perf(lexer): try perfect hash table for keyword lookup by YangchenYe323 · Pull Request #171 · oxc-project/oxc
Boshen
changed the title
perf(lexer): try perfect hash table based on CityHash
perf(lexer): try perfect hash table for keyword lookup
Boshen
deleted the
yangchen/perf/perfect-fast-hash
branch
Boshen added a commit that referenced this pull request
Oct 7, 2025Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.
## Changes
**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.
**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
|| matches!(self, True | False | Null) // Extra check needed!
}
```
**After**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check!
}
```
## Assembly Impact
### Before (range + OR)
```asm
; Range check for Await..=Yield
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #86
cset w9, lo
; Additional checks for True/False/Null
cmp w0, #168
ccmp w0, #171, #0, ne
cset w0, lo
orr w0, w9, w0 ; Combine results
```
**7 instructions**
### After (pure range)
```asm
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #89
cset w0, lo
```
**4 instructions** (43% reduction!)
## Why This Works
`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings
Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.
## Performance
- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution
Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Boshen added a commit that referenced this pull request
Oct 7, 2025Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.
## Changes
**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.
**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
|| matches!(self, True | False | Null) // Extra check needed!
}
```
**After**:
```rust
pub fn is_any_keyword(self) -> bool {
matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check!
}
```
## Assembly Impact
### Before (range + OR)
```asm
; Range check for Await..=Yield
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #86
cset w9, lo
; Additional checks for True/False/Null
cmp w0, #168
ccmp w0, #171, #0, ne
cset w0, lo
orr w0, w9, w0 ; Combine results
```
**7 instructions**
### After (pure range)
```asm
and w8, w0, #0xff
sub w8, w8, #5
cmp w8, #89
cset w0, lo
```
**4 instructions** (43% reduction!)
## Why This Works
`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings
Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.
## Performance
- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution
Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters