perf(lexer): try perfect hash table for keyword lookup by YangchenYe323 · Pull Request #171 · oxc-project/oxc

@Boshen Boshen changed the title perf(lexer): try perfect hash table based on CityHash perf(lexer): try perfect hash table for keyword lookup

Mar 12, 2023

Boshen

YoniFeng

YoniFeng

@YangchenYe323

@YangchenYe323

@YangchenYe323

@YangchenYe323

@Boshen Boshen deleted the yangchen/perf/perfect-fast-hash branch

April 15, 2023 10:26

Boshen added a commit that referenced this pull request

Oct 7, 2025
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.

## Changes

**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.

**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
        || matches!(self, True | False | Null)  // Extra check needed!
}
```

**After**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Null as u8)  // Pure range check!
}
```

## Assembly Impact

### Before (range + OR)
```asm
; Range check for Await..=Yield
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #86
cset  w9, lo
; Additional checks for True/False/Null
cmp   w0, #168
ccmp  w0, #171, #0, ne
cset  w0, lo
orr   w0, w9, w0      ; Combine results
```
**7 instructions**

### After (pure range)
```asm
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #89
cset  w0, lo
```
**4 instructions** (43% reduction!)

## Why This Works

`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings

Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.

## Performance

- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution

Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Boshen added a commit that referenced this pull request

Oct 7, 2025
Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses.

## Changes

**Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`.

**Before**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8)
        || matches!(self, True | False | Null)  // Extra check needed!
}
```

**After**:
```rust
pub fn is_any_keyword(self) -> bool {
    matches!(self as u8, x if x >= Await as u8 && x <= Null as u8)  // Pure range check!
}
```

## Assembly Impact

### Before (range + OR)
```asm
; Range check for Await..=Yield
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #86
cset  w9, lo
; Additional checks for True/False/Null
cmp   w0, #168
ccmp  w0, #171, #0, ne
cset  w0, lo
orr   w0, w9, w0      ; Combine results
```
**7 instructions**

### After (pure range)
```asm
and   w8, w0, #0xff
sub   w8, w8, #5
cmp   w8, #89
cset  w0, lo
```
**4 instructions** (43% reduction!)

## Why This Works

`True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec:
- Keywords: Reserved words that cannot be used as identifiers
- Literals: Values with specific meanings

Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness.

## Performance

- Called from `advance()` on **every token**
- **3 fewer instructions** on the hottest path
- **Simpler control flow** = better branch prediction
- **No OR operation** = faster execution

Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>