perf: improve tokenizer scanning and add benchmark suite by smith558 · Pull Request #51 · fent/ret.js

smith558 · 2026-03-20T13:39:50Z

Summary

The tokeniser now parses regex source directly instead of doing an upfront escaped-string rewrite and slice-based parsing for character classes and {m,n} repetitions.

What Changed

removed the up-front strToChars() pass from tokenization
replaced slice-based class parsing with an indexed class scanner
replaced regex-on-slice repetition parsing with an indexed {m,n} parser
consolidated escape and number parsing helpers to keep the hot path smaller
added light comments around the non-obvious tokeniser branches
added a repo benchmark suite covering:
- tokenizer
- reconstruct
- roundtrip
documented benchmark usage in the README

Why

The main goal is to reduce per-parse allocations and avoid unnecessary rescans of the input string, especially on patterns with character classes and custom repetitions.

The benchmark suite is included so future tokeniser changes can be measured with representative workloads instead of one-off local scripts.

Benchmark

I compared the current branch against HEAD~2 locally using the same benchmark case set.

Tokeniser results:

geometric mean speedup: 2.753x
arithmetic mean speedup: 2.975x

Selected tokeniser cases:

email-like: 4.922x
dense-sets: 4.027x
path-like: 3.447x
literal: 3.147x
class-heavy: 3.144x

Roundtrip results:

geometric mean speedup: 1.826x

reconstruct also benchmarked faster in this harness, but tokenizer is the primary target of the change.

Testing

npm run build
npm test
npm run bench -- --min-ms 200

Notes

The benchmark suite is intentionally dependency-free and runs against the built dist output:

npm run bench
npm run bench -- --suite tokenizer --min-ms 750

smith558 added 3 commits March 20, 2026 13:24

refactor: scan regex patterns in a single pass

0ba6dd9

test: add benchmark suite

f40fbe2

simplify

fe4865b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve tokenizer scanning and add benchmark suite#51

perf: improve tokenizer scanning and add benchmark suite#51
smith558 wants to merge 3 commits intofent:masterfrom
smith558:rescans-refactor

smith558 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

smith558 commented Mar 20, 2026

Summary

What Changed

Why

Benchmark

Testing

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant