0.1.1 Plan #2

Closed
opened 2026-03-11 17:59:26 +03:00 by NiXTheDev · 0 comments
NiXTheDev commented 2026-03-11 17:59:26 +03:00 (Migrated from github.com)

Summary of Improvements

Architecture & Core Logic

  • Simplify Quantifier variants: Unify quantifier enum with greediness flag to reduce match arms.
  • Enhance backtracking with memoization: Add caching to NFA simulation for exponential-case protection.
  • Optimize group storage: Replace per-state HashMap with dense vector or snapshot mechanism.

Parser & Lexer Improvements

  • Span-aware lexer: Attach position information to tokens for precise error messages.
  • Better parser errors: Wrap parse errors with SpannedError using current position.

Safety & FFI

  • FFI panic safety: Ensure all exported functions use catch_unwind or are panic-free.
  • Document const pointer behavior: Clarify ownership of string returned by ogex_version.
  • WASM detailed errors: Provide structured error objects with span info.

Performance Optimizations

  • Character class lookup: Preprocess classes into bitset/table for O(1) matching.
  • Pre-calculate epsilon closures: Cache epsilon closures per state to reduce runtime computation.
  • Avoid UTF-8 decoding: Operate directly on byte slices for ASCII-heavy patterns.

Testing & Reliability

  • Property-based tests: Use proptest to validate engine invariants with random patterns.
  • Backreference edge cases: Test relative backreferences with missing groups and complex interactions.
  • ReDoS testing: Include pathological patterns to ensure backtracking limits are effective.

Codebase Maintainability

  • Unify lookaround implementation: Use a trait for assertions to keep matching logic modular.
  • Improve documentation: Add more examples to replace.rs and groups.rs.
  • Benchmark suite: Set up Criterion benchmarks to track performance regressions.

Quality-of-Life Improvements

  • CLI --explain flag: Show step-by-step matching process for debugging.
  • Transpiler round-trip: Convert legacy regex syntax to Ogex to ease migration.
# Summary of Improvements ## Architecture & Core Logic - **Simplify Quantifier variants**: Unify quantifier enum with greediness flag to reduce match arms. - **Enhance backtracking with memoization**: Add caching to NFA simulation for exponential-case protection. - **Optimize group storage**: Replace per-state `HashMap` with dense vector or snapshot mechanism. ## Parser & Lexer Improvements - **Span-aware lexer**: Attach position information to tokens for precise error messages. - **Better parser errors**: Wrap parse errors with `SpannedError` using current position. ## Safety & FFI - **FFI panic safety**: Ensure all exported functions use `catch_unwind` or are panic-free. - **Document const pointer behavior**: Clarify ownership of string returned by `ogex_version`. - **WASM detailed errors**: Provide structured error objects with span info. ## Performance Optimizations - **Character class lookup**: Preprocess classes into bitset/table for O(1) matching. - **Pre-calculate epsilon closures**: Cache epsilon closures per state to reduce runtime computation. - **Avoid UTF-8 decoding**: Operate directly on byte slices for ASCII-heavy patterns. ## Testing & Reliability - **Property-based tests**: Use `proptest` to validate engine invariants with random patterns. - **Backreference edge cases**: Test relative backreferences with missing groups and complex interactions. - **ReDoS testing**: Include pathological patterns to ensure backtracking limits are effective. ## Codebase Maintainability - **Unify lookaround implementation**: Use a trait for assertions to keep matching logic modular. - **Improve documentation**: Add more examples to `replace.rs` and `groups.rs`. - **Benchmark suite**: Set up Criterion benchmarks to track performance regressions. ## Quality-of-Life Improvements - **CLI `--explain` flag**: Show step-by-step matching process for debugging. - **Transpiler round-trip**: Convert legacy regex syntax to Ogex to ease migration.
Sign in to join this conversation.
No description provided.