Enhance Lexer to Attach Span Positions to Tokens #6

Closed
opened 2026-03-11 18:03:41 +03:00 by NiXTheDev · 0 comments
NiXTheDev commented 2026-03-11 18:03:41 +03:00 (Migrated from github.com)

Goal: Improve error messages by tracking the exact source location of each token.

Currently, the lexer in lexer.rs likely produces tokens without position information. This makes it impossible to report errors with precise locations.

Proposed Change:

  • Define a Token struct that contains the token kind and a Span (start..end).
  • Update the lexer to record positions as it consumes characters.
  • Modify the parser to accept spanned tokens and propagate spans to AST nodes (optional, but helpful for later stages).
  • In error.rs, use these spans to create SpannedError instances.

Example:

pub struct Token {
    pub kind: TokenKind,
    pub span: Span,
}

pub enum TokenKind {
    Literal(char),
    LParen,
    RParen,
    // ...
}

Implementation Steps:

  1. Refactor lexer::Lexer to keep a current position counter.
  2. For each token, record the start and end positions.
  3. Change the parser to consume Token instead of just TokenKind.
  4. Update error reporting in the parser to use token spans.

Benefits:

  • Errors like "unexpected character ']' at position 15" become possible.
  • Can eventually be used in IDEs/plugins to highlight errors.
**Goal**: Improve error messages by tracking the exact source location of each token. Currently, the lexer in `lexer.rs` likely produces tokens without position information. This makes it impossible to report errors with precise locations. **Proposed Change**: - Define a `Token` struct that contains the token kind and a `Span` (start..end). - Update the lexer to record positions as it consumes characters. - Modify the parser to accept spanned tokens and propagate spans to AST nodes (optional, but helpful for later stages). - In `error.rs`, use these spans to create `SpannedError` instances. **Example**: ```rust pub struct Token { pub kind: TokenKind, pub span: Span, } pub enum TokenKind { Literal(char), LParen, RParen, // ... } ``` **Implementation Steps**: 1. Refactor lexer::Lexer to keep a current position counter. 2. For each token, record the start and end positions. 3. Change the parser to consume Token instead of just TokenKind. 4. Update error reporting in the parser to use token spans. **Benefits**: - Errors like "unexpected character ']' at position 15" become possible. - Can eventually be used in IDEs/plugins to highlight errors.
Sign in to join this conversation.
No description provided.