faputa

A stateful parser generator that compiles .faputa grammar files into fast winnow-based Rust parsers.

The goal is to make context-sensitive grammars — the kind that trip up pure PEG parsers — expressible and readable. Think Markdown formatting rules, indentation tracking, or nesting-depth limits: things that require remembering what you've already seen while parsing.

Features

PEG-style grammar DSL — sequences, ordered choice, repetition, lookahead, character ranges, and built-in boundaries (SOI, EOI, ANY, LINE_START, LINE_END)
Stateful parsing — first-class flag and counter declarations with scoped mutation (with), committed measurement (measure), conditional dispatch (when / if ... else), rule-level guards, and rollback-safe recursion depth limits
Compile-time codegen — generates Rust code via derive macro or build script; no runtime interpretation overhead
IR optimization pipeline — rule inlining, literal fusion, CharSet merging, TakeWhile recognition (maps to winnow's SIMD-accelerated take_while), and dispatch table generation
Custom error messages — @ label syntax at rule and expression level for user-friendly parse errors with accurate source positions
Tracing — RUST_LOG=debug shows the full compilation pipeline; the debug feature enables winnow's runtime parse-tree tracing

Quick Start

Derive Macro

[dependencies]
faputa = "0.1"
faputa_derive = "0.1"

Write a grammar file (grammar.faputa):

alpha  = { 'a'..'z' | 'A'..'Z' | "_" }
digit  = { '0'..'9' }
ident  = { alpha (alpha | digit)* }
number = { digit+ }
assign = { ident "=" (number | ident) }

Derive the parser:

use faputa_derive::Parser;

#[derive(Parser)]
#[grammar("grammar.faputa")]
struct MyParser;

fn main() {
    match MyParser::parse_assign("x=42") {
        Ok(matched) => println!("parsed: {matched}"),
        Err(e) => eprintln!("{e}"),
    }
}

Each rule generates a parse_<name>(&str) -> Result<&str, String> method.

`build.rs` Codegen

If you prefer generating sources at build time (pest-style):

[dependencies]
faputa = "0.1"

[build-dependencies]
faputa_meta = "0.1"
faputa_generator = "0.1"
prettyplease = "0.2"
syn = "2"

// build.rs
use std::path::Path;

fn main() {
    let source = std::fs::read_to_string("grammar.faputa").unwrap();
    let grammar = faputa_meta::compile(&source).unwrap();
    let tokens = faputa_generator::generate(&grammar);

    let code = match syn::parse2::<syn::File>(tokens.clone()) {
        Ok(file) => prettyplease::unparse(&file),
        Err(_) => tokens.to_string(),
    };

    let out_file = Path::new(&std::env::var("OUT_DIR").unwrap()).join("grammar.rs");
    std::fs::write(&out_file, code).unwrap();

    println!("cargo::rerun-if-changed=grammar.faputa");
}

// main.rs
include!(concat!(env!("OUT_DIR"), "/grammar.rs"));

fn main() {
    let parsed = __faputa::parse_assign("x=42").unwrap();
    println!("{parsed}");
}

Grammar Syntax

Terminals

"hello"              // literal string
'a'..'z'             // character range (inclusive)
ANY                  // any single character
SOI  EOI             // start / end of input
LINE_START  LINE_END // line boundaries (zero-width)

Combinators

a b c                // sequence
a | b | c            // ordered choice
a*                   // zero or more
a+                   // one or more
a?                   // optional
a{3}                 // exactly 3
a{2,5}               // 2 to 5
a{2,}                // at least 2
a{width}             // exactly counter `width`
a{indent,limit}      // runtime counter range
&a                   // positive lookahead (zero-width)
!a                   // negative lookahead (zero-width)
(a | b) c            // grouping

State declarations

Declare state at the top of the grammar, before any rules:

let flag   my_flag      // boolean, default false
let counter my_counter  // unsigned integer, default 0

Stateful expressions

with my_flag { ... }           // set flag to true inside block, restore on exit
with my_counter += 2 { ... }   // increment counter inside block, decrement on exit
with my_counter += width { ... } // runtime counter amount
measure width { ... }          // store consumed character count on success
when my_flag { ... }           // run body only if flag is true, else succeed empty
when !my_flag { ... }          // run body only if flag is false
when my_counter > 0 { ... }    // run body only if counter satisfies condition
if my_counter >= width { ... } else { ... } // true/false branch
depth_limit(64) { ... }        // fail if recursion exceeds 64 levels
depth_limit(limit) { ... }     // runtime counter limit

Supported comparison operators in when: ==, !=, <, <=, >, >=. Counter expressions are allowed on the right-hand side of comparisons and in repeat bounds.

Rule-level statements

These appear at the top of a rule body, before the main expression:

rule = {
    guard my_flag          // fail immediately if flag is not set
    guard !my_flag         // fail immediately if flag is set
    guard my_counter > 0   // fail immediately if condition does not hold
    guard LINE_START        // fail immediately if not at start of line
    inc my_counter         // increment counter by 1 on the current parse path
    ...
}

State mutations participate in backtracking. If a branch fails and rewinds, its counter updates are rolled back too; scoped with mutations still restore on both success and failure.

Error labels

ident  = @ "identifier" { alpha (alpha | digit)* }   // rule-level label
number = @ "number"     { digit+ }

assign = {
    ident "=" @ "right-hand side" (number | ident)   // expression-level label
}

Stateful Parsing Examples

Preventing re-entrant inline formatting (Markdown-style)

Flags prevent bold/italic from nesting inside themselves:

let flag inside_bold
let flag inside_italic

inline = { bold | italic | text }

bold = {
    guard !inside_bold
    with inside_bold {
        "**" inline+ "**"
    }
}

italic = {
    guard !inside_italic
    with inside_italic {
        "*" inline+ "*"
    }
}

text = { !("*") ANY }

Tracking nesting depth

Counters track depth; depth_limit prevents runaway recursion:

let counter brace_depth

document = {
    depth_limit(64) { block* }
}

block = {
    "{" with brace_depth += 1 { block* } "}"
}

Conditional parsing based on position

Built-in predicates work inside guard and when:

let counter section_count

header = {
    guard LINE_START
    inc section_count
    "#"{1,6} " " text
}

Error Labels

Without labels, errors show raw rule names. With @ labels you control the message:

ident  = @ "identifier" { alpha (alpha | digit)* }
number = @ "number" { digit+ }

value  = @ "value" {
    number @ "a number"
  | ident  @ "an identifier"
}

assign = { ident "=" @ "right-hand side" value }

A failed parse at x= would report:

parse error at 1:3
expected right-hand side (a number or an identifier)

Examples

Example	Description	Run
`parse_demo`	Assignment parser	`cargo run -p parse_demo -- file.txt`
`parse_json`	Full RFC 8259 JSON	`cargo run -p parse_json -- file.json`
`error_labels`	Custom error messages	`cargo run -p error_labels -- file.txt`

Crate Structure

Crate	Purpose
`faputa`	Runtime: re-exports winnow types, `LineIndex`, `State` trait
`faputa_meta`	Lexer → Parser → Validator → HIR → MIR → Optimizer
`faputa_generator`	MIR → Rust/winnow codegen
`faputa_derive`	`#[derive(Parser)]` proc macro

Feature Flags

Feature	Effect
`debug`	Enables winnow's `trace()` combinator — prints parse tree to stderr at runtime

faputa = { version = "0.1", features = ["debug"] }

Tracing

All example binaries include a tracing subscriber. Set RUST_LOG to see compilation internals:

RUST_LOG=debug cargo run -p parse_json -- file.json

Output includes each pipeline stage: lex → parse → validate → HIR lower → HIR optimize → MIR lower → MIR optimize → codegen.

Publishing

cargo xtask publish-dry  # dry run
cargo xtask publish      # publish to crates.io

Crates are published in dependency order: faputa → faputa_meta → faputa_generator → faputa_derive

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.cargo		.cargo
.claude		.claude
.github		.github
benches		benches
crates		crates
examples		examples
fixtures		fixtures
fuzz		fuzz
tests/e2e		tests/e2e
xtask		xtask
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

faputa

Features

Quick Start

Derive Macro

`build.rs` Codegen

Grammar Syntax

Terminals

Combinators

State declarations

Stateful expressions

Rule-level statements

Error labels

Stateful Parsing Examples

Preventing re-entrant inline formatting (Markdown-style)

Tracking nesting depth

Conditional parsing based on position

Error Labels

Examples

Crate Structure

Feature Flags

Tracing

Publishing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

faputa

Features

Quick Start

Derive Macro

build.rs Codegen

Grammar Syntax

Terminals

Combinators

State declarations

Stateful expressions

Rule-level statements

Error labels

Stateful Parsing Examples

Preventing re-entrant inline formatting (Markdown-style)

Tracking nesting depth

Conditional parsing based on position

Error Labels

Examples

Crate Structure

Feature Flags

Tracing

Publishing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`build.rs` Codegen

Packages