Change parser state only after .await points to prevent state corruption by Mingun · Pull Request #935 · tafia/quick-xml

Mingun · 2026-02-12T19:07:14Z

See explanation in #624 (comment)

Unfortunately, I cannot create stable test for this, so this bugfix without tests.

…t state corruption after canceling of async reading

Mingun · 2026-02-12T19:16:00Z

Failing Fuzzing pipeline is due to google/oss-fuzz#14945

codecov-commenter · 2026-02-12T19:20:03Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.98%. Comparing base (d1acdb5) to head (a765dca).
⚠️ Report is 33 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #935      +/-   ##
==========================================
- Coverage   58.08%   54.98%   -3.11%     
==========================================
  Files          42       44       +2     
  Lines       15513    16776    +1263     
==========================================
+ Hits         9011     9224     +213     
- Misses       6502     7552    +1050

Flag	Coverage Δ
unittests	`54.98% <ø> (-3.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mingun · 2026-02-13T06:30:34Z

After reflection, I think that still this PR does not fix the problem. It is that we do not save the state of the parser between .await points. Thus, if an interruption occurred at this point, we could read some piece of data from AsyncRead to the buffer, but we do not reflect that in our ParseState -- each variant hold the information only about that we in start of the next event, but not that we inside it.

I was already working on rewriting the parser in the style of "reading to the buffer separately, then feed the read piece to the parser" and suddenly this work became relevant to fix this problem. Parser will maintain state after consuming piece of data, so we will never lost it.

So I mark this a draft for now.

adamreichold · 2026-02-13T07:08:29Z

I was already working on rewriting the parser in the style of "reading to the buffer separately, then feed the read piece to the parser" and suddenly this work became relevant to fix this problem.

Sorry for the random tangent: Did you consider using async instead of manual state machines to write your parsers? I think this could very much simplify them, basing them off an async trait like AsyncBufRead and keeping the code in a much more readable recursive descent parser style with the compiler building those state machines for you. (Sync usage would then just need a little shim executor to reimplement the current "feed" style interface.)

Mingun · 2026-02-13T08:20:49Z

I don't quite understand the question. async is needed to wait for data from somewhere, when parsing we do not need to await anything, except the data, but getting data is independent of its parsing. For example, when we parse from slice we already have all needed data in the memory.

adamreichold · 2026-02-13T10:06:02Z

async can also be used to automatically generate the state machines currently open-coded via Parser::feed so that instead of manually managing a state enum, you just write imperative style code with the state enum a.k.a future being built by the Rust compiler so that you can keep writing sans I/O parsers but write them in recursive descent style. In that approach, the parser would use async in its implementation (but not its API) even when the source is not async at all.

Mingun · 2026-02-13T13:54:54Z

It is still unclear for me how the async may help with managing parse states. Could you provide, for example, how parsing of <tag attribute1=">" attribute2='<'> could look? We need 3 states here:

Outside -- outside of any quoted region
DoubleQ -- inside double-quoted region
SingleQ -- inside single-quoted region

adamreichold · 2026-02-13T14:38:22Z

It is still unclear for me how the async may help with managing parse states. Could you provide, for example, how parsing of <tag attribute1=">" attribute2='<'> could look? We need 3 states here:
* Outside -- outside of any quoted region

* DoubleQ -- inside double-quoted region

* SingleQ -- inside single-quoted region

Instead of a signature like

fn feed(&mut self, bytes: &[u8]) -> Option<usize>;

which tracks state via self, you'd track state via control flow, e.g.

async fn parse_quoted(read: &mut AsyncBufRead) -> Result<Vec<u8>> {
    let quote = expect_oneof(read, [b'\'', b'"']).await?;

    let mut buf = Vec::new();
    read.read_until(quote, &mut buf).await?;

    Ok(buf)
}

where the compiler will effectively create that state enum for you when it prepares the future to suspend after each suspension/await point.

This is just how I/O-based recursive descent parsers have been written forever and their main upside is that the code flow represents what symbols are expected where and when which is usually simpler for humans to understand than an explicit state machine.

The main trick here would to wrap this using a custom implementation of AsyncBufRead and trivial executor, so that one would could present this as a sans I/O interface on the outside, e.g.

fn feed(&mut self, bytes: &[u8]) -> Result<usize> {
   self.bytes = bytes;

   let mut cx = std::task::Context::from_waker(std::task::Waker::noop());

   match self.parser.as_mut().poll(&cx) {
     Poll::Pending => { Ok(self.consumed) }
     Poll::Ready(Ok(())) => { self.finished = true; Ok(self.consumed) }
     Poll::Ready(Err(err))) => { Err(err) }
   }
}

with some suitably constructed future stored in self.parser which was passed an AsyncBufRead implementation that is essentially backed by self.bytes and self.consumed. (Yes, there is a bit of hand-waving to safely making the sharing happen behind the scenes here.)

This would turn this around making the parser async-first and then derive the sync version from it, but just as a means to get state-management-via-control-flow known from classic recursive descent parsers.

Fix tafia#624: change parser state only after .await points to preven…

a765dca

…t state corruption after canceling of async reading

Mingun added the bug label Feb 12, 2026

dralley approved these changes Feb 13, 2026

View reviewed changes

Mingun marked this pull request as draft February 13, 2026 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change parser state only after .await points to prevent state corruption#935

Change parser state only after .await points to prevent state corruption#935
Mingun wants to merge 1 commit intotafia:masterfrom
Mingun:fix-async-cancel-safety

Mingun commented Feb 12, 2026

Uh oh!

Mingun commented Feb 12, 2026

Uh oh!

codecov-commenter commented Feb 12, 2026

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Mingun commented Feb 12, 2026

Uh oh!

Mingun commented Feb 12, 2026

Uh oh!

codecov-commenter commented Feb 12, 2026

Codecov Report

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Mingun commented Feb 13, 2026

Uh oh!

adamreichold commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants