Change parser state only after .await points to prevent state corruption#935
Change parser state only after .await points to prevent state corruption#935Mingun wants to merge 1 commit intotafia:masterfrom
Conversation
…t state corruption after canceling of async reading
|
Failing Fuzzing pipeline is due to google/oss-fuzz#14945 |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #935 +/- ##
==========================================
- Coverage 58.08% 54.98% -3.11%
==========================================
Files 42 44 +2
Lines 15513 16776 +1263
==========================================
+ Hits 9011 9224 +213
- Misses 6502 7552 +1050
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
After reflection, I think that still this PR does not fix the problem. It is that we do not save the state of the parser between I was already working on rewriting the parser in the style of "reading to the buffer separately, then feed the read piece to the parser" and suddenly this work became relevant to fix this problem. Parser will maintain state after consuming piece of data, so we will never lost it. So I mark this a draft for now. |
Sorry for the random tangent: Did you consider using |
|
I don't quite understand the question. |
|
async can also be used to automatically generate the state machines currently open-coded via |
|
It is still unclear for me how the async may help with managing parse states. Could you provide, for example, how parsing of
|
Instead of a signature like fn feed(&mut self, bytes: &[u8]) -> Option<usize>;which tracks state via async fn parse_quoted(read: &mut AsyncBufRead) -> Result<Vec<u8>> {
let quote = expect_oneof(read, [b'\'', b'"']).await?;
let mut buf = Vec::new();
read.read_until(quote, &mut buf).await?;
Ok(buf)
}where the compiler will effectively create that state enum for you when it prepares the future to suspend after each suspension/await point. This is just how I/O-based recursive descent parsers have been written forever and their main upside is that the code flow represents what symbols are expected where and when which is usually simpler for humans to understand than an explicit state machine. The main trick here would to wrap this using a custom implementation of fn feed(&mut self, bytes: &[u8]) -> Result<usize> {
self.bytes = bytes;
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
match self.parser.as_mut().poll(&cx) {
Poll::Pending => { Ok(self.consumed) }
Poll::Ready(Ok(())) => { self.finished = true; Ok(self.consumed) }
Poll::Ready(Err(err))) => { Err(err) }
}
}with some suitably constructed future stored in This would turn this around making the parser async-first and then derive the sync version from it, but just as a means to get state-management-via-control-flow known from classic recursive descent parsers. |
See explanation in #624 (comment)
Unfortunately, I cannot create stable test for this, so this bugfix without tests.