Add research about specfile parsing service#229
Add research about specfile parsing service#229lbarcziova wants to merge 2 commits intopackit:mainfrom
Conversation
Assisted-by: Claude Opus 4.6 noreply@anthropic.com
|
IIUC this assumes spec file content would always come from the client, but what about other files, would there be a shared directory? We should be careful not to introduce a gap in the isolation. I don't think the proposed caching can work in practice. For example it doesn't consider multiple |
do you mean for external files defined in %include/%load mentinoed in the open questions, or something else?
good point, will look into this more. |
I meant dist-git content in general, but mainly sources. Yes, also those to be included/loaded, but more importantly common ones like tarballs, patches, signatures etc. We could leave out those that have no effect on spec parsing, but that's not so straightforward to determine. |
|
I assumed those are not necessarily available even currently, e.g. when parsing upstream spec files. From quick analysis with Opus: For both upstream and dist-git, specfile is parsed before source tarballs are downloaded.
Upstream repo (sourcedir = upstream repo working dir):
- Source code files: present (it's the upstream repo)
- Patches: present if committed to upstream
- Source tarballs: not present (not downloaded yet)
Dist-git repo (sourcedir = dist-git repo root):
- Patches: present (committed to dist-git)
- Source tarballs: not present (in lookaside cache, downloaded later)
- %include/%{load:...} files: present if committed
The service would differ from current behavior only in that patches and other git-tracked files wouldn't be in sourcedir. But since rpm.spec() with RPMSPEC_ANYARCH does a non-build parse (doesn't execute %prep), it only checks file existence — dummy files satisfy this.
The one real gap remains %include/%{load:...}, where file content matters. |
|
I mean, upstream spec files are usually written in a way not to require files that are not there. Or such files are somehow fetched or generated in actions. But I thought we are implementing a general solution. |
Assisted-by: Claude Opus 4.6 noreply@anthropic.com
yes, but as I mentioned in the previous comment, even for dist-git repos for parsing the specfile we don't necessarily have all the sources downloaded. But you are right, it is a fair point to include it since we are starting here from scratch, added notes on this. |
| | `%include`/`%{load:...}` targets | If committed | Yes (content as spec input) | Handled by `force_parse` dummy mechanism, same as when missing today | | ||
| | Files read by `%(...)` / `%{lua:...}` | If committed | Yes (arbitrary shell/Lua code) | Regression — see examples below | | ||
|
|
||
| The actual gap: committed source files explicitly read by `%(...)` or `%{lua:...}` during parsing. |
There was a problem hiding this comment.
%include/%load belong here as well. force_parse=True is best effort, dummy files are generated but there is a (very) high chance it won't work anyway.
|
|
||
| ### Option A: Spec content only | ||
|
|
||
| Send only the spec file text. No source files. Core packit operations (Name, Version, Release, source URLs, changelog) are unaffected — these tags rarely depend on external file content. Regresses parsing for the small set of specs that read committed files via `%(...)` / `%{lua:...}`. `force_parse` prevents hard failures. |
There was a problem hiding this comment.
force_parseprevents hard failures.
It doesn't. Like I said above, it can help, but if e.g. a missing macro from a file to be loaded/included leads to a broken syntax, parsing will still end early with RPMException.
No description provided.