Skip to content

Add research about specfile parsing service#229

Open
lbarcziova wants to merge 2 commits intopackit:mainfrom
lbarcziova:specfile-parsing-service
Open

Add research about specfile parsing service#229
lbarcziova wants to merge 2 commits intopackit:mainfrom
lbarcziova:specfile-parsing-service

Conversation

@lbarcziova
Copy link
Copy Markdown
Member

No description provided.

Assisted-by: Claude Opus 4.6 noreply@anthropic.com
@nforro
Copy link
Copy Markdown
Member

nforro commented Mar 16, 2026

IIUC this assumes spec file content would always come from the client, but what about other files, would there be a shared directory? We should be careful not to introduce a gap in the isolation.

I don't think the proposed caching can work in practice. For example it doesn't consider multiple Specfile instances at all. The pre-caching seems rather naive and fragile, but maybe it could work, I'm not sure it's worth it though.

@lbarcziova
Copy link
Copy Markdown
Member Author

IIUC this assumes spec file content would always come from the client, but what about other files, would there be a shared directory? We should be careful not to introduce a gap in the isolation.

do you mean for external files defined in %include/%load mentinoed in the open questions, or something else?

I don't think the proposed caching can work in practice. For example it doesn't consider multiple Specfile instances at all. The pre-caching seems rather naive and fragile, but maybe it could work, I'm not sure it's worth it though.

good point, will look into this more.

@nforro
Copy link
Copy Markdown
Member

nforro commented Mar 16, 2026

do you mean for external files defined in %include/%load mentinoed in the open questions, or something else?

I meant dist-git content in general, but mainly sources. Yes, also those to be included/loaded, but more importantly common ones like tarballs, patches, signatures etc. We could leave out those that have no effect on spec parsing, but that's not so straightforward to determine.

@lbarcziova
Copy link
Copy Markdown
Member Author

I assumed those are not necessarily available even currently, e.g. when parsing upstream spec files. From quick analysis with Opus:

For both upstream and dist-git, specfile is parsed before source tarballs are downloaded. 

  Upstream repo (sourcedir = upstream repo working dir):
  - Source code files: present (it's the upstream repo)
  - Patches: present if committed to upstream
  - Source tarballs: not present (not downloaded yet)

  Dist-git repo (sourcedir = dist-git repo root):
  - Patches: present (committed to dist-git)
  - Source tarballs: not present (in lookaside cache, downloaded later)
  - %include/%{load:...} files: present if committed

  The service would differ from current behavior only in that patches and other git-tracked files wouldn't be in sourcedir. But since rpm.spec() with RPMSPEC_ANYARCH does a non-build parse (doesn't execute %prep), it only checks file existence — dummy files satisfy this.

  The one real gap remains %include/%{load:...}, where file content matters.

@nforro
Copy link
Copy Markdown
Member

nforro commented Mar 16, 2026

I mean, upstream spec files are usually written in a way not to require files that are not there. Or such files are somehow fetched or generated in actions. But I thought we are implementing a general solution.
Here are some examples of spec files (not using %include/%load) that will fail to parse or have tags that will expand differently depending on existence or content of other files (sources):
https://src.fedoraproject.org/rpms/mingw-crt/blob/rawhide/f/mingw-crt.spec
https://src.fedoraproject.org/rpms/gawk/blob/rawhide/f/gawk.spec
https://src.fedoraproject.org/rpms/python-rpm-macros/blob/rawhide/f/python-rpm-macros.spec

Assisted-by: Claude Opus 4.6 noreply@anthropic.com
@lbarcziova
Copy link
Copy Markdown
Member Author

I mean, upstream spec files are usually written in a way not to require files that are not there. Or such files are somehow fetched or generated in actions. But I thought we are implementing a general solution.

yes, but as I mentioned in the previous comment, even for dist-git repos for parsing the specfile we don't necessarily have all the sources downloaded. But you are right, it is a fair point to include it since we are starting here from scratch, added notes on this.

| `%include`/`%{load:...}` targets | If committed | Yes (content as spec input) | Handled by `force_parse` dummy mechanism, same as when missing today |
| Files read by `%(...)` / `%{lua:...}` | If committed | Yes (arbitrary shell/Lua code) | Regression — see examples below |

The actual gap: committed source files explicitly read by `%(...)` or `%{lua:...}` during parsing.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%include/%load belong here as well. force_parse=True is best effort, dummy files are generated but there is a (very) high chance it won't work anyway.


### Option A: Spec content only

Send only the spec file text. No source files. Core packit operations (Name, Version, Release, source URLs, changelog) are unaffected — these tags rarely depend on external file content. Regresses parsing for the small set of specs that read committed files via `%(...)` / `%{lua:...}`. `force_parse` prevents hard failures.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_parse prevents hard failures.

It doesn't. Like I said above, it can help, but if e.g. a missing macro from a file to be loaded/included leads to a broken syntax, parsing will still end early with RPMException.

Copy link
Copy Markdown
Member

@nforro nforro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants