Add research about specfile parsing service by lbarcziova · Pull Request #229 · packit/research

lbarcziova · 2026-03-13T10:34:14Z

No description provided.

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

nforro · 2026-03-16T07:00:41Z

IIUC this assumes spec file content would always come from the client, but what about other files, would there be a shared directory? We should be careful not to introduce a gap in the isolation.

I don't think the proposed caching can work in practice. For example it doesn't consider multiple Specfile instances at all. The pre-caching seems rather naive and fragile, but maybe it could work, I'm not sure it's worth it though.

lbarcziova · 2026-03-16T18:32:05Z

IIUC this assumes spec file content would always come from the client, but what about other files, would there be a shared directory? We should be careful not to introduce a gap in the isolation.

do you mean for external files defined in %include/%load mentinoed in the open questions, or something else?

I don't think the proposed caching can work in practice. For example it doesn't consider multiple Specfile instances at all. The pre-caching seems rather naive and fragile, but maybe it could work, I'm not sure it's worth it though.

good point, will look into this more.

nforro · 2026-03-16T18:42:38Z

do you mean for external files defined in %include/%load mentinoed in the open questions, or something else?

I meant dist-git content in general, but mainly sources. Yes, also those to be included/loaded, but more importantly common ones like tarballs, patches, signatures etc. We could leave out those that have no effect on spec parsing, but that's not so straightforward to determine.

lbarcziova · 2026-03-16T19:10:01Z

I assumed those are not necessarily available even currently, e.g. when parsing upstream spec files. From quick analysis with Opus:

For both upstream and dist-git, specfile is parsed before source tarballs are downloaded. 

  Upstream repo (sourcedir = upstream repo working dir):
  - Source code files: present (it's the upstream repo)
  - Patches: present if committed to upstream
  - Source tarballs: not present (not downloaded yet)

  Dist-git repo (sourcedir = dist-git repo root):
  - Patches: present (committed to dist-git)
  - Source tarballs: not present (in lookaside cache, downloaded later)
  - %include/%{load:...} files: present if committed

  The service would differ from current behavior only in that patches and other git-tracked files wouldn't be in sourcedir. But since rpm.spec() with RPMSPEC_ANYARCH does a non-build parse (doesn't execute %prep), it only checks file existence — dummy files satisfy this.

  The one real gap remains %include/%{load:...}, where file content matters.

nforro · 2026-03-16T19:56:20Z

I mean, upstream spec files are usually written in a way not to require files that are not there. Or such files are somehow fetched or generated in actions. But I thought we are implementing a general solution.
Here are some examples of spec files (not using %include/%load) that will fail to parse or have tags that will expand differently depending on existence or content of other files (sources):
https://src.fedoraproject.org/rpms/mingw-crt/blob/rawhide/f/mingw-crt.spec
https://src.fedoraproject.org/rpms/gawk/blob/rawhide/f/gawk.spec
https://src.fedoraproject.org/rpms/python-rpm-macros/blob/rawhide/f/python-rpm-macros.spec

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

lbarcziova · 2026-03-17T16:56:30Z

I mean, upstream spec files are usually written in a way not to require files that are not there. Or such files are somehow fetched or generated in actions. But I thought we are implementing a general solution.

yes, but as I mentioned in the previous comment, even for dist-git repos for parsing the specfile we don't necessarily have all the sources downloaded. But you are right, it is a fair point to include it since we are starting here from scratch, added notes on this.

nforro · 2026-04-01T14:38:31Z

research/specfile-parsing-service/index.md

+| `%include`/`%{load:...}` targets      | If committed     | Yes (content as spec input)    | Handled by `force_parse` dummy mechanism, same as when missing today |
+| Files read by `%(...)` / `%{lua:...}` | If committed     | Yes (arbitrary shell/Lua code) | Regression — see examples below                                      |
+
+The actual gap: committed source files explicitly read by `%(...)` or `%{lua:...}` during parsing.


%include/%load belong here as well. force_parse=True is best effort, dummy files are generated but there is a (very) high chance it won't work anyway.

nforro · 2026-04-01T14:51:08Z

research/specfile-parsing-service/index.md

+
+### Option A: Spec content only
+
+Send only the spec file text. No source files. Core packit operations (Name, Version, Release, source URLs, changelog) are unaffected — these tags rarely depend on external file content. Regresses parsing for the small set of specs that read committed files via `%(...)` / `%{lua:...}`. `force_parse` prevents hard failures.


force_parse prevents hard failures.

It doesn't. Like I said above, it can help, but if e.g. a missing macro from a file to be loaded/included leads to a broken syntax, parsing will still end early with RPMException.

nforro

LGTM, thanks!

Add research about specfile parsing service

d28b0cd

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

lbarcziova assigned majamassarini and nforro Mar 13, 2026

usercont-release-bot added this to Packit pull requests Mar 13, 2026

github-project-automation bot moved this to New in Packit pull requests Mar 13, 2026

Add notes based on the review comments

1da03fe

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

nforro reviewed Apr 1, 2026

View reviewed changes

nforro approved these changes Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add research about specfile parsing service#229

Add research about specfile parsing service#229
lbarcziova wants to merge 2 commits intopackit:mainfrom
lbarcziova:specfile-parsing-service

lbarcziova commented Mar 13, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 16, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 16, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 17, 2026

Uh oh!

nforro Apr 1, 2026

Uh oh!

nforro Apr 1, 2026

Uh oh!

nforro left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		### Option A: Spec content only

		Send only the spec file text. No source files. Core packit operations (Name, Version, Release, source URLs, changelog) are unaffected — these tags rarely depend on external file content. Regresses parsing for the small set of specs that read committed files via `%(...)` / `%{lua:...}`. `force_parse` prevents hard failures.

Conversation

lbarcziova commented Mar 13, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 16, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 16, 2026

Uh oh!

nforro commented Mar 16, 2026

Uh oh!

lbarcziova commented Mar 17, 2026

Uh oh!

nforro Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

nforro Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

nforro left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants