lock sha/digest POC by maxandersen · Pull Request #2410 · jbangdev/jbang

maxandersen · 2026-02-23T14:19:32Z

This PR adds lock-based integrity + reproducibility checks for JBang refs, while preserving normal run behavior when no lock file exists.

p.s. I was not actually expecting to have this in now but implementation shaped up nicely while I was assembling an IKEA table and having a chat with an LLM (brave new world :) - opening as draft PR to get feedback

What’s included

New command: jbang lock <ref>
Default lock file behavior:
local file refs (app.java) -> app.java.lock
alias/GAV/URL refs -> .jbang.lock
override via --lock-file=...
Run policy flag: --locked=<none|lenient|strict>
- none: ignore lock checks
- lenient (default): if lock data exists, enforce it; if lock missing, run normally
- strict: require lock entry when lock file exists and enforce strict matching
Lock structure per ref:
- ref=sha256:... (main resource digest)
- ref.sources=... (resolved source manifest)
- ref.deps=... (resolved transitive dependency coordinates)
- ref.dep.<gav>=sha256:... (per-artifact dependency digests)

Verification behavior

When lock data exists, JBang validates:

Main resource digest
Source manifest (.sources) consistency
Dependency graph (.deps) consistency
Per-artifact dependency digests (.dep.<gav>)

Mismatch -> fail with explicit error.
Digest mismatch errors include the resolved file path.

Why this design

Keeps “just run it” workflow intact when no lock is present.
Provides stronger guarantees as soon as lock data exists.
Separates mutation (jbang lock) from execution (jbang run ...).

Feedback requested

Are --locked mode semantics (none|lenient|strict) right?
Is <script>.lock the right default for local file refs?
Is properties-based lock format acceptable for v1 with these keys?
Should strict mode require complete per-artifact digests for every locked dep (current behavior)?

Related issues

Relates to:

Support script/alias SHA verification #1989 (script/alias SHA verification)
Support version pinning for aliases/scripts #1979 (pinning behavior for aliases/scripts)

Supersedes / consolidates discussion from:

Add script checksums to catalogs #938 (catalog checksums)
check checksum are being verified to avoid inocmplete/invalid data #687 (older checksum/integrity thread)

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

…tion Assisted-by: Haley (openai-codex/gpt-5.3-codex)

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

quintesse · 2026-02-23T15:01:08Z

When lock data exists, JBang validates:

I understand the main resource digest, but why the others?

I'd think that having the sources and deps explicitly mentioned in the main source files would already take care of that?

Of course I'm assuming each included (re)source file would have it's own .lock file. Is that a wrong assumption perhaps?
And the deps, at least the Maven ones, already have their own checksumming, don't they?

maxandersen · 2026-02-23T16:59:33Z

When lock data exists, JBang validates:

I understand the main resource digest, but why the others?

How do I check the digest/sha of those without them being listed?

look in .lock files from other ecosystems; there is a "root" (i.e. your project) and then listed what "dependencies" are needed with their sha's (including transitives).

Same model here.

I'd think that having the sources and deps explicitly mentioned in the main source files would already take care of that?

not following what you mean here?

Of course I'm assuming each included (re)source file would have it's own .lock file. Is that a wrong assumption perhaps?

each included resource file does not have a lock file - and if they did it does not necessarily equate with what was resolved at time of running the "parent".

A.java has //DEPS dep:something:1 uses B.java that uses //DEPS dep:something:1.2 dep:other:3

i run jbang A.java now the lock will be listing dep:something:1 and dep:other:3 and should not have dep:something:1.2 ...the .lock file for B.java is not relevant for A.java here.

only time "remote" .lock files are relevant is if you run them remotely, i.e. https://xam.dk/myapp.java could have a myapp.lock to help guide/validate what is expected. Its also where I realized we record the .java files so this as a sideeffect lets us implement //SOURCES **/*.java for remote runs.

And the deps, at least the Maven ones, already have their own checksumming, don't they?

the checksumming maven should do (which we haven't enabled) is to validate that the downlaod matches the checksum of the specific artifact.

I still need some .lock file that lists what checksum I actually expect, so if the remote one gets messed with (including the checksum file) we will fail the run.

does that help?

wfouche · 2026-02-23T17:42:25Z

Is <script>.lock the right default for local file refs?

Lock files have many negatives associated with them. I would much rather use a sqlite DB instance stored in ~/.jbang/db.

Just imagine, if two "jbang lock " commands are run at the same time for the same <ref>, then a DB instance is much rather preferred over cumbersome errorprone file system operations.

quintesse · 2026-02-23T17:56:40Z

and if they did it does not necessarily equate with what was resolved at time of running the "parent".

Well, it should, shouldn't it? Our cache works the same way: if any of the files change the checksum changes. So if in the case of our cache we can make do with one number, why can't we with this? (Not that a particularly care how many numbers there are in this lock file, it gets generated for me, but just curious)

wfouche · 2026-02-23T17:58:29Z

Would love to see an ADR written for the requirements. :-)

https://github.com/jbangdev/jbang-adrs

There might be a trivial implementation that can implemented in only a few lines of code. According to my understanding of the requirements a simple implementation would be to:

compute SHA256 hashes of all relevant files
XOR hashes together (call the result a "fingerprint"), and associate this result with the <ref>(fingerprint)

To validate that all the files are still the same, just recompute the SHA256 hashes and XOR them together (order is not significant), and compare to saved <ref>(fingerprint) value.

quintesse · 2026-02-23T18:07:28Z

Indeed, if we could do it with a single fingerprint that would be much simpler: Then we could even do things like:

> jbang ref$8b1810149d231d50d34e138bb44ec2ecda5dc0b9@fubar
Error: referenced resource does not match expected fingerprint

Btw, I never understood the idea of those remote checksums/lock files when in the end if somebody is able to upload a hacked version then they'll upload hacked checksums/lock files as well, right? Shouldn't they come from different sources so a hacker would at least need to hack different systems to be able to fully hack things?

maxandersen · 2026-02-24T04:59:51Z

Lock files have many negatives associated with them. I would much rather use a sqlite DB instance stored in ~/.jbang/db.

Just imagine, if two "jbang lock " commands are run at the same time for the same <ref>, then a DB instance is much rather preferred over cumbersome errorprone file system operations.

I'm not sure we are talking about the same thing here?

The data in the .lock files has to be "near" the run and not stored in a local user location.

if you have concurrent runs of jbang lock something is wrong.

jbang lock is for generating a "snapshot" of the lock info that you then read when running it (which yes can be done concurrently)

maxandersen · 2026-02-24T05:04:15Z

and if they did it does not necessarily equate with what was resolved at time of running the "parent".

Well, it should, shouldn't it? Our cache works the same way: if any of the files change the checksum changes. So if in the case of our cache we can make do with one number, why can't we with this? (Not that a particularly care how many numbers there are in this lock file, it gets generated for me, but just curious)

No it shouldn't.

If I run with a different version of a dependency than some transitive dependency uses the set of dependencies are not the same.

You can imagine having multiple .lock files for one resource - all dependent on how you ran it.

jbang lock --deps psqldriver.jar db.java != jbang lock db.java

Thus the .lock file is more a snapshot of a specific run scenario and thus we have --lock-file to have multiple lock files but do want something sensible for defaults so users dont have to deal with this unless they really need it.

maxandersen · 2026-02-24T05:14:42Z

Indeed, if we could do it with a single fingerprint that would be much simpler: Then we could even do things like:
> jbang ref$8b1810149d231d50d34e138bb44ec2ecda5dc0b9@fubar
Error: referenced resource does not match expected fingerprint

I don't grok what that does compared to jbang ref@fubar#8b1810149d231d50d34e138bb44ec2ecda5dc0b9 as suggested in this PR?

Btw, I never understood the idea of those remote checksums/lock files when in the end if somebody is able to upload a hacked version then they'll upload hacked checksums/lock files as well, right? Shouldn't they come from different sources so a hacker would at least need to hack different systems to be able to fully hack things?

different/similar/overlapping usecases.

remote checksums are used for verifying that the bits you manage to download is the same as what the remote server expects you to get. i.e. its there to catch "trivial" man-in-middle attacks.

Common case: you download maven artifact on a hotel wifi that is dumb and gives HTTP 200 OK status instead of 404 so when you download dev.jbang:jbang-devkitman:1.0 you get a .jar file with a bunch of html but you don't notice....if you had also downloaded the remote .sha256 file you would have caught that the checksums did not match.

This is download verification - which is different from this lock feature.

jbang lock (or npm lock, and similar) is about exactly NOT trusting the remote checksums because that source could be comprimised (or more commonly - versions might have drifted since you ran last and you would like to detect that).

Hence jbang lock downloads (preferably with the validation as above) and generate a .lock file to capture the current state. Now everytime you jbang run with the .lock file present it will only do so if the dependencies/sources stays the same - if they change the run will fail (or possibly warn depends on what lock modes we think are relevant)

Does that explain the difference?

maxandersen · 2026-02-24T05:19:56Z

to try explain the differences:

Day 1:
Resolution picks qux:2.1.0
Checksum valid.
Build works.

Day 30:
Resolution now picks qux:2.2.0 (new transitive).
Checksum valid.
Build breaks.

jbang lock ensures reproducible builds by freezing the entire resolved dependency graph — including transitives and sources — and storing checksums for every artifact.

Checksum verification ensures the downloaded bytes are correct.

Lock verification ensures you’re building the same thing.

You need both to protect against dependency drift and supply-chain tampering.

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

maxandersen · 2026-02-24T05:34:05Z

There might be a trivial implementation that can implemented in only a few lines of code. According to my understanding of the requirements a simple implementation would be to:

compute SHA256 hashes of all relevant files

XOR hashes together (call the result a "fingerprint"), and associate this result with the <ref>(fingerprint)

To validate that all the files are still the same, just recompute the SHA256 hashes and XOR them together (order is not significant), and compare to saved <ref>(fingerprint) value.

that is basically what this is doing but you still need the command defined and semantics of how you set this up, where to store the files, where/when to apply the lock and docs.

wfouche · 2026-02-24T07:00:32Z

I added a comment about storing a pre-computed checksum value in the catalog alias record.

Support script/alias SHA verification #1989

Edit: it would be nice to have a centrally stored checksum value, but it might not work well with versioned aliases.

maxandersen · 2026-02-24T10:37:32Z

added a comment about storing a pre-computed checksum value in the catalog alias record.

as mentioned on that I don't see reason why a catalog would need a separate field for that. Just make it part of the alias reference.

Support script/alias SHA verification #1989

Edit: it would be nice to have a centrally stored checksum value, but it might not work well with versioned aliases.

I don't know what you mean by centrally stored checksum value - these can by definition not be centrally stored as you cannot trust the central store. The values for checksum validation needs to be "relative to the resource" (i.e. add .sha256 to the file path and download) and for run verification as this issue is about the checksums must be near your "run" to list what you expect.

quintesse · 2026-02-24T12:10:37Z

I don't grok what that does compared to jbang ref@fubar#8b1810149d231d50d34e138bb44ec2ecda5dc0b9 as suggested in this PR?

That is nowhere in your explanation and I did not examine the code in depth :-)

Does that explain the difference?

Well, except for the fact that what you implemented is not a "lock" feature, as I understand them. In the Maven world transitive dependencies, AFAIK, hardly ever change (as long as you don't change your code). The lock feature in other languages, like Node, is exactly because you can't trust what you're getting from one run to another, so to avoid problems where code that worked 5 minutes ago suddenly stops working to have this concept of a lock file "use these versions, do NOT change them!".

So in this case , again AFAIK, what we have is a checksum, not a lock, right?
(and I still don't really understand why you'd want lists of checksums in the lock file)

But to recap, if I understand correctly, this is useful so you can publish .lock files together with your scripts so people that remote execute it can be somewhat more sure that what they will be running is what you, as the author, meant them to run?

maxandersen · 2026-02-24T19:20:32Z

This is exactly what node lock is so I'm not following why you think this is just checksum?

Npm records digest/checksum in an integrity field together with the transitive graph.

This is done so even if you get a file from local cache that has the right version metadata actually contains the binary content you expect.

wfouche · 2026-02-24T19:22:28Z

I was assembling an IKEA table and having a chat with an LLM (brave new world :)

I hope assembling the IKEA table did not require assistance from an LLM. :-) (just teasing)

maxandersen · 2026-02-24T19:25:15Z

"recap, if I understand correctly, this is useful so you can publish .lock files together with your scripts so people that remote execute it can be somewhat more sure that what they will be running is what you, as the author, meant them to run?"

It's 3-fold:

you can run Jbang lock and record a "run" and then to a jabang run with that same lock to ensure you are running exactly what you want. Reproducibility + secure pipeline.
you can share a lock file that when user runs can verify it is what author expected it to be. That's more like the checksum feature - but for the transitive graph
since it has the paths to sources it enables remote running of scripts even when //SoURCES **/*.java is used.

quintesse · 2026-02-25T08:24:07Z

This is exactly what node lock is so I'm not following why you think this is just checksum?

What I'm saying is that Node needs a lock file because their dependency resolution can literally change results from one minute to another. THat doesn't happen with Maven. The result it gives you today (for a fixed set of dependencies) is the same is it gave you yesterday and will be the same as it will give you tomorrow and in one month's or in one year's time.

So we (the Java ecosystem using Maven) don't need a lock file, we can do with a simple checksum. (To ensure none of the dependencies were changed)

maxandersen · 2026-02-25T20:41:16Z

This is exactly what node lock is so I'm not following why you think this is just checksum?

What I'm saying is that Node needs a lock file because their dependency resolution can literally change results from one minute to another. THat doesn't happen with Maven. The result it gives you today (for a fixed set of dependencies) is the same is it gave you yesterday and will be the same as it will give you tomorrow and in one month's or in one year's time.

This is just not true in the general case; and it its particular not true for jbang when you also consider the //SOURCES, //FILES, jitpack and usage of snapshot and version ranges in dependencies.

Even if you absolutely fixes all the versions of all transitive dependencies including sources and then just verify the checksum matches what the server you download from you will NOT have ensured that you are running the exact same set of bits.

So we (the Java ecosystem using Maven) don't need a lock file,

Just fyi, Gradle has locking built in (https://docs.gradle.org/current/userguide/dependency_locking.html),
sbt has a plugin (https://stringbean.github.io/sbt-dependency-lock/), maven has multiple lock file plugins (most popular seem to be https://github.com/chains-project/maven-lockfile)

we can do with a simple checksum. (To ensure none of the dependencies were changed)

Can you please tell me how you will do that without doing what this PR does:

Collect the list of dependencies, sources, resources - record their checksums, store that in a file and when you resolve it again verify that the list of dependencies and checksum matches what is downloaded (which cannot be the checksum of the remote server as it could be wrong/enemy)?

quintesse · 2026-02-25T22:36:21Z

Can you please tell me how you will do that without doing what this PR does:

Collect the list of dependencies, sources, resources - record their checksums ...

I'm saying that it seems you can have the result be a single number, not a file with a list of them. So I'm simply wondering why you did it that way. It, seemingly, is making it more complex than it needs to be.

version ranges in dependencies.

Ok, true, just never seen anyone actually use version ranges, so it wasn't something that I'd normally consider. But it's possible so we have to take it into account, indeed. But does this PR enforce that? (Meaning it will force the resolver to use the locked versions) Or will it just fail?

maxandersen · 2026-02-25T23:32:15Z

Can you please tell me how you will do that without doing what this PR does:
Collect the list of dependencies, sources, resources - record their checksums ...

I'm saying that it seems you can have the result be a single number, not a file with a list of them. So I'm simply wondering why you did it that way. It, seemingly, is making it more complex than it needs to be.

How can it be a single number when i.e. jbang run myapp.java points to a .java file, with i.e. 3 //DEPS lines?

that would at least require 4 checksums - and then the resolution can be different for those versions so need their transitives too.

version ranges in dependencies.

Ok, true, just never seen anyone actually use version ranges, so it wasn't something that I'd normally consider. But it's possible so we have to take it into account, indeed. But does this PR enforce that? (Meaning it will force the resolver to use the locked versions) Or will it just fail?

Currently it fails as it signals tampering - having the lock file enforce the versions would be a feature enhancement.

quintesse · 2026-02-25T23:49:17Z

How can it be a single number when i.e. jbang run myapp.java points to a .java file, with i.e. 3 //DEPS lines?

Because it all just boils down to a single thing? Just as we only have a single checksum now, even if you have a dozen //SOURCES and //FILES entries. They all just get added to the same number. Our current checksum already includes at least the identity of all the deps (their GAVs) , you'd just need to add their contents as well.

Currently it fails as it signals tampering

What? You just said yourself there could be perfectly normal reasons for the resolving to return different versions. That's not tampering, that completely normal and even expected behaviour. It just signals that what you're getting is not exactly the same as what you got last time. Saying it's "tampering" would just scare people without really knowing if it's true.

maxandersen · 2026-02-26T09:32:07Z

How can it be a single number when i.e. jbang run myapp.java points to a .java file, with i.e. 3 //DEPS lines?

Because it all just boils down to a single thing? Just as we only have a single checksum now, even if you have a dozen //SOURCES and //FILES entries. They all just get added to the same number. Our current checksum already includes at least the identity of all the deps (their GAVs) , you'd just need to add their contents as well.

The checksum we have now is meant for capturing what sources was included and for a quick way of detecting change to know if recompile is needed. Here it does not matter (as much) to know what has changed.

But sure its similar and we could expand it but we wouldn't be able to give a good user experience just having one bug number.

Currently it fails as it signals tampering

What? You just said yourself there could be perfectly normal reasons for the resolving to return different versions. That's not tampering, that completely normal and even expected behaviour. It just signals that what you're getting is not exactly the same as what you got last time. Saying it's "tampering" would just scare people without really knowing if it's true.

I mean "signals tampering" as in "potential tampering", maybe better phrase is "signals something changed so please verify before i run this"

please remember if you don't care about things has changed you don't use lock and if there is no .lock file you dont get warned/blocked/etc.

quintesse · 2026-02-26T10:02:45Z

maybe better phrase is "signals something changed so please verify before i run this"

👍

please remember if you don't care about things has changed you don't use lock and if there is no .lock file you dont get warned/blocked/etc.

Except if you want to use the "index" feature for //SOURCES **.java. Now I don't think many people will import a remote source file that uses globbing, but that is a point that might complicate things a bit.

maxandersen added 13 commits February 23, 2026 04:39

feat(lock): phase 1 checksum verify and basic lockfile enforcement

bfb8fee

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): add lock command and source manifest entries

a4c58f8

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): enforce locked source manifest matching in run

318ad23

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

fix(lock): treat bare #suffix as sha256 checksum prefix

33bd259

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): record and enforce transitive dependency graph in lock

c291972

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

test(lock): add lockfile edge-case coverage for missing entries

f066c35

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

test(lock): add drift edge-case checks for locked sources and deps

f31244c

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

docs(lock): add locking guide and CLI help pages

bf86270

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): use locked source manifest as include-source override

e86d9f7

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): finalize lock modes and local-file lock defaults

ea4f36b

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

feat(lock): add per-artifact dependency digest recording and verifica…

a8447c3

…tion Assisted-by: Haley (openai-codex/gpt-5.3-codex)

refactor(lock): clean imports and polish lock verification flow

f3959da

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

refactor(lock): remove remaining fully-qualified type usages

f58bab6

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

maxandersen marked this pull request as draft February 23, 2026 14:19

maxandersen changed the title ~~lock sha/digest~~ lock sha/digest POC Feb 23, 2026

maxandersen added 2 commits February 23, 2026 06:42

ux(lock): improve security-focused mismatch errors with fix guidance

0257971

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

fix(help): include lock command in help sections mapping

dec32a5

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

fix(native-image): add dev.jbang.cli.Lock to reachability metadata

9c02c83

Assisted-by: Haley (openai-codex/gpt-5.3-codex)

Uh oh!

Conversation

maxandersen commented Feb 23, 2026

What’s included

Verification behavior

Why this design

Feedback requested

Related issues

Uh oh!

quintesse commented Feb 23, 2026

Uh oh!

maxandersen commented Feb 23, 2026

Uh oh!

wfouche commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quintesse commented Feb 23, 2026

Uh oh!

wfouche commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quintesse commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

wfouche commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

quintesse commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

wfouche commented Feb 24, 2026

Uh oh!

maxandersen commented Feb 24, 2026

Uh oh!

quintesse commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxandersen commented Feb 25, 2026

Uh oh!

quintesse commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxandersen commented Feb 25, 2026

Uh oh!

quintesse commented Feb 25, 2026

Uh oh!

maxandersen commented Feb 26, 2026

Uh oh!

quintesse commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wfouche commented Feb 23, 2026 •

edited

Loading

wfouche commented Feb 23, 2026 •

edited

Loading

quintesse commented Feb 23, 2026 •

edited

Loading

wfouche commented Feb 24, 2026 •

edited

Loading

quintesse commented Feb 25, 2026 •

edited

Loading

quintesse commented Feb 25, 2026 •

edited

Loading