Conversation
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
…tion Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
I understand the main resource digest, but why the others? I'd think that having the sources and deps explicitly mentioned in the main source files would already take care of that? Of course I'm assuming each included (re)source file would have it's own .lock file. Is that a wrong assumption perhaps? |
How do I check the digest/sha of those without them being listed? look in .lock files from other ecosystems; there is a "root" (i.e. your project) and then listed what "dependencies" are needed with their sha's (including transitives). Same model here.
not following what you mean here?
each included resource file does not have a lock file - and if they did it does not necessarily equate with what was resolved at time of running the "parent". A.java has //DEPS dep:something:1 uses B.java that uses //DEPS dep:something:1.2 dep:other:3 i run only time "remote" .lock files are relevant is if you run them remotely, i.e. https://xam.dk/myapp.java could have a myapp.lock to help guide/validate what is expected. Its also where I realized we record the .java files so this as a sideeffect lets us implement
the checksumming maven should do (which we haven't enabled) is to validate that the downlaod matches the checksum of the specific artifact. I still need some .lock file that lists what checksum I actually expect, so if the remote one gets messed with (including the checksum file) we will fail the run. does that help? |
Lock files have many negatives associated with them. I would much rather use a sqlite DB instance stored in ~/.jbang/db. Just imagine, if two "jbang lock " commands are run at the same time for the same |
Well, it should, shouldn't it? Our cache works the same way: if any of the files change the checksum changes. So if in the case of our cache we can make do with one number, why can't we with this? (Not that a particularly care how many numbers there are in this lock file, it gets generated for me, but just curious) |
|
Would love to see an ADR written for the requirements. :-) There might be a trivial implementation that can implemented in only a few lines of code. According to my understanding of the requirements a simple implementation would be to:
To validate that all the files are still the same, just recompute the SHA256 hashes and XOR them together (order is not significant), and compare to saved |
|
Indeed, if we could do it with a single fingerprint that would be much simpler: Then we could even do things like: Btw, I never understood the idea of those remote checksums/lock files when in the end if somebody is able to upload a hacked version then they'll upload hacked checksums/lock files as well, right? Shouldn't they come from different sources so a hacker would at least need to hack different systems to be able to fully hack things? |
I'm not sure we are talking about the same thing here? The data in the .lock files has to be "near" the run and not stored in a local user location. if you have concurrent runs of
|
No it shouldn't. If I run with a different version of a dependency than some transitive dependency uses the set of dependencies are not the same. You can imagine having multiple .lock files for one resource - all dependent on how you ran it.
Thus the .lock file is more a snapshot of a specific run scenario and thus we have --lock-file to have multiple lock files but do want something sensible for defaults so users dont have to deal with this unless they really need it. |
I don't grok what that does compared to
different/similar/overlapping usecases. remote checksums are used for verifying that the bits you manage to download is the same as what the remote server expects you to get. i.e. its there to catch "trivial" man-in-middle attacks. Common case: you download maven artifact on a hotel wifi that is dumb and gives HTTP 200 OK status instead of 404 so when you download dev.jbang:jbang-devkitman:1.0 you get a .jar file with a bunch of html but you don't notice....if you had also downloaded the remote .sha256 file you would have caught that the checksums did not match. This is download verification - which is different from this lock feature.
Hence Does that explain the difference? |
|
to try explain the differences: Day 1: Day 30: jbang lock ensures reproducible builds by freezing the entire resolved dependency graph — including transitives and sources — and storing checksums for every artifact. Checksum verification ensures the downloaded bytes are correct. Lock verification ensures you’re building the same thing. You need both to protect against dependency drift and supply-chain tampering. |
Assisted-by: Haley (openai-codex/gpt-5.3-codex)
that is basically what this is doing but you still need the command defined and semantics of how you set this up, where to store the files, where/when to apply the lock and docs. |
|
I added a comment about storing a pre-computed checksum value in the catalog alias record. Edit: it would be nice to have a centrally stored checksum value, but it might not work well with versioned aliases. |
as mentioned on that I don't see reason why a catalog would need a separate field for that. Just make it part of the alias reference.
I don't know what you mean by centrally stored checksum value - these can by definition not be centrally stored as you cannot trust the central store. The values for checksum validation needs to be "relative to the resource" (i.e. add .sha256 to the file path and download) and for run verification as this issue is about the checksums must be near your "run" to list what you expect. |
That is nowhere in your explanation and I did not examine the code in depth :-)
Well, except for the fact that what you implemented is not a "lock" feature, as I understand them. In the Maven world transitive dependencies, AFAIK, hardly ever change (as long as you don't change your code). The lock feature in other languages, like Node, is exactly because you can't trust what you're getting from one run to another, so to avoid problems where code that worked 5 minutes ago suddenly stops working to have this concept of a lock file "use these versions, do NOT change them!". So in this case , again AFAIK, what we have is a checksum, not a lock, right? But to recap, if I understand correctly, this is useful so you can publish .lock files together with your scripts so people that remote execute it can be somewhat more sure that what they will be running is what you, as the author, meant them to run? |
|
This is exactly what node lock is so I'm not following why you think this is just checksum? Npm records digest/checksum in an integrity field together with the transitive graph. This is done so even if you get a file from local cache that has the right version metadata actually contains the binary content you expect. |
I hope assembling the IKEA table did not require assistance from an LLM. :-) (just teasing) |
|
"recap, if I understand correctly, this is useful so you can publish .lock files together with your scripts so people that remote execute it can be somewhat more sure that what they will be running is what you, as the author, meant them to run?" It's 3-fold:
|
What I'm saying is that Node needs a lock file because their dependency resolution can literally change results from one minute to another. THat doesn't happen with Maven. The result it gives you today (for a fixed set of dependencies) is the same is it gave you yesterday and will be the same as it will give you tomorrow and in one month's or in one year's time. So we (the Java ecosystem using Maven) don't need a lock file, we can do with a simple checksum. (To ensure none of the dependencies were changed) |
This is just not true in the general case; and it its particular not true for jbang when you also consider the //SOURCES, //FILES, jitpack and usage of snapshot and version ranges in dependencies. Even if you absolutely fixes all the versions of all transitive dependencies including sources and then just verify the checksum matches what the server you download from you will NOT have ensured that you are running the exact same set of bits.
Just fyi, Gradle has locking built in (https://docs.gradle.org/current/userguide/dependency_locking.html),
Can you please tell me how you will do that without doing what this PR does: Collect the list of dependencies, sources, resources - record their checksums, store that in a file and when you resolve it again verify that the list of dependencies and checksum matches what is downloaded (which cannot be the checksum of the remote server as it could be wrong/enemy)? |
I'm saying that it seems you can have the result be a single number, not a file with a list of them. So I'm simply wondering why you did it that way. It, seemingly, is making it more complex than it needs to be.
Ok, true, just never seen anyone actually use version ranges, so it wasn't something that I'd normally consider. But it's possible so we have to take it into account, indeed. But does this PR enforce that? (Meaning it will force the resolver to use the locked versions) Or will it just fail? |
How can it be a single number when i.e. that would at least require 4 checksums - and then the resolution can be different for those versions so need their transitives too.
Currently it fails as it signals tampering - having the lock file enforce the versions would be a feature enhancement. |
Because it all just boils down to a single thing? Just as we only have a single checksum now, even if you have a dozen //SOURCES and //FILES entries. They all just get added to the same number. Our current checksum already includes at least the identity of all the deps (their GAVs) , you'd just need to add their contents as well.
What? You just said yourself there could be perfectly normal reasons for the resolving to return different versions. That's not tampering, that completely normal and even expected behaviour. It just signals that what you're getting is not exactly the same as what you got last time. Saying it's "tampering" would just scare people without really knowing if it's true. |
The checksum we have now is meant for capturing what sources was included and for a quick way of detecting change to know if recompile is needed. Here it does not matter (as much) to know what has changed. But sure its similar and we could expand it but we wouldn't be able to give a good user experience just having one bug number.
I mean "signals tampering" as in "potential tampering", maybe better phrase is "signals something changed so please verify before i run this" please remember if you don't care about things has changed you don't use lock and if there is no .lock file you dont get warned/blocked/etc. |
👍
Except if you want to use the "index" feature for |
This PR adds lock-based integrity + reproducibility checks for JBang refs, while preserving normal run behavior when no lock file exists.
p.s. I was not actually expecting to have this in now but implementation shaped up nicely while I was assembling an IKEA table and having a chat with an LLM (brave new world :) - opening as draft PR to get feedback
What’s included
jbang lock <ref>app.java) ->app.java.lock.jbang.lock--lock-file=...--locked=<none|lenient|strict>none: ignore lock checkslenient(default): if lock data exists, enforce it; if lock missing, run normallystrict: require lock entry when lock file exists and enforce strict matchingref=sha256:...(main resource digest)ref.sources=...(resolved source manifest)ref.deps=...(resolved transitive dependency coordinates)ref.dep.<gav>=sha256:...(per-artifact dependency digests)Verification behavior
When lock data exists, JBang validates:
.sources) consistency.deps) consistency.dep.<gav>)Mismatch -> fail with explicit error.
Digest mismatch errors include the resolved file path.
Why this design
jbang lock) from execution (jbang run ...).Feedback requested
--lockedmode semantics (none|lenient|strict) right?<script>.lockthe right default for local file refs?Related issues
Relates to:
Supersedes / consolidates discussion from: