Skip to content

refactor: drop remarshal dependency#1008

Open
CertainLach wants to merge 1 commit intoipetkov:masterfrom
CertainLach:refactor/drop-remarshal
Open

refactor: drop remarshal dependency#1008
CertainLach wants to merge 1 commit intoipetkov:masterfrom
CertainLach:refactor/drop-remarshal

Conversation

@CertainLach
Copy link
Copy Markdown

Motivation

Remarshal closure is huge and causes too many rebuilds on platforms without good hydra cache coverage (even aarch64-linux, because it depends on ffmpeg, gtk4, openblas among many other things)

Similar thing is proposed to nixpkgs: NixOS/nixpkgs#464514

As a downside, yj which I have used to replace remarshal with doesn't support sorting keys during remarshal, but that is only used for checks, and it doesn't even seem to be required? It can be returned to checks though.

Checklist

  • added tests to verify new behavior
  • added an example template or updated an existing one
  • updated docs/API.md (or general documentation) with changes
  • updated CHANGELOG.md

@ipetkov
Copy link
Copy Markdown
Owner

ipetkov commented Apr 13, 2026

Hi @CertainLach thanks for the PR!

I'm curious what direction upstream will take and consider taking their example. There's certain arguments to be made whether yj is a good dependency to take on (e.g. is it maintained? is it "finished"?) and I'm not particularly familiar with the developers and dependency chains of both remarshal and yj (though at the very least the former is already used by nixpkgs for things).

Even that question aside, ultimately the question comes down to choosing to depend on one ecosystem (python) or another (Go). In terms of rebuilding without cache coverage both might feel "out of place" if you just want to build a Rust project but suddenly see a bunch of "unrelated" stuff being built 😅

I'm sympathetic to flattening dependency graphs where it makes sense, but perhaps a better question is why does remarshal depend on so many things like gtk or ffmpeg? At a quick glance it appears that a bunch of test/optional/transitive dependencies end up pulling on this. Is there a way to build remarshal without all those and special casing that upstream in nixpkgs?

As an alternative we can always add a new utility to crane-utils that does a json -> toml conversion (the only usecase we have for using remarshal, our own test suite not withstanding). At the very least that makes the full bootstrap chain only dependent on "one ecosystem" (e.g. to build a Rust crate with crane we'd have to build some Rust utilities first, but at the very least you don't have to care about a fully working Go compiler for yj or a ffmpeg build for remarshal etc).

The potential downside to this is that every single build will now require compiling crane-utils (currently this is only required if your dependency graph contains a git dependency). Granted this will be cached/deduped across all builds which use the same crane checkout, but for folks doing builds without their own cache, that's one extra thing to recompile and relink in their CI (whereas today remarshal is available via the NixOS cache, at least in my experience of sticking to the nix{pkgs,os}-unstable branches).

I'm not sure if there's a blatantly obvious right answer here, besides exposing a version of writeTOML which uses our own crane-utils converter and documenting how consumers can opt into using that over remarshal

@CertainLach
Copy link
Copy Markdown
Author

CertainLach commented Apr 13, 2026

At a quick glance it appears that a bunch of test/optional/transitive dependencies end up pulling on this. Is there a way to build remarshal without all those and special casing that upstream in nixpkgs?

remarshal depends on rich-argparse (argument parsing library), rich-argparse depends on rich (terminal renderer of markdown etc), rich depends on markdown-it (markdown-parser), it depends on pytest and on ffmpeg (I forgot how, but I have seen it in the graph of runtime closure?), and then there are lots of dependencies, and not all of them are test-only. I have looked at which of them might be made optional, but haven't found any feasible way to make it possible to disable them transitively without breaking anything else, those are just many packages suffering from the feature creep.

On the other hand, because of how go ecosystem is self-contained (or I would say isolated, since no FFI and other stuff) - they are usually quickest and easiest to build, which makes yj ideal for such sort of utilities.

I didn't know about crane-utils, I assumed that crane being a rust build tool would not use rust in itself, it seems it would be a perfect place to have this utility instead, I can rewrite this PR to use serde-toml+serde-json

whereas today remarshal is available via the NixOS cache, at least in my experience of sticking to the nix{pkgs,os}-unstable branches

Because of the huge dependency chain, it was not available for aarch64 at the moment of me creating this PR, and it would never be available for armv7l, which I'm avoiding cross-compiling to (my CI is already too complicated for that: https://github.com/CertainLach/jrsonnet/blob/master/flake.nix#L109-L142, and aarch64 is capable of running armv7l natively). Still, remarshal has too large closure for my taste :D
This closure can be reduced when NixOS/nixpkgs#272178 is resolved, but I don't think it would be anytime soon

In the meantime I would just use my fork of crane instead

@CertainLach
Copy link
Copy Markdown
Author

CertainLach commented Apr 13, 2026

Ah, yes, I also had a breakage in openblas on aarch64 on nixpkgs-unstable, which was only used in pytest, which made me question HOW does TOML serializer even depends on openblas

It was fixed here: NixOS/nixpkgs#508576

@dpc
Copy link
Copy Markdown
Contributor

dpc commented Apr 13, 2026

Ugh. Why do people keep building (and/or adapting) software in terribly unsuitable languages for it as system tools. :/

Then crane indeed only needs json -> toml, then it probably should in fact roll one like it in Rust. Could really nearly vibecode something so simple, and slap a fuzzer/proptest potentially comparing results with another tool like it.

Probalby nearly no one really needs all 4 in "YAML <=> TOML <=> JSON <=> HCL" conversion. Would be better to have multiple tools doing selected pairs only anyway.

@ipetkov
Copy link
Copy Markdown
Owner

ipetkov commented Apr 13, 2026

I have a quick proof of concept here if anyone wants to give it a whirl before I polish it off further (e.g. add tests and docs): https://github.com/ipetkov/crane/tree/push-kxwrxkxslrxv

You can test this via:

let
 craneLib = (inputs.crane.mkLib pkgs).overrideScope (final: prev: {
   writeTOML = final.writeTOMLViaCraneUtils;
 });
in
etc

@CertainLach
Copy link
Copy Markdown
Author

CertainLach commented Apr 13, 2026

Probalby nearly no one really needs all 4 in "YAML <=> TOML <=> JSON <=> HCL" conversion. Would be better to have multiple tools doing selected pairs only anyway.

Yep, but remarshal is even worse:
cbor <=> json <=> msgpack <=> python <=> toml <=> yaml <=> yaml-1.1 <=> yaml-1.2

...All of that while being dependent on ffmpeg and gtk4 for no reason

I only picked yj because it was already proposed to be used in nixpkgs as a replacement for this python abomination

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants