where flakes fall off: an eval cache tale

22 april, 2025

nix flakes are great. they provide a fairly integrated experience of both standardizing structured attributes, like packages, devShells, nixosConfigurations, while also making it fairly easy to integrate other flakes as inputs, and do provide a much better interface to the “new style” nix commands than the old ones did. they also provide a way to automatically update hashes for external inputs, superseding the nix-channel approach of software distribution, which was very bad and unintuitive to beginners.

as the start of the story, there’s one main killer feature over normal non-flake nix commands that’s really valuable, that of evaluation caching. the premise is pretty simple: some nix expressions are fairly complicated and take a long time to evaluate, so if we’re sure none of the inputs for the expression have changed, we could just avoid re-evaluating the whole expression altogether by substituting in the final result (usually a derivation path) from a cache.

there are two main ingredients in the flakes soup that enable this feature:

that they prohibit “impure” nix; all nix expressions that fetch some value from the outside at runtime (eg. fetchurl, fetchgit) must do so by providing a hash of some sort (commonly refered as “locking”). they also prohibit those that directly read off of the runtime (eg. nixPath, getEnv) as it may cause unreproducibility with no code changes.
that they have predefined types and schemas for every possible flake attribute path, such that you can be sure which commands can and cannot be cached based on which paths they access.

the theory states that if none of the inputs for a given pure expression have changed, then the output must be byte to byte equal every time¹. the way flakes implement it is by a .sqlite file that links all inputs content hash, mashed together with the specific path used to access the flake, poiting directly to the outPath of the derivation tucked in with some metadata. the mental model for this then is that when you invoke nix build ./path#attr (or other cacheable command), it will calculate the input content hash, try to fetch it from an sqlite file, and if it matches, great, you just skipped multiple seconds of your startup time.

as it so happens, I did have in hands a derivation that I wanted to evaluate very rapidly, that took about 5s seconds to evaluate, and that I thought flakes could very well be a solution to. after rewriting the expression and benchmarking, the time to enter to build it went from 5s to 300ms. that’s a dandy improvement! case closed. nothing more to complain. right? right? well… ergh.

the pitfalls

in order to understand the main pain points, let’s examine from the ground up what nix build evaluation caching looks like, through the following directory structure:

$ ~ :: mkdir my-repo
$ ~ :: cd my-repo
$ ~/my-repo :: git init
[...]
$ ~/my-repo :: mkdir a/b/c -p
$ ~/my-repo :: cd a/b/c && nix flake init
wrote: "home/leonardo/my-repo/a/b/c/flake.nix"
$ ~/my-repo/a/b/c ::

the default flake created from nix flake init contains the nixpkgs.hello derivation, so it’s perfect to our tests:

$ ~/my-repo :: nix build ./a/b/c#default
error: path '/nix/store/0ccnxa25whszw7mgbgyzdm4nqc0zwnm8-source/a/b/c/flake.nix' does not exist

huh, file doesn’t exist? what’s that path? it turns out that when no fetcher is specified, ./a/b/c#default will be interpreted as git+file:./a/b/c#default, ie. it’s assumed to be a git repository. as such, nix will first do a pseudo git clone --depth 1 to the nix store, with the contents taken from the working area and then execute from there, but the generated path (/nix/store/0ccnxa25whszw7mgbgyzdm4nqc0zwnm8-source) doesn’t contain our flake.nix file, as it hasn’t been included in the git tree yet.

one may also notice an additional “quirk”: the copied directory is not ~/my-repo/a/b/c but ~/my-repo. as it turns out, my previous statement was misleading, as ./a/b/c#default is actually interpreted as git+file:.?dir=a/b/c#default. this means that the whole ~/my-repo git directory is copied to the store, not only the subdirectory which contains the flake. running git add a/b/c/flake.nix before calling nix build should do the trick.

$ ~/my-repo :: time nix build ./a/b/c#default
warning: Git tree '/home/leonardo/my-repo' is dirty
warning: creating lock file '/home/leonardo/my-repo/a/b/c/flake.lock':
• Added input 'nixpkgs':
    'github:nixos/nixpkgs/b024ced1aac25639f8ca8fdfc2f8c4fbd66c48ef?narHash=sha256-fusHbZCyv126cyArUwwKrLdCkgVAIaa/fQJYFlCEqiU%3D' (2025-04-17)
warning: Git tree '/home/leonardo/my-repo' is dirty

real    0m0,881s
user    0m0,460s
sys     0m0,277s

now, let’s get this eval cache party started!!!

$ ~/my-repo :: time nix build ./a/b/c
warning: Git tree '/home/leonardo/my-repo' is dirty

real    0m0,889s
user    0m0,461s
sys     0m0,284s
$ ~/my-repo :: time nix build ./a/b/c
warning: Git tree '/home/leonardo/my-repo' is dirty

real    0m0,896s
user    0m0,457s
sys     0m0,284s
$ ~/my-repo :: time nix build ./a/b/c
warning: Git tree '/home/leonardo/my-repo' is dirty

real    0m0,901s
user    0m0,469s
sys     0m0,288s

huh, 900ms is way too high for a cache hit, and it does not seem to go lower than that. this is because, nix build will always copy the directory to the store when the git tree is dirty, which forces it to evaluate everytime. I cannot find any explanation nor discussion about this topic, but one plausible explanation would be that they’re utilizing the git HEAD commit itself to check whether or not it’s already been copied to the store, and if the tree is dirty then there’s no commit to look at. this should also be the reason why it is complaining so much about Git tree '/home/leonardo/my-repo' is dirty. invoking nix build with -vvv give us a better view:

$ ~/my-repo :: time nix build ./a/b/c#default -vvv
evaluating file '<nix/derivation-internal.nix>'
evaluating derivation 'git+file:///home/leonardo/my-repo?dir=a/b/c#default'...
warning: Git tree '/home/leonardo/my-repo' is dirty
source path '/home/leonardo/my-repo/home/leonardo/my-repo/' is uncacheable
copying '/home/leonardo/my-repo/home/leonardo/my-repo/' to the store...
performing daemon worker op: 7
acquiring write lock on '/nix/var/nix/temproots/9659'
performing daemon worker op: 26
got tree '/nix/store/0r5c4j5yww8mljggyy3f97glppgnc2pk-source' from 'git+file:///home/leonardo/my-repo?dir=a/b/c'
evaluating file '/nix/store/0r5c4j5yww8mljggyy3f97glppgnc2pk-source/a/b/c/flake.nix'
(...)

not only do we see the whole copy going on behind our backs together with all file evaluations, we can also see the following worrying line: source path '/home/leonardo/my-repo/home/leonardo/my-repo/' is uncacheable (I’ll assume the duplicate path is just an issue with the debug print). yep, flakes can only cache evaluations on clean git trees.

$ ~/my-repo :: git add .
$ ~/my-repo :: git commit -m 'initial commit'
[master (root-commit) 0a8327d] initial commit
 3 files changed, 43 insertions(+)
 create mode 100644 a/b/c/flake.lock
 create mode 100644 a/b/c/flake.nix
 create mode 120000 result
$ ~/my-repo :: time nix build ./a/b/c

real    0m0,948s
user    0m0,527s
sys     0m0,269s
$ ~/my-repo :: time nix build ./a/b/c

real    0m0,103s
user    0m0,051s
sys     0m0,026s

finally we see the cache hit. if we try to take a look into what’s going on - this time with -vvvv, we see:

$ ~/my-repo :: time nix build ./a/b/c -vvvv
evaluating file '<nix/derivation-internal.nix>'
evaluating derivation 'git+file:///home/leonardo/my-repo?dir=a/b/c#packages.x86_64-linux.default'...
using cache entry 'gitRevCount:{"rev":"0a8327deeee66e66f6385f7d134195620649f648"}' -> '{"revCount":1}'
using cache entry 'gitLastModified:{"rev":"0a8327deeee66e66f6385f7d134195620649f648"}' -> '{"lastModified":1745187032}'
using cache entry 'fetchToStore:{"fingerprint":"0a8327deeee66e66f6385f7d134195620649f648","method":"nar","name":"source","path":"/","store":"/nix/store"}' -> '{"storePath":"44yqsx1xq89f6ap49h7a7jp483229y3l-source"}'
performing daemon worker op: 11
acquiring write lock on '/nix/var/nix/temproots/11438'
performing daemon worker op: 1
using cache entry 'fetchToStore:{"fingerprint":"0a8327deeee66e66f6385f7d134195620649f648","method":"nar","name":"source","path":"/","store":"/nix/store"}' -> 'null', '/nix/store/44yqsx1xq89f6ap49h7a7jp483229y3l-source'
store path cache hit for '/home/leonardo/my-repo/home/leonardo/my-repo/'
performing daemon worker op: 26
got tree '/nix/store/44yqsx1xq89f6ap49h7a7jp483229y3l-source' from 'git+file:///home/leonardo/my-repo?dir=a/b/c&ref=refs/heads/master&rev=0a8327deeee66e66f6385f7d134195620649f648'

indeed, my theory seems to be correct: the fetchToStore cache hit seems to be using a fingerprint: 0a8327deeee66e66f6385f7d134195620649f648 that directly references the current commit:

$ ~/my-repo :: git log --pretty=oneline
0a8327deeee66e66f6385f7d134195620649f648 (HEAD -> master) initial commit

moreover, this state is extremely fragile. if we try to add a new file to the directory, even if it’s not read from the nix expression, it will invalidate the cache, because it will generate a different store path, and the git tree isn’t clean anymore.

$ ~/my-repo :: touch newfile
$ ~/my-repo :: git add newfile
$ ~/my-repo :: time nix build ./a/b/c
warning: Git tree '/home/leonardo/my-repo' is dirty

real    0m0,885s
user    0m0,465s
sys     0m0,256s

in my own example, the derivation that I was examining initially is one in a subdirectory inside a multi gigabyte monolithic git repository, so one can only imagine how fast copying it to the store on almost every invocation really is. benchmarking it, it seems that a whole second is spent only on copying to the store, and actually evaluating the expression took around 3.5s.

but this is very disappointing. most of the time when working on a repository, you’ll have uncommited source code, and so you’ll most likely never follow the fastest cache codepath. hell, you’re most likely rarely hitting the cache as it requires you to have no changes to the whole repository since the last time you ran it.

thus, we can summarize the two main issues with flakes evaluation cache as:

lack of glanularity of inputs, where the some inputs selected to be tracked to invalidate the cache are irrelevant to the nix expression itself.
copying the repository to the store, as the local repository must be copied to the store before starting the evaluation, adding a lot of time when the repositories are sizable enough.

alternatives

I’m not the first person to have suffered with these issues. the lazy trees pull request has been on going since 2022, trying to avoid unncessary copies to the store when the local repository store path isn’t read from during evaluation. I don’t know the current state of its implementation but the changes are so big that it’s been spliced into multiple smaller PRs, so I do not know when this issue is getting fixed. it also does not try to tackle the lack of glanularity of cache invalidation, so I had to look for other solutions.

the devenv.sh team had noticed that flake eval cache was unreliable and hand rolled their own. the solution to the copies is actually to just not do it, by using nix-build instead of nix build, but, in order to get better glanularity, you need to track yourself which files are read from during evaluation, and implement and invalidate the cache yourself.

they do it by utilizing the same approach as the lorri tool: leveraging nix’s internal logs, whereby invoking nix-build with -vv --log-format internal-json provides you a very barebones look into exactly which files and directories are read from during evaluation, which form the basis of a much nicer and more glanular cache.

and while that implementation does work decently, it is very cumbersome to maintain, as any changes to the internal log format mean that the implementation breaks - one can only imagine the stability that. another issue with this approach is that it cannot track reads done from outside of the main nix runtime; in particular, it cannot track files read from IFDs, which is a downside given that unfortunately my original expression did contain an IFD.

the drive for a better integration with the nix evaluation API was such that they decided to switch to tvix, where the runtime is designed with observability in mind and the API access with rust is much smoother². to the best of my knowledge, tvix is not stable enough to be used even in development, as there are some missing implementation details, and so I cannot ascertain how far into the switch to the new implementation they are.

there’s one additional quality of the tvix evaluator that I think is worth mentioning. given that it is implemented through a bytecode VM, a smart enough optimizing compiler should be able to inline most function calls in a expression, given they all have concrete values in their hand - the overall expression should just evaluate to a derivation afterall. it seems that the devenv.sh team is heading towards this direction, by trying to leverage some mixed file-glanular bytecode caching to not invalidate the whole cache when single files change. I recon this could very well be the best approach for a change resistant, very fast evaluator of nix.

nix-forall’s eval cache

in the meantime, for the last couple of months I’ve been developing a Rust wrapper around the FFI for the recently stabilized nix C API, called nix-forall, and decided to give my own twist into solving this problem. instead of relying on the internal logs - which aren’t even available from the nix C API (yet?) - I decided to try to use the linux ptrace API! what could go wrong tracing all system calls looking for those opening file descriptors?

I chose ptrace over fanotify and inotify because I did not want to have a single root repository to trace, as I did want to support non-flakes alternatives, and there isn’t a good way to guess what the root repository might be without a flake.nix, other than / which the user might not have privileges to reading.

using ptrace meant I’d need to track all system calls, and filter for those that open file descriptors as well as those that read from them, which wasn’t easy. ptrace also meant that I’d need to fork the original process, as I wasn’t able to trace a process’s thread from the parent itself³. in order to reduce the number of system calls to track, one can use libseccomp to raise PTRACE_EVENT_SECCOMP signals at specific syscalls, as with pure ptrace one must stop at every single syscall and ignore those unrelated, paying a big price on context switches between the kernel, the tracee and the tracer process every time⁴.

in the end, my implementation ended up with a hefty 10% performance tax on the slow codepath (though my benchmarks are not of great quality at the moment), as it paid both the fork() and the syscall monitoring additional costs, but it was able to severely limit the ammount of files that were needed to track, while also including in those files read from the IFD.

however, just as the slow code path got slower, the fast path got even faster: the unoptimized build written with almost no optimizations takes 5ms instead of the old 300ms, a whopping 60x improvement, and it persists through writes to unrelated files, something that flakes never even dreamed of doing. this is to be expected, as there’s basically no heavy machinery needed for a cache hit, all it does is hash the source file and query for any cached results for the specific attribute path. if there are any, it asserts that all relevant files’ hashes haven’t changed, and if so, it returns the cached output directly.

this is by no means a definitive solution. the current implementation is very barebones, and does not even support flakes. as the C API itself is fairly unstable and not feature complete, it just recently got support for handling and fetching flake inputs, and so in the mean time I tried to design my evaluation to handle flakes as well as non-flakes. the best I came up with was to take the accessor path (eg. packages.x86_64-linux.default) into consideration when inserting things into the cache, much like flakes does, and using a modified version of eelco’s flake-compat to access the flake through a non-flake interface, as the original package I wanted to interact with was offered through a flake interface afterall.

it also countains implementation bugs in the main loop when multithreading is involved (solely due to my poor skills with it), as it isn’t exactly trivial to track the state of multiple processes at once when the ptrace API has a bunch of implicitly tracked state. for example, one can issue PTRACE_SYSCALL requests to stop the tracee at the next syscall, raising PTRACE_EVENT_SYSCALL signals, but whether it refers to pre-execution or post execution state of said syscall is tracked entirely implicitly, and you’re expected to track yourself. these kinds of issues are bound to be ironed out as the time passes and I understand what I’m doing wrong, but it isn’t exactly easy to find much information about how handling it properly looks like so usually I can only refer to debugger source code implementations.

still, I do not think my approach is going to be the future of evaluation cache, due to it’s sheer complexity and difficulty to maintain. I reckon an approach based on internal compiler observability is going to win long term, and it doesn’t need necessarily to be through tvix. the nix team has mentioned⁵ there’s some interest in integrating an OpenTelemetry approach to nix internal logs, improving observability and allowing for it to be operated more cleanly from the outside.

either way, I would really like for there to be some attention towards these painpoints of using flakes, as I believe a lot of people will end up with the same problem as mine.

this isn’t exactly true. there are ways to create non-reproducible builds even if the source inputs haven’t changed, but the general consensus in the community seems to be that you need to go out of your way to cause unreproducibility in your derivation. this great blog post by Julien Malka reports his research to measure how well nixpkgs fares against reproducible builds, citing that the main ways that differences occur is when some kind of environment data is embedded into the build script itself. ↩︎
I’m selling tvix a bit short here, their implementation is awesome. they offer a content addressed store that reduces storage space by as much as 80%, a byte code optimizing compiler and VM runtime, instead of the slow AST fold nix does, and most incredibly, an asynchronous evaluator that treats IFDs as first class objects that can be executed in parallel with nix code. ↩︎
I’m saying this based on this 2006 email thread from Linus saying that they added checks to avoid ptrace‘ing when both the parent and the child are in the same thread group, which means that I cannot simply invoke a thread to dispatch the job. I cannot find any more information about this, but trying it locally indeed failed. ↩︎
this amazing blog post by Alfonso Beato gives a very in depth example on how to use the ptrace + seccomp power duo. ↩︎
this has been mentioned as a proceeding of one of the nix team meetings, referencing this 2022 issue that hasn’t seen much activity. ↩︎

#nix #flakes