* cat-file --batch-command info performance problem
@ 2025-08-05 21:21 Rob Browning
2025-08-05 22:10 ` Rob Browning
0 siblings, 1 reply; 2+ messages in thread
From: Rob Browning @ 2025-08-05 21:21 UTC (permalink / raw)
To: git
While doing some testing before a bup release, I ran in to a performance
problem that I've narrowed down to git (2.47.2) cat-file --batch-command
(currently, bup often relies on git cat-file).
I can reproduce the problem with a ~125GB (~3M object) repository on
an external SSD and a system with 16GB RAM via "git --batch-command <
fetch-oids" where fetch-oids contains 8k "info HASH" commands.
That process runs at 32 hashes/sec (overall average), with a cold
cache (echo 3 > /proc/sys/vm/drop_caches), and does not improve over
repeated runs. While that's running, every time I check, git's
reading about 300-400+ MB/s.
That didn't seem right, so I wrote a test command that produces the same
information via direct index access and packfile reads/seeks. With a
cold cache, that runs at 1.5k hashes/sec (even from python), and on the
second or third run, 9 hashes/seck. Interestingly, if I run cat-file
after the test command has warmed up the cache, cat-file then reaches
53k hashes/sec.
My current guess is that cat-file's approach (all mmap?) is causing
some kind of kernel derived read amplification that's vastly
increasing the working set.
This is easily repeatable here, so I'd be happy to help test if that's
desirable, and perhaps even to pursue it if I can, and if it seems like
something that could/should be addressed in git.
If not, that's also helpful to know, and then we'll just handle the
lookups ourselves.
Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: cat-file --batch-command info performance problem
2025-08-05 21:21 cat-file --batch-command info performance problem Rob Browning
@ 2025-08-05 22:10 ` Rob Browning
0 siblings, 0 replies; 2+ messages in thread
From: Rob Browning @ 2025-08-05 22:10 UTC (permalink / raw)
To: git
Rob Browning <rlb@defaultvalue.org> writes:
> While doing some testing before a bup release, I ran in to a performance
> problem that I've narrowed down to git (2.47.2) cat-file --batch-command
> (currently, bup often relies on git cat-file).
>
> I can reproduce the problem with a ~125GB (~3M object) repository on
> an external SSD and a system with 16GB RAM via "git --batch-command <
> fetch-oids" where fetch-oids contains 8k "info HASH" commands.
Just after sending this, I thought to wonder whether the issue might
somehow be with the hardware/driver/etc. (that particular drive is an
nvme ssd in an external usb-3 case), and while I'm not testing the exact
same repository (it's a slightly larger, related one), preliminary
results suggest cat-file behaves much more reasonably with an internal
nvme drive, i.e. it starts slow, gets faster, and ends up repeatably at
37k hashes/sec after the second run.
So I'll probably test a bit more, but while I'd be quite interested in
the cause, it seems likely my solution should just be to replace that
hardware.
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-08-05 22:10 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05 21:21 cat-file --batch-command info performance problem Rob Browning
2025-08-05 22:10 ` Rob Browning
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).