* Slow git pack-refs --all
@ 2025-12-25 22:13 Martin Fick
2025-12-25 23:38 ` brian m. carlson
0 siblings, 1 reply; 7+ messages in thread
From: Martin Fick @ 2025-12-25 22:13 UTC (permalink / raw)
To: git@vger.kernel.org
I was hoping to get some help debugging a busy large repository where git pack-refs --all tends to regularly take over 5mins to run in production. As you can imagine this is particularly problematic on a busy Gerrit server since it tends to hold the packed-refs.lock file for most of this duration. Any help is greatly appreciated, see the details below.
-Martin
What did you do before the bug happened? (Steps to reproduce your issue)
I have a large repository (~90M objects, ~50GB, ~3M refs) which is regularly (every ~2hours) repacked and maintained, but generally gets at least 300+ updates per maintenance cycle.
What did you expect to happen? (Expected behavior)
git pack-refs --all to complete in under 20s when there are only 200 loose refs
What happened instead? (Actual behavior)
git pack-refs --all takes more than 3 minutes
What's different between what you expected and what actually happened?
This is much slower than expected
Anything else you want to add:
Although the packed-refs file is large, copying it takes less than 1s, so there isn't a writing throughput issue with the filesystem. Additionally, jgit can pack-refs --all in under 20s on the same repo, so I don't believe there is an issue locking the 200 loose refs either. When observing the filesystem, I do see the packed-refs.new growing at a rate that seems slower than expected as if much more is happening while writing this file, than just writing the file.
An strace shows about 200+ open("./objects..") calls interspersed between around ~26K write() calls. I am surprised to see pack-refs reading objects at all.
Although the repository is not in terrible shape before packing refs (~1500 loose objects, 37pack files). Surprisingly, repacking the repo first does speed it up so that packing refs then takes under 20s.
This repository is on NFS.
[System Info]
git version:
git version 2.45.2
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64
compiler info: gnuc: 14.1
libc info: glibc: 2.17
$SHELL (typically, interactive shell): /bin/bash
[Enabled Hooks]
not run from a git repository - no hooks to show
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-25 22:13 Slow git pack-refs --all Martin Fick
@ 2025-12-25 23:38 ` brian m. carlson
2025-12-26 4:45 ` Jeff King
2025-12-31 5:39 ` Martin Fick
0 siblings, 2 replies; 7+ messages in thread
From: brian m. carlson @ 2025-12-25 23:38 UTC (permalink / raw)
To: Martin Fick; +Cc: git@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]
On 2025-12-25 at 22:13:54, Martin Fick wrote:
> Although the packed-refs file is large, copying it takes less than 1s,
> so there isn't a writing throughput issue with the filesystem.
> Additionally, jgit can pack-refs --all in under 20s on the same repo,
> so I don't believe there is an issue locking the 200 loose refs
> either. When observing the filesystem, I do see the packed-refs.new
> growing at a rate that seems slower than expected as if much more is
> happening while writing this file, than just writing the file.
>
> An strace shows about 200+ open("./objects..") calls interspersed
> between around ~26K write() calls. I am surprised to see pack-refs
> reading objects at all.
I think this is from `should_pack_ref`:
/* Do not pack broken refs: */
if (!ref_resolves_to_object(ref->name, refs->base.repo, ref->oid, ref->flags))
return 0;
So Git is going to need to verify that the object at least exists. I
don't know why we would need to _open_ them, however. Perhaps someone
else has ideas.
> Although the repository is not in terrible shape before packing refs
> (~1500 loose objects, 37pack files). Surprisingly, repacking the repo
> first does speed it up so that packing refs then takes under 20s.
>
> This repository is on NFS.
That's almost certainly part of your performance problem, too. Loading
a single pack file and index is going to be way, way faster than making
lots of network calls to open 37 pack file and 37 index files, plus at
least stat some loose objects.
I will note that at least some forges always have Git write pack files
and try to avoid loose objects altogether since that almost always
improves performance. You may want to set `receive.unpackLimit` to 1 to
see if that helps in the general case.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-25 23:38 ` brian m. carlson
@ 2025-12-26 4:45 ` Jeff King
2025-12-26 17:15 ` brian m. carlson
2025-12-31 5:48 ` Martin Fick
2025-12-31 5:39 ` Martin Fick
1 sibling, 2 replies; 7+ messages in thread
From: Jeff King @ 2025-12-26 4:45 UTC (permalink / raw)
To: brian m. carlson; +Cc: Martin Fick, git@vger.kernel.org
On Thu, Dec 25, 2025 at 11:38:30PM +0000, brian m. carlson wrote:
> I think this is from `should_pack_ref`:
>
> /* Do not pack broken refs: */
> if (!ref_resolves_to_object(ref->name, refs->base.repo, ref->oid, ref->flags))
> return 0;
>
> So Git is going to need to verify that the object at least exists. I
> don't know why we would need to _open_ them, however. Perhaps someone
> else has ideas.
The packed-refs file stores tag-peeling information. So pack-refs opens
the object for any newly written ref via peel_object(), which has to at
least read the header to get the type. That call happens via
write_with_updates() in packed-backend.c.
If we wanted to be really pedantic, anything in refs/heads/ should not
point to a non-commit and thus should never need to be peeled. I'm not
sure if we want to embed that assumption in this code path, though
(nor would it necessarily help Martin's case if the refs are not in
refs/heads anyway).
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-26 4:45 ` Jeff King
@ 2025-12-26 17:15 ` brian m. carlson
2025-12-27 7:36 ` Jeff King
2025-12-31 5:48 ` Martin Fick
1 sibling, 1 reply; 7+ messages in thread
From: brian m. carlson @ 2025-12-26 17:15 UTC (permalink / raw)
To: Jeff King; +Cc: Martin Fick, git@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]
On 2025-12-26 at 04:45:07, Jeff King wrote:
> On Thu, Dec 25, 2025 at 11:38:30PM +0000, brian m. carlson wrote:
>
> > I think this is from `should_pack_ref`:
> >
> > /* Do not pack broken refs: */
> > if (!ref_resolves_to_object(ref->name, refs->base.repo, ref->oid, ref->flags))
> > return 0;
> >
> > So Git is going to need to verify that the object at least exists. I
> > don't know why we would need to _open_ them, however. Perhaps someone
> > else has ideas.
>
> The packed-refs file stores tag-peeling information. So pack-refs opens
> the object for any newly written ref via peel_object(), which has to at
> least read the header to get the type. That call happens via
> write_with_updates() in packed-backend.c.
>
> If we wanted to be really pedantic, anything in refs/heads/ should not
> point to a non-commit and thus should never need to be peeled. I'm not
> sure if we want to embed that assumption in this code path, though
> (nor would it necessarily help Martin's case if the refs are not in
> refs/heads anyway).
I don't think that would be a good idea. I know that people definitely
do updates of the loose refs by hand (although they should not) and so
it's entirely possible for them to contain invalid values, such as
having branches contain non-commit objects.
I wonder if reftable would avoid the need for this kind of expensive
check since it would already have the data peeled if need be and
wouldn't need to recompute the values.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-26 17:15 ` brian m. carlson
@ 2025-12-27 7:36 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2025-12-27 7:36 UTC (permalink / raw)
To: brian m. carlson; +Cc: Martin Fick, git@vger.kernel.org
On Fri, Dec 26, 2025 at 05:15:31PM +0000, brian m. carlson wrote:
> > If we wanted to be really pedantic, anything in refs/heads/ should not
> > point to a non-commit and thus should never need to be peeled. I'm not
> > sure if we want to embed that assumption in this code path, though
> > (nor would it necessarily help Martin's case if the refs are not in
> > refs/heads anyway).
>
> I don't think that would be a good idea. I know that people definitely
> do updates of the loose refs by hand (although they should not) and so
> it's entirely possible for them to contain invalid values, such as
> having branches contain non-commit objects.
Yeah, that matches my inclination.
> I wonder if reftable would avoid the need for this kind of expensive
> check since it would already have the data peeled if need be and
> wouldn't need to recompute the values.
It does the same amount of peeling, but it's amortized across more
operations (i.e., whatever did those ref updates in the first place)
rather than during the pack operation. And of course there really is no
pack operation per se with reftables, but I believe it avoids re-peeling
when rewriting entries during compaction.
It might actually do fewer object accesses overall if the ref-writing
operations have already loaded the objects in question (and thus it
knows whether they're tags or not, and may even have parsed tags in
memory). It can also do more in some cases (e.g., two loose writes will
peel for each write, whereas the files backend only bothers to peel
during packing).
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-25 23:38 ` brian m. carlson
2025-12-26 4:45 ` Jeff King
@ 2025-12-31 5:39 ` Martin Fick
1 sibling, 0 replies; 7+ messages in thread
From: Martin Fick @ 2025-12-31 5:39 UTC (permalink / raw)
To: brian m. carlson; +Cc: git@vger.kernel.org
>From: brian m. carlson
> Sent: Thursday, December 25, 2025 4:38 PM
> On 2025-12-25 at 22:13:54, Martin Fick wrote:
>> Although the repository is not in terrible shape before packing refs
>> (~1500 loose objects, 37pack files). Surprisingly, repacking the repo
>> first does speed it up so that packing refs then takes under 20s.
>>
>> This repository is on NFS.
> That's almost certainly part of your performance problem, too. Loading
> a single pack file and index is going to be way, way faster than making
> lots of network calls to open 37 pack file and 37 index files, plus at
> least stat some loose objects.
>
> I will note that at least some forges always have Git write pack files
> and try to avoid loose objects altogether since that almost always
> improves performance. You may want to set `receive.unpackLimit` to 1 to
> see if that helps in the general case.
This would not explain why jgit can pack-refs much faster, since it has to deal with these NFS latencies also. In my experience with jgit, we generally don't see performance issues unless packfile counts exceed 300 or so on NFS, definitely not with only 37 of them. It could be that git is doing some things less efficiently here, but I would be pretty surprised if git could not also perform well typically on NFS with only 37 packfiles. I don't think that 200 something object lookups, should ever take 3+mins, even on NFS, and even with way more packfiles than this. To be that slow, it would have to take about 1s per lookup! Something really seems fishy here to me. I have to think that somehow it isn't these reads that are slow, but I can't explain what else it would be?
-Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow git pack-refs --all
2025-12-26 4:45 ` Jeff King
2025-12-26 17:15 ` brian m. carlson
@ 2025-12-31 5:48 ` Martin Fick
1 sibling, 0 replies; 7+ messages in thread
From: Martin Fick @ 2025-12-31 5:48 UTC (permalink / raw)
To: Jeff King, brian m. carlson; +Cc: git@vger.kernel.org
> From: Jeff King <peff@peff.net>
> Sent: Thursday, December 25, 2025 9:45 PM
>
> On Thu, Dec 25, 2025 at 11:38:30PM +0000, brian m. carlson wrote:
>> I think this is from `should_pack_ref`:
>>
>> /* Do not pack broken refs: */
>> if (!ref_resolves_to_object(ref->name, refs->base.repo, ref->oid, ref->flags))
>> return 0;
>>
>> So Git is going to need to verify that the object at least exists. I
>> don't know why we would need to _open_ them, however. Perhaps someone
>> else has ideas.
>
>The packed-refs file stores tag-peeling information. So pack-refs opens
>the object for any newly written ref via peel_object(), which has to at
>least read the header to get the type. That call happens via
>write_with_updates() in packed-backend.c.
Thanks, this makes sense. However, since jgit needs to peel these objects also, it doesn't make sense to me that this would be the bottleneck unless git is doing something terribly inefficient here. :(
Except for the fact that repacking objects made it faster, my observations make it look like it's the writing that is actually slow, not the reads. Could there be too many small unbuffered writes, could this write path have missed being optimized (it likely isn't used elsewhere)?
-Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-31 5:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-25 22:13 Slow git pack-refs --all Martin Fick
2025-12-25 23:38 ` brian m. carlson
2025-12-26 4:45 ` Jeff King
2025-12-26 17:15 ` brian m. carlson
2025-12-27 7:36 ` Jeff King
2025-12-31 5:48 ` Martin Fick
2025-12-31 5:39 ` Martin Fick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).