git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git keeps recreating packs, exploding backup increments
@ 2025-02-19  9:38 Pierre Ossman
  2025-02-20  3:03 ` [External] " Han Young
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Ossman @ 2025-02-19  9:38 UTC (permalink / raw)
  To: git

Hi,

I'm trying to understand git's repacking behaviour, as the observed 
behaviour doesn't match how I read the documentation or the code.

The problem we see is excessive backup increments for developer 
directories. The cause is that pack files keep getting regenerated for 
large repositories.

Users are not running 'git gc' manually, so the assumption is that this 
is caused by 'git gc --auto' being run implicitly.

 From what I can see in the code, and the documentation, it should only 
pack up objects not already found in existing packs. Or at the very 
least, not the objects found in the largest existing pack.

(at least not until gc.autoPackLimit is hit)

But this isn't happening. Old packs are constantly being replaced by new 
ones. Despite most of the objects being old and stable.

We tried gc.bigPackThreshold in the hope it would force it to reuse 
packs better. But all we got instead was duplication. It still creates 
new packs with everything. It just stopped removing the old ones.

Some guidance would be appreciated. I cannot find anything in the code 
or documentation that explains the current behaviour.

Regads,
-- 
Pierre Ossman           Software Development
Cendio AB               https://cendio.com
Teknikringen 8          https://twitter.com/ThinLinc
583 30 Linköping        https://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
  2025-02-19  9:38 git keeps recreating packs, exploding backup increments Pierre Ossman
@ 2025-02-20  3:03 ` Han Young
  2025-02-20  8:26   ` Pierre Ossman
  0 siblings, 1 reply; 7+ messages in thread
From: Han Young @ 2025-02-20  3:03 UTC (permalink / raw)
  To: Pierre Ossman; +Cc: git

On Wed, Feb 19, 2025 at 5:58 PM Pierre Ossman <ossman@cendio.se> wrote:
> We tried gc.bigPackThreshold in the hope it would force it to reuse
> packs better. But all we got instead was duplication. It still creates
> new packs with everything. It just stopped removing the old ones.

Is the repo partially cloned? git-repack will always pack promisor
packs even if it's a keep pack. This patch would fix it
https://lore.kernel.org/git/2728513.vuYhMxLoTh@mintaka.ncbr.muni.cz/

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
  2025-02-20  3:03 ` [External] " Han Young
@ 2025-02-20  8:26   ` Pierre Ossman
  2025-02-21  8:16     ` Patrick Steinhardt
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Ossman @ 2025-02-20  8:26 UTC (permalink / raw)
  To: Han Young; +Cc: git

On 20/02/2025 04:03, Han Young wrote:
> On Wed, Feb 19, 2025 at 5:58 PM Pierre Ossman <ossman@cendio.se> wrote:
>> We tried gc.bigPackThreshold in the hope it would force it to reuse
>> packs better. But all we got instead was duplication. It still creates
>> new packs with everything. It just stopped removing the old ones.
> 
> Is the repo partially cloned? git-repack will always pack promisor
> packs even if it's a keep pack. This patch would fix it
> https://lore.kernel.org/git/2728513.vuYhMxLoTh@mintaka.ncbr.muni.cz/
> 

Yes, the big offender is often partially cloned. So that could be part 
of it, thanks.

But we're seeing it in other repositories as well. E.g. I have a 
long-lived TigerVNC repository where the biggest pack file is just one 
week old. In that case, it's merely 21 MiB, so it's not a practical 
issue. But it does show that git keeps replacing it.

Anything I/we can do to shed more light on the issue?

Regards,
-- 
Pierre Ossman           Software Development
Cendio AB               https://cendio.com
Teknikringen 8          https://twitter.com/ThinLinc
583 30 Linköping        https://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
  2025-02-20  8:26   ` Pierre Ossman
@ 2025-02-21  8:16     ` Patrick Steinhardt
  2025-02-24  9:10       ` Pierre Ossman
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2025-02-21  8:16 UTC (permalink / raw)
  To: Pierre Ossman; +Cc: Han Young, git

On Thu, Feb 20, 2025 at 09:26:38AM +0100, Pierre Ossman wrote:
> On 20/02/2025 04:03, Han Young wrote:
> > On Wed, Feb 19, 2025 at 5:58 PM Pierre Ossman <ossman@cendio.se> wrote:
> > > We tried gc.bigPackThreshold in the hope it would force it to reuse
> > > packs better. But all we got instead was duplication. It still creates
> > > new packs with everything. It just stopped removing the old ones.
> > 
> > Is the repo partially cloned? git-repack will always pack promisor
> > packs even if it's a keep pack. This patch would fix it
> > https://lore.kernel.org/git/2728513.vuYhMxLoTh@mintaka.ncbr.muni.cz/
> > 
> 
> Yes, the big offender is often partially cloned. So that could be part of
> it, thanks.
> 
> But we're seeing it in other repositories as well. E.g. I have a long-lived
> TigerVNC repository where the biggest pack file is just one week old. In
> that case, it's merely 21 MiB, so it's not a practical issue. But it does
> show that git keeps replacing it.
> 
> Anything I/we can do to shed more light on the issue?

Well, one of the interesting things to learn would be how often you end
up updating those repositories. You have discovered "gc.autoPackLimit"
already, which determines when exactly Git is going to repack existing
packfiles into one, and mentioned that it doesn't seem to help you. But
whether it does or doesn't help really depends on how frequently you
gain new packfiles in the impacted repositories.

When you have fast-moving repositories and developers fetch several
times per day, then it is quite likely that they accumulate multiple new
packfiles per day. And thus, it's not all that unexpected that you will
have to repack the whole repository rather regularly. If so, this is
working as designed. You can tune the parameters for how often Git will
do an all-into-one repack, but also have to keep in mind that the more
packfiles there are, the less efficient Git will in general be.

That being said, there is an alternative: Git nowadays doesn't use
git-gc(1) anymore to perform auto-maintenance, but instead it invokes
git-maintenance(1). And that command allows the user to pick what tasks
should be performed. By default it uses git-gc(1) under the hood indeed,
but you also ask it to not do so and instead use an alternative
mechanism to pack your objects.

The alternative would be the "incremental-repack" task. This task does
not use git-gc(1) with its incremental/all-into-one repack split, but it
instead uses git-multi-pack-index(1). git-maintenance(1) tweaks the
`--batch-size` parameter of `git multi-pack-index repack` so that it
typically doesn't have to repack the one large packfile, but combines at
least two smaller ones. I use a mechanism like that, which I've
configured as follows:

    [maintenance "commit-graph"]
        enabled = true
    [maintenance "gc"]
        enabled = false
    [maintenance "incremental-repack"]
        enabled = true
    [maintenance "loose-objects"]
        enabled = true
    [maintenance "pack-refs"]
        enabled = true

I think this strategy still isn't quite optimal, as nowadays we should
probably make use of `git repack --geometric` instead of manually
computing batch sizes. This would ensure that the packfiles present in
the repository form a geometric sequence regarding their size, so you
end up repacking the biggest packfile very infrequently. Such a task has
not been implemented yet, but it shouldn't be all that hard to do,
either.

Patrick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
  2025-02-21  8:16     ` Patrick Steinhardt
@ 2025-02-24  9:10       ` Pierre Ossman
  2025-05-09 10:27         ` Pierre Ossman
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Ossman @ 2025-02-24  9:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Han Young, git

On 21/02/2025 09:16, Patrick Steinhardt wrote:
>>
>> Anything I/we can do to shed more light on the issue?
> 
> Well, one of the interesting things to learn would be how often you end
> up updating those repositories. You have discovered "gc.autoPackLimit"
> already, which determines when exactly Git is going to repack existing
> packfiles into one, and mentioned that it doesn't seem to help you. But
> whether it does or doesn't help really depends on how frequently you
> gain new packfiles in the impacted repositories.
> 
> When you have fast-moving repositories and developers fetch several
> times per day, then it is quite likely that they accumulate multiple new
> packfiles per day. And thus, it's not all that unexpected that you will
> have to repack the whole repository rather regularly. If so, this is
> working as designed. You can tune the parameters for how often Git will
> do an all-into-one repack, but also have to keep in mind that the more
> packfiles there are, the less efficient Git will in general be.
> 

I don't think the most problematic repo should be moving that fast. But 
I might be wrong. We've reverted all settings to default, and we'll try 
to keep an eye on what happens to the pack files to gain more understanding.

> That being said, there is an alternative: Git nowadays doesn't use
> git-gc(1) anymore to perform auto-maintenance, but instead it invokes
> git-maintenance(1). And that command allows the user to pick what tasks
> should be performed. By default it uses git-gc(1) under the hood indeed,
> but you also ask it to not do so and instead use an alternative
> mechanism to pack your objects.
> 

Thanks. This is definitely something we can try. We'll observe the 
system for now, to establish a new baseline. Then we'll try some of 
these settings and see how it affect things.

Regards,
-- 
Pierre Ossman           Software Development
Cendio AB               https://cendio.com
Teknikringen 8          https://twitter.com/ThinLinc
583 30 Linköping        https://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
  2025-02-24  9:10       ` Pierre Ossman
@ 2025-05-09 10:27         ` Pierre Ossman
  0 siblings, 0 replies; 7+ messages in thread
From: Pierre Ossman @ 2025-05-09 10:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Han Young, git

Following up on this old thread.

I think the entire thing might be a false diagnostic on our part, and 
git's default behaviour is working just fine for us.

We initially starting looking at this because the backups exploded in 
size. But git is complex enough that we had a hard time examining things 
retroactively. Instead, we started monitoring things going forward to 
find likely causes.

What we did was to keep an eye on how new the files were in 
.git/objects/pack. And we kept seeing that the large packs were brand 
spanking new. Hence, the attempts at reconfiguring git, and the thread here.

But it seems we didn't look close enough. Although the timestamps 
suggest that the packs are constantly being modified, the names and 
contents actually stay the same.

I've been keeping a closer eye on a couple of active repositories, and 
we aren't actually seeing any excessive growth in size. But the largest 
pack files always have a very current modification time.

No idea what caused that initial spike in backup storage. We'll have to 
revisit that if it happens again.

Regards,
-- 
Pierre Ossman           Software Development
Cendio AB               https://cendio.com
Teknikringen 8          https://twitter.com/ThinLinc
583 30 Linköping        https://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] git keeps recreating packs, exploding backup increments
@ 2025-05-11 18:03 OryAkerbay
  0 siblings, 0 replies; 7+ messages in thread
From: OryAkerbay @ 2025-05-11 18:03 UTC (permalink / raw)
  To: ossman; +Cc: git, hanyang.tony, ps


Sent from my iPhone

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-11 18:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19  9:38 git keeps recreating packs, exploding backup increments Pierre Ossman
2025-02-20  3:03 ` [External] " Han Young
2025-02-20  8:26   ` Pierre Ossman
2025-02-21  8:16     ` Patrick Steinhardt
2025-02-24  9:10       ` Pierre Ossman
2025-05-09 10:27         ` Pierre Ossman
  -- strict thread matches above, loose matches on Subject: below --
2025-05-11 18:03 OryAkerbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).