From: Patrick Steinhardt <ps@pks.im>
To: Pierre Ossman <ossman@cendio.se>
Cc: Han Young <hanyang.tony@bytedance.com>, git@vger.kernel.org
Subject: Re: [External] git keeps recreating packs, exploding backup increments
Date: Fri, 21 Feb 2025 09:16:40 +0100 [thread overview]
Message-ID: <Z7g2aEpEboL5mvRa@pks.im> (raw)
In-Reply-To: <ba212d4e-32c5-472a-8604-2a2653bde17c@cendio.se>
On Thu, Feb 20, 2025 at 09:26:38AM +0100, Pierre Ossman wrote:
> On 20/02/2025 04:03, Han Young wrote:
> > On Wed, Feb 19, 2025 at 5:58 PM Pierre Ossman <ossman@cendio.se> wrote:
> > > We tried gc.bigPackThreshold in the hope it would force it to reuse
> > > packs better. But all we got instead was duplication. It still creates
> > > new packs with everything. It just stopped removing the old ones.
> >
> > Is the repo partially cloned? git-repack will always pack promisor
> > packs even if it's a keep pack. This patch would fix it
> > https://lore.kernel.org/git/2728513.vuYhMxLoTh@mintaka.ncbr.muni.cz/
> >
>
> Yes, the big offender is often partially cloned. So that could be part of
> it, thanks.
>
> But we're seeing it in other repositories as well. E.g. I have a long-lived
> TigerVNC repository where the biggest pack file is just one week old. In
> that case, it's merely 21 MiB, so it's not a practical issue. But it does
> show that git keeps replacing it.
>
> Anything I/we can do to shed more light on the issue?
Well, one of the interesting things to learn would be how often you end
up updating those repositories. You have discovered "gc.autoPackLimit"
already, which determines when exactly Git is going to repack existing
packfiles into one, and mentioned that it doesn't seem to help you. But
whether it does or doesn't help really depends on how frequently you
gain new packfiles in the impacted repositories.
When you have fast-moving repositories and developers fetch several
times per day, then it is quite likely that they accumulate multiple new
packfiles per day. And thus, it's not all that unexpected that you will
have to repack the whole repository rather regularly. If so, this is
working as designed. You can tune the parameters for how often Git will
do an all-into-one repack, but also have to keep in mind that the more
packfiles there are, the less efficient Git will in general be.
That being said, there is an alternative: Git nowadays doesn't use
git-gc(1) anymore to perform auto-maintenance, but instead it invokes
git-maintenance(1). And that command allows the user to pick what tasks
should be performed. By default it uses git-gc(1) under the hood indeed,
but you also ask it to not do so and instead use an alternative
mechanism to pack your objects.
The alternative would be the "incremental-repack" task. This task does
not use git-gc(1) with its incremental/all-into-one repack split, but it
instead uses git-multi-pack-index(1). git-maintenance(1) tweaks the
`--batch-size` parameter of `git multi-pack-index repack` so that it
typically doesn't have to repack the one large packfile, but combines at
least two smaller ones. I use a mechanism like that, which I've
configured as follows:
[maintenance "commit-graph"]
enabled = true
[maintenance "gc"]
enabled = false
[maintenance "incremental-repack"]
enabled = true
[maintenance "loose-objects"]
enabled = true
[maintenance "pack-refs"]
enabled = true
I think this strategy still isn't quite optimal, as nowadays we should
probably make use of `git repack --geometric` instead of manually
computing batch sizes. This would ensure that the packfiles present in
the repository form a geometric sequence regarding their size, so you
end up repacking the biggest packfile very infrequently. Such a task has
not been implemented yet, but it shouldn't be all that hard to do,
either.
Patrick
next prev parent reply other threads:[~2025-02-21 8:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 9:38 git keeps recreating packs, exploding backup increments Pierre Ossman
2025-02-20 3:03 ` [External] " Han Young
2025-02-20 8:26 ` Pierre Ossman
2025-02-21 8:16 ` Patrick Steinhardt [this message]
2025-02-24 9:10 ` Pierre Ossman
2025-05-09 10:27 ` Pierre Ossman
-- strict thread matches above, loose matches on Subject: below --
2025-05-11 18:03 OryAkerbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7g2aEpEboL5mvRa@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=ossman@cendio.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).