From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Jeff King <peff@peff.net>, ZheNing Hu <adlternative@gmail.com>,
Git List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Christian Couder <christian.couder@gmail.com>,
johncai86@gmail.com
Subject: Re: Question: How to execute git-gc correctly on the git server
Date: Wed, 14 Dec 2022 15:11:57 -0500 [thread overview]
Message-ID: <Y5ouDcvjCaRlCGJf@nand.local> (raw)
In-Reply-To: <221208.86o7se6ou1.gmgdl@evledraar.gmail.com>
On Thu, Dec 08, 2022 at 01:35:04PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> The "cruft pack" facility does many different things, and my
> >> understanding of it is that GitHub's not using it only as an end-run
> >> around potential corruption issues, but that some not yet in tree
> >> patches on top of it allow more aggressive "gc" without the fear of
> >> corruption.
> >
> > I don't think cruft packs themselves help against corruption that much.
> > For many years, GitHub used "repack -k" to just never expire objects.
> > What cruft packs help with is:
> >
> > 1. They keep cruft objects out of the main pack, which reduces the
> > costs of lookups and bitmaps for the main pack.
Peff isn't wrong here, but there is a big caveat which is that this is
only true when using a single pack bitmap. Single pack bitmaps are
guaranteed to have reachability closure over their objects, but writing
a MIDX bitmap after generating the MIDX does not afford us the same
guarantees.
So if you have a cruft pack which contains some unreachable object X,
which is made reachable by some other object that *is* reachable from
some reference, *and that* object is included in one of the MIDX's
packs, then we won't have reachability closure unless we also bitmap the
cruft pack, too.
So even though it helps a lot with bitmapping in the single-pack case,
in practice it doesn't make a significant difference with multi-pack
bitmaps.
> > 2. When you _do_ choose to expire, you can do so without worrying
> > about accidentally exploding all of those old objects into loose
> > ones (which is not wrong from a correctness point of view, but can
> > have some amazingly bad performance characteristics).
> >
> > I think the bits you're thinking of on top are in v2.39. The "repack
> > --expire-to" option lets you write objects that _would_ be deleted into
> > a cruft pack, which can serve as a backup (but managing that is out of
> > scope for repack itself, so you have to roll your own strategy there).
>
> Yes, that's what I was referring to.
Yes, we use the `--expire-to` option when doing a pruning GC to move the
expired objects out of the repo to some "../backup.git" location. The
out-of-tree tools that Ævar is speculating is basically running
`cat-file --batch` in the backup repo, feeding it the list of missing
objects, and then writing those objects (back) into the GC'd repository.
> I think I had feedback on that series saying that if held correctly this
> would also nicely solve that long-time race. Maybe I'm just
> misremembering, but I (mis?)recalled that Taylor indicated that it was
> being used like that at GitHub.
It (the above) doesn't solve the race, but it does make it easier to
recover from a corrupt repository when we lose that race.
Thanks,
Taylor
prev parent reply other threads:[~2022-12-14 20:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-07 15:58 Question: How to execute git-gc correctly on the git server ZheNing Hu
2022-12-07 23:57 ` Ævar Arnfjörð Bjarmason
2022-12-08 1:16 ` Michal Suchánek
2022-12-08 7:01 ` Jeff King
2022-12-09 0:49 ` Michal Suchánek
2022-12-09 1:37 ` Jeff King
2022-12-09 7:26 ` ZheNing Hu
2022-12-09 13:48 ` Ævar Arnfjörð Bjarmason
2022-12-11 16:01 ` ZheNing Hu
2022-12-11 16:27 ` Michal Suchánek
2022-12-09 7:15 ` ZheNing Hu
2022-12-08 6:59 ` Jeff King
2022-12-08 12:35 ` Ævar Arnfjörð Bjarmason
2022-12-14 20:11 ` Taylor Blau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y5ouDcvjCaRlCGJf@nand.local \
--to=me@ttaylorr.com \
--cc=adlternative@gmail.com \
--cc=avarab@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.