From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
Date: Tue, 3 Oct 2023 17:52:29 -0400 [thread overview]
Message-ID: <ZRyNHRf+tQwV+T6P@nand.local> (raw)
In-Reply-To: <20231003201617.GE1562@coredump.intra.peff.net>
On Tue, Oct 03, 2023 at 04:16:17PM -0400, Jeff King wrote:
> On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
>
> > Note that this behavior is usually just a performance regression. But
> > it's possible it could be a correctness issue.
> >
> > Suppose an object was duplicated among the cruft and non-cruft pack. The
> > MIDX will pick the one from the pack with the lowest mtime, which will
> > always be the cruft one. But if the non-cruft pack happens to sort
> > earlier in lexical order, we'll treat that one as preferred, but not all
> > duplicates will be resolved in favor of that pack.
> >
> > So if we happened to have an object which appears in both packs
> > (e.g., due to a cruft object being freshened, causing it to appear
> > loose, and then repacking it via the `--geometric` repack) it's possible
> > the duplicate would be picked from the non-preferred pack.
>
> I'm not sure I understand how that is a correctness issue. The contents
> of the object are the same in either pack. Or do you mean that the
> pack-reuse code in pack-objects.c may get confused and try to use the
> wrong pack/offset when sending the object out? I would think it would
> always be coming from the preferred pack for that code (so the outcome
> is just that we fail to do the pack-reuse optimization very well, but we
> don't generate a wrong answer).
Admittedly I'm not 100% sure of my interpretation here either, since I
wrote most of this patch's log message nearly two years ago ;-).
I think the issue was as follows:
- you have an object duplicated among some cruft and non-cruft pack
- the cruft pack happens to have an older mtime than the non-cruft one
- the repack reused no non-cruft packs, which (before this patch)
would let the MIDX avoid picking a preferred pack
- if the non-cruft pack happens to sort ahead of the cruft one in
lexical order, it'll appear in the first few bits of the bitmap's
ordering, causing us to treat it as if it were the preferred pack
- ...but the MIDX will resolve duplicate objects in favor of the
oldest pack when neither are preferred, causing us to pick a
duplicate from the non-"preferred" pack.
(The last point is what causes the pack-bitmap reading code to get
confused, since it expects and makes assumption based on resolving ties
in favor of the preferred pack.)
That feels right to me, but admittedly my memory is pretty hazy on those
bitmap bugs. However, I don't think that this is an issue anymore, since
this patch was written before we had 177c0d6e63 (midx: infer preferred
pack when not given one, 2021-08-31) in GitHub's private fork, and the
patch message here incorrectly refers to statements about GitHub's fork,
not git.git.
But since we have 177c0d6e63 in git.git, this paragraph can and should
be dropped as finding ourselves in that situation is impossible, since
we would infer a preferred pack when not given one, and resolve
duplicates in favor of it.
That makes this purely a performance issue.
> Other than that, the explanation and patch make perfect sense to me, and
> the patch looks good.
Thanks!
> -Peff
Thanks,
Taylor
next prev parent reply other threads:[~2023-10-03 21:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 16:27 [PATCH] builtin/repack.c: avoid making cruft packs preferred Taylor Blau
2023-10-03 16:30 ` Taylor Blau
2023-10-03 20:20 ` Junio C Hamano
2023-10-03 21:39 ` Taylor Blau
2023-10-03 20:16 ` Jeff King
2023-10-03 21:52 ` Taylor Blau [this message]
2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
2023-10-04 13:13 ` Jeff King
2023-10-05 11:14 ` Patrick Steinhardt
2023-10-05 20:27 ` Eric Sunshine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRyNHRf+tQwV+T6P@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).