git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
Date: Thu, 18 Jul 2024 18:50:19 -0400	[thread overview]
Message-ID: <ZpmcK23coi5Qqm7E@nand.local> (raw)
In-Reply-To: <pull.1764.git.1721332546.gitgitgadget@gmail.com>

On Thu, Jul 18, 2024 at 07:55:44PM +0000, Derrick Stolee via GitGitGadget wrote:
> The issue is that 'git multi-pack-index repack' was taught to call 'git
> pack-objects' with the new '--stdin-packs' option. However, this changes the
> object selection algorithm. Instead of using the objects referenced by the
> multi-pack-index, it compares pack-files using a list of "included" and
> "excluded" pack-files. This loses some granularity of how the
> multi-pack-index chooses among duplicate objects.

Thank you for looking at this so carefully! Let me double check my own
understanding.

Suppose we have a MIDX with some pack 'P' and say, some commit object
'C' which appears in that pack. Let's also suppose we have another pack
'Q' in the same MIDX which also contains 'C', but the MIDX selected its
copy from pack 'P'.

If we want to combine 'P' with some other packs (excluding 'Q'), then
the input to --stdin-packs will look something like:

    P.pack
    ^Q.pack
    ...

And the resulting pack would not contain 'C', since we would reject it
via: add_object_entry_from_pack() -> want_object_in_pack() ->
want_found_object() -> has_object_kept_pack(). The final function there
would find a copy of 'C' in 'Q', and since 'Q' is excluded, we would
reject 'C' as unwanted.

So the resulting pack would not contain 'C', and the MIDX would hold
onto its copy from pack 'P', resulting in 'P' being both (a) in the set
of packs to repack together, but also (b) non-expireable, since it has
at least one object selected from it in the MIDX.

> The end result is that some objects that would normally have been included
> in the new pack-file are no longer included. The copy that the
> multi-pack-index references is in the pack-file that was intended to be
> repacked, so that pack-file cannot be expired in the next 'git
> multi-pack-index expire' step and is included again in the batch of objects
> to repack.

I think this matches my own understanding, but let me know if I'm
missing something. Assuming I'm thinking about this the same way you
are, the fix (stop using --stdin-packss) makes sense to me.

Thanks,
Taylor

  parent reply	other threads:[~2024-07-18 22:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 1/2] t5319: add failing test case for repack/expire Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 2/2] midx-write: revert use of --stdin-packs Derrick Stolee via GitGitGadget
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
2024-07-18 22:38   ` Taylor Blau
2024-07-19 13:23     ` Derrick Stolee
2024-07-19 13:24   ` Derrick Stolee
2024-07-19 15:13     ` Junio C Hamano
2024-07-19 16:20       ` Derrick Stolee
2024-07-18 22:50 ` Taylor Blau [this message]
2024-07-19 13:21   ` Derrick Stolee
2024-07-19 13:38     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZpmcK23coi5Qqm7E@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).