git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
Date: Fri, 19 Jul 2024 09:21:51 -0400	[thread overview]
Message-ID: <118b164e-67c5-4dfc-b440-62b8986bf356@gmail.com> (raw)
In-Reply-To: <ZpmcK23coi5Qqm7E@nand.local>

On 7/18/24 6:50 PM, Taylor Blau wrote:
> On Thu, Jul 18, 2024 at 07:55:44PM +0000, Derrick Stolee via GitGitGadget wrote:
>> The issue is that 'git multi-pack-index repack' was taught to call 'git
>> pack-objects' with the new '--stdin-packs' option. However, this changes the
>> object selection algorithm. Instead of using the objects referenced by the
>> multi-pack-index, it compares pack-files using a list of "included" and
>> "excluded" pack-files. This loses some granularity of how the
>> multi-pack-index chooses among duplicate objects.
> 
> Thank you for looking at this so carefully! Let me double check my own
> understanding.
> 
> Suppose we have a MIDX with some pack 'P' and say, some commit object
> 'C' which appears in that pack. Let's also suppose we have another pack
> 'Q' in the same MIDX which also contains 'C', but the MIDX selected its
> copy from pack 'P'.
> 
> If we want to combine 'P' with some other packs (excluding 'Q'), then
> the input to --stdin-packs will look something like:
> 
>      P.pack
>      ^Q.pack
>      ...
> 
> And the resulting pack would not contain 'C', since we would reject it
> via: add_object_entry_from_pack() -> want_object_in_pack() ->
> want_found_object() -> has_object_kept_pack(). The final function there
> would find a copy of 'C' in 'Q', and since 'Q' is excluded, we would
> reject 'C' as unwanted.
> 
> So the resulting pack would not contain 'C', and the MIDX would hold
> onto its copy from pack 'P', resulting in 'P' being both (a) in the set
> of packs to repack together, but also (b) non-expireable, since it has
> at least one object selected from it in the MIDX.
> 
>> The end result is that some objects that would normally have been included
>> in the new pack-file are no longer included. The copy that the
>> multi-pack-index references is in the pack-file that was intended to be
>> repacked, so that pack-file cannot be expired in the next 'git
>> multi-pack-index expire' step and is included again in the batch of objects
>> to repack.
> 
> I think this matches my own understanding, but let me know if I'm
> missing something. Assuming I'm thinking about this the same way you
> are, the fix (stop using --stdin-packss) makes sense to me.

Your interpretation matches mine. Thanks for the careful read.

I think we can accomplish similar goals that match the reasoning for
--stdin-packs (better deltas while also limiting the object walk to the 
repacked objects) with some changes to read_object_list_from_stdin(),
but that's a more subtle kind of change.

Taking this change as-is will cause a regression in the quality of
delta choices, but recovers our ability to expire "repacked" pack-files
in all cases.

Thanks,
-Stolee

  reply	other threads:[~2024-07-19 13:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 1/2] t5319: add failing test case for repack/expire Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 2/2] midx-write: revert use of --stdin-packs Derrick Stolee via GitGitGadget
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
2024-07-18 22:38   ` Taylor Blau
2024-07-19 13:23     ` Derrick Stolee
2024-07-19 13:24   ` Derrick Stolee
2024-07-19 15:13     ` Junio C Hamano
2024-07-19 16:20       ` Derrick Stolee
2024-07-18 22:50 ` Taylor Blau
2024-07-19 13:21   ` Derrick Stolee [this message]
2024-07-19 13:38     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=118b164e-67c5-4dfc-b440-62b8986bf356@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).