From: Victoria Dye <vdye@github.com>
To: Taylor Blau <me@ttaylorr.com>,
Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] repack: respect --keep-pack with geometric repack
Date: Fri, 20 May 2022 10:27:21 -0700 [thread overview]
Message-ID: <bb19159b-6d0c-a683-a58e-95ebdc128627@github.com> (raw)
In-Reply-To: <YofJLv8+x5J7yPmf@nand.local>
Taylor Blau wrote:
> Hi Victoria,
>
> On Fri, May 20, 2022 at 04:36:12PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Update 'repack' to ignore packs named on the command line with the
>> '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
>> command line-kept packs the same way it treats packs with an on-disk '.keep'
>> file (that is, skip the pack and do not include it in the 'geometry'
>> structure).
>>
>> Without this handling, a '--keep-pack' pack would be included in the
>> 'geometry' structure. If the pack is *before* the geometry split line (with
>> at least one other pack and/or loose objects present), 'repack' assumes the
>> pack's contents are "rolled up" into another pack via 'pack-objects'.
>> However, because the internally-invoked 'pack-objects' properly excludes
>> '--keep-pack' objects, any new pack it creates will not contain the kept
>> objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
>> it assumes 'pack-objects' created a new pack with its contents), resulting
>> in possible object loss and repository corruption.
>
> Nicely found and explained. Having discussed this fix with you already
> off-list, this approach (to treat kept packs as excluded from the list
> of packs in the `geometry` structure regardless of whether they are kept
> on disk or in-core) makes sense to me.
>
> I left a couple of small notes on the patch below, but since I have some
> patches that deal with a separate issue in the `git repack --geometric`
> code coming, do you want to combine forces (and I can send a
> lightly-reworked version of this patch as a part of my series)?
>
Works for me! I'm happy with all the suggested changes you noted below
(moving the 'string_list_sort' and cleaning up the test), so feel free to
include them in your series.
Thanks!
>> @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb)
>> return 0;
>> }
>>
>> -static void init_pack_geometry(struct pack_geometry **geometry_p)
>> +static void init_pack_geometry(struct pack_geometry **geometry_p,
>> + struct string_list *existing_kept_packs)
>> {
>> struct packed_git *p;
>> struct pack_geometry *geometry;
>> + struct strbuf buf = STRBUF_INIT;
>>
>> *geometry_p = xcalloc(1, sizeof(struct pack_geometry));
>> geometry = *geometry_p;
>>
>> + string_list_sort(existing_kept_packs);
>
> Would it be worth sorting this as early as in collect_pack_filenames()?
> For our purposes in this patch, this works as-is, but it may be
> defensive to try and minimize the time that list has unsorted contents.
>
I went back and forth on this, eventually settling on this to keep the
'string_list_sort' as close as possible to where the sorted list is needed.
I'm still pretty indifferent, though, so moving it to the end of
'collect_pack_filenames()' is fine with me.
>> for (p = get_all_packs(the_repository); p; p = p->next) {
>> - if (!pack_kept_objects && p->pack_keep)
>> - continue;
>> + if (!pack_kept_objects) {
>> + if (p->pack_keep)
>> + continue;
>
> (You mentioned this to me off-list, but I'll repeat it here since it
> wasn't obvious to me on first read): this check for `p->pack_keep` isn't
> strictly necessary, since any packs that have their `pack_keep` bit set
> will appear in the `existing_kept_packs` list.
>
> But it does give us a fast path to avoid having to check that list, so
> it's worth checking that bit to avoid a slightly more expensive check
> where possible.
>
>> + /*
>> + * The pack may be kept via the --keep-pack option;
>> + * check 'existing_kept_packs' to determine whether to
>> + * ignore it.
>> + */
>> + strbuf_reset(&buf);
>> + strbuf_addstr(&buf, pack_basename(p));
>> + strbuf_strip_suffix(&buf, ".pack");
>> +
>> + if (string_list_has_string(existing_kept_packs, buf.buf))
>> + continue;
>
> It's too bad that we have to do this check at all, and can't rely on the
> `pack_keep_in_core` in the same way as we check `p->pack_keep`. But
> lifting that restriction is a more invasive change, so I'm happy to
> rely on the contents of existing_kept_packs here in the meantime.
>
>> + }
>>
>> ALLOC_GROW(geometry->pack,
>> geometry->pack_nr + 1,
>> @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p)
>> }
>>
>> QSORT(geometry->pack, geometry->pack_nr, geometry_cmp);
>> + strbuf_release(&buf);
>> }
>>
>> static void split_pack_geometry(struct pack_geometry *geometry, int factor)
>> @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>> strbuf_release(&path);
>> }
>>
>> + packdir = mkpathdup("%s/pack", get_object_directory());
>> + packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> + packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>> +
>> + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
>> + &keep_pack_list);
>> +
>
> Makes sense; we have to initialize existing_kept_packs before arranging
> the list of packs for the `--geometric` split. And presumably
> `collect_pack_filenames()` relies on `packdir`, `packtmp_name`, and
> `packtmp` being setup ahead of time, too.
>
>> if (geometric_factor) {
>> if (pack_everything)
>> die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
>> - init_pack_geometry(&geometry);
>> + init_pack_geometry(&geometry, &existing_kept_packs);
>> split_pack_geometry(geometry, geometric_factor);
>> }
>>
>> - packdir = mkpathdup("%s/pack", get_object_directory());
>> - packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> - packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>> -
>> sigchain_push_common(remove_pack_on_signal);
>>
>> prepare_pack_objects(&cmd, &po_args);
>> @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>> if (use_delta_islands)
>> strvec_push(&cmd.args, "--delta-islands");
>>
>> - collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
>> - &keep_pack_list);
>> -
>> if (pack_everything & ALL_INTO_ONE) {
>> repack_promisor_objects(&po_args, &names);
>>
>> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
>> index bdbbcbf1eca..f5ac23413d5 100755
>> --- a/t/t7703-repack-geometric.sh
>> +++ b/t/t7703-repack-geometric.sh
>> @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' '
>> )
>> '
>>
>> +test_expect_success '--geometric ignores --keep-pack packs' '
>> + git init geometric &&
>> + test_when_finished "rm -fr geometric" &&
>> + (
>> + cd geometric &&
>> +
>> + # Create two equal-sized packs
>> + test_commit kept && # 3 objects
>> + test_commit pack && # 3 objects
>> +
>> + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF
>> + refs/tags/kept
>> + EOF
>> + ) &&
>> + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF
>> + refs/tags/pack
>> + ^refs/tags/kept
>> + EOF
>> + ) &&
>
> Nit; we don't care about the name of $PACK, so it would probably be fine
> to avoid storing the `PACK` variable. We could write these packs with
> just `git repack -d` after each `test_commit` (which would avoid us
> having to call `prune-packed`).
>
Makes sense.
> Does it matter which one is kept? I don't think so, since AFAICT the
> critical bit is that we mark one of the packs being rolled up as a
> `--keep-pack`.
>
Correct, the two packs in this test are just two same-sized (or, more
generally, non-geometrically progressing) packs with non-overlapping
content.
>> + # Prune loose objects that are now packed into PACK and KEEP
>> + git prune-packed &&
>> +
>> + git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out &&
>> +
>> + # Packs should not have changed (only one non-kept pack, no
>> + # loose objects), but midx should now exist.
>> + test_i18ngrep "Nothing new to pack" out &&
>
> Nit; test_i18ngrep here should just be "grep".
>
Thanks for pointing this out - I've been a bit unsure of the difference for
a while, but this pushed me to figure out the difference and I found the
note in 'test-lib-functions.sh' clarifying that 'test_i18ngrep' is
deprecated.
>> + test_path_is_file $midx &&
>> + test_path_is_file $objdir/pack/pack-$KEPT.pack &&
>> + git fsck
>> + )
>
>
> Thanks,
> Taylor
next prev parent reply other threads:[~2022-05-20 17:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 16:36 [PATCH] repack: respect --keep-pack with geometric repack Victoria Dye via GitGitGadget
2022-05-20 17:00 ` Taylor Blau
2022-05-20 17:27 ` Victoria Dye [this message]
2022-05-20 19:05 ` Taylor Blau
2022-05-20 20:41 ` Junio C Hamano
2022-05-20 22:12 ` Junio C Hamano
2022-05-20 22:20 ` Taylor Blau
2022-05-20 23:26 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb19159b-6d0c-a683-a58e-95ebdc128627@github.com \
--to=vdye@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.