All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <chriscool@tuxfamily.org>
Cc: Han Young <hanyang.tony@bytedance.com>,
	trnka@scm.com,  git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects
Date: Thu, 08 May 2025 08:36:23 -0700	[thread overview]
Message-ID: <xmqqwmar6rns.fsf@gitster.g> (raw)
In-Reply-To: <20250508104437.51513-1-hanyang.tony@bytedance.com> (Han Young's message of "Thu, 8 May 2025 18:44:37 +0800")

Han Young <hanyang.tony@bytedance.com> writes:

> From: Tomáš Trnka <trnka@scm.com>
>
> git-repack currently does not pass --keep-pack or --honor-pack-keep to
> the git-pack-objects handling promisor packs. This means that settings
> like gc.bigPackThreshold are completely ignored for promisor packs.
>
> The simple fix is to just skip keep packs when iterating promisor packs.
>
> Signed-off-by: Han Young <hanyang.tony@bytedance.com>
> Original-patch-by: Tomáš Trnka <trnka@scm.com>
> ---
> The original patch passes the --keep-pack arguments to pack-objects. While
> this approach avoids repacking the keep packs, the objects in the keep
> packs are still enumerated. These objects are iterated in repack and sent
> to be packed, only to be skipped in pack-objects.
>
> Filtering out the keep packs in repack_promisor_objects is more efficient,
> the objects in keep packs are not enumerated and sent to pack-objects.

Christian, I thought you have also been active around promisor area
these days, so let us pick your brain for this one, too.

Thanks.

>
> In a test repo with a 2.7G promisor pack. The original patch took longer to
> repack the repo, most of the time was spent on enumerating objects.
>
> The original patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 19091575, done.
> Counting objects: 100% (5934/5934), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (4517/4517), done.
> Writing objects: 100% (5934/5934), done.
> Total 5934 (delta 2885), reused 2986 (delta 1192), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  43.94s user 46.83s system 134% cpu 1:07.60 total
>
> This patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 20952, done.
> Counting objects: 100% (18323/18323), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (9022/9022), done.
> Writing objects: 100% (18323/18323), done.
> Total 18323 (delta 11340), reused 15753 (delta 8771), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  8.65s user 3.14s system 62% cpu 18.838 total
>
> I also noticed that this patch produced a smaller packfile.
> 1.6M compared to 4.5M generated by the original patch.
> I suspect this is because some objects exist in multiple packs,
> so pack-objects still repacks them even if they are actually present
> in keep-packs.
> ---
>  builtin/repack.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 59214dbdfd..fb09df53c0 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -252,8 +252,9 @@ static void existing_packs_release(struct existing_packs *existing)
>  /*
>   * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
>   * or packs->kept based on whether each pack has a corresponding
> - * .keep file or not.  Packs without a .keep file are not to be kept
> - * if we are going to pack everything into one file.
> + * .keep file or not.  Packs without a .keep file or specified by
> + * --keep-pack are not to be kept if we are going to pack everything
> + * into one file.
>   */
>  static void collect_pack_filenames(struct existing_packs *existing,
>  				   const struct string_list *extra_keep)
> @@ -278,8 +279,15 @@ static void collect_pack_filenames(struct existing_packs *existing,
>  		strbuf_addstr(&buf, base);
>  		strbuf_strip_suffix(&buf, ".pack");
>  
> -		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
> +		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep) {
>  			string_list_append(&existing->kept_packs, buf.buf);
> +			/*
> +			 * mark packs specified by --keep-pack as keep in core, keep packs
> +			 * are ignored by repack_promisor_objects. This avoids passthrough
> +			 * --keep-pack args to pack-objects
> +			 */
> +			p->pack_keep_in_core = 1;
> +		}
>  		else if (p->is_cruft)
>  			string_list_append(&existing->cruft_packs, buf.buf);
>  		else
> @@ -412,7 +420,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
>  	 * of a {type -> size} ordering, which may produce better deltas.
>  	 */
>  	for_each_packed_object(the_repository, write_oid, &cmd,
> -			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY |
> +				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
> +				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS);
>  
>  	if (cmd.in == -1) {
>  		/* No packed objects; cmd was never started */

  reply	other threads:[~2025-05-08 15:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-08 10:44 [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects Han Young
2025-05-08 15:36 ` Junio C Hamano [this message]
2025-05-21  8:31 ` Han Young

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwmar6rns.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=trnka@scm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.