git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <chriscool@tuxfamily.org>
Cc: Han Young <hanyang.tony@bytedance.com>,
	trnka@scm.com,  git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects
Date: Thu, 08 May 2025 08:36:23 -0700	[thread overview]
Message-ID: <xmqqwmar6rns.fsf@gitster.g> (raw)
In-Reply-To: <20250508104437.51513-1-hanyang.tony@bytedance.com> (Han Young's message of "Thu, 8 May 2025 18:44:37 +0800")

Han Young <hanyang.tony@bytedance.com> writes:

> From: Tomáš Trnka <trnka@scm.com>
>
> git-repack currently does not pass --keep-pack or --honor-pack-keep to
> the git-pack-objects handling promisor packs. This means that settings
> like gc.bigPackThreshold are completely ignored for promisor packs.
>
> The simple fix is to just skip keep packs when iterating promisor packs.
>
> Signed-off-by: Han Young <hanyang.tony@bytedance.com>
> Original-patch-by: Tomáš Trnka <trnka@scm.com>
> ---
> The original patch passes the --keep-pack arguments to pack-objects. While
> this approach avoids repacking the keep packs, the objects in the keep
> packs are still enumerated. These objects are iterated in repack and sent
> to be packed, only to be skipped in pack-objects.
>
> Filtering out the keep packs in repack_promisor_objects is more efficient,
> the objects in keep packs are not enumerated and sent to pack-objects.

Christian, I thought you have also been active around promisor area
these days, so let us pick your brain for this one, too.

Thanks.

>
> In a test repo with a 2.7G promisor pack. The original patch took longer to
> repack the repo, most of the time was spent on enumerating objects.
>
> The original patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 19091575, done.
> Counting objects: 100% (5934/5934), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (4517/4517), done.
> Writing objects: 100% (5934/5934), done.
> Total 5934 (delta 2885), reused 2986 (delta 1192), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  43.94s user 46.83s system 134% cpu 1:07.60 total
>
> This patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 20952, done.
> Counting objects: 100% (18323/18323), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (9022/9022), done.
> Writing objects: 100% (18323/18323), done.
> Total 18323 (delta 11340), reused 15753 (delta 8771), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  8.65s user 3.14s system 62% cpu 18.838 total
>
> I also noticed that this patch produced a smaller packfile.
> 1.6M compared to 4.5M generated by the original patch.
> I suspect this is because some objects exist in multiple packs,
> so pack-objects still repacks them even if they are actually present
> in keep-packs.
> ---
>  builtin/repack.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 59214dbdfd..fb09df53c0 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -252,8 +252,9 @@ static void existing_packs_release(struct existing_packs *existing)
>  /*
>   * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
>   * or packs->kept based on whether each pack has a corresponding
> - * .keep file or not.  Packs without a .keep file are not to be kept
> - * if we are going to pack everything into one file.
> + * .keep file or not.  Packs without a .keep file or specified by
> + * --keep-pack are not to be kept if we are going to pack everything
> + * into one file.
>   */
>  static void collect_pack_filenames(struct existing_packs *existing,
>  				   const struct string_list *extra_keep)
> @@ -278,8 +279,15 @@ static void collect_pack_filenames(struct existing_packs *existing,
>  		strbuf_addstr(&buf, base);
>  		strbuf_strip_suffix(&buf, ".pack");
>  
> -		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
> +		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep) {
>  			string_list_append(&existing->kept_packs, buf.buf);
> +			/*
> +			 * mark packs specified by --keep-pack as keep in core, keep packs
> +			 * are ignored by repack_promisor_objects. This avoids passthrough
> +			 * --keep-pack args to pack-objects
> +			 */
> +			p->pack_keep_in_core = 1;
> +		}
>  		else if (p->is_cruft)
>  			string_list_append(&existing->cruft_packs, buf.buf);
>  		else
> @@ -412,7 +420,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
>  	 * of a {type -> size} ordering, which may produce better deltas.
>  	 */
>  	for_each_packed_object(the_repository, write_oid, &cmd,
> -			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY |
> +				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
> +				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS);
>  
>  	if (cmd.in == -1) {
>  		/* No packed objects; cmd was never started */

  reply	other threads:[~2025-05-08 15:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-08 10:44 [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects Han Young
2025-05-08 15:36 ` Junio C Hamano [this message]
2025-05-21  8:31 ` Han Young

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwmar6rns.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=trnka@scm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).