git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects
@ 2025-05-08 10:44 Han Young
  2025-05-08 15:36 ` Junio C Hamano
  2025-05-21  8:31 ` Han Young
  0 siblings, 2 replies; 3+ messages in thread
From: Han Young @ 2025-05-08 10:44 UTC (permalink / raw)
  To: trnka; +Cc: git, jonathantanmy, Han Young

From: Tomáš Trnka <trnka@scm.com>

git-repack currently does not pass --keep-pack or --honor-pack-keep to
the git-pack-objects handling promisor packs. This means that settings
like gc.bigPackThreshold are completely ignored for promisor packs.

The simple fix is to just skip keep packs when iterating promisor packs.

Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Original-patch-by: Tomáš Trnka <trnka@scm.com>
---
The original patch passes the --keep-pack arguments to pack-objects. While
this approach avoids repacking the keep packs, the objects in the keep
packs are still enumerated. These objects are iterated in repack and sent
to be packed, only to be skipped in pack-objects.

Filtering out the keep packs in repack_promisor_objects is more efficient,
the objects in keep packs are not enumerated and sent to pack-objects.

In a test repo with a 2.7G promisor pack. The original patch took longer to
repack the repo, most of the time was spent on enumerating objects.

The original patch:

$ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
Enumerating objects: 19091575, done.
Counting objects: 100% (5934/5934), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4517/4517), done.
Writing objects: 100% (5934/5934), done.
Total 5934 (delta 2885), reused 2986 (delta 1192), pack-reused 0 (from 0)
git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  43.94s user 46.83s system 134% cpu 1:07.60 total

This patch:

$ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
Enumerating objects: 20952, done.
Counting objects: 100% (18323/18323), done.
Delta compression using up to 12 threads
Compressing objects: 100% (9022/9022), done.
Writing objects: 100% (18323/18323), done.
Total 18323 (delta 11340), reused 15753 (delta 8771), pack-reused 0 (from 0)
git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  8.65s user 3.14s system 62% cpu 18.838 total

I also noticed that this patch produced a smaller packfile.
1.6M compared to 4.5M generated by the original patch.
I suspect this is because some objects exist in multiple packs,
so pack-objects still repacks them even if they are actually present
in keep-packs.
---
 builtin/repack.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 59214dbdfd..fb09df53c0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -252,8 +252,9 @@ static void existing_packs_release(struct existing_packs *existing)
 /*
  * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
  * or packs->kept based on whether each pack has a corresponding
- * .keep file or not.  Packs without a .keep file are not to be kept
- * if we are going to pack everything into one file.
+ * .keep file or not.  Packs without a .keep file or specified by
+ * --keep-pack are not to be kept if we are going to pack everything
+ * into one file.
  */
 static void collect_pack_filenames(struct existing_packs *existing,
 				   const struct string_list *extra_keep)
@@ -278,8 +279,15 @@ static void collect_pack_filenames(struct existing_packs *existing,
 		strbuf_addstr(&buf, base);
 		strbuf_strip_suffix(&buf, ".pack");
 
-		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
+		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep) {
 			string_list_append(&existing->kept_packs, buf.buf);
+			/*
+			 * mark packs specified by --keep-pack as keep in core, keep packs
+			 * are ignored by repack_promisor_objects. This avoids passthrough
+			 * --keep-pack args to pack-objects
+			 */
+			p->pack_keep_in_core = 1;
+		}
 		else if (p->is_cruft)
 			string_list_append(&existing->cruft_packs, buf.buf);
 		else
@@ -412,7 +420,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 	 * of a {type -> size} ordering, which may produce better deltas.
 	 */
 	for_each_packed_object(the_repository, write_oid, &cmd,
-			       FOR_EACH_OBJECT_PROMISOR_ONLY);
+			       FOR_EACH_OBJECT_PROMISOR_ONLY |
+				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
+				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS);
 
 	if (cmd.in == -1) {
 		/* No packed objects; cmd was never started */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects
  2025-05-08 10:44 [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects Han Young
@ 2025-05-08 15:36 ` Junio C Hamano
  2025-05-21  8:31 ` Han Young
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2025-05-08 15:36 UTC (permalink / raw)
  To: Christian Couder; +Cc: Han Young, trnka, git, jonathantanmy

Han Young <hanyang.tony@bytedance.com> writes:

> From: Tomáš Trnka <trnka@scm.com>
>
> git-repack currently does not pass --keep-pack or --honor-pack-keep to
> the git-pack-objects handling promisor packs. This means that settings
> like gc.bigPackThreshold are completely ignored for promisor packs.
>
> The simple fix is to just skip keep packs when iterating promisor packs.
>
> Signed-off-by: Han Young <hanyang.tony@bytedance.com>
> Original-patch-by: Tomáš Trnka <trnka@scm.com>
> ---
> The original patch passes the --keep-pack arguments to pack-objects. While
> this approach avoids repacking the keep packs, the objects in the keep
> packs are still enumerated. These objects are iterated in repack and sent
> to be packed, only to be skipped in pack-objects.
>
> Filtering out the keep packs in repack_promisor_objects is more efficient,
> the objects in keep packs are not enumerated and sent to pack-objects.

Christian, I thought you have also been active around promisor area
these days, so let us pick your brain for this one, too.

Thanks.

>
> In a test repo with a 2.7G promisor pack. The original patch took longer to
> repack the repo, most of the time was spent on enumerating objects.
>
> The original patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 19091575, done.
> Counting objects: 100% (5934/5934), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (4517/4517), done.
> Writing objects: 100% (5934/5934), done.
> Total 5934 (delta 2885), reused 2986 (delta 1192), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  43.94s user 46.83s system 134% cpu 1:07.60 total
>
> This patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 20952, done.
> Counting objects: 100% (18323/18323), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (9022/9022), done.
> Writing objects: 100% (18323/18323), done.
> Total 18323 (delta 11340), reused 15753 (delta 8771), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  8.65s user 3.14s system 62% cpu 18.838 total
>
> I also noticed that this patch produced a smaller packfile.
> 1.6M compared to 4.5M generated by the original patch.
> I suspect this is because some objects exist in multiple packs,
> so pack-objects still repacks them even if they are actually present
> in keep-packs.
> ---
>  builtin/repack.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 59214dbdfd..fb09df53c0 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -252,8 +252,9 @@ static void existing_packs_release(struct existing_packs *existing)
>  /*
>   * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
>   * or packs->kept based on whether each pack has a corresponding
> - * .keep file or not.  Packs without a .keep file are not to be kept
> - * if we are going to pack everything into one file.
> + * .keep file or not.  Packs without a .keep file or specified by
> + * --keep-pack are not to be kept if we are going to pack everything
> + * into one file.
>   */
>  static void collect_pack_filenames(struct existing_packs *existing,
>  				   const struct string_list *extra_keep)
> @@ -278,8 +279,15 @@ static void collect_pack_filenames(struct existing_packs *existing,
>  		strbuf_addstr(&buf, base);
>  		strbuf_strip_suffix(&buf, ".pack");
>  
> -		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
> +		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep) {
>  			string_list_append(&existing->kept_packs, buf.buf);
> +			/*
> +			 * mark packs specified by --keep-pack as keep in core, keep packs
> +			 * are ignored by repack_promisor_objects. This avoids passthrough
> +			 * --keep-pack args to pack-objects
> +			 */
> +			p->pack_keep_in_core = 1;
> +		}
>  		else if (p->is_cruft)
>  			string_list_append(&existing->cruft_packs, buf.buf);
>  		else
> @@ -412,7 +420,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
>  	 * of a {type -> size} ordering, which may produce better deltas.
>  	 */
>  	for_each_packed_object(the_repository, write_oid, &cmd,
> -			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY |
> +				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
> +				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS);
>  
>  	if (cmd.in == -1) {
>  		/* No packed objects; cmd was never started */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects
  2025-05-08 10:44 [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects Han Young
  2025-05-08 15:36 ` Junio C Hamano
@ 2025-05-21  8:31 ` Han Young
  1 sibling, 0 replies; 3+ messages in thread
From: Han Young @ 2025-05-21  8:31 UTC (permalink / raw)
  To: git; +Cc: trnka, Christian Couder

By the way, the origin patch is
https://lore.kernel.org/git/19759704.fSG56mABFh@electra/
Somehow this patch isn't a reply to the original, I must have pasted the
wrong message id.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-21  8:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08 10:44 [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects Han Young
2025-05-08 15:36 ` Junio C Hamano
2025-05-21  8:31 ` Han Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).