git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH 2/2] maintenance: add loose-objects.batchSize config
Date: Mon, 24 Mar 2025 00:51:51 +0000	[thread overview]
Message-ID: <6ed537862fa7585928fde33d21b77367a7fb708d.1742777512.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1885.git.1742777512.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.

Add a new config option that allows this limit to be adjusted or ignored
entirely.

While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/config/maintenance.adoc |  5 +++++
 Documentation/git-maintenance.adoc    | 18 ++++++++++-------
 builtin/gc.c                          | 10 ++++++++++
 t/t7900-maintenance.sh                | 28 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 72a9d6cf816..42f9545da0e 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -61,6 +61,11 @@ maintenance.loose-objects.auto::
 	loose objects is at least the value of `maintenance.loose-objects.auto`.
 	The default value is 100.
 
+maintenance.loose-objects.batchSize::
+	This integer config option controls the maximum number of loose objects
+	written into a packfile during the `loose-objects` task. The default is
+	fifty thousand. Use value `0` to indicate no limit.
+
 maintenance.incremental-repack.auto::
 	This integer config option controls how often the `incremental-repack`
 	task should be run as part of `git maintenance run --auto`. If zero,
diff --git a/Documentation/git-maintenance.adoc b/Documentation/git-maintenance.adoc
index 0450d74aff1..c90b370b1fc 100644
--- a/Documentation/git-maintenance.adoc
+++ b/Documentation/git-maintenance.adoc
@@ -126,13 +126,17 @@ loose-objects::
 	objects that already exist in a pack-file; concurrent Git processes
 	will examine the pack-file for the object data instead of the loose
 	object. Second, it creates a new pack-file (starting with "loose-")
-	containing a batch of loose objects. The batch size is limited to 50
-	thousand objects to prevent the job from taking too long on a
-	repository with many loose objects. The `gc` task writes unreachable
-	objects as loose objects to be cleaned up by a later step only if
-	they are not re-added to a pack-file; for this reason it is not
-	advisable to enable both the `loose-objects` and `gc` tasks at the
-	same time.
+	containing a batch of loose objects.
++
+The batch size defaults to fifty thousand objects to prevent the job from
+taking too long on a repository with many loose objects. Use the
+`maintenance.loose-objects.batchSize` config option to adjust this size,
+including a value of `0` to remove the limit.
++
+The `gc` task writes unreachable objects as loose objects to be cleaned up
+by a later step only if they are not re-added to a pack-file; for this
+reason it is not advisable to enable both the `loose-objects` and `gc`
+tasks at the same time.
 
 incremental-repack::
 	The `incremental-repack` job repacks the object directory
diff --git a/builtin/gc.c b/builtin/gc.c
index 6672f165bda..817081e1a50 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1163,6 +1163,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid,
 
 	fprintf(d->in, "%s\n", oid_to_hex(oid));
 
+	/* If batch_size is INT_MAX, then this will return 0 always. */
 	return ++(d->count) > d->batch_size;
 }
 
@@ -1208,6 +1209,15 @@ static int pack_loose(struct maintenance_run_opts *opts)
 	data.count = 0;
 	data.batch_size = 50000;
 
+	repo_config_get_int(r, "maintenance.loose-objects.batchSize",
+			    &data.batch_size);
+
+	/* If configured as 0, then remove limit. */
+	if (!data.batch_size)
+		data.batch_size = INT_MAX;
+	else if (data.batch_size > 0)
+		data.batch_size--; /* Decrease for equality on limit. */
+
 	for_each_loose_file_in_objdir(r->objects->odb->path,
 				      write_loose_object_to_stdin,
 				      NULL,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 1909aed95e0..834ddb5ad68 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -306,6 +306,34 @@ test_expect_success 'maintenance.loose-objects.auto' '
 	test_subcommand git prune-packed --quiet <trace-loC
 '
 
+test_expect_success 'maintenance.loose-objects.batchSize' '
+	git init loose-batch &&
+
+	# This creates three objects per commit.
+	test_commit_bulk -C loose-batch 34 &&
+	pack=$(ls loose-batch/.git/objects/pack/pack-*.pack) &&
+	index="${pack%pack}idx" &&
+	rm "$index" &&
+	git -C loose-batch unpack-objects <"$pack" &&
+	git -C loose-batch config maintenance.loose-objects.batchSize 50 &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 2, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	test_must_be_empty err
+'
+
 test_expect_success 'incremental-repack task' '
 	packDir=.git/objects/pack &&
 	for i in $(test_seq 1 5)
-- 
gitgitgadget

  parent reply	other threads:[~2025-03-24  0:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-24  0:51 [PATCH 0/2] Allow configuration for loose-objects maintenance task Derrick Stolee via GitGitGadget
2025-03-24  0:51 ` [PATCH 1/2] maintenance: force progress/no-quiet to children Derrick Stolee via GitGitGadget
2025-03-24  0:51 ` Derrick Stolee via GitGitGadget [this message]
2025-03-24  6:11   ` [PATCH 2/2] maintenance: add loose-objects.batchSize config Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ed537862fa7585928fde33d21b77367a7fb708d.1742777512.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).