From: Toon Claes <toon@iotcl.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [PATCH 3/6] object-file: extract logic to approximate object count
Date: Wed, 11 Mar 2026 13:47:13 +0100 [thread overview]
Message-ID: <87v7f2lei6.fsf@iotcl.com> (raw)
In-Reply-To: <20260310-b4-pks-odb-source-count-objects-v1-3-109e07d425f4@pks.im>
Patrick Steinhardt <ps@pks.im> writes:
> In "builtin/gc.c" we have some logic that checks whether we need to
> repack objects. This is done by counting the number of objects that we
> have and checking whether it exceeds a certain threshold. We don't
> really need an accurate object count though, which is why we only
> open a single object diretcroy shard and then extrapolate from there.
s/diretcroy/directory/
>
> Extract this logic into a new function that is owned by the loose object
> database source. This is done to prepare for a subsequent change, where
> we'll introduce object counting on the object database source level.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/gc.c | 37 +++++++++----------------------------
> object-file.c | 41 +++++++++++++++++++++++++++++++++++++++++
> object-file.h | 13 +++++++++++++
> 3 files changed, 63 insertions(+), 28 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index fb329c2cff..a08c7554cb 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -467,37 +467,18 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED)
> static int too_many_loose_objects(int limit)
> {
> /*
> - * Quickly check if a "gc" is needed, by estimating how
> - * many loose objects there are. Because SHA-1 is evenly
> - * distributed, we can check only one and get a reasonable
> - * estimate.
> + * This is weird, but stems from legacy behaviour: the GC auto
> + * threshold was always essentially interpreted as if it was rounded up
> + * to the next multiple 256 of, so we retain this behaviour for now.
> */
> - DIR *dir;
> - struct dirent *ent;
> - int auto_threshold;
> - int num_loose = 0;
> - int needed = 0;
> - const unsigned hexsz_loose = the_hash_algo->hexsz - 2;
> - char *path;
> -
> - path = repo_git_path(the_repository, "objects/17");
> - dir = opendir(path);
> - free(path);
> - if (!dir)
> + int auto_threshold = DIV_ROUND_UP(limit, 256) * 256;
> + unsigned long loose_count;
> +
> + if (odb_source_loose_approximate_object_count(the_repository->objects->sources,
> + &loose_count) < 0)
> return 0;
>
> - auto_threshold = DIV_ROUND_UP(limit, 256);
> - while ((ent = readdir(dir)) != NULL) {
> - if (strspn(ent->d_name, "0123456789abcdef") != hexsz_loose ||
> - ent->d_name[hexsz_loose] != '\0')
> - continue;
> - if (++num_loose > auto_threshold) {
> - needed = 1;
> - break;
> - }
> - }
> - closedir(dir);
> - return needed;
> + return loose_count > auto_threshold;
> }
>
> static struct packed_git *find_base_packs(struct string_list *packs,
> diff --git a/object-file.c b/object-file.c
> index a3ff7f586c..da67e3c9ff 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1868,6 +1868,47 @@ int odb_source_loose_for_each_object(struct odb_source *source,
> NULL, NULL, &data);
> }
>
> +int odb_source_loose_approximate_object_count(struct odb_source *source,
> + unsigned long *out)
> +{
> + const unsigned hexsz = source->odb->repo->hash_algo->hexsz - 2;
> + unsigned long count = 0;
> + struct dirent *ent;
> + char *path = NULL;
> + DIR *dir = NULL;
> + int ret;
> +
> + path = xstrfmt("%s/17", source->path);
> +
> + dir = opendir(path);
> + if (!dir) {
> + if (errno == ENOENT) {
> + *out = 0;
> + ret = 0;
> + goto out;
> + }
> +
> + ret = error_errno("cannot open object shard '%s'", path);
> + goto out;
> + }
> +
> + while ((ent = readdir(dir)) != NULL) {
> + if (strspn(ent->d_name, "0123456789abcdef") != hexsz ||
> + ent->d_name[hexsz] != '\0')
> + continue;
> + count++;
> + }
> +
> + *out = count * 256;
This makes the number way larger, but I don't think we need to worry
getting anywhere near ULONG_MAX, because I would expect to have Git
coming to a grind way before that happens (not to mention filesystems
would get unhappy about it too).
--
Cheers,
Toon
next prev parent reply other threads:[~2026-03-11 12:47 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 15:18 [PATCH 0/6] odb: introduce generic object counting Patrick Steinhardt
2026-03-10 15:18 ` [PATCH 1/6] odb: stop including "odb/source.h" Patrick Steinhardt
2026-03-10 15:18 ` [PATCH 2/6] packfile: extract logic to count number of objects Patrick Steinhardt
2026-03-11 12:41 ` Toon Claes
2026-03-11 13:55 ` Patrick Steinhardt
2026-03-10 15:18 ` [PATCH 3/6] object-file: extract logic to approximate object count Patrick Steinhardt
2026-03-10 17:44 ` Junio C Hamano
2026-03-11 12:47 ` Toon Claes [this message]
2026-03-11 13:58 ` Patrick Steinhardt
2026-03-10 15:18 ` [PATCH 4/6] object-file: generalize counting objects Patrick Steinhardt
2026-03-11 13:53 ` Toon Claes
2026-03-11 14:01 ` Patrick Steinhardt
2026-03-10 15:18 ` [PATCH 5/6] odb/source: introduce generic object counting Patrick Steinhardt
2026-03-10 17:51 ` Junio C Hamano
2026-03-11 6:44 ` Patrick Steinhardt
2026-03-11 15:03 ` Toon Claes
2026-03-10 15:18 ` [PATCH 6/6] odb: " Patrick Steinhardt
2026-03-11 15:30 ` Toon Claes
2026-03-12 6:57 ` Patrick Steinhardt
2026-03-12 8:42 ` [PATCH v2 0/6] " Patrick Steinhardt
2026-03-12 8:42 ` [PATCH v2 1/6] odb: stop including "odb/source.h" Patrick Steinhardt
2026-03-12 8:42 ` [PATCH v2 2/6] packfile: extract logic to count number of objects Patrick Steinhardt
2026-03-12 8:42 ` [PATCH v2 3/6] object-file: extract logic to approximate object count Patrick Steinhardt
2026-03-12 8:42 ` [PATCH v2 4/6] object-file: generalize counting objects Patrick Steinhardt
2026-03-12 8:43 ` [PATCH v2 5/6] odb/source: introduce generic object counting Patrick Steinhardt
2026-03-12 8:43 ` [PATCH v2 6/6] odb: " Patrick Steinhardt
2026-03-13 11:52 ` [PATCH v2 0/6] " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v7f2lei6.fsf@iotcl.com \
--to=toon@iotcl.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.