From: Justin Tobler <jltobler@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, ps@pks.im, kristofferhaugsbakk@fastmail.com,
eslam.reda.div@gmail.com
Subject: Re: [PATCH v2 2/5] builtin/repo: collect largest inflated objects
Date: Mon, 2 Mar 2026 11:28:32 -0600 [thread overview]
Message-ID: <aaXFcz8AQrwRtr5C@denethor> (raw)
In-Reply-To: <xmqqv7fj1dzg.fsf@gitster.g>
On 26/02/26 11:50AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > @@ -485,6 +514,23 @@ static void structure_keyvalue_print(struct repo_structure *stats,
> > printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
> > (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
> >
> > + printf("objects.commits.max_size%c%" PRIuMAX "%c", key_delim,
> > + (uintmax_t)stats->objects.largest.commit_size.value, value_delim);
> > + printf("objects.commits.max_size_oid%c%s%c", key_delim,
> > + oid_to_hex(&stats->objects.largest.commit_size.oid), value_delim);
> > + printf("objects.trees.max_size%c%" PRIuMAX "%c", key_delim,
> > + (uintmax_t)stats->objects.largest.tree_size.value, value_delim);
> > + printf("objects.trees.max_size_oid%c%s%c", key_delim,
> > + oid_to_hex(&stats->objects.largest.tree_size.oid), value_delim);
> > + printf("objects.blobs.max_size%c%" PRIuMAX "%c", key_delim,
> > + (uintmax_t)stats->objects.largest.blob_size.value, value_delim);
> > + printf("objects.blobs.max_size_oid%c%s%c", key_delim,
> > + oid_to_hex(&stats->objects.largest.blob_size.oid), value_delim);
> > + printf("objects.tags.max_size%c%" PRIuMAX "%c", key_delim,
> > + (uintmax_t)stats->objects.largest.tag_size.value, value_delim);
> > + printf("objects.tags.max_size_oid%c%s%c", key_delim,
> > + oid_to_hex(&stats->objects.largest.tag_size.oid), value_delim);
>
> The repetition tires reviewers' eyes. I am reasonably sure if there
> were an intentional copy-and-paste error, I wouldn't be able to spot
> it. But I tried to be careful and read it over three times ;-).
Ya, I was thinking about adding another patch that reduces the
duplication for the output here. I'll go ahead and do that in the next
version.
> > @@ -553,6 +599,15 @@ struct count_objects_data {
> > struct progress *progress;
> > };
> >
> > +static void check_largest(struct object_data *data, struct object_id *oid,
> > + size_t value)
> > +{
> > + if (value > data->value) {
> > + oidcpy(&data->oid, oid);
> > + data->value = value;
> > + }
> > +}
>
> How important is it for this application to end up with a valid
> value in data->oid?
>
> If data->value is initialized to a valid value, instead of an
> impossible sentinel value that is strictly smaller than any valid
> values, this can leave data->value to a valid value from an existing
> object without recording its object name. Imagine a repository with
> a single empty blob, and data->value initialized to zero (it cannot
> be initialized to a sentinel -1, as use of size_t here makes it
> impossible to have any reasonable sentinel values).
So in cases where we do not record an OID for an object, the table
output format knows not to show any annotations and the machine parsable
formats display null OIDs. In the example you provided though, this
technically wouldn't be correct though as it possible we could have an
empty blob.
One way we could deal which this is have a sentinel value of -1 for the
size value as you mentioned. Another option could be to check if the OID
is a null value and if so record the value regardless. I'll work on this
in the next version.
Thanks,
-Justin
next prev parent reply other threads:[~2026-03-02 17:28 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36 ` Junio C Hamano
2026-02-18 19:40 ` Justin Tobler
2026-02-26 19:20 ` Junio C Hamano
2026-02-26 19:29 ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45 ` Junio C Hamano
2026-02-18 20:01 ` Justin Tobler
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14 ` Patrick Steinhardt
2026-02-18 20:13 ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48 ` Junio C Hamano
2026-02-03 23:14 ` Kristoffer Haugsbakk
2026-02-03 23:33 ` Junio C Hamano
2026-02-18 20:06 ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50 ` Junio C Hamano
2026-02-04 8:28 ` Patrick Steinhardt
2026-02-04 15:28 ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41 ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41 ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50 ` Junio C Hamano
2026-03-02 17:28 ` Justin Tobler [this message]
2026-02-28 23:36 ` Lucas Seiki Oshiro
2026-03-02 17:38 ` Justin Tobler
2026-02-23 17:41 ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56 ` Junio C Hamano
2026-03-02 17:39 ` Justin Tobler
2026-02-23 17:41 ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41 ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24 9:35 ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43 ` Lucas Seiki Oshiro
2026-03-01 19:22 ` Justin Tobler
2026-03-02 21:45 ` [PATCH v3 0/6] " Justin Tobler
2026-03-02 21:45 ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45 ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27 ` Patrick Steinhardt
2026-03-03 17:40 ` Junio C Hamano
2026-03-03 18:08 ` Justin Tobler
2026-03-02 21:45 ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27 ` Patrick Steinhardt
2026-03-02 21:45 ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45 ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45 ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09 ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36 ` Junio C Hamano
2026-03-08 18:44 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaXFcz8AQrwRtr5C@denethor \
--to=jltobler@gmail.com \
--cc=eslam.reda.div@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox