public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org,  ps@pks.im,
	 kristofferhaugsbakk@fastmail.com, eslam.reda.div@gmail.com
Subject: Re: [PATCH v2 2/5] builtin/repo: collect largest inflated objects
Date: Thu, 26 Feb 2026 11:50:11 -0800	[thread overview]
Message-ID: <xmqqv7fj1dzg.fsf@gitster.g> (raw)
In-Reply-To: <20260223174120.2356504-3-jltobler@gmail.com> (Justin Tobler's message of "Mon, 23 Feb 2026 11:41:17 -0600")

Justin Tobler <jltobler@gmail.com> writes:

> @@ -485,6 +514,23 @@ static void structure_keyvalue_print(struct repo_structure *stats,
>  	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
>  	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
>  
> +	printf("objects.commits.max_size%c%" PRIuMAX "%c", key_delim,
> +	       (uintmax_t)stats->objects.largest.commit_size.value, value_delim);
> +	printf("objects.commits.max_size_oid%c%s%c", key_delim,
> +	       oid_to_hex(&stats->objects.largest.commit_size.oid), value_delim);
> +	printf("objects.trees.max_size%c%" PRIuMAX "%c", key_delim,
> +	       (uintmax_t)stats->objects.largest.tree_size.value, value_delim);
> +	printf("objects.trees.max_size_oid%c%s%c", key_delim,
> +	       oid_to_hex(&stats->objects.largest.tree_size.oid), value_delim);
> +	printf("objects.blobs.max_size%c%" PRIuMAX "%c", key_delim,
> +	       (uintmax_t)stats->objects.largest.blob_size.value, value_delim);
> +	printf("objects.blobs.max_size_oid%c%s%c", key_delim,
> +	       oid_to_hex(&stats->objects.largest.blob_size.oid), value_delim);
> +	printf("objects.tags.max_size%c%" PRIuMAX "%c", key_delim,
> +	       (uintmax_t)stats->objects.largest.tag_size.value, value_delim);
> +	printf("objects.tags.max_size_oid%c%s%c", key_delim,
> +	       oid_to_hex(&stats->objects.largest.tag_size.oid), value_delim);

The repetition tires reviewers' eyes.  I am reasonably sure if there
were an intentional copy-and-paste error, I wouldn't be able to spot
it.  But I tried to be careful and read it over three times ;-).

> @@ -553,6 +599,15 @@ struct count_objects_data {
>  	struct progress *progress;
>  };
>  
> +static void check_largest(struct object_data *data, struct object_id *oid,
> +			  size_t value)
> +{
> +	if (value > data->value) {
> +		oidcpy(&data->oid, oid);
> +		data->value = value;
> +	}
> +}

How important is it for this application to end up with a valid
value in data->oid?

If data->value is initialized to a valid value, instead of an
impossible sentinel value that is strictly smaller than any valid
values, this can leave data->value to a valid value from an existing
object without recording its object name.  Imagine a repository with
a single empty blob, and data->value initialized to zero (it cannot
be initialized to a sentinel -1, as use of size_t here makes it
impossible to have any reasonable sentinel values).


> @@ -138,6 +158,14 @@ test_expect_success SHA1 'keyvalue and nul format' '
>  		objects.trees.disk_size=$(object_type_disk_usage tree)
>  		objects.blobs.disk_size=$(object_type_disk_usage blob)
>  		objects.tags.disk_size=$(object_type_disk_usage tag)
> +		objects.commits.max_size=221
> +		objects.commits.max_size_oid=de3508174b5c2ace6993da67cae9be9069e2df39
> +		objects.trees.max_size=1335
> +		objects.trees.max_size_oid=09931deea9d81ec21300d3e13c74412f32eacec5
> +		objects.blobs.max_size=11
> +		objects.blobs.max_size_oid=eaeeedced46482bd4281fda5a5f05ce24854151f
> +		objects.tags.max_size=132
> +		objects.tags.max_size_oid=1ee0f2b16ea37d895dbe9dbd76cd2ac70446176c
>  		EOF
>  
>  		git repo structure --format=keyvalue >out 2>err &&

  reply	other threads:[~2026-02-26 19:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36   ` Junio C Hamano
2026-02-18 19:40     ` Justin Tobler
2026-02-26 19:20       ` Junio C Hamano
2026-02-26 19:29         ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45   ` Junio C Hamano
2026-02-18 20:01     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14   ` Patrick Steinhardt
2026-02-18 20:13     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48   ` Junio C Hamano
2026-02-03 23:14     ` Kristoffer Haugsbakk
2026-02-03 23:33       ` Junio C Hamano
2026-02-18 20:06       ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50   ` Junio C Hamano
2026-02-04  8:28     ` Patrick Steinhardt
2026-02-04 15:28       ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41   ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41   ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50     ` Junio C Hamano [this message]
2026-03-02 17:28       ` Justin Tobler
2026-02-28 23:36     ` Lucas Seiki Oshiro
2026-03-02 17:38       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56     ` Junio C Hamano
2026-03-02 17:39       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41   ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24  9:35   ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43   ` Lucas Seiki Oshiro
2026-03-01 19:22     ` Justin Tobler
2026-03-02 21:45   ` [PATCH v3 0/6] " Justin Tobler
2026-03-02 21:45     ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45     ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-03 17:40         ` Junio C Hamano
2026-03-03 18:08           ` Justin Tobler
2026-03-02 21:45     ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-02 21:45     ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45     ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45     ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09     ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36       ` Junio C Hamano
2026-03-08 18:44         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqv7fj1dzg.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=eslam.reda.div@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox