public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, gitster@pobox.com, kristofferhaugsbakk@fastmail.com,
	lucasseikioshiro@gmail.com, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v3 0/6] builtin/repo: include largest object information
Date: Mon,  2 Mar 2026 15:45:20 -0600	[thread overview]
Message-ID: <20260302214526.2034279-1-jltobler@gmail.com> (raw)
In-Reply-To: <20260223174120.2356504-1-jltobler@gmail.com>

Greetings,

The "structure" output for git-repo(1) currently provides count
information for references/objects as well as total inflated/disk sizes
of objects by type. Info regarding the largest individual objects in the
repository is not yet collected, but would be useful to users wishing to
identify such large objects.

This patch series adds the following data points:
- The OID and size of the largest objects by object type
- The OID and parent count of the commit with the most parents
- The OID and entries count of the tree with the most entries

Changes from V2:
- When checking for largest objects, zero valued objects were not
  recorded even if they were the "largest" object. In this version, if
  an object ID has not been recorded yet, it is always added even if its
  value is zero.
- Added some helper functions for printing keyvalue info to cut down on
  duplicate code and hopefully make it a bit easier on the eyes.
- Moved the for-each loop that printed table OID annoations inside the
  preceding if-block making it a bit easier to reason about.

Changes from V1:
- Avoided duplicating the annotation string by handing over ownership.
- I decided to leave the `struct object_stats` structure alone for now
  as storing the various object values per-type does make it convenient
  to calulate the various totals. I may revisit this in a future series
  though.

Thanks,
-Justin

Justin Tobler (6):
  builtin/repo: update stats for each object
  builtin/repo: add helper for printing keyvalue output
  builtin/repo: collect largest inflated objects
  builtin/repo: add OID annotations to table output
  builtin/repo: find commit with most parents
  builtin/repo: find tree with most entries

 Documentation/git-repo.adoc |   1 +
 builtin/repo.c              | 323 ++++++++++++++++++++++++++++--------
 t/t1901-repo-structure.sh   | 143 ++++++++++------
 3 files changed, 352 insertions(+), 115 deletions(-)

Range-diff against v2:
1:  94a44e0e0f = 1:  94a44e0e0f builtin/repo: update stats for each object
-:  ---------- > 2:  36c11351ae builtin/repo: add helper for printing keyvalue output
2:  92dbf34f2c ! 3:  90e71c058d builtin/repo: collect largest inflated objects
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      }
      
      static void stats_table_print_structure(const struct stats_table *table)
    +@@ builtin/repo.c: static inline void print_keyvalue(const char *key, char key_delim, size_t value,
    + 	       value_delim);
    + }
    + 
    ++static void print_object_data(const char *key, char key_delim,
    ++			      struct object_data *data, char value_delim)
    ++{
    ++	print_keyvalue(key, key_delim, data->value, value_delim);
    ++	printf("%s_oid%c%s%c", key, key_delim, oid_to_hex(&data->oid),
    ++	       value_delim);
    ++}
    ++
    + static void structure_keyvalue_print(struct repo_structure *stats,
    + 				     char key_delim, char value_delim)
    + {
     @@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
    - 	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
    - 	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
    + 	print_keyvalue("objects.tags.disk_size", key_delim,
    + 		       stats->objects.disk_sizes.tags, value_delim);
      
    -+	printf("objects.commits.max_size%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.commit_size.value, value_delim);
    -+	printf("objects.commits.max_size_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.commit_size.oid), value_delim);
    -+	printf("objects.trees.max_size%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.tree_size.value, value_delim);
    -+	printf("objects.trees.max_size_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.tree_size.oid), value_delim);
    -+	printf("objects.blobs.max_size%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.blob_size.value, value_delim);
    -+	printf("objects.blobs.max_size_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.blob_size.oid), value_delim);
    -+	printf("objects.tags.max_size%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.tag_size.value, value_delim);
    -+	printf("objects.tags.max_size_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.tag_size.oid), value_delim);
    ++	print_object_data("objects.commits.max_size", key_delim,
    ++			  &stats->objects.largest.commit_size, value_delim);
    ++	print_object_data("objects.trees.max_size", key_delim,
    ++			  &stats->objects.largest.tree_size, value_delim);
    ++	print_object_data("objects.blobs.max_size", key_delim,
    ++			  &stats->objects.largest.blob_size, value_delim);
    ++	print_object_data("objects.tags.max_size", key_delim,
    ++			  &stats->objects.largest.tag_size, value_delim);
     +
      	fflush(stdout);
      }
    @@ builtin/repo.c: struct count_objects_data {
     +static void check_largest(struct object_data *data, struct object_id *oid,
     +			  size_t value)
     +{
    -+	if (value > data->value) {
    ++	if (value > data->value || is_null_oid(&data->oid)) {
     +		oidcpy(&data->oid, oid);
     +		data->value = value;
     +	}
3:  1457d5d59c ! 4:  938c36df91 builtin/repo: add OID annotations to table output
    @@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
      		printf("%s\n", buf.buf);
      	}
      
    -+	if (table->annotations.nr)
    ++	if (table->annotations.nr) {
     +		printf("\n");
    -+	for_each_string_list_item(item, &table->annotations)
    -+		printf("%s\n", item->string);
    ++		for_each_string_list_item(item, &table->annotations)
    ++			printf("%s\n", item->string);
    ++	}
     +
      	strbuf_release(&buf);
      }
    @@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
     +	string_list_clear(&table->annotations, 1);
      }
      
    - static void structure_keyvalue_print(struct repo_structure *stats,
    + static inline void print_keyvalue(const char *key, char key_delim, size_t value,
     @@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
      {
      	struct stats_table table = {
4:  f4e92e3f09 ! 5:  ab9870f06e builtin/repo: find commit with most parents
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      	stats_table_object_size_addf(table,
      				     &objects->largest.tree_size.oid,
     @@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
    - 	printf("objects.tags.max_size_oid%c%s%c", key_delim,
    - 	       oid_to_hex(&stats->objects.largest.tag_size.oid), value_delim);
    + 	print_object_data("objects.tags.max_size", key_delim,
    + 			  &stats->objects.largest.tag_size, value_delim);
      
    -+	printf("objects.commits.max_parents%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.parent_count.value, value_delim);
    -+	printf("objects.commits.max_parents_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.parent_count.oid), value_delim);
    ++	print_object_data("objects.commits.max_parents", key_delim,
    ++			  &stats->objects.largest.parent_count, value_delim);
     +
      	fflush(stdout);
      }
5:  af404fcc6c ! 6:  2884cb451c builtin/repo: find tree with most entries
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      	stats_table_object_size_addf(table,
      				     &objects->largest.blob_size.oid,
     @@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
    - 	       (uintmax_t)stats->objects.largest.parent_count.value, value_delim);
    - 	printf("objects.commits.max_parents_oid%c%s%c", key_delim,
    - 	       oid_to_hex(&stats->objects.largest.parent_count.oid), value_delim);
    -+	printf("objects.trees.max_entries%c%" PRIuMAX "%c", key_delim,
    -+	       (uintmax_t)stats->objects.largest.tree_entries.value, value_delim);
    -+	printf("objects.trees.max_entries_oid%c%s%c", key_delim,
    -+	       oid_to_hex(&stats->objects.largest.tree_entries.oid), value_delim);
    + 
    + 	print_object_data("objects.commits.max_parents", key_delim,
    + 			  &stats->objects.largest.parent_count, value_delim);
    ++	print_object_data("objects.trees.max_entries", key_delim,
    ++			  &stats->objects.largest.tree_entries, value_delim);
      
      	fflush(stdout);
      }

base-commit: 67ad42147a7acc2af6074753ebd03d904476118f
-- 
2.53.0


  parent reply	other threads:[~2026-03-02 21:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36   ` Junio C Hamano
2026-02-18 19:40     ` Justin Tobler
2026-02-26 19:20       ` Junio C Hamano
2026-02-26 19:29         ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45   ` Junio C Hamano
2026-02-18 20:01     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14   ` Patrick Steinhardt
2026-02-18 20:13     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48   ` Junio C Hamano
2026-02-03 23:14     ` Kristoffer Haugsbakk
2026-02-03 23:33       ` Junio C Hamano
2026-02-18 20:06       ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50   ` Junio C Hamano
2026-02-04  8:28     ` Patrick Steinhardt
2026-02-04 15:28       ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41   ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41   ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50     ` Junio C Hamano
2026-03-02 17:28       ` Justin Tobler
2026-02-28 23:36     ` Lucas Seiki Oshiro
2026-03-02 17:38       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56     ` Junio C Hamano
2026-03-02 17:39       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41   ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24  9:35   ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43   ` Lucas Seiki Oshiro
2026-03-01 19:22     ` Justin Tobler
2026-03-02 21:45   ` Justin Tobler [this message]
2026-03-02 21:45     ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45     ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-03 17:40         ` Junio C Hamano
2026-03-03 18:08           ` Justin Tobler
2026-03-02 21:45     ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-02 21:45     ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45     ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45     ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09     ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36       ` Junio C Hamano
2026-03-08 18:44         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260302214526.2034279-1-jltobler@gmail.com \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=lucasseikioshiro@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox