[PATCH 0/6] builtin/repo: add object size info to structure output

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] builtin/repo: add object size info to structure output
@ 2025-12-09 22:58 Justin Tobler
  2025-12-09 22:58 ` [PATCH 1/6] builtin/repo: group per-type object values into struct Justin Tobler
                   ` (6 more replies)
  0 siblings, 7 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

Greetings,

This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.

In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.

Thanks,
-Justin

Justin Tobler (6):
  builtin/repo: group per-type object values into struct
  builtin/repo: humanise count values in structure output
  builtin/repo: add inflated object info to keyvalue structure output
  builtin/repo: add inflated object info to structure table
  builtin/repo: add disk size info to keyvalue stucture output
  builtin/repo: add object disk size info to structure table

 Documentation/git-repo.adoc |   2 +
 builtin/repo.c              | 222 +++++++++++++++++++++++++++++++-----
 t/t1901-repo-structure.sh   | 142 ++++++++++++++++-------
 3 files changed, 295 insertions(+), 71 deletions(-)

base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
-- 
2.52.0.209.ge85ae279b0

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 1/6] builtin/repo: group per-type object values into struct
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-09 22:58 ` [PATCH 2/6] builtin/repo: humanise count values in structure output Justin Tobler
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
 	size_t others;
 };
 
-struct object_stats {
+struct object_values {
 	size_t tags;
 	size_t commits;
 	size_t trees;
 	size_t blobs;
 };
 
+struct object_stats {
+	struct object_values type_counts;
+};
+
 struct repo_structure {
 	struct ref_stats refs;
 	struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
 	return stats->branches + stats->remotes + stats->tags + stats->others;
 }
 
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
 {
-	return stats->tags + stats->commits + stats->trees + stats->blobs;
+	return values->tags + values->commits + values->trees + values->blobs;
 }
 
 static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_count(objects);
+	object_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
 	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
-	stats_table_count_addf(table, objects->commits, "    * %s", _("Commits"));
-	stats_table_count_addf(table, objects->trees, "    * %s", _("Trees"));
-	stats_table_count_addf(table, objects->blobs, "    * %s", _("Blobs"));
-	stats_table_count_addf(table, objects->tags, "    * %s", _("Tags"));
+	stats_table_count_addf(table, objects->type_counts.commits,
+			       "    * %s", _("Commits"));
+	stats_table_count_addf(table, objects->type_counts.trees,
+			       "    * %s", _("Trees"));
+	stats_table_count_addf(table, objects->type_counts.blobs,
+			       "    * %s", _("Blobs"));
+	stats_table_count_addf(table, objects->type_counts.tags,
+			       "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	       (uintmax_t)stats->refs.others, value_delim);
 
 	printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.commits, value_delim);
+	       (uintmax_t)stats->objects.type_counts.commits, value_delim);
 	printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.trees, value_delim);
+	       (uintmax_t)stats->objects.type_counts.trees, value_delim);
 	printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.blobs, value_delim);
+	       (uintmax_t)stats->objects.type_counts.blobs, value_delim);
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.tags, value_delim);
+	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
 	fflush(stdout);
 }
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 
 	switch (type) {
 	case OBJ_TAG:
-		stats->tags += oids->nr;
+		stats->type_counts.tags += oids->nr;
 		break;
 	case OBJ_COMMIT:
-		stats->commits += oids->nr;
+		stats->type_counts.commits += oids->nr;
 		break;
 	case OBJ_TREE:
-		stats->trees += oids->nr;
+		stats->type_counts.trees += oids->nr;
 		break;
 	case OBJ_BLOB:
-		stats->blobs += oids->nr;
+		stats->type_counts.blobs += oids->nr;
 		break;
 	default:
 		BUG("invalid object type");
 	}
 
-	object_count = get_total_object_count(stats);
+	object_count = get_total_object_values(&stats->type_counts);
 	display_progress(data->progress, object_count);
 
 	return 0;
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 2/6] builtin/repo: humanise count values in structure output
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-09 22:58 ` [PATCH 1/6] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-09 22:58 ` [PATCH 3/6] builtin/repo: add inflated object info to keyvalue " Justin Tobler
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.

For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 61 +++++++++++++++++++++++++++++++-------
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
 2 files changed, 82 insertions(+), 41 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..8fb728b3a5 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
 
 	int name_col_width;
 	int value_col_width;
+	int unit_col_width;
 };
 
 /*
@@ -230,6 +231,7 @@ struct stats_table {
  */
 struct stats_table_entry {
 	char *value;
+	const char *unit;
 };
 
 static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
 
 	if (name_width > table->name_col_width)
 		table->name_col_width = name_width;
-	if (entry) {
+	if (!entry)
+		return;
+	if (entry->value) {
 		int value_width = utf8_strwidth(entry->value);
 		if (value_width > table->value_col_width)
 			table->value_col_width = value_width;
 	}
+	if (entry->unit) {
+		int unit_width = utf8_strwidth(entry->unit);
+		if (unit_width > table->unit_col_width)
+			table->unit_col_width = unit_width;
+	}
 }
 
 static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -266,6 +275,10 @@ static void stats_table_addf(struct stats_table *table, const char *format, ...)
 	va_end(ap);
 }
 
+static const char *unit_k = "k";
+static const char *unit_M = "M";
+static const char *unit_G = "G";
+
 static void stats_table_count_addf(struct stats_table *table, size_t value,
 				   const char *format, ...)
 {
@@ -273,7 +286,26 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_list ap;
 
 	CALLOC_ARRAY(entry, 1);
-	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+	if (value >= 1000000000) {
+		uintmax_t x = (uintmax_t)value + 5000000;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
+				       x / 1000000000,
+				       x % 1000000000 / 10000000);
+		entry->unit = unit_G;
+	} else if (value >= 1000000) {
+		uintmax_t x = (uintmax_t)value + 5000;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
+				       x / 1000000, x % 1000000 / 10000);
+		entry->unit = unit_M;
+	} else if (value >= 1000) {
+		uintmax_t x = (uintmax_t)value + 5;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
+				       x / 1000, x % 1000 / 10);
+		entry->unit = unit_k;
+	} else {
+		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+	}
 
 	va_start(ap, format);
 	stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +356,24 @@ static void stats_table_print_structure(const struct stats_table *table)
 {
 	const char *name_col_title = _("Repository structure");
 	const char *value_col_title = _("Value");
-	int name_col_width = utf8_strwidth(name_col_title);
-	int value_col_width = utf8_strwidth(value_col_title);
+	int title_name_width = utf8_strwidth(name_col_title);
+	int title_value_width = utf8_strwidth(value_col_title);
+	int name_col_width = table->name_col_width;
+	int value_col_width = table->value_col_width;
+	int unit_col_width = table->unit_col_width;
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
 
-	if (table->name_col_width > name_col_width)
-		name_col_width = table->name_col_width;
-	if (table->value_col_width > value_col_width)
-		value_col_width = table->value_col_width;
+	if (title_name_width > name_col_width)
+		name_col_width = title_name_width;
+	if (title_value_width > value_col_width + unit_col_width + 1)
+		value_col_width = title_value_width - unit_col_width;
 
 	strbuf_addstr(&buf, "| ");
 	strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
 	strbuf_addstr(&buf, " | ");
-	strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+	strbuf_utf8_align(&buf, ALIGN_LEFT,
+			  value_col_width + unit_col_width + 1, value_col_title);
 	strbuf_addstr(&buf, " |");
 	printf("%s\n", buf.buf);
 
@@ -345,17 +381,20 @@ static void stats_table_print_structure(const struct stats_table *table)
 	for (int i = 0; i < name_col_width; i++)
 		putchar('-');
 	printf(" | ");
-	for (int i = 0; i < value_col_width; i++)
+	for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
 		putchar('-');
 	printf(" |\n");
 
 	for_each_string_list_item(item, &table->rows) {
 		struct stats_table_entry *entry = item->util;
 		const char *value = "";
+		const char *unit = "";
 
 		if (entry) {
 			struct stats_table_entry *entry = item->util;
 			value = entry->value;
+			if (entry->unit)
+				unit = entry->unit;
 		}
 
 		strbuf_reset(&buf);
@@ -363,6 +402,8 @@ static void stats_table_print_structure(const struct stats_table *table)
 		strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
 		strbuf_addstr(&buf, " | ");
 		strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+		strbuf_addch(&buf, ' ');
+		strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
 		strbuf_addstr(&buf, " |");
 		printf("%s\n", buf.buf);
 	}
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
 	(
 		cd repo &&
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     0 |
-		|     * Branches       |     0 |
-		|     * Tags           |     0 |
-		|     * Remotes        |     0 |
-		|     * Others         |     0 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |     0 |
-		|     * Commits        |     0 |
-		|     * Trees          |     0 |
-		|     * Blobs          |     0 |
-		|     * Tags           |     0 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |     0  |
+		|     * Branches       |     0  |
+		|     * Tags           |     0  |
+		|     * Remotes        |     0  |
+		|     * Others         |     0  |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            |     0  |
+		|     * Commits        |     0  |
+		|     * Trees          |     0  |
+		|     * Blobs          |     0  |
+		|     * Tags           |     0  |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
 	git init repo &&
 	(
 		cd repo &&
-		test_commit_bulk 42 &&
+		test_commit_bulk 1005 &&
 		git tag -a foo -m bar &&
 
 		oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     4 |
-		|     * Branches       |     1 |
-		|     * Tags           |     1 |
-		|     * Remotes        |     1 |
-		|     * Others         |     1 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |   130 |
-		|     * Commits        |    43 |
-		|     * Trees          |    43 |
-		|     * Blobs          |    43 |
-		|     * Tags           |     1 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |    4   |
+		|     * Branches       |    1   |
+		|     * Tags           |    1   |
+		|     * Remotes        |    1   |
+		|     * Others         |    1   |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            | 3.02 k |
+		|     * Commits        | 1.01 k |
+		|     * Trees          | 1.01 k |
+		|     * Blobs          | 1.01 k |
+		|     * Tags           |    1   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/6] builtin/repo: humanise count values in structure output
  2025-12-09 22:58 ` [PATCH 2/6] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 15:10     ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  6:28 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

On Tue, Dec 09, 2025 at 04:58:16PM -0600, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index a69699857a..8fb728b3a5 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -266,6 +275,10 @@ static void stats_table_addf(struct stats_table *table, const char *format, ...)
>  	va_end(ap);
>  }
>  
> +static const char *unit_k = "k";
> +static const char *unit_M = "M";
> +static const char *unit_G = "G";
> +
>  static void stats_table_count_addf(struct stats_table *table, size_t value,
>  				   const char *format, ...)
>  {

I would assume that these units should be translatable.

> @@ -273,7 +286,26 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
>  	va_list ap;
>  
>  	CALLOC_ARRAY(entry, 1);
> -	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> +
> +	if (value >= 1000000000) {
> +		uintmax_t x = (uintmax_t)value + 5000000;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> +				       x / 1000000000,
> +				       x % 1000000000 / 10000000);
> +		entry->unit = unit_G;
> +	} else if (value >= 1000000) {
> +		uintmax_t x = (uintmax_t)value + 5000;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> +				       x / 1000000, x % 1000000 / 10000);
> +		entry->unit = unit_M;
> +	} else if (value >= 1000) {
> +		uintmax_t x = (uintmax_t)value + 5;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> +				       x / 1000, x % 1000 / 10);
> +		entry->unit = unit_k;
> +	} else {
> +		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> +	}
>  
>  	va_start(ap, format);
>  	stats_table_vaddf(table, entry, format, ap);

These units are decimal-based (1000), whereas in "parse.c" we have
`get_unit_factor()` that is binary-based (1024). Arguably, it's
"parse.c" that is wrong because "k" is generally decimal-based whereas
"Ki" would be binary-based.

Not quite sure what to do with this. For counts it _could_ be okay if we
continue to use the wrong unit prefix. But as soon as we get to disk
sizes we certainly should use the correct units, which would probably be
KiB.

> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index 36a71a144e..55fd13ad1b 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
>  	(
>  		cd repo &&
>  		cat >expect <<-\EOF &&
> -		| Repository structure | Value |
> -		| -------------------- | ----- |
> -		| * References         |       |
> -		|   * Count            |     0 |
> -		|     * Branches       |     0 |
> -		|     * Tags           |     0 |
> -		|     * Remotes        |     0 |
> -		|     * Others         |     0 |
> -		|                      |       |
> -		| * Reachable objects  |       |
> -		|   * Count            |     0 |
> -		|     * Commits        |     0 |
> -		|     * Trees          |     0 |
> -		|     * Blobs          |     0 |
> -		|     * Tags           |     0 |
> +		| Repository structure | Value  |
> +		| -------------------- | ------ |
> +		| * References         |        |
> +		|   * Count            |     0  |
> +		|     * Branches       |     0  |
> +		|     * Tags           |     0  |
> +		|     * Remotes        |     0  |
> +		|     * Others         |     0  |
> +		|                      |        |
> +		| * Reachable objects  |        |
> +		|   * Count            |     0  |
> +		|     * Commits        |     0  |
> +		|     * Trees          |     0  |
> +		|     * Blobs          |     0  |
> +		|     * Tags           |     0  |
>  		EOF
>  
>  		git repo structure >out 2>err &&

It's a bit weird that this test here changes even though we don't even
use any units. But I don't mind it too much.

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/6] builtin/repo: humanise count values in structure output
  2025-12-10  6:28   ` Patrick Steinhardt
@ 2025-12-10 15:10     ` Justin Tobler
  2025-12-11  2:57       ` Junio C Hamano
  0 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-10 15:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 04:58:16PM -0600, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index a69699857a..8fb728b3a5 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -266,6 +275,10 @@ static void stats_table_addf(struct stats_table *table, const char *format, ...)
> >  	va_end(ap);
> >  }
> >  
> > +static const char *unit_k = "k";
> > +static const char *unit_M = "M";
> > +static const char *unit_G = "G";
> > +
> >  static void stats_table_count_addf(struct stats_table *table, size_t value,
> >  				   const char *format, ...)
> >  {
> 
> I would assume that these units should be translatable.

Ya, you are right. I'll make units translatable in the next version.

> > @@ -273,7 +286,26 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
> >  	va_list ap;
> >  
> >  	CALLOC_ARRAY(entry, 1);
> > -	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> > +
> > +	if (value >= 1000000000) {
> > +		uintmax_t x = (uintmax_t)value + 5000000;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> > +				       x / 1000000000,
> > +				       x % 1000000000 / 10000000);
> > +		entry->unit = unit_G;
> > +	} else if (value >= 1000000) {
> > +		uintmax_t x = (uintmax_t)value + 5000;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> > +				       x / 1000000, x % 1000000 / 10000);
> > +		entry->unit = unit_M;
> > +	} else if (value >= 1000) {
> > +		uintmax_t x = (uintmax_t)value + 5;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
> > +				       x / 1000, x % 1000 / 10);
> > +		entry->unit = unit_k;
> > +	} else {
> > +		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> > +	}
> >  
> >  	va_start(ap, format);
> >  	stats_table_vaddf(table, entry, format, ap);
> 
> These units are decimal-based (1000), whereas in "parse.c" we have
> `get_unit_factor()` that is binary-based (1024). Arguably, it's
> "parse.c" that is wrong because "k" is generally decimal-based whereas
> "Ki" would be binary-based.
> 
> Not quite sure what to do with this. For counts it _could_ be okay if we
> continue to use the wrong unit prefix. But as soon as we get to disk
> sizes we certainly should use the correct units, which would probably be
> KiB.

For count values, such as number of references/objects, I'm using SI
unit prefixes which I think is more correct. In a subsequent patch where
we start collect size information, I add a separate
`stats_table_size_addf()` function which uses the IEC unit prefixes.
This way we use the most appropriate option for both scenarios.

> > diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> > index 36a71a144e..55fd13ad1b 100755
> > --- a/t/t1901-repo-structure.sh
> > +++ b/t/t1901-repo-structure.sh
> > @@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
> >  	(
> >  		cd repo &&
> >  		cat >expect <<-\EOF &&
> > -		| Repository structure | Value |
> > -		| -------------------- | ----- |
> > -		| * References         |       |
> > -		|   * Count            |     0 |
> > -		|     * Branches       |     0 |
> > -		|     * Tags           |     0 |
> > -		|     * Remotes        |     0 |
> > -		|     * Others         |     0 |
> > -		|                      |       |
> > -		| * Reachable objects  |       |
> > -		|   * Count            |     0 |
> > -		|     * Commits        |     0 |
> > -		|     * Trees          |     0 |
> > -		|     * Blobs          |     0 |
> > -		|     * Tags           |     0 |
> > +		| Repository structure | Value  |
> > +		| -------------------- | ------ |
> > +		| * References         |        |
> > +		|   * Count            |     0  |
> > +		|     * Branches       |     0  |
> > +		|     * Tags           |     0  |
> > +		|     * Remotes        |     0  |
> > +		|     * Others         |     0  |
> > +		|                      |        |
> > +		| * Reachable objects  |        |
> > +		|   * Count            |     0  |
> > +		|     * Commits        |     0  |
> > +		|     * Trees          |     0  |
> > +		|     * Blobs          |     0  |
> > +		|     * Tags           |     0  |
> >  		EOF
> >  
> >  		git repo structure >out 2>err &&
> 
> It's a bit weird that this test here changes even though we don't even
> use any units. But I don't mind it too much.

Ya, the added space comes from the fixed space character between the
value and unit columns. I didn't think it mattered too much, but I may
try to only conditionally add it if needed in the next version.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/6] builtin/repo: humanise count values in structure output
  2025-12-10 15:10     ` Justin Tobler
@ 2025-12-11  2:57       ` Junio C Hamano
  2025-12-12 16:46         ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-11  2:57 UTC (permalink / raw)
  To: Justin Tobler; +Cc: Patrick Steinhardt, git

Justin Tobler <jltobler@gmail.com> writes:

> On 25/12/10 07:28AM, Patrick Steinhardt wrote:
>> On Tue, Dec 09, 2025 at 04:58:16PM -0600, Justin Tobler wrote:
>> > diff --git a/builtin/repo.c b/builtin/repo.c
>> > index a69699857a..8fb728b3a5 100644
>> > --- a/builtin/repo.c
>> > +++ b/builtin/repo.c
>> > @@ -266,6 +275,10 @@ static void stats_table_addf(struct stats_table *table, const char *format, ...)
>> >  	va_end(ap);
>> >  }
>> >  
>> > +static const char *unit_k = "k";
>> > +static const char *unit_M = "M";
>> > +static const char *unit_G = "G";
>> > +
>> >  static void stats_table_count_addf(struct stats_table *table, size_t value,
>> >  				   const char *format, ...)
>> >  {
>> 
>> I would assume that these units should be translatable.
>
> Ya, you are right. I'll make units translatable in the next version.

Whatever you do, please first consider reusing existing
"human-readable numbers" helpers, like strbuf_humanise_bytes() used
by the progress.c for showing throughput, before rolling your own
variant like the above.

Thanks.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/6] builtin/repo: humanise count values in structure output
  2025-12-11  2:57       ` Junio C Hamano
@ 2025-12-12 16:46         ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 16:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git

On 25/12/11 11:57AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> >> On Tue, Dec 09, 2025 at 04:58:16PM -0600, Justin Tobler wrote:
> >> > diff --git a/builtin/repo.c b/builtin/repo.c
> >> > index a69699857a..8fb728b3a5 100644
> >> > --- a/builtin/repo.c
> >> > +++ b/builtin/repo.c
> >> > @@ -266,6 +275,10 @@ static void stats_table_addf(struct stats_table *table, const char *format, ...)
> >> >  	va_end(ap);
> >> >  }
> >> >  
> >> > +static const char *unit_k = "k";
> >> > +static const char *unit_M = "M";
> >> > +static const char *unit_G = "G";
> >> > +
> >> >  static void stats_table_count_addf(struct stats_table *table, size_t value,
> >> >  				   const char *format, ...)
> >> >  {
> >> 
> >> I would assume that these units should be translatable.
> >
> > Ya, you are right. I'll make units translatable in the next version.
> 
> Whatever you do, please first consider reusing existing
> "human-readable numbers" helpers, like strbuf_humanise_bytes() used
> by the progress.c for showing throughput, before rolling your own
> variant like the above.

Ya, I originally looked into using strbuf_humanise_bytes(), but went a
different direction due do how I wanted to align the values and unit
prefixes in the table output. In the next version though, I'm trying to
split out and reuse some of the same logic to avoid the duplication.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 3/6] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-09 22:58 ` [PATCH 1/6] builtin/repo: group per-type object values into struct Justin Tobler
  2025-12-09 22:58 ` [PATCH 2/6] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-09 22:58 ` [PATCH 4/6] builtin/repo: add inflated object info to structure table Justin Tobler
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.

For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 33 +++++++++++++++++++++++++++++++++
 t/t1901-repo-structure.sh   |  6 +++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
 +
 * Reference counts categorized by type
 * Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 8fb728b3a5..a67215ae31 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
 
 #include "builtin.h"
 #include "environment.h"
+#include "hex.h"
+#include "odb.h"
 #include "parse-options.h"
 #include "path-walk.h"
 #include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
 
 struct object_stats {
 	struct object_values type_counts;
+	struct object_values inflated_sizes;
 };
 
 struct repo_structure {
@@ -446,6 +449,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
+	printf("objects.commits.inflated%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+	printf("objects.trees.inflated%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+	printf("objects.blobs.inflated%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+	printf("objects.tags.inflated%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -509,6 +521,7 @@ static void structure_count_references(struct ref_stats *stats,
 }
 
 struct count_objects_data {
+	struct object_database *odb;
 	struct object_stats *stats;
 	struct progress *progress;
 };
@@ -518,20 +531,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 {
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
+	size_t inflated_total = 0;
 	size_t object_count;
 
+	for (size_t i = 0; i < oids->nr; i++) {
+		struct object_info oi = OBJECT_INFO_INIT;
+		unsigned long inflated;
+
+		oi.sizep = &inflated;
+
+		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+						  OBJECT_INFO_FOR_PREFETCH) < 0)
+			die(_("cannot read object for %s"),
+			    oid_to_hex(&oids->oid[i]));
+
+		inflated_total += inflated;
+	}
+
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
+		stats->inflated_sizes.tags += inflated_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
+		stats->inflated_sizes.commits += inflated_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
+		stats->inflated_sizes.trees += inflated_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
+		stats->inflated_sizes.blobs += inflated_total;
 		break;
 	default:
 		BUG("invalid object type");
@@ -549,6 +581,7 @@ static void structure_count_objects(struct object_stats *stats,
 {
 	struct path_walk_info info = PATH_WALK_INFO_INIT;
 	struct count_objects_data data = {
+		.odb = repo->objects,
 		.stats = stats,
 	};
 
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..cf5e252f10 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
 	)
 '
 
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
 		objects.trees.count=42
 		objects.blobs.count=42
 		objects.tags.count=1
+		objects.commits.inflated=9225
+		objects.trees.inflated=28554
+		objects.blobs.inflated=453
+		objects.tags.inflated=132
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 4/6] builtin/repo: add inflated object info to structure table
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
                   ` (2 preceding siblings ...)
  2025-12-09 22:58 ` [PATCH 3/6] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-09 22:58 ` [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 57 +++++++++++++++++++++++++++++++++--
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
 2 files changed, 90 insertions(+), 29 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a67215ae31..5c37f4116f 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -315,6 +315,44 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_end(ap);
 }
 
+static const char *unit_B = "B";
+static const char *unit_KiB = "KiB";
+static const char *unit_MiB = "MiB";
+static const char *unit_GiB = "GiB";
+
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+				  const char *format, ...)
+{
+	struct stats_table_entry *entry;
+	va_list ap;
+
+	CALLOC_ARRAY(entry, 1);
+
+	if (value > 1 << 30) {
+		uintmax_t x = (uintmax_t)value + 5368709;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 30,
+				       ((x & ((1 << 30) - 1)) * 100) >> 30);
+		entry->unit = unit_GiB;
+	} else if (value > 1 << 20) {
+		uintmax_t x = (uintmax_t)value + 5243;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 20,
+				       ((x & ((1 << 20) - 1)) * 100) >> 20);
+		entry->unit = unit_MiB;
+	} else if (value > 1 << 10) {
+		uintmax_t x = (uintmax_t)value + 5;
+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 10,
+				       ((x & ((1 << 10) - 1)) * 100) >> 10);
+		entry->unit = unit_KiB;
+	} else {
+		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+		entry->unit = unit_B;
+	}
+
+	va_start(ap, format);
+	stats_table_vaddf(table, entry, format, ap);
+	va_end(ap);
+}
+
 static inline size_t get_total_reference_count(struct ref_stats *stats)
 {
 	return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -330,7 +368,8 @@ static void stats_table_setup_structure(struct stats_table *table,
 {
 	struct object_stats *objects = &stats->objects;
 	struct ref_stats *refs = &stats->refs;
-	size_t object_total;
+	size_t inflated_object_total;
+	size_t object_count_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -341,10 +380,10 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_values(&objects->type_counts);
+	object_count_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
-	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
+	stats_table_count_addf(table, object_count_total, "  * %s", _("Count"));
 	stats_table_count_addf(table, objects->type_counts.commits,
 			       "    * %s", _("Commits"));
 	stats_table_count_addf(table, objects->type_counts.trees,
@@ -353,6 +392,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			       "    * %s", _("Blobs"));
 	stats_table_count_addf(table, objects->type_counts.tags,
 			       "    * %s", _("Tags"));
+
+	inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+	stats_table_size_addf(table, inflated_object_total,
+			      "  * %s", _("Inflated size"));
+	stats_table_size_addf(table, objects->inflated_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->inflated_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->inflated_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->inflated_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index cf5e252f10..0ae96e6bbf 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
 		| Repository structure | Value  |
 		| -------------------- | ------ |
 		| * References         |        |
-		|   * Count            |     0  |
-		|     * Branches       |     0  |
-		|     * Tags           |     0  |
-		|     * Remotes        |     0  |
-		|     * Others         |     0  |
+		|   * Count            |    0   |
+		|     * Branches       |    0   |
+		|     * Tags           |    0   |
+		|     * Remotes        |    0   |
+		|     * Others         |    0   |
 		|                      |        |
 		| * Reachable objects  |        |
-		|   * Count            |     0  |
-		|     * Commits        |     0  |
-		|     * Trees          |     0  |
-		|     * Blobs          |     0  |
-		|     * Tags           |     0  |
+		|   * Count            |    0   |
+		|     * Commits        |    0   |
+		|     * Trees          |    0   |
+		|     * Blobs          |    0   |
+		|     * Tags           |    0   |
+		|   * Inflated size    |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
 	)
 '
 
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value  |
-		| -------------------- | ------ |
-		| * References         |        |
-		|   * Count            |    4   |
-		|     * Branches       |    1   |
-		|     * Tags           |    1   |
-		|     * Remotes        |    1   |
-		|     * Others         |    1   |
-		|                      |        |
-		| * Reachable objects  |        |
-		|   * Count            | 3.02 k |
-		|     * Commits        | 1.01 k |
-		|     * Trees          | 1.01 k |
-		|     * Blobs          | 1.01 k |
-		|     * Tags           |    1   |
+		| Repository structure | Value      |
+		| -------------------- | ---------- |
+		| * References         |            |
+		|   * Count            |      4     |
+		|     * Branches       |      1     |
+		|     * Tags           |      1     |
+		|     * Remotes        |      1     |
+		|     * Others         |      1     |
+		|                      |            |
+		| * Reachable objects  |            |
+		|   * Count            |   3.02 k   |
+		|     * Commits        |   1.01 k   |
+		|     * Trees          |   1.01 k   |
+		|     * Blobs          |   1.01 k   |
+		|     * Tags           |      1     |
+		|   * Inflated size    |  16.03 MiB |
+		|     * Commits        | 217.92 KiB |
+		|     * Trees          |  15.81 MiB |
+		|     * Blobs          |  11.68 KiB |
+		|     * Tags           |    132 B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/6] builtin/repo: add inflated object info to structure table
  2025-12-09 22:58 ` [PATCH 4/6] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 15:21     ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  6:28 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

On Tue, Dec 09, 2025 at 04:58:18PM -0600, Justin Tobler wrote:
> Update the table output format for the git-repo(1) structure command to
> begin printing the total inflated object size info by object type. To be
> more human-friendly, larger values are scaled down and displayed with
> the appropriate unit prefix. Output for the keyvalue and nul formats
> remains unchanged.
> 
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
>  builtin/repo.c            | 57 +++++++++++++++++++++++++++++++++--
>  t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
>  2 files changed, 90 insertions(+), 29 deletions(-)
> 
> diff --git a/builtin/repo.c b/builtin/repo.c
> index a67215ae31..5c37f4116f 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -315,6 +315,44 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
>  	va_end(ap);
>  }
>  
> +static const char *unit_B = "B";
> +static const char *unit_KiB = "KiB";
> +static const char *unit_MiB = "MiB";
> +static const char *unit_GiB = "GiB";

Okay, nice, you already use KiB et al as I suggested in an earlier
comment. But I guess these should also be marked as translatable.

> +static void stats_table_size_addf(struct stats_table *table, size_t value,
> +				  const char *format, ...)
> +{
> +	struct stats_table_entry *entry;
> +	va_list ap;
> +
> +	CALLOC_ARRAY(entry, 1);
> +
> +	if (value > 1 << 30) {
> +		uintmax_t x = (uintmax_t)value + 5368709;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 30,
> +				       ((x & ((1 << 30) - 1)) * 100) >> 30);
> +		entry->unit = unit_GiB;
> +	} else if (value > 1 << 20) {
> +		uintmax_t x = (uintmax_t)value + 5243;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 20,
> +				       ((x & ((1 << 20) - 1)) * 100) >> 20);
> +		entry->unit = unit_MiB;
> +	} else if (value > 1 << 10) {
> +		uintmax_t x = (uintmax_t)value + 5;
> +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 10,
> +				       ((x & ((1 << 10) - 1)) * 100) >> 10);
> +		entry->unit = unit_KiB;
> +	} else {
> +		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> +		entry->unit = unit_B;
> +	}

Euh. What kind of black magic is this? This block at least warrants a
comment how you came up with these incantations.

Also, git-rev-list(1) already has logic to output human-formatted disk
sizes via `git rev-list --disk-usage=human`. Can we share the logic?

> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index cf5e252f10..0ae96e6bbf 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
>  		git notes add -m foo &&
>  
>  		cat >expect <<-\EOF &&
> -		| Repository structure | Value  |
> -		| -------------------- | ------ |
> -		| * References         |        |
> -		|   * Count            |    4   |
> -		|     * Branches       |    1   |
> -		|     * Tags           |    1   |
> -		|     * Remotes        |    1   |
> -		|     * Others         |    1   |
> -		|                      |        |
> -		| * Reachable objects  |        |
> -		|   * Count            | 3.02 k |
> -		|     * Commits        | 1.01 k |
> -		|     * Trees          | 1.01 k |
> -		|     * Blobs          | 1.01 k |
> -		|     * Tags           |    1   |
> +		| Repository structure | Value      |
> +		| -------------------- | ---------- |
> +		| * References         |            |
> +		|   * Count            |      4     |
> +		|     * Branches       |      1     |
> +		|     * Tags           |      1     |
> +		|     * Remotes        |      1     |
> +		|     * Others         |      1     |
> +		|                      |            |
> +		| * Reachable objects  |            |
> +		|   * Count            |   3.02 k   |
> +		|     * Commits        |   1.01 k   |
> +		|     * Trees          |   1.01 k   |
> +		|     * Blobs          |   1.01 k   |
> +		|     * Tags           |      1     |
> +		|   * Inflated size    |  16.03 MiB |
> +		|     * Commits        | 217.92 KiB |
> +		|     * Trees          |  15.81 MiB |
> +		|     * Blobs          |  11.68 KiB |
> +		|     * Tags           |    132 B   |
>  		EOF

Nice, I like the end result.

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/6] builtin/repo: add inflated object info to structure table
  2025-12-10  6:28   ` Patrick Steinhardt
@ 2025-12-10 15:21     ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-10 15:21 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 04:58:18PM -0600, Justin Tobler wrote:
> > Update the table output format for the git-repo(1) structure command to
> > begin printing the total inflated object size info by object type. To be
> > more human-friendly, larger values are scaled down and displayed with
> > the appropriate unit prefix. Output for the keyvalue and nul formats
> > remains unchanged.
> > 
> > Signed-off-by: Justin Tobler <jltobler@gmail.com>
> > ---
> >  builtin/repo.c            | 57 +++++++++++++++++++++++++++++++++--
> >  t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
> >  2 files changed, 90 insertions(+), 29 deletions(-)
> > 
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index a67215ae31..5c37f4116f 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -315,6 +315,44 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
> >  	va_end(ap);
> >  }
> >  
> > +static const char *unit_B = "B";
> > +static const char *unit_KiB = "KiB";
> > +static const char *unit_MiB = "MiB";
> > +static const char *unit_GiB = "GiB";
> 
> Okay, nice, you already use KiB et al as I suggested in an earlier
> comment. But I guess these should also be marked as translatable.

Will do.

> > +static void stats_table_size_addf(struct stats_table *table, size_t value,
> > +				  const char *format, ...)
> > +{
> > +	struct stats_table_entry *entry;
> > +	va_list ap;
> > +
> > +	CALLOC_ARRAY(entry, 1);
> > +
> > +	if (value > 1 << 30) {
> > +		uintmax_t x = (uintmax_t)value + 5368709;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 30,
> > +				       ((x & ((1 << 30) - 1)) * 100) >> 30);
> > +		entry->unit = unit_GiB;
> > +	} else if (value > 1 << 20) {
> > +		uintmax_t x = (uintmax_t)value + 5243;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 20,
> > +				       ((x & ((1 << 20) - 1)) * 100) >> 20);
> > +		entry->unit = unit_MiB;
> > +	} else if (value > 1 << 10) {
> > +		uintmax_t x = (uintmax_t)value + 5;
> > +		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 10,
> > +				       ((x & ((1 << 10) - 1)) * 100) >> 10);
> > +		entry->unit = unit_KiB;
> > +	} else {
> > +		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> > +		entry->unit = unit_B;
> > +	}
> 
> Euh. What kind of black magic is this? This block at least warrants a
> comment how you came up with these incantations.

Ya, I'll add some comments to explain what is going on here. :)

> Also, git-rev-list(1) already has logic to output human-formatted disk
> sizes via `git rev-list --disk-usage=human`. Can we share the logic?

So I believe `git rev-list --disk-usage=human` relies on
strbuf_humanise_bytes() under the hood. The problem here is that it
combines the value and unit prefix together. For alignment purposes in
the table output, we need to store the value and unit prefix separately.
I couldn't immediately think of a great way to share logic here so I
opted to implement it separately to accommodate this specific use case.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
                   ` (3 preceding siblings ...)
  2025-12-09 22:58 ` [PATCH 4/6] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 14:58   ` Junio C Hamano
  2025-12-09 22:58 ` [PATCH 6/6] builtin/repo: add object disk size info to structure table Justin Tobler
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
  6 siblings, 2 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.

Since disk size may vary between platforms, tests do not validate actual
values and only check that size info is printed in an empty repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 18 +++++++++++++++
 t/t1901-repo-structure.sh   | 45 +++++++++++++++++++++++++++++--------
 3 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
 * Reference counts categorized by type
 * Reachable object counts categorized by type
 * Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 5c37f4116f..8ea7c9b24f 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
 struct object_stats {
 	struct object_values type_counts;
 	struct object_values inflated_sizes;
+	struct object_values disk_sizes;
 };
 
 struct repo_structure {
@@ -509,6 +510,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.inflated%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
 
+	printf("objects.commits.disk%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+	printf("objects.trees.disk%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+	printf("objects.blobs.disk%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+	printf("objects.tags.disk%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -583,13 +593,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
 	size_t inflated_total = 0;
+	size_t disk_total = 0;
 	size_t object_count;
 
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		off_t disk;
 
 		oi.sizep = &inflated;
+		oi.disk_sizep = &disk;
 
 		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
 						  OBJECT_INFO_FOR_PREFETCH) < 0)
@@ -597,24 +610,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 			    oid_to_hex(&oids->oid[i]));
 
 		inflated_total += inflated;
+		disk_total += disk;
 	}
 
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
 		stats->inflated_sizes.tags += inflated_total;
+		stats->disk_sizes.tags += disk_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
 		stats->inflated_sizes.commits += inflated_total;
+		stats->disk_sizes.commits += disk_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
 		stats->inflated_sizes.trees += inflated_total;
+		stats->disk_sizes.trees += disk_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
 		stats->inflated_sizes.blobs += inflated_total;
+		stats->disk_sizes.blobs += disk_total;
 		break;
 	default:
 		BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 0ae96e6bbf..a98c651f1d 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -35,6 +35,37 @@ test_expect_success 'empty repository' '
 		git repo structure >out 2>err &&
 
 		test_cmp expect out &&
+		test_line_count = 0 err &&
+
+		cat >expect <<-\EOF &&
+		references.branches.count=0
+		references.tags.count=0
+		references.remotes.count=0
+		references.others.count=0
+		objects.commits.count=0
+		objects.trees.count=0
+		objects.blobs.count=0
+		objects.tags.count=0
+		objects.commits.inflated=0
+		objects.trees.inflated=0
+		objects.blobs.inflated=0
+		objects.tags.inflated=0
+		objects.commits.disk=0
+		objects.trees.disk=0
+		objects.blobs.disk=0
+		objects.tags.disk=0
+		EOF
+
+		git repo structure --format=keyvalue >out 2>err &&
+
+		test_cmp expect out &&
+		test_line_count = 0 err &&
+
+		# Replace key and value delimiters for nul format.
+		tr "\n=" "\0\n" <expect >expect_nul &&
+		git repo structure --format=nul >out 2>err &&
+
+		test_cmp expect_nul out &&
 		test_line_count = 0 err
 	)
 '
@@ -83,7 +114,7 @@ test_expect_success SHA1 'repository with references and objects' '
 	)
 '
 
-test_expect_success SHA1 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		objects.tags.inflated=132
 		EOF
 
-		git repo structure --format=keyvalue >out 2>err &&
+		git repo structure --format=keyvalue >out.raw 2>err &&
 
-		test_cmp expect out &&
-		test_line_count = 0 err &&
+		# Strip object disk usage from output due to platform variance.
+		grep -v "objects\..*\.disk=" out.raw >out &&
 
-		# Replace key and value delimiters for nul format.
-		tr "\n=" "\0\n" <expect >expect_nul &&
-		git repo structure --format=nul >out 2>err &&
-
-		test_cmp expect_nul out &&
+		test_cmp expect out &&
 		test_line_count = 0 err
 	)
 '
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-09 22:58 ` [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 15:24     ` Justin Tobler
  2025-12-12 20:40     ` Justin Tobler
  2025-12-10 14:58   ` Junio C Hamano
  1 sibling, 2 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  6:28 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

On Tue, Dec 09, 2025 at 04:58:19PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index 0ae96e6bbf..a98c651f1d 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -35,6 +35,37 @@ test_expect_success 'empty repository' '
>  		git repo structure >out 2>err &&
>  
>  		test_cmp expect out &&
> +		test_line_count = 0 err &&
> +
> +		cat >expect <<-\EOF &&
> +		references.branches.count=0
> +		references.tags.count=0
> +		references.remotes.count=0
> +		references.others.count=0
> +		objects.commits.count=0
> +		objects.trees.count=0
> +		objects.blobs.count=0
> +		objects.tags.count=0
> +		objects.commits.inflated=0
> +		objects.trees.inflated=0
> +		objects.blobs.inflated=0
> +		objects.tags.inflated=0
> +		objects.commits.disk=0
> +		objects.trees.disk=0
> +		objects.blobs.disk=0
> +		objects.tags.disk=0
> +		EOF

Do we maybe want to adapt the keys to be "inflated_size" and
"disk_size"?

> @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
>  		objects.tags.inflated=132
>  		EOF
>  
> -		git repo structure --format=keyvalue >out 2>err &&
> +		git repo structure --format=keyvalue >out.raw 2>err &&
>  
> -		test_cmp expect out &&
> -		test_line_count = 0 err &&
> +		# Strip object disk usage from output due to platform variance.
> +		grep -v "objects\..*\.disk=" out.raw >out &&
>  
> -		# Replace key and value delimiters for nul format.
> -		tr "\n=" "\0\n" <expect >expect_nul &&
> -		git repo structure --format=nul >out 2>err &&
> -
> -		test_cmp expect_nul out &&
> +		test_cmp expect out &&
>  		test_line_count = 0 err
>  	)
>  '

We could test disk sizes here test if we use git-rev-list(1) to compute
disk size by type:

    git rev-list --disk-usage HEAD --objects --filter=object:type=blob
    git rev-list --disk-usage HEAD --objects --filter=object:type=commit
    git rev-list --disk-usage HEAD --objects --filter=object:type=tag
    git rev-list --disk-usage HEAD --objects --filter=object:type=tree

The `--disk-usage` option also supports `--disk-usage=human`, which we
can use in the next commit to verify that our computations are the same
across git-rev-list(1) and git-repo(1).

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-10  6:28   ` Patrick Steinhardt
@ 2025-12-10 15:24     ` Justin Tobler
  2025-12-12 20:40     ` Justin Tobler
  1 sibling, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-10 15:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 04:58:19PM -0600, Justin Tobler wrote:
> > diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> > index 0ae96e6bbf..a98c651f1d 100755
> > --- a/t/t1901-repo-structure.sh
> > +++ b/t/t1901-repo-structure.sh
> > @@ -35,6 +35,37 @@ test_expect_success 'empty repository' '
> >  		git repo structure >out 2>err &&
> >  
> >  		test_cmp expect out &&
> > +		test_line_count = 0 err &&
> > +
> > +		cat >expect <<-\EOF &&
> > +		references.branches.count=0
> > +		references.tags.count=0
> > +		references.remotes.count=0
> > +		references.others.count=0
> > +		objects.commits.count=0
> > +		objects.trees.count=0
> > +		objects.blobs.count=0
> > +		objects.tags.count=0
> > +		objects.commits.inflated=0
> > +		objects.trees.inflated=0
> > +		objects.blobs.inflated=0
> > +		objects.tags.inflated=0
> > +		objects.commits.disk=0
> > +		objects.trees.disk=0
> > +		objects.blobs.disk=0
> > +		objects.tags.disk=0
> > +		EOF
> 
> Do we maybe want to adapt the keys to be "inflated_size" and
> "disk_size"?

Good suggestion. I'll update in the next version.

> > @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
> >  		objects.tags.inflated=132
> >  		EOF
> >  
> > -		git repo structure --format=keyvalue >out 2>err &&
> > +		git repo structure --format=keyvalue >out.raw 2>err &&
> >  
> > -		test_cmp expect out &&
> > -		test_line_count = 0 err &&
> > +		# Strip object disk usage from output due to platform variance.
> > +		grep -v "objects\..*\.disk=" out.raw >out &&
> >  
> > -		# Replace key and value delimiters for nul format.
> > -		tr "\n=" "\0\n" <expect >expect_nul &&
> > -		git repo structure --format=nul >out 2>err &&
> > -
> > -		test_cmp expect_nul out &&
> > +		test_cmp expect out &&
> >  		test_line_count = 0 err
> >  	)
> >  '
> 
> We could test disk sizes here test if we use git-rev-list(1) to compute
> disk size by type:
> 
>     git rev-list --disk-usage HEAD --objects --filter=object:type=blob
>     git rev-list --disk-usage HEAD --objects --filter=object:type=commit
>     git rev-list --disk-usage HEAD --objects --filter=object:type=tag
>     git rev-list --disk-usage HEAD --objects --filter=object:type=tree
> 
> The `--disk-usage` option also supports `--disk-usage=human`, which we
> can use in the next commit to verify that our computations are the same
> across git-rev-list(1) and git-repo(1).

Thanks! I hadn't considered this. I'll try to update the tests in the
next version using this.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 15:24     ` Justin Tobler
@ 2025-12-12 20:40     ` Justin Tobler
  2025-12-15  5:33       ` Patrick Steinhardt
  1 sibling, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 20:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 04:58:19PM -0600, Justin Tobler wrote:
> > @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
> >  		objects.tags.inflated=132
> >  		EOF
> >  
> > -		git repo structure --format=keyvalue >out 2>err &&
> > +		git repo structure --format=keyvalue >out.raw 2>err &&
> >  
> > -		test_cmp expect out &&
> > -		test_line_count = 0 err &&
> > +		# Strip object disk usage from output due to platform variance.
> > +		grep -v "objects\..*\.disk=" out.raw >out &&
> >  
> > -		# Replace key and value delimiters for nul format.
> > -		tr "\n=" "\0\n" <expect >expect_nul &&
> > -		git repo structure --format=nul >out 2>err &&
> > -
> > -		test_cmp expect_nul out &&
> > +		test_cmp expect out &&
> >  		test_line_count = 0 err
> >  	)
> >  '
> 
> We could test disk sizes here test if we use git-rev-list(1) to compute
> disk size by type:
> 
>     git rev-list --disk-usage HEAD --objects --filter=object:type=blob
>     git rev-list --disk-usage HEAD --objects --filter=object:type=commit
>     git rev-list --disk-usage HEAD --objects --filter=object:type=tag
>     git rev-list --disk-usage HEAD --objects --filter=object:type=tree
> 
> The `--disk-usage` option also supports `--disk-usage=human`, which we
> can use in the next commit to verify that our computations are the same
> across git-rev-list(1) and git-repo(1).

So, I'm not sure we can use git-rev-list(1) in the manner suggested
above. It looks like user-specified objects are always included in the
output. When using "HEAD" this means the referenced object will always
be included regardless of the filter used. In practice, this means
reported disk-usage when filtering by trees or blobs will likely be
inflated by objects not specified by the filter. As far as I am aware,
there is not a way to suppress user-specified objects in git-rev-list(1)
output.

I am somewhat curious if always including user-specified objects in
git-rev-list(1) output regardless of the specified filter is
intentional. Looking at git-rev-list(1) --filter documentation:

  The form --filter=object:type=(tag|commit|tree|blob) omits all objects
  which are not of the requested type.

doesn't indicate this limitation. From looking at the code in
list-objects-filter.c:list_objects_filter__filter_object() though, it
does somewhat seem like this behavior is intentional.

Regardless, in the tests I can hack around this problem by using
something like:

  $ git cat-file --batch-check='$(objectsize:disk)' --batch-all-objects \
    --filter=object:type=tree | awk '{ sum += $1 } END { print sum }'

to add up the sizes by object type. This doesn't really leave me a great
way to verify the human-readable values in the table output though. I
may just continue to omit those values from the test like I already do
in the next patch.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-12 20:40     ` Justin Tobler
@ 2025-12-15  5:33       ` Patrick Steinhardt
  2025-12-15 16:24         ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

On Fri, Dec 12, 2025 at 02:40:24PM -0600, Justin Tobler wrote:
> On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> > On Tue, Dec 09, 2025 at 04:58:19PM -0600, Justin Tobler wrote:
> > > @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
> > >  		objects.tags.inflated=132
> > >  		EOF
> > >  
> > > -		git repo structure --format=keyvalue >out 2>err &&
> > > +		git repo structure --format=keyvalue >out.raw 2>err &&
> > >  
> > > -		test_cmp expect out &&
> > > -		test_line_count = 0 err &&
> > > +		# Strip object disk usage from output due to platform variance.
> > > +		grep -v "objects\..*\.disk=" out.raw >out &&
> > >  
> > > -		# Replace key and value delimiters for nul format.
> > > -		tr "\n=" "\0\n" <expect >expect_nul &&
> > > -		git repo structure --format=nul >out 2>err &&
> > > -
> > > -		test_cmp expect_nul out &&
> > > +		test_cmp expect out &&
> > >  		test_line_count = 0 err
> > >  	)
> > >  '
> > 
> > We could test disk sizes here test if we use git-rev-list(1) to compute
> > disk size by type:
> > 
> >     git rev-list --disk-usage HEAD --objects --filter=object:type=blob
> >     git rev-list --disk-usage HEAD --objects --filter=object:type=commit
> >     git rev-list --disk-usage HEAD --objects --filter=object:type=tag
> >     git rev-list --disk-usage HEAD --objects --filter=object:type=tree
> > 
> > The `--disk-usage` option also supports `--disk-usage=human`, which we
> > can use in the next commit to verify that our computations are the same
> > across git-rev-list(1) and git-repo(1).
> 
> So, I'm not sure we can use git-rev-list(1) in the manner suggested
> above. It looks like user-specified objects are always included in the
> output. When using "HEAD" this means the referenced object will always
> be included regardless of the filter used. In practice, this means
> reported disk-usage when filtering by trees or blobs will likely be
> inflated by objects not specified by the filter. As far as I am aware,
> there is not a way to suppress user-specified objects in git-rev-list(1)
> output.

There is, you can use "--filter-provided-objects".

> I am somewhat curious if always including user-specified objects in
> git-rev-list(1) output regardless of the specified filter is
> intentional. Looking at git-rev-list(1) --filter documentation:
> 
>   The form --filter=object:type=(tag|commit|tree|blob) omits all objects
>   which are not of the requested type.
> 
> doesn't indicate this limitation. From looking at the code in
> list-objects-filter.c:list_objects_filter__filter_object() though, it
> does somewhat seem like this behavior is intentional.

It is intentional, but I've been bitten by it in the past. Hence I
introduced the above option in 9cf68b27d5 (rev-list: allow filtering of
provided items, 2021-04-19).

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-15  5:33       ` Patrick Steinhardt
@ 2025-12-15 16:24         ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/15 06:33AM, Patrick Steinhardt wrote:
> On Fri, Dec 12, 2025 at 02:40:24PM -0600, Justin Tobler wrote:
> > So, I'm not sure we can use git-rev-list(1) in the manner suggested
> > above. It looks like user-specified objects are always included in the
> > output. When using "HEAD" this means the referenced object will always
> > be included regardless of the filter used. In practice, this means
> > reported disk-usage when filtering by trees or blobs will likely be
> > inflated by objects not specified by the filter. As far as I am aware,
> > there is not a way to suppress user-specified objects in git-rev-list(1)
> > output.
> 
> There is, you can use "--filter-provided-objects".

Perfect! I don't know how I missed that option. XD

> > I am somewhat curious if always including user-specified objects in
> > git-rev-list(1) output regardless of the specified filter is
> > intentional. Looking at git-rev-list(1) --filter documentation:
> > 
> >   The form --filter=object:type=(tag|commit|tree|blob) omits all objects
> >   which are not of the requested type.
> > 
> > doesn't indicate this limitation. From looking at the code in
> > list-objects-filter.c:list_objects_filter__filter_object() though, it
> > does somewhat seem like this behavior is intentional.
> 
> It is intentional, but I've been bitten by it in the past. Hence I
> introduced the above option in 9cf68b27d5 (rev-list: allow filtering of
> provided items, 2021-04-19).

Good to know. I think I'll submit a small patch today to try to clarify
the documentation here a little bit. It might be nice to point out this
behavior a bit more explictly in the --filter section. :)

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-09 22:58 ` [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
  2025-12-10  6:28   ` Patrick Steinhardt
@ 2025-12-10 14:58   ` Junio C Hamano
  2025-12-10 19:09     ` Lucas Seiki Oshiro
  2025-12-12 22:36     ` Justin Tobler
  1 sibling, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2025-12-10 14:58 UTC (permalink / raw)
  To: Justin Tobler, Lucas Seiki Oshiro; +Cc: git, ps

Justin Tobler <jltobler@gmail.com> writes:

> -test_expect_success SHA1 'keyvalue and nul format' '
> +test_expect_success SHA1 'keyvalue format' '
>  	test_when_finished "rm -rf repo" &&
>  	git init repo &&
>  	(
> @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
>  		objects.tags.inflated=132
>  		EOF
>  
> -		git repo structure --format=keyvalue >out 2>err &&
> +		git repo structure --format=keyvalue >out.raw 2>err &&
>  
> -		test_cmp expect out &&
> -		test_line_count = 0 err &&
> +		# Strip object disk usage from output due to platform variance.
> +		grep -v "objects\..*\.disk=" out.raw >out &&
>  
> -		# Replace key and value delimiters for nul format.
> -		tr "\n=" "\0\n" <expect >expect_nul &&
> -		git repo structure --format=nul >out 2>err &&
> -
> -		test_cmp expect_nul out &&
> +		test_cmp expect out &&
>  		test_line_count = 0 err
>  	)
>  '

This part has both textual and semantic conflicts with Lucas's "-z
is a synonym for --format=nul" topic.  I think I resolved it
correctly while improving the "munge expected output into expected
NUL-terminated output" approach to "munge -z output into textual
output and compare with textual expected output".  Please sanity
check the result after I push it out, merged at 32f8d84b (Merge
branch 'jt/repo-struct-more-objinfo' into seen, 2025-12-10)

Thanks.

commit 32f8d84b5cfc5a5704e30fe4abc9d8755893179c
Merge: 09bd4419e7 b8cacabfa5
Author: Junio C Hamano <gitster@pobox.com>
Date:   Wed Dec 10 20:41:31 2025 +0900

    Merge branch 'jt/repo-struct-more-objinfo' into seen
    
    * jt/repo-struct-more-objinfo:
      builtin/repo: add object disk size info to structure table
      builtin/repo: add disk size info to keyvalue stucture output
      builtin/repo: add inflated object info to structure table
      builtin/repo: add inflated object info to keyvalue structure output
      builtin/repo: humanise count values in structure output
      builtin/repo: group per-type object values into struct

diff --cc t/t1901-repo-structure.sh
index df7d4ea524,51820cc3f6..31c77c4666
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@@ -90,25 -148,18 +148,29 @@@ test_expect_success SHA1 'keyvalue form
  		objects.trees.count=42
  		objects.blobs.count=42
  		objects.tags.count=1
+ 		objects.commits.inflated=9225
+ 		objects.trees.inflated=28554
+ 		objects.blobs.inflated=453
+ 		objects.tags.inflated=132
  		EOF
  
- 		git repo structure --format=keyvalue >out 2>err &&
+ 		git repo structure --format=keyvalue >out.raw 2>err &&
  
- 		test_cmp expect out &&
- 		test_line_count = 0 err &&
+ 		# Strip object disk usage from output due to platform variance.
+ 		grep -v "objects\..*\.disk=" out.raw >out &&
  
- 		# Replace key and value delimiters for nul format.
- 		tr "\n=" "\0\n" <expect >expect_nul &&
- 		git repo structure --format=nul >out 2>err &&
- 
- 		test_cmp expect_nul out &&
++		test_cmp expect out &&
 +		test_line_count = 0 err &&
 +
 +		# "-z", as a synonym to "--format=nul", participates in the
 +		# usual "last one wins" rule.
- 		git repo structure --format=table -z >out 2>err &&
++		git repo structure --format=table -z >out.raw 2>err &&
 +
- 		test_cmp expect_nul out &&
++		# Replace key and value delimiters for nul format.
++		tr "\0\n" "\n=" <out.raw |
++		grep -v "objects\..*\.disk=" >out &&
++
+ 		test_cmp expect out &&
  		test_line_count = 0 err
  	)
  '

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-10 14:58   ` Junio C Hamano
@ 2025-12-10 19:09     ` Lucas Seiki Oshiro
  2025-12-12 22:36     ` Justin Tobler
  1 sibling, 0 replies; 80+ messages in thread
From: Lucas Seiki Oshiro @ 2025-12-10 19:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Justin Tobler, git, ps


> This part has both textual and semantic conflicts with Lucas's "-z
> is a synonym for --format=nul" topic.

My only change in this test was:

diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..df7d4ea524 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -101,6 +101,13 @@ test_expect_success 'keyvalue and nul format' '
 		tr "\n=" "\0\n" <expect >expect_nul &&
 		git repo structure --format=nul >out 2>err &&

+		test_cmp expect_nul out &&
+		test_line_count = 0 err &&
+
+		# "-z", as a synonym to "--format=nul", participates in the
+		# usual "last one wins" rule.
+		git repo structure --format=table -z >out 2>err &&
+
 		test_cmp expect_nul out &&
 		test_line_count = 0 err
 	)

Given that Justin moved the --format=nul test to
`test_expect_success 'empty repository'`, it should be ok to move
my change together with it. I did it here and everything seems to
be working.

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-10 14:58   ` Junio C Hamano
  2025-12-10 19:09     ` Lucas Seiki Oshiro
@ 2025-12-12 22:36     ` Justin Tobler
  2025-12-12 23:58       ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lucas Seiki Oshiro, git, ps

On 25/12/10 11:58PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > -test_expect_success SHA1 'keyvalue and nul format' '
> > +test_expect_success SHA1 'keyvalue format' '
> >  	test_when_finished "rm -rf repo" &&
> >  	git init repo &&
> >  	(
> > @@ -106,16 +137,12 @@ test_expect_success SHA1 'keyvalue and nul format' '
> >  		objects.tags.inflated=132
> >  		EOF
> >  
> > -		git repo structure --format=keyvalue >out 2>err &&
> > +		git repo structure --format=keyvalue >out.raw 2>err &&
> >  
> > -		test_cmp expect out &&
> > -		test_line_count = 0 err &&
> > +		# Strip object disk usage from output due to platform variance.
> > +		grep -v "objects\..*\.disk=" out.raw >out &&
> >  
> > -		# Replace key and value delimiters for nul format.
> > -		tr "\n=" "\0\n" <expect >expect_nul &&
> > -		git repo structure --format=nul >out 2>err &&
> > -
> > -		test_cmp expect_nul out &&
> > +		test_cmp expect out &&
> >  		test_line_count = 0 err
> >  	)
> >  '
> 
> This part has both textual and semantic conflicts with Lucas's "-z
> is a synonym for --format=nul" topic.  I think I resolved it
> correctly while improving the "munge expected output into expected
> NUL-terminated output" approach to "munge -z output into textual
> output and compare with textual expected output".  Please sanity
> check the result after I push it out, merged at 32f8d84b (Merge
> branch 'jt/repo-struct-more-objinfo' into seen, 2025-12-10)

Thanks, this looks correct.

Just FYI, some of the test changes I made here are reverted in the next
version since Patrick suggested a better way to test disk usage output.
This should allow Lucas's changes to apply a bit more cleanly to this
file.

Thanks,
-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-12 22:36     ` Justin Tobler
@ 2025-12-12 23:58       ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2025-12-12 23:58 UTC (permalink / raw)
  To: Justin Tobler; +Cc: Lucas Seiki Oshiro, git, ps

Justin Tobler <jltobler@gmail.com> writes:

> Just FYI, some of the test changes I made here are reverted in the next
> version since Patrick suggested a better way to test disk usage output.
> This should allow Lucas's changes to apply a bit more cleanly to this
> file.

Good.  I expect that Lucas's series would also be updated,
especially in the way the nul-delimited output is tested, so we'll
see what happens ;-).

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 6/6] builtin/repo: add object disk size info to structure table
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
                   ` (4 preceding siblings ...)
  2025-12-09 22:58 ` [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-09 22:58 ` Justin Tobler
  2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
  6 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-09 22:58 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

Similar to a prior commit, update the table output format for the
git-repo(1) structure commdn to display the total object disk usage by
object type.

Since disk size may vary between platforms, tests do not validate actual
values and only check that size info is printed in an empty repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 13 +++++++++++++
 t/t1901-repo-structure.sh | 19 ++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 8ea7c9b24f..8ddefd523e 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -371,6 +371,7 @@ static void stats_table_setup_structure(struct stats_table *table,
 	struct ref_stats *refs = &stats->refs;
 	size_t inflated_object_total;
 	size_t object_count_total;
+	size_t disk_object_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -405,6 +406,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			      "    * %s", _("Blobs"));
 	stats_table_size_addf(table, objects->inflated_sizes.tags,
 			      "    * %s", _("Tags"));
+
+	disk_object_total = get_total_object_values(&objects->disk_sizes);
+	stats_table_size_addf(table, disk_object_total,
+			      "  * %s", _("Disk size"));
+	stats_table_size_addf(table, objects->disk_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->disk_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->disk_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->disk_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index a98c651f1d..51820cc3f6 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,15 @@ test_description='test git repo structure'
 
 . ./test-lib.sh
 
+strip_object_disk_usage() {
+	awk '
+		/^\|   \* Disk size/ { skip=1; next }
+		skip && /^\|     \* / { next }
+		skip && !/^\|     \* / { skip=0 }
+		{ print }
+	' $1
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -30,6 +39,11 @@ test_expect_success 'empty repository' '
 		|     * Trees          |    0 B |
 		|     * Blobs          |    0 B |
 		|     * Tags           |    0 B |
+		|   * Disk size        |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -107,7 +121,10 @@ test_expect_success SHA1 'repository with references and objects' '
 		|     * Tags           |    132 B   |
 		EOF
 
-		git repo structure >out 2>err &&
+		git repo structure >out.raw 2>err &&
+
+		# Skip object disk sizes due to platform variance.
+		strip_object_disk_usage out.raw >out &&
 
 		test_cmp expect out &&
 		test_line_count = 0 err
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 6/6] builtin/repo: add object disk size info to structure table
  2025-12-09 22:58 ` [PATCH 6/6] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-10  6:28   ` Patrick Steinhardt
  2025-12-10 15:24     ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  6:28 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

On Tue, Dec 09, 2025 at 04:58:20PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index a98c651f1d..51820cc3f6 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -107,7 +121,10 @@ test_expect_success SHA1 'repository with references and objects' '
>  		|     * Tags           |    132 B   |
>  		EOF
>  
> -		git repo structure >out 2>err &&
> +		git repo structure >out.raw 2>err &&
> +
> +		# Skip object disk sizes due to platform variance.
> +		strip_object_disk_usage out.raw >out &&

As mentioned, we can use git-rev-list(1) to compute the expected disk
sizes.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 6/6] builtin/repo: add object disk size info to structure table
  2025-12-10  6:28   ` Patrick Steinhardt
@ 2025-12-10 15:24     ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-10 15:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 25/12/10 07:28AM, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 04:58:20PM -0600, Justin Tobler wrote:
> > diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> > index a98c651f1d..51820cc3f6 100755
> > --- a/t/t1901-repo-structure.sh
> > +++ b/t/t1901-repo-structure.sh
> > @@ -107,7 +121,10 @@ test_expect_success SHA1 'repository with references and objects' '
> >  		|     * Tags           |    132 B   |
> >  		EOF
> >  
> > -		git repo structure >out 2>err &&
> > +		git repo structure >out.raw 2>err &&
> > +
> > +		# Skip object disk sizes due to platform variance.
> > +		strip_object_disk_usage out.raw >out &&
> 
> As mentioned, we can use git-rev-list(1) to compute the expected disk
> sizes.

Thanks, I'll give this a go. :)

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 0/7] builtin/repo: add object size info to structure output
  2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
                   ` (5 preceding siblings ...)
  2025-12-09 22:58 ` [PATCH 6/6] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
                     ` (7 more replies)
  6 siblings, 8 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Greetings,

This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.

In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.

Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
  downscaling values and determining the appropriate unit prefix
  separately. This enables more control over how exactly the values are
  written to the structure output table which is useful for alignment
  reasons. I'm not how about the interface used in patch 2. Feedback is
  most welcome.
- In the previous version, when checking object size on a missing object
  we would die. Instead we now ignore missing objects. This allows the
  structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
  real expected values instead of skipping. Table output tests still
  skip verifing human-readable values though.

Thanks,
-Justin

Justin Tobler (7):
  builtin/repo: group per-type object values into struct
  strbuf: split out logic to humanise byte values
  builtin/repo: humanise count values in structure output
  builtin/repo: add inflated object info to keyvalue structure output
  builtin/repo: add inflated object info to structure table
  builtin/repo: add disk size info to keyvalue stucture output
  builtin/repo: add object disk size info to structure table

 Documentation/git-repo.adoc |   2 +
 builtin/repo.c              | 185 ++++++++++++++++++++++++++++++------
 strbuf.c                    |  89 ++++++++++-------
 strbuf.h                    |  17 ++++
 t/t1901-repo-structure.sh   | 110 ++++++++++++++-------
 5 files changed, 304 insertions(+), 99 deletions(-)

Range-diff against v1:
1:  bd3f1e6ec6 = 1:  be14de68f6 builtin/repo: group per-type object values into struct
6:  bce4c7b5f1 ! 2:  5ca6f9b708 builtin/repo: add object disk size info to structure table
    @@ Metadata
     Author: Justin Tobler <jltobler@gmail.com>
     
      ## Commit message ##
    -    builtin/repo: add object disk size info to structure table
    +    strbuf: split out logic to humanise byte values
     
    -    Similar to a prior commit, update the table output format for the
    -    git-repo(1) structure commdn to display the total object disk usage by
    -    object type.
    -
    -    Since disk size may vary between platforms, tests do not validate actual
    -    values and only check that size info is printed in an empty repository.
    +    In a subsequent commit, byte size values displayed in table output for
    +    the git-repo(1) "structure" subcommand will be shown in a more
    +    human-readable format with the appropriate unit prefixes. For this
    +    usecase, the downscaled values and unit prefixes must be handled
    +    separately to ensure proper column alignment. Refactor strbuf_humanise()
    +    to instead append the downscaled byte value to the buffer only and
    +    return the appropriate unit prefix string.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
    - ## builtin/repo.c ##
    -@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *table,
    - 	struct ref_stats *refs = &stats->refs;
    - 	size_t inflated_object_total;
    - 	size_t object_count_total;
    -+	size_t disk_object_total;
    - 	size_t ref_total;
    + ## strbuf.c ##
    +@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
    + 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
    + }
      
    - 	ref_total = get_total_reference_count(refs);
    -@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *table,
    - 			      "    * %s", _("Blobs"));
    - 	stats_table_size_addf(table, objects->inflated_sizes.tags,
    - 			      "    * %s", _("Tags"));
    +-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
    +-				 int humanise_rate)
    ++char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
    + {
    ++	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
     +
    -+	disk_object_total = get_total_object_values(&objects->disk_sizes);
    -+	stats_table_size_addf(table, disk_object_total,
    -+			      "  * %s", _("Disk size"));
    -+	stats_table_size_addf(table, objects->disk_sizes.commits,
    -+			      "    * %s", _("Commits"));
    -+	stats_table_size_addf(table, objects->disk_sizes.trees,
    -+			      "    * %s", _("Trees"));
    -+	stats_table_size_addf(table, objects->disk_sizes.blobs,
    -+			      "    * %s", _("Blobs"));
    -+	stats_table_size_addf(table, objects->disk_sizes.tags,
    -+			      "    * %s", _("Tags"));
    + 	if (bytes > 1 << 30) {
    +-		strbuf_addf(buf,
    +-				humanise_rate == 0 ?
    +-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
    +-					_("%u.%2.2u GiB") :
    +-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
    +-					_("%u.%2.2u GiB/s"),
    +-			    (unsigned)(bytes >> 30),
    ++		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
    + 			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
    ++		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
    ++		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
    + 	} else if (bytes > 1 << 20) {
    +-		unsigned x = bytes + 5243;  /* for rounding */
    +-		strbuf_addf(buf,
    +-				humanise_rate == 0 ?
    +-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
    +-					_("%u.%2.2u MiB") :
    +-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
    +-					_("%u.%2.2u MiB/s"),
    +-			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
    ++		unsigned x = bytes + 5243; /* for rounding */
    ++		strbuf_addf(buf, "%u.%2.2u", x >> 20,
    ++			    ((x & ((1 << 20) - 1)) * 100) >> 20);
    ++		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
    ++		return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
    + 	} else if (bytes > 1 << 10) {
    +-		unsigned x = bytes + 5;  /* for rounding */
    +-		strbuf_addf(buf,
    +-				humanise_rate == 0 ?
    +-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
    +-					_("%u.%2.2u KiB") :
    +-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
    +-					_("%u.%2.2u KiB/s"),
    +-			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
    ++		unsigned x = bytes + 5; /* for rounding */
    ++		strbuf_addf(buf, "%u.%2.2u", x >> 10,
    ++			    ((x & ((1 << 10) - 1)) * 100) >> 10);
    ++		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
    ++		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
    + 	} else {
    +-		strbuf_addf(buf,
    +-				humanise_rate == 0 ?
    +-					/* TRANSLATORS: IEC 80000-13:2008 byte */
    +-					Q_("%u byte", "%u bytes", bytes) :
    +-					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
    +-					Q_("%u byte/s", "%u bytes/s", bytes),
    +-				(unsigned)bytes);
    ++		strbuf_addf(buf, "%u", (unsigned)bytes);
    ++		return humanise_rate ?
    ++			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
    ++			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
    ++			       /* TRANSLATORS: IEC 80000-13:2008 byte */
    ++			       xstrfmt(Q_("byte", "bytes", bytes));
    + 	}
      }
      
    - static void stats_table_print_structure(const struct stats_table *table)
    -
    - ## t/t1901-repo-structure.sh ##
    -@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
    - 
    - . ./test-lib.sh
    + void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
    + {
    +-	strbuf_humanise(buf, bytes, 0);
    ++	char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
    ++	strbuf_addf(buf, " %s", unit);
    ++	free(unit);
    + }
      
    -+strip_object_disk_usage() {
    -+	awk '
    -+		/^\|   \* Disk size/ { skip=1; next }
    -+		skip && /^\|     \* / { next }
    -+		skip && !/^\|     \* / { skip=0 }
    -+		{ print }
    -+	' $1
    -+}
    -+
    - test_expect_success 'empty repository' '
    - 	test_when_finished "rm -rf repo" &&
    - 	git init repo &&
    -@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
    - 		|     * Trees          |    0 B |
    - 		|     * Blobs          |    0 B |
    - 		|     * Tags           |    0 B |
    -+		|   * Disk size        |    0 B |
    -+		|     * Commits        |    0 B |
    -+		|     * Trees          |    0 B |
    -+		|     * Blobs          |    0 B |
    -+		|     * Tags           |    0 B |
    - 		EOF
    + void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
    + {
    +-	strbuf_humanise(buf, bytes, 1);
    ++	char *unit = strbuf_humanise_bytes_value(buf, bytes, STRBUF_HUMANISE_RATE);
    ++	strbuf_addf(buf, " %s", unit);
    ++	free(unit);
    + }
      
    - 		git repo structure >out 2>err &&
    -@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
    - 		|     * Tags           |    132 B   |
    - 		EOF
    + int printf_ln(const char *fmt, ...)
    +
    + ## strbuf.h ##
    +@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
    +  */
    + void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
      
    --		git repo structure >out 2>err &&
    -+		git repo structure >out.raw 2>err &&
    ++#define STRBUF_HUMANISE_RATE 1 << 0
     +
    -+		# Skip object disk sizes due to platform variance.
    -+		strip_object_disk_usage out.raw >out &&
    - 
    - 		test_cmp expect out &&
    - 		test_line_count = 0 err
    ++/**
    ++ * Append the given byte size as a human-readable string that is downscaled by
    ++ * some factor. A string with the corresponding unit prefix is returned
    ++ * separately.
    ++ */
    ++char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
    ++
    + /**
    +  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
    +  * 3.50 MiB).
2:  3f56d52cd9 ! 3:  2efc3533ef builtin/repo: humanise count values in structure output
    @@ builtin/repo.c: struct stats_table {
       */
      struct stats_table_entry {
      	char *value;
    -+	const char *unit;
    ++	char *unit;
      };
      
      static void stats_table_vaddf(struct stats_table *table,
    @@ builtin/repo.c: static void stats_table_vaddf(struct stats_table *table,
      }
      
      static void stats_table_addf(struct stats_table *table, const char *format, ...)
    -@@ builtin/repo.c: static void stats_table_addf(struct stats_table *table, const char *format, ...)
    - 	va_end(ap);
    - }
    - 
    -+static const char *unit_k = "k";
    -+static const char *unit_M = "M";
    -+static const char *unit_G = "G";
    -+
    - static void stats_table_count_addf(struct stats_table *table, size_t value,
    +@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
      				   const char *format, ...)
      {
    -@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
    + 	struct stats_table_entry *entry;
    ++	struct strbuf buf = STRBUF_INIT;
      	va_list ap;
      
      	CALLOC_ARRAY(entry, 1);
     -	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
     +
    -+	if (value >= 1000000000) {
    -+		uintmax_t x = (uintmax_t)value + 5000000;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
    -+				       x / 1000000000,
    -+				       x % 1000000000 / 10000000);
    -+		entry->unit = unit_G;
    -+	} else if (value >= 1000000) {
    -+		uintmax_t x = (uintmax_t)value + 5000;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
    -+				       x / 1000000, x % 1000000 / 10000);
    -+		entry->unit = unit_M;
    -+	} else if (value >= 1000) {
    -+		uintmax_t x = (uintmax_t)value + 5;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX,
    -+				       x / 1000, x % 1000 / 10);
    -+		entry->unit = unit_k;
    -+	} else {
    -+		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
    -+	}
    ++	entry->unit = strbuf_humanise_count_value(&buf, value);
    ++	entry->value = strbuf_detach(&buf, NULL);
      
      	va_start(ap, format);
      	stats_table_vaddf(table, entry, format, ap);
    @@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
      		strbuf_addstr(&buf, " |");
      		printf("%s\n", buf.buf);
      	}
    +@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
    + 
    + 	for_each_string_list_item(item, &table->rows) {
    + 		entry = item->util;
    +-		if (entry)
    ++		if (entry) {
    + 			free(entry->value);
    ++			free(entry->unit);
    ++		}
    + 	}
    + 
    + 	string_list_clear(&table->rows, 1);
    +
    + ## strbuf.c ##
    +@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
    + 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
    + }
    + 
    ++char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
    ++{
    ++	if (value >= 1000000000) {
    ++		uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
    ++		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    ++			    x / 1000000000, x % 1000000000 / 10000000);
    ++		return xstrfmt(_("G"));
    ++	} else if (value >= 1000000) {
    ++		uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
    ++		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    ++			    x / 1000000, x % 1000000 / 10000);
    ++		return xstrfmt(_("M"));
    ++	} else if (value >= 1000) {
    ++		uintmax_t x = (uintmax_t)value + 5; /* for rounding */
    ++		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    ++			    x / 1000, x % 1000 / 10);
    ++		return xstrfmt(_("k"));
    ++	} else {
    ++		strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
    ++		return NULL;
    ++	}
    ++}
    ++
    + char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
    + {
    + 	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
    +
    + ## strbuf.h ##
    +@@ strbuf.h: void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
    +  */
    + char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
    + 
    ++/**
    ++ * Append the given count value as a human-readable string that is downsacled by
    ++ * some factor. A string with the corresponding unit prefix is returned
    ++ * separately.
    ++ */
    ++char *strbuf_humanise_count_value(struct strbuf *buf, size_t value);
    ++
    + /**
    +  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
    +  * 3.50 MiB).
     
      ## t/t1901-repo-structure.sh ##
     @@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
3:  594bd320d1 ! 4:  627b8bf025 builtin/repo: add inflated object info to keyvalue structure output
    @@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stat
      	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
      	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
      
    -+	printf("objects.commits.inflated%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
    -+	printf("objects.trees.inflated%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
    -+	printf("objects.blobs.inflated%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
    -+	printf("objects.tags.inflated%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
     +
      	fflush(stdout);
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
     +
     +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
     +						  OBJECT_INFO_FOR_PREFETCH) < 0)
    -+			die(_("cannot read object for %s"),
    -+			    oid_to_hex(&oids->oid[i]));
    ++			continue;
     +
     +		inflated_total += inflated;
     +	}
    @@ t/t1901-repo-structure.sh: test_expect_success 'keyvalue and nul format' '
      		objects.trees.count=42
      		objects.blobs.count=42
      		objects.tags.count=1
    -+		objects.commits.inflated=9225
    -+		objects.trees.inflated=28554
    -+		objects.blobs.inflated=453
    -+		objects.tags.inflated=132
    ++		objects.commits.inflated_size=9225
    ++		objects.trees.inflated_size=28554
    ++		objects.blobs.inflated_size=453
    ++		objects.tags.inflated_size=132
      		EOF
      
      		git repo structure --format=keyvalue >out 2>err &&
4:  3406b1ed90 ! 5:  14f4983e1d builtin/repo: add inflated object info to structure table
    @@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, si
      	va_end(ap);
      }
      
    -+static const char *unit_B = "B";
    -+static const char *unit_KiB = "KiB";
    -+static const char *unit_MiB = "MiB";
    -+static const char *unit_GiB = "GiB";
    -+
     +static void stats_table_size_addf(struct stats_table *table, size_t value,
     +				  const char *format, ...)
     +{
     +	struct stats_table_entry *entry;
    ++	struct strbuf buf = STRBUF_INIT;
     +	va_list ap;
     +
     +	CALLOC_ARRAY(entry, 1);
     +
    -+	if (value > 1 << 30) {
    -+		uintmax_t x = (uintmax_t)value + 5368709;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 30,
    -+				       ((x & ((1 << 30) - 1)) * 100) >> 30);
    -+		entry->unit = unit_GiB;
    -+	} else if (value > 1 << 20) {
    -+		uintmax_t x = (uintmax_t)value + 5243;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 20,
    -+				       ((x & ((1 << 20) - 1)) * 100) >> 20);
    -+		entry->unit = unit_MiB;
    -+	} else if (value > 1 << 10) {
    -+		uintmax_t x = (uintmax_t)value + 5;
    -+		entry->value = xstrfmt("%" PRIuMAX ".%02" PRIuMAX, x >> 10,
    -+				       ((x & ((1 << 10) - 1)) * 100) >> 10);
    -+		entry->unit = unit_KiB;
    -+	} else {
    -+		entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
    -+		entry->unit = unit_B;
    -+	}
    ++	entry->unit = strbuf_humanise_bytes_value(&buf, value,
    ++						  STRBUF_HUMANISE_COMPACT);
    ++	entry->value = strbuf_detach(&buf, NULL);
     +
     +	va_start(ap, format);
     +	stats_table_vaddf(table, entry, format, ap);
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      
      static void stats_table_print_structure(const struct stats_table *table)
     
    + ## strbuf.c ##
    +@@ strbuf.c: char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flag
    + 		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
    + 	} else {
    + 		strbuf_addf(buf, "%u", (unsigned)bytes);
    ++		if (flags & STRBUF_HUMANISE_COMPACT)
    ++			return humanise_rate ?
    ++				       xstrfmt(_("B/s")) :
    ++				       xstrfmt(_("B"));
    + 		return humanise_rate ?
    + 			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
    + 			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
    +
    + ## strbuf.h ##
    +@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
    +  */
    + void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
    + 
    +-#define STRBUF_HUMANISE_RATE 1 << 0
    ++#define STRBUF_HUMANISE_RATE	1 << 0
    ++#define STRBUF_HUMANISE_COMPACT 1 << 1
    + 
    + /**
    +  * Append the given byte size as a human-readable string that is downscaled by
    +
      ## t/t1901-repo-structure.sh ##
     @@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
      		| Repository structure | Value  |
5:  48461ac6a0 ! 6:  dc9e82889f builtin/repo: add disk size info to keyvalue stucture output
    @@ Commit message
         the git-repo(1) structure command to additionally provide info regarding
         total object disk sizes by object type.
     
    -    Since disk size may vary between platforms, tests do not validate actual
    -    values and only check that size info is printed in an empty repository.
    -
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## Documentation/git-repo.adoc ##
    @@ builtin/repo.c: struct object_values {
      
      struct repo_structure {
     @@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
    - 	printf("objects.tags.inflated%c%" PRIuMAX "%c", key_delim,
    + 	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
      	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
      
    -+	printf("objects.commits.disk%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
    -+	printf("objects.trees.disk%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
    -+	printf("objects.blobs.disk%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
    -+	printf("objects.tags.disk%c%" PRIuMAX "%c", key_delim,
    ++	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
     +	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
     +
      	fflush(stdout);
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
      
      		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
      						  OBJECT_INFO_FOR_PREFETCH) < 0)
    -@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_array *oids,
    - 			    oid_to_hex(&oids->oid[i]));
    + 			continue;
      
      		inflated_total += inflated;
     +		disk_total += disk;
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
      		BUG("invalid object type");
     
      ## t/t1901-repo-structure.sh ##
    -@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
    - 		git repo structure >out 2>err &&
    +@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
      
    - 		test_cmp expect out &&
    -+		test_line_count = 0 err &&
    -+
    -+		cat >expect <<-\EOF &&
    -+		references.branches.count=0
    -+		references.tags.count=0
    -+		references.remotes.count=0
    -+		references.others.count=0
    -+		objects.commits.count=0
    -+		objects.trees.count=0
    -+		objects.blobs.count=0
    -+		objects.tags.count=0
    -+		objects.commits.inflated=0
    -+		objects.trees.inflated=0
    -+		objects.blobs.inflated=0
    -+		objects.tags.inflated=0
    -+		objects.commits.disk=0
    -+		objects.trees.disk=0
    -+		objects.blobs.disk=0
    -+		objects.tags.disk=0
    -+		EOF
    -+
    -+		git repo structure --format=keyvalue >out 2>err &&
    -+
    -+		test_cmp expect out &&
    -+		test_line_count = 0 err &&
    -+
    -+		# Replace key and value delimiters for nul format.
    -+		tr "\n=" "\0\n" <expect >expect_nul &&
    -+		git repo structure --format=nul >out 2>err &&
    -+
    -+		test_cmp expect_nul out &&
    - 		test_line_count = 0 err
    - 	)
    - '
    -@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
    - 	)
    - '
    + . ./test-lib.sh
      
    --test_expect_success SHA1 'keyvalue and nul format' '
    -+test_expect_success SHA1 'keyvalue format' '
    ++object_type_disk_usage() {
    ++	git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
    ++		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
    ++}
    ++
    + test_expect_success 'empty repository' '
      	test_when_finished "rm -rf repo" &&
      	git init repo &&
    - 	(
     @@ t/t1901-repo-structure.sh: test_expect_success SHA1 'keyvalue and nul format' '
    - 		objects.tags.inflated=132
    - 		EOF
    + 		test_commit_bulk 42 &&
    + 		git tag -a foo -m bar &&
      
    --		git repo structure --format=keyvalue >out 2>err &&
    -+		git repo structure --format=keyvalue >out.raw 2>err &&
    - 
    --		test_cmp expect out &&
    --		test_line_count = 0 err &&
    -+		# Strip object disk usage from output due to platform variance.
    -+		grep -v "objects\..*\.disk=" out.raw >out &&
    +-		cat >expect <<-\EOF &&
    ++		cat >expect <<-EOF &&
    + 		references.branches.count=1
    + 		references.tags.count=1
    + 		references.remotes.count=0
    +@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'keyvalue and nul format' '
    + 		objects.trees.inflated_size=28554
    + 		objects.blobs.inflated_size=453
    + 		objects.tags.inflated_size=132
    ++		objects.commits.disk_size=$(object_type_disk_usage commit)
    ++		objects.trees.disk_size=$(object_type_disk_usage tree)
    ++		objects.blobs.disk_size=$(object_type_disk_usage blob)
    ++		objects.tags.disk_size=$(object_type_disk_usage tag)
    + 		EOF
      
    --		# Replace key and value delimiters for nul format.
    --		tr "\n=" "\0\n" <expect >expect_nul &&
    --		git repo structure --format=nul >out 2>err &&
    --
    --		test_cmp expect_nul out &&
    -+		test_cmp expect out &&
    - 		test_line_count = 0 err
    - 	)
    - '
    + 		git repo structure --format=keyvalue >out 2>err &&
-:  ---------- > 7:  213b19dc7f builtin/repo: add object disk size info to structure table

base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 1/7] builtin/repo: group per-type object values into struct
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
 	size_t others;
 };
 
-struct object_stats {
+struct object_values {
 	size_t tags;
 	size_t commits;
 	size_t trees;
 	size_t blobs;
 };
 
+struct object_stats {
+	struct object_values type_counts;
+};
+
 struct repo_structure {
 	struct ref_stats refs;
 	struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
 	return stats->branches + stats->remotes + stats->tags + stats->others;
 }
 
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
 {
-	return stats->tags + stats->commits + stats->trees + stats->blobs;
+	return values->tags + values->commits + values->trees + values->blobs;
 }
 
 static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_count(objects);
+	object_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
 	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
-	stats_table_count_addf(table, objects->commits, "    * %s", _("Commits"));
-	stats_table_count_addf(table, objects->trees, "    * %s", _("Trees"));
-	stats_table_count_addf(table, objects->blobs, "    * %s", _("Blobs"));
-	stats_table_count_addf(table, objects->tags, "    * %s", _("Tags"));
+	stats_table_count_addf(table, objects->type_counts.commits,
+			       "    * %s", _("Commits"));
+	stats_table_count_addf(table, objects->type_counts.trees,
+			       "    * %s", _("Trees"));
+	stats_table_count_addf(table, objects->type_counts.blobs,
+			       "    * %s", _("Blobs"));
+	stats_table_count_addf(table, objects->type_counts.tags,
+			       "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	       (uintmax_t)stats->refs.others, value_delim);
 
 	printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.commits, value_delim);
+	       (uintmax_t)stats->objects.type_counts.commits, value_delim);
 	printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.trees, value_delim);
+	       (uintmax_t)stats->objects.type_counts.trees, value_delim);
 	printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.blobs, value_delim);
+	       (uintmax_t)stats->objects.type_counts.blobs, value_delim);
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.tags, value_delim);
+	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
 	fflush(stdout);
 }
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 
 	switch (type) {
 	case OBJ_TAG:
-		stats->tags += oids->nr;
+		stats->type_counts.tags += oids->nr;
 		break;
 	case OBJ_COMMIT:
-		stats->commits += oids->nr;
+		stats->type_counts.commits += oids->nr;
 		break;
 	case OBJ_TREE:
-		stats->trees += oids->nr;
+		stats->type_counts.trees += oids->nr;
 		break;
 	case OBJ_BLOB:
-		stats->blobs += oids->nr;
+		stats->type_counts.blobs += oids->nr;
 		break;
 	default:
 		BUG("invalid object type");
 	}
 
-	object_count = get_total_object_count(stats);
+	object_count = get_total_object_values(&stats->type_counts);
 	display_progress(data->progress, object_count);
 
 	return 0;
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
                       ` (2 more replies)
  2025-12-12 22:36   ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
                     ` (5 subsequent siblings)
  7 siblings, 3 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment. Refactor strbuf_humanise()
to instead append the downscaled byte value to the buffer only and
return the appropriate unit prefix string.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 strbuf.c | 62 +++++++++++++++++++++++++-------------------------------
 strbuf.h |  9 ++++++++
 2 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..1fb47bf21b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,55 +836,49 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
-				 int humanise_rate)
+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
 {
+	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
+
 	if (bytes > 1 << 30) {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
-					_("%u.%2.2u GiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
-					_("%u.%2.2u GiB/s"),
-			    (unsigned)(bytes >> 30),
+		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
 			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
 	} else if (bytes > 1 << 20) {
-		unsigned x = bytes + 5243;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
-					_("%u.%2.2u MiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
-					_("%u.%2.2u MiB/s"),
-			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+		unsigned x = bytes + 5243; /* for rounding */
+		strbuf_addf(buf, "%u.%2.2u", x >> 20,
+			    ((x & ((1 << 20) - 1)) * 100) >> 20);
+		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+		return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
 	} else if (bytes > 1 << 10) {
-		unsigned x = bytes + 5;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
-					_("%u.%2.2u KiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
-					_("%u.%2.2u KiB/s"),
-			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+		unsigned x = bytes + 5; /* for rounding */
+		strbuf_addf(buf, "%u.%2.2u", x >> 10,
+			    ((x & ((1 << 10) - 1)) * 100) >> 10);
+		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
 	} else {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 byte */
-					Q_("%u byte", "%u bytes", bytes) :
-					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
-					Q_("%u byte/s", "%u bytes/s", bytes),
-				(unsigned)bytes);
+		strbuf_addf(buf, "%u", (unsigned)bytes);
+		return humanise_rate ?
+			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
+			       /* TRANSLATORS: IEC 80000-13:2008 byte */
+			       xstrfmt(Q_("byte", "bytes", bytes));
 	}
 }
 
 void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 {
-	strbuf_humanise(buf, bytes, 0);
+	char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
+	strbuf_addf(buf, " %s", unit);
+	free(unit);
 }
 
 void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
 {
-	strbuf_humanise(buf, bytes, 1);
+	char *unit = strbuf_humanise_bytes_value(buf, bytes, STRBUF_HUMANISE_RATE);
+	strbuf_addf(buf, " %s", unit);
+	free(unit);
 }
 
 int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..a5e3ab0cb4 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
  */
 void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
 
+#define STRBUF_HUMANISE_RATE 1 << 0
+
+/**
+ * Append the given byte size as a human-readable string that is downscaled by
+ * some factor. A string with the corresponding unit prefix is returned
+ * separately.
+ */
+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-15 16:26       ` Justin Tobler
  2025-12-15  8:21     ` Junio C Hamano
  2025-12-16  2:26     ` Jiang Xin
  2 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Fri, Dec 12, 2025 at 04:36:39PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index 6c3851a7f8..1fb47bf21b 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,55 +836,49 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
>  	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
>  }
>  
> -static void strbuf_humanise(struct strbuf *buf, off_t bytes,
> -				 int humanise_rate)
> +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
>  {
> +	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> +
>  	if (bytes > 1 << 30) {
> -		strbuf_addf(buf,
> -				humanise_rate == 0 ?
> -					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
> -					_("%u.%2.2u GiB") :
> -					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
> -					_("%u.%2.2u GiB/s"),
> -			    (unsigned)(bytes >> 30),
> +		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
>  			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> +		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> +		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
>  	} else if (bytes > 1 << 20) {
> -		unsigned x = bytes + 5243;  /* for rounding */
> -		strbuf_addf(buf,
> -				humanise_rate == 0 ?
> -					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
> -					_("%u.%2.2u MiB") :
> -					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
> -					_("%u.%2.2u MiB/s"),
> -			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
> +		unsigned x = bytes + 5243; /* for rounding */
> +		strbuf_addf(buf, "%u.%2.2u", x >> 20,
> +			    ((x & ((1 << 20) - 1)) * 100) >> 20);
> +		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
> +		return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
>  	} else if (bytes > 1 << 10) {
> -		unsigned x = bytes + 5;  /* for rounding */
> -		strbuf_addf(buf,
> -				humanise_rate == 0 ?
> -					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
> -					_("%u.%2.2u KiB") :
> -					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
> -					_("%u.%2.2u KiB/s"),
> -			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
> +		unsigned x = bytes + 5; /* for rounding */
> +		strbuf_addf(buf, "%u.%2.2u", x >> 10,
> +			    ((x & ((1 << 10) - 1)) * 100) >> 10);
> +		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
> +		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
>  	} else {
> -		strbuf_addf(buf,
> -				humanise_rate == 0 ?
> -					/* TRANSLATORS: IEC 80000-13:2008 byte */
> -					Q_("%u byte", "%u bytes", bytes) :
> -					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
> -					Q_("%u byte/s", "%u bytes/s", bytes),
> -				(unsigned)bytes);
> +		strbuf_addf(buf, "%u", (unsigned)bytes);
> +		return humanise_rate ?
> +			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> +			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> +			       /* TRANSLATORS: IEC 80000-13:2008 byte */
> +			       xstrfmt(Q_("byte", "bytes", bytes));
>  	}
>  }

All branches use `xstrfmt()` with strings that are essentially
constants, except for the translation part. So isn't it possible to drop
all these allocations and have the function return a `const char *`
instead?

> diff --git a/strbuf.h b/strbuf.h
> index a580ac6084..a5e3ab0cb4 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
>   */
>  void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
>  
> +#define STRBUF_HUMANISE_RATE 1 << 0

I think nowadays it's a bit more common to use an enum, and I think we
should also document what the flag does:

    enum strbuf_humanise_flags {
        /*
         * Frobnicate the string.
         */
        STRBUF_HUMANISE_RATE = (1 << 0),
    };

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-15  5:33     ` Patrick Steinhardt
@ 2025-12-15 16:26       ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster

On 25/12/15 06:33AM, Patrick Steinhardt wrote:
> On Fri, Dec 12, 2025 at 04:36:39PM -0600, Justin Tobler wrote:
>
> All branches use `xstrfmt()` with strings that are essentially
> constants, except for the translation part. So isn't it possible to drop
> all these allocations and have the function return a `const char *`
> instead?

Ya, that would indeed be better. Will fix.

> > diff --git a/strbuf.h b/strbuf.h
> > index a580ac6084..a5e3ab0cb4 100644
> > --- a/strbuf.h
> > +++ b/strbuf.h
> > @@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
> >   */
> >  void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
> >  
> > +#define STRBUF_HUMANISE_RATE 1 << 0
> 
> I think nowadays it's a bit more common to use an enum, and I think we
> should also document what the flag does:
> 
>     enum strbuf_humanise_flags {
>         /*
>          * Frobnicate the string.
>          */
>         STRBUF_HUMANISE_RATE = (1 << 0),
>     };

Will do.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
@ 2025-12-15  8:21     ` Junio C Hamano
  2025-12-15 16:47       ` Justin Tobler
  2025-12-16  2:26     ` Jiang Xin
  2 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-15  8:21 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps

Justin Tobler <jltobler@gmail.com> writes:

> +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
>  {
> +	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> +
>  	if (bytes > 1 << 30) {
> +		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
>  			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> +		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> +		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
> ...
> }
>  void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
>  {
> -	strbuf_humanise(buf, bytes, 0);
> +	char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
> +	strbuf_addf(buf, " %s", unit);
> +	free(unit);
>  }

The old "strbuf-humanise" used to treat the whole "<number> <unit>",
e.g., _("%u.%2.2u GiB"), as a single thing to be translated.
However, the new code requires that in all languages:

 - Decimal point in number MUST be "." (don't some Europeans prefer
   comma instead?);

 - Number MUST come before the unit;

 - Between the number and the unit, there has to be one and only one
   SP.

All of which could be a severe regression from localization's point
of view.

The first point among the above three can relatively easily
remedied.  It is a bit more involved, but it is possible to fix the
other two, too.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-15  8:21     ` Junio C Hamano
@ 2025-12-15 16:47       ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps

On 25/12/15 05:21PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
> >  {
> > +	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> > +
> >  	if (bytes > 1 << 30) {
> > +		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
> >  			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> > +		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> > +		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
> > ...
> > }
> >  void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
> >  {
> > -	strbuf_humanise(buf, bytes, 0);
> > +	char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
> > +	strbuf_addf(buf, " %s", unit);
> > +	free(unit);
> >  }
> 
> The old "strbuf-humanise" used to treat the whole "<number> <unit>",
> e.g., _("%u.%2.2u GiB"), as a single thing to be translated.
> However, the new code requires that in all languages:
> 
>  - Decimal point in number MUST be "." (don't some Europeans prefer
>    comma instead?);
> 
>  - Number MUST come before the unit;
> 
>  - Between the number and the unit, there has to be one and only one
>    SP.
> 
> All of which could be a severe regression from localization's point
> of view.
> 
> The first point among the above three can relatively easily
> remedied.  It is a bit more involved, but it is possible to fix the
> other two, too.

The first point could be addressed by just making "%u.%2.2u"
translatable. To address the others, we could have
strbuf_humanise_bytes_value() output two separate strings (value and
unit) instead of appending the the value and returning the unit. Maybe
something like:

  void humanise_bytes(off_t bytes, char **value, const char **unit)

We could then have another translatable string to configure the format:

  void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
  {
    char *value;
    const char *unit;

    humanise_bytes(bytes, &value, &unit);
    strbuf_addf(buf, _("%s %s"), value, unit);
    free(value);
  }

This is certainly a bit more involved setup for translators though. But
maybe it's ok? I'll move forward with something like above in the next
version for now.

Thanks
-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-15  8:21     ` Junio C Hamano
@ 2025-12-16  2:26     ` Jiang Xin
  2025-12-16  4:37       ` Junio C Hamano
  2 siblings, 1 reply; 80+ messages in thread
From: Jiang Xin @ 2025-12-16  2:26 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, gitster, Jeff Hostetler

On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> +               return humanise_rate ?
> +                              /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> +                              xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> +                              /* TRANSLATORS: IEC 80000-13:2008 byte */
> +                              xstrfmt(Q_("byte", "bytes", bytes));

We have already defined "byte" as a 10n string without plural forms in the
file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).

    OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),

The newly introduced usage of "byte" is now marked as having a plural form
(via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
pot failing with the following error:

    msgcat: msgid 'byte' is used without plural and with plural.

This happens because gettext requires that a given msgid be treated
consistently—either exclusively as a singular string or as part of a plural
construct—but not both.

To resolve this conflict, we can unmark the singular "byte" in
t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
plural-form definition of "byte".

--
Jiang Xin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-16  2:26     ` Jiang Xin
@ 2025-12-16  4:37       ` Junio C Hamano
  2025-12-16  6:18         ` Jiang Xin
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16  4:37 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Justin Tobler, git, ps, Jeff Hostetler

Jiang Xin <worldhello.net@gmail.com> writes:

> On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
>> +               return humanise_rate ?
>> +                              /* TRANSLATORS: IEC 80000-13:2008 byte/second */
>> +                              xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
>> +                              /* TRANSLATORS: IEC 80000-13:2008 byte */
>> +                              xstrfmt(Q_("byte", "bytes", bytes));
>
> We have already defined "byte" as a 10n string without plural forms in the
> file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
>
>     OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
>
> The newly introduced usage of "byte" is now marked as having a plural form
> (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> pot failing with the following error:
>
>     msgcat: msgid 'byte' is used without plural and with plural.
>
> This happens because gettext requires that a given msgid be treated
> consistently—either exclusively as a singular string or as part of a plural
> construct—but not both.
>
> To resolve this conflict, we can unmark the singular "byte" in
> t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> plural-form definition of "byte".

I learned a new thing today and am happy :).

But how does one "unmark" the singular "byte" there, exactly?

Would something like this ...

     OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),

... a good idea, to "mark" it as a countable noun that has a plural
form?

Or did you mean that we can simply drop N_() around it, i.e.,
N_("byte") -> "byte", to discard the i18n, because it merely is a
test helper?

Punting is fine in this case, but in case a similar situation arises
in real code, it would be better to establish a pattern we can
follow.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-16  4:37       ` Junio C Hamano
@ 2025-12-16  6:18         ` Jiang Xin
  2025-12-16 14:41           ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Jiang Xin @ 2025-12-16  6:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Justin Tobler, git, ps, Jeff Hostetler

On Tue, Dec 16, 2025 at 12:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jiang Xin <worldhello.net@gmail.com> writes:
>
> > On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> >> +               return humanise_rate ?
> >> +                              /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> >> +                              xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> >> +                              /* TRANSLATORS: IEC 80000-13:2008 byte */
> >> +                              xstrfmt(Q_("byte", "bytes", bytes));
> >
> > We have already defined "byte" as a 10n string without plural forms in the
> > file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> > tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
> >
> >     OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
> >
> > The newly introduced usage of "byte" is now marked as having a plural form
> > (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> > pot failing with the following error:
> >
> >     msgcat: msgid 'byte' is used without plural and with plural.
> >
> > This happens because gettext requires that a given msgid be treated
> > consistently—either exclusively as a singular string or as part of a plural
> > construct—but not both.
> >
> > To resolve this conflict, we can unmark the singular "byte" in
> > t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> > plural-form definition of "byte".
>
> I learned a new thing today and am happy :).
>
> But how does one "unmark" the singular "byte" there, exactly?
>
> Would something like this ...
>
>      OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),
>
> ... a good idea, to "mark" it as a countable noun that has a plural
> form?
>
> Or did you mean that we can simply drop N_() around it, i.e.,
> N_("byte") -> "byte", to discard the i18n, because it merely is a
> test helper?

I prefer dropping N_() for "byte" in "t/helper/test-simple-ipc.c", and
the i18n for the test helper will continue to work as before if we also
mark the plural-form of "byte" in this patch series. (i.e., drop the N_()
for "byte" in the test helper in this patch.)

This is because N_() is a macro that does not invoke any gettext
function, only returns msgid as in gettext.h:

    #define N_(msgid) msgid

And the actual translation for the msgid (the argh field of an option)
occurs later by calling:

    opts->argh ? _(opts->argh) : _("...")

in "parse-options.c".

However, replacing N_() with Q_() would cause the string to be
processed by gettext twice: once at runtime via Q_(), and again
when _(opts->argh) is evaluated.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
  2025-12-16  6:18         ` Jiang Xin
@ 2025-12-16 14:41           ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 14:41 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Junio C Hamano, git, ps, Jeff Hostetler

On 25/12/16 02:18PM, Jiang Xin wrote:
> On Tue, Dec 16, 2025 at 12:37 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Jiang Xin <worldhello.net@gmail.com> writes:
> >
> > > On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> > >> +               return humanise_rate ?
> > >> +                              /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> > >> +                              xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> > >> +                              /* TRANSLATORS: IEC 80000-13:2008 byte */
> > >> +                              xstrfmt(Q_("byte", "bytes", bytes));
> > >
> > > We have already defined "byte" as a 10n string without plural forms in the
> > > file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> > > tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
> > >
> > >     OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
> > >
> > > The newly introduced usage of "byte" is now marked as having a plural form
> > > (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> > > pot failing with the following error:
> > >
> > >     msgcat: msgid 'byte' is used without plural and with plural.
> > >
> > > This happens because gettext requires that a given msgid be treated
> > > consistently—either exclusively as a singular string or as part of a plural
> > > construct—but not both.
> > >
> > > To resolve this conflict, we can unmark the singular "byte" in
> > > t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> > > plural-form definition of "byte".
> >
> > I learned a new thing today and am happy :).
> >
> > But how does one "unmark" the singular "byte" there, exactly?
> >
> > Would something like this ...
> >
> >      OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),
> >
> > ... a good idea, to "mark" it as a countable noun that has a plural
> > form?
> >
> > Or did you mean that we can simply drop N_() around it, i.e.,
> > N_("byte") -> "byte", to discard the i18n, because it merely is a
> > test helper?
> 
> I prefer dropping N_() for "byte" in "t/helper/test-simple-ipc.c", and
> the i18n for the test helper will continue to work as before if we also
> mark the plural-form of "byte" in this patch series. (i.e., drop the N_()
> for "byte" in the test helper in this patch.)
> 
> This is because N_() is a macro that does not invoke any gettext
> function, only returns msgid as in gettext.h:
> 
>     #define N_(msgid) msgid
> 
> And the actual translation for the msgid (the argh field of an option)
> occurs later by calling:
> 
>     opts->argh ? _(opts->argh) : _("...")
> 
> in "parse-options.c".
> 
> However, replacing N_() with Q_() would cause the string to be
> processed by gettext twice: once at runtime via Q_(), and again
> when _(opts->argh) is evaluated.

Thanks both! This thread has been very informative. In the version I'll
go ahead and drop the N_() here for this patch. :)

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 3/7] builtin/repo: humanise count values in structure output
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-12 22:36   ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.

For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 45 +++++++++++++++++++++-------
 strbuf.c                  | 23 +++++++++++++++
 strbuf.h                  |  7 +++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
 4 files changed, 95 insertions(+), 42 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..d3dfe416d0 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
 
 	int name_col_width;
 	int value_col_width;
+	int unit_col_width;
 };
 
 /*
@@ -230,6 +231,7 @@ struct stats_table {
  */
 struct stats_table_entry {
 	char *value;
+	char *unit;
 };
 
 static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
 
 	if (name_width > table->name_col_width)
 		table->name_col_width = name_width;
-	if (entry) {
+	if (!entry)
+		return;
+	if (entry->value) {
 		int value_width = utf8_strwidth(entry->value);
 		if (value_width > table->value_col_width)
 			table->value_col_width = value_width;
 	}
+	if (entry->unit) {
+		int unit_width = utf8_strwidth(entry->unit);
+		if (unit_width > table->unit_col_width)
+			table->unit_col_width = unit_width;
+	}
 }
 
 static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -270,10 +279,13 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 				   const char *format, ...)
 {
 	struct stats_table_entry *entry;
+	struct strbuf buf = STRBUF_INIT;
 	va_list ap;
 
 	CALLOC_ARRAY(entry, 1);
-	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+	entry->unit = strbuf_humanise_count_value(&buf, value);
+	entry->value = strbuf_detach(&buf, NULL);
 
 	va_start(ap, format);
 	stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +336,24 @@ static void stats_table_print_structure(const struct stats_table *table)
 {
 	const char *name_col_title = _("Repository structure");
 	const char *value_col_title = _("Value");
-	int name_col_width = utf8_strwidth(name_col_title);
-	int value_col_width = utf8_strwidth(value_col_title);
+	int title_name_width = utf8_strwidth(name_col_title);
+	int title_value_width = utf8_strwidth(value_col_title);
+	int name_col_width = table->name_col_width;
+	int value_col_width = table->value_col_width;
+	int unit_col_width = table->unit_col_width;
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
 
-	if (table->name_col_width > name_col_width)
-		name_col_width = table->name_col_width;
-	if (table->value_col_width > value_col_width)
-		value_col_width = table->value_col_width;
+	if (title_name_width > name_col_width)
+		name_col_width = title_name_width;
+	if (title_value_width > value_col_width + unit_col_width + 1)
+		value_col_width = title_value_width - unit_col_width;
 
 	strbuf_addstr(&buf, "| ");
 	strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
 	strbuf_addstr(&buf, " | ");
-	strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+	strbuf_utf8_align(&buf, ALIGN_LEFT,
+			  value_col_width + unit_col_width + 1, value_col_title);
 	strbuf_addstr(&buf, " |");
 	printf("%s\n", buf.buf);
 
@@ -345,17 +361,20 @@ static void stats_table_print_structure(const struct stats_table *table)
 	for (int i = 0; i < name_col_width; i++)
 		putchar('-');
 	printf(" | ");
-	for (int i = 0; i < value_col_width; i++)
+	for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
 		putchar('-');
 	printf(" |\n");
 
 	for_each_string_list_item(item, &table->rows) {
 		struct stats_table_entry *entry = item->util;
 		const char *value = "";
+		const char *unit = "";
 
 		if (entry) {
 			struct stats_table_entry *entry = item->util;
 			value = entry->value;
+			if (entry->unit)
+				unit = entry->unit;
 		}
 
 		strbuf_reset(&buf);
@@ -363,6 +382,8 @@ static void stats_table_print_structure(const struct stats_table *table)
 		strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
 		strbuf_addstr(&buf, " | ");
 		strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+		strbuf_addch(&buf, ' ');
+		strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
 		strbuf_addstr(&buf, " |");
 		printf("%s\n", buf.buf);
 	}
@@ -377,8 +398,10 @@ static void stats_table_clear(struct stats_table *table)
 
 	for_each_string_list_item(item, &table->rows) {
 		entry = item->util;
-		if (entry)
+		if (entry) {
 			free(entry->value);
+			free(entry->unit);
+		}
 	}
 
 	string_list_clear(&table->rows, 1);
diff --git a/strbuf.c b/strbuf.c
index 1fb47bf21b..cebb1593ab 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
+{
+	if (value >= 1000000000) {
+		uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+			    x / 1000000000, x % 1000000000 / 10000000);
+		return xstrfmt(_("G"));
+	} else if (value >= 1000000) {
+		uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+			    x / 1000000, x % 1000000 / 10000);
+		return xstrfmt(_("M"));
+	} else if (value >= 1000) {
+		uintmax_t x = (uintmax_t)value + 5; /* for rounding */
+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+			    x / 1000, x % 1000 / 10);
+		return xstrfmt(_("k"));
+	} else {
+		strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
+		return NULL;
+	}
+}
+
 char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
 {
 	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
diff --git a/strbuf.h b/strbuf.h
index a5e3ab0cb4..7532eadd02 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -376,6 +376,13 @@ void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
  */
 char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
 
+/**
+ * Append the given count value as a human-readable string that is downsacled by
+ * some factor. A string with the corresponding unit prefix is returned
+ * separately.
+ */
+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
 	(
 		cd repo &&
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     0 |
-		|     * Branches       |     0 |
-		|     * Tags           |     0 |
-		|     * Remotes        |     0 |
-		|     * Others         |     0 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |     0 |
-		|     * Commits        |     0 |
-		|     * Trees          |     0 |
-		|     * Blobs          |     0 |
-		|     * Tags           |     0 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |     0  |
+		|     * Branches       |     0  |
+		|     * Tags           |     0  |
+		|     * Remotes        |     0  |
+		|     * Others         |     0  |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            |     0  |
+		|     * Commits        |     0  |
+		|     * Trees          |     0  |
+		|     * Blobs          |     0  |
+		|     * Tags           |     0  |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
 	git init repo &&
 	(
 		cd repo &&
-		test_commit_bulk 42 &&
+		test_commit_bulk 1005 &&
 		git tag -a foo -m bar &&
 
 		oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     4 |
-		|     * Branches       |     1 |
-		|     * Tags           |     1 |
-		|     * Remotes        |     1 |
-		|     * Others         |     1 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |   130 |
-		|     * Commits        |    43 |
-		|     * Trees          |    43 |
-		|     * Blobs          |    43 |
-		|     * Tags           |     1 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |    4   |
+		|     * Branches       |    1   |
+		|     * Tags           |    1   |
+		|     * Remotes        |    1   |
+		|     * Others         |    1   |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            | 3.02 k |
+		|     * Commits        | 1.01 k |
+		|     * Trees          | 1.01 k |
+		|     * Blobs          | 1.01 k |
+		|     * Tags           |    1   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 3/7] builtin/repo: humanise count values in structure output
  2025-12-12 22:36   ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-15  5:33     ` Patrick Steinhardt
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Fri, Dec 12, 2025 at 04:36:40PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index 1fb47bf21b..cebb1593ab 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
>  	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
>  }
>  
> +char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
> +{
> +	if (value >= 1000000000) {
> +		uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
> +		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> +			    x / 1000000000, x % 1000000000 / 10000000);
> +		return xstrfmt(_("G"));
> +	} else if (value >= 1000000) {
> +		uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
> +		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> +			    x / 1000000, x % 1000000 / 10000);
> +		return xstrfmt(_("M"));
> +	} else if (value >= 1000) {
> +		uintmax_t x = (uintmax_t)value + 5; /* for rounding */
> +		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> +			    x / 1000, x % 1000 / 10);
> +		return xstrfmt(_("k"));
> +	} else {
> +		strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
> +		return NULL;
> +	}
> +}

Same comment here as in the previous patch, can't we return `const char *`
here in case we drop all allocations?

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
                     ` (2 preceding siblings ...)
  2025-12-12 22:36   ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-12 22:36   ` [PATCH v2 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.

For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 32 ++++++++++++++++++++++++++++++++
 t/t1901-repo-structure.sh   |  6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
 +
 * Reference counts categorized by type
 * Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index d3dfe416d0..3a2d15cec4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
 
 #include "builtin.h"
 #include "environment.h"
+#include "hex.h"
+#include "odb.h"
 #include "parse-options.h"
 #include "path-walk.h"
 #include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
 
 struct object_stats {
 	struct object_values type_counts;
+	struct object_values inflated_sizes;
 };
 
 struct repo_structure {
@@ -428,6 +431,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
+	printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+	printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+	printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -491,6 +503,7 @@ static void structure_count_references(struct ref_stats *stats,
 }
 
 struct count_objects_data {
+	struct object_database *odb;
 	struct object_stats *stats;
 	struct progress *progress;
 };
@@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 {
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
+	size_t inflated_total = 0;
 	size_t object_count;
 
+	for (size_t i = 0; i < oids->nr; i++) {
+		struct object_info oi = OBJECT_INFO_INIT;
+		unsigned long inflated;
+
+		oi.sizep = &inflated;
+
+		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+						  OBJECT_INFO_FOR_PREFETCH) < 0)
+			continue;
+
+		inflated_total += inflated;
+	}
+
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
+		stats->inflated_sizes.tags += inflated_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
+		stats->inflated_sizes.commits += inflated_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
+		stats->inflated_sizes.trees += inflated_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
+		stats->inflated_sizes.blobs += inflated_total;
 		break;
 	default:
 		BUG("invalid object type");
@@ -531,6 +562,7 @@ static void structure_count_objects(struct object_stats *stats,
 {
 	struct path_walk_info info = PATH_WALK_INFO_INIT;
 	struct count_objects_data data = {
+		.odb = repo->objects,
 		.stats = stats,
 	};
 
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
 	)
 '
 
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
 		objects.trees.count=42
 		objects.blobs.count=42
 		objects.tags.count=1
+		objects.commits.inflated_size=9225
+		objects.trees.inflated_size=28554
+		objects.blobs.inflated_size=453
+		objects.tags.inflated_size=132
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-12 22:36   ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-15 16:48       ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Fri, Dec 12, 2025 at 04:36:41PM -0600, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index d3dfe416d0..3a2d15cec4 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
>  {
>  	struct count_objects_data *data = cb_data;
>  	struct object_stats *stats = data->stats;
> +	size_t inflated_total = 0;
>  	size_t object_count;
>  
> +	for (size_t i = 0; i < oids->nr; i++) {
> +		struct object_info oi = OBJECT_INFO_INIT;
> +		unsigned long inflated;
> +
> +		oi.sizep = &inflated;
> +
> +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> +						  OBJECT_INFO_FOR_PREFETCH) < 0)

Using `OBJECT_INFO_FOR_PREFETCH` feels a bit weird to me, as we're not
in a context where we want to do a prefetch. And if we ever were to
extend that flag to have more semantics that are relevant to prefetches,
only, then this code here might become broken.

Using `SKIP_FETCH_OBJECT | INFO_QUICK` does make sense though, so I'd
suggest to expand the flag here.

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-15  5:33     ` Patrick Steinhardt
@ 2025-12-15 16:48       ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster

On 25/12/15 06:33AM, Patrick Steinhardt wrote:
> On Fri, Dec 12, 2025 at 04:36:41PM -0600, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index d3dfe416d0..3a2d15cec4 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> >  {
> >  	struct count_objects_data *data = cb_data;
> >  	struct object_stats *stats = data->stats;
> > +	size_t inflated_total = 0;
> >  	size_t object_count;
> >  
> > +	for (size_t i = 0; i < oids->nr; i++) {
> > +		struct object_info oi = OBJECT_INFO_INIT;
> > +		unsigned long inflated;
> > +
> > +		oi.sizep = &inflated;
> > +
> > +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> > +						  OBJECT_INFO_FOR_PREFETCH) < 0)
> 
> Using `OBJECT_INFO_FOR_PREFETCH` feels a bit weird to me, as we're not
> in a context where we want to do a prefetch. And if we ever were to
> extend that flag to have more semantics that are relevant to prefetches,
> only, then this code here might become broken.
> 
> Using `SKIP_FETCH_OBJECT | INFO_QUICK` does make sense though, so I'd
> suggest to expand the flag here.

Good points. I'll update in the next version.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 5/7] builtin/repo: add inflated object info to structure table
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
                     ` (3 preceding siblings ...)
  2025-12-12 22:36   ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-12 22:36   ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 37 +++++++++++++++++++++--
 strbuf.c                  |  4 +++
 strbuf.h                  |  3 +-
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
 4 files changed, 76 insertions(+), 30 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 3a2d15cec4..b0609cfae5 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -295,6 +295,24 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_end(ap);
 }
 
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+				  const char *format, ...)
+{
+	struct stats_table_entry *entry;
+	struct strbuf buf = STRBUF_INIT;
+	va_list ap;
+
+	CALLOC_ARRAY(entry, 1);
+
+	entry->unit = strbuf_humanise_bytes_value(&buf, value,
+						  STRBUF_HUMANISE_COMPACT);
+	entry->value = strbuf_detach(&buf, NULL);
+
+	va_start(ap, format);
+	stats_table_vaddf(table, entry, format, ap);
+	va_end(ap);
+}
+
 static inline size_t get_total_reference_count(struct ref_stats *stats)
 {
 	return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -310,7 +328,8 @@ static void stats_table_setup_structure(struct stats_table *table,
 {
 	struct object_stats *objects = &stats->objects;
 	struct ref_stats *refs = &stats->refs;
-	size_t object_total;
+	size_t inflated_object_total;
+	size_t object_count_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -321,10 +340,10 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_values(&objects->type_counts);
+	object_count_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
-	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
+	stats_table_count_addf(table, object_count_total, "  * %s", _("Count"));
 	stats_table_count_addf(table, objects->type_counts.commits,
 			       "    * %s", _("Commits"));
 	stats_table_count_addf(table, objects->type_counts.trees,
@@ -333,6 +352,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			       "    * %s", _("Blobs"));
 	stats_table_count_addf(table, objects->type_counts.tags,
 			       "    * %s", _("Tags"));
+
+	inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+	stats_table_size_addf(table, inflated_object_total,
+			      "  * %s", _("Inflated size"));
+	stats_table_size_addf(table, objects->inflated_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->inflated_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->inflated_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->inflated_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index cebb1593ab..eed4e167ca 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -882,6 +882,10 @@ char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flag
 		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
 	} else {
 		strbuf_addf(buf, "%u", (unsigned)bytes);
+		if (flags & STRBUF_HUMANISE_COMPACT)
+			return humanise_rate ?
+				       xstrfmt(_("B/s")) :
+				       xstrfmt(_("B"));
 		return humanise_rate ?
 			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
 			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
diff --git a/strbuf.h b/strbuf.h
index 7532eadd02..919527d26b 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,7 +367,8 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
  */
 void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
 
-#define STRBUF_HUMANISE_RATE 1 << 0
+#define STRBUF_HUMANISE_RATE	1 << 0
+#define STRBUF_HUMANISE_COMPACT 1 << 1
 
 /**
  * Append the given byte size as a human-readable string that is downscaled by
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
 		| Repository structure | Value  |
 		| -------------------- | ------ |
 		| * References         |        |
-		|   * Count            |     0  |
-		|     * Branches       |     0  |
-		|     * Tags           |     0  |
-		|     * Remotes        |     0  |
-		|     * Others         |     0  |
+		|   * Count            |    0   |
+		|     * Branches       |    0   |
+		|     * Tags           |    0   |
+		|     * Remotes        |    0   |
+		|     * Others         |    0   |
 		|                      |        |
 		| * Reachable objects  |        |
-		|   * Count            |     0  |
-		|     * Commits        |     0  |
-		|     * Trees          |     0  |
-		|     * Blobs          |     0  |
-		|     * Tags           |     0  |
+		|   * Count            |    0   |
+		|     * Commits        |    0   |
+		|     * Trees          |    0   |
+		|     * Blobs          |    0   |
+		|     * Tags           |    0   |
+		|   * Inflated size    |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
 	)
 '
 
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value  |
-		| -------------------- | ------ |
-		| * References         |        |
-		|   * Count            |    4   |
-		|     * Branches       |    1   |
-		|     * Tags           |    1   |
-		|     * Remotes        |    1   |
-		|     * Others         |    1   |
-		|                      |        |
-		| * Reachable objects  |        |
-		|   * Count            | 3.02 k |
-		|     * Commits        | 1.01 k |
-		|     * Trees          | 1.01 k |
-		|     * Blobs          | 1.01 k |
-		|     * Tags           |    1   |
+		| Repository structure | Value      |
+		| -------------------- | ---------- |
+		| * References         |            |
+		|   * Count            |      4     |
+		|     * Branches       |      1     |
+		|     * Tags           |      1     |
+		|     * Remotes        |      1     |
+		|     * Others         |      1     |
+		|                      |            |
+		| * Reachable objects  |            |
+		|   * Count            |   3.02 k   |
+		|     * Commits        |   1.01 k   |
+		|     * Trees          |   1.01 k   |
+		|     * Blobs          |   1.01 k   |
+		|     * Tags           |      1     |
+		|   * Inflated size    |  16.03 MiB |
+		|     * Commits        | 217.92 KiB |
+		|     * Trees          |  15.81 MiB |
+		|     * Blobs          |  11.68 KiB |
+		|     * Tags           |    132 B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
                     ` (4 preceding siblings ...)
  2025-12-12 22:36   ` [PATCH v2 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-15  5:33     ` Patrick Steinhardt
  2025-12-12 22:36   ` [PATCH v2 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 18 ++++++++++++++++++
 t/t1901-repo-structure.sh   | 11 ++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
 * Reference counts categorized by type
 * Reachable object counts categorized by type
 * Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b0609cfae5..252a53f452 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
 struct object_stats {
 	struct object_values type_counts;
 	struct object_values inflated_sizes;
+	struct object_values disk_sizes;
 };
 
 struct repo_structure {
@@ -471,6 +472,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
 
+	printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+	printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+	printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -545,37 +555,45 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
 	size_t inflated_total = 0;
+	size_t disk_total = 0;
 	size_t object_count;
 
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		off_t disk;
 
 		oi.sizep = &inflated;
+		oi.disk_sizep = &disk;
 
 		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
 						  OBJECT_INFO_FOR_PREFETCH) < 0)
 			continue;
 
 		inflated_total += inflated;
+		disk_total += disk;
 	}
 
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
 		stats->inflated_sizes.tags += inflated_total;
+		stats->disk_sizes.tags += disk_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
 		stats->inflated_sizes.commits += inflated_total;
+		stats->disk_sizes.commits += disk_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
 		stats->inflated_sizes.trees += inflated_total;
+		stats->disk_sizes.trees += disk_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
 		stats->inflated_sizes.blobs += inflated_total;
+		stats->disk_sizes.blobs += disk_total;
 		break;
 	default:
 		BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..1553f3cd32 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
 
 . ./test-lib.sh
 
+object_type_disk_usage() {
+	git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
+		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		test_commit_bulk 42 &&
 		git tag -a foo -m bar &&
 
-		cat >expect <<-\EOF &&
+		cat >expect <<-EOF &&
 		references.branches.count=1
 		references.tags.count=1
 		references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		objects.trees.inflated_size=28554
 		objects.blobs.inflated_size=453
 		objects.tags.inflated_size=132
+		objects.commits.disk_size=$(object_type_disk_usage commit)
+		objects.trees.disk_size=$(object_type_disk_usage tree)
+		objects.blobs.disk_size=$(object_type_disk_usage blob)
+		objects.tags.disk_size=$(object_type_disk_usage tag)
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-12 22:36   ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-15  5:33     ` Patrick Steinhardt
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Fri, Dec 12, 2025 at 04:36:43PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index b18213c660..1553f3cd32 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -4,6 +4,11 @@ test_description='test git repo structure'
>  
>  . ./test-lib.sh
>  
> +object_type_disk_usage() {
> +	git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
> +		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
> +}
> +

Using `git rev-list --all --disk-usage --filter=object:type=$1
--filter-provided-objects` would avoid the separate call to awk(1).

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 7/7] builtin/repo: add object disk size info to structure table
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
                     ` (5 preceding siblings ...)
  2025-12-12 22:36   ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-12 22:36   ` Justin Tobler
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.

Since disk size may vary between platforms, tests do not validate actual
values and only check that size info is printed in an empty repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 13 +++++++++++++
 t/t1901-repo-structure.sh | 19 ++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 252a53f452..c294fa11d2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -331,6 +331,7 @@ static void stats_table_setup_structure(struct stats_table *table,
 	struct ref_stats *refs = &stats->refs;
 	size_t inflated_object_total;
 	size_t object_count_total;
+	size_t disk_object_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -365,6 +366,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			      "    * %s", _("Blobs"));
 	stats_table_size_addf(table, objects->inflated_sizes.tags,
 			      "    * %s", _("Tags"));
+
+	disk_object_total = get_total_object_values(&objects->disk_sizes);
+	stats_table_size_addf(table, disk_object_total,
+			      "  * %s", _("Disk size"));
+	stats_table_size_addf(table, objects->disk_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->disk_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->disk_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->disk_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 1553f3cd32..6a992222df 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -9,6 +9,15 @@ object_type_disk_usage() {
 		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
 }
 
+strip_object_disk_usage() {
+	awk '
+		/^\|   \* Disk size/ { skip=1; next }
+		skip && /^\|     \* / { next }
+		skip && !/^\|     \* / { skip=0 }
+		{ print }
+	' $1
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -35,6 +44,11 @@ test_expect_success 'empty repository' '
 		|     * Trees          |    0 B |
 		|     * Blobs          |    0 B |
 		|     * Tags           |    0 B |
+		|   * Disk size        |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -81,7 +95,10 @@ test_expect_success SHA1 'repository with references and objects' '
 		|     * Tags           |    132 B   |
 		EOF
 
-		git repo structure >out 2>err &&
+		git repo structure >out.raw 2>err &&
+
+		# Skip object disk sizes due to platform variance.
+		strip_object_disk_usage out.raw >out &&
 
 		test_cmp expect out &&
 		test_line_count = 0 err
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 0/7] builtin/repo: add object size info to structure output
  2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
                     ` (6 preceding siblings ...)
  2025-12-12 22:36   ` [PATCH v2 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-15 20:56   ` Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
                       ` (7 more replies)
  7 siblings, 8 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Greetings,

This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.

In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.

Changes in V3:
- Address potential localization regression by making the downscaled
  number format string also translatable. Also make the format string
  for how the values and unit prefixes are displayed via
  `strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
  `humanise_{bytes,count}()` and updated to provide both the value and
  unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
  `OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
  explicitly.
- Tests now use git-rev-list(1) to verify disk size info.

Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
  downscaling values and determining the appropriate unit prefix
  separately. This enables more control over how exactly the values are
  written to the structure output table which is useful for alignment
  reasons. I'm not how about the interface used in patch 2. Feedback is
  most welcome.
- In the previous version, when checking object size on a missing object
  we would die. Instead we now ignore missing objects. This allows the
  structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
  real expected values instead of skipping. Table output tests still
  skip verifing human-readable values though.

Thanks,
-Justin

Justin Tobler (7):
  builtin/repo: group per-type object values into struct
  strbuf: split out logic to humanise byte values
  builtin/repo: humanise count values in structure output
  builtin/repo: add inflated object info to keyvalue structure output
  builtin/repo: add inflated object info to structure table
  builtin/repo: add disk size info to keyvalue stucture output
  builtin/repo: add object disk size info to structure table

 Documentation/git-repo.adoc |   2 +
 builtin/repo.c              | 175 ++++++++++++++++++++++++++++++------
 strbuf.c                    |  93 ++++++++++++-------
 strbuf.h                    |  25 ++++++
 t/t1901-repo-structure.sh   | 113 +++++++++++++++--------
 5 files changed, 311 insertions(+), 97 deletions(-)

Range-diff against v2:
1:  be14de68f6 = 1:  be14de68f6 builtin/repo: group per-type object values into struct
2:  5ca6f9b708 ! 2:  1fa33f5906 strbuf: split out logic to humanise byte values
    @@ Commit message
         the git-repo(1) "structure" subcommand will be shown in a more
         human-readable format with the appropriate unit prefixes. For this
         usecase, the downscaled values and unit prefixes must be handled
    -    separately to ensure proper column alignment. Refactor strbuf_humanise()
    -    to instead append the downscaled byte value to the buffer only and
    -    return the appropriate unit prefix string.
    +    separately to ensure proper column alignment.
    +
    +    Split out logic from strbuf_humanise() to downscale byte values and
    +    determine the corresponding unit prefix into a separate humanise_bytes()
    +    function that provides seperate value and unit strings.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
      
     -static void strbuf_humanise(struct strbuf *buf, off_t bytes,
     -				 int humanise_rate)
    -+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
    ++void humanise_bytes(off_t bytes, char **value, const char **unit,
    ++		    unsigned flags)
      {
    -+	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
    ++	int humanise_rate = flags & HUMANISE_RATE;
     +
      	if (bytes > 1 << 30) {
     -		strbuf_addf(buf,
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     -					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
     -					_("%u.%2.2u GiB/s"),
     -			    (unsigned)(bytes >> 30),
    -+		strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
    - 			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
    +-			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
    ++		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
    ++				 (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
     +		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
    -+		return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
    ++		*unit = humanise_rate ? _("GiB/s") : _("GiB");
      	} else if (bytes > 1 << 20) {
     -		unsigned x = bytes + 5243;  /* for rounding */
     -		strbuf_addf(buf,
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     -					_("%u.%2.2u MiB/s"),
     -			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
     +		unsigned x = bytes + 5243; /* for rounding */
    -+		strbuf_addf(buf, "%u.%2.2u", x >> 20,
    -+			    ((x & ((1 << 20) - 1)) * 100) >> 20);
    ++		*value = xstrfmt(_("%u.%2.2u"), x >> 20,
    ++				 ((x & ((1 << 20) - 1)) * 100) >> 20);
     +		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
    -+		return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
    ++		*unit = humanise_rate ? _("MiB/s") : _("MiB");
      	} else if (bytes > 1 << 10) {
     -		unsigned x = bytes + 5;  /* for rounding */
     -		strbuf_addf(buf,
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     -					_("%u.%2.2u KiB/s"),
     -			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
     +		unsigned x = bytes + 5; /* for rounding */
    -+		strbuf_addf(buf, "%u.%2.2u", x >> 10,
    -+			    ((x & ((1 << 10) - 1)) * 100) >> 10);
    ++		*value = xstrfmt(_("%u.%2.2u"), x >> 10,
    ++				 ((x & ((1 << 10) - 1)) * 100) >> 10);
     +		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
    -+		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
    ++		*unit = humanise_rate ? _("KiB/s") : _("KiB");
      	} else {
     -		strbuf_addf(buf,
     -				humanise_rate == 0 ?
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     -					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
     -					Q_("%u byte/s", "%u bytes/s", bytes),
     -				(unsigned)bytes);
    -+		strbuf_addf(buf, "%u", (unsigned)bytes);
    -+		return humanise_rate ?
    ++		*value = xstrfmt(_("%u"), (unsigned)bytes);
    ++		*unit = humanise_rate ?
     +			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
    -+			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
    ++			       Q_("byte/s", "bytes/s", bytes) :
     +			       /* TRANSLATORS: IEC 80000-13:2008 byte */
    -+			       xstrfmt(Q_("byte", "bytes", bytes));
    ++			       Q_("byte", "bytes", bytes);
      	}
      }
      
    ++static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
    ++{
    ++	char *value;
    ++	const char *unit;
    ++
    ++	humanise_bytes(bytes, &value, &unit, flags);
    ++	strbuf_addf(buf, _("%s %s"), value, unit);
    ++	free(value);
    ++}
    ++
      void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
      {
    --	strbuf_humanise(buf, bytes, 0);
    -+	char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
    -+	strbuf_addf(buf, " %s", unit);
    -+	free(unit);
    - }
    + 	strbuf_humanise(buf, bytes, 0);
    +@@ strbuf.c: void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
      
      void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
      {
     -	strbuf_humanise(buf, bytes, 1);
    -+	char *unit = strbuf_humanise_bytes_value(buf, bytes, STRBUF_HUMANISE_RATE);
    -+	strbuf_addf(buf, " %s", unit);
    -+	free(unit);
    ++	strbuf_humanise(buf, bytes, HUMANISE_RATE);
      }
      
      int printf_ln(const char *fmt, ...)
    @@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
       */
      void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
      
    -+#define STRBUF_HUMANISE_RATE 1 << 0
    ++enum humanise_flags {
    ++	/*
    ++	 * Use rate based unit prefixes for humanised values.
    ++	 */
    ++	HUMANISE_RATE = (1 << 0),
    ++};
     +
     +/**
    -+ * Append the given byte size as a human-readable string that is downscaled by
    -+ * some factor. A string with the corresponding unit prefix is returned
    -+ * separately.
    ++ * Converts the given byte size into a downscaled human-readable value and
    ++ * corresponding unit prefix as two separate strings.
     + */
    -+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
    ++void humanise_bytes(off_t bytes, char **value, const char **unit,
    ++		    unsigned flags);
     +
      /**
       * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
3:  2efc3533ef ! 3:  8f09f6358e builtin/repo: humanise count values in structure output
    @@ builtin/repo.c: struct stats_table {
       */
      struct stats_table_entry {
      	char *value;
    -+	char *unit;
    ++	const char *unit;
      };
      
      static void stats_table_vaddf(struct stats_table *table,
    @@ builtin/repo.c: static void stats_table_vaddf(struct stats_table *table,
      
      static void stats_table_addf(struct stats_table *table, const char *format, ...)
     @@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
    - 				   const char *format, ...)
    - {
    - 	struct stats_table_entry *entry;
    -+	struct strbuf buf = STRBUF_INIT;
      	va_list ap;
      
      	CALLOC_ARRAY(entry, 1);
     -	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
    -+
    -+	entry->unit = strbuf_humanise_count_value(&buf, value);
    -+	entry->value = strbuf_detach(&buf, NULL);
    ++	humanise_count(value, &entry->value, &entry->unit);
      
      	va_start(ap, format);
      	stats_table_vaddf(table, entry, format, ap);
    @@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
      		strbuf_addstr(&buf, " |");
      		printf("%s\n", buf.buf);
      	}
    -@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
    - 
    - 	for_each_string_list_item(item, &table->rows) {
    - 		entry = item->util;
    --		if (entry)
    -+		if (entry) {
    - 			free(entry->value);
    -+			free(entry->unit);
    -+		}
    - 	}
    - 
    - 	string_list_clear(&table->rows, 1);
     
      ## strbuf.c ##
     @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
      	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
      }
      
    -+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
    ++void humanise_count(size_t count, char **value, const char **unit)
     +{
    -+	if (value >= 1000000000) {
    -+		uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
    -+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    -+			    x / 1000000000, x % 1000000000 / 10000000);
    -+		return xstrfmt(_("G"));
    -+	} else if (value >= 1000000) {
    -+		uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
    -+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    -+			    x / 1000000, x % 1000000 / 10000);
    -+		return xstrfmt(_("M"));
    -+	} else if (value >= 1000) {
    -+		uintmax_t x = (uintmax_t)value + 5; /* for rounding */
    -+		strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
    -+			    x / 1000, x % 1000 / 10);
    -+		return xstrfmt(_("k"));
    ++	if (count >= 1000000000) {
    ++		size_t x = count + 5000000; /* for rounding */
    ++		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
    ++				 (unsigned)(x % 1000000000 / 10000000));
    ++		*unit = _("G");
    ++	} else if (count >= 1000000) {
    ++		size_t x = count + 5000; /* for rounding */
    ++		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
    ++				 (unsigned)(x % 1000000 / 10000));
    ++		*unit = _("M");
    ++	} else if (count >= 1000) {
    ++		size_t x = count + 5; /* for rounding */
    ++		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
    ++				 (unsigned)(x % 1000 / 10));
    ++		*unit = _("k");
     +	} else {
    -+		strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
    -+		return NULL;
    ++		*value = xstrfmt(_("%u"), (unsigned)count);
    ++		*unit = NULL;
     +	}
     +}
     +
    - char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
    + void humanise_bytes(off_t bytes, char **value, const char **unit,
    + 		    unsigned flags)
      {
    - 	int humanise_rate = flags & STRBUF_HUMANISE_RATE;
     
      ## strbuf.h ##
    -@@ strbuf.h: void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
    -  */
    - char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
    +@@ strbuf.h: enum humanise_flags {
    + void humanise_bytes(off_t bytes, char **value, const char **unit,
    + 		    unsigned flags);
      
     +/**
    -+ * Append the given count value as a human-readable string that is downsacled by
    -+ * some factor. A string with the corresponding unit prefix is returned
    -+ * separately.
    ++ * Converts the given count into a downscaled human-readable value and
    ++ * corresponding unit prefix as two separate strings.
     + */
    -+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value);
    ++void humanise_count(size_t count, char **value, const char **unit);
     +
      /**
       * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
4:  627b8bf025 ! 4:  3f4eabe94f builtin/repo: add inflated object info to keyvalue structure output
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
     +		oi.sizep = &inflated;
     +
     +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
    -+						  OBJECT_INFO_FOR_PREFETCH) < 0)
    ++						  OBJECT_INFO_SKIP_FETCH_OBJECT |
    ++							  OBJECT_INFO_QUICK) < 0)
     +			continue;
     +
     +		inflated_total += inflated;
5:  14f4983e1d ! 5:  85d1052100 builtin/repo: add inflated object info to structure table
    @@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, si
     +				  const char *format, ...)
     +{
     +	struct stats_table_entry *entry;
    -+	struct strbuf buf = STRBUF_INIT;
     +	va_list ap;
     +
     +	CALLOC_ARRAY(entry, 1);
    -+
    -+	entry->unit = strbuf_humanise_bytes_value(&buf, value,
    -+						  STRBUF_HUMANISE_COMPACT);
    -+	entry->value = strbuf_detach(&buf, NULL);
    ++	humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
     +
     +	va_start(ap, format);
     +	stats_table_vaddf(table, entry, format, ap);
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      static void stats_table_print_structure(const struct stats_table *table)
     
      ## strbuf.c ##
    -@@ strbuf.c: char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flag
    - 		return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
    +@@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
    + 		*unit = humanise_rate ? _("KiB/s") : _("KiB");
      	} else {
    - 		strbuf_addf(buf, "%u", (unsigned)bytes);
    -+		if (flags & STRBUF_HUMANISE_COMPACT)
    -+			return humanise_rate ?
    -+				       xstrfmt(_("B/s")) :
    -+				       xstrfmt(_("B"));
    - 		return humanise_rate ?
    - 			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
    - 			       xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
    + 		*value = xstrfmt(_("%u"), (unsigned)bytes);
    +-		*unit = humanise_rate ?
    +-			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
    +-			       Q_("byte/s", "bytes/s", bytes) :
    +-			       /* TRANSLATORS: IEC 80000-13:2008 byte */
    +-			       Q_("byte", "bytes", bytes);
    ++		if (flags & HUMANISE_COMPACT)
    ++			*unit = humanise_rate ? _("B/s") : _("B");
    ++		else
    ++			*unit = humanise_rate ?
    ++					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
    ++					Q_("byte/s", "bytes/s", bytes) :
    ++					/* TRANSLATORS: IEC 80000-13:2008 byte */
    ++					Q_("byte", "bytes", bytes);
    + 	}
    + }
    + 
     
      ## strbuf.h ##
    -@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
    -  */
    - void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
    - 
    --#define STRBUF_HUMANISE_RATE 1 << 0
    -+#define STRBUF_HUMANISE_RATE	1 << 0
    -+#define STRBUF_HUMANISE_COMPACT 1 << 1
    +@@ strbuf.h: enum humanise_flags {
    + 	 * Use rate based unit prefixes for humanised values.
    + 	 */
    + 	HUMANISE_RATE = (1 << 0),
    ++	/*
    ++	 * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
    ++	 * values.
    ++	 */
    ++	HUMANISE_COMPACT = (1 << 1),
    + };
      
      /**
    -  * Append the given byte size as a human-readable string that is downscaled by
     
      ## t/t1901-repo-structure.sh ##
     @@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
6:  dc9e82889f ! 6:  e9fa9babec builtin/repo: add disk size info to keyvalue stucture output
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
     +		oi.disk_sizep = &disk;
      
      		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
    - 						  OBJECT_INFO_FOR_PREFETCH) < 0)
    + 						  OBJECT_INFO_SKIP_FETCH_OBJECT |
    +@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_array *oids,
      			continue;
      
      		inflated_total += inflated;
    @@ t/t1901-repo-structure.sh: test_description='test git repo structure'
      . ./test-lib.sh
      
     +object_type_disk_usage() {
    -+	git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
    -+		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
    ++	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
    ++		--filter-provided-objects
     +}
     +
      test_expect_success 'empty repository' '
7:  213b19dc7f ! 7:  df542c7bdf builtin/repo: add object disk size info to structure table
    @@ Commit message
         git-repo(1) structure command to display the total object disk usage by
         object type.
     
    -    Since disk size may vary between platforms, tests do not validate actual
    -    values and only check that size info is printed in an empty repository.
    -
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## builtin/repo.c ##
    @@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
      static void stats_table_print_structure(const struct stats_table *table)
     
      ## t/t1901-repo-structure.sh ##
    -@@ t/t1901-repo-structure.sh: object_type_disk_usage() {
    - 		--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
    - }
    +@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
    + . ./test-lib.sh
      
    -+strip_object_disk_usage() {
    -+	awk '
    -+		/^\|   \* Disk size/ { skip=1; next }
    -+		skip && /^\|     \* / { next }
    -+		skip && !/^\|     \* / { skip=0 }
    -+		{ print }
    -+	' $1
    -+}
    + object_type_disk_usage() {
    +-	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
    +-		--filter-provided-objects
    ++	disk_usage_opt="--disk-usage"
    ++
    ++	if [ "$2" = "true" ]; then
    ++		disk_usage_opt="--disk-usage=human"
    ++	fi
     +
    ++	if [ "$1" = "all" ]; then
    ++		git rev-list --all --objects $disk_usage_opt
    ++	else
    ++		git rev-list --all --objects $disk_usage_opt \
    ++			--filter=object:type=$1 --filter-provided-objects
    ++	fi
    + }
    + 
      test_expect_success 'empty repository' '
    - 	test_when_finished "rm -rf repo" &&
    - 	git init repo &&
     @@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
      		|     * Trees          |    0 B |
      		|     * Blobs          |    0 B |
    @@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
      
      		git repo structure >out 2>err &&
     @@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
    + 		# Also creates a commit, tree, and blob.
    + 		git notes add -m foo &&
    + 
    +-		cat >expect <<-\EOF &&
    ++		cat >expect <<-EOF &&
    + 		| Repository structure | Value      |
    + 		| -------------------- | ---------- |
    + 		| * References         |            |
    +@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
    + 		|     * Trees          |  15.81 MiB |
    + 		|     * Blobs          |  11.68 KiB |
      		|     * Tags           |    132 B   |
    ++		|   * Disk size        | $(object_type_disk_usage all true) |
    ++		|     * Commits        | $(object_type_disk_usage commit true) |
    ++		|     * Trees          | $(object_type_disk_usage tree true) |
    ++		|     * Blobs          |  $(object_type_disk_usage blob true) |
    ++		|     * Tags           |    $(object_type_disk_usage tag) B   |
      		EOF
      
    --		git repo structure >out 2>err &&
    -+		git repo structure >out.raw 2>err &&
    -+
    -+		# Skip object disk sizes due to platform variance.
    -+		strip_object_disk_usage out.raw >out &&
    - 
    - 		test_cmp expect out &&
    - 		test_line_count = 0 err
    + 		git repo structure >out 2>err &&

base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3 1/7] builtin/repo: group per-type object values into struct
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
 	size_t others;
 };
 
-struct object_stats {
+struct object_values {
 	size_t tags;
 	size_t commits;
 	size_t trees;
 	size_t blobs;
 };
 
+struct object_stats {
+	struct object_values type_counts;
+};
+
 struct repo_structure {
 	struct ref_stats refs;
 	struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
 	return stats->branches + stats->remotes + stats->tags + stats->others;
 }
 
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
 {
-	return stats->tags + stats->commits + stats->trees + stats->blobs;
+	return values->tags + values->commits + values->trees + values->blobs;
 }
 
 static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_count(objects);
+	object_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
 	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
-	stats_table_count_addf(table, objects->commits, "    * %s", _("Commits"));
-	stats_table_count_addf(table, objects->trees, "    * %s", _("Trees"));
-	stats_table_count_addf(table, objects->blobs, "    * %s", _("Blobs"));
-	stats_table_count_addf(table, objects->tags, "    * %s", _("Tags"));
+	stats_table_count_addf(table, objects->type_counts.commits,
+			       "    * %s", _("Commits"));
+	stats_table_count_addf(table, objects->type_counts.trees,
+			       "    * %s", _("Trees"));
+	stats_table_count_addf(table, objects->type_counts.blobs,
+			       "    * %s", _("Blobs"));
+	stats_table_count_addf(table, objects->type_counts.tags,
+			       "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	       (uintmax_t)stats->refs.others, value_delim);
 
 	printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.commits, value_delim);
+	       (uintmax_t)stats->objects.type_counts.commits, value_delim);
 	printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.trees, value_delim);
+	       (uintmax_t)stats->objects.type_counts.trees, value_delim);
 	printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.blobs, value_delim);
+	       (uintmax_t)stats->objects.type_counts.blobs, value_delim);
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.tags, value_delim);
+	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
 	fflush(stdout);
 }
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 
 	switch (type) {
 	case OBJ_TAG:
-		stats->tags += oids->nr;
+		stats->type_counts.tags += oids->nr;
 		break;
 	case OBJ_COMMIT:
-		stats->commits += oids->nr;
+		stats->type_counts.commits += oids->nr;
 		break;
 	case OBJ_TREE:
-		stats->trees += oids->nr;
+		stats->type_counts.trees += oids->nr;
 		break;
 	case OBJ_BLOB:
-		stats->blobs += oids->nr;
+		stats->type_counts.blobs += oids->nr;
 		break;
 	default:
 		BUG("invalid object type");
 	}
 
-	object_count = get_total_object_count(stats);
+	object_count = get_total_object_values(&stats->type_counts);
 	display_progress(data->progress, object_count);
 
 	return 0;
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 2/7] strbuf: split out logic to humanise byte values
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-16  1:19       ` Junio C Hamano
  2025-12-15 20:56     ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment.

Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 strbuf.c | 69 ++++++++++++++++++++++++++++----------------------------
 strbuf.h | 14 ++++++++++++
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..bb8e98872f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,48 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
-				 int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags)
 {
+	int humanise_rate = flags & HUMANISE_RATE;
+
 	if (bytes > 1 << 30) {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
-					_("%u.%2.2u GiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
-					_("%u.%2.2u GiB/s"),
-			    (unsigned)(bytes >> 30),
-			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+				 (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+		*unit = humanise_rate ? _("GiB/s") : _("GiB");
 	} else if (bytes > 1 << 20) {
-		unsigned x = bytes + 5243;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
-					_("%u.%2.2u MiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
-					_("%u.%2.2u MiB/s"),
-			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+		unsigned x = bytes + 5243; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 20,
+				 ((x & ((1 << 20) - 1)) * 100) >> 20);
+		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+		*unit = humanise_rate ? _("MiB/s") : _("MiB");
 	} else if (bytes > 1 << 10) {
-		unsigned x = bytes + 5;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
-					_("%u.%2.2u KiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
-					_("%u.%2.2u KiB/s"),
-			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+		unsigned x = bytes + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 10,
+				 ((x & ((1 << 10) - 1)) * 100) >> 10);
+		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 byte */
-					Q_("%u byte", "%u bytes", bytes) :
-					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
-					Q_("%u byte/s", "%u bytes/s", bytes),
-				(unsigned)bytes);
+		*value = xstrfmt(_("%u"), (unsigned)bytes);
+		*unit = humanise_rate ?
+			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+			       Q_("byte/s", "bytes/s", bytes) :
+			       /* TRANSLATORS: IEC 80000-13:2008 byte */
+			       Q_("byte", "bytes", bytes);
 	}
 }
 
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+	char *value;
+	const char *unit;
+
+	humanise_bytes(bytes, &value, &unit, flags);
+	strbuf_addf(buf, _("%s %s"), value, unit);
+	free(value);
+}
+
 void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 {
 	strbuf_humanise(buf, bytes, 0);
@@ -884,7 +885,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 
 void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
 {
-	strbuf_humanise(buf, bytes, 1);
+	strbuf_humanise(buf, bytes, HUMANISE_RATE);
 }
 
 int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..4426163e7e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
  */
 void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
 
+enum humanise_flags {
+	/*
+	 * Use rate based unit prefixes for humanised values.
+	 */
+	HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/7] strbuf: split out logic to humanise byte values
  2025-12-15 20:56     ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16  1:19       ` Junio C Hamano
  2025-12-16  1:36         ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16  1:19 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps

Justin Tobler <jltobler@gmail.com> writes:

> +		*value = xstrfmt(_("%u"), (unsigned)bytes);

Does this "%u" need translation?

I very much doubt it, but if it did, this does need TRANSLATORS
comment.

> +		*unit = humanise_rate ?
> +			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> +			       Q_("byte/s", "bytes/s", bytes) :
> +			       /* TRANSLATORS: IEC 80000-13:2008 byte */
> +			       Q_("byte", "bytes", bytes);
>  	}
>  }
>  
> +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> +{
> +	char *value;
> +	const char *unit;
> +
> +	humanise_bytes(bytes, &value, &unit, flags);
> +	strbuf_addf(buf, _("%s %s"), value, unit);

This definitely needs the TRANSLATORS comment to tell what is going on.

> +	free(value);
> +}



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/7] strbuf: split out logic to humanise byte values
  2025-12-16  1:19       ` Junio C Hamano
@ 2025-12-16  1:36         ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16  1:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps

On 25/12/16 10:19AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > +		*value = xstrfmt(_("%u"), (unsigned)bytes);
> 
> Does this "%u" need translation?
> 
> I very much doubt it, but if it did, this does need TRANSLATORS
> comment.

Ya, I don't think one should be necessary. Will remove in the next
version.

I think I made the same mistake in humanise_count() in a later patch.
I'll also adjust it there.

> 
> > +		*unit = humanise_rate ?
> > +			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> > +			       Q_("byte/s", "bytes/s", bytes) :
> > +			       /* TRANSLATORS: IEC 80000-13:2008 byte */
> > +			       Q_("byte", "bytes", bytes);
> >  	}
> >  }
> >  
> > +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> > +{
> > +	char *value;
> > +	const char *unit;
> > +
> > +	humanise_bytes(bytes, &value, &unit, flags);
> > +	strbuf_addf(buf, _("%s %s"), value, unit);
> 
> This definitely needs the TRANSLATORS comment to tell what is going on.

Ok, will do in the next version. Thanks :)

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3 3/7] builtin/repo: humanise count values in structure output
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-16  8:25       ` Patrick Steinhardt
  2025-12-15 20:56     ` [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.

For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 38 +++++++++++++++++-------
 strbuf.c                  | 23 +++++++++++++++
 strbuf.h                  |  6 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
 4 files changed, 88 insertions(+), 41 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
 
 	int name_col_width;
 	int value_col_width;
+	int unit_col_width;
 };
 
 /*
@@ -230,6 +231,7 @@ struct stats_table {
  */
 struct stats_table_entry {
 	char *value;
+	const char *unit;
 };
 
 static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
 
 	if (name_width > table->name_col_width)
 		table->name_col_width = name_width;
-	if (entry) {
+	if (!entry)
+		return;
+	if (entry->value) {
 		int value_width = utf8_strwidth(entry->value);
 		if (value_width > table->value_col_width)
 			table->value_col_width = value_width;
 	}
+	if (entry->unit) {
+		int unit_width = utf8_strwidth(entry->unit);
+		if (unit_width > table->unit_col_width)
+			table->unit_col_width = unit_width;
+	}
 }
 
 static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_list ap;
 
 	CALLOC_ARRAY(entry, 1);
-	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+	humanise_count(value, &entry->value, &entry->unit);
 
 	va_start(ap, format);
 	stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
 {
 	const char *name_col_title = _("Repository structure");
 	const char *value_col_title = _("Value");
-	int name_col_width = utf8_strwidth(name_col_title);
-	int value_col_width = utf8_strwidth(value_col_title);
+	int title_name_width = utf8_strwidth(name_col_title);
+	int title_value_width = utf8_strwidth(value_col_title);
+	int name_col_width = table->name_col_width;
+	int value_col_width = table->value_col_width;
+	int unit_col_width = table->unit_col_width;
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
 
-	if (table->name_col_width > name_col_width)
-		name_col_width = table->name_col_width;
-	if (table->value_col_width > value_col_width)
-		value_col_width = table->value_col_width;
+	if (title_name_width > name_col_width)
+		name_col_width = title_name_width;
+	if (title_value_width > value_col_width + unit_col_width + 1)
+		value_col_width = title_value_width - unit_col_width;
 
 	strbuf_addstr(&buf, "| ");
 	strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
 	strbuf_addstr(&buf, " | ");
-	strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+	strbuf_utf8_align(&buf, ALIGN_LEFT,
+			  value_col_width + unit_col_width + 1, value_col_title);
 	strbuf_addstr(&buf, " |");
 	printf("%s\n", buf.buf);
 
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
 	for (int i = 0; i < name_col_width; i++)
 		putchar('-');
 	printf(" | ");
-	for (int i = 0; i < value_col_width; i++)
+	for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
 		putchar('-');
 	printf(" |\n");
 
 	for_each_string_list_item(item, &table->rows) {
 		struct stats_table_entry *entry = item->util;
 		const char *value = "";
+		const char *unit = "";
 
 		if (entry) {
 			struct stats_table_entry *entry = item->util;
 			value = entry->value;
+			if (entry->unit)
+				unit = entry->unit;
 		}
 
 		strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
 		strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
 		strbuf_addstr(&buf, " | ");
 		strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+		strbuf_addch(&buf, ' ');
+		strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
 		strbuf_addstr(&buf, " |");
 		printf("%s\n", buf.buf);
 	}
diff --git a/strbuf.c b/strbuf.c
index bb8e98872f..662edd4d19 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
+void humanise_count(size_t count, char **value, const char **unit)
+{
+	if (count >= 1000000000) {
+		size_t x = count + 5000000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+				 (unsigned)(x % 1000000000 / 10000000));
+		*unit = _("G");
+	} else if (count >= 1000000) {
+		size_t x = count + 5000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+				 (unsigned)(x % 1000000 / 10000));
+		*unit = _("M");
+	} else if (count >= 1000) {
+		size_t x = count + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+				 (unsigned)(x % 1000 / 10));
+		*unit = _("k");
+	} else {
+		*value = xstrfmt(_("%u"), (unsigned)count);
+		*unit = NULL;
+	}
+}
+
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags)
 {
diff --git a/strbuf.h b/strbuf.h
index 4426163e7e..571bd889df 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags);
 
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
 	(
 		cd repo &&
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     0 |
-		|     * Branches       |     0 |
-		|     * Tags           |     0 |
-		|     * Remotes        |     0 |
-		|     * Others         |     0 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |     0 |
-		|     * Commits        |     0 |
-		|     * Trees          |     0 |
-		|     * Blobs          |     0 |
-		|     * Tags           |     0 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |     0  |
+		|     * Branches       |     0  |
+		|     * Tags           |     0  |
+		|     * Remotes        |     0  |
+		|     * Others         |     0  |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            |     0  |
+		|     * Commits        |     0  |
+		|     * Trees          |     0  |
+		|     * Blobs          |     0  |
+		|     * Tags           |     0  |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
 	git init repo &&
 	(
 		cd repo &&
-		test_commit_bulk 42 &&
+		test_commit_bulk 1005 &&
 		git tag -a foo -m bar &&
 
 		oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     4 |
-		|     * Branches       |     1 |
-		|     * Tags           |     1 |
-		|     * Remotes        |     1 |
-		|     * Others         |     1 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |   130 |
-		|     * Commits        |    43 |
-		|     * Trees          |    43 |
-		|     * Blobs          |    43 |
-		|     * Tags           |     1 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |    4   |
+		|     * Branches       |    1   |
+		|     * Tags           |    1   |
+		|     * Remotes        |    1   |
+		|     * Others         |    1   |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            | 3.02 k |
+		|     * Commits        | 1.01 k |
+		|     * Trees          | 1.01 k |
+		|     * Blobs          | 1.01 k |
+		|     * Tags           |    1   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/7] builtin/repo: humanise count values in structure output
  2025-12-15 20:56     ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-16  8:25       ` Patrick Steinhardt
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-16  8:25 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Mon, Dec 15, 2025 at 02:56:35PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index bb8e98872f..662edd4d19 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
>  	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
>  }
>  
> +void humanise_count(size_t count, char **value, const char **unit)
> +{
> +	if (count >= 1000000000) {
> +		size_t x = count + 5000000; /* for rounding */
> +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
> +				 (unsigned)(x % 1000000000 / 10000000));
> +		*unit = _("G");
> +	} else if (count >= 1000000) {
> +		size_t x = count + 5000; /* for rounding */
> +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
> +				 (unsigned)(x % 1000000 / 10000));
> +		*unit = _("M");
> +	} else if (count >= 1000) {
> +		size_t x = count + 5; /* for rounding */
> +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
> +				 (unsigned)(x % 1000 / 10));
> +		*unit = _("k");
> +	} else {
> +		*value = xstrfmt(_("%u"), (unsigned)count);
> +		*unit = NULL;
> +	}
> +}

I guess these here could also all use TRANSLATOR comments.

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
                       ` (2 preceding siblings ...)
  2025-12-15 20:56     ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.

For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 33 +++++++++++++++++++++++++++++++++
 t/t1901-repo-structure.sh   |  6 +++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
 +
 * Reference counts categorized by type
 * Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..e207108346 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
 
 #include "builtin.h"
 #include "environment.h"
+#include "hex.h"
+#include "odb.h"
 #include "parse-options.h"
 #include "path-walk.h"
 #include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
 
 struct object_stats {
 	struct object_values type_counts;
+	struct object_values inflated_sizes;
 };
 
 struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
+	printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+	printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+	printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
 }
 
 struct count_objects_data {
+	struct object_database *odb;
 	struct object_stats *stats;
 	struct progress *progress;
 };
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 {
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
+	size_t inflated_total = 0;
 	size_t object_count;
 
+	for (size_t i = 0; i < oids->nr; i++) {
+		struct object_info oi = OBJECT_INFO_INIT;
+		unsigned long inflated;
+
+		oi.sizep = &inflated;
+
+		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+						  OBJECT_INFO_SKIP_FETCH_OBJECT |
+							  OBJECT_INFO_QUICK) < 0)
+			continue;
+
+		inflated_total += inflated;
+	}
+
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
+		stats->inflated_sizes.tags += inflated_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
+		stats->inflated_sizes.commits += inflated_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
+		stats->inflated_sizes.trees += inflated_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
+		stats->inflated_sizes.blobs += inflated_total;
 		break;
 	default:
 		BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
 {
 	struct path_walk_info info = PATH_WALK_INFO_INIT;
 	struct count_objects_data data = {
+		.odb = repo->objects,
 		.stats = stats,
 	};
 
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
 	)
 '
 
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
 		objects.trees.count=42
 		objects.blobs.count=42
 		objects.tags.count=1
+		objects.commits.inflated_size=9225
+		objects.trees.inflated_size=28554
+		objects.blobs.inflated_size=453
+		objects.tags.inflated_size=132
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 5/7] builtin/repo: add inflated object info to structure table
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
                       ` (3 preceding siblings ...)
  2025-12-15 20:56     ` [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 33 +++++++++++++++++++--
 strbuf.c                  | 13 ++++----
 strbuf.h                  |  5 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
 4 files changed, 79 insertions(+), 34 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index e207108346..b73cfd975b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_end(ap);
 }
 
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+				  const char *format, ...)
+{
+	struct stats_table_entry *entry;
+	va_list ap;
+
+	CALLOC_ARRAY(entry, 1);
+	humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+	va_start(ap, format);
+	stats_table_vaddf(table, entry, format, ap);
+	va_end(ap);
+}
+
 static inline size_t get_total_reference_count(struct ref_stats *stats)
 {
 	return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
 {
 	struct object_stats *objects = &stats->objects;
 	struct ref_stats *refs = &stats->refs;
-	size_t object_total;
+	size_t inflated_object_total;
+	size_t object_count_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_values(&objects->type_counts);
+	object_count_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
-	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
+	stats_table_count_addf(table, object_count_total, "  * %s", _("Count"));
 	stats_table_count_addf(table, objects->type_counts.commits,
 			       "    * %s", _("Commits"));
 	stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			       "    * %s", _("Blobs"));
 	stats_table_count_addf(table, objects->type_counts.tags,
 			       "    * %s", _("Tags"));
+
+	inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+	stats_table_size_addf(table, inflated_object_total,
+			      "  * %s", _("Inflated size"));
+	stats_table_size_addf(table, objects->inflated_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->inflated_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->inflated_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->inflated_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 662edd4d19..1e2d1f70a7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -883,11 +883,14 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
 		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
 		*value = xstrfmt(_("%u"), (unsigned)bytes);
-		*unit = humanise_rate ?
-			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
-			       Q_("byte/s", "bytes/s", bytes) :
-			       /* TRANSLATORS: IEC 80000-13:2008 byte */
-			       Q_("byte", "bytes", bytes);
+		if (flags & HUMANISE_COMPACT)
+			*unit = humanise_rate ? _("B/s") : _("B");
+		else
+			*unit = humanise_rate ?
+					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
+					Q_("byte/s", "bytes/s", bytes) :
+					/* TRANSLATORS: IEC 80000-13:2008 byte */
+					Q_("byte", "bytes", bytes);
 	}
 }
 
diff --git a/strbuf.h b/strbuf.h
index 571bd889df..005c155808 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
 	 * Use rate based unit prefixes for humanised values.
 	 */
 	HUMANISE_RATE = (1 << 0),
+	/*
+	 * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
+	 * values.
+	 */
+	HUMANISE_COMPACT = (1 << 1),
 };
 
 /**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
 		| Repository structure | Value  |
 		| -------------------- | ------ |
 		| * References         |        |
-		|   * Count            |     0  |
-		|     * Branches       |     0  |
-		|     * Tags           |     0  |
-		|     * Remotes        |     0  |
-		|     * Others         |     0  |
+		|   * Count            |    0   |
+		|     * Branches       |    0   |
+		|     * Tags           |    0   |
+		|     * Remotes        |    0   |
+		|     * Others         |    0   |
 		|                      |        |
 		| * Reachable objects  |        |
-		|   * Count            |     0  |
-		|     * Commits        |     0  |
-		|     * Trees          |     0  |
-		|     * Blobs          |     0  |
-		|     * Tags           |     0  |
+		|   * Count            |    0   |
+		|     * Commits        |    0   |
+		|     * Trees          |    0   |
+		|     * Blobs          |    0   |
+		|     * Tags           |    0   |
+		|   * Inflated size    |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
 	)
 '
 
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value  |
-		| -------------------- | ------ |
-		| * References         |        |
-		|   * Count            |    4   |
-		|     * Branches       |    1   |
-		|     * Tags           |    1   |
-		|     * Remotes        |    1   |
-		|     * Others         |    1   |
-		|                      |        |
-		| * Reachable objects  |        |
-		|   * Count            | 3.02 k |
-		|     * Commits        | 1.01 k |
-		|     * Trees          | 1.01 k |
-		|     * Blobs          | 1.01 k |
-		|     * Tags           |    1   |
+		| Repository structure | Value      |
+		| -------------------- | ---------- |
+		| * References         |            |
+		|   * Count            |      4     |
+		|     * Branches       |      1     |
+		|     * Tags           |      1     |
+		|     * Remotes        |      1     |
+		|     * Others         |      1     |
+		|                      |            |
+		| * Reachable objects  |            |
+		|   * Count            |   3.02 k   |
+		|     * Commits        |   1.01 k   |
+		|     * Trees          |   1.01 k   |
+		|     * Blobs          |   1.01 k   |
+		|     * Tags           |      1     |
+		|   * Inflated size    |  16.03 MiB |
+		|     * Commits        | 217.92 KiB |
+		|     * Trees          |  15.81 MiB |
+		|     * Blobs          |  11.68 KiB |
+		|     * Tags           |    132 B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
                       ` (4 preceding siblings ...)
  2025-12-15 20:56     ` [PATCH v3 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-15 20:56     ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 18 ++++++++++++++++++
 t/t1901-repo-structure.sh   | 11 ++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
 * Reference counts categorized by type
 * Reachable object counts categorized by type
 * Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b73cfd975b..0ed41bf9d4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
 struct object_stats {
 	struct object_values type_counts;
 	struct object_values inflated_sizes;
+	struct object_values disk_sizes;
 };
 
 struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
 
+	printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+	printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+	printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
 	size_t inflated_total = 0;
+	size_t disk_total = 0;
 	size_t object_count;
 
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		off_t disk;
 
 		oi.sizep = &inflated;
+		oi.disk_sizep = &disk;
 
 		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
 						  OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 			continue;
 
 		inflated_total += inflated;
+		disk_total += disk;
 	}
 
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
 		stats->inflated_sizes.tags += inflated_total;
+		stats->disk_sizes.tags += disk_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
 		stats->inflated_sizes.commits += inflated_total;
+		stats->disk_sizes.commits += disk_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
 		stats->inflated_sizes.trees += inflated_total;
+		stats->disk_sizes.trees += disk_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
 		stats->inflated_sizes.blobs += inflated_total;
+		stats->disk_sizes.blobs += disk_total;
 		break;
 	default:
 		BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
 
 . ./test-lib.sh
 
+object_type_disk_usage() {
+	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+		--filter-provided-objects
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		test_commit_bulk 42 &&
 		git tag -a foo -m bar &&
 
-		cat >expect <<-\EOF &&
+		cat >expect <<-EOF &&
 		references.branches.count=1
 		references.tags.count=1
 		references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		objects.trees.inflated_size=28554
 		objects.blobs.inflated_size=453
 		objects.tags.inflated_size=132
+		objects.commits.disk_size=$(object_type_disk_usage commit)
+		objects.trees.disk_size=$(object_type_disk_usage tree)
+		objects.blobs.disk_size=$(object_type_disk_usage blob)
+		objects.tags.disk_size=$(object_type_disk_usage tag)
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
                       ` (5 preceding siblings ...)
  2025-12-15 20:56     ` [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-15 20:56     ` Justin Tobler
  2025-12-16  8:25       ` Patrick Steinhardt
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
  7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, Justin Tobler

Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 13 +++++++++++++
 t/t1901-repo-structure.sh | 26 +++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 0ed41bf9d4..a071d2fdfe 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
 	struct ref_stats *refs = &stats->refs;
 	size_t inflated_object_total;
 	size_t object_count_total;
+	size_t disk_object_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			      "    * %s", _("Blobs"));
 	stats_table_size_addf(table, objects->inflated_sizes.tags,
 			      "    * %s", _("Tags"));
+
+	disk_object_total = get_total_object_values(&objects->disk_sizes);
+	stats_table_size_addf(table, disk_object_total,
+			      "  * %s", _("Disk size"));
+	stats_table_size_addf(table, objects->disk_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->disk_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->disk_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->disk_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..64db191234 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,18 @@ test_description='test git repo structure'
 . ./test-lib.sh
 
 object_type_disk_usage() {
-	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
-		--filter-provided-objects
+	disk_usage_opt="--disk-usage"
+
+	if [ "$2" = "true" ]; then
+		disk_usage_opt="--disk-usage=human"
+	fi
+
+	if [ "$1" = "all" ]; then
+		git rev-list --all --objects $disk_usage_opt
+	else
+		git rev-list --all --objects $disk_usage_opt \
+			--filter=object:type=$1 --filter-provided-objects
+	fi
 }
 
 test_expect_success 'empty repository' '
@@ -35,6 +45,11 @@ test_expect_success 'empty repository' '
 		|     * Trees          |    0 B |
 		|     * Blobs          |    0 B |
 		|     * Tags           |    0 B |
+		|   * Disk size        |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -58,7 +73,7 @@ test_expect_success SHA1 'repository with references and objects' '
 		# Also creates a commit, tree, and blob.
 		git notes add -m foo &&
 
-		cat >expect <<-\EOF &&
+		cat >expect <<-EOF &&
 		| Repository structure | Value      |
 		| -------------------- | ---------- |
 		| * References         |            |
@@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
 		|     * Trees          |  15.81 MiB |
 		|     * Blobs          |  11.68 KiB |
 		|     * Tags           |    132 B   |
+		|   * Disk size        | $(object_type_disk_usage all true) |
+		|     * Commits        | $(object_type_disk_usage commit true) |
+		|     * Trees          | $(object_type_disk_usage tree true) |
+		|     * Blobs          |  $(object_type_disk_usage blob true) |
+		|     * Tags           |    $(object_type_disk_usage tag) B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
  2025-12-15 20:56     ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-16  8:25       ` Patrick Steinhardt
  2025-12-16 14:48         ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-16  8:25 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster

On Mon, Dec 15, 2025 at 02:56:39PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index dd17caad05..64db191234 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -5,8 +5,18 @@ test_description='test git repo structure'
>  . ./test-lib.sh
>  
>  object_type_disk_usage() {
> -	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
> -		--filter-provided-objects
> +	disk_usage_opt="--disk-usage"
> +
> +	if [ "$2" = "true" ]; then
> +		disk_usage_opt="--disk-usage=human"
> +	fi
> +
> +	if [ "$1" = "all" ]; then
> +		git rev-list --all --objects $disk_usage_opt
> +	else
> +		git rev-list --all --objects $disk_usage_opt \
> +			--filter=object:type=$1 --filter-provided-objects
> +	fi
>  }
>  
>  test_expect_success 'empty repository' '

We don't use `if [ ... ]` in our codebase, and we typically have the
`then` on the next line:

    if test "$2" = "true"
    then
        ...
    fi

    if test "$1" = "all"
    then
        ...
    else
        ...
    fi

> @@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
>  		|     * Trees          |  15.81 MiB |
>  		|     * Blobs          |  11.68 KiB |
>  		|     * Tags           |    132 B   |
> +		|   * Disk size        | $(object_type_disk_usage all true) |
> +		|     * Commits        | $(object_type_disk_usage commit true) |
> +		|     * Trees          | $(object_type_disk_usage tree true) |
> +		|     * Blobs          |  $(object_type_disk_usage blob true) |
> +		|     * Tags           |    $(object_type_disk_usage tag) B   |
>  		EOF

Curious, but why is the last one special here?

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
  2025-12-16  8:25       ` Patrick Steinhardt
@ 2025-12-16 14:48         ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster

On 25/12/16 09:25AM, Patrick Steinhardt wrote:
> On Mon, Dec 15, 2025 at 02:56:39PM -0600, Justin Tobler wrote:
> > diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> > index dd17caad05..64db191234 100755
> > --- a/t/t1901-repo-structure.sh
> > +++ b/t/t1901-repo-structure.sh
> > @@ -5,8 +5,18 @@ test_description='test git repo structure'
> >  . ./test-lib.sh
> >  
> >  object_type_disk_usage() {
> > -	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
> > -		--filter-provided-objects
> > +	disk_usage_opt="--disk-usage"
> > +
> > +	if [ "$2" = "true" ]; then
> > +		disk_usage_opt="--disk-usage=human"
> > +	fi
> > +
> > +	if [ "$1" = "all" ]; then
> > +		git rev-list --all --objects $disk_usage_opt
> > +	else
> > +		git rev-list --all --objects $disk_usage_opt \
> > +			--filter=object:type=$1 --filter-provided-objects
> > +	fi
> >  }
> >  
> >  test_expect_success 'empty repository' '
> 
> We don't use `if [ ... ]` in our codebase, and we typically have the
> `then` on the next line:
> 
>     if test "$2" = "true"
>     then
>         ...
>     fi
> 
>     if test "$1" = "all"
>     then
>         ...
>     else
>         ...
>     fi

Noted, will fix.

> > @@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
> >  		|     * Trees          |  15.81 MiB |
> >  		|     * Blobs          |  11.68 KiB |
> >  		|     * Tags           |    132 B   |
> > +		|   * Disk size        | $(object_type_disk_usage all true) |
> > +		|     * Commits        | $(object_type_disk_usage commit true) |
> > +		|     * Trees          | $(object_type_disk_usage tree true) |
> > +		|     * Blobs          |  $(object_type_disk_usage blob true) |
> > +		|     * Tags           |    $(object_type_disk_usage tag) B   |
> >  		EOF
> 
> Curious, but why is the last one special here?

The `--disk-usage=human` rev-list option here outputs "byte/bytes"
instead of "B". In patch 5, the HUMANISE_COMPACT flag was added to
humanise_bytes() to toggle this behavior. For the git-repo(1) structure
table output, I wanted to always use the more compact unit prefix
representation.

I'll leave a comment here to explain this special case.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 0/7] builtin/repo: add object size info to structure output
  2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
                       ` (6 preceding siblings ...)
  2025-12-15 20:56     ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-16 17:38     ` Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
                         ` (8 more replies)
  7 siblings, 9 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Greetings,

This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.

In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.

Changes in V4:
- Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
  to avoid conflict with translated plural "byte/bytes" string.
- Remove some unnecessary translations and add comments to clarify some
  of the added translations.
- Some small changes to the tests in patch 7.

Changes in V3:
- Address potential localization regression by making the downscaled
  number format string also translatable. Also make the format string
  for how the values and unit prefixes are displayed via
  `strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
  `humanise_{bytes,count}()` and updated to provide both the value and
  unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
  `OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
  explicitly.
- Tests now use git-rev-list(1) to verify disk size info.

Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
  downscaling values and determining the appropriate unit prefix
  separately. This enables more control over how exactly the values are
  written to the structure output table which is useful for alignment
  reasons. I'm not how about the interface used in patch 2. Feedback is
  most welcome.
- In the previous version, when checking object size on a missing object
  we would die. Instead we now ignore missing objects. This allows the
  structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
  real expected values instead of skipping. Table output tests still
  skip verifing human-readable values though.

Thanks,
-Justin

Justin Tobler (7):
  builtin/repo: group per-type object values into struct
  strbuf: split out logic to humanise byte values
  builtin/repo: humanise count values in structure output
  builtin/repo: add inflated object info to keyvalue structure output
  builtin/repo: add inflated object info to structure table
  builtin/repo: add disk size info to keyvalue stucture output
  builtin/repo: add object disk size info to structure table

 Documentation/git-repo.adoc |   2 +
 builtin/repo.c              | 175 ++++++++++++++++++++++++++++++------
 strbuf.c                    | 102 ++++++++++++++-------
 strbuf.h                    |  25 ++++++
 t/helper/test-simple-ipc.c  |   7 +-
 t/t1901-repo-structure.sh   | 118 ++++++++++++++++--------
 6 files changed, 331 insertions(+), 98 deletions(-)

Range-diff against v3:
1:  be14de68f6 = 1:  be14de68f6 builtin/repo: group per-type object values into struct
2:  1fa33f5906 ! 2:  0a145cfeec strbuf: split out logic to humanise byte values
    @@ Commit message
         determine the corresponding unit prefix into a separate humanise_bytes()
         function that provides seperate value and unit strings.
     
    +    Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
    +    for translation here so that it doesn't conflict with the newly defined
    +    plural "byte/bytes" translation and instead uses it.
    +
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## strbuf.c ##
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     -					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
     -					Q_("%u byte/s", "%u bytes/s", bytes),
     -				(unsigned)bytes);
    -+		*value = xstrfmt(_("%u"), (unsigned)bytes);
    ++		*value = xstrfmt("%u", (unsigned)bytes);
     +		*unit = humanise_rate ?
     +			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
     +			       Q_("byte/s", "bytes/s", bytes) :
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     +	const char *unit;
     +
     +	humanise_bytes(bytes, &value, &unit, flags);
    ++
    ++	/*
    ++	 * TRANSLATORS: The first argument is the number string. The second
    ++	 * argument is the unit prefix string (i.e. "12.34 MiB/s").
    ++	 */
     +	strbuf_addf(buf, _("%s %s"), value, unit);
     +	free(value);
     +}
    @@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
      /**
       * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
       * 3.50 MiB).
    +
    + ## t/helper/test-simple-ipc.c ##
    +@@ t/helper/test-simple-ipc.c: int cmd__simple_ipc(int argc, const char **argv)
    + 		OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
    + 		OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
    + 
    +-		OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
    ++		/*
    ++		 * The "byte" string here is not marked for translation and
    ++		 * instead relies on translation in strbuf.c:humanise_bytes() to
    ++		 * avoid conflict with the plural form.
    ++		 */
    ++		OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
    + 		OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
    + 
    + 		OPT_END()
3:  8f09f6358e ! 3:  eebf0d917b builtin/repo: humanise count values in structure output
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     +		size_t x = count + 5000000; /* for rounding */
     +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
     +				 (unsigned)(x % 1000000000 / 10000000));
    ++		/* TRANSLATORS: SI decimal prefix symbol for 10^9 */
     +		*unit = _("G");
     +	} else if (count >= 1000000) {
     +		size_t x = count + 5000; /* for rounding */
     +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
     +				 (unsigned)(x % 1000000 / 10000));
    ++		/* TRANSLATORS: SI decimal prefix symbol for 10^6 */
     +		*unit = _("M");
     +	} else if (count >= 1000) {
     +		size_t x = count + 5; /* for rounding */
     +		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
     +				 (unsigned)(x % 1000 / 10));
    ++		/* TRANSLATORS: SI decimal prefix symbol for 10^3 */
     +		*unit = _("k");
     +	} else {
    -+		*value = xstrfmt(_("%u"), (unsigned)count);
    ++		*value = xstrfmt("%u", (unsigned)count);
     +		*unit = NULL;
     +	}
     +}
4:  3f4eabe94f = 4:  37f71cc1bc builtin/repo: add inflated object info to keyvalue structure output
5:  85d1052100 ! 5:  40edf4c20b builtin/repo: add inflated object info to structure table
    @@ strbuf.c
     @@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
      		*unit = humanise_rate ? _("KiB/s") : _("KiB");
      	} else {
    - 		*value = xstrfmt(_("%u"), (unsigned)bytes);
    + 		*value = xstrfmt("%u", (unsigned)bytes);
     -		*unit = humanise_rate ?
     -			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
     -			       Q_("byte/s", "bytes/s", bytes) :
     -			       /* TRANSLATORS: IEC 80000-13:2008 byte */
     -			       Q_("byte", "bytes", bytes);
     +		if (flags & HUMANISE_COMPACT)
    ++			/* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
     +			*unit = humanise_rate ? _("B/s") : _("B");
     +		else
     +			*unit = humanise_rate ?
6:  e9fa9babec = 6:  ba861f37c9 builtin/repo: add disk size info to keyvalue stucture output
7:  df542c7bdf ! 7:  3118c17ae3 builtin/repo: add object disk size info to structure table
    @@ t/t1901-repo-structure.sh: test_description='test git repo structure'
     -		--filter-provided-objects
     +	disk_usage_opt="--disk-usage"
     +
    -+	if [ "$2" = "true" ]; then
    ++	if test "$2" = "true"
    ++	then
     +		disk_usage_opt="--disk-usage=human"
     +	fi
     +
    -+	if [ "$1" = "all" ]; then
    ++	if test "$1" = "all"
    ++	then
     +		git rev-list --all --objects $disk_usage_opt
     +	else
     +		git rev-list --all --objects $disk_usage_opt \
    @@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references
      		git notes add -m foo &&
      
     -		cat >expect <<-\EOF &&
    ++		# The tags disk size is handled specially due to the
    ++		# git-rev-list(1) --disk-usage=human option printing the full
    ++		# "byte/bytes" unit prefix instead of just "B".
     +		cat >expect <<-EOF &&
      		| Repository structure | Value      |
      		| -------------------- | ---------- |

base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 1/7] builtin/repo: group per-type object values into struct
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
                         ` (7 subsequent siblings)
  8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
 	size_t others;
 };
 
-struct object_stats {
+struct object_values {
 	size_t tags;
 	size_t commits;
 	size_t trees;
 	size_t blobs;
 };
 
+struct object_stats {
+	struct object_values type_counts;
+};
+
 struct repo_structure {
 	struct ref_stats refs;
 	struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
 	return stats->branches + stats->remotes + stats->tags + stats->others;
 }
 
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
 {
-	return stats->tags + stats->commits + stats->trees + stats->blobs;
+	return values->tags + values->commits + values->trees + values->blobs;
 }
 
 static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_count(objects);
+	object_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
 	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
-	stats_table_count_addf(table, objects->commits, "    * %s", _("Commits"));
-	stats_table_count_addf(table, objects->trees, "    * %s", _("Trees"));
-	stats_table_count_addf(table, objects->blobs, "    * %s", _("Blobs"));
-	stats_table_count_addf(table, objects->tags, "    * %s", _("Tags"));
+	stats_table_count_addf(table, objects->type_counts.commits,
+			       "    * %s", _("Commits"));
+	stats_table_count_addf(table, objects->type_counts.trees,
+			       "    * %s", _("Trees"));
+	stats_table_count_addf(table, objects->type_counts.blobs,
+			       "    * %s", _("Blobs"));
+	stats_table_count_addf(table, objects->type_counts.tags,
+			       "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	       (uintmax_t)stats->refs.others, value_delim);
 
 	printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.commits, value_delim);
+	       (uintmax_t)stats->objects.type_counts.commits, value_delim);
 	printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.trees, value_delim);
+	       (uintmax_t)stats->objects.type_counts.trees, value_delim);
 	printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.blobs, value_delim);
+	       (uintmax_t)stats->objects.type_counts.blobs, value_delim);
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.tags, value_delim);
+	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
 	fflush(stdout);
 }
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 
 	switch (type) {
 	case OBJ_TAG:
-		stats->tags += oids->nr;
+		stats->type_counts.tags += oids->nr;
 		break;
 	case OBJ_COMMIT:
-		stats->commits += oids->nr;
+		stats->type_counts.commits += oids->nr;
 		break;
 	case OBJ_TREE:
-		stats->trees += oids->nr;
+		stats->type_counts.trees += oids->nr;
 		break;
 	case OBJ_BLOB:
-		stats->blobs += oids->nr;
+		stats->type_counts.blobs += oids->nr;
 		break;
 	default:
 		BUG("invalid object type");
 	}
 
-	object_count = get_total_object_count(stats);
+	object_count = get_total_object_values(&stats->type_counts);
 	display_progress(data->progress, object_count);
 
 	return 0;
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 2/7] strbuf: split out logic to humanise byte values
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-16 18:59         ` Junio C Hamano
  2025-12-16 17:38       ` [PATCH v4 3/7] builtin/repo: humanise count values in structure output Justin Tobler
                         ` (6 subsequent siblings)
  8 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment.

Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.

Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
for translation here so that it doesn't conflict with the newly defined
plural "byte/bytes" translation and instead uses it.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 strbuf.c                   | 74 ++++++++++++++++++++------------------
 strbuf.h                   | 14 ++++++++
 t/helper/test-simple-ipc.c |  7 +++-
 3 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..3fbd375ad6 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,53 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
-				 int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags)
 {
+	int humanise_rate = flags & HUMANISE_RATE;
+
 	if (bytes > 1 << 30) {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
-					_("%u.%2.2u GiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
-					_("%u.%2.2u GiB/s"),
-			    (unsigned)(bytes >> 30),
-			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+				 (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+		*unit = humanise_rate ? _("GiB/s") : _("GiB");
 	} else if (bytes > 1 << 20) {
-		unsigned x = bytes + 5243;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
-					_("%u.%2.2u MiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
-					_("%u.%2.2u MiB/s"),
-			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+		unsigned x = bytes + 5243; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 20,
+				 ((x & ((1 << 20) - 1)) * 100) >> 20);
+		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+		*unit = humanise_rate ? _("MiB/s") : _("MiB");
 	} else if (bytes > 1 << 10) {
-		unsigned x = bytes + 5;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
-					_("%u.%2.2u KiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
-					_("%u.%2.2u KiB/s"),
-			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+		unsigned x = bytes + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 10,
+				 ((x & ((1 << 10) - 1)) * 100) >> 10);
+		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 byte */
-					Q_("%u byte", "%u bytes", bytes) :
-					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
-					Q_("%u byte/s", "%u bytes/s", bytes),
-				(unsigned)bytes);
+		*value = xstrfmt("%u", (unsigned)bytes);
+		*unit = humanise_rate ?
+			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+			       Q_("byte/s", "bytes/s", bytes) :
+			       /* TRANSLATORS: IEC 80000-13:2008 byte */
+			       Q_("byte", "bytes", bytes);
 	}
 }
 
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+	char *value;
+	const char *unit;
+
+	humanise_bytes(bytes, &value, &unit, flags);
+
+	/*
+	 * TRANSLATORS: The first argument is the number string. The second
+	 * argument is the unit prefix string (i.e. "12.34 MiB/s").
+	 */
+	strbuf_addf(buf, _("%s %s"), value, unit);
+	free(value);
+}
+
 void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 {
 	strbuf_humanise(buf, bytes, 0);
@@ -884,7 +890,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 
 void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
 {
-	strbuf_humanise(buf, bytes, 1);
+	strbuf_humanise(buf, bytes, HUMANISE_RATE);
 }
 
 int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..4426163e7e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
  */
 void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
 
+enum humanise_flags {
+	/*
+	 * Use rate based unit prefixes for humanised values.
+	 */
+	HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
index 03cc5eea2c..442ad6b16f 100644
--- a/t/helper/test-simple-ipc.c
+++ b/t/helper/test-simple-ipc.c
@@ -603,7 +603,12 @@ int cmd__simple_ipc(int argc, const char **argv)
 		OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
 		OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
 
-		OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
+		/*
+		 * The "byte" string here is not marked for translation and
+		 * instead relies on translation in strbuf.c:humanise_bytes() to
+		 * avoid conflict with the plural form.
+		 */
+		OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
 		OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
 
 		OPT_END()
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 2/7] strbuf: split out logic to humanise byte values
  2025-12-16 17:38       ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16 18:59         ` Junio C Hamano
  2025-12-16 19:39           ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16 18:59 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, worldhello.net

Justin Tobler <jltobler@gmail.com> writes:

> +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> +{
> +	char *value;
> +	const char *unit;
> +
> +	humanise_bytes(bytes, &value, &unit, flags);
> +
> +	/*
> +	 * TRANSLATORS: The first argument is the number string. The second
> +	 * argument is the unit prefix string (i.e. "12.34 MiB/s").
> +	 */
> +	strbuf_addf(buf, _("%s %s"), value, unit);

"unit prefix string"?  Prefix is something that comes before
something else, but this one is at the end.  Simply saying a "unit
string" would probably be a sufficient fix, perhaps?

I read the changes since the last round, and other than this part,
everything looked good.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 2/7] strbuf: split out logic to humanise byte values
  2025-12-16 18:59         ` Junio C Hamano
@ 2025-12-16 19:39           ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 19:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, worldhello.net

On 25/12/17 03:59AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> > +{
> > +	char *value;
> > +	const char *unit;
> > +
> > +	humanise_bytes(bytes, &value, &unit, flags);
> > +
> > +	/*
> > +	 * TRANSLATORS: The first argument is the number string. The second
> > +	 * argument is the unit prefix string (i.e. "12.34 MiB/s").
> > +	 */
> > +	strbuf_addf(buf, _("%s %s"), value, unit);
> 
> "unit prefix string"?  Prefix is something that comes before
> something else, but this one is at the end.  Simply saying a "unit
> string" would probably be a sufficient fix, perhaps?

Ya my bad, the prefix part would be just the Ki, Mi, etc. In this case
it is the whole unit string. Saying "unit string" would be correct. I
can send another version fixing this it if you would like.

Thanks for the review,
-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 3/7] builtin/repo: humanise count values in structure output
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.

For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 38 +++++++++++++++++-------
 strbuf.c                  | 26 ++++++++++++++++
 strbuf.h                  |  6 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
 4 files changed, 91 insertions(+), 41 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
 
 	int name_col_width;
 	int value_col_width;
+	int unit_col_width;
 };
 
 /*
@@ -230,6 +231,7 @@ struct stats_table {
  */
 struct stats_table_entry {
 	char *value;
+	const char *unit;
 };
 
 static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
 
 	if (name_width > table->name_col_width)
 		table->name_col_width = name_width;
-	if (entry) {
+	if (!entry)
+		return;
+	if (entry->value) {
 		int value_width = utf8_strwidth(entry->value);
 		if (value_width > table->value_col_width)
 			table->value_col_width = value_width;
 	}
+	if (entry->unit) {
+		int unit_width = utf8_strwidth(entry->unit);
+		if (unit_width > table->unit_col_width)
+			table->unit_col_width = unit_width;
+	}
 }
 
 static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_list ap;
 
 	CALLOC_ARRAY(entry, 1);
-	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+	humanise_count(value, &entry->value, &entry->unit);
 
 	va_start(ap, format);
 	stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
 {
 	const char *name_col_title = _("Repository structure");
 	const char *value_col_title = _("Value");
-	int name_col_width = utf8_strwidth(name_col_title);
-	int value_col_width = utf8_strwidth(value_col_title);
+	int title_name_width = utf8_strwidth(name_col_title);
+	int title_value_width = utf8_strwidth(value_col_title);
+	int name_col_width = table->name_col_width;
+	int value_col_width = table->value_col_width;
+	int unit_col_width = table->unit_col_width;
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
 
-	if (table->name_col_width > name_col_width)
-		name_col_width = table->name_col_width;
-	if (table->value_col_width > value_col_width)
-		value_col_width = table->value_col_width;
+	if (title_name_width > name_col_width)
+		name_col_width = title_name_width;
+	if (title_value_width > value_col_width + unit_col_width + 1)
+		value_col_width = title_value_width - unit_col_width;
 
 	strbuf_addstr(&buf, "| ");
 	strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
 	strbuf_addstr(&buf, " | ");
-	strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+	strbuf_utf8_align(&buf, ALIGN_LEFT,
+			  value_col_width + unit_col_width + 1, value_col_title);
 	strbuf_addstr(&buf, " |");
 	printf("%s\n", buf.buf);
 
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
 	for (int i = 0; i < name_col_width; i++)
 		putchar('-');
 	printf(" | ");
-	for (int i = 0; i < value_col_width; i++)
+	for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
 		putchar('-');
 	printf(" |\n");
 
 	for_each_string_list_item(item, &table->rows) {
 		struct stats_table_entry *entry = item->util;
 		const char *value = "";
+		const char *unit = "";
 
 		if (entry) {
 			struct stats_table_entry *entry = item->util;
 			value = entry->value;
+			if (entry->unit)
+				unit = entry->unit;
 		}
 
 		strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
 		strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
 		strbuf_addstr(&buf, " | ");
 		strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+		strbuf_addch(&buf, ' ');
+		strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
 		strbuf_addstr(&buf, " |");
 		printf("%s\n", buf.buf);
 	}
diff --git a/strbuf.c b/strbuf.c
index 3fbd375ad6..9beebad5b9 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,32 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
+void humanise_count(size_t count, char **value, const char **unit)
+{
+	if (count >= 1000000000) {
+		size_t x = count + 5000000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+				 (unsigned)(x % 1000000000 / 10000000));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^9 */
+		*unit = _("G");
+	} else if (count >= 1000000) {
+		size_t x = count + 5000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+				 (unsigned)(x % 1000000 / 10000));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^6 */
+		*unit = _("M");
+	} else if (count >= 1000) {
+		size_t x = count + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+				 (unsigned)(x % 1000 / 10));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^3 */
+		*unit = _("k");
+	} else {
+		*value = xstrfmt("%u", (unsigned)count);
+		*unit = NULL;
+	}
+}
+
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags)
 {
diff --git a/strbuf.h b/strbuf.h
index 4426163e7e..571bd889df 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags);
 
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
 	(
 		cd repo &&
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     0 |
-		|     * Branches       |     0 |
-		|     * Tags           |     0 |
-		|     * Remotes        |     0 |
-		|     * Others         |     0 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |     0 |
-		|     * Commits        |     0 |
-		|     * Trees          |     0 |
-		|     * Blobs          |     0 |
-		|     * Tags           |     0 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |     0  |
+		|     * Branches       |     0  |
+		|     * Tags           |     0  |
+		|     * Remotes        |     0  |
+		|     * Others         |     0  |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            |     0  |
+		|     * Commits        |     0  |
+		|     * Trees          |     0  |
+		|     * Blobs          |     0  |
+		|     * Tags           |     0  |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
 	git init repo &&
 	(
 		cd repo &&
-		test_commit_bulk 42 &&
+		test_commit_bulk 1005 &&
 		git tag -a foo -m bar &&
 
 		oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     4 |
-		|     * Branches       |     1 |
-		|     * Tags           |     1 |
-		|     * Remotes        |     1 |
-		|     * Others         |     1 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |   130 |
-		|     * Commits        |    43 |
-		|     * Trees          |    43 |
-		|     * Blobs          |    43 |
-		|     * Tags           |     1 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |    4   |
+		|     * Branches       |    1   |
+		|     * Tags           |    1   |
+		|     * Remotes        |    1   |
+		|     * Others         |    1   |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            | 3.02 k |
+		|     * Commits        | 1.01 k |
+		|     * Trees          | 1.01 k |
+		|     * Blobs          | 1.01 k |
+		|     * Tags           |    1   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (2 preceding siblings ...)
  2025-12-16 17:38       ` [PATCH v4 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-17  7:03         ` Patrick Steinhardt
  2025-12-16 17:38       ` [PATCH v4 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
                         ` (4 subsequent siblings)
  8 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.

For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 33 +++++++++++++++++++++++++++++++++
 t/t1901-repo-structure.sh   |  6 +++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
 +
 * Reference counts categorized by type
 * Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..e207108346 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
 
 #include "builtin.h"
 #include "environment.h"
+#include "hex.h"
+#include "odb.h"
 #include "parse-options.h"
 #include "path-walk.h"
 #include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
 
 struct object_stats {
 	struct object_values type_counts;
+	struct object_values inflated_sizes;
 };
 
 struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
+	printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+	printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+	printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
 }
 
 struct count_objects_data {
+	struct object_database *odb;
 	struct object_stats *stats;
 	struct progress *progress;
 };
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 {
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
+	size_t inflated_total = 0;
 	size_t object_count;
 
+	for (size_t i = 0; i < oids->nr; i++) {
+		struct object_info oi = OBJECT_INFO_INIT;
+		unsigned long inflated;
+
+		oi.sizep = &inflated;
+
+		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+						  OBJECT_INFO_SKIP_FETCH_OBJECT |
+							  OBJECT_INFO_QUICK) < 0)
+			continue;
+
+		inflated_total += inflated;
+	}
+
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
+		stats->inflated_sizes.tags += inflated_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
+		stats->inflated_sizes.commits += inflated_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
+		stats->inflated_sizes.trees += inflated_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
+		stats->inflated_sizes.blobs += inflated_total;
 		break;
 	default:
 		BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
 {
 	struct path_walk_info info = PATH_WALK_INFO_INIT;
 	struct count_objects_data data = {
+		.odb = repo->objects,
 		.stats = stats,
 	};
 
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
 	)
 '
 
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
 		objects.trees.count=42
 		objects.blobs.count=42
 		objects.tags.count=1
+		objects.commits.inflated_size=9225
+		objects.trees.inflated_size=28554
+		objects.blobs.inflated_size=453
+		objects.tags.inflated_size=132
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-16 17:38       ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-17  7:03         ` Patrick Steinhardt
  2025-12-17 16:10           ` Justin Tobler
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-17  7:03 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster, worldhello.net

On Tue, Dec 16, 2025 at 11:38:39AM -0600, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 9c61bc3e17..e207108346 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
>  {
>  	struct count_objects_data *data = cb_data;
>  	struct object_stats *stats = data->stats;
> +	size_t inflated_total = 0;
>  	size_t object_count;
>  
> +	for (size_t i = 0; i < oids->nr; i++) {
> +		struct object_info oi = OBJECT_INFO_INIT;
> +		unsigned long inflated;
> +
> +		oi.sizep = &inflated;
> +
> +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> +						  OBJECT_INFO_SKIP_FETCH_OBJECT |
> +							  OBJECT_INFO_QUICK) < 0)

Tiny nit: there seems to be an extra tab here. This really is only worth
fixing if you intend to reroll anyway.

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-17  7:03         ` Patrick Steinhardt
@ 2025-12-17 16:10           ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 16:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster, worldhello.net

On 25/12/17 08:03AM, Patrick Steinhardt wrote:
> On Tue, Dec 16, 2025 at 11:38:39AM -0600, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index 9c61bc3e17..e207108346 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> >  {
> >  	struct count_objects_data *data = cb_data;
> >  	struct object_stats *stats = data->stats;
> > +	size_t inflated_total = 0;
> >  	size_t object_count;
> >  
> > +	for (size_t i = 0; i < oids->nr; i++) {
> > +		struct object_info oi = OBJECT_INFO_INIT;
> > +		unsigned long inflated;
> > +
> > +		oi.sizep = &inflated;
> > +
> > +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> > +						  OBJECT_INFO_SKIP_FETCH_OBJECT |
> > +							  OBJECT_INFO_QUICK) < 0)
> 
> Tiny nit: there seems to be an extra tab here. This really is only worth
> fixing if you intend to reroll anyway.

I had that initially, but it was failing the check_style CI job so I
just opted to what clang format wanted. I can change it though if I sent
another version. I haven't quite figured out the best way to wrap long
lines.

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 5/7] builtin/repo: add inflated object info to structure table
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (3 preceding siblings ...)
  2025-12-16 17:38       ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 33 +++++++++++++++++++--
 strbuf.c                  | 14 +++++----
 strbuf.h                  |  5 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
 4 files changed, 80 insertions(+), 34 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index e207108346..b73cfd975b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_end(ap);
 }
 
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+				  const char *format, ...)
+{
+	struct stats_table_entry *entry;
+	va_list ap;
+
+	CALLOC_ARRAY(entry, 1);
+	humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+	va_start(ap, format);
+	stats_table_vaddf(table, entry, format, ap);
+	va_end(ap);
+}
+
 static inline size_t get_total_reference_count(struct ref_stats *stats)
 {
 	return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
 {
 	struct object_stats *objects = &stats->objects;
 	struct ref_stats *refs = &stats->refs;
-	size_t object_total;
+	size_t inflated_object_total;
+	size_t object_count_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_values(&objects->type_counts);
+	object_count_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
-	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
+	stats_table_count_addf(table, object_count_total, "  * %s", _("Count"));
 	stats_table_count_addf(table, objects->type_counts.commits,
 			       "    * %s", _("Commits"));
 	stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			       "    * %s", _("Blobs"));
 	stats_table_count_addf(table, objects->type_counts.tags,
 			       "    * %s", _("Tags"));
+
+	inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+	stats_table_size_addf(table, inflated_object_total,
+			      "  * %s", _("Inflated size"));
+	stats_table_size_addf(table, objects->inflated_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->inflated_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->inflated_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->inflated_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 9beebad5b9..512c7ba680 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -886,11 +886,15 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
 		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
 		*value = xstrfmt("%u", (unsigned)bytes);
-		*unit = humanise_rate ?
-			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
-			       Q_("byte/s", "bytes/s", bytes) :
-			       /* TRANSLATORS: IEC 80000-13:2008 byte */
-			       Q_("byte", "bytes", bytes);
+		if (flags & HUMANISE_COMPACT)
+			/* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
+			*unit = humanise_rate ? _("B/s") : _("B");
+		else
+			*unit = humanise_rate ?
+					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
+					Q_("byte/s", "bytes/s", bytes) :
+					/* TRANSLATORS: IEC 80000-13:2008 byte */
+					Q_("byte", "bytes", bytes);
 	}
 }
 
diff --git a/strbuf.h b/strbuf.h
index 571bd889df..005c155808 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
 	 * Use rate based unit prefixes for humanised values.
 	 */
 	HUMANISE_RATE = (1 << 0),
+	/*
+	 * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
+	 * values.
+	 */
+	HUMANISE_COMPACT = (1 << 1),
 };
 
 /**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
 		| Repository structure | Value  |
 		| -------------------- | ------ |
 		| * References         |        |
-		|   * Count            |     0  |
-		|     * Branches       |     0  |
-		|     * Tags           |     0  |
-		|     * Remotes        |     0  |
-		|     * Others         |     0  |
+		|   * Count            |    0   |
+		|     * Branches       |    0   |
+		|     * Tags           |    0   |
+		|     * Remotes        |    0   |
+		|     * Others         |    0   |
 		|                      |        |
 		| * Reachable objects  |        |
-		|   * Count            |     0  |
-		|     * Commits        |     0  |
-		|     * Trees          |     0  |
-		|     * Blobs          |     0  |
-		|     * Tags           |     0  |
+		|   * Count            |    0   |
+		|     * Commits        |    0   |
+		|     * Trees          |    0   |
+		|     * Blobs          |    0   |
+		|     * Tags           |    0   |
+		|   * Inflated size    |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
 	)
 '
 
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value  |
-		| -------------------- | ------ |
-		| * References         |        |
-		|   * Count            |    4   |
-		|     * Branches       |    1   |
-		|     * Tags           |    1   |
-		|     * Remotes        |    1   |
-		|     * Others         |    1   |
-		|                      |        |
-		| * Reachable objects  |        |
-		|   * Count            | 3.02 k |
-		|     * Commits        | 1.01 k |
-		|     * Trees          | 1.01 k |
-		|     * Blobs          | 1.01 k |
-		|     * Tags           |    1   |
+		| Repository structure | Value      |
+		| -------------------- | ---------- |
+		| * References         |            |
+		|   * Count            |      4     |
+		|     * Branches       |      1     |
+		|     * Tags           |      1     |
+		|     * Remotes        |      1     |
+		|     * Others         |      1     |
+		|                      |            |
+		| * Reachable objects  |            |
+		|   * Count            |   3.02 k   |
+		|     * Commits        |   1.01 k   |
+		|     * Trees          |   1.01 k   |
+		|     * Blobs          |   1.01 k   |
+		|     * Tags           |      1     |
+		|   * Inflated size    |  16.03 MiB |
+		|     * Commits        | 217.92 KiB |
+		|     * Trees          |  15.81 MiB |
+		|     * Blobs          |  11.68 KiB |
+		|     * Tags           |    132 B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (4 preceding siblings ...)
  2025-12-16 17:38       ` [PATCH v4 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-16 17:38       ` [PATCH v4 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 18 ++++++++++++++++++
 t/t1901-repo-structure.sh   | 11 ++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
 * Reference counts categorized by type
 * Reachable object counts categorized by type
 * Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b73cfd975b..0ed41bf9d4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
 struct object_stats {
 	struct object_values type_counts;
 	struct object_values inflated_sizes;
+	struct object_values disk_sizes;
 };
 
 struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
 
+	printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+	printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+	printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
 	size_t inflated_total = 0;
+	size_t disk_total = 0;
 	size_t object_count;
 
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		off_t disk;
 
 		oi.sizep = &inflated;
+		oi.disk_sizep = &disk;
 
 		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
 						  OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 			continue;
 
 		inflated_total += inflated;
+		disk_total += disk;
 	}
 
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
 		stats->inflated_sizes.tags += inflated_total;
+		stats->disk_sizes.tags += disk_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
 		stats->inflated_sizes.commits += inflated_total;
+		stats->disk_sizes.commits += disk_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
 		stats->inflated_sizes.trees += inflated_total;
+		stats->disk_sizes.trees += disk_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
 		stats->inflated_sizes.blobs += inflated_total;
+		stats->disk_sizes.blobs += disk_total;
 		break;
 	default:
 		BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
 
 . ./test-lib.sh
 
+object_type_disk_usage() {
+	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+		--filter-provided-objects
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		test_commit_bulk 42 &&
 		git tag -a foo -m bar &&
 
-		cat >expect <<-\EOF &&
+		cat >expect <<-EOF &&
 		references.branches.count=1
 		references.tags.count=1
 		references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		objects.trees.inflated_size=28554
 		objects.blobs.inflated_size=453
 		objects.tags.inflated_size=132
+		objects.commits.disk_size=$(object_type_disk_usage commit)
+		objects.trees.disk_size=$(object_type_disk_usage tree)
+		objects.blobs.disk_size=$(object_type_disk_usage blob)
+		objects.tags.disk_size=$(object_type_disk_usage tag)
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 7/7] builtin/repo: add object disk size info to structure table
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (5 preceding siblings ...)
  2025-12-16 17:38       ` [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-16 17:38       ` Justin Tobler
  2025-12-17  7:03       ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
  8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 13 +++++++++++++
 t/t1901-repo-structure.sh | 31 ++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 0ed41bf9d4..a071d2fdfe 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
 	struct ref_stats *refs = &stats->refs;
 	size_t inflated_object_total;
 	size_t object_count_total;
+	size_t disk_object_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			      "    * %s", _("Blobs"));
 	stats_table_size_addf(table, objects->inflated_sizes.tags,
 			      "    * %s", _("Tags"));
+
+	disk_object_total = get_total_object_values(&objects->disk_sizes);
+	stats_table_size_addf(table, disk_object_total,
+			      "  * %s", _("Disk size"));
+	stats_table_size_addf(table, objects->disk_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->disk_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->disk_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->disk_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..1b68525079 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,20 @@ test_description='test git repo structure'
 . ./test-lib.sh
 
 object_type_disk_usage() {
-	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
-		--filter-provided-objects
+	disk_usage_opt="--disk-usage"
+
+	if test "$2" = "true"
+	then
+		disk_usage_opt="--disk-usage=human"
+	fi
+
+	if test "$1" = "all"
+	then
+		git rev-list --all --objects $disk_usage_opt
+	else
+		git rev-list --all --objects $disk_usage_opt \
+			--filter=object:type=$1 --filter-provided-objects
+	fi
 }
 
 test_expect_success 'empty repository' '
@@ -35,6 +47,11 @@ test_expect_success 'empty repository' '
 		|     * Trees          |    0 B |
 		|     * Blobs          |    0 B |
 		|     * Tags           |    0 B |
+		|   * Disk size        |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -58,7 +75,10 @@ test_expect_success SHA1 'repository with references and objects' '
 		# Also creates a commit, tree, and blob.
 		git notes add -m foo &&
 
-		cat >expect <<-\EOF &&
+		# The tags disk size is handled specially due to the
+		# git-rev-list(1) --disk-usage=human option printing the full
+		# "byte/bytes" unit prefix instead of just "B".
+		cat >expect <<-EOF &&
 		| Repository structure | Value      |
 		| -------------------- | ---------- |
 		| * References         |            |
@@ -79,6 +99,11 @@ test_expect_success SHA1 'repository with references and objects' '
 		|     * Trees          |  15.81 MiB |
 		|     * Blobs          |  11.68 KiB |
 		|     * Tags           |    132 B   |
+		|   * Disk size        | $(object_type_disk_usage all true) |
+		|     * Commits        | $(object_type_disk_usage commit true) |
+		|     * Trees          | $(object_type_disk_usage tree true) |
+		|     * Blobs          |  $(object_type_disk_usage blob true) |
+		|     * Tags           |    $(object_type_disk_usage tag) B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 0/7] builtin/repo: add object size info to structure output
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (6 preceding siblings ...)
  2025-12-16 17:38       ` [PATCH v4 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-17  7:03       ` Patrick Steinhardt
  2025-12-17 17:49         ` Justin Tobler
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
  8 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-17  7:03 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster, worldhello.net

On Tue, Dec 16, 2025 at 11:38:35AM -0600, Justin Tobler wrote:
> Changes in V4:
> - Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
>   to avoid conflict with translated plural "byte/bytes" string.
> - Remove some unnecessary translations and add comments to clarify some
>   of the added translations.
> - Some small changes to the tests in patch 7.

I had a last tiny nit that doesn't warrant a reroll on its own. Other
than that this series looks great to me now. Thanks!

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 0/7] builtin/repo: add object size info to structure output
  2025-12-17  7:03       ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
@ 2025-12-17 17:49         ` Justin Tobler
  0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster, worldhello.net

On 25/12/17 08:03AM, Patrick Steinhardt wrote:
> On Tue, Dec 16, 2025 at 11:38:35AM -0600, Justin Tobler wrote:
> > Changes in V4:
> > - Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
> >   to avoid conflict with translated plural "byte/bytes" string.
> > - Remove some unnecessary translations and add comments to clarify some
> >   of the added translations.
> > - Some small changes to the tests in patch 7.
> 
> I had a last tiny nit that doesn't warrant a reroll on its own. Other
> than that this series looks great to me now. Thanks!

Junio also had some small comments. I'll go ahead a send another
version. Thanks for the review. :)

-Justin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v5 0/7] builtin/repo: add object size info to structure output
  2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
                         ` (7 preceding siblings ...)
  2025-12-17  7:03       ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
@ 2025-12-17 17:53       ` Justin Tobler
  2025-12-17 17:53         ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
                           ` (7 more replies)
  8 siblings, 8 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Greetings,

This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.

In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.

Changes in V5:
- Small updates to some comments and log messages to improve
  correctness.
- Adjusted spacing in builtin/repo.c:count_objects().

Changes in V4:
- Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
  to avoid conflict with translated plural "byte/bytes" string.
- Remove some unnecessary translations and add comments to clarify some
  of the added translations.
- Some small changes to the tests in patch 7.

Changes in V3:
- Address potential localization regression by making the downscaled
  number format string also translatable. Also make the format string
  for how the values and unit prefixes are displayed via
  `strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
  `humanise_{bytes,count}()` and updated to provide both the value and
  unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
  `OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
  explicitly.
- Tests now use git-rev-list(1) to verify disk size info.

Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
  downscaling values and determining the appropriate unit prefix
  separately. This enables more control over how exactly the values are
  written to the structure output table which is useful for alignment
  reasons. I'm not how about the interface used in patch 2. Feedback is
  most welcome.
- In the previous version, when checking object size on a missing object
  we would die. Instead we now ignore missing objects. This allows the
  structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
  real expected values instead of skipping. Table output tests still
  skip verifing human-readable values though.

Thanks,
-Justin

Justin Tobler (7):
  builtin/repo: group per-type object values into struct
  strbuf: split out logic to humanise byte values
  builtin/repo: humanise count values in structure output
  builtin/repo: add inflated object info to keyvalue structure output
  builtin/repo: add inflated object info to structure table
  builtin/repo: add disk size info to keyvalue stucture output
  builtin/repo: add object disk size info to structure table

 Documentation/git-repo.adoc |   2 +
 builtin/repo.c              | 175 ++++++++++++++++++++++++++++++------
 strbuf.c                    | 102 ++++++++++++++-------
 strbuf.h                    |  25 ++++++
 t/helper/test-simple-ipc.c  |   7 +-
 t/t1901-repo-structure.sh   | 118 ++++++++++++++++--------
 6 files changed, 331 insertions(+), 98 deletions(-)

Range-diff against v4:
1:  be14de68f6 = 1:  be14de68f6 builtin/repo: group per-type object values into struct
2:  0a145cfeec ! 2:  61cff22afa strbuf: split out logic to humanise byte values
    @@ Commit message
         In a subsequent commit, byte size values displayed in table output for
         the git-repo(1) "structure" subcommand will be shown in a more
         human-readable format with the appropriate unit prefixes. For this
    -    usecase, the downscaled values and unit prefixes must be handled
    +    usecase, the downscaled values and unit strings must be handled
         separately to ensure proper column alignment.
     
         Split out logic from strbuf_humanise() to downscale byte values and
    @@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
     +
     +	/*
     +	 * TRANSLATORS: The first argument is the number string. The second
    -+	 * argument is the unit prefix string (i.e. "12.34 MiB/s").
    ++	 * argument is the unit string (i.e. "12.34 MiB/s").
     +	 */
     +	strbuf_addf(buf, _("%s %s"), value, unit);
     +	free(value);
    @@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
      
     +enum humanise_flags {
     +	/*
    -+	 * Use rate based unit prefixes for humanised values.
    ++	 * Use rate based units for humanised values.
     +	 */
     +	HUMANISE_RATE = (1 << 0),
     +};
     +
     +/**
     + * Converts the given byte size into a downscaled human-readable value and
    -+ * corresponding unit prefix as two separate strings.
    ++ * corresponding unit as two separate strings.
     + */
     +void humanise_bytes(off_t bytes, char **value, const char **unit,
     +		    unsigned flags);
3:  eebf0d917b ! 3:  0b575738c2 builtin/repo: humanise count values in structure output
    @@ strbuf.h: enum humanise_flags {
      
     +/**
     + * Converts the given count into a downscaled human-readable value and
    -+ * corresponding unit prefix as two separate strings.
    ++ * corresponding unit as two separate strings.
     + */
     +void humanise_count(size_t count, char **value, const char **unit);
     +
4:  37f71cc1bc ! 4:  e2c79c8759 builtin/repo: add inflated object info to keyvalue structure output
    @@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
     +
     +		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
     +						  OBJECT_INFO_SKIP_FETCH_OBJECT |
    -+							  OBJECT_INFO_QUICK) < 0)
    ++						  OBJECT_INFO_QUICK) < 0)
     +			continue;
     +
     +		inflated_total += inflated;
5:  40edf4c20b ! 5:  03219630cc builtin/repo: add inflated object info to structure table
    @@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
     
      ## strbuf.h ##
     @@ strbuf.h: enum humanise_flags {
    - 	 * Use rate based unit prefixes for humanised values.
    + 	 * Use rate based units for humanised values.
      	 */
      	HUMANISE_RATE = (1 << 0),
     +	/*
    -+	 * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
    ++	 * Use compact "B" unit symbol instead of "byte/bytes" for humanised
     +	 * values.
     +	 */
     +	HUMANISE_COMPACT = (1 << 1),
6:  ba861f37c9 = 6:  7d8862a064 builtin/repo: add disk size info to keyvalue stucture output
7:  3118c17ae3 ! 7:  3e2d5c20f8 builtin/repo: add object disk size info to structure table
    @@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references
     -		cat >expect <<-\EOF &&
     +		# The tags disk size is handled specially due to the
     +		# git-rev-list(1) --disk-usage=human option printing the full
    -+		# "byte/bytes" unit prefix instead of just "B".
    ++		# "byte/bytes" unit string instead of just "B".
     +		cat >expect <<-EOF &&
      		| Repository structure | Value      |
      		| -------------------- | ---------- |

base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v5 1/7] builtin/repo: group per-type object values into struct
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
@ 2025-12-17 17:53         ` Justin Tobler
  2025-12-17 17:53         ` [PATCH v5 2/7] strbuf: split out logic to humanise byte values Justin Tobler
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
 	size_t others;
 };
 
-struct object_stats {
+struct object_values {
 	size_t tags;
 	size_t commits;
 	size_t trees;
 	size_t blobs;
 };
 
+struct object_stats {
+	struct object_values type_counts;
+};
+
 struct repo_structure {
 	struct ref_stats refs;
 	struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
 	return stats->branches + stats->remotes + stats->tags + stats->others;
 }
 
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
 {
-	return stats->tags + stats->commits + stats->trees + stats->blobs;
+	return values->tags + values->commits + values->trees + values->blobs;
 }
 
 static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_count(objects);
+	object_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
 	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
-	stats_table_count_addf(table, objects->commits, "    * %s", _("Commits"));
-	stats_table_count_addf(table, objects->trees, "    * %s", _("Trees"));
-	stats_table_count_addf(table, objects->blobs, "    * %s", _("Blobs"));
-	stats_table_count_addf(table, objects->tags, "    * %s", _("Tags"));
+	stats_table_count_addf(table, objects->type_counts.commits,
+			       "    * %s", _("Commits"));
+	stats_table_count_addf(table, objects->type_counts.trees,
+			       "    * %s", _("Trees"));
+	stats_table_count_addf(table, objects->type_counts.blobs,
+			       "    * %s", _("Blobs"));
+	stats_table_count_addf(table, objects->type_counts.tags,
+			       "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	       (uintmax_t)stats->refs.others, value_delim);
 
 	printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.commits, value_delim);
+	       (uintmax_t)stats->objects.type_counts.commits, value_delim);
 	printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.trees, value_delim);
+	       (uintmax_t)stats->objects.type_counts.trees, value_delim);
 	printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.blobs, value_delim);
+	       (uintmax_t)stats->objects.type_counts.blobs, value_delim);
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
-	       (uintmax_t)stats->objects.tags, value_delim);
+	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
 	fflush(stdout);
 }
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 
 	switch (type) {
 	case OBJ_TAG:
-		stats->tags += oids->nr;
+		stats->type_counts.tags += oids->nr;
 		break;
 	case OBJ_COMMIT:
-		stats->commits += oids->nr;
+		stats->type_counts.commits += oids->nr;
 		break;
 	case OBJ_TREE:
-		stats->trees += oids->nr;
+		stats->type_counts.trees += oids->nr;
 		break;
 	case OBJ_BLOB:
-		stats->blobs += oids->nr;
+		stats->type_counts.blobs += oids->nr;
 		break;
 	default:
 		BUG("invalid object type");
 	}
 
-	object_count = get_total_object_count(stats);
+	object_count = get_total_object_values(&stats->type_counts);
 	display_progress(data->progress, object_count);
 
 	return 0;
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 2/7] strbuf: split out logic to humanise byte values
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
  2025-12-17 17:53         ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-17 17:53         ` Justin Tobler
  2025-12-17 17:54         ` [PATCH v5 3/7] builtin/repo: humanise count values in structure output Justin Tobler
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit strings must be handled
separately to ensure proper column alignment.

Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.

Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
for translation here so that it doesn't conflict with the newly defined
plural "byte/bytes" translation and instead uses it.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 strbuf.c                   | 74 ++++++++++++++++++++------------------
 strbuf.h                   | 14 ++++++++
 t/helper/test-simple-ipc.c |  7 +++-
 3 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..349ee9727a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,53 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
-				 int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags)
 {
+	int humanise_rate = flags & HUMANISE_RATE;
+
 	if (bytes > 1 << 30) {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte */
-					_("%u.%2.2u GiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
-					_("%u.%2.2u GiB/s"),
-			    (unsigned)(bytes >> 30),
-			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+				 (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+		/* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+		*unit = humanise_rate ? _("GiB/s") : _("GiB");
 	} else if (bytes > 1 << 20) {
-		unsigned x = bytes + 5243;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte */
-					_("%u.%2.2u MiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
-					_("%u.%2.2u MiB/s"),
-			    x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+		unsigned x = bytes + 5243; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 20,
+				 ((x & ((1 << 20) - 1)) * 100) >> 20);
+		/* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+		*unit = humanise_rate ? _("MiB/s") : _("MiB");
 	} else if (bytes > 1 << 10) {
-		unsigned x = bytes + 5;  /* for rounding */
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte */
-					_("%u.%2.2u KiB") :
-					/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
-					_("%u.%2.2u KiB/s"),
-			    x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+		unsigned x = bytes + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), x >> 10,
+				 ((x & ((1 << 10) - 1)) * 100) >> 10);
+		/* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
-		strbuf_addf(buf,
-				humanise_rate == 0 ?
-					/* TRANSLATORS: IEC 80000-13:2008 byte */
-					Q_("%u byte", "%u bytes", bytes) :
-					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
-					Q_("%u byte/s", "%u bytes/s", bytes),
-				(unsigned)bytes);
+		*value = xstrfmt("%u", (unsigned)bytes);
+		*unit = humanise_rate ?
+			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+			       Q_("byte/s", "bytes/s", bytes) :
+			       /* TRANSLATORS: IEC 80000-13:2008 byte */
+			       Q_("byte", "bytes", bytes);
 	}
 }
 
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+	char *value;
+	const char *unit;
+
+	humanise_bytes(bytes, &value, &unit, flags);
+
+	/*
+	 * TRANSLATORS: The first argument is the number string. The second
+	 * argument is the unit string (i.e. "12.34 MiB/s").
+	 */
+	strbuf_addf(buf, _("%s %s"), value, unit);
+	free(value);
+}
+
 void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 {
 	strbuf_humanise(buf, bytes, 0);
@@ -884,7 +890,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 
 void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
 {
-	strbuf_humanise(buf, bytes, 1);
+	strbuf_humanise(buf, bytes, HUMANISE_RATE);
 }
 
 int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..698b3cc4a5 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
  */
 void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
 
+enum humanise_flags {
+	/*
+	 * Use rate based units for humanised values.
+	 */
+	HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+		    unsigned flags);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
index 03cc5eea2c..442ad6b16f 100644
--- a/t/helper/test-simple-ipc.c
+++ b/t/helper/test-simple-ipc.c
@@ -603,7 +603,12 @@ int cmd__simple_ipc(int argc, const char **argv)
 		OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
 		OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
 
-		OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
+		/*
+		 * The "byte" string here is not marked for translation and
+		 * instead relies on translation in strbuf.c:humanise_bytes() to
+		 * avoid conflict with the plural form.
+		 */
+		OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
 		OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
 
 		OPT_END()
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 3/7] builtin/repo: humanise count values in structure output
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
  2025-12-17 17:53         ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
  2025-12-17 17:53         ` [PATCH v5 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-17 17:54         ` Justin Tobler
  2025-12-17 17:54         ` [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.

For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 38 +++++++++++++++++-------
 strbuf.c                  | 26 ++++++++++++++++
 strbuf.h                  |  6 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
 4 files changed, 91 insertions(+), 41 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
 
 	int name_col_width;
 	int value_col_width;
+	int unit_col_width;
 };
 
 /*
@@ -230,6 +231,7 @@ struct stats_table {
  */
 struct stats_table_entry {
 	char *value;
+	const char *unit;
 };
 
 static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
 
 	if (name_width > table->name_col_width)
 		table->name_col_width = name_width;
-	if (entry) {
+	if (!entry)
+		return;
+	if (entry->value) {
 		int value_width = utf8_strwidth(entry->value);
 		if (value_width > table->value_col_width)
 			table->value_col_width = value_width;
 	}
+	if (entry->unit) {
+		int unit_width = utf8_strwidth(entry->unit);
+		if (unit_width > table->unit_col_width)
+			table->unit_col_width = unit_width;
+	}
 }
 
 static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_list ap;
 
 	CALLOC_ARRAY(entry, 1);
-	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+	humanise_count(value, &entry->value, &entry->unit);
 
 	va_start(ap, format);
 	stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
 {
 	const char *name_col_title = _("Repository structure");
 	const char *value_col_title = _("Value");
-	int name_col_width = utf8_strwidth(name_col_title);
-	int value_col_width = utf8_strwidth(value_col_title);
+	int title_name_width = utf8_strwidth(name_col_title);
+	int title_value_width = utf8_strwidth(value_col_title);
+	int name_col_width = table->name_col_width;
+	int value_col_width = table->value_col_width;
+	int unit_col_width = table->unit_col_width;
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
 
-	if (table->name_col_width > name_col_width)
-		name_col_width = table->name_col_width;
-	if (table->value_col_width > value_col_width)
-		value_col_width = table->value_col_width;
+	if (title_name_width > name_col_width)
+		name_col_width = title_name_width;
+	if (title_value_width > value_col_width + unit_col_width + 1)
+		value_col_width = title_value_width - unit_col_width;
 
 	strbuf_addstr(&buf, "| ");
 	strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
 	strbuf_addstr(&buf, " | ");
-	strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+	strbuf_utf8_align(&buf, ALIGN_LEFT,
+			  value_col_width + unit_col_width + 1, value_col_title);
 	strbuf_addstr(&buf, " |");
 	printf("%s\n", buf.buf);
 
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
 	for (int i = 0; i < name_col_width; i++)
 		putchar('-');
 	printf(" | ");
-	for (int i = 0; i < value_col_width; i++)
+	for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
 		putchar('-');
 	printf(" |\n");
 
 	for_each_string_list_item(item, &table->rows) {
 		struct stats_table_entry *entry = item->util;
 		const char *value = "";
+		const char *unit = "";
 
 		if (entry) {
 			struct stats_table_entry *entry = item->util;
 			value = entry->value;
+			if (entry->unit)
+				unit = entry->unit;
 		}
 
 		strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
 		strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
 		strbuf_addstr(&buf, " | ");
 		strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+		strbuf_addch(&buf, ' ');
+		strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
 		strbuf_addstr(&buf, " |");
 		printf("%s\n", buf.buf);
 	}
diff --git a/strbuf.c b/strbuf.c
index 349ee9727a..995ff15169 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,32 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
 	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
+void humanise_count(size_t count, char **value, const char **unit)
+{
+	if (count >= 1000000000) {
+		size_t x = count + 5000000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+				 (unsigned)(x % 1000000000 / 10000000));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^9 */
+		*unit = _("G");
+	} else if (count >= 1000000) {
+		size_t x = count + 5000; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+				 (unsigned)(x % 1000000 / 10000));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^6 */
+		*unit = _("M");
+	} else if (count >= 1000) {
+		size_t x = count + 5; /* for rounding */
+		*value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+				 (unsigned)(x % 1000 / 10));
+		/* TRANSLATORS: SI decimal prefix symbol for 10^3 */
+		*unit = _("k");
+	} else {
+		*value = xstrfmt("%u", (unsigned)count);
+		*unit = NULL;
+	}
+}
+
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags)
 {
diff --git a/strbuf.h b/strbuf.h
index 698b3cc4a5..52feef4c1b 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
 void humanise_bytes(off_t bytes, char **value, const char **unit,
 		    unsigned flags);
 
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
 /**
  * Append the given byte size as a human-readable string (i.e. 12.23 KiB,
  * 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
 	(
 		cd repo &&
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     0 |
-		|     * Branches       |     0 |
-		|     * Tags           |     0 |
-		|     * Remotes        |     0 |
-		|     * Others         |     0 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |     0 |
-		|     * Commits        |     0 |
-		|     * Trees          |     0 |
-		|     * Blobs          |     0 |
-		|     * Tags           |     0 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |     0  |
+		|     * Branches       |     0  |
+		|     * Tags           |     0  |
+		|     * Remotes        |     0  |
+		|     * Others         |     0  |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            |     0  |
+		|     * Commits        |     0  |
+		|     * Trees          |     0  |
+		|     * Blobs          |     0  |
+		|     * Tags           |     0  |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
 	git init repo &&
 	(
 		cd repo &&
-		test_commit_bulk 42 &&
+		test_commit_bulk 1005 &&
 		git tag -a foo -m bar &&
 
 		oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value |
-		| -------------------- | ----- |
-		| * References         |       |
-		|   * Count            |     4 |
-		|     * Branches       |     1 |
-		|     * Tags           |     1 |
-		|     * Remotes        |     1 |
-		|     * Others         |     1 |
-		|                      |       |
-		| * Reachable objects  |       |
-		|   * Count            |   130 |
-		|     * Commits        |    43 |
-		|     * Trees          |    43 |
-		|     * Blobs          |    43 |
-		|     * Tags           |     1 |
+		| Repository structure | Value  |
+		| -------------------- | ------ |
+		| * References         |        |
+		|   * Count            |    4   |
+		|     * Branches       |    1   |
+		|     * Tags           |    1   |
+		|     * Remotes        |    1   |
+		|     * Others         |    1   |
+		|                      |        |
+		| * Reachable objects  |        |
+		|   * Count            | 3.02 k |
+		|     * Commits        | 1.01 k |
+		|     * Trees          | 1.01 k |
+		|     * Blobs          | 1.01 k |
+		|     * Tags           |    1   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue structure output
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
                           ` (2 preceding siblings ...)
  2025-12-17 17:54         ` [PATCH v5 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-17 17:54         ` Justin Tobler
  2025-12-17 17:54         ` [PATCH v5 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.

For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 33 +++++++++++++++++++++++++++++++++
 t/t1901-repo-structure.sh   |  6 +++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
 +
 * Reference counts categorized by type
 * Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..8da321a386 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
 
 #include "builtin.h"
 #include "environment.h"
+#include "hex.h"
+#include "odb.h"
 #include "parse-options.h"
 #include "path-walk.h"
 #include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
 
 struct object_stats {
 	struct object_values type_counts;
+	struct object_values inflated_sizes;
 };
 
 struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.type_counts.tags, value_delim);
 
+	printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+	printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+	printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
 }
 
 struct count_objects_data {
+	struct object_database *odb;
 	struct object_stats *stats;
 	struct progress *progress;
 };
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 {
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
+	size_t inflated_total = 0;
 	size_t object_count;
 
+	for (size_t i = 0; i < oids->nr; i++) {
+		struct object_info oi = OBJECT_INFO_INIT;
+		unsigned long inflated;
+
+		oi.sizep = &inflated;
+
+		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+						  OBJECT_INFO_SKIP_FETCH_OBJECT |
+						  OBJECT_INFO_QUICK) < 0)
+			continue;
+
+		inflated_total += inflated;
+	}
+
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
+		stats->inflated_sizes.tags += inflated_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
+		stats->inflated_sizes.commits += inflated_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
+		stats->inflated_sizes.trees += inflated_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
+		stats->inflated_sizes.blobs += inflated_total;
 		break;
 	default:
 		BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
 {
 	struct path_walk_info info = PATH_WALK_INFO_INIT;
 	struct count_objects_data data = {
+		.odb = repo->objects,
 		.stats = stats,
 	};
 
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
 	)
 '
 
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
 		objects.trees.count=42
 		objects.blobs.count=42
 		objects.tags.count=1
+		objects.commits.inflated_size=9225
+		objects.trees.inflated_size=28554
+		objects.blobs.inflated_size=453
+		objects.tags.inflated_size=132
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 5/7] builtin/repo: add inflated object info to structure table
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
                           ` (3 preceding siblings ...)
  2025-12-17 17:54         ` [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-17 17:54         ` Justin Tobler
  2025-12-17 17:54         ` [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 33 +++++++++++++++++++--
 strbuf.c                  | 14 +++++----
 strbuf.h                  |  5 ++++
 t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
 4 files changed, 80 insertions(+), 34 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 8da321a386..67d7548b88 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
 	va_end(ap);
 }
 
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+				  const char *format, ...)
+{
+	struct stats_table_entry *entry;
+	va_list ap;
+
+	CALLOC_ARRAY(entry, 1);
+	humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+	va_start(ap, format);
+	stats_table_vaddf(table, entry, format, ap);
+	va_end(ap);
+}
+
 static inline size_t get_total_reference_count(struct ref_stats *stats)
 {
 	return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
 {
 	struct object_stats *objects = &stats->objects;
 	struct ref_stats *refs = &stats->refs;
-	size_t object_total;
+	size_t inflated_object_total;
+	size_t object_count_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
 	stats_table_count_addf(table, refs->remotes, "    * %s", _("Remotes"));
 	stats_table_count_addf(table, refs->others, "    * %s", _("Others"));
 
-	object_total = get_total_object_values(&objects->type_counts);
+	object_count_total = get_total_object_values(&objects->type_counts);
 	stats_table_addf(table, "");
 	stats_table_addf(table, "* %s", _("Reachable objects"));
-	stats_table_count_addf(table, object_total, "  * %s", _("Count"));
+	stats_table_count_addf(table, object_count_total, "  * %s", _("Count"));
 	stats_table_count_addf(table, objects->type_counts.commits,
 			       "    * %s", _("Commits"));
 	stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			       "    * %s", _("Blobs"));
 	stats_table_count_addf(table, objects->type_counts.tags,
 			       "    * %s", _("Tags"));
+
+	inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+	stats_table_size_addf(table, inflated_object_total,
+			      "  * %s", _("Inflated size"));
+	stats_table_size_addf(table, objects->inflated_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->inflated_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->inflated_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->inflated_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 995ff15169..7fb7d12ac0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -886,11 +886,15 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
 		*unit = humanise_rate ? _("KiB/s") : _("KiB");
 	} else {
 		*value = xstrfmt("%u", (unsigned)bytes);
-		*unit = humanise_rate ?
-			       /* TRANSLATORS: IEC 80000-13:2008 byte/second */
-			       Q_("byte/s", "bytes/s", bytes) :
-			       /* TRANSLATORS: IEC 80000-13:2008 byte */
-			       Q_("byte", "bytes", bytes);
+		if (flags & HUMANISE_COMPACT)
+			/* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
+			*unit = humanise_rate ? _("B/s") : _("B");
+		else
+			*unit = humanise_rate ?
+					/* TRANSLATORS: IEC 80000-13:2008 byte/second */
+					Q_("byte/s", "bytes/s", bytes) :
+					/* TRANSLATORS: IEC 80000-13:2008 byte */
+					Q_("byte", "bytes", bytes);
 	}
 }
 
diff --git a/strbuf.h b/strbuf.h
index 52feef4c1b..06e284f9cc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
 	 * Use rate based units for humanised values.
 	 */
 	HUMANISE_RATE = (1 << 0),
+	/*
+	 * Use compact "B" unit symbol instead of "byte/bytes" for humanised
+	 * values.
+	 */
+	HUMANISE_COMPACT = (1 << 1),
 };
 
 /**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
 		| Repository structure | Value  |
 		| -------------------- | ------ |
 		| * References         |        |
-		|   * Count            |     0  |
-		|     * Branches       |     0  |
-		|     * Tags           |     0  |
-		|     * Remotes        |     0  |
-		|     * Others         |     0  |
+		|   * Count            |    0   |
+		|     * Branches       |    0   |
+		|     * Tags           |    0   |
+		|     * Remotes        |    0   |
+		|     * Others         |    0   |
 		|                      |        |
 		| * Reachable objects  |        |
-		|   * Count            |     0  |
-		|     * Commits        |     0  |
-		|     * Trees          |     0  |
-		|     * Blobs          |     0  |
-		|     * Tags           |     0  |
+		|   * Count            |    0   |
+		|     * Commits        |    0   |
+		|     * Trees          |    0   |
+		|     * Blobs          |    0   |
+		|     * Tags           |    0   |
+		|   * Inflated size    |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
 	)
 '
 
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
 		git notes add -m foo &&
 
 		cat >expect <<-\EOF &&
-		| Repository structure | Value  |
-		| -------------------- | ------ |
-		| * References         |        |
-		|   * Count            |    4   |
-		|     * Branches       |    1   |
-		|     * Tags           |    1   |
-		|     * Remotes        |    1   |
-		|     * Others         |    1   |
-		|                      |        |
-		| * Reachable objects  |        |
-		|   * Count            | 3.02 k |
-		|     * Commits        | 1.01 k |
-		|     * Trees          | 1.01 k |
-		|     * Blobs          | 1.01 k |
-		|     * Tags           |    1   |
+		| Repository structure | Value      |
+		| -------------------- | ---------- |
+		| * References         |            |
+		|   * Count            |      4     |
+		|     * Branches       |      1     |
+		|     * Tags           |      1     |
+		|     * Remotes        |      1     |
+		|     * Others         |      1     |
+		|                      |            |
+		| * Reachable objects  |            |
+		|   * Count            |   3.02 k   |
+		|     * Commits        |   1.01 k   |
+		|     * Trees          |   1.01 k   |
+		|     * Blobs          |   1.01 k   |
+		|     * Tags           |      1     |
+		|   * Inflated size    |  16.03 MiB |
+		|     * Commits        | 217.92 KiB |
+		|     * Trees          |  15.81 MiB |
+		|     * Blobs          |  11.68 KiB |
+		|     * Tags           |    132 B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
                           ` (4 preceding siblings ...)
  2025-12-17 17:54         ` [PATCH v5 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-17 17:54         ` Justin Tobler
  2025-12-17 17:54         ` [PATCH v5 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
  2025-12-18  6:32         ` [PATCH v5 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/git-repo.adoc |  1 +
 builtin/repo.c              | 18 ++++++++++++++++++
 t/t1901-repo-structure.sh   | 11 ++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
 * Reference counts categorized by type
 * Reachable object counts categorized by type
 * Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
 
 +
 The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 67d7548b88..7ea051f3af 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
 struct object_stats {
 	struct object_values type_counts;
 	struct object_values inflated_sizes;
+	struct object_values disk_sizes;
 };
 
 struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
 	printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
 	       (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
 
+	printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+	printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+	printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+	printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+	       (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
 	fflush(stdout);
 }
 
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 	struct count_objects_data *data = cb_data;
 	struct object_stats *stats = data->stats;
 	size_t inflated_total = 0;
+	size_t disk_total = 0;
 	size_t object_count;
 
 	for (size_t i = 0; i < oids->nr; i++) {
 		struct object_info oi = OBJECT_INFO_INIT;
 		unsigned long inflated;
+		off_t disk;
 
 		oi.sizep = &inflated;
+		oi.disk_sizep = &disk;
 
 		if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
 						  OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
 			continue;
 
 		inflated_total += inflated;
+		disk_total += disk;
 	}
 
 	switch (type) {
 	case OBJ_TAG:
 		stats->type_counts.tags += oids->nr;
 		stats->inflated_sizes.tags += inflated_total;
+		stats->disk_sizes.tags += disk_total;
 		break;
 	case OBJ_COMMIT:
 		stats->type_counts.commits += oids->nr;
 		stats->inflated_sizes.commits += inflated_total;
+		stats->disk_sizes.commits += disk_total;
 		break;
 	case OBJ_TREE:
 		stats->type_counts.trees += oids->nr;
 		stats->inflated_sizes.trees += inflated_total;
+		stats->disk_sizes.trees += disk_total;
 		break;
 	case OBJ_BLOB:
 		stats->type_counts.blobs += oids->nr;
 		stats->inflated_sizes.blobs += inflated_total;
+		stats->disk_sizes.blobs += disk_total;
 		break;
 	default:
 		BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
 
 . ./test-lib.sh
 
+object_type_disk_usage() {
+	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+		--filter-provided-objects
+}
+
 test_expect_success 'empty repository' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		test_commit_bulk 42 &&
 		git tag -a foo -m bar &&
 
-		cat >expect <<-\EOF &&
+		cat >expect <<-EOF &&
 		references.branches.count=1
 		references.tags.count=1
 		references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
 		objects.trees.inflated_size=28554
 		objects.blobs.inflated_size=453
 		objects.tags.inflated_size=132
+		objects.commits.disk_size=$(object_type_disk_usage commit)
+		objects.trees.disk_size=$(object_type_disk_usage tree)
+		objects.blobs.disk_size=$(object_type_disk_usage blob)
+		objects.tags.disk_size=$(object_type_disk_usage tag)
 		EOF
 
 		git repo structure --format=keyvalue >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 7/7] builtin/repo: add object disk size info to structure table
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
                           ` (5 preceding siblings ...)
  2025-12-17 17:54         ` [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-17 17:54         ` Justin Tobler
  2025-12-18  6:32         ` [PATCH v5 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
  7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
  To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler

Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/repo.c            | 13 +++++++++++++
 t/t1901-repo-structure.sh | 31 ++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/builtin/repo.c b/builtin/repo.c
index 7ea051f3af..09bc8fccfd 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
 	struct ref_stats *refs = &stats->refs;
 	size_t inflated_object_total;
 	size_t object_count_total;
+	size_t disk_object_total;
 	size_t ref_total;
 
 	ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
 			      "    * %s", _("Blobs"));
 	stats_table_size_addf(table, objects->inflated_sizes.tags,
 			      "    * %s", _("Tags"));
+
+	disk_object_total = get_total_object_values(&objects->disk_sizes);
+	stats_table_size_addf(table, disk_object_total,
+			      "  * %s", _("Disk size"));
+	stats_table_size_addf(table, objects->disk_sizes.commits,
+			      "    * %s", _("Commits"));
+	stats_table_size_addf(table, objects->disk_sizes.trees,
+			      "    * %s", _("Trees"));
+	stats_table_size_addf(table, objects->disk_sizes.blobs,
+			      "    * %s", _("Blobs"));
+	stats_table_size_addf(table, objects->disk_sizes.tags,
+			      "    * %s", _("Tags"));
 }
 
 static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..435fd979fa 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,20 @@ test_description='test git repo structure'
 . ./test-lib.sh
 
 object_type_disk_usage() {
-	git rev-list --all --objects --disk-usage --filter=object:type=$1 \
-		--filter-provided-objects
+	disk_usage_opt="--disk-usage"
+
+	if test "$2" = "true"
+	then
+		disk_usage_opt="--disk-usage=human"
+	fi
+
+	if test "$1" = "all"
+	then
+		git rev-list --all --objects $disk_usage_opt
+	else
+		git rev-list --all --objects $disk_usage_opt \
+			--filter=object:type=$1 --filter-provided-objects
+	fi
 }
 
 test_expect_success 'empty repository' '
@@ -35,6 +47,11 @@ test_expect_success 'empty repository' '
 		|     * Trees          |    0 B |
 		|     * Blobs          |    0 B |
 		|     * Tags           |    0 B |
+		|   * Disk size        |    0 B |
+		|     * Commits        |    0 B |
+		|     * Trees          |    0 B |
+		|     * Blobs          |    0 B |
+		|     * Tags           |    0 B |
 		EOF
 
 		git repo structure >out 2>err &&
@@ -58,7 +75,10 @@ test_expect_success SHA1 'repository with references and objects' '
 		# Also creates a commit, tree, and blob.
 		git notes add -m foo &&
 
-		cat >expect <<-\EOF &&
+		# The tags disk size is handled specially due to the
+		# git-rev-list(1) --disk-usage=human option printing the full
+		# "byte/bytes" unit string instead of just "B".
+		cat >expect <<-EOF &&
 		| Repository structure | Value      |
 		| -------------------- | ---------- |
 		| * References         |            |
@@ -79,6 +99,11 @@ test_expect_success SHA1 'repository with references and objects' '
 		|     * Trees          |  15.81 MiB |
 		|     * Blobs          |  11.68 KiB |
 		|     * Tags           |    132 B   |
+		|   * Disk size        | $(object_type_disk_usage all true) |
+		|     * Commits        | $(object_type_disk_usage commit true) |
+		|     * Trees          | $(object_type_disk_usage tree true) |
+		|     * Blobs          |  $(object_type_disk_usage blob true) |
+		|     * Tags           |    $(object_type_disk_usage tag) B   |
 		EOF
 
 		git repo structure >out 2>err &&
-- 
2.52.0.209.ge85ae279b0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 0/7] builtin/repo: add object size info to structure output
  2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
                           ` (6 preceding siblings ...)
  2025-12-17 17:54         ` [PATCH v5 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-18  6:32         ` Patrick Steinhardt
  7 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-18  6:32 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, gitster, worldhello.net

On Wed, Dec 17, 2025 at 11:53:57AM -0600, Justin Tobler wrote:
> Greetings,
> 
> This patch series extends the recently introduced "structure" subcommand
> for git-repo(1) to collect object size information. More specifically,
> it shows total inflated and disk sizes of objects by object type. The
> aim to provide additional insight that may be useful to users regarding
> the structure of a repository.
> 
> In addition to this change, this series also updates the table output
> format to downscale larger output values along with the appropriate unit
> prefix. This is done to make table output more human friendly. The
> keyvalue and nul output formats are left the same since they are
> intended more for machine parsing.
> 
> Changes in V5:
> - Small updates to some comments and log messages to improve
>   correctness.
> - Adjusted spacing in builtin/repo.c:count_objects().

I'm happy with this version, thanks!

Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2025-12-18  6:32 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09 22:58 [PATCH 0/6] builtin/repo: add object size info to structure output Justin Tobler
2025-12-09 22:58 ` [PATCH 1/6] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-09 22:58 ` [PATCH 2/6] builtin/repo: humanise count values in structure output Justin Tobler
2025-12-10  6:28   ` Patrick Steinhardt
2025-12-10 15:10     ` Justin Tobler
2025-12-11  2:57       ` Junio C Hamano
2025-12-12 16:46         ` Justin Tobler
2025-12-09 22:58 ` [PATCH 3/6] builtin/repo: add inflated object info to keyvalue " Justin Tobler
2025-12-09 22:58 ` [PATCH 4/6] builtin/repo: add inflated object info to structure table Justin Tobler
2025-12-10  6:28   ` Patrick Steinhardt
2025-12-10 15:21     ` Justin Tobler
2025-12-09 22:58 ` [PATCH 5/6] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
2025-12-10  6:28   ` Patrick Steinhardt
2025-12-10 15:24     ` Justin Tobler
2025-12-12 20:40     ` Justin Tobler
2025-12-15  5:33       ` Patrick Steinhardt
2025-12-15 16:24         ` Justin Tobler
2025-12-10 14:58   ` Junio C Hamano
2025-12-10 19:09     ` Lucas Seiki Oshiro
2025-12-12 22:36     ` Justin Tobler
2025-12-12 23:58       ` Junio C Hamano
2025-12-09 22:58 ` [PATCH 6/6] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-10  6:28   ` Patrick Steinhardt
2025-12-10 15:24     ` Justin Tobler
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-12 22:36   ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-12 22:36   ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-15  5:33     ` Patrick Steinhardt
2025-12-15 16:26       ` Justin Tobler
2025-12-15  8:21     ` Junio C Hamano
2025-12-15 16:47       ` Justin Tobler
2025-12-16  2:26     ` Jiang Xin
2025-12-16  4:37       ` Junio C Hamano
2025-12-16  6:18         ` Jiang Xin
2025-12-16 14:41           ` Justin Tobler
2025-12-12 22:36   ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
2025-12-15  5:33     ` Patrick Steinhardt
2025-12-12 22:36   ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
2025-12-15  5:33     ` Patrick Steinhardt
2025-12-15 16:48       ` Justin Tobler
2025-12-12 22:36   ` [PATCH v2 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
2025-12-12 22:36   ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
2025-12-15  5:33     ` Patrick Steinhardt
2025-12-12 22:36   ` [PATCH v2 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-15 20:56   ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-15 20:56     ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-15 20:56     ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-16  1:19       ` Junio C Hamano
2025-12-16  1:36         ` Justin Tobler
2025-12-15 20:56     ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
2025-12-16  8:25       ` Patrick Steinhardt
2025-12-15 20:56     ` [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
2025-12-15 20:56     ` [PATCH v3 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
2025-12-15 20:56     ` [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
2025-12-15 20:56     ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-16  8:25       ` Patrick Steinhardt
2025-12-16 14:48         ` Justin Tobler
2025-12-16 17:38     ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-16 17:38       ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-16 17:38       ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-16 18:59         ` Junio C Hamano
2025-12-16 19:39           ` Justin Tobler
2025-12-16 17:38       ` [PATCH v4 3/7] builtin/repo: humanise count values in structure output Justin Tobler
2025-12-16 17:38       ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
2025-12-17  7:03         ` Patrick Steinhardt
2025-12-17 16:10           ` Justin Tobler
2025-12-16 17:38       ` [PATCH v4 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
2025-12-16 17:38       ` [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
2025-12-16 17:38       ` [PATCH v4 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-17  7:03       ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
2025-12-17 17:49         ` Justin Tobler
2025-12-17 17:53       ` [PATCH v5 " Justin Tobler
2025-12-17 17:53         ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-17 17:53         ` [PATCH v5 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-17 17:54         ` [PATCH v5 3/7] builtin/repo: humanise count values in structure output Justin Tobler
2025-12-17 17:54         ` [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
2025-12-17 17:54         ` [PATCH v5 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
2025-12-17 17:54         ` [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
2025-12-17 17:54         ` [PATCH v5 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-18  6:32         ` [PATCH v5 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).