* [PATCH v2 1/7] builtin/repo: group per-type object values into struct
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-12 22:36 ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
` (6 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
size_t others;
};
-struct object_stats {
+struct object_values {
size_t tags;
size_t commits;
size_t trees;
size_t blobs;
};
+struct object_stats {
+ struct object_values type_counts;
+};
+
struct repo_structure {
struct ref_stats refs;
struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
return stats->branches + stats->remotes + stats->tags + stats->others;
}
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
{
- return stats->tags + stats->commits + stats->trees + stats->blobs;
+ return values->tags + values->commits + values->trees + values->blobs;
}
static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_count(objects);
+ object_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
stats_table_count_addf(table, object_total, " * %s", _("Count"));
- stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
- stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
- stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
- stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, objects->type_counts.commits,
+ " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->type_counts.trees,
+ " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->type_counts.blobs,
+ " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->type_counts.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
(uintmax_t)stats->refs.others, value_delim);
printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.commits, value_delim);
+ (uintmax_t)stats->objects.type_counts.commits, value_delim);
printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.trees, value_delim);
+ (uintmax_t)stats->objects.type_counts.trees, value_delim);
printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.blobs, value_delim);
+ (uintmax_t)stats->objects.type_counts.blobs, value_delim);
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.tags, value_delim);
+ (uintmax_t)stats->objects.type_counts.tags, value_delim);
fflush(stdout);
}
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
switch (type) {
case OBJ_TAG:
- stats->tags += oids->nr;
+ stats->type_counts.tags += oids->nr;
break;
case OBJ_COMMIT:
- stats->commits += oids->nr;
+ stats->type_counts.commits += oids->nr;
break;
case OBJ_TREE:
- stats->trees += oids->nr;
+ stats->type_counts.trees += oids->nr;
break;
case OBJ_BLOB:
- stats->blobs += oids->nr;
+ stats->type_counts.blobs += oids->nr;
break;
default:
BUG("invalid object type");
}
- object_count = get_total_object_count(stats);
+ object_count = get_total_object_values(&stats->type_counts);
display_progress(data->progress, object_count);
return 0;
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-12 22:36 ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
` (2 more replies)
2025-12-12 22:36 ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
` (5 subsequent siblings)
7 siblings, 3 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment. Refactor strbuf_humanise()
to instead append the downscaled byte value to the buffer only and
return the appropriate unit prefix string.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
strbuf.c | 62 +++++++++++++++++++++++++-------------------------------
strbuf.h | 9 ++++++++
2 files changed, 37 insertions(+), 34 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..1fb47bf21b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,55 +836,49 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
- int humanise_rate)
+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
{
+ int humanise_rate = flags & STRBUF_HUMANISE_RATE;
+
if (bytes > 1 << 30) {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte */
- _("%u.%2.2u GiB") :
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
- _("%u.%2.2u GiB/s"),
- (unsigned)(bytes >> 30),
+ strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
(unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+ return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
} else if (bytes > 1 << 20) {
- unsigned x = bytes + 5243; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte */
- _("%u.%2.2u MiB") :
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
- _("%u.%2.2u MiB/s"),
- x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+ unsigned x = bytes + 5243; /* for rounding */
+ strbuf_addf(buf, "%u.%2.2u", x >> 20,
+ ((x & ((1 << 20) - 1)) * 100) >> 20);
+ /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+ return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
} else if (bytes > 1 << 10) {
- unsigned x = bytes + 5; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte */
- _("%u.%2.2u KiB") :
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
- _("%u.%2.2u KiB/s"),
- x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+ unsigned x = bytes + 5; /* for rounding */
+ strbuf_addf(buf, "%u.%2.2u", x >> 10,
+ ((x & ((1 << 10) - 1)) * 100) >> 10);
+ /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+ return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
} else {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("%u byte", "%u bytes", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
+ strbuf_addf(buf, "%u", (unsigned)bytes);
+ return humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ xstrfmt(Q_("byte", "bytes", bytes));
}
}
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 0);
+ char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
+ strbuf_addf(buf, " %s", unit);
+ free(unit);
}
void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 1);
+ char *unit = strbuf_humanise_bytes_value(buf, bytes, STRBUF_HUMANISE_RATE);
+ strbuf_addf(buf, " %s", unit);
+ free(unit);
}
int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..a5e3ab0cb4 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
+#define STRBUF_HUMANISE_RATE 1 << 0
+
+/**
+ * Append the given byte size as a human-readable string that is downscaled by
+ * some factor. A string with the corresponding unit prefix is returned
+ * separately.
+ */
+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-12 22:36 ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-15 5:33 ` Patrick Steinhardt
2025-12-15 16:26 ` Justin Tobler
2025-12-15 8:21 ` Junio C Hamano
2025-12-16 2:26 ` Jiang Xin
2 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15 5:33 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Fri, Dec 12, 2025 at 04:36:39PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index 6c3851a7f8..1fb47bf21b 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,55 +836,49 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
> strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
> }
>
> -static void strbuf_humanise(struct strbuf *buf, off_t bytes,
> - int humanise_rate)
> +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
> {
> + int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> +
> if (bytes > 1 << 30) {
> - strbuf_addf(buf,
> - humanise_rate == 0 ?
> - /* TRANSLATORS: IEC 80000-13:2008 gibibyte */
> - _("%u.%2.2u GiB") :
> - /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
> - _("%u.%2.2u GiB/s"),
> - (unsigned)(bytes >> 30),
> + strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
> (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> + /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> + return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
> } else if (bytes > 1 << 20) {
> - unsigned x = bytes + 5243; /* for rounding */
> - strbuf_addf(buf,
> - humanise_rate == 0 ?
> - /* TRANSLATORS: IEC 80000-13:2008 mebibyte */
> - _("%u.%2.2u MiB") :
> - /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
> - _("%u.%2.2u MiB/s"),
> - x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
> + unsigned x = bytes + 5243; /* for rounding */
> + strbuf_addf(buf, "%u.%2.2u", x >> 20,
> + ((x & ((1 << 20) - 1)) * 100) >> 20);
> + /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
> + return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
> } else if (bytes > 1 << 10) {
> - unsigned x = bytes + 5; /* for rounding */
> - strbuf_addf(buf,
> - humanise_rate == 0 ?
> - /* TRANSLATORS: IEC 80000-13:2008 kibibyte */
> - _("%u.%2.2u KiB") :
> - /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
> - _("%u.%2.2u KiB/s"),
> - x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
> + unsigned x = bytes + 5; /* for rounding */
> + strbuf_addf(buf, "%u.%2.2u", x >> 10,
> + ((x & ((1 << 10) - 1)) * 100) >> 10);
> + /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
> + return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
> } else {
> - strbuf_addf(buf,
> - humanise_rate == 0 ?
> - /* TRANSLATORS: IEC 80000-13:2008 byte */
> - Q_("%u byte", "%u bytes", bytes) :
> - /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> - Q_("%u byte/s", "%u bytes/s", bytes),
> - (unsigned)bytes);
> + strbuf_addf(buf, "%u", (unsigned)bytes);
> + return humanise_rate ?
> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> + xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> + /* TRANSLATORS: IEC 80000-13:2008 byte */
> + xstrfmt(Q_("byte", "bytes", bytes));
> }
> }
All branches use `xstrfmt()` with strings that are essentially
constants, except for the translation part. So isn't it possible to drop
all these allocations and have the function return a `const char *`
instead?
> diff --git a/strbuf.h b/strbuf.h
> index a580ac6084..a5e3ab0cb4 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
> */
> void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
>
> +#define STRBUF_HUMANISE_RATE 1 << 0
I think nowadays it's a bit more common to use an enum, and I think we
should also document what the flag does:
enum strbuf_humanise_flags {
/*
* Frobnicate the string.
*/
STRBUF_HUMANISE_RATE = (1 << 0),
};
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-15 5:33 ` Patrick Steinhardt
@ 2025-12-15 16:26 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster
On 25/12/15 06:33AM, Patrick Steinhardt wrote:
> On Fri, Dec 12, 2025 at 04:36:39PM -0600, Justin Tobler wrote:
>
> All branches use `xstrfmt()` with strings that are essentially
> constants, except for the translation part. So isn't it possible to drop
> all these allocations and have the function return a `const char *`
> instead?
Ya, that would indeed be better. Will fix.
> > diff --git a/strbuf.h b/strbuf.h
> > index a580ac6084..a5e3ab0cb4 100644
> > --- a/strbuf.h
> > +++ b/strbuf.h
> > @@ -367,6 +367,15 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
> > */
> > void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
> >
> > +#define STRBUF_HUMANISE_RATE 1 << 0
>
> I think nowadays it's a bit more common to use an enum, and I think we
> should also document what the flag does:
>
> enum strbuf_humanise_flags {
> /*
> * Frobnicate the string.
> */
> STRBUF_HUMANISE_RATE = (1 << 0),
> };
Will do.
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-12 22:36 ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
@ 2025-12-15 8:21 ` Junio C Hamano
2025-12-15 16:47 ` Justin Tobler
2025-12-16 2:26 ` Jiang Xin
2 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-15 8:21 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps
Justin Tobler <jltobler@gmail.com> writes:
> +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
> {
> + int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> +
> if (bytes > 1 << 30) {
> + strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
> (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> + /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> + return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
> ...
> }
> void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
> {
> - strbuf_humanise(buf, bytes, 0);
> + char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
> + strbuf_addf(buf, " %s", unit);
> + free(unit);
> }
The old "strbuf-humanise" used to treat the whole "<number> <unit>",
e.g., _("%u.%2.2u GiB"), as a single thing to be translated.
However, the new code requires that in all languages:
- Decimal point in number MUST be "." (don't some Europeans prefer
comma instead?);
- Number MUST come before the unit;
- Between the number and the unit, there has to be one and only one
SP.
All of which could be a severe regression from localization's point
of view.
The first point among the above three can relatively easily
remedied. It is a bit more involved, but it is possible to fix the
other two, too.
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-15 8:21 ` Junio C Hamano
@ 2025-12-15 16:47 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps
On 25/12/15 05:21PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > +char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
> > {
> > + int humanise_rate = flags & STRBUF_HUMANISE_RATE;
> > +
> > if (bytes > 1 << 30) {
> > + strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
> > (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
> > + /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
> > + return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
> > ...
> > }
> > void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
> > {
> > - strbuf_humanise(buf, bytes, 0);
> > + char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
> > + strbuf_addf(buf, " %s", unit);
> > + free(unit);
> > }
>
> The old "strbuf-humanise" used to treat the whole "<number> <unit>",
> e.g., _("%u.%2.2u GiB"), as a single thing to be translated.
> However, the new code requires that in all languages:
>
> - Decimal point in number MUST be "." (don't some Europeans prefer
> comma instead?);
>
> - Number MUST come before the unit;
>
> - Between the number and the unit, there has to be one and only one
> SP.
>
> All of which could be a severe regression from localization's point
> of view.
>
> The first point among the above three can relatively easily
> remedied. It is a bit more involved, but it is possible to fix the
> other two, too.
The first point could be addressed by just making "%u.%2.2u"
translatable. To address the others, we could have
strbuf_humanise_bytes_value() output two separate strings (value and
unit) instead of appending the the value and returning the unit. Maybe
something like:
void humanise_bytes(off_t bytes, char **value, const char **unit)
We could then have another translatable string to configure the format:
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
char *value;
const char *unit;
humanise_bytes(bytes, &value, &unit);
strbuf_addf(buf, _("%s %s"), value, unit);
free(value);
}
This is certainly a bit more involved setup for translators though. But
maybe it's ok? I'll move forward with something like above in the next
version for now.
Thanks
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-12 22:36 ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
2025-12-15 8:21 ` Junio C Hamano
@ 2025-12-16 2:26 ` Jiang Xin
2025-12-16 4:37 ` Junio C Hamano
2 siblings, 1 reply; 80+ messages in thread
From: Jiang Xin @ 2025-12-16 2:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, gitster, Jeff Hostetler
On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> + return humanise_rate ?
> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> + xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> + /* TRANSLATORS: IEC 80000-13:2008 byte */
> + xstrfmt(Q_("byte", "bytes", bytes));
We have already defined "byte" as a 10n string without plural forms in the
file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
The newly introduced usage of "byte" is now marked as having a plural form
(via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
pot failing with the following error:
msgcat: msgid 'byte' is used without plural and with plural.
This happens because gettext requires that a given msgid be treated
consistently—either exclusively as a singular string or as part of a plural
construct—but not both.
To resolve this conflict, we can unmark the singular "byte" in
t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
plural-form definition of "byte".
--
Jiang Xin
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-16 2:26 ` Jiang Xin
@ 2025-12-16 4:37 ` Junio C Hamano
2025-12-16 6:18 ` Jiang Xin
0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16 4:37 UTC (permalink / raw)
To: Jiang Xin; +Cc: Justin Tobler, git, ps, Jeff Hostetler
Jiang Xin <worldhello.net@gmail.com> writes:
> On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
>> + return humanise_rate ?
>> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
>> + xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
>> + /* TRANSLATORS: IEC 80000-13:2008 byte */
>> + xstrfmt(Q_("byte", "bytes", bytes));
>
> We have already defined "byte" as a 10n string without plural forms in the
> file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
>
> OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
>
> The newly introduced usage of "byte" is now marked as having a plural form
> (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> pot failing with the following error:
>
> msgcat: msgid 'byte' is used without plural and with plural.
>
> This happens because gettext requires that a given msgid be treated
> consistently—either exclusively as a singular string or as part of a plural
> construct—but not both.
>
> To resolve this conflict, we can unmark the singular "byte" in
> t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> plural-form definition of "byte".
I learned a new thing today and am happy :).
But how does one "unmark" the singular "byte" there, exactly?
Would something like this ...
OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),
... a good idea, to "mark" it as a countable noun that has a plural
form?
Or did you mean that we can simply drop N_() around it, i.e.,
N_("byte") -> "byte", to discard the i18n, because it merely is a
test helper?
Punting is fine in this case, but in case a similar situation arises
in real code, it would be better to establish a pattern we can
follow.
Thanks.
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-16 4:37 ` Junio C Hamano
@ 2025-12-16 6:18 ` Jiang Xin
2025-12-16 14:41 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Jiang Xin @ 2025-12-16 6:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Justin Tobler, git, ps, Jeff Hostetler
On Tue, Dec 16, 2025 at 12:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jiang Xin <worldhello.net@gmail.com> writes:
>
> > On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> >> + return humanise_rate ?
> >> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> >> + xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> >> + /* TRANSLATORS: IEC 80000-13:2008 byte */
> >> + xstrfmt(Q_("byte", "bytes", bytes));
> >
> > We have already defined "byte" as a 10n string without plural forms in the
> > file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> > tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
> >
> > OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
> >
> > The newly introduced usage of "byte" is now marked as having a plural form
> > (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> > pot failing with the following error:
> >
> > msgcat: msgid 'byte' is used without plural and with plural.
> >
> > This happens because gettext requires that a given msgid be treated
> > consistently—either exclusively as a singular string or as part of a plural
> > construct—but not both.
> >
> > To resolve this conflict, we can unmark the singular "byte" in
> > t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> > plural-form definition of "byte".
>
> I learned a new thing today and am happy :).
>
> But how does one "unmark" the singular "byte" there, exactly?
>
> Would something like this ...
>
> OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),
>
> ... a good idea, to "mark" it as a countable noun that has a plural
> form?
>
> Or did you mean that we can simply drop N_() around it, i.e.,
> N_("byte") -> "byte", to discard the i18n, because it merely is a
> test helper?
I prefer dropping N_() for "byte" in "t/helper/test-simple-ipc.c", and
the i18n for the test helper will continue to work as before if we also
mark the plural-form of "byte" in this patch series. (i.e., drop the N_()
for "byte" in the test helper in this patch.)
This is because N_() is a macro that does not invoke any gettext
function, only returns msgid as in gettext.h:
#define N_(msgid) msgid
And the actual translation for the msgid (the argh field of an option)
occurs later by calling:
opts->argh ? _(opts->argh) : _("...")
in "parse-options.c".
However, replacing N_() with Q_() would cause the string to be
processed by gettext twice: once at runtime via Q_(), and again
when _(opts->argh) is evaluated.
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 2/7] strbuf: split out logic to humanise byte values
2025-12-16 6:18 ` Jiang Xin
@ 2025-12-16 14:41 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 14:41 UTC (permalink / raw)
To: Jiang Xin; +Cc: Junio C Hamano, git, ps, Jeff Hostetler
On 25/12/16 02:18PM, Jiang Xin wrote:
> On Tue, Dec 16, 2025 at 12:37 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Jiang Xin <worldhello.net@gmail.com> writes:
> >
> > > On Sat, Dec 13, 2025 at 6:37 AM Justin Tobler <jltobler@gmail.com> wrote:
> > >> + return humanise_rate ?
> > >> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> > >> + xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
> > >> + /* TRANSLATORS: IEC 80000-13:2008 byte */
> > >> + xstrfmt(Q_("byte", "bytes", bytes));
> > >
> > > We have already defined "byte" as a 10n string without plural forms in the
> > > file "t/helper/test-simple-ipc.c" via commit 36a7eb6876 (t0052: add simple-ipc
> > > tests and t/helper/test-simple-ipc tool, 2021-03-22 10:29:48 +0000).
> > >
> > > OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
> > >
> > > The newly introduced usage of "byte" is now marked as having a plural form
> > > (via Q_("byte", "bytes", bytes)), which causes a conflict. This results in make
> > > pot failing with the following error:
> > >
> > > msgcat: msgid 'byte' is used without plural and with plural.
> > >
> > > This happens because gettext requires that a given msgid be treated
> > > consistently—either exclusively as a singular string or as part of a plural
> > > construct—but not both.
> > >
> > > To resolve this conflict, we can unmark the singular "byte" in
> > > t/helper/test-simple-ipc.c, allowing it to reuse the translation from the
> > > plural-form definition of "byte".
> >
> > I learned a new thing today and am happy :).
> >
> > But how does one "unmark" the singular "byte" there, exactly?
> >
> > Would something like this ...
> >
> > OPT_STRING(0, "byte", &bytevalue, Q_("byte", "bytes", 1), N_("ballast character")),
> >
> > ... a good idea, to "mark" it as a countable noun that has a plural
> > form?
> >
> > Or did you mean that we can simply drop N_() around it, i.e.,
> > N_("byte") -> "byte", to discard the i18n, because it merely is a
> > test helper?
>
> I prefer dropping N_() for "byte" in "t/helper/test-simple-ipc.c", and
> the i18n for the test helper will continue to work as before if we also
> mark the plural-form of "byte" in this patch series. (i.e., drop the N_()
> for "byte" in the test helper in this patch.)
>
> This is because N_() is a macro that does not invoke any gettext
> function, only returns msgid as in gettext.h:
>
> #define N_(msgid) msgid
>
> And the actual translation for the msgid (the argh field of an option)
> occurs later by calling:
>
> opts->argh ? _(opts->argh) : _("...")
>
> in "parse-options.c".
>
> However, replacing N_() with Q_() would cause the string to be
> processed by gettext twice: once at runtime via Q_(), and again
> when _(opts->argh) is evaluated.
Thanks both! This thread has been very informative. In the version I'll
go ahead and drop the N_() here for this patch. :)
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v2 3/7] builtin/repo: humanise count values in structure output
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-12 22:36 ` [PATCH v2 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-12 22:36 ` [PATCH v2 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
2025-12-12 22:36 ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
` (4 subsequent siblings)
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 45 +++++++++++++++++++++-------
strbuf.c | 23 +++++++++++++++
strbuf.h | 7 +++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
4 files changed, 95 insertions(+), 42 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..d3dfe416d0 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
int name_col_width;
int value_col_width;
+ int unit_col_width;
};
/*
@@ -230,6 +231,7 @@ struct stats_table {
*/
struct stats_table_entry {
char *value;
+ char *unit;
};
static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
if (name_width > table->name_col_width)
table->name_col_width = name_width;
- if (entry) {
+ if (!entry)
+ return;
+ if (entry->value) {
int value_width = utf8_strwidth(entry->value);
if (value_width > table->value_col_width)
table->value_col_width = value_width;
}
+ if (entry->unit) {
+ int unit_width = utf8_strwidth(entry->unit);
+ if (unit_width > table->unit_col_width)
+ table->unit_col_width = unit_width;
+ }
}
static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -270,10 +279,13 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
const char *format, ...)
{
struct stats_table_entry *entry;
+ struct strbuf buf = STRBUF_INIT;
va_list ap;
CALLOC_ARRAY(entry, 1);
- entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+ entry->unit = strbuf_humanise_count_value(&buf, value);
+ entry->value = strbuf_detach(&buf, NULL);
va_start(ap, format);
stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +336,24 @@ static void stats_table_print_structure(const struct stats_table *table)
{
const char *name_col_title = _("Repository structure");
const char *value_col_title = _("Value");
- int name_col_width = utf8_strwidth(name_col_title);
- int value_col_width = utf8_strwidth(value_col_title);
+ int title_name_width = utf8_strwidth(name_col_title);
+ int title_value_width = utf8_strwidth(value_col_title);
+ int name_col_width = table->name_col_width;
+ int value_col_width = table->value_col_width;
+ int unit_col_width = table->unit_col_width;
struct string_list_item *item;
struct strbuf buf = STRBUF_INIT;
- if (table->name_col_width > name_col_width)
- name_col_width = table->name_col_width;
- if (table->value_col_width > value_col_width)
- value_col_width = table->value_col_width;
+ if (title_name_width > name_col_width)
+ name_col_width = title_name_width;
+ if (title_value_width > value_col_width + unit_col_width + 1)
+ value_col_width = title_value_width - unit_col_width;
strbuf_addstr(&buf, "| ");
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
strbuf_addstr(&buf, " | ");
- strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+ strbuf_utf8_align(&buf, ALIGN_LEFT,
+ value_col_width + unit_col_width + 1, value_col_title);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
@@ -345,17 +361,20 @@ static void stats_table_print_structure(const struct stats_table *table)
for (int i = 0; i < name_col_width; i++)
putchar('-');
printf(" | ");
- for (int i = 0; i < value_col_width; i++)
+ for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
putchar('-');
printf(" |\n");
for_each_string_list_item(item, &table->rows) {
struct stats_table_entry *entry = item->util;
const char *value = "";
+ const char *unit = "";
if (entry) {
struct stats_table_entry *entry = item->util;
value = entry->value;
+ if (entry->unit)
+ unit = entry->unit;
}
strbuf_reset(&buf);
@@ -363,6 +382,8 @@ static void stats_table_print_structure(const struct stats_table *table)
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
strbuf_addstr(&buf, " | ");
strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+ strbuf_addch(&buf, ' ');
+ strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
}
@@ -377,8 +398,10 @@ static void stats_table_clear(struct stats_table *table)
for_each_string_list_item(item, &table->rows) {
entry = item->util;
- if (entry)
+ if (entry) {
free(entry->value);
+ free(entry->unit);
+ }
}
string_list_clear(&table->rows, 1);
diff --git a/strbuf.c b/strbuf.c
index 1fb47bf21b..cebb1593ab 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
+{
+ if (value >= 1000000000) {
+ uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+ x / 1000000000, x % 1000000000 / 10000000);
+ return xstrfmt(_("G"));
+ } else if (value >= 1000000) {
+ uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+ x / 1000000, x % 1000000 / 10000);
+ return xstrfmt(_("M"));
+ } else if (value >= 1000) {
+ uintmax_t x = (uintmax_t)value + 5; /* for rounding */
+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
+ x / 1000, x % 1000 / 10);
+ return xstrfmt(_("k"));
+ } else {
+ strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
+ return NULL;
+ }
+}
+
char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
{
int humanise_rate = flags & STRBUF_HUMANISE_RATE;
diff --git a/strbuf.h b/strbuf.h
index a5e3ab0cb4..7532eadd02 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -376,6 +376,13 @@ void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
*/
char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
+/**
+ * Append the given count value as a human-readable string that is downsacled by
+ * some factor. A string with the corresponding unit prefix is returned
+ * separately.
+ */
+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
- | | |
- | * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
git init repo &&
(
cd repo &&
- test_commit_bulk 42 &&
+ test_commit_bulk 1005 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 130 |
- | * Commits | 43 |
- | * Trees | 43 |
- | * Blobs | 43 |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v2 3/7] builtin/repo: humanise count values in structure output
2025-12-12 22:36 ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-15 5:33 ` Patrick Steinhardt
0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15 5:33 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Fri, Dec 12, 2025 at 04:36:40PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index 1fb47bf21b..cebb1593ab 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
> strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
> }
>
> +char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
> +{
> + if (value >= 1000000000) {
> + uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
> + strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> + x / 1000000000, x % 1000000000 / 10000000);
> + return xstrfmt(_("G"));
> + } else if (value >= 1000000) {
> + uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
> + strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> + x / 1000000, x % 1000000 / 10000);
> + return xstrfmt(_("M"));
> + } else if (value >= 1000) {
> + uintmax_t x = (uintmax_t)value + 5; /* for rounding */
> + strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
> + x / 1000, x % 1000 / 10);
> + return xstrfmt(_("k"));
> + } else {
> + strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
> + return NULL;
> + }
> +}
Same comment here as in the previous patch, can't we return `const char *`
here in case we drop all allocations?
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (2 preceding siblings ...)
2025-12-12 22:36 ` [PATCH v2 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
2025-12-12 22:36 ` [PATCH v2 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
` (3 subsequent siblings)
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 32 ++++++++++++++++++++++++++++++++
t/t1901-repo-structure.sh | 6 +++++-
3 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
+
* Reference counts categorized by type
* Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index d3dfe416d0..3a2d15cec4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
#include "builtin.h"
#include "environment.h"
+#include "hex.h"
+#include "odb.h"
#include "parse-options.h"
#include "path-walk.h"
#include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
+ struct object_values inflated_sizes;
};
struct repo_structure {
@@ -428,6 +431,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.type_counts.tags, value_delim);
+ printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+ printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+ printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+ printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -491,6 +503,7 @@ static void structure_count_references(struct ref_stats *stats,
}
struct count_objects_data {
+ struct object_database *odb;
struct object_stats *stats;
struct progress *progress;
};
@@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
{
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
+ size_t inflated_total = 0;
size_t object_count;
+ for (size_t i = 0; i < oids->nr; i++) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ unsigned long inflated;
+
+ oi.sizep = &inflated;
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+ OBJECT_INFO_FOR_PREFETCH) < 0)
+ continue;
+
+ inflated_total += inflated;
+ }
+
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
+ stats->inflated_sizes.tags += inflated_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
+ stats->inflated_sizes.commits += inflated_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
+ stats->inflated_sizes.trees += inflated_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
+ stats->inflated_sizes.blobs += inflated_total;
break;
default:
BUG("invalid object type");
@@ -531,6 +562,7 @@ static void structure_count_objects(struct object_stats *stats,
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct count_objects_data data = {
+ .odb = repo->objects,
.stats = stats,
};
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
)
'
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
objects.trees.count=42
objects.blobs.count=42
objects.tags.count=1
+ objects.commits.inflated_size=9225
+ objects.trees.inflated_size=28554
+ objects.blobs.inflated_size=453
+ objects.tags.inflated_size=132
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-12 22:36 ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-15 5:33 ` Patrick Steinhardt
2025-12-15 16:48 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15 5:33 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Fri, Dec 12, 2025 at 04:36:41PM -0600, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index d3dfe416d0..3a2d15cec4 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> {
> struct count_objects_data *data = cb_data;
> struct object_stats *stats = data->stats;
> + size_t inflated_total = 0;
> size_t object_count;
>
> + for (size_t i = 0; i < oids->nr; i++) {
> + struct object_info oi = OBJECT_INFO_INIT;
> + unsigned long inflated;
> +
> + oi.sizep = &inflated;
> +
> + if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> + OBJECT_INFO_FOR_PREFETCH) < 0)
Using `OBJECT_INFO_FOR_PREFETCH` feels a bit weird to me, as we're not
in a context where we want to do a prefetch. And if we ever were to
extend that flag to have more semantics that are relevant to prefetches,
only, then this code here might become broken.
Using `SKIP_FETCH_OBJECT | INFO_QUICK` does make sense though, so I'd
suggest to expand the flag here.
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-15 5:33 ` Patrick Steinhardt
@ 2025-12-15 16:48 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 16:48 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster
On 25/12/15 06:33AM, Patrick Steinhardt wrote:
> On Fri, Dec 12, 2025 at 04:36:41PM -0600, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index d3dfe416d0..3a2d15cec4 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -500,20 +513,38 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> > {
> > struct count_objects_data *data = cb_data;
> > struct object_stats *stats = data->stats;
> > + size_t inflated_total = 0;
> > size_t object_count;
> >
> > + for (size_t i = 0; i < oids->nr; i++) {
> > + struct object_info oi = OBJECT_INFO_INIT;
> > + unsigned long inflated;
> > +
> > + oi.sizep = &inflated;
> > +
> > + if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> > + OBJECT_INFO_FOR_PREFETCH) < 0)
>
> Using `OBJECT_INFO_FOR_PREFETCH` feels a bit weird to me, as we're not
> in a context where we want to do a prefetch. And if we ever were to
> extend that flag to have more semantics that are relevant to prefetches,
> only, then this code here might become broken.
>
> Using `SKIP_FETCH_OBJECT | INFO_QUICK` does make sense though, so I'd
> suggest to expand the flag here.
Good points. I'll update in the next version.
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v2 5/7] builtin/repo: add inflated object info to structure table
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (3 preceding siblings ...)
2025-12-12 22:36 ` [PATCH v2 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-12 22:36 ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
` (2 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 37 +++++++++++++++++++++--
strbuf.c | 4 +++
strbuf.h | 3 +-
t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
4 files changed, 76 insertions(+), 30 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 3a2d15cec4..b0609cfae5 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -295,6 +295,24 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ struct strbuf buf = STRBUF_INIT;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+
+ entry->unit = strbuf_humanise_bytes_value(&buf, value,
+ STRBUF_HUMANISE_COMPACT);
+ entry->value = strbuf_detach(&buf, NULL);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
static inline size_t get_total_reference_count(struct ref_stats *stats)
{
return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -310,7 +328,8 @@ static void stats_table_setup_structure(struct stats_table *table,
{
struct object_stats *objects = &stats->objects;
struct ref_stats *refs = &stats->refs;
- size_t object_total;
+ size_t inflated_object_total;
+ size_t object_count_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -321,10 +340,10 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_values(&objects->type_counts);
+ object_count_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
- stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, object_count_total, " * %s", _("Count"));
stats_table_count_addf(table, objects->type_counts.commits,
" * %s", _("Commits"));
stats_table_count_addf(table, objects->type_counts.trees,
@@ -333,6 +352,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_count_addf(table, objects->type_counts.tags,
" * %s", _("Tags"));
+
+ inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+ stats_table_size_addf(table, inflated_object_total,
+ " * %s", _("Inflated size"));
+ stats_table_size_addf(table, objects->inflated_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->inflated_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->inflated_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->inflated_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index cebb1593ab..eed4e167ca 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -882,6 +882,10 @@ char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flag
return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
} else {
strbuf_addf(buf, "%u", (unsigned)bytes);
+ if (flags & STRBUF_HUMANISE_COMPACT)
+ return humanise_rate ?
+ xstrfmt(_("B/s")) :
+ xstrfmt(_("B"));
return humanise_rate ?
/* TRANSLATORS: IEC 80000-13:2008 byte/second */
xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
diff --git a/strbuf.h b/strbuf.h
index 7532eadd02..919527d26b 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,7 +367,8 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
-#define STRBUF_HUMANISE_RATE 1 << 0
+#define STRBUF_HUMANISE_RATE 1 << 0
+#define STRBUF_HUMANISE_COMPACT 1 << 1
/**
* Append the given byte size as a human-readable string that is downscaled by
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
| Repository structure | Value |
| -------------------- | ------ |
| * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
| | |
| * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
+ | * Inflated size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ------ |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 3.02 k |
- | * Commits | 1.01 k |
- | * Trees | 1.01 k |
- | * Blobs | 1.01 k |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ---------- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
+ | * Inflated size | 16.03 MiB |
+ | * Commits | 217.92 KiB |
+ | * Trees | 15.81 MiB |
+ | * Blobs | 11.68 KiB |
+ | * Tags | 132 B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (4 preceding siblings ...)
2025-12-12 22:36 ` [PATCH v2 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-15 5:33 ` Patrick Steinhardt
2025-12-12 22:36 ` [PATCH v2 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 18 ++++++++++++++++++
t/t1901-repo-structure.sh | 11 ++++++++++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
* Reference counts categorized by type
* Reachable object counts categorized by type
* Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b0609cfae5..252a53f452 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
struct object_values inflated_sizes;
+ struct object_values disk_sizes;
};
struct repo_structure {
@@ -471,6 +472,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+ printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+ printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+ printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+ printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -545,37 +555,45 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
size_t inflated_total = 0;
+ size_t disk_total = 0;
size_t object_count;
for (size_t i = 0; i < oids->nr; i++) {
struct object_info oi = OBJECT_INFO_INIT;
unsigned long inflated;
+ off_t disk;
oi.sizep = &inflated;
+ oi.disk_sizep = &disk;
if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
OBJECT_INFO_FOR_PREFETCH) < 0)
continue;
inflated_total += inflated;
+ disk_total += disk;
}
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
stats->inflated_sizes.tags += inflated_total;
+ stats->disk_sizes.tags += disk_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
stats->inflated_sizes.commits += inflated_total;
+ stats->disk_sizes.commits += disk_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
stats->inflated_sizes.trees += inflated_total;
+ stats->disk_sizes.trees += disk_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
stats->inflated_sizes.blobs += inflated_total;
+ stats->disk_sizes.blobs += disk_total;
break;
default:
BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..1553f3cd32 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
. ./test-lib.sh
+object_type_disk_usage() {
+ git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
+ --filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
+}
+
test_expect_success 'empty repository' '
test_when_finished "rm -rf repo" &&
git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
test_commit_bulk 42 &&
git tag -a foo -m bar &&
- cat >expect <<-\EOF &&
+ cat >expect <<-EOF &&
references.branches.count=1
references.tags.count=1
references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
objects.trees.inflated_size=28554
objects.blobs.inflated_size=453
objects.tags.inflated_size=132
+ objects.commits.disk_size=$(object_type_disk_usage commit)
+ objects.trees.disk_size=$(object_type_disk_usage tree)
+ objects.blobs.disk_size=$(object_type_disk_usage blob)
+ objects.tags.disk_size=$(object_type_disk_usage tag)
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output
2025-12-12 22:36 ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-15 5:33 ` Patrick Steinhardt
0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-15 5:33 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Fri, Dec 12, 2025 at 04:36:43PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index b18213c660..1553f3cd32 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -4,6 +4,11 @@ test_description='test git repo structure'
>
> . ./test-lib.sh
>
> +object_type_disk_usage() {
> + git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
> + --filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
> +}
> +
Using `git rev-list --all --disk-usage --filter=object:type=$1
--filter-provided-objects` would avoid the separate call to awk(1).
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v2 7/7] builtin/repo: add object disk size info to structure table
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (5 preceding siblings ...)
2025-12-12 22:36 ` [PATCH v2 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-12 22:36 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-12 22:36 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Since disk size may vary between platforms, tests do not validate actual
values and only check that size info is printed in an empty repository.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 13 +++++++++++++
t/t1901-repo-structure.sh | 19 ++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 252a53f452..c294fa11d2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -331,6 +331,7 @@ static void stats_table_setup_structure(struct stats_table *table,
struct ref_stats *refs = &stats->refs;
size_t inflated_object_total;
size_t object_count_total;
+ size_t disk_object_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -365,6 +366,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_size_addf(table, objects->inflated_sizes.tags,
" * %s", _("Tags"));
+
+ disk_object_total = get_total_object_values(&objects->disk_sizes);
+ stats_table_size_addf(table, disk_object_total,
+ " * %s", _("Disk size"));
+ stats_table_size_addf(table, objects->disk_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->disk_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->disk_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->disk_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 1553f3cd32..6a992222df 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -9,6 +9,15 @@ object_type_disk_usage() {
--filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
}
+strip_object_disk_usage() {
+ awk '
+ /^\| \* Disk size/ { skip=1; next }
+ skip && /^\| \* / { next }
+ skip && !/^\| \* / { skip=0 }
+ { print }
+ ' $1
+}
+
test_expect_success 'empty repository' '
test_when_finished "rm -rf repo" &&
git init repo &&
@@ -35,6 +44,11 @@ test_expect_success 'empty repository' '
| * Trees | 0 B |
| * Blobs | 0 B |
| * Tags | 0 B |
+ | * Disk size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -81,7 +95,10 @@ test_expect_success SHA1 'repository with references and objects' '
| * Tags | 132 B |
EOF
- git repo structure >out 2>err &&
+ git repo structure >out.raw 2>err &&
+
+ # Skip object disk sizes due to platform variance.
+ strip_object_disk_usage out.raw >out &&
test_cmp expect out &&
test_line_count = 0 err
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v3 0/7] builtin/repo: add object size info to structure output
2025-12-12 22:36 ` [PATCH v2 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (6 preceding siblings ...)
2025-12-12 22:36 ` [PATCH v2 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
` (7 more replies)
7 siblings, 8 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Greetings,
This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.
In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.
Changes in V3:
- Address potential localization regression by making the downscaled
number format string also translatable. Also make the format string
for how the values and unit prefixes are displayed via
`strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
`humanise_{bytes,count}()` and updated to provide both the value and
unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
`OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
explicitly.
- Tests now use git-rev-list(1) to verify disk size info.
Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
downscaling values and determining the appropriate unit prefix
separately. This enables more control over how exactly the values are
written to the structure output table which is useful for alignment
reasons. I'm not how about the interface used in patch 2. Feedback is
most welcome.
- In the previous version, when checking object size on a missing object
we would die. Instead we now ignore missing objects. This allows the
structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
real expected values instead of skipping. Table output tests still
skip verifing human-readable values though.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: group per-type object values into struct
strbuf: split out logic to humanise byte values
builtin/repo: humanise count values in structure output
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: add inflated object info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add object disk size info to structure table
Documentation/git-repo.adoc | 2 +
builtin/repo.c | 175 ++++++++++++++++++++++++++++++------
strbuf.c | 93 ++++++++++++-------
strbuf.h | 25 ++++++
t/t1901-repo-structure.sh | 113 +++++++++++++++--------
5 files changed, 311 insertions(+), 97 deletions(-)
Range-diff against v2:
1: be14de68f6 = 1: be14de68f6 builtin/repo: group per-type object values into struct
2: 5ca6f9b708 ! 2: 1fa33f5906 strbuf: split out logic to humanise byte values
@@ Commit message
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
- separately to ensure proper column alignment. Refactor strbuf_humanise()
- to instead append the downscaled byte value to the buffer only and
- return the appropriate unit prefix string.
+ separately to ensure proper column alignment.
+
+ Split out logic from strbuf_humanise() to downscale byte values and
+ determine the corresponding unit prefix into a separate humanise_bytes()
+ function that provides seperate value and unit strings.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
- int humanise_rate)
-+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
++void humanise_bytes(off_t bytes, char **value, const char **unit,
++ unsigned flags)
{
-+ int humanise_rate = flags & STRBUF_HUMANISE_RATE;
++ int humanise_rate = flags & HUMANISE_RATE;
+
if (bytes > 1 << 30) {
- strbuf_addf(buf,
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
- _("%u.%2.2u GiB/s"),
- (unsigned)(bytes >> 30),
-+ strbuf_addf(buf, "%u.%2.2u", (unsigned)(bytes >> 30),
- (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+- (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
++ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
++ (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
-+ return humanise_rate ? xstrfmt(_("GiB/s")) : xstrfmt(_("GiB"));
++ *unit = humanise_rate ? _("GiB/s") : _("GiB");
} else if (bytes > 1 << 20) {
- unsigned x = bytes + 5243; /* for rounding */
- strbuf_addf(buf,
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
- _("%u.%2.2u MiB/s"),
- x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+ unsigned x = bytes + 5243; /* for rounding */
-+ strbuf_addf(buf, "%u.%2.2u", x >> 20,
-+ ((x & ((1 << 20) - 1)) * 100) >> 20);
++ *value = xstrfmt(_("%u.%2.2u"), x >> 20,
++ ((x & ((1 << 20) - 1)) * 100) >> 20);
+ /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
-+ return humanise_rate ? xstrfmt(_("MiB/s")) : xstrfmt(_("MiB"));
++ *unit = humanise_rate ? _("MiB/s") : _("MiB");
} else if (bytes > 1 << 10) {
- unsigned x = bytes + 5; /* for rounding */
- strbuf_addf(buf,
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
- _("%u.%2.2u KiB/s"),
- x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+ unsigned x = bytes + 5; /* for rounding */
-+ strbuf_addf(buf, "%u.%2.2u", x >> 10,
-+ ((x & ((1 << 10) - 1)) * 100) >> 10);
++ *value = xstrfmt(_("%u.%2.2u"), x >> 10,
++ ((x & ((1 << 10) - 1)) * 100) >> 10);
+ /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
-+ return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
++ *unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- strbuf_addf(buf,
- humanise_rate == 0 ?
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
-+ strbuf_addf(buf, "%u", (unsigned)bytes);
-+ return humanise_rate ?
++ *value = xstrfmt(_("%u"), (unsigned)bytes);
++ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
-+ xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
++ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
-+ xstrfmt(Q_("byte", "bytes", bytes));
++ Q_("byte", "bytes", bytes);
}
}
++static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
++{
++ char *value;
++ const char *unit;
++
++ humanise_bytes(bytes, &value, &unit, flags);
++ strbuf_addf(buf, _("%s %s"), value, unit);
++ free(value);
++}
++
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
-- strbuf_humanise(buf, bytes, 0);
-+ char *unit = strbuf_humanise_bytes_value(buf, bytes, 0);
-+ strbuf_addf(buf, " %s", unit);
-+ free(unit);
- }
+ strbuf_humanise(buf, bytes, 0);
+@@ strbuf.c: void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 1);
-+ char *unit = strbuf_humanise_bytes_value(buf, bytes, STRBUF_HUMANISE_RATE);
-+ strbuf_addf(buf, " %s", unit);
-+ free(unit);
++ strbuf_humanise(buf, bytes, HUMANISE_RATE);
}
int printf_ln(const char *fmt, ...)
@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
-+#define STRBUF_HUMANISE_RATE 1 << 0
++enum humanise_flags {
++ /*
++ * Use rate based unit prefixes for humanised values.
++ */
++ HUMANISE_RATE = (1 << 0),
++};
+
+/**
-+ * Append the given byte size as a human-readable string that is downscaled by
-+ * some factor. A string with the corresponding unit prefix is returned
-+ * separately.
++ * Converts the given byte size into a downscaled human-readable value and
++ * corresponding unit prefix as two separate strings.
+ */
-+char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
++void humanise_bytes(off_t bytes, char **value, const char **unit,
++ unsigned flags);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
3: 2efc3533ef ! 3: 8f09f6358e builtin/repo: humanise count values in structure output
@@ builtin/repo.c: struct stats_table {
*/
struct stats_table_entry {
char *value;
-+ char *unit;
++ const char *unit;
};
static void stats_table_vaddf(struct stats_table *table,
@@ builtin/repo.c: static void stats_table_vaddf(struct stats_table *table,
static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
- const char *format, ...)
- {
- struct stats_table_entry *entry;
-+ struct strbuf buf = STRBUF_INIT;
va_list ap;
CALLOC_ARRAY(entry, 1);
- entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
-+
-+ entry->unit = strbuf_humanise_count_value(&buf, value);
-+ entry->value = strbuf_detach(&buf, NULL);
++ humanise_count(value, &entry->value, &entry->unit);
va_start(ap, format);
stats_table_vaddf(table, entry, format, ap);
@@ builtin/repo.c: static void stats_table_print_structure(const struct stats_table
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
}
-@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
-
- for_each_string_list_item(item, &table->rows) {
- entry = item->util;
-- if (entry)
-+ if (entry) {
- free(entry->value);
-+ free(entry->unit);
-+ }
- }
-
- string_list_clear(&table->rows, 1);
## strbuf.c ##
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
-+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value)
++void humanise_count(size_t count, char **value, const char **unit)
+{
-+ if (value >= 1000000000) {
-+ uintmax_t x = (uintmax_t)value + 5000000; /* for rounding */
-+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
-+ x / 1000000000, x % 1000000000 / 10000000);
-+ return xstrfmt(_("G"));
-+ } else if (value >= 1000000) {
-+ uintmax_t x = (uintmax_t)value + 5000; /* for rounding */
-+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
-+ x / 1000000, x % 1000000 / 10000);
-+ return xstrfmt(_("M"));
-+ } else if (value >= 1000) {
-+ uintmax_t x = (uintmax_t)value + 5; /* for rounding */
-+ strbuf_addf(buf, "%" PRIuMAX ".%02" PRIuMAX,
-+ x / 1000, x % 1000 / 10);
-+ return xstrfmt(_("k"));
++ if (count >= 1000000000) {
++ size_t x = count + 5000000; /* for rounding */
++ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
++ (unsigned)(x % 1000000000 / 10000000));
++ *unit = _("G");
++ } else if (count >= 1000000) {
++ size_t x = count + 5000; /* for rounding */
++ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
++ (unsigned)(x % 1000000 / 10000));
++ *unit = _("M");
++ } else if (count >= 1000) {
++ size_t x = count + 5; /* for rounding */
++ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
++ (unsigned)(x % 1000 / 10));
++ *unit = _("k");
+ } else {
-+ strbuf_addf(buf, "%" PRIuMAX, (uintmax_t)value);
-+ return NULL;
++ *value = xstrfmt(_("%u"), (unsigned)count);
++ *unit = NULL;
+ }
+}
+
- char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags)
+ void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags)
{
- int humanise_rate = flags & STRBUF_HUMANISE_RATE;
## strbuf.h ##
-@@ strbuf.h: void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
- */
- char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flags);
+@@ strbuf.h: enum humanise_flags {
+ void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags);
+/**
-+ * Append the given count value as a human-readable string that is downsacled by
-+ * some factor. A string with the corresponding unit prefix is returned
-+ * separately.
++ * Converts the given count into a downscaled human-readable value and
++ * corresponding unit prefix as two separate strings.
+ */
-+char *strbuf_humanise_count_value(struct strbuf *buf, size_t value);
++void humanise_count(size_t count, char **value, const char **unit);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
4: 627b8bf025 ! 4: 3f4eabe94f builtin/repo: add inflated object info to keyvalue structure output
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
+ oi.sizep = &inflated;
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
-+ OBJECT_INFO_FOR_PREFETCH) < 0)
++ OBJECT_INFO_SKIP_FETCH_OBJECT |
++ OBJECT_INFO_QUICK) < 0)
+ continue;
+
+ inflated_total += inflated;
5: 14f4983e1d ! 5: 85d1052100 builtin/repo: add inflated object info to structure table
@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, si
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
-+ struct strbuf buf = STRBUF_INIT;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
-+
-+ entry->unit = strbuf_humanise_bytes_value(&buf, value,
-+ STRBUF_HUMANISE_COMPACT);
-+ entry->value = strbuf_detach(&buf, NULL);
++ humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
static void stats_table_print_structure(const struct stats_table *table)
## strbuf.c ##
-@@ strbuf.c: char *strbuf_humanise_bytes_value(struct strbuf *buf, off_t bytes, unsigned flag
- return humanise_rate ? xstrfmt(_("KiB/s")) : xstrfmt(_("KiB"));
+@@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
+ *unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- strbuf_addf(buf, "%u", (unsigned)bytes);
-+ if (flags & STRBUF_HUMANISE_COMPACT)
-+ return humanise_rate ?
-+ xstrfmt(_("B/s")) :
-+ xstrfmt(_("B"));
- return humanise_rate ?
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- xstrfmt(Q_("byte/s", "bytes/s", bytes)) :
+ *value = xstrfmt(_("%u"), (unsigned)bytes);
+- *unit = humanise_rate ?
+- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+- Q_("byte/s", "bytes/s", bytes) :
+- /* TRANSLATORS: IEC 80000-13:2008 byte */
+- Q_("byte", "bytes", bytes);
++ if (flags & HUMANISE_COMPACT)
++ *unit = humanise_rate ? _("B/s") : _("B");
++ else
++ *unit = humanise_rate ?
++ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
++ Q_("byte/s", "bytes/s", bytes) :
++ /* TRANSLATORS: IEC 80000-13:2008 byte */
++ Q_("byte", "bytes", bytes);
+ }
+ }
+
## strbuf.h ##
-@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
- */
- void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
-
--#define STRBUF_HUMANISE_RATE 1 << 0
-+#define STRBUF_HUMANISE_RATE 1 << 0
-+#define STRBUF_HUMANISE_COMPACT 1 << 1
+@@ strbuf.h: enum humanise_flags {
+ * Use rate based unit prefixes for humanised values.
+ */
+ HUMANISE_RATE = (1 << 0),
++ /*
++ * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
++ * values.
++ */
++ HUMANISE_COMPACT = (1 << 1),
+ };
/**
- * Append the given byte size as a human-readable string that is downscaled by
## t/t1901-repo-structure.sh ##
@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
6: dc9e82889f ! 6: e9fa9babec builtin/repo: add disk size info to keyvalue stucture output
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
+ oi.disk_sizep = &disk;
if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
- OBJECT_INFO_FOR_PREFETCH) < 0)
+ OBJECT_INFO_SKIP_FETCH_OBJECT |
+@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_array *oids,
continue;
inflated_total += inflated;
@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
. ./test-lib.sh
+object_type_disk_usage() {
-+ git cat-file --batch-check='%(objectsize:disk)' --batch-all-objects \
-+ --filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
++ git rev-list --all --objects --disk-usage --filter=object:type=$1 \
++ --filter-provided-objects
+}
+
test_expect_success 'empty repository' '
7: 213b19dc7f ! 7: df542c7bdf builtin/repo: add object disk size info to structure table
@@ Commit message
git-repo(1) structure command to display the total object disk usage by
object type.
- Since disk size may vary between platforms, tests do not validate actual
- values and only check that size info is printed in an empty repository.
-
Signed-off-by: Justin Tobler <jltobler@gmail.com>
## builtin/repo.c ##
@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
static void stats_table_print_structure(const struct stats_table *table)
## t/t1901-repo-structure.sh ##
-@@ t/t1901-repo-structure.sh: object_type_disk_usage() {
- --filter=object:type=$1 | awk '{ sum += $1 } END { print sum }'
- }
+@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
+ . ./test-lib.sh
-+strip_object_disk_usage() {
-+ awk '
-+ /^\| \* Disk size/ { skip=1; next }
-+ skip && /^\| \* / { next }
-+ skip && !/^\| \* / { skip=0 }
-+ { print }
-+ ' $1
-+}
+ object_type_disk_usage() {
+- git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+- --filter-provided-objects
++ disk_usage_opt="--disk-usage"
++
++ if [ "$2" = "true" ]; then
++ disk_usage_opt="--disk-usage=human"
++ fi
+
++ if [ "$1" = "all" ]; then
++ git rev-list --all --objects $disk_usage_opt
++ else
++ git rev-list --all --objects $disk_usage_opt \
++ --filter=object:type=$1 --filter-provided-objects
++ fi
+ }
+
test_expect_success 'empty repository' '
- test_when_finished "rm -rf repo" &&
- git init repo &&
@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
| * Trees | 0 B |
| * Blobs | 0 B |
@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
git repo structure >out 2>err &&
@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
+ # Also creates a commit, tree, and blob.
+ git notes add -m foo &&
+
+- cat >expect <<-\EOF &&
++ cat >expect <<-EOF &&
+ | Repository structure | Value |
+ | -------------------- | ---------- |
+ | * References | |
+@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references and objects' '
+ | * Trees | 15.81 MiB |
+ | * Blobs | 11.68 KiB |
| * Tags | 132 B |
++ | * Disk size | $(object_type_disk_usage all true) |
++ | * Commits | $(object_type_disk_usage commit true) |
++ | * Trees | $(object_type_disk_usage tree true) |
++ | * Blobs | $(object_type_disk_usage blob true) |
++ | * Tags | $(object_type_disk_usage tag) B |
EOF
-- git repo structure >out 2>err &&
-+ git repo structure >out.raw 2>err &&
-+
-+ # Skip object disk sizes due to platform variance.
-+ strip_object_disk_usage out.raw >out &&
-
- test_cmp expect out &&
- test_line_count = 0 err
+ git repo structure >out 2>err &&
base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
--
2.52.0.209.ge85ae279b0
^ permalink raw reply [flat|nested] 80+ messages in thread* [PATCH v3 1/7] builtin/repo: group per-type object values into struct
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
` (6 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
size_t others;
};
-struct object_stats {
+struct object_values {
size_t tags;
size_t commits;
size_t trees;
size_t blobs;
};
+struct object_stats {
+ struct object_values type_counts;
+};
+
struct repo_structure {
struct ref_stats refs;
struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
return stats->branches + stats->remotes + stats->tags + stats->others;
}
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
{
- return stats->tags + stats->commits + stats->trees + stats->blobs;
+ return values->tags + values->commits + values->trees + values->blobs;
}
static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_count(objects);
+ object_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
stats_table_count_addf(table, object_total, " * %s", _("Count"));
- stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
- stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
- stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
- stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, objects->type_counts.commits,
+ " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->type_counts.trees,
+ " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->type_counts.blobs,
+ " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->type_counts.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
(uintmax_t)stats->refs.others, value_delim);
printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.commits, value_delim);
+ (uintmax_t)stats->objects.type_counts.commits, value_delim);
printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.trees, value_delim);
+ (uintmax_t)stats->objects.type_counts.trees, value_delim);
printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.blobs, value_delim);
+ (uintmax_t)stats->objects.type_counts.blobs, value_delim);
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.tags, value_delim);
+ (uintmax_t)stats->objects.type_counts.tags, value_delim);
fflush(stdout);
}
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
switch (type) {
case OBJ_TAG:
- stats->tags += oids->nr;
+ stats->type_counts.tags += oids->nr;
break;
case OBJ_COMMIT:
- stats->commits += oids->nr;
+ stats->type_counts.commits += oids->nr;
break;
case OBJ_TREE:
- stats->trees += oids->nr;
+ stats->type_counts.trees += oids->nr;
break;
case OBJ_BLOB:
- stats->blobs += oids->nr;
+ stats->type_counts.blobs += oids->nr;
break;
default:
BUG("invalid object type");
}
- object_count = get_total_object_count(stats);
+ object_count = get_total_object_values(&stats->type_counts);
display_progress(data->progress, object_count);
return 0;
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v3 2/7] strbuf: split out logic to humanise byte values
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-15 20:56 ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-16 1:19 ` Junio C Hamano
2025-12-15 20:56 ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
` (5 subsequent siblings)
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment.
Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
strbuf.c | 69 ++++++++++++++++++++++++++++----------------------------
strbuf.h | 14 ++++++++++++
2 files changed, 49 insertions(+), 34 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..bb8e98872f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,48 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
- int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags)
{
+ int humanise_rate = flags & HUMANISE_RATE;
+
if (bytes > 1 << 30) {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte */
- _("%u.%2.2u GiB") :
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
- _("%u.%2.2u GiB/s"),
- (unsigned)(bytes >> 30),
- (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+ (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+ *unit = humanise_rate ? _("GiB/s") : _("GiB");
} else if (bytes > 1 << 20) {
- unsigned x = bytes + 5243; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte */
- _("%u.%2.2u MiB") :
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
- _("%u.%2.2u MiB/s"),
- x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+ unsigned x = bytes + 5243; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 20,
+ ((x & ((1 << 20) - 1)) * 100) >> 20);
+ /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+ *unit = humanise_rate ? _("MiB/s") : _("MiB");
} else if (bytes > 1 << 10) {
- unsigned x = bytes + 5; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte */
- _("%u.%2.2u KiB") :
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
- _("%u.%2.2u KiB/s"),
- x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+ unsigned x = bytes + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 10,
+ ((x & ((1 << 10) - 1)) * 100) >> 10);
+ /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+ *unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("%u byte", "%u bytes", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
+ *value = xstrfmt(_("%u"), (unsigned)bytes);
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+ char *value;
+ const char *unit;
+
+ humanise_bytes(bytes, &value, &unit, flags);
+ strbuf_addf(buf, _("%s %s"), value, unit);
+ free(value);
+}
+
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
strbuf_humanise(buf, bytes, 0);
@@ -884,7 +885,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 1);
+ strbuf_humanise(buf, bytes, HUMANISE_RATE);
}
int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..4426163e7e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
+enum humanise_flags {
+ /*
+ * Use rate based unit prefixes for humanised values.
+ */
+ HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v3 2/7] strbuf: split out logic to humanise byte values
2025-12-15 20:56 ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16 1:19 ` Junio C Hamano
2025-12-16 1:36 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16 1:19 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps
Justin Tobler <jltobler@gmail.com> writes:
> + *value = xstrfmt(_("%u"), (unsigned)bytes);
Does this "%u" need translation?
I very much doubt it, but if it did, this does need TRANSLATORS
comment.
> + *unit = humanise_rate ?
> + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> + Q_("byte/s", "bytes/s", bytes) :
> + /* TRANSLATORS: IEC 80000-13:2008 byte */
> + Q_("byte", "bytes", bytes);
> }
> }
>
> +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> +{
> + char *value;
> + const char *unit;
> +
> + humanise_bytes(bytes, &value, &unit, flags);
> + strbuf_addf(buf, _("%s %s"), value, unit);
This definitely needs the TRANSLATORS comment to tell what is going on.
> + free(value);
> +}
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v3 2/7] strbuf: split out logic to humanise byte values
2025-12-16 1:19 ` Junio C Hamano
@ 2025-12-16 1:36 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 1:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps
On 25/12/16 10:19AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > + *value = xstrfmt(_("%u"), (unsigned)bytes);
>
> Does this "%u" need translation?
>
> I very much doubt it, but if it did, this does need TRANSLATORS
> comment.
Ya, I don't think one should be necessary. Will remove in the next
version.
I think I made the same mistake in humanise_count() in a later patch.
I'll also adjust it there.
>
> > + *unit = humanise_rate ?
> > + /* TRANSLATORS: IEC 80000-13:2008 byte/second */
> > + Q_("byte/s", "bytes/s", bytes) :
> > + /* TRANSLATORS: IEC 80000-13:2008 byte */
> > + Q_("byte", "bytes", bytes);
> > }
> > }
> >
> > +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> > +{
> > + char *value;
> > + const char *unit;
> > +
> > + humanise_bytes(bytes, &value, &unit, flags);
> > + strbuf_addf(buf, _("%s %s"), value, unit);
>
> This definitely needs the TRANSLATORS comment to tell what is going on.
Ok, will do in the next version. Thanks :)
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v3 3/7] builtin/repo: humanise count values in structure output
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-15 20:56 ` [PATCH v3 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-15 20:56 ` [PATCH v3 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-16 8:25 ` Patrick Steinhardt
2025-12-15 20:56 ` [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
` (4 subsequent siblings)
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 38 +++++++++++++++++-------
strbuf.c | 23 +++++++++++++++
strbuf.h | 6 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
4 files changed, 88 insertions(+), 41 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
int name_col_width;
int value_col_width;
+ int unit_col_width;
};
/*
@@ -230,6 +231,7 @@ struct stats_table {
*/
struct stats_table_entry {
char *value;
+ const char *unit;
};
static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
if (name_width > table->name_col_width)
table->name_col_width = name_width;
- if (entry) {
+ if (!entry)
+ return;
+ if (entry->value) {
int value_width = utf8_strwidth(entry->value);
if (value_width > table->value_col_width)
table->value_col_width = value_width;
}
+ if (entry->unit) {
+ int unit_width = utf8_strwidth(entry->unit);
+ if (unit_width > table->unit_col_width)
+ table->unit_col_width = unit_width;
+ }
}
static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_list ap;
CALLOC_ARRAY(entry, 1);
- entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+ humanise_count(value, &entry->value, &entry->unit);
va_start(ap, format);
stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
{
const char *name_col_title = _("Repository structure");
const char *value_col_title = _("Value");
- int name_col_width = utf8_strwidth(name_col_title);
- int value_col_width = utf8_strwidth(value_col_title);
+ int title_name_width = utf8_strwidth(name_col_title);
+ int title_value_width = utf8_strwidth(value_col_title);
+ int name_col_width = table->name_col_width;
+ int value_col_width = table->value_col_width;
+ int unit_col_width = table->unit_col_width;
struct string_list_item *item;
struct strbuf buf = STRBUF_INIT;
- if (table->name_col_width > name_col_width)
- name_col_width = table->name_col_width;
- if (table->value_col_width > value_col_width)
- value_col_width = table->value_col_width;
+ if (title_name_width > name_col_width)
+ name_col_width = title_name_width;
+ if (title_value_width > value_col_width + unit_col_width + 1)
+ value_col_width = title_value_width - unit_col_width;
strbuf_addstr(&buf, "| ");
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
strbuf_addstr(&buf, " | ");
- strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+ strbuf_utf8_align(&buf, ALIGN_LEFT,
+ value_col_width + unit_col_width + 1, value_col_title);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
for (int i = 0; i < name_col_width; i++)
putchar('-');
printf(" | ");
- for (int i = 0; i < value_col_width; i++)
+ for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
putchar('-');
printf(" |\n");
for_each_string_list_item(item, &table->rows) {
struct stats_table_entry *entry = item->util;
const char *value = "";
+ const char *unit = "";
if (entry) {
struct stats_table_entry *entry = item->util;
value = entry->value;
+ if (entry->unit)
+ unit = entry->unit;
}
strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
strbuf_addstr(&buf, " | ");
strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+ strbuf_addch(&buf, ' ');
+ strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
}
diff --git a/strbuf.c b/strbuf.c
index bb8e98872f..662edd4d19 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
+void humanise_count(size_t count, char **value, const char **unit)
+{
+ if (count >= 1000000000) {
+ size_t x = count + 5000000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+ (unsigned)(x % 1000000000 / 10000000));
+ *unit = _("G");
+ } else if (count >= 1000000) {
+ size_t x = count + 5000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+ (unsigned)(x % 1000000 / 10000));
+ *unit = _("M");
+ } else if (count >= 1000) {
+ size_t x = count + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+ (unsigned)(x % 1000 / 10));
+ *unit = _("k");
+ } else {
+ *value = xstrfmt(_("%u"), (unsigned)count);
+ *unit = NULL;
+ }
+}
+
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags)
{
diff --git a/strbuf.h b/strbuf.h
index 4426163e7e..571bd889df 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags);
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
- | | |
- | * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
git init repo &&
(
cd repo &&
- test_commit_bulk 42 &&
+ test_commit_bulk 1005 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 130 |
- | * Commits | 43 |
- | * Trees | 43 |
- | * Blobs | 43 |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v3 3/7] builtin/repo: humanise count values in structure output
2025-12-15 20:56 ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-16 8:25 ` Patrick Steinhardt
0 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-16 8:25 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Mon, Dec 15, 2025 at 02:56:35PM -0600, Justin Tobler wrote:
> diff --git a/strbuf.c b/strbuf.c
> index bb8e98872f..662edd4d19 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -836,6 +836,29 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
> strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
> }
>
> +void humanise_count(size_t count, char **value, const char **unit)
> +{
> + if (count >= 1000000000) {
> + size_t x = count + 5000000; /* for rounding */
> + *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
> + (unsigned)(x % 1000000000 / 10000000));
> + *unit = _("G");
> + } else if (count >= 1000000) {
> + size_t x = count + 5000; /* for rounding */
> + *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
> + (unsigned)(x % 1000000 / 10000));
> + *unit = _("M");
> + } else if (count >= 1000) {
> + size_t x = count + 5; /* for rounding */
> + *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
> + (unsigned)(x % 1000 / 10));
> + *unit = _("k");
> + } else {
> + *value = xstrfmt(_("%u"), (unsigned)count);
> + *unit = NULL;
> + }
> +}
I guess these here could also all use TRANSLATOR comments.
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (2 preceding siblings ...)
2025-12-15 20:56 ` [PATCH v3 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
` (3 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 33 +++++++++++++++++++++++++++++++++
t/t1901-repo-structure.sh | 6 +++++-
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
+
* Reference counts categorized by type
* Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..e207108346 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
#include "builtin.h"
#include "environment.h"
+#include "hex.h"
+#include "odb.h"
#include "parse-options.h"
#include "path-walk.h"
#include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
+ struct object_values inflated_sizes;
};
struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.type_counts.tags, value_delim);
+ printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+ printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+ printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+ printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
}
struct count_objects_data {
+ struct object_database *odb;
struct object_stats *stats;
struct progress *progress;
};
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
{
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
+ size_t inflated_total = 0;
size_t object_count;
+ for (size_t i = 0; i < oids->nr; i++) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ unsigned long inflated;
+
+ oi.sizep = &inflated;
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+ OBJECT_INFO_SKIP_FETCH_OBJECT |
+ OBJECT_INFO_QUICK) < 0)
+ continue;
+
+ inflated_total += inflated;
+ }
+
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
+ stats->inflated_sizes.tags += inflated_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
+ stats->inflated_sizes.commits += inflated_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
+ stats->inflated_sizes.trees += inflated_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
+ stats->inflated_sizes.blobs += inflated_total;
break;
default:
BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct count_objects_data data = {
+ .odb = repo->objects,
.stats = stats,
};
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
)
'
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
objects.trees.count=42
objects.blobs.count=42
objects.tags.count=1
+ objects.commits.inflated_size=9225
+ objects.trees.inflated_size=28554
+ objects.blobs.inflated_size=453
+ objects.tags.inflated_size=132
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v3 5/7] builtin/repo: add inflated object info to structure table
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (3 preceding siblings ...)
2025-12-15 20:56 ` [PATCH v3 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
` (2 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 33 +++++++++++++++++++--
strbuf.c | 13 ++++----
strbuf.h | 5 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
4 files changed, 79 insertions(+), 34 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index e207108346..b73cfd975b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
static inline size_t get_total_reference_count(struct ref_stats *stats)
{
return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
{
struct object_stats *objects = &stats->objects;
struct ref_stats *refs = &stats->refs;
- size_t object_total;
+ size_t inflated_object_total;
+ size_t object_count_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_values(&objects->type_counts);
+ object_count_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
- stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, object_count_total, " * %s", _("Count"));
stats_table_count_addf(table, objects->type_counts.commits,
" * %s", _("Commits"));
stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_count_addf(table, objects->type_counts.tags,
" * %s", _("Tags"));
+
+ inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+ stats_table_size_addf(table, inflated_object_total,
+ " * %s", _("Inflated size"));
+ stats_table_size_addf(table, objects->inflated_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->inflated_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->inflated_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->inflated_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 662edd4d19..1e2d1f70a7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -883,11 +883,14 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
*unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
*value = xstrfmt(_("%u"), (unsigned)bytes);
- *unit = humanise_rate ?
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("byte/s", "bytes/s", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("byte", "bytes", bytes);
+ if (flags & HUMANISE_COMPACT)
+ *unit = humanise_rate ? _("B/s") : _("B");
+ else
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
diff --git a/strbuf.h b/strbuf.h
index 571bd889df..005c155808 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
* Use rate based unit prefixes for humanised values.
*/
HUMANISE_RATE = (1 << 0),
+ /*
+ * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
+ * values.
+ */
+ HUMANISE_COMPACT = (1 << 1),
};
/**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
| Repository structure | Value |
| -------------------- | ------ |
| * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
| | |
| * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
+ | * Inflated size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ------ |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 3.02 k |
- | * Commits | 1.01 k |
- | * Trees | 1.01 k |
- | * Blobs | 1.01 k |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ---------- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
+ | * Inflated size | 16.03 MiB |
+ | * Commits | 217.92 KiB |
+ | * Trees | 15.81 MiB |
+ | * Blobs | 11.68 KiB |
+ | * Tags | 132 B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (4 preceding siblings ...)
2025-12-15 20:56 ` [PATCH v3 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-15 20:56 ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 18 ++++++++++++++++++
t/t1901-repo-structure.sh | 11 ++++++++++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
* Reference counts categorized by type
* Reachable object counts categorized by type
* Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b73cfd975b..0ed41bf9d4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
struct object_values inflated_sizes;
+ struct object_values disk_sizes;
};
struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+ printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+ printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+ printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+ printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
size_t inflated_total = 0;
+ size_t disk_total = 0;
size_t object_count;
for (size_t i = 0; i < oids->nr; i++) {
struct object_info oi = OBJECT_INFO_INIT;
unsigned long inflated;
+ off_t disk;
oi.sizep = &inflated;
+ oi.disk_sizep = &disk;
if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
continue;
inflated_total += inflated;
+ disk_total += disk;
}
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
stats->inflated_sizes.tags += inflated_total;
+ stats->disk_sizes.tags += disk_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
stats->inflated_sizes.commits += inflated_total;
+ stats->disk_sizes.commits += disk_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
stats->inflated_sizes.trees += inflated_total;
+ stats->disk_sizes.trees += disk_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
stats->inflated_sizes.blobs += inflated_total;
+ stats->disk_sizes.blobs += disk_total;
break;
default:
BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
. ./test-lib.sh
+object_type_disk_usage() {
+ git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+ --filter-provided-objects
+}
+
test_expect_success 'empty repository' '
test_when_finished "rm -rf repo" &&
git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
test_commit_bulk 42 &&
git tag -a foo -m bar &&
- cat >expect <<-\EOF &&
+ cat >expect <<-EOF &&
references.branches.count=1
references.tags.count=1
references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
objects.trees.inflated_size=28554
objects.blobs.inflated_size=453
objects.tags.inflated_size=132
+ objects.commits.disk_size=$(object_type_disk_usage commit)
+ objects.trees.disk_size=$(object_type_disk_usage tree)
+ objects.blobs.disk_size=$(object_type_disk_usage blob)
+ objects.tags.disk_size=$(object_type_disk_usage tag)
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (5 preceding siblings ...)
2025-12-15 20:56 ` [PATCH v3 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-15 20:56 ` Justin Tobler
2025-12-16 8:25 ` Patrick Steinhardt
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
7 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-15 20:56 UTC (permalink / raw)
To: git; +Cc: ps, gitster, Justin Tobler
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 13 +++++++++++++
t/t1901-repo-structure.sh | 26 +++++++++++++++++++++++---
2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 0ed41bf9d4..a071d2fdfe 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
struct ref_stats *refs = &stats->refs;
size_t inflated_object_total;
size_t object_count_total;
+ size_t disk_object_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_size_addf(table, objects->inflated_sizes.tags,
" * %s", _("Tags"));
+
+ disk_object_total = get_total_object_values(&objects->disk_sizes);
+ stats_table_size_addf(table, disk_object_total,
+ " * %s", _("Disk size"));
+ stats_table_size_addf(table, objects->disk_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->disk_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->disk_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->disk_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..64db191234 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,18 @@ test_description='test git repo structure'
. ./test-lib.sh
object_type_disk_usage() {
- git rev-list --all --objects --disk-usage --filter=object:type=$1 \
- --filter-provided-objects
+ disk_usage_opt="--disk-usage"
+
+ if [ "$2" = "true" ]; then
+ disk_usage_opt="--disk-usage=human"
+ fi
+
+ if [ "$1" = "all" ]; then
+ git rev-list --all --objects $disk_usage_opt
+ else
+ git rev-list --all --objects $disk_usage_opt \
+ --filter=object:type=$1 --filter-provided-objects
+ fi
}
test_expect_success 'empty repository' '
@@ -35,6 +45,11 @@ test_expect_success 'empty repository' '
| * Trees | 0 B |
| * Blobs | 0 B |
| * Tags | 0 B |
+ | * Disk size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -58,7 +73,7 @@ test_expect_success SHA1 'repository with references and objects' '
# Also creates a commit, tree, and blob.
git notes add -m foo &&
- cat >expect <<-\EOF &&
+ cat >expect <<-EOF &&
| Repository structure | Value |
| -------------------- | ---------- |
| * References | |
@@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
| * Trees | 15.81 MiB |
| * Blobs | 11.68 KiB |
| * Tags | 132 B |
+ | * Disk size | $(object_type_disk_usage all true) |
+ | * Commits | $(object_type_disk_usage commit true) |
+ | * Trees | $(object_type_disk_usage tree true) |
+ | * Blobs | $(object_type_disk_usage blob true) |
+ | * Tags | $(object_type_disk_usage tag) B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
2025-12-15 20:56 ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-16 8:25 ` Patrick Steinhardt
2025-12-16 14:48 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-16 8:25 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster
On Mon, Dec 15, 2025 at 02:56:39PM -0600, Justin Tobler wrote:
> diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> index dd17caad05..64db191234 100755
> --- a/t/t1901-repo-structure.sh
> +++ b/t/t1901-repo-structure.sh
> @@ -5,8 +5,18 @@ test_description='test git repo structure'
> . ./test-lib.sh
>
> object_type_disk_usage() {
> - git rev-list --all --objects --disk-usage --filter=object:type=$1 \
> - --filter-provided-objects
> + disk_usage_opt="--disk-usage"
> +
> + if [ "$2" = "true" ]; then
> + disk_usage_opt="--disk-usage=human"
> + fi
> +
> + if [ "$1" = "all" ]; then
> + git rev-list --all --objects $disk_usage_opt
> + else
> + git rev-list --all --objects $disk_usage_opt \
> + --filter=object:type=$1 --filter-provided-objects
> + fi
> }
>
> test_expect_success 'empty repository' '
We don't use `if [ ... ]` in our codebase, and we typically have the
`then` on the next line:
if test "$2" = "true"
then
...
fi
if test "$1" = "all"
then
...
else
...
fi
> @@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
> | * Trees | 15.81 MiB |
> | * Blobs | 11.68 KiB |
> | * Tags | 132 B |
> + | * Disk size | $(object_type_disk_usage all true) |
> + | * Commits | $(object_type_disk_usage commit true) |
> + | * Trees | $(object_type_disk_usage tree true) |
> + | * Blobs | $(object_type_disk_usage blob true) |
> + | * Tags | $(object_type_disk_usage tag) B |
> EOF
Curious, but why is the last one special here?
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v3 7/7] builtin/repo: add object disk size info to structure table
2025-12-16 8:25 ` Patrick Steinhardt
@ 2025-12-16 14:48 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 14:48 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster
On 25/12/16 09:25AM, Patrick Steinhardt wrote:
> On Mon, Dec 15, 2025 at 02:56:39PM -0600, Justin Tobler wrote:
> > diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
> > index dd17caad05..64db191234 100755
> > --- a/t/t1901-repo-structure.sh
> > +++ b/t/t1901-repo-structure.sh
> > @@ -5,8 +5,18 @@ test_description='test git repo structure'
> > . ./test-lib.sh
> >
> > object_type_disk_usage() {
> > - git rev-list --all --objects --disk-usage --filter=object:type=$1 \
> > - --filter-provided-objects
> > + disk_usage_opt="--disk-usage"
> > +
> > + if [ "$2" = "true" ]; then
> > + disk_usage_opt="--disk-usage=human"
> > + fi
> > +
> > + if [ "$1" = "all" ]; then
> > + git rev-list --all --objects $disk_usage_opt
> > + else
> > + git rev-list --all --objects $disk_usage_opt \
> > + --filter=object:type=$1 --filter-provided-objects
> > + fi
> > }
> >
> > test_expect_success 'empty repository' '
>
> We don't use `if [ ... ]` in our codebase, and we typically have the
> `then` on the next line:
>
> if test "$2" = "true"
> then
> ...
> fi
>
> if test "$1" = "all"
> then
> ...
> else
> ...
> fi
Noted, will fix.
> > @@ -79,6 +94,11 @@ test_expect_success SHA1 'repository with references and objects' '
> > | * Trees | 15.81 MiB |
> > | * Blobs | 11.68 KiB |
> > | * Tags | 132 B |
> > + | * Disk size | $(object_type_disk_usage all true) |
> > + | * Commits | $(object_type_disk_usage commit true) |
> > + | * Trees | $(object_type_disk_usage tree true) |
> > + | * Blobs | $(object_type_disk_usage blob true) |
> > + | * Tags | $(object_type_disk_usage tag) B |
> > EOF
>
> Curious, but why is the last one special here?
The `--disk-usage=human` rev-list option here outputs "byte/bytes"
instead of "B". In patch 5, the HUMANISE_COMPACT flag was added to
humanise_bytes() to toggle this behavior. For the git-repo(1) structure
table output, I wanted to always use the more compact unit prefix
representation.
I'll leave a comment here to explain this special case.
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v4 0/7] builtin/repo: add object size info to structure output
2025-12-15 20:56 ` [PATCH v3 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (6 preceding siblings ...)
2025-12-15 20:56 ` [PATCH v3 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 17:38 ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
` (8 more replies)
7 siblings, 9 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Greetings,
This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.
In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.
Changes in V4:
- Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
to avoid conflict with translated plural "byte/bytes" string.
- Remove some unnecessary translations and add comments to clarify some
of the added translations.
- Some small changes to the tests in patch 7.
Changes in V3:
- Address potential localization regression by making the downscaled
number format string also translatable. Also make the format string
for how the values and unit prefixes are displayed via
`strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
`humanise_{bytes,count}()` and updated to provide both the value and
unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
`OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
explicitly.
- Tests now use git-rev-list(1) to verify disk size info.
Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
downscaling values and determining the appropriate unit prefix
separately. This enables more control over how exactly the values are
written to the structure output table which is useful for alignment
reasons. I'm not how about the interface used in patch 2. Feedback is
most welcome.
- In the previous version, when checking object size on a missing object
we would die. Instead we now ignore missing objects. This allows the
structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
real expected values instead of skipping. Table output tests still
skip verifing human-readable values though.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: group per-type object values into struct
strbuf: split out logic to humanise byte values
builtin/repo: humanise count values in structure output
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: add inflated object info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add object disk size info to structure table
Documentation/git-repo.adoc | 2 +
builtin/repo.c | 175 ++++++++++++++++++++++++++++++------
strbuf.c | 102 ++++++++++++++-------
strbuf.h | 25 ++++++
t/helper/test-simple-ipc.c | 7 +-
t/t1901-repo-structure.sh | 118 ++++++++++++++++--------
6 files changed, 331 insertions(+), 98 deletions(-)
Range-diff against v3:
1: be14de68f6 = 1: be14de68f6 builtin/repo: group per-type object values into struct
2: 1fa33f5906 ! 2: 0a145cfeec strbuf: split out logic to humanise byte values
@@ Commit message
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.
+ Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
+ for translation here so that it doesn't conflict with the newly defined
+ plural "byte/bytes" translation and instead uses it.
+
Signed-off-by: Justin Tobler <jltobler@gmail.com>
## strbuf.c ##
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
-+ *value = xstrfmt(_("%u"), (unsigned)bytes);
++ *value = xstrfmt("%u", (unsigned)bytes);
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
+ const char *unit;
+
+ humanise_bytes(bytes, &value, &unit, flags);
++
++ /*
++ * TRANSLATORS: The first argument is the number string. The second
++ * argument is the unit prefix string (i.e. "12.34 MiB/s").
++ */
+ strbuf_addf(buf, _("%s %s"), value, unit);
+ free(value);
+}
@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
+
+ ## t/helper/test-simple-ipc.c ##
+@@ t/helper/test-simple-ipc.c: int cmd__simple_ipc(int argc, const char **argv)
+ OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
+ OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
+
+- OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
++ /*
++ * The "byte" string here is not marked for translation and
++ * instead relies on translation in strbuf.c:humanise_bytes() to
++ * avoid conflict with the plural form.
++ */
++ OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
+ OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
+
+ OPT_END()
3: 8f09f6358e ! 3: eebf0d917b builtin/repo: humanise count values in structure output
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
+ size_t x = count + 5000000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+ (unsigned)(x % 1000000000 / 10000000));
++ /* TRANSLATORS: SI decimal prefix symbol for 10^9 */
+ *unit = _("G");
+ } else if (count >= 1000000) {
+ size_t x = count + 5000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+ (unsigned)(x % 1000000 / 10000));
++ /* TRANSLATORS: SI decimal prefix symbol for 10^6 */
+ *unit = _("M");
+ } else if (count >= 1000) {
+ size_t x = count + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+ (unsigned)(x % 1000 / 10));
++ /* TRANSLATORS: SI decimal prefix symbol for 10^3 */
+ *unit = _("k");
+ } else {
-+ *value = xstrfmt(_("%u"), (unsigned)count);
++ *value = xstrfmt("%u", (unsigned)count);
+ *unit = NULL;
+ }
+}
4: 3f4eabe94f = 4: 37f71cc1bc builtin/repo: add inflated object info to keyvalue structure output
5: 85d1052100 ! 5: 40edf4c20b builtin/repo: add inflated object info to structure table
@@ strbuf.c
@@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
*unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- *value = xstrfmt(_("%u"), (unsigned)bytes);
+ *value = xstrfmt("%u", (unsigned)bytes);
- *unit = humanise_rate ?
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("byte/s", "bytes/s", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("byte", "bytes", bytes);
+ if (flags & HUMANISE_COMPACT)
++ /* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
+ *unit = humanise_rate ? _("B/s") : _("B");
+ else
+ *unit = humanise_rate ?
6: e9fa9babec = 6: ba861f37c9 builtin/repo: add disk size info to keyvalue stucture output
7: df542c7bdf ! 7: 3118c17ae3 builtin/repo: add object disk size info to structure table
@@ t/t1901-repo-structure.sh: test_description='test git repo structure'
- --filter-provided-objects
+ disk_usage_opt="--disk-usage"
+
-+ if [ "$2" = "true" ]; then
++ if test "$2" = "true"
++ then
+ disk_usage_opt="--disk-usage=human"
+ fi
+
-+ if [ "$1" = "all" ]; then
++ if test "$1" = "all"
++ then
+ git rev-list --all --objects $disk_usage_opt
+ else
+ git rev-list --all --objects $disk_usage_opt \
@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references
git notes add -m foo &&
- cat >expect <<-\EOF &&
++ # The tags disk size is handled specially due to the
++ # git-rev-list(1) --disk-usage=human option printing the full
++ # "byte/bytes" unit prefix instead of just "B".
+ cat >expect <<-EOF &&
| Repository structure | Value |
| -------------------- | ---------- |
base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
--
2.52.0.209.ge85ae279b0
^ permalink raw reply [flat|nested] 80+ messages in thread* [PATCH v4 1/7] builtin/repo: group per-type object values into struct
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 17:38 ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
` (7 subsequent siblings)
8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
size_t others;
};
-struct object_stats {
+struct object_values {
size_t tags;
size_t commits;
size_t trees;
size_t blobs;
};
+struct object_stats {
+ struct object_values type_counts;
+};
+
struct repo_structure {
struct ref_stats refs;
struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
return stats->branches + stats->remotes + stats->tags + stats->others;
}
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
{
- return stats->tags + stats->commits + stats->trees + stats->blobs;
+ return values->tags + values->commits + values->trees + values->blobs;
}
static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_count(objects);
+ object_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
stats_table_count_addf(table, object_total, " * %s", _("Count"));
- stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
- stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
- stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
- stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, objects->type_counts.commits,
+ " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->type_counts.trees,
+ " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->type_counts.blobs,
+ " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->type_counts.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
(uintmax_t)stats->refs.others, value_delim);
printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.commits, value_delim);
+ (uintmax_t)stats->objects.type_counts.commits, value_delim);
printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.trees, value_delim);
+ (uintmax_t)stats->objects.type_counts.trees, value_delim);
printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.blobs, value_delim);
+ (uintmax_t)stats->objects.type_counts.blobs, value_delim);
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.tags, value_delim);
+ (uintmax_t)stats->objects.type_counts.tags, value_delim);
fflush(stdout);
}
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
switch (type) {
case OBJ_TAG:
- stats->tags += oids->nr;
+ stats->type_counts.tags += oids->nr;
break;
case OBJ_COMMIT:
- stats->commits += oids->nr;
+ stats->type_counts.commits += oids->nr;
break;
case OBJ_TREE:
- stats->trees += oids->nr;
+ stats->type_counts.trees += oids->nr;
break;
case OBJ_BLOB:
- stats->blobs += oids->nr;
+ stats->type_counts.blobs += oids->nr;
break;
default:
BUG("invalid object type");
}
- object_count = get_total_object_count(stats);
+ object_count = get_total_object_values(&stats->type_counts);
display_progress(data->progress, object_count);
return 0;
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v4 2/7] strbuf: split out logic to humanise byte values
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-16 17:38 ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 18:59 ` Junio C Hamano
2025-12-16 17:38 ` [PATCH v4 3/7] builtin/repo: humanise count values in structure output Justin Tobler
` (6 subsequent siblings)
8 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit prefixes must be handled
separately to ensure proper column alignment.
Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.
Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
for translation here so that it doesn't conflict with the newly defined
plural "byte/bytes" translation and instead uses it.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
strbuf.c | 74 ++++++++++++++++++++------------------
strbuf.h | 14 ++++++++
t/helper/test-simple-ipc.c | 7 +++-
3 files changed, 60 insertions(+), 35 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..3fbd375ad6 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,53 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
- int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags)
{
+ int humanise_rate = flags & HUMANISE_RATE;
+
if (bytes > 1 << 30) {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte */
- _("%u.%2.2u GiB") :
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
- _("%u.%2.2u GiB/s"),
- (unsigned)(bytes >> 30),
- (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+ (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+ *unit = humanise_rate ? _("GiB/s") : _("GiB");
} else if (bytes > 1 << 20) {
- unsigned x = bytes + 5243; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte */
- _("%u.%2.2u MiB") :
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
- _("%u.%2.2u MiB/s"),
- x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+ unsigned x = bytes + 5243; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 20,
+ ((x & ((1 << 20) - 1)) * 100) >> 20);
+ /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+ *unit = humanise_rate ? _("MiB/s") : _("MiB");
} else if (bytes > 1 << 10) {
- unsigned x = bytes + 5; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte */
- _("%u.%2.2u KiB") :
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
- _("%u.%2.2u KiB/s"),
- x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+ unsigned x = bytes + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 10,
+ ((x & ((1 << 10) - 1)) * 100) >> 10);
+ /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+ *unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("%u byte", "%u bytes", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
+ *value = xstrfmt("%u", (unsigned)bytes);
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+ char *value;
+ const char *unit;
+
+ humanise_bytes(bytes, &value, &unit, flags);
+
+ /*
+ * TRANSLATORS: The first argument is the number string. The second
+ * argument is the unit prefix string (i.e. "12.34 MiB/s").
+ */
+ strbuf_addf(buf, _("%s %s"), value, unit);
+ free(value);
+}
+
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
strbuf_humanise(buf, bytes, 0);
@@ -884,7 +890,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 1);
+ strbuf_humanise(buf, bytes, HUMANISE_RATE);
}
int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..4426163e7e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
+enum humanise_flags {
+ /*
+ * Use rate based unit prefixes for humanised values.
+ */
+ HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
index 03cc5eea2c..442ad6b16f 100644
--- a/t/helper/test-simple-ipc.c
+++ b/t/helper/test-simple-ipc.c
@@ -603,7 +603,12 @@ int cmd__simple_ipc(int argc, const char **argv)
OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
- OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
+ /*
+ * The "byte" string here is not marked for translation and
+ * instead relies on translation in strbuf.c:humanise_bytes() to
+ * avoid conflict with the plural form.
+ */
+ OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
OPT_END()
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v4 2/7] strbuf: split out logic to humanise byte values
2025-12-16 17:38 ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16 18:59 ` Junio C Hamano
2025-12-16 19:39 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2025-12-16 18:59 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, worldhello.net
Justin Tobler <jltobler@gmail.com> writes:
> +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> +{
> + char *value;
> + const char *unit;
> +
> + humanise_bytes(bytes, &value, &unit, flags);
> +
> + /*
> + * TRANSLATORS: The first argument is the number string. The second
> + * argument is the unit prefix string (i.e. "12.34 MiB/s").
> + */
> + strbuf_addf(buf, _("%s %s"), value, unit);
"unit prefix string"? Prefix is something that comes before
something else, but this one is at the end. Simply saying a "unit
string" would probably be a sufficient fix, perhaps?
I read the changes since the last round, and other than this part,
everything looked good.
Thanks.
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v4 2/7] strbuf: split out logic to humanise byte values
2025-12-16 18:59 ` Junio C Hamano
@ 2025-12-16 19:39 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 19:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, worldhello.net
On 25/12/17 03:59AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > +static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
> > +{
> > + char *value;
> > + const char *unit;
> > +
> > + humanise_bytes(bytes, &value, &unit, flags);
> > +
> > + /*
> > + * TRANSLATORS: The first argument is the number string. The second
> > + * argument is the unit prefix string (i.e. "12.34 MiB/s").
> > + */
> > + strbuf_addf(buf, _("%s %s"), value, unit);
>
> "unit prefix string"? Prefix is something that comes before
> something else, but this one is at the end. Simply saying a "unit
> string" would probably be a sufficient fix, perhaps?
Ya my bad, the prefix part would be just the Ki, Mi, etc. In this case
it is the whole unit string. Saying "unit string" would be correct. I
can send another version fixing this it if you would like.
Thanks for the review,
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v4 3/7] builtin/repo: humanise count values in structure output
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
2025-12-16 17:38 ` [PATCH v4 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-16 17:38 ` [PATCH v4 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 17:38 ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
` (5 subsequent siblings)
8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 38 +++++++++++++++++-------
strbuf.c | 26 ++++++++++++++++
strbuf.h | 6 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
4 files changed, 91 insertions(+), 41 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
int name_col_width;
int value_col_width;
+ int unit_col_width;
};
/*
@@ -230,6 +231,7 @@ struct stats_table {
*/
struct stats_table_entry {
char *value;
+ const char *unit;
};
static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
if (name_width > table->name_col_width)
table->name_col_width = name_width;
- if (entry) {
+ if (!entry)
+ return;
+ if (entry->value) {
int value_width = utf8_strwidth(entry->value);
if (value_width > table->value_col_width)
table->value_col_width = value_width;
}
+ if (entry->unit) {
+ int unit_width = utf8_strwidth(entry->unit);
+ if (unit_width > table->unit_col_width)
+ table->unit_col_width = unit_width;
+ }
}
static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_list ap;
CALLOC_ARRAY(entry, 1);
- entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+ humanise_count(value, &entry->value, &entry->unit);
va_start(ap, format);
stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
{
const char *name_col_title = _("Repository structure");
const char *value_col_title = _("Value");
- int name_col_width = utf8_strwidth(name_col_title);
- int value_col_width = utf8_strwidth(value_col_title);
+ int title_name_width = utf8_strwidth(name_col_title);
+ int title_value_width = utf8_strwidth(value_col_title);
+ int name_col_width = table->name_col_width;
+ int value_col_width = table->value_col_width;
+ int unit_col_width = table->unit_col_width;
struct string_list_item *item;
struct strbuf buf = STRBUF_INIT;
- if (table->name_col_width > name_col_width)
- name_col_width = table->name_col_width;
- if (table->value_col_width > value_col_width)
- value_col_width = table->value_col_width;
+ if (title_name_width > name_col_width)
+ name_col_width = title_name_width;
+ if (title_value_width > value_col_width + unit_col_width + 1)
+ value_col_width = title_value_width - unit_col_width;
strbuf_addstr(&buf, "| ");
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
strbuf_addstr(&buf, " | ");
- strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+ strbuf_utf8_align(&buf, ALIGN_LEFT,
+ value_col_width + unit_col_width + 1, value_col_title);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
for (int i = 0; i < name_col_width; i++)
putchar('-');
printf(" | ");
- for (int i = 0; i < value_col_width; i++)
+ for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
putchar('-');
printf(" |\n");
for_each_string_list_item(item, &table->rows) {
struct stats_table_entry *entry = item->util;
const char *value = "";
+ const char *unit = "";
if (entry) {
struct stats_table_entry *entry = item->util;
value = entry->value;
+ if (entry->unit)
+ unit = entry->unit;
}
strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
strbuf_addstr(&buf, " | ");
strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+ strbuf_addch(&buf, ' ');
+ strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
}
diff --git a/strbuf.c b/strbuf.c
index 3fbd375ad6..9beebad5b9 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,32 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
+void humanise_count(size_t count, char **value, const char **unit)
+{
+ if (count >= 1000000000) {
+ size_t x = count + 5000000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+ (unsigned)(x % 1000000000 / 10000000));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^9 */
+ *unit = _("G");
+ } else if (count >= 1000000) {
+ size_t x = count + 5000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+ (unsigned)(x % 1000000 / 10000));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^6 */
+ *unit = _("M");
+ } else if (count >= 1000) {
+ size_t x = count + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+ (unsigned)(x % 1000 / 10));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^3 */
+ *unit = _("k");
+ } else {
+ *value = xstrfmt("%u", (unsigned)count);
+ *unit = NULL;
+ }
+}
+
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags)
{
diff --git a/strbuf.h b/strbuf.h
index 4426163e7e..571bd889df 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags);
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit prefix as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
- | | |
- | * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
git init repo &&
(
cd repo &&
- test_commit_bulk 42 &&
+ test_commit_bulk 1005 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 130 |
- | * Commits | 43 |
- | * Trees | 43 |
- | * Blobs | 43 |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (2 preceding siblings ...)
2025-12-16 17:38 ` [PATCH v4 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-17 7:03 ` Patrick Steinhardt
2025-12-16 17:38 ` [PATCH v4 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
` (4 subsequent siblings)
8 siblings, 1 reply; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 33 +++++++++++++++++++++++++++++++++
t/t1901-repo-structure.sh | 6 +++++-
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
+
* Reference counts categorized by type
* Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..e207108346 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
#include "builtin.h"
#include "environment.h"
+#include "hex.h"
+#include "odb.h"
#include "parse-options.h"
#include "path-walk.h"
#include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
+ struct object_values inflated_sizes;
};
struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.type_counts.tags, value_delim);
+ printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+ printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+ printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+ printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
}
struct count_objects_data {
+ struct object_database *odb;
struct object_stats *stats;
struct progress *progress;
};
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
{
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
+ size_t inflated_total = 0;
size_t object_count;
+ for (size_t i = 0; i < oids->nr; i++) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ unsigned long inflated;
+
+ oi.sizep = &inflated;
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+ OBJECT_INFO_SKIP_FETCH_OBJECT |
+ OBJECT_INFO_QUICK) < 0)
+ continue;
+
+ inflated_total += inflated;
+ }
+
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
+ stats->inflated_sizes.tags += inflated_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
+ stats->inflated_sizes.commits += inflated_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
+ stats->inflated_sizes.trees += inflated_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
+ stats->inflated_sizes.blobs += inflated_total;
break;
default:
BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct count_objects_data data = {
+ .odb = repo->objects,
.stats = stats,
};
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
)
'
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
objects.trees.count=42
objects.blobs.count=42
objects.tags.count=1
+ objects.commits.inflated_size=9225
+ objects.trees.inflated_size=28554
+ objects.blobs.inflated_size=453
+ objects.tags.inflated_size=132
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-16 17:38 ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-17 7:03 ` Patrick Steinhardt
2025-12-17 16:10 ` Justin Tobler
0 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-17 7:03 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster, worldhello.net
On Tue, Dec 16, 2025 at 11:38:39AM -0600, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 9c61bc3e17..e207108346 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> {
> struct count_objects_data *data = cb_data;
> struct object_stats *stats = data->stats;
> + size_t inflated_total = 0;
> size_t object_count;
>
> + for (size_t i = 0; i < oids->nr; i++) {
> + struct object_info oi = OBJECT_INFO_INIT;
> + unsigned long inflated;
> +
> + oi.sizep = &inflated;
> +
> + if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> + OBJECT_INFO_SKIP_FETCH_OBJECT |
> + OBJECT_INFO_QUICK) < 0)
Tiny nit: there seems to be an extra tab here. This really is only worth
fixing if you intend to reroll anyway.
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-17 7:03 ` Patrick Steinhardt
@ 2025-12-17 16:10 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 16:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, worldhello.net
On 25/12/17 08:03AM, Patrick Steinhardt wrote:
> On Tue, Dec 16, 2025 at 11:38:39AM -0600, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index 9c61bc3e17..e207108346 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> > {
> > struct count_objects_data *data = cb_data;
> > struct object_stats *stats = data->stats;
> > + size_t inflated_total = 0;
> > size_t object_count;
> >
> > + for (size_t i = 0; i < oids->nr; i++) {
> > + struct object_info oi = OBJECT_INFO_INIT;
> > + unsigned long inflated;
> > +
> > + oi.sizep = &inflated;
> > +
> > + if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
> > + OBJECT_INFO_SKIP_FETCH_OBJECT |
> > + OBJECT_INFO_QUICK) < 0)
>
> Tiny nit: there seems to be an extra tab here. This really is only worth
> fixing if you intend to reroll anyway.
I had that initially, but it was failing the check_style CI job so I
just opted to what clang format wanted. I can change it though if I sent
another version. I haven't quite figured out the best way to wrap long
lines.
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v4 5/7] builtin/repo: add inflated object info to structure table
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (3 preceding siblings ...)
2025-12-16 17:38 ` [PATCH v4 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 17:38 ` [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
` (3 subsequent siblings)
8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 33 +++++++++++++++++++--
strbuf.c | 14 +++++----
strbuf.h | 5 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
4 files changed, 80 insertions(+), 34 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index e207108346..b73cfd975b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
static inline size_t get_total_reference_count(struct ref_stats *stats)
{
return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
{
struct object_stats *objects = &stats->objects;
struct ref_stats *refs = &stats->refs;
- size_t object_total;
+ size_t inflated_object_total;
+ size_t object_count_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_values(&objects->type_counts);
+ object_count_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
- stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, object_count_total, " * %s", _("Count"));
stats_table_count_addf(table, objects->type_counts.commits,
" * %s", _("Commits"));
stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_count_addf(table, objects->type_counts.tags,
" * %s", _("Tags"));
+
+ inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+ stats_table_size_addf(table, inflated_object_total,
+ " * %s", _("Inflated size"));
+ stats_table_size_addf(table, objects->inflated_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->inflated_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->inflated_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->inflated_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 9beebad5b9..512c7ba680 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -886,11 +886,15 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
*unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
*value = xstrfmt("%u", (unsigned)bytes);
- *unit = humanise_rate ?
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("byte/s", "bytes/s", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("byte", "bytes", bytes);
+ if (flags & HUMANISE_COMPACT)
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
+ *unit = humanise_rate ? _("B/s") : _("B");
+ else
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
diff --git a/strbuf.h b/strbuf.h
index 571bd889df..005c155808 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
* Use rate based unit prefixes for humanised values.
*/
HUMANISE_RATE = (1 << 0),
+ /*
+ * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
+ * values.
+ */
+ HUMANISE_COMPACT = (1 << 1),
};
/**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
| Repository structure | Value |
| -------------------- | ------ |
| * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
| | |
| * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
+ | * Inflated size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ------ |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 3.02 k |
- | * Commits | 1.01 k |
- | * Trees | 1.01 k |
- | * Blobs | 1.01 k |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ---------- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
+ | * Inflated size | 16.03 MiB |
+ | * Commits | 217.92 KiB |
+ | * Trees | 15.81 MiB |
+ | * Blobs | 11.68 KiB |
+ | * Tags | 132 B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (4 preceding siblings ...)
2025-12-16 17:38 ` [PATCH v4 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-16 17:38 ` [PATCH v4 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
` (2 subsequent siblings)
8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 18 ++++++++++++++++++
t/t1901-repo-structure.sh | 11 ++++++++++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
* Reference counts categorized by type
* Reachable object counts categorized by type
* Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index b73cfd975b..0ed41bf9d4 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
struct object_values inflated_sizes;
+ struct object_values disk_sizes;
};
struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+ printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+ printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+ printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+ printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
size_t inflated_total = 0;
+ size_t disk_total = 0;
size_t object_count;
for (size_t i = 0; i < oids->nr; i++) {
struct object_info oi = OBJECT_INFO_INIT;
unsigned long inflated;
+ off_t disk;
oi.sizep = &inflated;
+ oi.disk_sizep = &disk;
if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
continue;
inflated_total += inflated;
+ disk_total += disk;
}
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
stats->inflated_sizes.tags += inflated_total;
+ stats->disk_sizes.tags += disk_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
stats->inflated_sizes.commits += inflated_total;
+ stats->disk_sizes.commits += disk_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
stats->inflated_sizes.trees += inflated_total;
+ stats->disk_sizes.trees += disk_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
stats->inflated_sizes.blobs += inflated_total;
+ stats->disk_sizes.blobs += disk_total;
break;
default:
BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
. ./test-lib.sh
+object_type_disk_usage() {
+ git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+ --filter-provided-objects
+}
+
test_expect_success 'empty repository' '
test_when_finished "rm -rf repo" &&
git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
test_commit_bulk 42 &&
git tag -a foo -m bar &&
- cat >expect <<-\EOF &&
+ cat >expect <<-EOF &&
references.branches.count=1
references.tags.count=1
references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
objects.trees.inflated_size=28554
objects.blobs.inflated_size=453
objects.tags.inflated_size=132
+ objects.commits.disk_size=$(object_type_disk_usage commit)
+ objects.trees.disk_size=$(object_type_disk_usage tree)
+ objects.blobs.disk_size=$(object_type_disk_usage blob)
+ objects.tags.disk_size=$(object_type_disk_usage tag)
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v4 7/7] builtin/repo: add object disk size info to structure table
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (5 preceding siblings ...)
2025-12-16 17:38 ` [PATCH v4 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-16 17:38 ` Justin Tobler
2025-12-17 7:03 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
8 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-16 17:38 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 13 +++++++++++++
t/t1901-repo-structure.sh | 31 ++++++++++++++++++++++++++++---
2 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 0ed41bf9d4..a071d2fdfe 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
struct ref_stats *refs = &stats->refs;
size_t inflated_object_total;
size_t object_count_total;
+ size_t disk_object_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_size_addf(table, objects->inflated_sizes.tags,
" * %s", _("Tags"));
+
+ disk_object_total = get_total_object_values(&objects->disk_sizes);
+ stats_table_size_addf(table, disk_object_total,
+ " * %s", _("Disk size"));
+ stats_table_size_addf(table, objects->disk_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->disk_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->disk_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->disk_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..1b68525079 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,20 @@ test_description='test git repo structure'
. ./test-lib.sh
object_type_disk_usage() {
- git rev-list --all --objects --disk-usage --filter=object:type=$1 \
- --filter-provided-objects
+ disk_usage_opt="--disk-usage"
+
+ if test "$2" = "true"
+ then
+ disk_usage_opt="--disk-usage=human"
+ fi
+
+ if test "$1" = "all"
+ then
+ git rev-list --all --objects $disk_usage_opt
+ else
+ git rev-list --all --objects $disk_usage_opt \
+ --filter=object:type=$1 --filter-provided-objects
+ fi
}
test_expect_success 'empty repository' '
@@ -35,6 +47,11 @@ test_expect_success 'empty repository' '
| * Trees | 0 B |
| * Blobs | 0 B |
| * Tags | 0 B |
+ | * Disk size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -58,7 +75,10 @@ test_expect_success SHA1 'repository with references and objects' '
# Also creates a commit, tree, and blob.
git notes add -m foo &&
- cat >expect <<-\EOF &&
+ # The tags disk size is handled specially due to the
+ # git-rev-list(1) --disk-usage=human option printing the full
+ # "byte/bytes" unit prefix instead of just "B".
+ cat >expect <<-EOF &&
| Repository structure | Value |
| -------------------- | ---------- |
| * References | |
@@ -79,6 +99,11 @@ test_expect_success SHA1 'repository with references and objects' '
| * Trees | 15.81 MiB |
| * Blobs | 11.68 KiB |
| * Tags | 132 B |
+ | * Disk size | $(object_type_disk_usage all true) |
+ | * Commits | $(object_type_disk_usage commit true) |
+ | * Trees | $(object_type_disk_usage tree true) |
+ | * Blobs | $(object_type_disk_usage blob true) |
+ | * Tags | $(object_type_disk_usage tag) B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v4 0/7] builtin/repo: add object size info to structure output
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (6 preceding siblings ...)
2025-12-16 17:38 ` [PATCH v4 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-17 7:03 ` Patrick Steinhardt
2025-12-17 17:49 ` Justin Tobler
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
8 siblings, 1 reply; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-17 7:03 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster, worldhello.net
On Tue, Dec 16, 2025 at 11:38:35AM -0600, Justin Tobler wrote:
> Changes in V4:
> - Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
> to avoid conflict with translated plural "byte/bytes" string.
> - Remove some unnecessary translations and add comments to clarify some
> of the added translations.
> - Some small changes to the tests in patch 7.
I had a last tiny nit that doesn't warrant a reroll on its own. Other
than that this series looks great to me now. Thanks!
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread* Re: [PATCH v4 0/7] builtin/repo: add object size info to structure output
2025-12-17 7:03 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
@ 2025-12-17 17:49 ` Justin Tobler
0 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:49 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, worldhello.net
On 25/12/17 08:03AM, Patrick Steinhardt wrote:
> On Tue, Dec 16, 2025 at 11:38:35AM -0600, Justin Tobler wrote:
> > Changes in V4:
> > - Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
> > to avoid conflict with translated plural "byte/bytes" string.
> > - Remove some unnecessary translations and add comments to clarify some
> > of the added translations.
> > - Some small changes to the tests in patch 7.
>
> I had a last tiny nit that doesn't warrant a reroll on its own. Other
> than that this series looks great to me now. Thanks!
Junio also had some small comments. I'll go ahead a send another
version. Thanks for the review. :)
-Justin
^ permalink raw reply [flat|nested] 80+ messages in thread
* [PATCH v5 0/7] builtin/repo: add object size info to structure output
2025-12-16 17:38 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Justin Tobler
` (7 preceding siblings ...)
2025-12-17 7:03 ` [PATCH v4 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
@ 2025-12-17 17:53 ` Justin Tobler
2025-12-17 17:53 ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
` (7 more replies)
8 siblings, 8 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Greetings,
This patch series extends the recently introduced "structure" subcommand
for git-repo(1) to collect object size information. More specifically,
it shows total inflated and disk sizes of objects by object type. The
aim to provide additional insight that may be useful to users regarding
the structure of a repository.
In addition to this change, this series also updates the table output
format to downscale larger output values along with the appropriate unit
prefix. This is done to make table output more human friendly. The
keyvalue and nul output formats are left the same since they are
intended more for machine parsing.
Changes in V5:
- Small updates to some comments and log messages to improve
correctness.
- Adjusted spacing in builtin/repo.c:count_objects().
Changes in V4:
- Unmark "byte" string in "t/helper/test-simple-ipc.c" for translation
to avoid conflict with translated plural "byte/bytes" string.
- Remove some unnecessary translations and add comments to clarify some
of the added translations.
- Some small changes to the tests in patch 7.
Changes in V3:
- Address potential localization regression by making the downscaled
number format string also translatable. Also make the format string
for how the values and unit prefixes are displayed via
`strbuf_humanise_{bytes,rate}()` translatable to be more flexible.
- `strbuf_humanise_{bytes,count}_value()` has been renamed to
`humanise_{bytes,count}()` and updated to provide both the value and
unit prefix as separate strings.
- Unit prefix strings are no longer allocated and instead constant.
- The humanise flags are now defined in an enum.
- Instead of using `OBJECT_INFO_FOR_PREFETCH`,
`OBJECT_INFO_SKIP_FETCH_OBJECT` and `OBJECT_INFO_QUICK` are used
explicitly.
- Tests now use git-rev-list(1) to verify disk size info.
Changes in V2:
- Factor out and reuse existing logic from strbuf_humanise() to handle
downscaling values and determining the appropriate unit prefix
separately. This enables more control over how exactly the values are
written to the structure output table which is useful for alignment
reasons. I'm not how about the interface used in patch 2. Feedback is
most welcome.
- In the previous version, when checking object size on a missing object
we would die. Instead we now ignore missing objects. This allows the
structure command to work on partial clones.
- disk/inflated keyvalue names renamed to disk_size/inflated_size.
- Unit prefixes are marked for translation.
- The test for keyvalue disk size values are updated to check against
real expected values instead of skipping. Table output tests still
skip verifing human-readable values though.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: group per-type object values into struct
strbuf: split out logic to humanise byte values
builtin/repo: humanise count values in structure output
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: add inflated object info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add object disk size info to structure table
Documentation/git-repo.adoc | 2 +
builtin/repo.c | 175 ++++++++++++++++++++++++++++++------
strbuf.c | 102 ++++++++++++++-------
strbuf.h | 25 ++++++
t/helper/test-simple-ipc.c | 7 +-
t/t1901-repo-structure.sh | 118 ++++++++++++++++--------
6 files changed, 331 insertions(+), 98 deletions(-)
Range-diff against v4:
1: be14de68f6 = 1: be14de68f6 builtin/repo: group per-type object values into struct
2: 0a145cfeec ! 2: 61cff22afa strbuf: split out logic to humanise byte values
@@ Commit message
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
- usecase, the downscaled values and unit prefixes must be handled
+ usecase, the downscaled values and unit strings must be handled
separately to ensure proper column alignment.
Split out logic from strbuf_humanise() to downscale byte values and
@@ strbuf.c: void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
+
+ /*
+ * TRANSLATORS: The first argument is the number string. The second
-+ * argument is the unit prefix string (i.e. "12.34 MiB/s").
++ * argument is the unit string (i.e. "12.34 MiB/s").
+ */
+ strbuf_addf(buf, _("%s %s"), value, unit);
+ free(value);
@@ strbuf.h: void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbu
+enum humanise_flags {
+ /*
-+ * Use rate based unit prefixes for humanised values.
++ * Use rate based units for humanised values.
+ */
+ HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
-+ * corresponding unit prefix as two separate strings.
++ * corresponding unit as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags);
3: eebf0d917b ! 3: 0b575738c2 builtin/repo: humanise count values in structure output
@@ strbuf.h: enum humanise_flags {
+/**
+ * Converts the given count into a downscaled human-readable value and
-+ * corresponding unit prefix as two separate strings.
++ * corresponding unit as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
4: 37f71cc1bc ! 4: e2c79c8759 builtin/repo: add inflated object info to keyvalue structure output
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+ OBJECT_INFO_SKIP_FETCH_OBJECT |
-+ OBJECT_INFO_QUICK) < 0)
++ OBJECT_INFO_QUICK) < 0)
+ continue;
+
+ inflated_total += inflated;
5: 40edf4c20b ! 5: 03219630cc builtin/repo: add inflated object info to structure table
@@ strbuf.c: void humanise_bytes(off_t bytes, char **value, const char **unit,
## strbuf.h ##
@@ strbuf.h: enum humanise_flags {
- * Use rate based unit prefixes for humanised values.
+ * Use rate based units for humanised values.
*/
HUMANISE_RATE = (1 << 0),
+ /*
-+ * Use compact "B" unit prefixes instead of "byte/bytes" for humanised
++ * Use compact "B" unit symbol instead of "byte/bytes" for humanised
+ * values.
+ */
+ HUMANISE_COMPACT = (1 << 1),
6: ba861f37c9 = 6: 7d8862a064 builtin/repo: add disk size info to keyvalue stucture output
7: 3118c17ae3 ! 7: 3e2d5c20f8 builtin/repo: add object disk size info to structure table
@@ t/t1901-repo-structure.sh: test_expect_success SHA1 'repository with references
- cat >expect <<-\EOF &&
+ # The tags disk size is handled specially due to the
+ # git-rev-list(1) --disk-usage=human option printing the full
-+ # "byte/bytes" unit prefix instead of just "B".
++ # "byte/bytes" unit string instead of just "B".
+ cat >expect <<-EOF &&
| Repository structure | Value |
| -------------------- | ---------- |
base-commit: e85ae279b0d58edc2f4c3fd5ac391b51e1223985
--
2.52.0.209.ge85ae279b0
^ permalink raw reply [flat|nested] 80+ messages in thread* [PATCH v5 1/7] builtin/repo: group per-type object values into struct
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
@ 2025-12-17 17:53 ` Justin Tobler
2025-12-17 17:53 ` [PATCH v5 2/7] strbuf: split out logic to humanise byte values Justin Tobler
` (6 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 2a653bd3ea..a69699857a 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -202,13 +202,17 @@ struct ref_stats {
size_t others;
};
-struct object_stats {
+struct object_values {
size_t tags;
size_t commits;
size_t trees;
size_t blobs;
};
+struct object_stats {
+ struct object_values type_counts;
+};
+
struct repo_structure {
struct ref_stats refs;
struct object_stats objects;
@@ -281,9 +285,9 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
return stats->branches + stats->remotes + stats->tags + stats->others;
}
-static inline size_t get_total_object_count(struct object_stats *stats)
+static inline size_t get_total_object_values(struct object_values *values)
{
- return stats->tags + stats->commits + stats->trees + stats->blobs;
+ return values->tags + values->commits + values->trees + values->blobs;
}
static void stats_table_setup_structure(struct stats_table *table,
@@ -302,14 +306,18 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_count(objects);
+ object_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
stats_table_count_addf(table, object_total, " * %s", _("Count"));
- stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
- stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
- stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
- stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, objects->type_counts.commits,
+ " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->type_counts.trees,
+ " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->type_counts.blobs,
+ " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->type_counts.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -389,13 +397,13 @@ static void structure_keyvalue_print(struct repo_structure *stats,
(uintmax_t)stats->refs.others, value_delim);
printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.commits, value_delim);
+ (uintmax_t)stats->objects.type_counts.commits, value_delim);
printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.trees, value_delim);
+ (uintmax_t)stats->objects.type_counts.trees, value_delim);
printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.blobs, value_delim);
+ (uintmax_t)stats->objects.type_counts.blobs, value_delim);
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
- (uintmax_t)stats->objects.tags, value_delim);
+ (uintmax_t)stats->objects.type_counts.tags, value_delim);
fflush(stdout);
}
@@ -473,22 +481,22 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
switch (type) {
case OBJ_TAG:
- stats->tags += oids->nr;
+ stats->type_counts.tags += oids->nr;
break;
case OBJ_COMMIT:
- stats->commits += oids->nr;
+ stats->type_counts.commits += oids->nr;
break;
case OBJ_TREE:
- stats->trees += oids->nr;
+ stats->type_counts.trees += oids->nr;
break;
case OBJ_BLOB:
- stats->blobs += oids->nr;
+ stats->type_counts.blobs += oids->nr;
break;
default:
BUG("invalid object type");
}
- object_count = get_total_object_count(stats);
+ object_count = get_total_object_values(&stats->type_counts);
display_progress(data->progress, object_count);
return 0;
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 2/7] strbuf: split out logic to humanise byte values
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
2025-12-17 17:53 ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
@ 2025-12-17 17:53 ` Justin Tobler
2025-12-17 17:54 ` [PATCH v5 3/7] builtin/repo: humanise count values in structure output Justin Tobler
` (5 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:53 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit strings must be handled
separately to ensure proper column alignment.
Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.
Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
for translation here so that it doesn't conflict with the newly defined
plural "byte/bytes" translation and instead uses it.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
strbuf.c | 74 ++++++++++++++++++++------------------
strbuf.h | 14 ++++++++
t/helper/test-simple-ipc.c | 7 +++-
3 files changed, 60 insertions(+), 35 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 6c3851a7f8..349ee9727a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,47 +836,53 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
-static void strbuf_humanise(struct strbuf *buf, off_t bytes,
- int humanise_rate)
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags)
{
+ int humanise_rate = flags & HUMANISE_RATE;
+
if (bytes > 1 << 30) {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte */
- _("%u.%2.2u GiB") :
- /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second */
- _("%u.%2.2u GiB/s"),
- (unsigned)(bytes >> 30),
- (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(bytes >> 30),
+ (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
+ /* TRANSLATORS: IEC 80000-13:2008 gibibyte/second and gibibyte */
+ *unit = humanise_rate ? _("GiB/s") : _("GiB");
} else if (bytes > 1 << 20) {
- unsigned x = bytes + 5243; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte */
- _("%u.%2.2u MiB") :
- /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second */
- _("%u.%2.2u MiB/s"),
- x >> 20, ((x & ((1 << 20) - 1)) * 100) >> 20);
+ unsigned x = bytes + 5243; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 20,
+ ((x & ((1 << 20) - 1)) * 100) >> 20);
+ /* TRANSLATORS: IEC 80000-13:2008 mebibyte/second and mebibyte */
+ *unit = humanise_rate ? _("MiB/s") : _("MiB");
} else if (bytes > 1 << 10) {
- unsigned x = bytes + 5; /* for rounding */
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte */
- _("%u.%2.2u KiB") :
- /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second */
- _("%u.%2.2u KiB/s"),
- x >> 10, ((x & ((1 << 10) - 1)) * 100) >> 10);
+ unsigned x = bytes + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), x >> 10,
+ ((x & ((1 << 10) - 1)) * 100) >> 10);
+ /* TRANSLATORS: IEC 80000-13:2008 kibibyte/second and kibibyte */
+ *unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
- strbuf_addf(buf,
- humanise_rate == 0 ?
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("%u byte", "%u bytes", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("%u byte/s", "%u bytes/s", bytes),
- (unsigned)bytes);
+ *value = xstrfmt("%u", (unsigned)bytes);
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
+static void strbuf_humanise(struct strbuf *buf, off_t bytes, unsigned flags)
+{
+ char *value;
+ const char *unit;
+
+ humanise_bytes(bytes, &value, &unit, flags);
+
+ /*
+ * TRANSLATORS: The first argument is the number string. The second
+ * argument is the unit string (i.e. "12.34 MiB/s").
+ */
+ strbuf_addf(buf, _("%s %s"), value, unit);
+ free(value);
+}
+
void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
{
strbuf_humanise(buf, bytes, 0);
@@ -884,7 +890,7 @@ void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
void strbuf_humanise_rate(struct strbuf *buf, off_t bytes)
{
- strbuf_humanise(buf, bytes, 1);
+ strbuf_humanise(buf, bytes, HUMANISE_RATE);
}
int printf_ln(const char *fmt, ...)
diff --git a/strbuf.h b/strbuf.h
index a580ac6084..698b3cc4a5 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,20 @@ void strbuf_addbuf_percentquote(struct strbuf *dst, const struct strbuf *src);
*/
void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags);
+enum humanise_flags {
+ /*
+ * Use rate based units for humanised values.
+ */
+ HUMANISE_RATE = (1 << 0),
+};
+
+/**
+ * Converts the given byte size into a downscaled human-readable value and
+ * corresponding unit as two separate strings.
+ */
+void humanise_bytes(off_t bytes, char **value, const char **unit,
+ unsigned flags);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
index 03cc5eea2c..442ad6b16f 100644
--- a/t/helper/test-simple-ipc.c
+++ b/t/helper/test-simple-ipc.c
@@ -603,7 +603,12 @@ int cmd__simple_ipc(int argc, const char **argv)
OPT_INTEGER(0, "bytecount", &cl_args.bytecount, N_("number of bytes")),
OPT_INTEGER(0, "batchsize", &cl_args.batchsize, N_("number of requests per thread")),
- OPT_STRING(0, "byte", &bytevalue, N_("byte"), N_("ballast character")),
+ /*
+ * The "byte" string here is not marked for translation and
+ * instead relies on translation in strbuf.c:humanise_bytes() to
+ * avoid conflict with the plural form.
+ */
+ OPT_STRING(0, "byte", &bytevalue, "byte", N_("ballast character")),
OPT_STRING(0, "token", &cl_args.token, N_("token"), N_("command token to send to the server")),
OPT_END()
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 3/7] builtin/repo: humanise count values in structure output
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
2025-12-17 17:53 ` [PATCH v5 1/7] builtin/repo: group per-type object values into struct Justin Tobler
2025-12-17 17:53 ` [PATCH v5 2/7] strbuf: split out logic to humanise byte values Justin Tobler
@ 2025-12-17 17:54 ` Justin Tobler
2025-12-17 17:54 ` [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
` (4 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 38 +++++++++++++++++-------
strbuf.c | 26 ++++++++++++++++
strbuf.h | 6 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++--------------------
4 files changed, 91 insertions(+), 41 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index a69699857a..9c61bc3e17 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -223,6 +223,7 @@ struct stats_table {
int name_col_width;
int value_col_width;
+ int unit_col_width;
};
/*
@@ -230,6 +231,7 @@ struct stats_table {
*/
struct stats_table_entry {
char *value;
+ const char *unit;
};
static void stats_table_vaddf(struct stats_table *table,
@@ -250,11 +252,18 @@ static void stats_table_vaddf(struct stats_table *table,
if (name_width > table->name_col_width)
table->name_col_width = name_width;
- if (entry) {
+ if (!entry)
+ return;
+ if (entry->value) {
int value_width = utf8_strwidth(entry->value);
if (value_width > table->value_col_width)
table->value_col_width = value_width;
}
+ if (entry->unit) {
+ int unit_width = utf8_strwidth(entry->unit);
+ if (unit_width > table->unit_col_width)
+ table->unit_col_width = unit_width;
+ }
}
static void stats_table_addf(struct stats_table *table, const char *format, ...)
@@ -273,7 +282,7 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_list ap;
CALLOC_ARRAY(entry, 1);
- entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+ humanise_count(value, &entry->value, &entry->unit);
va_start(ap, format);
stats_table_vaddf(table, entry, format, ap);
@@ -324,20 +333,24 @@ static void stats_table_print_structure(const struct stats_table *table)
{
const char *name_col_title = _("Repository structure");
const char *value_col_title = _("Value");
- int name_col_width = utf8_strwidth(name_col_title);
- int value_col_width = utf8_strwidth(value_col_title);
+ int title_name_width = utf8_strwidth(name_col_title);
+ int title_value_width = utf8_strwidth(value_col_title);
+ int name_col_width = table->name_col_width;
+ int value_col_width = table->value_col_width;
+ int unit_col_width = table->unit_col_width;
struct string_list_item *item;
struct strbuf buf = STRBUF_INIT;
- if (table->name_col_width > name_col_width)
- name_col_width = table->name_col_width;
- if (table->value_col_width > value_col_width)
- value_col_width = table->value_col_width;
+ if (title_name_width > name_col_width)
+ name_col_width = title_name_width;
+ if (title_value_width > value_col_width + unit_col_width + 1)
+ value_col_width = title_value_width - unit_col_width;
strbuf_addstr(&buf, "| ");
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, name_col_title);
strbuf_addstr(&buf, " | ");
- strbuf_utf8_align(&buf, ALIGN_LEFT, value_col_width, value_col_title);
+ strbuf_utf8_align(&buf, ALIGN_LEFT,
+ value_col_width + unit_col_width + 1, value_col_title);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
@@ -345,17 +358,20 @@ static void stats_table_print_structure(const struct stats_table *table)
for (int i = 0; i < name_col_width; i++)
putchar('-');
printf(" | ");
- for (int i = 0; i < value_col_width; i++)
+ for (int i = 0; i < value_col_width + unit_col_width + 1; i++)
putchar('-');
printf(" |\n");
for_each_string_list_item(item, &table->rows) {
struct stats_table_entry *entry = item->util;
const char *value = "";
+ const char *unit = "";
if (entry) {
struct stats_table_entry *entry = item->util;
value = entry->value;
+ if (entry->unit)
+ unit = entry->unit;
}
strbuf_reset(&buf);
@@ -363,6 +379,8 @@ static void stats_table_print_structure(const struct stats_table *table)
strbuf_utf8_align(&buf, ALIGN_LEFT, name_col_width, item->string);
strbuf_addstr(&buf, " | ");
strbuf_utf8_align(&buf, ALIGN_RIGHT, value_col_width, value);
+ strbuf_addch(&buf, ' ');
+ strbuf_utf8_align(&buf, ALIGN_LEFT, unit_col_width, unit);
strbuf_addstr(&buf, " |");
printf("%s\n", buf.buf);
}
diff --git a/strbuf.c b/strbuf.c
index 349ee9727a..995ff15169 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -836,6 +836,32 @@ void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
}
+void humanise_count(size_t count, char **value, const char **unit)
+{
+ if (count >= 1000000000) {
+ size_t x = count + 5000000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000000),
+ (unsigned)(x % 1000000000 / 10000000));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^9 */
+ *unit = _("G");
+ } else if (count >= 1000000) {
+ size_t x = count + 5000; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000000),
+ (unsigned)(x % 1000000 / 10000));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^6 */
+ *unit = _("M");
+ } else if (count >= 1000) {
+ size_t x = count + 5; /* for rounding */
+ *value = xstrfmt(_("%u.%2.2u"), (unsigned)(x / 1000),
+ (unsigned)(x % 1000 / 10));
+ /* TRANSLATORS: SI decimal prefix symbol for 10^3 */
+ *unit = _("k");
+ } else {
+ *value = xstrfmt("%u", (unsigned)count);
+ *unit = NULL;
+ }
+}
+
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags)
{
diff --git a/strbuf.h b/strbuf.h
index 698b3cc4a5..52feef4c1b 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -381,6 +381,12 @@ enum humanise_flags {
void humanise_bytes(off_t bytes, char **value, const char **unit,
unsigned flags);
+/**
+ * Converts the given count into a downscaled human-readable value and
+ * corresponding unit as two separate strings.
+ */
+void humanise_count(size_t count, char **value, const char **unit);
+
/**
* Append the given byte size as a human-readable string (i.e. 12.23 KiB,
* 3.50 MiB).
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 36a71a144e..55fd13ad1b 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -10,21 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
- | | |
- | * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -39,7 +39,7 @@ test_expect_success 'repository with references and objects' '
git init repo &&
(
cd repo &&
- test_commit_bulk 42 &&
+ test_commit_bulk 1005 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
@@ -49,21 +49,21 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 130 |
- | * Commits | 43 |
- | * Trees | 43 |
- | * Blobs | 43 |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ------ |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue structure output
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
` (2 preceding siblings ...)
2025-12-17 17:54 ` [PATCH v5 3/7] builtin/repo: humanise count values in structure output Justin Tobler
@ 2025-12-17 17:54 ` Justin Tobler
2025-12-17 17:54 ` [PATCH v5 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
` (3 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 33 +++++++++++++++++++++++++++++++++
t/t1901-repo-structure.sh | 6 +++++-
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 70f0a6d2e4..287eee4b93 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -50,6 +50,7 @@ supported:
+
* Reference counts categorized by type
* Reachable object counts categorized by type
+* Total inflated size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 9c61bc3e17..8da321a386 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -2,6 +2,8 @@
#include "builtin.h"
#include "environment.h"
+#include "hex.h"
+#include "odb.h"
#include "parse-options.h"
#include "path-walk.h"
#include "progress.h"
@@ -211,6 +213,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
+ struct object_values inflated_sizes;
};
struct repo_structure {
@@ -423,6 +426,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.type_counts.tags, value_delim);
+ printf("objects.commits.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.commits, value_delim);
+ printf("objects.trees.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.trees, value_delim);
+ printf("objects.blobs.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.blobs, value_delim);
+ printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -486,6 +498,7 @@ static void structure_count_references(struct ref_stats *stats,
}
struct count_objects_data {
+ struct object_database *odb;
struct object_stats *stats;
struct progress *progress;
};
@@ -495,20 +508,39 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
{
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
+ size_t inflated_total = 0;
size_t object_count;
+ for (size_t i = 0; i < oids->nr; i++) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ unsigned long inflated;
+
+ oi.sizep = &inflated;
+
+ if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
+ OBJECT_INFO_SKIP_FETCH_OBJECT |
+ OBJECT_INFO_QUICK) < 0)
+ continue;
+
+ inflated_total += inflated;
+ }
+
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
+ stats->inflated_sizes.tags += inflated_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
+ stats->inflated_sizes.commits += inflated_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
+ stats->inflated_sizes.trees += inflated_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
+ stats->inflated_sizes.blobs += inflated_total;
break;
default:
BUG("invalid object type");
@@ -526,6 +558,7 @@ static void structure_count_objects(struct object_stats *stats,
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct count_objects_data data = {
+ .odb = repo->objects,
.stats = stats,
};
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 55fd13ad1b..33237822fd 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,7 +73,7 @@ test_expect_success 'repository with references and objects' '
)
'
-test_expect_success 'keyvalue and nul format' '
+test_expect_success SHA1 'keyvalue and nul format' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -90,6 +90,10 @@ test_expect_success 'keyvalue and nul format' '
objects.trees.count=42
objects.blobs.count=42
objects.tags.count=1
+ objects.commits.inflated_size=9225
+ objects.trees.inflated_size=28554
+ objects.blobs.inflated_size=453
+ objects.tags.inflated_size=132
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 5/7] builtin/repo: add inflated object info to structure table
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
` (3 preceding siblings ...)
2025-12-17 17:54 ` [PATCH v5 4/7] builtin/repo: add inflated object info to keyvalue " Justin Tobler
@ 2025-12-17 17:54 ` Justin Tobler
2025-12-17 17:54 ` [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
` (2 subsequent siblings)
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 33 +++++++++++++++++++--
strbuf.c | 14 +++++----
strbuf.h | 5 ++++
t/t1901-repo-structure.sh | 62 +++++++++++++++++++++++----------------
4 files changed, 80 insertions(+), 34 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 8da321a386..67d7548b88 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -292,6 +292,20 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
+static void stats_table_size_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ humanise_bytes(value, &entry->value, &entry->unit, HUMANISE_COMPACT);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
static inline size_t get_total_reference_count(struct ref_stats *stats)
{
return stats->branches + stats->remotes + stats->tags + stats->others;
@@ -307,7 +321,8 @@ static void stats_table_setup_structure(struct stats_table *table,
{
struct object_stats *objects = &stats->objects;
struct ref_stats *refs = &stats->refs;
- size_t object_total;
+ size_t inflated_object_total;
+ size_t object_count_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -318,10 +333,10 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
- object_total = get_total_object_values(&objects->type_counts);
+ object_count_total = get_total_object_values(&objects->type_counts);
stats_table_addf(table, "");
stats_table_addf(table, "* %s", _("Reachable objects"));
- stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, object_count_total, " * %s", _("Count"));
stats_table_count_addf(table, objects->type_counts.commits,
" * %s", _("Commits"));
stats_table_count_addf(table, objects->type_counts.trees,
@@ -330,6 +345,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_count_addf(table, objects->type_counts.tags,
" * %s", _("Tags"));
+
+ inflated_object_total = get_total_object_values(&objects->inflated_sizes);
+ stats_table_size_addf(table, inflated_object_total,
+ " * %s", _("Inflated size"));
+ stats_table_size_addf(table, objects->inflated_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->inflated_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->inflated_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->inflated_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/strbuf.c b/strbuf.c
index 995ff15169..7fb7d12ac0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -886,11 +886,15 @@ void humanise_bytes(off_t bytes, char **value, const char **unit,
*unit = humanise_rate ? _("KiB/s") : _("KiB");
} else {
*value = xstrfmt("%u", (unsigned)bytes);
- *unit = humanise_rate ?
- /* TRANSLATORS: IEC 80000-13:2008 byte/second */
- Q_("byte/s", "bytes/s", bytes) :
- /* TRANSLATORS: IEC 80000-13:2008 byte */
- Q_("byte", "bytes", bytes);
+ if (flags & HUMANISE_COMPACT)
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second and byte */
+ *unit = humanise_rate ? _("B/s") : _("B");
+ else
+ *unit = humanise_rate ?
+ /* TRANSLATORS: IEC 80000-13:2008 byte/second */
+ Q_("byte/s", "bytes/s", bytes) :
+ /* TRANSLATORS: IEC 80000-13:2008 byte */
+ Q_("byte", "bytes", bytes);
}
}
diff --git a/strbuf.h b/strbuf.h
index 52feef4c1b..06e284f9cc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -372,6 +372,11 @@ enum humanise_flags {
* Use rate based units for humanised values.
*/
HUMANISE_RATE = (1 << 0),
+ /*
+ * Use compact "B" unit symbol instead of "byte/bytes" for humanised
+ * values.
+ */
+ HUMANISE_COMPACT = (1 << 1),
};
/**
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 33237822fd..b18213c660 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -13,18 +13,23 @@ test_expect_success 'empty repository' '
| Repository structure | Value |
| -------------------- | ------ |
| * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
| | |
| * Reachable objects | |
- | * Count | 0 |
- | * Commits | 0 |
- | * Trees | 0 |
- | * Blobs | 0 |
- | * Tags | 0 |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
+ | * Inflated size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -34,7 +39,7 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references and objects' '
+test_expect_success SHA1 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
@@ -49,21 +54,26 @@ test_expect_success 'repository with references and objects' '
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository structure | Value |
- | -------------------- | ------ |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
- | | |
- | * Reachable objects | |
- | * Count | 3.02 k |
- | * Commits | 1.01 k |
- | * Trees | 1.01 k |
- | * Blobs | 1.01 k |
- | * Tags | 1 |
+ | Repository structure | Value |
+ | -------------------- | ---------- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 3.02 k |
+ | * Commits | 1.01 k |
+ | * Trees | 1.01 k |
+ | * Blobs | 1.01 k |
+ | * Tags | 1 |
+ | * Inflated size | 16.03 MiB |
+ | * Commits | 217.92 KiB |
+ | * Trees | 15.81 MiB |
+ | * Blobs | 11.68 KiB |
+ | * Tags | 132 B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
` (4 preceding siblings ...)
2025-12-17 17:54 ` [PATCH v5 5/7] builtin/repo: add inflated object info to structure table Justin Tobler
@ 2025-12-17 17:54 ` Justin Tobler
2025-12-17 17:54 ` [PATCH v5 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
2025-12-18 6:32 ` [PATCH v5 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 18 ++++++++++++++++++
t/t1901-repo-structure.sh | 11 ++++++++++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 287eee4b93..861073f641 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -51,6 +51,7 @@ supported:
* Reference counts categorized by type
* Reachable object counts categorized by type
* Total inflated size of reachable objects by type
+* Total disk size of reachable objects by type
+
The output format can be chosen through the flag `--format`. Three formats are
diff --git a/builtin/repo.c b/builtin/repo.c
index 67d7548b88..7ea051f3af 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -214,6 +214,7 @@ struct object_values {
struct object_stats {
struct object_values type_counts;
struct object_values inflated_sizes;
+ struct object_values disk_sizes;
};
struct repo_structure {
@@ -462,6 +463,15 @@ static void structure_keyvalue_print(struct repo_structure *stats,
printf("objects.tags.inflated_size%c%" PRIuMAX "%c", key_delim,
(uintmax_t)stats->objects.inflated_sizes.tags, value_delim);
+ printf("objects.commits.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.commits, value_delim);
+ printf("objects.trees.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.trees, value_delim);
+ printf("objects.blobs.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.blobs, value_delim);
+ printf("objects.tags.disk_size%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.disk_sizes.tags, value_delim);
+
fflush(stdout);
}
@@ -536,13 +546,16 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
struct count_objects_data *data = cb_data;
struct object_stats *stats = data->stats;
size_t inflated_total = 0;
+ size_t disk_total = 0;
size_t object_count;
for (size_t i = 0; i < oids->nr; i++) {
struct object_info oi = OBJECT_INFO_INIT;
unsigned long inflated;
+ off_t disk;
oi.sizep = &inflated;
+ oi.disk_sizep = &disk;
if (odb_read_object_info_extended(data->odb, &oids->oid[i], &oi,
OBJECT_INFO_SKIP_FETCH_OBJECT |
@@ -550,24 +563,29 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
continue;
inflated_total += inflated;
+ disk_total += disk;
}
switch (type) {
case OBJ_TAG:
stats->type_counts.tags += oids->nr;
stats->inflated_sizes.tags += inflated_total;
+ stats->disk_sizes.tags += disk_total;
break;
case OBJ_COMMIT:
stats->type_counts.commits += oids->nr;
stats->inflated_sizes.commits += inflated_total;
+ stats->disk_sizes.commits += disk_total;
break;
case OBJ_TREE:
stats->type_counts.trees += oids->nr;
stats->inflated_sizes.trees += inflated_total;
+ stats->disk_sizes.trees += disk_total;
break;
case OBJ_BLOB:
stats->type_counts.blobs += oids->nr;
stats->inflated_sizes.blobs += inflated_total;
+ stats->disk_sizes.blobs += disk_total;
break;
default:
BUG("invalid object type");
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index b18213c660..dd17caad05 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -4,6 +4,11 @@ test_description='test git repo structure'
. ./test-lib.sh
+object_type_disk_usage() {
+ git rev-list --all --objects --disk-usage --filter=object:type=$1 \
+ --filter-provided-objects
+}
+
test_expect_success 'empty repository' '
test_when_finished "rm -rf repo" &&
git init repo &&
@@ -91,7 +96,7 @@ test_expect_success SHA1 'keyvalue and nul format' '
test_commit_bulk 42 &&
git tag -a foo -m bar &&
- cat >expect <<-\EOF &&
+ cat >expect <<-EOF &&
references.branches.count=1
references.tags.count=1
references.remotes.count=0
@@ -104,6 +109,10 @@ test_expect_success SHA1 'keyvalue and nul format' '
objects.trees.inflated_size=28554
objects.blobs.inflated_size=453
objects.tags.inflated_size=132
+ objects.commits.disk_size=$(object_type_disk_usage commit)
+ objects.trees.disk_size=$(object_type_disk_usage tree)
+ objects.blobs.disk_size=$(object_type_disk_usage blob)
+ objects.tags.disk_size=$(object_type_disk_usage tag)
EOF
git repo structure --format=keyvalue >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* [PATCH v5 7/7] builtin/repo: add object disk size info to structure table
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
` (5 preceding siblings ...)
2025-12-17 17:54 ` [PATCH v5 6/7] builtin/repo: add disk size info to keyvalue stucture output Justin Tobler
@ 2025-12-17 17:54 ` Justin Tobler
2025-12-18 6:32 ` [PATCH v5 0/7] builtin/repo: add object size info to structure output Patrick Steinhardt
7 siblings, 0 replies; 80+ messages in thread
From: Justin Tobler @ 2025-12-17 17:54 UTC (permalink / raw)
To: git; +Cc: ps, gitster, worldhello.net, Justin Tobler
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 13 +++++++++++++
t/t1901-repo-structure.sh | 31 ++++++++++++++++++++++++++++---
2 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 7ea051f3af..09bc8fccfd 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -324,6 +324,7 @@ static void stats_table_setup_structure(struct stats_table *table,
struct ref_stats *refs = &stats->refs;
size_t inflated_object_total;
size_t object_count_total;
+ size_t disk_object_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -358,6 +359,18 @@ static void stats_table_setup_structure(struct stats_table *table,
" * %s", _("Blobs"));
stats_table_size_addf(table, objects->inflated_sizes.tags,
" * %s", _("Tags"));
+
+ disk_object_total = get_total_object_values(&objects->disk_sizes);
+ stats_table_size_addf(table, disk_object_total,
+ " * %s", _("Disk size"));
+ stats_table_size_addf(table, objects->disk_sizes.commits,
+ " * %s", _("Commits"));
+ stats_table_size_addf(table, objects->disk_sizes.trees,
+ " * %s", _("Trees"));
+ stats_table_size_addf(table, objects->disk_sizes.blobs,
+ " * %s", _("Blobs"));
+ stats_table_size_addf(table, objects->disk_sizes.tags,
+ " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index dd17caad05..435fd979fa 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -5,8 +5,20 @@ test_description='test git repo structure'
. ./test-lib.sh
object_type_disk_usage() {
- git rev-list --all --objects --disk-usage --filter=object:type=$1 \
- --filter-provided-objects
+ disk_usage_opt="--disk-usage"
+
+ if test "$2" = "true"
+ then
+ disk_usage_opt="--disk-usage=human"
+ fi
+
+ if test "$1" = "all"
+ then
+ git rev-list --all --objects $disk_usage_opt
+ else
+ git rev-list --all --objects $disk_usage_opt \
+ --filter=object:type=$1 --filter-provided-objects
+ fi
}
test_expect_success 'empty repository' '
@@ -35,6 +47,11 @@ test_expect_success 'empty repository' '
| * Trees | 0 B |
| * Blobs | 0 B |
| * Tags | 0 B |
+ | * Disk size | 0 B |
+ | * Commits | 0 B |
+ | * Trees | 0 B |
+ | * Blobs | 0 B |
+ | * Tags | 0 B |
EOF
git repo structure >out 2>err &&
@@ -58,7 +75,10 @@ test_expect_success SHA1 'repository with references and objects' '
# Also creates a commit, tree, and blob.
git notes add -m foo &&
- cat >expect <<-\EOF &&
+ # The tags disk size is handled specially due to the
+ # git-rev-list(1) --disk-usage=human option printing the full
+ # "byte/bytes" unit string instead of just "B".
+ cat >expect <<-EOF &&
| Repository structure | Value |
| -------------------- | ---------- |
| * References | |
@@ -79,6 +99,11 @@ test_expect_success SHA1 'repository with references and objects' '
| * Trees | 15.81 MiB |
| * Blobs | 11.68 KiB |
| * Tags | 132 B |
+ | * Disk size | $(object_type_disk_usage all true) |
+ | * Commits | $(object_type_disk_usage commit true) |
+ | * Trees | $(object_type_disk_usage tree true) |
+ | * Blobs | $(object_type_disk_usage blob true) |
+ | * Tags | $(object_type_disk_usage tag) B |
EOF
git repo structure >out 2>err &&
--
2.52.0.209.ge85ae279b0
^ permalink raw reply related [flat|nested] 80+ messages in thread* Re: [PATCH v5 0/7] builtin/repo: add object size info to structure output
2025-12-17 17:53 ` [PATCH v5 " Justin Tobler
` (6 preceding siblings ...)
2025-12-17 17:54 ` [PATCH v5 7/7] builtin/repo: add object disk size info to structure table Justin Tobler
@ 2025-12-18 6:32 ` Patrick Steinhardt
7 siblings, 0 replies; 80+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:32 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, gitster, worldhello.net
On Wed, Dec 17, 2025 at 11:53:57AM -0600, Justin Tobler wrote:
> Greetings,
>
> This patch series extends the recently introduced "structure" subcommand
> for git-repo(1) to collect object size information. More specifically,
> it shows total inflated and disk sizes of objects by object type. The
> aim to provide additional insight that may be useful to users regarding
> the structure of a repository.
>
> In addition to this change, this series also updates the table output
> format to downscale larger output values along with the appropriate unit
> prefix. This is done to make table output more human friendly. The
> keyvalue and nul output formats are left the same since they are
> intended more for machine parsing.
>
> Changes in V5:
> - Small updates to some comments and log messages to improve
> correctness.
> - Adjusted spacing in builtin/repo.c:count_objects().
I'm happy with this version, thanks!
Patrick
^ permalink raw reply [flat|nested] 80+ messages in thread