From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com
Subject: Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
Date: Tue, 23 Sep 2025 12:52:44 +0200 [thread overview]
Message-ID: <aNJ7_GoKT5ea4QJE@pks.im> (raw)
In-Reply-To: <20250923025700.3046260-2-jltobler@gmail.com>
On Mon, Sep 22, 2025 at 09:56:57PM -0500, Justin Tobler wrote:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command. Acquiring this information requires
> users to be fairly knowledgeable about the structure of a Git repository
> and how to identify the relevant data points. To fill this gap,
> supplemental tools such as git-sizer(1) have been developed.
>
> To allow users to more readily identify potential issues for a
> repository, introduce the "stats" subcommand in git-repo(1) to output
> stats for the repository that may be of interest to users. The goal of
> this subcommand is to eventually provide similar functionality to
> git-sizer(1), but in Git natively.
Nit: "but natively in Git" would read more natural.
> The initial version of this command only iterates through all references
> in the repository and tracks the count of branches, tags, remotes, and
s/remotes/remote refs/
> other reference types. The corresponding information is displayed in a
> human-friendly table formatted in a very similar manner to git-sizer(1).
> The width of each table column is adjusted automatically to satisfy the
> requirements of the widest row contained.
>
> Subsequent commits will surface additional relevant data points to
> output.
>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
Is this command built on Derrick's git-survey(1)? If so, it would
probably be nice to add a "Based-on-patch-by" tag.
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 209afd1b61..7762329551 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -43,6 +44,12 @@ supported:
> +
> `-z` is an alias for `--format=nul`.
>
> +stats::
> + Retrieve stats about the current repository. All references in the
s/stats/statistics/
> + repository are categorized and counted accordingly.
> ++
> +The table output format may change and is not intended for machine parsing.
I guess we don't yet have a machine-parseable interface in this state,
so we cannot ponit to it.
> @@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> +struct stats {
> + size_t branches;
> + size_t remotes;
> + size_t tags;
> + size_t others;
> +};
> +
> +struct stats_table {
> + struct string_list rows;
> +
> + int name_col_width;
> + int value_col_width;
You assign the result from `strlen()` to these fields, so they should
probably be `size_t`.
> +};
> +
> +struct stats_table_entry {
> + char *value;
> +};
> +
> +static void stats_table_add(struct stats_table *table, const char *name,
> + struct stats_table_entry *entry)
> +{
> + int name_width = strlen(name);
`strlen()` returns `size_t`.
> + struct string_list_item *item;
> +
> + item = string_list_append(&table->rows, name);
> + item->util = entry;
> +
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
> + if (entry) {
> + int value_width = strlen(entry->value);
> + if (value_width > table->value_col_width)
> + table->value_col_width = value_width;
> + }
I was wondering at first why you'd ever want to not pass an entry, but
we use that to have "dividers" in the table. Makes sense.
> +}
> +
> +static void stats_table_add_count(struct stats_table *table, const char *name,
> + size_t value)
> +{
> + struct stats_table_entry *entry;
> +
> + CALLOC_ARRAY(entry, 1);
> + entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> + stats_table_add(table, name, entry);
> +}
> +
> +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> +{
> + size_t ref_total;
> +
> + ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> + stats_table_add(table, _("* References"), NULL);
> + stats_table_add_count(table, _(" * Count"), ref_total);
> + stats_table_add_count(table, _(" * Branches"), stats->branches);
> + stats_table_add_count(table, _(" * Tags"), stats->tags);
> + stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> + stats_table_add_count(table, _(" * Others"), stats->others);
> +}
Would it make sense to not translate the formatting directives, but only
the actual words?
> +static void stats_table_print(struct stats_table *table)
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + int name_col_width = strlen(name_col_title);
> + int value_col_width = strlen(value_col_title);
These should both be `size_t`.
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> +
> + if (table->name_col_width > name_col_width)
> + name_col_width = table->name_col_width;
> + if (table->value_col_width > value_col_width)
> + value_col_width = table->value_col_width;
> +
> + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
Aha, that's why you went with `int`. You can use `cast_size_to_to_int()`
to convert between the types.
> + strbuf_addstr(&buf, "| ");
> + strbuf_addchars(&buf, '-', name_col_width);
> + strbuf_addstr(&buf, " | ");
> + strbuf_addchars(&buf, '-', value_col_width);
> + strbuf_addstr(&buf, " |\n");
> +
> + for_each_string_list_item (item, &table->rows) {
We typically don't have a space after between the macro and its
arguments.
> + struct stats_table_entry *entry = item->util;
> + const char *value = "";
> +
> + if (entry) {
> + struct stats_table_entry *entry = item->util;
> + value = entry->value;
> + }
> +
> + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> + item->string, value_col_width, value);
> +
> + if (entry)
> + free(entry->value);
It's a bit weird that we free the values when we pretend to only print
data. Sure, we probably don't ever have a usecase where we want to print
data a second time. But I still think it would be nice to separate
concerns.
> + }
> +
> + fputs(buf.buf, stdout);
> + strbuf_release(&buf);
> +}
> +
> +static void stats_count_references(struct stats *stats, struct ref_array *refs)
> +{
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + stats->branches++;
> + break;
> + case FILTER_REFS_REMOTES:
> + stats->remotes++;
> + break;
> + case FILTER_REFS_TAGS:
> + stats->tags++;
> + break;
> + case FILTER_REFS_OTHERS:
> + stats->others++;
> + break;
Do we want to have a `default:` case where we `BUG()`? Otherwise we may
not notice that we undercount the overall number of refs.
> + }
> + }
> +}
> +
> +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> + const char *prefix UNUSED, struct repository *repo UNUSED)
Not a new issue, but I'd rather call this `cmd_repo_stats()` to note
that this is the entrypoint. We might as well adapt the other subcommand
to follow that naming schema in a preparatory commit.
> +{
> + struct ref_filter filter = REF_FILTER_INIT;
> + struct strvec ref_patterns = STRVEC_INIT;
> + struct stats_table table = { 0 };
> + struct ref_array refs = { 0 };
> + struct stats stats = { 0 };
> +
> + filter.name_patterns = ref_patterns.v;
> + filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
`filter_refs()` may return an error code which we should probably
handle.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> new file mode 100755
> index 0000000000..27c32ec45f
> --- /dev/null
> +++ b/t/t1901-repo-stats.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +
> +test_description='test git repo stats'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'empty repository stats' '
Nit: I don't think it's necessary to repeat "repository stats" in every
test name. That's already clear from the test suite.
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
s/EOF/\EOF/, as we don't need to expand any variables.
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 0 |
> + | * Branches | 0 |
> + | * Tags | 0 |
> + | * Remotes | 0 |
> + | * Others | 0 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
> +
> +test_expect_success 'repository stats with references' '
Same comment regarding the name.
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m init &&
> + oid="$(git rev-parse HEAD)" &&
> + git switch -c foo &&
> + git tag init &&
> + git update-ref refs/remotes/origin/foo "$oid" &&
> + git notes add -m foo &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
Likewise regarding "\EOF".
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 5 |
> + | * Branches | 2 |
> + | * Tags | 1 |
> + | * Remotes | 1 |
> + | * Others | 1 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
Patrick
next prev parent reply other threads:[~2025-09-23 10:52 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt [this message]
2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:26 ` Patrick Steinhardt
2025-09-23 15:22 ` Karthik Nayak
2025-09-23 15:55 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:19 ` Justin Tobler
2025-09-23 15:30 ` Karthik Nayak
2025-09-23 15:56 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:26 ` Justin Tobler
2025-09-23 15:39 ` Karthik Nayak
2025-09-23 15:59 ` Justin Tobler
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:33 ` Justin Tobler
2025-09-24 4:48 ` Patrick Steinhardt
2025-09-23 15:41 ` Karthik Nayak
2025-09-23 16:02 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 5:38 ` Patrick Steinhardt
2025-09-25 13:01 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:16 ` Justin Tobler
2025-09-25 13:58 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:20 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51 ` Eric Sunshine
2025-09-26 1:38 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40 ` Junio C Hamano
2025-09-27 15:51 ` Justin Tobler
2025-09-27 23:49 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32 ` Junio C Hamano
2025-10-09 22:09 ` Justin Tobler
2025-10-10 0:42 ` Justin Tobler
2025-10-10 6:53 ` Patrick Steinhardt
2025-10-10 14:34 ` Justin Tobler
2025-10-13 6:13 ` Patrick Steinhardt
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58 ` Patrick Steinhardt
2025-10-21 16:04 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 13:50 ` Justin Tobler
2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-22 23:42 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-23 0:03 ` Justin Tobler
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23 0:05 ` Justin Tobler
2025-10-23 20:54 ` Junio C Hamano
2025-10-24 5:14 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNJ7_GoKT5ea4QJE@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).