git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com
Subject: Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
Date: Tue, 23 Sep 2025 12:52:44 +0200	[thread overview]
Message-ID: <aNJ7_GoKT5ea4QJE@pks.im> (raw)
In-Reply-To: <20250923025700.3046260-2-jltobler@gmail.com>

On Mon, Sep 22, 2025 at 09:56:57PM -0500, Justin Tobler wrote:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command. Acquiring this information requires
> users to be fairly knowledgeable about the structure of a Git repository
> and how to identify the relevant data points. To fill this gap,
> supplemental tools such as git-sizer(1) have been developed.
> 
> To allow users to more readily identify potential issues for a
> repository, introduce the "stats" subcommand in git-repo(1) to output
> stats for the repository that may be of interest to users. The goal of
> this subcommand is to eventually provide similar functionality to
> git-sizer(1), but in Git natively.

Nit: "but natively in Git" would read more natural.

> The initial version of this command only iterates through all references
> in the repository and tracks the count of branches, tags, remotes, and

s/remotes/remote refs/

> other reference types. The corresponding information is displayed in a
> human-friendly table formatted in a very similar manner to git-sizer(1).
> The width of each table column is adjusted automatically to satisfy the
> requirements of the widest row contained.
> 
> Subsequent commits will surface additional relevant data points to
> output.
> 
> Signed-off-by: Justin Tobler <jltobler@gmail.com>

Is this command built on Derrick's git-survey(1)? If so, it would
probably be nice to add a "Based-on-patch-by" tag.

> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 209afd1b61..7762329551 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -43,6 +44,12 @@ supported:
>  +
>  `-z` is an alias for `--format=nul`.
>  
> +stats::
> +	Retrieve stats about the current repository. All references in the

s/stats/statistics/

> +	repository are categorized and counted accordingly.
> ++
> +The table output format may change and is not intended for machine parsing.

I guess we don't yet have a machine-parseable interface in this state,
so we cannot ponit to it.

> @@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
>  	return print_fields(argc, argv, repo, format);
>  }
>  
> +struct stats {
> +	size_t branches;
> +	size_t remotes;
> +	size_t tags;
> +	size_t others;
> +};
> +
> +struct stats_table {
> +	struct string_list rows;
> +
> +	int name_col_width;
> +	int value_col_width;

You assign the result from `strlen()` to these fields, so they should
probably be `size_t`.

> +};
> +
> +struct stats_table_entry {
> +	char *value;
> +};
> +
> +static void stats_table_add(struct stats_table *table, const char *name,
> +			    struct stats_table_entry *entry)
> +{
> +	int name_width = strlen(name);

`strlen()` returns `size_t`.

> +	struct string_list_item *item;
> +
> +	item = string_list_append(&table->rows, name);
> +	item->util = entry;
> +
> +	if (name_width > table->name_col_width)
> +		table->name_col_width = name_width;
> +	if (entry) {
> +		int value_width = strlen(entry->value);
> +		if (value_width > table->value_col_width)
> +			table->value_col_width = value_width;
> +	}

I was wondering at first why you'd ever want to not pass an entry, but
we use that to have "dividers" in the table. Makes sense.

> +}
> +
> +static void stats_table_add_count(struct stats_table *table, const char *name,
> +				  size_t value)
> +{
> +	struct stats_table_entry *entry;
> +
> +	CALLOC_ARRAY(entry, 1);
> +	entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> +	stats_table_add(table, name, entry);
> +}
> +
> +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> +{
> +	size_t ref_total;
> +
> +	ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> +	stats_table_add(table, _("* References"), NULL);
> +	stats_table_add_count(table, _("  * Count"), ref_total);
> +	stats_table_add_count(table, _("    * Branches"), stats->branches);
> +	stats_table_add_count(table, _("    * Tags"), stats->tags);
> +	stats_table_add_count(table, _("    * Remotes"), stats->remotes);
> +	stats_table_add_count(table, _("    * Others"), stats->others);
> +}

Would it make sense to not translate the formatting directives, but only
the actual words?

> +static void stats_table_print(struct stats_table *table)
> +{
> +	const char *name_col_title = _("Repository stats");
> +	const char *value_col_title = _("Value");
> +	int name_col_width = strlen(name_col_title);
> +	int value_col_width = strlen(value_col_title);

These should both be `size_t`.

> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +
> +	if (table->name_col_width > name_col_width)
> +		name_col_width = table->name_col_width;
> +	if (table->value_col_width > value_col_width)
> +		value_col_width = table->value_col_width;
> +
> +	strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> +		    value_col_width, value_col_title);

Aha, that's why you went with `int`. You can use `cast_size_to_to_int()`
to convert between the types.

> +	strbuf_addstr(&buf, "| ");
> +	strbuf_addchars(&buf, '-', name_col_width);
> +	strbuf_addstr(&buf, " | ");
> +	strbuf_addchars(&buf, '-', value_col_width);
> +	strbuf_addstr(&buf, " |\n");
> +
> +	for_each_string_list_item (item, &table->rows) {

We typically don't have a space after between the macro and its
arguments.

> +		struct stats_table_entry *entry = item->util;
> +		const char *value = "";
> +
> +		if (entry) {
> +			struct stats_table_entry *entry = item->util;
> +			value = entry->value;
> +		}
> +
> +		strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> +			    item->string, value_col_width, value);
> +
> +		if (entry)
> +			free(entry->value);

It's a bit weird that we free the values when we pretend to only print
data. Sure, we probably don't ever have a usecase where we want to print
data a second time. But I still think it would be nice to separate
concerns.

> +	}
> +
> +	fputs(buf.buf, stdout);
> +	strbuf_release(&buf);
> +}
> +
> +static void stats_count_references(struct stats *stats, struct ref_array *refs)
> +{
> +	for (int i = 0; i < refs->nr; i++) {
> +		struct ref_array_item *ref = refs->items[i];
> +
> +		switch (ref->kind) {
> +		case FILTER_REFS_BRANCHES:
> +			stats->branches++;
> +			break;
> +		case FILTER_REFS_REMOTES:
> +			stats->remotes++;
> +			break;
> +		case FILTER_REFS_TAGS:
> +			stats->tags++;
> +			break;
> +		case FILTER_REFS_OTHERS:
> +			stats->others++;
> +			break;

Do we want to have a `default:` case where we `BUG()`? Otherwise we may
not notice that we undercount the overall number of refs.

> +		}
> +	}
> +}
> +
> +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> +		      const char *prefix UNUSED, struct repository *repo UNUSED)

Not a new issue, but I'd rather call this `cmd_repo_stats()` to note
that this is the entrypoint. We might as well adapt the other subcommand
to follow that naming schema in a preparatory commit.

> +{
> +	struct ref_filter filter = REF_FILTER_INIT;
> +	struct strvec ref_patterns = STRVEC_INIT;
> +	struct stats_table table = { 0 };
> +	struct ref_array refs = { 0 };
> +	struct stats stats = { 0 };
> +
> +	filter.name_patterns = ref_patterns.v;
> +	filter_refs(&refs, &filter, FILTER_REFS_REGULAR);

`filter_refs()` may return an error code which we should probably
handle.

> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> new file mode 100755
> index 0000000000..27c32ec45f
> --- /dev/null
> +++ b/t/t1901-repo-stats.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +
> +test_description='test git repo stats'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'empty repository stats' '

Nit: I don't think it's necessary to repeat "repository stats" in every
test name. That's already clear from the test suite.

> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git repo stats >out 2>err &&
> +
> +		cat >expect <<-EOF &&

s/EOF/\EOF/, as we don't need to expand any variables.

> +		| Repository stats | Value |
> +		| ---------------- | ----- |
> +		| * References     |       |
> +		|   * Count        |     0 |
> +		|     * Branches   |     0 |
> +		|     * Tags       |     0 |
> +		|     * Remotes    |     0 |
> +		|     * Others     |     0 |
> +		EOF
> +
> +		test_cmp expect out &&
> +		test_line_count = 0 err
> +	)
> +'
> +
> +test_expect_success 'repository stats with references' '

Same comment regarding the name.

> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m init &&
> +		oid="$(git rev-parse HEAD)" &&
> +		git switch -c foo &&
> +		git tag init &&
> +		git update-ref refs/remotes/origin/foo "$oid" &&
> +		git notes add -m foo &&
> +		git repo stats >out 2>err &&
> +
> +		cat >expect <<-EOF &&

Likewise regarding "\EOF".

> +		| Repository stats | Value |
> +		| ---------------- | ----- |
> +		| * References     |       |
> +		|   * Count        |     5 |
> +		|     * Branches   |     2 |
> +		|     * Tags       |     1 |
> +		|     * Remotes    |     1 |
> +		|     * Others     |     1 |
> +		EOF
> +
> +		test_cmp expect out &&
> +		test_line_count = 0 err
> +	)
> +'

Patrick

  reply	other threads:[~2025-09-23 10:52 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23  2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23  2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52   ` Patrick Steinhardt [this message]
2025-09-23 15:10     ` Justin Tobler
2025-09-23 15:26       ` Patrick Steinhardt
2025-09-23 15:22   ` Karthik Nayak
2025-09-23 15:55     ` Justin Tobler
2025-09-23  2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52   ` Patrick Steinhardt
2025-09-23 15:19     ` Justin Tobler
2025-09-23 15:30   ` Karthik Nayak
2025-09-23 15:56     ` Justin Tobler
2025-09-23  2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53   ` Patrick Steinhardt
2025-09-23 15:26     ` Justin Tobler
2025-09-23 15:39   ` Karthik Nayak
2025-09-23 15:59     ` Justin Tobler
2025-09-23  2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53   ` Patrick Steinhardt
2025-09-23 15:33     ` Justin Tobler
2025-09-24  4:48       ` Patrick Steinhardt
2025-09-23 15:41   ` Karthik Nayak
2025-09-23 16:02     ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24   ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24   ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24   ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25  5:38     ` Patrick Steinhardt
2025-09-25 13:01       ` Justin Tobler
2025-09-24 21:24   ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24   ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25  5:39     ` Patrick Steinhardt
2025-09-25 13:16       ` Justin Tobler
2025-09-25 13:58         ` Patrick Steinhardt
2025-09-24 21:24   ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25  5:39     ` Patrick Steinhardt
2025-09-25 13:20       ` Justin Tobler
2025-09-25 23:29   ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29     ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29     ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29     ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29     ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51       ` Eric Sunshine
2025-09-26  1:38         ` Justin Tobler
2025-09-25 23:29     ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29     ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29     ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50     ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50       ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50       ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50       ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40         ` Junio C Hamano
2025-09-27 15:51           ` Justin Tobler
2025-09-27 23:49             ` Junio C Hamano
2025-09-27 14:50       ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32         ` Junio C Hamano
2025-10-09 22:09           ` Justin Tobler
2025-10-10  0:42             ` Justin Tobler
2025-10-10  6:53               ` Patrick Steinhardt
2025-10-10 14:34                 ` Justin Tobler
2025-10-13  6:13                   ` Patrick Steinhardt
2025-09-27 14:50       ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50       ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50       ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33       ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12       ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12         ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12         ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12         ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58           ` Patrick Steinhardt
2025-10-21 16:04             ` Justin Tobler
2025-10-15 21:12         ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12         ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12         ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25         ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25           ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25           ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25           ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25           ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22  5:01             ` Patrick Steinhardt
2025-10-22 13:50               ` Justin Tobler
2025-10-22 20:15             ` Lucas Seiki Oshiro
2025-10-22 23:42               ` Justin Tobler
2025-10-21 18:25           ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26           ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34             ` Lucas Seiki Oshiro
2025-10-23  0:03               ` Justin Tobler
2025-10-21 18:26           ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23           ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23  0:05             ` Justin Tobler
2025-10-23 20:54           ` Junio C Hamano
2025-10-24  5:14             ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNJ7_GoKT5ea4QJE@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).