git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com
Subject: Re: [PATCH 2/4] builtin/repo: add object counts in stats output
Date: Tue, 23 Sep 2025 12:52:54 +0200	[thread overview]
Message-ID: <aNJ8BvTZ_yNSrBA6@pks.im> (raw)
In-Reply-To: <20250923025700.3046260-3-jltobler@gmail.com>

On Mon, Sep 22, 2025 at 09:56:58PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 7762329551..2a67abfca8 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -45,8 +45,9 @@ supported:
>  `-z` is an alias for `--format=nul`.
>  
>  stats::
> -	Retrieve stats about the current repository. All references in the
> -	repository are categorized and counted accordingly.
> +	Retrieve stats about the current repository. All references and
> +	reachable objects in the repository are categorized and counted
> +	accordingly.
>  +
>  The table output format may change and is not intended for machine parsing.

I already wanted to mention this on the first commit, but would it maybe
make sense if this was a bulleted list of information that we surface
right from the start? Then we don't have to reflow the whole paragraph
every time we surface new information.

> diff --git a/builtin/repo.c b/builtin/repo.c
> index 15899dd74c..a24ea0e66b 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
>  	return print_fields(argc, argv, repo, format);
>  }
>  
> -struct stats {
> +struct ref_stats {

Nit: let's call it `ref_stats` right from the start instead of renaming.

>  	size_t branches;
>  	size_t remotes;
>  	size_t tags;
>  	size_t others;
>  };
>  
> +struct object_stats {
> +	size_t tags;
> +	size_t commits;
> +	size_t trees;
> +	size_t blobs;
> +};
> +
> +struct stats {

I'd maybe call this `struct repo_stats`. `stats` feels quite generic and
very close to a collision with `struct stat`.

> @@ -207,15 +221,27 @@ static void stats_table_add_count(struct stats_table *table, const char *name,
>  
>  static void stats_table_setup(struct stats_table *table, struct stats *stats)
>  {
> +	struct object_stats objects = stats->objects;
> +	struct ref_stats refs = stats->refs;

We can avoid the copies by making these pointers. Not that it'd really
matter all that much.

> +	size_t object_total;
>  	size_t ref_total;
>  
> -	ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> +	ref_total = refs.branches + refs.remotes + refs.tags + refs.others;
>  	stats_table_add(table, _("* References"), NULL);
>  	stats_table_add_count(table, _("  * Count"), ref_total);
> -	stats_table_add_count(table, _("    * Branches"), stats->branches);
> -	stats_table_add_count(table, _("    * Tags"), stats->tags);
> -	stats_table_add_count(table, _("    * Remotes"), stats->remotes);
> -	stats_table_add_count(table, _("    * Others"), stats->others);
> +	stats_table_add_count(table, _("    * Branches"), refs.branches);
> +	stats_table_add_count(table, _("    * Tags"), refs.tags);
> +	stats_table_add_count(table, _("    * Remotes"), refs.remotes);
> +	stats_table_add_count(table, _("    * Others"), refs.others);
> +
> +	object_total = objects.commits + objects.trees + objects.blobs + objects.tags;
> +	stats_table_add(table, "", NULL);
> +	stats_table_add(table, _("* Objects"), NULL);

Should we maybe say "Reachable objects" here to clarify that this
doesn't count unreachable ones?

> @@ -282,25 +308,80 @@ static void stats_count_references(struct stats *stats, struct ref_array *refs)
>  	}
>  }
>  
> +static int count_objects(const char *path UNUSED, struct oid_array *oids,
> +			 enum object_type type, void *data)
> +{
> +	struct object_stats *stats = data;
> +
> +	switch (type) {
> +	case OBJ_TAG:
> +		stats->tags += oids->nr;
> +		break;
> +	case OBJ_COMMIT:
> +		stats->commits += oids->nr;
> +		break;
> +	case OBJ_TREE:
> +		stats->trees += oids->nr;
> +		break;
> +	case OBJ_BLOB:
> +		stats->blobs += oids->nr;
> +		break;
> +	default:

Let's `BUG()` here. This case should never happen, and if it does
something is seriously wrong.

> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static void stats_count_objects(struct object_stats *stats,
> +				struct ref_array *refs, struct rev_info *revs)
> +{
> +	struct path_walk_info info = PATH_WALK_INFO_INIT;
> +
> +	info.revs = revs;
> +	info.path_fn = count_objects;
> +	info.path_fn_data = stats;
> +
> +	for (int i = 0; i < refs->nr; i++) {
> +		struct ref_array_item *ref = refs->items[i];
> +
> +		switch (ref->kind) {
> +		case FILTER_REFS_BRANCHES:
> +		case FILTER_REFS_TAGS:
> +		case FILTER_REFS_REMOTES:
> +		case FILTER_REFS_OTHERS:
> +			add_pending_oid(revs, NULL, &ref->objectname, 0);
> +			break;
> +		}
> +	}
> +
> +	walk_objects_by_path(&info);
> +	path_walk_info_clear(&info);
> +}

I guess this can take a while, so having a progress meter would be great
to have to give the user some info what's happening. I guess it doesn't
have to be part of the first iteration thuogh as long as this is
something we plan to add at a later point.

> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> index 27c32ec45f..c6a7f08be5 100755
> --- a/t/t1901-repo-stats.sh
> +++ b/t/t1901-repo-stats.sh
> @@ -20,6 +20,13 @@ test_expect_success 'empty repository stats' '
>  		|     * Tags       |     0 |
>  		|     * Remotes    |     0 |
>  		|     * Others     |     0 |
> +		|                  |       |
> +		| * Objects        |       |
> +		|   * Count        |     0 |
> +		|     * Commits    |     0 |
> +		|     * Trees      |     0 |
> +		|     * Blobs      |     0 |
> +		|     * Tags       |     0 |
>  		EOF
>  
>  		test_cmp expect out &&
> @@ -49,6 +56,45 @@ test_expect_success 'repository stats with references' '
>  		|     * Tags       |     1 |
>  		|     * Remotes    |     1 |
>  		|     * Others     |     1 |
> +		|                  |       |
> +		| * Objects        |       |
> +		|   * Count        |     5 |
> +		|     * Commits    |     2 |
> +		|     * Trees      |     2 |
> +		|     * Blobs      |     1 |
> +		|     * Tags       |     0 |
> +		EOF
> +
> +		test_cmp expect out &&
> +		test_line_count = 0 err
> +	)
> +'
> +
> +test_expect_success 'repository stats with objects' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit_bulk 42 &&
> +		git tag -a foo -m bar &&
> +		git repo stats >out 2>err &&
> +
> +		cat >expect <<-EOF &&
> +		| Repository stats | Value |
> +		| ---------------- | ----- |
> +		| * References     |       |
> +		|   * Count        |     2 |
> +		|     * Branches   |     1 |
> +		|     * Tags       |     1 |
> +		|     * Remotes    |     0 |
> +		|     * Others     |     0 |
> +		|                  |       |
> +		| * Objects        |       |
> +		|   * Count        |   127 |
> +		|     * Commits    |    42 |
> +		|     * Trees      |    42 |
> +		|     * Blobs      |    42 |
> +		|     * Tags       |     1 |
>  		EOF

I quite like the output format, by the way. It's nice to read and makes
it sufficiently clear that this is not expected to be parsed by a
machine.

Patrick

  reply	other threads:[~2025-09-23 10:52 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23  2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23  2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52   ` Patrick Steinhardt
2025-09-23 15:10     ` Justin Tobler
2025-09-23 15:26       ` Patrick Steinhardt
2025-09-23 15:22   ` Karthik Nayak
2025-09-23 15:55     ` Justin Tobler
2025-09-23  2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52   ` Patrick Steinhardt [this message]
2025-09-23 15:19     ` Justin Tobler
2025-09-23 15:30   ` Karthik Nayak
2025-09-23 15:56     ` Justin Tobler
2025-09-23  2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53   ` Patrick Steinhardt
2025-09-23 15:26     ` Justin Tobler
2025-09-23 15:39   ` Karthik Nayak
2025-09-23 15:59     ` Justin Tobler
2025-09-23  2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53   ` Patrick Steinhardt
2025-09-23 15:33     ` Justin Tobler
2025-09-24  4:48       ` Patrick Steinhardt
2025-09-23 15:41   ` Karthik Nayak
2025-09-23 16:02     ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24   ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24   ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24   ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25  5:38     ` Patrick Steinhardt
2025-09-25 13:01       ` Justin Tobler
2025-09-24 21:24   ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24   ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25  5:39     ` Patrick Steinhardt
2025-09-25 13:16       ` Justin Tobler
2025-09-25 13:58         ` Patrick Steinhardt
2025-09-24 21:24   ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25  5:39     ` Patrick Steinhardt
2025-09-25 13:20       ` Justin Tobler
2025-09-25 23:29   ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29     ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29     ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29     ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29     ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51       ` Eric Sunshine
2025-09-26  1:38         ` Justin Tobler
2025-09-25 23:29     ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29     ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29     ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50     ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50       ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50       ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50       ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40         ` Junio C Hamano
2025-09-27 15:51           ` Justin Tobler
2025-09-27 23:49             ` Junio C Hamano
2025-09-27 14:50       ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32         ` Junio C Hamano
2025-10-09 22:09           ` Justin Tobler
2025-10-10  0:42             ` Justin Tobler
2025-10-10  6:53               ` Patrick Steinhardt
2025-10-10 14:34                 ` Justin Tobler
2025-10-13  6:13                   ` Patrick Steinhardt
2025-09-27 14:50       ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50       ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50       ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33       ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12       ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12         ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12         ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12         ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58           ` Patrick Steinhardt
2025-10-21 16:04             ` Justin Tobler
2025-10-15 21:12         ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12         ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12         ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25         ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25           ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25           ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25           ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25           ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22  5:01             ` Patrick Steinhardt
2025-10-22 13:50               ` Justin Tobler
2025-10-22 20:15             ` Lucas Seiki Oshiro
2025-10-22 23:42               ` Justin Tobler
2025-10-21 18:25           ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26           ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34             ` Lucas Seiki Oshiro
2025-10-23  0:03               ` Justin Tobler
2025-10-21 18:26           ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23           ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23  0:05             ` Justin Tobler
2025-10-23 20:54           ` Junio C Hamano
2025-10-24  5:14             ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNJ8BvTZ_yNSrBA6@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).