From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com
Subject: Re: [PATCH 2/4] builtin/repo: add object counts in stats output
Date: Tue, 23 Sep 2025 12:52:54 +0200 [thread overview]
Message-ID: <aNJ8BvTZ_yNSrBA6@pks.im> (raw)
In-Reply-To: <20250923025700.3046260-3-jltobler@gmail.com>
On Mon, Sep 22, 2025 at 09:56:58PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 7762329551..2a67abfca8 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -45,8 +45,9 @@ supported:
> `-z` is an alias for `--format=nul`.
>
> stats::
> - Retrieve stats about the current repository. All references in the
> - repository are categorized and counted accordingly.
> + Retrieve stats about the current repository. All references and
> + reachable objects in the repository are categorized and counted
> + accordingly.
> +
> The table output format may change and is not intended for machine parsing.
I already wanted to mention this on the first commit, but would it maybe
make sense if this was a bulleted list of information that we surface
right from the start? Then we don't have to reflow the whole paragraph
every time we surface new information.
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 15899dd74c..a24ea0e66b 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> -struct stats {
> +struct ref_stats {
Nit: let's call it `ref_stats` right from the start instead of renaming.
> size_t branches;
> size_t remotes;
> size_t tags;
> size_t others;
> };
>
> +struct object_stats {
> + size_t tags;
> + size_t commits;
> + size_t trees;
> + size_t blobs;
> +};
> +
> +struct stats {
I'd maybe call this `struct repo_stats`. `stats` feels quite generic and
very close to a collision with `struct stat`.
> @@ -207,15 +221,27 @@ static void stats_table_add_count(struct stats_table *table, const char *name,
>
> static void stats_table_setup(struct stats_table *table, struct stats *stats)
> {
> + struct object_stats objects = stats->objects;
> + struct ref_stats refs = stats->refs;
We can avoid the copies by making these pointers. Not that it'd really
matter all that much.
> + size_t object_total;
> size_t ref_total;
>
> - ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> + ref_total = refs.branches + refs.remotes + refs.tags + refs.others;
> stats_table_add(table, _("* References"), NULL);
> stats_table_add_count(table, _(" * Count"), ref_total);
> - stats_table_add_count(table, _(" * Branches"), stats->branches);
> - stats_table_add_count(table, _(" * Tags"), stats->tags);
> - stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> - stats_table_add_count(table, _(" * Others"), stats->others);
> + stats_table_add_count(table, _(" * Branches"), refs.branches);
> + stats_table_add_count(table, _(" * Tags"), refs.tags);
> + stats_table_add_count(table, _(" * Remotes"), refs.remotes);
> + stats_table_add_count(table, _(" * Others"), refs.others);
> +
> + object_total = objects.commits + objects.trees + objects.blobs + objects.tags;
> + stats_table_add(table, "", NULL);
> + stats_table_add(table, _("* Objects"), NULL);
Should we maybe say "Reachable objects" here to clarify that this
doesn't count unreachable ones?
> @@ -282,25 +308,80 @@ static void stats_count_references(struct stats *stats, struct ref_array *refs)
> }
> }
>
> +static int count_objects(const char *path UNUSED, struct oid_array *oids,
> + enum object_type type, void *data)
> +{
> + struct object_stats *stats = data;
> +
> + switch (type) {
> + case OBJ_TAG:
> + stats->tags += oids->nr;
> + break;
> + case OBJ_COMMIT:
> + stats->commits += oids->nr;
> + break;
> + case OBJ_TREE:
> + stats->trees += oids->nr;
> + break;
> + case OBJ_BLOB:
> + stats->blobs += oids->nr;
> + break;
> + default:
Let's `BUG()` here. This case should never happen, and if it does
something is seriously wrong.
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static void stats_count_objects(struct object_stats *stats,
> + struct ref_array *refs, struct rev_info *revs)
> +{
> + struct path_walk_info info = PATH_WALK_INFO_INIT;
> +
> + info.revs = revs;
> + info.path_fn = count_objects;
> + info.path_fn_data = stats;
> +
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + case FILTER_REFS_TAGS:
> + case FILTER_REFS_REMOTES:
> + case FILTER_REFS_OTHERS:
> + add_pending_oid(revs, NULL, &ref->objectname, 0);
> + break;
> + }
> + }
> +
> + walk_objects_by_path(&info);
> + path_walk_info_clear(&info);
> +}
I guess this can take a while, so having a progress meter would be great
to have to give the user some info what's happening. I guess it doesn't
have to be part of the first iteration thuogh as long as this is
something we plan to add at a later point.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> index 27c32ec45f..c6a7f08be5 100755
> --- a/t/t1901-repo-stats.sh
> +++ b/t/t1901-repo-stats.sh
> @@ -20,6 +20,13 @@ test_expect_success 'empty repository stats' '
> | * Tags | 0 |
> | * Remotes | 0 |
> | * Others | 0 |
> + | | |
> + | * Objects | |
> + | * Count | 0 |
> + | * Commits | 0 |
> + | * Trees | 0 |
> + | * Blobs | 0 |
> + | * Tags | 0 |
> EOF
>
> test_cmp expect out &&
> @@ -49,6 +56,45 @@ test_expect_success 'repository stats with references' '
> | * Tags | 1 |
> | * Remotes | 1 |
> | * Others | 1 |
> + | | |
> + | * Objects | |
> + | * Count | 5 |
> + | * Commits | 2 |
> + | * Trees | 2 |
> + | * Blobs | 1 |
> + | * Tags | 0 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
> +
> +test_expect_success 'repository stats with objects' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit_bulk 42 &&
> + git tag -a foo -m bar &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 2 |
> + | * Branches | 1 |
> + | * Tags | 1 |
> + | * Remotes | 0 |
> + | * Others | 0 |
> + | | |
> + | * Objects | |
> + | * Count | 127 |
> + | * Commits | 42 |
> + | * Trees | 42 |
> + | * Blobs | 42 |
> + | * Tags | 1 |
> EOF
I quite like the output format, by the way. It's nice to read and makes
it sufficiently clear that this is not expected to be parsed by a
machine.
Patrick
next prev parent reply other threads:[~2025-09-23 10:52 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:26 ` Patrick Steinhardt
2025-09-23 15:22 ` Karthik Nayak
2025-09-23 15:55 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt [this message]
2025-09-23 15:19 ` Justin Tobler
2025-09-23 15:30 ` Karthik Nayak
2025-09-23 15:56 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:26 ` Justin Tobler
2025-09-23 15:39 ` Karthik Nayak
2025-09-23 15:59 ` Justin Tobler
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:33 ` Justin Tobler
2025-09-24 4:48 ` Patrick Steinhardt
2025-09-23 15:41 ` Karthik Nayak
2025-09-23 16:02 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 5:38 ` Patrick Steinhardt
2025-09-25 13:01 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:16 ` Justin Tobler
2025-09-25 13:58 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:20 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51 ` Eric Sunshine
2025-09-26 1:38 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40 ` Junio C Hamano
2025-09-27 15:51 ` Justin Tobler
2025-09-27 23:49 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32 ` Junio C Hamano
2025-10-09 22:09 ` Justin Tobler
2025-10-10 0:42 ` Justin Tobler
2025-10-10 6:53 ` Patrick Steinhardt
2025-10-10 14:34 ` Justin Tobler
2025-10-13 6:13 ` Patrick Steinhardt
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58 ` Patrick Steinhardt
2025-10-21 16:04 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 13:50 ` Justin Tobler
2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-22 23:42 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-23 0:03 ` Justin Tobler
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23 0:05 ` Justin Tobler
2025-10-23 20:54 ` Junio C Hamano
2025-10-24 5:14 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNJ8BvTZ_yNSrBA6@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).