From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, ps@pks.im, karthik.188@gmail.com,
sunshine@sunshineco.com, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
Date: Sat, 27 Sep 2025 09:32:55 -0700 [thread overview]
Message-ID: <xmqqfrc797pk.fsf@gitster.g> (raw)
In-Reply-To: <20250927145049.723341-5-jltobler@gmail.com> (Justin Tobler's message of "Sat, 27 Sep 2025 09:50:46 -0500")
Justin Tobler <jltobler@gmail.com> writes:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command.
Talking about "shape of a repository's *history*" may negatively
affect your goal here. If a project is overly mergy with many
octopus merges, it would have huge impacts on the performance to run
"git bisect" over its history, so it may be interesting to know the
ratio of the merge commits in the total commits, and also the
average number of merge parents. But after you obtain such numbers,
you cannot do anything about it, as you cannot afford to rewrite its
history only to improve the "performance and health".
And that is what makes "key stats" relative to your goal. If your
goal is to give stats on the things you can control (e.g., how long
a typical delta chain is, how many loose objects there are that can
be moved to a packfile, how small would your object database would
become if you prune all the unreachable objects), that would cut off
some stats that may still be interesting but may not contribute to
address "huge impacts on the performance and health".
With Devil's advocate hat on, a single command that gives a set of
stats that are "key" to a goal of a single use case may not be as
useful as a collection of commands, each of which gives stats on one
aspect of the repository, that can be combined to help you address
various different goals.
> To allow users to more readily identify potential issues for a
> repository, introduce the "stats" subcommand in git-repo(1) to output
> stats for the repository that may be of interest to users. The goal of
> this subcommand is to eventually provide similar functionality to
> git-sizer(1), but natively in Git.
So, it is needless to say that the kind of "stats" obtained by such
a single tool needs to be chosen carefully, but more importantly,
its output should give users actionable output, as whoever designed
such a tool and chose what "key stats" are has a clear idea on
various aspects of repository. "stats" measure the health of the
repository against certain yardstick, but it should come with a
clear instruction to make use of that measurement. The tool may say
"the stats indicate that you have commits that touch too many paths
at the same time". The users need to be know what consequence of
that finding is, and what they can do about it.
For example, what would the user do with the new knowledge that the
repository has 100x as many local branches as there are
remote-tracking branches? Without breaking down these numerous
local branches into those that are still used in active development
(hint: peek into their reflog), kept as historical landmarks, past
development that has already been merged (hint: "git branch --list
--merged origin/master"), or abandoned cruft that hasn't been
touched with some changes that are not merged anywhere, the users
would not know what to do.
> +`stats`::
> + Retrieve statistics about the current repository. The following kinds
> + of information are reported:
> ++
> +* Reference counts categorized by type
> +
> ++
> +The table output format may change and is not intended for machine parsing.
Do we eventually want to give another format that is intended for
machine parsing?
In a format meant for human consumption, is it still sensible to
target fixed-column terminals these days? Rather, would they want
prettier-formatted html, or csv that they can easily import to
spreadsheet? (these are not objections but genuine questions).
> +static void stats_table_vaddf(struct stats_table *table,
> + struct stats_table_entry *entry,
> + const char *format, va_list ap)
> +{
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> + char *formatted_name;
> + size_t name_width;
> +
> + strbuf_vaddf(&buf, format, ap);
> + formatted_name = strbuf_detach(&buf, NULL);
> + name_width = utf8_strwidth(formatted_name);
> +
> + item = string_list_append_nodup(&table->rows, formatted_name);
> + item->util = entry;
> +
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
> + if (entry) {
> + size_t value_width = utf8_strwidth(entry->value);
> + if (value_width > table->value_col_width)
> + table->value_col_width = value_width;
> + }
> +}
OK, accumulate while measuring, so that you can compute the max
width of these things before writing them out, which is quite
bog-standard way to collect data.
> +static inline size_t max_size_t(size_t a, size_t b)
> +{
> + return (a > b) ? a : b;
> +}
Heh.
> +static void stats_table_print(const struct stats_table *table)
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + size_t name_title_len = utf8_strwidth(name_col_title);
> + size_t value_title_len = utf8_strwidth(value_col_title);
> + struct string_list_item *item;
> + int name_col_width;
> + int value_col_width;
> +
> + name_col_width = cast_size_t_to_int(
> + max_size_t(table->name_col_width, name_title_len));
> + value_col_width = cast_size_t_to_int(
> + max_size_t(table->value_col_width, value_title_len));
If table->name_col_width and table->value_col_width were int to
begin with, none of these casts would have been necessary. Aren't
we overusing size_t to count things that are not memory allocations?
> + printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
> + printf("| ");
> + for (int i = 0; i < name_col_width; i++)
> + putchar('-');
> + printf(" | ");
> + for (int i = 0; i < value_col_width; i++)
> + putchar('-');
> + printf(" |\n");
I wonder if people want to use unicode "Box Drawing" block and other
fancier things, as we assume utf8 for names and values, in which
case these printf would need to be "translatable", but locale
administrators should not have more say than others what kind of
line drawing elements are to be used, so perhaps the above is good
enough at least for now.
next prev parent reply other threads:[~2025-09-27 16:32 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:26 ` Patrick Steinhardt
2025-09-23 15:22 ` Karthik Nayak
2025-09-23 15:55 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:19 ` Justin Tobler
2025-09-23 15:30 ` Karthik Nayak
2025-09-23 15:56 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:26 ` Justin Tobler
2025-09-23 15:39 ` Karthik Nayak
2025-09-23 15:59 ` Justin Tobler
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:33 ` Justin Tobler
2025-09-24 4:48 ` Patrick Steinhardt
2025-09-23 15:41 ` Karthik Nayak
2025-09-23 16:02 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 5:38 ` Patrick Steinhardt
2025-09-25 13:01 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:16 ` Justin Tobler
2025-09-25 13:58 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:20 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51 ` Eric Sunshine
2025-09-26 1:38 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40 ` Junio C Hamano
2025-09-27 15:51 ` Justin Tobler
2025-09-27 23:49 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32 ` Junio C Hamano [this message]
2025-10-09 22:09 ` Justin Tobler
2025-10-10 0:42 ` Justin Tobler
2025-10-10 6:53 ` Patrick Steinhardt
2025-10-10 14:34 ` Justin Tobler
2025-10-13 6:13 ` Patrick Steinhardt
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58 ` Patrick Steinhardt
2025-10-21 16:04 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 13:50 ` Justin Tobler
2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-22 23:42 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-23 0:03 ` Justin Tobler
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23 0:05 ` Justin Tobler
2025-10-23 20:54 ` Junio C Hamano
2025-10-24 5:14 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqfrc797pk.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).