* [PATCH 0/4] builtin/repo: introduce stats subcommand
@ 2025-09-23 2:56 Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
` (4 more replies)
0 siblings, 5 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 2:56 UTC (permalink / raw)
To: git; +Cc: karthik.188, Justin Tobler
Greetings,
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but in Git natively.
In this initial version, the "stats" subcommand only surfaces counts of
the various reference and object types in a repository. In a follow-up
series, I would like to introduce additional data points that are
present in git-sizer(1) such as largest objects, combined object sizes
by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
- A progress meter to provide better user feedback while the repository
is being evaluated.
Thanks,
-Justin
Justin Tobler (4):
builtin/repo: introduce stats subcommand
builtin/repo: add object counts in stats output
builtin/repo: add keyvalue format for stats
builtin/repo: add nul format for stats
Documentation/git-repo.adoc | 22 +++
builtin/repo.c | 287 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-stats.sh | 157 ++++++++++++++++++++
4 files changed, 467 insertions(+)
create mode 100755 t/t1901-repo-stats.sh
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-23 2:56 ` Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:22 ` Karthik Nayak
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
` (3 subsequent siblings)
4 siblings, 2 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 2:56 UTC (permalink / raw)
To: git; +Cc: karthik.188, Justin Tobler
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but in Git natively.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remotes, and
other reference types. The corresponding information is displayed in a
human-friendly table formatted in a very similar manner to git-sizer(1).
The width of each table column is adjusted automatically to satisfy the
requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 7 ++
builtin/repo.c | 151 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-stats.sh | 59 ++++++++++++++
4 files changed, 218 insertions(+)
create mode 100755 t/t1901-repo-stats.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..7762329551 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo stats
DESCRIPTION
-----------
@@ -43,6 +44,12 @@ supported:
+
`-z` is an alias for `--format=nul`.
+stats::
+ Retrieve stats about the current repository. All references in the
+ repository are categorized and counted accordingly.
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..15899dd74c 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,15 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo stats",
NULL
};
@@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ int name_col_width;
+ int value_col_width;
+};
+
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_add(struct stats_table *table, const char *name,
+ struct stats_table_entry *entry)
+{
+ int name_width = strlen(name);
+ struct string_list_item *item;
+
+ item = string_list_append(&table->rows, name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ int value_width = strlen(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_add_count(struct stats_table *table, const char *name,
+ size_t value)
+{
+ struct stats_table_entry *entry;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+ stats_table_add(table, name, entry);
+}
+
+static void stats_table_setup(struct stats_table *table, struct stats *stats)
+{
+ size_t ref_total;
+
+ ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
+ stats_table_add(table, _("* References"), NULL);
+ stats_table_add_count(table, _(" * Count"), ref_total);
+ stats_table_add_count(table, _(" * Branches"), stats->branches);
+ stats_table_add_count(table, _(" * Tags"), stats->tags);
+ stats_table_add_count(table, _(" * Remotes"), stats->remotes);
+ stats_table_add_count(table, _(" * Others"), stats->others);
+}
+
+static void stats_table_print(struct stats_table *table)
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
+ int name_col_width = strlen(name_col_title);
+ int value_col_width = strlen(value_col_title);
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+
+ if (table->name_col_width > name_col_width)
+ name_col_width = table->name_col_width;
+ if (table->value_col_width > value_col_width)
+ value_col_width = table->value_col_width;
+
+ strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ strbuf_addstr(&buf, "| ");
+ strbuf_addchars(&buf, '-', name_col_width);
+ strbuf_addstr(&buf, " | ");
+ strbuf_addchars(&buf, '-', value_col_width);
+ strbuf_addstr(&buf, " |\n");
+
+ for_each_string_list_item (item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
+ item->string, value_col_width, value);
+
+ if (entry)
+ free(entry->value);
+ }
+
+ fputs(buf.buf, stdout);
+ strbuf_release(&buf);
+}
+
+static void stats_count_references(struct stats *stats, struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ }
+ }
+}
+
+static int repo_stats(int argc UNUSED, const char **argv UNUSED,
+ const char *prefix UNUSED, struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct strvec ref_patterns = STRVEC_INIT;
+ struct stats_table table = { 0 };
+ struct ref_array refs = { 0 };
+ struct stats stats = { 0 };
+
+ filter.name_patterns = ref_patterns.v;
+ filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
+
+ stats_count_references(&stats, &refs);
+
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+
+ string_list_clear(&table.rows, 1);
+ strvec_clear(&ref_patterns);
+ ref_array_clear(&refs);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("stats", &fn, repo_stats),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..071d4a5112 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-stats.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
new file mode 100755
index 0000000000..27c32ec45f
--- /dev/null
+++ b/t/t1901-repo-stats.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='test git repo stats'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository stats' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git repo stats >out 2>err &&
+
+ cat >expect <<-EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository stats with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ oid="$(git rev-parse HEAD)" &&
+ git switch -c foo &&
+ git tag init &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+ git notes add -m foo &&
+ git repo stats >out 2>err &&
+
+ cat >expect <<-EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 5 |
+ | * Branches | 2 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH 2/4] builtin/repo: add object counts in stats output
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
@ 2025-09-23 2:56 ` Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:30 ` Karthik Nayak
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
` (2 subsequent siblings)
4 siblings, 2 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 2:56 UTC (permalink / raw)
To: git; +Cc: karthik.188, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 5 +-
builtin/repo.c | 99 +++++++++++++++++++++++++++++++++----
t/t1901-repo-stats.sh | 46 +++++++++++++++++
3 files changed, 139 insertions(+), 11 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 7762329551..2a67abfca8 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -45,8 +45,9 @@ supported:
`-z` is an alias for `--format=nul`.
stats::
- Retrieve stats about the current repository. All references in the
- repository are categorized and counted accordingly.
+ Retrieve stats about the current repository. All references and
+ reachable objects in the repository are categorized and counted
+ accordingly.
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index 15899dd74c..a24ea0e66b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
-struct stats {
+struct ref_stats {
size_t branches;
size_t remotes;
size_t tags;
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct stats {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -207,15 +221,27 @@ static void stats_table_add_count(struct stats_table *table, const char *name,
static void stats_table_setup(struct stats_table *table, struct stats *stats)
{
+ struct object_stats objects = stats->objects;
+ struct ref_stats refs = stats->refs;
+ size_t object_total;
size_t ref_total;
- ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
+ ref_total = refs.branches + refs.remotes + refs.tags + refs.others;
stats_table_add(table, _("* References"), NULL);
stats_table_add_count(table, _(" * Count"), ref_total);
- stats_table_add_count(table, _(" * Branches"), stats->branches);
- stats_table_add_count(table, _(" * Tags"), stats->tags);
- stats_table_add_count(table, _(" * Remotes"), stats->remotes);
- stats_table_add_count(table, _(" * Others"), stats->others);
+ stats_table_add_count(table, _(" * Branches"), refs.branches);
+ stats_table_add_count(table, _(" * Tags"), refs.tags);
+ stats_table_add_count(table, _(" * Remotes"), refs.remotes);
+ stats_table_add_count(table, _(" * Others"), refs.others);
+
+ object_total = objects.commits + objects.trees + objects.blobs + objects.tags;
+ stats_table_add(table, "", NULL);
+ stats_table_add(table, _("* Objects"), NULL);
+ stats_table_add_count(table, _(" * Count"), object_total);
+ stats_table_add_count(table, _(" * Commits"), objects.commits);
+ stats_table_add_count(table, _(" * Trees"), objects.trees);
+ stats_table_add_count(table, _(" * Blobs"), objects.blobs);
+ stats_table_add_count(table, _(" * Tags"), objects.tags);
}
static void stats_table_print(struct stats_table *table)
@@ -260,7 +286,7 @@ static void stats_table_print(struct stats_table *table)
strbuf_release(&buf);
}
-static void stats_count_references(struct stats *stats, struct ref_array *refs)
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -282,25 +308,80 @@ static void stats_count_references(struct stats *stats, struct ref_array *refs)
}
}
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *data)
+{
+ struct object_stats *stats = data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static void stats_count_objects(struct object_stats *stats,
+ struct ref_array *refs, struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ case FILTER_REFS_TAGS:
+ case FILTER_REFS_REMOTES:
+ case FILTER_REFS_OTHERS:
+ add_pending_oid(revs, NULL, &ref->objectname, 0);
+ break;
+ }
+ }
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
+}
+
static int repo_stats(int argc UNUSED, const char **argv UNUSED,
- const char *prefix UNUSED, struct repository *repo UNUSED)
+ const char *prefix, struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct strvec ref_patterns = STRVEC_INIT;
struct stats_table table = { 0 };
struct ref_array refs = { 0 };
struct stats stats = { 0 };
+ struct rev_info revs;
+ repo_init_revisions(repo, &revs, prefix);
filter.name_patterns = ref_patterns.v;
filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
- stats_count_references(&stats, &refs);
+ stats_count_references(&stats.refs, &refs);
+ stats_count_objects(&stats.objects, &refs, &revs);
stats_table_setup(&table, &stats);
stats_table_print(&table);
string_list_clear(&table.rows, 1);
strvec_clear(&ref_patterns);
+ release_revisions(&revs);
ref_array_clear(&refs);
return 0;
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 27c32ec45f..c6a7f08be5 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -20,6 +20,13 @@ test_expect_success 'empty repository stats' '
| * Tags | 0 |
| * Remotes | 0 |
| * Others | 0 |
+ | | |
+ | * Objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
test_cmp expect out &&
@@ -49,6 +56,45 @@ test_expect_success 'repository stats with references' '
| * Tags | 1 |
| * Remotes | 1 |
| * Others | 1 |
+ | | |
+ | * Objects | |
+ | * Count | 5 |
+ | * Commits | 2 |
+ | * Trees | 2 |
+ | * Blobs | 1 |
+ | * Tags | 0 |
+ EOF
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository stats with objects' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+ git repo stats >out 2>err &&
+
+ cat >expect <<-EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 2 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Objects | |
+ | * Count | 127 |
+ | * Commits | 42 |
+ | * Trees | 42 |
+ | * Blobs | 42 |
+ | * Tags | 1 |
EOF
test_cmp expect out &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH 3/4] builtin/repo: add keyvalue format for stats
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
@ 2025-09-23 2:56 ` Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:39 ` Karthik Nayak
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
4 siblings, 2 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 2:56 UTC (permalink / raw)
To: git; +Cc: karthik.188, Justin Tobler
All repository stats are outputted in a human-friendly table form. This
format is not suitable for machine parsing. Add a --format option that
supports two output modes: `table` and `keyvalue`. The `table` mode is
the default format and prints the same table output as before. With the
`keyvalue` mode, each line of output contains a key-value pair of a
repository stat. The '=' character is used to delimit between keys and
values. This mode provides output that is more machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 16 ++++++++--
builtin/repo.c | 61 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-stats.sh | 25 +++++++++++++++
3 files changed, 94 insertions(+), 8 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 2a67abfca8..7d0341e4f1 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo stats
+git repo stats [--format=(table|keyvalue)]
DESCRIPTION
-----------
@@ -44,12 +44,22 @@ supported:
+
`-z` is an alias for `--format=nul`.
-stats::
+`stats [--format=(table|keyvalue)]`::
Retrieve stats about the current repository. All references and
reachable objects in the repository are categorized and counted
accordingly.
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Two formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table and is used by
+ default. This format may change and is not intended for machine
+ parsing.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair of a repostiory stat. The
+ '=' character is used to delimit between the key and the value.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index a24ea0e66b..4c16a68e4e 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -14,13 +14,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo stats",
+ "git repo stats [--format=(table|keyvalue)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -135,6 +136,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -157,6 +160,8 @@ static int repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format == FORMAT_TABLE)
+ die(_("table format not supported"));
return print_fields(argc, argv, repo, format);
}
@@ -286,6 +291,32 @@ static void stats_table_print(struct stats_table *table)
strbuf_release(&buf);
}
+static void stats_print(struct stats *stats)
+{
+ struct strbuf buf = STRBUF_INIT;
+
+ strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->refs.branches);
+ strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->refs.tags);
+ strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->refs.remotes);
+ strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->refs.others);
+
+ strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->objects.commits);
+ strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->objects.trees);
+ strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->objects.blobs);
+ strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
+ (uintmax_t)stats->objects.tags);
+
+ fwrite(buf.buf, sizeof(char), buf.len, stdout);
+ strbuf_release(&buf);
+}
+
static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
@@ -359,9 +390,16 @@ static void stats_count_objects(struct object_stats *stats,
path_walk_info_clear(&info);
}
-static int repo_stats(int argc UNUSED, const char **argv UNUSED,
- const char *prefix, struct repository *repo)
+static int repo_stats(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
+ enum output_format format = FORMAT_TABLE;
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
struct ref_filter filter = REF_FILTER_INIT;
struct strvec ref_patterns = STRVEC_INIT;
struct stats_table table = { 0 };
@@ -369,6 +407,10 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
struct stats stats = { 0 };
struct rev_info revs;
+ parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format == FORMAT_NUL_TERMINATED)
+ die(_("nul format not yet supported"));
+
repo_init_revisions(repo, &revs, prefix);
filter.name_patterns = ref_patterns.v;
filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
@@ -376,8 +418,17 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
stats_count_references(&stats.refs, &refs);
stats_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup(&table, &stats);
- stats_table_print(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ stats_print(&stats);
+ break;
+ default:
+ BUG("not a valid output format: %d", format);
+ }
string_list_clear(&table.rows, 1);
strvec_clear(&ref_patterns);
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index c6a7f08be5..5bc6d9d5c4 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -102,4 +102,29 @@ test_expect_success 'repository stats with objects' '
)
'
+test_expect_success 'repository stats with keyvalue format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+ git repo stats --format=keyvalue >out 2>err &&
+
+ cat >expect <<-EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
` (2 preceding siblings ...)
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
@ 2025-09-23 2:57 ` Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:41 ` Karthik Nayak
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
4 siblings, 2 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 2:57 UTC (permalink / raw)
To: git; +Cc: karthik.188, Justin Tobler
Introduce the `nul` mode for the --format option. When enabled, the
output is similar to the `keyvalue` mode, but key-values are delimited
by a NUL character instead of a newline. This allows stat values to
support special characters without having to cquote values.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++++---
builtin/repo.c | 52 ++++++++++++++++++++-----------------
t/t1901-repo-stats.sh | 27 +++++++++++++++++++
3 files changed, 62 insertions(+), 27 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 7d0341e4f1..57267064ea 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo stats [--format=(table|keyvalue)]
+git repo stats [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,12 +44,12 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`stats [--format=(table|keyvalue)]`::
+`stats [--format=(table|keyvalue|nul)]`::
Retrieve stats about the current repository. All references and
reachable objects in the repository are categorized and counted
accordingly.
+
-The output format can be chosen through the flag `--format`. Two formats are
+The output format can be chosen through the flag `--format`. Three formats are
supported:
+
`table`:::
@@ -61,6 +61,10 @@ supported:
Each line of output contains a key-value pair of a repostiory stat. The
'=' character is used to delimit between the key and the value.
+`nul`:::
+ Similar to 'keyvalue', but uses a NUL character to delimit between
+ key-value pairs instead of a newline.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index 4c16a68e4e..37034e6347 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -14,7 +14,7 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo stats [--format=(table|keyvalue)]",
+ "git repo stats [--format=(table|keyvalue|nul)]",
NULL
};
@@ -291,27 +291,31 @@ static void stats_table_print(struct stats_table *table)
strbuf_release(&buf);
}
-static void stats_print(struct stats *stats)
+static void stats_print(struct stats *stats, int nul_delim)
{
struct strbuf buf = STRBUF_INIT;
-
- strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
- (uintmax_t)stats->refs.branches);
- strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
- (uintmax_t)stats->refs.tags);
- strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
- (uintmax_t)stats->refs.remotes);
- strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
- (uintmax_t)stats->refs.others);
-
- strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
- (uintmax_t)stats->objects.commits);
- strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
- (uintmax_t)stats->objects.trees);
- strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
- (uintmax_t)stats->objects.blobs);
- strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
- (uintmax_t)stats->objects.tags);
+ char delim = '\n';
+
+ if (nul_delim)
+ delim = '\0';
+
+ strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->refs.branches, delim);
+ strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->refs.tags, delim);
+ strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->refs.remotes, delim);
+ strbuf_addf(&buf, "references.others.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->refs.others, delim);
+
+ strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->objects.commits, delim);
+ strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->objects.trees, delim);
+ strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->objects.blobs, delim);
+ strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "%c",
+ (uintmax_t)stats->objects.tags, delim);
fwrite(buf.buf, sizeof(char), buf.len, stdout);
strbuf_release(&buf);
@@ -408,9 +412,6 @@ static int repo_stats(int argc, const char **argv, const char *prefix,
struct rev_info revs;
parse_options(argc, argv, prefix, options, repo_usage, 0);
- if (format == FORMAT_NUL_TERMINATED)
- die(_("nul format not yet supported"));
-
repo_init_revisions(repo, &revs, prefix);
filter.name_patterns = ref_patterns.v;
filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
@@ -424,7 +425,10 @@ static int repo_stats(int argc, const char **argv, const char *prefix,
stats_table_print(&table);
break;
case FORMAT_KEYVALUE:
- stats_print(&stats);
+ stats_print(&stats, 0);
+ break;
+ case FORMAT_NUL_TERMINATED:
+ stats_print(&stats, 1);
break;
default:
BUG("not a valid output format: %d", format);
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 5bc6d9d5c4..061b2fbbc1 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -127,4 +127,31 @@ test_expect_success 'repository stats with keyvalue format' '
)
'
+test_expect_success 'repository stats with nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+ git repo stats --format=nul >out 2>err &&
+
+ cat >expect <<-EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ tr "\n" "\0" <expect >expect_null &&
+
+ test_cmp expect_null out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
@ 2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:22 ` Karthik Nayak
1 sibling, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 10:52 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Mon, Sep 22, 2025 at 09:56:57PM -0500, Justin Tobler wrote:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command. Acquiring this information requires
> users to be fairly knowledgeable about the structure of a Git repository
> and how to identify the relevant data points. To fill this gap,
> supplemental tools such as git-sizer(1) have been developed.
>
> To allow users to more readily identify potential issues for a
> repository, introduce the "stats" subcommand in git-repo(1) to output
> stats for the repository that may be of interest to users. The goal of
> this subcommand is to eventually provide similar functionality to
> git-sizer(1), but in Git natively.
Nit: "but natively in Git" would read more natural.
> The initial version of this command only iterates through all references
> in the repository and tracks the count of branches, tags, remotes, and
s/remotes/remote refs/
> other reference types. The corresponding information is displayed in a
> human-friendly table formatted in a very similar manner to git-sizer(1).
> The width of each table column is adjusted automatically to satisfy the
> requirements of the widest row contained.
>
> Subsequent commits will surface additional relevant data points to
> output.
>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
Is this command built on Derrick's git-survey(1)? If so, it would
probably be nice to add a "Based-on-patch-by" tag.
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 209afd1b61..7762329551 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -43,6 +44,12 @@ supported:
> +
> `-z` is an alias for `--format=nul`.
>
> +stats::
> + Retrieve stats about the current repository. All references in the
s/stats/statistics/
> + repository are categorized and counted accordingly.
> ++
> +The table output format may change and is not intended for machine parsing.
I guess we don't yet have a machine-parseable interface in this state,
so we cannot ponit to it.
> @@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> +struct stats {
> + size_t branches;
> + size_t remotes;
> + size_t tags;
> + size_t others;
> +};
> +
> +struct stats_table {
> + struct string_list rows;
> +
> + int name_col_width;
> + int value_col_width;
You assign the result from `strlen()` to these fields, so they should
probably be `size_t`.
> +};
> +
> +struct stats_table_entry {
> + char *value;
> +};
> +
> +static void stats_table_add(struct stats_table *table, const char *name,
> + struct stats_table_entry *entry)
> +{
> + int name_width = strlen(name);
`strlen()` returns `size_t`.
> + struct string_list_item *item;
> +
> + item = string_list_append(&table->rows, name);
> + item->util = entry;
> +
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
> + if (entry) {
> + int value_width = strlen(entry->value);
> + if (value_width > table->value_col_width)
> + table->value_col_width = value_width;
> + }
I was wondering at first why you'd ever want to not pass an entry, but
we use that to have "dividers" in the table. Makes sense.
> +}
> +
> +static void stats_table_add_count(struct stats_table *table, const char *name,
> + size_t value)
> +{
> + struct stats_table_entry *entry;
> +
> + CALLOC_ARRAY(entry, 1);
> + entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> + stats_table_add(table, name, entry);
> +}
> +
> +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> +{
> + size_t ref_total;
> +
> + ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> + stats_table_add(table, _("* References"), NULL);
> + stats_table_add_count(table, _(" * Count"), ref_total);
> + stats_table_add_count(table, _(" * Branches"), stats->branches);
> + stats_table_add_count(table, _(" * Tags"), stats->tags);
> + stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> + stats_table_add_count(table, _(" * Others"), stats->others);
> +}
Would it make sense to not translate the formatting directives, but only
the actual words?
> +static void stats_table_print(struct stats_table *table)
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + int name_col_width = strlen(name_col_title);
> + int value_col_width = strlen(value_col_title);
These should both be `size_t`.
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> +
> + if (table->name_col_width > name_col_width)
> + name_col_width = table->name_col_width;
> + if (table->value_col_width > value_col_width)
> + value_col_width = table->value_col_width;
> +
> + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
Aha, that's why you went with `int`. You can use `cast_size_to_to_int()`
to convert between the types.
> + strbuf_addstr(&buf, "| ");
> + strbuf_addchars(&buf, '-', name_col_width);
> + strbuf_addstr(&buf, " | ");
> + strbuf_addchars(&buf, '-', value_col_width);
> + strbuf_addstr(&buf, " |\n");
> +
> + for_each_string_list_item (item, &table->rows) {
We typically don't have a space after between the macro and its
arguments.
> + struct stats_table_entry *entry = item->util;
> + const char *value = "";
> +
> + if (entry) {
> + struct stats_table_entry *entry = item->util;
> + value = entry->value;
> + }
> +
> + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> + item->string, value_col_width, value);
> +
> + if (entry)
> + free(entry->value);
It's a bit weird that we free the values when we pretend to only print
data. Sure, we probably don't ever have a usecase where we want to print
data a second time. But I still think it would be nice to separate
concerns.
> + }
> +
> + fputs(buf.buf, stdout);
> + strbuf_release(&buf);
> +}
> +
> +static void stats_count_references(struct stats *stats, struct ref_array *refs)
> +{
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + stats->branches++;
> + break;
> + case FILTER_REFS_REMOTES:
> + stats->remotes++;
> + break;
> + case FILTER_REFS_TAGS:
> + stats->tags++;
> + break;
> + case FILTER_REFS_OTHERS:
> + stats->others++;
> + break;
Do we want to have a `default:` case where we `BUG()`? Otherwise we may
not notice that we undercount the overall number of refs.
> + }
> + }
> +}
> +
> +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> + const char *prefix UNUSED, struct repository *repo UNUSED)
Not a new issue, but I'd rather call this `cmd_repo_stats()` to note
that this is the entrypoint. We might as well adapt the other subcommand
to follow that naming schema in a preparatory commit.
> +{
> + struct ref_filter filter = REF_FILTER_INIT;
> + struct strvec ref_patterns = STRVEC_INIT;
> + struct stats_table table = { 0 };
> + struct ref_array refs = { 0 };
> + struct stats stats = { 0 };
> +
> + filter.name_patterns = ref_patterns.v;
> + filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
`filter_refs()` may return an error code which we should probably
handle.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> new file mode 100755
> index 0000000000..27c32ec45f
> --- /dev/null
> +++ b/t/t1901-repo-stats.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +
> +test_description='test git repo stats'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'empty repository stats' '
Nit: I don't think it's necessary to repeat "repository stats" in every
test name. That's already clear from the test suite.
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
s/EOF/\EOF/, as we don't need to expand any variables.
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 0 |
> + | * Branches | 0 |
> + | * Tags | 0 |
> + | * Remotes | 0 |
> + | * Others | 0 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
> +
> +test_expect_success 'repository stats with references' '
Same comment regarding the name.
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m init &&
> + oid="$(git rev-parse HEAD)" &&
> + git switch -c foo &&
> + git tag init &&
> + git update-ref refs/remotes/origin/foo "$oid" &&
> + git notes add -m foo &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
Likewise regarding "\EOF".
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 5 |
> + | * Branches | 2 |
> + | * Tags | 1 |
> + | * Remotes | 1 |
> + | * Others | 1 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 2/4] builtin/repo: add object counts in stats output
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
@ 2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:19 ` Justin Tobler
2025-09-23 15:30 ` Karthik Nayak
1 sibling, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 10:52 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Mon, Sep 22, 2025 at 09:56:58PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 7762329551..2a67abfca8 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -45,8 +45,9 @@ supported:
> `-z` is an alias for `--format=nul`.
>
> stats::
> - Retrieve stats about the current repository. All references in the
> - repository are categorized and counted accordingly.
> + Retrieve stats about the current repository. All references and
> + reachable objects in the repository are categorized and counted
> + accordingly.
> +
> The table output format may change and is not intended for machine parsing.
I already wanted to mention this on the first commit, but would it maybe
make sense if this was a bulleted list of information that we surface
right from the start? Then we don't have to reflow the whole paragraph
every time we surface new information.
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 15899dd74c..a24ea0e66b 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> -struct stats {
> +struct ref_stats {
Nit: let's call it `ref_stats` right from the start instead of renaming.
> size_t branches;
> size_t remotes;
> size_t tags;
> size_t others;
> };
>
> +struct object_stats {
> + size_t tags;
> + size_t commits;
> + size_t trees;
> + size_t blobs;
> +};
> +
> +struct stats {
I'd maybe call this `struct repo_stats`. `stats` feels quite generic and
very close to a collision with `struct stat`.
> @@ -207,15 +221,27 @@ static void stats_table_add_count(struct stats_table *table, const char *name,
>
> static void stats_table_setup(struct stats_table *table, struct stats *stats)
> {
> + struct object_stats objects = stats->objects;
> + struct ref_stats refs = stats->refs;
We can avoid the copies by making these pointers. Not that it'd really
matter all that much.
> + size_t object_total;
> size_t ref_total;
>
> - ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> + ref_total = refs.branches + refs.remotes + refs.tags + refs.others;
> stats_table_add(table, _("* References"), NULL);
> stats_table_add_count(table, _(" * Count"), ref_total);
> - stats_table_add_count(table, _(" * Branches"), stats->branches);
> - stats_table_add_count(table, _(" * Tags"), stats->tags);
> - stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> - stats_table_add_count(table, _(" * Others"), stats->others);
> + stats_table_add_count(table, _(" * Branches"), refs.branches);
> + stats_table_add_count(table, _(" * Tags"), refs.tags);
> + stats_table_add_count(table, _(" * Remotes"), refs.remotes);
> + stats_table_add_count(table, _(" * Others"), refs.others);
> +
> + object_total = objects.commits + objects.trees + objects.blobs + objects.tags;
> + stats_table_add(table, "", NULL);
> + stats_table_add(table, _("* Objects"), NULL);
Should we maybe say "Reachable objects" here to clarify that this
doesn't count unreachable ones?
> @@ -282,25 +308,80 @@ static void stats_count_references(struct stats *stats, struct ref_array *refs)
> }
> }
>
> +static int count_objects(const char *path UNUSED, struct oid_array *oids,
> + enum object_type type, void *data)
> +{
> + struct object_stats *stats = data;
> +
> + switch (type) {
> + case OBJ_TAG:
> + stats->tags += oids->nr;
> + break;
> + case OBJ_COMMIT:
> + stats->commits += oids->nr;
> + break;
> + case OBJ_TREE:
> + stats->trees += oids->nr;
> + break;
> + case OBJ_BLOB:
> + stats->blobs += oids->nr;
> + break;
> + default:
Let's `BUG()` here. This case should never happen, and if it does
something is seriously wrong.
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static void stats_count_objects(struct object_stats *stats,
> + struct ref_array *refs, struct rev_info *revs)
> +{
> + struct path_walk_info info = PATH_WALK_INFO_INIT;
> +
> + info.revs = revs;
> + info.path_fn = count_objects;
> + info.path_fn_data = stats;
> +
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + case FILTER_REFS_TAGS:
> + case FILTER_REFS_REMOTES:
> + case FILTER_REFS_OTHERS:
> + add_pending_oid(revs, NULL, &ref->objectname, 0);
> + break;
> + }
> + }
> +
> + walk_objects_by_path(&info);
> + path_walk_info_clear(&info);
> +}
I guess this can take a while, so having a progress meter would be great
to have to give the user some info what's happening. I guess it doesn't
have to be part of the first iteration thuogh as long as this is
something we plan to add at a later point.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> index 27c32ec45f..c6a7f08be5 100755
> --- a/t/t1901-repo-stats.sh
> +++ b/t/t1901-repo-stats.sh
> @@ -20,6 +20,13 @@ test_expect_success 'empty repository stats' '
> | * Tags | 0 |
> | * Remotes | 0 |
> | * Others | 0 |
> + | | |
> + | * Objects | |
> + | * Count | 0 |
> + | * Commits | 0 |
> + | * Trees | 0 |
> + | * Blobs | 0 |
> + | * Tags | 0 |
> EOF
>
> test_cmp expect out &&
> @@ -49,6 +56,45 @@ test_expect_success 'repository stats with references' '
> | * Tags | 1 |
> | * Remotes | 1 |
> | * Others | 1 |
> + | | |
> + | * Objects | |
> + | * Count | 5 |
> + | * Commits | 2 |
> + | * Trees | 2 |
> + | * Blobs | 1 |
> + | * Tags | 0 |
> + EOF
> +
> + test_cmp expect out &&
> + test_line_count = 0 err
> + )
> +'
> +
> +test_expect_success 'repository stats with objects' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit_bulk 42 &&
> + git tag -a foo -m bar &&
> + git repo stats >out 2>err &&
> +
> + cat >expect <<-EOF &&
> + | Repository stats | Value |
> + | ---------------- | ----- |
> + | * References | |
> + | * Count | 2 |
> + | * Branches | 1 |
> + | * Tags | 1 |
> + | * Remotes | 0 |
> + | * Others | 0 |
> + | | |
> + | * Objects | |
> + | * Count | 127 |
> + | * Commits | 42 |
> + | * Trees | 42 |
> + | * Blobs | 42 |
> + | * Tags | 1 |
> EOF
I quite like the output format, by the way. It's nice to read and makes
it sufficiently clear that this is not expected to be parsed by a
machine.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 3/4] builtin/repo: add keyvalue format for stats
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
@ 2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:26 ` Justin Tobler
2025-09-23 15:39 ` Karthik Nayak
1 sibling, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 10:53 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Mon, Sep 22, 2025 at 09:56:59PM -0500, Justin Tobler wrote:
> @@ -157,6 +160,8 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> };
>
> argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
> + if (format == FORMAT_TABLE)
> + die(_("table format not supported"));
We can deduplicate this error message by saying "format '%s' not
supported".
Other than that, let's fail closed and say `if (format !=
FORMAT_KEYVALUE && format != FORMAT_NUL)`. Like this, we won't have to
update the condition every time a new format is added, which is easy to
forget.
> @@ -286,6 +291,32 @@ static void stats_table_print(struct stats_table *table)
> strbuf_release(&buf);
> }
>
> +static void stats_print(struct stats *stats)
I think it would make sense to call this `stats_keyvalue_print()` to
clearly distinguish it from `stats_table_print()`.
> @@ -359,9 +390,16 @@ static void stats_count_objects(struct object_stats *stats,
> path_walk_info_clear(&info);
> }
>
> -static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> - const char *prefix, struct repository *repo)
> +static int repo_stats(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> {
> + enum output_format format = FORMAT_TABLE;
> + struct option options[] = {
> + OPT_CALLBACK_F(0, "format", &format, N_("format"),
> + N_("output format"),
> + PARSE_OPT_NONEG, parse_format_cb),
> + OPT_END()
> + };
> struct ref_filter filter = REF_FILTER_INIT;
> struct strvec ref_patterns = STRVEC_INIT;
> struct stats_table table = { 0 };
Nice that we can reuse the callback.
> @@ -369,6 +407,10 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> struct stats stats = { 0 };
> struct rev_info revs;
>
> + parse_options(argc, argv, prefix, options, repo_usage, 0);
> + if (format == FORMAT_NUL_TERMINATED)
> + die(_("nul format not yet supported"));
> +
> repo_init_revisions(repo, &revs, prefix);
> filter.name_patterns = ref_patterns.v;
> filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
Same comment here regarding failing in a closed way.
> @@ -376,8 +418,17 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> stats_count_references(&stats.refs, &refs);
> stats_count_objects(&stats.objects, &refs, &revs);
>
> - stats_table_setup(&table, &stats);
> - stats_table_print(&table);
> + switch (format) {
> + case FORMAT_TABLE:
> + stats_table_setup(&table, &stats);
> + stats_table_print(&table);
> + break;
> + case FORMAT_KEYVALUE:
> + stats_print(&stats);
> + break;
> + default:
> + BUG("not a valid output format: %d", format);
Nit: it may be valid, but definitely not supported.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
@ 2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:33 ` Justin Tobler
2025-09-23 15:41 ` Karthik Nayak
1 sibling, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 10:53 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Mon, Sep 22, 2025 at 09:57:00PM -0500, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 4c16a68e4e..37034e6347 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -291,27 +291,31 @@ static void stats_table_print(struct stats_table *table)
> strbuf_release(&buf);
> }
>
> -static void stats_print(struct stats *stats)
> +static void stats_print(struct stats *stats, int nul_delim)
Instead of passing a boolean-style option, can't we pass the expected
delimiter directly? Makes the callsite a bit more obvious.
> {
> struct strbuf buf = STRBUF_INIT;
> -
> - strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.branches);
> - strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.tags);
> - strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.remotes);
> - strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.others);
> -
> - strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.commits);
> - strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.trees);
> - strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.blobs);
> - strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.tags);
> + char delim = '\n';
> +
> + if (nul_delim)
> + delim = '\0';
> +
> + strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.branches, delim);
> + strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.tags, delim);
> + strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.remotes, delim);
> + strbuf_addf(&buf, "references.others.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.others, delim);
> +
> + strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.commits, delim);
> + strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.trees, delim);
> + strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.blobs, delim);
> + strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.tags, delim);
>
> fwrite(buf.buf, sizeof(char), buf.len, stdout);
> strbuf_release(&buf);
It's a bit unfortunate we have to rewrite most of the function. I'd
either have the `delim` parameter right from the start or just squash
these two patches together.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> index 5bc6d9d5c4..061b2fbbc1 100755
> --- a/t/t1901-repo-stats.sh
> +++ b/t/t1901-repo-stats.sh
> @@ -127,4 +127,31 @@ test_expect_success 'repository stats with keyvalue format' '
> )
> '
>
> +test_expect_success 'repository stats with nul format' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit_bulk 42 &&
> + git tag -a foo -m bar &&
> + git repo stats --format=nul >out 2>err &&
> +
> + cat >expect <<-EOF &&
> + references.branches.count=1
> + references.tags.count=1
> + references.remotes.count=0
> + references.others.count=0
> + objects.commits.count=42
> + objects.trees.count=42
> + objects.blobs.count=42
> + objects.tags.count=1
> + EOF
> +
> + tr "\n" "\0" <expect >expect_null &&
> +
> + test_cmp expect_null out &&
> + test_line_count = 0 err
> + )
> +'
We already have a test for the keyvalue format that looks mostly the
same, so we may just as well test both formats in a single test.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 10:52 ` Patrick Steinhardt
@ 2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:26 ` Patrick Steinhardt
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/23 12:52PM, Patrick Steinhardt wrote:
> On Mon, Sep 22, 2025 at 09:56:57PM -0500, Justin Tobler wrote:
[snip]
> > Signed-off-by: Justin Tobler <jltobler@gmail.com>
>
> Is this command built on Derrick's git-survey(1)? If so, it would
> probably be nice to add a "Based-on-patch-by" tag.
The git-survey(1) series sent by Derrick certainly served as inspiration
for this series. I didn't build this series from those patches, but
there is probably some overlap since I'm trying to accomplish something
very similar. I don't mind at all adding a "Based-on-patch-by" tag
though. Will do in the next version :)
[snip]
> > + struct string_list_item *item;
> > +
> > + item = string_list_append(&table->rows, name);
> > + item->util = entry;
> > +
> > + if (name_width > table->name_col_width)
> > + table->name_col_width = name_width;
> > + if (entry) {
> > + int value_width = strlen(entry->value);
> > + if (value_width > table->value_col_width)
> > + table->value_col_width = value_width;
> > + }
>
> I was wondering at first why you'd ever want to not pass an entry, but
> we use that to have "dividers" in the table. Makes sense.
Yup. Also, some rows like "* References" may have a name, but no value.
> > +}
> > +
> > +static void stats_table_add_count(struct stats_table *table, const char *name,
> > + size_t value)
> > +{
> > + struct stats_table_entry *entry;
> > +
> > + CALLOC_ARRAY(entry, 1);
> > + entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> > + stats_table_add(table, name, entry);
> > +}
> > +
> > +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> > +{
> > + size_t ref_total;
> > +
> > + ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> > + stats_table_add(table, _("* References"), NULL);
> > + stats_table_add_count(table, _(" * Count"), ref_total);
> > + stats_table_add_count(table, _(" * Branches"), stats->branches);
> > + stats_table_add_count(table, _(" * Tags"), stats->tags);
> > + stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> > + stats_table_add_count(table, _(" * Others"), stats->others);
> > +}
>
> Would it make sense to not translate the formatting directives, but only
> the actual words?
From a simplicity stand point, it is quite nice to have the formatted
offsets baked-in. It is probably better to separate out the
transalations though? I'll interate on this in the next version.
> > + struct strbuf buf = STRBUF_INIT;
> > + struct string_list_item *item;
> > +
> > + if (table->name_col_width > name_col_width)
> > + name_col_width = table->name_col_width;
> > + if (table->value_col_width > value_col_width)
> > + value_col_width = table->value_col_width;
> > +
> > + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> > + value_col_width, value_col_title);
>
> Aha, that's why you went with `int`. You can use `cast_size_to_to_int()`
> to convert between the types.
Yep :) I'll adapt following your suggestion in the next version.
> > + strbuf_addstr(&buf, "| ");
> > + strbuf_addchars(&buf, '-', name_col_width);
> > + strbuf_addstr(&buf, " | ");
> > + strbuf_addchars(&buf, '-', value_col_width);
> > + strbuf_addstr(&buf, " |\n");
> > +
> > + for_each_string_list_item (item, &table->rows) {
>
> We typically don't have a space after between the macro and its
> arguments.
Your right! I seem to recall the style linter wanted me to have it this
way, but I'll change it back to be consistent.
> > + struct stats_table_entry *entry = item->util;
> > + const char *value = "";
> > +
> > + if (entry) {
> > + struct stats_table_entry *entry = item->util;
> > + value = entry->value;
> > + }
> > +
> > + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> > + item->string, value_col_width, value);
> > +
> > + if (entry)
> > + free(entry->value);
>
> It's a bit weird that we free the values when we pretend to only print
> data. Sure, we probably don't ever have a usecase where we want to print
> data a second time. But I still think it would be nice to separate
> concerns.
That's fair. I'll probably add a stats_table_clear() in the next
version.
> > + }
> > +
> > + fputs(buf.buf, stdout);
> > + strbuf_release(&buf);
> > +}
> > +
> > +static void stats_count_references(struct stats *stats, struct ref_array *refs)
> > +{
> > + for (int i = 0; i < refs->nr; i++) {
> > + struct ref_array_item *ref = refs->items[i];
> > +
> > + switch (ref->kind) {
> > + case FILTER_REFS_BRANCHES:
> > + stats->branches++;
> > + break;
> > + case FILTER_REFS_REMOTES:
> > + stats->remotes++;
> > + break;
> > + case FILTER_REFS_TAGS:
> > + stats->tags++;
> > + break;
> > + case FILTER_REFS_OTHERS:
> > + stats->others++;
> > + break;
>
> Do we want to have a `default:` case where we `BUG()`? Otherwise we may
> not notice that we undercount the overall number of refs.
Since filter_refs() is only checking regular references, we shouldn't
ever encounter other types, but it doesn't hurt to BUG() here if it were
to happen. Will do.
> > + }
> > + }
> > +}
> > +
> > +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> > + const char *prefix UNUSED, struct repository *repo UNUSED)
>
> Not a new issue, but I'd rather call this `cmd_repo_stats()` to note
> that this is the entrypoint. We might as well adapt the other subcommand
> to follow that naming schema in a preparatory commit.
Make sense. I'll update repo_info() in a preparatory commit as
suggested.
> > +{
> > + struct ref_filter filter = REF_FILTER_INIT;
> > + struct strvec ref_patterns = STRVEC_INIT;
> > + struct stats_table table = { 0 };
> > + struct ref_array refs = { 0 };
> > + struct stats stats = { 0 };
> > +
> > + filter.name_patterns = ref_patterns.v;
> > + filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
>
> `filter_refs()` may return an error code which we should probably
> handle.
Will do.
> > diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> > new file mode 100755
> > index 0000000000..27c32ec45f
> > --- /dev/null
> > +++ b/t/t1901-repo-stats.sh
> > @@ -0,0 +1,59 @@
> > +#!/bin/sh
> > +
> > +test_description='test git repo stats'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'empty repository stats' '
>
> Nit: I don't think it's necessary to repeat "repository stats" in every
> test name. That's already clear from the test suite.
That's fair. Will adapt the tests accordingly.
Thanks for the review,
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 2/4] builtin/repo: add object counts in stats output
2025-09-23 10:52 ` Patrick Steinhardt
@ 2025-09-23 15:19 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:19 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/23 12:52PM, Patrick Steinhardt wrote:
> On Mon, Sep 22, 2025 at 09:56:58PM -0500, Justin Tobler wrote:
> > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> > index 7762329551..2a67abfca8 100644
> > --- a/Documentation/git-repo.adoc
> > +++ b/Documentation/git-repo.adoc
> > @@ -45,8 +45,9 @@ supported:
> > `-z` is an alias for `--format=nul`.
> >
> > stats::
> > - Retrieve stats about the current repository. All references in the
> > - repository are categorized and counted accordingly.
> > + Retrieve stats about the current repository. All references and
> > + reachable objects in the repository are categorized and counted
> > + accordingly.
> > +
> > The table output format may change and is not intended for machine parsing.
>
> I already wanted to mention this on the first commit, but would it maybe
> make sense if this was a bulleted list of information that we surface
> right from the start? Then we don't have to reflow the whole paragraph
> every time we surface new information.
Makes sense.
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index 15899dd74c..a24ea0e66b 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> > return print_fields(argc, argv, repo, format);
> > }
> >
> > -struct stats {
> > +struct ref_stats {
>
> Nit: let's call it `ref_stats` right from the start instead of renaming.
Will do.
> > size_t branches;
> > size_t remotes;
> > size_t tags;
> > size_t others;
> > };
> >
> > +struct object_stats {
> > + size_t tags;
> > + size_t commits;
> > + size_t trees;
> > + size_t blobs;
> > +};
> > +
> > +struct stats {
>
> I'd maybe call this `struct repo_stats`. `stats` feels quite generic and
> very close to a collision with `struct stat`.
That's fair. I quite like the name `repo_stats`. I'll use that instead.
Thanks
>
> > @@ -207,15 +221,27 @@ static void stats_table_add_count(struct stats_table *table, const char *name,
> >
> > static void stats_table_setup(struct stats_table *table, struct stats *stats)
> > {
> > + struct object_stats objects = stats->objects;
> > + struct ref_stats refs = stats->refs;
>
> We can avoid the copies by making these pointers. Not that it'd really
> matter all that much.
Ya, I'll change this in the next version.
> > + size_t object_total;
> > size_t ref_total;
> >
> > - ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> > + ref_total = refs.branches + refs.remotes + refs.tags + refs.others;
> > stats_table_add(table, _("* References"), NULL);
> > stats_table_add_count(table, _(" * Count"), ref_total);
> > - stats_table_add_count(table, _(" * Branches"), stats->branches);
> > - stats_table_add_count(table, _(" * Tags"), stats->tags);
> > - stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> > - stats_table_add_count(table, _(" * Others"), stats->others);
> > + stats_table_add_count(table, _(" * Branches"), refs.branches);
> > + stats_table_add_count(table, _(" * Tags"), refs.tags);
> > + stats_table_add_count(table, _(" * Remotes"), refs.remotes);
> > + stats_table_add_count(table, _(" * Others"), refs.others);
> > +
> > + object_total = objects.commits + objects.trees + objects.blobs + objects.tags;
> > + stats_table_add(table, "", NULL);
> > + stats_table_add(table, _("* Objects"), NULL);
>
> Should we maybe say "Reachable objects" here to clarify that this
> doesn't count unreachable ones?
Good suggestion. Will update.
> > @@ -282,25 +308,80 @@ static void stats_count_references(struct stats *stats, struct ref_array *refs)
> > }
> > }
> >
> > +static int count_objects(const char *path UNUSED, struct oid_array *oids,
> > + enum object_type type, void *data)
> > +{
> > + struct object_stats *stats = data;
> > +
> > + switch (type) {
> > + case OBJ_TAG:
> > + stats->tags += oids->nr;
> > + break;
> > + case OBJ_COMMIT:
> > + stats->commits += oids->nr;
> > + break;
> > + case OBJ_TREE:
> > + stats->trees += oids->nr;
> > + break;
> > + case OBJ_BLOB:
> > + stats->blobs += oids->nr;
> > + break;
> > + default:
>
> Let's `BUG()` here. This case should never happen, and if it does
> something is seriously wrong.
I agree it doesn't hurt to be more defensive here. I'll update in the
next version.
> > + break;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void stats_count_objects(struct object_stats *stats,
> > + struct ref_array *refs, struct rev_info *revs)
> > +{
> > + struct path_walk_info info = PATH_WALK_INFO_INIT;
> > +
> > + info.revs = revs;
> > + info.path_fn = count_objects;
> > + info.path_fn_data = stats;
> > +
> > + for (int i = 0; i < refs->nr; i++) {
> > + struct ref_array_item *ref = refs->items[i];
> > +
> > + switch (ref->kind) {
> > + case FILTER_REFS_BRANCHES:
> > + case FILTER_REFS_TAGS:
> > + case FILTER_REFS_REMOTES:
> > + case FILTER_REFS_OTHERS:
> > + add_pending_oid(revs, NULL, &ref->objectname, 0);
> > + break;
> > + }
> > + }
> > +
> > + walk_objects_by_path(&info);
> > + path_walk_info_clear(&info);
> > +}
>
> I guess this can take a while, so having a progress meter would be great
> to have to give the user some info what's happening. I guess it doesn't
> have to be part of the first iteration thuogh as long as this is
> something we plan to add at a later point.
Ya, I was planning on adding a progress meter in the future. It may not
be too much to add it as part of this series though. I'll take a look.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
@ 2025-09-23 15:22 ` Karthik Nayak
2025-09-23 15:55 ` Justin Tobler
1 sibling, 1 reply; 92+ messages in thread
From: Karthik Nayak @ 2025-09-23 15:22 UTC (permalink / raw)
To: Justin Tobler, git
[-- Attachment #1: Type: text/plain, Size: 9183 bytes --]
Justin Tobler <jltobler@gmail.com> writes:
[snip]
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 209afd1b61..7762329551 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -9,6 +9,7 @@ SYNOPSIS
> --------
> [synopsis]
> git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
> +git repo stats
>
> DESCRIPTION
> -----------
> @@ -43,6 +44,12 @@ supported:
> +
> `-z` is an alias for `--format=nul`.
>
> +stats::
> + Retrieve stats about the current repository. All references in the
> + repository are categorized and counted accordingly.
> ++
Nit: Should we use bullet points for the information provided? As of
this commit it is only references, but with time, this will grow, so
would be nice to have a set of bullet points to understand the different
information retrieved.
> +The table output format may change and is not intended for machine parsing.
> +
> INFO KEYS
> ---------
> In order to obtain a set of values from `git repo info`, you should provide
> diff --git a/builtin/repo.c b/builtin/repo.c
> index bbb0966f2d..15899dd74c 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -4,12 +4,15 @@
> #include "environment.h"
> #include "parse-options.h"
> #include "quote.h"
> +#include "ref-filter.h"
> #include "refs.h"
> #include "strbuf.h"
> +#include "string-list.h"
> #include "shallow.h"
>
> static const char *const repo_usage[] = {
> "git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
> + "git repo stats",
> NULL
> };
>
> @@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> +struct stats {
> + size_t branches;
> + size_t remotes;
> + size_t tags;
> + size_t others;
Maybe we can use a nested structure here, which would reflect the output
table. That would be much nicer, as the code would grow, currently we
know 'others' here refers to references, but with other fields coming
in, this would no longer be obvious.
> +};
> +
> +struct stats_table {
> + struct string_list rows;
> +
> + int name_col_width;
> + int value_col_width;
Can these be negative?
> +};
> +
Nit: Could we add a comment about what this structure is and what it
holds? I think it is sort of obvious, but would be nice to clarify.
> +struct stats_table_entry {
> + char *value;
> +};
> +
> +static void stats_table_add(struct stats_table *table, const char *name,
> + struct stats_table_entry *entry)
> +{
> + int name_width = strlen(name);
> + struct string_list_item *item;
> +
> + item = string_list_append(&table->rows, name);
> + item->util = entry;
> +
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
> + if (entry) {
> + int value_width = strlen(entry->value);
> + if (value_width > table->value_col_width)
> + table->value_col_width = value_width;
> + }
> +}
> +
> +static void stats_table_add_count(struct stats_table *table, const char *name,
> + size_t value)
> +{
> + struct stats_table_entry *entry;
> +
> + CALLOC_ARRAY(entry, 1);
> + entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
> + stats_table_add(table, name, entry);
> +}
> +
> +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> +{
> + size_t ref_total;
> +
> + ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> + stats_table_add(table, _("* References"), NULL);
> + stats_table_add_count(table, _(" * Count"), ref_total);
> + stats_table_add_count(table, _(" * Branches"), stats->branches);
> + stats_table_add_count(table, _(" * Tags"), stats->tags);
> + stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> + stats_table_add_count(table, _(" * Others"), stats->others);
> +}
> +
> +static void stats_table_print(struct stats_table *table)
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + int name_col_width = strlen(name_col_title);
> + int value_col_width = strlen(value_col_title);
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> +
> + if (table->name_col_width > name_col_width)
> + name_col_width = table->name_col_width;
> + if (table->value_col_width > value_col_width)
> + value_col_width = table->value_col_width;
> +
> + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
> + strbuf_addstr(&buf, "| ");
> + strbuf_addchars(&buf, '-', name_col_width);
> + strbuf_addstr(&buf, " | ");
> + strbuf_addchars(&buf, '-', value_col_width);
> + strbuf_addstr(&buf, " |\n");
> +
> + for_each_string_list_item (item, &table->rows) {
> + struct stats_table_entry *entry = item->util;
> + const char *value = "";
> +
> + if (entry) {
> + struct stats_table_entry *entry = item->util;
> + value = entry->value;
> + }
> +
> + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> + item->string, value_col_width, value);
> +
> + if (entry)
> + free(entry->value);
> + }
> +
> + fputs(buf.buf, stdout);
> + strbuf_release(&buf);
> +}
> +
> +static void stats_count_references(struct stats *stats, struct ref_array *refs)
> +{
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + stats->branches++;
> + break;
> + case FILTER_REFS_REMOTES:
> + stats->remotes++;
> + break;
> + case FILTER_REFS_TAGS:
> + stats->tags++;
> + break;
> + case FILTER_REFS_OTHERS:
> + stats->others++;
> + break;
> + }
> + }
> +}
> +
> +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> + const char *prefix UNUSED, struct repository *repo UNUSED)
> +{
> + struct ref_filter filter = REF_FILTER_INIT;
> + struct strvec ref_patterns = STRVEC_INIT;
> + struct stats_table table = { 0 };
> + struct ref_array refs = { 0 };
> + struct stats stats = { 0 };
> +
> + filter.name_patterns = ref_patterns.v;
> + filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
> +
I was wondering why we need the filter mechanism here, but seems like
this is to obtain the type of the reference. This is setup automatically
by the filter mechanism and so it's okay.
We could replicate the same using the ref iterator code. This would be a
little more involved, but would remove the need to store all refs in a
'ref_array' and also the need to loop over references twice. But it
doesn't really matter in this usecase I assume.
> + stats_count_references(&stats, &refs);
> +
> + stats_table_setup(&table, &stats);
> + stats_table_print(&table);
> +
> + string_list_clear(&table.rows, 1);
> + strvec_clear(&ref_patterns);
Huh. So `ref_patterns` is simply a dummy variable, I was wondering why
we simply can't set `filter.name_patterns = NULL`. I see that it is
because in `filter_pattern_match()` we do `if
(!*filter->name_patterns)`. This is not a great interface for
ref-filter. Perhaps we could add a precursor commit like:
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..20284b5918 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char
**pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
diff --git a/ref-filter.h b/ref-filter.h
index f22ca94b49..44d9b481ad 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -109,6 +109,7 @@ struct ref_format {
#define REF_FILTER_INIT { \
.points_at = OID_ARRAY_INIT, \
.exclude = STRVEC_INIT, \
+ .name_pattern = NULL, \
}
#define REF_FORMAT_INIT { \
.use_color = -1, \
and then we could modify this commit:
diff --git a/builtin/repo.c b/builtin/repo.c
index 15899dd74c..09733b8df7 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -286,12 +286,11 @@ static int repo_stats(int argc UNUSED, const
char **argv UNUSED,
const char *prefix UNUSED, struct repository *repo UNUSED)
{
struct ref_filter filter = REF_FILTER_INIT;
- struct strvec ref_patterns = STRVEC_INIT;
struct stats_table table = { 0 };
struct ref_array refs = { 0 };
struct stats stats = { 0 };
- filter.name_patterns = ref_patterns.v;
filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
stats_count_references(&stats, &refs);
@@ -300,7 +299,6 @@ static int repo_stats(int argc UNUSED, const char
**argv UNUSED,
stats_table_print(&table);
string_list_clear(&table.rows, 1);
- strvec_clear(&ref_patterns);
ref_array_clear(&refs);
return 0;
> + ref_array_clear(&refs);
> +
> + return 0;
> +}
> +
> int cmd_repo(int argc, const char **argv, const char *prefix,
> struct repository *repo)
> {
> parse_opt_subcommand_fn *fn = NULL;
> struct option options[] = {
> OPT_SUBCOMMAND("info", &fn, repo_info),
> + OPT_SUBCOMMAND("stats", &fn, repo_stats),
> OPT_END()
> };
>
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 15:10 ` Justin Tobler
@ 2025-09-23 15:26 ` Patrick Steinhardt
0 siblings, 0 replies; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 15:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Tue, Sep 23, 2025 at 10:10:50AM -0500, Justin Tobler wrote:
> On 25/09/23 12:52PM, Patrick Steinhardt wrote:
> > On Mon, Sep 22, 2025 at 09:56:57PM -0500, Justin Tobler wrote:
[snip]
> > > +static void stats_table_setup(struct stats_table *table, struct stats *stats)
> > > +{
> > > + size_t ref_total;
> > > +
> > > + ref_total = stats->branches + stats->remotes + stats->tags + stats->others;
> > > + stats_table_add(table, _("* References"), NULL);
> > > + stats_table_add_count(table, _(" * Count"), ref_total);
> > > + stats_table_add_count(table, _(" * Branches"), stats->branches);
> > > + stats_table_add_count(table, _(" * Tags"), stats->tags);
> > > + stats_table_add_count(table, _(" * Remotes"), stats->remotes);
> > > + stats_table_add_count(table, _(" * Others"), stats->others);
> > > +}
> >
> > Would it make sense to not translate the formatting directives, but only
> > the actual words?
>
> From a simplicity stand point, it is quite nice to have the formatted
> offsets baked-in. It is probably better to separate out the
> transalations though? I'll interate on this in the next version.
Yeah, I don't mind the baked-in offsets. We could allow formatting
directives here and then pass the translated nouns as varargs.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 3/4] builtin/repo: add keyvalue format for stats
2025-09-23 10:53 ` Patrick Steinhardt
@ 2025-09-23 15:26 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/23 12:53PM, Patrick Steinhardt wrote:
> On Mon, Sep 22, 2025 at 09:56:59PM -0500, Justin Tobler wrote:
> > @@ -157,6 +160,8 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> > };
> >
> > argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
> > + if (format == FORMAT_TABLE)
> > + die(_("table format not supported"));
>
> We can deduplicate this error message by saying "format '%s' not
> supported".
>
> Other than that, let's fail closed and say `if (format !=
> FORMAT_KEYVALUE && format != FORMAT_NUL)`. Like this, we won't have to
> update the condition every time a new format is added, which is easy to
> forget.
That's fair. Will update.
> > @@ -286,6 +291,32 @@ static void stats_table_print(struct stats_table *table)
> > strbuf_release(&buf);
> > }
> >
> > +static void stats_print(struct stats *stats)
>
> I think it would make sense to call this `stats_keyvalue_print()` to
> clearly distinguish it from `stats_table_print()`.
I adapt this same function in the next version to also print the NUL
format which is why I kept it generic. The NUL format is really still
just a key-format just with different delimiters. I'll change this in
the next version.
[snip]
> > @@ -376,8 +418,17 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> > stats_count_references(&stats.refs, &refs);
> > stats_count_objects(&stats.objects, &refs, &revs);
> >
> > - stats_table_setup(&table, &stats);
> > - stats_table_print(&table);
> > + switch (format) {
> > + case FORMAT_TABLE:
> > + stats_table_setup(&table, &stats);
> > + stats_table_print(&table);
> > + break;
> > + case FORMAT_KEYVALUE:
> > + stats_print(&stats);
> > + break;
> > + default:
> > + BUG("not a valid output format: %d", format);
>
> Nit: it may be valid, but definitely not supported.
At this point, the only other valid option was handled above. This would
leave only invalid options. I think in the next version I'm just going
to combine patches 3 and 4 though to avoid this issue altogether though.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 2/4] builtin/repo: add object counts in stats output
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
@ 2025-09-23 15:30 ` Karthik Nayak
2025-09-23 15:56 ` Justin Tobler
1 sibling, 1 reply; 92+ messages in thread
From: Karthik Nayak @ 2025-09-23 15:30 UTC (permalink / raw)
To: Justin Tobler, git
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]
Justin Tobler <jltobler@gmail.com> writes:
[snip]
> @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> -struct stats {
> +struct ref_stats {
> size_t branches;
> size_t remotes;
> size_t tags;
> size_t others;
> };
>
> +struct object_stats {
> + size_t tags;
> + size_t commits;
> + size_t trees;
> + size_t blobs;
> +};
> +
> +struct stats {
> + struct ref_stats refs;
> + struct object_stats objects;
> +};
> +
Okay so we do use nested structs, which I mentioned in the previous
commit, so we're all good.
Nit: A lot of the diff in this commit is because we don't make the
required structs in the previous commit. Perhaps we could start with the
nested structs in the previous commit? That way this patch would be much
simpler.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 10:53 ` Patrick Steinhardt
@ 2025-09-23 15:33 ` Justin Tobler
2025-09-24 4:48 ` Patrick Steinhardt
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:33 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/23 12:53PM, Patrick Steinhardt wrote:
> On Mon, Sep 22, 2025 at 09:57:00PM -0500, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index 4c16a68e4e..37034e6347 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -291,27 +291,31 @@ static void stats_table_print(struct stats_table *table)
> > strbuf_release(&buf);
> > }
> >
> > -static void stats_print(struct stats *stats)
> > +static void stats_print(struct stats *stats, int nul_delim)
>
> Instead of passing a boolean-style option, can't we pass the expected
> delimiter directly? Makes the callsite a bit more obvious.
Ya, we could do that here instead. Something I just noticed is that the
NUL format in `git repo info` also replaces the '=' delimiter with a
newline. I'm not sure if it would be best to match the same behavior
here?
If so, we would have to either pass both delimiters as arguments to the
function, or just keep the boolean toggle for the mode.
> > {
> > struct strbuf buf = STRBUF_INIT;
> > -
> > - strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->refs.branches);
> > - strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->refs.tags);
> > - strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->refs.remotes);
> > - strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->refs.others);
> > -
> > - strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->objects.commits);
> > - strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->objects.trees);
> > - strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->objects.blobs);
> > - strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
> > - (uintmax_t)stats->objects.tags);
> > + char delim = '\n';
> > +
> > + if (nul_delim)
> > + delim = '\0';
> > +
> > + strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->refs.branches, delim);
> > + strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->refs.tags, delim);
> > + strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->refs.remotes, delim);
> > + strbuf_addf(&buf, "references.others.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->refs.others, delim);
> > +
> > + strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->objects.commits, delim);
> > + strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->objects.trees, delim);
> > + strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->objects.blobs, delim);
> > + strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "%c",
> > + (uintmax_t)stats->objects.tags, delim);
> >
> > fwrite(buf.buf, sizeof(char), buf.len, stdout);
> > strbuf_release(&buf);
>
> It's a bit unfortunate we have to rewrite most of the function. I'd
> either have the `delim` parameter right from the start or just squash
> these two patches together.
I'll just squash these two patches together.
> > diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> > index 5bc6d9d5c4..061b2fbbc1 100755
> > --- a/t/t1901-repo-stats.sh
> > +++ b/t/t1901-repo-stats.sh
> > @@ -127,4 +127,31 @@ test_expect_success 'repository stats with keyvalue format' '
> > )
> > '
> >
> > +test_expect_success 'repository stats with nul format' '
> > + test_when_finished "rm -rf repo" &&
> > + git init repo &&
> > + (
> > + cd repo &&
> > + test_commit_bulk 42 &&
> > + git tag -a foo -m bar &&
> > + git repo stats --format=nul >out 2>err &&
> > +
> > + cat >expect <<-EOF &&
> > + references.branches.count=1
> > + references.tags.count=1
> > + references.remotes.count=0
> > + references.others.count=0
> > + objects.commits.count=42
> > + objects.trees.count=42
> > + objects.blobs.count=42
> > + objects.tags.count=1
> > + EOF
> > +
> > + tr "\n" "\0" <expect >expect_null &&
> > +
> > + test_cmp expect_null out &&
> > + test_line_count = 0 err
> > + )
> > +'
>
> We already have a test for the keyvalue format that looks mostly the
> same, so we may just as well test both formats in a single test.
Good suggestion. Will update.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 3/4] builtin/repo: add keyvalue format for stats
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
@ 2025-09-23 15:39 ` Karthik Nayak
2025-09-23 15:59 ` Justin Tobler
1 sibling, 1 reply; 92+ messages in thread
From: Karthik Nayak @ 2025-09-23 15:39 UTC (permalink / raw)
To: Justin Tobler, git
[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]
Justin Tobler <jltobler@gmail.com> writes:
[snip]
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 2a67abfca8..7d0341e4f1 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -9,7 +9,7 @@ SYNOPSIS
> --------
> [synopsis]
> git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
> -git repo stats
> +git repo stats [--format=(table|keyvalue)]
>
> DESCRIPTION
> -----------
> @@ -44,12 +44,22 @@ supported:
> +
> `-z` is an alias for `--format=nul`.
>
> -stats::
> +`stats [--format=(table|keyvalue)]`::
> Retrieve stats about the current repository. All references and
> reachable objects in the repository are categorized and counted
> accordingly.
> +
> -The table output format may change and is not intended for machine parsing.
> +The output format can be chosen through the flag `--format`. Two formats are
> +supported:
> ++
> +`table`:::
> + Outputs repository stats in a human-friendly table and is used by
> + default. This format may change and is not intended for machine
> + parsing.
> +
> +`keyvalue`:::
> + Each line of output contains a key-value pair of a repostiory stat. The
s/repostiory/repository
> + '=' character is used to delimit between the key and the value.
>
Does each value end with a newline or with a NUL? We should mention
that here.
[snip]
> +static void stats_print(struct stats *stats)
> +{
> + struct strbuf buf = STRBUF_INIT;
> +
> + strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->refs.branches);
> + strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->refs.tags);
> + strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->refs.remotes);
> + strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->refs.others);
> +
> + strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->objects.commits);
> + strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->objects.trees);
> + strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->objects.blobs);
> + strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
> + (uintmax_t)stats->objects.tags);
> +
> + fwrite(buf.buf, sizeof(char), buf.len, stdout);
> + strbuf_release(&buf);
> +}
> +
Okay so newline delimeted, similar to 'git repo info'.
> static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
> {
> for (int i = 0; i < refs->nr; i++) {
> @@ -359,9 +390,16 @@ static void stats_count_objects(struct object_stats *stats,
> path_walk_info_clear(&info);
> }
>
> -static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> - const char *prefix, struct repository *repo)
> +static int repo_stats(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> {
> + enum output_format format = FORMAT_TABLE;
> + struct option options[] = {
> + OPT_CALLBACK_F(0, "format", &format, N_("format"),
> + N_("output format"),
> + PARSE_OPT_NONEG, parse_format_cb),
> + OPT_END()
> + };
> struct ref_filter filter = REF_FILTER_INIT;
> struct strvec ref_patterns = STRVEC_INIT;
> struct stats_table table = { 0 };
> @@ -369,6 +407,10 @@ static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> struct stats stats = { 0 };
> struct rev_info revs;
>
> + parse_options(argc, argv, prefix, options, repo_usage, 0);
> + if (format == FORMAT_NUL_TERMINATED)
> + die(_("nul format not yet supported"));
> +
Okay I forsee the next patch adding this support.
The patch looks good!
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
@ 2025-09-23 15:41 ` Karthik Nayak
2025-09-23 16:02 ` Justin Tobler
1 sibling, 1 reply; 92+ messages in thread
From: Karthik Nayak @ 2025-09-23 15:41 UTC (permalink / raw)
To: Justin Tobler, git
[-- Attachment #1: Type: text/plain, Size: 2286 bytes --]
Justin Tobler <jltobler@gmail.com> writes:
[snip]
> @@ -291,27 +291,31 @@ static void stats_table_print(struct stats_table *table)
> strbuf_release(&buf);
> }
>
> -static void stats_print(struct stats *stats)
> +static void stats_print(struct stats *stats, int nul_delim)
> {
>
Nit: we can use 'bool' variable types now.
> struct strbuf buf = STRBUF_INIT;
> -
> - strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.branches);
> - strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.tags);
> - strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.remotes);
> - strbuf_addf(&buf, "references.others.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->refs.others);
> -
> - strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.commits);
> - strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.trees);
> - strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.blobs);
> - strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "\n",
> - (uintmax_t)stats->objects.tags);
> + char delim = '\n';
> +
> + if (nul_delim)
> + delim = '\0';
> +
> + strbuf_addf(&buf, "references.branches.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.branches, delim);
> + strbuf_addf(&buf, "references.tags.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.tags, delim);
> + strbuf_addf(&buf, "references.remotes.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.remotes, delim);
> + strbuf_addf(&buf, "references.others.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->refs.others, delim);
> +
> + strbuf_addf(&buf, "objects.commits.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.commits, delim);
> + strbuf_addf(&buf, "objects.trees.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.trees, delim);
> + strbuf_addf(&buf, "objects.blobs.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.blobs, delim);
> + strbuf_addf(&buf, "objects.tags.count=%" PRIuMAX "%c",
> + (uintmax_t)stats->objects.tags, delim);
>
> fwrite(buf.buf, sizeof(char), buf.len, stdout);
> strbuf_release(&buf);
The rest looks good to me! :)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 1/4] builtin/repo: introduce stats subcommand
2025-09-23 15:22 ` Karthik Nayak
@ 2025-09-23 15:55 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:55 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On 25/09/23 11:22AM, Karthik Nayak wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> [snip]
>
> > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> > index 209afd1b61..7762329551 100644
> > --- a/Documentation/git-repo.adoc
> > +++ b/Documentation/git-repo.adoc
> > @@ -9,6 +9,7 @@ SYNOPSIS
> > --------
> > [synopsis]
> > git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
> > +git repo stats
> >
> > DESCRIPTION
> > -----------
> > @@ -43,6 +44,12 @@ supported:
> > +
> > `-z` is an alias for `--format=nul`.
> >
> > +stats::
> > + Retrieve stats about the current repository. All references in the
> > + repository are categorized and counted accordingly.
> > ++
>
> Nit: Should we use bullet points for the information provided? As of
> this commit it is only references, but with time, this will grow, so
> would be nice to have a set of bullet points to understand the different
> information retrieved.
Ya, it was also suggested else where to do the same. Will update.
> > +The table output format may change and is not intended for machine parsing.
> > +
> > INFO KEYS
> > ---------
> > In order to obtain a set of values from `git repo info`, you should provide
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index bbb0966f2d..15899dd74c 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -4,12 +4,15 @@
> > #include "environment.h"
> > #include "parse-options.h"
> > #include "quote.h"
> > +#include "ref-filter.h"
> > #include "refs.h"
> > #include "strbuf.h"
> > +#include "string-list.h"
> > #include "shallow.h"
> >
> > static const char *const repo_usage[] = {
> > "git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
> > + "git repo stats",
> > NULL
> > };
> >
> > @@ -156,12 +159,160 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> > return print_fields(argc, argv, repo, format);
> > }
> >
> > +struct stats {
> > + size_t branches;
> > + size_t remotes;
> > + size_t tags;
> > + size_t others;
>
> Maybe we can use a nested structure here, which would reflect the output
> table. That would be much nicer, as the code would grow, currently we
> know 'others' here refers to references, but with other fields coming
> in, this would no longer be obvious.
Will do.
> > +};
> > +
> > +struct stats_table {
> > + struct string_list rows;
> > +
> > + int name_col_width;
> > + int value_col_width;
>
> Can these be negative?
No, but the only usage of these values is in the format string used to
print each row and it requires an int. I'm not sure it makes sense to
cast in this situation.
> > +};
> > +
>
> Nit: Could we add a comment about what this structure is and what it
> holds? I think it is sort of obvious, but would be nice to clarify.
Adding a comment doesn't hurt. Will do.
[snip]
> > +static int repo_stats(int argc UNUSED, const char **argv UNUSED,
> > + const char *prefix UNUSED, struct repository *repo UNUSED)
> > +{
> > + struct ref_filter filter = REF_FILTER_INIT;
> > + struct strvec ref_patterns = STRVEC_INIT;
> > + struct stats_table table = { 0 };
> > + struct ref_array refs = { 0 };
> > + struct stats stats = { 0 };
> > +
> > + filter.name_patterns = ref_patterns.v;
> > + filter_refs(&refs, &filter, FILTER_REFS_REGULAR);
> > +
>
> I was wondering why we need the filter mechanism here, but seems like
> this is to obtain the type of the reference. This is setup automatically
> by the filter mechanism and so it's okay.
>
> We could replicate the same using the ref iterator code. This would be a
> little more involved, but would remove the need to store all refs in a
> 'ref_array' and also the need to loop over references twice. But it
> doesn't really matter in this usecase I assume.
In a future series, I would like to add options that restrict the set of
references used when evaluating the repository. For this reason, I think
it makes sense to stick with filter_refs() here.
> > + stats_count_references(&stats, &refs);
> > +
> > + stats_table_setup(&table, &stats);
> > + stats_table_print(&table);
> > +
> > + string_list_clear(&table.rows, 1);
> > + strvec_clear(&ref_patterns);
>
> Huh. So `ref_patterns` is simply a dummy variable, I was wondering why
> we simply can't set `filter.name_patterns = NULL`. I see that it is
> because in `filter_pattern_match()` we do `if
> (!*filter->name_patterns)`. This is not a great interface for
> ref-filter. Perhaps we could add a precursor commit like:
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 520d2539c9..20284b5918 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2664,7 +2664,7 @@ static int match_name_as_path(const char
> **pattern, const char *refname,
> /* Return 1 if the refname matches one of the patterns, otherwise 0. */
> static int filter_pattern_match(struct ref_filter *filter, const char *refname)
> {
> - if (!*filter->name_patterns)
> + if (!filter->name_patterns || !*filter->name_patterns)
> return 1; /* No pattern always matches */
> if (filter->match_as_path)
> return match_name_as_path(filter->name_patterns, refname,
> diff --git a/ref-filter.h b/ref-filter.h
> index f22ca94b49..44d9b481ad 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -109,6 +109,7 @@ struct ref_format {
> #define REF_FILTER_INIT { \
> .points_at = OID_ARRAY_INIT, \
> .exclude = STRVEC_INIT, \
> + .name_pattern = NULL, \
> }
> #define REF_FORMAT_INIT { \
> .use_color = -1, \
Ya, I actually already have this exact patch locally. :)
I considered sending it as part of this series, but opted to just use
the dummy vairable instead. It probably makes sense though to improve
the ref-filter interface though. I'll do so in the next version.
Thanks for the review,
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 2/4] builtin/repo: add object counts in stats output
2025-09-23 15:30 ` Karthik Nayak
@ 2025-09-23 15:56 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:56 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On 25/09/23 10:30AM, Karthik Nayak wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> [snip]
>
> > @@ -159,13 +161,25 @@ static int repo_info(int argc, const char **argv, const char *prefix,
> > return print_fields(argc, argv, repo, format);
> > }
> >
> > -struct stats {
> > +struct ref_stats {
> > size_t branches;
> > size_t remotes;
> > size_t tags;
> > size_t others;
> > };
> >
> > +struct object_stats {
> > + size_t tags;
> > + size_t commits;
> > + size_t trees;
> > + size_t blobs;
> > +};
> > +
> > +struct stats {
> > + struct ref_stats refs;
> > + struct object_stats objects;
> > +};
> > +
>
> Okay so we do use nested structs, which I mentioned in the previous
> commit, so we're all good.
>
> Nit: A lot of the diff in this commit is because we don't make the
> required structs in the previous commit. Perhaps we could start with the
> nested structs in the previous commit? That way this patch would be much
> simpler.
Ya, I clean this up in the next version.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 3/4] builtin/repo: add keyvalue format for stats
2025-09-23 15:39 ` Karthik Nayak
@ 2025-09-23 15:59 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 15:59 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On 25/09/23 11:39AM, Karthik Nayak wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> [snip]
>
> > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> > index 2a67abfca8..7d0341e4f1 100644
> > --- a/Documentation/git-repo.adoc
> > +++ b/Documentation/git-repo.adoc
> > @@ -9,7 +9,7 @@ SYNOPSIS
> > --------
> > [synopsis]
> > git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
> > -git repo stats
> > +git repo stats [--format=(table|keyvalue)]
> >
> > DESCRIPTION
> > -----------
> > @@ -44,12 +44,22 @@ supported:
> > +
> > `-z` is an alias for `--format=nul`.
> >
> > -stats::
> > +`stats [--format=(table|keyvalue)]`::
> > Retrieve stats about the current repository. All references and
> > reachable objects in the repository are categorized and counted
> > accordingly.
> > +
> > -The table output format may change and is not intended for machine parsing.
> > +The output format can be chosen through the flag `--format`. Two formats are
> > +supported:
> > ++
> > +`table`:::
> > + Outputs repository stats in a human-friendly table and is used by
> > + default. This format may change and is not intended for machine
> > + parsing.
> > +
> > +`keyvalue`:::
> > + Each line of output contains a key-value pair of a repostiory stat. The
>
> s/repostiory/repository
Will fix.
> > + '=' character is used to delimit between the key and the value.
> >
>
> Does each value end with a newline or with a NUL? We should mention
> that here.
I was thinking that this would be implied by the prior sentence, but it
is probably better to be more explicit here. Will update. :)
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 15:41 ` Karthik Nayak
@ 2025-09-23 16:02 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-23 16:02 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On 25/09/23 11:41AM, Karthik Nayak wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> > -static void stats_print(struct stats *stats)
> > +static void stats_print(struct stats *stats, int nul_delim)
> > {
> >
>
> Nit: we can use 'bool' variable types now.
It was suggested elsewhere to just pass the delimiter character here
directly instead. I will probably do that. :)
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH 4/4] builtin/repo: add nul format for stats
2025-09-23 15:33 ` Justin Tobler
@ 2025-09-24 4:48 ` Patrick Steinhardt
0 siblings, 0 replies; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 4:48 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Tue, Sep 23, 2025 at 10:33:45AM -0500, Justin Tobler wrote:
> On 25/09/23 12:53PM, Patrick Steinhardt wrote:
> > On Mon, Sep 22, 2025 at 09:57:00PM -0500, Justin Tobler wrote:
> > > diff --git a/builtin/repo.c b/builtin/repo.c
> > > index 4c16a68e4e..37034e6347 100644
> > > --- a/builtin/repo.c
> > > +++ b/builtin/repo.c
> > > @@ -291,27 +291,31 @@ static void stats_table_print(struct stats_table *table)
> > > strbuf_release(&buf);
> > > }
> > >
> > > -static void stats_print(struct stats *stats)
> > > +static void stats_print(struct stats *stats, int nul_delim)
> >
> > Instead of passing a boolean-style option, can't we pass the expected
> > delimiter directly? Makes the callsite a bit more obvious.
>
> Ya, we could do that here instead. Something I just noticed is that the
> NUL format in `git repo info` also replaces the '=' delimiter with a
> newline. I'm not sure if it would be best to match the same behavior
> here?
>
> If so, we would have to either pass both delimiters as arguments to the
> function, or just keep the boolean toggle for the mode.
I think being consistent would be nice, yes. I'd personally lean towards
passing both delimiters as arguments in that case. Makes the callsites
way easier to read.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v2 0/6] builtin/repo: introduce stats subcommand
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
` (3 preceding siblings ...)
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
` (6 more replies)
4 siblings, 7 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
Greetings,
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but in Git natively.
In this initial version, the "stats" subcommand only surfaces counts of
the various reference and object types in a repository. In a follow-up
series, I would like to introduce additional data points that are
present in git-sizer(1) such as largest objects, combined object sizes
by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
Changes since V1:
- Translatable terms displayed in the table have formatting separated
out.
- Squashed the `keyvalue` and `nul` output format patches into one.
- Added a progress meter to provide users with more feedback.
- Updated docs to outline to outline reported data in a bulleted list.
- Combined similar tests together to reduce repetitive setup.
- Added patch to improve ref-filter interface so we don't have to create
a dummy patterns array.
- Many other renames and cleanups to improve patch clarity.
Thanks,
-Justin
Justin Tobler (6):
builtin/repo: rename repo_info() to cmd_repo_info()
ref-filter: allow NULL filter pattern
builtin/repo: introduce stats subcommand
builtin/repo: add object counts in stats output
builtin/repo: add keyvalue and nul format for stats
builtin/repo: add progress meter for stats
Documentation/git-repo.adoc | 30 +++
builtin/repo.c | 354 +++++++++++++++++++++++++++++++++++-
ref-filter.c | 4 +-
t/meson.build | 1 +
t/t1901-repo-stats.sh | 109 +++++++++++
5 files changed, 493 insertions(+), 5 deletions(-)
create mode 100755 t/t1901-repo-stats.sh
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info()
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
` (5 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..eeeab8fbd2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -136,8 +136,8 @@ static int parse_format_cb(const struct option *opt,
return 0;
}
-static int repo_info(int argc, const char **argv, const char *prefix,
- struct repository *repo)
+static int cmd_repo_info(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
enum output_format format = FORMAT_KEYVALUE;
struct option options[] = {
@@ -161,7 +161,7 @@ int cmd_repo(int argc, const char **argv, const char *prefix,
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
- OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
OPT_END()
};
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 2/6] ref-filter: allow NULL filter pattern
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
` (4 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..2cb5a166d6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char **pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
@@ -2751,7 +2751,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
- if (!filter->name_patterns[0]) {
+ if (!filter->name_patterns || !filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 3/6] builtin/repo: introduce stats subcommand
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-25 5:38 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
` (3 subsequent siblings)
6 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler, Derrick Stolee
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++
builtin/repo.c | 179 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-stats.sh | 61 ++++++++++++
4 files changed, 251 insertions(+)
create mode 100755 t/t1901-repo-stats.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..a009bf8cf1 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo stats
DESCRIPTION
-----------
@@ -43,6 +44,15 @@ supported:
+
`-z` is an alias for `--format=nul`.
+`stats`::
+ Retrieve statistics about the current repository. The following kinds
+ of information are reported:
++
+* Reference counts categorized by type
+
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index eeeab8fbd2..32ddf2350e 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,15 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo stats",
NULL
};
@@ -156,12 +159,188 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct ref_stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ size_t name_col_width;
+ size_t value_col_width;
+};
+
+/*
+ * Holds column data that gets stored for each row.
+ */
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_add(struct stats_table *table, const char *format,
+ const char *name, struct stats_table_entry *entry)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ size_t name_width;
+
+ strbuf_addf(&buf, format, name);
+ formatted_name = strbuf_detach(&buf, &name_width);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ size_t value_width = strlen(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_add_count(struct stats_table *table, const char *format,
+ const char *name, size_t value)
+{
+ struct stats_table_entry *entry;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+ stats_table_add(table, format, name, entry);
+}
+
+static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+{
+ size_t ref_total;
+
+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
+ stats_table_add(table, "* %s", _("References"), NULL);
+ stats_table_add_count(table, " * %s", _("Count"), ref_total);
+ stats_table_add_count(table, " * %s", _("Branches"), refs->branches);
+ stats_table_add_count(table, " * %s", _("Tags"), refs->tags);
+ stats_table_add_count(table, " * %s", _("Remotes"), refs->remotes);
+ stats_table_add_count(table, " * %s", _("Others"), refs->others);
+}
+
+static inline size_t max_size_t(size_t a, size_t b)
+{
+ return (a > b) ? a : b;
+}
+
+static void stats_table_print(struct stats_table *table)
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
+ size_t name_title_len = strlen(name_col_title);
+ size_t value_title_len = strlen(value_col_title);
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ int name_col_width;
+ int value_col_width;
+
+ name_col_width = cast_size_t_to_int(
+ max_size_t(table->name_col_width, name_title_len));
+ value_col_width = cast_size_t_to_int(
+ max_size_t(table->value_col_width, value_title_len));
+
+ strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ strbuf_addstr(&buf, "| ");
+ strbuf_addchars(&buf, '-', name_col_width);
+ strbuf_addstr(&buf, " | ");
+ strbuf_addchars(&buf, '-', value_col_width);
+ strbuf_addstr(&buf, " |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
+ item->string, value_col_width, value);
+ }
+
+ fputs(buf.buf, stdout);
+ strbuf_release(&buf);
+}
+
+static void stats_table_clear(struct stats_table *table)
+{
+ struct stats_table_entry *entry;
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, &table->rows) {
+ entry = item->util;
+ if (entry)
+ free(entry->value);
+ }
+
+ string_list_clear(&table->rows, 1);
+}
+
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+}
+
+static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
+ const char *prefix UNUSED, struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
+ struct ref_array refs = { 0 };
+
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
+
+ stats_count_references(&stats, &refs);
+
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+
+ stats_table_clear(&table);
+ ref_array_clear(&refs);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
+ OPT_SUBCOMMAND("stats", &fn, cmd_repo_stats),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..071d4a5112 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-stats.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
new file mode 100755
index 0000000000..535ac511dd
--- /dev/null
+++ b/t/t1901-repo-stats.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='test git repo stats'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ git tag -a foo -m bar &&
+
+ oid="$(git rev-parse HEAD)" &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 4/6] builtin/repo: add object counts in stats output
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
` (2 preceding siblings ...)
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
` (2 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 91 +++++++++++++++++++++++++++++++++++--
t/t1901-repo-stats.sh | 51 +++++++++++++--------
3 files changed, 121 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index a009bf8cf1..0b8d74ed3e 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -49,6 +49,7 @@ supported:
of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index 32ddf2350e..8f130bca66 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -166,6 +168,18 @@ struct ref_stats {
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct repo_stats {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -213,8 +227,11 @@ static void stats_table_add_count(struct stats_table *table, const char *format,
stats_table_add(table, format, name, entry);
}
-static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
@@ -224,6 +241,15 @@ static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
stats_table_add_count(table, " * %s", _("Tags"), refs->tags);
stats_table_add_count(table, " * %s", _("Remotes"), refs->remotes);
stats_table_add_count(table, " * %s", _("Others"), refs->others);
+
+ object_total = objects->commits + objects->trees + objects->blobs + objects->tags;
+ stats_table_add(table, "%s", "", NULL);
+ stats_table_add(table, "* %s", _("Reachable objects"), NULL);
+ stats_table_add_count(table, " * %s", _("Count"), object_total);
+ stats_table_add_count(table, " * %s", _("Commits"), objects->commits);
+ stats_table_add_count(table, " * %s", _("Trees"), objects->trees);
+ stats_table_add_count(table, " * %s", _("Blobs"), objects->blobs);
+ stats_table_add_count(table, " * %s", _("Tags"), objects->tags);
}
static inline size_t max_size_t(size_t a, size_t b)
@@ -310,25 +336,82 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
}
}
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
+ struct object_stats *stats = cb_data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ BUG("invalid object type");
+ }
+
+ return 0;
+}
+
+static void stats_count_objects(struct object_stats *stats,
+ struct ref_array *refs, struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ case FILTER_REFS_TAGS:
+ case FILTER_REFS_REMOTES:
+ case FILTER_REFS_OTHERS:
+ add_pending_oid(revs, NULL, &ref->objectname, 0);
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
+}
+
static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
- const char *prefix UNUSED, struct repository *repo UNUSED)
+ const char *prefix, struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
+ repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats, &refs);
+ stats_count_references(&stats.refs, &refs);
+ stats_count_objects(&stats.objects, &refs, &revs);
stats_table_setup(&table, &stats);
stats_table_print(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
ref_array_clear(&refs);
return 0;
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 535ac511dd..315b9e1767 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -10,14 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo stats >out 2>err &&
@@ -27,28 +34,36 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references' '
+test_expect_success 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- git commit --allow-empty -m init &&
+ test_commit_bulk 42 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
git update-ref refs/remotes/origin/foo "$oid" &&
+ # Also creates a commit, tree, and blob.
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 130 |
+ | * Commits | 43 |
+ | * Trees | 43 |
+ | * Blobs | 43 |
+ | * Tags | 1 |
EOF
git repo stats >out 2>err &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
` (3 preceding siblings ...)
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
6 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
All repository stats are outputted in a human-friendly table form. This
format is not suitable for machine parsing. Add a --format option that
supports three output modes: `table`, `keyvalue`, and `nul`. The `table`
mode is the default format and prints the same table output as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 25 +++++++++++++--
builtin/repo.c | 62 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-stats.sh | 33 ++++++++++++++++++++
3 files changed, 112 insertions(+), 8 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 0b8d74ed3e..db21b75522 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo stats
+git repo stats [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,7 +44,7 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`stats`::
+`stats [--format=(table|keyvalue|nul)]`::
Retrieve statistics about the current repository. The following kinds
of information are reported:
+
@@ -52,7 +52,26 @@ supported:
* Reachable object counts categorized by type
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Three formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table and is used by
+ default. This format may change and is not intended for machine
+ parsing.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
+ The '=' character is used to delimit between the key and the value.
+ Values containing "unusual" characters are quoted as explained for the
+ configuration variable `core.quotePath` (see linkgit:git-config[1]).
+
+`nul`:::
+ Similar to `keyvalue`, but uses a NUL character to delimit between
+ key-value pairs instead of a newline. Also uses a newline character as
+ the delimiter between the key and value instead of '='. Unlike the
+ `keyvalue` format, values containing "unusual" characters are never
+ quoted.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index 8f130bca66..fe7d43f78e 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -14,13 +14,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo stats",
+ "git repo stats [--format=(table|keyvalue|nul)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -135,6 +136,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -157,6 +160,8 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format != FORMAT_KEYVALUE && format != FORMAT_NUL_TERMINATED)
+ die(_("unsupported output format"));
return print_fields(argc, argv, repo, format);
}
@@ -312,6 +317,33 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
+ char value_delim)
+{
+ struct strbuf buf = STRBUF_INIT;
+
+ strbuf_addf(&buf, "references.branches.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->refs.branches, value_delim);
+ strbuf_addf(&buf, "references.tags.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->refs.tags, value_delim);
+ strbuf_addf(&buf, "references.remotes.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->refs.remotes, value_delim);
+ strbuf_addf(&buf, "references.others.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->refs.others, value_delim);
+
+ strbuf_addf(&buf, "objects.commits.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->objects.commits, value_delim);
+ strbuf_addf(&buf, "objects.trees.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->objects.trees, value_delim);
+ strbuf_addf(&buf, "objects.blobs.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->objects.blobs, value_delim);
+ strbuf_addf(&buf, "objects.tags.count%c%" PRIuMAX "%c",
+ key_delim, (uintmax_t)stats->objects.tags, value_delim);
+
+ fwrite(buf.buf, sizeof(char), buf.len, stdout);
+ strbuf_release(&buf);
+}
+
static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
@@ -389,17 +421,25 @@ static void stats_count_objects(struct object_stats *stats,
path_walk_info_clear(&info);
}
-static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
- const char *prefix, struct repository *repo)
+static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
+ parse_options(argc, argv, prefix, options, repo_usage, 0);
repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
@@ -407,8 +447,20 @@ static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
stats_count_references(&stats.refs, &refs);
stats_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup(&table, &stats);
- stats_table_print(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ stats_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
+ stats_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
+ }
stats_table_clear(&table);
release_revisions(&revs);
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 315b9e1767..d2c1b6e307 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
)
'
+test_expect_success 'repository stats with keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+
+ cat >expect <<-\EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ git repo stats --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n" "\0" <expect | tr "=" "\n" >expect_null &&
+ git repo stats --format=nul >out 2>err &&
+
+ test_cmp expect_null out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 6/6] builtin/repo: add progress meter for stats
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
` (4 preceding siblings ...)
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
@ 2025-09-24 21:24 ` Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
6 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-24 21:24 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
When using the stats subcommand for git-repo(1), evaluating a repository
may take some time depending on its shape. Add a progress meter to
provide feedback to the user about what is happening. The progress meter
is enabled by default when the command is executed from a tty. It can
also be explicitly enabled/disabled via the --[no-]progress option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 46 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index fe7d43f78e..fdc8af92dc 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,6 +4,7 @@
#include "environment.h"
#include "parse-options.h"
#include "path-walk.h"
+#include "progress.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
@@ -344,8 +345,14 @@ static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
strbuf_release(&buf);
}
-static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
+ struct repository *repo, int show_progress)
{
+ struct progress *progress = NULL;
+
+ if (show_progress)
+ progress = start_progress(repo, _("Counting references"), refs->nr);
+
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -365,13 +372,24 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
default:
BUG("unexpected reference type");
}
+
+ display_progress(progress, i + 1);
}
+
+ stop_progress(&progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
+
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
- struct object_stats *stats = cb_data;
+ struct count_objects_data *data = cb_data;
+ struct object_stats *stats = data->stats;
+ size_t object_count;
switch (type) {
case OBJ_TAG:
@@ -390,17 +408,24 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
BUG("invalid object type");
}
+ object_count = stats->tags + stats->commits + stats->trees + stats->blobs;
+ display_progress(data->progress, object_count);
+
return 0;
}
static void stats_count_objects(struct object_stats *stats,
- struct ref_array *refs, struct rev_info *revs)
+ struct ref_array *refs, struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
+ .stats = stats,
+ };
info.revs = revs;
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -417,8 +442,12 @@ static void stats_count_objects(struct object_stats *stats,
}
}
+ if (show_progress)
+ data.progress = start_progress(repo, _("Counting Objects"), 0);
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
}
static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
@@ -432,10 +461,12 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
OPT_CALLBACK_F(0, "format", &format, N_("format"),
N_("output format"),
PARSE_OPT_NONEG, parse_format_cb),
+ OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
OPT_END()
};
@@ -444,8 +475,11 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats.refs, &refs);
- stats_count_objects(&stats.objects, &refs, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
+ stats_count_references(&stats.refs, &refs, repo, show_progress);
+ stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 3/6] builtin/repo: introduce stats subcommand
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-25 5:38 ` Patrick Steinhardt
2025-09-25 13:01 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 5:38 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188, Derrick Stolee
On Wed, Sep 24, 2025 at 04:24:23PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 209afd1b61..a009bf8cf1 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -156,12 +159,188 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
> return print_fields(argc, argv, repo, format);
> }
>
> +struct ref_stats {
> + size_t branches;
> + size_t remotes;
> + size_t tags;
> + size_t others;
> +};
> +
> +struct stats_table {
> + struct string_list rows;
> +
> + size_t name_col_width;
> + size_t value_col_width;
> +};
> +
> +/*
> + * Holds column data that gets stored for each row.
> + */
> +struct stats_table_entry {
> + char *value;
> +};
> +
> +static void stats_table_add(struct stats_table *table, const char *format,
> + const char *name, struct stats_table_entry *entry)
We could of course accept varargs right from the start and thus allow
the caller to pass arbitrary formatting directives. But I guess we don't
need it now, so it's fine to not do it.
[snip]
> +static void stats_table_print(struct stats_table *table)
Nit: The table can be marked as `const` as we don't modify it.
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + size_t name_title_len = strlen(name_col_title);
> + size_t value_title_len = strlen(value_col_title);
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> + int name_col_width;
> + int value_col_width;
> +
> + name_col_width = cast_size_t_to_int(
> + max_size_t(table->name_col_width, name_title_len));
> + value_col_width = cast_size_t_to_int(
> + max_size_t(table->value_col_width, value_title_len));
> +
> + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
> + strbuf_addstr(&buf, "| ");
> + strbuf_addchars(&buf, '-', name_col_width);
> + strbuf_addstr(&buf, " | ");
> + strbuf_addchars(&buf, '-', value_col_width);
> + strbuf_addstr(&buf, " |\n");
> +
> + for_each_string_list_item(item, &table->rows) {
> + struct stats_table_entry *entry = item->util;
> + const char *value = "";
> +
> + if (entry) {
> + struct stats_table_entry *entry = item->util;
> + value = entry->value;
> + }
> +
> + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> + item->string, value_col_width, value);
> + }
> +
> + fputs(buf.buf, stdout);
> + strbuf_release(&buf);
> +}
By the way, is there any specific reason we do the detour via the strbuf
instead of printing the data to stdout directly?
> +static void stats_table_clear(struct stats_table *table)
> +{
> + struct stats_table_entry *entry;
> + struct string_list_item *item;
> +
> + for_each_string_list_item(item, &table->rows) {
> + entry = item->util;
> + if (entry)
> + free(entry->value);
> + }
> +
> + string_list_clear(&table->rows, 1);
> +}
Yeah, this is much nicer now.
> +static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
> +{
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + stats->branches++;
> + break;
> + case FILTER_REFS_REMOTES:
> + stats->remotes++;
> + break;
> + case FILTER_REFS_TAGS:
> + stats->tags++;
> + break;
> + case FILTER_REFS_OTHERS:
> + stats->others++;
> + break;
> + default:
> + BUG("unexpected reference type");
> + }
> + }
> +}
Here we're now being defensive. Good.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
@ 2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:16 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 5:39 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Wed, Sep 24, 2025 at 04:24:25PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 0b8d74ed3e..db21b75522 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -52,7 +52,26 @@ supported:
> * Reachable object counts categorized by type
>
> +
> -The table output format may change and is not intended for machine parsing.
> +The output format can be chosen through the flag `--format`. Three formats are
> +supported:
> ++
> +`table`:::
> + Outputs repository stats in a human-friendly table and is used by
> + default. This format may change and is not intended for machine
> + parsing.
Let's mention that this is the default format.
> +`keyvalue`:::
> + Each line of output contains a key-value pair for a repository stat.
> + The '=' character is used to delimit between the key and the value.
> + Values containing "unusual" characters are quoted as explained for the
> + configuration variable `core.quotePath` (see linkgit:git-config[1]).
In the current state there is never any quoting, so this statement here
is a bit misleading. Should we maybe drop that part?
> +`nul`:::
> + Similar to `keyvalue`, but uses a NUL character to delimit between
> + key-value pairs instead of a newline. Also uses a newline character as
> + the delimiter between the key and value instead of '='. Unlike the
> + `keyvalue` format, values containing "unusual" characters are never
> + quoted.
Likewise.
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 8f130bca66..fe7d43f78e 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -312,6 +317,33 @@ static void stats_table_clear(struct stats_table *table)
> string_list_clear(&table->rows, 1);
> }
>
> +static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
> + char value_delim)
> +{
> + struct strbuf buf = STRBUF_INIT;
> +
> + strbuf_addf(&buf, "references.branches.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->refs.branches, value_delim);
> + strbuf_addf(&buf, "references.tags.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->refs.tags, value_delim);
> + strbuf_addf(&buf, "references.remotes.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->refs.remotes, value_delim);
> + strbuf_addf(&buf, "references.others.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->refs.others, value_delim);
> +
> + strbuf_addf(&buf, "objects.commits.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->objects.commits, value_delim);
> + strbuf_addf(&buf, "objects.trees.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->objects.trees, value_delim);
> + strbuf_addf(&buf, "objects.blobs.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->objects.blobs, value_delim);
> + strbuf_addf(&buf, "objects.tags.count%c%" PRIuMAX "%c",
> + key_delim, (uintmax_t)stats->objects.tags, value_delim);
> +
> + fwrite(buf.buf, sizeof(char), buf.len, stdout);
> + strbuf_release(&buf);
> +}
Same question here regarding the buffering. Can't we print to stdout
directly, or is there a reason not to?
> @@ -389,17 +421,25 @@ static void stats_count_objects(struct object_stats *stats,
> path_walk_info_clear(&info);
> }
>
> -static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
> - const char *prefix, struct repository *repo)
> +static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> {
> struct ref_filter filter = REF_FILTER_INIT;
> struct stats_table table = {
> .rows = STRING_LIST_INIT_DUP,
> };
> + enum output_format format = FORMAT_TABLE;
> struct repo_stats stats = { 0 };
> struct ref_array refs = { 0 };
> struct rev_info revs;
> + struct option options[] = {
> + OPT_CALLBACK_F(0, "format", &format, N_("format"),
> + N_("output format"),
> + PARSE_OPT_NONEG, parse_format_cb),
> + OPT_END()
> + };
>
> + parse_options(argc, argv, prefix, options, repo_usage, 0);
I think it would be sensible to introduce this call to `parse_options()`
right in the first commit that wires up the new subcommand. If we don't
do that we otherwise accept arbitrary arguments without raising any
error, and neither do we know to output help.
So we should move the addition to a previous commit and probably do the
following:
argc = parse_options(...);
if (argc)
usagef("too many arguments");
> @@ -407,8 +447,20 @@ static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
> stats_count_references(&stats.refs, &refs);
> stats_count_objects(&stats.objects, &refs, &revs);
>
> - stats_table_setup(&table, &stats);
> - stats_table_print(&table);
> + switch (format) {
> + case FORMAT_TABLE:
> + stats_table_setup(&table, &stats);
> + stats_table_print(&table);
> + break;
> + case FORMAT_KEYVALUE:
> + stats_keyvalue_print(&stats, '=', '\n');
> + break;
> + case FORMAT_NUL_TERMINATED:
> + stats_keyvalue_print(&stats, '\n', '\0');
> + break;
This reads much nicer now. The newline as key-value delimiter is a
curious choice, but you simply do what we already do in `git repo info`.
> diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> index 315b9e1767..d2c1b6e307 100755
> --- a/t/t1901-repo-stats.sh
> +++ b/t/t1901-repo-stats.sh
> @@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
> )
> '
>
> +test_expect_success 'repository stats with keyvalue and nul format' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit_bulk 42 &&
> + git tag -a foo -m bar &&
> +
> + cat >expect <<-\EOF &&
> + references.branches.count=1
> + references.tags.count=1
> + references.remotes.count=0
> + references.others.count=0
> + objects.commits.count=42
> + objects.trees.count=42
> + objects.blobs.count=42
> + objects.tags.count=1
> + EOF
> +
> + git repo stats --format=keyvalue >out 2>err &&
> +
> + test_cmp expect out &&
> + test_line_count = 0 err &&
> +
> + # Replace key and value delimiters for nul format.
> + tr "\n" "\0" <expect | tr "=" "\n" >expect_null &&
You can do this without the pipe:
tr "\n=" "\0\n" <expect >expect_nul
Also, let's call the file `expect_nul` (with a single 'l') to match the
format.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 6/6] builtin/repo: add progress meter for stats
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
@ 2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:20 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 5:39 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Wed, Sep 24, 2025 at 04:24:26PM -0500, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index fe7d43f78e..fdc8af92dc 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -344,8 +345,14 @@ static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
> strbuf_release(&buf);
> }
>
> -static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
> +static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
> + struct repository *repo, int show_progress)
> {
> + struct progress *progress = NULL;
> +
> + if (show_progress)
> + progress = start_progress(repo, _("Counting references"), refs->nr);
We tend to use `start_delayed_progress()` so that the progress meter is
not displayed when the action takes less than a second. The delay can be
disabled in our tests by using `GIT_PROGRESS_DELAY=0`.
> @@ -365,13 +372,24 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
> default:
> BUG("unexpected reference type");
> }
> +
> + display_progress(progress, i + 1);
> }
> +
> + stop_progress(&progress);
> }
>
> +struct count_objects_data {
> + struct object_stats *stats;
> + struct progress *progress;
> +};
> +
> static int count_objects(const char *path UNUSED, struct oid_array *oids,
> enum object_type type, void *cb_data)
> {
> - struct object_stats *stats = cb_data;
> + struct count_objects_data *data = cb_data;
> + struct object_stats *stats = data->stats;
> + size_t object_count;
>
> switch (type) {
> case OBJ_TAG:
> @@ -390,17 +408,24 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
> BUG("invalid object type");
> }
>
> + object_count = stats->tags + stats->commits + stats->trees + stats->blobs;
We have this computation in two locations now. Maybe we should
deduplicate it via something like:
static inline size_t stats_get_total_object_count()
{
return stats->tags + stats->commits + stats->trees + stats->blobs;
}
> @@ -417,8 +442,12 @@ static void stats_count_objects(struct object_stats *stats,
> }
> }
>
> + if (show_progress)
> + data.progress = start_progress(repo, _("Counting Objects"), 0);
s/Objects/objects/
> @@ -432,10 +461,12 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
> struct repo_stats stats = { 0 };
> struct ref_array refs = { 0 };
> struct rev_info revs;
> + int show_progress = -1;
> struct option options[] = {
> OPT_CALLBACK_F(0, "format", &format, N_("format"),
> N_("output format"),
> PARSE_OPT_NONEG, parse_format_cb),
> + OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
> OPT_END()
> };
>
> @@ -444,8 +475,11 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
> if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
> die(_("unable to filter refs"));
>
> - stats_count_references(&stats.refs, &refs);
> - stats_count_objects(&stats.objects, &refs, &revs);
> + if (show_progress < 0)
> + show_progress = isatty(2);
Makes sense.
> + stats_count_references(&stats.refs, &refs, repo, show_progress);
> + stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
>
> switch (format) {
> case FORMAT_TABLE:
Should our tests be updated to verify that we know to print progress
depending on whether or not `--progress` is passed?
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 3/6] builtin/repo: introduce stats subcommand
2025-09-25 5:38 ` Patrick Steinhardt
@ 2025-09-25 13:01 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 13:01 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188, Derrick Stolee
On 25/09/25 07:38AM, Patrick Steinhardt wrote:
> On Wed, Sep 24, 2025 at 04:24:23PM -0500, Justin Tobler wrote:
[snip]
> > +static void stats_table_add(struct stats_table *table, const char *format,
> > + const char *name, struct stats_table_entry *entry)
>
> We could of course accept varargs right from the start and thus allow
> the caller to pass arbitrary formatting directives. But I guess we don't
> need it now, so it's fine to not do it.
I was on the fence about using varargs from the start. I figured since
we don't have a need right now for multiple formatting directives, we
could just pass the one. I might just go ahead and implement varargs
here in the next version though.
> [snip]
> > +static void stats_table_print(struct stats_table *table)
>
> Nit: The table can be marked as `const` as we don't modify it.
Will update.
> > +{
> > + const char *name_col_title = _("Repository stats");
> > + const char *value_col_title = _("Value");
> > + size_t name_title_len = strlen(name_col_title);
> > + size_t value_title_len = strlen(value_col_title);
> > + struct strbuf buf = STRBUF_INIT;
> > + struct string_list_item *item;
> > + int name_col_width;
> > + int value_col_width;
> > +
> > + name_col_width = cast_size_t_to_int(
> > + max_size_t(table->name_col_width, name_title_len));
> > + value_col_width = cast_size_t_to_int(
> > + max_size_t(table->value_col_width, value_title_len));
> > +
> > + strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
> > + value_col_width, value_col_title);
> > + strbuf_addstr(&buf, "| ");
> > + strbuf_addchars(&buf, '-', name_col_width);
> > + strbuf_addstr(&buf, " | ");
> > + strbuf_addchars(&buf, '-', value_col_width);
> > + strbuf_addstr(&buf, " |\n");
> > +
> > + for_each_string_list_item(item, &table->rows) {
> > + struct stats_table_entry *entry = item->util;
> > + const char *value = "";
> > +
> > + if (entry) {
> > + struct stats_table_entry *entry = item->util;
> > + value = entry->value;
> > + }
> > +
> > + strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
> > + item->string, value_col_width, value);
> > + }
> > +
> > + fputs(buf.buf, stdout);
> > + strbuf_release(&buf);
> > +}
>
> By the way, is there any specific reason we do the detour via the strbuf
> instead of printing the data to stdout directly?
Not really. I think it's just a holdover from a previous implementation
that was passing around a strbuf before printing. I'll go ahead and just
print directly to stdout via printf in the next version.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats
2025-09-25 5:39 ` Patrick Steinhardt
@ 2025-09-25 13:16 ` Justin Tobler
2025-09-25 13:58 ` Patrick Steinhardt
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 13:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/25 07:39AM, Patrick Steinhardt wrote:
> On Wed, Sep 24, 2025 at 04:24:25PM -0500, Justin Tobler wrote:
> > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> > index 0b8d74ed3e..db21b75522 100644
> > --- a/Documentation/git-repo.adoc
> > +++ b/Documentation/git-repo.adoc
> > @@ -52,7 +52,26 @@ supported:
> > * Reachable object counts categorized by type
> >
> > +
> > -The table output format may change and is not intended for machine parsing.
> > +The output format can be chosen through the flag `--format`. Three formats are
> > +supported:
> > ++
> > +`table`:::
> > + Outputs repository stats in a human-friendly table and is used by
> > + default. This format may change and is not intended for machine
> > + parsing.
>
> Let's mention that this is the default format.
I didn't mention that it is "used by default", but I think the wording
could be more clear here. Will improve in the next version.
> > +`keyvalue`:::
> > + Each line of output contains a key-value pair for a repository stat.
> > + The '=' character is used to delimit between the key and the value.
> > + Values containing "unusual" characters are quoted as explained for the
> > + configuration variable `core.quotePath` (see linkgit:git-config[1]).
>
> In the current state there is never any quoting, so this statement here
> is a bit misleading. Should we maybe drop that part?
While there are currently not any values in the output that would
require quoting, I'm inclined to leave this note in the documentation.
That way we set the expectation regarding how parsers should handle the
output from the start.
[snip]
> > +static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
> > + char value_delim)
> > +{
> > + struct strbuf buf = STRBUF_INIT;
> > +
> > + strbuf_addf(&buf, "references.branches.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->refs.branches, value_delim);
> > + strbuf_addf(&buf, "references.tags.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->refs.tags, value_delim);
> > + strbuf_addf(&buf, "references.remotes.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->refs.remotes, value_delim);
> > + strbuf_addf(&buf, "references.others.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->refs.others, value_delim);
> > +
> > + strbuf_addf(&buf, "objects.commits.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->objects.commits, value_delim);
> > + strbuf_addf(&buf, "objects.trees.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->objects.trees, value_delim);
> > + strbuf_addf(&buf, "objects.blobs.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->objects.blobs, value_delim);
> > + strbuf_addf(&buf, "objects.tags.count%c%" PRIuMAX "%c",
> > + key_delim, (uintmax_t)stats->objects.tags, value_delim);
> > +
> > + fwrite(buf.buf, sizeof(char), buf.len, stdout);
> > + strbuf_release(&buf);
> > +}
>
> Same question here regarding the buffering. Can't we print to stdout
> directly, or is there a reason not to?
Ya, we can just write to stdout directly. Will update.
> > @@ -389,17 +421,25 @@ static void stats_count_objects(struct object_stats *stats,
> > path_walk_info_clear(&info);
> > }
> >
> > -static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
> > - const char *prefix, struct repository *repo)
> > +static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
> > + struct repository *repo)
> > {
> > struct ref_filter filter = REF_FILTER_INIT;
> > struct stats_table table = {
> > .rows = STRING_LIST_INIT_DUP,
> > };
> > + enum output_format format = FORMAT_TABLE;
> > struct repo_stats stats = { 0 };
> > struct ref_array refs = { 0 };
> > struct rev_info revs;
> > + struct option options[] = {
> > + OPT_CALLBACK_F(0, "format", &format, N_("format"),
> > + N_("output format"),
> > + PARSE_OPT_NONEG, parse_format_cb),
> > + OPT_END()
> > + };
> >
> > + parse_options(argc, argv, prefix, options, repo_usage, 0);
>
> I think it would be sensible to introduce this call to `parse_options()`
> right in the first commit that wires up the new subcommand. If we don't
> do that we otherwise accept arbitrary arguments without raising any
> error, and neither do we know to output help.
>
> So we should move the addition to a previous commit and probably do the
> following:
>
> argc = parse_options(...);
> if (argc)
> usagef("too many arguments");
Good point. I'll add this in the next version.
> > @@ -407,8 +447,20 @@ static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
> > stats_count_references(&stats.refs, &refs);
> > stats_count_objects(&stats.objects, &refs, &revs);
> >
> > - stats_table_setup(&table, &stats);
> > - stats_table_print(&table);
> > + switch (format) {
> > + case FORMAT_TABLE:
> > + stats_table_setup(&table, &stats);
> > + stats_table_print(&table);
> > + break;
> > + case FORMAT_KEYVALUE:
> > + stats_keyvalue_print(&stats, '=', '\n');
> > + break;
> > + case FORMAT_NUL_TERMINATED:
> > + stats_keyvalue_print(&stats, '\n', '\0');
> > + break;
>
> This reads much nicer now. The newline as key-value delimiter is a
> curious choice, but you simply do what we already do in `git repo info`.
I agree that the key-value delimiter chosen is a bit strange. The
command is still experimental so we could maybe change it if want. Not
sure if it would be worth it though.
> > diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
> > index 315b9e1767..d2c1b6e307 100755
> > --- a/t/t1901-repo-stats.sh
> > +++ b/t/t1901-repo-stats.sh
> > @@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
> > )
> > '
> >
> > +test_expect_success 'repository stats with keyvalue and nul format' '
> > + test_when_finished "rm -rf repo" &&
> > + git init repo &&
> > + (
> > + cd repo &&
> > + test_commit_bulk 42 &&
> > + git tag -a foo -m bar &&
> > +
> > + cat >expect <<-\EOF &&
> > + references.branches.count=1
> > + references.tags.count=1
> > + references.remotes.count=0
> > + references.others.count=0
> > + objects.commits.count=42
> > + objects.trees.count=42
> > + objects.blobs.count=42
> > + objects.tags.count=1
> > + EOF
> > +
> > + git repo stats --format=keyvalue >out 2>err &&
> > +
> > + test_cmp expect out &&
> > + test_line_count = 0 err &&
> > +
> > + # Replace key and value delimiters for nul format.
> > + tr "\n" "\0" <expect | tr "=" "\n" >expect_null &&
>
> You can do this without the pipe:
>
> tr "\n=" "\0\n" <expect >expect_nul
>
> Also, let's call the file `expect_nul` (with a single 'l') to match the
> format.
Thanks. Will update.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 6/6] builtin/repo: add progress meter for stats
2025-09-25 5:39 ` Patrick Steinhardt
@ 2025-09-25 13:20 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 13:20 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188
On 25/09/25 07:39AM, Patrick Steinhardt wrote:
> On Wed, Sep 24, 2025 at 04:24:26PM -0500, Justin Tobler wrote:
> > diff --git a/builtin/repo.c b/builtin/repo.c
> > index fe7d43f78e..fdc8af92dc 100644
> > --- a/builtin/repo.c
> > +++ b/builtin/repo.c
> > @@ -344,8 +345,14 @@ static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
> > strbuf_release(&buf);
> > }
> >
> > -static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
> > +static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
> > + struct repository *repo, int show_progress)
> > {
> > + struct progress *progress = NULL;
> > +
> > + if (show_progress)
> > + progress = start_progress(repo, _("Counting references"), refs->nr);
>
> We tend to use `start_delayed_progress()` so that the progress meter is
> not displayed when the action takes less than a second. The delay can be
> disabled in our tests by using `GIT_PROGRESS_DELAY=0`.
Good to know. I'll update in the next version.
[snip]
> > + object_count = stats->tags + stats->commits + stats->trees + stats->blobs;
>
> We have this computation in two locations now. Maybe we should
> deduplicate it via something like:
>
> static inline size_t stats_get_total_object_count()
> {
> return stats->tags + stats->commits + stats->trees + stats->blobs;
> }
Make sense. Will add.
>
> > @@ -417,8 +442,12 @@ static void stats_count_objects(struct object_stats *stats,
> > }
> > }
> >
> > + if (show_progress)
> > + data.progress = start_progress(repo, _("Counting Objects"), 0);
>
> s/Objects/objects/
Will fix in the next version.
[snip]
> > + stats_count_references(&stats.refs, &refs, repo, show_progress);
> > + stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
> >
> > switch (format) {
> > case FORMAT_TABLE:
>
> Should our tests be updated to verify that we know to print progress
> depending on whether or not `--progress` is passed?
That seems reasonable to me. I'll add a test for this in the next
version.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats
2025-09-25 13:16 ` Justin Tobler
@ 2025-09-25 13:58 ` Patrick Steinhardt
0 siblings, 0 replies; 92+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 13:58 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188
On Thu, Sep 25, 2025 at 08:16:06AM -0500, Justin Tobler wrote:
> On 25/09/25 07:39AM, Patrick Steinhardt wrote:
> > On Wed, Sep 24, 2025 at 04:24:25PM -0500, Justin Tobler wrote:
> > > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> > > index 0b8d74ed3e..db21b75522 100644
> > > --- a/Documentation/git-repo.adoc
> > > +++ b/Documentation/git-repo.adoc
> > > @@ -52,7 +52,26 @@ supported:
> > > * Reachable object counts categorized by type
> > >
> > > +
> > > -The table output format may change and is not intended for machine parsing.
> > > +The output format can be chosen through the flag `--format`. Three formats are
> > > +supported:
> > > ++
> > > +`table`:::
> > > + Outputs repository stats in a human-friendly table and is used by
> > > + default. This format may change and is not intended for machine
> > > + parsing.
> >
> > Let's mention that this is the default format.
>
> I didn't mention that it is "used by default", but I think the wording
> could be more clear here. Will improve in the next version.
Oh, I completely missed this. I guess I was looking for a sentence like
"This is the default." at the end of this paragraph, which is what we
often use in other parts.
> > > +`keyvalue`:::
> > > + Each line of output contains a key-value pair for a repository stat.
> > > + The '=' character is used to delimit between the key and the value.
> > > + Values containing "unusual" characters are quoted as explained for the
> > > + configuration variable `core.quotePath` (see linkgit:git-config[1]).
> >
> > In the current state there is never any quoting, so this statement here
> > is a bit misleading. Should we maybe drop that part?
>
> While there are currently not any values in the output that would
> require quoting, I'm inclined to leave this note in the documentation.
> That way we set the expectation regarding how parsers should handle the
> output from the start.
Fair enough.
> > > @@ -407,8 +447,20 @@ static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
> > > stats_count_references(&stats.refs, &refs);
> > > stats_count_objects(&stats.objects, &refs, &revs);
> > >
> > > - stats_table_setup(&table, &stats);
> > > - stats_table_print(&table);
> > > + switch (format) {
> > > + case FORMAT_TABLE:
> > > + stats_table_setup(&table, &stats);
> > > + stats_table_print(&table);
> > > + break;
> > > + case FORMAT_KEYVALUE:
> > > + stats_keyvalue_print(&stats, '=', '\n');
> > > + break;
> > > + case FORMAT_NUL_TERMINATED:
> > > + stats_keyvalue_print(&stats, '\n', '\0');
> > > + break;
> >
> > This reads much nicer now. The newline as key-value delimiter is a
> > curious choice, but you simply do what we already do in `git repo info`.
>
> I agree that the key-value delimiter chosen is a bit strange. The
> command is still experimental so we could maybe change it if want. Not
> sure if it would be worth it though.
Dunno. If we wanted to do it I'd do it in a follow-up patch series
anyway.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v3 0/7] builtin/repo: introduce stats subcommand
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
` (5 preceding siblings ...)
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
` (7 more replies)
6 siblings, 8 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
Greetings,
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but in Git natively.
In this initial version, the "stats" subcommand only surfaces counts of
the various reference and object types in a repository. In a follow-up
series, I would like to introduce additional data points that are
present in git-sizer(1) such as largest objects, combined object sizes
by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
Changes since V2:
- Added clang-format patch to address false postive triggered in this
series.
- Use varargs for stats_table_add() family of functions.
- Print to stdout directly instead of using strbuf.
- Add parse_option() earlier in the series.
- Use start_delayed_progress() instead of start_progress().
- Add test to validate --[no-]progress options.
- Some other small fixes.
Changes since V1:
- Translatable terms displayed in the table have formatting separated
out.
- Squashed the `keyvalue` and `nul` output format patches into one.
- Added a progress meter to provide users with more feedback.
- Updated docs to outline to outline reported data in a bulleted list.
- Combined similar tests together to reduce repetitive setup.
- Added patch to improve ref-filter interface so we don't have to create
a dummy patterns array.
- Many other renames and cleanups to improve patch clarity.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: rename repo_info() to cmd_repo_info()
ref-filter: allow NULL filter pattern
clang-format: exclude control macros from SpaceBeforeParens
builtin/repo: introduce stats subcommand
builtin/repo: add object counts in stats output
builtin/repo: add keyvalue and nul format for stats
builtin/repo: add progress meter for stats
.clang-format | 2 +-
Documentation/git-repo.adoc | 30 +++
builtin/repo.c | 372 +++++++++++++++++++++++++++++++++++-
ref-filter.c | 4 +-
t/meson.build | 1 +
t/t1901-repo-stats.sh | 129 +++++++++++++
6 files changed, 532 insertions(+), 6 deletions(-)
create mode 100755 t/t1901-repo-stats.sh
Range-diff against v2:
1: ed04168562 = 1: ed04168562 builtin/repo: rename repo_info() to cmd_repo_info()
2: 6aa76d1323 = 2: 6aa76d1323 ref-filter: allow NULL filter pattern
-: ---------- > 3: 02a3fcc5fb clang-format: exclude control macros from SpaceBeforeParens
3: dc06ca98fb ! 4: 12cfbdc464 builtin/repo: introduce stats subcommand
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ char *value;
+};
+
-+static void stats_table_add(struct stats_table *table, const char *format,
-+ const char *name, struct stats_table_entry *entry)
++static void stats_table_vaddf(struct stats_table *table,
++ struct stats_table_entry *entry,
++ const char *format, va_list ap)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ size_t name_width;
+
-+ strbuf_addf(&buf, format, name);
++ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, &name_width);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ }
+}
+
-+static void stats_table_add_count(struct stats_table *table, const char *format,
-+ const char *name, size_t value)
++static void stats_table_addf(struct stats_table *table, const char *format, ...)
++{
++ va_list ap;
++
++ va_start(ap, format);
++ stats_table_vaddf(table, NULL, format, ap);
++ va_end(ap);
++}
++
++static void stats_table_count_addf(struct stats_table *table, size_t value,
++ const char *format, ...)
+{
+ struct stats_table_entry *entry;
++ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
-+ stats_table_add(table, format, name, entry);
++
++ va_start(ap, format);
++ stats_table_vaddf(table, entry, format, ap);
++ va_end(ap);
+}
+
+static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ size_t ref_total;
+
+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
-+ stats_table_add(table, "* %s", _("References"), NULL);
-+ stats_table_add_count(table, " * %s", _("Count"), ref_total);
-+ stats_table_add_count(table, " * %s", _("Branches"), refs->branches);
-+ stats_table_add_count(table, " * %s", _("Tags"), refs->tags);
-+ stats_table_add_count(table, " * %s", _("Remotes"), refs->remotes);
-+ stats_table_add_count(table, " * %s", _("Others"), refs->others);
++ stats_table_addf(table, "* %s", _("References"));
++ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
++ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
++ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
++ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
++ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
+static inline size_t max_size_t(size_t a, size_t b)
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ return (a > b) ? a : b;
+}
+
-+static void stats_table_print(struct stats_table *table)
++static void stats_table_print(const struct stats_table *table)
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
+ size_t name_title_len = strlen(name_col_title);
+ size_t value_title_len = strlen(value_col_title);
-+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ int name_col_width;
+ int value_col_width;
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ value_col_width = cast_size_t_to_int(
+ max_size_t(table->value_col_width, value_title_len));
+
-+ strbuf_addf(&buf, "| %-*s | %-*s |\n", name_col_width, name_col_title,
-+ value_col_width, value_col_title);
-+ strbuf_addstr(&buf, "| ");
-+ strbuf_addchars(&buf, '-', name_col_width);
-+ strbuf_addstr(&buf, " | ");
-+ strbuf_addchars(&buf, '-', value_col_width);
-+ strbuf_addstr(&buf, " |\n");
++ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
++ value_col_width, value_col_title);
++ printf("| ");
++ for (int i = 0; i < name_col_width; i++)
++ putchar('-');
++ printf(" | ");
++ for (int i = 0; i < value_col_width; i++)
++ putchar('-');
++ printf(" |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ value = entry->value;
+ }
+
-+ strbuf_addf(&buf, "| %-*s | %*s |\n", name_col_width,
-+ item->string, value_col_width, value);
++ printf("| %-*s | %*s |\n", name_col_width, item->string,
++ value_col_width, value);
+ }
-+
-+ fputs(buf.buf, stdout);
-+ strbuf_release(&buf);
+}
+
+static void stats_table_clear(struct stats_table *table)
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ }
+}
+
-+static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
-+ const char *prefix UNUSED, struct repository *repo UNUSED)
++static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
++ struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ };
+ struct ref_stats stats = { 0 };
+ struct ref_array refs = { 0 };
++ struct option options[] = { 0 };
++
++ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
++ if (argc)
++ usage(_("too many arguments"));
+
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
4: 2832670e82 ! 5: ab27340d58 builtin/repo: add object counts in stats output
@@ builtin/repo.c: struct ref_stats {
struct stats_table {
struct string_list rows;
-@@ builtin/repo.c: static void stats_table_add_count(struct stats_table *table, const char *format,
- stats_table_add(table, format, name, entry);
+@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
+ va_end(ap);
}
-static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
-+static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
++static inline size_t get_total_object_count(struct object_stats *stats)
{
++ return stats->tags + stats->commits + stats->trees + stats->blobs;
++}
++
++static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
++{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
@@ builtin/repo.c: static void stats_table_add_count(struct stats_table *table, con
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
@@ builtin/repo.c: static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
- stats_table_add_count(table, " * %s", _("Tags"), refs->tags);
- stats_table_add_count(table, " * %s", _("Remotes"), refs->remotes);
- stats_table_add_count(table, " * %s", _("Others"), refs->others);
+ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+
-+ object_total = objects->commits + objects->trees + objects->blobs + objects->tags;
-+ stats_table_add(table, "%s", "", NULL);
-+ stats_table_add(table, "* %s", _("Reachable objects"), NULL);
-+ stats_table_add_count(table, " * %s", _("Count"), object_total);
-+ stats_table_add_count(table, " * %s", _("Commits"), objects->commits);
-+ stats_table_add_count(table, " * %s", _("Trees"), objects->trees);
-+ stats_table_add_count(table, " * %s", _("Blobs"), objects->blobs);
-+ stats_table_add_count(table, " * %s", _("Tags"), objects->tags);
++ object_total = get_total_object_count(objects);
++ stats_table_addf(table, "");
++ stats_table_addf(table, "* %s", _("Reachable objects"));
++ stats_table_count_addf(table, object_total, " * %s", _("Count"));
++ stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
++ stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
++ stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
++ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
static inline size_t max_size_t(size_t a, size_t b)
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
+ path_walk_info_clear(&info);
+}
+
- static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
-- const char *prefix UNUSED, struct repository *repo UNUSED)
-+ const char *prefix, struct repository *repo)
+ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+- struct repository *repo UNUSED)
++ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
+ struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+ repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
5: 5d1adf7905 ! 6: f69110224d builtin/repo: add keyvalue and nul format for stats
@@ Documentation/git-repo.adoc: supported:
+supported:
++
+`table`:::
-+ Outputs repository stats in a human-friendly table and is used by
-+ default. This format may change and is not intended for machine
-+ parsing.
++ Outputs repository stats in a human-friendly table. This format may
++ change and is not intended for machine parsing. This is the default
++ format.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
+static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
+ char value_delim)
+{
-+ struct strbuf buf = STRBUF_INIT;
++ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->refs.branches, value_delim);
++ printf("references.tags.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->refs.tags, value_delim);
++ printf("references.remotes.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->refs.remotes, value_delim);
++ printf("references.others.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->refs.others, value_delim);
+
-+ strbuf_addf(&buf, "references.branches.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->refs.branches, value_delim);
-+ strbuf_addf(&buf, "references.tags.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->refs.tags, value_delim);
-+ strbuf_addf(&buf, "references.remotes.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->refs.remotes, value_delim);
-+ strbuf_addf(&buf, "references.others.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->refs.others, value_delim);
++ printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->objects.commits, value_delim);
++ printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->objects.trees, value_delim);
++ printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->objects.blobs, value_delim);
++ printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
++ (uintmax_t)stats->objects.tags, value_delim);
+
-+ strbuf_addf(&buf, "objects.commits.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->objects.commits, value_delim);
-+ strbuf_addf(&buf, "objects.trees.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->objects.trees, value_delim);
-+ strbuf_addf(&buf, "objects.blobs.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->objects.blobs, value_delim);
-+ strbuf_addf(&buf, "objects.tags.count%c%" PRIuMAX "%c",
-+ key_delim, (uintmax_t)stats->objects.tags, value_delim);
-+
-+ fwrite(buf.buf, sizeof(char), buf.len, stdout);
-+ strbuf_release(&buf);
++ fflush(stdout);
+}
+
static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
-@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
- path_walk_info_clear(&info);
- }
-
--static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
-- const char *prefix, struct repository *repo)
-+static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
-+ struct repository *repo)
- {
- struct ref_filter filter = REF_FILTER_INIT;
+@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+- struct option options[] = { 0 };
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
+ OPT_END()
+ };
-+ parse_options(argc, argv, prefix, options, repo_usage, 0);
- repo_init_revisions(repo, &revs, prefix);
- if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
- die(_("unable to filter refs"));
-@@ builtin/repo.c: static int cmd_repo_stats(int argc UNUSED, const char **argv UNUSED,
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
stats_count_references(&stats.refs, &refs);
stats_count_objects(&stats.objects, &refs, &revs);
@@ t/t1901-repo-stats.sh: test_expect_success 'repository with references and objec
)
'
-+test_expect_success 'repository stats with keyvalue and nul format' '
++test_expect_success 'keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
@@ t/t1901-repo-stats.sh: test_expect_success 'repository with references and objec
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
-+ tr "\n" "\0" <expect | tr "=" "\n" >expect_null &&
++ tr "\n=" "\0\n" <expect >expect_nul &&
+ git repo stats --format=nul >out 2>err &&
+
-+ test_cmp expect_null out &&
++ test_cmp expect_nul out &&
+ test_line_count = 0 err
+ )
+'
6: 2a745c8417 ! 7: cff5e183bb builtin/repo: add progress meter for stats
@@ builtin/repo.c
#include "ref-filter.h"
#include "refs.h"
@@ builtin/repo.c: static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
- strbuf_release(&buf);
+ fflush(stdout);
}
-static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
@@ builtin/repo.c: static void stats_keyvalue_print(struct repo_stats *stats, char
+ struct progress *progress = NULL;
+
+ if (show_progress)
-+ progress = start_progress(repo, _("Counting references"), refs->nr);
++ progress = start_delayed_progress(repo, _("Counting references"),
++ refs->nr);
+
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
BUG("invalid object type");
}
-+ object_count = stats->tags + stats->commits + stats->trees + stats->blobs;
++ object_count = get_total_object_count(stats);
+ display_progress(data->progress, object_count);
+
return 0;
@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
}
+ if (show_progress)
-+ data.progress = start_progress(repo, _("Counting Objects"), 0);
++ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const cha
switch (format) {
case FORMAT_TABLE:
+
+ ## t/t1901-repo-stats.sh ##
+@@ t/t1901-repo-stats.sh: test_expect_success 'keyvalue and nul format' '
+ )
+ '
+
++test_expect_success 'progress meter option' '
++ test_when_finished "rm -rf repo" &&
++ git init repo &&
++ (
++ cd repo &&
++ test_commit foo &&
++
++ GIT_PROGRESS_DELAY=0 git repo stats --progress >out 2>err &&
++
++ test_file_not_empty out &&
++ test_grep "Counting references: 100% (2/2), done." err &&
++ test_grep "Counting objects: 3, done." err &&
++
++ GIT_PROGRESS_DELAY=0 git repo stats --no-progress >out 2>err &&
++
++ test_file_not_empty out &&
++ test_line_count = 0 err
++ )
++'
++
+ test_done
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info()
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
` (6 subsequent siblings)
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..eeeab8fbd2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -136,8 +136,8 @@ static int parse_format_cb(const struct option *opt,
return 0;
}
-static int repo_info(int argc, const char **argv, const char *prefix,
- struct repository *repo)
+static int cmd_repo_info(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
enum output_format format = FORMAT_KEYVALUE;
struct option options[] = {
@@ -161,7 +161,7 @@ int cmd_repo(int argc, const char **argv, const char *prefix,
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
- OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
OPT_END()
};
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 2/7] ref-filter: allow NULL filter pattern
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
` (5 subsequent siblings)
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..2cb5a166d6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char **pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
@@ -2751,7 +2751,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
- if (!filter->name_patterns[0]) {
+ if (!filter->name_patterns || !filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
` (4 subsequent siblings)
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
The formatter currently suggests adding a space between a control macro
and parentheses. In the Git project, this is not typically expected. Set
`SpaceBeforeParens` to `ControlStatementsExceptControlMacros`
accordingly.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
.clang-format | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.clang-format b/.clang-format
index dcfd0aad60..86b4fe33e5 100644
--- a/.clang-format
+++ b/.clang-format
@@ -149,7 +149,7 @@ SpaceBeforeCaseColon: false
# f();
# }
# }
-SpaceBeforeParens: ControlStatements
+SpaceBeforeParens: ControlStatementsExceptControlMacros
# Don't insert spaces inside empty '()'
SpaceInEmptyParentheses: false
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 4/7] builtin/repo: introduce stats subcommand
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (2 preceding siblings ...)
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:51 ` Eric Sunshine
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
` (3 subsequent siblings)
7 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler, Derrick Stolee
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++
builtin/repo.c | 196 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-stats.sh | 61 +++++++++++
4 files changed, 268 insertions(+)
create mode 100755 t/t1901-repo-stats.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..a009bf8cf1 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo stats
DESCRIPTION
-----------
@@ -43,6 +44,15 @@ supported:
+
`-z` is an alias for `--format=nul`.
+`stats`::
+ Retrieve statistics about the current repository. The following kinds
+ of information are reported:
++
+* Reference counts categorized by type
+
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index eeeab8fbd2..0b7dd636e5 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,15 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo stats",
NULL
};
@@ -156,12 +159,205 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct ref_stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ size_t name_col_width;
+ size_t value_col_width;
+};
+
+/*
+ * Holds column data that gets stored for each row.
+ */
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_vaddf(struct stats_table *table,
+ struct stats_table_entry *entry,
+ const char *format, va_list ap)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ size_t name_width;
+
+ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, &name_width);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ size_t value_width = strlen(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_addf(struct stats_table *table, const char *format, ...)
+{
+ va_list ap;
+
+ va_start(ap, format);
+ stats_table_vaddf(table, NULL, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_count_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+{
+ size_t ref_total;
+
+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
+ stats_table_addf(table, "* %s", _("References"));
+ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
+ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
+ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
+static inline size_t max_size_t(size_t a, size_t b)
+{
+ return (a > b) ? a : b;
+}
+
+static void stats_table_print(const struct stats_table *table)
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
+ size_t name_title_len = strlen(name_col_title);
+ size_t value_title_len = strlen(value_col_title);
+ struct string_list_item *item;
+ int name_col_width;
+ int value_col_width;
+
+ name_col_width = cast_size_t_to_int(
+ max_size_t(table->name_col_width, name_title_len));
+ value_col_width = cast_size_t_to_int(
+ max_size_t(table->value_col_width, value_title_len));
+
+ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ printf("| ");
+ for (int i = 0; i < name_col_width; i++)
+ putchar('-');
+ printf(" | ");
+ for (int i = 0; i < value_col_width; i++)
+ putchar('-');
+ printf(" |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ printf("| %-*s | %*s |\n", name_col_width, item->string,
+ value_col_width, value);
+ }
+}
+
+static void stats_table_clear(struct stats_table *table)
+{
+ struct stats_table_entry *entry;
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, &table->rows) {
+ entry = item->util;
+ if (entry)
+ free(entry->value);
+ }
+
+ string_list_clear(&table->rows, 1);
+}
+
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+}
+
+static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+ struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
+ struct ref_array refs = { 0 };
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
+
+ stats_count_references(&stats, &refs);
+
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+
+ stats_table_clear(&table);
+ ref_array_clear(&refs);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
+ OPT_SUBCOMMAND("stats", &fn, cmd_repo_stats),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..071d4a5112 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-stats.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
new file mode 100755
index 0000000000..535ac511dd
--- /dev/null
+++ b/t/t1901-repo-stats.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='test git repo stats'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ git tag -a foo -m bar &&
+
+ oid="$(git rev-parse HEAD)" &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 5/7] builtin/repo: add object counts in stats output
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (3 preceding siblings ...)
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
` (2 subsequent siblings)
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 96 +++++++++++++++++++++++++++++++++++--
t/t1901-repo-stats.sh | 51 +++++++++++++-------
3 files changed, 126 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index a009bf8cf1..0b8d74ed3e 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -49,6 +49,7 @@ supported:
of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index 0b7dd636e5..43cd6b1b38 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -166,6 +168,18 @@ struct ref_stats {
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct repo_stats {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -227,8 +241,16 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
-static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+static inline size_t get_total_object_count(struct object_stats *stats)
{
+ return stats->tags + stats->commits + stats->trees + stats->blobs;
+}
+
+static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
+{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
@@ -238,6 +260,15 @@ static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+
+ object_total = get_total_object_count(objects);
+ stats_table_addf(table, "");
+ stats_table_addf(table, "* %s", _("Reachable objects"));
+ stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
static inline size_t max_size_t(size_t a, size_t b)
@@ -322,30 +353,87 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
}
}
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
+ struct object_stats *stats = cb_data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ BUG("invalid object type");
+ }
+
+ return 0;
+}
+
+static void stats_count_objects(struct object_stats *stats,
+ struct ref_array *refs, struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ case FILTER_REFS_TAGS:
+ case FILTER_REFS_REMOTES:
+ case FILTER_REFS_OTHERS:
+ add_pending_oid(revs, NULL, &ref->objectname, 0);
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
+}
+
static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
- struct repository *repo UNUSED)
+ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
usage(_("too many arguments"));
+ repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats, &refs);
+ stats_count_references(&stats.refs, &refs);
+ stats_count_objects(&stats.objects, &refs, &revs);
stats_table_setup(&table, &stats);
stats_table_print(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
ref_array_clear(&refs);
return 0;
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 535ac511dd..315b9e1767 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -10,14 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo stats >out 2>err &&
@@ -27,28 +34,36 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references' '
+test_expect_success 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- git commit --allow-empty -m init &&
+ test_commit_bulk 42 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
git update-ref refs/remotes/origin/foo "$oid" &&
+ # Also creates a commit, tree, and blob.
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 130 |
+ | * Commits | 43 |
+ | * Trees | 43 |
+ | * Blobs | 43 |
+ | * Tags | 1 |
EOF
git repo stats >out 2>err &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (4 preceding siblings ...)
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
All repository stats are outputted in a human-friendly table form. This
format is not suitable for machine parsing. Add a --format option that
supports three output modes: `table`, `keyvalue`, and `nul`. The `table`
mode is the default format and prints the same table output as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 25 +++++++++++++++--
builtin/repo.c | 55 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-stats.sh | 33 ++++++++++++++++++++++
3 files changed, 106 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 0b8d74ed3e..3fbce0b88c 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo stats
+git repo stats [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,7 +44,7 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`stats`::
+`stats [--format=(table|keyvalue|nul)]`::
Retrieve statistics about the current repository. The following kinds
of information are reported:
+
@@ -52,7 +52,26 @@ supported:
* Reachable object counts categorized by type
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Three formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table. This format may
+ change and is not intended for machine parsing. This is the default
+ format.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
+ The '=' character is used to delimit between the key and the value.
+ Values containing "unusual" characters are quoted as explained for the
+ configuration variable `core.quotePath` (see linkgit:git-config[1]).
+
+`nul`:::
+ Similar to `keyvalue`, but uses a NUL character to delimit between
+ key-value pairs instead of a newline. Also uses a newline character as
+ the delimiter between the key and value instead of '='. Unlike the
+ `keyvalue` format, values containing "unusual" characters are never
+ quoted.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index 43cd6b1b38..e8a02c950b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -14,13 +14,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo stats",
+ "git repo stats [--format=(table|keyvalue|nul)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -135,6 +136,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -157,6 +160,8 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format != FORMAT_KEYVALUE && format != FORMAT_NUL_TERMINATED)
+ die(_("unsupported output format"));
return print_fields(argc, argv, repo, format);
}
@@ -329,6 +334,30 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
+ char value_delim)
+{
+ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.branches, value_delim);
+ printf("references.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.tags, value_delim);
+ printf("references.remotes.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.remotes, value_delim);
+ printf("references.others.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.others, value_delim);
+
+ printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.commits, value_delim);
+ printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.trees, value_delim);
+ printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.blobs, value_delim);
+ printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.tags, value_delim);
+
+ fflush(stdout);
+}
+
static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
@@ -413,10 +442,16 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
@@ -429,8 +464,20 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
stats_count_references(&stats.refs, &refs);
stats_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup(&table, &stats);
- stats_table_print(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ stats_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
+ stats_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
+ }
stats_table_clear(&table);
release_revisions(&revs);
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 315b9e1767..2409edae4f 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
)
'
+test_expect_success 'keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+
+ cat >expect <<-\EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ git repo stats --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n=" "\0\n" <expect >expect_nul &&
+ git repo stats --format=nul >out 2>err &&
+
+ test_cmp expect_nul out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 7/7] builtin/repo: add progress meter for stats
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (5 preceding siblings ...)
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
@ 2025-09-25 23:29 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
7 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-25 23:29 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, Justin Tobler
When using the stats subcommand for git-repo(1), evaluating a repository
may take some time depending on its shape. Add a progress meter to
provide feedback to the user about what is happening. The progress meter
is enabled by default when the command is executed from a tty. It can
also be explicitly enabled/disabled via the --[no-]progress option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 47 +++++++++++++++++++++++++++++++++++++------
t/t1901-repo-stats.sh | 20 ++++++++++++++++++
2 files changed, 61 insertions(+), 6 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index e8a02c950b..e553d7aa28 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,6 +4,7 @@
#include "environment.h"
#include "parse-options.h"
#include "path-walk.h"
+#include "progress.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
@@ -358,8 +359,15 @@ static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
fflush(stdout);
}
-static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
+ struct repository *repo, int show_progress)
{
+ struct progress *progress = NULL;
+
+ if (show_progress)
+ progress = start_delayed_progress(repo, _("Counting references"),
+ refs->nr);
+
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -379,13 +387,24 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
default:
BUG("unexpected reference type");
}
+
+ display_progress(progress, i + 1);
}
+
+ stop_progress(&progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
+
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
- struct object_stats *stats = cb_data;
+ struct count_objects_data *data = cb_data;
+ struct object_stats *stats = data->stats;
+ size_t object_count;
switch (type) {
case OBJ_TAG:
@@ -404,17 +423,24 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
BUG("invalid object type");
}
+ object_count = get_total_object_count(stats);
+ display_progress(data->progress, object_count);
+
return 0;
}
static void stats_count_objects(struct object_stats *stats,
- struct ref_array *refs, struct rev_info *revs)
+ struct ref_array *refs, struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
+ .stats = stats,
+ };
info.revs = revs;
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -431,8 +457,12 @@ static void stats_count_objects(struct object_stats *stats,
}
}
+ if (show_progress)
+ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
}
static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
@@ -446,10 +476,12 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
OPT_CALLBACK_F(0, "format", &format, N_("format"),
N_("output format"),
PARSE_OPT_NONEG, parse_format_cb),
+ OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
OPT_END()
};
@@ -461,8 +493,11 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats.refs, &refs);
- stats_count_objects(&stats.objects, &refs, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
+ stats_count_references(&stats.refs, &refs, repo, show_progress);
+ stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 2409edae4f..03d6db479f 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -106,4 +106,24 @@ test_expect_success 'keyvalue and nul format' '
)
'
+test_expect_success 'progress meter option' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit foo &&
+
+ GIT_PROGRESS_DELAY=0 git repo stats --progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_grep "Counting references: 100% (2/2), done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
+ GIT_PROGRESS_DELAY=0 git repo stats --no-progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v3 4/7] builtin/repo: introduce stats subcommand
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-25 23:51 ` Eric Sunshine
2025-09-26 1:38 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Eric Sunshine @ 2025-09-25 23:51 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, Derrick Stolee
On Thu, Sep 25, 2025 at 7:30 PM Justin Tobler <jltobler@gmail.com> wrote:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command. Acquiring this information requires
> users to be fairly knowledgeable about the structure of a Git repository
> and how to identify the relevant data points. To fill this gap,
> supplemental tools such as git-sizer(1) have been developed.
> [...]
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> diff --git a/builtin/repo.c b/builtin/repo.c
> @@ -156,12 +159,205 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
> +static void stats_table_vaddf(struct stats_table *table,
> + struct stats_table_entry *entry,
> + const char *format, va_list ap)
> +{
> + size_t name_width;
> +
> + strbuf_vaddf(&buf, format, ap);
> + formatted_name = strbuf_detach(&buf, &name_width);
> + [...]
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
Here, you're using the byte length of the composed string to compute
the table width which you will use for alignment purposes when
rendering the table...
> +static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
> +{
> + size_t ref_total;
> +
> + ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
> + stats_table_addf(table, "* %s", _("References"));
> + stats_table_count_addf(table, ref_total, " * %s", _("Count"));
> + stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
> + stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
> + stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
> + stats_table_count_addf(table, refs->others, " * %s", _("Others"));
> +}
...however, here you feed the function translatable strings, which
means that the display length of the composed string is not guaranteed
to be the same as the byte length.
To resolve this, you probably want to investigate Git's utf8.h header,
in particular, the utf8_strwidth() function.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 4/7] builtin/repo: introduce stats subcommand
2025-09-25 23:51 ` Eric Sunshine
@ 2025-09-26 1:38 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-26 1:38 UTC (permalink / raw)
To: Eric Sunshine; +Cc: git, ps, karthik.188, Derrick Stolee
On 25/09/25 07:51PM, Eric Sunshine wrote:
> On Thu, Sep 25, 2025 at 7:30 PM Justin Tobler <jltobler@gmail.com> wrote:
[snip]
> > + strbuf_vaddf(&buf, format, ap);
> > + formatted_name = strbuf_detach(&buf, &name_width);
> > + [...]
> > + if (name_width > table->name_col_width)
> > + table->name_col_width = name_width;
>
> Here, you're using the byte length of the composed string to compute
> the table width which you will use for alignment purposes when
> rendering the table...
>
> > +static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
> > +{
> > + size_t ref_total;
> > +
> > + ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
> > + stats_table_addf(table, "* %s", _("References"));
> > + stats_table_count_addf(table, ref_total, " * %s", _("Count"));
> > + stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
> > + stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
> > + stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
> > + stats_table_count_addf(table, refs->others, " * %s", _("Others"));
> > +}
>
> ...however, here you feed the function translatable strings, which
> means that the display length of the composed string is not guaranteed
> to be the same as the byte length.
>
> To resolve this, you probably want to investigate Git's utf8.h header,
> in particular, the utf8_strwidth() function.
Thanks for raising this. I hadn't considered the fact that the
characters in translated strings could occupy more than one byte. I'll
address this in the next version by using utf8_strwidth() as you
mentioned. :)
Thanks,
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v4 0/7] builtin/repo: introduce stats subcommand
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (6 preceding siblings ...)
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
` (8 more replies)
7 siblings, 9 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
Greetings,
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but in Git natively.
In this initial version, the "stats" subcommand only surfaces counts of
the various reference and object types in a repository. In a follow-up
series, I would like to introduce additional data points that are
present in git-sizer(1) such as largest objects, combined object sizes
by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
Changes since V3:
- Changed from using strlen() to utf8_strlen() to take into
consideration that translatable strings may have characters that are
more than one byte.
Changes since V2:
- Added clang-format patch to address false postive triggered in this
series.
- Use varargs for stats_table_add() family of functions.
- Print to stdout directly instead of using strbuf.
- Add parse_option() earlier in the series.
- Use start_delayed_progress() instead of start_progress().
- Add test to validate --[no-]progress options.
- Some other small fixes.
Changes since V1:
- Translatable terms displayed in the table have formatting separated
out.
- Squashed the `keyvalue` and `nul` output format patches into one.
- Added a progress meter to provide users with more feedback.
- Updated docs to outline to outline reported data in a bulleted list.
- Combined similar tests together to reduce repetitive setup.
- Added patch to improve ref-filter interface so we don't have to create
a dummy patterns array.
- Many other renames and cleanups to improve patch clarity.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: rename repo_info() to cmd_repo_info()
ref-filter: allow NULL filter pattern
clang-format: exclude control macros from SpaceBeforeParens
builtin/repo: introduce stats subcommand
builtin/repo: add object counts in stats output
builtin/repo: add keyvalue and nul format for stats
builtin/repo: add progress meter for stats
.clang-format | 2 +-
Documentation/git-repo.adoc | 30 +++
builtin/repo.c | 374 +++++++++++++++++++++++++++++++++++-
ref-filter.c | 4 +-
t/meson.build | 1 +
t/t1901-repo-stats.sh | 129 +++++++++++++
6 files changed, 534 insertions(+), 6 deletions(-)
create mode 100755 t/t1901-repo-stats.sh
Range-diff against v3:
1: ed04168562 = 1: ed04168562 builtin/repo: rename repo_info() to cmd_repo_info()
2: 6aa76d1323 = 2: 6aa76d1323 ref-filter: allow NULL filter pattern
3: 02a3fcc5fb = 3: 02a3fcc5fb clang-format: exclude control macros from SpaceBeforeParens
4: 12cfbdc464 ! 4: 8ec9914886 builtin/repo: introduce stats subcommand
@@ builtin/repo.c
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
++#include "utf8.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ size_t name_width;
+
+ strbuf_vaddf(&buf, format, ap);
-+ formatted_name = strbuf_detach(&buf, &name_width);
++ formatted_name = strbuf_detach(&buf, NULL);
++ name_width = utf8_strwidth(formatted_name);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
-+ size_t value_width = strlen(entry->value);
++ size_t value_width = utf8_strwidth(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
-+ size_t name_title_len = strlen(name_col_title);
-+ size_t value_title_len = strlen(value_col_title);
++ size_t name_title_len = utf8_strwidth(name_col_title);
++ size_t value_title_len = utf8_strwidth(value_col_title);
+ struct string_list_item *item;
+ int name_col_width;
+ int value_col_width;
5: ab27340d58 = 5: 584d35f2c7 builtin/repo: add object counts in stats output
6: f69110224d = 6: 76975b2eab builtin/repo: add keyvalue and nul format for stats
7: cff5e183bb = 7: 1105346a3c builtin/repo: add progress meter for stats
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info()
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
` (7 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..eeeab8fbd2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -136,8 +136,8 @@ static int parse_format_cb(const struct option *opt,
return 0;
}
-static int repo_info(int argc, const char **argv, const char *prefix,
- struct repository *repo)
+static int cmd_repo_info(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
enum output_format format = FORMAT_KEYVALUE;
struct option options[] = {
@@ -161,7 +161,7 @@ int cmd_repo(int argc, const char **argv, const char *prefix,
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
- OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
OPT_END()
};
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 2/7] ref-filter: allow NULL filter pattern
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
` (6 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..2cb5a166d6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char **pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
@@ -2751,7 +2751,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
- if (!filter->name_patterns[0]) {
+ if (!filter->name_patterns || !filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 15:40 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
` (5 subsequent siblings)
8 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
The formatter currently suggests adding a space between a control macro
and parentheses. In the Git project, this is not typically expected. Set
`SpaceBeforeParens` to `ControlStatementsExceptControlMacros`
accordingly.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
.clang-format | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.clang-format b/.clang-format
index dcfd0aad60..86b4fe33e5 100644
--- a/.clang-format
+++ b/.clang-format
@@ -149,7 +149,7 @@ SpaceBeforeCaseColon: false
# f();
# }
# }
-SpaceBeforeParens: ControlStatements
+SpaceBeforeParens: ControlStatementsExceptControlMacros
# Don't insert spaces inside empty '()'
SpaceInEmptyParentheses: false
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (2 preceding siblings ...)
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 16:32 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
` (4 subsequent siblings)
8 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler, Derrick Stolee
The shape of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface key stats/information regarding the shape of a
repository via a single command. Acquiring this information requires
users to be fairly knowledgeable about the structure of a Git repository
and how to identify the relevant data points. To fill this gap,
supplemental tools such as git-sizer(1) have been developed.
To allow users to more readily identify potential issues for a
repository, introduce the "stats" subcommand in git-repo(1) to output
stats for the repository that may be of interest to users. The goal of
this subcommand is to eventually provide similar functionality to
git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++
builtin/repo.c | 198 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-stats.sh | 61 +++++++++++
4 files changed, 270 insertions(+)
create mode 100755 t/t1901-repo-stats.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..a009bf8cf1 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo stats
DESCRIPTION
-----------
@@ -43,6 +44,15 @@ supported:
+
`-z` is an alias for `--format=nul`.
+`stats`::
+ Retrieve statistics about the current repository. The following kinds
+ of information are reported:
++
+* Reference counts categorized by type
+
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index eeeab8fbd2..889e344f15 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,16 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
+#include "utf8.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo stats",
NULL
};
@@ -156,12 +160,206 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct ref_stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ size_t name_col_width;
+ size_t value_col_width;
+};
+
+/*
+ * Holds column data that gets stored for each row.
+ */
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_vaddf(struct stats_table *table,
+ struct stats_table_entry *entry,
+ const char *format, va_list ap)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ size_t name_width;
+
+ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, NULL);
+ name_width = utf8_strwidth(formatted_name);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ size_t value_width = utf8_strwidth(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_addf(struct stats_table *table, const char *format, ...)
+{
+ va_list ap;
+
+ va_start(ap, format);
+ stats_table_vaddf(table, NULL, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_count_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+{
+ size_t ref_total;
+
+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
+ stats_table_addf(table, "* %s", _("References"));
+ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
+ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
+ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
+static inline size_t max_size_t(size_t a, size_t b)
+{
+ return (a > b) ? a : b;
+}
+
+static void stats_table_print(const struct stats_table *table)
+{
+ const char *name_col_title = _("Repository stats");
+ const char *value_col_title = _("Value");
+ size_t name_title_len = utf8_strwidth(name_col_title);
+ size_t value_title_len = utf8_strwidth(value_col_title);
+ struct string_list_item *item;
+ int name_col_width;
+ int value_col_width;
+
+ name_col_width = cast_size_t_to_int(
+ max_size_t(table->name_col_width, name_title_len));
+ value_col_width = cast_size_t_to_int(
+ max_size_t(table->value_col_width, value_title_len));
+
+ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ printf("| ");
+ for (int i = 0; i < name_col_width; i++)
+ putchar('-');
+ printf(" | ");
+ for (int i = 0; i < value_col_width; i++)
+ putchar('-');
+ printf(" |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ printf("| %-*s | %*s |\n", name_col_width, item->string,
+ value_col_width, value);
+ }
+}
+
+static void stats_table_clear(struct stats_table *table)
+{
+ struct stats_table_entry *entry;
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, &table->rows) {
+ entry = item->util;
+ if (entry)
+ free(entry->value);
+ }
+
+ string_list_clear(&table->rows, 1);
+}
+
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+}
+
+static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+ struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
+ struct ref_array refs = { 0 };
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
+
+ stats_count_references(&stats, &refs);
+
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+
+ stats_table_clear(&table);
+ ref_array_clear(&refs);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
+ OPT_SUBCOMMAND("stats", &fn, cmd_repo_stats),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..071d4a5112 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-stats.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
new file mode 100755
index 0000000000..535ac511dd
--- /dev/null
+++ b/t/t1901-repo-stats.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='test git repo stats'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ git tag -a foo -m bar &&
+
+ oid="$(git rev-parse HEAD)" &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
+ | Repository stats | Value |
+ | ---------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ git repo stats >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 5/7] builtin/repo: add object counts in stats output
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (3 preceding siblings ...)
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
` (3 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 96 +++++++++++++++++++++++++++++++++++--
t/t1901-repo-stats.sh | 51 +++++++++++++-------
3 files changed, 126 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index a009bf8cf1..0b8d74ed3e 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -49,6 +49,7 @@ supported:
of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index 889e344f15..3eefbeddba 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -167,6 +169,18 @@ struct ref_stats {
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct repo_stats {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -229,8 +243,16 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
-static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+static inline size_t get_total_object_count(struct object_stats *stats)
{
+ return stats->tags + stats->commits + stats->trees + stats->blobs;
+}
+
+static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
+{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
@@ -240,6 +262,15 @@ static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+
+ object_total = get_total_object_count(objects);
+ stats_table_addf(table, "");
+ stats_table_addf(table, "* %s", _("Reachable objects"));
+ stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
static inline size_t max_size_t(size_t a, size_t b)
@@ -324,30 +355,87 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
}
}
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
+ struct object_stats *stats = cb_data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ BUG("invalid object type");
+ }
+
+ return 0;
+}
+
+static void stats_count_objects(struct object_stats *stats,
+ struct ref_array *refs, struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ case FILTER_REFS_TAGS:
+ case FILTER_REFS_REMOTES:
+ case FILTER_REFS_OTHERS:
+ add_pending_oid(revs, NULL, &ref->objectname, 0);
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
+}
+
static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
- struct repository *repo UNUSED)
+ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
usage(_("too many arguments"));
+ repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats, &refs);
+ stats_count_references(&stats.refs, &refs);
+ stats_count_objects(&stats.objects, &refs, &revs);
stats_table_setup(&table, &stats);
stats_table_print(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
ref_array_clear(&refs);
return 0;
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 535ac511dd..315b9e1767 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -10,14 +10,21 @@ test_expect_success 'empty repository' '
(
cd repo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 0 |
- | * Branches | 0 |
- | * Tags | 0 |
- | * Remotes | 0 |
- | * Others | 0 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo stats >out 2>err &&
@@ -27,28 +34,36 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references' '
+test_expect_success 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- git commit --allow-empty -m init &&
+ test_commit_bulk 42 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
git update-ref refs/remotes/origin/foo "$oid" &&
+ # Also creates a commit, tree, and blob.
git notes add -m foo &&
cat >expect <<-\EOF &&
- | Repository stats | Value |
- | ---------------- | ----- |
- | * References | |
- | * Count | 4 |
- | * Branches | 1 |
- | * Tags | 1 |
- | * Remotes | 1 |
- | * Others | 1 |
+ | Repository stats | Value |
+ | ------------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 130 |
+ | * Commits | 43 |
+ | * Trees | 43 |
+ | * Blobs | 43 |
+ | * Tags | 1 |
EOF
git repo stats >out 2>err &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (4 preceding siblings ...)
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
` (2 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
All repository stats are outputted in a human-friendly table form. This
format is not suitable for machine parsing. Add a --format option that
supports three output modes: `table`, `keyvalue`, and `nul`. The `table`
mode is the default format and prints the same table output as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 25 +++++++++++++++--
builtin/repo.c | 55 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-stats.sh | 33 ++++++++++++++++++++++
3 files changed, 106 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 0b8d74ed3e..3fbce0b88c 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo stats
+git repo stats [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,7 +44,7 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`stats`::
+`stats [--format=(table|keyvalue|nul)]`::
Retrieve statistics about the current repository. The following kinds
of information are reported:
+
@@ -52,7 +52,26 @@ supported:
* Reachable object counts categorized by type
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Three formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table. This format may
+ change and is not intended for machine parsing. This is the default
+ format.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
+ The '=' character is used to delimit between the key and the value.
+ Values containing "unusual" characters are quoted as explained for the
+ configuration variable `core.quotePath` (see linkgit:git-config[1]).
+
+`nul`:::
+ Similar to `keyvalue`, but uses a NUL character to delimit between
+ key-value pairs instead of a newline. Also uses a newline character as
+ the delimiter between the key and value instead of '='. Unlike the
+ `keyvalue` format, values containing "unusual" characters are never
+ quoted.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index 3eefbeddba..6f41c9ada2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -15,13 +15,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo stats",
+ "git repo stats [--format=(table|keyvalue|nul)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -136,6 +137,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -158,6 +161,8 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format != FORMAT_KEYVALUE && format != FORMAT_NUL_TERMINATED)
+ die(_("unsupported output format"));
return print_fields(argc, argv, repo, format);
}
@@ -331,6 +336,30 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
+ char value_delim)
+{
+ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.branches, value_delim);
+ printf("references.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.tags, value_delim);
+ printf("references.remotes.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.remotes, value_delim);
+ printf("references.others.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.others, value_delim);
+
+ printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.commits, value_delim);
+ printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.trees, value_delim);
+ printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.blobs, value_delim);
+ printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.tags, value_delim);
+
+ fflush(stdout);
+}
+
static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
{
for (int i = 0; i < refs->nr; i++) {
@@ -415,10 +444,16 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
@@ -431,8 +466,20 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
stats_count_references(&stats.refs, &refs);
stats_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup(&table, &stats);
- stats_table_print(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup(&table, &stats);
+ stats_table_print(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ stats_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
+ stats_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
+ }
stats_table_clear(&table);
release_revisions(&revs);
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 315b9e1767..2409edae4f 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
)
'
+test_expect_success 'keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+
+ cat >expect <<-\EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ git repo stats --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n=" "\0\n" <expect >expect_nul &&
+ git repo stats --format=nul >out 2>err &&
+
+ test_cmp expect_nul out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v4 7/7] builtin/repo: add progress meter for stats
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (5 preceding siblings ...)
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
@ 2025-09-27 14:50 ` Justin Tobler
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 14:50 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, Justin Tobler
When using the stats subcommand for git-repo(1), evaluating a repository
may take some time depending on its shape. Add a progress meter to
provide feedback to the user about what is happening. The progress meter
is enabled by default when the command is executed from a tty. It can
also be explicitly enabled/disabled via the --[no-]progress option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 47 +++++++++++++++++++++++++++++++++++++------
t/t1901-repo-stats.sh | 20 ++++++++++++++++++
2 files changed, 61 insertions(+), 6 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 6f41c9ada2..c5fe9901ec 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,6 +4,7 @@
#include "environment.h"
#include "parse-options.h"
#include "path-walk.h"
+#include "progress.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
@@ -360,8 +361,15 @@ static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
fflush(stdout);
}
-static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
+ struct repository *repo, int show_progress)
{
+ struct progress *progress = NULL;
+
+ if (show_progress)
+ progress = start_delayed_progress(repo, _("Counting references"),
+ refs->nr);
+
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -381,13 +389,24 @@ static void stats_count_references(struct ref_stats *stats, struct ref_array *re
default:
BUG("unexpected reference type");
}
+
+ display_progress(progress, i + 1);
}
+
+ stop_progress(&progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
+
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
- struct object_stats *stats = cb_data;
+ struct count_objects_data *data = cb_data;
+ struct object_stats *stats = data->stats;
+ size_t object_count;
switch (type) {
case OBJ_TAG:
@@ -406,17 +425,24 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
BUG("invalid object type");
}
+ object_count = get_total_object_count(stats);
+ display_progress(data->progress, object_count);
+
return 0;
}
static void stats_count_objects(struct object_stats *stats,
- struct ref_array *refs, struct rev_info *revs)
+ struct ref_array *refs, struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
+ .stats = stats,
+ };
info.revs = revs;
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -433,8 +459,12 @@ static void stats_count_objects(struct object_stats *stats,
}
}
+ if (show_progress)
+ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
}
static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
@@ -448,10 +478,12 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
struct repo_stats stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
OPT_CALLBACK_F(0, "format", &format, N_("format"),
N_("output format"),
PARSE_OPT_NONEG, parse_format_cb),
+ OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
OPT_END()
};
@@ -463,8 +495,11 @@ static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- stats_count_references(&stats.refs, &refs);
- stats_count_objects(&stats.objects, &refs, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
+ stats_count_references(&stats.refs, &refs, repo, show_progress);
+ stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
diff --git a/t/t1901-repo-stats.sh b/t/t1901-repo-stats.sh
index 2409edae4f..03d6db479f 100755
--- a/t/t1901-repo-stats.sh
+++ b/t/t1901-repo-stats.sh
@@ -106,4 +106,24 @@ test_expect_success 'keyvalue and nul format' '
)
'
+test_expect_success 'progress meter option' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit foo &&
+
+ GIT_PROGRESS_DELAY=0 git repo stats --progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_grep "Counting references: 100% (2/2), done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
+ GIT_PROGRESS_DELAY=0 git repo stats --no-progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
@ 2025-09-27 15:40 ` Junio C Hamano
2025-09-27 15:51 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2025-09-27 15:40 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine
Justin Tobler <jltobler@gmail.com> writes:
> The formatter currently suggests adding a space between a control macro
> and parentheses. In the Git project, this is not typically expected. Set
> `SpaceBeforeParens` to `ControlStatementsExceptControlMacros`
> accordingly.
>
> Helped-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> .clang-format | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
While this may be a welcome addition, I somehow do not think it
belongs as [3/7] to this series, whose theme is about "git repo
stats".
Perhaps make it a separate topic and have it graduate sooner?
> diff --git a/.clang-format b/.clang-format
> index dcfd0aad60..86b4fe33e5 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -149,7 +149,7 @@ SpaceBeforeCaseColon: false
> # f();
> # }
> # }
> -SpaceBeforeParens: ControlStatements
> +SpaceBeforeParens: ControlStatementsExceptControlMacros
>
> # Don't insert spaces inside empty '()'
> SpaceInEmptyParentheses: false
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens
2025-09-27 15:40 ` Junio C Hamano
@ 2025-09-27 15:51 ` Justin Tobler
2025-09-27 23:49 ` Junio C Hamano
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-09-27 15:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, karthik.188, sunshine
On 25/09/27 08:40AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > The formatter currently suggests adding a space between a control macro
> > and parentheses. In the Git project, this is not typically expected. Set
> > `SpaceBeforeParens` to `ControlStatementsExceptControlMacros`
> > accordingly.
> >
> > Helped-by: Karthik Nayak <karthik.188@gmail.com>
> > Signed-off-by: Justin Tobler <jltobler@gmail.com>
> > ---
> > .clang-format | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> While this may be a welcome addition, I somehow do not think it
> belongs as [3/7] to this series, whose theme is about "git repo
> stats".
That's completely fair. I noticed the formatter flagged this issue while
working on this series, but I was also on the fence as to whether it
should be submitted separately.
> Perhaps make it a separate topic and have it graduate sooner?
I'll send this patch as a separate topic and can send another version of
this series without this patch.
Thanks,
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
@ 2025-09-27 16:32 ` Junio C Hamano
2025-10-09 22:09 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2025-09-27 16:32 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine, Derrick Stolee
Justin Tobler <jltobler@gmail.com> writes:
> The shape of a repository's history can have huge impacts on the
> performance and health of the repository itself. Currently, Git lacks a
> means to surface key stats/information regarding the shape of a
> repository via a single command.
Talking about "shape of a repository's *history*" may negatively
affect your goal here. If a project is overly mergy with many
octopus merges, it would have huge impacts on the performance to run
"git bisect" over its history, so it may be interesting to know the
ratio of the merge commits in the total commits, and also the
average number of merge parents. But after you obtain such numbers,
you cannot do anything about it, as you cannot afford to rewrite its
history only to improve the "performance and health".
And that is what makes "key stats" relative to your goal. If your
goal is to give stats on the things you can control (e.g., how long
a typical delta chain is, how many loose objects there are that can
be moved to a packfile, how small would your object database would
become if you prune all the unreachable objects), that would cut off
some stats that may still be interesting but may not contribute to
address "huge impacts on the performance and health".
With Devil's advocate hat on, a single command that gives a set of
stats that are "key" to a goal of a single use case may not be as
useful as a collection of commands, each of which gives stats on one
aspect of the repository, that can be combined to help you address
various different goals.
> To allow users to more readily identify potential issues for a
> repository, introduce the "stats" subcommand in git-repo(1) to output
> stats for the repository that may be of interest to users. The goal of
> this subcommand is to eventually provide similar functionality to
> git-sizer(1), but natively in Git.
So, it is needless to say that the kind of "stats" obtained by such
a single tool needs to be chosen carefully, but more importantly,
its output should give users actionable output, as whoever designed
such a tool and chose what "key stats" are has a clear idea on
various aspects of repository. "stats" measure the health of the
repository against certain yardstick, but it should come with a
clear instruction to make use of that measurement. The tool may say
"the stats indicate that you have commits that touch too many paths
at the same time". The users need to be know what consequence of
that finding is, and what they can do about it.
For example, what would the user do with the new knowledge that the
repository has 100x as many local branches as there are
remote-tracking branches? Without breaking down these numerous
local branches into those that are still used in active development
(hint: peek into their reflog), kept as historical landmarks, past
development that has already been merged (hint: "git branch --list
--merged origin/master"), or abandoned cruft that hasn't been
touched with some changes that are not merged anywhere, the users
would not know what to do.
> +`stats`::
> + Retrieve statistics about the current repository. The following kinds
> + of information are reported:
> ++
> +* Reference counts categorized by type
> +
> ++
> +The table output format may change and is not intended for machine parsing.
Do we eventually want to give another format that is intended for
machine parsing?
In a format meant for human consumption, is it still sensible to
target fixed-column terminals these days? Rather, would they want
prettier-formatted html, or csv that they can easily import to
spreadsheet? (these are not objections but genuine questions).
> +static void stats_table_vaddf(struct stats_table *table,
> + struct stats_table_entry *entry,
> + const char *format, va_list ap)
> +{
> + struct strbuf buf = STRBUF_INIT;
> + struct string_list_item *item;
> + char *formatted_name;
> + size_t name_width;
> +
> + strbuf_vaddf(&buf, format, ap);
> + formatted_name = strbuf_detach(&buf, NULL);
> + name_width = utf8_strwidth(formatted_name);
> +
> + item = string_list_append_nodup(&table->rows, formatted_name);
> + item->util = entry;
> +
> + if (name_width > table->name_col_width)
> + table->name_col_width = name_width;
> + if (entry) {
> + size_t value_width = utf8_strwidth(entry->value);
> + if (value_width > table->value_col_width)
> + table->value_col_width = value_width;
> + }
> +}
OK, accumulate while measuring, so that you can compute the max
width of these things before writing them out, which is quite
bog-standard way to collect data.
> +static inline size_t max_size_t(size_t a, size_t b)
> +{
> + return (a > b) ? a : b;
> +}
Heh.
> +static void stats_table_print(const struct stats_table *table)
> +{
> + const char *name_col_title = _("Repository stats");
> + const char *value_col_title = _("Value");
> + size_t name_title_len = utf8_strwidth(name_col_title);
> + size_t value_title_len = utf8_strwidth(value_col_title);
> + struct string_list_item *item;
> + int name_col_width;
> + int value_col_width;
> +
> + name_col_width = cast_size_t_to_int(
> + max_size_t(table->name_col_width, name_title_len));
> + value_col_width = cast_size_t_to_int(
> + max_size_t(table->value_col_width, value_title_len));
If table->name_col_width and table->value_col_width were int to
begin with, none of these casts would have been necessary. Aren't
we overusing size_t to count things that are not memory allocations?
> + printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
> + value_col_width, value_col_title);
> + printf("| ");
> + for (int i = 0; i < name_col_width; i++)
> + putchar('-');
> + printf(" | ");
> + for (int i = 0; i < value_col_width; i++)
> + putchar('-');
> + printf(" |\n");
I wonder if people want to use unicode "Box Drawing" block and other
fancier things, as we assume utf8 for names and values, in which
case these printf would need to be "translatable", but locale
administrators should not have more say than others what kind of
line drawing elements are to be used, so perhaps the above is good
enough at least for now.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 0/7] builtin/repo: introduce stats subcommand
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (6 preceding siblings ...)
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
@ 2025-09-27 16:33 ` Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
8 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2025-09-27 16:33 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine
Justin Tobler <jltobler@gmail.com> writes:
> Changes since V3:
>
> - Changed from using strlen() to utf8_strlen() to take into
> consideration that translatable strings may have characters that are
> more than one byte.
If you are truncating a string, chopping the tail end of it, you do
have to take into account the fact that a single character can be
more than one byte long and avoid chopping in the middle of one
character.
But your use of <utf8.h> to measure the display columns for a
formatted name in stats table needs a different care. You are using
utf8_strwidth() to account for the fact that the number of display
columns consumed to show a non-ASCII string may be different from
the number of bytes in the string.
No need to resend, for the obvious reason that the cover letter
material would not make into the etched-in-stone "git log" history.
I just thought I should mention the above to unconfuse potential
readers.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens
2025-09-27 15:51 ` Justin Tobler
@ 2025-09-27 23:49 ` Junio C Hamano
0 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2025-09-27 23:49 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine
Justin Tobler <jltobler@gmail.com> writes:
>> Perhaps make it a separate topic and have it graduate sooner?
>
> I'll send this patch as a separate topic and can send another version of
> this series without this patch.
Nah, let's save one iteration by having me queue them on two
separate topics. If you need update to the "repo" topic, you can
just send it as a series without this step.
Thanks.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-09-27 16:32 ` Junio C Hamano
@ 2025-10-09 22:09 ` Justin Tobler
2025-10-10 0:42 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-10-09 22:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, karthik.188, sunshine, Derrick Stolee
On 25/09/27 09:32AM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > The shape of a repository's history can have huge impacts on the
> > performance and health of the repository itself. Currently, Git lacks a
> > means to surface key stats/information regarding the shape of a
> > repository via a single command.
>
> Talking about "shape of a repository's *history*" may negatively
> affect your goal here. If a project is overly mergy with many
> octopus merges, it would have huge impacts on the performance to run
> "git bisect" over its history, so it may be interesting to know the
> ratio of the merge commits in the total commits, and also the
> average number of merge parents. But after you obtain such numbers,
> you cannot do anything about it, as you cannot afford to rewrite its
> history only to improve the "performance and health".
The above example is actually something I would like to add in a future
series. From my perspective, if a certain repository operation is
performing poorly, it is still valuable to have insight into the reason
why, regardless of whether it is realistically actionable or not for the
user.
I will try to better clarify the command's intent in the log message.
> And that is what makes "key stats" relative to your goal. If your
> goal is to give stats on the things you can control (e.g., how long
> a typical delta chain is, how many loose objects there are that can
> be moved to a packfile, how small would your object database would
> become if you prune all the unreachable objects), that would cut off
> some stats that may still be interesting but may not contribute to
> address "huge impacts on the performance and health".
I would say the main goal of this command is to surface interesting
information about the repository and its object graph structure.
Something that may make a stat "interesting" is if its value could be a
potential indicator of poor repository performance. Like the max number of
parents a commit has, or the max number of entries a tree has. I don't
think the actual stat value needs to be concerning itself for it to be
displayed though.
If there is actionable recourse a user can take to remediate a
concerning stat that would be ideal, but I see the primarly goal being
to just surface the information regardless.
> With Devil's advocate hat on, a single command that gives a set of
> stats that are "key" to a goal of a single use case may not be as
> useful as a collection of commands, each of which gives stats on one
> aspect of the repository, that can be combined to help you address
> various different goals.
Good points. From my perspective, the benefit of having a single command
here is to provide a simple means to generate a report of the general
repository shape. In this context, "key stats" reflect certain
charactistics about the repository that may be concerning performance
wise for typical repository operations or just of interest in general.
One of the motivations here is to enable a user to easily generate such
a report and be able to share it with others that may not have access to
the underlying repository.
I think this still could leave room for more fine-grained commands that
can surface more targeted information about a repository with other
goals in mind in the future though.
> > To allow users to more readily identify potential issues for a
> > repository, introduce the "stats" subcommand in git-repo(1) to output
> > stats for the repository that may be of interest to users. The goal of
> > this subcommand is to eventually provide similar functionality to
> > git-sizer(1), but natively in Git.
>
> So, it is needless to say that the kind of "stats" obtained by such
> a single tool needs to be chosen carefully, but more importantly,
> its output should give users actionable output, as whoever designed
> such a tool and chose what "key stats" are has a clear idea on
> various aspects of repository. "stats" measure the health of the
> repository against certain yardstick, but it should come with a
> clear instruction to make use of that measurement. The tool may say
> "the stats indicate that you have commits that touch too many paths
> at the same time". The users need to be know what consequence of
> that finding is, and what they can do about it.
As mentioned above, from my perspective, the git-repo-stats command
itself is not about specifically targeting and diagnoising actionable
issues that a repository has. It's primary focus should be to provide
insight about the repository structure that may be helpful when trying
to understand certain repository performance characteristics in general.
In the next version I'll rework this log message to better clarify the
intent of this command.
> For example, what would the user do with the new knowledge that the
> repository has 100x as many local branches as there are
> remote-tracking branches? Without breaking down these numerous
> local branches into those that are still used in active development
> (hint: peek into their reflog), kept as historical landmarks, past
> development that has already been merged (hint: "git branch --list
> --merged origin/master"), or abandoned cruft that hasn't been
> touched with some changes that are not merged anywhere, the users
> would not know what to do.
In a future series, I would like to introduce a "level of concern" meter
for the outputted stats. As you mentioned earlier, this could provide a
measure of health for a repository stat against a certain yardstick. At
that point in time, I think it would also make sense to provide
documentation on actions that a user could potentially take to address
specific stats that are marked with a higher level of concern. Certain
stats that get identified as concerning may not be realistically
actionable though.
For now, I think it's fine to omit this though because the outputted
stats are presented agnostically without any concern level.
> > +`stats`::
> > + Retrieve statistics about the current repository. The following kinds
> > + of information are reported:
> > ++
> > +* Reference counts categorized by type
> > +
> > ++
> > +The table output format may change and is not intended for machine parsing.
>
> Do we eventually want to give another format that is intended for
> machine parsing?
Yes, and we introduce a key-value and NUL format later in this series. I
will mention this in the log message.
> In a format meant for human consumption, is it still sensible to
> target fixed-column terminals these days? Rather, would they want
> prettier-formatted html, or csv that they can easily import to
> spreadsheet? (these are not objections but genuine questions).
From my perspective, having an output format that can be immediately
viewed in the same place the command is run is still quite valuable and
is still common in similar tooling.
I think there may also be value in additional formats, such as the ones
mentioned above, but I think those should be implemented as a separate
series if demand presents itself.
> > +static void stats_table_print(const struct stats_table *table)
> > +{
> > + const char *name_col_title = _("Repository stats");
> > + const char *value_col_title = _("Value");
> > + size_t name_title_len = utf8_strwidth(name_col_title);
> > + size_t value_title_len = utf8_strwidth(value_col_title);
> > + struct string_list_item *item;
> > + int name_col_width;
> > + int value_col_width;
> > +
> > + name_col_width = cast_size_t_to_int(
> > + max_size_t(table->name_col_width, name_title_len));
> > + value_col_width = cast_size_t_to_int(
> > + max_size_t(table->value_col_width, value_title_len));
>
> If table->name_col_width and table->value_col_width were int to
> begin with, none of these casts would have been necessary. Aren't
> we overusing size_t to count things that are not memory allocations?
Yes. Storing the column widths as size_t in this scenario doesn't make
much sense because they need to be an int for the format string anyways.
Furthermore, the number of columns will always be a relatively small
number.
I'll fix this in the next version. :)
> > + printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
> > + value_col_width, value_col_title);
> > + printf("| ");
> > + for (int i = 0; i < name_col_width; i++)
> > + putchar('-');
> > + printf(" | ");
> > + for (int i = 0; i < value_col_width; i++)
> > + putchar('-');
> > + printf(" |\n");
>
> I wonder if people want to use unicode "Box Drawing" block and other
> fancier things, as we assume utf8 for names and values, in which
> case these printf would need to be "translatable", but locale
> administrators should not have more say than others what kind of
> line drawing elements are to be used, so perhaps the above is good
> enough at least for now.
I think the current table is probably sufficient for now. I do forsee
iteration on this format in future series though.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-10-09 22:09 ` Justin Tobler
@ 2025-10-10 0:42 ` Justin Tobler
2025-10-10 6:53 ` Patrick Steinhardt
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-10-10 0:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, karthik.188, sunshine, Derrick Stolee
On 25/10/09 05:09PM, Justin Tobler wrote:
> On 25/09/27 09:32AM, Junio C Hamano wrote:
> > With Devil's advocate hat on, a single command that gives a set of
> > stats that are "key" to a goal of a single use case may not be as
> > useful as a collection of commands, each of which gives stats on one
> > aspect of the repository, that can be combined to help you address
> > various different goals.
>
> Good points. From my perspective, the benefit of having a single command
> here is to provide a simple means to generate a report of the general
> repository shape. In this context, "key stats" reflect certain
> charactistics about the repository that may be concerning performance
> wise for typical repository operations or just of interest in general.
>
> One of the motivations here is to enable a user to easily generate such
> a report and be able to share it with others that may not have access to
> the underlying repository.
>
> I think this still could leave room for more fine-grained commands that
> can surface more targeted information about a repository with other
> goals in mind in the future though.
Thinking about this some more, a single "stats" command is indeed rather
vauge. Furthermore, as Junio mentioned, there could be other aspects of
a repository that we want to display stats for in the future.
Since the goal of this command is to surface info about a repositories
structure, may we should instead call this command `git repo structure`?
Or something else along those lines that in more specific and related to
goal of the command?
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-10-10 0:42 ` Justin Tobler
@ 2025-10-10 6:53 ` Patrick Steinhardt
2025-10-10 14:34 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-10-10 6:53 UTC (permalink / raw)
To: Justin Tobler; +Cc: Junio C Hamano, git, karthik.188, sunshine, Derrick Stolee
On Thu, Oct 09, 2025 at 07:42:40PM -0500, Justin Tobler wrote:
> On 25/10/09 05:09PM, Justin Tobler wrote:
> > On 25/09/27 09:32AM, Junio C Hamano wrote:
> > > With Devil's advocate hat on, a single command that gives a set of
> > > stats that are "key" to a goal of a single use case may not be as
> > > useful as a collection of commands, each of which gives stats on one
> > > aspect of the repository, that can be combined to help you address
> > > various different goals.
> >
> > Good points. From my perspective, the benefit of having a single command
> > here is to provide a simple means to generate a report of the general
> > repository shape. In this context, "key stats" reflect certain
> > charactistics about the repository that may be concerning performance
> > wise for typical repository operations or just of interest in general.
> >
> > One of the motivations here is to enable a user to easily generate such
> > a report and be able to share it with others that may not have access to
> > the underlying repository.
> >
> > I think this still could leave room for more fine-grained commands that
> > can surface more targeted information about a repository with other
> > goals in mind in the future though.
>
> Thinking about this some more, a single "stats" command is indeed rather
> vauge. Furthermore, as Junio mentioned, there could be other aspects of
> a repository that we want to display stats for in the future.
>
> Since the goal of this command is to surface info about a repositories
> structure, may we should instead call this command `git repo structure`?
> Or something else along those lines that in more specific and related to
> goal of the command?
Some alternatives that come to my mind:
- inspect
- analyze
- scan
- survey
- measure
I don't have any specific preference. What I like though is that those
are verbs, which makes it a bit more natural to use them.
Feel free to take any or ignore all of these.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-10-10 6:53 ` Patrick Steinhardt
@ 2025-10-10 14:34 ` Justin Tobler
2025-10-13 6:13 ` Patrick Steinhardt
0 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-10-10 14:34 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Junio C Hamano, git, karthik.188, sunshine, Derrick Stolee
On 25/10/10 08:53AM, Patrick Steinhardt wrote:
> On Thu, Oct 09, 2025 at 07:42:40PM -0500, Justin Tobler wrote:
> > Thinking about this some more, a single "stats" command is indeed rather
> > vauge. Furthermore, as Junio mentioned, there could be other aspects of
> > a repository that we want to display stats for in the future.
> >
> > Since the goal of this command is to surface info about a repositories
> > structure, may we should instead call this command `git repo structure`?
> > Or something else along those lines that in more specific and related to
> > goal of the command?
>
> Some alternatives that come to my mind:
>
> - inspect
> - analyze
> - scan
> - survey
> - measure
>
> I don't have any specific preference. What I like though is that those
> are verbs, which makes it a bit more natural to use them.
If we want the theme of this command to be the repository's
structure/shape and have a name that matches this scope, I'm not sure
any of the above examples would move us closer to that. Aligning the
command name to its scope is beneficial if we forsee the potential to
introduce additional subcommands for git-repo(1) that target other
aspects of a repository.
From my perspective, the main question is: should git-repo-stats be a
generic command that can eventual provides all sorts of different stats?
Or should it stick to repository structure/shape information? I think
I'm currently leaning towards the latter.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v4 4/7] builtin/repo: introduce stats subcommand
2025-10-10 14:34 ` Justin Tobler
@ 2025-10-13 6:13 ` Patrick Steinhardt
0 siblings, 0 replies; 92+ messages in thread
From: Patrick Steinhardt @ 2025-10-13 6:13 UTC (permalink / raw)
To: Justin Tobler; +Cc: Junio C Hamano, git, karthik.188, sunshine, Derrick Stolee
On Fri, Oct 10, 2025 at 09:34:54AM -0500, Justin Tobler wrote:
> On 25/10/10 08:53AM, Patrick Steinhardt wrote:
> > On Thu, Oct 09, 2025 at 07:42:40PM -0500, Justin Tobler wrote:
> > > Thinking about this some more, a single "stats" command is indeed rather
> > > vauge. Furthermore, as Junio mentioned, there could be other aspects of
> > > a repository that we want to display stats for in the future.
> > >
> > > Since the goal of this command is to surface info about a repositories
> > > structure, may we should instead call this command `git repo structure`?
> > > Or something else along those lines that in more specific and related to
> > > goal of the command?
> >
> > Some alternatives that come to my mind:
> >
> > - inspect
> > - analyze
> > - scan
> > - survey
> > - measure
> >
> > I don't have any specific preference. What I like though is that those
> > are verbs, which makes it a bit more natural to use them.
>
> If we want the theme of this command to be the repository's
> structure/shape and have a name that matches this scope, I'm not sure
> any of the above examples would move us closer to that. Aligning the
> command name to its scope is beneficial if we forsee the potential to
> introduce additional subcommands for git-repo(1) that target other
> aspects of a repository.
Fair.
> From my perspective, the main question is: should git-repo-stats be a
> generic command that can eventual provides all sorts of different stats?
> Or should it stick to repository structure/shape information? I think
> I'm currently leaning towards the latter.
Yeah, I'm leaning towards the latter, as well.
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v5 0/6] builtin/repo: introduce structure subcommand
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
` (7 preceding siblings ...)
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
` (6 more replies)
8 siblings, 7 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
Greetings,
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.
To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.
In this initial version, the "structure" subcommand only surfaces counts
of the various reference and object types in a repository. In a
follow-up series, I would like to introduce additional data points that
are present in git-sizer(1) such as largest objects, combined object
sizes by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
Changes since V4:
- The subcommand was renamed from "stats" to "structure". This was done
to define a more narrow scope for the types of stats that would be
outtputted. This also also for other types of stat-related subcommands
to be implemented in the future that may cover different aspects of
the repository.
- Table column widths are now stored as ints to avoid the unneeded back
and forth conversions.
- Dropped the clang-format patch as it has been upstreamed separately.
- Updated commit messages accordingly.
Changes since V3:
- Changed from using strlen() to utf8_strlen() to take into
consideration that translatable strings may have characters that are
more than one byte.
Changes since V2:
- Added clang-format patch to address false postive triggered in this
series.
- Use varargs for stats_table_add() family of functions.
- Print to stdout directly instead of using strbuf.
- Add parse_option() earlier in the series.
- Use start_delayed_progress() instead of start_progress().
- Add test to validate --[no-]progress options.
- Some other small fixes.
Changes since V1:
- Translatable terms displayed in the table have formatting separated
out.
- Squashed the `keyvalue` and `nul` output format patches into one.
- Added a progress meter to provide users with more feedback.
- Updated docs to outline to outline reported data in a bulleted list.
- Combined similar tests together to reduce repetitive setup.
- Added patch to improve ref-filter interface so we don't have to create
a dummy patterns array.
- Many other renames and cleanups to improve patch clarity.
Thanks,
-Justin
Justin Tobler (6):
builtin/repo: rename repo_info() to cmd_repo_info()
ref-filter: allow NULL filter pattern
builtin/repo: introduce structure subcommand
builtin/repo: add object counts in structure output
builtin/repo: add keyvalue and nul format for structure stats
builtin/repo: add progress meter for structure stats
Documentation/git-repo.adoc | 30 +++
builtin/repo.c | 370 +++++++++++++++++++++++++++++++++++-
ref-filter.c | 4 +-
t/meson.build | 1 +
t/t1901-repo-structure.sh | 129 +++++++++++++
5 files changed, 529 insertions(+), 5 deletions(-)
create mode 100755 t/t1901-repo-structure.sh
Range-diff against v4:
1: ed04168562 = 1: ed04168562 builtin/repo: rename repo_info() to cmd_repo_info()
2: 6aa76d1323 = 2: 6aa76d1323 ref-filter: allow NULL filter pattern
3: 02a3fcc5fb < -: ---------- clang-format: exclude control macros from SpaceBeforeParens
4: 8ec9914886 ! 3: eda1afbe3d builtin/repo: introduce stats subcommand
@@ Metadata
Author: Justin Tobler <jltobler@gmail.com>
## Commit message ##
- builtin/repo: introduce stats subcommand
+ builtin/repo: introduce structure subcommand
- The shape of a repository's history can have huge impacts on the
+ The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
- means to surface key stats/information regarding the shape of a
- repository via a single command. Acquiring this information requires
- users to be fairly knowledgeable about the structure of a Git repository
- and how to identify the relevant data points. To fill this gap,
- supplemental tools such as git-sizer(1) have been developed.
+ means to surface repository metrics regarding its structure/shape via a
+ single command. Acquiring this information requires users to be familiar
+ with the relevant data points and the various Git commands required to
+ surface them. To fill this gap, supplemental tools such as git-sizer(1)
+ have been developed.
- To allow users to more readily identify potential issues for a
- repository, introduce the "stats" subcommand in git-repo(1) to output
- stats for the repository that may be of interest to users. The goal of
- this subcommand is to eventually provide similar functionality to
- git-sizer(1), but natively in Git.
+ To allow users to more readily identify repository structure related
+ information, introduce the "structure" subcommand in git-repo(1). The
+ goal of this subcommand is to eventually provide similar functionality
+ to git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
@@ Commit message
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
- output.
+ output and also provide other more machine-friendly output formats.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
@@ Documentation/git-repo.adoc: SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-+git repo stats
++git repo structure
DESCRIPTION
-----------
@@ Documentation/git-repo.adoc: supported:
+
`-z` is an alias for `--format=nul`.
-+`stats`::
-+ Retrieve statistics about the current repository. The following kinds
-+ of information are reported:
++`structure`::
++ Retrieve statistics about the current repository structure. The
++ following kinds of information are reported:
++
+* Reference counts categorized by type
+
@@ builtin/repo.c
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
-+ "git repo stats",
++ "git repo structure",
NULL
};
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+struct stats_table {
+ struct string_list rows;
+
-+ size_t name_col_width;
-+ size_t value_col_width;
++ int name_col_width;
++ int value_col_width;
+};
+
+/*
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
-+ size_t name_width;
++ int name_width;
+
+ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, NULL);
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
-+ size_t value_width = utf8_strwidth(entry->value);
++ int value_width = utf8_strwidth(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ va_end(ap);
+}
+
-+static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
++static void stats_table_setup_structure(struct stats_table *table,
++ struct ref_stats *refs)
+{
+ size_t ref_total;
+
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
-+static inline size_t max_size_t(size_t a, size_t b)
++static void stats_table_print_structure(const struct stats_table *table)
+{
-+ return (a > b) ? a : b;
-+}
-+
-+static void stats_table_print(const struct stats_table *table)
-+{
-+ const char *name_col_title = _("Repository stats");
++ const char *name_col_title = _("Repository structure");
+ const char *value_col_title = _("Value");
-+ size_t name_title_len = utf8_strwidth(name_col_title);
-+ size_t value_title_len = utf8_strwidth(value_col_title);
++ int name_col_width = utf8_strwidth(name_col_title);
++ int value_col_width = utf8_strwidth(value_col_title);
+ struct string_list_item *item;
-+ int name_col_width;
-+ int value_col_width;
+
-+ name_col_width = cast_size_t_to_int(
-+ max_size_t(table->name_col_width, name_title_len));
-+ value_col_width = cast_size_t_to_int(
-+ max_size_t(table->value_col_width, value_title_len));
++ if (table->name_col_width > name_col_width)
++ name_col_width = table->name_col_width;
++ if (table->value_col_width > value_col_width)
++ value_col_width = table->value_col_width;
+
+ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ string_list_clear(&table->rows, 1);
+}
+
-+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
++static void structure_count_references(struct ref_stats *stats,
++ struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ }
+}
+
-+static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
-+ struct repository *repo UNUSED)
++static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
++ struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
+
-+ stats_count_references(&stats, &refs);
++ structure_count_references(&stats, &refs);
+
-+ stats_table_setup(&table, &stats);
-+ stats_table_print(&table);
++ stats_table_setup_structure(&table, &stats);
++ stats_table_print_structure(&table);
+
+ stats_table_clear(&table);
+ ref_array_clear(&refs);
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
-+ OPT_SUBCOMMAND("stats", &fn, cmd_repo_stats),
++ OPT_SUBCOMMAND("structure", &fn, cmd_repo_structure),
OPT_END()
};
@@ t/meson.build: integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
-+ 't1901-repo-stats.sh',
++ 't1901-repo-structure.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
- ## t/t1901-repo-stats.sh (new) ##
+ ## t/t1901-repo-structure.sh (new) ##
@@
+#!/bin/sh
+
-+test_description='test git repo stats'
++test_description='test git repo structure'
+
+. ./test-lib.sh
+
@@ t/t1901-repo-stats.sh (new)
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
-+ | Repository stats | Value |
-+ | ---------------- | ----- |
-+ | * References | |
-+ | * Count | 0 |
-+ | * Branches | 0 |
-+ | * Tags | 0 |
-+ | * Remotes | 0 |
-+ | * Others | 0 |
++ | Repository structure | Value |
++ | -------------------- | ----- |
++ | * References | |
++ | * Count | 0 |
++ | * Branches | 0 |
++ | * Tags | 0 |
++ | * Remotes | 0 |
++ | * Others | 0 |
+ EOF
+
-+ git repo stats >out 2>err &&
++ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
@@ t/t1901-repo-stats.sh (new)
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
-+ | Repository stats | Value |
-+ | ---------------- | ----- |
-+ | * References | |
-+ | * Count | 4 |
-+ | * Branches | 1 |
-+ | * Tags | 1 |
-+ | * Remotes | 1 |
-+ | * Others | 1 |
++ | Repository structure | Value |
++ | -------------------- | ----- |
++ | * References | |
++ | * Count | 4 |
++ | * Branches | 1 |
++ | * Tags | 1 |
++ | * Remotes | 1 |
++ | * Others | 1 |
+ EOF
+
-+ git repo stats >out 2>err &&
++ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
5: 584d35f2c7 ! 4: 503af885d3 builtin/repo: add object counts in stats output
@@ Metadata
Author: Justin Tobler <jltobler@gmail.com>
## Commit message ##
- builtin/repo: add object counts in stats output
+ builtin/repo: add object counts in structure output
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
@@ Commit message
## Documentation/git-repo.adoc ##
@@ Documentation/git-repo.adoc: supported:
- of information are reported:
+ following kinds of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
@@ builtin/repo.c: struct ref_stats {
+ size_t blobs;
+};
+
-+struct repo_stats {
++struct repo_structure {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, si
va_end(ap);
}
--static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+static inline size_t get_total_object_count(struct object_stats *stats)
- {
++{
+ return stats->tags + stats->commits + stats->trees + stats->blobs;
+}
+
-+static void stats_table_setup(struct stats_table *table, struct repo_stats *stats)
-+{
+ static void stats_table_setup_structure(struct stats_table *table,
+- struct ref_stats *refs)
++ struct repo_structure *stats)
+ {
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
-@@ builtin/repo.c: static void stats_table_setup(struct stats_table *table, struct ref_stats *refs)
+@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
@@ builtin/repo.c: static void stats_table_setup(struct stats_table *table, struct
+ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
- static inline size_t max_size_t(size_t a, size_t b)
-@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, struct ref_array *re
+ static void stats_table_print_structure(const struct stats_table *table)
+@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
}
}
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
+ return 0;
+}
+
-+static void stats_count_objects(struct object_stats *stats,
-+ struct ref_array *refs, struct rev_info *revs)
++static void structure_count_objects(struct object_stats *stats,
++ struct ref_array *refs,
++ struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
+ path_walk_info_clear(&info);
+}
+
- static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
-- struct repository *repo UNUSED)
-+ struct repository *repo)
+ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+- struct repository *repo UNUSED)
++ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
-+ struct repo_stats stats = { 0 };
++ struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
-- stats_count_references(&stats, &refs);
-+ stats_count_references(&stats.refs, &refs);
-+ stats_count_objects(&stats.objects, &refs, &revs);
+- structure_count_references(&stats, &refs);
++ structure_count_references(&stats.refs, &refs);
++ structure_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup(&table, &stats);
- stats_table_print(&table);
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, stru
return 0;
- ## t/t1901-repo-stats.sh ##
-@@ t/t1901-repo-stats.sh: test_expect_success 'empty repository' '
- (
- cd repo &&
- cat >expect <<-\EOF &&
-- | Repository stats | Value |
-- | ---------------- | ----- |
-- | * References | |
-- | * Count | 0 |
-- | * Branches | 0 |
-- | * Tags | 0 |
-- | * Remotes | 0 |
-- | * Others | 0 |
-+ | Repository stats | Value |
-+ | ------------------- | ----- |
-+ | * References | |
-+ | * Count | 0 |
-+ | * Branches | 0 |
-+ | * Tags | 0 |
-+ | * Remotes | 0 |
-+ | * Others | 0 |
-+ | | |
-+ | * Reachable objects | |
-+ | * Count | 0 |
-+ | * Commits | 0 |
-+ | * Trees | 0 |
-+ | * Blobs | 0 |
-+ | * Tags | 0 |
+ ## t/t1901-repo-structure.sh ##
+@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
++ | | |
++ | * Reachable objects | |
++ | * Count | 0 |
++ | * Commits | 0 |
++ | * Trees | 0 |
++ | * Blobs | 0 |
++ | * Tags | 0 |
EOF
- git repo stats >out 2>err &&
-@@ t/t1901-repo-stats.sh: test_expect_success 'empty repository' '
+ git repo structure >out 2>err &&
+@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
)
'
@@ t/t1901-repo-stats.sh: test_expect_success 'empty repository' '
git notes add -m foo &&
cat >expect <<-\EOF &&
-- | Repository stats | Value |
-- | ---------------- | ----- |
-- | * References | |
-- | * Count | 4 |
-- | * Branches | 1 |
-- | * Tags | 1 |
-- | * Remotes | 1 |
-- | * Others | 1 |
-+ | Repository stats | Value |
-+ | ------------------- | ----- |
-+ | * References | |
-+ | * Count | 4 |
-+ | * Branches | 1 |
-+ | * Tags | 1 |
-+ | * Remotes | 1 |
-+ | * Others | 1 |
-+ | | |
-+ | * Reachable objects | |
-+ | * Count | 130 |
-+ | * Commits | 43 |
-+ | * Trees | 43 |
-+ | * Blobs | 43 |
-+ | * Tags | 1 |
+@@ t/t1901-repo-structure.sh: test_expect_success 'repository with references' '
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
++ | | |
++ | * Reachable objects | |
++ | * Count | 130 |
++ | * Commits | 43 |
++ | * Trees | 43 |
++ | * Blobs | 43 |
++ | * Tags | 1 |
EOF
- git repo stats >out 2>err &&
+ git repo structure >out 2>err &&
6: 76975b2eab ! 5: b336578445 builtin/repo: add keyvalue and nul format for stats
@@ Metadata
Author: Justin Tobler <jltobler@gmail.com>
## Commit message ##
- builtin/repo: add keyvalue and nul format for stats
+ builtin/repo: add keyvalue and nul format for structure stats
- All repository stats are outputted in a human-friendly table form. This
- format is not suitable for machine parsing. Add a --format option that
- supports three output modes: `table`, `keyvalue`, and `nul`. The `table`
- mode is the default format and prints the same table output as before.
+ All repository structure stats are outputted in a human-friendly table
+ form. This format is not suitable for machine parsing. Add a --format
+ option that supports three output modes: `table`, `keyvalue`, and `nul`.
+ The `table` mode is the default format and prints the same table output
+ as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
@@ Documentation/git-repo.adoc: SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
--git repo stats
-+git repo stats [--format=(table|keyvalue|nul)]
+-git repo structure
++git repo structure [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ Documentation/git-repo.adoc: supported:
+
`-z` is an alias for `--format=nul`.
--`stats`::
-+`stats [--format=(table|keyvalue|nul)]`::
- Retrieve statistics about the current repository. The following kinds
- of information are reported:
+-`structure`::
++`structure [--format=(table|keyvalue|nul)]`::
+ Retrieve statistics about the current repository structure. The
+ following kinds of information are reported:
+
@@ Documentation/git-repo.adoc: supported:
* Reachable object counts categorized by type
@@ builtin/repo.c
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
-- "git repo stats",
-+ "git repo stats [--format=(table|keyvalue|nul)]",
+- "git repo structure",
++ "git repo structure [--format=(table|keyvalue|nul)]",
NULL
};
@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
-+static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
-+ char value_delim)
++static void structure_keyvalue_print(struct repo_structure *stats,
++ char key_delim, char value_delim)
+{
+ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.branches, value_delim);
@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
+ fflush(stdout);
+}
+
- static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
+ static void structure_count_references(struct ref_stats *stats,
+ struct ref_array *refs)
{
- for (int i = 0; i < refs->nr; i++) {
-@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
- struct repo_stats stats = { 0 };
+ struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const cha
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
-@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
- stats_count_references(&stats.refs, &refs);
- stats_count_objects(&stats.objects, &refs, &revs);
+@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+ structure_count_references(&stats.refs, &refs);
+ structure_count_objects(&stats.objects, &refs, &revs);
-- stats_table_setup(&table, &stats);
-- stats_table_print(&table);
+- stats_table_setup_structure(&table, &stats);
+- stats_table_print_structure(&table);
+ switch (format) {
+ case FORMAT_TABLE:
-+ stats_table_setup(&table, &stats);
-+ stats_table_print(&table);
++ stats_table_setup_structure(&table, &stats);
++ stats_table_print_structure(&table);
+ break;
+ case FORMAT_KEYVALUE:
-+ stats_keyvalue_print(&stats, '=', '\n');
++ structure_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
-+ stats_keyvalue_print(&stats, '\n', '\0');
++ structure_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const cha
stats_table_clear(&table);
release_revisions(&revs);
- ## t/t1901-repo-stats.sh ##
-@@ t/t1901-repo-stats.sh: test_expect_success 'repository with references and objects' '
+ ## t/t1901-repo-structure.sh ##
+@@ t/t1901-repo-structure.sh: test_expect_success 'repository with references and objects' '
)
'
@@ t/t1901-repo-stats.sh: test_expect_success 'repository with references and objec
+ objects.tags.count=1
+ EOF
+
-+ git repo stats --format=keyvalue >out 2>err &&
++ git repo structure --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n=" "\0\n" <expect >expect_nul &&
-+ git repo stats --format=nul >out 2>err &&
++ git repo structure --format=nul >out 2>err &&
+
+ test_cmp expect_nul out &&
+ test_line_count = 0 err
7: 1105346a3c ! 6: 70c0b7e200 builtin/repo: add progress meter for stats
@@ Metadata
Author: Justin Tobler <jltobler@gmail.com>
## Commit message ##
- builtin/repo: add progress meter for stats
+ builtin/repo: add progress meter for structure stats
- When using the stats subcommand for git-repo(1), evaluating a repository
- may take some time depending on its shape. Add a progress meter to
- provide feedback to the user about what is happening. The progress meter
- is enabled by default when the command is executed from a tty. It can
- also be explicitly enabled/disabled via the --[no-]progress option.
+ When using the structure subcommand for git-repo(1), evaluating a
+ repository may take some time depending on its shape. Add a progress
+ meter to provide feedback to the user about what is happening. The
+ progress meter is enabled by default when the command is executed from a
+ tty. It can also be explicitly enabled/disabled via the --[no-]progress
+ option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
@@ builtin/repo.c
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
-@@ builtin/repo.c: static void stats_keyvalue_print(struct repo_stats *stats, char key_delim,
- fflush(stdout);
+@@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
}
--static void stats_count_references(struct ref_stats *stats, struct ref_array *refs)
-+static void stats_count_references(struct ref_stats *stats, struct ref_array *refs,
-+ struct repository *repo, int show_progress)
+ static void structure_count_references(struct ref_stats *stats,
+- struct ref_array *refs)
++ struct ref_array *refs,
++ struct repository *repo,
++ int show_progress)
{
+ struct progress *progress = NULL;
+
@@ builtin/repo.c: static void stats_keyvalue_print(struct repo_stats *stats, char
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
-@@ builtin/repo.c: static void stats_count_references(struct ref_stats *stats, struct ref_array *re
+@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
default:
BUG("unexpected reference type");
}
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
return 0;
}
- static void stats_count_objects(struct object_stats *stats,
-- struct ref_array *refs, struct rev_info *revs)
-+ struct ref_array *refs, struct rev_info *revs,
-+ struct repository *repo, int show_progress)
+ static void structure_count_objects(struct object_stats *stats,
+- struct ref_array *refs,
+- struct rev_info *revs)
++ struct ref_array *refs, struct rev_info *revs,
++ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
-@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
+@@ builtin/repo.c: static void structure_count_objects(struct object_stats *stats,
}
}
@@ builtin/repo.c: static void stats_count_objects(struct object_stats *stats,
+ stop_progress(&data.progress);
}
- static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
-@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
- struct repo_stats stats = { 0 };
+ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+ struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const cha
OPT_END()
};
-@@ builtin/repo.c: static int cmd_repo_stats(int argc, const char **argv, const char *prefix,
+@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
-- stats_count_references(&stats.refs, &refs);
-- stats_count_objects(&stats.objects, &refs, &revs);
+- structure_count_references(&stats.refs, &refs);
+- structure_count_objects(&stats.objects, &refs, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
-+ stats_count_references(&stats.refs, &refs, repo, show_progress);
-+ stats_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
++ structure_count_references(&stats.refs, &refs, repo, show_progress);
++ structure_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
- ## t/t1901-repo-stats.sh ##
-@@ t/t1901-repo-stats.sh: test_expect_success 'keyvalue and nul format' '
+ ## t/t1901-repo-structure.sh ##
+@@ t/t1901-repo-structure.sh: test_expect_success 'keyvalue and nul format' '
)
'
@@ t/t1901-repo-stats.sh: test_expect_success 'keyvalue and nul format' '
+ cd repo &&
+ test_commit foo &&
+
-+ GIT_PROGRESS_DELAY=0 git repo stats --progress >out 2>err &&
++ GIT_PROGRESS_DELAY=0 git repo structure --progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_grep "Counting references: 100% (2/2), done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
-+ GIT_PROGRESS_DELAY=0 git repo stats --no-progress >out 2>err &&
++ GIT_PROGRESS_DELAY=0 git repo structure --no-progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_line_count = 0 err
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info()
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
` (5 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..eeeab8fbd2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -136,8 +136,8 @@ static int parse_format_cb(const struct option *opt,
return 0;
}
-static int repo_info(int argc, const char **argv, const char *prefix,
- struct repository *repo)
+static int cmd_repo_info(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
enum output_format format = FORMAT_KEYVALUE;
struct option options[] = {
@@ -161,7 +161,7 @@ int cmd_repo(int argc, const char **argv, const char *prefix,
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
- OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
OPT_END()
};
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v5 2/6] ref-filter: allow NULL filter pattern
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
` (4 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..2cb5a166d6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char **pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
@@ -2751,7 +2751,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
- if (!filter->name_patterns[0]) {
+ if (!filter->name_patterns || !filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v5 3/6] builtin/repo: introduce structure subcommand
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-16 10:58 ` Patrick Steinhardt
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
` (3 subsequent siblings)
6 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler, Derrick Stolee
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.
To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output and also provide other more machine-friendly output formats.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++
builtin/repo.c | 193 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-structure.sh | 61 ++++++++++++
4 files changed, 265 insertions(+)
create mode 100755 t/t1901-repo-structure.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..8193298dd5 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo structure
DESCRIPTION
-----------
@@ -43,6 +44,15 @@ supported:
+
`-z` is an alias for `--format=nul`.
+`structure`::
+ Retrieve statistics about the current repository structure. The
+ following kinds of information are reported:
++
+* Reference counts categorized by type
+
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index eeeab8fbd2..4575cf9467 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,16 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
+#include "utf8.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo structure",
NULL
};
@@ -156,12 +160,201 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct ref_stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ int name_col_width;
+ int value_col_width;
+};
+
+/*
+ * Holds column data that gets stored for each row.
+ */
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_vaddf(struct stats_table *table,
+ struct stats_table_entry *entry,
+ const char *format, va_list ap)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ int name_width;
+
+ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, NULL);
+ name_width = utf8_strwidth(formatted_name);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ int value_width = utf8_strwidth(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_addf(struct stats_table *table, const char *format, ...)
+{
+ va_list ap;
+
+ va_start(ap, format);
+ stats_table_vaddf(table, NULL, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_count_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_setup_structure(struct stats_table *table,
+ struct ref_stats *refs)
+{
+ size_t ref_total;
+
+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
+ stats_table_addf(table, "* %s", _("References"));
+ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
+ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
+ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
+static void stats_table_print_structure(const struct stats_table *table)
+{
+ const char *name_col_title = _("Repository structure");
+ const char *value_col_title = _("Value");
+ int name_col_width = utf8_strwidth(name_col_title);
+ int value_col_width = utf8_strwidth(value_col_title);
+ struct string_list_item *item;
+
+ if (table->name_col_width > name_col_width)
+ name_col_width = table->name_col_width;
+ if (table->value_col_width > value_col_width)
+ value_col_width = table->value_col_width;
+
+ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ printf("| ");
+ for (int i = 0; i < name_col_width; i++)
+ putchar('-');
+ printf(" | ");
+ for (int i = 0; i < value_col_width; i++)
+ putchar('-');
+ printf(" |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ printf("| %-*s | %*s |\n", name_col_width, item->string,
+ value_col_width, value);
+ }
+}
+
+static void stats_table_clear(struct stats_table *table)
+{
+ struct stats_table_entry *entry;
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, &table->rows) {
+ entry = item->util;
+ if (entry)
+ free(entry->value);
+ }
+
+ string_list_clear(&table->rows, 1);
+}
+
+static void structure_count_references(struct ref_stats *stats,
+ struct ref_array *refs)
+{
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+}
+
+static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+ struct repository *repo UNUSED)
+{
+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
+ struct ref_array refs = { 0 };
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+
+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
+ die(_("unable to filter refs"));
+
+ structure_count_references(&stats, &refs);
+
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
+
+ stats_table_clear(&table);
+ ref_array_clear(&refs);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
+ OPT_SUBCOMMAND("structure", &fn, cmd_repo_structure),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..9e426f8edc 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-structure.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
new file mode 100755
index 0000000000..e592eea0eb
--- /dev/null
+++ b/t/t1901-repo-structure.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='test git repo structure'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
+ | Repository structure | Value |
+ | -------------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ git tag -a foo -m bar &&
+
+ oid="$(git rev-parse HEAD)" &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
+ | Repository structure | Value |
+ | -------------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v5 4/6] builtin/repo: add object counts in structure output
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
` (2 preceding siblings ...)
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
` (2 subsequent siblings)
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 97 +++++++++++++++++++++++++++++++++++--
t/t1901-repo-structure.sh | 19 +++++++-
3 files changed, 111 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 8193298dd5..ae62d2415f 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -49,6 +49,7 @@ supported:
following kinds of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index 4575cf9467..0bc3c1e458 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -167,6 +169,18 @@ struct ref_stats {
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct repo_structure {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -229,9 +243,17 @@ static void stats_table_count_addf(struct stats_table *table, size_t value,
va_end(ap);
}
+static inline size_t get_total_object_count(struct object_stats *stats)
+{
+ return stats->tags + stats->commits + stats->trees + stats->blobs;
+}
+
static void stats_table_setup_structure(struct stats_table *table,
- struct ref_stats *refs)
+ struct repo_structure *stats)
{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
@@ -241,6 +263,15 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+
+ object_total = get_total_object_count(objects);
+ stats_table_addf(table, "");
+ stats_table_addf(table, "* %s", _("Reachable objects"));
+ stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -319,30 +350,88 @@ static void structure_count_references(struct ref_stats *stats,
}
}
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
+ struct object_stats *stats = cb_data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ BUG("invalid object type");
+ }
+
+ return 0;
+}
+
+static void structure_count_objects(struct object_stats *stats,
+ struct ref_array *refs,
+ struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ for (int i = 0; i < refs->nr; i++) {
+ struct ref_array_item *ref = refs->items[i];
+
+ switch (ref->kind) {
+ case FILTER_REFS_BRANCHES:
+ case FILTER_REFS_TAGS:
+ case FILTER_REFS_REMOTES:
+ case FILTER_REFS_OTHERS:
+ add_pending_oid(revs, NULL, &ref->objectname, 0);
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+ }
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
+}
+
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
- struct repository *repo UNUSED)
+ struct repository *repo)
{
struct ref_filter filter = REF_FILTER_INIT;
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
usage(_("too many arguments"));
+ repo_init_revisions(repo, &revs, prefix);
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- structure_count_references(&stats, &refs);
+ structure_count_references(&stats.refs, &refs);
+ structure_count_objects(&stats.objects, &refs, &revs);
stats_table_setup_structure(&table, &stats);
stats_table_print_structure(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
ref_array_clear(&refs);
return 0;
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index e592eea0eb..c32cf4e239 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -18,6 +18,13 @@ test_expect_success 'empty repository' '
| * Tags | 0 |
| * Remotes | 0 |
| * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -27,17 +34,18 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references' '
+test_expect_success 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- git commit --allow-empty -m init &&
+ test_commit_bulk 42 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
git update-ref refs/remotes/origin/foo "$oid" &&
+ # Also creates a commit, tree, and blob.
git notes add -m foo &&
cat >expect <<-\EOF &&
@@ -49,6 +57,13 @@ test_expect_success 'repository with references' '
| * Tags | 1 |
| * Remotes | 1 |
| * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 130 |
+ | * Commits | 43 |
+ | * Trees | 43 |
+ | * Blobs | 43 |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
` (3 preceding siblings ...)
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
All repository structure stats are outputted in a human-friendly table
form. This format is not suitable for machine parsing. Add a --format
option that supports three output modes: `table`, `keyvalue`, and `nul`.
The `table` mode is the default format and prints the same table output
as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 25 +++++++++++++++--
builtin/repo.c | 55 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-structure.sh | 33 ++++++++++++++++++++++
3 files changed, 106 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index ae62d2415f..ce43cb19c8 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo structure
+git repo structure [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,7 +44,7 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`structure`::
+`structure [--format=(table|keyvalue|nul)]`::
Retrieve statistics about the current repository structure. The
following kinds of information are reported:
+
@@ -52,7 +52,26 @@ supported:
* Reachable object counts categorized by type
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Three formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table. This format may
+ change and is not intended for machine parsing. This is the default
+ format.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
+ The '=' character is used to delimit between the key and the value.
+ Values containing "unusual" characters are quoted as explained for the
+ configuration variable `core.quotePath` (see linkgit:git-config[1]).
+
+`nul`:::
+ Similar to `keyvalue`, but uses a NUL character to delimit between
+ key-value pairs instead of a newline. Also uses a newline character as
+ the delimiter between the key and value instead of '='. Unlike the
+ `keyvalue` format, values containing "unusual" characters are never
+ quoted.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index 0bc3c1e458..6bf93b6da8 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -15,13 +15,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo structure",
+ "git repo structure [--format=(table|keyvalue|nul)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -136,6 +137,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -158,6 +161,8 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format != FORMAT_KEYVALUE && format != FORMAT_NUL_TERMINATED)
+ die(_("unsupported output format"));
return print_fields(argc, argv, repo, format);
}
@@ -325,6 +330,30 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+static void structure_keyvalue_print(struct repo_structure *stats,
+ char key_delim, char value_delim)
+{
+ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.branches, value_delim);
+ printf("references.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.tags, value_delim);
+ printf("references.remotes.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.remotes, value_delim);
+ printf("references.others.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.others, value_delim);
+
+ printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.commits, value_delim);
+ printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.trees, value_delim);
+ printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.blobs, value_delim);
+ printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.tags, value_delim);
+
+ fflush(stdout);
+}
+
static void structure_count_references(struct ref_stats *stats,
struct ref_array *refs)
{
@@ -411,10 +440,16 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
@@ -427,8 +462,20 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
structure_count_references(&stats.refs, &refs);
structure_count_objects(&stats.objects, &refs, &revs);
- stats_table_setup_structure(&table, &stats);
- stats_table_print_structure(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ structure_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
+ structure_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
+ }
stats_table_clear(&table);
release_revisions(&revs);
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index c32cf4e239..14bd8aede5 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
)
'
+test_expect_success 'keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+
+ cat >expect <<-\EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ git repo structure --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n=" "\0\n" <expect >expect_nul &&
+ git repo structure --format=nul >out 2>err &&
+
+ test_cmp expect_nul out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v5 6/6] builtin/repo: add progress meter for structure stats
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
` (4 preceding siblings ...)
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
@ 2025-10-15 21:12 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
6 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-15 21:12 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
When using the structure subcommand for git-repo(1), evaluating a
repository may take some time depending on its shape. Add a progress
meter to provide feedback to the user about what is happening. The
progress meter is enabled by default when the command is executed from a
tty. It can also be explicitly enabled/disabled via the --[no-]progress
option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 49 +++++++++++++++++++++++++++++++++------
t/t1901-repo-structure.sh | 20 ++++++++++++++++
2 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 6bf93b6da8..763da436ad 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,6 +4,7 @@
#include "environment.h"
#include "parse-options.h"
#include "path-walk.h"
+#include "progress.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
@@ -355,8 +356,16 @@ static void structure_keyvalue_print(struct repo_structure *stats,
}
static void structure_count_references(struct ref_stats *stats,
- struct ref_array *refs)
+ struct ref_array *refs,
+ struct repository *repo,
+ int show_progress)
{
+ struct progress *progress = NULL;
+
+ if (show_progress)
+ progress = start_delayed_progress(repo, _("Counting references"),
+ refs->nr);
+
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -376,13 +385,24 @@ static void structure_count_references(struct ref_stats *stats,
default:
BUG("unexpected reference type");
}
+
+ display_progress(progress, i + 1);
}
+
+ stop_progress(&progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
+
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
- struct object_stats *stats = cb_data;
+ struct count_objects_data *data = cb_data;
+ struct object_stats *stats = data->stats;
+ size_t object_count;
switch (type) {
case OBJ_TAG:
@@ -401,18 +421,24 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
BUG("invalid object type");
}
+ object_count = get_total_object_count(stats);
+ display_progress(data->progress, object_count);
+
return 0;
}
static void structure_count_objects(struct object_stats *stats,
- struct ref_array *refs,
- struct rev_info *revs)
+ struct ref_array *refs, struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
+ .stats = stats,
+ };
info.revs = revs;
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
for (int i = 0; i < refs->nr; i++) {
struct ref_array_item *ref = refs->items[i];
@@ -429,8 +455,12 @@ static void structure_count_objects(struct object_stats *stats,
}
}
+ if (show_progress)
+ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
}
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
@@ -444,10 +474,12 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct repo_structure stats = { 0 };
struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
OPT_CALLBACK_F(0, "format", &format, N_("format"),
N_("output format"),
PARSE_OPT_NONEG, parse_format_cb),
+ OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
OPT_END()
};
@@ -459,8 +491,11 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
die(_("unable to filter refs"));
- structure_count_references(&stats.refs, &refs);
- structure_count_objects(&stats.objects, &refs, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
+ structure_count_references(&stats.refs, &refs, repo, show_progress);
+ structure_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 14bd8aede5..5f513feadb 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -106,4 +106,24 @@ test_expect_success 'keyvalue and nul format' '
)
'
+test_expect_success 'progress meter option' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit foo &&
+
+ GIT_PROGRESS_DELAY=0 git repo structure --progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_grep "Counting references: 100% (2/2), done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
+ GIT_PROGRESS_DELAY=0 git repo structure --no-progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v5 3/6] builtin/repo: introduce structure subcommand
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-16 10:58 ` Patrick Steinhardt
2025-10-21 16:04 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-10-16 10:58 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188, sunshine, gitster, Derrick Stolee
On Wed, Oct 15, 2025 at 04:12:10PM -0500, Justin Tobler wrote:
> diff --git a/builtin/repo.c b/builtin/repo.c
> index eeeab8fbd2..4575cf9467 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -156,12 +160,201 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
[snip]
> +static void structure_count_references(struct ref_stats *stats,
> + struct ref_array *refs)
> +{
> + for (int i = 0; i < refs->nr; i++) {
> + struct ref_array_item *ref = refs->items[i];
> +
> + switch (ref->kind) {
> + case FILTER_REFS_BRANCHES:
> + stats->branches++;
> + break;
> + case FILTER_REFS_REMOTES:
> + stats->remotes++;
> + break;
> + case FILTER_REFS_TAGS:
> + stats->tags++;
> + break;
> + case FILTER_REFS_OTHERS:
> + stats->others++;
> + break;
> + default:
> + BUG("unexpected reference type");
> + }
> + }
> +}
> +
> +static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
> + struct repository *repo UNUSED)
> +{
> + struct ref_filter filter = REF_FILTER_INIT;
> + struct stats_table table = {
> + .rows = STRING_LIST_INIT_DUP,
> + };
> + struct ref_stats stats = { 0 };
> + struct ref_array refs = { 0 };
> + struct option options[] = { 0 };
> +
> + argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
> + if (argc)
> + usage(_("too many arguments"));
> +
> + if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
> + die(_("unable to filter refs"));
> +
> + structure_count_references(&stats, &refs);
I only noticed this when taking a look at the last patch that introduces
a progress meter, but I think we should change how we count references.
The way you do it here means that we have to temporarily store all refs
in an array, which is completely unnecessary and thus a waste of memory.
Furthermore, the resulting progress meter will be somewhat useless
because it only starts counting _after_ we have enumerated all
references already. The second phase where we basically just classify
the refs by type is going to be orders of magnitude faster and probably
not noticeable even with millions of refs.
Instead, I think we should use e.g. `refs_for_each_ref()` and count them
in the callback function. This means we don't have to store them
anymore, and also the progress meter becomes more useful.
The only downside is that we cannot set up the progress meter with an
upper limit. But I think that is an acceptable tradeoff.
The other patches all look good to me, thanks!
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v5 3/6] builtin/repo: introduce structure subcommand
2025-10-16 10:58 ` Patrick Steinhardt
@ 2025-10-21 16:04 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 16:04 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188, sunshine, gitster, Derrick Stolee
On 25/10/16 12:58PM, Patrick Steinhardt wrote:
> I only noticed this when taking a look at the last patch that introduces
> a progress meter, but I think we should change how we count references.
> The way you do it here means that we have to temporarily store all refs
> in an array, which is completely unnecessary and thus a waste of memory.
> Furthermore, the resulting progress meter will be somewhat useless
> because it only starts counting _after_ we have enumerated all
> references already. The second phase where we basically just classify
> the refs by type is going to be orders of magnitude faster and probably
> not noticeable even with millions of refs.
>
> Instead, I think we should use e.g. `refs_for_each_ref()` and count them
> in the callback function. This means we don't have to store them
> anymore, and also the progress meter becomes more useful.
One nice thing about `filter_refs()` is that we can easily set it up to
exclude certain sets of references. In the future, I do forsee options
being introduced that restrict what references we use when performing
this operation. We really only need to iterate through the references
once though to count references and append OIDs in preparation for the
path walk. So I agree that storing all the references in an array is
quite wasteful.
In the next version, I'll update to use `refs_for_each_ref()` and set up
a callback that counts references and appends their OIDs in preparation
for the path walk in a single iteration. When we introduce the ability
to restrict references, we could export `do_filter_refs()` via
"ref-filter.h" and be able to reuse the existing filtering machinery.
> The only downside is that we cannot set up the progress meter with an
> upper limit. But I think that is an acceptable tradeoff.
I agree that this is only a minor downside and sounds reasonable.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v6 0/7] builtin/repo: introduce structure subcommand
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
` (5 preceding siblings ...)
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
` (8 more replies)
6 siblings, 9 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
Greetings,
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.
To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.
In this initial version, the "structure" subcommand only surfaces counts
of the various reference and object types in a repository. In a
follow-up series, I would like to introduce additional data points that
are present in git-sizer(1) such as largest objects, combined object
sizes by type, and other general repository shape information.
Some other general features that would be nice to introduce eventually:
- A "level of concern" meter for reported stats. This could indicate to
users which stats may be worth looking into further.
- Links to OIDs of interesting objects that correspond to certain stats.
- Options to limit which references to use when evaluating the
repository.
Changes since V5:
- Instead of using `filter_refs()` to get an array of all references, we
now use `refs_for_each_ref()` to count references, and setup OIDs for
the path walk, in place. Doing this not only allows us to avoid
wasting memory storing all the reference info, but also to display
progress info to the user while iterating across the references
initially.
- Add a prepatory patch to export `ref_kind_from_refname()` via
"ref_filter.h" so we can reuse logic to categorize references while
counting.
Changes since V4:
- The subcommand was renamed from "stats" to "structure". This was done
to define a more narrow scope for the types of stats that would be
outtputted. This also also for other types of stat-related subcommands
to be implemented in the future that may cover different aspects of
the repository.
- Table column widths are now stored as ints to avoid the unneeded back
and forth conversions.
- Dropped the clang-format patch as it has been upstreamed separately.
- Updated commit messages accordingly.
Changes since V3:
- Changed from using strlen() to utf8_strlen() to take into
consideration that translatable strings may have characters that are
more than one byte.
Changes since V2:
- Added clang-format patch to address false postive triggered in this
series.
- Use varargs for stats_table_add() family of functions.
- Print to stdout directly instead of using strbuf.
- Add parse_option() earlier in the series.
- Use start_delayed_progress() instead of start_progress().
- Add test to validate --[no-]progress options.
- Some other small fixes.
Changes since V1:
- Translatable terms displayed in the table have formatting separated
out.
- Squashed the `keyvalue` and `nul` output format patches into one.
- Added a progress meter to provide users with more feedback.
- Updated docs to outline to outline reported data in a bulleted list.
- Combined similar tests together to reduce repetitive setup.
- Added patch to improve ref-filter interface so we don't have to create
a dummy patterns array.
- Many other renames and cleanups to improve patch clarity.
Thanks,
-Justin
Justin Tobler (7):
builtin/repo: rename repo_info() to cmd_repo_info()
ref-filter: allow NULL filter pattern
ref-filter: export ref_kind_from_refname()
builtin/repo: introduce structure subcommand
builtin/repo: add object counts in structure output
builtin/repo: add keyvalue and nul format for structure stats
builtin/repo: add progress meter for structure stats
Documentation/git-repo.adoc | 30 +++
builtin/repo.c | 380 +++++++++++++++++++++++++++++++++++-
ref-filter.c | 6 +-
ref-filter.h | 2 +
t/meson.build | 1 +
t/t1901-repo-structure.sh | 129 ++++++++++++
6 files changed, 542 insertions(+), 6 deletions(-)
create mode 100755 t/t1901-repo-structure.sh
Range-diff against v5:
1: ed04168562 = 1: ed04168562 builtin/repo: rename repo_info() to cmd_repo_info()
2: 6aa76d1323 = 2: 6aa76d1323 ref-filter: allow NULL filter pattern
-: ---------- > 3: aee696c69b ref-filter: export ref_kind_from_refname()
3: eda1afbe3d ! 4: 4ad599d0ec builtin/repo: introduce structure subcommand
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ va_end(ap);
+}
+
++static inline size_t get_total_reference_count(struct ref_stats *stats)
++{
++ return stats->branches + stats->remotes + stats->tags + stats->others;
++}
++
+static void stats_table_setup_structure(struct stats_table *table,
+ struct ref_stats *refs)
+{
+ size_t ref_total;
+
-+ ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
++ ref_total = get_total_reference_count(refs);
+ stats_table_addf(table, "* %s", _("References"));
+ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
+ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
@@ builtin/repo.c: static int cmd_repo_info(int argc, const char **argv, const char
+ string_list_clear(&table->rows, 1);
+}
+
-+static void structure_count_references(struct ref_stats *stats,
-+ struct ref_array *refs)
++static int count_references(const char *refname,
++ const char *referent UNUSED,
++ const struct object_id *oid UNUSED,
++ int flags UNUSED, void *cb_data)
+{
-+ for (int i = 0; i < refs->nr; i++) {
-+ struct ref_array_item *ref = refs->items[i];
-+
-+ switch (ref->kind) {
-+ case FILTER_REFS_BRANCHES:
-+ stats->branches++;
-+ break;
-+ case FILTER_REFS_REMOTES:
-+ stats->remotes++;
-+ break;
-+ case FILTER_REFS_TAGS:
-+ stats->tags++;
-+ break;
-+ case FILTER_REFS_OTHERS:
-+ stats->others++;
-+ break;
-+ default:
-+ BUG("unexpected reference type");
-+ }
++ struct ref_stats *stats = cb_data;
++
++ switch (ref_kind_from_refname(refname)) {
++ case FILTER_REFS_BRANCHES:
++ stats->branches++;
++ break;
++ case FILTER_REFS_REMOTES:
++ stats->remotes++;
++ break;
++ case FILTER_REFS_TAGS:
++ stats->tags++;
++ break;
++ case FILTER_REFS_OTHERS:
++ stats->others++;
++ break;
++ default:
++ BUG("unexpected reference type");
+ }
++
++ return 0;
++}
++
++static void structure_count_references(struct ref_stats *stats,
++ struct repository *repo)
++{
++ refs_for_each_ref(get_main_ref_store(repo), count_references, &stats);
+}
+
+static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
-+ struct repository *repo UNUSED)
++ struct repository *repo)
+{
-+ struct ref_filter filter = REF_FILTER_INIT;
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
-+ struct ref_array refs = { 0 };
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+
-+ if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
-+ die(_("unable to filter refs"));
-+
-+ structure_count_references(&stats, &refs);
++ structure_count_references(&stats, repo);
+
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
+
+ stats_table_clear(&table);
-+ ref_array_clear(&refs);
+
+ return 0;
+}
4: 503af885d3 ! 5: 4d37f65331 builtin/repo: add object counts in structure output
@@ builtin/repo.c: struct ref_stats {
struct stats_table {
struct string_list rows;
-@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, size_t value,
- va_end(ap);
+@@ builtin/repo.c: static inline size_t get_total_reference_count(struct ref_stats *stats)
+ return stats->branches + stats->remotes + stats->tags + stats->others;
}
+static inline size_t get_total_object_count(struct object_stats *stats)
@@ builtin/repo.c: static void stats_table_count_addf(struct stats_table *table, si
+ size_t object_total;
size_t ref_total;
- ref_total = refs->branches + refs->remotes + refs->tags + refs->others;
+ ref_total = get_total_reference_count(refs);
@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
@@ builtin/repo.c: static void stats_table_setup_structure(struct stats_table *tabl
}
static void stats_table_print_structure(const struct stats_table *table)
-@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
+@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
+ string_list_clear(&table->rows, 1);
+ }
+
++struct count_references_data {
++ struct ref_stats *stats;
++ struct rev_info *revs;
++};
++
+ static int count_references(const char *refname,
+ const char *referent UNUSED,
+- const struct object_id *oid UNUSED,
++ const struct object_id *oid,
+ int flags UNUSED, void *cb_data)
+ {
+- struct ref_stats *stats = cb_data;
++ struct count_references_data *data = cb_data;
++ struct ref_stats *stats = data->stats;
+
+ switch (ref_kind_from_refname(refname)) {
+ case FILTER_REFS_BRANCHES:
+@@ builtin/repo.c: static int count_references(const char *refname,
+ BUG("unexpected reference type");
}
+
++ /*
++ * While iterating through references for counting, also add OIDs in
++ * preparation for the path walk.
++ */
++ add_pending_oid(data->revs, NULL, oid, 0);
++
+ return 0;
}
+ static void structure_count_references(struct ref_stats *stats,
++ struct rev_info *revs,
+ struct repository *repo)
+ {
+- refs_for_each_ref(get_main_ref_store(repo), count_references, &stats);
++ struct count_references_data data = {
++ .stats = stats,
++ .revs = revs,
++ };
++
++ refs_for_each_ref(get_main_ref_store(repo), count_references, &data);
++}
++
++
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
+}
+
+static void structure_count_objects(struct object_stats *stats,
-+ struct ref_array *refs,
+ struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
-+ for (int i = 0; i < refs->nr; i++) {
-+ struct ref_array_item *ref = refs->items[i];
-+
-+ switch (ref->kind) {
-+ case FILTER_REFS_BRANCHES:
-+ case FILTER_REFS_TAGS:
-+ case FILTER_REFS_REMOTES:
-+ case FILTER_REFS_OTHERS:
-+ add_pending_oid(revs, NULL, &ref->objectname, 0);
-+ break;
-+ default:
-+ BUG("unexpected reference type");
-+ }
-+ }
-+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
-+}
-+
+ }
+
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
-- struct repository *repo UNUSED)
-+ struct repository *repo)
- {
- struct ref_filter filter = REF_FILTER_INIT;
+@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_structure stats = { 0 };
- struct ref_array refs = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
if (argc)
usage(_("too many arguments"));
+- structure_count_references(&stats, repo);
+ repo_init_revisions(repo, &revs, prefix);
- if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
- die(_("unable to filter refs"));
-
-- structure_count_references(&stats, &refs);
-+ structure_count_references(&stats.refs, &refs);
-+ structure_count_objects(&stats.objects, &refs, &revs);
++
++ structure_count_references(&stats.refs, &revs, repo);
++ structure_count_objects(&stats.objects, &revs);
stats_table_setup_structure(&table, &stats);
stats_table_print_structure(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
- ref_array_clear(&refs);
return 0;
+ }
## t/t1901-repo-structure.sh ##
@@ t/t1901-repo-structure.sh: test_expect_success 'empty repository' '
5: b336578445 ! 6: 3d42929434 builtin/repo: add keyvalue and nul format for structure stats
@@ builtin/repo.c: static void stats_table_clear(struct stats_table *table)
+ fflush(stdout);
+}
+
- static void structure_count_references(struct ref_stats *stats,
- struct ref_array *refs)
- {
+ struct count_references_data {
+ struct ref_stats *stats;
+ struct rev_info *revs;
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_structure stats = { 0 };
- struct ref_array refs = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
+ struct option options[] = {
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
- structure_count_references(&stats.refs, &refs);
- structure_count_objects(&stats.objects, &refs, &revs);
+ structure_count_references(&stats.refs, &revs, repo);
+ structure_count_objects(&stats.objects, &revs);
- stats_table_setup_structure(&table, &stats);
- stats_table_print_structure(&table);
6: 70c0b7e200 ! 7: 67d7d8eb8c builtin/repo: add progress meter for structure stats
@@ builtin/repo.c
#include "ref-filter.h"
#include "refs.h"
@@ builtin/repo.c: static void structure_keyvalue_print(struct repo_structure *stats,
+ struct count_references_data {
+ struct ref_stats *stats;
+ struct rev_info *revs;
++ struct progress *progress;
+ };
+
+ static int count_references(const char *refname,
+@@ builtin/repo.c: static int count_references(const char *refname,
+ {
+ struct count_references_data *data = cb_data;
+ struct ref_stats *stats = data->stats;
++ size_t ref_count;
+
+ switch (ref_kind_from_refname(refname)) {
+ case FILTER_REFS_BRANCHES:
+@@ builtin/repo.c: static int count_references(const char *refname,
+ */
+ add_pending_oid(data->revs, NULL, oid, 0);
+
++ ref_count = get_total_reference_count(stats);
++ display_progress(data->progress, ref_count);
++
+ return 0;
}
static void structure_count_references(struct ref_stats *stats,
-- struct ref_array *refs)
-+ struct ref_array *refs,
+ struct rev_info *revs,
+- struct repository *repo)
+ struct repository *repo,
+ int show_progress)
{
-+ struct progress *progress = NULL;
-+
-+ if (show_progress)
-+ progress = start_delayed_progress(repo, _("Counting references"),
-+ refs->nr);
-+
- for (int i = 0; i < refs->nr; i++) {
- struct ref_array_item *ref = refs->items[i];
+ struct count_references_data data = {
+ .stats = stats,
+ .revs = revs,
+ };
-@@ builtin/repo.c: static void structure_count_references(struct ref_stats *stats,
- default:
- BUG("unexpected reference type");
- }
-+
-+ display_progress(progress, i + 1);
- }
++ if (show_progress)
++ data.progress = start_delayed_progress(repo,
++ _("Counting references"), 0);
+
-+ stop_progress(&progress);
+ refs_for_each_ref(get_main_ref_store(repo), count_references, &data);
++ stop_progress(&data.progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
-+
+
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
}
static void structure_count_objects(struct object_stats *stats,
-- struct ref_array *refs,
- struct rev_info *revs)
-+ struct ref_array *refs, struct rev_info *revs,
++ struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
@@ builtin/repo.c: static int count_objects(const char *path UNUSED, struct oid_arr
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
-
- for (int i = 0; i < refs->nr; i++) {
- struct ref_array_item *ref = refs->items[i];
-@@ builtin/repo.c: static void structure_count_objects(struct object_stats *stats,
- }
- }
-
++
+ if (show_progress)
+ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
-+
+
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
@@ builtin/repo.c: static void structure_count_objects(struct object_stats *stats,
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+ enum output_format format = FORMAT_TABLE;
struct repo_structure stats = { 0 };
- struct ref_array refs = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const
};
@@ builtin/repo.c: static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
- if (filter_refs(&refs, &filter, FILTER_REFS_REGULAR))
- die(_("unable to filter refs"));
-- structure_count_references(&stats.refs, &refs);
-- structure_count_objects(&stats.objects, &refs, &revs);
+ repo_init_revisions(repo, &revs, prefix);
+
+- structure_count_references(&stats.refs, &revs, repo);
+- structure_count_objects(&stats.objects, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
-+ structure_count_references(&stats.refs, &refs, repo, show_progress);
-+ structure_count_objects(&stats.objects, &refs, &revs, repo, show_progress);
++ structure_count_references(&stats.refs, &revs, repo, show_progress);
++ structure_count_objects(&stats.objects, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
@@ t/t1901-repo-structure.sh: test_expect_success 'keyvalue and nul format' '
+ GIT_PROGRESS_DELAY=0 git repo structure --progress >out 2>err &&
+
+ test_file_not_empty out &&
-+ test_grep "Counting references: 100% (2/2), done." err &&
++ test_grep "Counting references: 2, done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
+ GIT_PROGRESS_DELAY=0 git repo structure --no-progress >out 2>err &&
base-commit: ca2559c1d630eb4f04cdee2328aaf1c768907a9e
--
2.51.0.193.g4975ec3473b
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info()
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
` (7 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index bbb0966f2d..eeeab8fbd2 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -136,8 +136,8 @@ static int parse_format_cb(const struct option *opt,
return 0;
}
-static int repo_info(int argc, const char **argv, const char *prefix,
- struct repository *repo)
+static int cmd_repo_info(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
{
enum output_format format = FORMAT_KEYVALUE;
struct option options[] = {
@@ -161,7 +161,7 @@ int cmd_repo(int argc, const char **argv, const char *prefix,
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
- OPT_SUBCOMMAND("info", &fn, repo_info),
+ OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
OPT_END()
};
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 2/7] ref-filter: allow NULL filter pattern
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
` (6 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ref-filter.c b/ref-filter.c
index 520d2539c9..2cb5a166d6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2664,7 +2664,7 @@ static int match_name_as_path(const char **pattern, const char *refname,
/* Return 1 if the refname matches one of the patterns, otherwise 0. */
static int filter_pattern_match(struct ref_filter *filter, const char *refname)
{
- if (!*filter->name_patterns)
+ if (!filter->name_patterns || !*filter->name_patterns)
return 1; /* No pattern always matches */
if (filter->match_as_path)
return match_name_as_path(filter->name_patterns, refname,
@@ -2751,7 +2751,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
- if (!filter->name_patterns[0]) {
+ if (!filter->name_patterns || !filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
return for_each_fullref_with_seek(filter, cb, cb_data, 0);
}
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 3/7] ref-filter: export ref_kind_from_refname()
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
` (5 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
When filtering refs, `ref_kind_from_refname()` is used to determine the
ref type. In a subsequent commit, this same logic is reused when
counting refs by type. Export the function to prepare for this change.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
ref-filter.c | 2 +-
ref-filter.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/ref-filter.c b/ref-filter.c
index 2cb5a166d6..30cc488d8a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2833,7 +2833,7 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
return ref;
}
-static int ref_kind_from_refname(const char *refname)
+int ref_kind_from_refname(const char *refname)
{
unsigned int i;
diff --git a/ref-filter.h b/ref-filter.h
index f22ca94b49..4ed1edf09a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -135,6 +135,8 @@ struct ref_format {
OPT_STRVEC(0, "exclude", &(var)->exclude, \
N_("pattern"), N_("exclude refs which match pattern"))
+/* Get the reference kind from the provided reference name. */
+int ref_kind_from_refname(const char *refname);
/*
* API for filtering a set of refs. Based on the type of refs the user
* has requested, we iterate through those refs and apply filters
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 4/7] builtin/repo: introduce structure subcommand
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (2 preceding siblings ...)
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
` (4 subsequent siblings)
8 siblings, 2 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler, Derrick Stolee
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.
To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output and also provide other more machine-friendly output formats.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 10 ++
builtin/repo.c | 200 ++++++++++++++++++++++++++++++++++++
t/meson.build | 1 +
t/t1901-repo-structure.sh | 61 +++++++++++
4 files changed, 272 insertions(+)
create mode 100755 t/t1901-repo-structure.sh
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 209afd1b61..8193298dd5 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
+git repo structure
DESCRIPTION
-----------
@@ -43,6 +44,15 @@ supported:
+
`-z` is an alias for `--format=nul`.
+`structure`::
+ Retrieve statistics about the current repository structure. The
+ following kinds of information are reported:
++
+* Reference counts categorized by type
+
++
+The table output format may change and is not intended for machine parsing.
+
INFO KEYS
---------
In order to obtain a set of values from `git repo info`, you should provide
diff --git a/builtin/repo.c b/builtin/repo.c
index eeeab8fbd2..e77e8db563 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,12 +4,16 @@
#include "environment.h"
#include "parse-options.h"
#include "quote.h"
+#include "ref-filter.h"
#include "refs.h"
#include "strbuf.h"
+#include "string-list.h"
#include "shallow.h"
+#include "utf8.h"
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
+ "git repo structure",
NULL
};
@@ -156,12 +160,208 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
return print_fields(argc, argv, repo, format);
}
+struct ref_stats {
+ size_t branches;
+ size_t remotes;
+ size_t tags;
+ size_t others;
+};
+
+struct stats_table {
+ struct string_list rows;
+
+ int name_col_width;
+ int value_col_width;
+};
+
+/*
+ * Holds column data that gets stored for each row.
+ */
+struct stats_table_entry {
+ char *value;
+};
+
+static void stats_table_vaddf(struct stats_table *table,
+ struct stats_table_entry *entry,
+ const char *format, va_list ap)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct string_list_item *item;
+ char *formatted_name;
+ int name_width;
+
+ strbuf_vaddf(&buf, format, ap);
+ formatted_name = strbuf_detach(&buf, NULL);
+ name_width = utf8_strwidth(formatted_name);
+
+ item = string_list_append_nodup(&table->rows, formatted_name);
+ item->util = entry;
+
+ if (name_width > table->name_col_width)
+ table->name_col_width = name_width;
+ if (entry) {
+ int value_width = utf8_strwidth(entry->value);
+ if (value_width > table->value_col_width)
+ table->value_col_width = value_width;
+ }
+}
+
+static void stats_table_addf(struct stats_table *table, const char *format, ...)
+{
+ va_list ap;
+
+ va_start(ap, format);
+ stats_table_vaddf(table, NULL, format, ap);
+ va_end(ap);
+}
+
+static void stats_table_count_addf(struct stats_table *table, size_t value,
+ const char *format, ...)
+{
+ struct stats_table_entry *entry;
+ va_list ap;
+
+ CALLOC_ARRAY(entry, 1);
+ entry->value = xstrfmt("%" PRIuMAX, (uintmax_t)value);
+
+ va_start(ap, format);
+ stats_table_vaddf(table, entry, format, ap);
+ va_end(ap);
+}
+
+static inline size_t get_total_reference_count(struct ref_stats *stats)
+{
+ return stats->branches + stats->remotes + stats->tags + stats->others;
+}
+
+static void stats_table_setup_structure(struct stats_table *table,
+ struct ref_stats *refs)
+{
+ size_t ref_total;
+
+ ref_total = get_total_reference_count(refs);
+ stats_table_addf(table, "* %s", _("References"));
+ stats_table_count_addf(table, ref_total, " * %s", _("Count"));
+ stats_table_count_addf(table, refs->branches, " * %s", _("Branches"));
+ stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
+ stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
+ stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+}
+
+static void stats_table_print_structure(const struct stats_table *table)
+{
+ const char *name_col_title = _("Repository structure");
+ const char *value_col_title = _("Value");
+ int name_col_width = utf8_strwidth(name_col_title);
+ int value_col_width = utf8_strwidth(value_col_title);
+ struct string_list_item *item;
+
+ if (table->name_col_width > name_col_width)
+ name_col_width = table->name_col_width;
+ if (table->value_col_width > value_col_width)
+ value_col_width = table->value_col_width;
+
+ printf("| %-*s | %-*s |\n", name_col_width, name_col_title,
+ value_col_width, value_col_title);
+ printf("| ");
+ for (int i = 0; i < name_col_width; i++)
+ putchar('-');
+ printf(" | ");
+ for (int i = 0; i < value_col_width; i++)
+ putchar('-');
+ printf(" |\n");
+
+ for_each_string_list_item(item, &table->rows) {
+ struct stats_table_entry *entry = item->util;
+ const char *value = "";
+
+ if (entry) {
+ struct stats_table_entry *entry = item->util;
+ value = entry->value;
+ }
+
+ printf("| %-*s | %*s |\n", name_col_width, item->string,
+ value_col_width, value);
+ }
+}
+
+static void stats_table_clear(struct stats_table *table)
+{
+ struct stats_table_entry *entry;
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, &table->rows) {
+ entry = item->util;
+ if (entry)
+ free(entry->value);
+ }
+
+ string_list_clear(&table->rows, 1);
+}
+
+static int count_references(const char *refname,
+ const char *referent UNUSED,
+ const struct object_id *oid UNUSED,
+ int flags UNUSED, void *cb_data)
+{
+ struct ref_stats *stats = cb_data;
+
+ switch (ref_kind_from_refname(refname)) {
+ case FILTER_REFS_BRANCHES:
+ stats->branches++;
+ break;
+ case FILTER_REFS_REMOTES:
+ stats->remotes++;
+ break;
+ case FILTER_REFS_TAGS:
+ stats->tags++;
+ break;
+ case FILTER_REFS_OTHERS:
+ stats->others++;
+ break;
+ default:
+ BUG("unexpected reference type");
+ }
+
+ return 0;
+}
+
+static void structure_count_references(struct ref_stats *stats,
+ struct repository *repo)
+{
+ refs_for_each_ref(get_main_ref_store(repo), count_references, &stats);
+}
+
+static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
+{
+ struct stats_table table = {
+ .rows = STRING_LIST_INIT_DUP,
+ };
+ struct ref_stats stats = { 0 };
+ struct option options[] = { 0 };
+
+ argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (argc)
+ usage(_("too many arguments"));
+
+ structure_count_references(&stats, repo);
+
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
+
+ stats_table_clear(&table);
+
+ return 0;
+}
+
int cmd_repo(int argc, const char **argv, const char *prefix,
struct repository *repo)
{
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("info", &fn, cmd_repo_info),
+ OPT_SUBCOMMAND("structure", &fn, cmd_repo_structure),
OPT_END()
};
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..9e426f8edc 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -236,6 +236,7 @@ integration_tests = [
't1701-racy-split-index.sh',
't1800-hook.sh',
't1900-repo.sh',
+ 't1901-repo-structure.sh',
't2000-conflict-when-checking-files-out.sh',
't2002-checkout-cache-u.sh',
't2003-checkout-cache-mkdir.sh',
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
new file mode 100755
index 0000000000..e592eea0eb
--- /dev/null
+++ b/t/t1901-repo-structure.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='test git repo structure'
+
+. ./test-lib.sh
+
+test_expect_success 'empty repository' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ cat >expect <<-\EOF &&
+ | Repository structure | Value |
+ | -------------------- | ----- |
+ | * References | |
+ | * Count | 0 |
+ | * Branches | 0 |
+ | * Tags | 0 |
+ | * Remotes | 0 |
+ | * Others | 0 |
+ EOF
+
+ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_expect_success 'repository with references' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m init &&
+ git tag -a foo -m bar &&
+
+ oid="$(git rev-parse HEAD)" &&
+ git update-ref refs/remotes/origin/foo "$oid" &&
+
+ git notes add -m foo &&
+
+ cat >expect <<-\EOF &&
+ | Repository structure | Value |
+ | -------------------- | ----- |
+ | * References | |
+ | * Count | 4 |
+ | * Branches | 1 |
+ | * Tags | 1 |
+ | * Remotes | 1 |
+ | * Others | 1 |
+ EOF
+
+ git repo structure >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err
+ )
+'
+
+test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 5/7] builtin/repo: add object counts in structure output
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (3 preceding siblings ...)
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-21 18:25 ` Justin Tobler
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
` (3 subsequent siblings)
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:25 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 1 +
builtin/repo.c | 105 +++++++++++++++++++++++++++++++++---
t/t1901-repo-structure.sh | 19 ++++++-
3 files changed, 117 insertions(+), 8 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 8193298dd5..ae62d2415f 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -49,6 +49,7 @@ supported:
following kinds of information are reported:
+
* Reference counts categorized by type
+* Reachable object counts categorized by type
+
The table output format may change and is not intended for machine parsing.
diff --git a/builtin/repo.c b/builtin/repo.c
index e77e8db563..f39f06ee8c 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -3,9 +3,11 @@
#include "builtin.h"
#include "environment.h"
#include "parse-options.h"
+#include "path-walk.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
+#include "revision.h"
#include "strbuf.h"
#include "string-list.h"
#include "shallow.h"
@@ -167,6 +169,18 @@ struct ref_stats {
size_t others;
};
+struct object_stats {
+ size_t tags;
+ size_t commits;
+ size_t trees;
+ size_t blobs;
+};
+
+struct repo_structure {
+ struct ref_stats refs;
+ struct object_stats objects;
+};
+
struct stats_table {
struct string_list rows;
@@ -234,9 +248,17 @@ static inline size_t get_total_reference_count(struct ref_stats *stats)
return stats->branches + stats->remotes + stats->tags + stats->others;
}
+static inline size_t get_total_object_count(struct object_stats *stats)
+{
+ return stats->tags + stats->commits + stats->trees + stats->blobs;
+}
+
static void stats_table_setup_structure(struct stats_table *table,
- struct ref_stats *refs)
+ struct repo_structure *stats)
{
+ struct object_stats *objects = &stats->objects;
+ struct ref_stats *refs = &stats->refs;
+ size_t object_total;
size_t ref_total;
ref_total = get_total_reference_count(refs);
@@ -246,6 +268,15 @@ static void stats_table_setup_structure(struct stats_table *table,
stats_table_count_addf(table, refs->tags, " * %s", _("Tags"));
stats_table_count_addf(table, refs->remotes, " * %s", _("Remotes"));
stats_table_count_addf(table, refs->others, " * %s", _("Others"));
+
+ object_total = get_total_object_count(objects);
+ stats_table_addf(table, "");
+ stats_table_addf(table, "* %s", _("Reachable objects"));
+ stats_table_count_addf(table, object_total, " * %s", _("Count"));
+ stats_table_count_addf(table, objects->commits, " * %s", _("Commits"));
+ stats_table_count_addf(table, objects->trees, " * %s", _("Trees"));
+ stats_table_count_addf(table, objects->blobs, " * %s", _("Blobs"));
+ stats_table_count_addf(table, objects->tags, " * %s", _("Tags"));
}
static void stats_table_print_structure(const struct stats_table *table)
@@ -299,12 +330,18 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+struct count_references_data {
+ struct ref_stats *stats;
+ struct rev_info *revs;
+};
+
static int count_references(const char *refname,
const char *referent UNUSED,
- const struct object_id *oid UNUSED,
+ const struct object_id *oid,
int flags UNUSED, void *cb_data)
{
- struct ref_stats *stats = cb_data;
+ struct count_references_data *data = cb_data;
+ struct ref_stats *stats = data->stats;
switch (ref_kind_from_refname(refname)) {
case FILTER_REFS_BRANCHES:
@@ -323,13 +360,64 @@ static int count_references(const char *refname,
BUG("unexpected reference type");
}
+ /*
+ * While iterating through references for counting, also add OIDs in
+ * preparation for the path walk.
+ */
+ add_pending_oid(data->revs, NULL, oid, 0);
+
return 0;
}
static void structure_count_references(struct ref_stats *stats,
+ struct rev_info *revs,
struct repository *repo)
{
- refs_for_each_ref(get_main_ref_store(repo), count_references, &stats);
+ struct count_references_data data = {
+ .stats = stats,
+ .revs = revs,
+ };
+
+ refs_for_each_ref(get_main_ref_store(repo), count_references, &data);
+}
+
+
+static int count_objects(const char *path UNUSED, struct oid_array *oids,
+ enum object_type type, void *cb_data)
+{
+ struct object_stats *stats = cb_data;
+
+ switch (type) {
+ case OBJ_TAG:
+ stats->tags += oids->nr;
+ break;
+ case OBJ_COMMIT:
+ stats->commits += oids->nr;
+ break;
+ case OBJ_TREE:
+ stats->trees += oids->nr;
+ break;
+ case OBJ_BLOB:
+ stats->blobs += oids->nr;
+ break;
+ default:
+ BUG("invalid object type");
+ }
+
+ return 0;
+}
+
+static void structure_count_objects(struct object_stats *stats,
+ struct rev_info *revs)
+{
+ struct path_walk_info info = PATH_WALK_INFO_INIT;
+
+ info.revs = revs;
+ info.path_fn = count_objects;
+ info.path_fn_data = stats;
+
+ walk_objects_by_path(&info);
+ path_walk_info_clear(&info);
}
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
@@ -338,19 +426,24 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
- struct ref_stats stats = { 0 };
+ struct repo_structure stats = { 0 };
+ struct rev_info revs;
struct option options[] = { 0 };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
usage(_("too many arguments"));
- structure_count_references(&stats, repo);
+ repo_init_revisions(repo, &revs, prefix);
+
+ structure_count_references(&stats.refs, &revs, repo);
+ structure_count_objects(&stats.objects, &revs);
stats_table_setup_structure(&table, &stats);
stats_table_print_structure(&table);
stats_table_clear(&table);
+ release_revisions(&revs);
return 0;
}
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index e592eea0eb..c32cf4e239 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -18,6 +18,13 @@ test_expect_success 'empty repository' '
| * Tags | 0 |
| * Remotes | 0 |
| * Others | 0 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 0 |
+ | * Commits | 0 |
+ | * Trees | 0 |
+ | * Blobs | 0 |
+ | * Tags | 0 |
EOF
git repo structure >out 2>err &&
@@ -27,17 +34,18 @@ test_expect_success 'empty repository' '
)
'
-test_expect_success 'repository with references' '
+test_expect_success 'repository with references and objects' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- git commit --allow-empty -m init &&
+ test_commit_bulk 42 &&
git tag -a foo -m bar &&
oid="$(git rev-parse HEAD)" &&
git update-ref refs/remotes/origin/foo "$oid" &&
+ # Also creates a commit, tree, and blob.
git notes add -m foo &&
cat >expect <<-\EOF &&
@@ -49,6 +57,13 @@ test_expect_success 'repository with references' '
| * Tags | 1 |
| * Remotes | 1 |
| * Others | 1 |
+ | | |
+ | * Reachable objects | |
+ | * Count | 130 |
+ | * Commits | 43 |
+ | * Trees | 43 |
+ | * Blobs | 43 |
+ | * Tags | 1 |
EOF
git repo structure >out 2>err &&
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (4 preceding siblings ...)
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
@ 2025-10-21 18:26 ` Justin Tobler
2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
` (2 subsequent siblings)
8 siblings, 1 reply; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:26 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
All repository structure stats are outputted in a human-friendly table
form. This format is not suitable for machine parsing. Add a --format
option that supports three output modes: `table`, `keyvalue`, and `nul`.
The `table` mode is the default format and prints the same table output
as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
Documentation/git-repo.adoc | 25 +++++++++++++++--
builtin/repo.c | 55 ++++++++++++++++++++++++++++++++++---
t/t1901-repo-structure.sh | 33 ++++++++++++++++++++++
3 files changed, 106 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index ae62d2415f..ce43cb19c8 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[synopsis]
git repo info [--format=(keyvalue|nul)] [-z] [<key>...]
-git repo structure
+git repo structure [--format=(table|keyvalue|nul)]
DESCRIPTION
-----------
@@ -44,7 +44,7 @@ supported:
+
`-z` is an alias for `--format=nul`.
-`structure`::
+`structure [--format=(table|keyvalue|nul)]`::
Retrieve statistics about the current repository structure. The
following kinds of information are reported:
+
@@ -52,7 +52,26 @@ supported:
* Reachable object counts categorized by type
+
-The table output format may change and is not intended for machine parsing.
+The output format can be chosen through the flag `--format`. Three formats are
+supported:
++
+`table`:::
+ Outputs repository stats in a human-friendly table. This format may
+ change and is not intended for machine parsing. This is the default
+ format.
+
+`keyvalue`:::
+ Each line of output contains a key-value pair for a repository stat.
+ The '=' character is used to delimit between the key and the value.
+ Values containing "unusual" characters are quoted as explained for the
+ configuration variable `core.quotePath` (see linkgit:git-config[1]).
+
+`nul`:::
+ Similar to `keyvalue`, but uses a NUL character to delimit between
+ key-value pairs instead of a newline. Also uses a newline character as
+ the delimiter between the key and value instead of '='. Unlike the
+ `keyvalue` format, values containing "unusual" characters are never
+ quoted.
INFO KEYS
---------
diff --git a/builtin/repo.c b/builtin/repo.c
index f39f06ee8c..1754cc7e5d 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -15,13 +15,14 @@
static const char *const repo_usage[] = {
"git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
- "git repo structure",
+ "git repo structure [--format=(table|keyvalue|nul)]",
NULL
};
typedef int get_value_fn(struct repository *repo, struct strbuf *buf);
enum output_format {
+ FORMAT_TABLE,
FORMAT_KEYVALUE,
FORMAT_NUL_TERMINATED,
};
@@ -136,6 +137,8 @@ static int parse_format_cb(const struct option *opt,
*format = FORMAT_NUL_TERMINATED;
else if (!strcmp(arg, "keyvalue"))
*format = FORMAT_KEYVALUE;
+ else if (!strcmp(arg, "table"))
+ *format = FORMAT_TABLE;
else
die(_("invalid format '%s'"), arg);
@@ -158,6 +161,8 @@ static int cmd_repo_info(int argc, const char **argv, const char *prefix,
};
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
+ if (format != FORMAT_KEYVALUE && format != FORMAT_NUL_TERMINATED)
+ die(_("unsupported output format"));
return print_fields(argc, argv, repo, format);
}
@@ -330,6 +335,30 @@ static void stats_table_clear(struct stats_table *table)
string_list_clear(&table->rows, 1);
}
+static void structure_keyvalue_print(struct repo_structure *stats,
+ char key_delim, char value_delim)
+{
+ printf("references.branches.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.branches, value_delim);
+ printf("references.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.tags, value_delim);
+ printf("references.remotes.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.remotes, value_delim);
+ printf("references.others.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->refs.others, value_delim);
+
+ printf("objects.commits.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.commits, value_delim);
+ printf("objects.trees.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.trees, value_delim);
+ printf("objects.blobs.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.blobs, value_delim);
+ printf("objects.tags.count%c%" PRIuMAX "%c", key_delim,
+ (uintmax_t)stats->objects.tags, value_delim);
+
+ fflush(stdout);
+}
+
struct count_references_data {
struct ref_stats *stats;
struct rev_info *revs;
@@ -426,9 +455,15 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
struct stats_table table = {
.rows = STRING_LIST_INIT_DUP,
};
+ enum output_format format = FORMAT_TABLE;
struct repo_structure stats = { 0 };
struct rev_info revs;
- struct option options[] = { 0 };
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "format", &format, N_("format"),
+ N_("output format"),
+ PARSE_OPT_NONEG, parse_format_cb),
+ OPT_END()
+ };
argc = parse_options(argc, argv, prefix, options, repo_usage, 0);
if (argc)
@@ -439,8 +474,20 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
structure_count_references(&stats.refs, &revs, repo);
structure_count_objects(&stats.objects, &revs);
- stats_table_setup_structure(&table, &stats);
- stats_table_print_structure(&table);
+ switch (format) {
+ case FORMAT_TABLE:
+ stats_table_setup_structure(&table, &stats);
+ stats_table_print_structure(&table);
+ break;
+ case FORMAT_KEYVALUE:
+ structure_keyvalue_print(&stats, '=', '\n');
+ break;
+ case FORMAT_NUL_TERMINATED:
+ structure_keyvalue_print(&stats, '\n', '\0');
+ break;
+ default:
+ BUG("invalid output format");
+ }
stats_table_clear(&table);
release_revisions(&revs);
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index c32cf4e239..14bd8aede5 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -73,4 +73,37 @@ test_expect_success 'repository with references and objects' '
)
'
+test_expect_success 'keyvalue and nul format' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit_bulk 42 &&
+ git tag -a foo -m bar &&
+
+ cat >expect <<-\EOF &&
+ references.branches.count=1
+ references.tags.count=1
+ references.remotes.count=0
+ references.others.count=0
+ objects.commits.count=42
+ objects.trees.count=42
+ objects.blobs.count=42
+ objects.tags.count=1
+ EOF
+
+ git repo structure --format=keyvalue >out 2>err &&
+
+ test_cmp expect out &&
+ test_line_count = 0 err &&
+
+ # Replace key and value delimiters for nul format.
+ tr "\n=" "\0\n" <expect >expect_nul &&
+ git repo structure --format=nul >out 2>err &&
+
+ test_cmp expect_nul out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 7/7] builtin/repo: add progress meter for structure stats
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (5 preceding siblings ...)
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
@ 2025-10-21 18:26 ` Justin Tobler
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23 20:54 ` Junio C Hamano
8 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-21 18:26 UTC (permalink / raw)
To: git; +Cc: ps, karthik.188, sunshine, gitster, Justin Tobler
When using the structure subcommand for git-repo(1), evaluating a
repository may take some time depending on its shape. Add a progress
meter to provide feedback to the user about what is happening. The
progress meter is enabled by default when the command is executed from a
tty. It can also be explicitly enabled/disabled via the --[no-]progress
option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
builtin/repo.c | 46 ++++++++++++++++++++++++++++++++++-----
t/t1901-repo-structure.sh | 20 +++++++++++++++++
2 files changed, 60 insertions(+), 6 deletions(-)
diff --git a/builtin/repo.c b/builtin/repo.c
index 1754cc7e5d..9d4749f79b 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -4,6 +4,7 @@
#include "environment.h"
#include "parse-options.h"
#include "path-walk.h"
+#include "progress.h"
#include "quote.h"
#include "ref-filter.h"
#include "refs.h"
@@ -362,6 +363,7 @@ static void structure_keyvalue_print(struct repo_structure *stats,
struct count_references_data {
struct ref_stats *stats;
struct rev_info *revs;
+ struct progress *progress;
};
static int count_references(const char *refname,
@@ -371,6 +373,7 @@ static int count_references(const char *refname,
{
struct count_references_data *data = cb_data;
struct ref_stats *stats = data->stats;
+ size_t ref_count;
switch (ref_kind_from_refname(refname)) {
case FILTER_REFS_BRANCHES:
@@ -395,26 +398,41 @@ static int count_references(const char *refname,
*/
add_pending_oid(data->revs, NULL, oid, 0);
+ ref_count = get_total_reference_count(stats);
+ display_progress(data->progress, ref_count);
+
return 0;
}
static void structure_count_references(struct ref_stats *stats,
struct rev_info *revs,
- struct repository *repo)
+ struct repository *repo,
+ int show_progress)
{
struct count_references_data data = {
.stats = stats,
.revs = revs,
};
+ if (show_progress)
+ data.progress = start_delayed_progress(repo,
+ _("Counting references"), 0);
+
refs_for_each_ref(get_main_ref_store(repo), count_references, &data);
+ stop_progress(&data.progress);
}
+struct count_objects_data {
+ struct object_stats *stats;
+ struct progress *progress;
+};
static int count_objects(const char *path UNUSED, struct oid_array *oids,
enum object_type type, void *cb_data)
{
- struct object_stats *stats = cb_data;
+ struct count_objects_data *data = cb_data;
+ struct object_stats *stats = data->stats;
+ size_t object_count;
switch (type) {
case OBJ_TAG:
@@ -433,20 +451,31 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids,
BUG("invalid object type");
}
+ object_count = get_total_object_count(stats);
+ display_progress(data->progress, object_count);
+
return 0;
}
static void structure_count_objects(struct object_stats *stats,
- struct rev_info *revs)
+ struct rev_info *revs,
+ struct repository *repo, int show_progress)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
+ struct count_objects_data data = {
+ .stats = stats,
+ };
info.revs = revs;
info.path_fn = count_objects;
- info.path_fn_data = stats;
+ info.path_fn_data = &data;
+
+ if (show_progress)
+ data.progress = start_delayed_progress(repo, _("Counting objects"), 0);
walk_objects_by_path(&info);
path_walk_info_clear(&info);
+ stop_progress(&data.progress);
}
static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
@@ -458,10 +487,12 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
enum output_format format = FORMAT_TABLE;
struct repo_structure stats = { 0 };
struct rev_info revs;
+ int show_progress = -1;
struct option options[] = {
OPT_CALLBACK_F(0, "format", &format, N_("format"),
N_("output format"),
PARSE_OPT_NONEG, parse_format_cb),
+ OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
OPT_END()
};
@@ -471,8 +502,11 @@ static int cmd_repo_structure(int argc, const char **argv, const char *prefix,
repo_init_revisions(repo, &revs, prefix);
- structure_count_references(&stats.refs, &revs, repo);
- structure_count_objects(&stats.objects, &revs);
+ if (show_progress < 0)
+ show_progress = isatty(2);
+
+ structure_count_references(&stats.refs, &revs, repo, show_progress);
+ structure_count_objects(&stats.objects, &revs, repo, show_progress);
switch (format) {
case FORMAT_TABLE:
diff --git a/t/t1901-repo-structure.sh b/t/t1901-repo-structure.sh
index 14bd8aede5..36a71a144e 100755
--- a/t/t1901-repo-structure.sh
+++ b/t/t1901-repo-structure.sh
@@ -106,4 +106,24 @@ test_expect_success 'keyvalue and nul format' '
)
'
+test_expect_success 'progress meter option' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit foo &&
+
+ GIT_PROGRESS_DELAY=0 git repo structure --progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_grep "Counting references: 2, done." err &&
+ test_grep "Counting objects: 3, done." err &&
+
+ GIT_PROGRESS_DELAY=0 git repo structure --no-progress >out 2>err &&
+
+ test_file_not_empty out &&
+ test_line_count = 0 err
+ )
+'
+
test_done
--
2.51.0.193.g4975ec3473b
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v6 4/7] builtin/repo: introduce structure subcommand
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
@ 2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 13:50 ` Justin Tobler
2025-10-22 20:15 ` Lucas Seiki Oshiro
1 sibling, 1 reply; 92+ messages in thread
From: Patrick Steinhardt @ 2025-10-22 5:01 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, karthik.188, sunshine, gitster, Derrick Stolee
On Tue, Oct 21, 2025 at 01:25:58PM -0500, Justin Tobler wrote:
[snip]
> +static int count_references(const char *refname,
> + const char *referent UNUSED,
> + const struct object_id *oid UNUSED,
> + int flags UNUSED, void *cb_data)
Tiniest nit, not worth a reroll: we tend to use these callbacks in
singular, as you end up doing the thing for one specific entity.
Other than that this series looks good to me now and is ready to be
merged from my point of view. Thanks!
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 4/7] builtin/repo: introduce structure subcommand
2025-10-22 5:01 ` Patrick Steinhardt
@ 2025-10-22 13:50 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-22 13:50 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, karthik.188, sunshine, gitster, Derrick Stolee
On 25/10/22 07:01AM, Patrick Steinhardt wrote:
> On Tue, Oct 21, 2025 at 01:25:58PM -0500, Justin Tobler wrote:
> > +static int count_references(const char *refname,
> > + const char *referent UNUSED,
> > + const struct object_id *oid UNUSED,
> > + int flags UNUSED, void *cb_data)
>
> Tiniest nit, not worth a reroll: we tend to use these callbacks in
> singular, as you end up doing the thing for one specific entity.
Good to know and makes sense. I'll hold off from sending another version
for now. I can also probably adjust as part of my planned followup
series.
> Other than that this series looks good to me now and is ready to be
> merged from my point of view. Thanks!
Thanks for the review.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 0/7] builtin/repo: introduce structure subcommand
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (6 preceding siblings ...)
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
@ 2025-10-22 19:23 ` Lucas Seiki Oshiro
2025-10-23 0:05 ` Justin Tobler
2025-10-23 20:54 ` Junio C Hamano
8 siblings, 1 reply; 92+ messages in thread
From: Lucas Seiki Oshiro @ 2025-10-22 19:23 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine, gitster
Hi, Justin!
Nice to see this happening. I'm really happy to see `git repo` getting
new features, and I think that these new features will be very useful
especially for people who research on free/open source software.
Sorry for only review this today, I've been busy finishing my master's
and I didn't have enough time to see your work here. Can you please CC
me in the next versions?
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 4/7] builtin/repo: introduce structure subcommand
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
@ 2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-22 23:42 ` Justin Tobler
1 sibling, 1 reply; 92+ messages in thread
From: Lucas Seiki Oshiro @ 2025-10-22 20:15 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine, gitster, Derrick Stolee
> +static void stats_table_print_structure(const struct stats_table *table)
Question: isn't it possible to use the tables from column.c by
allowing them to use a cell delimiter? This table formatter seems to
be useful for other places, like git-repo-info itself.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
@ 2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-23 0:03 ` Justin Tobler
0 siblings, 1 reply; 92+ messages in thread
From: Lucas Seiki Oshiro @ 2025-10-22 20:34 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine, gitster
> All repository structure stats are outputted in a human-friendly table
> form. This format is not suitable for machine parsing. Add a --format
> option that supports three output modes: `table`, `keyvalue`, and `nul`.
> The `table` mode is the default format and prints the same table output
> as before.
Now I'm thinking... What about making --format a flag for git-repo,
working for both git-repo-info and git-repo-structure? It doesn't seem
to be hard to make git-repo-info compatible with your table format and
it looks to me that it would make git-repo more consistent.
I'm also wondering if git-repo-info use this table format by default.
git-repo-info and git-repo-structure are completely different under
the hood, but their interface are very similar and it seems to be that
they could be more closer to each other.
> "git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
> - "git repo structure",
> + "git repo structure [--format=(table|keyvalue|nul)]",
Do you intend to add a key parameter like in git-repo-info? This way,
we could run:
git repo structure references.branches.count references.tags.count
and it would return only the branch and tag count.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 4/7] builtin/repo: introduce structure subcommand
2025-10-22 20:15 ` Lucas Seiki Oshiro
@ 2025-10-22 23:42 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-22 23:42 UTC (permalink / raw)
To: Lucas Seiki Oshiro
Cc: git, ps, karthik.188, sunshine, gitster, Derrick Stolee
On 25/10/22 05:15PM, Lucas Seiki Oshiro wrote:
>
> > +static void stats_table_print_structure(const struct stats_table *table)
>
> Question: isn't it possible to use the tables from column.c by
> allowing them to use a cell delimiter? This table formatter seems to
> be useful for other places, like git-repo-info itself.
I opted to create a table formatter specific stats output so we could be
more flexible with how individual column data is displayed.
git-repo-structure having its own table format allows it to more easily
configure things like cell value alignment column-by-column. Also, in a
followup series I plan to introduce measurement units to values
displayed in the table, but I still want to align values and units
separately while still being in the same cell. This sort of
configuration wouldn't fit well in the existing "column.c" table format
IMO.
I did try to design this new stats table format in a way that its could
be reused by other git-repo(1) subcommands if ever needed. So we want to
use it for git-repo-info itself in the future, I think we could do so
relatively easily.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats
2025-10-22 20:34 ` Lucas Seiki Oshiro
@ 2025-10-23 0:03 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-23 0:03 UTC (permalink / raw)
To: Lucas Seiki Oshiro; +Cc: git, ps, karthik.188, sunshine, gitster
On 25/10/22 05:34PM, Lucas Seiki Oshiro wrote:
>
> > All repository structure stats are outputted in a human-friendly table
> > form. This format is not suitable for machine parsing. Add a --format
> > option that supports three output modes: `table`, `keyvalue`, and `nul`.
> > The `table` mode is the default format and prints the same table output
> > as before.
>
> Now I'm thinking... What about making --format a flag for git-repo,
> working for both git-repo-info and git-repo-structure? It doesn't seem
> to be hard to make git-repo-info compatible with your table format and
> it looks to me that it would make git-repo more consistent.
Since currently these different subcommands don't support all the same
formats anyways, I don't think it matters too much that they both define
their own `--format` option. If git-repo-info were to add support for a
table format in the future though, I do think we could probably lift
this option up at that time if we wanted.
> I'm also wondering if git-repo-info use this table format by default.
> git-repo-info and git-repo-structure are completely different under
> the hood, but their interface are very similar and it seems to be that
> they could be more closer to each other.
I think that could be nice, but is should probably be done in a separate
series. I do like the idea of aligning the interfaces of these
subcommands though. :)
> > "git repo info [--format=(keyvalue|nul)] [-z] [<key>...]",
> > - "git repo structure",
> > + "git repo structure [--format=(table|keyvalue|nul)]",
>
> Do you intend to add a key parameter like in git-repo-info? This way,
> we could run:
>
> git repo structure references.branches.count references.tags.count
>
> and it would return only the branch and tag count.
Currently I do not intend to add key parameters similar to
git-repo-info. For git-repo-structure, I don't think it would add much
value to filter out certain values unless it would simplify overall
computation. In the case of `references.branches.count` and
`references.tags.count`, figuring these values out requires iterating
across all the references anyways. Even if we only wanted object count
data we would still have to iterate across all the references too.
If we just wanted reference data, we could have an option to skip the
path walking, but that would really limit the data we display and
probably not be that useful overall.
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 0/7] builtin/repo: introduce structure subcommand
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
@ 2025-10-23 0:05 ` Justin Tobler
0 siblings, 0 replies; 92+ messages in thread
From: Justin Tobler @ 2025-10-23 0:05 UTC (permalink / raw)
To: Lucas Seiki Oshiro; +Cc: git, ps, karthik.188, sunshine, gitster
On 25/10/22 04:23PM, Lucas Seiki Oshiro wrote:
>
> Hi, Justin!
>
> Nice to see this happening. I'm really happy to see `git repo` getting
> new features, and I think that these new features will be very useful
> especially for people who research on free/open source software.
>
> Sorry for only review this today, I've been busy finishing my master's
> and I didn't have enough time to see your work here. Can you please CC
> me in the next versions?
Thanks Lucas for taking a look. If I do need to send out another
version, I'll make sure to CC you. :)
-Justin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 0/7] builtin/repo: introduce structure subcommand
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
` (7 preceding siblings ...)
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
@ 2025-10-23 20:54 ` Junio C Hamano
2025-10-24 5:14 ` Patrick Steinhardt
8 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2025-10-23 20:54 UTC (permalink / raw)
To: Justin Tobler; +Cc: git, ps, karthik.188, sunshine
Justin Tobler <jltobler@gmail.com> writes:
> In this initial version, the "structure" subcommand only surfaces counts
> of the various reference and object types in a repository. In a
> follow-up series, I would like to introduce additional data points that
> are present in git-sizer(1) such as largest objects, combined object
> sizes by type, and other general repository shape information.
>
> Some other general features that would be nice to introduce eventually:
>
> - A "level of concern" meter for reported stats. This could indicate to
> users which stats may be worth looking into further.
> - Links to OIDs of interesting objects that correspond to certain stats.
> - Options to limit which references to use when evaluating the
> repository.
>
> Changes since V5:
> - Instead of using `filter_refs()` to get an array of all references, we
> now use `refs_for_each_ref()` to count references, and setup OIDs for
> the path walk, in place. Doing this not only allows us to avoid
> wasting memory storing all the reference info, but also to display
> progress info to the user while iterating across the references
> initially.
> - Add a prepatory patch to export `ref_kind_from_refname()` via
> "ref_filter.h" so we can reuse logic to categorize references while
> counting.
This round looked pretty well done to me. Shall we declare victory
and mark it for 'next' real soon now?
Thanks.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 0/7] builtin/repo: introduce structure subcommand
2025-10-23 20:54 ` Junio C Hamano
@ 2025-10-24 5:14 ` Patrick Steinhardt
0 siblings, 0 replies; 92+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 5:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Justin Tobler, git, karthik.188, sunshine
On Thu, Oct 23, 2025 at 01:54:17PM -0700, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
>
> > In this initial version, the "structure" subcommand only surfaces counts
> > of the various reference and object types in a repository. In a
> > follow-up series, I would like to introduce additional data points that
> > are present in git-sizer(1) such as largest objects, combined object
> > sizes by type, and other general repository shape information.
> >
> > Some other general features that would be nice to introduce eventually:
> >
> > - A "level of concern" meter for reported stats. This could indicate to
> > users which stats may be worth looking into further.
> > - Links to OIDs of interesting objects that correspond to certain stats.
> > - Options to limit which references to use when evaluating the
> > repository.
> >
> > Changes since V5:
> > - Instead of using `filter_refs()` to get an array of all references, we
> > now use `refs_for_each_ref()` to count references, and setup OIDs for
> > the path walk, in place. Doing this not only allows us to avoid
> > wasting memory storing all the reference info, but also to display
> > progress info to the user while iterating across the references
> > initially.
> > - Add a prepatory patch to export `ref_kind_from_refname()` via
> > "ref_filter.h" so we can reuse logic to categorize references while
> > counting.
>
> This round looked pretty well done to me. Shall we declare victory
> and mark it for 'next' real soon now?
Agreed, I also think that this round looks good and is ready to be
merged down to next.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 92+ messages in thread
end of thread, other threads:[~2025-10-24 5:14 UTC | newest]
Thread overview: 92+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 2:56 [PATCH 0/4] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-23 2:56 ` [PATCH 1/4] " Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:10 ` Justin Tobler
2025-09-23 15:26 ` Patrick Steinhardt
2025-09-23 15:22 ` Karthik Nayak
2025-09-23 15:55 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 2/4] builtin/repo: add object counts in stats output Justin Tobler
2025-09-23 10:52 ` Patrick Steinhardt
2025-09-23 15:19 ` Justin Tobler
2025-09-23 15:30 ` Karthik Nayak
2025-09-23 15:56 ` Justin Tobler
2025-09-23 2:56 ` [PATCH 3/4] builtin/repo: add keyvalue format for stats Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:26 ` Justin Tobler
2025-09-23 15:39 ` Karthik Nayak
2025-09-23 15:59 ` Justin Tobler
2025-09-23 2:57 ` [PATCH 4/4] builtin/repo: add nul " Justin Tobler
2025-09-23 10:53 ` Patrick Steinhardt
2025-09-23 15:33 ` Justin Tobler
2025-09-24 4:48 ` Patrick Steinhardt
2025-09-23 15:41 ` Karthik Nayak
2025-09-23 16:02 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 0/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-24 21:24 ` [PATCH v2 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-24 21:24 ` [PATCH v2 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-24 21:24 ` [PATCH v2 3/6] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 5:38 ` Patrick Steinhardt
2025-09-25 13:01 ` Justin Tobler
2025-09-24 21:24 ` [PATCH v2 4/6] builtin/repo: add object counts in stats output Justin Tobler
2025-09-24 21:24 ` [PATCH v2 5/6] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:16 ` Justin Tobler
2025-09-25 13:58 ` Patrick Steinhardt
2025-09-24 21:24 ` [PATCH v2 6/6] builtin/repo: add progress meter " Justin Tobler
2025-09-25 5:39 ` Patrick Steinhardt
2025-09-25 13:20 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:29 ` [PATCH v3 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-25 23:29 ` [PATCH v3 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-25 23:29 ` [PATCH v3 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-25 23:29 ` [PATCH v3 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-25 23:51 ` Eric Sunshine
2025-09-26 1:38 ` Justin Tobler
2025-09-25 23:29 ` [PATCH v3 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-25 23:29 ` [PATCH v3 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-25 23:29 ` [PATCH v3 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 14:50 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 14:50 ` [PATCH v4 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-09-27 14:50 ` [PATCH v4 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-09-27 14:50 ` [PATCH v4 3/7] clang-format: exclude control macros from SpaceBeforeParens Justin Tobler
2025-09-27 15:40 ` Junio C Hamano
2025-09-27 15:51 ` Justin Tobler
2025-09-27 23:49 ` Junio C Hamano
2025-09-27 14:50 ` [PATCH v4 4/7] builtin/repo: introduce stats subcommand Justin Tobler
2025-09-27 16:32 ` Junio C Hamano
2025-10-09 22:09 ` Justin Tobler
2025-10-10 0:42 ` Justin Tobler
2025-10-10 6:53 ` Patrick Steinhardt
2025-10-10 14:34 ` Justin Tobler
2025-10-13 6:13 ` Patrick Steinhardt
2025-09-27 14:50 ` [PATCH v4 5/7] builtin/repo: add object counts in stats output Justin Tobler
2025-09-27 14:50 ` [PATCH v4 6/7] builtin/repo: add keyvalue and nul format for stats Justin Tobler
2025-09-27 14:50 ` [PATCH v4 7/7] builtin/repo: add progress meter " Justin Tobler
2025-09-27 16:33 ` [PATCH v4 0/7] builtin/repo: introduce stats subcommand Junio C Hamano
2025-10-15 21:12 ` [PATCH v5 0/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-15 21:12 ` [PATCH v5 1/6] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-15 21:12 ` [PATCH v5 2/6] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-15 21:12 ` [PATCH v5 3/6] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-16 10:58 ` Patrick Steinhardt
2025-10-21 16:04 ` Justin Tobler
2025-10-15 21:12 ` [PATCH v5 4/6] builtin/repo: add object counts in structure output Justin Tobler
2025-10-15 21:12 ` [PATCH v5 5/6] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-15 21:12 ` [PATCH v5 6/6] builtin/repo: add progress meter " Justin Tobler
2025-10-21 18:25 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-21 18:25 ` [PATCH v6 1/7] builtin/repo: rename repo_info() to cmd_repo_info() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 2/7] ref-filter: allow NULL filter pattern Justin Tobler
2025-10-21 18:25 ` [PATCH v6 3/7] ref-filter: export ref_kind_from_refname() Justin Tobler
2025-10-21 18:25 ` [PATCH v6 4/7] builtin/repo: introduce structure subcommand Justin Tobler
2025-10-22 5:01 ` Patrick Steinhardt
2025-10-22 13:50 ` Justin Tobler
2025-10-22 20:15 ` Lucas Seiki Oshiro
2025-10-22 23:42 ` Justin Tobler
2025-10-21 18:25 ` [PATCH v6 5/7] builtin/repo: add object counts in structure output Justin Tobler
2025-10-21 18:26 ` [PATCH v6 6/7] builtin/repo: add keyvalue and nul format for structure stats Justin Tobler
2025-10-22 20:34 ` Lucas Seiki Oshiro
2025-10-23 0:03 ` Justin Tobler
2025-10-21 18:26 ` [PATCH v6 7/7] builtin/repo: add progress meter " Justin Tobler
2025-10-22 19:23 ` [PATCH v6 0/7] builtin/repo: introduce structure subcommand Lucas Seiki Oshiro
2025-10-23 0:05 ` Justin Tobler
2025-10-23 20:54 ` Junio C Hamano
2025-10-24 5:14 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).