* [PATCH GSoC RFC v13 08/12] serve: advertise object-info feature
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Calvin Wan, Jonathan Tan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Calvin Wan <calvinwan@google.com>
In order for a client to know what object-info components a server can
provide, advertise supported object-info features. This will allow a
client to decide whether to query the server for object-info or fetch
as a fallback.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
serve.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/serve.c b/serve.c
index 49a6e39b1d..2b07d922b3 100644
--- a/serve.c
+++ b/serve.c
@@ -89,7 +89,7 @@ static void session_id_receive(struct repository *r UNUSED,
trace2_data_string("transfer", NULL, "client-sid", client_sid);
}
-static int object_info_advertise(struct repository *r, struct strbuf *value UNUSED)
+static int object_info_advertise(struct repository *r, struct strbuf *value)
{
if (advertise_object_info == -1 &&
repo_config_get_bool(r, "transfer.advertiseobjectinfo",
@@ -97,6 +97,9 @@ static int object_info_advertise(struct repository *r, struct strbuf *value UNUS
/* disabled by default */
advertise_object_info = 0;
}
+ /* Currently only size is supported */
+ if (value && advertise_object_info)
+ strbuf_addstr(value, "size");
return advertise_object_info;
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 07/12] fetch-pack: move fetch initialization
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Calvin Wan, Jonathan Tan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Calvin Wan <calvinwan@google.com>
There are some variables initialized at the start of the
do_fetch_pack_v2() state machine. Currently, they are initialized
in FETCH_CHECK_LOCAL, which is the initial state set at the beginning
of the function.
However, a subsequent patch will allow for another initial state,
while still requiring these initialized variables.
Move the initialization to be before the state machine,
so that they are set regardless of the initial state.
Note that there is no change in behavior, because we're moving code
from the beginning of the first state to just before the execution of
the state machine.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 3d32114907..cdebd3476f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1736,18 +1736,18 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
reader.me = "fetch-pack";
}
+ /* v2 supports these by default */
+ allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
+ use_sideband = 2;
+ if (args->depth > 0 || args->deepen_since || args->deepen_not)
+ args->deepen = 1;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
- /* v2 supports these by default */
- allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
- use_sideband = 2;
- if (args->depth > 0 || args->deepen_since || args->deepen_not)
- args->deepen = 1;
-
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater, Jonathan Tan, Calvin Wan
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
Refactor `write_fetch_command_and_capabilities()`, enabling it to serve
both fetch and additional commands.
In this context, "command" refers to the "operations" supported by
Git's wire protocol https://git-scm.com/docs/protocol-v2, such as a Git
subcommand (e.g., git-fetch(1)) or a server-side operation like
"object-info" as implemented in commit a2ba162
(object-info: support for retrieving object info, 2021-04-20).
Refactor the function signature to accept a command instead of the
hardcoded "fetch".
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
connect.c | 10 +++++-----
connect.h | 8 ++++++--
fetch-pack.c | 4 ++--
3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/connect.c b/connect.c
index 1dced8e632..78c69d4485 100644
--- a/connect.c
+++ b/connect.c
@@ -700,16 +700,16 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
+void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
+ const struct string_list *server_options)
{
const char *hash_name;
int advertise_sid;
repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
- ensure_server_supports_v2("fetch");
- packet_buf_write(req_buf, "command=fetch");
+ ensure_server_supports_v2(command);
+ packet_buf_write(req_buf, "command=%s", command);
if (server_supports_v2("agent"))
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
if (advertise_sid && server_supports_v2("session-id"))
@@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
die(_("mismatched algorithms: client %s; server %s"),
the_hash_algo->name, hash_name);
packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
- } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
+ } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
die(_("the server does not support algorithm '%s'"),
the_hash_algo->name);
}
diff --git a/connect.h b/connect.h
index c4f6ea4b0a..8f4c523892 100644
--- a/connect.h
+++ b/connect.h
@@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
struct packet_reader *reader,
const char *error);
+/*
+ * Writes a command along with the requested server capabilities/features into a
+ * request buffer.
+ */
struct string_list;
-void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options);
+void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
+ const struct string_list *server_options);
#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 4a8a70b5f3..3d32114907 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1387,7 +1387,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
int done_sent = 0;
struct strbuf req_buf = STRBUF_INIT;
- write_fetch_command_and_capabilities(&req_buf, args->server_options);
+ write_command_and_capabilities(&req_buf, "fetch", args->server_options);
if (args->use_thin_pack)
packet_buf_write(&req_buf, "thin-pack");
@@ -2255,7 +2255,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_restrict_tips,
the_repository, "%d",
negotiation_round);
strbuf_reset(&req_buf);
- write_fetch_command_and_capabilities(&req_buf, server_options);
+ write_command_and_capabilities(&req_buf, "fetch", server_options);
packet_buf_write(&req_buf, "wait-for-done");
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 05/12] fetch-pack: move function to connect.c
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater, Jonathan Tan, Calvin Wan
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
write_fetch_command_and_capabilities will be refactored in a subsequent
commit where it will become a more general-purpose function, making it
more accessible to additional commands in the future.
To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
`advertise_sid` was a static variable, modified using
`repo_config_get_bool()`.
In `connect.c`, we now initialize `advertise_sid` at the begining by
directly using `repo_config_get_bool()`. This change is safe because:
In the original `fetch-pack.c` code, there are only two places that write
`advertise_sid`:
1. In function `do_fetch_pack()`:
if (!sever_supports("session_id"))
advertise_sid = 0;
2. In function `fetch_pack_config()`:
repo_config_get_bool("transfer.advertisesid", &advertise_sid);
About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
assignment can be ignored, as `write_fetch_command_and_capabilities()`
is only used in v2.
About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
dependency of `connect.c`, so we can reuse it directly.
Move `write_fetch_command_and_capabilities()` to `connect.c`
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
connect.c | 34 ++++++++++++++++++++++++++++++++++
connect.h | 4 ++++
fetch-pack.c | 31 -------------------------------
3 files changed, 38 insertions(+), 31 deletions(-)
diff --git a/connect.c b/connect.c
index 47e39d2a73..1dced8e632 100644
--- a/connect.c
+++ b/connect.c
@@ -700,6 +700,40 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
+void write_fetch_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options)
+{
+ const char *hash_name;
+ int advertise_sid;
+
+ repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
+
+ ensure_server_supports_v2("fetch");
+ packet_buf_write(req_buf, "command=fetch");
+ if (server_supports_v2("agent"))
+ packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
+ if (advertise_sid && server_supports_v2("session-id"))
+ packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
+ if (server_options && server_options->nr) {
+ ensure_server_supports_v2("server-option");
+ for (size_t i = 0; i < server_options->nr; i++)
+ packet_buf_write(req_buf, "server-option=%s",
+ server_options->items[i].string);
+ }
+
+ if (server_feature_v2("object-format", &hash_name)) {
+ const unsigned int hash_algo = hash_algo_by_name(hash_name);
+ if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
+ die(_("mismatched algorithms: client %s; server %s"),
+ the_hash_algo->name, hash_name);
+ packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
+ } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
+ die(_("the server does not support algorithm '%s'"),
+ the_hash_algo->name);
+ }
+ packet_buf_delim(req_buf);
+}
+
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
diff --git a/connect.h b/connect.h
index aa482a37fb..c4f6ea4b0a 100644
--- a/connect.h
+++ b/connect.h
@@ -34,4 +34,8 @@ void check_stateless_delimiter(int stateless_rpc,
struct packet_reader *reader,
const char *error);
+struct string_list;
+void write_fetch_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options);
+
#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index f13951d154..4a8a70b5f3 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1376,37 +1376,6 @@ static int add_haves(struct fetch_negotiator *negotiator,
return haves_added;
}
-static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
-{
- const char *hash_name;
-
- ensure_server_supports_v2("fetch");
- packet_buf_write(req_buf, "command=fetch");
- if (server_supports_v2("agent"))
- packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
- if (advertise_sid && server_supports_v2("session-id"))
- packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
- if (server_options && server_options->nr) {
- ensure_server_supports_v2("server-option");
- for (size_t i = 0; i < server_options->nr; i++)
- packet_buf_write(req_buf, "server-option=%s",
- server_options->items[i].string);
- }
-
- if (server_feature_v2("object-format", &hash_name)) {
- int hash_algo = hash_algo_by_name(hash_name);
- if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
- die(_("mismatched algorithms: client %s; server %s"),
- the_hash_algo->name, hash_name);
- packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
- } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
- die(_("the server does not support algorithm '%s'"),
- the_hash_algo->name);
- }
- packet_buf_delim(req_buf);
-}
-
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
struct fetch_pack_args *args,
const struct ref *wants, struct oidset *common,
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 04/12] t1006: split test utility functions into new "lib-cat-file.sh"
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
This refactor extracts utility functions from the cat-file's test
script "t1006-cat-file.sh" into a new "lib-cat-file.sh" dedicated
library file. The goal is to improve code reuse and readability,
enabling future tests to leverage these utilities without duplicating
code.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
t/lib-cat-file.sh | 16 ++++++++++++++++
t/t1006-cat-file.sh | 13 +------------
2 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/t/lib-cat-file.sh b/t/lib-cat-file.sh
new file mode 100644
index 0000000000..44af232d74
--- /dev/null
+++ b/t/lib-cat-file.sh
@@ -0,0 +1,16 @@
+# Library of git-cat-file related test functions.
+
+# Print a string without a trailing newline.
+echo_without_newline () {
+ printf '%s' "$*"
+}
+
+# Print a string without newlines and replace them with a NULL character (\0).
+echo_without_newline_nul () {
+ echo_without_newline "$@" | tr '\n' '\0'
+}
+
+# Calculate the length of a string.
+strlen () {
+ echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
+}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8e2c52652c..8360f3bbd9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -4,6 +4,7 @@ test_description='git cat-file'
. ./test-lib.sh
. "$TEST_DIRECTORY/lib-loose.sh"
+. "$TEST_DIRECTORY"/lib-cat-file.sh
test_cmdmode_usage () {
test_expect_code 129 "$@" 2>err &&
@@ -99,18 +100,6 @@ do
'
done
-echo_without_newline () {
- printf '%s' "$*"
-}
-
-echo_without_newline_nul () {
- echo_without_newline "$@" | tr '\n' '\0'
-}
-
-strlen () {
- echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
-}
-
run_tests () {
type=$1
object_name="$2"
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 03/12] cat-file: declare loop counter inside for()
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
Some code used in this series declares variable i and only uses it
in a for loop, not in any other logic outside the loop.
Change the declaration of i to be inside the for loop for readability.
While at it, we also change its type from "int" to "size_t" where the
latter makes more sense.
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
builtin/cat-file.c | 13 ++++---------
fetch-pack.c | 3 +--
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 446d649904..fab55c11de 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -723,14 +723,12 @@ static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
struct queued_cmd *cmd,
- int nr)
+ size_t nr)
{
- int i;
-
if (!opt->buffer_output)
die(_("flush is only for --buffer mode"));
- for (i = 0; i < nr; i++)
+ for (size_t i = 0; i < nr; i++)
cmd[i].fn(opt, cmd[i].line, output, data);
fflush(stdout);
@@ -738,9 +736,7 @@ static void dispatch_calls(struct batch_options *opt,
static void free_cmds(struct queued_cmd *cmd, size_t *nr)
{
- size_t i;
-
- for (i = 0; i < *nr; i++)
+ for (size_t i = 0; i < *nr; i++)
FREE_AND_NULL(cmd[i].line);
*nr = 0;
@@ -767,7 +763,6 @@ static void batch_objects_command(struct batch_options *opt,
size_t alloc = 0, nr = 0;
while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
- int i;
const struct parse_cmd *cmd = NULL;
const char *p = NULL, *cmd_end;
struct queued_cmd call = {0};
@@ -777,7 +772,7 @@ static void batch_objects_command(struct batch_options *opt,
if (isspace(*input.buf))
die(_("whitespace before command: '%s'"), input.buf);
- for (i = 0; i < ARRAY_SIZE(commands); i++) {
+ for (size_t i = 0; i < ARRAY_SIZE(commands); i++) {
if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
continue;
diff --git a/fetch-pack.c b/fetch-pack.c
index 120e01f3cf..f13951d154 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1388,9 +1388,8 @@ static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
if (advertise_sid && server_supports_v2("session-id"))
packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
if (server_options && server_options->nr) {
- int i;
ensure_server_supports_v2("server-option");
- for (i = 0; i < server_options->nr; i++)
+ for (size_t i = 0; i < server_options->nr; i++)
packet_buf_write(req_buf, "server-option=%s",
server_options->items[i].string);
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 02/12] git-compat-util: add strtoul_ul() with error handling
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
We already have strtoul_ui() and similar functions that provide proper
error handling using strtoul from the standard library. However,
there isn't currently a variant that returns an unsigned long.
This variant is needed in a subsequent commit.
This variant is needed in a subsequent commit to enable returning an
unsigned long with proper error handling.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
git-compat-util.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/git-compat-util.h b/git-compat-util.h
index 8809776407..4bf569f35c 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -975,6 +975,26 @@ static inline int strtoul_ui(char const *s, int base, unsigned int *result)
return 0;
}
+/*
+ * Convert a string to an unsigned long using the standard library's strtoul,
+ * with additional error handling to ensure robustness.
+ */
+static inline int strtoul_ul(char const *s, int base, unsigned long *result)
+{
+ unsigned long ul;
+ char *p;
+
+ errno = 0;
+ /* negative values would be accepted by strtoul */
+ if (strchr(s, '-'))
+ return -1;
+ ul = strtoul(s, &p, base);
+ if (errno || *p || p == s)
+ return -1;
+ *result = ul;
+ return 0;
+}
+
static inline int strtol_i(char const *s, int base, int *result)
{
long ul;
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 01/12] transport-helper: fix memory leak of helper on disconnect
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
disconnect_helper() only frees data inside of the if(data->helper)
block [1]. When the transport is disconnected without the helper
being fully started, data->name allocated in transport_helper_init()
is never freed.
Move FREE_AND_NULL(data->name) outside the conditional block so it's
always freed on disconnect.
[1]: https://lore.kernel.org/git/05fbadbae2184479c87c37675dde7bd79b3e32ab.1716465556.git.ps@pks.im/
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
transport-helper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/transport-helper.c b/transport-helper.c
index 0fa0eb2d72..8a71354d50 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -266,9 +266,9 @@ static int disconnect_helper(struct transport *transport)
close(data->helper->out);
fclose(data->out);
res = finish_command(data->helper);
- FREE_AND_NULL(data->name);
FREE_AND_NULL(data->helper);
}
+ FREE_AND_NULL(data->name);
return res;
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 00/12] cat-file: add remote-object-info to batch-command
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260608-ps-eric-work-rebase-v12-0-5338b766e658@gmail.com>
This path series is a continuation of Eric Ju's (eric.peijian@gmail.com) and
Calvin Wan's (calvinwan@google.com) patch series [1] and [2] respectively.
Sometimes it is beneficial to retrieve information about an object without
having to download it completely. The server logic for retrieving size has
already been implemented and merged in "a2ba162cda (object-info: support for
retrieving object info, 2021-04-20)"[3]. This patch series implement the client
option for it.
Eric's series adds the `remote-object-info` command to
`cat-file --batch-command`. This command allows the client to make an
object-info command request to a server that supports protocol v2.
If the server uses protocol v2 but does not support the object-info capability,
`cat-file --batch-command` will die.
If a user attempts to use `remote-object-info` with protocol v1,
`cat-file --batch-command` will die.
Currently, only the size (%(objectsize)) is supported end to end in this
implementation. The type (%(objecttype)) is known by the client's allow-list
and request path but is not supported on the server side nor the response
parsing. A follow up series will add full end-to-end support for %(objecttype).
The default format for remote-object-info is set to %(objectname) %(objectsize).
Once %(objecttype) is supported, the default format will be unified accordingly.
If the batch command format includes unsupported fields such as %(objecttype),
%(objectsize:disk), or %(deltabase), the command will return empty strings for
each unsupported field.
This series completes Eric's work mainly with the refactor of the validation
of the placeholder with an allow-list that filters what the client asks with
what the server is capable of provide following Jeff King's idea [4].
I have a question for the design:
1. If the format includes unsupported fields such as %(objecttype) or
%(deltabase) it currently returns an empty string for each unsupported
field, this follows what for-each-ref does with known but inapplicable
atoms. However future placeholders that will be implemented: %(rest),
%(objectmode) can return empty strings. How should we differentiate
"unsupported" vs "no data".
Eric proposed to use a placeholder like "???" [5].
Should a placeholder be used?
2. _tangent/not related with this series_
'a2ba162cda' is designed to only work with full OIDs, which is
inconsistent with local `info` that does support short OIDs and in
case of being ambiguous returns a list of what possibly the user meant.
Because V2 protocol is thought to be stateless supporting short OIDs
could become more inconsistent with other remote commands that do not
support short OIDs. Maybe a --pick-first option? That does accept
short oids and picks the first match.
Alternatively, would sending a list of possible OIDs to the client so
it can re-request with the correct one be ok?
[1]: https://lore.kernel.org/git/20250221190451.12536-1-eric.peijian@gmail.com/
[2]: https://lore.kernel.org/git/20220728230210.2952731-1-calvinwan@google.com/#t
[3]: https://git.kernel.org/pub/scm/git/git.git/commit/?id=a2ba162cda2acc171c3e36acbbc854792b093cb7
[4]: https://lore.kernel.org/git/20250313060250.GH94015@coredump.intra.peff.net/
[5]: https://lore.kernel.org/git/CAN2LT1D3d=yMYVhBjpj5PvyjfTVjwqcFPNViuCJ=f49YbCZuJg@mail.gmail.com/
Changes since v12:
- Remote-object-info no longer dies when the server doesn't recognize
the object, printing "<oid> missing" like `info` does.
- On 12th commit explicitly cast to int and add a comment explaining why
the backward iteration of the list.
- Renamed 3rd commit and in the commit, change the signature of
dispatch_calls() as it is only called with size_t instead of ints.
- Because remote-object-info does not support short oids add a check to
improve the error report if the oid passed is valid but not long
enough or if it is an invalid oid.
- Fixed overly long lines.
- Reworded 4th commit.
- Avoid unnecessary request to the server when no placeholder is supported.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Calvin Wan (3):
fetch-pack: move fetch initialization
serve: advertise object-info feature
transport: add client support for object-info
Eric Ju (4):
git-compat-util: add strtoul_ul() with error handling
cat-file: declare loop counter inside for()
t1006: split test utility functions into new "lib-cat-file.sh"
cat-file: add remote-object-info to batch-command
Pablo Sabater (5):
transport-helper: fix memory leak of helper on disconnect
fetch-pack: move function to connect.c
connect: refactor packet writing
cat-file: validate remote atoms with allow_list
cat-file: make remote-object-info allow-list dynamic
Documentation/git-cat-file.adoc | 25 +-
Makefile | 1 +
builtin/cat-file.c | 221 ++++++++++-
connect.c | 34 ++
connect.h | 8 +
fetch-object-info.c | 106 +++++
fetch-object-info.h | 22 ++
fetch-pack.c | 51 +--
fetch-pack.h | 2 +
git-compat-util.h | 20 +
meson.build | 1 +
object-file.c | 10 +
odb.h | 3 +
serve.c | 5 +-
t/lib-cat-file.sh | 16 +
t/meson.build | 1 +
t/t1006-cat-file.sh | 13 +-
t/t1017-cat-file-remote-object-info.sh | 699 +++++++++++++++++++++++++++++++++
transport-helper.c | 13 +-
transport.c | 28 +-
transport.h | 11 +
21 files changed, 1215 insertions(+), 75 deletions(-)
---
base-commit: 4621f8ce5e9b97aa2e8d0d9ffe9d25df2471074d
change-id: 20260608-ps-eric-work-rebase-b73ae84ba671
Best regards,
--
Pablo Sabater <pabloosabaterr@gmail.com>
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:52 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <131d7ad3-7791-4d6f-bdf3-afa6b0831a71@gmail.com>
On Fri, Jun 19, 2026 at 10:40:51AM -0400, Derrick Stolee wrote:
> > [...]
> > , which gives us:
> >
> > Test HEAD^ HEAD
> > ----------------------------------------------------------------------------------------
> > 5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
> > 5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
> >
> > (eliding other tests). I considered whether there are other interesting
> > tests, but I think "repack" is the right layer to run perf tests, since
> > you're always writing a closed pack. We could try different subsets of
> > the repository's objects (which would also have to be closed), but I
> > don't think this is that interesting.
>
> This sort of thing does help to show that we're getting different
> behavior when repacking with and without --path-walk. And this test
> is showing the slightest change for git.git, but is likely more
> impactful for the other repos I've used to demonstrate the benefits.
>
> So this is the kind of data I'm hoping to see, but also with data
> from other repos whose data shapes benefit from --path-walk more
> than git.git and repos where name-hash v1 is sufficient to give a
> similar result.
I'm glad this is the sort of data you're looking for. I'm happy to run
this on other repositories.
> I'd also like to see if the repack _time_ changes with this, but
> these direct size comparisons are the biggest indicator I'd like to
> see.
Unfortunately a timing comparison is kind of a pain here. We'd have to
use test_perf, which will perform the same repack multiple times. We
could do that, though it's wasteful, and changes like bf4a60874af
(p5326: generate pack bitmaps before writing the MIDX bitmap,
2021-09-17) move us in the opposite direction.
I'm not opposed to changing this to test_perf if you feel strongly about
it.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:46 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ec45260a-1d4e-49d1-9aa8-9ec94ecd9b23@gmail.com>
On Fri, Jun 19, 2026 at 10:36:54AM -0400, Derrick Stolee wrote:
> On 6/19/2026 10:16 AM, Taylor Blau wrote:
> > On Fri, Jun 12, 2026 at 09:03:41AM -0400, Derrick Stolee wrote:
> >> On 6/2/2026 6:21 PM, Taylor Blau wrote:
> >>
> >>> As a result, we can see significantly reduced pack sizes from p5311
> >>> before this commit:
> >>
> >> I mentioned this before, but the pack _sizes_ aren't changing in this
> >> example. We are computing them more quickly, though.
> >
> > Thanks for pointing this out. The paragraph following the perf output
> > below correctly explains the results ("We get the same size of output
> > pack, but [...]"), but this one is obviously wrong.
> >
> >> Since we are testing --path-walk on both sides, the change across this
> >> commit is that we are using the bitmaps for the "counting objects" phase
> >> and then potentially using the --path-walk algorithm to construct the
> >> packfile.
> >
> > I'm not sure I agree here. Because we are using bitmaps, we're relying
> > on pack-reuse to construct the output pack, not --path-walk. I mentioned
> > in git-pack-objects(1), but the combination of seeing "--path-walk" and
> > "--use-bitmap-index" together only means that we will use a path-walk
> > traversal as fallback if we can't get an answer by relying on bitmaps.
>
> I guess my thought was that we'd construct bitmaps when they are
> available, but how do we walk objects to get the objects for commits
> that are not represented by bitmaps?
Good question, and we use the existing bitmap traversal (or the
boundary-based one, if enabled). In that case we really want something
that is topological and not path-based, so we can terminate the walk as
soon as we run into an existing set bit, or something on the negated
side of the query.
> But you make a good point: we don't need to do that for functional
> use: the bitmap code does an object walk to produce a bitmap, and it's
> all in a layer "below" the pack-objects code.
>
> So essentially, this _isn't_ a combined approach: it's "use bitmaps if
> we can, and fall back to --path-walk if we can't" which is changing
> from our previous behavior of "--path-walk means we don't try to use
> bitmaps".
Exactly!
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Derrick Stolee @ 2026-06-19 14:40 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ajVSHvL+On9AEV+g@nand.local>
On 6/19/2026 10:28 AM, Taylor Blau wrote:
> On Fri, Jun 12, 2026 at 09:24:32AM -0400, Derrick Stolee wrote:
>> On 6/2/2026 6:21 PM, Taylor Blau wrote:
>>> When 'pack-objects' is invoked with '--path-walk', it prevents us from
>>> using reachability bitmaps.
>>
>> My earlier response focused on the _use_ of bitmaps when creating a
>> packfile, but your patch also enables _writing_ bitmaps with the
>> --path-walk option, which is significant and potentially more
>> interesting from my perspective: we have evidence that --path-walk
>> can produce significantly smaller packfiles than the standard
>> algorithm, and once those packfiles are created we can benefit from
>> that size in later packfile creation steps by reusing those deltas.
>
> I am perhaps splitting hairs here, but I would frame the use of bitmaps
> when reading with "--path-walk" as "either/or" not "both/and". The main
> goal of this patch is to enable us to still generate bitmaps when
> *writing* a pack with "--path-walk".
Yes. I was confused but your response to the earlier thread made this
more clear. I'm no longer confused.
>> Even more important here is that we have demonstrated examples of repos
>> that change their packfile size when using the --path-walk method. We
>> should demonstrate that the size continues to shrink with --path-walk
>> even when producing a matching .bitmap file with --write-bitmap-index.
>
> That's fair. One way to do this would be to:
>
> --- 8< ---
> diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
> index 1b115d921a1..c1aed3e2aef 100755
> --- a/t/perf/p5311-pack-bitmaps-fetch.sh
> +++ b/t/perf/p5311-pack-bitmaps-fetch.sh
> @@ -18,6 +18,10 @@ test_fetch_bitmaps () {
> git repack -ad $argv
> '
>
> + test_size "size of bitmapped pack ${argv:+($argv)}" '
> + test_file_size .git/objects/pack/pack-*.pack
> + '
> +
> # simulate a fetch from a repository that last fetched N days ago, for
> # various values of N. We do so by following the first-parent chain,
> # and assume the first entry in the chain that is N days older than the current
> --- >8 ---
>
> , which gives us:
>
> Test HEAD^ HEAD
> ----------------------------------------------------------------------------------------
> 5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
> 5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
>
> (eliding other tests). I considered whether there are other interesting
> tests, but I think "repack" is the right layer to run perf tests, since
> you're always writing a closed pack. We could try different subsets of
> the repository's objects (which would also have to be closed), but I
> don't think this is that interesting.
This sort of thing does help to show that we're getting different
behavior when repacking with and without --path-walk. And this test
is showing the slightest change for git.git, but is likely more
impactful for the other repos I've used to demonstrate the benefits.
So this is the kind of data I'm hoping to see, but also with data
from other repos whose data shapes benefit from --path-walk more
than git.git and repos where name-hash v1 is sufficient to give a
similar result.
I'd also like to see if the repack _time_ changes with this, but
these direct size comparisons are the biggest indicator I'd like to
see.
>> The other thing that I notice here is that the bitmaps will need to
>> compute their reachable object set independently from the path-walk
>> algorithm. But I suppose that already happens separately from the
>> revision-walk approach that normally produces the packfile contents.
>
> Right. The only wrinkle here is how we handle the internal traversal's
> "--boundary" option, but see the last paragraph in the commit message
> for details on why the proposed approach is OK.
>
>> >From my perspective, the point of integrating these two things are:
>>
>> 1. Reachability bitmaps make it much faster to discover the reachable
>> set and reuse bits of existing packfiles. (Your performance table
>> demonstrates this is true.)
>>
>> 2. The --path-walk option can shrink packfile sizes by grouping
>> trees and blobs by path before those paths collide in the name-hash
>> sort. (I haven't seen evidence that this is happening.)
>>
>> With evidence of (1) and not (2), it's not clear from the data that
>> these features are integrating completely. Without looking at the
>> code, those numbers would be the same if we had instead swapped the
>> preference of "the --path-walk option disables bitmaps" to "bitmaps
>> disable --path-walk".
>
> Let me know if modifying the perf test as above (and including the
> relevant results in the commit message) would be sufficient in
> addressing your concern.
Yes, the perf test modification and data reporting is the only
missing thing at this point. You've helped me better understand the
"integration" between the features during fetches and clones.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Derrick Stolee @ 2026-06-19 14:36 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ajVPJGXuhugDcT+A@nand.local>
On 6/19/2026 10:16 AM, Taylor Blau wrote:
> On Fri, Jun 12, 2026 at 09:03:41AM -0400, Derrick Stolee wrote:
>> On 6/2/2026 6:21 PM, Taylor Blau wrote:
>>
>>> As a result, we can see significantly reduced pack sizes from p5311
>>> before this commit:
>>
>> I mentioned this before, but the pack _sizes_ aren't changing in this
>> example. We are computing them more quickly, though.
>
> Thanks for pointing this out. The paragraph following the perf output
> below correctly explains the results ("We get the same size of output
> pack, but [...]"), but this one is obviously wrong.
>
>> Since we are testing --path-walk on both sides, the change across this
>> commit is that we are using the bitmaps for the "counting objects" phase
>> and then potentially using the --path-walk algorithm to construct the
>> packfile.
>
> I'm not sure I agree here. Because we are using bitmaps, we're relying
> on pack-reuse to construct the output pack, not --path-walk. I mentioned
> in git-pack-objects(1), but the combination of seeing "--path-walk" and
> "--use-bitmap-index" together only means that we will use a path-walk
> traversal as fallback if we can't get an answer by relying on bitmaps.
I guess my thought was that we'd construct bitmaps when they are
available, but how do we walk objects to get the objects for commits
that are not represented by bitmaps?
But you make a good point: we don't need to do that for functional
use: the bitmap code does an object walk to produce a bitmap, and it's
all in a layer "below" the pack-objects code.
So essentially, this _isn't_ a combined approach: it's "use bitmaps if
we can, and fall back to --path-walk if we can't" which is changing
from our previous behavior of "--path-walk means we don't try to use
bitmaps".
Thanks,
-Stolee
^ permalink raw reply
* Re: What's cooking in git.git (Jun 2026, #06)
From: Taylor Blau @ 2026-06-19 14:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git
In-Reply-To: <xmqqtsr1w0z4.fsf@gitster.g>
On Wed, Jun 17, 2026 at 10:06:23AM -0700, Junio C Hamano wrote:
> * tb/midx-incremental-custom-base (2026-06-12) 3 commits
> - midx-write: include packs above custom incremental base
> - midx: pass custom '--base' through incremental writes
> - t5334: expose shared `nth_line()` helper
>
> The `git multi-pack-index write --incremental` command has been
> corrected to properly honor the `--base` option. Previously, the
> custom base was ignored by the normal write path, and the pack
> exclusion logic incorrectly skipped packs from layers above the
> selected base, breaking reachability closure for bitmaps.
>
> Needs review.
> source: <cover.1781294771.git.me@ttaylorr.com>
It would be nice to get this in before v2.55.0 is tagged, but I don't
think it's critical. In my analysis, the worst thing that could happen
is that generating MIDXs with a custom --base would result in a failure
to generate bitmaps, but not much else.
That's unlikely to be invoked manually, but does have the unfortunate
effect of rendering the new incremental MIDX-based repacking strategy as
useless in this release.
I'll add Peff to CC in case he has a moment to look it over.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:28 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <849c659f-efa8-430a-bfac-0c26a3ed1aaa@gmail.com>
On Fri, Jun 12, 2026 at 09:24:32AM -0400, Derrick Stolee wrote:
> On 6/2/2026 6:21 PM, Taylor Blau wrote:
> > When 'pack-objects' is invoked with '--path-walk', it prevents us from
> > using reachability bitmaps.
>
> My earlier response focused on the _use_ of bitmaps when creating a
> packfile, but your patch also enables _writing_ bitmaps with the
> --path-walk option, which is significant and potentially more
> interesting from my perspective: we have evidence that --path-walk
> can produce significantly smaller packfiles than the standard
> algorithm, and once those packfiles are created we can benefit from
> that size in later packfile creation steps by reusing those deltas.
I am perhaps splitting hairs here, but I would frame the use of bitmaps
when reading with "--path-walk" as "either/or" not "both/and". The main
goal of this patch is to enable us to still generate bitmaps when
*writing* a pack with "--path-walk".
> Even more important here is that we have demonstrated examples of repos
> that change their packfile size when using the --path-walk method. We
> should demonstrate that the size continues to shrink with --path-walk
> even when producing a matching .bitmap file with --write-bitmap-index.
That's fair. One way to do this would be to:
--- 8< ---
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 1b115d921a1..c1aed3e2aef 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -18,6 +18,10 @@ test_fetch_bitmaps () {
git repack -ad $argv
'
+ test_size "size of bitmapped pack ${argv:+($argv)}" '
+ test_file_size .git/objects/pack/pack-*.pack
+ '
+
# simulate a fetch from a repository that last fetched N days ago, for
# various values of N. We do so by following the first-parent chain,
# and assume the first entry in the chain that is N days older than the current
--- >8 ---
, which gives us:
Test HEAD^ HEAD
----------------------------------------------------------------------------------------
5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
(eliding other tests). I considered whether there are other interesting
tests, but I think "repack" is the right layer to run perf tests, since
you're always writing a closed pack. We could try different subsets of
the repository's objects (which would also have to be closed), but I
don't think this is that interesting.
> The other thing that I notice here is that the bitmaps will need to
> compute their reachable object set independently from the path-walk
> algorithm. But I suppose that already happens separately from the
> revision-walk approach that normally produces the packfile contents.
Right. The only wrinkle here is how we handle the internal traversal's
"--boundary" option, but see the last paragraph in the commit message
for details on why the proposed approach is OK.
> >From my perspective, the point of integrating these two things are:
>
> 1. Reachability bitmaps make it much faster to discover the reachable
> set and reuse bits of existing packfiles. (Your performance table
> demonstrates this is true.)
>
> 2. The --path-walk option can shrink packfile sizes by grouping
> trees and blobs by path before those paths collide in the name-hash
> sort. (I haven't seen evidence that this is happening.)
>
> With evidence of (1) and not (2), it's not clear from the data that
> these features are integrating completely. Without looking at the
> code, those numbers would be the same if we had instead swapped the
> preference of "the --path-walk option disables bitmaps" to "bitmaps
> disable --path-walk".
Let me know if modifying the perf test as above (and including the
relevant results in the commit message) would be sufficient in
addressing your concern.
Thanks,
Taylor
^ permalink raw reply related
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:16 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <6e4a8764-3c56-42c8-a87e-40a94c6c34e9@gmail.com>
On Fri, Jun 12, 2026 at 09:03:41AM -0400, Derrick Stolee wrote:
> On 6/2/2026 6:21 PM, Taylor Blau wrote:
>
> > As a result, we can see significantly reduced pack sizes from p5311
> > before this commit:
>
> I mentioned this before, but the pack _sizes_ aren't changing in this
> example. We are computing them more quickly, though.
Thanks for pointing this out. The paragraph following the perf output
below correctly explains the results ("We get the same size of output
pack, but [...]"), but this one is obviously wrong.
> Since we are testing --path-walk on both sides, the change across this
> commit is that we are using the bitmaps for the "counting objects" phase
> and then potentially using the --path-walk algorithm to construct the
> packfile.
I'm not sure I agree here. Because we are using bitmaps, we're relying
on pack-reuse to construct the output pack, not --path-walk. I mentioned
in git-pack-objects(1), but the combination of seeing "--path-walk" and
"--use-bitmap-index" together only means that we will use a path-walk
traversal as fallback if we can't get an answer by relying on bitmaps.
> And I wonder if the test setup creates a situation where we are always
> reusing deltas from the underlying packfile, so the --path-walk algorithm
> isn't doing anything to help with delta compression at this point and the
> difference in this patch is that we are replacing the object reachability
> calculation entirely with bitmaps.
>
> I suppose what I'm really worried about is that I'm hoping to see some
> evidence from a large-scale test that demonstrates that the two algorithms
> are working in tandem in a non-trivial way. I haven't seen it yet, but I
> also don't have evidence that they _aren't_ working together.
Your thinking is correct here that the test setup intentionally creates
a situation where we are reusing objects/deltas verbatim from the
bitmapped pack.
I'm not sure what "working in tandem" means here. At read time, the two
options mutually exclude one another, meaning we'll use bitmaps if we
have them, or do a path-walk traversal otherwise (or if the bitmaps we
have are somehow insufficient to perform the traversal).
The goal of this patch is not to demonstrate that the two work together
at the same time, but rather that we can write a pack using --path-walk,
and generate reachability bitmaps simultaneously.
Let me know if you have more thoughts on what "working together in a
non-trivial" way would look like here. If there are ways to improve the
compatibility of these two features in a way that yields better
performance via either smaller packs, faster generation, or both, I'm
all ears :-).
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:08 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Michael Montalbo, Derrick Stolee, Jeff King, Elijah Newren
In-Reply-To: <xmqqjyrzbjyf.fsf@gitster.g>
On Mon, Jun 15, 2026 at 01:57:28PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> > index f693cb56691..69c5da1580a 100755
> > --- a/t/t5310-pack-bitmaps.sh
> > +++ b/t/t5310-pack-bitmaps.sh
> > ...
> > + for reuse in true false
> > + do
> > + : >trace.txt &&
> > +
> > + GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
> > + git -c pack.allowPackReuse=$reuse pack-objects \
> > + --stdout --revs --path-walk --use-bitmap-index \
> > + <in >out.pack &&
> > + grep "\"category\":\"bitmap\",\"key\":\"bitmap/hits\"" trace.txt &&
>
> This gets flagged by updated test linter X-<. Use test_grep to
> pacify it.
Oops, thanks for spotting.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH] commit-graph: use timestamp_t for max parent generation accumulator
From: Taylor Blau @ 2026-06-19 14:05 UTC (permalink / raw)
To: Derrick Stolee
Cc: Patrick Steinhardt, Elijah Newren via GitGitGadget, git,
Elijah Newren
In-Reply-To: <09e50180-e165-48d8-a9d0-485283342f5c@gmail.com>
On Mon, Jun 15, 2026 at 07:44:19AM -0400, Derrick Stolee wrote:
> On 6/15/26 4:11 AM, Patrick Steinhardt wrote:
> > On Sun, Jun 14, 2026 at 06:57:50AM +0000, Elijah Newren via GitGitGadget wrote:
> > > commit-graph: use timestamp_t for max parent generation accumulator
> > > We found a few repositories in the wild with commits whose authors were
> > > apparently on a computer in the year 2120 when they recorded their
> > > commits. Apparently, in a century from now, some folks are going to have
> > > a really weird timezone as well (-13068837), though the timezone doesn't
> > > factor into this patch at all.
>
> > > @@ -1669,7 +1669,7 @@ static void compute_reachable_generation_numbers(
> > > struct commit *current = list->item;
> > > struct commit_list *parent;
> > > int all_parents_computed = 1;
> > > - uint32_t max_gen = 0;
> > > + timestamp_t max_gen = 0;
> > > for (parent = current->parents; parent; parent = parent->next) {
> > > repo_parse_commit(info->r, parent->item);
> >
> > This looks obviously correct.
>
> I agree. I was surprised this was the only necessary change, but
> your message clearly describes how the timing of the patch that
> delivered this change contributed to the mismatch.
Ditto. I reviewed a version of this patch before Elijah sent it to the
list, but this LGTM and is
Acked-by: Taylor Blau <me@ttaylorr.com>
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH] t4216: fix no-op test that breaks TAP output
From: Taylor Blau @ 2026-06-19 14:04 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Todd Zullinger, Junio C Hamano, Jeff King
In-Reply-To: <20260619-pks-t4216-drop-unused-prereq-v1-1-2ce0d7bea088@pks.im>
Hi Patrick,
A couple of thanks are owed: one to Todd for reporting this issue in the
first place, another to Peff for analyizing why it didn't appear broken
before, and a third for you for proposing a patch to fix it.
If you choose to delete this piece of test infrastructure entirely (I
think that there is an alternative direction that I would prefer, but
see below for more on why), I think the patch you wrote below is OK.
But...
On Fri, Jun 19, 2026 at 09:20:20AM +0200, Patrick Steinhardt wrote:
> In t4216 we have have a prerequisite that is active in case the system's
> `char` type is signed by default. This prerequisite isn't really used by
> anything though: while it is used to guard one of our tests, that
> specific test is essentially a no-op. So all this infrastructure does is
> to provide some debugging hint to a reader that pays a lot of attention.
I don't think that this is guarding nothing, but I agree that the test
as written is strange. As I recall, this was to sanity check the v1
Bloom values, but allow failures on platforms where the `char` type is
unsigned by default.
I don't feel that strongly about whether or not we check the exact
value of the filter, but I think there are a couple of arguments in
favor of doing so. Most compelling would be that we know that our
murmur3 implementation is correct (in at least one case) and that we
don't regress that case in the future. We do have these checks for v2
changed-path Bloom filters where the signed-ness of `char` is
irrelevant.
> Besides that, the way we set up the prerequisite also results in broken
> TAP output on systems where `char` is unsigned by default: we use
> `test_cmp()` to diff two files outside of of any test body, and if the
> files differ we enable the prerequisite. If so, the call to `test_cmp()`
> would also print output, and that output is of course not valid TAP
> output.
Given this and the above, I would probably err on the side of
designating this as 'test_lazy_prereq' or otherwise silencing the output
of 'test_cmp' so that this does not taint the TAP output.
Thanks,
Taylor
^ permalink raw reply
* Re: [RFH] Why do osx CI jobs so unreliable?
From: Patrick Steinhardt @ 2026-06-19 14:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqik7fnz90.fsf@gitster.g>
On Thu, Jun 18, 2026 at 05:35:23PM -0700, Junio C Hamano wrote:
> I've been observing that in recent push-out to 'master' and 'next',
> osx-* jobs in GitHub Actions CI keep running for 6 hours and get
> killed.
>
> What is troubling is that this seems to be very flaky. For example,
> https://github.com/git/git/actions/runs/27778820659 is testing
> 95e20213 (Hopefully final batch before -rc2, 2026-06-17) which got
> killed after wasting 6 hours in osx-clang and osx-gcc jobs.
>
> https://github.com/git/git/actions/runs/27790036076 is testing
> the same 'master', with a patch to .github/workflows/main.yml to
> remove everything except for config and osx-* jobs, which succeeded
> within 30 minutes.
>
> Stumped...
So the raw logs have the following trailer:
2026-06-18T23:53:33.2996180Z Cleaning up orphan processes
2026-06-18T23:53:33.7900380Z Terminate orphan process: pid (34022) (git-remote-http)
2026-06-18T23:53:33.9848670Z Terminate orphan process: pid (15488) (httpd)
2026-06-18T23:53:34.0321490Z Terminate orphan process: pid (13146) (httpd)
2026-06-18T23:53:34.0808280Z Terminate orphan process: pid (13145) (httpd)
2026-06-18T23:53:34.1212760Z Terminate orphan process: pid (13144) (httpd)
2026-06-18T23:53:34.1570160Z Terminate orphan process: pid (13141) (httpd)
2026-06-18T23:53:34.1924140Z Terminate orphan process: pid (12553) (bash)
2026-06-18T23:53:34.2472970Z Terminate orphan process: pid (12552) (tee)
2026-06-18T23:53:34.6547890Z Terminate orphan process: pid (21209) (bash)
So I strongly suspect that it most be one of the t555* tests.
Furthermore, the t5551 and t5559 (both of which are actually the same
test) are the only test suites that use lib-httpd.sh and which are
missing in the job logs.
I have not been able to reproduce this hang on my macOS virtual machine
though, and on GitLab I didn't notice a similar hang recently. Maybe
this is something that's specific to GitHub's environment...? No idea.
Patrick
^ permalink raw reply
* Re: [PATCH v2 2/2] doc: advise batching patch rerolls
From: Weijie Yuan @ 2026-06-19 13:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps
In-Reply-To: <xmqq4ij1vywy.fsf@gitster.g>
Sorry for the late reply. I spent some time looking back through the
discussions on earlier patch series, to check my patch itself, of course
because I'm apparently a newcomer here.
On Wed, Jun 17, 2026 at 10:50:53AM -0700, Junio C Hamano wrote:
> > If the comments require substantial rework, sending a new version
> > +sooner may save reviewers from spending time on a version you already know will
> > +change significantly.
>
> I am not sure about this one. Even though the intention to avoid
> wasting reviewers' time spent on reading through the previous
> version that will be invalidated is a good one, by definition, a
> substantial rework will naturally take time, and it is better not to
> rush and send an updated version with substantial changes that you
> yourself haven't had a chance to thoroughly review yet.
>
> In such a case, it would be a better idea to respond to the review
> that made you realize a substantial rewrite is needed with a simple
> "I'll make a substantial rework based on this comment, which would
> invalidate this and that part of the current patch series, so please
> do not waste reviewer cycles on these parts until I send an updated
> series out" message.
I think the approach you recommended is obviously more reasonable.
It would be better to give everyone a heads-up "I am working on a
new version."
I will improve this part accordingly.
> > If the topic is close to being accepted and the remaining
> > +comments are small, a quicker new version may also be fine.
>
> I am not sure if this needs to be codified.
>
> I often see (e.g., in patches from Patrick) that an iteration is
> marked clearly as final candidate that the author is not aware of
> any outstanding issues. This encourages reviewers to ask "what
> about this one raised there?" to remind what is missed, or chime in
> with "yup, this looks good" to show support. Such a note is highly
> recommended, but I do not see a need to say "the (supposedly) final
> one is specifically allowed to be sent without waiting" even then.
Actually I thought Patrick would say something here ;-) so I waited a
few more days to see whether anyone else had any suggestions.
But here I think Patrick's original intention is: If your series is
*close* to be accepted, (while I'm not sure what the precise definition
of this "close to be accepted", does it means: commented by Junio with
"Looks good", or reviewed by the community/core contributors with "Makes
sense"?) and this time there happens to be a small issue, you can
re-roll quickly to make your series more "sturdy" to wait for
maintainer's final examination and further merges.
So, I think the situation you are describing here is that this version
of the patch has already been declared by the *author* to be the final
version. (i.e. waiting for Junio to do the last exam)
Therefore, I do not think the two situations conflict with each other,
or are directly related. One concerns a patch that is already close to
receiving the maintainer's final verdict, where a minor issue is
discovered and the author quickly rerolls it. The other concerns an
author who, without realizing that some issues remain unresolved, rushes
to send what they believe to be the final version and then waits for the
maintainer to review it.
For the latter case, I think it would be better to add a sentence along
the lines of: "Before sending a new version/the final version, check
once more whether there are any unresolved issues," if the existing
documentation does not already make this clear.
That said, I am not familiar with how patch discussions have played out
in the past, so please directly point out any mistakes in my
understanding. I have to admit that, by this point in writing the
message, I have become a little tangled up in my own reasoning.
Thanks!
^ permalink raw reply
* Re: [PATCH v14 4/6] branch: add --prune-merged <branch>
From: Phillip Wood @ 2026-06-19 13:13 UTC (permalink / raw)
To: Junio C Hamano
Cc: Harald Nordgren, Harald Nordgren via GitGitGadget, git,
Kristoffer Haugsbakk, Johannes Sixt
In-Reply-To: <xmqqcxxnsufl.fsf@gitster.g>
On 18/06/2026 17:08, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> One thing I've just thought of related to this patch is whether we want
>> to protect branches that are the upstreams of branches that are not
>> slated for deletion. With stacked branches it is possible that a branch
>> has been merged but has other branches stacked on top of it that have
>> not been merged.
>
> An interesting point. We do have "this topic is built on the result
> of merging these other topics into main" and I expect the practice
> is wide spread. These base topics may graduate first, but other
> topics may still be updated.
>
> But when you rewrite these other topics, wouldn't you leave their
> bases untouched? IOW, a new iteration (i.e. "rebase -i") would
> reuse the base that was used in an earlier iteration, i.e. the
> result of an earlier merge of the other topics, some of which might
> have been pruned since then, into an older 'main', so it is OK to
> lose these other topics once they have graduated, simply because you
> wouldn't be recreating the merge that you used as the base of this
> remaining topic, no?
>
> Or am I missing something?
I was thinking that if I have feature1 with upstream origin/master and
feature2 with upstream feautre1, then once feature1 is merged I'd still
like "git log @{u}.." and "git rebase" without an explicit upstream to
work when feature2 is checked out. If "git branch --prune-merged
origin/master" deletes feautre1 then those commands stop working. Maybe
it would be sensible to update feature2's upstream once feature1 is
merged (which I think is what you're saying above) but do we really want
to force the user to do that by deleting feature1?
Thanks
Phillip
^ permalink raw reply
* Re: [PATCH] sequencer: Skip copying notes for commits that disappear during rebase
From: Uwe Kleine-König @ 2026-06-19 13:01 UTC (permalink / raw)
To: Phillip Wood; +Cc: Junio C Hamano, git
In-Reply-To: <67dbfb5c-5f07-49b8-aa32-a4635c585028@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 366 bytes --]
Hello Phillip,
On Fri, Jun 19, 2026 at 11:13:32AM +0100, Phillip Wood wrote:
> I'm happy to take this forward and try and fix at least some of the other
> bugs I've listed above. Uwe - if I don't cc you on some patches within the
> next couple of weeks please feel free to send a reminder.
Very appreciated! Looking forward to test your patches.
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v3 3/4] history: add squash subcommand to fold a range
From: Patrick Steinhardt @ 2026-06-19 12:55 UTC (permalink / raw)
To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <66b2f49fb427c7328136b2d440dc7461b97fb4e0.1781810227.git.gitgitgadget@gmail.com>
On Thu, Jun 18, 2026 at 07:17:05PM +0000, Harald Nordgren via GitGitGadget wrote:
> diff --git a/builtin/history.c b/builtin/history.c
> index 305bde3102..9d9416870f 100644
> --- a/builtin/history.c
> +++ b/builtin/history.c
> @@ -973,6 +975,156 @@ out:
> return ret;
> }
>
> +/*
> + * Resolve a "<base>..<tip>" revision range into the base commit just outside
> + * the range (which becomes the parent of the squashed commit), the oldest
> + * commit contained in the range (whose message the squash reuses), and the
> + * range tip (whose tree becomes the result). A merge inside the range is fine,
> + * but the range must have a single base and must not reach a root commit.
> + */
> +static int resolve_squash_range(struct repository *repo,
> + const char *range,
> + struct commit **base_out,
> + struct commit **oldest_out,
> + struct commit **tip_out)
> +{
> + struct rev_info revs;
> + struct commit *commit, *base = NULL, *oldest = NULL, *tip = NULL;
> + struct strvec args = STRVEC_INIT;
> + int ret;
> +
> + repo_init_revisions(repo, &revs, NULL);
> + strvec_push(&args, "ignored");
> + strvec_push(&args, "--reverse");
> + strvec_push(&args, "--topo-order");
> + strvec_push(&args, "--boundary");
> + strvec_push(&args, range);
We don't have any kind of input verification for "range". So in theory,
the user could pass whatever string here, and this may or may not work.
Also, should we use "--ancestry-path" with the first commit of the range
here? Otherwise we may incldue commits that aren't descendants of A in a
range "A..B". If not I wonder whether we might see multiple boundaries
even though we would be able to resolve the boundary unambiguously in
some cases.
> + setup_revisions_from_strvec(&args, &revs, NULL);
> + if (args.nr != 1) {
> + ret = error(_("'%s' does not name a revision range"), range);
> + goto out;
> + }
> +
> + if (prepare_revision_walk(&revs) < 0) {
> + ret = error(_("error preparing revisions"));
> + goto out;
> + }
> +
> + while ((commit = get_revision(&revs))) {
> + if (commit->object.flags & BOUNDARY) {
> + if (base) {
> + ret = error(_("range '%s' has more than one base; "
> + "cannot squash"), range);
> + goto out;
> + }
> + base = commit;
> + continue;
> + }
> + if (!oldest)
> + oldest = commit;
> + tip = commit;
> + }
Hmm. I really wonder whether we should also restrict merges. It might be
somewhat obvious that intermediate merge commits should just be
discarded. But is that equally obvious for HEAD and the base commit?
> + if (!oldest) {
> + ret = error(_("the range '%s' is empty"), range);
> + goto out;
> + }
> +
> + if (!base) {
> + ret = error(_("cannot squash the root commit"));
> + goto out;
> + }
In theory we can by squashing onto an empty tree. But it's fine to not
care about this edge case, we can still address it at a later point in
time if we ever feel the need to.
> + *base_out = base;
> + *oldest_out = oldest;
> + *tip_out = tip;
> + ret = 0;
> +
> +out:
> + reset_revision_walk();
> + release_revisions(&revs);
> + strvec_clear(&args);
> + return ret;
> +}
> +
> +static int cmd_history_squash(int argc,
> + const char **argv,
> + const char *prefix,
> + struct repository *repo)
> +{
> + const char * const usage[] = {
> + GIT_HISTORY_SQUASH_USAGE,
> + NULL,
> + };
> + enum ref_action action = REF_ACTION_DEFAULT;
> + enum commit_tree_flags flags = 0;
> + int dry_run = 0;
> + struct option options[] = {
> + OPT_CALLBACK_F(0, "update-refs", &action, "(branches|head)",
> + N_("control which refs should be updated"),
> + PARSE_OPT_NONEG, parse_ref_action),
> + OPT_BOOL('n', "dry-run", &dry_run,
> + N_("perform a dry-run without updating any refs")),
> + OPT_BIT(0, "reedit-message", &flags,
> + N_("open an editor to modify the commit message"),
> + COMMIT_TREE_EDIT_MESSAGE),
> + OPT_END(),
> + };
> + struct strbuf reflog_msg = STRBUF_INIT;
> + struct commit *base, *oldest, *tip, *rewritten;
> + const struct object_id *base_tree_oid, *tip_tree_oid;
> + struct commit_list *parents = NULL;
> + struct rev_info revs = { 0 };
> + int ret;
> +
> + argc = parse_options(argc, argv, prefix, options, usage, 0);
> + if (argc != 1) {
> + ret = error(_("command expects a single revision range"));
> + goto out;
> + }
> + repo_config(repo, git_default_config, NULL);
> +
> + if (action == REF_ACTION_DEFAULT)
> + action = REF_ACTION_BRANCHES;
> +
> + ret = resolve_squash_range(repo, argv[0], &base, &oldest, &tip);
> + if (ret < 0)
> + goto out;
> +
> + ret = setup_revwalk(repo, action, tip, &revs);
> + if (ret < 0)
> + goto out;
Oh, you already use `setup_revwalk()` here. Wouldn't that keep us from
accepting merge commits?
Patrick
^ permalink raw reply
* Re: [PATCH v3 0/4] history: add squash subcommand to fold a range
From: Patrick Steinhardt @ 2026-06-19 12:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Harald Nordgren via GitGitGadget, git, Harald Nordgren
In-Reply-To: <xmqqo6h7nza3.fsf@gitster.g>
On Thu, Jun 18, 2026 at 05:34:44PM -0700, Junio C Hamano wrote:
> "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Adds git history squash <revision-range> to fold a range of commits into its
> > oldest one, reusing that commit's message and replaying any descendants on
> > top.
>
> One thing that just occurred to me.
>
> When you have a linear history
>
> o---A---B---C
>
> you run "git history squash A..C" and come to
>
> o---X
>
> where the tree of X is the same as C, with the log message of A
> reused for it. That is simple, clean, and easy to explain.
>
> But what should happen to refs (i.e., branch head) that point at A
> or B?
It's a very good question. I had `git history squash` in my backlog for
a while, and this very question made me defer that topic repeatedly.
> I am adressing this message to Patrick as this question relates to
> the grand vision for the "git history" command. I think "git
> replay" wants to rewrite all the refs that are involved in the
> rewrite operation, while "git rebase" (without "--update-refs")
> wants to leave all others refs intact and update only the branch it
> was told to rewrite. Is it the same design as "rebase" and
> "--update-refs" controls if we update _other_ refs that happened to
> be in the range that are rewritten?
Yeah.
> Now, assuming that there do exist a mode where the command can
> update these refs that point into the history that got rewritten,
> there probably are at least two possibilities.
>
> On one hand, I think it is reasonable to _remove_ these refs that
> used to point at a section of history that disappeared (like the one
> that were pointing at A or B). Perhaps A and B were pointed at by
> two branches or tags that were used to mark "up to this point things
> are broken" and "from here on things are fixed" (i.e., imagine a
> manual bisection). After squashing all of the commits in this
> section of history, the result no longer has such transition points.
I think just pruning references would be extremely surprising to our
users.
> It also is plausible that users may want these refs that used to
> point at A or B to point at X, just like the ref that used to point
> at C would now point at X, even though I cannot offhand think of a
> good story (like "there used to be transtion points, now there
> isn't" I said above to explain why these refs should disappear) to
> support such a behaviour.
>
> Thoughts?
There are two more modes:
- If a reference points at an intermediate commit then it stays there.
- We detect this case and reject the update. Optionally, we may ask
the user what they intend to do with those other refs.
It really is kind of ambiguous what is supposed to happen, and I can
think of different scenarios where each of the possibilities would be
the best choice. So ultimately, I think the last option is the best one,
as it also gives us a way to iterate.
If so, a user would already be able to achieve that other refs keep
pointing at X by saying `git history squash --update-refs=head`. The
other modes can then be added at a later point in time as the need
arises.
Patrick
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox