* [PATCH v3 1/2] environment: move ignore_case into repo_config_values
From: Tian Yuchen @ 2026-06-19 15:51 UTC (permalink / raw)
To: git
Cc: ps, phillip.wood123, johannes.schindelin, stolee, Tian Yuchen,
Christian Couder, Ayush Chandekar, Olamide Caleb Bello
In-Reply-To: <20260619155152.642760-1-cat@malon.dev>
The 'core.ignorecase' configuration which is stored as the
global variable 'ignore_case' acts as a core filesystem
capability flag.
Move this global variable into 'struct repo_config_values' to tie it
to the specific repository instance it was read from. This reduces
global state and aligns with the ongoing libification effort.
To ensure code readability, the getter function
'repo_ignore_case()' is introduced.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Mentored-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Tian Yuchen <cat@malon.dev>
---
environment.c | 8 ++++++++
environment.h | 8 ++++++++
2 files changed, 16 insertions(+)
diff --git a/environment.c b/environment.c
index fc3ed8bb1c..bfa3cb3045 100644
--- a/environment.c
+++ b/environment.c
@@ -142,6 +142,13 @@ int is_bare_repository(void)
return is_bare_repository_cfg && !repo_get_work_tree(the_repository);
}
+int repo_ignore_case(struct repository *repo)
+{
+ return (repo && repo->initialized) ?
+ repo_config_values(repo)->ignore_case :
+ 0;
+}
+
int have_git_dir(void)
{
return startup_info->have_repository
@@ -720,5 +727,6 @@ void repo_config_values_init(struct repo_config_values *cfg)
{
cfg->attributes_file = NULL;
cfg->apply_sparse_checkout = 0;
+ cfg->ignore_case = 0;
cfg->branch_track = BRANCH_TRACK_REMOTE;
}
diff --git a/environment.h b/environment.h
index 9eb97b3869..39a8bf0b49 100644
--- a/environment.h
+++ b/environment.h
@@ -91,6 +91,7 @@ struct repo_config_values {
/* section "core" config values */
char *attributes_file;
int apply_sparse_checkout;
+ int ignore_case;
/* section "branch" config values */
enum branch_track branch_track;
@@ -123,6 +124,13 @@ int git_default_config(const char *, const char *,
int git_default_core_config(const char *var, const char *value,
const struct config_context *ctx, void *cb);
+/*
+ * Getter for the `ignore_case` field of `struct repo_config_values`.
+ * It checks `repo->initialized` to prevent calling repo_config_values()`
+ * before the repository setup is fully complete or in non-git environments.
+ */
+int repo_ignore_case(struct repository *repo);
+
void repo_config_values_init(struct repo_config_values *cfg);
/*
--
2.43.0
^ permalink raw reply related
* [PATCH v3 0/2] environment: move ignore_case into repo_config_values
From: Tian Yuchen @ 2026-06-19 15:51 UTC (permalink / raw)
To: git; +Cc: ps, phillip.wood123, johannes.schindelin, stolee, Tian Yuchen
In-Reply-To: <20260618114207.605211-1-cat@malon.dev>
This series continues the ongoing libification effort by moving
this global variable into 'struct repo_config_values', tying it
to the specific repository instance it was read from. This allows
us to encapsulate the configuration without altering its
eager-parsing behavior.
The getter function 'repo_ignore_case()' is introduced so
that we can safely retrieve the configuration value whilst
maintaining the correct fallback logic.
RFC Questions:
dir.c --- Performance overhead?
compat/win32/path-utils.c --- Is it appropriate to include the
repository.h header file?
Related materials:
[1] The practice of introducing getters for filesystem flags
to ensure safe access was previously introduced in this patch
to migrate 'protect_hfs' and 'protect_ntfs'.
When migrating 'ignore_case', the same approach is strictly followed.
[2] Derrick Stolee's previous attempt. This patch series attempted
to wrap this kind of filesystem-level variable using a lazy-loaded
global accessor get_int_config_global().
However, as Glen Choo pointed out in his review of that
series, it is strongly preferred to use plain fields in a
repository-scoped struct over global lazy-loaders, provided
those fields are properly initialized during the setup process.
Changes since V2:
- Revise the cover letter to clarify what the links lead to.
Thanks!
Mentored-by: Christian Couder christian.couder@gmail.com
Mentored-by: Ayush Chandekar ayu.chandekar@gmail.com
Mentored-by: Olamide Caleb Bello belkid98@gmail.com
Signed-off-by: Tian Yuchen cat@malon.dev
[1] https://lore.kernel.org/git/20260606143412.15443-1-cat@malon.dev/
[2] https://lore.kernel.org/git/2b4198c09cb6c04c60608d19072d419503dfe5df.1685716421.git.gitgitgadget@gmail.com/
Tian Yuchen (2):
environment: move ignore_case into repo_config_values
config: use repo_ignore_case() to access core.ignorecase
apply.c | 2 +-
builtin/fetch.c | 2 +-
builtin/mv.c | 2 +-
compat/win32/path-utils.c | 3 ++-
dir.c | 18 +++++++++---------
environment.c | 11 +++++++++--
environment.h | 9 ++++++++-
fsmonitor.c | 2 +-
name-hash.c | 6 +++---
read-cache.c | 6 +++---
refs/files-backend.c | 4 ++--
submodule.c | 2 +-
t/helper/test-lazy-init-name-hash.c | 2 +-
unpack-trees.c | 2 +-
14 files changed, 43 insertions(+), 28 deletions(-)
--
2.43.0
^ permalink raw reply
* Strange behavior of "git log" with file argument
From: Vincent Lefevre @ 2026-06-19 15:44 UTC (permalink / raw)
To: git
With 2.53.0 under Debian/unstable:
* https://github.com/git/git.git repository
at 95e20213faefeb95df29277c58ac1980ab68f701
"git log git-gui/git-gui--askyesno.sh" outputs nothing. To get logs, I
can add the -m option. In particular, this shows 3 non-merge commits.
So the behavior without -m seems incorrect, and at least unhelpful.
* https://gitlab.inria.fr/mpfr/mpfr.git repository
at 74cb29f0908c2887dc8c3e6ba7a3c5a2f20710a3
"git log --reverse AUTHORS" shows only 2 commits:
bfa9d064e1cf7a736740c73b9773eabb11da6ed7
5a9521d1f305268e575c4a5c4de13614acef6321
where bfa9d064e1cf7a736740c73b9773eabb11da6ed7 corresponds to the file
creation and 5a9521d1f305268e575c4a5c4de13614acef6321 is unrelated to
the AUTHORS file:
git show 5a9521d1f305268e575c4a5c4de13614acef6321
contains nothing about AUTHORS, and "git log AUTHORS" does not list
this commit. But "git log AUTHORS" also lists
b28347ab59db2a99168a17c3e1804000069199aa
which had been done *before* the AUTHORS file was created!
Note: According to the git-log(1) man page, the --reverse option
is supposed to affect only the order, not the list of commits to
be shown:
--reverse
Output the commits chosen to be shown (see Commit Limiting section
above) in reverse order. Cannot be combined with --walk-reflogs.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
^ permalink raw reply
* Re: [PATCH v14 4/6] branch: add --prune-merged <branch>
From: Junio C Hamano @ 2026-06-19 15:42 UTC (permalink / raw)
To: Phillip Wood
Cc: Harald Nordgren, Harald Nordgren via GitGitGadget, git,
Kristoffer Haugsbakk, Johannes Sixt
In-Reply-To: <42ffcb36-7fff-4948-9b8d-2c54eb626e66@gmail.com>
Phillip Wood <phillip.wood123@gmail.com> writes:
> I was thinking that if I have feature1 with upstream origin/master and
> feature2 with upstream feautre1, then once feature1 is merged I'd still
> like "git log @{u}.." and "git rebase" without an explicit upstream to
> work when feature2 is checked out. If "git branch --prune-merged
> origin/master" deletes feautre1 then those commands stop working. Maybe
> it would be sensible to update feature2's upstream once feature1 is
> merged (which I think is what you're saying above) but do we really want
> to force the user to do that by deleting feature1?
Ahh, reference with @{upstream}. Yeah, that _does_ make sense.
^ permalink raw reply
* Re: [PATCH v2] Makefile: dedup archives in $(LIBS) so link recipes don't repeat them
From: Junio C Hamano @ 2026-06-19 15:41 UTC (permalink / raw)
To: Harald Nordgren; +Cc: Harald Nordgren via GitGitGadget, git
In-Reply-To: <CAHwyqnWBb65dC+qSYTw9SKdufjibUmTm065feM5D9906H5SQ4w@mail.gmail.com>
Harald Nordgren <haraldnordgren@gmail.com> writes:
> I think this would be quite nice to fix for all the macOS developers
> (I don't know how many we have who are active on this list), but when
> running repeated tests it does take up some space on the terminal:
>
> ````
> ❯ git rebase --keep-base -x 'make -s && cd t && prove -j8
> t345?-history*.sh && echo'
>
> Executing: make -s && cd t && prove -j8 t345?-history*.sh && echo
> GIT_VERSION=2.55.0.rc1.20.g1e31474ef6
> ld: warning: ignoring duplicate libraries: 'libgit.a',
> 'target/release/libgitcore.a'
> ld: warning: ignoring duplicate libraries: 'libgit.a',
> 'target/release/libgitcore.a'
While I am very sympathetic that it may be annoying, I have to
wonder if that is ultimately the linker's job to accept the same
library listed twice on the same command line, deside when it can
ignore the second one, and *silently* ignore it.
Imagine this situation.
- There are two library archives, libA.a has a.o in it and libB.a
has b.o in it, respectively.
- The object file a.o defines a symbol that b.o needs, and b.o
defines a symbol a.o needs (i.e., mutually dependent). libA.a and
libB.a have other symbols in them. There are valid reasons why we
do not want to combine them into a single libAB.a.
- Now our program X uses both libraries and we build and try to link it this way:
$(CC) -c x.c # this builds x.o
$(CC) -o programX x.o libA.a libB.a # unfortunately does not work as-is
which fails because x.o uses symbol from libA.a that is not in
a.o (so a.o is not linked), and then x.o also uses something in
b.o that is picked up from libB.a. But b.o in turn needs a.o
that we already skipped. One way to make it work is to tweak the
final link phase to read like this:
$(CC) -o programX x.o libA.a libB.a libA.a
If your linker complains because we list libA.a twice, it would be
annoying.
I guess if we can assume GNU ld (e.g., gcc/clang), we can use
$(CC) -o programX x.o -Wl,--start-group libA.a libB.a -Wl,end-group
to tell the linker that they need to be processed for circular
dependencies, but listing them twice is more portable and harmless
(i.e., if all the symbols are resolved by the time the linker sees
the second libA.a, then it would not pick up anything extra from
there) way to achieve the same thing.
So from future-proofing and portability perspective (which is
another way to say maintainability we care about), I would very much
prefer to see this solved at the linker level, allowing the build
procedure to list the same library twice on the command line.
It seems that on the Internet various folks, including masonbuild
and CMake, have heard complaints from users enough and fixed the
linker by using -no_warn_duplicate_libraries option. Their approach
translates to something like the following in our build environment.
config.mak.uname | 4 ++++
1 file changed, 4 insertions(+)
diff --git i/config.mak.uname w/config.mak.uname
index 8719e09f66..e29eaaf3fd 100644
--- i/config.mak.uname
+++ w/config.mak.uname
@@ -149,6 +149,10 @@ ifeq ($(uname_S),Darwin)
ifeq ($(shell test "`expr "$(uname_R)" : '\([0-9][0-9]*\)\.'`" -ge 20 && echo 1),1)
OPEN_RETURNS_EINTR = UnfortunatelyYes
endif
+
+ # NEEDSWORK: do this only for XCode 15 or later
+ BASIC_LDFLAGS += -Wl,-no_warn_duplicate_libraries
+
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
^ permalink raw reply related
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Derrick Stolee @ 2026-06-19 15:33 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ajVXlcHgIF2XkmMQ@nand.local>
On 6/19/2026 10:52 AM, Taylor Blau wrote:
> On Fri, Jun 19, 2026 at 10:40:51AM -0400, Derrick Stolee wrote:
>>> [...]
>>> , which gives us:
>>>
>>> Test HEAD^ HEAD
>>> ----------------------------------------------------------------------------------------
>>> 5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
>>> 5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
>>>
>>> (eliding other tests). I considered whether there are other interesting
>>> tests, but I think "repack" is the right layer to run perf tests, since
>>> you're always writing a closed pack. We could try different subsets of
>>> the repository's objects (which would also have to be closed), but I
>>> don't think this is that interesting.
>>
>> This sort of thing does help to show that we're getting different
>> behavior when repacking with and without --path-walk. And this test
>> is showing the slightest change for git.git, but is likely more
>> impactful for the other repos I've used to demonstrate the benefits.
>>
>> So this is the kind of data I'm hoping to see, but also with data
>> from other repos whose data shapes benefit from --path-walk more
>> than git.git and repos where name-hash v1 is sufficient to give a
>> similar result.
>
> I'm glad this is the sort of data you're looking for. I'm happy to run
> this on other repositories.
>
>> I'd also like to see if the repack _time_ changes with this, but
>> these direct size comparisons are the biggest indicator I'd like to
>> see.
>
> Unfortunately a timing comparison is kind of a pain here. We'd have to
> use test_perf, which will perform the same repack multiple times. We
> could do that, though it's wasteful, and changes like bf4a60874af
> (p5326: generate pack bitmaps before writing the MIDX bitmap,
> 2021-09-17) move us in the opposite direction.
>
> I'm not opposed to changing this to test_perf if you feel strongly about
> it.
Repacking is expensive and time-consuming. I care a bit about it,
but not as much as I care about the size difference. Feel free to
skip the time performance impact for now.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 02/12] git-compat-util: add strtoul_ul() with error handling
From: Pablo Sabater @ 2026-06-19 15:06 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519
In-Reply-To: <20260619-ps-eric-work-rebase-v13-2-3d4c7315d2f8@gmail.com>
El vie, 19 jun 2026 a las 16:56, Pablo Sabater
(<pabloosabaterr@gmail.com>) escribió:
>
> From: Eric Ju <eric.peijian@gmail.com>
>
> We already have strtoul_ui() and similar functions that provide proper
> error handling using strtoul from the standard library. However,
> there isn't currently a variant that returns an unsigned long.
>
> This variant is needed in a subsequent commit.
>
> This variant is needed in a subsequent commit to enable returning an
> unsigned long with proper error handling.
>
> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
I should have removed the second paragraph which is duplicated, I
quickly fixed the last paragraph but didn't realize it contained the
same as above, sorry.
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 04/12] t1006: split test utility functions into new "lib-cat-file.sh"
From: Pablo Sabater @ 2026-06-19 15:02 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519
In-Reply-To: <20260619-ps-eric-work-rebase-v13-4-3d4c7315d2f8@gmail.com>
El vie, 19 jun 2026 a las 16:56, Pablo Sabater
(<pabloosabaterr@gmail.com>) escribió:
>
> From: Eric Ju <eric.peijian@gmail.com>
>
> This refactor extracts utility functions from the cat-file's test
> script "t1006-cat-file.sh" into a new "lib-cat-file.sh" dedicated
> library file. The goal is to improve code reuse and readability,
> enabling future tests to leverage these utilities without duplicating
> code.
>
> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
I forgot to mention that it is needed in a subsequent commit, sorry.
It will be in the next version.
^ permalink raw reply
* [PATCH GSoC RFC v13 12/12] cat-file: make remote-object-info allow-list dynamic
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
The static allow-list in expand_atom() is hardcoded to only allow
"objectname" and "objectsize" for remote queries. This works because
up to this point all servers will either support object-info with name
and size or they do not support them at all, but we cannot expect that
in a future different servers with different git versions to have the
same object-info capabilities. Therefore, the allow_list needs to be
dynamic depending on what does the server advertise.
The client will now:
1. Request the protocol option that the placeholder refers to (i.e.
"size" when "%(objectsize)").
2. Filters the request in fetch_object_info() dropping any option that
the server does not advertise.
3. After the fetching, the options that haven't been dropped are the ones
fetched and supported by the server, these supported options are
mapped and remote_allowed_atoms is populated with the placeholders.
4. expand_atom() checks remote_allowed_atoms with the same behaviour as
the static allow_list had.
Move object_info_options out of get_remote_info so the caller which has
data can select what options will be requested instead of requesting
always size.
Move batch_object_write() out so there will always be an output even if
all the placeholders are not supported by the server (returns an empty
line).
Include "type" in the object_info_options so once the server supports
it, the clients know already how to request it.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
builtin/cat-file.c | 97 +++++++++++++++++++++++++++++++++++------------------
fetch-object-info.c | 16 +++++++++
2 files changed, 80 insertions(+), 33 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7ad6165032..4c7b2781da 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -341,13 +341,11 @@ struct expand_data {
* Flags about when an object info is being fetched from remote.
*/
unsigned is_remote:1;
-};
-#define EXPAND_DATA_INIT { .mode = S_IFINVALID, .type = OBJ_BAD }
-static const char *remote_object_info_atoms[] = {
- "objectname",
- "objectsize",
+ struct string_list remote_allowed_atoms;
};
+#define EXPAND_DATA_INIT { .mode = S_IFINVALID, .type = OBJ_BAD, \
+ .remote_allowed_atoms = STRING_LIST_INIT_NODUP }
static int is_atom(const char *atom, const char *s, int slen)
{
@@ -359,17 +357,11 @@ static int expand_atom(struct strbuf *sb, const char *atom, int len,
struct expand_data *data)
{
if (data->is_remote) {
- size_t i, allowed_nr = ARRAY_SIZE(remote_object_info_atoms);
- for (i = 0; i < allowed_nr; i++)
- if (is_atom(remote_object_info_atoms[i], atom, len))
+ size_t i;
+ for (i = 0; i < data->remote_allowed_atoms.nr; i++)
+ if (is_atom(data->remote_allowed_atoms.items[i].string, atom, len))
break;
-
- /*
- * On remote, skip unsupported atoms returning an empty sb,
- * honoring how for-each-ref handles known but inapplicable
- * atoms (e.g. %(tagger)).
- */
- if (i == allowed_nr)
+ if (i == data->remote_allowed_atoms.nr)
return 1;
}
@@ -685,12 +677,12 @@ static int get_remote_info(struct batch_options *opt,
int argc,
const char **argv,
struct object_info **remote_object_info,
- struct oid_array *object_info_oids)
+ struct oid_array *object_info_oids,
+ struct string_list *object_info_options)
{
int retval = 0;
struct remote *remote = NULL;
struct object_id oid;
- struct string_list object_info_options = STRING_LIST_INIT_NODUP;
static struct transport *gtransport;
/*
@@ -739,15 +731,12 @@ static int get_remote_info(struct batch_options *opt,
gtransport->smart_options->object_info = 1;
gtransport->smart_options->object_info_oids = object_info_oids;
- string_list_append(&object_info_options, "size");
-
- if (object_info_options.nr > 0) {
- gtransport->smart_options->object_info_options = &object_info_options;
+ if (object_info_options->nr > 0) {
+ gtransport->smart_options->object_info_options = object_info_options;
gtransport->smart_options->object_info_data = *remote_object_info;
retval = transport_fetch_refs(gtransport, NULL);
}
cleanup:
- string_list_clear(&object_info_options, 0);
transport_disconnect(gtransport);
return retval;
}
@@ -833,6 +822,21 @@ static void parse_cmd_mailmap(struct batch_options *opt UNUSED,
load_mailmap();
}
+struct protocol_placeholder_entry {
+ const char *option;
+ const char *atom;
+};
+
+static const struct protocol_placeholder_entry remote_atom_map[] = {
+ {"size", "objectsize"},
+ {"type", "objecttype"},
+ /*
+ * Add new protocol options here. Even if the server doesn't support
+ * them the allow_list will drop them if the server doesn't advertise
+ * them.
+ */
+};
+
static void parse_cmd_remote_object_info(struct batch_options *opt,
const char *line, struct strbuf *output,
struct expand_data *data)
@@ -842,6 +846,7 @@ static void parse_cmd_remote_object_info(struct batch_options *opt,
char *line_to_split;
static struct object_info *remote_object_info;
static struct oid_array object_info_oids = OID_ARRAY_INIT;
+ struct string_list object_info_options = STRING_LIST_INIT_NODUP;
if (strlen(line) >= MAX_REMOTE_OBJ_INFO_LINE)
die(_("remote-object-info command too long"));
@@ -854,32 +859,57 @@ static void parse_cmd_remote_object_info(struct batch_options *opt,
die(_("remote-object-info supports at most %d objects"),
MAX_ALLOWED_OBJ_LIMIT);
+ if (data->info.sizep)
+ string_list_append(&object_info_options, "size");
+ if (data->info.typep)
+ string_list_append(&object_info_options, "type");
+
if (get_remote_info(opt, count, argv, &remote_object_info,
- &object_info_oids))
+ &object_info_oids, &object_info_options))
goto cleanup;
+ string_list_clear(&data->remote_allowed_atoms, 0);
+ string_list_append(&data->remote_allowed_atoms, "objectname");
+ for (size_t i = 0; i < ARRAY_SIZE(remote_atom_map); i++)
+ if (unsorted_string_list_has_string(&object_info_options, remote_atom_map[i].option))
+ string_list_append(&data->remote_allowed_atoms,
+ remote_atom_map[i].atom);
+
data->skip_object_info = 1;
for (size_t i = 0; i < object_info_oids.nr; i++) {
+ int found = 0;
data->oid = object_info_oids.oid[i];
+ /*
+ * When reaching here, it means remote-object-info can retrieve
+ * information from server without downloading them.
+ */
if (remote_object_info[i].sizep) {
- /*
- * When reaching here, it means remote-object-info can retrieve
- * information from server without downloading them.
- */
data->size = *remote_object_info[i].sizep;
- opt->batch_mode = BATCH_MODE_INFO;
- data->is_remote = 1;
- batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
- data->is_remote = 0;
- } else {
- report_object_status(opt, oid_to_hex(&data->oid), &data->oid, "missing");
+ found = 1;
}
+
+ if (remote_object_info[i].typep) {
+ data->type = *remote_object_info[i].typep;
+ found = 1;
+ }
+
+ if (!found && object_info_options.nr > 0) {
+ report_object_status(opt, oid_to_hex(&data->oid),
+ &data->oid, "missing");
+ continue;
+ }
+
+ opt->batch_mode = BATCH_MODE_INFO;
+ data->is_remote = 1;
+ batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
+ data->is_remote = 0;
}
data->skip_object_info = 0;
cleanup:
for (size_t i = 0; i < object_info_oids.nr; i++)
free_object_info_contents(&remote_object_info[i]);
+ string_list_clear(&object_info_options, 0);
free(line_to_split);
free(argv);
free(remote_object_info);
@@ -1194,6 +1224,7 @@ static int batch_objects(struct batch_options *opt)
cleanup:
strbuf_release(&input);
strbuf_release(&output);
+ string_list_clear(&data.remote_allowed_atoms, 0);
cfg->warn_on_object_refname_ambiguity = save_warning;
return retval;
}
diff --git a/fetch-object-info.c b/fetch-object-info.c
index ae035c9598..de927ecd48 100644
--- a/fetch-object-info.c
+++ b/fetch-object-info.c
@@ -39,6 +39,22 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
case protocol_v2:
if (!server_supports_v2("object-info"))
die(_("object-info capability is not enabled on the server"));
+ /*
+ * When removing an element from the list it gets swapped by the
+ * last element, iterate backwards to prevent elements skipping
+ * evaluation.
+ */
+ for (int i = (int)args->object_info_options->nr - 1; i >= 0; i--)
+ if (!server_supports_feature("object-info",
+ args->object_info_options->items[i].string, 0))
+ unsorted_string_list_delete_item(args->object_info_options, i, 0);
+ /*
+ * If no options are left after the filtering, avoid unnecessary
+ * request to the server.
+ */
+ if (!args->object_info_options->nr)
+ return 0;
+
send_object_info_request(fd_out, args);
break;
case protocol_v1:
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 11/12] cat-file: validate remote atoms with allow_list
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
strstr() is not enough to validate the format placeholders in
remote-object-info causing two errors:
- Atoms recognized by expand_atom() but the remote doesn't returns 1, but
data->type contains garbage causing segfault.
- expand_atom() returns 0 for unknown atoms, calling
strbuf_expand_bad_format() which ends in die() blocking local queries
if the same format is shared.
Add an allow_list with the supported atoms at the top of expand_atom().
In remote mode, unsupported atoms return 1 leaving the sb empty,
honoring how for-each-ref handles known but inapplicable atoms.
As extra safety, initialize data->type to OBJ_BAD and add a NULL check
for type_name() so uninitialized data doesn't cause segfault.
Update tests that expect previous die() behaviour to expect an empty
string and add an explicit test for empty string return on unknown
placeholder.
Update caveat behaviour documentation.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Documentation/git-cat-file.adoc | 5 +++--
builtin/cat-file.c | 41 +++++++++++++++++++++++++++-------
t/t1017-cat-file-remote-object-info.sh | 27 ++++++++++++++++++----
3 files changed, 59 insertions(+), 14 deletions(-)
diff --git a/Documentation/git-cat-file.adoc b/Documentation/git-cat-file.adoc
index aba20eb770..3b7a85b383 100644
--- a/Documentation/git-cat-file.adoc
+++ b/Documentation/git-cat-file.adoc
@@ -451,8 +451,9 @@ CAVEATS
-------
Note that since %(objecttype), %(objectsize:disk) and %(deltabase) are
-currently not supported by the `remote-object-info` command, we will raise
-an error and exit when they appear in the format string.
+currently not supported by the `remote-object-info` command, they will
+return an empty string for remote queries, matching how `for-each-ref`
+behaves for known but inapplicable placeholders.
Note that the sizes of objects on disk are reported accurately, but care
should be taken in drawing conclusions about which refs or objects are
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 402b2c31a1..7ad6165032 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -336,8 +336,18 @@ struct expand_data {
* optimized out.
*/
unsigned skip_object_info : 1;
+
+ /*
+ * Flags about when an object info is being fetched from remote.
+ */
+ unsigned is_remote:1;
+};
+#define EXPAND_DATA_INIT { .mode = S_IFINVALID, .type = OBJ_BAD }
+
+static const char *remote_object_info_atoms[] = {
+ "objectname",
+ "objectsize",
};
-#define EXPAND_DATA_INIT { .mode = S_IFINVALID }
static int is_atom(const char *atom, const char *s, int slen)
{
@@ -348,14 +358,31 @@ static int is_atom(const char *atom, const char *s, int slen)
static int expand_atom(struct strbuf *sb, const char *atom, int len,
struct expand_data *data)
{
+ if (data->is_remote) {
+ size_t i, allowed_nr = ARRAY_SIZE(remote_object_info_atoms);
+ for (i = 0; i < allowed_nr; i++)
+ if (is_atom(remote_object_info_atoms[i], atom, len))
+ break;
+
+ /*
+ * On remote, skip unsupported atoms returning an empty sb,
+ * honoring how for-each-ref handles known but inapplicable
+ * atoms (e.g. %(tagger)).
+ */
+ if (i == allowed_nr)
+ return 1;
+ }
+
if (is_atom("objectname", atom, len)) {
if (!data->mark_query)
strbuf_add_oid_hex(sb, &data->oid);
} else if (is_atom("objecttype", atom, len)) {
- if (data->mark_query)
+ if (data->mark_query) {
data->info.typep = &data->type;
- else
- strbuf_addstr(sb, type_name(data->type));
+ } else {
+ const char *t = type_name(data->type);
+ strbuf_addstr(sb, t ? t : "");
+ }
} else if (is_atom("objectsize", atom, len)) {
if (data->mark_query)
data->info.sizep = &data->size;
@@ -712,10 +739,6 @@ static int get_remote_info(struct batch_options *opt,
gtransport->smart_options->object_info = 1;
gtransport->smart_options->object_info_oids = object_info_oids;
- /* 'objectsize' is the only option currently supported */
- if (!strstr(opt->format, "%(objectsize)"))
- die(_("%s is currently not supported with remote-object-info"), opt->format);
-
string_list_append(&object_info_options, "size");
if (object_info_options.nr > 0) {
@@ -845,7 +868,9 @@ static void parse_cmd_remote_object_info(struct batch_options *opt,
*/
data->size = *remote_object_info[i].sizep;
opt->batch_mode = BATCH_MODE_INFO;
+ data->is_remote = 1;
batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
+ data->is_remote = 0;
} else {
report_object_status(opt, oid_to_hex(&data->oid), &data->oid, "missing");
}
diff --git a/t/t1017-cat-file-remote-object-info.sh b/t/t1017-cat-file-remote-object-info.sh
index b744e81701..9d8f114b72 100755
--- a/t/t1017-cat-file-remote-object-info.sh
+++ b/t/t1017-cat-file-remote-object-info.sh
@@ -236,6 +236,21 @@ test_expect_success 'remote-object-info does not die on missing oid like info' '
)
'
+# This tests depends on %(objecttype) not being supported yet, once supported
+# it needs to be updated.
+test_expect_success 'unsupported placeholder on remote returns empty string' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ echo "" >expect &&
+ git cat-file --batch-command="%(objecttype)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
# Test --batch-command remote-object-info with 'git://' and
# transfer.advertiseobjectinfo set to false, i.e. server does not have object-info capability
test_expect_success 'batch-command remote-object-info git:// fails when transfer.advertiseobjectinfo=false' '
@@ -575,10 +590,12 @@ test_expect_success 'remote-object-info fails on unsupported filter option (obje
set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
- test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
+ echo "$hello_oid " >expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize:disk)" >actual <<-EOF &&
remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
EOF
- test_grep "%(objectsize:disk) is currently not supported with remote-object-info" err
+ test_cmp expect actual
)
'
@@ -587,10 +604,12 @@ test_expect_success 'remote-object-info fails on unsupported filter option (delt
set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
- test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
+ echo "" >expect &&
+
+ git cat-file --batch-command="%(deltabase)" >actual <<-EOF &&
remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
EOF
- test_grep "%(deltabase) is currently not supported with remote-object-info" err
+ test_cmp expect actual
)
'
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 10/12] cat-file: add remote-object-info to batch-command
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Jonathan Tan, Calvin Wan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
Since the `info` command in `cat-file --batch-command` prints object
info for a given object, it is natural to add another command in
`cat-file --batch-command` to print object info for a given object
from a remote.
Add `remote-object-info` to `cat-file --batch-command`.
While `info` takes object ids one at a time, this creates
overhead when making requests to a server. So `remote-object-info`
instead can take multiple object ids at once.
The `cat-file --batch-command` command is generally implemented in
the following manner:
- Receive and parse input from user
- Call respective function attached to command
- Get object info, print object info
In --buffer mode, this changes to:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue
- Call respective function attached to command
- Get object info, print object info
Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:
- Receive and parse input from user
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Parse input, get object info, print object info
And finally for --buffer mode `remote-object-info`:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue:
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Get object info, print object info
To summarize, `remote-object-info` gets object info from the remote and
then loops through the object info passed in, printing the info.
In order for `remote-object-info` to avoid remote communication
overhead in the non-buffer mode, the objects are passed in as such:
remote-object-info <remote> <oid> <oid> ... <oid>
rather than
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
...
remote-object-info <remote> <oid>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Documentation/git-cat-file.adoc | 24 +-
builtin/cat-file.c | 144 ++++++-
object-file.c | 10 +
odb.h | 3 +
t/meson.build | 1 +
t/t1017-cat-file-remote-object-info.sh | 680 +++++++++++++++++++++++++++++++++
transport.c | 4 +-
7 files changed, 859 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-cat-file.adoc b/Documentation/git-cat-file.adoc
index 86b9181599..aba20eb770 100644
--- a/Documentation/git-cat-file.adoc
+++ b/Documentation/git-cat-file.adoc
@@ -169,6 +169,13 @@ info <object>::
Print object info for object reference `<object>`. This corresponds to the
output of `--batch-check`.
+remote-object-info <remote> <object>...::
+ Print object info for object references `<object>` at specified
+ `<remote>` without downloading objects from the remote.
+ Raise an error when the `object-info` capability is not supported by the remote.
+ Raise an error when no object references are provided.
+ This command may be combined with `--buffer`.
+
flush::
Used with `--buffer` to execute all preceding commands that were issued
since the beginning or since the last flush was issued. When `--buffer`
@@ -312,7 +319,8 @@ newline. The available atoms are:
The full hex representation of the object name.
`objecttype`::
- The type of the object (the same as `cat-file -t` reports).
+ The type of the object (the same as `cat-file -t` reports). See
+ `CAVEATS` below. Not supported by `remote-object-info`.
`objectmode`::
If the specified object has mode information (such as a tree or
@@ -325,13 +333,14 @@ newline. The available atoms are:
`objectsize:disk`::
The size, in bytes, that the object takes up on disk. See the
- note about on-disk sizes in the `CAVEATS` section below.
+ note about on-disk sizes in the `CAVEATS` section below. Not
+ supported by `remote-object-info`.
`deltabase`::
If the object is stored as a delta on-disk, this expands to the
full hex representation of the delta base object name.
Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
- below.
+ below. Not supported by `remote-object-info`.
`rest`::
If this atom is used in the output string, input lines are split
@@ -341,7 +350,10 @@ newline. The available atoms are:
line) are output in place of the `%(rest)` atom.
If no format is specified, the default format is `%(objectname)
-%(objecttype) %(objectsize)`.
+%(objecttype) %(objectsize)`, except for `remote-object-info` commands which use
+`%(objectname) %(objectsize)` for now because "%(objecttype)" is not supported yet.
+WARNING: When "%(objecttype)" is supported, the default format WILL be unified, so
+DO NOT RELY on the current default format to stay the same!!!
If `--batch` is specified, or if `--batch-command` is used with the `contents`
command, the object information is followed by the object contents (consisting
@@ -438,6 +450,10 @@ scripting purposes.
CAVEATS
-------
+Note that since %(objecttype), %(objectsize:disk) and %(deltabase) are
+currently not supported by the `remote-object-info` command, we will raise
+an error and exit when they appear in the format string.
+
Note that the sizes of objects on disk are reported accurately, but care
should be taken in drawing conclusions about which refs or objects are
responsible for disk usage. The size of a packed non-delta object may be
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index fab55c11de..402b2c31a1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -29,6 +29,22 @@
#include "promisor-remote.h"
#include "mailmap.h"
#include "write-or-die.h"
+#include "alias.h"
+#include "remote.h"
+#include "transport.h"
+
+/*
+ * Maximum length for a remote URL. While no universal standard exists,
+ * 8K is assumed to be a reasonable limit.
+ */
+#define MAX_REMOTE_URL_LEN (8 * 1024)
+
+/* Maximum number of objects allowed in a single remote-object-info request. */
+#define MAX_ALLOWED_OBJ_LIMIT 10000
+
+/* Maximum input size permitted for the remote-object-info command. */
+#define MAX_REMOTE_OBJ_INFO_LINE \
+ (MAX_REMOTE_URL_LEN + MAX_ALLOWED_OBJ_LIMIT * (GIT_MAX_HEXSZ + 1))
enum batch_mode {
BATCH_MODE_CONTENTS,
@@ -638,6 +654,81 @@ static void batch_one_object(const char *obj_name,
object_context_release(&ctx);
}
+static int get_remote_info(struct batch_options *opt,
+ int argc,
+ const char **argv,
+ struct object_info **remote_object_info,
+ struct oid_array *object_info_oids)
+{
+ int retval = 0;
+ struct remote *remote = NULL;
+ struct object_id oid;
+ struct string_list object_info_options = STRING_LIST_INIT_NODUP;
+ static struct transport *gtransport;
+
+ /*
+ * Change the format to "%(objectname) %(objectsize)" when
+ * remote-object-info command is used. Once we start supporting objecttype
+ * the default format should change to DEFAULT_FORMAT.
+ */
+ if (!opt->format)
+ opt->format = "%(objectname) %(objectsize)";
+
+ remote = remote_get(argv[0]);
+ if (!remote)
+ die(_("must supply valid remote when using remote-object-info"));
+
+ oid_array_clear(object_info_oids);
+ for (size_t i = 1; i < argc; i++) {
+ if (get_oid_hex(argv[i], &oid)) {
+ size_t len = strlen(argv[i]);
+
+ if (len < the_hash_algo->hexsz && len >= 4) {
+ size_t j;
+ for (j = 0; j < len; j++)
+ if (!isxdigit(argv[i][j]))
+ break;
+ if (j == len)
+ die(_("remote-object-info does not support "
+ "short oids, %d characters required"),
+ (int)the_hash_algo->hexsz);
+ }
+ die(_("not a valid object name '%s'"), argv[i]);
+ }
+ oid_array_append(object_info_oids, &oid);
+ }
+
+ if (!object_info_oids->nr)
+ die(_("remote-object-info requires objects"));
+
+ gtransport = transport_get(remote, NULL);
+
+ if (!gtransport->smart_options) {
+ retval = -1;
+ goto cleanup;
+ }
+
+ CALLOC_ARRAY(*remote_object_info, object_info_oids->nr);
+ gtransport->smart_options->object_info = 1;
+ gtransport->smart_options->object_info_oids = object_info_oids;
+
+ /* 'objectsize' is the only option currently supported */
+ if (!strstr(opt->format, "%(objectsize)"))
+ die(_("%s is currently not supported with remote-object-info"), opt->format);
+
+ string_list_append(&object_info_options, "size");
+
+ if (object_info_options.nr > 0) {
+ gtransport->smart_options->object_info_options = &object_info_options;
+ gtransport->smart_options->object_info_data = *remote_object_info;
+ retval = transport_fetch_refs(gtransport, NULL);
+ }
+cleanup:
+ string_list_clear(&object_info_options, 0);
+ transport_disconnect(gtransport);
+ return retval;
+}
+
struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
@@ -719,6 +810,56 @@ static void parse_cmd_mailmap(struct batch_options *opt UNUSED,
load_mailmap();
}
+static void parse_cmd_remote_object_info(struct batch_options *opt,
+ const char *line, struct strbuf *output,
+ struct expand_data *data)
+{
+ int count;
+ const char **argv;
+ char *line_to_split;
+ static struct object_info *remote_object_info;
+ static struct oid_array object_info_oids = OID_ARRAY_INIT;
+
+ if (strlen(line) >= MAX_REMOTE_OBJ_INFO_LINE)
+ die(_("remote-object-info command too long"));
+
+ line_to_split = xstrdup(line);
+ count = split_cmdline(line_to_split, &argv);
+ if (count < 0)
+ die(_("split remote-object-info command"));
+ if (count - 1 > MAX_ALLOWED_OBJ_LIMIT)
+ die(_("remote-object-info supports at most %d objects"),
+ MAX_ALLOWED_OBJ_LIMIT);
+
+ if (get_remote_info(opt, count, argv, &remote_object_info,
+ &object_info_oids))
+ goto cleanup;
+
+ data->skip_object_info = 1;
+ for (size_t i = 0; i < object_info_oids.nr; i++) {
+ data->oid = object_info_oids.oid[i];
+ if (remote_object_info[i].sizep) {
+ /*
+ * When reaching here, it means remote-object-info can retrieve
+ * information from server without downloading them.
+ */
+ data->size = *remote_object_info[i].sizep;
+ opt->batch_mode = BATCH_MODE_INFO;
+ batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
+ } else {
+ report_object_status(opt, oid_to_hex(&data->oid), &data->oid, "missing");
+ }
+ }
+ data->skip_object_info = 0;
+
+cleanup:
+ for (size_t i = 0; i < object_info_oids.nr; i++)
+ free_object_info_contents(&remote_object_info[i]);
+ free(line_to_split);
+ free(argv);
+ free(remote_object_info);
+}
+
static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
@@ -750,8 +891,9 @@ static const struct parse_cmd {
} commands[] = {
{ "contents", parse_cmd_contents, 1 },
{ "info", parse_cmd_info, 1 },
- { "flush", NULL, 0 },
{ "mailmap", parse_cmd_mailmap, 1 },
+ { "remote-object-info", parse_cmd_remote_object_info, 1 },
+ { "flush", NULL, 0 },
};
static void batch_objects_command(struct batch_options *opt,
diff --git a/object-file.c b/object-file.c
index 9afa842da2..ef31a47939 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1694,3 +1694,13 @@ struct odb_transaction *odb_transaction_files_begin(struct odb_source *source)
return &transaction->base;
}
+
+void free_object_info_contents(struct object_info *object_info)
+{
+ if (!object_info)
+ return;
+ free(object_info->typep);
+ free(object_info->sizep);
+ free(object_info->disk_sizep);
+ free(object_info->delta_base_oid);
+}
diff --git a/odb.h b/odb.h
index 0030467a52..168ea12da7 100644
--- a/odb.h
+++ b/odb.h
@@ -573,4 +573,7 @@ void parse_alternates(const char *string,
const char *relative_base,
struct strvec *out);
+/* Free pointers inside of object_info, but not object_info itself */
+void free_object_info_contents(struct object_info *object_info);
+
#endif /* ODB_H */
diff --git a/t/meson.build b/t/meson.build
index c5832fee05..33327dd1df 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -170,6 +170,7 @@ integration_tests = [
't1014-read-tree-confusing.sh',
't1015-read-index-unmerged.sh',
't1016-compatObjectFormat.sh',
+ 't1017-cat-file-remote-object-info.sh',
't1020-subdirectory.sh',
't1022-read-tree-partial-clone.sh',
't1050-large.sh',
diff --git a/t/t1017-cat-file-remote-object-info.sh b/t/t1017-cat-file-remote-object-info.sh
new file mode 100755
index 0000000000..b744e81701
--- /dev/null
+++ b/t/t1017-cat-file-remote-object-info.sh
@@ -0,0 +1,680 @@
+#!/bin/sh
+
+test_description='git cat-file --batch-command with remote-object-info command'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-cat-file.sh
+
+hello_content="Hello World"
+hello_size=$(strlen "$hello_content")
+hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
+hello_short_oid=$(git rev-parse --short "$hello_oid")
+
+unstored_content="Hello Git"
+unstored_oid=$(echo_without_newline "$unstored_content" | git hash-object --stdin)
+
+# This is how we get 13:
+# 13 = <file mode> + <a_space> + <file name> + <a_null>, where
+# file mode is 100644, which is 6 characters;
+# file name is hello, which is 5 characters
+# a space is 1 character and a null is 1 character
+tree_size=$(($(test_oid rawsz) + 13))
+
+commit_message="Initial commit"
+
+# This is how we get 137:
+# 137 = <tree header> + <a_space> + <a newline> +
+# <Author line> + <a newline> +
+# <Committer line> + <a newline> +
+# <a newline> +
+# <commit message length>
+# An easier way to calculate is: 1. use `git cat-file commit <commit hash> | wc -c`,
+# to get 177, 2. then deduct 40 hex characters to get 137
+commit_size=$(($(test_oid hexsz) + 137))
+
+tag_header_without_oid="type blob
+tag hellotag
+tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_header_without_timestamp="object $hello_oid
+$tag_header_without_oid"
+tag_description="This is a tag"
+tag_content="$tag_header_without_timestamp 0 +0000
+
+$tag_description"
+
+tag_oid=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
+tag_size=$(strlen "$tag_content")
+
+set_transport_variables () {
+ hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
+ tree_oid=$(git -C "$1" write-tree)
+ commit_oid=$(echo_without_newline "$commit_message" | git -C "$1" commit-tree $tree_oid)
+ tag_oid=$(echo_without_newline "$tag_content" | git -C "$1" hash-object -t tag --stdin -w)
+ tag_size=$(strlen "$tag_content")
+}
+
+# This section tests --batch-command with remote-object-info command
+# Since "%(objecttype)" is currently not supported by the command remote-object-info ,
+# the filters are set to "%(objectname) %(objectsize)" in some test cases.
+
+# Test --batch-command remote-object-info with 'git://' transport with
+# transfer.advertiseobjectinfo set to true, i.e. server has object-info capability
+. "$TEST_DIRECTORY"/lib-git-daemon.sh
+start_git_daemon --export-all --enable=receive-pack
+daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
+
+test_expect_success 'create repo to be served by git-daemon' '
+ git init "$daemon_parent" &&
+ echo_without_newline "$hello_content" > $daemon_parent/hello &&
+ git -C "$daemon_parent" update-index --add hello &&
+ git -C "$daemon_parent" config transfer.advertiseobjectinfo true &&
+ git clone "$GIT_DAEMON_URL/parent" -n "$daemon_parent/daemon_client_empty"
+'
+
+test_expect_success 'batch-command remote-object-info git://' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_oid
+ remote-object-info "$GIT_DAEMON_URL/parent" $tree_oid
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_oid
+ remote-object-info "$GIT_DAEMON_URL/parent" $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info git:// multiple sha1 per line' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_oid $tree_oid $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info git:// default filter' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+ GIT_TRACE_PACKET=1 git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_oid $tree_oid
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_oid $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info git://' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$GIT_DAEMON_URL/parent" $hello_oid $tree_oid
+ remote-object-info "$GIT_DAEMON_URL/parent" $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command -Z remote-object-info git:// default filter' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ printf "%s\0" "$hello_oid $hello_size" >expect &&
+ printf "%s\0" "$tree_oid $tree_size" >>expect &&
+ printf "%s\0" "$commit_oid $commit_size" >>expect &&
+ printf "%s\0" "$tag_oid $tag_size" >>expect &&
+
+ printf "%s\0" "$hello_oid missing" >>expect &&
+ printf "%s\0" "$tree_oid missing" >>expect &&
+ printf "%s\0" "$commit_oid missing" >>expect &&
+ printf "%s\0" "$tag_oid missing" >>expect &&
+
+ batch_input="remote-object-info $GIT_DAEMON_URL/parent $hello_oid $tree_oid
+remote-object-info $GIT_DAEMON_URL/parent $commit_oid $tag_oid
+info $hello_oid
+info $tree_oid
+info $commit_oid
+info $tag_oid
+" &&
+ echo_without_newline_nul "$batch_input" >commands_null_delimited &&
+
+ git cat-file --batch-command -Z < commands_null_delimited >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'remote-object-info does not support short oids' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info $GIT_DAEMON_URL/parent $hello_short_oid
+ EOF
+ test_grep "does not support short oids" err
+ )
+'
+
+test_expect_success 'remote-object-info does not die on missing oid like info' '
+ (
+ set_transport_variables "$daemon_parent" &&
+ cd "$daemon_parent/daemon_client_empty" &&
+
+ git cat-file --batch-command >local <<-EOF &&
+ info $unstored_oid
+ EOF
+ git cat-file --batch-command >remote <<-EOF &&
+ remote-object-info $GIT_DAEMON_URL/parent $unstored_oid
+ EOF
+ test_cmp local remote
+ )
+'
+
+# Test --batch-command remote-object-info with 'git://' and
+# transfer.advertiseobjectinfo set to false, i.e. server does not have object-info capability
+test_expect_success 'batch-command remote-object-info git:// fails when transfer.advertiseobjectinfo=false' '
+ (
+ git -C "$daemon_parent" config transfer.advertiseobjectinfo false &&
+ set_transport_variables "$daemon_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info $GIT_DAEMON_URL/parent $hello_oid $tree_oid $commit_oid $tag_oid
+ EOF
+ test_grep "object-info capability is not enabled on the server" err &&
+
+ # revert server state back
+ git -C "$daemon_parent" config transfer.advertiseobjectinfo true
+
+ )
+'
+
+stop_git_daemon
+
+# Test --batch-command remote-object-info with 'file://' transport with
+# transfer.advertiseobjectinfo set to true, i.e. server has object-info capability
+# shellcheck disable=SC2016
+test_expect_success 'create repo to be served by file:// transport' '
+ git init server &&
+ git -C server config protocol.version 2 &&
+ git -C server config transfer.advertiseobjectinfo true &&
+ echo_without_newline "$hello_content" > server/hello &&
+ git -C server update-index --add hello &&
+ git clone -n "file://$(pwd)/server" file_client_empty
+'
+
+test_expect_success 'batch-command remote-object-info file://' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ cd file_client_empty &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://${server_path}" $hello_oid
+ remote-object-info "file://${server_path}" $tree_oid
+ remote-object-info "file://${server_path}" $commit_oid
+ remote-object-info "file://${server_path}" $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// multiple sha1 per line' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ cd file_client_empty &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "file://${server_path}" $hello_oid $tree_oid $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info file://' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ cd file_client_empty &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "file://${server_path}" $hello_oid $tree_oid
+ remote-object-info "file://${server_path}" $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info file:// default filter' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ cd file_client_empty &&
+
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "file://${server_path}" $hello_oid $tree_oid
+ remote-object-info "file://${server_path}" $commit_oid $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command -Z remote-object-info file:// default filter' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ cd file_client_empty &&
+
+ printf "%s\0" "$hello_oid $hello_size" >expect &&
+ printf "%s\0" "$tree_oid $tree_size" >>expect &&
+ printf "%s\0" "$commit_oid $commit_size" >>expect &&
+ printf "%s\0" "$tag_oid $tag_size" >>expect &&
+
+ printf "%s\0" "$hello_oid missing" >>expect &&
+ printf "%s\0" "$tree_oid missing" >>expect &&
+ printf "%s\0" "$commit_oid missing" >>expect &&
+ printf "%s\0" "$tag_oid missing" >>expect &&
+
+ batch_input="remote-object-info \"file://${server_path}\" $hello_oid $tree_oid
+remote-object-info \"file://${server_path}\" $commit_oid $tag_oid
+info $hello_oid
+info $tree_oid
+info $commit_oid
+info $tag_oid
+" &&
+ echo_without_newline_nul "$batch_input" >commands_null_delimited &&
+
+ git cat-file --batch-command -Z < commands_null_delimited >actual &&
+ test_cmp expect actual
+ )
+'
+
+# Test --batch-command remote-object-info with 'file://' and
+# transfer.advertiseobjectinfo set to false, i.e. server does not have object-info capability
+test_expect_success 'batch-command remote-object-info file:// fails when transfer.advertiseobjectinfo=false' '
+ (
+ set_transport_variables "server" &&
+ server_path="$(pwd)/server" &&
+ git -C "${server_path}" config transfer.advertiseobjectinfo false &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "file://${server_path}" $hello_oid $tree_oid $commit_oid $tag_oid
+ EOF
+ test_grep "object-info capability is not enabled on the server" err &&
+
+ # revert server state back
+ git -C "${server_path}" config transfer.advertiseobjectinfo true
+ )
+'
+
+# Test --batch-command remote-object-info with 'http://' transport with
+# transfer.advertiseobjectinfo set to true, i.e. server has object-info capability
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'create repo to be served by http:// transport' '
+ git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config transfer.advertiseobjectinfo true &&
+ echo_without_newline "$hello_content" > $HTTPD_DOCUMENT_ROOT_PATH/http_parent/hello &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" update-index --add hello &&
+ git clone "$HTTPD_URL/smart/http_parent" -n "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty"
+'
+
+test_expect_success 'batch-command remote-object-info http://' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tree_oid
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_oid
+ remote-object-info "$HTTPD_URL/smart/http_parent" $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// one line' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid $tree_oid $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command --buffer remote-object-info http://' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty" &&
+
+ # These results prove remote-object-info can get object info from the remote
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ # These results prove remote-object-info did not download objects from the remote
+ echo "$hello_oid missing" >>expect &&
+ echo "$tree_oid missing" >>expect &&
+ echo "$commit_oid missing" >>expect &&
+ echo "$tag_oid missing" >>expect &&
+
+ git cat-file --batch-command="%(objectname) %(objectsize)" --buffer >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid $tree_oid
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_oid $tag_oid
+ info $hello_oid
+ info $tree_oid
+ info $commit_oid
+ info $tag_oid
+ flush
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command remote-object-info http:// default filter' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty" &&
+
+ echo "$hello_oid $hello_size" >expect &&
+ echo "$tree_oid $tree_size" >>expect &&
+ echo "$commit_oid $commit_size" >>expect &&
+ echo "$tag_oid $tag_size" >>expect &&
+
+ git cat-file --batch-command >actual <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid $tree_oid
+ remote-object-info "$HTTPD_URL/smart/http_parent" $commit_oid $tag_oid
+ EOF
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'batch-command -Z remote-object-info http:// default filter' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_client_empty" &&
+
+ printf "%s\0" "$hello_oid $hello_size" >expect &&
+ printf "%s\0" "$tree_oid $tree_size" >>expect &&
+ printf "%s\0" "$commit_oid $commit_size" >>expect &&
+ printf "%s\0" "$tag_oid $tag_size" >>expect &&
+
+ batch_input="remote-object-info $HTTPD_URL/smart/http_parent $hello_oid $tree_oid
+remote-object-info $HTTPD_URL/smart/http_parent $commit_oid $tag_oid
+" &&
+ echo_without_newline_nul "$batch_input" >commands_null_delimited &&
+
+ git cat-file --batch-command -Z < commands_null_delimited >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'remote-object-info fails on unsupported filter option (objectsize:disk)' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
+ EOF
+ test_grep "%(objectsize:disk) is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on unsupported filter option (deltabase)' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
+ EOF
+ test_grep "%(deltabase) is currently not supported with remote-object-info" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
+ EOF
+ test_grep "remote-object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on server with legacy protocol with default filter' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git -c protocol.version=0 cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid
+ EOF
+ test_grep "remote-object-info requires protocol v2" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_grep "not a valid object name '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on malformed OID with default filter' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ malformed_object_id="this_id_is_not_valid" &&
+
+ test_must_fail git cat-file --batch-command 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $malformed_object_id
+ EOF
+ test_grep "not a valid object name '$malformed_object_id'" err
+ )
+'
+
+test_expect_success 'remote-object-info fails on not providing OID' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent"
+ EOF
+ test_grep "remote-object-info requires objects" err
+ )
+'
+
+
+# Test --batch-command remote-object-info with 'http://' transport and
+# transfer.advertiseobjectinfo set to false, i.e. server does not have object-info capability
+test_expect_success 'batch-command remote-object-info http:// fails when transfer.advertiseobjectinfo=false ' '
+ (
+ set_transport_variables "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config transfer.advertiseobjectinfo false &&
+
+ test_must_fail git cat-file --batch-command="%(objectname) %(objectsize)" 2>err <<-EOF &&
+ remote-object-info "$HTTPD_URL/smart/http_parent" $hello_oid $tree_oid $commit_oid $tag_oid
+ EOF
+ test_grep "object-info capability is not enabled on the server" err &&
+
+ # revert server state back
+ git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config transfer.advertiseobjectinfo true
+ )
+'
+
+# DO NOT add non-httpd-specific tests here, because the last part of this
+# test script is only executed when httpd is available and enabled.
+
+test_done
diff --git a/transport.c b/transport.c
index 7d3246e12b..81faf8e748 100644
--- a/transport.c
+++ b/transport.c
@@ -470,8 +470,8 @@ static int fetch_refs_via_pack(struct transport *transport,
args.reject_shallow_remote = transport->smart_options->reject_shallow;
args.object_info = transport->smart_options->object_info;
- if (transport->smart_options->object_info
- && transport->smart_options->object_info_oids->nr > 0) {
+ if (transport->smart_options->object_info &&
+ transport->smart_options->object_info_oids->nr > 0) {
struct packet_reader reader;
struct object_info_args obj_info_args = { 0 };
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 09/12] transport: add client support for object-info
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Calvin Wan, Jonathan Tan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Calvin Wan <calvinwan@google.com>
Sometimes, it is beneficial to retrieve information about an object
without downloading it entirely. The server-side logic for this
functionality was implemented in commit "a2ba162cda (object-info:
support for retrieving object info, 2021-04-20)." And the wire
format is documented at
https://git-scm.com/docs/protocol-v2#_object_info.
This commit introduces client functions to interact with the server.
Currently, the client supports requesting a list of object IDs with
the 'size' feature from a v2 server. If the server does not advertise
this feature (i.e., transfer.advertiseobjectinfo is set to false),
the client will return an error and exit.
Notice that the entire request is written into req_buf before being
sent to the remote. This approach follows the pattern used in the
`send_fetch_request()` logic within fetch-pack.c.
Streaming the request is not addressed in this patch.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Makefile | 1 +
fetch-object-info.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fetch-object-info.h | 22 +++++++++++++
fetch-pack.c | 3 ++
fetch-pack.h | 2 ++
meson.build | 1 +
transport-helper.c | 11 +++++--
transport.c | 28 ++++++++++++++++-
transport.h | 11 +++++++
9 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/Makefile b/Makefile
index 1cec251f43..ec4df39a6b 100644
--- a/Makefile
+++ b/Makefile
@@ -1159,6 +1159,7 @@ LIB_OBJS += ewah/ewah_rlw.o
LIB_OBJS += exec-cmd.o
LIB_OBJS += fetch-negotiator.o
LIB_OBJS += fetch-pack.o
+LIB_OBJS += fetch-object-info.o
LIB_OBJS += fmt-merge-msg.o
LIB_OBJS += fsck.o
LIB_OBJS += fsmonitor.o
diff --git a/fetch-object-info.c b/fetch-object-info.c
new file mode 100644
index 0000000000..ae035c9598
--- /dev/null
+++ b/fetch-object-info.c
@@ -0,0 +1,90 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "hex.h"
+#include "pkt-line.h"
+#include "connect.h"
+#include "oid-array.h"
+#include "odb.h"
+#include "fetch-object-info.h"
+#include "string-list.h"
+
+/* Sends git-cat-file object-info command and its arguments into the request buffer. */
+static void send_object_info_request(const int fd_out, struct object_info_args *args)
+{
+ struct strbuf req_buf = STRBUF_INIT;
+
+ write_command_and_capabilities(&req_buf, "object-info", args->server_options);
+
+ if (unsorted_string_list_has_string(args->object_info_options, "size"))
+ packet_buf_write(&req_buf, "size");
+
+ if (args->oids)
+ for (size_t i = 0; i < args->oids->nr; i++)
+ packet_buf_write(&req_buf, "oid %s", oid_to_hex(&args->oids->oid[i]));
+
+ packet_buf_flush(&req_buf);
+ if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
+ die_errno(_("unable to write request to remote"));
+
+ strbuf_release(&req_buf);
+}
+
+int fetch_object_info(const enum protocol_version version, struct object_info_args *args,
+ struct packet_reader *reader, struct object_info *object_info_data,
+ const int stateless_rpc, const int fd_out)
+{
+ int size_index = -1;
+
+ switch (version) {
+ case protocol_v2:
+ if (!server_supports_v2("object-info"))
+ die(_("object-info capability is not enabled on the server"));
+ send_object_info_request(fd_out, args);
+ break;
+ case protocol_v1:
+ case protocol_v0:
+ die(_("unsupported protocol version. expected v2"));
+ case protocol_unknown_version:
+ BUG("unknown protocol version");
+ }
+
+ for (size_t i = 0; i < args->object_info_options->nr; i++) {
+ if (packet_reader_read(reader) != PACKET_READ_NORMAL) {
+ check_stateless_delimiter(stateless_rpc, reader,
+ "stateless delimiter expected");
+ return -1;
+ }
+
+ if (!string_list_has_string(args->object_info_options, reader->line))
+ return -1;
+
+ if (!strcmp(reader->line, "size")) {
+ size_index = i;
+ for (size_t j = 0; j < args->oids->nr; j++)
+ object_info_data[j].sizep = xcalloc(1, sizeof(*object_info_data[j].sizep));
+ }
+ }
+
+ for (size_t i = 0; packet_reader_read(reader) == PACKET_READ_NORMAL && i < args->oids->nr; i++) {
+ struct string_list object_info_values = STRING_LIST_INIT_DUP;
+
+ string_list_split(&object_info_values, reader->line, " ", -1);
+ if (0 <= size_index) {
+ if (!strcmp(object_info_values.items[1 + size_index].string, "")) {
+ FREE_AND_NULL(object_info_data[i].sizep);
+ string_list_clear(&object_info_values, 0);
+ continue;
+ }
+ if (strtoul_ul(object_info_values.items[1 + size_index].string,
+ 10, object_info_data[i].sizep))
+ die("object-info: ref %s has invalid size %s",
+ object_info_values.items[0].string,
+ object_info_values.items[1 + size_index].string);
+ }
+
+ string_list_clear(&object_info_values, 0);
+ }
+ check_stateless_delimiter(stateless_rpc, reader, "stateless delimiter expected");
+
+ return 0;
+}
diff --git a/fetch-object-info.h b/fetch-object-info.h
new file mode 100644
index 0000000000..d35284bd6b
--- /dev/null
+++ b/fetch-object-info.h
@@ -0,0 +1,22 @@
+#ifndef FETCH_OBJECT_INFO_H
+#define FETCH_OBJECT_INFO_H
+
+#include "pkt-line.h"
+#include "protocol.h"
+#include "odb.h"
+
+struct object_info_args {
+ struct string_list *object_info_options;
+ const struct string_list *server_options;
+ struct oid_array *oids;
+};
+
+/*
+ * Sends git-cat-file object-info command into the request buf and read the
+ * results from packets.
+ */
+int fetch_object_info(enum protocol_version version, struct object_info_args *args,
+ struct packet_reader *reader, struct object_info *object_info_data,
+ int stateless_rpc, int fd_out);
+
+#endif /* FETCH_OBJECT_INFO_H */
diff --git a/fetch-pack.c b/fetch-pack.c
index cdebd3476f..a86c93fc52 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1742,6 +1742,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
if (args->depth > 0 || args->deepen_since || args->deepen_not)
args->deepen = 1;
+ if (args->object_info)
+ state = FETCH_SEND_REQUEST;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
diff --git a/fetch-pack.h b/fetch-pack.h
index 6d0dec7f41..5a428f11ed 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -16,6 +16,7 @@ struct fetch_pack_args {
const struct string_list *deepen_not;
struct list_objects_filter_options filter_options;
const struct string_list *server_options;
+ struct object_info *object_info_data;
/*
* If not NULL, during packfile negotiation, fetch-pack will send "have"
@@ -43,6 +44,7 @@ struct fetch_pack_args {
unsigned reject_shallow_remote:1;
unsigned deepen:1;
unsigned refetch:1;
+ unsigned object_info:1;
/*
* Indicate that the remote of this request is a promisor remote. The
diff --git a/meson.build b/meson.build
index 3247697f74..145c6882eb 100644
--- a/meson.build
+++ b/meson.build
@@ -347,6 +347,7 @@ libgit_sources = [
'exec-cmd.c',
'fetch-negotiator.c',
'fetch-pack.c',
+ 'fetch-object-info.c',
'fmt-merge-msg.c',
'fsck.c',
'fsmonitor.c',
diff --git a/transport-helper.c b/transport-helper.c
index 8a71354d50..fdb0590417 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -727,8 +727,8 @@ static int fetch_refs(struct transport *transport,
/*
* If we reach here, then the server, the client, and/or the transport
- * helper does not support protocol v2. --negotiate-only requires
- * protocol v2.
+ * helper does not support protocol v2. --negotiate-only and cat-file
+ * remote-object-info require protocol v2.
*/
if (data->transport_options.acked_commits) {
warning(_("--negotiate-only requires protocol v2"));
@@ -744,6 +744,13 @@ static int fetch_refs(struct transport *transport,
free_refs(dummy);
}
+ /* fail the command explicitly to avoid further commands input. */
+ if (transport->smart_options->object_info)
+ die(_("remote-object-info requires protocol v2"));
+
+ if (!data->get_refs_list_called)
+ get_refs_list_using_list(transport, 0);
+
count = 0;
for (i = 0; i < nr_heads; i++)
if (!(to_fetch[i]->status & REF_STATUS_UPTODATE))
diff --git a/transport.c b/transport.c
index 0f5ec30247..7d3246e12b 100644
--- a/transport.c
+++ b/transport.c
@@ -9,6 +9,7 @@
#include "hook.h"
#include "pkt-line.h"
#include "fetch-pack.h"
+#include "fetch-object-info.h"
#include "remote.h"
#include "connect.h"
#include "send-pack.h"
@@ -467,8 +468,33 @@ static int fetch_refs_via_pack(struct transport *transport,
args.negotiation_restrict_tips = data->options.negotiation_restrict_tips;
args.negotiation_include_tips = data->options.negotiation_include_tips;
args.reject_shallow_remote = transport->smart_options->reject_shallow;
+ args.object_info = transport->smart_options->object_info;
+
+ if (transport->smart_options->object_info
+ && transport->smart_options->object_info_oids->nr > 0) {
+ struct packet_reader reader;
+ struct object_info_args obj_info_args = { 0 };
+
+ obj_info_args.server_options = transport->server_options;
+ obj_info_args.oids = transport->smart_options->object_info_oids;
+ obj_info_args.object_info_options = transport->smart_options->object_info_options;
+ string_list_sort(obj_info_args.object_info_options);
+
+ connect_setup(transport, 0);
+ packet_reader_init(&reader, data->fd[0], NULL, 0,
+ PACKET_READ_CHOMP_NEWLINE |
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_DIE_ON_ERR_PACKET);
+
+ data->version = discover_version(&reader);
+ transport->hash_algo = reader.hash_algo;
+
+ ret = fetch_object_info(data->version, &obj_info_args, &reader,
+ data->options.object_info_data, transport->stateless_rpc,
+ data->fd[1]);
+ goto cleanup;
- if (!data->finished_handshake) {
+ } else if (!data->finished_handshake) {
int i;
int must_list_refs = 0;
for (i = 0; i < nr_heads; i++) {
diff --git a/transport.h b/transport.h
index 7e5867cffa..bd60b10af4 100644
--- a/transport.h
+++ b/transport.h
@@ -6,6 +6,7 @@
#include "list-objects-filter-options.h"
#include "string-list.h"
#include "connect.h"
+#include "odb.h"
struct git_transport_options {
unsigned thin : 1;
@@ -31,6 +32,12 @@ struct git_transport_options {
*/
unsigned connectivity_checked:1;
+ /*
+ * Transport will attempt to retrieve only object-info.
+ * If object-info is not supported, the operation will error and exit.
+ */
+ unsigned object_info : 1;
+
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
@@ -55,6 +62,10 @@ struct git_transport_options {
* common commits to this oidset instead of fetching any packfiles.
*/
struct oidset *acked_commits;
+
+ struct oid_array *object_info_oids;
+ struct object_info *object_info_data;
+ struct string_list *object_info_options;
};
enum transport_family {
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 08/12] serve: advertise object-info feature
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Calvin Wan, Jonathan Tan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Calvin Wan <calvinwan@google.com>
In order for a client to know what object-info components a server can
provide, advertise supported object-info features. This will allow a
client to decide whether to query the server for object-info or fetch
as a fallback.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
serve.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/serve.c b/serve.c
index 49a6e39b1d..2b07d922b3 100644
--- a/serve.c
+++ b/serve.c
@@ -89,7 +89,7 @@ static void session_id_receive(struct repository *r UNUSED,
trace2_data_string("transfer", NULL, "client-sid", client_sid);
}
-static int object_info_advertise(struct repository *r, struct strbuf *value UNUSED)
+static int object_info_advertise(struct repository *r, struct strbuf *value)
{
if (advertise_object_info == -1 &&
repo_config_get_bool(r, "transfer.advertiseobjectinfo",
@@ -97,6 +97,9 @@ static int object_info_advertise(struct repository *r, struct strbuf *value UNUS
/* disabled by default */
advertise_object_info = 0;
}
+ /* Currently only size is supported */
+ if (value && advertise_object_info)
+ strbuf_addstr(value, "size");
return advertise_object_info;
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 07/12] fetch-pack: move fetch initialization
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Calvin Wan, Jonathan Tan, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Calvin Wan <calvinwan@google.com>
There are some variables initialized at the start of the
do_fetch_pack_v2() state machine. Currently, they are initialized
in FETCH_CHECK_LOCAL, which is the initial state set at the beginning
of the function.
However, a subsequent patch will allow for another initial state,
while still requiring these initialized variables.
Move the initialization to be before the state machine,
so that they are set regardless of the initial state.
Note that there is no change in behavior, because we're moving code
from the beginning of the first state to just before the execution of
the state machine.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
fetch-pack.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 3d32114907..cdebd3476f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1736,18 +1736,18 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
reader.me = "fetch-pack";
}
+ /* v2 supports these by default */
+ allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
+ use_sideband = 2;
+ if (args->depth > 0 || args->deepen_since || args->deepen_not)
+ args->deepen = 1;
+
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
- /* v2 supports these by default */
- allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
- use_sideband = 2;
- if (args->depth > 0 || args->deepen_since || args->deepen_not)
- args->deepen = 1;
-
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater, Jonathan Tan, Calvin Wan
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
Refactor `write_fetch_command_and_capabilities()`, enabling it to serve
both fetch and additional commands.
In this context, "command" refers to the "operations" supported by
Git's wire protocol https://git-scm.com/docs/protocol-v2, such as a Git
subcommand (e.g., git-fetch(1)) or a server-side operation like
"object-info" as implemented in commit a2ba162
(object-info: support for retrieving object info, 2021-04-20).
Refactor the function signature to accept a command instead of the
hardcoded "fetch".
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
connect.c | 10 +++++-----
connect.h | 8 ++++++--
fetch-pack.c | 4 ++--
3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/connect.c b/connect.c
index 1dced8e632..78c69d4485 100644
--- a/connect.c
+++ b/connect.c
@@ -700,16 +700,16 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
+void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
+ const struct string_list *server_options)
{
const char *hash_name;
int advertise_sid;
repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
- ensure_server_supports_v2("fetch");
- packet_buf_write(req_buf, "command=fetch");
+ ensure_server_supports_v2(command);
+ packet_buf_write(req_buf, "command=%s", command);
if (server_supports_v2("agent"))
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
if (advertise_sid && server_supports_v2("session-id"))
@@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
die(_("mismatched algorithms: client %s; server %s"),
the_hash_algo->name, hash_name);
packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
- } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
+ } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
die(_("the server does not support algorithm '%s'"),
the_hash_algo->name);
}
diff --git a/connect.h b/connect.h
index c4f6ea4b0a..8f4c523892 100644
--- a/connect.h
+++ b/connect.h
@@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
struct packet_reader *reader,
const char *error);
+/*
+ * Writes a command along with the requested server capabilities/features into a
+ * request buffer.
+ */
struct string_list;
-void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options);
+void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
+ const struct string_list *server_options);
#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 4a8a70b5f3..3d32114907 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1387,7 +1387,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
int done_sent = 0;
struct strbuf req_buf = STRBUF_INIT;
- write_fetch_command_and_capabilities(&req_buf, args->server_options);
+ write_command_and_capabilities(&req_buf, "fetch", args->server_options);
if (args->use_thin_pack)
packet_buf_write(&req_buf, "thin-pack");
@@ -2255,7 +2255,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_restrict_tips,
the_repository, "%d",
negotiation_round);
strbuf_reset(&req_buf);
- write_fetch_command_and_capabilities(&req_buf, server_options);
+ write_command_and_capabilities(&req_buf, "fetch", server_options);
packet_buf_write(&req_buf, "wait-for-done");
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 05/12] fetch-pack: move function to connect.c
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater, Jonathan Tan, Calvin Wan
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
write_fetch_command_and_capabilities will be refactored in a subsequent
commit where it will become a more general-purpose function, making it
more accessible to additional commands in the future.
To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
`advertise_sid` was a static variable, modified using
`repo_config_get_bool()`.
In `connect.c`, we now initialize `advertise_sid` at the begining by
directly using `repo_config_get_bool()`. This change is safe because:
In the original `fetch-pack.c` code, there are only two places that write
`advertise_sid`:
1. In function `do_fetch_pack()`:
if (!sever_supports("session_id"))
advertise_sid = 0;
2. In function `fetch_pack_config()`:
repo_config_get_bool("transfer.advertisesid", &advertise_sid);
About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
assignment can be ignored, as `write_fetch_command_and_capabilities()`
is only used in v2.
About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
dependency of `connect.c`, so we can reuse it directly.
Move `write_fetch_command_and_capabilities()` to `connect.c`
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
connect.c | 34 ++++++++++++++++++++++++++++++++++
connect.h | 4 ++++
fetch-pack.c | 31 -------------------------------
3 files changed, 38 insertions(+), 31 deletions(-)
diff --git a/connect.c b/connect.c
index 47e39d2a73..1dced8e632 100644
--- a/connect.c
+++ b/connect.c
@@ -700,6 +700,40 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
+void write_fetch_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options)
+{
+ const char *hash_name;
+ int advertise_sid;
+
+ repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
+
+ ensure_server_supports_v2("fetch");
+ packet_buf_write(req_buf, "command=fetch");
+ if (server_supports_v2("agent"))
+ packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
+ if (advertise_sid && server_supports_v2("session-id"))
+ packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
+ if (server_options && server_options->nr) {
+ ensure_server_supports_v2("server-option");
+ for (size_t i = 0; i < server_options->nr; i++)
+ packet_buf_write(req_buf, "server-option=%s",
+ server_options->items[i].string);
+ }
+
+ if (server_feature_v2("object-format", &hash_name)) {
+ const unsigned int hash_algo = hash_algo_by_name(hash_name);
+ if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
+ die(_("mismatched algorithms: client %s; server %s"),
+ the_hash_algo->name, hash_name);
+ packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
+ } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
+ die(_("the server does not support algorithm '%s'"),
+ the_hash_algo->name);
+ }
+ packet_buf_delim(req_buf);
+}
+
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
diff --git a/connect.h b/connect.h
index aa482a37fb..c4f6ea4b0a 100644
--- a/connect.h
+++ b/connect.h
@@ -34,4 +34,8 @@ void check_stateless_delimiter(int stateless_rpc,
struct packet_reader *reader,
const char *error);
+struct string_list;
+void write_fetch_command_and_capabilities(struct strbuf *req_buf,
+ const struct string_list *server_options);
+
#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index f13951d154..4a8a70b5f3 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1376,37 +1376,6 @@ static int add_haves(struct fetch_negotiator *negotiator,
return haves_added;
}
-static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
- const struct string_list *server_options)
-{
- const char *hash_name;
-
- ensure_server_supports_v2("fetch");
- packet_buf_write(req_buf, "command=fetch");
- if (server_supports_v2("agent"))
- packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
- if (advertise_sid && server_supports_v2("session-id"))
- packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
- if (server_options && server_options->nr) {
- ensure_server_supports_v2("server-option");
- for (size_t i = 0; i < server_options->nr; i++)
- packet_buf_write(req_buf, "server-option=%s",
- server_options->items[i].string);
- }
-
- if (server_feature_v2("object-format", &hash_name)) {
- int hash_algo = hash_algo_by_name(hash_name);
- if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
- die(_("mismatched algorithms: client %s; server %s"),
- the_hash_algo->name, hash_name);
- packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
- } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
- die(_("the server does not support algorithm '%s'"),
- the_hash_algo->name);
- }
- packet_buf_delim(req_buf);
-}
-
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
struct fetch_pack_args *args,
const struct ref *wants, struct oidset *common,
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 04/12] t1006: split test utility functions into new "lib-cat-file.sh"
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
This refactor extracts utility functions from the cat-file's test
script "t1006-cat-file.sh" into a new "lib-cat-file.sh" dedicated
library file. The goal is to improve code reuse and readability,
enabling future tests to leverage these utilities without duplicating
code.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
t/lib-cat-file.sh | 16 ++++++++++++++++
t/t1006-cat-file.sh | 13 +------------
2 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/t/lib-cat-file.sh b/t/lib-cat-file.sh
new file mode 100644
index 0000000000..44af232d74
--- /dev/null
+++ b/t/lib-cat-file.sh
@@ -0,0 +1,16 @@
+# Library of git-cat-file related test functions.
+
+# Print a string without a trailing newline.
+echo_without_newline () {
+ printf '%s' "$*"
+}
+
+# Print a string without newlines and replace them with a NULL character (\0).
+echo_without_newline_nul () {
+ echo_without_newline "$@" | tr '\n' '\0'
+}
+
+# Calculate the length of a string.
+strlen () {
+ echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
+}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8e2c52652c..8360f3bbd9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -4,6 +4,7 @@ test_description='git cat-file'
. ./test-lib.sh
. "$TEST_DIRECTORY/lib-loose.sh"
+. "$TEST_DIRECTORY"/lib-cat-file.sh
test_cmdmode_usage () {
test_expect_code 129 "$@" 2>err &&
@@ -99,18 +100,6 @@ do
'
done
-echo_without_newline () {
- printf '%s' "$*"
-}
-
-echo_without_newline_nul () {
- echo_without_newline "$@" | tr '\n' '\0'
-}
-
-strlen () {
- echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
-}
-
run_tests () {
type=$1
object_name="$2"
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 03/12] cat-file: declare loop counter inside for()
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
Some code used in this series declares variable i and only uses it
in a for loop, not in any other logic outside the loop.
Change the declaration of i to be inside the for loop for readability.
While at it, we also change its type from "int" to "size_t" where the
latter makes more sense.
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
builtin/cat-file.c | 13 ++++---------
fetch-pack.c | 3 +--
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 446d649904..fab55c11de 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -723,14 +723,12 @@ static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
struct queued_cmd *cmd,
- int nr)
+ size_t nr)
{
- int i;
-
if (!opt->buffer_output)
die(_("flush is only for --buffer mode"));
- for (i = 0; i < nr; i++)
+ for (size_t i = 0; i < nr; i++)
cmd[i].fn(opt, cmd[i].line, output, data);
fflush(stdout);
@@ -738,9 +736,7 @@ static void dispatch_calls(struct batch_options *opt,
static void free_cmds(struct queued_cmd *cmd, size_t *nr)
{
- size_t i;
-
- for (i = 0; i < *nr; i++)
+ for (size_t i = 0; i < *nr; i++)
FREE_AND_NULL(cmd[i].line);
*nr = 0;
@@ -767,7 +763,6 @@ static void batch_objects_command(struct batch_options *opt,
size_t alloc = 0, nr = 0;
while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
- int i;
const struct parse_cmd *cmd = NULL;
const char *p = NULL, *cmd_end;
struct queued_cmd call = {0};
@@ -777,7 +772,7 @@ static void batch_objects_command(struct batch_options *opt,
if (isspace(*input.buf))
die(_("whitespace before command: '%s'"), input.buf);
- for (i = 0; i < ARRAY_SIZE(commands); i++) {
+ for (size_t i = 0; i < ARRAY_SIZE(commands); i++) {
if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
continue;
diff --git a/fetch-pack.c b/fetch-pack.c
index 120e01f3cf..f13951d154 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1388,9 +1388,8 @@ static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
if (advertise_sid && server_supports_v2("session-id"))
packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
if (server_options && server_options->nr) {
- int i;
ensure_server_supports_v2("server-option");
- for (i = 0; i < server_options->nr; i++)
+ for (size_t i = 0; i < server_options->nr; i++)
packet_buf_write(req_buf, "server-option=%s",
server_options->items[i].string);
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 02/12] git-compat-util: add strtoul_ul() with error handling
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
From: Eric Ju <eric.peijian@gmail.com>
We already have strtoul_ui() and similar functions that provide proper
error handling using strtoul from the standard library. However,
there isn't currently a variant that returns an unsigned long.
This variant is needed in a subsequent commit.
This variant is needed in a subsequent commit to enable returning an
unsigned long with proper error handling.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
git-compat-util.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/git-compat-util.h b/git-compat-util.h
index 8809776407..4bf569f35c 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -975,6 +975,26 @@ static inline int strtoul_ui(char const *s, int base, unsigned int *result)
return 0;
}
+/*
+ * Convert a string to an unsigned long using the standard library's strtoul,
+ * with additional error handling to ensure robustness.
+ */
+static inline int strtoul_ul(char const *s, int base, unsigned long *result)
+{
+ unsigned long ul;
+ char *p;
+
+ errno = 0;
+ /* negative values would be accepted by strtoul */
+ if (strchr(s, '-'))
+ return -1;
+ ul = strtoul(s, &p, base);
+ if (errno || *p || p == s)
+ return -1;
+ *result = ul;
+ return 0;
+}
+
static inline int strtol_i(char const *s, int base, int *result)
{
long ul;
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 01/12] transport-helper: fix memory leak of helper on disconnect
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260619-ps-eric-work-rebase-v13-0-3d4c7315d2f8@gmail.com>
disconnect_helper() only frees data inside of the if(data->helper)
block [1]. When the transport is disconnected without the helper
being fully started, data->name allocated in transport_helper_init()
is never freed.
Move FREE_AND_NULL(data->name) outside the conditional block so it's
always freed on disconnect.
[1]: https://lore.kernel.org/git/05fbadbae2184479c87c37675dde7bd79b3e32ab.1716465556.git.ps@pks.im/
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
transport-helper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/transport-helper.c b/transport-helper.c
index 0fa0eb2d72..8a71354d50 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -266,9 +266,9 @@ static int disconnect_helper(struct transport *transport)
close(data->helper->out);
fclose(data->out);
res = finish_command(data->helper);
- FREE_AND_NULL(data->name);
FREE_AND_NULL(data->helper);
}
+ FREE_AND_NULL(data->name);
return res;
}
--
2.54.0
^ permalink raw reply related
* [PATCH GSoC RFC v13 00/12] cat-file: add remote-object-info to batch-command
From: Pablo Sabater @ 2026-06-19 14:56 UTC (permalink / raw)
To: gitster
Cc: peff, eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519, Pablo Sabater
In-Reply-To: <20260608-ps-eric-work-rebase-v12-0-5338b766e658@gmail.com>
This path series is a continuation of Eric Ju's (eric.peijian@gmail.com) and
Calvin Wan's (calvinwan@google.com) patch series [1] and [2] respectively.
Sometimes it is beneficial to retrieve information about an object without
having to download it completely. The server logic for retrieving size has
already been implemented and merged in "a2ba162cda (object-info: support for
retrieving object info, 2021-04-20)"[3]. This patch series implement the client
option for it.
Eric's series adds the `remote-object-info` command to
`cat-file --batch-command`. This command allows the client to make an
object-info command request to a server that supports protocol v2.
If the server uses protocol v2 but does not support the object-info capability,
`cat-file --batch-command` will die.
If a user attempts to use `remote-object-info` with protocol v1,
`cat-file --batch-command` will die.
Currently, only the size (%(objectsize)) is supported end to end in this
implementation. The type (%(objecttype)) is known by the client's allow-list
and request path but is not supported on the server side nor the response
parsing. A follow up series will add full end-to-end support for %(objecttype).
The default format for remote-object-info is set to %(objectname) %(objectsize).
Once %(objecttype) is supported, the default format will be unified accordingly.
If the batch command format includes unsupported fields such as %(objecttype),
%(objectsize:disk), or %(deltabase), the command will return empty strings for
each unsupported field.
This series completes Eric's work mainly with the refactor of the validation
of the placeholder with an allow-list that filters what the client asks with
what the server is capable of provide following Jeff King's idea [4].
I have a question for the design:
1. If the format includes unsupported fields such as %(objecttype) or
%(deltabase) it currently returns an empty string for each unsupported
field, this follows what for-each-ref does with known but inapplicable
atoms. However future placeholders that will be implemented: %(rest),
%(objectmode) can return empty strings. How should we differentiate
"unsupported" vs "no data".
Eric proposed to use a placeholder like "???" [5].
Should a placeholder be used?
2. _tangent/not related with this series_
'a2ba162cda' is designed to only work with full OIDs, which is
inconsistent with local `info` that does support short OIDs and in
case of being ambiguous returns a list of what possibly the user meant.
Because V2 protocol is thought to be stateless supporting short OIDs
could become more inconsistent with other remote commands that do not
support short OIDs. Maybe a --pick-first option? That does accept
short oids and picks the first match.
Alternatively, would sending a list of possible OIDs to the client so
it can re-request with the correct one be ok?
[1]: https://lore.kernel.org/git/20250221190451.12536-1-eric.peijian@gmail.com/
[2]: https://lore.kernel.org/git/20220728230210.2952731-1-calvinwan@google.com/#t
[3]: https://git.kernel.org/pub/scm/git/git.git/commit/?id=a2ba162cda2acc171c3e36acbbc854792b093cb7
[4]: https://lore.kernel.org/git/20250313060250.GH94015@coredump.intra.peff.net/
[5]: https://lore.kernel.org/git/CAN2LT1D3d=yMYVhBjpj5PvyjfTVjwqcFPNViuCJ=f49YbCZuJg@mail.gmail.com/
Changes since v12:
- Remote-object-info no longer dies when the server doesn't recognize
the object, printing "<oid> missing" like `info` does.
- On 12th commit explicitly cast to int and add a comment explaining why
the backward iteration of the list.
- Renamed 3rd commit and in the commit, change the signature of
dispatch_calls() as it is only called with size_t instead of ints.
- Because remote-object-info does not support short oids add a check to
improve the error report if the oid passed is valid but not long
enough or if it is an invalid oid.
- Fixed overly long lines.
- Reworded 4th commit.
- Avoid unnecessary request to the server when no placeholder is supported.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Calvin Wan (3):
fetch-pack: move fetch initialization
serve: advertise object-info feature
transport: add client support for object-info
Eric Ju (4):
git-compat-util: add strtoul_ul() with error handling
cat-file: declare loop counter inside for()
t1006: split test utility functions into new "lib-cat-file.sh"
cat-file: add remote-object-info to batch-command
Pablo Sabater (5):
transport-helper: fix memory leak of helper on disconnect
fetch-pack: move function to connect.c
connect: refactor packet writing
cat-file: validate remote atoms with allow_list
cat-file: make remote-object-info allow-list dynamic
Documentation/git-cat-file.adoc | 25 +-
Makefile | 1 +
builtin/cat-file.c | 221 ++++++++++-
connect.c | 34 ++
connect.h | 8 +
fetch-object-info.c | 106 +++++
fetch-object-info.h | 22 ++
fetch-pack.c | 51 +--
fetch-pack.h | 2 +
git-compat-util.h | 20 +
meson.build | 1 +
object-file.c | 10 +
odb.h | 3 +
serve.c | 5 +-
t/lib-cat-file.sh | 16 +
t/meson.build | 1 +
t/t1006-cat-file.sh | 13 +-
t/t1017-cat-file-remote-object-info.sh | 699 +++++++++++++++++++++++++++++++++
transport-helper.c | 13 +-
transport.c | 28 +-
transport.h | 11 +
21 files changed, 1215 insertions(+), 75 deletions(-)
---
base-commit: 4621f8ce5e9b97aa2e8d0d9ffe9d25df2471074d
change-id: 20260608-ps-eric-work-rebase-b73ae84ba671
Best regards,
--
Pablo Sabater <pabloosabaterr@gmail.com>
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:52 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <131d7ad3-7791-4d6f-bdf3-afa6b0831a71@gmail.com>
On Fri, Jun 19, 2026 at 10:40:51AM -0400, Derrick Stolee wrote:
> > [...]
> > , which gives us:
> >
> > Test HEAD^ HEAD
> > ----------------------------------------------------------------------------------------
> > 5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
> > 5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
> >
> > (eliding other tests). I considered whether there are other interesting
> > tests, but I think "repack" is the right layer to run perf tests, since
> > you're always writing a closed pack. We could try different subsets of
> > the repository's objects (which would also have to be closed), but I
> > don't think this is that interesting.
>
> This sort of thing does help to show that we're getting different
> behavior when repacking with and without --path-walk. And this test
> is showing the slightest change for git.git, but is likely more
> impactful for the other repos I've used to demonstrate the benefits.
>
> So this is the kind of data I'm hoping to see, but also with data
> from other repos whose data shapes benefit from --path-walk more
> than git.git and repos where name-hash v1 is sufficient to give a
> similar result.
I'm glad this is the sort of data you're looking for. I'm happy to run
this on other repositories.
> I'd also like to see if the repack _time_ changes with this, but
> these direct size comparisons are the biggest indicator I'd like to
> see.
Unfortunately a timing comparison is kind of a pain here. We'd have to
use test_perf, which will perform the same repack multiple times. We
could do that, though it's wasteful, and changes like bf4a60874af
(p5326: generate pack bitmaps before writing the MIDX bitmap,
2021-09-17) move us in the opposite direction.
I'm not opposed to changing this to test_perf if you feel strongly about
it.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-19 14:46 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ec45260a-1d4e-49d1-9aa8-9ec94ecd9b23@gmail.com>
On Fri, Jun 19, 2026 at 10:36:54AM -0400, Derrick Stolee wrote:
> On 6/19/2026 10:16 AM, Taylor Blau wrote:
> > On Fri, Jun 12, 2026 at 09:03:41AM -0400, Derrick Stolee wrote:
> >> On 6/2/2026 6:21 PM, Taylor Blau wrote:
> >>
> >>> As a result, we can see significantly reduced pack sizes from p5311
> >>> before this commit:
> >>
> >> I mentioned this before, but the pack _sizes_ aren't changing in this
> >> example. We are computing them more quickly, though.
> >
> > Thanks for pointing this out. The paragraph following the perf output
> > below correctly explains the results ("We get the same size of output
> > pack, but [...]"), but this one is obviously wrong.
> >
> >> Since we are testing --path-walk on both sides, the change across this
> >> commit is that we are using the bitmaps for the "counting objects" phase
> >> and then potentially using the --path-walk algorithm to construct the
> >> packfile.
> >
> > I'm not sure I agree here. Because we are using bitmaps, we're relying
> > on pack-reuse to construct the output pack, not --path-walk. I mentioned
> > in git-pack-objects(1), but the combination of seeing "--path-walk" and
> > "--use-bitmap-index" together only means that we will use a path-walk
> > traversal as fallback if we can't get an answer by relying on bitmaps.
>
> I guess my thought was that we'd construct bitmaps when they are
> available, but how do we walk objects to get the objects for commits
> that are not represented by bitmaps?
Good question, and we use the existing bitmap traversal (or the
boundary-based one, if enabled). In that case we really want something
that is topological and not path-based, so we can terminate the walk as
soon as we run into an existing set bit, or something on the negated
side of the query.
> But you make a good point: we don't need to do that for functional
> use: the bitmap code does an object walk to produce a bitmap, and it's
> all in a layer "below" the pack-objects code.
>
> So essentially, this _isn't_ a combined approach: it's "use bitmaps if
> we can, and fall back to --path-walk if we can't" which is changing
> from our previous behavior of "--path-walk means we don't try to use
> bitmaps".
Exactly!
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Derrick Stolee @ 2026-06-19 14:40 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ajVSHvL+On9AEV+g@nand.local>
On 6/19/2026 10:28 AM, Taylor Blau wrote:
> On Fri, Jun 12, 2026 at 09:24:32AM -0400, Derrick Stolee wrote:
>> On 6/2/2026 6:21 PM, Taylor Blau wrote:
>>> When 'pack-objects' is invoked with '--path-walk', it prevents us from
>>> using reachability bitmaps.
>>
>> My earlier response focused on the _use_ of bitmaps when creating a
>> packfile, but your patch also enables _writing_ bitmaps with the
>> --path-walk option, which is significant and potentially more
>> interesting from my perspective: we have evidence that --path-walk
>> can produce significantly smaller packfiles than the standard
>> algorithm, and once those packfiles are created we can benefit from
>> that size in later packfile creation steps by reusing those deltas.
>
> I am perhaps splitting hairs here, but I would frame the use of bitmaps
> when reading with "--path-walk" as "either/or" not "both/and". The main
> goal of this patch is to enable us to still generate bitmaps when
> *writing* a pack with "--path-walk".
Yes. I was confused but your response to the earlier thread made this
more clear. I'm no longer confused.
>> Even more important here is that we have demonstrated examples of repos
>> that change their packfile size when using the --path-walk method. We
>> should demonstrate that the size continues to shrink with --path-walk
>> even when producing a matching .bitmap file with --write-bitmap-index.
>
> That's fair. One way to do this would be to:
>
> --- 8< ---
> diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
> index 1b115d921a1..c1aed3e2aef 100755
> --- a/t/perf/p5311-pack-bitmaps-fetch.sh
> +++ b/t/perf/p5311-pack-bitmaps-fetch.sh
> @@ -18,6 +18,10 @@ test_fetch_bitmaps () {
> git repack -ad $argv
> '
>
> + test_size "size of bitmapped pack ${argv:+($argv)}" '
> + test_file_size .git/objects/pack/pack-*.pack
> + '
> +
> # simulate a fetch from a repository that last fetched N days ago, for
> # various values of N. We do so by following the first-parent chain,
> # and assume the first entry in the chain that is N days older than the current
> --- >8 ---
>
> , which gives us:
>
> Test HEAD^ HEAD
> ----------------------------------------------------------------------------------------
> 5311.3: size of bitmapped pack 278.8M 278.8M -0.0%
> 5311.38: size of bitmapped pack (--path-walk) 278.7M 278.7M +0.0%
>
> (eliding other tests). I considered whether there are other interesting
> tests, but I think "repack" is the right layer to run perf tests, since
> you're always writing a closed pack. We could try different subsets of
> the repository's objects (which would also have to be closed), but I
> don't think this is that interesting.
This sort of thing does help to show that we're getting different
behavior when repacking with and without --path-walk. And this test
is showing the slightest change for git.git, but is likely more
impactful for the other repos I've used to demonstrate the benefits.
So this is the kind of data I'm hoping to see, but also with data
from other repos whose data shapes benefit from --path-walk more
than git.git and repos where name-hash v1 is sufficient to give a
similar result.
I'd also like to see if the repack _time_ changes with this, but
these direct size comparisons are the biggest indicator I'd like to
see.
>> The other thing that I notice here is that the bitmaps will need to
>> compute their reachable object set independently from the path-walk
>> algorithm. But I suppose that already happens separately from the
>> revision-walk approach that normally produces the packfile contents.
>
> Right. The only wrinkle here is how we handle the internal traversal's
> "--boundary" option, but see the last paragraph in the commit message
> for details on why the proposed approach is OK.
>
>> >From my perspective, the point of integrating these two things are:
>>
>> 1. Reachability bitmaps make it much faster to discover the reachable
>> set and reuse bits of existing packfiles. (Your performance table
>> demonstrates this is true.)
>>
>> 2. The --path-walk option can shrink packfile sizes by grouping
>> trees and blobs by path before those paths collide in the name-hash
>> sort. (I haven't seen evidence that this is happening.)
>>
>> With evidence of (1) and not (2), it's not clear from the data that
>> these features are integrating completely. Without looking at the
>> code, those numbers would be the same if we had instead swapped the
>> preference of "the --path-walk option disables bitmaps" to "bitmaps
>> disable --path-walk".
>
> Let me know if modifying the perf test as above (and including the
> relevant results in the commit message) would be sufficient in
> addressing your concern.
Yes, the perf test modification and data reporting is the only
missing thing at this point. You've helped me better understand the
"integration" between the features during fetches and clones.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Derrick Stolee @ 2026-06-19 14:36 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ajVPJGXuhugDcT+A@nand.local>
On 6/19/2026 10:16 AM, Taylor Blau wrote:
> On Fri, Jun 12, 2026 at 09:03:41AM -0400, Derrick Stolee wrote:
>> On 6/2/2026 6:21 PM, Taylor Blau wrote:
>>
>>> As a result, we can see significantly reduced pack sizes from p5311
>>> before this commit:
>>
>> I mentioned this before, but the pack _sizes_ aren't changing in this
>> example. We are computing them more quickly, though.
>
> Thanks for pointing this out. The paragraph following the perf output
> below correctly explains the results ("We get the same size of output
> pack, but [...]"), but this one is obviously wrong.
>
>> Since we are testing --path-walk on both sides, the change across this
>> commit is that we are using the bitmaps for the "counting objects" phase
>> and then potentially using the --path-walk algorithm to construct the
>> packfile.
>
> I'm not sure I agree here. Because we are using bitmaps, we're relying
> on pack-reuse to construct the output pack, not --path-walk. I mentioned
> in git-pack-objects(1), but the combination of seeing "--path-walk" and
> "--use-bitmap-index" together only means that we will use a path-walk
> traversal as fallback if we can't get an answer by relying on bitmaps.
I guess my thought was that we'd construct bitmaps when they are
available, but how do we walk objects to get the objects for commits
that are not represented by bitmaps?
But you make a good point: we don't need to do that for functional
use: the bitmap code does an object walk to produce a bitmap, and it's
all in a layer "below" the pack-objects code.
So essentially, this _isn't_ a combined approach: it's "use bitmaps if
we can, and fall back to --path-walk if we can't" which is changing
from our previous behavior of "--path-walk means we don't try to use
bitmaps".
Thanks,
-Stolee
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox