* Re: [GSoC Patch v5 2/4] rev-parse: use append_formatted_path() for path formatting
From: Phillip Wood @ 2026-06-16 18:26 UTC (permalink / raw)
To: K Jayatheerth
Cc: git, jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
kumarayushjha123, a3205153416
In-Reply-To: <CA+rGoLfhhRNrSReeJ1grhy+2K3BSrikTCNgGpCaGqc4fFp3Lfg@mail.gmail.com>
On 16/06/2026 18:04, K Jayatheerth wrote:
> Hi Phillip,
> Thanks for taking a look!
>
>> On 16/06/2026 05:49, K Jayatheerth wrote:
>>> The path-formatting logic in builtin/rev-parse.c is tightly coupled
>>> to that command and writes directly to stdout, making it impossible
>>> for other builtins to reuse.
>>>
>>> Extract the core algorithm into append_formatted_path() in path.c
>>> and expose a path_format enum in path.h so that any builtin can
>>> format paths consistently without duplicating logic.
>>
>> Sorry I haven't had time to look at this series recently, it is looking
>> much nicer now that we have a single enum. It would be helpful to
>> explain why we need PATH_FORMAT_DEFAULT that acts exactly like
>> PATH_FORMAT_UNMODIFIED. Looking at the next patch it seems this is still
>> a wart in the api due to rev-parse wanting needing to distinguish the
>> unmodified case from the default case.
> t);
>>> +
>>> # ifdef USE_THE_REPOSITORY_VARIABLE
>>> # include "strbuf.h"
>>> # include "repository.h"
>>
>
>
>>> int cmd_rev_parse(int argc,
>>> @@ -717,7 +661,7 @@ int cmd_rev_parse(int argc,
>>> const char *name = NULL;
>>> struct strbuf buf = STRBUF_INIT;
>>> int seen_end_of_options = 0;
>>> - enum format_type format = FORMAT_DEFAULT;
>>> + enum path_format arg_path_format = PATH_FORMAT_DEFAULT;
>>
>> This is the source of the api wart I referred to in the previous patch.
>> Could we keep the existing enums and convert them into the appropriate
>> PATH_FORMAT_* flag in print_path() above? I think we already have the
>> logic to do that in the existing code. That would mean that other users
>> of append_formatted_path() don't have to worry about the extra flag.
>>
>
> That is a much more elegant solution than the current one.
>
> For v6, I will clean this up by keeping the fallback logic
> localized within builtin/rev-parse.c and removing
> PATH_FORMAT_DEFAULT entirely from enum path_format in path.h.
>
> Instead, I'll re-introduce a small local enum (e.g., enum
> rev_parse_format) inside rev-parse.c to handle the
> command-line parsing state (tracking whether the user
> explicitly provided a flag or if we are still in a
> neutral/default state).
I think it is probably simplest to keep the existing enums and modify
print_path() to convert them to the appropriate PATH_FORMAT_*. That way
we can keep the option parsing code as is.
Thanks
Phillip
> As you said, most of the logic is already present. In
> print_path(), we will check that local tracking enum. If it’s
> set to the local default, we can map it directly to the
> path-specific def_format before invoking append_formatted_path().
> This ensures other users of the function don't have to worry
> about the extra flag.
>
> I will send out the v6 series with these fixes shortly.
>
> Regards,
> - K Jayatheerth
^ permalink raw reply
* Re: [PATCH v14 6/6] branch: add --dry-run for --prune-merged
From: Harald Nordgren @ 2026-06-16 18:28 UTC (permalink / raw)
To: Phillip Wood
Cc: Harald Nordgren via GitGitGadget, git, Kristoffer Haugsbakk,
Johannes Sixt
In-Reply-To: <7b43a0f1-32a0-40f0-8c82-d2ee78809cc2@gmail.com>
> > With --dry-run, --prune-merged prints the local branches it would
> > delete, one "Would delete branch <name>" line each, and exits
> > without touching any ref. The same filtering applies, so the output
> > is exactly the set that the real run would delete.
>
> I can see this being very useful.
Great to hear and thanks for taking the time to review this! Much appreciated!
> > static int prune_merged_branches(int argc, const char **argv,
> > - int quiet)
> > + int quiet, int dry_run)
>
> Let's not start adding multiple boolean augments - use a flags argument
> like we do for delete_branches() - if you get feedback on one patch you
> should think about whether it applies later in the series as well. The
> rest of the implementation looks good.
I'm trying to generalize all feedback, but sometimes I miss things.
Thanks for pointing it out!
Harald
^ permalink raw reply
* Re: [PATCH v14 4/6] branch: add --prune-merged <branch>
From: Harald Nordgren @ 2026-06-16 19:15 UTC (permalink / raw)
To: Phillip Wood
Cc: Harald Nordgren via GitGitGadget, git, Kristoffer Haugsbakk,
Johannes Sixt
In-Reply-To: <78b6dfdd-df61-4c44-96eb-b527cb26243c@gmail.com>
> > diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> > index 4e7deddc04..27ea1319bb 100755
> > --- a/t/t3200-branch.sh
> > +++ b/t/t3200-branch.sh
> > @@ -1809,4 +1809,205 @@ test_expect_success '--forked requires a value' '
> > test_grep "requires a value" err
> > '
> >
> > +test_expect_success '--prune-merged: setup' '
> > + test_create_repo pm-upstream &&
>
> The rest of this test would be easier to read if we did
>
> (
> cd pm-upstream &&
> ...
> )
>
> rather than prefixing every command with "-C pm-upstream"
I feel like the discussion to nest or not to nest has come up many
times in other topics as well. I don't feel strongly about either way,
but I just want to flag that if I change it now, another reviewer
might ask me to change it back later.
Should the rules be to nest inside of setup functions (and helpers?)
but not inside the actual tests?
> > + test_commit -C pm-upstream base &&
> > + git -C pm-upstream checkout -b next &&
> > + test_commit -C pm-upstream one-commit &&
> > + test_commit -C pm-upstream two-commit &&
> > + git -C pm-upstream branch one HEAD~ &&
> > + git -C pm-upstream branch two HEAD &&
> > + git -C pm-upstream branch wip main &&
> > + git -C pm-upstream checkout main &&
> > + test_create_repo pm-fork
> > +'
> > +
> > +test_expect_success '--prune-merged deletes branches integrated into upstream' '
> > + test_when_finished "rm -rf pm-merged" &&
> > + git clone pm-upstream pm-merged &&
> > + git -C pm-merged remote add fork ../pm-fork &&
> > + test_config -C pm-merged remote.pushDefault fork &&
> > + test_config -C pm-merged push.default current &&
>
> So we clone upstream and add fork as the default push remote. I find the
> pm- prefixes rather distracting. It would be clearer to me if we just
> called the repositories "upstream", "fork" and "repo"
Good point.
> > + test_must_fail git -C pm-local rev-parse --verify refs/heads/one
> > +'
> > +
> > +test_expect_success '--prune-merged warns instead of erroring on un-integrated commits' '
> > + test_when_finished "rm -rf pm-unmerged" &&
> > + git clone pm-upstream pm-unmerged &&
> > + git -C pm-unmerged remote add fork ../pm-fork &&
> > + test_config -C pm-unmerged remote.pushDefault fork &&
> > + test_config -C pm-unmerged push.default current &&
> > + git -C pm-unmerged checkout -b wip origin/wip &&
> > + git -C pm-unmerged branch --set-upstream-to=origin/next wip &&
> > + test_commit -C pm-unmerged local-only &&
> > + git -C pm-unmerged checkout - &&
> > +
> > + git -C pm-unmerged branch --prune-merged "origin/*" 2>err &&
> > + test_grep "not fully merged" err &&
> > + test_grep ! "If you are sure you want to delete it" err &&
>
> I'm always suspicious of test_grep when we know what the output should
> look like - it might be better to use test_cmp. This test does not check
> that we also delete branches that are merged when we see one that isn't.
>
> I'm going to stop here - the tests I've read seem to me to be too much
> like unit tests checking one aspect of the implementation in isolation
> rather than checking that the whole feature works as expected.
I'll respond to the rest here: Excellent points regarding the testing
aboce, I will take a look at doing this.
Harald
^ permalink raw reply
* Assisted-by tag
From: Marius Spix @ 2026-06-16 19:25 UTC (permalink / raw)
To: git
Hi there,
as the Linux kernel requires the new Assisted-by tag for AI-assisted
commits, I was researching how git handles such tags. Thereby I
observed the following behaviour:
git commit --signoff
* adds an empty line before the Signed-off-by tag
* ignores the Signed-off-by tag by checking for an empty commit message
git commit --trailer "\nAssisted-by: OpenAI"
* does not add an empty line (the "\n" is not converted to a newline)
* does not ignore the tag by checking for empty commit message
Since there will be more and more AI-assisted commits in projects like
the Linux kernel in the future, this should be taken in account.
When merging or squashing commits, that tag should also be
automatically applied to the new commit message to make it clear that
the commit is tainted by AI.
Your opinion?
Best regards
Marius
^ permalink raw reply
* Re: Assisted-by tag
From: Kristoffer Haugsbakk @ 2026-06-16 19:53 UTC (permalink / raw)
To: Marius Spix, git
In-Reply-To: <20260616212553.31ddea83@rockhopper>
On Tue, Jun 16, 2026, at 21:25, Marius Spix wrote:
> as the Linux kernel requires the new Assisted-by tag for AI-assisted
> commits, I was researching how git handles such tags. Thereby I
> observed the following behaviour:
>
> git commit --signoff
> * adds an empty line before the Signed-off-by tag
> * ignores the Signed-off-by tag by checking for an empty commit message
That a bare message which is just `Signed-off-by` is considered an
“empty” commit message seems like a historical quirk. It checks
specifically for that tag/trailer.
>
> git commit --trailer "\nAssisted-by: OpenAI"
> * does not add an empty line (the "\n" is not converted to a newline)
> * does not ignore the tag by checking for empty commit message
This is just a regular trailer. I don’t know why you have a `\n`.
git commit --trailer "Assisted-by: OpenAI"
Any number of these will populate the trailer block.
> Since there will be more and more AI-assisted commits in projects like
> the Linux kernel in the future, this should be taken in account.
>
> When merging or squashing commits, that tag should also be
> automatically applied to the new commit message to make it clear that
> the commit is tainted by AI.
That `Signed-off-by` has dedicated options and logic is historical
baggage at this point.
A 2013 [patch] to add `git commit --fixes` because the Linux Kernel uses
`Fixes` was rejected because the Git project considers tags/trailers
project-specific. Instead git-interpret-trailers(1) was born which
eventually provided the code base for `git commit --trailer` and similar
options.
[patch]: https://lore.kernel.org/all/20131027013402.GA7146@leaf/
Handling how trailers are added is also deemed project-specific. There
are only the configurations and options that git-interpret-trailers(1)
provides. And handling that commits are rewritten (combined and so on)
in such a way that trailer-taint sticks is beyond any discussion I’ve
seen on the subject.
^ permalink raw reply
* Re: SHA-1/SHA-256 interoperability work is functional
From: Junio C Hamano @ 2026-06-16 20:01 UTC (permalink / raw)
To: brian m. carlson; +Cc: git
In-Reply-To: <ajCWBG9RHBrm8jMZ@fruit.crustytoothpaste.net>
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> I'm pleased to announce that I have Git fully passing the testsuite and
> CI in interoperability mode, both with SHA-256 and SHA-1 as the main
> algorithm. While this is very exciting, the work is not ready to send
> to the list and is effectively a draft, since there is still cleanup
> and efficiency work to be done.
Great to hear about a great milestone.
> Features which are currently unsupported (and which may or may not be
> supported in the future):
>
> * Filtered bundles are unsupported because there is currently no way to provide
> a mapping.
> * Multi-pack index cannot be used as the sole pack index format because it does
> not yet provide mappings.
> * Pack index v1 and v2 cannot be used because they do not provide object
> mappings. Git automatically uses pack index v3 instead when necessary, which
> does handle mappings.
> * Packfile URIs are not supported because the protocol-provided packfile is not
> complete and its objects cannot be mapped.
Not that I specifically care about packfile URI, but this one is
curious. How would regular "fetch" and "push" traffic work under
the new world order? Presumably we will keep one characteristic of
the protocol, that the packdata stream is the only thing that is
given to the other side and no object names are given, because the
receiving end would not want to blindly trust the object name the
sending end _claims_ to have sent and instead recomputes the object
name out of the packed objects in the data stream ("if we rehash
and recompute the object names from the datastream, the other side
cannot lie to us" IIRC was a security measure).
For a regular "fetch" and "push" to work, we would need to recompute
the native object names and also somehow compute the compatibility
object names if we are in interoperability mode, no?
If we download *.pack files from a packfile URI, wouldn't it be the
same story?
> * Large object promisors cannot be used if the server does not actually have
> the entire history, since the server must have a complete history in order to
> provide object mappings.
Again, this one worries me a bit, but perhaps I am not reading it
correctly. Does this mean that the server side says "this is the
data for object whose name is X in the SHA-1 world, which translates
to X256 in the SHA-256 world", the receiving end blindly trusts
without having a way to verify?
> There is new documentation in `Documentation/gitformat-hash.adoc` that
> outlines the requirements for using the protocol. The protocol
> restrictions described there are hard technical limitations that cannot
> be avoided; I've intentionally made things as featureful as they can be.
> This imposes real restrictions on using protocol interoperability with many
> projects, including Git and Linux[0]. Interested parties may wish to look
> at t1017 to see what's tested vis-à-vis the protocol and
> interoperability.
> ...
> If you're interested in testing or perusing the work, you may get it
> from the `sha256-interop` branch of https://github.com/bk2204/git.git.
> Please note that it may be rebased, rewound, or otherwise folded,
> spindled, or mutilated at any time.
Sounds exciting.
Thanks.
^ permalink raw reply
* Re: [PATCH 2/6] SubmittingPatches: discuss non-ident trailers
From: Kristoffer Haugsbakk @ 2026-06-16 20:02 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
In-Reply-To: <aivvE6gVMGWhRbCB@pks.im>
On Fri, Jun 12, 2026, at 13:35, Patrick Steinhardt wrote:
> On Thu, Jun 11, 2026 at 12:22:45AM +0200,
> kristofferhaugsbakk@fastmail.com wrote:
>> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
>> index 0b12badf86d..51c308a89a8 100644
>> --- a/Documentation/SubmittingPatches
>> +++ b/Documentation/SubmittingPatches
>> @@ -474,7 +474,10 @@ These are the common trailers in use:
>>
>> While you can also create your own trailer if the situation warrants it, we
>> encourage you to instead use one of the common trailers in this project
>> -highlighted above.
>> +highlighted above. A trailer that credits someone might be more likely
>> +to be accepted since these are the most common ones. But another kind of
>> +trailer might be relevant, for example to link to an issue tracker
>> +belonging to a downstream project that is affected by a bug in Git.
>
> Hm, I wonder whether this is a bit too vague to really be helpful for a
> newcomer. Instead of alluding to such trailers, wouldn't it be
> preferable if we added those as actual examples to the list of known
> trailers and then tell folks that they can invent their own ones if
> there is a good reason to do so?
Honestly there are so few non-ident trailers that I don’t think they can
be listed as common trailers:
1. The Git project doesn’t need them (e.g. no bug tracker)
2. They seem mostly for use by other projects (bug trackers again)
With this list:
git log --format='%(trailers:only,keyonly)' | sort | uniq
If you filter out the ident-looking ones:
grep -v --extended-regexp -- '-[Bb]y$'
There are few left. And some can be discarded:
• Change-Id
• Message-ID
• Fixes (pointing to a commit)
So to address your point:
1. Maybe this is so niche that it is not worth mentioning; or
2. Maybe give a concrete example like `Closes: <bug link>`?
^ permalink raw reply
* Re: [PATCH 4/6] SubmittingPatches: document Based-on-patch-by trailer
From: Kristoffer Haugsbakk @ 2026-06-16 20:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqse6tnho1.fsf@gitster.g>
On Thu, Jun 11, 2026, at 18:52, Junio C Hamano wrote:
> kristofferhaugsbakk@fastmail.com writes:
>
>> +. `Based-on-patch-by:` can be used when someone else authored parts of
>> + the patch that you are submitting. This might be relevant if someone
>> + sent a patch to the mailing list without a commit message or a
>> + `Signed-off-by:` and you have picked it up.
>
> Hmph, this seems to encourage pick up material that come outside of
> the usual DCO process, which should not be the intention of this
> document.
Oh, I have misread the room on this subject. It would be better to drop
the mention of signoff here.
>
> Unless the changes are trivial enough to not be copyrightable, it
> may be better to say "... if someone submitted a preliminary patch or
> a detailed code snippet with their sign-off", plus encourage asking
> the original author to sign-off if it initially came without, or
> something like that?
Okay, since they provided something concrete to copy (cf. Helped-by
where they did not provide precise changes in patch form, according to
the below context), it’s best to mention that signoff is relevant here.
>
>> . `Helped-by:` is used to credit someone who suggested ideas for
>> changes without providing the precise changes in patch form.
>> . `Mentored-by:` is used to credit someone with helping develop a
^ permalink raw reply
* Re: [PATCH 1/6] SubmittingPatches: encourage trailer use for substantial help
From: Kristoffer Haugsbakk @ 2026-06-16 20:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq1pedowl2.fsf@gitster.g>
On Thu, Jun 11, 2026, at 18:44, Junio C Hamano wrote:
> kristofferhaugsbakk@fastmail.com writes:
>
>> Let’s replace “If you like” with outright encouragment in this section
>
> "encouragement"?
Yep.
>
>> At the same, it is important to temper this recommendation to a sign-
>> ificant enough contribution; in my experience beginners can be eager
>
> "At the same time"?
Yep.
>
> It is a bit unusual to see a long word split at the end of a line
> to line-wrap in our documentation and commit log messages.
A bit unusual is an understatement. I cannot find any other commit log
message writers that have split a word on a syllable. All linebreaks
that I’ve found are on existing hyphens. Like
... multi-pack-
indexes
I’ll avoid this in the future.
>
>> ---
>> Documentation/SubmittingPatches | 14 +++++++++++---
>> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> The patch text itself looks great. Thanks.
Thanks for going over these.
^ permalink raw reply
* Re: [PATCH 6/6] SubmittingPatches: note that trailer order matters
From: Kristoffer Haugsbakk @ 2026-06-16 20:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq8q8mt4eo.fsf@gitster.g>
On Thu, Jun 11, 2026, at 00:30, Junio C Hamano wrote:
>>[snip]
>> Only capitalize the very first letter of the trailer, i.e. favor
>> `Signed-off-by:` over `Signed-Off-By:` and `Acked-by:` over `Acked-By:`.
>>
>> +Note that these trailers should come before your `Signed-off-by:`
>> +trailer. You are signing off to the patch as well as the message. This
>> +also makes it clear who added trailers when multiple people have signed
>> +off on a patch.
>
> Perhaps first mention the underlying rule that they are added in the
> order that helps us to understand the chronological order of events.
> That would avoid giving a wrong impression that the nature of each
> trailer keys determine the order of these lines.
You’re right. It’s best to lead with the time-based order. That
naturally leads into the implication that we can easily read what
trailers that any given person added.
I’ve seen discussions in the past about whether trailers ought to be
sorted by *kind*, and maybe if first e.g. you should first have
Tested-by, then Reviewed-by, then Signed-off-by... best to not give such
an impression by accident.
Thanks.
^ permalink raw reply
* Re: [PATCH v3 02/11] doc: interpret-trailers: replace “lines” with “metadata”
From: Kristoffer Haugsbakk @ 2026-06-16 20:32 UTC (permalink / raw)
To: Matt Hunter, git; +Cc: Christian Couder, jackmanb, Linus Arver, D. Ben Knoble
In-Reply-To: <DJ5W2I8UYXAA.3O4JQUHFMKP5X@lfurio.us>
On Thu, Jun 11, 2026, at 05:10, Matt Hunter wrote:
> On Wed Jun 10, 2026 at 5:21 PM EDT, kristofferhaugsbakk wrote:
>>[snip]
>> DESCRIPTION
>> -----------
>> -Add or parse _trailer_ lines at the end of the otherwise
>> +Add or parse trailers metadata at the end of the otherwise
>
> fwiw, I think "trailer metadata" reads more naturally.
You’re right that your version reads more naturally. I went back and
forth on this.
1. We’re introducing the jargon, and the format is often discussed as
plural “trailers”, with its constituent parts being singular
“trailer”
2. What this replaces uses “trailer”, but it rescues the plural mood
with “lines”
3. This is very soon going to go into the constituent parts, including
each trailer, so we’re contrasting the concept name (trailers) with
its parts
But I think your version is overall better. It reads better and there is
no way to confuse “trailer metadata” (trailers as a collection) with
just a single “trailer”.
^ permalink raw reply
* Re: [PATCH GSoC RFC v12 09/12] transport: add client support for object-info
From: Junio C Hamano @ 2026-06-16 20:35 UTC (permalink / raw)
To: Pablo Sabater
Cc: eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519
In-Reply-To: <20260608-ps-eric-work-rebase-v12-9-5338b766e658@gmail.com>
Pablo Sabater <pabloosabaterr@gmail.com> writes:
[jc: removed recipients from Cc: list whose addresses bounce]
> From: Calvin Wan <calvinwan@google.com>
>
> Sometimes, it is beneficial to retrieve information about an object
> without downloading it entirely. The server-side logic for this
> functionality was implemented in commit "a2ba162cda (object-info:
> ...
> diff --git a/fetch-object-info.c b/fetch-object-info.c
> ...
> +int fetch_object_info(const enum protocol_version version, struct object_info_args *args,
> + struct packet_reader *reader, struct object_info *object_info_data,
> + const int stateless_rpc, const int fd_out)
> +{
> ...
> + for (size_t i = 0; packet_reader_read(reader) == PACKET_READ_NORMAL && i < args->oids->nr; i++) {
> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
> +
> + string_list_split(&object_info_values, reader->line, " ", -1);
> + if (0 <= size_index) {
> + if (!strcmp(object_info_values.items[1 + size_index].string, ""))
> + die("object-info: server does not recognize object %s",
> + object_info_values.items[0].string);
> +
> + if (strtoul_ul(object_info_values.items[1 + size_index].string, 10, object_info_data[i].sizep))
Overly long lines need to be fixed, by using a shorter and crisper
variable name in such a short scope, and line wrapping if needed.
More importantly, on this line (wrapped):
if (strtoul_ul(object_info_values.items[1 + size_index].string,
10, object_info_data[i].sizep))
we notice object_info_data[i] is of type "struct object_info", which
is
struct object_info {
/* Request */
enum object_type *typep;
size_t *sizep;
off_t *disk_sizep;
...
but the last parameter strtoul_ul() takes is unsurprisingly a
pointer to "unsigned long", not a pointer to "size_t".
Which will break on 32-bit boxes where size_t is "unsigned int"
that is 32-bit and different from "unsigned long".
Perhaps something along this line?
diff --git a/fetch-object-info.c b/fetch-object-info.c
index 425929a269..5210e7d954 100644
--- a/fetch-object-info.c
+++ b/fetch-object-info.c
@@ -75,14 +75,17 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
string_list_split(&object_info_values, reader->line, " ", -1);
if (0 <= size_index) {
+ unsigned long sz;
if (!strcmp(object_info_values.items[1 + size_index].string, ""))
die("object-info: server does not recognize object %s",
object_info_values.items[0].string);
- if (strtoul_ul(object_info_values.items[1 + size_index].string, 10, object_info_data[i].sizep))
+ if (strtoul_ul(object_info_values.items[1 + size_index].string,
+ 10, &sz))
die("object-info: ref %s has invalid size %s",
object_info_values.items[0].string,
object_info_values.items[1 + size_index].string);
+ *object_info_data[i].sizep = sz;
}
string_list_clear(&object_info_values, 0);
^ permalink raw reply related
* Re: [PATCH v2 01/17] packfile: rename `struct packfile_store` to `odb_source_packed`
From: Justin Tobler @ 2026-06-16 21:01 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <20260609-pks-odb-source-packed-v2-1-839089132c8b@pks.im>
On 26/06/09 10:50AM, Patrick Steinhardt wrote:
> Not too long ago, we have introduced the packfile store in b7983adb51
> (packfile: introduce a new `struct packfile_store`, 2025-09-23). This
> struct is responsible for managing all of our access to packfiles and is
> used as one of the two sources of objects for the "files" source.
>
> Back when I introduced this structure I didn't have the clear vision yet
> that it will eventually also turn into a proper object database source,
> and how exactly that infrastructure will look like. Now though it's
> becoming increasingly clear that it does make sense to treat it just the
> same as any of our other ODB sources.
We already have `struct odb_source_loose` which is used in `struct
odb_source_files` which is a proper ODB source. Since it seems accessing
packed objects is moving in the same direction, the renaming here makes
sense to me.
> The consequence is that the naming is now a bit out-of-date: it's just
> another source and will be turned into a proper `struct odb_source` over
> the next couple of commits, but it's not named accordingly.
>
> Rename the structure to `odb_source_packed` to align it with this goal
> and to bring it in line with the other sources we already have.
The rest of this patch is just trivial renames and looks good to me.
-Justin
^ permalink raw reply
* Re: [PATCH v2 04/17] odb/source-packed: store pointer to "files" instead of generic source
From: Justin Tobler @ 2026-06-16 21:14 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <20260609-pks-odb-source-packed-v2-4-839089132c8b@pks.im>
On 26/06/09 10:50AM, Patrick Steinhardt wrote:
> The `struct odb_source_packed` holds a pointer to its owning parent
> source. The way that Git is currently structured, this parent is always
> the "files" source. In subsequent commits we're going to detangle that
> so that the "packed" source doesn't have any owning parent source at
> all, which makes it usable as a completely standalone source.
Out of curiousity, `struct odb_source_loose` also stores a similar to
pointer to its owning parent. Is the plan to also eventually do the same
there?
> Detangling this mess is somewhat intricate though, and is made even more
> intricate because it's not always clear which kind of source one is
> holding at a specific point in time -- either the parent "files" source,
> or the child "packed" source.
>
> Make this relationship more explicit by storing a pointer to the "files"
> source instead of storing a pointer to a generic `struct odb_source`.
> This will help make subsequent steps a bit clearer.
>
> Note that this is a temporary step, only. At the end of this series
> we will have dropped the parent pointer completely.
Ok, so IIUC the eventual goal is to get rid of the pointer entirely, but
for now we are just making its concrete type explicit without having to
downcast. It's not immediately obvious to me how this step gets us
closer to that goal, but that may become more obvious in the next
patches. :)
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb/source-files.c | 2 +-
> odb/source-packed.c | 4 ++--
> odb/source-packed.h | 4 ++--
> packfile.c | 12 ++++++------
> 4 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/odb/source-files.c b/odb/source-files.c
> index 191562f316..e04525fb08 100644
> --- a/odb/source-files.c
> +++ b/odb/source-files.c
> @@ -269,7 +269,7 @@ struct odb_source_files *odb_source_files_new(struct object_database *odb,
> CALLOC_ARRAY(files, 1);
> odb_source_init(&files->base, odb, ODB_SOURCE_FILES, path, local);
> files->loose = odb_source_loose_new(odb, path, local);
> - files->packed = odb_source_packed_new(&files->base);
> + files->packed = odb_source_packed_new(files);
>
> files->base.free = odb_source_files_free;
> files->base.close = odb_source_files_close;
> diff --git a/odb/source-packed.c b/odb/source-packed.c
> index 1e94b47ea0..12e785be48 100644
> --- a/odb/source-packed.c
> +++ b/odb/source-packed.c
> @@ -1,11 +1,11 @@
> #include "git-compat-util.h"
> #include "odb/source-packed.h"
>
> -struct odb_source_packed *odb_source_packed_new(struct odb_source *source)
> +struct odb_source_packed *odb_source_packed_new(struct odb_source_files *parent)
> {
> struct odb_source_packed *store;
> CALLOC_ARRAY(store, 1);
> - store->source = source;
> + store->files = parent;
> strmap_init(&store->packs_by_path);
> return store;
> }
> diff --git a/odb/source-packed.h b/odb/source-packed.h
> index 327be4ad65..3c2d229a17 100644
> --- a/odb/source-packed.h
> +++ b/odb/source-packed.h
> @@ -9,7 +9,7 @@
> * A store that manages packfiles for a given object database.
> */
> struct odb_source_packed {
> - struct odb_source *source;
> + struct odb_source_files *files;
The rest of this patch updates the callsites accordingly and looks
correct.
-Justin
^ permalink raw reply
* Re: [PATCH v2 2/7] patch-delta: use size_t for sizes
From: Junio C Hamano @ 2026-06-16 21:26 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget
Cc: git, Kristofer Karlsson, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <66a642c39e7755755fe388af7612ac8c9bf41a5a.1781524349.git.gitgitgadget@gmail.com>
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> Widen `patch_delta()`'s three size parameters to `size_t` and switch
> its internal use of `get_delta_hdr_size()` to the `_sz` variant.
> Then propagate the wider type through the callers.
Makes sense.
^ permalink raw reply
* Re: [PATCH v2 04/17] odb/source-packed: store pointer to "files" instead of generic source
From: Justin Tobler @ 2026-06-16 21:28 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <ajG69JZHx_u2mt7q@denethor>
On 26/06/16 04:14PM, Justin Tobler wrote:
> On 26/06/09 10:50AM, Patrick Steinhardt wrote:
> > The `struct odb_source_packed` holds a pointer to its owning parent
> > source. The way that Git is currently structured, this parent is always
> > the "files" source. In subsequent commits we're going to detangle that
> > so that the "packed" source doesn't have any owning parent source at
> > all, which makes it usable as a completely standalone source.
>
> Out of curiousity, `struct odb_source_loose` also stores a similar to
> pointer to its owning parent. Is the plan to also eventually do the same
> there?
Ok, I think I got myself mixed up. IIUC the `struct odb_source` pointer
currently stored in `struct odb_source_loose` is not to the parent
source, but to its "base". And the eventual goal here is to move towards
that same direction which makes sense.
-Justin
^ permalink raw reply
* Re: [PATCH GSoC RFC v12 09/12] transport: add client support for object-info
From: Junio C Hamano @ 2026-06-16 21:31 UTC (permalink / raw)
To: Pablo Sabater
Cc: eric.peijian, chriscool, git, jltobler, karthik.188, toon,
chandrapratap3519
In-Reply-To: <xmqq1pe62pgo.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> [jc: removed recipients from Cc: list whose addresses bounce]
>
>> From: Calvin Wan <calvinwan@google.com>
>>
>> Sometimes, it is beneficial to retrieve information about an object
>> without downloading it entirely. The server-side logic for this
>> functionality was implemented in commit "a2ba162cda (object-info:
>> ...
>> diff --git a/fetch-object-info.c b/fetch-object-info.c
>> ...
>> +int fetch_object_info(const enum protocol_version version, struct object_info_args *args,
>> + struct packet_reader *reader, struct object_info *object_info_data,
>> + const int stateless_rpc, const int fd_out)
>> +{
>> ...
>> + for (size_t i = 0; packet_reader_read(reader) == PACKET_READ_NORMAL && i < args->oids->nr; i++) {
>> + struct string_list object_info_values = STRING_LIST_INIT_DUP;
>> +
>> + string_list_split(&object_info_values, reader->line, " ", -1);
>> + if (0 <= size_index) {
>> + if (!strcmp(object_info_values.items[1 + size_index].string, ""))
>> + die("object-info: server does not recognize object %s",
>> + object_info_values.items[0].string);
>> +
>> + if (strtoul_ul(object_info_values.items[1 + size_index].string, 10, object_info_data[i].sizep))
>
>
> Overly long lines need to be fixed, by using a shorter and crisper
> variable name in such a short scope, and line wrapping if needed.
>
> More importantly, on this line (wrapped):
>
> if (strtoul_ul(object_info_values.items[1 + size_index].string,
> 10, object_info_data[i].sizep))
>
> we notice object_info_data[i] is of type "struct object_info", which
> is
>
> struct object_info {
> /* Request */
> enum object_type *typep;
> size_t *sizep;
> off_t *disk_sizep;
> ...
>
> but the last parameter strtoul_ul() takes is unsurprisingly a
> pointer to "unsigned long", not a pointer to "size_t".
>
> Which will break on 32-bit boxes where size_t is "unsigned int"
> that is 32-bit and different from "unsigned long".
>
> Perhaps something along this line?
Not quite. This "size_t *sizep" has been very recently introduced
by Dscho in a topic that is in-flight.
The ps/cat-file-remote-object-info topic alone does not have this
type-mismatch problem. Below needs to be addressed as an evil merge
at the integration side, so you have nothing to do. I'll have to
tweak the merges.
Sorry for a false alarm.
> diff --git a/fetch-object-info.c b/fetch-object-info.c
> index 425929a269..5210e7d954 100644
> --- a/fetch-object-info.c
> +++ b/fetch-object-info.c
> @@ -75,14 +75,17 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
>
> string_list_split(&object_info_values, reader->line, " ", -1);
> if (0 <= size_index) {
> + unsigned long sz;
> if (!strcmp(object_info_values.items[1 + size_index].string, ""))
> die("object-info: server does not recognize object %s",
> object_info_values.items[0].string);
>
> - if (strtoul_ul(object_info_values.items[1 + size_index].string, 10, object_info_data[i].sizep))
> + if (strtoul_ul(object_info_values.items[1 + size_index].string,
> + 10, &sz))
> die("object-info: ref %s has invalid size %s",
> object_info_values.items[0].string,
> object_info_values.items[1 + size_index].string);
> + *object_info_data[i].sizep = sz;
> }
>
> string_list_clear(&object_info_values, 0);
^ permalink raw reply
* Re: SHA-1/SHA-256 interoperability work is functional
From: brian m. carlson @ 2026-06-16 21:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqldce2r1a.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 6352 bytes --]
On 2026-06-16 at 20:01:37, Junio C Hamano wrote:
> Not that I specifically care about packfile URI, but this one is
> curious. How would regular "fetch" and "push" traffic work under
> the new world order? Presumably we will keep one characteristic of
> the protocol, that the packdata stream is the only thing that is
> given to the other side and no object names are given, because the
> receiving end would not want to blindly trust the object name the
> sending end _claims_ to have sent and instead recomputes the object
> name out of the packed objects in the data stream ("if we rehash
> and recompute the object names from the datastream, the other side
> cannot lie to us" IIRC was a security measure).
>
> For a regular "fetch" and "push" to work, we would need to recompute
> the native object names and also somehow compute the compatibility
> object names if we are in interoperability mode, no?
>
> If we download *.pack files from a packfile URI, wouldn't it be the
> same story?
Let me explain how conversion works. Say you have an empty local
repository with SHA-256 as the main algorithm and SHA-1 as the
compatibility algorithm, plus a SHA-1 remote. When you do `git fetch`,
you get a SHA-1 pack. You cannot write this into the repository because
your repository doesn't use SHA-1 as the main algorithm, so `git
index-pack` takes the pack and maps any objects. If the objects are in
the new pack, they get rewritten based on the dependencies; otherwise,
Git uses the existing maps in the repository to rewrite the objects.
`git index-pack` then writes a completely new SHA-256 pack with an index
containing the SHA-1 mapping, using the corresponding deltas[0].
However, `git index-pack` can only index and map objects for one pack at
a time. We therefore need any pack that we get to be connected to our
existing history so that we can rely on our existing maps to remap
objects that are not in the pack. For instance, if we get a commit
without its parents, then we'll simply die because those objects cannot
be mapped and we can't write the mapping in the index.
The problem is that that packfile URIs result in multiple packs (one of
which is the dynamically generated protocol pack) that, _in total_,
provide a complete history with what we have, but are not necessarily
individually connected to our existing history. Moreover, the
dynamically generated protocol pack is sent _first_, so if we have
packfile URIs, that pack is almost certainly guaranteed _not_ to connect
to our existing history. We would therefore have to pause index-pack,
download all the packfile URIs, index those packfiles (which would have
to be connected to our history), and then unpause index-pack to rewrite
the history. This is not impossible, but it's tricky, and it has yet to
be implemented. Someone may decide that this is a valuable feature and
implement it, but it's not on my to-do list.
This doesn't pose a problem with single-algorithm repositories because
if you have unreferenced and unconnected objects, no big deal. They
just don't get used and will eventually get GC'd. But since we can't
map those objects in a multi-algorithm world, that's fatal and those
packs can't be indexed.
> > * Large object promisors cannot be used if the server does not actually have
> > the entire history, since the server must have a complete history in order to
> > provide object mappings.
>
> Again, this one worries me a bit, but perhaps I am not reading it
> correctly. Does this mean that the server side says "this is the
> data for object whose name is X in the SHA-1 world, which translates
> to X256 in the SHA-256 world", the receiving end blindly trusts
> without having a way to verify?
The server provides algorithm mappings for for submodules, shallow
clones, and partial clones. For shallow clones and partial clones, you
have to trust the server anyway because you're already getting a
truncated history. If you complete the history by fetching the missing
objects and run `git fsck`, then it will detect if the mapping is
invalid because the server was dishonest and complain. You will have a
corrupt set of mappings to the compatibility algorithm, but those could
theoretically be repaired.
However, in order for the server to produce those mappings, it has to
know the entire history. If there are objects that are outside the
repository in a secondary location, the server will not have mappings
for those objects and so it will abort the protocol.
The server does not normally provide mappings for non-submodules if
you're doing a regular fetch or clone, since the client has a
self-contained history and does not need those objects to compute the
mapping. That means that regular clones and fetches work just fine
against existing servers as long as no submodules are involved[1].
The tricky part is submodules. Because the data is in a separate
repository, we cannot be certain of the mapping. The documentation says
this:
There is a potential security problem with providing mappings of
submodules over the protocol. Namely, there is no way to guarantee
that the SHA-1 object ID and the SHA-256 object ID correspond to the
same commit. This means that, for example, a malicious server could
provide a SHA-256 object ID for a submodule that was up to date with
all security fixes, but map that to a SHA-1 object ID for an older
commit with security problems.
We therefore reject submodule mappings if fsck verification for
transferred objects is enabled unless the user has explicitly enabled
submodule mappings.
[0] If A deltas against B in SHA-1, then when those are rewritten into
SHA-256, we delta the SHA-256 A against the SHA-256 B. This does not
guarantee the best possible delta, but it is much cheaper than
redeltifying and because we expect remapped objects to have the same
shape, it should delta well enough in most cases.
[1] For instance, if you build my branch, you can do
`git clone --object-format=sha256:sha1 https://github.com/bk2204/lawn.git`
and it just works since there are no submodules. My dotfiles, on the
other hand, have submodules and will not work without protocol support.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]
^ permalink raw reply
* Re: [PATCH v2 05/17] odb/source-packed: start converting to a proper `struct odb_source`
From: Justin Tobler @ 2026-06-16 21:36 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <20260609-pks-odb-source-packed-v2-5-839089132c8b@pks.im>
On 26/06/09 10:50AM, Patrick Steinhardt wrote:
> Start converting `struct odb_source_packed` into a proper pluggable
> `struct odb_source` by embedding the base struct and assigning it the
> new `ODB_SOURCE_PACKED` type. Furthermore, wire up lifecycle management
> of this source by implementing the `free` callback and taking ownership
> of the chdir notifications.
>
> Note that the packed source is not yet functional as a standalone `struct
> odb_source`, as it's missing all of the callback implementations. These
> will be wired up in subsequent commits.
Ok.
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
[snip]
> struct odb_source_packed *odb_source_packed_new(struct odb_source_files *parent)
> {
> - struct odb_source_packed *store;
> - CALLOC_ARRAY(store, 1);
> - store->files = parent;
> - strmap_init(&store->packs_by_path);
> - return store;
> + struct odb_source_packed *packed;
> +
> + CALLOC_ARRAY(packed, 1);
> + odb_source_init(&packed->base, parent->base.odb, ODB_SOURCE_PACKED,
> + parent->base.path, parent->base.local);
> + packed->files = parent;
> + strmap_init(&packed->packs_by_path);
> +
> + packed->base.free = odb_source_packed_free;
> +
> + if (!is_absolute_path(parent->base.path))
> + chdir_notify_register(NULL, odb_source_packed_reparent, packed);
Out of curiousity, did the packfile store previously not have to worry
about changing directories?
> +
> + return packed;
> }
> diff --git a/odb/source-packed.h b/odb/source-packed.h
> index 3c2d229a17..68e64cabab 100644
> --- a/odb/source-packed.h
> +++ b/odb/source-packed.h
> @@ -9,6 +9,7 @@
> * A store that manages packfiles for a given object database.
> */
> struct odb_source_packed {
> + struct odb_source base;
> struct odb_source_files *files;
At first I got confused as to why we were adding back a `struct
odb_source` pointer, but this is for the "base" not the parent ODB
source. We need to embed a `struct odb_source` here to make this a
proper ODB source.
Looking good.
-Justin
^ permalink raw reply
* Re: [PATCH v3 02/11] doc: interpret-trailers: replace “lines” with “metadata”
From: Matt Hunter @ 2026-06-16 21:39 UTC (permalink / raw)
To: Kristoffer Haugsbakk, git
Cc: Christian Couder, jackmanb, Linus Arver, D. Ben Knoble
In-Reply-To: <26c2fcb2-2618-4bb9-a6e4-a6135934556d@app.fastmail.com>
On Tue Jun 16, 2026 at 4:32 PM EDT, Kristoffer Haugsbakk wrote:
> On Thu, Jun 11, 2026, at 05:10, Matt Hunter wrote:
>> On Wed Jun 10, 2026 at 5:21 PM EDT, kristofferhaugsbakk wrote:
>>>[snip]
>>> DESCRIPTION
>>> -----------
>>> -Add or parse _trailer_ lines at the end of the otherwise
>>> +Add or parse trailers metadata at the end of the otherwise
>>
>> fwiw, I think "trailer metadata" reads more naturally.
>
> You’re right that your version reads more naturally. I went back and
> forth on this.
After reading your points, I started debating the grammatical
correctness of it to myself too. Though I think I stand by my original
knee-jerk response.
>
> 1. We’re introducing the jargon, and the format is often discussed as
> plural “trailers”, with its constituent parts being singular
> “trailer”
> 2. What this replaces uses “trailer”, but it rescues the plural mood
> with “lines”
> 3. This is very soon going to go into the constituent parts, including
> each trailer, so we’re contrasting the concept name (trailers) with
> its parts
>
> But I think your version is overall better. It reads better and there is
> no way to confuse “trailer metadata” (trailers as a collection) with
> just a single “trailer”.
Yes, "metadata" reads as a hint to me that interpret-trailers works with
the overall trailer block too.
Another thought I had after seeing this again is that the text could
just say "Add or parse trailers at the end ...", but "metadata" is a lot
more useful, since it's an introduction to what trailers _are_. So the
patch is an improvement over the original context imo.
^ permalink raw reply
* Re: [PATCH v2 07/17] odb/source-packed: wire up `reprepare()` callback
From: Justin Tobler @ 2026-06-16 21:53 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <20260609-pks-odb-source-packed-v2-7-839089132c8b@pks.im>
On 26/06/09 10:51AM, Patrick Steinhardt wrote:
> Move the logic to prepare and reprepare the "packed" source into
> "odb/source-packed.c" and wire it up as the `reprepare()` callback.
>
> Note that "preparing" a source is not yet generic. Eventually, it would
> probably make sense to turn the existing `reprepare()` callback into a
> `prepare()` callback with an optional flag to force re-preparing. But
> this step will be handled in a separate patch series.
I do find the prepare vs reprepare semantics a bit confusing. The
mentioned change above would be nice to see in the future. :)
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
[snip]
> static void odb_source_packed_reparent(const char *name UNUSED,
> const char *old_cwd,
> const char *new_cwd,
> @@ -58,6 +214,7 @@ struct odb_source_packed *odb_source_packed_new(struct odb_source_files *parent)
>
> packed->base.free = odb_source_packed_free;
> packed->base.close = odb_source_packed_close;
> + packed->base.reprepare = odb_source_packed_reprepare;
We moved `packfile_store_reprepare()` logic into "odb/source-packed.c"
and setup the callback making it accessible via `odb_source_reprepare()`
instead. Makes sense.
> if (!is_absolute_path(parent->base.path))
> chdir_notify_register(NULL, odb_source_packed_reparent, packed);
> diff --git a/odb/source-packed.h b/odb/source-packed.h
> index 68e64cabab..9d4796261a 100644
> --- a/odb/source-packed.h
> +++ b/odb/source-packed.h
> @@ -81,4 +81,13 @@ static inline struct odb_source_packed *odb_source_packed_downcast(struct odb_so
> return container_of(source, struct odb_source_packed, base);
> }
>
> +/*
> + * Prepare the source by loading packfiles and multi-pack indices for
> + * all alternates. This becomes a no-op if the source is already prepared.
> + *
> + * It shouldn't typically be necessary to call this function directly, as
> + * functions that access the source know to prepare it.
> + */
> +void odb_source_packed_prepare(struct odb_source_packed *source);
The logic for `packfile_store_prepare()` is also moved into
"odb/source-packed.c" and made accessible this function since there
isn't a matching callback.
The other changes are mostly 1:1 moves of associated logic and callsite
updates. Looks good.
-Justin
^ permalink raw reply
* Re: [PATCH v2 11/17] odb/source-packed: wire up `for_each_object()` callback
From: Justin Tobler @ 2026-06-16 22:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <20260609-pks-odb-source-packed-v2-11-839089132c8b@pks.im>
On 26/06/09 10:51AM, Patrick Steinhardt wrote:
> Move `packfile_store_for_each_object()` and its associated helpers from
> "packfile.c" into "odb/source-packed.c" and wire it up as the
> `for_each_object()` callback of the "packed" source.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
[snip]
> @@ -291,6 +548,7 @@ struct odb_source_packed *odb_source_packed_new(struct odb_source_files *parent)
> packed->base.reprepare = odb_source_packed_reprepare;
> packed->base.read_object_info = odb_source_packed_read_object_info;
> packed->base.read_object_stream = odb_source_packed_read_object_stream;
> + packed->base.for_each_object = odb_source_packed_for_each_object;
Wiring up the callback. Looks good.
> if (!is_absolute_path(parent->base.path))
> chdir_notify_register(NULL, odb_source_packed_reparent, packed);
> diff --git a/packfile.c b/packfile.c
> index 42c84397eb..b8d6054c16 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1362,8 +1362,8 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
> hashmap_add(&delta_base_cache, &ent->ent);
> }
>
> -static int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
> - uint32_t *maybe_index_pos, struct object_info *oi)
> +int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
> + uint32_t *maybe_index_pos, struct object_info *oi)
Looks like we are also exposing `packed_object_info_with_index_pos()`
now. Not sure yet if this is also intended to be temporary like
`find_pack_entry()` in a previous patch though though.
-Justin
^ permalink raw reply
* Re: [PATCH] cat-file: speed up default format
From: René Scharfe @ 2026-06-16 22:14 UTC (permalink / raw)
To: Jeff King; +Cc: Git List
In-Reply-To: <20260616111237.GA687438@coredump.intra.peff.net>
On 6/16/26 1:12 PM, Jeff King wrote:
> On Mon, Jun 15, 2026 at 11:53:07PM +0200, René Scharfe wrote:
>
>> We can also store literal characters in there. An opcode plus with a
>> payload char incurs an overhead of 50%, which sounds high, but at least
>> the default format only has two of them and it's much better than
>> storing pointer plus size for an overhead of more than 90% in case of a
>> single char.
>
> True, and it's a size win if the literal portions tend to be small
> (fewer than 15 bytes). You do lose out on the ability to strbuf_add()
> them in one go, though. So lots more strbuf_grow() checks, etc. If you
> really wanted to get fancy, you could follow the opcode with a length
> represented as a variable-sized integer, followed by the literal bytes.
Or an opcode that shovels a fixed-size string. Depends on how much
literal text people include in their formats.
> I'm not sure that Git's formatting code needs to squeeze out quite that
> much performance, though.
Good point. Near-native performance would be necessary to make peephole
optimizations like the special handling of the default format
unnecessary, which I understand exists to speed up Gitaly [1], but I
guess most users don't have such high demands. And there's no point in
removing a few lines of duplicate code if the necessary machinery adds a
lot of complexity. Though the code discussed so far was not too crazy
IMHO.
[1] https://gitlab.com/gitlab-org/gitaly/-/blob/master/internal/git/gitpipe/catfile_info.go
> OK, so we managed another 1%. But I'm skeptical that this linear opcode
> technique is where we want to go in the long run, if we're ever going to
> unify formatters.
Agreed.
René
^ permalink raw reply
* Re: [PATCH] cat-file: speed up default format
From: Junio C Hamano @ 2026-06-16 22:21 UTC (permalink / raw)
To: René Scharfe; +Cc: Jeff King, Git List
In-Reply-To: <47e3cf16-217e-45d4-91e2-5a1abb4ee49e@web.de>
René Scharfe <l.s.r@web.de> writes:
> On 6/16/26 1:12 PM, Jeff King wrote:
>> On Mon, Jun 15, 2026 at 11:53:07PM +0200, René Scharfe wrote:
>>
>>> We can also store literal characters in there. An opcode plus with a
>>> payload char incurs an overhead of 50%, which sounds high, but at least
>>> the default format only has two of them and it's much better than
>>> storing pointer plus size for an overhead of more than 90% in case of a
>>> single char.
>>
>> True, and it's a size win if the literal portions tend to be small
>> (fewer than 15 bytes). You do lose out on the ability to strbuf_add()
>> them in one go, though. So lots more strbuf_grow() checks, etc. If you
>> really wanted to get fancy, you could follow the opcode with a length
>> represented as a variable-sized integer, followed by the literal bytes.
>
> Or an opcode that shovels a fixed-size string. Depends on how much
> literal text people include in their formats.
>
>> I'm not sure that Git's formatting code needs to squeeze out quite that
>> much performance, though.
>
> Good point. Near-native performance would be necessary to make peephole
> optimizations like the special handling of the default format
> unnecessary, which I understand exists to speed up Gitaly [1], but I
> guess most users don't have such high demands. And there's no point in
> removing a few lines of duplicate code if the necessary machinery adds a
> lot of complexity. Though the code discussed so far was not too crazy
> IMHO.
>
> [1] https://gitlab.com/gitlab-org/gitaly/-/blob/master/internal/git/gitpipe/catfile_info.go
>
>> OK, so we managed another 1%. But I'm skeptical that this linear opcode
>> technique is where we want to go in the long run, if we're ever going to
>> unify formatters.
>
> Agreed.
>
> René
Obviously I agree with the conclusion, but it was fun to watch cute
experiments from the sidelines. Thanks for entertainment ;-)
^ permalink raw reply
* [PATCH v2 0/7] Introduce fetch.followRemoteHEAD config variable
From: Matt Hunter @ 2026-06-16 22:25 UTC (permalink / raw)
To: git; +Cc: Bence Ferdinandy, Jeff King, Junio C Hamano
In-Reply-To: <20260612055947.1499497-1-m@lfurio.us>
git-fetch presently offers some useful ways to control how remote HEAD
symbolic-refs are (or aren't) updated when fetching from remote
repositories. Namely this is done via the
'remote.<name>.followRemoteHEAD' configuration variable.
However, this setting can be somewhat painful to use if you prefer a
default other than "create" and often work with multiple different
remote repositories.
This series introduces the variable 'fetch.followRemoteHEAD', which
provides a configurable default in place of per-remote settings.
'fetch.followRemoteHEAD' functions exactly the same as the original
variable, except that it doesn't allow warning suppression via
'warn-if-not-$branch'. Given that different remotes will vary their
HEAD and set of branches independently, setting a false-positive
globally in this way doesn't make logical sense.
While it is not mentioned by any of the patches in this series, note
also that the behavior introduced by 012bc566bad7 (remote set-head: set
followRemoteHEAD to "warn" if "always") is unaffected by this series,
and this feature continues to work for only the
'remote.<name>.followRemoteHEAD' variable.
---
Changes in v2:
- Don't die() if the value of fetch.followRemoteHEAD is unrecognized.
- Use case-sensitive matching for fetch.followRemoteHEAD values.
- Avoid the phrase "configuration option".
- Minor documentation wording changes.
- Link to v1: https://patch.msgid.link/20260612055947.1499497-1-m@lfurio.us
Matt Hunter (7):
fetch: fixup set_head advice for warn-if-not-branch
doc: explain fetchRemoteHEADWarn advice
t5510: cleanup remote in followRemoteHEAD dangling ref test
fetch: rename function report_set_head
fetch: refactor do_fetch handling of followRemoteHEAD
fetch: add configuration variable fetch.followRemoteHEAD
fetch: fixup a misaligned comment
Documentation/config/advice.adoc | 4 ++
Documentation/config/fetch.adoc | 19 ++++++
Documentation/config/remote.adoc | 21 +++---
builtin/fetch.c | 49 ++++++++++----
remote.h | 14 ++--
t/t5510-fetch.sh | 106 +++++++++++++++++++++++++++++++
6 files changed, 183 insertions(+), 30 deletions(-)
Range-diff against v1:
1: 779fb9bfc59f = 1: 2106228f7b98 fetch: fixup set_head advice for warn-if-not-branch
2: aacc2856bc77 = 2: b1c58c06e0c7 doc: explain fetchRemoteHEADWarn advice
3: f5272eaafbcc = 3: c1d11e8883e6 t5510: cleanup remote in followRemoteHEAD dangling ref test
4: 43a17027c13e = 4: 6306c8212fc0 fetch: rename function report_set_head
5: c719435d9675 ! 5: 3c7257094686 fetch: refactor do_fetch handling of followRemoteHEAD
@@ Commit message
Update enum follow_remote_head_settings to include the value
FOLLOW_REMOTE_UNCONFIGURED as the new zero-initialized value for
- followRemoteHEAD. This will allow us to distinguish between the option
- being unset vs. explicitly set to 'create', which is ultimately the
- system default. The unnecessary indentation is removed.
+ followRemoteHEAD. This will allow us to distinguish between the
+ variable being unset vs. explicitly set to 'create', which is ultimately
+ the system default. The unnecessary indentation is removed.
The do_fetch function is likewise updated to perform its own decision
making to determine the effective followRemoteHEAD mode, falling back to
the system default if necessary. This will enable the next patch to
- introduce a user-configurable fallback default option.
+ introduce a user-configurable default.
- Function set_head now accepts this value as an argument rather than only
+ Function set_head now accepts the mode as an argument rather than only
considering the value defined by the remote.
The use of the 'warn-if-not-$branch' value is awkward in the context of
- a global default option, since the branches will differ between
- individual remotes. For this reason, it's left out of this scheme and
- handling of the no_warn_branch variable is untouched. Since a
- remote-specific setting for followRemoteHEAD takes priority, we can
- assume that if remote->no_warn_branch is set, then the remote is also
- asserting FOLLOW_REMOTE_WARN as the effective operating mode, and it
- will be honored by do_fetch.
+ a global default, since the branches will differ between individual
+ remotes. For this reason, it's left out of this scheme and handling of
+ the no_warn_branch variable is untouched. Since a remote-specific
+ value for followRemoteHEAD takes priority, we can assume that if
+ remote->no_warn_branch is set, then the remote is also asserting
+ FOLLOW_REMOTE_WARN as the effective operating mode, and it will be
+ honored by do_fetch.
Signed-off-by: Matt Hunter <m@lfurio.us>
6: 56f6fc8ded2d ! 6: af9f99b1ceb2 fetch: add configuration option fetch.followRemoteHEAD
@@ Metadata
Author: Matt Hunter <m@lfurio.us>
## Commit message ##
- fetch: add configuration option fetch.followRemoteHEAD
+ fetch: add configuration variable fetch.followRemoteHEAD
- 'fetch.followRemoteHEAD' is added as a generic option used by all
+ 'fetch.followRemoteHEAD' is added as a generic setting used by all
remotes for which 'remote.<name>.followRemoteHEAD' is undefined. If
- both options are undefined, a builtin default of "create" is in effect,
- matching the previous behavior.
+ both variables are undefined, a builtin default of "create" is in
+ effect, matching the previous behavior.
As mentioned in the previous patch, 'fetch.followRemoteHEAD' supports
all of the values that its 'remote' counterpart does _except_
@@ Commit message
repositories.
Documentation and advice messages for both of the followRemoteHEAD
- options are reworded to better capture the relationship between the two.
+ variables are reworded to better capture the relationship between the
+ two.
The added tests assert feature parity between the two followRemoteHEAD
- options, as well as the fact that 'remote.<name>.followRemoteHEAD'
+ variables, as well as the fact that 'remote.<name>.followRemoteHEAD'
always supersedes this new configurable default.
+ Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matt Hunter <m@lfurio.us>
## Documentation/config/fetch.adoc ##
@@ Documentation/config/fetch.adoc: the new bundle URI.
remove the value for the `fetch.bundleCreationToken` value before fetching.
+
+`fetch.followRemoteHEAD`::
-+ When fetching using a default refspec, this option determines how to handle
++ When fetching using a default refspec, this setting determines how to handle
+ differences between a fetched remote's `HEAD` and the local
+ `remotes/<name>/HEAD` symbolic-ref. Its value is one of
++
@@ Documentation/config/remote.adoc: Blank values signal to ignore all previous val
- Setting it to "always" will silently update `remotes/<name>/HEAD` to
- the value on the remote. Finally, setting it to "never" will never
- change or create the local reference.
-+ When fetching this remote using its default refspec, this option determines
++ When fetching this remote using its default refspec, this setting determines
+ how to handle differences between the remote's `HEAD` and the local
-+ `remotes/<name>/HEAD` symbolic-ref. Overrides the setting for
++ `remotes/<name>/HEAD` symbolic-ref. Overrides the value of
+ `fetch.followRemoteHEAD`. See `fetch.followRemoteHEAD` for a description of
+ accepted values.
++
-+In addition to the values supported by `fetch.followRemoteHEAD`, this option may
-+also take on the value "warn-if-not-`$branch`", which behaves like "warn", but
-+ignores the warning if the remote's `HEAD` is `remotes/<name>/$branch`.
++In addition to the values supported by `fetch.followRemoteHEAD`, this setting
++may also take on the value "warn-if-not-`$branch`", which behaves like "warn",
++but ignores the warning if the remote's `HEAD` is `remotes/<name>/$branch`.
## builtin/fetch.c ##
@@ builtin/fetch.c: static struct string_list negotiation_include = STRING_LIST_INIT_NODUP;
@@ builtin/fetch.c: static int git_fetch_config(const char *k, const char *v,
+ if (!strcmp(k, "fetch.followremotehead")) {
+ if (!v)
+ return config_error_nonbool(k);
-+ else if (!strcasecmp(v, "never"))
++ else if (!strcmp(v, "never"))
+ fetch_config->follow_remote_head = FOLLOW_REMOTE_NEVER;
-+ else if (!strcasecmp(v, "create"))
++ else if (!strcmp(v, "create"))
+ fetch_config->follow_remote_head = FOLLOW_REMOTE_CREATE;
-+ else if (!strcasecmp(v, "warn"))
++ else if (!strcmp(v, "warn"))
+ fetch_config->follow_remote_head = FOLLOW_REMOTE_WARN;
-+ else if (!strcasecmp(v, "always"))
++ else if (!strcmp(v, "always"))
+ fetch_config->follow_remote_head = FOLLOW_REMOTE_ALWAYS;
-+ else
-+ die(_("invalid value for '%s': '%s'"),
-+ "fetch.followRemoteHEAD", v);
+ }
+
return git_default_config(k, v, ctx, cb);
@@ builtin/fetch.c: static const char *strip_refshead(const char *name){
- "will disable the warning until the remote changes HEAD to something else.");
+ N_("Run 'git remote set-head %s %s' to follow the change, or modify\n"
+ "either of the 'remote.%s.followRemoteHEAD' or 'fetch.followRemoteHEAD'\n"
-+ "configuration options to handle the situation differently.\n\n"
++ "configuration variables to handle the situation differently.\n\n"
+
-+ "Using this specific option\n\n"
++ "Using this specific setting\n\n"
+ " git config set remote.%s.followRemoteHEAD warn-if-not-%s\n\n"
+ "will suppress the warning until the remote changes HEAD to something else.");
7: 5e0bdd0f00b4 = 7: 5c80107f6488 fetch: fixup a misaligned comment
base-commit: 0fae78c9d55efe705877ea537fe42c59164ccd94
--
2.54.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox