Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 2/3] revision: introduce rev_walk_mode to clarify get_revision_1()
From: Kristofer Karlsson via GitGitGadget @ 2026-05-27 15:50 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Kristofer Karlsson
In-Reply-To: <pull.2127.git.1779897003.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

get_revision_1() dispatches to different walk strategies based on a
combination of rev_info flags: reflog_info, topo_walk_info, and
limited.  These conditions are checked in multiple places within
the function -- once to select the next commit, and again to decide
how to expand parents -- and the two chains must stay in sync.

Extract the mode selection into a rev_walk_mode enum and a small
get_walk_mode() helper, resolved once at the top of get_revision_1().
Both dispatch sites now switch on the same mode variable, making it
obvious that they agree and easier to verify that all modes are
handled.

No functional change.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 revision.c | 62 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index e1970b9c5d..9d0fc696d0 100644
--- a/revision.c
+++ b/revision.c
@@ -4327,22 +4327,48 @@ static void track_linear(struct rev_info *revs, struct commit *commit)
 	revs->previous_parents = commit_list_copy(commit->parents);
 }
 
+enum rev_walk_mode {
+	REV_WALK_REFLOG,
+	REV_WALK_TOPO,
+	REV_WALK_LIMITED,
+	REV_WALK_STREAMING,
+};
+
+static enum rev_walk_mode get_walk_mode(struct rev_info *revs)
+{
+	if (revs->reflog_info)
+		return REV_WALK_REFLOG;
+	if (revs->topo_walk_info)
+		return REV_WALK_TOPO;
+	if (revs->limited)
+		return REV_WALK_LIMITED;
+	return REV_WALK_STREAMING;
+}
+
 static struct commit *get_revision_1(struct rev_info *revs)
 {
+	enum rev_walk_mode mode = get_walk_mode(revs);
+
 	while (1) {
 		struct commit *commit;
 
-		if (revs->reflog_info)
+		switch (mode) {
+		case REV_WALK_REFLOG:
 			commit = next_reflog_entry(revs->reflog_info);
-		else if (revs->topo_walk_info)
+			break;
+		case REV_WALK_TOPO:
 			commit = next_topo_commit(revs);
-		else
+			break;
+		case REV_WALK_LIMITED:
+		case REV_WALK_STREAMING:
 			commit = pop_commit(&revs->commits);
+			break;
+		}
 
 		if (!commit)
 			return NULL;
 
-		if (revs->reflog_info)
+		if (mode == REV_WALK_REFLOG)
 			commit->object.flags &= ~(ADDED | SEEN | SHOWN);
 
 		/*
@@ -4350,20 +4376,28 @@ static struct commit *get_revision_1(struct rev_info *revs)
 		 * the parents here. We also need to do the date-based limiting
 		 * that we'd otherwise have done in limit_list().
 		 */
-		if (!revs->limited) {
-			if (revs->max_age != -1 &&
-			    comparison_date(revs, commit) < revs->max_age)
-				continue;
+		if (mode != REV_WALK_LIMITED &&
+		    revs->max_age != -1 &&
+		    comparison_date(revs, commit) < revs->max_age)
+			continue;
 
-			if (revs->reflog_info)
-				try_to_simplify_commit(revs, commit);
-			else if (revs->topo_walk_info)
-				expand_topo_walk(revs, commit);
-			else if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
+		switch (mode) {
+		case REV_WALK_REFLOG:
+			try_to_simplify_commit(revs, commit);
+			break;
+		case REV_WALK_TOPO:
+			expand_topo_walk(revs, commit);
+			break;
+		case REV_WALK_STREAMING:
+			if (process_parents(revs, commit,
+					    &revs->commits, NULL) < 0) {
 				if (!revs->ignore_missing_links)
 					die("Failed to traverse parents of commit %s",
-						oid_to_hex(&commit->object.oid));
+					    oid_to_hex(&commit->object.oid));
 			}
+			break;
+		case REV_WALK_LIMITED:
+			break;
 		}
 
 		switch (simplify_commit(revs, commit)) {
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 1/3] pack-objects: call release_revisions() after cruft traversal
From: Kristofer Karlsson via GitGitGadget @ 2026-05-27 15:50 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Kristofer Karlsson
In-Reply-To: <pull.2127.git.1779897003.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

enumerate_and_traverse_cruft_objects() initializes a rev_info on the
stack but never calls release_revisions() afterwards.  This is not
visible on master but becomes a leak once the revision walking
machinery uses dynamically allocated structures.

Add the missing release_revisions() call.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 builtin/pack-objects.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 480cc0bd8c..67025e8625 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4275,6 +4275,7 @@ static void enumerate_and_traverse_cruft_objects(struct string_list *fresh_packs
 	traverse_commit_list(&revs, show_cruft_commit, show_cruft_object, NULL);
 
 	stop_progress(&progress_state);
+	release_revisions(&revs);
 }
 
 static void read_cruft_objects(void)
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 0/3] revision: use priority queue for streaming walks
From: Kristofer Karlsson via GitGitGadget @ 2026-05-27 15:49 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson

This is a follow-up to kk/limit-list-optim (now in master), which replaced
the O(N) sorted linked-list insertion in limit_list() with a priority queue.
In the review thread for that patch, I mentioned that the same approach
could be applied to the streaming (non-limited) walk in get_revision_1().
Junio suggested doing it as a separate topic, and Peff noted he had a local
branch (jk/revs-commits-prio-queue) doing a broader conversion of
revs->commits to prio_queue entirely.

This series takes a lighter-weight approach: it keeps the linked list for
setup and external callers, and adds a separate commit_queue field that the
streaming walk drains into on first use. This avoids touching bisect,
line-log, and list-objects code, at the cost of a "only one should be
non-empty" invariant between the two fields.

Together with the limit_list() change already in master, this eliminates all
commit_list_insert_by_date() callers from revision.c.

Patch 1 is a small leak fix -- a missing release_revisions() call in
pack-objects that becomes visible once the commit queue uses a dynamically
allocated prio_queue array.

Patch 2 introduces a rev_walk_mode enum to replace the repeated if/else
chains in get_revision_1(). The function dispatches on walk mode in multiple
places (next commit, expand parents, flag clearing) and these chains must
stay in sync. The enum resolves the mode once and both dispatch sites switch
on the same variable. This is a lighter alternative to the vtable-based
refactoring I mentioned before. No functional change.

Patch 3 is the actual conversion of the streaming walk to use a priority
queue.

== Why this helps ==

The streaming walk in get_revision_1() inserts newly discovered parent
commits into a date-sorted queue. On master, this uses
commit_list_insert_by_date(), which walks the linked list to find the
insertion point -- O(w) per insert, where w is the queue width (active walk
frontier).

In merge-heavy repositories, the walk frontier stays wide:

Repository Commits Peak width Avg width
=======================================

monorepo (2.4M) 2,420K 2,653 1,700 linux.git 1,445K 581 235 git.git 82K 188
82

On the monorepo, each of the 2.4M commits requires scanning an average of
1,700 list entries to find the insertion point. With the priority queue,
this drops to ~11 heap comparisons.

== Benchmarks ==

All benchmarks: best of 3 runs, same machine, commit-graph present.

Streaming walks (affected by this series):

git rev-list --count HEAD (monorepo, 2.4M commits)

  master:   17.94s
  patched:   3.38s   (5.3x faster)

git rev-list HEAD (monorepo, full output)

  master:   27.72s
  patched:   8.61s   (2.8x faster, I/O-bound fraction unchanged)

Regression checks -- non-merge-heavy repos (streaming path, but frontier
stays narrow so O(w) insertion was never the bottleneck):

git rev-list --count HEAD (linux.git, 1.4M commits)

  master:    1.76s
  patched:   1.81s   (no change)

git rev-list HEAD (linux.git, full output)

  master:    4.46s
  patched:   4.52s   (no change)

git rev-list --count HEAD (git.git, 82K commits)

  master:     83ms
  patched:    86ms   (no change)

Regression checks -- other walk modes (not affected by this series):

git rev-list --count HEAD~5000...HEAD (monorepo, limited path)

  master:    7.36s
  patched:   7.02s   (no change)

== Profile breakdown ==

perf profiling of rev-list --count HEAD on the monorepo shows where the time
goes:

master (17.94s): commit_list_insert_by_date 79% 14.25s fixed overhead
(parse/lookup) 21% 3.69s

patched (3.38s): heap ops (compare + sift) 16% 0.53s fixed overhead
(parse/lookup) 84% 2.85s

The queue maintenance itself sped up 27x (14.25s to 0.53s). The overall 5.3x
is lower because the fixed costs -- object lookup (17%), commit-graph
parsing (14%), memory allocation (10%) -- are roughly constant between the
two versions at ~3s.

This means the patch removes the dominant bottleneck entirely. After the
patch, the walk cost is dominated by irreducible per-commit work (parsing
and object lookup) which scales linearly with commit count regardless of
frontier width.

Kristofer Karlsson (3):
  pack-objects: call release_revisions() after cruft traversal
  revision: introduce rev_walk_mode to clarify get_revision_1()
  revision: use priority queue for non-limited streaming walks

 builtin/pack-objects.c |   1 +
 commit.c               |  13 -----
 commit.h               |   2 -
 revision.c             | 113 +++++++++++++++++++++++++++--------------
 revision.h             |  12 ++++-
 5 files changed, 88 insertions(+), 53 deletions(-)

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2127%2Fspkrka%2Fstreaming-prio-queue-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2127/spkrka/streaming-prio-queue-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2127
-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH v3 6/8] promisor-remote: trust known remotes matching acceptFromServerUrl
From: Christian Couder @ 2026-05-27 15:37 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: git, Junio C Hamano, Patrick Steinhardt, Taylor Blau,
	Karthik Nayak, Elijah Newren, Toon Claes, Christian Couder
In-Reply-To: <97b9f2cd-7c82-4d4c-b574-31176074e566@app.fastmail.com>

On Sat, May 23, 2026 at 5:17 PM Kristoffer Haugsbakk
<kristofferhaugsbakk@fastmail.com> wrote:
>
> On Tue, May 19, 2026, at 17:38, Christian Couder wrote:
> >[snip]
> >
> > Let's then use this helper in should_accept_remote() so that, a known
> > remote whose URL matches the allowlist is accepted.
>
> I don’t understand this comma break?

I have removed it in the v4 I just sent. Sorry for the confusion.

> > ++
> > +Before matching, both the advertised URL and the pattern are
> > +normalized: the scheme and host are lowercased, percent-encoded
>
> This next paragraph seems to go back to describing how things work. But
> this paragraph as well as all of the following ones belong to this list
> item:
>
>       4.   Be careful using globs [...]
>
>            Before matching, [...]
>
>            The glob pattern can [...]
>
>            If a remote with the [...]
>
>            For the security implications [...]
>
>     promisor.checkFields
>     [...]
>
> I don’t know what the intent is. But using an open block will delimit
> the ordered list.
>
>     diff --git Documentation/config/promisor.adoc Documentation/config/promisor.adoc
>     index cc728bb0b5e..f07a2e883bd 100644
>     --- Documentation/config/promisor.adoc
>     +++ Documentation/config/promisor.adoc
>     @@ -109,6 +109,7 @@ and to update fields (such as authentication tokens) on known remotes
>      without further confirmation. To minimize security risks, follow these
>      guidelines:
>      +
>     +--
>      1. Start with a secure protocol scheme, like `https://` or `ssh://`.
>      +
>      2. Only allow domain names or paths where you control and trust _ALL_
>     @@ -130,6 +131,7 @@ guidelines:
>         subdomain. This is extremely dangerous on shared hosting platforms
>         (e.g., `https://*.github.io/*` trusts every user's site on the
>         entire platform).
>     +--
>      +
>      Before matching, both the advertised URL and the pattern are
>      normalized: the scheme and host are lowercased, percent-encoded

Thanks for the suggestion, it is indeed much better this way, and this
is what is used in v4.

^ permalink raw reply

* Re: [PATCH v3 4/8] promisor-remote: add 'local_name' to 'struct promisor_info'
From: Christian Couder @ 2026-05-27 15:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Christian Couder
In-Reply-To: <87a4tvq6pr.fsf@gitster.g>

On Wed, May 20, 2026 at 2:12 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > +static const char *promisor_info_internal_name(struct promisor_info *p)
> > +{
> > +     return p->local_name ? p->local_name : p->name;
> > +}
>
> Hmph.
>
> > @@ -829,7 +836,7 @@ static bool promisor_store_advertised_fields(struct promisor_info *advertised,
> >  {
> >       struct promisor_info *p;
> >       struct string_list_item *item;
> > -     const char *remote_name = advertised->name;
> > +     const char *remote_name = promisor_info_internal_name(advertised);
>
> Is this really a "remote_name", though?  As ...
>
> > @@ -937,7 +944,8 @@ static void filter_promisor_remote(struct repository *repo,
> >       /* Apply accepted remotes to the stable repo state */
> >       for_each_string_list_item(item, accepted_remotes) {
> >               struct promisor_info *info = item->util;
> > -             struct promisor_remote *r = repo_promisor_remote_find(repo, info->name);
> > +             const char *local = promisor_info_internal_name(info);
>
> ... this name "local" is "the name the thing is locally known to
> us", promisor_info_local_name() might be a better name?  I dunno.
> I jsut found it odd that the return value of the same function is
> stored in variables named "remote" and "local" at the same time ;-)

In the v4 I just sent, I renamed promisor_info_internal_name() to
promisor_info_local_name(), and "remote_name" is the name of the local
variable in both places to be more consistent.

Thanks.

^ permalink raw reply

* Re: [PATCH 5/8] pack-bitmap: cache object positions during fill
From: Taylor Blau @ 2026-05-27 14:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <20260527094535.GF981444@coredump.intra.peff.net>

On Wed, May 27, 2026 at 05:45:35AM -0400, Jeff King wrote:
> > Combat this by adding a small, direct-mapped cache to the bitmap writer
> > which maps object IDs to their corresponding bit positions. Size the
> > cache according to the number of objects being written, with fixed lower
> > and upper bounds so small repositories do not pay for a large table and
> > large repositories can avoid most repeated packlist and MIDX lookups.
>
> Introducing another layer of data structure feels so dirty, but it's
> hard to argue with the numbers. We are looking up oids in the packlist,
> so it's already O(lg n). Your cache here is essentially a hash lookup,
> which is O(1)-ish (with collisions causing eviction rather than growth).
> And it presumably works because there's a lot of locality in lookups
> (between commits X and X^1, their top-level trees will be almost
> identical but we have to resolve the bits to find out which entries are
> new).
>
> It does make me wonder if we'd see similar improvements if we just
> turned the packlist into a regular hash table. Or maybe not, because
> then we'd have to do actual probing.

I haven't run that experiment directly, but I share your suspicion. I
wrote a 2- and 4-way associative cache implementation as alternatives
before settling on the direct-mapped approach in this patch. I found
that associative caches regardless of cache lines nearly nuked any of
the performance gains that we got as a result of this patch.

> So this really is a somewhat unique situation. It _might_ be applicable
> for the reading side of bitmaps, though. When we do fill-in traversal we
> end up with this same "read a tree, find the bit for each entry, and 99%
> of the time find that it is already in the bitmap".

I think it's certainly likely. In my experience, many object to
bit-position queries are extremely cache-friendly. And in practice, many
large repositories have MIDXs with many tens of millions of objects. So
even on a O(log n) lookup, caching seems to help a lot.

> > In our example repository from above and in earlier commits, this
> > results in a ~9.4% reduction in runtime relative to the previous commit:
> >
> >     +------------------+-------------+-------------+---------------------+
> >     |                  | HEAD^       | HEAD        | Delta               |
> >     +------------------+-------------+-------------+---------------------+
> >     | elapsed          |   324.8 s   |   294.1 s   |    -30.7 s  (-9.4%) |
> >     | cycles           | 1,508.6 B   | 1,365.5 B   |   -143.0 B  (-9.5%) |
> >     | instructions     | 1,436.6 B   | 1,389.8 B   |    -46.9 B  (-3.3%) |
> >     | CPI              |     1.050   |     0.983   |   -0.068    (-6.4%) |
> >     +------------------+-------------+-------------+---------------------+
>
> I show a 26% speed up on linux.git (1m37 down to 1m12). Very cool.

Glad it reproduces ;-).

> > +static uint32_t store_cached_object_pos(struct bitmap_writer *writer,
> > +					const struct object_id *oid,
> > +					uint32_t pos)
> > +{
> > +	size_t slot;
> > +
> > +	if (pos & BITMAP_POS_CACHE_VALID)
> > +		return pos; /* too large to cache */
>
> Cute, I wondered what would happen if we went past 2^31. I suspect there
> are other parts of the code that do not behave that well around that
> size, but it is good that we are not introducing any new surprises.

Yeah, I suspect that there are many breakages past the 32-bit unsigned
maximum number of objects. I figure the easiest thing to do in this
patch is to avoid making that situation worse by simply not caching
objects that are near that limit.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps
From: Taylor Blau @ 2026-05-27 14:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <20260527092412.GD981444@coredump.intra.peff.net>

On Wed, May 27, 2026 at 05:24:12AM -0400, Jeff King wrote:
> On Tue, May 19, 2026 at 12:12:41PM -0400, Taylor Blau wrote:
>
> > Building bitmaps from scratch on the same test repository from the
> > previous commits yields a significant speed-up:
> >
> >     +------------------+-------------+-------------+---------------------+
> >     |                  | HEAD^       | HEAD        | Delta               |
> >     +------------------+-------------+-------------+---------------------+
> >     | elapsed          |   562.8 s   |   324.8 s   |   -237.9 s (-42.3%) |
> >     | cycles           | 2,621.3 B   | 1,508.6 B   | -1,112.7 B (-42.4%) |
> >     | instructions     | 2,348.9 B   | 1,436.6 B   |   -912.3 B (-38.8%) |
> >     | CPI              |     1.116   |     1.050   |   -0.066    (-5.9%) |
> >     +------------------+-------------+-------------+---------------------+
>
> Oh my, that's a rather nice speedup. I can reproduce here on linux.git
> (~47% improvement).
>
> > When `fill_bitmap_commit()` reaches an ancestor that was selected for
> > its own bitmap and processed earlier, its object closure is already
> > stored in `writer->bitmaps` as an EWAH bitmap. As a result, walking
> > through that commit's tree and parents again is redundant.
> >
> > Teach `fill_bitmap_commit()` to notice that case. For non-root commits in
> > the walk, look for a stored selected bitmap and OR it into the bitmap
> > being built. If one exists, skip the commit, its tree, and its parents.
>
> I feel like this _shouldn't_ be necessary, because the idea of the
> current writing code is to go from the roots up, following inverted
> parent pointers, and passing the bitmap up as we go. So whenever we
> visit a commit we should in theory have all of the ancestor's bits set
> in that bitmap. But I remember that the simple-and-stupid approach ended
> up being too memory hungry, so we pick some focal points in the graph
> and then fill them independently.

It's sharing within the non-first parent history that is killing us
here. I think what you said is true in a completely linear repository
with no merges. But since we only pass commit masks from commits to
their first parents, we don't reuse any already-generated bitmaps for
common points in history not shared between commits' first parents.

> I wondered about "c != commit" here. "c" is the commit we're traversing,
> and "commit" is the one for which we're trying to build the bitmap. So
> we would not expect to ever have an entry in writer->bitmaps for "c"
> yet, but the conditional is just short-circuiting the hash lookup.

Exactly.

> The rest of the patch looks obviously correct. The trace2 bits aren't
> strictly necessary, of course, but some metrics might help with further
> tuning.

Yeah, these were for my own curiosity as much as anything. I had written
them as a temporary measure in order to write the "[...] there are 1,261
commits selected for bitmap coverage, and 1,382 maximal commits induced
[...]" portion of the commit message above.

Once I had written it, I found the result useful enough to keep around.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 2/8] pack-bitmap: check subtree bits before recursing
From: Taylor Blau @ 2026-05-27 14:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <20260527090348.GC981444@coredump.intra.peff.net>

On Wed, May 27, 2026 at 05:03:48AM -0400, Jeff King wrote:
> On Tue, May 19, 2026 at 12:12:39PM -0400, Taylor Blau wrote:
>
> > In the previous commit, we adjusted the callers of `fill_bitmap_tree()`
> > to pass in the bit position of the tree they wish to fill.
> >
> > This commit makes use of that information at the call site to avoid
> > setting up a stack frame for fill_bitmap_tree() entirely whenever a
> > tree's bit position is already set.
>
> OK, this one at least has a plausible explanation. ;)
>
> I can reproduce your speedup on linux.git (~5% again). I don't love that
> we have to duplicate the logic in each of the callers, but there are
> only two sites (and unlikely to ever be more). And it is only one line,
> the comment notwithstanding. That seems like a good tradeoff for a
> multiple-second speedup.

Yup, exactly. There are naturally only two callers: one to handle a
commit's root tree, and another for recursive calls to handle subtrees.
As a result, I'm OK with the duplication here.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()`
From: Taylor Blau @ 2026-05-27 14:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <20260527085740.GB981444@coredump.intra.peff.net>

On Wed, May 27, 2026 at 04:57:40AM -0400, Jeff King wrote:
> It is indeed surprising. There's a possible candidate for the speedup
> here:
>
> > @@ -482,8 +479,12 @@ static int fill_bitmap_tree(struct bitmap_writer *writer,
> >  	while (tree_entry(&desc, &entry)) {
> >  		switch (object_type(entry.mode)) {
> >  		case OBJ_TREE:
> > +			pos = find_object_pos(writer, &entry.oid, &found);
> > +			if (!found)
> > +				return -1;
> >  			if (fill_bitmap_tree(writer, bitmap,
> > -					     lookup_tree(writer->repo, &entry.oid)) < 0)
> > +					     lookup_tree(writer->repo,
> > +							 &entry.oid), pos) < 0)
> >  				return -1;
> >  			break;
>
> Whenever "found" is false, we cut out early and skip the hash lookup in
> lookup_tree() entirely. But that should almost never happen! It implies
> that a reachable object is not in the pack/midx, and thus the bitmaps is
> not closed (and we'll refuse to generate it).

That's right, and I had actually written something like the following
while developing this patch:

--- 8< ---
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 2d5ff8fd406..328e1c13df3 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -481,7 +481,7 @@ static int fill_bitmap_tree(struct bitmap_writer *writer,
 		case OBJ_TREE:
 			pos = find_object_pos(writer, &entry.oid, &found);
 			if (!found)
-				return -1;
+				BUG("huh??");
 			if (fill_bitmap_tree(writer, bitmap,
 					     lookup_tree(writer->repo,
 							 &entry.oid), pos) < 0)
--- >8 ---

, but couldn't trigger it in either the test suite nor in my sample
repository. I left it in there as a sanity measure.

> So it really is the case that we do the same operations in a different
> order. Weird.

Yeah, I puzzled over this for quite a while myself. I really think that
this is reordering produces more favorable cache behavior or codegen
that results in a meaningful speedup.

> But the patch itself looks correct to me, and I get ~6% speedup on a
> from-scratch bitmap generation of linux.git. I guess it could vary
> between architectures and compilers (I'm using gcc on x86), but since
> the reorg is setting us up for further optimizations in the next patch,
> I suppose there's no need to look a gift horse in the mouth.

Good, I'm glad that it was reproducible on your machine. And I agree
;-).

Thanks,
Taylor

^ permalink raw reply related

* [PATCH v4 8/8] doc: promisor: improve acceptFromServer entry
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

The entry for the `promisor.acceptFromServer` in
"Documentation/config/promisor.adoc" has a number of issues:

- it's not clear if new remotes and URLs can be created,
- it looks like a big block of text,
- it's not easy to see all the options,
- it's not easy to see which option is the default one,
- for "knownName", it says "advertised by the client" instead of
  "advertised by the server",
- it doesn't refer to the new related `acceptFromServerUrl`
  option.

Let's address all these issues by rewording large parts of it
and using bullet points for the different options.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc | 53 ++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 455ce40be8..f07a2e883b 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -32,24 +32,41 @@ variable is set to "true", and the "name" and "url" fields are always
 advertised regardless of this setting.
 
 promisor.acceptFromServer::
-	If set to "all", a client will accept all the promisor remotes
-	a server might advertise using the "promisor-remote"
-	capability. If set to "knownName" the client will accept
-	promisor remotes which are already configured on the client
-	and have the same name as those advertised by the client. This
-	is not very secure, but could be used in a corporate setup
-	where servers and clients are trusted to not switch name and
-	URLs. If set to "knownUrl", the client will accept promisor
-	remotes which have both the same name and the same URL
-	configured on the client as the name and URL advertised by the
-	server. This is more secure than "all" or "knownName", so it
-	should be used if possible instead of those options. Default
-	is "none", which means no promisor remote advertised by a
-	server will be accepted. By accepting a promisor remote, the
-	client agrees that the server might omit objects that are
-	lazily fetchable from this promisor remote from its responses
-	to "fetch" and "clone" requests from the client. Name and URL
-	comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
+	Controls which promisor remotes advertised by a server (using the
+	"promisor-remote" protocol capability) a client will accept. By
+	accepting a promisor remote, the client agrees that the server
+	might omit objects that are lazily fetchable from this promisor
+	remote from its responses to "fetch" and "clone" requests.
++
+Note that this option does not cause new remotes to be automatically
+created in the client's configuration. It only allows remotes which
+are somehow already configured to be trusted for the current
+operation, or their fields to be updated (if `promisor.storeFields` is
+set and the remote already exists locally). To allow Git to
+automatically create and persist new remotes from server
+advertisements, use `promisor.acceptFromServerUrl`.
++
+The available options are:
++
+* `none` (default): No promisor remote advertised by a server will be
+  accepted.
++
+* `knownUrl`: The client will accept promisor remotes that are already
+  configured on the client and have both the same name and the same URL
+  as advertised by the server. This is more secure than `all` or
+  `knownName`, and should be used if possible instead of those options.
++
+* `knownName`: The client will accept promisor remotes that are already
+  configured on the client and have the same name as those advertised
+  by the server. This is not very secure, but could be used in a corporate
+  setup where servers and clients are trusted to not switch names and URLs.
++
+* `all`: The client will accept all the promisor remotes a server might
+  advertise. This is the least secure option and should only be used in
+  fully trusted environments.
++
+Name and URL comparisons are case-sensitive. See linkgit:gitprotocol-v2[5]
+for protocol details.
 
 promisor.acceptFromServerUrl::
 	A glob pattern to specify which server-advertised URLs a
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 7/8] promisor-remote: auto-configure unknown remotes
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

Previous commits have introduced the `promisor.acceptFromServerUrl`
config variable to allowlist some URLs advertised by a server through
the "promisor-remote" protocol capability.

However the new `promisor.acceptFromServerUrl` mechanism, like the old
`promisor.acceptFromServer` mechanism, still requires a remote to
already exist in the client's local configuration before it can be
accepted. This places a significant manual burden on users to
pre-configure these remotes, and creates friction for administrators
who have to troubleshoot or manually provision these setups for their
teams.

To eliminate this burden, let's automatically create a new `[remote]`
section in the client's config when a server advertises an unknown
remote whose URL matches a `promisor.acceptFromServerUrl` glob pattern.

Concretely, let's add four helpers:

 - sanitize_remote_name(): turn an arbitrary URL-derived string into a
   valid remote name by replacing non-alphanumeric characters,
   collapsing runs of '-', and prepending "promisor-auto-".

 - promisor_remote_name_from_url(): normalize the URL and extract
   host+port+path to build a human-readable base name, then pass it
   through sanitize_remote_name().

 - configure_auto_promisor_remote(): write the remote.*.url,
   remote.*.promisor and remote.*.advertisedAs keys to the repo
   config.

 - handle_matching_allowed_url(): pick the final name (user-supplied
   alias or auto-generated), handle collisions by appending "-1",
   "-2", etc., then call configure_auto_promisor_remote().

Let's also add should_accept_new_remote_url() which reuses the
url_matches_accept_list() helper introduced in a previous commit to
find a matching pattern, then delegates to handle_matching_allowed_url()
to create the remote.

And then let's call should_accept_new_remote_url() from the '!item'
(unknown remote) branch of should_accept_remote(), setting
`reload_config` so that the newly-written config is picked up.

Finally let's document all that by:

 - expanding the `promisor.acceptFromServerUrl` entry to describe
   auto-creation, the optional "name=" prefix syntax, the
   "promisor-auto-*" generation rules, and numeric-suffix collision
   handling, and by
 - adding a "remote.<name>.advertisedAs" entry to "remote.adoc".

Also let's extend the precedence paragraph added by a previous commit
to mention this new acceptance path: until now, the only way for
`promisor.acceptFromServerUrl` to trigger acceptance was to allow
field updates for a known remote. With this commit, it can also trigger
auto-creation of a previously-unknown remote whose advertised URL
matches the allowlist.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc    |  39 +++--
 Documentation/config/remote.adoc      |   9 ++
 promisor-remote.c                     | 201 +++++++++++++++++++++++++-
 t/t5710-promisor-remote-capability.sh | 104 +++++++++++++
 4 files changed, 340 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 605473c82f..455ce40be8 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -54,7 +54,8 @@ promisor.acceptFromServer::
 promisor.acceptFromServerUrl::
 	A glob pattern to specify which server-advertised URLs a
 	client is allowed to act on. When a URL matches, the client
-	will accept the advertised remote as a promisor remote and may
+	will accept the advertised remote as a promisor remote, may
+	automatically create a new remote configuration for it and may
 	automatically accept field updates (such as authentication
 	tokens) from the server, even if `promisor.acceptFromServer`
 	is set to `none` (the default).
@@ -65,12 +66,13 @@ this option in _ANY_ config file read by Git.
 +
 When both `promisor.acceptFromServer` and `promisor.acceptFromServerUrl`
 are set, `promisor.acceptFromServerUrl` is consulted first and takes
-precedence: if a matching pattern leads to acceptance (by accepting
-field updates for a known remote whose URL matches both the local
-configuration and the allowlist), the advertised remote is accepted
-regardless of the `promisor.acceptFromServer` setting. If no pattern
-in `promisor.acceptFromServerUrl` triggers acceptance, the decision
-is left to `promisor.acceptFromServer`.
+precedence: if a matching pattern leads to acceptance (either by
+auto-configuring an unknown remote or by accepting field updates for
+a known remote whose URL matches both the local configuration and the
+allowlist), the advertised remote is accepted regardless of the
+`promisor.acceptFromServer` setting. If no pattern in
+`promisor.acceptFromServerUrl` triggers acceptance, the decision is
+left to `promisor.acceptFromServer`.
 +
 Note however that, even when an advertised URL matches a pattern in
 `promisor.acceptFromServerUrl`, an already-existing remote on the
@@ -85,9 +87,10 @@ documentation of that option.)
 Be _VERY_ careful with these patterns: `*` matches any sequence of
 characters within the 'host' and 'path' parts of a URL (but cannot
 cross part boundaries). An overly broad pattern is a major security
-risk, as a matching URL allows a server to update fields (such as
-authentication tokens) on known remotes without further confirmation.
-To minimize security risks, follow these guidelines:
+risk, as a matching URL allows a server to auto-configure new remotes
+and to update fields (such as authentication tokens) on known remotes
+without further confirmation. To minimize security risks, follow these
+guidelines:
 +
 --
 1. Start with a secure protocol scheme, like `https://` or `ssh://`.
@@ -123,6 +126,22 @@ ignored during matching. Note that embedding credentials in URLs is
 discouraged. Passing authentication tokens via the `token` field of
 the `promisor-remote` capability is strongly preferred.
 +
+The glob pattern can optionally be prefixed with a remote name and an
+equals sign (e.g., `cdn=https://cdn.example.com/*`). If such a prefix
+is provided, accepted remotes will be saved under that name. If no
+such prefix is provided, a safe remote name will be automatically
+generated by sanitizing the URL and prefixing it with
+`promisor-auto-`.
++
+If a remote with the chosen name already exists but points to a
+different URL, Git will append a numeric suffix (e.g., `-1`, `-2`) to
+the name to prevent overwriting existing configurations. You should
+make sure that this doesn't happen often though, as remotes will be
+rejected if the numeric suffix increases too much. In all cases, the
+original name advertised by the server is recorded in the
+`remote.<name>.advertisedAs` configuration variable for tracing and
+debugging purposes.
++
 For the security implications of accepting a promisor remote, see the
 documentation of `promisor.acceptFromServer`. For details on the
 protocol, see linkgit:gitprotocol-v2[5].
diff --git a/Documentation/config/remote.adoc b/Documentation/config/remote.adoc
index 91e46f66f5..6e2bbdf457 100644
--- a/Documentation/config/remote.adoc
+++ b/Documentation/config/remote.adoc
@@ -91,6 +91,15 @@ remote.<name>.promisor::
 	When set to true, this remote will be used to fetch promisor
 	objects.
 
+remote.<name>.advertisedAs::
+	When a promisor remote is automatically configured using
+	information advertised by a server through the
+	`promisor-remote` protocol capability (see
+	`promisor.acceptFromServerUrl`), the server's originally
+	advertised name is saved in this variable. This is for
+	information, tracing and debugging purposes. Users should not
+	typically modify or create such configuration entries.
+
 remote.<name>.partialclonefilter::
 	The filter that will be applied when fetching from this	promisor remote.
 	Changing or clearing this value will only affect fetches for new commits.
diff --git a/promisor-remote.c b/promisor-remote.c
index 04a5bb9939..8fb5e40f67 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -813,10 +813,197 @@ static struct allowed_url *url_matches_accept_list(
 	return NULL;
 }
 
-static int should_accept_remote(enum accept_promisor accept,
+/*
+ * Sanitize the buffer to make it a valid remote name coming from the
+ * server by:
+ *
+ * - replacing any non alphanumeric character with a '-'
+ * - stripping any leading '-',
+ * - condensing multiple '-' into one,
+ * - prepending "promisor-auto-",
+ * - validating the result.
+ */
+static int sanitize_remote_name(struct strbuf *buf, const char *url)
+{
+	char prev = '-';
+	for (size_t i = 0; i < buf->len; ) {
+		if (!isalnum(buf->buf[i]))
+			buf->buf[i] = '-';
+		if (prev == '-' && buf->buf[i] == '-') {
+			strbuf_remove(buf, i, 1);
+		} else {
+			prev = buf->buf[i];
+			i++;
+		}
+	}
+
+	strbuf_strip_suffix(buf, "-");
+
+	if (!buf->len) {
+		warning(_("couldn't generate a valid remote name from "
+			  "advertised url '%s', ignoring this remote"), url);
+		return -1;
+	}
+
+	strbuf_insertstr(buf, 0, "promisor-auto-");
+
+	if (!valid_remote_name(buf->buf)) {
+		warning(_("generated remote name '%s' from advertised url '%s' "
+			  "is invalid, ignoring this remote"), buf->buf, url);
+		return -1;
+	}
+
+	return 0;
+}
+
+static char *promisor_remote_name_from_url(const char *url)
+{
+	struct url_info url_info = { 0 };
+	char *normalized = url_normalize(url, &url_info);
+	struct strbuf buf = STRBUF_INIT;
+
+	if (!normalized) {
+		warning(_("couldn't normalize advertised url '%s', "
+			  "ignoring this remote"), url);
+		return NULL;
+	}
+
+	if (url_info.host_len) {
+		strbuf_add(&buf, normalized + url_info.host_off, url_info.host_len);
+		strbuf_addch(&buf, '-');
+	}
+
+	if (url_info.port_len) {
+		strbuf_add(&buf, normalized + url_info.port_off, url_info.port_len);
+		strbuf_addch(&buf, '-');
+	}
+
+	if (url_info.path_len) {
+		strbuf_add(&buf, normalized + url_info.path_off, url_info.path_len);
+		strbuf_trim_trailing_dir_sep(&buf);
+		strbuf_strip_suffix(&buf, ".git");
+	}
+
+	free(normalized);
+
+	if (sanitize_remote_name(&buf, url)) {
+		strbuf_release(&buf);
+		return NULL;
+	}
+
+	return strbuf_detach(&buf, NULL);
+}
+
+static void configure_auto_promisor_remote(struct repository *repo,
+					   const char *name,
+					   const char *url,
+					   const char *advertised_as,
+					   bool reuse)
+{
+	char *key;
+
+	if (!reuse) {
+		fprintf(stderr, _("Auto-creating promisor remote '%s' for URL '%s'\n"),
+			name, url);
+
+		key = xstrfmt("remote.%s.url", name);
+		repo_config_set_gently(repo, key, url);
+		free(key);
+	}
+
+	/* NB: when reusing, this promotes an existing non-promisor remote */
+	key = xstrfmt("remote.%s.promisor", name);
+	repo_config_set_gently(repo, key, "true");
+	free(key);
+
+	if (advertised_as) {
+		key = xstrfmt("remote.%s.advertisedAs", name);
+		repo_config_set_gently(repo, key, advertised_as);
+		free(key);
+	}
+}
+
+#define MAX_REMOTES_WITH_SIMILAR_NAMES 20
+
+/* Return the allocated local name, or NULL on failure */
+static char *handle_matching_allowed_url(struct repository *repo,
+					 char *allowed_name,
+					 const char *remote_url,
+					 const char *remote_name)
+{
+	char *name;
+	char *basename = allowed_name ?
+		xstrdup(allowed_name) :
+		promisor_remote_name_from_url(remote_url);
+	int i = 0;
+	bool reuse = false;
+
+	if (!basename)
+		return NULL;
+
+	name = xstrdup(basename);
+
+	while (i < MAX_REMOTES_WITH_SIMILAR_NAMES) {
+		char *url_key = xstrfmt("remote.%s.url", name);
+		const char *existing_url;
+		int exists = !repo_config_get_string_tmp(repo, url_key, &existing_url);
+
+		free(url_key);
+
+		if (!exists)
+			break; /* Free to use */
+
+		if (!strcmp(existing_url, remote_url)) {
+			reuse = true;
+			break; /* Same URL, so safe to reuse */
+		}
+
+		i++;
+		free(name);
+		name = xstrfmt("%s-%d", basename, i);
+	}
+
+	if (i < MAX_REMOTES_WITH_SIMILAR_NAMES) {
+		configure_auto_promisor_remote(repo, name,
+					       remote_url, remote_name,
+					       reuse);
+	} else {
+		warning(_("too many remotes accepted with name like '%s-X', "
+			  "ignoring this remote"), basename);
+		FREE_AND_NULL(name);
+	}
+
+	free(basename);
+	return name;
+}
+
+static int should_accept_new_remote_url(struct repository *repo,
+					struct string_list *accept_urls,
+					struct promisor_info *advertised)
+{
+	struct allowed_url *allowed = url_matches_accept_list(accept_urls,
+							     advertised->url);
+	if (allowed) {
+		char *name = handle_matching_allowed_url(repo,
+							 allowed->remote_name,
+							 advertised->url,
+							 advertised->name);
+		if (name) {
+			free((char *)advertised->local_name);
+			advertised->local_name = name;
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static int should_accept_remote(struct repository *repo,
+				enum accept_promisor accept,
 				struct promisor_info *advertised,
 				struct string_list *accept_urls,
-				struct string_list *config_info)
+				struct string_list *config_info,
+				bool *reload_config)
 {
 	struct promisor_info *p;
 	struct string_list_item *item;
@@ -833,6 +1020,13 @@ static int should_accept_remote(enum accept_promisor accept,
 
 	if (!item) {
 		/* We don't know about that remote */
+
+		int res = should_accept_new_remote_url(repo, accept_urls, advertised);
+		if (res) {
+			*reload_config = true;
+			return res;
+		}
+
 		if (accept == ACCEPT_ALL)
 			return all_fields_match(advertised, config_info, NULL);
 		return 0;
@@ -1093,7 +1287,8 @@ static void filter_promisor_remote(struct repository *repo,
 			string_list_sort(&config_info);
 		}
 
-		if (should_accept_remote(accept, advertised, &accept_urls, &config_info)) {
+		if (should_accept_remote(repo, accept, advertised, &accept_urls,
+					 &config_info, &reload_config)) {
 			if (!store_info)
 				store_info = store_info_new(repo);
 			if (promisor_store_advertised_fields(advertised, store_info))
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 0659b2ac15..549acff23f 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -458,6 +458,107 @@ test_expect_success "clone with 'None', URL allowlisted, but client has differen
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with URL allowlisted and no remote already configured" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+	test_when_finished "rm -f full_names" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that exactly one remote has been auto-created, identified
+	# by "remote.<name>.advertisedAs" == "lop".
+	git -C client config get --all --show-names --regexp \
+		"remote\..*\.advertisedas" >full_names &&
+	test_line_count = 1 full_names &&
+	REMOTE_NAME=$(sed "s/^remote\.\(.*\)\.advertisedas .*$/\1/" full_names) &&
+
+	# Check ".url" and ".promisor" values
+	printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" >expect &&
+	git -C client config "remote.$REMOTE_NAME.url" >actual &&
+	git -C client config "remote.$REMOTE_NAME.promisor" >>actual &&
+	test_cmp expect actual &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with named URL allowlisted and no pre-configured remote" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that a remote has been auto-created with the right "cdn" name and fields.
+	printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" >expect &&
+	git -C client config "remote.cdn.url" >actual &&
+	git -C client config "remote.cdn.promisor" >>actual &&
+	git -C client config "remote.cdn.advertisedAs" >>actual &&
+	test_cmp expect actual &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with URL allowlisted but colliding name" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.cdn.promisor=true \
+		-c remote.cdn.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.cdn.url="https://example.com/cdn" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that a remote has been auto-created with the right "cdn-1" name and fields.
+	printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" >expect &&
+	git -C client config "remote.cdn-1.url" >actual &&
+	git -C client config "remote.cdn-1.promisor" >>actual &&
+	git -C client config "remote.cdn-1.advertisedAs" >>actual &&
+	test_cmp expect actual &&
+
+	# Check that the original "cdn" remote was not overwritten.
+	printf "%s\n" "https://example.com/cdn" "true" >expect &&
+	git -C client config "remote.cdn.url" >actual &&
+	git -C client config "remote.cdn.promisor" >>actual &&
+	test_cmp expect actual &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with URL allowlisted and reusable remote" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone \
+		-c remote.cdn.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.cdn.url="$TRASH_DIRECTORY_URL/lop" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the existing "cdn" remote has been properly updated.
+	printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" "+refs/heads/*:refs/remotes/lop/*" >expect &&
+	git -C client config "remote.cdn.url" >actual &&
+	git -C client config "remote.cdn.promisor" >>actual &&
+	git -C client config "remote.cdn.advertisedAs" >>actual &&
+	git -C client config "remote.cdn.fetch" >>actual &&
+	test_cmp expect actual &&
+
+	# Check that no new "cdn-1" remote has been created.
+	test_must_fail git -C client config "remote.cdn-1.url" &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
 	git -C server config promisor.advertise true &&
 	test_when_finished "rm -rf client" &&
@@ -472,6 +573,9 @@ test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
 	# Check that a warning was emitted
 	test_grep "invalid remote name '\''bad name'\''" err &&
 
+	# Check that no remote was auto-created
+	test_must_fail git -C client config get --regexp "remote\..*\.advertisedas" &&
+
 	# Check that the largest object is not missing on the server
 	check_missing_objects server 0 "" &&
 
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 6/8] promisor-remote: trust known remotes matching acceptFromServerUrl
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

A previous commit introduced the `promisor.acceptFromServerUrl` config
variable along with the machinery to parse and validate the URL glob
patterns and optional remote name prefixes it contains. However, these
URL patterns are not yet tied into the client's acceptance logic.

When a promisor remote is already configured locally, its fields (like
authentication tokens) may occasionally need to be refreshed by the
server. If `promisor.acceptFromServer` is set to the secure default
("None"), these updates are rejected, potentially causing future
fetches to fail.

To enable such targeted updates for trusted URLs, let's use the URL
patterns from `promisor.acceptFromServerUrl` as an additional URL
based allowlist.

Concretely, let's check the advertised URLs against the URL glob
patterns by introducing a new small helper function called
url_matches_accept_list(), which iterates over the glob patterns and
returns the first matching allowed_url entry (or NULL).

The URL matching is done component by component: scheme and port are
compared exactly, the host and path are matched with wildmatch().
Before matching, the advertised URL is passed through url_normalize()
so that case variations in the scheme/host, percent-encoding tricks,
and ".." path segments cannot bypass the allowlist.

The username and password components of the URL are intentionally
ignored during matching to allow servers to rotate them, though using
the 'token' field of the capability is preferred over embedding
credentials in the URL.

Let's then use this helper in should_accept_remote() so that a known
remote whose URL matches the allowlist is accepted.

To prepare for this new logic, let's also:

 - Add an 'accept_urls' parameter to should_accept_remote().

 - Replace the BUG() guard in the ACCEPT_KNOWN_URL case with an
   explicit 'if (accept == ACCEPT_KNOWN_URL) return' and a new
   BUG() guard in the ACCEPT_NONE case.

 - Call accept_from_server_url() from filter_promisor_remote()
   and relax its early return so that the function is entered when
   `accept_urls` has entries even if `accept == ACCEPT_NONE`.

With this, many organizations may only need something like:

  git config set --global \
          promisor.acceptFromServerUrl "https://my-org.com/*"

to accept only their own remotes. And if they need to accept additional
remotes in some specific repos, they can also set:

  git config set promisor.acceptFromServer knownUrl

and configure the additional remote manually only in the repos where
they are needed.

Let's then properly document `promisor.acceptFromServerUrl` in
"promisor.adoc" as an additive security allowlist for known remotes,
including the URL normalization behavior and the component-wise
matching, and let's mention it in "gitprotocol-v2.adoc".

Also let's clarify in the documentation how
`promisor.acceptFromServerUrl` interacts with
`promisor.acceptFromServer`:

 - Precedence: when both options are set,
   `promisor.acceptFromServerUrl` is consulted first. If a matching
   pattern leads to acceptance, the remote is accepted regardless of
   `promisor.acceptFromServer`. Otherwise the decision is left to
   `promisor.acceptFromServer`.

 - URL-mismatch guard: even when the advertised URL matches the
   allowlist, an already-existing client-side remote whose configured
   URL differs from the advertised one is not accepted through
   `promisor.acceptFromServerUrl`. `promisor.acceptFromServer=all` and
   `=knownName` keep their pre-existing, looser semantics.

The precedence paragraph is intentionally scoped here to known remotes
only (field updates). A following commit that introduces auto-creation
of unknown remotes will extend it to cover that case as well.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc    |  76 +++++++++++++++++++
 Documentation/gitprotocol-v2.adoc     |   9 ++-
 promisor-remote.c                     | 102 +++++++++++++++++++++++---
 t/t5710-promisor-remote-capability.sh |  71 ++++++++++++++++++
 4 files changed, 244 insertions(+), 14 deletions(-)

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index b0fa43b839..605473c82f 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -51,6 +51,82 @@ promisor.acceptFromServer::
 	to "fetch" and "clone" requests from the client. Name and URL
 	comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
 
+promisor.acceptFromServerUrl::
+	A glob pattern to specify which server-advertised URLs a
+	client is allowed to act on. When a URL matches, the client
+	will accept the advertised remote as a promisor remote and may
+	automatically accept field updates (such as authentication
+	tokens) from the server, even if `promisor.acceptFromServer`
+	is set to `none` (the default).
++
+This option can appear multiple times in config files. An advertised
+URL will be accepted if it matches _ANY_ glob pattern specified by
+this option in _ANY_ config file read by Git.
++
+When both `promisor.acceptFromServer` and `promisor.acceptFromServerUrl`
+are set, `promisor.acceptFromServerUrl` is consulted first and takes
+precedence: if a matching pattern leads to acceptance (by accepting
+field updates for a known remote whose URL matches both the local
+configuration and the allowlist), the advertised remote is accepted
+regardless of the `promisor.acceptFromServer` setting. If no pattern
+in `promisor.acceptFromServerUrl` triggers acceptance, the decision
+is left to `promisor.acceptFromServer`.
++
+Note however that, even when an advertised URL matches a pattern in
+`promisor.acceptFromServerUrl`, an already-existing remote on the
+client whose name matches the advertised name but whose configured URL
+differs from the advertised one will _NOT_ be accepted through
+`promisor.acceptFromServerUrl`. This prevents a server from silently
+re-pointing an existing client-side remote at a different URL. (Such a
+remote may still be accepted through `promisor.acceptFromServer=all`
+or `=knownName`, which have their own, looser semantics; see the
+documentation of that option.)
++
+Be _VERY_ careful with these patterns: `*` matches any sequence of
+characters within the 'host' and 'path' parts of a URL (but cannot
+cross part boundaries). An overly broad pattern is a major security
+risk, as a matching URL allows a server to update fields (such as
+authentication tokens) on known remotes without further confirmation.
+To minimize security risks, follow these guidelines:
++
+--
+1. Start with a secure protocol scheme, like `https://` or `ssh://`.
++
+2. Only allow domain names or paths where you control and trust _ALL_
+   the content. Be especially careful with shared hosting platforms
+   like `github.com` or `gitlab.com`. A broad pattern like
+   `https://gitlab.com/*` is dangerous because it trusts every
+   repository on the entire platform. Always restrict such patterns to
+   your specific organization or namespace (e.g.,
+   `https://gitlab.com/your-org/*`).
++
+3. Never use globs at the end of domain names. For example,
+   `https://cdn.your-org.com/*` might be safe, but
+   `https://cdn.your-org.com*/*` is a major security risk because
+   the latter matches `https://cdn.your-org.com.hacker.net/repo`.
++
+4. Be careful using globs at the beginning of domain names. While the
+   code ensures a `*` in the host cannot cross into the path, a
+   pattern like `https://*.example.com/*` will still match any
+   subdomain. This is extremely dangerous on shared hosting platforms
+   (e.g., `https://*.github.io/*` trusts every user's site on the
+   entire platform).
+--
++
+Before matching, both the advertised URL and the pattern are
+normalized: the scheme and host are lowercased, percent-encoded
+characters are decoded where possible, and path segments like `..`
+are resolved. The port must also match exactly (e.g.,
+`https://example.com:8080/*` will not match a URL advertised on
+port 9999). The username and password components of the URL are
+ignored during matching. Note that embedding credentials in URLs is
+discouraged. Passing authentication tokens via the `token` field of
+the `promisor-remote` capability is strongly preferred.
++
+For the security implications of accepting a promisor remote, see the
+documentation of `promisor.acceptFromServer`. For details on the
+protocol, see linkgit:gitprotocol-v2[5].
+
 promisor.checkFields::
 	A comma or space separated list of additional remote related
 	field names. A client checks if the values of these fields
diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
index befa697d21..2beb70595f 100644
--- a/Documentation/gitprotocol-v2.adoc
+++ b/Documentation/gitprotocol-v2.adoc
@@ -866,10 +866,11 @@ the server advertised, the client shouldn't advertise the
 
 On the server side, the "promisor.advertise" and "promisor.sendFields"
 configuration options can be used to control what it advertises. On
-the client side, the "promisor.acceptFromServer" configuration option
-can be used to control what it accepts, and the "promisor.storeFields"
-option, to control what it stores. See the documentation of these
-configuration options in linkgit:git-config[1] for more information.
+the client side, the "promisor.acceptFromServer" and
+"promisor.acceptFromServerUrl" configuration options can be used to
+control what it accepts, and the "promisor.storeFields" option, to
+control what it stores. See the documentation of these configuration
+options in linkgit:git-config[1] for more information.
 
 Note that in the future it would be nice if the "promisor-remote"
 protocol capability could be used by the server, when responding to
diff --git a/promisor-remote.c b/promisor-remote.c
index 8d4f6e0a72..04a5bb9939 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -14,6 +14,7 @@
 #include "url.h"
 #include "urlmatch.h"
 #include "version.h"
+#include "wildmatch.h"
 
 struct promisor_remote_config {
 	struct promisor_remote *promisors;
@@ -742,8 +743,79 @@ static void load_accept_from_server_url(struct repository *repo,
 	}
 }
 
+static bool match_pattern_url(const char *pat, size_t pat_len,
+			      const char *url, size_t url_len)
+{
+	char *p_str = xstrndup(pat, pat_len);
+	char *u_str = xstrndup(url, url_len);
+	bool res = !wildmatch(p_str, u_str, 0);
+
+	free(p_str);
+	free(u_str);
+
+	return res;
+}
+
+static bool match_one_url(const struct url_info *pi, const struct url_info *ui)
+{
+	const char *pat = pi->url;
+	const char *url = ui->url;
+
+	/*
+	 * Schemes must match exactly. They are case-folded by
+	 * url_normalize(), so strncmp() suffices.
+	 */
+	if (pi->scheme_len != ui->scheme_len || strncmp(pat, url, pi->scheme_len))
+		return false;
+
+	/*
+	 * Ports must match exactly. url_normalize() strips default
+	 * ports (like 443 for https), so length and content
+	 * comparisons are sufficient.
+	 */
+	if (pi->port_len != ui->port_len ||
+	    strncmp(pat + pi->port_off, url + ui->port_off, pi->port_len))
+		return false;
+
+	/*
+	 * Match host and path separately to prevent a '*' in the host
+	 * portion of the pattern from matching across the '/'
+	 * boundary into the path.
+	 */
+
+	return match_pattern_url(pat + pi->host_off, pi->host_len,
+				 url + ui->host_off, ui->host_len) &&
+		match_pattern_url(pat + pi->path_off, pi->path_len,
+				  url + ui->path_off, ui->path_len);
+}
+
+static struct allowed_url *url_matches_accept_list(
+		struct string_list *accept_urls, const char *url)
+{
+	struct string_list_item *item;
+	struct url_info url_info;
+
+	url_info.url = url_normalize(url, &url_info);
+
+	if (!url_info.url)
+		return NULL;
+
+	for_each_string_list_item(item, accept_urls) {
+		struct allowed_url *allowed = item->util;
+
+		if (match_one_url(&allowed->pattern_info, &url_info)) {
+			free(url_info.url);
+			return allowed;
+		}
+	}
+
+	free(url_info.url);
+	return NULL;
+}
+
 static int should_accept_remote(enum accept_promisor accept,
 				struct promisor_info *advertised,
+				struct string_list *accept_urls,
 				struct string_list *config_info)
 {
 	struct promisor_info *p;
@@ -756,23 +828,27 @@ static int should_accept_remote(enum accept_promisor accept,
 		    "this remote should have been rejected earlier",
 		    remote_name);
 
-	if (accept == ACCEPT_ALL)
-		return all_fields_match(advertised, config_info, NULL);
-
 	/* Get config info for that promisor remote */
 	item = string_list_lookup(config_info, remote_name);
 
-	if (!item)
+	if (!item) {
 		/* We don't know about that remote */
+		if (accept == ACCEPT_ALL)
+			return all_fields_match(advertised, config_info, NULL);
 		return 0;
+	}
 
 	p = item->util;
 
-	if (accept == ACCEPT_KNOWN_NAME)
+	/* Known remote in the allowlist? */
+	if (!strcmp(p->url, remote_url) && url_matches_accept_list(accept_urls, remote_url))
 		return all_fields_match(advertised, config_info, p);
 
-	if (accept != ACCEPT_KNOWN_URL)
-		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
+	if (accept == ACCEPT_ALL)
+		return all_fields_match(advertised, config_info, NULL);
+
+	if (accept == ACCEPT_KNOWN_NAME)
+		return all_fields_match(advertised, config_info, p);
 
 	if (strcmp(p->url, remote_url)) {
 		warning(_("known remote named '%s' but with URL '%s' instead of '%s', "
@@ -781,7 +857,13 @@ static int should_accept_remote(enum accept_promisor accept,
 		return 0;
 	}
 
-	return all_fields_match(advertised, config_info, p);
+	if (accept == ACCEPT_KNOWN_URL)
+		return all_fields_match(advertised, config_info, p);
+
+	if (accept != ACCEPT_NONE)
+		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
+
+	return 0;
 }
 
 static int skip_field_name_prefix(const char *elem, const char *field_name, const char **value)
@@ -991,7 +1073,7 @@ static void filter_promisor_remote(struct repository *repo,
 	/* Load and validate the acceptFromServerUrl config */
 	load_accept_from_server_url(repo, &accept_urls);
 
-	if (accept == ACCEPT_NONE)
+	if (accept == ACCEPT_NONE && !accept_urls.nr)
 		return;
 
 	/* Parse remote info received */
@@ -1011,7 +1093,7 @@ static void filter_promisor_remote(struct repository *repo,
 			string_list_sort(&config_info);
 		}
 
-		if (should_accept_remote(accept, advertised, &config_info)) {
+		if (should_accept_remote(accept, advertised, &accept_urls, &config_info)) {
 			if (!store_info)
 				store_info = store_info_new(repo);
 			if (promisor_store_advertised_fields(advertised, store_info))
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 3b39505380..0659b2ac15 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -387,6 +387,77 @@ test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
 	check_missing_objects server 1 "$oid"
 '
 
+test_expect_success "clone with 'None' but URL allowlisted" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'None' but URL not in allowlist" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="https://example.com/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
+test_expect_success "clone with 'None' but URL allowlisted in one pattern out of two" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="https://example.com/*" \
+		-c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'None', URL allowlisted, but client has different URL" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	# The client configures "lop" with a different URL (serverTwo) than
+	# what the server advertises (lop). Even though the advertised URL
+	# matches the allowlist, the remote is rejected because the
+	# configured URL does not match the advertised one.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="$TRASH_DIRECTORY_URL/serverTwo" \
+		-c promisor.acceptfromserver=None \
+		-c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
 	git -C server config promisor.advertise true &&
 	test_when_finished "rm -rf client" &&
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 5/8] promisor-remote: introduce promisor.acceptFromServerUrl
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

The "promisor-remote" protocol capability allows servers to advertise
promisor remotes, but doesn't allow these remotes to be automatically
configured on the client.

Let's introduce a new `promisor.acceptFromServerUrl` config variable
which contains a glob pattern, so that advertised remotes with a URL
matching that pattern will be automatically configured.

The glob pattern can optionally be prefixed with a remote name which
will be used as the name of the new local remote.

For now though, let's only introduce the functions to read and validate
the glob patterns and the optional prefixes.

Checking if the URLs of the advertised remotes match the glob patterns
and taking the appropriate action is left for a following commit.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 90 +++++++++++++++++++++++++++
 t/t5710-promisor-remote-capability.sh | 21 +++++++
 2 files changed, 111 insertions(+)

diff --git a/promisor-remote.c b/promisor-remote.c
index 138a412893..8d4f6e0a72 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "environment.h"
 #include "url.h"
+#include "urlmatch.h"
 #include "version.h"
 
 struct promisor_remote_config {
@@ -657,6 +658,90 @@ static bool has_control_char(const char *s)
 	return false;
 }
 
+struct allowed_url {
+	char *remote_name;
+	char *url_pattern;
+	struct url_info pattern_info;
+};
+
+static void allowed_url_free(void *util, const char *str UNUSED)
+{
+	struct allowed_url *allowed = util;
+
+	if (!allowed)
+		return;
+
+	/* Depending on prefix, free either remote_name or url_pattern */
+	free(allowed->remote_name ? allowed->remote_name : allowed->url_pattern);
+	free(allowed->pattern_info.url);
+	free(allowed);
+}
+
+static struct allowed_url *valid_accept_url(const char *url)
+{
+	char *dup, *p;
+	struct allowed_url *allowed;
+
+	if (!url)
+		return NULL;
+
+	dup = xstrdup(url);
+	p = strchr(dup, '=');
+	if (p) {
+		*p = '\0';
+		if (!valid_remote_name(dup)) {
+			warning(_("invalid remote name '%s' before '=' sign "
+				  "in '%s' from promisor.acceptFromServerUrl config"),
+				dup, url);
+			free(dup);
+			return NULL;
+		}
+		p++;
+	} else {
+		p = dup;
+	}
+
+	if (has_control_char(p)) {
+		warning(_("invalid url pattern '%s' "
+			  "in '%s' from promisor.acceptFromServerUrl config"), p, url);
+		free(dup);
+		return NULL;
+	}
+
+	allowed = xmalloc(sizeof(*allowed));
+	allowed->remote_name = (p == dup) ? NULL : dup;
+	allowed->url_pattern = p;
+	allowed->pattern_info.url = url_normalize_pattern(p, &allowed->pattern_info);
+	if (!allowed->pattern_info.url) {
+		warning(_("invalid url pattern '%s' "
+			  "in '%s' from promisor.acceptFromServerUrl config"), p, url);
+		free(dup);
+		free(allowed);
+		return NULL;
+	}
+
+	return allowed;
+}
+
+static void load_accept_from_server_url(struct repository *repo,
+					struct string_list *accept_urls)
+{
+	const struct string_list *config_urls;
+
+	if (!repo_config_get_string_multi(repo, "promisor.acceptfromserverurl", &config_urls)) {
+		struct string_list_item *item;
+
+		for_each_string_list_item(item, config_urls) {
+			struct allowed_url *allowed = valid_accept_url(item->string);
+			if (allowed) {
+				struct string_list_item *new;
+				new = string_list_append(accept_urls, item->string);
+				new->util = allowed;
+			}
+		}
+	}
+}
+
 static int should_accept_remote(enum accept_promisor accept,
 				struct promisor_info *advertised,
 				struct string_list *config_info)
@@ -901,6 +986,10 @@ static void filter_promisor_remote(struct repository *repo,
 	struct string_list_item *item;
 	bool reload_config = false;
 	enum accept_promisor accept = accept_from_server(repo);
+	struct string_list accept_urls = STRING_LIST_INIT_DUP;
+
+	/* Load and validate the acceptFromServerUrl config */
+	load_accept_from_server_url(repo, &accept_urls);
 
 	if (accept == ACCEPT_NONE)
 		return;
@@ -934,6 +1023,7 @@ static void filter_promisor_remote(struct repository *repo,
 		}
 	}
 
+	string_list_clear_func(&accept_urls, allowed_url_free);
 	promisor_info_list_clear(&config_info);
 	string_list_clear(&remote_info, 0);
 	store_info_free(store_info);
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index bf1cc54605..3b39505380 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -387,6 +387,27 @@ test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
 	check_missing_objects server 1 "$oid"
 '
 
+test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	# As "bad name" contains a space, which is not a valid remote name,
+	# the pattern should be rejected with a warning and no remote created.
+	GIT_NO_LAZY_FETCH=0 git clone \
+		-c promisor.acceptfromserver=None \
+		-c "promisor.acceptFromServerUrl=bad name=https://example.com/*" \
+		--no-local --filter="blob:limit=5k" server client 2>err &&
+
+	# Check that a warning was emitted
+	test_grep "invalid remote name '\''bad name'\''" err &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.sendFields" '
 	git -C server config promisor.advertise true &&
 	test_when_finished "rm -rf client" &&
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 4/8] promisor-remote: add 'local_name' to 'struct promisor_info'
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

In a following commit, we will store promisor remote information under
a remote name different than the one the server advertised.

To prepare for this change, let's add a new 'char *local_name' member
to 'struct promisor_info', and let's update the related functions.

While at it, let's also add a small promisor_info_local_name() helper
that returns `local_name` when set, `name` otherwise, and let's use
this small helper in promisor_store_advertised_fields() and in the
post-loop of filter_promisor_remote() so that lookups against the local
repo configuration use the right name.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 38fa050542..138a412893 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -434,13 +434,14 @@ static struct string_list *fields_stored(void)
  * Struct for promisor remotes involved in the "promisor-remote"
  * protocol capability.
  *
- * Except for "name", each <member> in this struct and its <value>
- * should correspond (either on the client side or on the server side)
- * to a "remote.<name>.<member>" config variable set to <value> where
- * "<name>" is a promisor remote name.
+ * Except for "name" and "local_name", each <member> in this struct
+ * and its <value> should correspond (either on the client side or on
+ * the server side) to a "remote.<name>.<member>" config variable set
+ * to <value> where "<name>" is a promisor remote name.
  */
 struct promisor_info {
-	const char *name;
+	const char *name;	/* name the server advertised */
+	const char *local_name;	/* name used locally (may be auto-generated) */
 	const char *url;
 	const char *filter;
 	const char *token;
@@ -449,6 +450,7 @@ struct promisor_info {
 static void promisor_info_free(struct promisor_info *p)
 {
 	free((char *)p->name);
+	free((char *)p->local_name);
 	free((char *)p->url);
 	free((char *)p->filter);
 	free((char *)p->token);
@@ -462,6 +464,11 @@ static void promisor_info_list_clear(struct string_list *list)
 	string_list_clear(list, 0);
 }
 
+static const char *promisor_info_local_name(struct promisor_info *p)
+{
+	return p->local_name ? p->local_name : p->name;
+}
+
 static void set_one_field(struct promisor_info *p,
 			  const char *field, const char *value)
 {
@@ -829,7 +836,7 @@ static bool promisor_store_advertised_fields(struct promisor_info *advertised,
 {
 	struct promisor_info *p;
 	struct string_list_item *item;
-	const char *remote_name = advertised->name;
+	const char *remote_name = promisor_info_local_name(advertised);
 	bool reload_config = false;
 
 	if (!(store_info->store_filter || store_info->store_token))
@@ -937,7 +944,8 @@ static void filter_promisor_remote(struct repository *repo,
 	/* Apply accepted remotes to the stable repo state */
 	for_each_string_list_item(item, accepted_remotes) {
 		struct promisor_info *info = item->util;
-		struct promisor_remote *r = repo_promisor_remote_find(repo, info->name);
+		const char *remote_name = promisor_info_local_name(info);
+		struct promisor_remote *r = repo_promisor_remote_find(repo, remote_name);
 
 		if (r) {
 			r->accepted = 1;
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 3/8] urlmatch: add url_normalize_pattern() helper
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

In a following commit, we will need to normalize a URL glob pattern
(which may contain '*' in the host portion) and extract its component
offsets (host, path, etc.) for separate matching. Let's export a
dedicated helper function url_normalize_pattern() for that purpose.

It works like url_normalize(), but passes allow_globs=true to the
internal url_normalize_1(), so that '*' characters in the host are
accepted rather than rejected.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 urlmatch.c |  5 +++++
 urlmatch.h | 12 ++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/urlmatch.c b/urlmatch.c
index b2d88a5289..20bc2d009c 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -441,6 +441,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return url_normalize_1(url, out_info, false);
 }
 
+char *url_normalize_pattern(const char *url, struct url_info *out_info)
+{
+	return url_normalize_1(url, out_info, true);
+}
+
 char *url_parse(const char *url_orig, struct url_info *out_info)
 {
 	struct strbuf url;
diff --git a/urlmatch.h b/urlmatch.h
index 6b3ce42858..db1a335e72 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -37,6 +37,18 @@ struct url_info {
 char *url_normalize(const char *, struct url_info *);
 char *url_parse(const char *, struct url_info *);
 
+/*
+ * Like url_normalize(), but also allows '*' glob characters in the host
+ * portion. Use this when normalizing URL patterns from user configuration.
+ *
+ * Note that '*' is a valid path character per RFC 3986 (as a sub-delim),
+ * so glob patterns using '*' in the path are also accepted.
+ *
+ * Returns a newly allocated normalized string and fills out_info if
+ * non-NULL, or NULL if the pattern is invalid.
+ */
+char *url_normalize_pattern(const char *url, struct url_info *out_info);
+
 struct urlmatch_item {
 	size_t hostmatch_len;
 	size_t pathmatch_len;
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 2/8] urlmatch: change 'allow_globs' arg to bool
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

The last argument of url_normalize_1() is `char allow_globs` but it is
used as a boolean, not as a char.

Let's convert it to a `bool`, and while at it convert the two calls to
url_normalize_1() so they pass 'true' or 'false' instead of '1' or '0'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 urlmatch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index bf8cce6de9..b2d88a5289 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -112,7 +112,7 @@ static int match_host(const struct url_info *url_info,
 	return (!url_len && !pat_len);
 }
 
-static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
+static char *url_normalize_1(const char *url, struct url_info *out_info, bool allow_globs)
 {
 	/*
 	 * Normalize NUL-terminated url using the following rules:
@@ -438,7 +438,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 
 char *url_normalize(const char *url, struct url_info *out_info)
 {
-	return url_normalize_1(url, out_info, 0);
+	return url_normalize_1(url, out_info, false);
 }
 
 char *url_parse(const char *url_orig, struct url_info *out_info)
@@ -704,7 +704,7 @@ int urlmatch_config_entry(const char *var, const char *value,
 		struct url_info norm_info;
 
 		config_url = xmemdupz(key, dot - key);
-		norm_url = url_normalize_1(config_url, &norm_info, 1);
+		norm_url = url_normalize_1(config_url, &norm_info, true);
 		if (norm_url)
 			retval = match_urls(url, &norm_info, &matched);
 		else if (collect->fallback_match_fn)
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 1/8] t5710: simplify 'mkdir X' followed by 'git -C X init'
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder,
	Christian Couder
In-Reply-To: <20260527140820.1438165-1-christian.couder@gmail.com>

It's simpler and more efficient to just use `git init client` instead
of `mkdir client && git -C client init`.

So let's replace the latter with the former.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t5710-promisor-remote-capability.sh | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index b404ad9f0a..bf1cc54605 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -177,8 +177,7 @@ test_expect_success "init + fetch with promisor.advertise set to 'true'" '
 	git -C server config promisor.advertise true &&
 	test_when_finished "rm -rf client" &&
 
-	mkdir client &&
-	git -C client init &&
+	git init client &&
 	git -C client config remote.lop.promisor true &&
 	git -C client config remote.lop.fetch "+refs/heads/*:refs/remotes/lop/*" &&
 	git -C client config remote.lop.url "$TRASH_DIRECTORY_URL/lop" &&
@@ -231,8 +230,7 @@ test_expect_success "init + fetch two promisors but only one advertised" '
 	# Create a promisor that will be configured but not be used
 	git init --bare unused_lop &&
 
-	mkdir client &&
-	git -C client init &&
+	git init client &&
 	git -C client config remote.unused_lop.promisor true &&
 	git -C client config remote.unused_lop.fetch "+refs/heads/*:refs/remotes/unused_lop/*" &&
 	git -C client config remote.unused_lop.url "$TRASH_DIRECTORY_URL/unused_lop" &&
-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply related

* [PATCH v4 0/8] Auto-configure advertised remotes via URL allowlist
From: Christian Couder @ 2026-05-27 14:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
	Elijah Newren, Toon Claes, Kristoffer Haugsbakk, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>

Currently, the "promisor-remote" protocol capability allows a server
to advertise promisor remotes (and their tokens/filters), but the
client's `promisor.acceptFromServer` mechanism requires these remotes
to already exist in the config.

This is a significant burden for users and administrators who have to
pre-configure remotes.

This patch series improves on this by introducing a new
`promisor.acceptFromServerUrl` config option, which provides an
additive, URL-based security allowlist.

Multiple `promisor.acceptFromServerUrl` config options can be provided
in different config files. Each one should contain a URL glob pattern
which can optionally be prefixed with a remote name in the
"[<name>=]<pattern>" format.

The goal is for something like a simple:

  git config set --global promisor.acceptFromServerUrl "https://my-org.com/*"

to be all that is needed for internal work in many organizations. 

With this new config option:

 - The server can update fields (like tokens) for known remotes,
   provided their URL matches the allowlist, even if
   `acceptFromServer` is set to `None`.

 - Unknown remotes advertised by the server can be automatically
   configured on the client if their URL matches the allowlist.

 - If there is no `<name>` prefix before the glob pattern matched, the
   auto-configured remote is named using the
   "promisor-auto-<sanitized-url>" format. So the same auto-configured
   remote config entry will be reused for the same URL.

 - If a `<name>` prefix is provided, it will be used for the
   auto-configured remote config entry.

 - If the chosen name (auto-generated or prefixed) already exists but
   points to a different URL, overwriting the existing config is
   prevented by appending a numeric suffix (e.g., -1, -2) to the name
   and auto-configuring using that name.

 - The server's originally advertised name is always saved in the
   `remote.<name>.advertisedAs` config variable of the auto-configured
   remote for tracing and debugging.

Security considerations:

 - Advertised URLs and glob patterns are routed through
   url_normalize() / url_normalize_pattern() before matching, to
   prevent percent-encoding, case variation, or path-traversal (..)
   bypasses.

 - URL matching is done component by component: scheme and port
   must match exactly (no wildcards), the host is matched with
   WM_PATHNAME so a '*' cannot cross the '/' boundary into the
   path, and the path is matched without WM_PATHNAME so '*' can
   still span multi-level paths.

 - Auto-generated remote names are sanitized (non-alphanumeric
   characters are replaced with '-', runs of '-' are collapsed)
   and prefixed with 'promisor-auto-'. User-supplied names (from
   the 'name=<pattern>' syntax) are validated with
   valid_remote_name(). Together, these prevent a server from
   maliciously overwriting standard remotes (like 'origin').

 - If the auto-generated or user-supplied name collides with an
   existing remote configured to a different URL, a numeric
   suffix ('-1', '-2', ...) is appended, up to a bounded limit,
   so a server cannot hijack an existing remote by name.

 - Known remotes are still subject to URL consistency checks:
   even if an advertised URL matches the allowlist, it is only
   accepted for a known remote if it matches the URL already
   configured locally for that remote.

 - The documentation explains in detail how to write secure glob
   patterns in `promisor.acceptFromServerUrl`, and highlights the
   risks of overly broad patterns on shared hosting platforms.

High level description of the patches
=====================================

 - Patch 1/8 is a very small preparatory patch that simplifies some
   tests a bit.

 - Patches 2/8 and 3/8 expose and adapt a url_normalize_pattern()
   helper function in the urlmatch API.

 - Patch 4/8 adapts `struct promisor_info` by adding a new
   `local_name` member to it to prepare for the next patches.

 - Patches 5/8 to 7/8 implement the core feature. They introduce the
   parsing machinery, add the additive allowlist for known remotes
   (with url_normalize() security), and finally implement the
   auto-creation and collision resolution for unknown remotes.

 - Patch 8/8 cleans up and modernizes the existing
   `promisor.acceptFromServer` documentation.

Changes compared to v3
======================

Thanks to Toon, Kristoffer, Patrick and Junio for reviewing the
previous versions of this series and of the preparatory series.

This has been rebased onto master @ 56a4f3c3a2 (The 8th batch,
2026-05-25) to avoid a trivial conflict in "urlmatch.c".

Only minor changes have been made since v3, in the following patches:

 - Patch 4/8 ("promisor-remote: add 'local_name' to 'struct
   promisor_info'"):

   - The promisor_info_internal_name() function has been renamed
     promisor_info_local_name() for clarity.

   - A `const char *local` local variable has been renamed
     `remote_name` for consistency with another similar variable.

 - Patch 6/8 ("promisor-remote: trust known remotes matching
   acceptFromServerUrl"):

   - A spurious comma in the commit message has been deleted.

   - The `promisor.acceptFromServerUrl` documentation in
     "Documentation/config/promisor.adoc" now uses `--` to separate a
     numbered list from the surrounding paragraphs.

CI tests
========

They all pass, see:

https://github.com/chriscool/git/actions/runs/26514308470

Range diff since v3
===================

1:  ab231c0896 = 1:  9fcc7d9d5e t5710: simplify 'mkdir X' followed by 'git -C X init'
2:  b3e66f329f ! 2:  ec558c2b9c urlmatch: change 'allow_globs' arg to bool
    @@ urlmatch.c: static char *url_normalize_1(const char *url, struct url_info *out_i
     +	return url_normalize_1(url, out_info, false);
      }
      
    - static size_t url_match_prefix(const char *url,
    + char *url_parse(const char *url_orig, struct url_info *out_info)
     @@ urlmatch.c: int urlmatch_config_entry(const char *var, const char *value,
      		struct url_info norm_info;
      
3:  813d748dd6 ! 3:  79ee353449 urlmatch: add url_normalize_pattern() helper
    @@ urlmatch.c: char *url_normalize(const char *url, struct url_info *out_info)
     +	return url_normalize_1(url, out_info, true);
     +}
     +
    - static size_t url_match_prefix(const char *url,
    - 			       const char *url_prefix,
    - 			       size_t url_prefix_len)
    + char *url_parse(const char *url_orig, struct url_info *out_info)
    + {
    + 	struct strbuf url;
     
      ## urlmatch.h ##
     @@ urlmatch.h: struct url_info {
    - 
      char *url_normalize(const char *, struct url_info *);
    + char *url_parse(const char *, struct url_info *);
      
     +/*
     + * Like url_normalize(), but also allows '*' glob characters in the host
4:  e92863bee8 ! 4:  037fd46ac7 promisor-remote: add 'local_name' to 'struct promisor_info'
    @@ Commit message
         To prepare for this change, let's add a new 'char *local_name' member
         to 'struct promisor_info', and let's update the related functions.
     
    -    While at it, let's also add a small promisor_info_internal_name()
    -    helper that returns `local_name` when set, `name` otherwise, and let's
    -    use this small helper in promisor_store_advertised_fields() and in the
    +    While at it, let's also add a small promisor_info_local_name() helper
    +    that returns `local_name` when set, `name` otherwise, and let's use
    +    this small helper in promisor_store_advertised_fields() and in the
         post-loop of filter_promisor_remote() so that lookups against the local
         repo configuration use the right name.
     
    @@ promisor-remote.c: static void promisor_info_list_clear(struct string_list *list
      	string_list_clear(list, 0);
      }
      
    -+static const char *promisor_info_internal_name(struct promisor_info *p)
    ++static const char *promisor_info_local_name(struct promisor_info *p)
     +{
     +	return p->local_name ? p->local_name : p->name;
     +}
    @@ promisor-remote.c: static bool promisor_store_advertised_fields(struct promisor_
      	struct promisor_info *p;
      	struct string_list_item *item;
     -	const char *remote_name = advertised->name;
    -+	const char *remote_name = promisor_info_internal_name(advertised);
    ++	const char *remote_name = promisor_info_local_name(advertised);
      	bool reload_config = false;
      
      	if (!(store_info->store_filter || store_info->store_token))
    @@ promisor-remote.c: static void filter_promisor_remote(struct repository *repo,
      	for_each_string_list_item(item, accepted_remotes) {
      		struct promisor_info *info = item->util;
     -		struct promisor_remote *r = repo_promisor_remote_find(repo, info->name);
    -+		const char *local = promisor_info_internal_name(info);
    -+		struct promisor_remote *r = repo_promisor_remote_find(repo, local);
    ++		const char *remote_name = promisor_info_local_name(info);
    ++		struct promisor_remote *r = repo_promisor_remote_find(repo, remote_name);
      
      		if (r) {
      			r->accepted = 1;
5:  7e1b106404 = 5:  532adb7ca9 promisor-remote: introduce promisor.acceptFromServerUrl
6:  f00eed4bf2 ! 6:  b970f5647c promisor-remote: trust known remotes matching acceptFromServerUrl
    @@ Commit message
         the 'token' field of the capability is preferred over embedding
         credentials in the URL.
     
    -    Let's then use this helper in should_accept_remote() so that, a known
    +    Let's then use this helper in should_accept_remote() so that a known
         remote whose URL matches the allowlist is accepted.
     
         To prepare for this new logic, let's also:
    @@ Documentation/config/promisor.adoc: promisor.acceptFromServer::
     +authentication tokens) on known remotes without further confirmation.
     +To minimize security risks, follow these guidelines:
     ++
    ++--
     +1. Start with a secure protocol scheme, like `https://` or `ssh://`.
     ++
     +2. Only allow domain names or paths where you control and trust _ALL_
    @@ Documentation/config/promisor.adoc: promisor.acceptFromServer::
     +   subdomain. This is extremely dangerous on shared hosting platforms
     +   (e.g., `https://*.github.io/*` trusts every user's site on the
     +   entire platform).
    ++--
     ++
     +Before matching, both the advertised URL and the pattern are
     +normalized: the scheme and host are lowercased, percent-encoded
7:  af06fb31db ! 7:  1875228a7b promisor-remote: auto-configure unknown remotes
    @@ Documentation/config/promisor.adoc: documentation of that option.)
     +without further confirmation. To minimize security risks, follow these
     +guidelines:
      +
    + --
      1. Start with a secure protocol scheme, like `https://` or `ssh://`.
    - +
     @@ Documentation/config/promisor.adoc: ignored during matching. Note that embedding credentials in URLs is
      discouraged. Passing authentication tokens via the `token` field of
      the `promisor-remote` capability is strongly preferred.
8:  92075d88d8 = 8:  351ece0b90 doc: promisor: improve acceptFromServer entry


Christian Couder (8):
  t5710: simplify 'mkdir X' followed by 'git -C X init'
  urlmatch: change 'allow_globs' arg to bool
  urlmatch: add url_normalize_pattern() helper
  promisor-remote: add 'local_name' to 'struct promisor_info'
  promisor-remote: introduce promisor.acceptFromServerUrl
  promisor-remote: trust known remotes matching acceptFromServerUrl
  promisor-remote: auto-configure unknown remotes
  doc: promisor: improve acceptFromServer entry

 Documentation/config/promisor.adoc    | 148 +++++++--
 Documentation/config/remote.adoc      |   9 +
 Documentation/gitprotocol-v2.adoc     |   9 +-
 promisor-remote.c                     | 413 ++++++++++++++++++++++++--
 t/t5710-promisor-remote-capability.sh | 202 ++++++++++++-
 urlmatch.c                            |  11 +-
 urlmatch.h                            |  12 +
 7 files changed, 756 insertions(+), 48 deletions(-)

-- 
2.54.0.275.g96c817d129.dirty


^ permalink raw reply

* [PATCH 2/2] commit: remove deprecated functions
From: kristofferhaugsbakk @ 2026-05-27 13:59 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Patrick Steinhardt
In-Reply-To: <CV_commit.h_remove_deprecated.714@msgid.xyz>

From: Kristoffer Haugsbakk <code@khaugsbakk.name>

These functions were deprecated in a series of commits merged in
52882024 (Merge branch 'ps/commit-list-functions-renamed', 2026-02-13).

The compatibility was for in-flight topics at the time.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 commit.h | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/commit.h b/commit.h
index 58150045afa..5352056f87a 100644
--- a/commit.h
+++ b/commit.h
@@ -203,25 +203,6 @@ struct commit_list *commit_list_reverse(struct commit_list *list);
 
 void commit_list_free(struct commit_list *list);
 
-/*
- * Deprecated compatibility functions for `struct commit_list`, to be removed
- * once Git 2.53 is released.
- */
-static inline struct commit_list *copy_commit_list(struct commit_list *l)
-{
-	return commit_list_copy(l);
-}
-
-static inline struct commit_list *reverse_commit_list(struct commit_list *l)
-{
-	return commit_list_reverse(l);
-}
-
-static inline void free_commit_list(struct commit_list *l)
-{
-	commit_list_free(l);
-}
-
 struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
 
 const char *repo_logmsg_reencode(struct repository *r,
-- 
2.54.0.6.gf6fa7dd4140


^ permalink raw reply related

* [PATCH 1/2] *: replace deprecated free_commit_list
From: kristofferhaugsbakk @ 2026-05-27 13:59 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Patrick Steinhardt
In-Reply-To: <CV_commit.h_remove_deprecated.714@msgid.xyz>

From: Kristoffer Haugsbakk <code@khaugsbakk.name>

Replace `free_commit_list` with `commit_list_free`. The former was
deprecated in 9f18d089 (commit: rename `free_commit_list()` to conform
to coding guidelines, 2026-01-15).

This allows us to remove all the deprecated functions in the
next commit:

• `copy_commit_list`
• `reverse_commit_list`
• `free_commit_list`

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 builtin/history.c | 4 ++--
 replay.c          | 2 +-
 upload-pack.c     | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/history.c b/builtin/history.c
index 0fc06fb2045..091465a59e2 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -284,7 +284,7 @@ static int setup_revwalk(struct repository *repo,
 
 		commit_list_insert(original, &from_list);
 		ret = repo_is_descendant_of(repo, head, from_list);
-		free_commit_list(from_list);
+		commit_list_free(from_list);
 
 		if (ret < 0) {
 			ret = error(_("cannot determine descendance"));
@@ -892,7 +892,7 @@ static int split_commit(struct repository *repo,
 	if (index_file.len)
 		unlink(index_file.buf);
 	strbuf_release(&index_file);
-	free_commit_list(parents);
+	commit_list_free(parents);
 	release_index(&index);
 	return ret;
 }
diff --git a/replay.c b/replay.c
index 4ef8abb6077..da531d5bc68 100644
--- a/replay.c
+++ b/replay.c
@@ -120,7 +120,7 @@ static struct commit *create_commit(struct repository *repo,
 out:
 	repo_unuse_commit_buffer(repo, based_on, message);
 	free_commit_extra_headers(extra);
-	free_commit_list(parents);
+	commit_list_free(parents);
 	strbuf_release(&msg);
 	free(author);
 	return (struct commit *)obj;
diff --git a/upload-pack.c b/upload-pack.c
index 9f6d6fe48c8..2bf450ab288 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -886,7 +886,7 @@ static void deepen(struct upload_pack_data *data, int depth)
 					     data->deepen_relative, depth,
 					     SHALLOW, NOT_SHALLOW);
 		send_shallow(data, result);
-		free_commit_list(result);
+		commit_list_free(result);
 	}
 
 	send_unshallow(data);
@@ -900,7 +900,7 @@ static void deepen_by_rev_list(struct upload_pack_data *data,
 	disable_commit_graph(the_repository);
 	result = get_shallow_commits_by_rev_list(argv, SHALLOW, NOT_SHALLOW);
 	send_shallow(data, result);
-	free_commit_list(result);
+	commit_list_free(result);
 	send_unshallow(data);
 }
 
-- 
2.54.0.6.gf6fa7dd4140


^ permalink raw reply related

* [PATCH 0/2] commit: remove deprecated functions
From: kristofferhaugsbakk @ 2026-05-27 13:59 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Patrick Steinhardt

From: Kristoffer Haugsbakk <code@khaugsbakk.name>

Topic name: kh/commit-deprecated

Topic summary: Remove deprecated comments that were slated for removal
after Git 2.53.0.

See the comment:

    /*
     * Deprecated compatibility functions for `struct commit_list`, to be removed
     * once Git 2.53 is released.
     */

I merged in `seen` and `next` yesterday and found no new in-flight usages
of these functions.

I commented on this patch but apparently it hasn’t hit any of these
integration branches yet:

Patch: replay: support replaying 2-parent merges
Link: https://lore.kernel.org/git/920cc022-8b63-4dbb-a41d-957ee01a5efd@app.fastmail.com/

[1/2] *: replace deprecated free_commit_list
[2/2] commit: remove deprecated functions

 builtin/history.c |  4 ++--
 commit.h          | 19 -------------------
 replay.c          |  2 +-
 upload-pack.c     |  4 ++--
 4 files changed, 5 insertions(+), 24 deletions(-)


base-commit: 56a4f3c3a221adf1df9b39da69b8a6890f803157
-- 
2.54.0.6.gf6fa7dd4140


^ permalink raw reply

* git-maintenance detach timing, was Re: I discovered a minor issue with `git fetch`.
From: Jeff King @ 2026-05-27 10:56 UTC (permalink / raw)
  To: SURA; +Cc: git
In-Reply-To: <CAD6AYr9YmcnkdW=Nx=HUKcuaNbv1ukrAbXRnKyGibCQDy8N3hQ@mail.gmail.com>

On Fri, May 22, 2026 at 03:45:25PM +0800, SURA wrote:

> A zombie process has appeared. It appears to originate from a `fetch`
> subprocess that terminates very quickly; despite several attempts, I
> have been unable to successfully capture it.
> 
> This issue was discovered within a legacy service. A few days after
> upgrading to Git 2.53.0, the system's PID resources were exhausted by
> zombie processes. This is likely the result of recent changes, as this
> problem did not exist in earlier versions (2.4x).
> 
> To be honest, this is not an urgent matter; I have already deployed
> `tini` as the init process (PID 1) to prevent the service from
> becoming unavailable.

Yeah, I agree that you need some kind of zombie-reaping init process.
But I did wonder if we might have started generating more zombies here.

I did a little poking around with "strace -p 1". I didn't see extra
fetch processes, but I did see a lot of zombie git-maintenance processes
getting reaped. Which makes sense; by default we run background
maintenance with --detach. We can't ever reap that ourselves, since the
whole point is that it might outlive the parent fetch.

Once upon a time, we used to run "git gc --auto", and it would check
whether gc was needed (using a simple count of objects and packs) before
detaching. So in most cases it would realize there was nothing to be
done and exit immediately without daemonizing, and would get reaped by
git-fetch.

We switched to running "git maintenance" in v2.29. But it didn't yet
have a detached mode; it just run "git gc --detach" under the hood, so
the behavior was roughly the same (gc was reaped by maintenance which
was reaped by fetch).

Later, git-maintenance learned its own --detach flag, as of v2.47. But
unlike gc, it detaches immediately, and then each sub-task decides if it
needs to be run or not. So every "git fetch" will generate a detached
maintenance process that then gets reaped by init.

And if you moved from a pre-v2.47 version, then you'd see an increase in
such processes.

I think this is probably OK in practice. It is an extra fork that git-gc
never incurred, but as long as you have a functioning init process, they
won't accumulate.

I do wonder if git-maintenance could be more like git-gc here. Its
notion of tasks is more abstract, but it could in theory ask each task
"do you need to run?" and if they all say "no", then it can quit without
detaching. That would save an extra fork() for every noop
auto-maintenance call. I don't know how measurable that is in practice.
Or even how easy it is for each task to do such a check. Something like
"prefetch" is kind of all-or-nothing; you find out whether it needs
doing by doing it.

-Peff

^ permalink raw reply

* Re: [PATCH] fetch: pass transport to post-fetch connectivity check
From: Jeff King @ 2026-05-27 10:39 UTC (permalink / raw)
  To: Kristofer Karlsson; +Cc: Kristofer Karlsson via GitGitGadget, git
In-Reply-To: <CAL71e4MrVqC1=AR6x0_8S=8kVqPdDkhgCZRb4etFsxTzd6s_8Q@mail.gmail.com>

On Wed, May 27, 2026 at 12:04:19PM +0200, Kristofer Karlsson wrote:

> You're right. I dug into this further and realized the problem is deeper
> than just the flag not being set in builtin/fetch.c.
> 
> Even if we add:
> transport->smart_options->check_self_contained_and_connected = 1;
> to prepare_transport(), the optimization still won't work for fetches.
> 
> The optimization is fundamentally clone-only.

I have wondered if the transport could do the same thing for:

  git init
  git fetch ...

When we do not send any "want", then we'd expect the pack we receive to
be self-contained.

But in practice that is not that exciting, as it is a special case that
does not come up that often.

I suspect there's some hybrid mode where we could save some work. It is
easy for index-pack to come up with a list of "edges" from the pack it
got that point outside of the pack. We just need to know that those
edges are reachable from existing refs. So really we could be checking
the connectivity of those edges, rather than the actual ref tips.

Would that be less work? I'm not sure. It saves walking over the
newly-fetched history, but in practice that is probably not that
expensive. It potentially saves a lot when the edge is a ref tip; for a
true fast-forward we'd see a ref going from A..B, and if index-pack
tells us that it just needs A, we can skip the traversal entirely.

But index-pack isn't really thinking in terms of commits, but rather the
whole object graph. So you're going to find that commit A is needed, but
also all of the tree entries in the existing history that weren't
touched by the new history (e.g., B touched path "foo" but not "bar", so
it gets a new top-level tree, a new blob for "foo", but still references
the existing blob for "bar"). I guess you could speculatively load A^{tree}
to cull the list.

So I dunno. I think there is some room for speedup here, and in many
common cases you could skip the rev-list invocation entirely. But it's
not trivial, and I think is far afield from what your patch was
originally trying to do. ;)

> I was unable to reproduce the benchmark numbers from my original commit
> message.

Yeah, I wondered where the numbers came from. It is very easy to fool
yourself with fetch benchmarks, because even "fetch --dry-run" will
transfer objects, and under the hood we try to optimize out as much
object transfer as possible. So you really have to start from the exact
same on-disk state for each trial.

> The patch as submitted is indeed inert for non-clone fetches.
> It looked like a simple improvement, but it's clear that it was incorrect.
> I'll drop it, and I apologize for the noise here.

No problem. You've been generating some interesting optimization work
lately, so I can't complain. :)

I'll probably have more comments on your other topics, but I'm out of
time for tonight.

-Peff

^ permalink raw reply

* Re: [PATCH 0/8] pack-bitmap-write: speed up bitmap generation
From: Jeff King @ 2026-05-27 10:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <cover.1779207127.git.me@ttaylorr.com>

On Tue, May 19, 2026 at 12:12:33PM -0400, Taylor Blau wrote:

> This series improves the performance of reachability bitmap generation,
> focusing on very large repositories and the penalty to generate
> pseudo-merge reachability bitmaps.

Very nice numbers. I'm especially excited about patches 1-6, which are
really just speeding up bitmap generation in general, pseudo-merges
aside. I think you could even have split those off into their own
series, and then built the final two as a separate topic, but I am happy
either way.

I looked through the patches and had nothing to complain about.

-Peff

^ permalink raw reply

* Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps
From: Jeff King @ 2026-05-27 10:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Derrick Stolee
In-Reply-To: <30ce254312cfee2a2a82f08246c3a2546ae32578.1779207127.git.me@ttaylorr.com>

On Tue, May 19, 2026 at 12:12:55PM -0400, Taylor Blau wrote:

> Each selected commit starts with one commit_mask bit in its "commit
> mask" bitmap. Then, we walk the first-parent history in topological
> order and OR each commit's mask into its (first) parent. Whenever that
> OR results in the parent having more bits set, the child is deemed to be
> non-maximal, and the frontier is pushed further back along the first
> parent history.
> 
> That approach works extremely well for ordinary selected commits, whose
> first-parent histories often describe real sharing between the bitmaps
> we are going to write.
> 
> It struggles, however, to efficiently generate pseudo-merge bitmaps.
> Unlike ordinary commits for which the above algorithm is designed,
> pseudo-merges don't represent any "real" commit in history, just a
> grouping of non-bitmapped reference tips. In that sense, their first
> parent is just a part of a larger set, and treating them like ordinary
> selected commits imposes a significant slow-down when generating bitmaps
> with pseudo-merges enabled.

This is a great explanation of the problem, and especially this:

> In other words, we pay a nearly ~5 minute penalty to generate
> pseudo-merge bitmaps, but only save ~50 seconds during traversal.

makes it clear that we're doing something sub-optimal. And it points us
in the right direction, since that traversal should be able to generate
the pseudo-merge bitmap we need in the first place! So that should be
our goal to work towards.

> Instead, build the regular selected commit bitmaps first, considering
> only non-pseudo-merge commits in `bitmap_builder_init()`. Once those
> bitmaps have been stored, build each pseudo-merge bitmap separately and
> attach its parent and object bitmaps to the corresponding pseudo-merge
> entry before writing the extension.

And then this solution follows naturally from the earlier explanations.
Good.

In some ways this goes back to the pre-v2.31 way of generating bitmaps,
which is to just traverse for each bitmap independently. But as you
note, the whole idea of pseudo-merge bitmaps is that they aren't
overlapping in any meaningful way. So doing one fill-in traversal per
pseudo-merge makes sense, and hopefully we hit enough real bitmaps that
it's not too costly.

> As a result, the overhead cost for generating pseudo-merges in the above
> configuration is much smaller:
> 
>     +------------------+-----------------+---------------+-------------------+
>     |                  | no pseudo-merge | pseudo-merges | Delta             |
>     |                  |                 | (HEAD)        |                   |
>     +------------------+-----------------+---------------+-------------------+
>     | elapsed          |   294.1 s       |   328.4 s     |  +34.3 s (+11.7%) |
>     | cycles           | 1,365.5 B       | 1,529.3 B     | +163.7 B (+12.0%) |
>     | instructions     | 1,389.8 B       | 1,552.8 B     | +163.0 B (+11.7%) |
>     | CPI              |     0.983       |     0.985     |  +0.002   (+0.2%) |
>     +------------------+-----------------+---------------+-------------------+

Nice. The time savings are going to depend on how many pseudo-merges we
generate, I think. And I'd guess that the numbers above come from making
one big pseudo-merge bitmap, per the config you showed earlier. But you
probably only want a handful of them in any repo, so hopefully it
doesn't scale _too_ badly.

> Recall that at the start of this series, generating reachability bitmaps
> took 612.5 seconds *without* pseudo-merges. With this commit, it is
> still ~46.38% *faster* to generate reachability bitmaps *with*
> pseudo-merges than it was to generate bitmaps wihtout them at the
> beginning of this series.

Sure, though 612.5 seconds is all in the distant past. We only care
about 294.1 seconds now. ;)

More seriously, I do think the interesting question here is how the time
scales for various pseudo-merge configurations. I don't know if we have
any real operational experience with them yet. The original idea is that
you might slice up the ref space into a few chunks. I'd guess that the
old code performed badly-ish overall, but the time did not grow all that
much as you increased the number of chunks. But with the new code, I
suspect that the cost grows more linearly with number of chunks. That's
just a guess, though.

The other thing we hope for with pseudo-merges is that the chunks are
selected such that most of the chunks don't change (because they are
composed of old, stable refs). So in subsequent bitmap generations, we
can either reuse them either verbatim or as a starting point (if there
were only additions). But all of that is going to be heuristic and
depend on your config, the changes the repo sees over time, and so on.

So I don't know if we'd really have good numbers on that.

> Now that we have decoupled how we generate pseudo-merges from their
> representation, the following commits will improve the API around
> specifying pseudo-merge groupings during bitmap generation.

I think we're at patch 8/8 here. I guess you have more to come
eventually, but for now this part is just misleading. ;)

>  pack-bitmap-write.c | 210 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 174 insertions(+), 36 deletions(-)

The patch looks reasonable, though I'm not all that familiar with the
ins and outs of the pseudo-merge code. I'd trust the tests here more
than my review.

> @@ -696,12 +700,32 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
>  		 * walk ensures we cover all parents.
>  		 */
>  		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
> +			struct tree *tree;
> +
> +			if (from_pseudo_merge && !c->object.parsed) {
> +				/*
> +				 * Commits reachable from selected
> +				 * non-pseudo-merges are already parsed
> +				 * by the regular bitmap build.
> +				 *
> +				 * However, pseudo-merge fills can also
> +				 * reach commits that were not covered
> +				 * there, so parse any such leftovers
> +				 * before reading their tree or parents.
> +				 */
> +				if (repo_parse_commit(writer->repo, c))
> +					return -1;
> +			}

Makes sense. This should be a quick noop for the regular bitmap build,
since we'll have the parsed flag set. And it should even allow use of
the commit-graph if it's available.

-Peff

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox