Git development
 help / color / mirror / Atom feed
* [PATCH] http: handle absolute-path alternates from server root
@ 2026-05-12 16:26 Jeff King
  2026-05-13  1:10 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2026-05-12 16:26 UTC (permalink / raw)
  To: git; +Cc: slonkazoid

When a dumb http server reports alternates with an absolute path, we try
to paste that onto the root of the URL we're trying to fetch from. So if
we go to "http://example.com/path/to/child.git" and it tells us about an
alternate at "/parent.git", we'll hit "http://example.com/parent.git".

But there's a bug in computing the base when the URL does not have any
path component at all, like "http://example.com". When looking for the
first slash after the host, strchr() returns NULL, and we compute a
nonsense value for the length of the host portion. And then when we use
that length to copy the base of the URL into a strbuf, we're likely to
fail.

The security implications are minimal here. We store the nonsense length
("serverlen") as an int, so on a 64-bit system it may effectively be
anything (it is zero minus a 64-bit heap pointer, then truncated to
32-bits and stuffed into a signed value). When we feed that length to
strbuf_add(), it is cast into a size_t and one of four things will
happen:

  1. If serverlen was negative, it will turn into a very large positive
     value and strbuf_add() will fail to allocate, ending the program.
     Ditto if serverlen was positive but just very large.

     This doesn't really get an attacker anything; the victim will just
     fail to clone their evil repo.

  2. If serverlen was small enough, we'll successfully extend the target
     strbuf, and then copy an arbitrary set of bytes from "base". And
     then one of these is true:

       a. That set of bytes is much larger than the length of the "base"
          string. This is an out-of-bounds read, but there's no
          out-of-bounds write, since the strbuf code both allocates and
          copies using the same size_t. This is likely to cause a
          segfault as we try to read unmapped pages of memory.

       b. Like (2a), but if the set of bytes is small enough we might
          not segfault. We might read random memory from the process and
          copy it into the "target" strbuf.

          What happens then? We know that "base" ends with a NUL
          terminator, which will be copied into "target" as well. So
          even though target.len might be 1000 bytes (or whatever), when
          interpreted as a NUL-terminated string, target.buf is still
          the exact same string as "base".

          And that's all we ever do with target: pass it around as a C
          string, and then eventually strbuf_detach() it to become a C
          string. So even though there was arbitrary memory copied into
          the strbuf, we never access it.

       c. The other interesting case is when serverlen is actually
          _shorter_ than the length of base. And there we truncate the
          string. Probably in a way that makes it totally invalid, but
          if you were very unlucky you could turn something like:

             http://victim.com.evil.domain:8000

          into:

            http://victim.com

	  Which looks like the start of a redirect attack, except that
	  the attacker could just have written "http://victim.com" in
	  the first place! Either way we feed it to
	  is_alternate_allowed(), which is where we check redirect and
	  protocol rules.

I think we can just treat this like a regular bug.

And it's quite a weird setup in the first place, as it implies that the
root of the web server is serving a repository (i.e., that you can get
something useful from "http://example.com/info/refs"). The bug has been
there since b3661567cf ([PATCH] Add support for alternates in HTTP,
2005-09-14) without anybody noticing.

I kind of doubt anybody really cares about making this work, but it's
easy enough to do so: the host-portion of the URL ends at either the
first slash or the end-of-string. So we can just replace strchr() with
strchrnul().

The test setup is a little gross, as we take over the httpd document
root by shoving our bare-repo components into it. But it demonstrates
the problem and shows that our solution actually allows the alternate to
function, if the server is configured to allow it.

Reported-by: slonkazoid <slonkazoid@slonk.ing>
Signed-off-by: Jeff King <peff@peff.net>
---
 http-walker.c              |  2 +-
 t/t5550-http-fetch-dumb.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/http-walker.c b/http-walker.c
index 1b6d496548..f252de089f 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -268,7 +268,7 @@ static void process_alternates_response(void *callback_data)
 				 */
 				const char *colon_ss = strstr(base,"://");
 				if (colon_ss) {
-					serverlen = (strchr(colon_ss + 3, '/')
+					serverlen = (strchrnul(colon_ss + 3, '/')
 						     - base);
 					okay = 1;
 				}
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 9d0a7f5c4b..b0080bf204 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -555,4 +555,24 @@ test_expect_success 'dumb http can fetch index v1' '
 	git -C idx-v1 fsck
 '
 
+test_expect_success 'absolute-path alternate when url has no path' '
+	src=$HTTPD_DOCUMENT_ROOT_PATH/repo.git &&
+	alt=absolute-alt.git &&
+	git clone --bare --shared "$src" "$alt" &&
+
+	# Our repo has an alternate pointing to the absolute filesystem path,
+	# but that will not make any sense to an http client. So we will
+	# manually give it the equivalent path that the http server will
+	# understand.
+	echo "/dumb/repo.git/objects" >"$alt/objects/info/http-alternates" &&
+
+	# Now make our alt repository available at the root of the http
+	# server without any path (i.e., just http://localhost:1234).
+	git -C "$alt" update-server-info &&
+	mv absolute-alt.git/* "$HTTPD_DOCUMENT_ROOT_PATH" &&
+
+	git -c http.followRedirects=true clone "$HTTPD_URL" alt-clone.git 2>err &&
+	test_grep "adding alternate object store: $HTTPD_URL/dumb/repo.git" err
+'
+
 test_done
-- 
2.54.0.420.gf0bcdff42b

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-12 16:26 [PATCH] http: handle absolute-path alternates from server root Jeff King
@ 2026-05-13  1:10 ` Junio C Hamano
  2026-05-13 18:58   ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2026-05-13  1:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git, slonkazoid

Jeff King <peff@peff.net> writes:

>           ... Probably in a way that makes it totally invalid, but
>           if you were very unlucky you could turn something like:
>
>              http://victim.com.evil.domain:8000
>
>           into:
>
>             http://victim.com
>
> 	  Which looks like the start of a redirect attack, except that
> 	  the attacker could just have written "http://victim.com" in
> 	  the first place! Either way we feed it to
> 	  is_alternate_allowed(), which is where we check redirect and
> 	  protocol rules.

Yuck.  I know I am the guilty party who introduced the dumb HTTP
walker but I wish we could kill it off after all these years. I did
not even recall that we supported the alternate object store in the
"protocol" until I saw this patch X-<.

> I think we can just treat this like a regular bug.

Absolutely.  Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-13  1:10 ` Junio C Hamano
@ 2026-05-13 18:58   ` Jeff King
  2026-05-15  7:41     ` Patrick Steinhardt
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2026-05-13 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, slonkazoid

On Wed, May 13, 2026 at 10:10:54AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >           ... Probably in a way that makes it totally invalid, but
> >           if you were very unlucky you could turn something like:
> >
> >              http://victim.com.evil.domain:8000
> >
> >           into:
> >
> >             http://victim.com
> >
> > 	  Which looks like the start of a redirect attack, except that
> > 	  the attacker could just have written "http://victim.com" in
> > 	  the first place! Either way we feed it to
> > 	  is_alternate_allowed(), which is where we check redirect and
> > 	  protocol rules.
> 
> Yuck.  I know I am the guilty party who introduced the dumb HTTP
> walker but I wish we could kill it off after all these years. I did
> not even recall that we supported the alternate object store in the
> "protocol" until I saw this patch X-<.

Me too. It's been the source of many obscure bugs, and I think a couple
of vulnerabilities (even though clients never intend to use dumb clones
in the first place).

We talked about dropping it a few years ago, but Eric countered that
dumb clones are easier on the server in some cases (like gigantic
public-inbox repos that are packed to keep most of the old history in
one big pack that is never updated). The verbatim pack-reuse feature
tries to get smart clones closer to that, but it's hard to beat serving
a static file from the server's perspective. I haven't measured anything
in that area in a while, though.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-13 18:58   ` Jeff King
@ 2026-05-15  7:41     ` Patrick Steinhardt
  2026-05-15 17:01       ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2026-05-15  7:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, slonkazoid

On Wed, May 13, 2026 at 02:58:25PM -0400, Jeff King wrote:
> On Wed, May 13, 2026 at 10:10:54AM +0900, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > >           ... Probably in a way that makes it totally invalid, but
> > >           if you were very unlucky you could turn something like:
> > >
> > >              http://victim.com.evil.domain:8000
> > >
> > >           into:
> > >
> > >             http://victim.com
> > >
> > > 	  Which looks like the start of a redirect attack, except that
> > > 	  the attacker could just have written "http://victim.com" in
> > > 	  the first place! Either way we feed it to
> > > 	  is_alternate_allowed(), which is where we check redirect and
> > > 	  protocol rules.
> > 
> > Yuck.  I know I am the guilty party who introduced the dumb HTTP
> > walker but I wish we could kill it off after all these years. I did
> > not even recall that we supported the alternate object store in the
> > "protocol" until I saw this patch X-<.
> 
> Me too. It's been the source of many obscure bugs, and I think a couple
> of vulnerabilities (even though clients never intend to use dumb clones
> in the first place).
> 
> We talked about dropping it a few years ago, but Eric countered that
> dumb clones are easier on the server in some cases (like gigantic
> public-inbox repos that are packed to keep most of the old history in
> one big pack that is never updated). The verbatim pack-reuse feature
> tries to get smart clones closer to that, but it's hard to beat serving
> a static file from the server's perspective. I haven't measured anything
> in that area in a while, though.

In theory we can get much closer with packfile URIs, too, can't we? If
the packfiles are directly accessible anyway the server could just
announce these directly and have the client fetch them. That should
significantly reduce the load on the server even further.

Of course, the big downside is that "fetch.uriProtocols" is empty by
default, so Git will not use them. Makes me wonder whether this is
something we want to eventually change, but I guess the current default
behaviour is somewhat insecure as it would allow the server to redirect
clients to arbitrary locations. It would be great if we had a mechanism
that only allowed packfile URIs that use the same host, which would make
this a lot more reasonable to enable by default.

Patrick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-15  7:41     ` Patrick Steinhardt
@ 2026-05-15 17:01       ` Jeff King
  2026-05-21 11:50         ` Patrick Steinhardt
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2026-05-15 17:01 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, slonkazoid

On Fri, May 15, 2026 at 09:41:06AM +0200, Patrick Steinhardt wrote:

> > We talked about dropping it a few years ago, but Eric countered that
> > dumb clones are easier on the server in some cases (like gigantic
> > public-inbox repos that are packed to keep most of the old history in
> > one big pack that is never updated). The verbatim pack-reuse feature
> > tries to get smart clones closer to that, but it's hard to beat serving
> > a static file from the server's perspective. I haven't measured anything
> > in that area in a while, though.
> 
> In theory we can get much closer with packfile URIs, too, can't we? If
> the packfiles are directly accessible anyway the server could just
> announce these directly and have the client fetch them. That should
> significantly reduce the load on the server even further.

Packfile URIs help with the actual pack generation (even if we're
blitting out bits from the disk with verbatim packfile reuse, we still
have to handle gaps and compute the checksum over the output pack).

But it doesn't help with the server computing the set of objects the
client needs in the first place. IIRC, packfile URIs work by the server
saying "oh, I was going to send you object XYZ, but you can get it from
this stable pack instead". So the server still has to compute the set of
objects (and send any that are not mentioned in URI packs). Bitmaps
help, but there's still non-trivial computation and storage on the
server.

Contrast that with a client that instead pulls a packfile over dumb
storage on its own, and then comes to the server for a top-off fetch.
The server still has to do some computation, but it's usually quite
small, because both sides agree quickly that there's no need to dig down
further than the tips in that dumb packfile.

> Of course, the big downside is that "fetch.uriProtocols" is empty by
> default, so Git will not use them. Makes me wonder whether this is
> something we want to eventually change, but I guess the current default
> behaviour is somewhat insecure as it would allow the server to redirect
> clients to arbitrary locations. It would be great if we had a mechanism
> that only allowed packfile URIs that use the same host, which would make
> this a lot more reasonable to enable by default.

It's been a while since I've looked at it, but I seem to recall that the
server-side tools for specifying which packfile URIs to use were not
that mature. Maybe that has changed, though (I'm probably 5 years out of
date since the last time I really thought about these things).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-15 17:01       ` Jeff King
@ 2026-05-21 11:50         ` Patrick Steinhardt
  2026-05-22  4:55           ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2026-05-21 11:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, slonkazoid

On Fri, May 15, 2026 at 01:01:34PM -0400, Jeff King wrote:
> On Fri, May 15, 2026 at 09:41:06AM +0200, Patrick Steinhardt wrote:
> 
> > > We talked about dropping it a few years ago, but Eric countered that
> > > dumb clones are easier on the server in some cases (like gigantic
> > > public-inbox repos that are packed to keep most of the old history in
> > > one big pack that is never updated). The verbatim pack-reuse feature
> > > tries to get smart clones closer to that, but it's hard to beat serving
> > > a static file from the server's perspective. I haven't measured anything
> > > in that area in a while, though.
> > 
> > In theory we can get much closer with packfile URIs, too, can't we? If
> > the packfiles are directly accessible anyway the server could just
> > announce these directly and have the client fetch them. That should
> > significantly reduce the load on the server even further.
> 
> Packfile URIs help with the actual pack generation (even if we're
> blitting out bits from the disk with verbatim packfile reuse, we still
> have to handle gaps and compute the checksum over the output pack).
> 
> But it doesn't help with the server computing the set of objects the
> client needs in the first place. IIRC, packfile URIs work by the server
> saying "oh, I was going to send you object XYZ, but you can get it from
> this stable pack instead". So the server still has to compute the set of
> objects (and send any that are not mentioned in URI packs). Bitmaps
> help, but there's still non-trivial computation and storage on the
> server.

I guess it depends on the actual server-side implementation, but in the
general case this is of course true. A server could decide to for
example overserve objects in case the client does a full clone, or it
could arrange packfiles in a special way that allows it to serve at
least some kinds of requests efficiently.

> Contrast that with a client that instead pulls a packfile over dumb
> storage on its own, and then comes to the server for a top-off fetch.
> The server still has to do some computation, but it's usually quite
> small, because both sides agree quickly that there's no need to dig down
> further than the tips in that dumb packfile.

So this here is in theory possible with packfile URIs, as well, by
computing the top-off fetch depending on the packfile layout.

But this requires quite a bunch of server-side logic and very specific
layouts, I guess.

> > Of course, the big downside is that "fetch.uriProtocols" is empty by
> > default, so Git will not use them. Makes me wonder whether this is
> > something we want to eventually change, but I guess the current default
> > behaviour is somewhat insecure as it would allow the server to redirect
> > clients to arbitrary locations. It would be great if we had a mechanism
> > that only allowed packfile URIs that use the same host, which would make
> > this a lot more reasonable to enable by default.
> 
> It's been a while since I've looked at it, but I seem to recall that the
> server-side tools for specifying which packfile URIs to use were not
> that mature. Maybe that has changed, though (I'm probably 5 years out of
> date since the last time I really thought about these things).

Packfile URIs definitely need some love to become feasible, yes, and I
don't think they have evolved much since their introduction. I still
feel like they are the better mechanism for offloading traffic compared
to bundle URIs though, as we already have packfiles around anyway.

Patrick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] http: handle absolute-path alternates from server root
  2026-05-21 11:50         ` Patrick Steinhardt
@ 2026-05-22  4:55           ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2026-05-22  4:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, slonkazoid

On Thu, May 21, 2026 at 01:50:06PM +0200, Patrick Steinhardt wrote:

> > Packfile URIs help with the actual pack generation (even if we're
> > blitting out bits from the disk with verbatim packfile reuse, we still
> > have to handle gaps and compute the checksum over the output pack).
> > 
> > But it doesn't help with the server computing the set of objects the
> > client needs in the first place. IIRC, packfile URIs work by the server
> > saying "oh, I was going to send you object XYZ, but you can get it from
> > this stable pack instead". So the server still has to compute the set of
> > objects (and send any that are not mentioned in URI packs). Bitmaps
> > help, but there's still non-trivial computation and storage on the
> > server.
> 
> I guess it depends on the actual server-side implementation, but in the
> general case this is of course true. A server could decide to for
> example overserve objects in case the client does a full clone, or it
> could arrange packfiles in a special way that allows it to serve at
> least some kinds of requests efficiently.

True, though you still have to receive the client wants/haves before
getting to the packfile-uri phase. The alternative is for the server
send URIs during the ref advertisement. But we have that, too, these
days: the bundle-uri feature. (Which I completely forgot about while
writing my earlier email).

So I do think that bundle-uris can probably be an adequate substitute
for dumb-http in terms of reducing server load. Though...

> Packfile URIs definitely need some love to become feasible, yes, and I
> don't think they have evolved much since their introduction. I still
> feel like they are the better mechanism for offloading traffic compared
> to bundle URIs though, as we already have packfiles around anyway.

...yeah, I agree that storing both bundles and packs can be annoying for
a server, depending on your setup. In theory it would not be hard for a
slightly-clever server endpoint to store packfiles for regular Git to
use, and then generate the bundles on the fly by cat-ing the bundle
header and the packfile, both of which can be sent out as raw bytes
without further processing.

Anyway, we are far afield from the patch that started this thread. ;) I
do agree with the general notion that we _should_ be able to get
smart-http close to the server-side expense of dumb-http with a few
tricks like these.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-22  4:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 16:26 [PATCH] http: handle absolute-path alternates from server root Jeff King
2026-05-13  1:10 ` Junio C Hamano
2026-05-13 18:58   ` Jeff King
2026-05-15  7:41     ` Patrick Steinhardt
2026-05-15 17:01       ` Jeff King
2026-05-21 11:50         ` Patrick Steinhardt
2026-05-22  4:55           ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox