Git dumb HTTP protocol should work without update-server-info

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Git dumb HTTP protocol should work without update-server-info
@ 2025-09-07 11:24 Milan Hauth
  2025-09-07 15:07 ` brian m. carlson
  2025-09-08  0:05 ` Jeff King
  0 siblings, 2 replies; 11+ messages in thread
From: Milan Hauth @ 2025-09-07 11:24 UTC (permalink / raw)
  To: git

this works:

git ls-remote /path/to/repo2/
git ls-remote /path/to/repo2/.git/
git ls-remote file:///path/to/repo2/
git ls-remote file:///path/to/repo2/.git/

this fails:

python -m http.server -d /path/to/repo2/ &
git ls-remote http://localhost:8000/
git ls-remote http://localhost:8000/.git/

workaround:

pushd /path/to/repo2/.git/
git --bare update-server-info
mv hooks/post-update.sample hooks/post-update
popd
git ls-remote http://localhost:8000/

expected:
dumb http remotes should behave like file remotes



> git --bare update-server-info

that command creates the file
/path/to/repo2/.git/info/refs
but that is just an optimization
for http servers with high latency

my "dumb" http server
is smart enough to handle http range requests
so there is no need
to download all the files from .git/

but also without http range requests
this should "just work"
and the user should be responsible for optimizations



as another workaround
i tried to mount the .git/ directory with httpdirfs
but httpdirfs fails to mount git repos, see
https://github.com/fangfufu/httpdirfs/issues/183



related:

https://stackoverflow.com/questions/2085402/what-does-git-update-server-info-do

https://stackoverflow.com/questions/2278888/private-git-repository-over-http

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth
@ 2025-09-07 15:07 ` brian m. carlson
  2025-09-07 17:23   ` Milan Hauth
  2025-09-08  9:40   ` Patrick Steinhardt
  2025-09-08  0:05 ` Jeff King
  1 sibling, 2 replies; 11+ messages in thread
From: brian m. carlson @ 2025-09-07 15:07 UTC (permalink / raw)
  To: Milan Hauth; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]

On 2025-09-07 at 11:24:13, Milan Hauth wrote:
> this works:
> 
> git ls-remote /path/to/repo2/
> git ls-remote /path/to/repo2/.git/
> git ls-remote file:///path/to/repo2/
> git ls-remote file:///path/to/repo2/.git/
> 
> this fails:
> 
> python -m http.server -d /path/to/repo2/ &
> git ls-remote http://localhost:8000/
> git ls-remote http://localhost:8000/.git/
> 
> workaround:
> 
> pushd /path/to/repo2/.git/
> git --bare update-server-info
> mv hooks/post-update.sample hooks/post-update
> popd
> git ls-remote http://localhost:8000/
> 
> expected:
> dumb http remotes should behave like file remotes

In general, that's not possible, because HTTP doesn't support native
atomic operations.  An HTTP push locks the remote with DAV by preventing
other changes to `info/refs`, which makes the operation atomic.  If
that file isn't there, then there's no way to guarantee that the ref
update isn't competing with others, which might cause data loss.

In addition, HTTP also doesn't support native machine-readable directory
listings except with DAV.  However, we don't require DAV for fetches, so
we need a list of the refs and the packs in order to be able to download
objects correctly.

> > git --bare update-server-info
> 
> that command creates the file
> /path/to/repo2/.git/info/refs
> but that is just an optimization
> for http servers with high latency
> 
> my "dumb" http server
> is smart enough to handle http range requests
> so there is no need
> to download all the files from .git/
> 
> but also without http range requests
> this should "just work"
> and the user should be responsible for optimizations

As I said above, I don't think that's the concern here.

I will also note that the dumb HTTP protocol doesn't work with reftable
and there was some suggestion of removing it for Git 3.0.  It certainly
will not work out of the box with Git 3.0, since the default is
reftable.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-07 15:07 ` brian m. carlson
@ 2025-09-07 17:23   ` Milan Hauth
  2025-09-07 17:42     ` brian m. carlson
  2025-09-08  9:40   ` Patrick Steinhardt
  1 sibling, 1 reply; 11+ messages in thread
From: Milan Hauth @ 2025-09-07 17:23 UTC (permalink / raw)
  To: brian m. carlson, Milan Hauth, git

> HTTP push

im only talking about "read" operations:

git ls-remote
git fetch
git pull

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-07 17:23   ` Milan Hauth
@ 2025-09-07 17:42     ` brian m. carlson
  0 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2025-09-07 17:42 UTC (permalink / raw)
  To: Milan Hauth; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

On 2025-09-07 at 17:23:28, Milan Hauth wrote:
> > HTTP push
> 
> im only talking about "read" operations:
> 
> git ls-remote
> git fetch
> git pull

Yes, as I said, reading directories is only possible with WebDAV since
HTTP doesn't offer native directory listing.  However, we don't use
WebDAV for fetches and other read operations and not all web servers
support it.  We get better web server support in many cases by requiring
that the server side do the work of updating the lists of packs and
refs.

Without some way to list directories, you cannot in the general case
iterate over the refs and packs in the file system without a manifest,
so creating a manifest is what update-server-info does.

This is also why most tools which provide HTTP access in the file system
require WebDAV, since it isn't very useful to have a file system where
you can't list directories.  (We abandoned directory-less file systems
in the early DOS and Macintosh days.)
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-07 15:07 ` brian m. carlson
  2025-09-07 17:23   ` Milan Hauth
@ 2025-09-08  9:40   ` Patrick Steinhardt
  2025-09-08 14:43     ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick Steinhardt @ 2025-09-08  9:40 UTC (permalink / raw)
  To: brian m. carlson, Milan Hauth, git

On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote:
> I will also note that the dumb HTTP protocol doesn't work with reftable
> and there was some suggestion of removing it for Git 3.0.  It certainly
> will not work out of the box with Git 3.0, since the default is
> reftable.

Yes, indeed. In theory though reftables could also be the solution to
the underlying issue: the client can be tought to read the "tables.list"
file and then fetch all tables listed therein. The result would be fully
consistent, unless any of the tables gets garbage collected. The client
would notice and abort the operation, after which it could restart the
operation.

In that case there would be no need for git-update-server-info(1)
anymore. The "tables.list" file sits in a well-known location,
identifies all other tables we have to download, and there are no
atomicity issues anymore.

The catch of course is that somebody would have to implement this :)

Patrick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-08  9:40   ` Patrick Steinhardt
@ 2025-09-08 14:43     ` Junio C Hamano
  2025-09-09  5:26       ` Patrick Steinhardt
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2025-09-08 14:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: brian m. carlson, Milan Hauth, git

Patrick Steinhardt <ps@pks.im> writes:

> On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote:
>> I will also note that the dumb HTTP protocol doesn't work with reftable
>> and there was some suggestion of removing it for Git 3.0.  It certainly
>> will not work out of the box with Git 3.0, since the default is
>> reftable.
>
> Yes, indeed. In theory though reftables could also be the solution to
> the underlying issue: the client can be tought to read the "tables.list"
> file and then fetch all tables listed therein. The result would be fully
> consistent, unless any of the tables gets garbage collected. The client
> would notice and abort the operation, after which it could restart the
> operation.
>
> In that case there would be no need for git-update-server-info(1)
> anymore. The "tables.list" file sits in a well-known location,
> identifies all other tables we have to download, and there are no
> atomicity issues anymore.

Does tables.list list what pack files there are in the repository?  
I somehow doubt it.

The dumb HTTP transport was meant to be able to operate with a truly
dumb HTTP server, that does not even have to support WebDAV at all,
so there needs some tables at known name that lists _all_ the files
the cloners are expected to be able to download from.  We still need
the output from update-server-info [*] to tell what packs are there
even if tables.list is stored at the known path.

[Footnote]

* ... or its equivalent generated offline and uploaded manually to
  the repository, which was what I did before I got an account at
  kernel.org.  There was a small web space at local ISP provided for
  its subscribers, so my "push" was to ftp upload the loose objects,
  packs, refs, and the info/refs + objects/info/packs files there X-<.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-08 14:43     ` Junio C Hamano
@ 2025-09-09  5:26       ` Patrick Steinhardt
  0 siblings, 0 replies; 11+ messages in thread
From: Patrick Steinhardt @ 2025-09-09  5:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, Milan Hauth, git

On Mon, Sep 08, 2025 at 07:43:20AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote:
> >> I will also note that the dumb HTTP protocol doesn't work with reftable
> >> and there was some suggestion of removing it for Git 3.0.  It certainly
> >> will not work out of the box with Git 3.0, since the default is
> >> reftable.
> >
> > Yes, indeed. In theory though reftables could also be the solution to
> > the underlying issue: the client can be tought to read the "tables.list"
> > file and then fetch all tables listed therein. The result would be fully
> > consistent, unless any of the tables gets garbage collected. The client
> > would notice and abort the operation, after which it could restart the
> > operation.
> >
> > In that case there would be no need for git-update-server-info(1)
> > anymore. The "tables.list" file sits in a well-known location,
> > identifies all other tables we have to download, and there are no
> > atomicity issues anymore.
> 
> Does tables.list list what pack files there are in the repository?  
> I somehow doubt it.
> 
> The dumb HTTP transport was meant to be able to operate with a truly
> dumb HTTP server, that does not even have to support WebDAV at all,
> so there needs some tables at known name that lists _all_ the files
> the cloners are expected to be able to download from.  We still need
> the output from update-server-info [*] to tell what packs are there
> even if tables.list is stored at the known path.

Oh, you're right. I only remembered that we need it for refs, but of
course we also need it for packs.

Patrick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth
  2025-09-07 15:07 ` brian m. carlson
@ 2025-09-08  0:05 ` Jeff King
  2025-09-08  4:14   ` Junio C Hamano
  2025-09-08 21:27   ` brian m. carlson
  1 sibling, 2 replies; 11+ messages in thread
From: Jeff King @ 2025-09-08  0:05 UTC (permalink / raw)
  To: Milan Hauth; +Cc: git

On Sun, Sep 07, 2025 at 01:24:13PM +0200, Milan Hauth wrote:

> expected:
> dumb http remotes should behave like file remotes

File remotes are running a local git-upload-pack to act as the server.
A dumb http remote can't run anything on the server side. So we are
stuck pretending that the http endpoint is a filesystem. Besides
performing terribly, as brian noted it is not even portable to do,
since there is no readdir() equivalent (and things like httpdirfs have
to resort to scraping auto-generated directory listings).

If you want to go that route, I think you're better off using a fuse
wrapper and just letting Git work against the mounted filesystem, as you
tried here:

> as another workaround
> i tried to mount the .git/ directory with httpdirfs
> but httpdirfs fails to mount git repos, see
> https://github.com/fangfufu/httpdirfs/issues/183

I think this does work. The instructions you gave there won't do it,
though, because python's http.server module doesn't support range
requests.

Something like:

  sudo apt install python3-rangehttpserver
  (cd /path/to/repo && python -m RangeHTTPServer)
  mkdir mnt
  sudo httpdirfs http://localhost:8000/.git mnt

got me a mount that worked with:

  # you could probably avoid sudo here with better mount perms above
  sudo git clone mnt foo

It's painfully slow, though.

Possibly dumb-http could learn to do the same scraping that httpdirfs
does to get the refs and pack listings (though this might be quite slow
for unpacked refs, if the ref tree is deep). But I doubt you will find
anybody that enthused about working on or reviewing dumb-http patches
these days. The code is not very well maintained, IMO.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-08  0:05 ` Jeff King
@ 2025-09-08  4:14   ` Junio C Hamano
  2025-09-08 21:27   ` brian m. carlson
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2025-09-08  4:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Milan Hauth, git

Jeff King <peff@peff.net> writes:

> Possibly dumb-http could learn to do the same scraping that httpdirfs
> does to get the refs and pack listings (though this might be quite slow
> for unpacked refs, if the ref tree is deep).

Please don't.  Once you go that route, that is no longer "dumb http"
at all.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-08  0:05 ` Jeff King
  2025-09-08  4:14   ` Junio C Hamano
@ 2025-09-08 21:27   ` brian m. carlson
  2025-09-09  1:35     ` Jeff King
  1 sibling, 1 reply; 11+ messages in thread
From: brian m. carlson @ 2025-09-08 21:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Milan Hauth, git

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On 2025-09-08 at 00:05:43, Jeff King wrote:
> Possibly dumb-http could learn to do the same scraping that httpdirfs
> does to get the refs and pack listings (though this might be quite slow
> for unpacked refs, if the ref tree is deep). But I doubt you will find
> anybody that enthused about working on or reviewing dumb-http patches
> these days. The code is not very well maintained, IMO.

That kind of scraping is really not a good idea.

It's the equivalent of trying to parse the FTP LIST output, which is
customarily the equivalent of `ls -l`, but doesn't have to be, and often
isn't on Windows.  I can tell you from experience how painful doing that
is and how many bug reports come in when you try to use that (because
the FTP server also doesn't support either variant of the
machine-readable format that solves this problem).

It's also prone to breaking things because some HTTP servers have weird
redirect loops due to sorting entries in the index pages that you have
to be careful not to trigger.  And, of course, that depends on the
server having index pages turned on, which many do not.

And then you have to do all of this in C with pointer arithmetic, with
all of the terrifying security properties that has, especially because
you can't actually use an XML parser to do it (unlike in DAV) since many
servers have index pages that are (possibly invalid) autogenerated HTML
3.2 or something.

I've seen exactly one piece of software (lftp) that hasn't done a
terribly awful implementation of this and that happens to have worked
every time I've tried it (which is extremely infrequently, so I probably
just haven't found a server which breaks it yet).  Every other piece of
software I've used that's done this kind of thing has just been broken
and I anticipate Git would be no different.

I do want to be very clear that this is a bad idea and I hope never to
see such patches come into Git.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Git dumb HTTP protocol should work without update-server-info
  2025-09-08 21:27   ` brian m. carlson
@ 2025-09-09  1:35     ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2025-09-09  1:35 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Milan Hauth, git

On Mon, Sep 08, 2025 at 09:27:23PM +0000, brian m. carlson wrote:

> And then you have to do all of this in C with pointer arithmetic, with
> all of the terrifying security properties that has, especially because
> you can't actually use an XML parser to do it (unlike in DAV) since many
> servers have index pages that are (possibly invalid) autogenerated HTML
> 3.2 or something.

I hear you could do it in rust these days. ;) But more seriously, yes, I
don't think it's a good idea. Maybe I should have been more emphatic in
my original message about that. But there is nothing stopping somebody
who wanted to implement such a thing for themselves, which is probably
more productive than asking somebody on this list to work on it.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-09  5:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth
2025-09-07 15:07 ` brian m. carlson
2025-09-07 17:23   ` Milan Hauth
2025-09-07 17:42     ` brian m. carlson
2025-09-08  9:40   ` Patrick Steinhardt
2025-09-08 14:43     ` Junio C Hamano
2025-09-09  5:26       ` Patrick Steinhardt
2025-09-08  0:05 ` Jeff King
2025-09-08  4:14   ` Junio C Hamano
2025-09-08 21:27   ` brian m. carlson
2025-09-09  1:35     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).