* Git dumb HTTP protocol should work without update-server-info
@ 2025-09-07 11:24 Milan Hauth
2025-09-07 15:07 ` brian m. carlson
2025-09-08 0:05 ` Jeff King
0 siblings, 2 replies; 11+ messages in thread
From: Milan Hauth @ 2025-09-07 11:24 UTC (permalink / raw)
To: git
this works:
git ls-remote /path/to/repo2/
git ls-remote /path/to/repo2/.git/
git ls-remote file:///path/to/repo2/
git ls-remote file:///path/to/repo2/.git/
this fails:
python -m http.server -d /path/to/repo2/ &
git ls-remote http://localhost:8000/
git ls-remote http://localhost:8000/.git/
workaround:
pushd /path/to/repo2/.git/
git --bare update-server-info
mv hooks/post-update.sample hooks/post-update
popd
git ls-remote http://localhost:8000/
expected:
dumb http remotes should behave like file remotes
> git --bare update-server-info
that command creates the file
/path/to/repo2/.git/info/refs
but that is just an optimization
for http servers with high latency
my "dumb" http server
is smart enough to handle http range requests
so there is no need
to download all the files from .git/
but also without http range requests
this should "just work"
and the user should be responsible for optimizations
as another workaround
i tried to mount the .git/ directory with httpdirfs
but httpdirfs fails to mount git repos, see
https://github.com/fangfufu/httpdirfs/issues/183
related:
https://stackoverflow.com/questions/2085402/what-does-git-update-server-info-do
https://stackoverflow.com/questions/2278888/private-git-repository-over-http
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth @ 2025-09-07 15:07 ` brian m. carlson 2025-09-07 17:23 ` Milan Hauth 2025-09-08 9:40 ` Patrick Steinhardt 2025-09-08 0:05 ` Jeff King 1 sibling, 2 replies; 11+ messages in thread From: brian m. carlson @ 2025-09-07 15:07 UTC (permalink / raw) To: Milan Hauth; +Cc: git [-- Attachment #1: Type: text/plain, Size: 2013 bytes --] On 2025-09-07 at 11:24:13, Milan Hauth wrote: > this works: > > git ls-remote /path/to/repo2/ > git ls-remote /path/to/repo2/.git/ > git ls-remote file:///path/to/repo2/ > git ls-remote file:///path/to/repo2/.git/ > > this fails: > > python -m http.server -d /path/to/repo2/ & > git ls-remote http://localhost:8000/ > git ls-remote http://localhost:8000/.git/ > > workaround: > > pushd /path/to/repo2/.git/ > git --bare update-server-info > mv hooks/post-update.sample hooks/post-update > popd > git ls-remote http://localhost:8000/ > > expected: > dumb http remotes should behave like file remotes In general, that's not possible, because HTTP doesn't support native atomic operations. An HTTP push locks the remote with DAV by preventing other changes to `info/refs`, which makes the operation atomic. If that file isn't there, then there's no way to guarantee that the ref update isn't competing with others, which might cause data loss. In addition, HTTP also doesn't support native machine-readable directory listings except with DAV. However, we don't require DAV for fetches, so we need a list of the refs and the packs in order to be able to download objects correctly. > > git --bare update-server-info > > that command creates the file > /path/to/repo2/.git/info/refs > but that is just an optimization > for http servers with high latency > > my "dumb" http server > is smart enough to handle http range requests > so there is no need > to download all the files from .git/ > > but also without http range requests > this should "just work" > and the user should be responsible for optimizations As I said above, I don't think that's the concern here. I will also note that the dumb HTTP protocol doesn't work with reftable and there was some suggestion of removing it for Git 3.0. It certainly will not work out of the box with Git 3.0, since the default is reftable. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-07 15:07 ` brian m. carlson @ 2025-09-07 17:23 ` Milan Hauth 2025-09-07 17:42 ` brian m. carlson 2025-09-08 9:40 ` Patrick Steinhardt 1 sibling, 1 reply; 11+ messages in thread From: Milan Hauth @ 2025-09-07 17:23 UTC (permalink / raw) To: brian m. carlson, Milan Hauth, git > HTTP push im only talking about "read" operations: git ls-remote git fetch git pull ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-07 17:23 ` Milan Hauth @ 2025-09-07 17:42 ` brian m. carlson 0 siblings, 0 replies; 11+ messages in thread From: brian m. carlson @ 2025-09-07 17:42 UTC (permalink / raw) To: Milan Hauth; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1036 bytes --] On 2025-09-07 at 17:23:28, Milan Hauth wrote: > > HTTP push > > im only talking about "read" operations: > > git ls-remote > git fetch > git pull Yes, as I said, reading directories is only possible with WebDAV since HTTP doesn't offer native directory listing. However, we don't use WebDAV for fetches and other read operations and not all web servers support it. We get better web server support in many cases by requiring that the server side do the work of updating the lists of packs and refs. Without some way to list directories, you cannot in the general case iterate over the refs and packs in the file system without a manifest, so creating a manifest is what update-server-info does. This is also why most tools which provide HTTP access in the file system require WebDAV, since it isn't very useful to have a file system where you can't list directories. (We abandoned directory-less file systems in the early DOS and Macintosh days.) -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-07 15:07 ` brian m. carlson 2025-09-07 17:23 ` Milan Hauth @ 2025-09-08 9:40 ` Patrick Steinhardt 2025-09-08 14:43 ` Junio C Hamano 1 sibling, 1 reply; 11+ messages in thread From: Patrick Steinhardt @ 2025-09-08 9:40 UTC (permalink / raw) To: brian m. carlson, Milan Hauth, git On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote: > I will also note that the dumb HTTP protocol doesn't work with reftable > and there was some suggestion of removing it for Git 3.0. It certainly > will not work out of the box with Git 3.0, since the default is > reftable. Yes, indeed. In theory though reftables could also be the solution to the underlying issue: the client can be tought to read the "tables.list" file and then fetch all tables listed therein. The result would be fully consistent, unless any of the tables gets garbage collected. The client would notice and abort the operation, after which it could restart the operation. In that case there would be no need for git-update-server-info(1) anymore. The "tables.list" file sits in a well-known location, identifies all other tables we have to download, and there are no atomicity issues anymore. The catch of course is that somebody would have to implement this :) Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-08 9:40 ` Patrick Steinhardt @ 2025-09-08 14:43 ` Junio C Hamano 2025-09-09 5:26 ` Patrick Steinhardt 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2025-09-08 14:43 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: brian m. carlson, Milan Hauth, git Patrick Steinhardt <ps@pks.im> writes: > On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote: >> I will also note that the dumb HTTP protocol doesn't work with reftable >> and there was some suggestion of removing it for Git 3.0. It certainly >> will not work out of the box with Git 3.0, since the default is >> reftable. > > Yes, indeed. In theory though reftables could also be the solution to > the underlying issue: the client can be tought to read the "tables.list" > file and then fetch all tables listed therein. The result would be fully > consistent, unless any of the tables gets garbage collected. The client > would notice and abort the operation, after which it could restart the > operation. > > In that case there would be no need for git-update-server-info(1) > anymore. The "tables.list" file sits in a well-known location, > identifies all other tables we have to download, and there are no > atomicity issues anymore. Does tables.list list what pack files there are in the repository? I somehow doubt it. The dumb HTTP transport was meant to be able to operate with a truly dumb HTTP server, that does not even have to support WebDAV at all, so there needs some tables at known name that lists _all_ the files the cloners are expected to be able to download from. We still need the output from update-server-info [*] to tell what packs are there even if tables.list is stored at the known path. [Footnote] * ... or its equivalent generated offline and uploaded manually to the repository, which was what I did before I got an account at kernel.org. There was a small web space at local ISP provided for its subscribers, so my "push" was to ftp upload the loose objects, packs, refs, and the info/refs + objects/info/packs files there X-<. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-08 14:43 ` Junio C Hamano @ 2025-09-09 5:26 ` Patrick Steinhardt 0 siblings, 0 replies; 11+ messages in thread From: Patrick Steinhardt @ 2025-09-09 5:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, Milan Hauth, git On Mon, Sep 08, 2025 at 07:43:20AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > On Sun, Sep 07, 2025 at 03:07:11PM +0000, brian m. carlson wrote: > >> I will also note that the dumb HTTP protocol doesn't work with reftable > >> and there was some suggestion of removing it for Git 3.0. It certainly > >> will not work out of the box with Git 3.0, since the default is > >> reftable. > > > > Yes, indeed. In theory though reftables could also be the solution to > > the underlying issue: the client can be tought to read the "tables.list" > > file and then fetch all tables listed therein. The result would be fully > > consistent, unless any of the tables gets garbage collected. The client > > would notice and abort the operation, after which it could restart the > > operation. > > > > In that case there would be no need for git-update-server-info(1) > > anymore. The "tables.list" file sits in a well-known location, > > identifies all other tables we have to download, and there are no > > atomicity issues anymore. > > Does tables.list list what pack files there are in the repository? > I somehow doubt it. > > The dumb HTTP transport was meant to be able to operate with a truly > dumb HTTP server, that does not even have to support WebDAV at all, > so there needs some tables at known name that lists _all_ the files > the cloners are expected to be able to download from. We still need > the output from update-server-info [*] to tell what packs are there > even if tables.list is stored at the known path. Oh, you're right. I only remembered that we need it for refs, but of course we also need it for packs. Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth 2025-09-07 15:07 ` brian m. carlson @ 2025-09-08 0:05 ` Jeff King 2025-09-08 4:14 ` Junio C Hamano 2025-09-08 21:27 ` brian m. carlson 1 sibling, 2 replies; 11+ messages in thread From: Jeff King @ 2025-09-08 0:05 UTC (permalink / raw) To: Milan Hauth; +Cc: git On Sun, Sep 07, 2025 at 01:24:13PM +0200, Milan Hauth wrote: > expected: > dumb http remotes should behave like file remotes File remotes are running a local git-upload-pack to act as the server. A dumb http remote can't run anything on the server side. So we are stuck pretending that the http endpoint is a filesystem. Besides performing terribly, as brian noted it is not even portable to do, since there is no readdir() equivalent (and things like httpdirfs have to resort to scraping auto-generated directory listings). If you want to go that route, I think you're better off using a fuse wrapper and just letting Git work against the mounted filesystem, as you tried here: > as another workaround > i tried to mount the .git/ directory with httpdirfs > but httpdirfs fails to mount git repos, see > https://github.com/fangfufu/httpdirfs/issues/183 I think this does work. The instructions you gave there won't do it, though, because python's http.server module doesn't support range requests. Something like: sudo apt install python3-rangehttpserver (cd /path/to/repo && python -m RangeHTTPServer) mkdir mnt sudo httpdirfs http://localhost:8000/.git mnt got me a mount that worked with: # you could probably avoid sudo here with better mount perms above sudo git clone mnt foo It's painfully slow, though. Possibly dumb-http could learn to do the same scraping that httpdirfs does to get the refs and pack listings (though this might be quite slow for unpacked refs, if the ref tree is deep). But I doubt you will find anybody that enthused about working on or reviewing dumb-http patches these days. The code is not very well maintained, IMO. -Peff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-08 0:05 ` Jeff King @ 2025-09-08 4:14 ` Junio C Hamano 2025-09-08 21:27 ` brian m. carlson 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2025-09-08 4:14 UTC (permalink / raw) To: Jeff King; +Cc: Milan Hauth, git Jeff King <peff@peff.net> writes: > Possibly dumb-http could learn to do the same scraping that httpdirfs > does to get the refs and pack listings (though this might be quite slow > for unpacked refs, if the ref tree is deep). Please don't. Once you go that route, that is no longer "dumb http" at all. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-08 0:05 ` Jeff King 2025-09-08 4:14 ` Junio C Hamano @ 2025-09-08 21:27 ` brian m. carlson 2025-09-09 1:35 ` Jeff King 1 sibling, 1 reply; 11+ messages in thread From: brian m. carlson @ 2025-09-08 21:27 UTC (permalink / raw) To: Jeff King; +Cc: Milan Hauth, git [-- Attachment #1: Type: text/plain, Size: 2004 bytes --] On 2025-09-08 at 00:05:43, Jeff King wrote: > Possibly dumb-http could learn to do the same scraping that httpdirfs > does to get the refs and pack listings (though this might be quite slow > for unpacked refs, if the ref tree is deep). But I doubt you will find > anybody that enthused about working on or reviewing dumb-http patches > these days. The code is not very well maintained, IMO. That kind of scraping is really not a good idea. It's the equivalent of trying to parse the FTP LIST output, which is customarily the equivalent of `ls -l`, but doesn't have to be, and often isn't on Windows. I can tell you from experience how painful doing that is and how many bug reports come in when you try to use that (because the FTP server also doesn't support either variant of the machine-readable format that solves this problem). It's also prone to breaking things because some HTTP servers have weird redirect loops due to sorting entries in the index pages that you have to be careful not to trigger. And, of course, that depends on the server having index pages turned on, which many do not. And then you have to do all of this in C with pointer arithmetic, with all of the terrifying security properties that has, especially because you can't actually use an XML parser to do it (unlike in DAV) since many servers have index pages that are (possibly invalid) autogenerated HTML 3.2 or something. I've seen exactly one piece of software (lftp) that hasn't done a terribly awful implementation of this and that happens to have worked every time I've tried it (which is extremely infrequently, so I probably just haven't found a server which breaks it yet). Every other piece of software I've used that's done this kind of thing has just been broken and I anticipate Git would be no different. I do want to be very clear that this is a bad idea and I hope never to see such patches come into Git. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Git dumb HTTP protocol should work without update-server-info 2025-09-08 21:27 ` brian m. carlson @ 2025-09-09 1:35 ` Jeff King 0 siblings, 0 replies; 11+ messages in thread From: Jeff King @ 2025-09-09 1:35 UTC (permalink / raw) To: brian m. carlson; +Cc: Milan Hauth, git On Mon, Sep 08, 2025 at 09:27:23PM +0000, brian m. carlson wrote: > And then you have to do all of this in C with pointer arithmetic, with > all of the terrifying security properties that has, especially because > you can't actually use an XML parser to do it (unlike in DAV) since many > servers have index pages that are (possibly invalid) autogenerated HTML > 3.2 or something. I hear you could do it in rust these days. ;) But more seriously, yes, I don't think it's a good idea. Maybe I should have been more emphatic in my original message about that. But there is nothing stopping somebody who wanted to implement such a thing for themselves, which is probably more productive than asking somebody on this list to work on it. -Peff ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-09-09 5:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-07 11:24 Git dumb HTTP protocol should work without update-server-info Milan Hauth 2025-09-07 15:07 ` brian m. carlson 2025-09-07 17:23 ` Milan Hauth 2025-09-07 17:42 ` brian m. carlson 2025-09-08 9:40 ` Patrick Steinhardt 2025-09-08 14:43 ` Junio C Hamano 2025-09-09 5:26 ` Patrick Steinhardt 2025-09-08 0:05 ` Jeff King 2025-09-08 4:14 ` Junio C Hamano 2025-09-08 21:27 ` brian m. carlson 2025-09-09 1:35 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).