* git-fetch takes forever on a slow network link. Can parallel mode help? @ 2026-03-06 20:13 R. Diez 2026-03-06 20:54 ` brian m. carlson 0 siblings, 1 reply; 9+ messages in thread From: R. Diez @ 2026-03-06 20:13 UTC (permalink / raw) To: git Hi all: I have an SMB/CIFS connection to a file server over a slow link of about 1 Mbps download, and a faster upload of about 10 Mbps. My smallish Git repository has its single origin on that file server. Unfortunately, I cannot set up any sort of Git server on the remote host. git fetch takes a long time. If the repository is up to date, it takes about 25 seconds to realise that there is nothing to do. If there are changes to download, it can take half an hour, even if the new commit history is rather small. The network link is slow, but not that slow. I wonder what may be causing the long delays. The first question is: how come it takes so long to determine that nothing has changed? Does git-fetch need to download a biggish file every time? Perhaps latency is more of an issue than bandwidth. I saw that git-fetch can work in parallel with --jobs=n . Doing parallel requests may help against round trip latency. However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules. Adding --jobs=10 does not help in the 25-second case with no new commits to download. Does anybody have any ideas about how to improve performance in this scenario? Thanks in advance, rdiez ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez @ 2026-03-06 20:54 ` brian m. carlson 2026-03-07 21:28 ` R. Diez 0 siblings, 1 reply; 9+ messages in thread From: brian m. carlson @ 2026-03-06 20:54 UTC (permalink / raw) To: R. Diez; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3484 bytes --] On 2026-03-06 at 20:13:58, R. Diez wrote: > Hi all: Hey, > I have an SMB/CIFS connection to a file server over a slow link of about 1 Mbps download, and a faster upload of about 10 Mbps. > > My smallish Git repository has its single origin on that file server. Unfortunately, I cannot set up any sort of Git server on the remote host. > > git fetch takes a long time. If the repository is up to date, it takes about 25 seconds to realise that there is nothing to do. > > If there are changes to download, it can take half an hour, even if the new commit history is rather small. > > The network link is slow, but not that slow. I wonder what may be causing the long delays. > > The first question is: how come it takes so long to determine that nothing has changed? Does git-fetch need to download a biggish file every time? 1 Mbps is considered extremely slow for a modern disk. A floppy disk was 250 kbps[0], so your speed is about four times that of a floppy disk. Hard disks in 1998 were about 10 MB/s[1], so about 80 times that speed. That's definitely a big part of the problem. Since this is presumably a bare repository, Git will first read the remote references to determine what's available, so if you're using the default files backend, it will read each of the refs, which may involve many small network requests. This performance could be improved with `git pack-refs` or by converting to the reftable backend, which will open fewer files. reftable also uses some simple compression for ref names, which will help as well, but it requires a relatively recent Git. `git refs migrate` can be used to convert to reftable if you like. Once Git knows what the remote repository's refs are, it will need to walk the history to find out what it does and doesn't have. If there are many lines of development, then Git will do more work; if there is just one main branch to fetch, then there will be less. This will involve opening every loose commit or tag object or reading every packed commit or tag object in the history path to determine what needs to be copied. If there's nothing to copy, then Git can determine that from the refs and won't walk any history or copy any objects. If you _do_ have to transfer data, I'm not sure whether having the data packed or loose will be more efficient in your case due to the slow speed. You can try packing the repository with `git gc` and see how that affects future transfers. If latency is the cost, then packing will almost certainly be more efficient. You can also see how long various operations take by using `GIT_TRACE2=1`, which will give some detailed timing information that will help you see what the expensive parts are. If you have some trace output showing timings, we can advise on what you might do to help us address performance. > However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules. Parallel mode does not help with a single remote. All the data for a single remote comes in one job. [0] https://stackoverflow.com/questions/52841124/how-fast-could-you-read-write-to-floppy-disks-both-3-1-4-and-5-1-2 [1] https://goughlui.com/the-hard-disk-corner/hard-drive-performance-over-the-years/ -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-06 20:54 ` brian m. carlson @ 2026-03-07 21:28 ` R. Diez 2026-03-08 1:44 ` brian m. carlson 0 siblings, 1 reply; 9+ messages in thread From: R. Diez @ 2026-03-07 21:28 UTC (permalink / raw) To: brian m. carlson; +Cc: git Hallo Brian: First of all, thanks for your quick feedback. > Since this is presumably a bare repository, Yes, the remote repository is bare. > [...] > This performance could be improved with `git pack-refs` After looking around, it turns out that the documentation of "git gc" says that "packing refs" is one of the things it already does. I'll check when it was the last time I did a "git gc" on the remote bare repository, when I'm there again. > or by converting to the reftable backend, which will open fewer files. The documentation states: "reftable for the reftable format. This format is experimental and its internals are subject to change.". I am not ready to risk it yet on my precious Git repository. 8-) > [...] > You can also see how long various operations take by using > `GIT_TRACE2=1`, which will give some detailed timing information that > will help you see what the expensive parts are. That didn't help much. Most of the time (23.7 from 24 seconds) is spent in a single child process: child_start[0] 'git-upload-pack '\''/home/rdiez/MountPoints/blah/blah'\''' The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong? I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown: This takes around 13 seconds: pkt-line.c:85 packet: upload-pack< 0000 I don't know what 0000 means. All other similar "upload-pack" lines have a hash there. About 2 seconds are spent here: pkt-line.c:85 packet: upload-pack> [some hash] HEAD symref-target:refs/heads/master pkt-line.c:85 packet: upload-pack> [some hash] refs/heads/master 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server. >> However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules. > > Parallel mode does not help with a single remote. All the data for a single remote comes in one job. Is this due to a simple implementation in Git? Could Git download such "refs/tags" files in parallel? Best regards, rdiez ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-07 21:28 ` R. Diez @ 2026-03-08 1:44 ` brian m. carlson 2026-03-08 21:08 ` R. Diez 0 siblings, 1 reply; 9+ messages in thread From: brian m. carlson @ 2026-03-08 1:44 UTC (permalink / raw) To: R. Diez; +Cc: git [-- Attachment #1: Type: text/plain, Size: 6440 bytes --] On 2026-03-07 at 21:28:10, R. Diez wrote: > Hallo Brian: Hey, > > This performance could be improved with `git pack-refs` > > After looking around, it turns out that the documentation of "git gc" says that "packing refs" is one of the things it already does. > > I'll check when it was the last time I did a "git gc" on the remote bare repository, when I'm there again. Yes, this is part of a gc. However, packing refs is much lighter than a full GC and will therefore be much faster to complete. > > or by converting to the reftable backend, which will open fewer files. > > The documentation states: "reftable for the reftable format. This format is experimental and its internals are subject to change.". I am not ready to risk it yet on my precious Git repository. 8-) It will be the default on Git 3.0 and it's in use on major forges. I also use it on several of my development repositories. It's stable and functional. I'll try to send a patch to fix that text. I would definitely recommend at the very least Git 2.51 for this and ideally the latest stable version, 2.53. Git has had a lot of work on this format to improve performance and stability over the past few releases. > That didn't help much. Most of the time (23.7 from 24 seconds) is spent in a single child process: > child_start[0] 'git-upload-pack '\''/home/rdiez/MountPoints/blah/blah'\''' > > The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong? upload-pack refers to what's happening on the server. If you contact a Git server over something like HTTPS or SSH, then it will use git-upload-pack to send data to you (a fetch or clone from your perspective) or git-receive-pack to receive data from you (a push from your perspective). When you perform a local fetch, upload-pack is spawned in the remote repository to serve data. > I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown: > > This takes around 13 seconds: > > pkt-line.c:85 packet: upload-pack< 0000 Is it just that line that takes 13 seconds or is the listing of references altogether that takes 13 seconds? That particular line should not take 13 seconds because it's literally just writing and flushing 4 bytes. It would be helpful if you can to include the entire trace output so we can see and analyze it ourselves. It's very hard to analyze data from the different sections in isolation if one is not intimately familiar with the protocol. > I don't know what 0000 means. All other similar "upload-pack" lines have a hash there. Git uses a pkt-line format where each line or chunk of data is preceded by the total length of the data (including the length itself) encoded as four hex characters. So a single byte of data with the value A plus a newline would be `0006A\n` (four bytes for the length, plus two bytes of data). The special code 0000 is a flush packet and means that the end of a command or a section has been reached. That's how Git knows the advertisement has finished. `GIT_TRACE_PACKET` does not normally print the pkt-line unless it's a flush (0000) packet or a delimiter (0001) packet, since it would just be noise. > About 2 seconds are spent here: > > pkt-line.c:85 packet: upload-pack> [some hash] HEAD symref-target:refs/heads/master > pkt-line.c:85 packet: upload-pack> [some hash] refs/heads/master That's sending references, which is expected. > 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server. Okay, this is helpful. You probably have the `peel` capability, which means that when you have a tag, you get a line like this: 4a76996b9c60ca3f21e644d78e1e5089a06c6fb3 refs/tags/v0.1.0 peeled:b4c993704e90881bec9c217749be813c70ae2bb6 That `peeled` directive tells us what object the tag points to, but it means that the tag object has to be opened and read, which makes things much more expensive. Unfortunately, there's no way to turn that capability off, since Git doesn't usually have capability control options for the protocol. _However_, if you pack references with `git pack-refs` or you use reftable, then Git will store the references both peeled and unpeeled, so it doesn't need to compute that. reftable is better because _all_ tags are stored both peeled and unpeeled, but as long as you're writing new references into a files-style repository, the new references are unpacked (and therefore contain no peeling information). reftable is also a binary format which means that it's smaller than a packed-refs file and since your read speed is the limiting factor, that should make reads faster. > > > However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules. > > > > Parallel mode does not help with a single remote. All the data for a single remote comes in one job. > > Is this due to a simple implementation in Git? Could Git download such "refs/tags" files in parallel? Git is already downloading them as efficiently as possible. The protocol has both sides advertise the references (branches, tags, etc.) that they have and then, in a fetch or clone, the client sends a list of what it has and what it wants, and the two sides negotiate to come to an agreement on what needs to be sent. This shared understanding includes _all_ of the objects necessary for everything the client wants but doesn't have, and then those are all sent as part of one pack. Parallelization would not help here because the limiting factor is the speed of the connection (and in your case, literally the speed of reading data off the file system). A different design with parallelization might work if one had a very fast connection and the speed of deltification and compression were slower than enough to max out the connection, but that point is around 50 MB/s in a typical situation and that wouldn't matter here because the server component is on the same file systems as well. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-08 1:44 ` brian m. carlson @ 2026-03-08 21:08 ` R. Diez 2026-03-08 22:52 ` brian m. carlson 0 siblings, 1 reply; 9+ messages in thread From: R. Diez @ 2026-03-08 21:08 UTC (permalink / raw) To: brian m. carlson; +Cc: git Hi again: >> The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong? > > upload-pack refers to what's happening on the server. If you contact a > Git server over something like HTTPS or SSH, then it will use > git-upload-pack to send data to you (a fetch or clone from your > perspective) or git-receive-pack to receive data from you (a push from > your perspective). > > When you perform a local fetch, upload-pack is spawned in the remote > repository to serve data. My client computer has an SMB/CIFS connection to the remote file server. That means the client has mounted the file share with "mount.cifs", so in this scenario nothing is happening on the server, as the connection is not HTTPS or SSH. No process will be spawned on the remote server. That is the reason why I am getting confused. From my point of view, my client computer is not "uploading" anything when doing a "git pull". But I guess Git is designed for all scenarios and will probably not use the correct terminology in my case. In case it helps, I am using Git version 2.53.0. >> I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown: >> >> This takes around 13 seconds: >> >> pkt-line.c:85 packet: upload-pack< 0000 > > Is it just that line that takes 13 seconds or is the listing of > references altogether that takes 13 seconds? That particular line > should not take 13 seconds because it's literally just writing and > flushing 4 bytes. > > It would be helpful if you can to include the entire trace output so we > can see and analyze it ourselves. It's very hard to analyze data from > the different sections in isolation if one is not intimately familiar > with the protocol. The log does not really say which operation is taking how long. It does not say when the listing of references starts or finishes, which files it is reading and how many bytes it is reading from each file, or whether the files are read sequentially or in parallel. Thanks for your feedback. I know it is hard to help without the whole log, but I would have to ask for permission to upload a log with file paths, hashes and tag names. Or clean them all manually. >> 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server. > > Okay, this is helpful. You probably have the `peel` capability, which > means that when you have a tag, you get a line like this: > > 4a76996b9c60ca3f21e644d78e1e5089a06c6fb3 refs/tags/v0.1.0 peeled:b4c993704e90881bec9c217749be813c70ae2bb6 Yes, that is the case. > That `peeled` directive tells us what object the tag points to, but it > means that the tag object has to be opened and read, which makes things > much more expensive. Unfortunately, there's no way to turn that > capability off, since Git doesn't usually have capability control > options for the protocol. OK, but there is no protocol here, Git is accessing the files over the mount. > _However_, if you pack references with `git pack-refs` or you use > [...] OK, I'll try with "git gc" on the remote server the next time I can. > Git is already downloading them as efficiently as possible. The > protocol has both sides advertise the references (branches, tags, etc.) > that they have and then, in a fetch or clone, the client sends a list of > what it has and what it wants, and the two sides negotiate to come to an > agreement on what needs to be sent. This shared understanding includes > _all_ of the objects necessary for everything the client wants but > doesn't have, and then those are all sent as part of one pack. > > Parallelization would not help here because the limiting factor is the > speed of the connection (and in your case, literally the speed of > reading data off the file system). > [...] I don't think that is the case. Git is accessing the remote repository over a mount (a file share), so there is no protocol or negotiation, although I am guessing it is happening virtually with the current Git implementation. If I understand it correctly, without "packed references", Git will have to access a number of small files on the remote server. Even with packet references, there will probably still be a few small files to access, in addition to some biggish packed references file. In the past, on rotational hard disks, issuing many such read requests in parallel wasn't beneficial to performance, because of the disk head seek times. That is, jumping around would thrash the disk instead of increasing performance. But that is not true anymore with SSDs, and especially with file mounts over a network connection with a high latency. In that scenario, issuing parallel requests (with multiple threads or async I/O) should actually increase performance. Is my reasoning correct? Another question: Would it help if I only fetched the 'master' branch? Something like "git fetch origin master". Most of the time, I am only interested in the main branch. I am guessing that "git fetch" will download all other branches by default, because of this: [remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* I read the "git fetch" documentation, but I didn't understand whether it will fetch by default everything or just the current branch. Thanks again, rdiez ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-08 21:08 ` R. Diez @ 2026-03-08 22:52 ` brian m. carlson 2026-03-09 21:08 ` R. Diez 0 siblings, 1 reply; 9+ messages in thread From: brian m. carlson @ 2026-03-08 22:52 UTC (permalink / raw) To: R. Diez; +Cc: git [-- Attachment #1: Type: text/plain, Size: 5533 bytes --] On 2026-03-08 at 21:08:41, R. Diez wrote: > My client computer has an SMB/CIFS connection to the remote file server. That means the client has mounted the file share with "mount.cifs", so in this scenario nothing is happening on the server, as the connection is not HTTPS or SSH. No process will be spawned on the remote server. > > That is the reason why I am getting confused. From my point of view, my client computer is not "uploading" anything when doing a "git pull". > > But I guess Git is designed for all scenarios and will probably not use the correct terminology in my case. For an initial clone on a local file system, Git may shortcut spawning an upload-pack helper and simply copy or hard link files, but otherwise, all fetches require the use of upload-pack. There are a couple reasons for this. First, upload-pack is specifically designed to deal with untrusted data without executing code or honouring configuration values, which is important for security reasons. Second, when you're doing a fetch, Git wants to copy only the necessary objects and it can only do that with a helper that can read the objects. Simply copying every pack and loose object would lead to enormous bloating of your client repository because you'd end up with several copies of each object. > The log does not really say which operation is taking how long. It does not say when the listing of references starts or finishes, which files it is reading and how many bytes it is reading from each file, or whether the files are read sequentially or in parallel. The log includes timestamps, which allow us to infer that information. > Thanks for your feedback. I know it is hard to help without the whole log, but I would have to ask for permission to upload a log with file paths, hashes and tag names. Or clean them all manually. I'm afraid that without more information, it's going to be difficult for me or anyone else to give you accurate answers about how to improve this. The trace data is specifically designed to allow us to troubleshoot problems and most forges and Git-adjacent projects would require you to provide a full trace output before even investigating further. > OK, but there is no protocol here, Git is accessing the files over the mount. As mentioned above, there is a protocol because Git always uses one for fetches. > I don't think that is the case. Git is accessing the remote repository over a mount (a file share), so there is no protocol or negotiation, although I am guessing it is happening virtually with the current Git implementation. `git fetch` from a remote repository on a file system spawns an upload-pack process in the remote repository to handle the transfer. `git fetch` then speaks to it over standard input and standard output. So the normal protocol is being used. > If I understand it correctly, without "packed references", Git will have to access a number of small files on the remote server. Even with packet references, there will probably still be a few small files to access, in addition to some biggish packed references file. Correct. > In the past, on rotational hard disks, issuing many such read requests in parallel wasn't beneficial to performance, because of the disk head seek times. That is, jumping around would thrash the disk instead of increasing performance. > > But that is not true anymore with SSDs, and especially with file mounts over a network connection with a high latency. In that scenario, issuing parallel requests (with multiple threads or async I/O) should actually increase performance. Git, like virtually every other Unix program, is not designed for high latency file systems. Yes, in theory it could be faster to issue multiple requests, but that would increase the need to buffer large amounts of data in memory, increasing memory usage, and in the general case, the fact is that the file system is much lower latency and much faster than the network connection over which data is being sent, so that's the case that Git optimizes for. rsync would also perform poorly in your case because it's again optimized for sending less data over the network than it receives from the file system. Similarly with tar over a network pipe. So it's certainly the case that Git could handle this case better, but it also optimizes for the common case like virtually every other modern Unix program. If you think it might be faster, you could try rsyncing the remote repository to a separate directory on your local machine and then fetching from that. That does require that both directories are completely quiescent at the moment with no modification at all. > Another question: Would it help if I only fetched the 'master' branch? Something like "git fetch origin master". Most of the time, I am only interested in the main branch. That would likely be faster. You may also want `--no-tags`, which prevents downloading tags that would point into the main branch. > I am guessing that "git fetch" will download all other branches by default, because of this: > > [remote "origin"] > fetch = +refs/heads/*:refs/remotes/origin/* > > I read the "git fetch" documentation, but I didn't understand whether it will fetch by default everything or just the current branch. A `git fetch origin` with that configuration will fetch every branch and every tag that points into one of those branches. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-08 22:52 ` brian m. carlson @ 2026-03-09 21:08 ` R. Diez 2026-03-10 22:50 ` brian m. carlson 0 siblings, 1 reply; 9+ messages in thread From: R. Diez @ 2026-03-09 21:08 UTC (permalink / raw) To: brian m. carlson; +Cc: git First of all, thanks for the information about upload-pack etc. > [...] > the fact is that the file system is much lower latency and much> faster than the network connection over which data is being sent, so > that's the case that Git optimizes for. I wouldn't say that reading sequentially is "optimising". It is just the limitation of a simple implementation. Like I said, with modern SSDs, issuing requests in parallel will be faster even on a local filesystem. That would be a real optimisation then. Some elderly Unix tools like GNU Make realised long time ago that parallel operation is the way to go. Git itself has realised too, so that it can now work in parallel in certain cases (multiple remote repositories, multiple submodules). So old Unix tools don't count as an excuse! I think we should clearly point out this deficiency. Git must not be perfect, but I would rather know the limitations upfront. At the very least, that would help me make decisions faster, like investing in some sort of a Git server instead of trying to optimise the SMB/CIFS mount. And who knows, maybe someone will see this post in the future and decide to implement parallel file operations (async I/O) inside upload-pack and the like. > rsync would also perform poorly in your case because it's again > optimized for sending less data over the network than it receives from > the file system. Similarly with tar over a network pipe. rsync would probably look at the file dates and sizes and not transfer everything. There are even some parallel rsync variants designed to overcome high network latencies. But I don't think rsync is worth the effort for me. I'll just wait a while longer every now and then. There is one more thing I am curious about. Git does not document how it uses SSH (or at least I couldn't find it in the standard end-user documentation). Git cannot launch a process on the target host over SSH, unless Git is already installed on the remote system. After all, the local system may have a different architecture (like AMD vs ARM), so you cannot copy a binary across. And I haven't seen the requirement that Git must be installed on the remote host when connecting over SSH. In that case, I would have probably seen somewhere a version compatibility table between client and server. So Git must be accessing files over SSH using the standard SSH file transfer operations. I am guessing that the same latency problem will apply here too, because uploads and downloads over SSH will also be sequential. Is my reasoning correct? Or does Git attempt to find out whether there is a Git on the other side? What happens if there isn't then? > [...] > A `git fetch origin` with that configuration will fetch every branch and > every tag that points into one of those branches. OK, thanks. It turns out my repository has no branches at all, so that wouldn't help me anyway. Best regards, rdiez ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-09 21:08 ` R. Diez @ 2026-03-10 22:50 ` brian m. carlson 2026-03-11 18:05 ` R. Diez 0 siblings, 1 reply; 9+ messages in thread From: brian m. carlson @ 2026-03-10 22:50 UTC (permalink / raw) To: R. Diez; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3287 bytes --] On 2026-03-09 at 21:08:31, R. Diez wrote: > There is one more thing I am curious about. Git does not document how it uses SSH (or at least I couldn't find it in the standard end-user documentation). Git cannot launch a process on the target host over SSH, unless Git is already installed on the remote system. After all, the local system may have a different architecture (like AMD vs ARM), so you cannot copy a binary across. And I haven't seen the requirement that Git must be installed on the remote host when connecting over SSH. In that case, I would have probably seen somewhere a version compatibility table between client and server. > > So Git must be accessing files over SSH using the standard SSH file transfer operations. I am guessing that the same latency problem will apply here too, because uploads and downloads over SSH will also be sequential. Is my reasoning correct? Git doesn't use standard SSH file transfer operations. That would be much slower and it also works poorly when the remote side doesn't grant access to a file system, such as with a forge or gitolite. SSH allows multiple commands to be run over a single connection with `-oControlMaster`, which can improve performance. The benefit to running a single command for each Git operation is that we can do authentication once at the beginning of each command, whereas if we have a long-running SSH connection and attempt to do SFTP, we might interleave requests on different repositories, so each request would have to perform authentication. That's not a problem if you're using Unix permissions to control access, but it scales really poorly when your Git data is actually spread across many different file servers and the user is accessing multiple repositories, such as is common on forges. Using SSH file transfer operations also would not work well because you would effectively have to download every pack file and loose object to be sure you got the data you need, instead of getting a pack with only a few objects if that's all you need. However, you can of course mount a remote file system as SFTP with `sshfs` and use it as a local file system if you actually have a real file system on the remote side. That will send multiple requests over the connection when reading or writing since the `sshfs` does queue those. > Or does Git attempt to find out whether there is a Git on the other side? What happens if there isn't then? Git invokes git-upload-pack on the remote side and talks to it over standard input and output. If there isn't one, then the operation fails. Here's an example: ---- % GIT_TRACE=1 git ls-remote git@github.com:git/git.git 22:41:36.731673 git.c:502 trace: built-in: git ls-remote git@github.com:git/git.git 22:41:36.731937 run-command.c:673 trace: run_command: unset GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '\''git/git.git'\''' 22:41:36.731952 run-command.c:765 trace: start_command: /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '\''git/git.git'\''' ---- I don't have any systems without Git on them, so I can't demonstrate the failure case. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git-fetch takes forever on a slow network link. Can parallel mode help? 2026-03-10 22:50 ` brian m. carlson @ 2026-03-11 18:05 ` R. Diez 0 siblings, 0 replies; 9+ messages in thread From: R. Diez @ 2026-03-11 18:05 UTC (permalink / raw) To: brian m. carlson; +Cc: git > Git doesn't use standard SSH file transfer operations. > [...] OK, thanks for the information. I have finally done a "git gc" on the server side, and now a "git pull" from the client with no new commits to download takes 4 seconds, a drastic reduction from the 25 seconds it took before. I turns out I hadn't done a "git gc" on the server for over 2 years, so that many new references weren't packed. Therefore, I think that having many small files to read versus one packed-refs file makes a huge difference if you have mounted a remote filesystem over a network with a relatively high latency. My 1 Mbps connection does not actually have such a high latency (around 40 ms measured with ping), but latency seems to have a much greater impact than the low bandwidth, at least with a packed-refs file which only weighs 64 kB. Best regards, rdiez ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-11 18:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez 2026-03-06 20:54 ` brian m. carlson 2026-03-07 21:28 ` R. Diez 2026-03-08 1:44 ` brian m. carlson 2026-03-08 21:08 ` R. Diez 2026-03-08 22:52 ` brian m. carlson 2026-03-09 21:08 ` R. Diez 2026-03-10 22:50 ` brian m. carlson 2026-03-11 18:05 ` R. Diez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox