git-fetch takes forever on a slow network link. Can parallel mode help?

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

* git-fetch takes forever on a slow network link. Can parallel mode help?
@ 2026-03-06 20:13 R. Diez
  2026-03-06 20:54 ` brian m. carlson
  0 siblings, 1 reply; 9+ messages in thread
From: R. Diez @ 2026-03-06 20:13 UTC (permalink / raw)
  To: git

Hi all:

I have an SMB/CIFS connection to a file server over a slow link of about 1 Mbps download, and a faster upload of about 10 Mbps.

My smallish Git repository has its single origin on that file server. Unfortunately, I cannot set up any sort of Git server on the remote host.

git fetch takes a long time. If the repository is up to date, it takes about 25 seconds to realise that there is nothing to do.

If there are changes to download, it can take half an hour, even if the new commit history is rather small.

The network link is slow, but not that slow. I wonder what may be causing the long delays.

The first question is: how come it takes so long to determine that nothing has changed? Does git-fetch need to download a biggish file every time?

Perhaps latency is more of an issue than bandwidth. I saw that git-fetch can work in parallel with --jobs=n . Doing parallel requests may help against round trip latency.

However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules.

Adding --jobs=10 does not help in the 25-second case with no new commits to download.

Does anybody have any ideas about how to improve performance in this scenario?

Thanks in advance,
   rdiez

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez
@ 2026-03-06 20:54 ` brian m. carlson
  2026-03-07 21:28   ` R. Diez
  0 siblings, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2026-03-06 20:54 UTC (permalink / raw)
  To: R. Diez; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3484 bytes --]

On 2026-03-06 at 20:13:58, R. Diez wrote:
> Hi all:

Hey,

> I have an SMB/CIFS connection to a file server over a slow link of about 1 Mbps download, and a faster upload of about 10 Mbps.
> 
> My smallish Git repository has its single origin on that file server. Unfortunately, I cannot set up any sort of Git server on the remote host.
> 
> git fetch takes a long time. If the repository is up to date, it takes about 25 seconds to realise that there is nothing to do.
> 
> If there are changes to download, it can take half an hour, even if the new commit history is rather small.
> 
> The network link is slow, but not that slow. I wonder what may be causing the long delays.
> 
> The first question is: how come it takes so long to determine that nothing has changed? Does git-fetch need to download a biggish file every time?

1 Mbps is considered extremely slow for a modern disk.  A floppy disk
was 250 kbps[0], so your speed is about four times that of a floppy
disk.  Hard disks in 1998 were about 10 MB/s[1], so about 80 times that
speed.  That's definitely a big part of the problem.

Since this is presumably a bare repository, Git will first read the
remote references to determine what's available, so if you're using the
default files backend, it will read each of the refs, which may involve
many small network requests.  This performance could be improved with
`git pack-refs` or by converting to the reftable backend, which will
open fewer files.  reftable also uses some simple compression for ref
names, which will help as well, but it requires a relatively recent Git.
`git refs migrate` can be used to convert to reftable if you like.

Once Git knows what the remote repository's refs are, it will need to
walk the history to find out what it does and doesn't have.  If there
are many lines of development, then Git will do more work; if there is
just one main branch to fetch, then there will be less.  This will
involve opening every loose commit or tag object or reading every packed
commit or tag object in the history path to determine what needs to be
copied.  If there's nothing to copy, then Git can determine that from
the refs and won't walk any history or copy any objects.

If you _do_ have to transfer data, I'm not sure whether having the data
packed or loose will be more efficient in your case due to the slow
speed.  You can try packing the repository with `git gc` and see how
that affects future transfers.  If latency is the cost, then packing
will almost certainly be more efficient.

You can also see how long various operations take by using
`GIT_TRACE2=1`, which will give some detailed timing information that
will help you see what the expensive parts are.

If you have some trace output showing timings, we can advise on what you
might do to help us address performance.

> However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules.

Parallel mode does not help with a single remote.  All the data for a
single remote comes in one job.

[0] https://stackoverflow.com/questions/52841124/how-fast-could-you-read-write-to-floppy-disks-both-3-1-4-and-5-1-2
[1] https://goughlui.com/the-hard-disk-corner/hard-drive-performance-over-the-years/
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-06 20:54 ` brian m. carlson
@ 2026-03-07 21:28   ` R. Diez
  2026-03-08  1:44     ` brian m. carlson
  0 siblings, 1 reply; 9+ messages in thread
From: R. Diez @ 2026-03-07 21:28 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Hallo Brian:

First of all, thanks for your quick feedback.

> Since this is presumably a bare repository,

Yes, the remote repository is bare.

> [...]
> This performance could be improved with `git pack-refs`

After looking around, it turns out that the documentation of "git gc" says that "packing refs" is one of the things it already does.

I'll check when it was the last time I did a "git gc" on the remote bare repository, when I'm there again.

> or by converting to the reftable backend, which will open fewer files.

The documentation states: "reftable for the reftable format. This format is experimental and its internals are subject to change.". I am not ready to risk it yet on my precious Git repository. 8-)

> [...]
> You can also see how long various operations take by using
> `GIT_TRACE2=1`, which will give some detailed timing information that
> will help you see what the expensive parts are.

That didn't help much. Most of the time (23.7 from 24 seconds) is spent in a single child process:
child_start[0] 'git-upload-pack '\''/home/rdiez/MountPoints/blah/blah'\'''

The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong?

I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown:

This takes around 13 seconds:

   pkt-line.c:85           packet:  upload-pack< 0000

I don't know what 0000 means. All other similar "upload-pack" lines have a hash there.

About 2 seconds are spent here:

  pkt-line.c:85           packet:  upload-pack> [some hash]  HEAD symref-target:refs/heads/master
  pkt-line.c:85           packet:  upload-pack> [some hash]  refs/heads/master

7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server.

>> However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules.
> 
> Parallel mode does not help with a single remote.  All the data for a single remote comes in one job.

Is this due to a simple implementation in Git? Could Git download such "refs/tags" files in parallel?

Best regards,
   rdiez

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-07 21:28   ` R. Diez
@ 2026-03-08  1:44     ` brian m. carlson
  2026-03-08 21:08       ` R. Diez
  0 siblings, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2026-03-08  1:44 UTC (permalink / raw)
  To: R. Diez; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 6440 bytes --]

On 2026-03-07 at 21:28:10, R. Diez wrote:
> Hallo Brian:

Hey,

> > This performance could be improved with `git pack-refs`
> 
> After looking around, it turns out that the documentation of "git gc" says that "packing refs" is one of the things it already does.
> 
> I'll check when it was the last time I did a "git gc" on the remote bare repository, when I'm there again.

Yes, this is part of a gc.  However, packing refs is much lighter than a
full GC and will therefore be much faster to complete.

> > or by converting to the reftable backend, which will open fewer files.
> 
> The documentation states: "reftable for the reftable format. This format is experimental and its internals are subject to change.". I am not ready to risk it yet on my precious Git repository. 8-)

It will be the default on Git 3.0 and it's in use on major forges.  I
also use it on several of my development repositories.  It's stable and
functional.  I'll try to send a patch to fix that text.

I would definitely recommend at the very least Git 2.51 for this and
ideally the latest stable version, 2.53.  Git has had a lot of work on
this format to improve performance and stability over the past few
releases.

> That didn't help much. Most of the time (23.7 from 24 seconds) is spent in a single child process:
> child_start[0] 'git-upload-pack '\''/home/rdiez/MountPoints/blah/blah'\'''
> 
> The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong?

upload-pack refers to what's happening on the server.  If you contact a
Git server over something like HTTPS or SSH, then it will use
git-upload-pack to send data to you (a fetch or clone from your
perspective) or git-receive-pack to receive data from you (a push from
your perspective).

When you perform a local fetch, upload-pack is spawned in the remote
repository to serve data.

> I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown:
> 
> This takes around 13 seconds:
> 
>   pkt-line.c:85           packet:  upload-pack< 0000

Is it just that line that takes 13 seconds or is the listing of
references altogether that takes 13 seconds?  That particular line
should not take 13 seconds because it's literally just writing and
flushing 4 bytes.

It would be helpful if you can to include the entire trace output so we
can see and analyze it ourselves.  It's very hard to analyze data from
the different sections in isolation if one is not intimately familiar
with the protocol.

> I don't know what 0000 means. All other similar "upload-pack" lines have a hash there.

Git uses a pkt-line format where each line or chunk of data is preceded
by the total length of the data (including the length itself) encoded as
four hex characters.  So a single byte of data with the value A plus a
newline would be `0006A\n` (four bytes for the length, plus two bytes of
data).  The special code 0000 is a flush packet and means that the end
of a command or a section has been reached.  That's how Git knows the
advertisement has finished.

`GIT_TRACE_PACKET` does not normally print the pkt-line unless it's a
flush (0000) packet or a delimiter (0001) packet, since it would just be
noise.

> About 2 seconds are spent here:
> 
>  pkt-line.c:85           packet:  upload-pack> [some hash]  HEAD symref-target:refs/heads/master
>  pkt-line.c:85           packet:  upload-pack> [some hash]  refs/heads/master

That's sending references, which is expected.

> 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server.

Okay, this is helpful.  You probably have the `peel` capability, which
means that when you have a tag, you get a line like this:

    4a76996b9c60ca3f21e644d78e1e5089a06c6fb3 refs/tags/v0.1.0 peeled:b4c993704e90881bec9c217749be813c70ae2bb6

That `peeled` directive tells us what object the tag points to, but it
means that the tag object has to be opened and read, which makes things
much more expensive.  Unfortunately, there's no way to turn that
capability off, since Git doesn't usually have capability control
options for the protocol.

_However_, if you pack references with `git pack-refs` or you use
reftable, then Git will store the references both peeled and unpeeled,
so it doesn't need to compute that.  reftable is better because _all_
tags are stored both peeled and unpeeled, but as long as you're writing
new references into a files-style repository, the new references are
unpacked (and therefore contain no peeling information).  reftable is
also a binary format which means that it's smaller than a packed-refs
file and since your read speed is the limiting factor, that should make
reads faster.

> > > However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules.
> > 
> > Parallel mode does not help with a single remote.  All the data for a single remote comes in one job.
> 
> Is this due to a simple implementation in Git? Could Git download such "refs/tags" files in parallel?

Git is already downloading them as efficiently as possible.  The
protocol has both sides advertise the references (branches, tags, etc.)
that they have and then, in a fetch or clone, the client sends a list of
what it has and what it wants, and the two sides negotiate to come to an
agreement on what needs to be sent.  This shared understanding includes
_all_ of the objects necessary for everything the client wants but
doesn't have, and then those are all sent as part of one pack.

Parallelization would not help here because the limiting factor is the
speed of the connection (and in your case, literally the speed of
reading data off the file system).  A different design with
parallelization might work if one had a very fast connection and the
speed of deltification and compression were slower than enough to max
out the connection, but that point is around 50 MB/s in a typical
situation and that wouldn't matter here because the server component is
on the same file systems as well.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-08  1:44     ` brian m. carlson
@ 2026-03-08 21:08       ` R. Diez
  2026-03-08 22:52         ` brian m. carlson
  0 siblings, 1 reply; 9+ messages in thread
From: R. Diez @ 2026-03-08 21:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Hi again:

>> The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong?
> 
> upload-pack refers to what's happening on the server.  If you contact a
> Git server over something like HTTPS or SSH, then it will use
> git-upload-pack to send data to you (a fetch or clone from your
> perspective) or git-receive-pack to receive data from you (a push from
> your perspective).
> 
> When you perform a local fetch, upload-pack is spawned in the remote
> repository to serve data.

My client computer has an SMB/CIFS connection to the remote file server. That means the client has mounted the file share with "mount.cifs", so in this scenario nothing is happening on the server, as the connection is not HTTPS or SSH. No process will be spawned on the remote server.

That is the reason why I am getting confused. From my point of view, my client computer is not "uploading" anything when doing a "git pull".

But I guess Git is designed for all scenarios and will probably not use the correct terminology in my case.

In case it helps, I am using Git version 2.53.0.

>> I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown:
>>
>> This takes around 13 seconds:
>>
>>    pkt-line.c:85           packet:  upload-pack< 0000
> 
> Is it just that line that takes 13 seconds or is the listing of
> references altogether that takes 13 seconds?  That particular line
> should not take 13 seconds because it's literally just writing and
> flushing 4 bytes.
> 
> It would be helpful if you can to include the entire trace output so we
> can see and analyze it ourselves.  It's very hard to analyze data from
> the different sections in isolation if one is not intimately familiar
> with the protocol.

The log does not really say which operation is taking how long. It does not say when the listing of references starts or finishes, which files it is reading and how many bytes it is reading from each file, or whether the files are read sequentially or in parallel.

Thanks for your feedback. I know it is hard to help without the whole log, but I would have to ask for permission to upload a log with file paths, hashes and tag names. Or clean them all manually.

>> 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server.
> 
> Okay, this is helpful.  You probably have the `peel` capability, which
> means that when you have a tag, you get a line like this:
> 
>      4a76996b9c60ca3f21e644d78e1e5089a06c6fb3 refs/tags/v0.1.0 peeled:b4c993704e90881bec9c217749be813c70ae2bb6

Yes, that is the case.

> That `peeled` directive tells us what object the tag points to, but it
> means that the tag object has to be opened and read, which makes things
> much more expensive.  Unfortunately, there's no way to turn that
> capability off, since Git doesn't usually have capability control
> options for the protocol.

OK, but there is no protocol here, Git is accessing the files over the mount.

> _However_, if you pack references with `git pack-refs` or you use
> [...]

OK, I'll try with "git gc" on the remote server the next time I can.

> Git is already downloading them as efficiently as possible.  The
> protocol has both sides advertise the references (branches, tags, etc.)
> that they have and then, in a fetch or clone, the client sends a list of
> what it has and what it wants, and the two sides negotiate to come to an
> agreement on what needs to be sent.  This shared understanding includes
> _all_ of the objects necessary for everything the client wants but
> doesn't have, and then those are all sent as part of one pack.
> 
> Parallelization would not help here because the limiting factor is the
> speed of the connection (and in your case, literally the speed of
> reading data off the file system).
> [...]

I don't think that is the case. Git is accessing the remote repository over a mount (a file share), so there is no protocol or negotiation, although I am guessing it is happening virtually with the current Git implementation.

If I understand it correctly, without "packed references", Git will have to access a number of small files on the remote server. Even with packet references, there will probably still be a few small files to access, in addition to some biggish packed references file.

In the past, on rotational hard disks, issuing many such read requests in parallel wasn't beneficial to performance, because of the disk head seek times. That is, jumping around would thrash the disk instead of increasing performance.

But that is not true anymore with SSDs, and especially with file mounts over a network connection with a high latency. In that scenario, issuing parallel requests (with multiple threads or async I/O) should actually increase performance.

Is my reasoning correct?

Another question: Would it help if I only fetched the 'master' branch? Something like "git fetch origin master". Most of the time, I am only interested in the main branch.

I am guessing that "git fetch" will download all other branches by default, because of this:

[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*

I read the "git fetch" documentation, but I didn't understand whether it will fetch by default everything or just the current branch.

Thanks again,
  rdiez

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-08 21:08       ` R. Diez
@ 2026-03-08 22:52         ` brian m. carlson
  2026-03-09 21:08           ` R. Diez
  0 siblings, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2026-03-08 22:52 UTC (permalink / raw)
  To: R. Diez; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 5533 bytes --]

On 2026-03-08 at 21:08:41, R. Diez wrote:
> My client computer has an SMB/CIFS connection to the remote file server. That means the client has mounted the file share with "mount.cifs", so in this scenario nothing is happening on the server, as the connection is not HTTPS or SSH. No process will be spawned on the remote server.
> 
> That is the reason why I am getting confused. From my point of view, my client computer is not "uploading" anything when doing a "git pull".
> 
> But I guess Git is designed for all scenarios and will probably not use the correct terminology in my case.

For an initial clone on a local file system, Git may shortcut spawning
an upload-pack helper and simply copy or hard link files, but otherwise,
all fetches require the use of upload-pack.

There are a couple reasons for this.  First, upload-pack is specifically
designed to deal with untrusted data without executing code or honouring
configuration values, which is important for security reasons.  Second,
when you're doing a fetch, Git wants to copy only the necessary objects
and it can only do that with a helper that can read the objects.  Simply
copying every pack and loose object would lead to enormous bloating of
your client repository because you'd end up with several copies of each
object.

> The log does not really say which operation is taking how long. It does not say when the listing of references starts or finishes, which files it is reading and how many bytes it is reading from each file, or whether the files are read sequentially or in parallel.

The log includes timestamps, which allow us to infer that information.

> Thanks for your feedback. I know it is hard to help without the whole log, but I would have to ask for permission to upload a log with file paths, hashes and tag names. Or clean them all manually.

I'm afraid that without more information, it's going to be difficult for
me or anyone else to give you accurate answers about how to improve
this.  The trace data is specifically designed to allow us to
troubleshoot problems and most forges and Git-adjacent projects would
require you to provide a full trace output before even investigating
further.

> OK, but there is no protocol here, Git is accessing the files over the mount.

As mentioned above, there is a protocol because Git always uses one for
fetches.

> I don't think that is the case. Git is accessing the remote repository over a mount (a file share), so there is no protocol or negotiation, although I am guessing it is happening virtually with the current Git implementation.

`git fetch` from a remote repository on a file system spawns an
upload-pack process in the remote repository to handle the transfer.
`git fetch` then speaks to it over standard input and standard output.
So the normal protocol is being used.

> If I understand it correctly, without "packed references", Git will have to access a number of small files on the remote server. Even with packet references, there will probably still be a few small files to access, in addition to some biggish packed references file.

Correct.

> In the past, on rotational hard disks, issuing many such read requests in parallel wasn't beneficial to performance, because of the disk head seek times. That is, jumping around would thrash the disk instead of increasing performance.
> 
> But that is not true anymore with SSDs, and especially with file mounts over a network connection with a high latency. In that scenario, issuing parallel requests (with multiple threads or async I/O) should actually increase performance.

Git, like virtually every other Unix program, is not designed for high
latency file systems.  Yes, in theory it could be faster to issue
multiple requests, but that would increase the need to buffer large
amounts of data in memory, increasing memory usage, and in the general
case, the fact is that the file system is much lower latency and much
faster than the network connection over which data is being sent, so
that's the case that Git optimizes for.

rsync would also perform poorly in your case because it's again
optimized for sending less data over the network than it receives from
the file system.  Similarly with tar over a network pipe.

So it's certainly the case that Git could handle this case better, but
it also optimizes for the common case like virtually every other modern
Unix program.

If you think it might be faster, you could try rsyncing the remote
repository to a separate directory on your local machine and then
fetching from that.  That does require that both directories are
completely quiescent at the moment with no modification at all.

> Another question: Would it help if I only fetched the 'master' branch? Something like "git fetch origin master". Most of the time, I am only interested in the main branch.

That would likely be faster.  You may also want `--no-tags`, which
prevents downloading tags that would point into the main branch.

> I am guessing that "git fetch" will download all other branches by default, because of this:
> 
> [remote "origin"]
> fetch = +refs/heads/*:refs/remotes/origin/*
> 
> I read the "git fetch" documentation, but I didn't understand whether it will fetch by default everything or just the current branch.

A `git fetch origin` with that configuration will fetch every branch and
every tag that points into one of those branches.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-08 22:52         ` brian m. carlson
@ 2026-03-09 21:08           ` R. Diez
  2026-03-10 22:50             ` brian m. carlson
  0 siblings, 1 reply; 9+ messages in thread
From: R. Diez @ 2026-03-09 21:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

First of all, thanks for the information about upload-pack etc.

> [...]
> the fact is that the file system is much lower latency and much> faster than the network connection over which data is being sent, so
> that's the case that Git optimizes for.

I wouldn't say that reading sequentially is "optimising". It is just the limitation of a simple implementation. Like I said, with modern SSDs, issuing requests in parallel will be faster even on a local filesystem. That would be a real optimisation then.

Some elderly Unix tools like GNU Make realised long time ago that parallel operation is the way to go. Git itself has realised too, so that it can now work in parallel in certain cases (multiple remote repositories, multiple submodules). So old Unix tools don't count as an excuse!

I think we should clearly point out this deficiency. Git must not be perfect, but I would rather know the limitations upfront. At the very least, that would help me make decisions faster, like investing in some sort of a Git server instead of trying to optimise the SMB/CIFS mount.

And who knows, maybe someone will see this post in the future and decide to implement parallel file operations (async I/O) inside upload-pack and the like.

> rsync would also perform poorly in your case because it's again
> optimized for sending less data over the network than it receives from
> the file system.  Similarly with tar over a network pipe.

rsync would probably look at the file dates and sizes and not transfer everything. There are even some parallel rsync variants designed to overcome high network latencies.

But I don't think rsync is worth the effort for me. I'll just wait a while longer every now and then.

There is one more thing I am curious about. Git does not document how it uses SSH (or at least I couldn't find it in the standard end-user documentation). Git cannot launch a process on the target host over SSH, unless Git is already installed on the remote system. After all, the local system may have a different architecture (like AMD vs ARM), so you cannot copy a binary across. And I haven't seen the requirement that Git must be installed on the remote host when connecting over SSH. In that case, I would have probably seen somewhere a version compatibility table between client and server.

So Git must be accessing files over SSH using the standard SSH file transfer operations. I am guessing that the same latency problem will apply here too, because uploads and downloads over SSH will also be sequential. Is my reasoning correct?

Or does Git attempt to find out whether there is a Git on the other side? What happens if there isn't then?

> [...]
> A `git fetch origin` with that configuration will fetch every branch and
> every tag that points into one of those branches.

OK, thanks. It turns out my repository has no branches at all, so that wouldn't help me anyway.

Best regards,
   rdiez

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-09 21:08           ` R. Diez
@ 2026-03-10 22:50             ` brian m. carlson
  2026-03-11 18:05               ` R. Diez
  0 siblings, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2026-03-10 22:50 UTC (permalink / raw)
  To: R. Diez; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3287 bytes --]

On 2026-03-09 at 21:08:31, R. Diez wrote:
> There is one more thing I am curious about. Git does not document how it uses SSH (or at least I couldn't find it in the standard end-user documentation). Git cannot launch a process on the target host over SSH, unless Git is already installed on the remote system. After all, the local system may have a different architecture (like AMD vs ARM), so you cannot copy a binary across. And I haven't seen the requirement that Git must be installed on the remote host when connecting over SSH. In that case, I would have probably seen somewhere a version compatibility table between client and server.
> 
> So Git must be accessing files over SSH using the standard SSH file transfer operations. I am guessing that the same latency problem will apply here too, because uploads and downloads over SSH will also be sequential. Is my reasoning correct?

Git doesn't use standard SSH file transfer operations.  That would be
much slower and it also works poorly when the remote side doesn't grant
access to a file system, such as with a forge or gitolite.

SSH allows multiple commands to be run over a single connection with
`-oControlMaster`, which can improve performance.  The benefit to
running a single command for each Git operation is that we can do
authentication once at the beginning of each command, whereas if we have
a long-running SSH connection and attempt to do SFTP, we might
interleave requests on different repositories, so each request would
have to perform authentication.  That's not a problem if you're using
Unix permissions to control access, but it scales really poorly when
your Git data is actually spread across many different file servers and
the user is accessing multiple repositories, such as is common on
forges.

Using SSH file transfer operations also would not work well because you
would effectively have to download every pack file and loose object to
be sure you got the data you need, instead of getting a pack with only a
few objects if that's all you need.

However, you can of course mount a remote file system as SFTP with
`sshfs` and use it as a local file system if you actually have a real
file system on the remote side.  That will send multiple requests over
the connection when reading or writing since the `sshfs` does queue
those.

> Or does Git attempt to find out whether there is a Git on the other side? What happens if there isn't then?

Git invokes git-upload-pack on the remote side and talks to it over
standard input and output.  If there isn't one, then the operation
fails.

Here's an example:

----
% GIT_TRACE=1 git ls-remote git@github.com:git/git.git
22:41:36.731673 git.c:502               trace: built-in: git ls-remote git@github.com:git/git.git
22:41:36.731937 run-command.c:673       trace: run_command: unset GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '\''git/git.git'\'''
22:41:36.731952 run-command.c:765       trace: start_command: /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '\''git/git.git'\'''
----

I don't have any systems without Git on them, so I can't demonstrate the
failure case.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git-fetch takes forever on a slow network link. Can parallel mode help?
  2026-03-10 22:50             ` brian m. carlson
@ 2026-03-11 18:05               ` R. Diez
  0 siblings, 0 replies; 9+ messages in thread
From: R. Diez @ 2026-03-11 18:05 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

> Git doesn't use standard SSH file transfer operations.
> [...]

OK, thanks for the information.

I have finally done a "git gc" on the server side, and now a "git pull" from the client with no new commits to download takes 4 seconds, a drastic reduction from the 25 seconds it took before.

I turns out I hadn't done a "git gc" on the server for over 2 years, so that many new references weren't packed.

Therefore, I think that having many small files to read versus one packed-refs file makes a huge difference if you have mounted a remote filesystem over a network with a relatively high latency.

My 1 Mbps connection does not actually have such a high latency (around 40 ms measured with ping), but latency seems to have a much greater impact than the low bandwidth, at least with a packed-refs file which only weighs 64 kB.

Best regards,
   rdiez

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-11 18:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez
2026-03-06 20:54 ` brian m. carlson
2026-03-07 21:28   ` R. Diez
2026-03-08  1:44     ` brian m. carlson
2026-03-08 21:08       ` R. Diez
2026-03-08 22:52         ` brian m. carlson
2026-03-09 21:08           ` R. Diez
2026-03-10 22:50             ` brian m. carlson
2026-03-11 18:05               ` R. Diez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox