Re: git-fetch takes forever on a slow network link. Can parallel mode help?

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "R. Diez" <rdiez-2006@rd10.de>
Cc: git@vger.kernel.org
Subject: Re: git-fetch takes forever on a slow network link. Can parallel mode help?
Date: Sun, 8 Mar 2026 22:52:17 +0000	[thread overview]
Message-ID: <aa39obsSbk9R1mqu@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <0ebf757b-eab5-424a-a58b-e654b1a2942e@rd10.de>

[-- Attachment #1: Type: text/plain, Size: 5533 bytes --]

On 2026-03-08 at 21:08:41, R. Diez wrote:
> My client computer has an SMB/CIFS connection to the remote file server. That means the client has mounted the file share with "mount.cifs", so in this scenario nothing is happening on the server, as the connection is not HTTPS or SSH. No process will be spawned on the remote server.
> 
> That is the reason why I am getting confused. From my point of view, my client computer is not "uploading" anything when doing a "git pull".
> 
> But I guess Git is designed for all scenarios and will probably not use the correct terminology in my case.

For an initial clone on a local file system, Git may shortcut spawning
an upload-pack helper and simply copy or hard link files, but otherwise,
all fetches require the use of upload-pack.

There are a couple reasons for this.  First, upload-pack is specifically
designed to deal with untrusted data without executing code or honouring
configuration values, which is important for security reasons.  Second,
when you're doing a fetch, Git wants to copy only the necessary objects
and it can only do that with a helper that can read the objects.  Simply
copying every pack and loose object would lead to enormous bloating of
your client repository because you'd end up with several copies of each
object.

> The log does not really say which operation is taking how long. It does not say when the listing of references starts or finishes, which files it is reading and how many bytes it is reading from each file, or whether the files are read sequentially or in parallel.

The log includes timestamps, which allow us to infer that information.

> Thanks for your feedback. I know it is hard to help without the whole log, but I would have to ask for permission to upload a log with file paths, hashes and tag names. Or clean them all manually.

I'm afraid that without more information, it's going to be difficult for
me or anyone else to give you accurate answers about how to improve
this.  The trace data is specifically designed to allow us to
troubleshoot problems and most forges and Git-adjacent projects would
require you to provide a full trace output before even investigating
further.

> OK, but there is no protocol here, Git is accessing the files over the mount.

As mentioned above, there is a protocol because Git always uses one for
fetches.

> I don't think that is the case. Git is accessing the remote repository over a mount (a file share), so there is no protocol or negotiation, although I am guessing it is happening virtually with the current Git implementation.

`git fetch` from a remote repository on a file system spawns an
upload-pack process in the remote repository to handle the transfer.
`git fetch` then speaks to it over standard input and standard output.
So the normal protocol is being used.

> If I understand it correctly, without "packed references", Git will have to access a number of small files on the remote server. Even with packet references, there will probably still be a few small files to access, in addition to some biggish packed references file.

Correct.

> In the past, on rotational hard disks, issuing many such read requests in parallel wasn't beneficial to performance, because of the disk head seek times. That is, jumping around would thrash the disk instead of increasing performance.
> 
> But that is not true anymore with SSDs, and especially with file mounts over a network connection with a high latency. In that scenario, issuing parallel requests (with multiple threads or async I/O) should actually increase performance.

Git, like virtually every other Unix program, is not designed for high
latency file systems.  Yes, in theory it could be faster to issue
multiple requests, but that would increase the need to buffer large
amounts of data in memory, increasing memory usage, and in the general
case, the fact is that the file system is much lower latency and much
faster than the network connection over which data is being sent, so
that's the case that Git optimizes for.

rsync would also perform poorly in your case because it's again
optimized for sending less data over the network than it receives from
the file system.  Similarly with tar over a network pipe.

So it's certainly the case that Git could handle this case better, but
it also optimizes for the common case like virtually every other modern
Unix program.

If you think it might be faster, you could try rsyncing the remote
repository to a separate directory on your local machine and then
fetching from that.  That does require that both directories are
completely quiescent at the moment with no modification at all.

> Another question: Would it help if I only fetched the 'master' branch? Something like "git fetch origin master". Most of the time, I am only interested in the main branch.

That would likely be faster.  You may also want `--no-tags`, which
prevents downloading tags that would point into the main branch.

> I am guessing that "git fetch" will download all other branches by default, because of this:
> 
> [remote "origin"]
> fetch = +refs/heads/*:refs/remotes/origin/*
> 
> I read the "git fetch" documentation, but I didn't understand whether it will fetch by default everything or just the current branch.

A `git fetch origin` with that configuration will fetch every branch and
every tag that points into one of those branches.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

next prev parent reply	other threads:[~2026-03-08 22:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez
2026-03-06 20:54 ` brian m. carlson
2026-03-07 21:28   ` R. Diez
2026-03-08  1:44     ` brian m. carlson
2026-03-08 21:08       ` R. Diez
2026-03-08 22:52         ` brian m. carlson [this message]
2026-03-09 21:08           ` R. Diez
2026-03-10 22:50             ` brian m. carlson
2026-03-11 18:05               ` R. Diez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa39obsSbk9R1mqu@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=rdiez-2006@rd10.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox