Re: git-fetch takes forever on a slow network link. Can parallel mode help?

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "R. Diez" <rdiez-2006@rd10.de>
Cc: git@vger.kernel.org
Subject: Re: git-fetch takes forever on a slow network link. Can parallel mode help?
Date: Sun, 8 Mar 2026 01:44:52 +0000	[thread overview]
Message-ID: <aazUlMBj_IK41Ss2@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <1d6a8eec-20b3-4d6e-83f1-d18b7a3c0145@rd10.de>

[-- Attachment #1: Type: text/plain, Size: 6440 bytes --]

On 2026-03-07 at 21:28:10, R. Diez wrote:
> Hallo Brian:

Hey,

> > This performance could be improved with `git pack-refs`
> 
> After looking around, it turns out that the documentation of "git gc" says that "packing refs" is one of the things it already does.
> 
> I'll check when it was the last time I did a "git gc" on the remote bare repository, when I'm there again.

Yes, this is part of a gc.  However, packing refs is much lighter than a
full GC and will therefore be much faster to complete.

> > or by converting to the reftable backend, which will open fewer files.
> 
> The documentation states: "reftable for the reftable format. This format is experimental and its internals are subject to change.". I am not ready to risk it yet on my precious Git repository. 8-)

It will be the default on Git 3.0 and it's in use on major forges.  I
also use it on several of my development repositories.  It's stable and
functional.  I'll try to send a patch to fix that text.

I would definitely recommend at the very least Git 2.51 for this and
ideally the latest stable version, 2.53.  Git has had a lot of work on
this format to improve performance and stability over the past few
releases.

> That didn't help much. Most of the time (23.7 from 24 seconds) is spent in a single child process:
> child_start[0] 'git-upload-pack '\''/home/rdiez/MountPoints/blah/blah'\'''
> 
> The log talks about "upload pack", but I gather this is actually a download operation. It wouldn't be the first confusing item in Git. Or have I got it wrong?

upload-pack refers to what's happening on the server.  If you contact a
Git server over something like HTTPS or SSH, then it will use
git-upload-pack to send data to you (a fetch or clone from your
perspective) or git-receive-pack to receive data from you (a push from
your perspective).

When you perform a local fetch, upload-pack is spawned in the remote
repository to serve data.

> I added "export GIT_TRACE_PACKET=true", and then I got a more useful breakdown:
> 
> This takes around 13 seconds:
> 
>   pkt-line.c:85           packet:  upload-pack< 0000

Is it just that line that takes 13 seconds or is the listing of
references altogether that takes 13 seconds?  That particular line
should not take 13 seconds because it's literally just writing and
flushing 4 bytes.

It would be helpful if you can to include the entire trace output so we
can see and analyze it ourselves.  It's very hard to analyze data from
the different sections in isolation if one is not intimately familiar
with the protocol.

> I don't know what 0000 means. All other similar "upload-pack" lines have a hash there.

Git uses a pkt-line format where each line or chunk of data is preceded
by the total length of the data (including the length itself) encoded as
four hex characters.  So a single byte of data with the value A plus a
newline would be `0006A\n` (four bytes for the length, plus two bytes of
data).  The special code 0000 is a flush packet and means that the end
of a command or a section has been reached.  That's how Git knows the
advertisement has finished.

`GIT_TRACE_PACKET` does not normally print the pkt-line unless it's a
flush (0000) packet or a delimiter (0001) packet, since it would just be
noise.

> About 2 seconds are spent here:
> 
>  pkt-line.c:85           packet:  upload-pack> [some hash]  HEAD symref-target:refs/heads/master
>  pkt-line.c:85           packet:  upload-pack> [some hash]  refs/heads/master

That's sending references, which is expected.

> 7 seconds are spent with "upload-pack" and "fetch" operations, mainly for single "refs/tags". I'll check whether that improves after the next "git gc" on the server.

Okay, this is helpful.  You probably have the `peel` capability, which
means that when you have a tag, you get a line like this:

    4a76996b9c60ca3f21e644d78e1e5089a06c6fb3 refs/tags/v0.1.0 peeled:b4c993704e90881bec9c217749be813c70ae2bb6

That `peeled` directive tells us what object the tag points to, but it
means that the tag object has to be opened and read, which makes things
much more expensive.  Unfortunately, there's no way to turn that
capability off, since Git doesn't usually have capability control
options for the protocol.

_However_, if you pack references with `git pack-refs` or you use
reftable, then Git will store the references both peeled and unpeeled,
so it doesn't need to compute that.  reftable is better because _all_
tags are stored both peeled and unpeeled, but as long as you're writing
new references into a files-style repository, the new references are
unpacked (and therefore contain no peeling information).  reftable is
also a binary format which means that it's smaller than a packed-refs
file and since your read speed is the limiting factor, that should make
reads faster.

> > > However, the git-fetch documentation does not clearly state whether the parallel mode only helps if you have multiple remotes and/or multiple submodules. In my case, I just have a single repository with a single origin and no submodules.
> > 
> > Parallel mode does not help with a single remote.  All the data for a single remote comes in one job.
> 
> Is this due to a simple implementation in Git? Could Git download such "refs/tags" files in parallel?

Git is already downloading them as efficiently as possible.  The
protocol has both sides advertise the references (branches, tags, etc.)
that they have and then, in a fetch or clone, the client sends a list of
what it has and what it wants, and the two sides negotiate to come to an
agreement on what needs to be sent.  This shared understanding includes
_all_ of the objects necessary for everything the client wants but
doesn't have, and then those are all sent as part of one pack.

Parallelization would not help here because the limiting factor is the
speed of the connection (and in your case, literally the speed of
reading data off the file system).  A different design with
parallelization might work if one had a very fast connection and the
speed of deltification and compression were slower than enough to max
out the connection, but that point is around 50 MB/s in a typical
situation and that wouldn't matter here because the server component is
on the same file systems as well.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

next prev parent reply	other threads:[~2026-03-08  1:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 20:13 git-fetch takes forever on a slow network link. Can parallel mode help? R. Diez
2026-03-06 20:54 ` brian m. carlson
2026-03-07 21:28   ` R. Diez
2026-03-08  1:44     ` brian m. carlson [this message]
2026-03-08 21:08       ` R. Diez
2026-03-08 22:52         ` brian m. carlson
2026-03-09 21:08           ` R. Diez
2026-03-10 22:50             ` brian m. carlson
2026-03-11 18:05               ` R. Diez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aazUlMBj_IK41Ss2@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=rdiez-2006@rd10.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox