Re: Make `git fetch --all` parallel?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, Ram Rachum <ram@rachum.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Make `git fetch --all` parallel?
Date: Tue, 11 Oct 2016 21:34:28 -0400	[thread overview]
Message-ID: <20161012013428.swxmrbyxv2wo37xf@sigill.intra.peff.net> (raw)
In-Reply-To: <CAGZ79kaKOiy-HJboaujXXc66P6CLupteDw4JyPOGetREfz_q_Q@mail.gmail.com>

On Tue, Oct 11, 2016 at 04:18:15PM -0700, Stefan Beller wrote:

> >> At the very least we would need a similar thing as Jeff recently sent for the
> >> push case with objects quarantined and then made available in one go?
> >
> > I don't think so. The object database is perfectly happy with multiple
> > simultaneous writers, and nothing impacts the have/wants until actual
> > refs are written. Quarantining objects before the refs are written is an
> > orthogonal concept.
> 
> If a remote advertises its tips, we'd need to look these up (clientside) to
> decide if we have them, and I do not think we'd do that via a reachability
> check, but via direct lookup in the object data base? So I do not quite
> understand, what we gain from the atomic ref writes in e.g. remote/origin/.

It's been a while since I've dug into the fetch protocol. But I think we
cover the "do we have the objects already" check via quickfetch(), which
does do a reachability check, And then we advertise our "have" commits
by walking backwards from our ref tips, so everything there is
reachable.

Anything else would be questionable, especially under older versions of
git, as we promise only to have a complete graph for objects reachable
from the refs. Older versions of git would happily truncate unreachable
history based on the 2-week prune expiration period.

> > I'm not altogether convinced that parallel fetch would be that much
> > faster, though.
> 
> Ok, time to present data... Let's assume a degenerate case first:
> "up-to-date with all remotes" because that is easy to reproduce.
> 
> I have 14 remotes currently:
> 
> $ time git fetch --all
> real 0m18.016s
> user 0m2.027s
> sys 0m1.235s
> 
> $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
> -P 14 -I % git fetch %
> real 0m5.168s
> user 0m2.312s
> sys 0m1.167s

So first, thank you (and Ævar) for providing real numbers. It's clear
that I was talking nonsense.

Second, I wonder where all that time is going. Clearly there's an
end-to-end latency issue, but I'm not sure where it is. Is it startup
time for git-fetch? Is it in getting and processing the ref
advertisement from the other side? What I'm wondering is if there are
opportunities to speed up the serial case (but nobody really cared
before because it doesn't matter unless you're doing 14 of them back to
back).

> > I usually just do a one-off fetch of their URL in such a case, exactly
> > because I _don't_ want to end up with a bunch of remotes. You can also
> > mark them with skipDefaultUpdate if you only care about them
> > occasionally (so you can "git fetch sbeller" when you care about it, but
> > it doesn't slow down your daily "git fetch").
> 
> And I assume you don't want the remotes because it takes time to fetch and not
> because your disk space is expensive. ;)

That, and it clogs the ref namespace. You can mostly ignore the extra
refs, but they show up in the "git checkout ..." DWIM, for example.

-Peff

next prev parent reply	other threads:[~2016-10-12  1:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 20:12 Make `git fetch --all` parallel? Ram Rachum
2016-10-11 20:53 ` Stefan Beller
2016-10-11 22:37   ` Junio C Hamano
2016-10-11 22:50     ` Stefan Beller
2016-10-11 22:58       ` Junio C Hamano
2016-10-11 22:58       ` Stefan Beller
2016-10-11 22:59       ` Jeff King
2016-10-11 23:16         ` Ævar Arnfjörð Bjarmason
2016-10-11 23:18         ` Stefan Beller
2016-10-12  1:34           ` Jeff King [this message]
2016-10-12  1:52             ` Jeff King
2016-10-12  6:47               ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161012013428.swxmrbyxv2wo37xf@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ram@rachum.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).