git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: David Lang <david@lang.hm>
Cc: Koosha Khajehmoogahi <koosha.khajeh@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: Multi-threaded 'git clone'
Date: Mon, 16 Feb 2015 10:47:45 -0500	[thread overview]
Message-ID: <20150216154745.GA10120@peff.net> (raw)
In-Reply-To: <alpine.DEB.2.02.1502160727480.23770@nftneq.ynat.uz>

On Mon, Feb 16, 2015 at 07:31:33AM -0800, David Lang wrote:

> >Then the server streams the data to the client. It might do some light
> >work transforming the data as it comes off the disk, but most of it is
> >just blitted straight from disk, and the network is the bottleneck.
> 
> Depending on how close to full the WAN link is, it may be possible to
> improve this with multiple connections (again, referencing bbcp), but
> there's also the question of if it's worth trying to use the entire WAN for
> a single user. The vast majority of the time the server is doing more than
> one thing and would rather let any individual user wait a bit and service
> the other users.

Yeah, I have seen clients that make multiple TCP connections to each
request a chunk of a file in parallel. The short answer is that this is
going to be very hard with git. Each clone generates the pack on the fly
based on what's on disk and streams it out. It should _usually_ be the
same, but there's nothing to guarantee byte-for-byte equality between
invocations. So you'd have to multiplex all of the connections into the
same server process. And even then it's hard; that process knows its
going to send you byte the bytes for object X, but it doesn't know at
exactly which offset until it gets there, which makes sending things out
of order tricky. And the whole output is checksummed by a single sha1
over the whole stream that comes at the end.

I think the most feasible thing would be to quickly spool it to a server
on the LAN, and then use an existing fetch-in-parallel tool to grab it
from there over the WAN.

-Peff

  reply	other threads:[~2015-02-16 15:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-16 13:16 Multi-threaded 'git clone' Koosha Khajehmoogahi
2015-02-16 13:31 ` David Lang
2015-02-16 15:03   ` Jeff King
2015-02-16 15:31     ` David Lang
2015-02-16 15:47       ` Jeff King [this message]
2015-02-16 18:43         ` Junio C Hamano
2015-02-17  3:16           ` Shawn Pearce
2015-02-16 23:16         ` Duy Nguyen
2015-02-17  0:56           ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2015-02-17  5:20 Martin Fick
2015-02-17 23:32 ` Junio C Hamano
2015-02-18  3:14   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150216154745.GA10120@peff.net \
    --to=peff@peff.net \
    --cc=david@lang.hm \
    --cc=git@vger.kernel.org \
    --cc=koosha.khajeh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).