Re: Mercurial 0.4e vs git network pull

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Mackall <mpm@selenic.com>
To: Daniel Barkalow <barkalow@iabervon.org>
Cc: Petr Baudis <pasky@ucw.cz>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	git@vger.kernel.org, mercurial@selenic.com,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: Mercurial 0.4e vs git network pull
Date: Thu, 12 May 2005 19:44:27 -0700	[thread overview]
Message-ID: <20050513024427.GL5914@waste.org> (raw)
In-Reply-To: <Pine.LNX.4.21.0505122148480.30848-100000@iabervon.org>

On Thu, May 12, 2005 at 10:23:01PM -0400, Daniel Barkalow wrote:
> On Thu, 12 May 2005, Matt Mackall wrote:
> 
> > On Thu, May 12, 2005 at 08:33:56PM -0400, Daniel Barkalow wrote:
> >
> > > Yes, although that also includes pulling the commits, and may be
> > > interleaved with pulling the trees and objects to cover the
> > > latency. (I.e., one round trip gets the new head hash; the second gets
> > > that commit; on the third the tree and the parent(s) can be requested at
> > > once; on the fouth the contents of the tree and the grandparents, at
> > > which point the bandwidth will probably be the limiting factor for the
> > > rest of the operation.)
> > 
> > What if a changeset is smaller than the bandwidth-delay product of
> > your link? As an extreme example, Mercurial is currently at a point
> > where its -entire repo- changegroup (set of all changesets) can be in
> > flight on the wire on a typical link.
> 
> If this is common for the repository in question, then it will be forced
> to wait for the parent to come in, true. If you have a number of merges,
> however, you start using more total bandwidth relative to latency while
> tracking them in parallel.

No, you're missing my point. If you can request all the files in a
changeset in less than a round-trip time, you have a pipeline stall.
Let's say a changeset is 10k and round trip time is 100ms. That means
you'll stall on any pipe with more than 100k/s. You won't know what
changeset to request next as it'll still be in flight.
 
> > > I must be misunderstanding your numbers, because 6 HTTP responses is more
> > > than 1K, ignoring any actual content from the server, and 1K for 800
> > > commits is less than 2 bytes per commit.
> > 
> > 1k of application-level data, sorry. And my whole point is that I
> > don't send those 800 commit identifiers (which are 40 bytes each as
> > hex). I send about 30 or so. It's basically a negotiation to find the
> > earliest commits not known to the client with a minimum of round trips
> > and data exchange.
> 
> Does this rely on the history being entirely linear? I suppose that
> requesting a rev-list from the server (which could have it as a static
> file generated when a new head was pushed) could jumpstart the
> process. The client could request all of the commits it doesn't have in
> rapid succession, and then request trees as the commits started coming
> in. Of course, this would get inefficient if you were, for example,
> pulling a merge with a branch with a long history, since you'd get a ton
> of old mainline (which you already have) interleaved with occasional new
> things.

I don't depend on history being linear (I'm not reinventing CVS here)
and I don't grab a list of all revisions (the point is to be
scalable). In fact, I do something fairly clever, and something I
don't think will work with git, because, yet again, it lacks the
metadata.

> > > I'm also worried about testing on 800 linear commits, since the projects
> > > under consideration tend to have very non-linear histories. 
> > 
> > Not true at all. Dumps from Andrew to Linus via patch bombs will
> > result in runs of hundreds of linear commits on a regular basis.
> > Linear patch series are the preferred way to make changes and series
> > of 30 or 40 small patches are not at all uncommon.
> 
> It has sounded like Andrew had some interest in using git, and a number of
> other developers are using it already. If this becomes still more common,
> it may be the case that, instead of sending patch bombs, Andrew will point
> Linus at authors' original series, in which case the mainline would be
> merges of a hundred linear series of various lengths. I had the
> impression, although I never looked carefully, that this was happening on
> a smaller scale with BK, where work by BK users got included using BK,
> rather than as patches applied out of a bomb.

Andrew already uses git, in a manner much like he used BK. He does a
pull from a repo, generates a patch of that repo vs mainline, and puts
that in -mm. And never passes that stuff on to Linus.

-- 
Mathematics is the supreme nostalgia of our time.

next prev parent reply	other threads:[~2005-05-13  2:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  9:44 Mercurial 0.4e vs git network pull Matt Mackall
2005-05-12 18:23 ` Petr Baudis
2005-05-12 20:11   ` Matt Mackall
2005-05-12 20:14     ` Petr Baudis
2005-05-12 20:57       ` Matt Mackall
2005-05-12 21:24         ` Daniel Barkalow
2005-05-12 22:29           ` Matt Mackall
2005-05-13  0:33             ` Daniel Barkalow
2005-05-13  1:11               ` Matt Mackall
2005-05-13  2:23                 ` Daniel Barkalow
2005-05-13  2:44                   ` Matt Mackall [this message]
2005-05-13  5:44           ` Petr Baudis
2005-05-15  8:54         ` Petr Baudis
2005-05-15  0:40       ` Christian Kujau
2005-05-15  8:50         ` Petr Baudis
2005-05-15 15:12           ` Christian Kujau
2005-05-15  6:22   ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2005-05-15 11:22 Adam J. Richter
2005-05-15 12:40 ` Petr Baudis
2005-05-16 22:22   ` Tristan Wibberley
2005-05-16 22:22   ` Tristan Wibberley
2005-05-15 17:39 ` Matt Mackall
2005-05-15 18:23   ` Jeff Garzik
2005-05-16  1:12     ` Matt Mackall
2005-05-16  9:29 ` Matthias Urlichs
2005-05-16  9:29   ` Matthias Urlichs
2005-05-15 11:52 Adam J. Richter
2005-05-15 14:23 ` Petr Baudis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050513024427.GL5914@waste.org \
    --to=mpm@selenic.com \
    --cc=barkalow@iabervon.org \
    --cc=git@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mercurial@selenic.com \
    --cc=pasky@ucw.cz \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.