[RFC PATCH v3 0/2] fetch-pack: log(n)-transmission find_common()

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Rast <trast@student.ethz.ch>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Nicolas Pitre <nico@cam.org>,
	Nanako Shiraishi <nanako3@lavabit.com>
Subject: [RFC PATCH v3 0/2] fetch-pack: log(n)-transmission find_common()
Date: Sun,  7 Dec 2008 00:20:19 +0100	[thread overview]
Message-ID: <1228605621-29685-1-git-send-email-trast@student.ethz.ch> (raw)
In-Reply-To: <alpine.LFD.2.00.0810281034500.13034@xanadu.home>

If you still know what this was about, you can skip ahead to the "=="
marker below.

The mail that started it all was

  http://kerneltrap.org/mailarchive/git/2008/9/17/3328364

  [Abstract:]
  Using two passes, first exponential stride through history, then
  bisection on the "right" windows, we implement the find_common
  handshake in log(history_size) many transmissions.  The followup
  patch partially achieves that, but a few points are open and strict
  backwards compatibility is missing (but its absence is Mostly
  Harmless).

This spawned out of a problem originally reported on IRC where pulling
several disjoint histories together takes ages (due to the load on the
server caused by ok_to_give_up()).  Apparently this is a common
workflow in the Ruby-on-Rails.  The bound on the number of
transmissions holds in other cases too, of course.

The idea attracted some interest, but to my knowledge nobody actually
reviewed the code, and it eventually died because of Nicholas's
finding:

Nicolas Pitre wrote:
> FWIW, I had to back this patch out from my version as things seemed to 
> fall into an infinite loop of ref negotiation while fetching the Linux 
> kernel repository at some point.  Doing a "git fetch -v -v" turned up an 
> endless stream of "got" and "have" lines.  I was in a hurry for $work so 
> didn't think of preserving my local refs for reproduction of the 
> problem.

Unfortunately, I was never able to reliably reproduce this problem.
(IIRC I did once when running an automated hammering script but it
vanished when I tried again manually.)

==

After a long break, I decided to pick this up again.  I'm not sure
this is the best time to do so, but I had the time to spare, and
(unless there are more serious bugs, and Gods and Junio willing) we
might get it into 'next' sometime early in the next cycle for better
testing.

I rewrote the core of the algorithm, though some helpers and most of
the glue to surrounding routines are still the same.  I took the
following design decisions that weren't in v2:

0. Lots of comments.  Well, by my standards in any case.

1. The old code stays.  I mostly want to make it (far) easier to be
   certain that it behaves exactly as before if the server does not
   support disabling ok_to_give_up().  [The idea is that it should be
   easy to verify that 1/2 is a no-change refactoring and 2/2 only
   does something if the server supports it.]

2. The work list now uses a Fibonacci heap to order jobs.  I'm not
   religious about the specific flavour of heap, but I had the
   description of these handy.  (The work list can get rather large.)

3. The bisection is essentially precomputed.  By staring at the binary
   representation of the distance from the starting commit hard
   enough, it can be constructed during the stride pass.  Barring a
   circular bisection "tree", the algorithm is now guaranteed to
   terminate even if it, say, emits too many or too few sha1s due to
   some other bug.

The last point of course means that the main work is now in the
tangled mess that is get_rev_new_stride(), and "staring hard enough"
frequently enough meant grabbing a notepad.  YMMV.

While I did run a lot of tests, including 'make test' and some
automated hammering, it seems quite hard to make sure it _really_
bisects correctly (except manually in toy examples, such as two linear
histories with a common base).  Suggestions for automated tests
welcome.

To see the effects of the patch, try

  git fetch-pack -v -k <url> <head> 2>&1 \
    | git name-rev --stdin refs/heads/master

with a daemon that supports the feature.

next prev parent reply	other threads:[~2008-12-06 23:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-17 23:00 [RFC] log(n)-transmissions common commit handshake Thomas Rast
2008-09-17 23:01 ` [RFC PATCH] fetch-pack: log(n)-transmission find_common() Thomas Rast
2008-09-24  0:48   ` [RFC PATCH v2] " Thomas Rast
2008-10-23 19:38     ` Thomas Rast
2008-10-24 15:49       ` Nicolas Pitre
2008-10-27 10:29       ` Nanako Shiraishi
2008-10-28  3:24         ` Junio C Hamano
2008-10-28 14:46           ` Nicolas Pitre
2008-10-28 19:37             ` Thomas Rast
2008-12-06 23:20             ` Thomas Rast [this message]
2008-12-06 23:20               ` [RFC PATCH v3 1/2] fetch-pack: rearrange main loop Thomas Rast
2008-12-06 23:20                 ` [RFC PATCH v3 2/2] fetch-pack: log(n)-transmission find_common() Thomas Rast
2008-09-18  8:18 ` [RFC] log(n)-transmissions common commit handshake Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1228605621-29685-1-git-send-email-trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nanako3@lavabit.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).