git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: git@vger.kernel.org
Subject: O(#haves*...) behaviour in "have <sha>" processing in upload-pack
Date: Sat, 13 Sep 2008 02:11:07 +0200	[thread overview]
Message-ID: <200809130211.14091.trast@student.ethz.ch> (raw)

[-- Attachment #1: Type: text/plain, Size: 2899 bytes --]

Hi *

evilchelu (Cristi Balan) pointed out on IRC that 'git fetch' takes a
long time when fetching a history-disjoint repository.

For example,

  mkdir test && cd test
  git init
  git remote add -f paperclip git://github.com/thoughtbot/paperclip.git
  git remote add -f hoptoad git://github.com/thoughtbot/hoptoad_notifier.git
  git remote add -f aasm git://github.com/rubyist/aasm.git
  git remote add -f forgot git://github.com/greenisus/forgot_password.git
  git remote add -f restful git://github.com/technoweenie/restful-authentication.git

(taken straight from evilchelu's example, but it should be the same
with random repositories).

Peeking at the transmission with Wireshark, there is a noticeable
pattern of 4-5s delays before _every_ "0008NAK\n" line sent by the
server.

Looking at upload-pack.c, it seems that the server does far too much
work when processing the "have" lines.  I've only just read into this
area of code, but the rough idea seems to be:

  [404: get_common_commits()]
  for (H = every "have" line) {

      [321: got_sha1()]
      flag H and its parents (shallow!) as THEY_HAVE

      [415: get_common_commits()]
      if (we do not have H in our object store) {

          [367: ok_to_give_up()]
          for (W = every "want" object specified earlier) {

              [338: reachable()]
              walk the entire history to see if anything flagged
              THEY_HAVE so far is reachable from W

          }

          [418: get_common_commits()]
          if the innermost test succeeded for all W: ACK this H so the
          client stops walking history from it
      }
  }

The entire loop seems to have O(h*w*n) (n=history) complexity, which
probably is to blame for the delays.

* Isn't this ok_to_give_up() test moot?  If H is not in our object
  store, it cannot be of any use in the transfer (of our history to
  the client).  So if we are going to fake an ACK to stop the client
  digging on this side of his history, we might as well send it right
  away.  (And what does reaching a THEY_HAVE commit from all W's have
  to do with it?)

* Even assuming it is not, it would save some work if the server
  avoided walking the entire history for every H.  For example, it
  could buffer up all H's until a "0000\n" arrives, which currently
  seems to be 32 haves, then check all of them in a single pass over
  history.  (Unfortunately the 'h' factor cannot be removed completely
  unless we buffer all of them, which again defeats the point.)

Much of the code in question was added in 937a515 (upload-pack: stop
the other side when they have more roots than we do., 2006-07-05).

[I hope I'm making some sense, it's far too late here, but it was
either this or trying to understand it again in the morning.]

- Thomas

-- 
Thomas Rast
trast@student.ethz.ch





[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

             reply	other threads:[~2008-09-13  0:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-13  0:11 Thomas Rast [this message]
2008-09-13 12:18 ` O(#haves*...) behaviour in "have <sha>" processing in upload-pack Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200809130211.14091.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).