git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Git push performance problems with ~100K refs
Date: Fri, 30 Mar 2012 05:32:08 -0400	[thread overview]
Message-ID: <20120330093207.GA12298@sigill.intra.peff.net> (raw)
In-Reply-To: <60bff12d-544c-4fbd-b48a-0fdf44efaded@email.android.com>

On Thu, Mar 29, 2012 at 08:43:06PM -0600, Martin Fick wrote:

> >It is trying to minimize the transfer cost.  By showing a ref to the
> >sending side, you prove you have chains of commits leading to that
> >commit
> >and the sender knows that it does not have to send objects that are
> >reachable from that ref. One thing you could immediately do is de-dup
> >the
> >100k refs but we may already do that in the current code.
> 
> I am sorry I don't quite understand what you are suggesting is taking
> up the CPU time?  It doesn't take that much CPU just to gather 100refs
> and send them to the other side, that would be i/o bound.  Could you
> explain what is happening on the receiving side that is so time
> consuming?

You said earlier that it is "git rev-list --objects --stdin --not --all"
taking up all the CPU. That is probably called by
check_everything_connected. And that is why it is slow when you push
even a small change, but fast when you push only a deletion (in the
latter case, we skip the check because there are no new objects).

As for why that rev-list is slow, my suspicion is that it may be
quadratic behavior in commit_list_insert_by_date as we process the set
of negative refs. Basically, we keep a priority queue of commits to be
processed in our graph walk, but the queue is stored as a linked list.
So insertion is O(n), and building a list of n items (especially if they
are not in sorted order) is O(n^2).

I've run into this before dealing with repos with many refs (at GitHub,
some of our alternates repositories hit 100K refs, although typically we
have a lot of duplicated refs, as we are storing identical tags from
many repositories).

But that's just a suspicion. I don't have time tonight to work out a
test case. Is it possible for you to run something like:

  # make a new commit on top of HEAD, but not yet referenced
  sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null`

  # now do the same "connected" test that receive-pack would do
  git rev-list --objects $sha1 --not --all

That should replicate the slow behavior you are seeing. If that works,
try running the latter command under "perf"; my guess is that you will
see commit_list_insert_by_date as a hot-spot.

Even doing this simple test on a moderate repository (my git.git has
~1100 refs), commit_list_insert_by_date accounts for 10% of the CPU
according to perf.

-Peff

  reply	other threads:[~2012-03-30  9:32 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-30  0:18 Git push performance problems with ~100K refs Martin Fick
2012-03-30  2:12 ` Junio C Hamano
2012-03-30  2:43   ` Martin Fick
2012-03-30  9:32     ` Jeff King [this message]
2012-03-30  9:40       ` Jeff King
2012-03-30 14:22         ` Martin Fick
2012-03-31 22:10         ` [PATCH 1/3] add mergesort() for linked lists René Scharfe
2012-04-05 19:17           ` Junio C Hamano
2012-04-08 20:32             ` René Scharfe
2012-04-09 18:26               ` Junio C Hamano
2012-04-11  6:19           ` Stephen Boyd
2012-04-11 16:44             ` Junio C Hamano
2012-03-31 22:10         ` [PATCH 2/3] commit: use mergesort() in commit_list_sort_by_date() René Scharfe
2012-03-31 22:11         ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() René Scharfe
2012-03-31 22:36           ` Martin Fick
2012-03-31 23:45           ` Junio C Hamano
2012-04-02 16:24           ` Martin Fick
2012-04-02 16:39             ` Shawn Pearce
2012-04-02 16:49               ` Martin Fick
2012-04-02 16:51                 ` Shawn Pearce
2012-04-02 20:37                   ` Jeff King
2012-04-02 20:51                     ` Jeff King
2012-04-02 23:16                     ` Martin Fick
2012-04-03  3:49                     ` Nguyen Thai Ngoc Duy
2012-04-03  5:55                       ` Martin Fick
2012-04-03  6:55                         ` [PATCH 0/3] Commit cache Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 1/3] parse_commit_buffer: rename a confusing variable name Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 2/3] Add commit cache to help speed up commit traversal Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 3/3] Add parse_commit_for_rev() to take advantage of sha1-cache Nguyễn Thái Ngọc Duy
2012-04-05 13:02                       ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() Nguyen Thai Ngoc Duy
2012-04-06 19:21                         ` Shawn Pearce
2012-04-07  4:20                           ` Nguyen Thai Ngoc Duy
2012-04-03  3:44                   ` Nguyen Thai Ngoc Duy
2012-04-02 20:14           ` Jeff King
2012-04-02 22:54             ` René Scharfe
2012-04-03  8:40               ` Jeff King
2012-04-03  9:19                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120330093207.GA12298@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).