From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Git push performance problems with ~100K refs
Date: Fri, 30 Mar 2012 05:32:08 -0400 [thread overview]
Message-ID: <20120330093207.GA12298@sigill.intra.peff.net> (raw)
In-Reply-To: <60bff12d-544c-4fbd-b48a-0fdf44efaded@email.android.com>
On Thu, Mar 29, 2012 at 08:43:06PM -0600, Martin Fick wrote:
> >It is trying to minimize the transfer cost. By showing a ref to the
> >sending side, you prove you have chains of commits leading to that
> >commit
> >and the sender knows that it does not have to send objects that are
> >reachable from that ref. One thing you could immediately do is de-dup
> >the
> >100k refs but we may already do that in the current code.
>
> I am sorry I don't quite understand what you are suggesting is taking
> up the CPU time? It doesn't take that much CPU just to gather 100refs
> and send them to the other side, that would be i/o bound. Could you
> explain what is happening on the receiving side that is so time
> consuming?
You said earlier that it is "git rev-list --objects --stdin --not --all"
taking up all the CPU. That is probably called by
check_everything_connected. And that is why it is slow when you push
even a small change, but fast when you push only a deletion (in the
latter case, we skip the check because there are no new objects).
As for why that rev-list is slow, my suspicion is that it may be
quadratic behavior in commit_list_insert_by_date as we process the set
of negative refs. Basically, we keep a priority queue of commits to be
processed in our graph walk, but the queue is stored as a linked list.
So insertion is O(n), and building a list of n items (especially if they
are not in sorted order) is O(n^2).
I've run into this before dealing with repos with many refs (at GitHub,
some of our alternates repositories hit 100K refs, although typically we
have a lot of duplicated refs, as we are storing identical tags from
many repositories).
But that's just a suspicion. I don't have time tonight to work out a
test case. Is it possible for you to run something like:
# make a new commit on top of HEAD, but not yet referenced
sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null`
# now do the same "connected" test that receive-pack would do
git rev-list --objects $sha1 --not --all
That should replicate the slow behavior you are seeing. If that works,
try running the latter command under "perf"; my guess is that you will
see commit_list_insert_by_date as a hot-spot.
Even doing this simple test on a moderate repository (my git.git has
~1100 refs), commit_list_insert_by_date accounts for 10% of the CPU
according to perf.
-Peff
next prev parent reply other threads:[~2012-03-30 9:32 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-30 0:18 Git push performance problems with ~100K refs Martin Fick
2012-03-30 2:12 ` Junio C Hamano
2012-03-30 2:43 ` Martin Fick
2012-03-30 9:32 ` Jeff King [this message]
2012-03-30 9:40 ` Jeff King
2012-03-30 14:22 ` Martin Fick
2012-03-31 22:10 ` [PATCH 1/3] add mergesort() for linked lists René Scharfe
2012-04-05 19:17 ` Junio C Hamano
2012-04-08 20:32 ` René Scharfe
2012-04-09 18:26 ` Junio C Hamano
2012-04-11 6:19 ` Stephen Boyd
2012-04-11 16:44 ` Junio C Hamano
2012-03-31 22:10 ` [PATCH 2/3] commit: use mergesort() in commit_list_sort_by_date() René Scharfe
2012-03-31 22:11 ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() René Scharfe
2012-03-31 22:36 ` Martin Fick
2012-03-31 23:45 ` Junio C Hamano
2012-04-02 16:24 ` Martin Fick
2012-04-02 16:39 ` Shawn Pearce
2012-04-02 16:49 ` Martin Fick
2012-04-02 16:51 ` Shawn Pearce
2012-04-02 20:37 ` Jeff King
2012-04-02 20:51 ` Jeff King
2012-04-02 23:16 ` Martin Fick
2012-04-03 3:49 ` Nguyen Thai Ngoc Duy
2012-04-03 5:55 ` Martin Fick
2012-04-03 6:55 ` [PATCH 0/3] Commit cache Nguyễn Thái Ngọc Duy
2012-04-03 6:55 ` [PATCH 1/3] parse_commit_buffer: rename a confusing variable name Nguyễn Thái Ngọc Duy
2012-04-03 6:55 ` [PATCH 2/3] Add commit cache to help speed up commit traversal Nguyễn Thái Ngọc Duy
2012-04-03 6:55 ` [PATCH 3/3] Add parse_commit_for_rev() to take advantage of sha1-cache Nguyễn Thái Ngọc Duy
2012-04-05 13:02 ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() Nguyen Thai Ngoc Duy
2012-04-06 19:21 ` Shawn Pearce
2012-04-07 4:20 ` Nguyen Thai Ngoc Duy
2012-04-03 3:44 ` Nguyen Thai Ngoc Duy
2012-04-02 20:14 ` Jeff King
2012-04-02 22:54 ` René Scharfe
2012-04-03 8:40 ` Jeff King
2012-04-03 9:19 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120330093207.GA12298@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mfick@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).