From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: git@vger.kernel.org
Subject: Re: How to still kill git fetch with too many refs
Date: Tue, 2 Jul 2013 00:07:58 -0400 [thread overview]
Message-ID: <20130702040758.GA7068@sigill.intra.peff.net> (raw)
In-Reply-To: <201307012102.31384.mfick@codeaurora.org>
On Mon, Jul 01, 2013 at 09:02:31PM -0600, Martin Fick wrote:
> A simple synthetic test case with 1M refs all pointing to the same
> sha1 seems to be easily handled by git these days. However, in our
> experience with our internal git repo, we still have performance
> issues related to having too many refs, in our kernel/msm instance we
> have around 400K.
I'm not too surprised. There's O(n^2) behavior in fetch-pack's
mark_complete function, as it adds each of the N ref tips to a
commit_list, sorting by date (so each insertion is O(n)).
I posted two alternative patches in May of 2011. The first simply
avoids adding duplicate objects, which is simple and covers many
real-world cases (e.g., an "alternates" repository which has a bunch of
copies of the same tags, one per fork). The second one switches the
commit_list out for a heap-based priority queue.
We ended up taking the first (as ea5f220), since it was trivial and
obviously correct, but didn't bother with the second since:
1. There had been no real-world reports of it.
2. While in theory a priority queue implementation would be used in
other spots, too, it ended up being a pain to use it, as most of
the callers wanted list-like splicing.
You can see the original here:
http://thread.gmane.org/gmane.comp.version-control.git/174003/focus=174005
Though it probably doesn't apply cleanly anymore. However, I've kept it
rebased over the years at:
git://github.com/peff/git.git jk/fast-commit-list
Junio recently added a priority queue implementation in b4b594a
(prio-queue: priority queue of pointers to structs, 2013-06-06), which
is currently in next. So a modern version of that series would build on
top of that, rather than my priority queue.
And yet another alternative would be to keep the list unsorted during
the mark_complete calls, and then sort it at the end. Like this:
diff --git a/fetch-pack.c b/fetch-pack.c
index abe5ffb..4df8abd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -505,7 +505,7 @@ static int mark_complete(const char *refname, const unsigned char *sha1, int fla
struct commit *commit = (struct commit *)o;
if (!(commit->object.flags & COMPLETE)) {
commit->object.flags |= COMPLETE;
- commit_list_insert_by_date(commit, &complete);
+ commit_list_insert(commit, &complete);
}
}
return 0;
@@ -622,6 +622,7 @@ static int everything_local(struct fetch_pack_args *args,
if (!args->depth) {
for_each_ref(mark_complete, NULL);
for_each_alternate_ref(mark_alternate_complete, NULL);
+ commit_list_sort_by_date(&complete);
if (cutoff)
mark_recent_complete_commits(args, cutoff);
}
(If you're wondering why we didn't do this trivial bit at the time, it
was because back then we did not yet have the René's nice linked-list
mergesort that backs commit_list_sort_by_date).
> The result, a copy of linus' repo with a million unique
> valid refs and a git fetch of a single updated ref taking a
> very long time (55mins and it did not complete yet). Note,
> with 100K refs it completes in about 2m40s. It is likely
> not linear since 2m40s * 10 would be ~26m (but the
> difference could also just be how the data in the sha1s are
> ordered).
That sounds like the O(n^2) problem. My timings back then with 100K refs
were 1-2 minutes. Does the patch above fix it for you?
-Peff
next prev parent reply other threads:[~2013-07-02 4:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-02 3:02 How to still kill git fetch with too many refs Martin Fick
2013-07-02 4:07 ` Jeff King [this message]
2013-07-02 4:41 ` Jeff King
2013-07-02 5:01 ` Jeff King
2013-07-02 5:19 ` Junio C Hamano
2013-07-02 5:28 ` Jeff King
2013-07-02 6:11 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02 6:16 ` [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete Jeff King
2013-07-02 6:21 ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02 6:24 ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02 7:52 ` Eric Sunshine
2013-07-02 17:45 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52 ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02 9:24 ` Michael Haggerty
2013-07-02 16:58 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130702040758.GA7068@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mfick@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).