From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Martin Fick <mfick@codeaurora.org>, git@vger.kernel.org
Subject: [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete
Date: Tue, 2 Jul 2013 02:16:23 -0400 [thread overview]
Message-ID: <20130702061623.GA27856@sigill.intra.peff.net> (raw)
In-Reply-To: <20130702061149.GB1206@sigill.intra.peff.net>
We insert the commit pointed to by each ref one-by-one into
the "complete" commit_list using insert_by_date. Because
each insertion is O(n), we end up with O(n^2) behavior.
This typically doesn't matter, because the number of refs is
reasonably small. And even if there are a lot of refs, they
often point to a smaller set of objects (in which case the
optimization in commit ea5f220 keeps our "n" small).
However, in pathological repositories (hundreds of thousands
of refs, each pointing to a unique commit), this quadratic
behavior can make a difference. Since we do not care about
the list order until we have finished building it, we can
simply keep it unsorted during the insertion phase, then
sort it afterwards.
On a repository like the one described above, this dropped
the time to do a no-op fetch from 2.0s to 1.7s. On normal
repositories, it probably does not matter at all, but it
does not hurt to protect ourselves from pathological cases.
Signed-off-by: Jeff King <peff@peff.net>
---
A note on the timings. I measured the times above last year when I wrote
the same patch here:
http://article.gmane.org/gmane.comp.version-control.git/194939
And earlier tonight, I did a fetch that showed the same result. But when
I tried to replicate it while writing the commit message, I had trouble,
because either:
1. I was fetching actual commits, in which case the more serious
problem behavior in find_common kicked in, ruining the measurement.
2. It was a no-op fetch, in which case quickfetch() kicked in and we
did not call mark_complete at all.
So I'm rather confused how I managed to get timings earlier (both last
year, and earlier today). But I still think it's an obviously correct
thing to do, and does protect us in case (1) above (actually fetching a
commit) once we fix the problem in find_common.
fetch-pack.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index abe5ffb..4df8abd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -505,7 +505,7 @@ static int mark_complete(const char *refname, const unsigned char *sha1, int fla
struct commit *commit = (struct commit *)o;
if (!(commit->object.flags & COMPLETE)) {
commit->object.flags |= COMPLETE;
- commit_list_insert_by_date(commit, &complete);
+ commit_list_insert(commit, &complete);
}
}
return 0;
@@ -622,6 +622,7 @@ static int everything_local(struct fetch_pack_args *args,
if (!args->depth) {
for_each_ref(mark_complete, NULL);
for_each_alternate_ref(mark_alternate_complete, NULL);
+ commit_list_sort_by_date(&complete);
if (cutoff)
mark_recent_complete_commits(args, cutoff);
}
--
1.8.3.rc2.14.g7eee6b3
next prev parent reply other threads:[~2013-07-02 6:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-02 3:02 How to still kill git fetch with too many refs Martin Fick
2013-07-02 4:07 ` Jeff King
2013-07-02 4:41 ` Jeff King
2013-07-02 5:01 ` Jeff King
2013-07-02 5:19 ` Junio C Hamano
2013-07-02 5:28 ` Jeff King
2013-07-02 6:11 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02 6:16 ` Jeff King [this message]
2013-07-02 6:21 ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02 6:24 ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02 7:52 ` Eric Sunshine
2013-07-02 17:45 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52 ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02 9:24 ` Michael Haggerty
2013-07-02 16:58 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130702061623.GA27856@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mfick@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).