git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Martin Fick <mfick@codeaurora.org>, git@vger.kernel.org
Subject: [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete
Date: Tue, 2 Jul 2013 02:16:23 -0400	[thread overview]
Message-ID: <20130702061623.GA27856@sigill.intra.peff.net> (raw)
In-Reply-To: <20130702061149.GB1206@sigill.intra.peff.net>

We insert the commit pointed to by each ref one-by-one into
the "complete" commit_list using insert_by_date. Because
each insertion is O(n), we end up with O(n^2) behavior.

This typically doesn't matter, because the number of refs is
reasonably small. And even if there are a lot of refs, they
often point to a smaller set of objects (in which case the
optimization in commit ea5f220 keeps our "n" small).

However, in pathological repositories (hundreds of thousands
of refs, each pointing to a unique commit), this quadratic
behavior can make a difference. Since we do not care about
the list order until we have finished building it, we can
simply keep it unsorted during the insertion phase, then
sort it afterwards.

On a repository like the one described above, this dropped
the time to do a no-op fetch from 2.0s to 1.7s. On normal
repositories, it probably does not matter at all, but it
does not hurt to protect ourselves from pathological cases.

Signed-off-by: Jeff King <peff@peff.net>
---
A note on the timings. I measured the times above last year when I wrote
the same patch here:

  http://article.gmane.org/gmane.comp.version-control.git/194939

And earlier tonight, I did a fetch that showed the same result. But when
I tried to replicate it while writing the commit message, I had trouble,
because either:

  1. I was fetching actual commits, in which case the more serious
     problem behavior in find_common kicked in, ruining the measurement.

  2. It was a no-op fetch, in which case quickfetch() kicked in and we
     did not call mark_complete at all.

So I'm rather confused how I managed to get timings earlier (both last
year, and earlier today). But I still think it's an obviously correct
thing to do, and does protect us in case (1) above (actually fetching a
commit) once we fix the problem in find_common.

 fetch-pack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index abe5ffb..4df8abd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -505,7 +505,7 @@ static int mark_complete(const char *refname, const unsigned char *sha1, int fla
 		struct commit *commit = (struct commit *)o;
 		if (!(commit->object.flags & COMPLETE)) {
 			commit->object.flags |= COMPLETE;
-			commit_list_insert_by_date(commit, &complete);
+			commit_list_insert(commit, &complete);
 		}
 	}
 	return 0;
@@ -622,6 +622,7 @@ static int everything_local(struct fetch_pack_args *args,
 	if (!args->depth) {
 		for_each_ref(mark_complete, NULL);
 		for_each_alternate_ref(mark_alternate_complete, NULL);
+		commit_list_sort_by_date(&complete);
 		if (cutoff)
 			mark_recent_complete_commits(args, cutoff);
 	}
-- 
1.8.3.rc2.14.g7eee6b3

  reply	other threads:[~2013-07-02  6:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-02  3:02 How to still kill git fetch with too many refs Martin Fick
2013-07-02  4:07 ` Jeff King
2013-07-02  4:41   ` Jeff King
2013-07-02  5:01     ` Jeff King
2013-07-02  5:19       ` Junio C Hamano
2013-07-02  5:28         ` Jeff King
2013-07-02  6:11           ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02  6:16             ` Jeff King [this message]
2013-07-02  6:21             ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02  6:24             ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02  7:52               ` Eric Sunshine
2013-07-02 17:45             ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52       ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02  9:24 ` Michael Haggerty
2013-07-02 16:58   ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130702061623.GA27856@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).