git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
Cc: Martin Fick <mfick@codeaurora.org>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk()
Date: Mon, 2 Apr 2012 16:14:33 -0400	[thread overview]
Message-ID: <20120402201432.GA26503@sigill.intra.peff.net> (raw)
In-Reply-To: <4F7780F5.3060306@lsrfire.ath.cx>

On Sun, Apr 01, 2012 at 12:11:01AM +0200, René Scharfe wrote:

> Speed up prepare_revision_walk() by adding commits without sorting
> to the commit_list and at the end sort the list in one go.  Thanks
> to mergesort() working behind the scenes, this is a lot faster for
> large numbers of commits than the current insert sort.

I think this is probably a sane thing to do, but I have two slight
misgivings:

  1. Is it worth the complexity of the linked-list mergesort? I was
     planning to just build an array, qsort it, and then put the results
     into a linked list. The patch for that is below for reference.

     It's a lot less code and complexity for the same performance
     (actually, I measured it at 1% faster, but that is probably
     negligible). The downside is that it is not nicely encapsulated in
     commit_list_sort_by_date(). We call the latter from two other
     places; I don't know if they can be fed with enough commits to
     actually benefit from the performance gain or not.

  2. I'm not super happy about fixing this one spot. This quadratic
     behavior comes up in a lot of places, and we're slowly hacking them
     one by one. E.g., this does nothing to help the same case in
     fetch-pack.c:mark_complete[1]. Nor does it help the fact that
     when we follow parents, we will do an O(n) insert_by_date for each
     commit we insert. The latter is largely saved by the locality of
     timestamps (i.e., timestamps of the parents of recently popped
     commits tend to be near the front of the list), as well as the hack
     in fce87ae (Fix quadratic performance in rewrite_one., 2008-07-12).

     So I wonder if in the long term we would benefit from a better data
     structure, which would make these problems just go away. That being
     said, there is a lot of code to be updated with such a change, so
     even if we do want to do that eventually, a quick fix like this is
     probably still a good thing.

-Peff

[1] I fixed the mark_complete thing in ea5f220 (fetch: avoid repeated
    commits in mark_complete, 2011-05-19), but only for exact-duplicate
    commits. The real-world case where it came up was an "alternates"
    repository that held refs for many clones (so we had hundreds or
    thousands of copies of each tag). But on a repository like the one
    we are testing on, I think it would be similarly slow.

---
Here's the qsort-in-array patch, for reference.

diff --git a/revision.c b/revision.c
index b3554ed..22c26d0 100644
--- a/revision.c
+++ b/revision.c
@@ -2062,10 +2062,24 @@ static void set_children(struct rev_info *revs)
 	}
 }
 
+static int commit_compare_by_date(const void *va, const void *vb)
+{
+	const struct commit *a = va;
+	const struct commit *b = vb;
+	if (a->date < b->date)
+		return -1;
+	if (b->date < a->date)
+		return 1;
+	return 0;
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
 	int nr = revs->pending.nr;
 	struct object_array_entry *e, *list;
+	struct commit **commits = NULL;
+	int commits_nr = 0, commits_alloc = 0;
+	int i;
 
 	e = list = revs->pending.objects;
 	revs->pending.nr = 0;
@@ -2076,11 +2090,17 @@ int prepare_revision_walk(struct rev_info *revs)
 		if (commit) {
 			if (!(commit->object.flags & SEEN)) {
 				commit->object.flags |= SEEN;
-				commit_list_insert_by_date(commit, &revs->commits);
+				ALLOC_GROW(commits, commits_nr + 1, commits_alloc);
+				commits[commits_nr++] = commit;
 			}
 		}
 		e++;
 	}
+	qsort(commits, commits_nr, sizeof(*commits), commit_compare_by_date);
+	for (i = commits_nr - 1; i >= 0; i--)
+		commit_list_insert(commits[i], &revs->commits);
+	free(commits);
+
 	if (!revs->leak_pending)
 		free(list);
 

  parent reply	other threads:[~2012-04-02 20:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-30  0:18 Git push performance problems with ~100K refs Martin Fick
2012-03-30  2:12 ` Junio C Hamano
2012-03-30  2:43   ` Martin Fick
2012-03-30  9:32     ` Jeff King
2012-03-30  9:40       ` Jeff King
2012-03-30 14:22         ` Martin Fick
2012-03-31 22:10         ` [PATCH 1/3] add mergesort() for linked lists René Scharfe
2012-04-05 19:17           ` Junio C Hamano
2012-04-08 20:32             ` René Scharfe
2012-04-09 18:26               ` Junio C Hamano
2012-04-11  6:19           ` Stephen Boyd
2012-04-11 16:44             ` Junio C Hamano
2012-03-31 22:10         ` [PATCH 2/3] commit: use mergesort() in commit_list_sort_by_date() René Scharfe
2012-03-31 22:11         ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() René Scharfe
2012-03-31 22:36           ` Martin Fick
2012-03-31 23:45           ` Junio C Hamano
2012-04-02 16:24           ` Martin Fick
2012-04-02 16:39             ` Shawn Pearce
2012-04-02 16:49               ` Martin Fick
2012-04-02 16:51                 ` Shawn Pearce
2012-04-02 20:37                   ` Jeff King
2012-04-02 20:51                     ` Jeff King
2012-04-02 23:16                     ` Martin Fick
2012-04-03  3:49                     ` Nguyen Thai Ngoc Duy
2012-04-03  5:55                       ` Martin Fick
2012-04-03  6:55                         ` [PATCH 0/3] Commit cache Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 1/3] parse_commit_buffer: rename a confusing variable name Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 2/3] Add commit cache to help speed up commit traversal Nguyễn Thái Ngọc Duy
2012-04-03  6:55                         ` [PATCH 3/3] Add parse_commit_for_rev() to take advantage of sha1-cache Nguyễn Thái Ngọc Duy
2012-04-05 13:02                       ` [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk() Nguyen Thai Ngoc Duy
2012-04-06 19:21                         ` Shawn Pearce
2012-04-07  4:20                           ` Nguyen Thai Ngoc Duy
2012-04-03  3:44                   ` Nguyen Thai Ngoc Duy
2012-04-02 20:14           ` Jeff King [this message]
2012-04-02 22:54             ` René Scharfe
2012-04-03  8:40               ` Jeff King
2012-04-03  9:19                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120402201432.GA26503@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=rene.scharfe@lsrfire.ath.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).