git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mhagger@alum.mit.edu
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Drew Northup <drew.northup@maine.edu>,
	Jakub Narebski <jnareb@gmail.com>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Johan Herland <johan@herland.net>,
	Julian Phillips <julian@quantumfyre.co.uk>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 07/28] sort_ref_dir(): do not sort if already sorted
Date: Fri, 28 Oct 2011 14:28:20 +0200	[thread overview]
Message-ID: <1319804921-17545-8-git-send-email-mhagger@alum.mit.edu> (raw)
In-Reply-To: <1319804921-17545-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Keep track of how many entries in a ref_dir are already sorted.  In
sort_ref_dir(), only call qsort() if the dir contains unsorted
entries.

We could store a binary "sorted" value instead of an integer, but
storing the number of sorted entries leaves the way open for a couple
of possible future optimizations:

* In sort_ref_dir(), sort *only* the unsorted entries, then merge them
  with the sorted entries.  This should be faster if most of the
  entries are already sorted.

* Teach search_ref_dir() to do a binary search of any sorted entries,
  and if unsuccessful do a linear search of any unsorted entries.
  This would avoid the need to sort the list every time that
  search_ref_dir() is called, and (given some intelligence about how
  often to sort) could significantly improve the speed in certain
  hypothetical usage patterns.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   29 ++++++++++++++++++++++++-----
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 6b4d5d5..8733a08 100644
--- a/refs.c
+++ b/refs.c
@@ -145,6 +145,10 @@ struct ref_value {
 
 struct ref_dir {
 	int nr, alloc;
+
+	/* How many of the entries in this directory are sorted? */
+	int sorted;
+
 	struct ref_entry **entries;
 };
 
@@ -245,7 +249,7 @@ static void clear_ref_dir(struct ref_dir *dir)
 	for (i = 0; i < dir->nr; i++)
 		free_ref_entry(dir->entries[i]);
 	free(dir->entries);
-	dir->nr = dir->alloc = 0;
+	dir->sorted = dir->nr = dir->alloc = 0;
 	dir->entries = NULL;
 }
 
@@ -294,8 +298,9 @@ static struct ref_entry *search_ref_dir(struct ref_dir *dir, const char *refname
 
 	/*
 	 * We need dir to be sorted so that binary search works.
-	 * FIXME: Sorting the array each time is terribly inefficient,
-	 * and has to be changed.
+	 * Calling sort_ref_dir() here is not quite as terribly
+	 * inefficient as it looks, because directories that are
+	 * already sorted are not re-sorted.
 	 */
 	sort_ref_dir(dir);
 
@@ -400,13 +405,27 @@ static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2
 	return 1;
 }
 
+/*
+ * Sort the entries in dir and its subdirectories (if they are not
+ * already sorted).
+ */
 static void sort_ref_dir(struct ref_dir *dir)
 {
 	int i, j;
 	struct ref_entry *last = NULL;
 
-	if (!dir->nr)
+	if (dir->sorted == dir->nr) {
+		/*
+		 * This directory is already sorted and de-duped, but
+		 * we still have to sort subdirectories.
+		 */
+		for (i = 0; i < dir->nr; i++) {
+			struct ref_entry *entry = dir->entries[i];
+			if (entry->flag & REF_DIR)
+				sort_ref_dir(&entry->u.subdir);
+		}
 		return;
+	}
 
 	qsort(dir->entries, dir->nr, sizeof(*dir->entries), ref_entry_cmp);
 
@@ -423,7 +442,7 @@ static void sort_ref_dir(struct ref_dir *dir)
 			last = dir->entries[i++] = entry;
 		}
 	}
-	dir->nr = i;
+	dir->sorted = dir->nr = i;
 }
 
 #define DO_FOR_EACH_INCLUDE_BROKEN 01
-- 
1.7.7

  parent reply	other threads:[~2011-10-28 12:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-28 12:28 [PATCH 00/28] Store references hierarchically in cache mhagger
2011-10-28 12:28 ` [PATCH 01/28] refs.c: reorder definitions more logically mhagger
2011-10-28 12:28 ` [PATCH 02/28] free_ref_entry(): new function mhagger
2011-10-28 12:28 ` [PATCH 03/28] check_refname_component(): return 0 for zero-length components mhagger
2011-10-28 12:28 ` [PATCH 04/28] struct ref_entry: nest the value part in a union mhagger
2011-10-28 12:28 ` [PATCH 05/28] refs.c: rename ref_array -> ref_dir mhagger
2011-10-28 12:28 ` [PATCH 06/28] refs: store references hierarchically mhagger
2011-10-28 12:28 ` mhagger [this message]
2011-10-28 12:28 ` [PATCH 08/28] refs: sort ref_dirs lazily mhagger
2011-10-28 12:28 ` [PATCH 09/28] do_for_each_ref(): only iterate over the subtree that was requested mhagger
2011-10-28 12:28 ` [PATCH 10/28] get_ref_dir(): keep track of the current ref_dir mhagger
2011-10-28 12:28 ` [PATCH 11/28] refs: wrap top-level ref_dirs in ref_entries mhagger
2011-10-28 12:28 ` [PATCH 12/28] get_packed_refs(): return (ref_entry *) instead of (ref_dir *) mhagger
2011-10-28 12:28 ` [PATCH 13/28] get_loose_refs(): " mhagger
2011-10-28 12:28 ` [PATCH 14/28] is_refname_available(): take " mhagger
2011-10-28 12:28 ` [PATCH 15/28] find_ref(): " mhagger
2011-10-28 12:28 ` [PATCH 16/28] read_packed_refs(): " mhagger
2011-10-28 12:28 ` [PATCH 17/28] add_ref(): " mhagger
2011-10-28 12:28 ` [PATCH 18/28] find_containing_direntry(): use " mhagger
2011-10-28 12:28 ` [PATCH 19/28] search_ref_dir(): take " mhagger
2011-10-28 12:28 ` [PATCH 20/28] add_entry(): " mhagger
2011-10-28 12:28 ` [PATCH 21/28] do_for_each_ref_in_dir*(): " mhagger
2011-10-28 12:28 ` [PATCH 22/28] sort_ref_dir(): " mhagger
2011-10-28 12:28 ` [PATCH 23/28] struct ref_dir: store a reference to the enclosing ref_cache mhagger
2011-10-28 12:28 ` [PATCH 24/28] read_loose_refs(): take a (ref_entry *) as argument mhagger
2011-10-28 12:28 ` [PATCH 25/28] refs: read loose references lazily mhagger
2011-10-28 12:28 ` [PATCH 26/28] is_refname_available(): query only possibly-conflicting references mhagger
2011-11-15  5:55   ` [PATCH] Fix "is_refname_available(): query only possibly-conflicting references" mhagger
2011-11-15  7:24     ` Junio C Hamano
2011-11-15 16:19       ` Michael Haggerty
2011-11-15 19:19         ` Junio C Hamano
2011-10-28 12:28 ` [PATCH 27/28] read_packed_refs(): keep track of the directory being worked in mhagger
2011-10-28 12:28 ` [PATCH 28/28] repack_without_ref(): call clear_packed_ref_cache() mhagger
2011-10-28 13:07 ` [PATCH 00/28] Store references hierarchically in cache Ramkumar Ramachandra
2011-10-28 18:45   ` Michael Haggerty
2011-11-16 12:51 ` [PATCH 00/28] Store references hierarchically in cache -- benchmark results Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1319804921-17545-8-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jnareb@gmail.com \
    --cc=johan@herland.net \
    --cc=julian@quantumfyre.co.uk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).