git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 3/5] combine_notes_cat_sort_uniq(): sort and dedup lines all at once
Date: Sun,  4 Nov 2012 08:07:08 +0100	[thread overview]
Message-ID: <1352012830-13591-4-git-send-email-mhagger@alum.mit.edu> (raw)
In-Reply-To: <1352012830-13591-1-git-send-email-mhagger@alum.mit.edu>

Instead of reading lines one by one and insertion-sorting them into a
string_list, read all of the lines, sort them, then remove duplicates.
Aside from being less code, this reduces the complexity from O(N^2) to
O(N lg N) in the total number of lines.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 notes.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/notes.c b/notes.c
index 8652f8f..62f8f6f 100644
--- a/notes.c
+++ b/notes.c
@@ -848,15 +848,16 @@ int combine_notes_ignore(unsigned char *cur_sha1,
 	return 0;
 }
 
-static int string_list_add_note_lines(struct string_list *sort_uniq_list,
+/*
+ * Add the lines from the named object to list, with trailing
+ * newlines removed.
+ */
+static int string_list_add_note_lines(struct string_list *list,
 				      const unsigned char *sha1)
 {
 	char *data;
 	unsigned long len;
 	enum object_type t;
-	struct strbuf buf = STRBUF_INIT;
-	struct strbuf **lines = NULL;
-	int i, list_index;
 
 	if (is_null_sha1(sha1))
 		return 0;
@@ -868,24 +869,14 @@ static int string_list_add_note_lines(struct string_list *sort_uniq_list,
 		return t != OBJ_BLOB || !data;
 	}
 
-	strbuf_attach(&buf, data, len, len + 1);
-	lines = strbuf_split(&buf, '\n');
-
-	for (i = 0; lines[i]; i++) {
-		if (lines[i]->buf[lines[i]->len - 1] == '\n')
-			strbuf_setlen(lines[i], lines[i]->len - 1);
-		if (!lines[i]->len)
-			continue; /* skip empty lines */
-		list_index = string_list_find_insert_index(sort_uniq_list,
-							   lines[i]->buf, 0);
-		if (list_index < 0)
-			continue; /* skip duplicate lines */
-		string_list_insert_at_index(sort_uniq_list, list_index,
-					    lines[i]->buf);
-	}
-
-	strbuf_list_free(lines);
-	strbuf_release(&buf);
+	/*
+	 * If the last line of the file is EOL-terminated, this will
+	 * add an empty string to the list.  But it will be removed
+	 * later, along with any empty strings that came from empty
+	 * lines within the file.
+	 */
+	string_list_split(list, data, '\n', -1);
+	free(data);
 	return 0;
 }
 
@@ -910,6 +901,9 @@ int combine_notes_cat_sort_uniq(unsigned char *cur_sha1,
 		goto out;
 	if (string_list_add_note_lines(&sort_uniq_list, new_sha1))
 		goto out;
+	string_list_remove_empty_items(&sort_uniq_list, 0);
+	sort_string_list(&sort_uniq_list);
+	string_list_remove_duplicates(&sort_uniq_list, 0);
 
 	/* create a new blob object from sort_uniq_list */
 	if (for_each_string_list(&sort_uniq_list,
-- 
1.8.0

  parent reply	other threads:[~2012-11-04  7:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-04  7:07 [PATCH 0/5] Use string_lists when processing notes Michael Haggerty
2012-11-04  7:07 ` [PATCH 1/5] string_list: add a function string_list_remove_empty_items() Michael Haggerty
2012-11-04  7:07 ` [PATCH 2/5] Initialize sort_uniq_list using named constant Michael Haggerty
2012-11-04  7:07 ` Michael Haggerty [this message]
2012-11-04  7:07 ` [PATCH 4/5] notes: fix handling of colon-separated values Michael Haggerty
2012-11-04  7:07 ` [PATCH 5/5] string_list_add_refs_from_colon_sep(): use string_list_split() Michael Haggerty
2012-11-04 11:57   ` Jeff King
     [not found] ` <CALKQrgebzH5vJUQVNxTks0Nq_3OZBWrb-cLDkABxnGJJqfB7gQ@mail.gmail.com>
     [not found]   ` <5098C29A.4010901@alum.mit.edu>
2012-11-06 14:20     ` [PATCH 0/5] Use string_lists when processing notes Johan Herland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1352012830-13591-4-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).