git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Really slow 'git gc'
Date: Thu, 19 Feb 2009 13:34:32 -0800	[thread overview]
Message-ID: <7vk57mkt5j.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <7vr61uku2f.fsf@gitster.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 19 Feb 2009 13:14:48 -0800")

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> The real reason _seems_ to be the "--unpacked=pack-....pack" arguments. I 
>> literally had 232 pack-files, and it looks like a lot of the time was 
>> spent in that silly loop oer 'ignore_packed' in find_pack_entry(), when 
>> revision.c does that "has_sha1_pack()" thing. You get a O(n**2) effect in 
>> number of pack-files: for each commit we look over every pack-file, and 
>> for every pack-file we look at, we look over each ignore_pack entry.
>
> I think we can add a single bit to "struct packed_git" and in the middle
> of setup_revisions() perform the O(N**2) once, so that find_pack_entry()
> can check the bit without looping.

Roughly like this, although we probably should change the API because most
of the callers pass NULL to it.  Also we may need a way to say "I am done
with ignoring, please clear the pack_ignore bits from all of them" API.

---

 cache.h     |    4 +++-
 revision.c  |    2 ++
 sha1_file.c |   31 +++++++++++++++++++++++--------
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/cache.h b/cache.h
index 37dfb1c..7e8c008 100644
--- a/cache.h
+++ b/cache.h
@@ -759,7 +759,8 @@ extern struct packed_git {
 	time_t mtime;
 	int pack_fd;
 	unsigned pack_local:1,
-		 pack_keep:1;
+		 pack_keep:1,
+		 pack_ignore:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
@@ -817,6 +818,7 @@ extern struct packed_git *parse_pack_index(unsigned char *sha1);
 extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
+extern void mark_ignore_packed(const char **);
 
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 					 struct packed_git *packs);
diff --git a/revision.c b/revision.c
index 286e416..86f80da 100644
--- a/revision.c
+++ b/revision.c
@@ -1342,6 +1342,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 		object = get_reference(revs, revs->def, sha1, 0);
 		add_pending_object_with_mode(revs, object, revs->def, mode);
 	}
+	if (revs->ignore_packed)
+		mark_ignore_packed(revs->ignore_packed);
 
 	/* Did the user ask for any diff output? Run the diff! */
 	if (revs->diffopt.output_format & ~DIFF_FORMAT_NO_OUTPUT)
diff --git a/sha1_file.c b/sha1_file.c
index 5b6e0f6..4a804c7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1937,6 +1937,27 @@ int matches_pack_name(struct packed_git *p, const char *name)
 	return 0;
 }
 
+void mark_ignore_packed(const char **ignore_packed)
+{
+	struct packed_git *p;
+
+	if (!ignore_packed || !*ignore_packed)
+		return;
+
+	prepare_packed_git();
+	if (!packed_git)
+		return;
+
+	for (p = packed_git; p; p = p->next) {
+		const char **ig;
+		for (ig = ignore_packed; *ig; ig++)
+			if (matches_pack_name(p, *ig)) {
+				p->pack_ignore = 1;
+				break;
+			}
+	}
+}
+
 static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, const char **ignore_packed)
 {
 	static struct packed_git *last_found = (void *)1;
@@ -1949,14 +1970,8 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, cons
 	p = (last_found == (void *)1) ? packed_git : last_found;
 
 	do {
-		if (ignore_packed) {
-			const char **ig;
-			for (ig = ignore_packed; *ig; ig++)
-				if (matches_pack_name(p, *ig))
-					break;
-			if (*ig)
-				goto next;
-		}
+		if (ignore_packed && p->pack_ignore)
+			goto next;
 
 		if (p->num_bad_objects) {
 			unsigned i;

      parent reply	other threads:[~2009-02-19 21:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-19 20:24 Really slow 'git gc' Linus Torvalds
2009-02-19 21:14 ` Junio C Hamano
2009-02-19 21:25   ` Linus Torvalds
2009-02-28  9:15     ` [PATCH 0/6] "git repack -a -d" improvements Junio C Hamano
2009-02-28  9:15       ` [PATCH 1/6] git-repack: resist stray environment variable Junio C Hamano
2009-02-28  9:15       ` [PATCH 2/6] has_sha1_pack(): refactor "pretend these packs do not exist" interface Junio C Hamano
2009-02-28  9:15       ` [PATCH 3/6] has_sha1_kept_pack(): take "struct rev_info" Junio C Hamano
2009-02-28  9:15       ` [PATCH 4/6] Consolidate ignore_packed logic more Junio C Hamano
2009-02-28  9:15       ` [PATCH 5/6] Simplify is_kept_pack() Junio C Hamano
2009-02-28  9:15       ` [PATCH 6/6] is_kept_pack(): final clean-up Junio C Hamano
2009-02-28 12:29       ` [PATCH 0/6] "git repack -a -d" improvements Kjetil Barvik
2009-02-28 17:41         ` Junio C Hamano
     [not found]       ` <7Vazs5mFk91IKAarOd0wrBNmYj7eSJxVIcR0PEQxJl8R0aQmQDEqSJMphMrXhmVu570fijupQ34@cipher.nrlssc.navy.mil>
2009-03-18 20:59         ` [PATCH] t7700-repack: repack -a now works properly, expect success from test Brandon Casey
2009-03-20  3:47           ` [PATCH 0/5] repack improvements Brandon Casey
2009-03-20  3:47             ` [PATCH 1/5] t7700-repack: add two new tests demonstrating repacking flaws Brandon Casey
2009-03-20  3:47               ` [PATCH 2/5] git-repack.sh: don't use --kept-pack-only option to pack-objects Brandon Casey
2009-03-20  3:47                 ` [PATCH 3/5] pack-objects: only repack or loosen objects residing in "local" packs Brandon Casey
2009-03-20  3:47                   ` [PATCH 4/5] t7700-repack: repack -a now works properly, expect success from test Brandon Casey
2009-03-20  3:47                     ` [PATCH 5/5] Remove --kept-pack-only option and associated infrastructure Brandon Casey
2009-03-20  4:05             ` [PATCH 0/5] repack improvements Brandon Casey
2009-02-19 21:34   ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vk57mkt5j.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).