From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Really slow 'git gc'
Date: Thu, 19 Feb 2009 13:34:32 -0800 [thread overview]
Message-ID: <7vk57mkt5j.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <7vr61uku2f.fsf@gitster.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 19 Feb 2009 13:14:48 -0800")
Junio C Hamano <gitster@pobox.com> writes:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> The real reason _seems_ to be the "--unpacked=pack-....pack" arguments. I
>> literally had 232 pack-files, and it looks like a lot of the time was
>> spent in that silly loop oer 'ignore_packed' in find_pack_entry(), when
>> revision.c does that "has_sha1_pack()" thing. You get a O(n**2) effect in
>> number of pack-files: for each commit we look over every pack-file, and
>> for every pack-file we look at, we look over each ignore_pack entry.
>
> I think we can add a single bit to "struct packed_git" and in the middle
> of setup_revisions() perform the O(N**2) once, so that find_pack_entry()
> can check the bit without looping.
Roughly like this, although we probably should change the API because most
of the callers pass NULL to it. Also we may need a way to say "I am done
with ignoring, please clear the pack_ignore bits from all of them" API.
---
cache.h | 4 +++-
revision.c | 2 ++
sha1_file.c | 31 +++++++++++++++++++++++--------
3 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/cache.h b/cache.h
index 37dfb1c..7e8c008 100644
--- a/cache.h
+++ b/cache.h
@@ -759,7 +759,8 @@ extern struct packed_git {
time_t mtime;
int pack_fd;
unsigned pack_local:1,
- pack_keep:1;
+ pack_keep:1,
+ pack_ignore:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
char pack_name[FLEX_ARRAY]; /* more */
@@ -817,6 +818,7 @@ extern struct packed_git *parse_pack_index(unsigned char *sha1);
extern void prepare_packed_git(void);
extern void reprepare_packed_git(void);
extern void install_packed_git(struct packed_git *pack);
+extern void mark_ignore_packed(const char **);
extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
struct packed_git *packs);
diff --git a/revision.c b/revision.c
index 286e416..86f80da 100644
--- a/revision.c
+++ b/revision.c
@@ -1342,6 +1342,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
object = get_reference(revs, revs->def, sha1, 0);
add_pending_object_with_mode(revs, object, revs->def, mode);
}
+ if (revs->ignore_packed)
+ mark_ignore_packed(revs->ignore_packed);
/* Did the user ask for any diff output? Run the diff! */
if (revs->diffopt.output_format & ~DIFF_FORMAT_NO_OUTPUT)
diff --git a/sha1_file.c b/sha1_file.c
index 5b6e0f6..4a804c7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1937,6 +1937,27 @@ int matches_pack_name(struct packed_git *p, const char *name)
return 0;
}
+void mark_ignore_packed(const char **ignore_packed)
+{
+ struct packed_git *p;
+
+ if (!ignore_packed || !*ignore_packed)
+ return;
+
+ prepare_packed_git();
+ if (!packed_git)
+ return;
+
+ for (p = packed_git; p; p = p->next) {
+ const char **ig;
+ for (ig = ignore_packed; *ig; ig++)
+ if (matches_pack_name(p, *ig)) {
+ p->pack_ignore = 1;
+ break;
+ }
+ }
+}
+
static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, const char **ignore_packed)
{
static struct packed_git *last_found = (void *)1;
@@ -1949,14 +1970,8 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, cons
p = (last_found == (void *)1) ? packed_git : last_found;
do {
- if (ignore_packed) {
- const char **ig;
- for (ig = ignore_packed; *ig; ig++)
- if (matches_pack_name(p, *ig))
- break;
- if (*ig)
- goto next;
- }
+ if (ignore_packed && p->pack_ignore)
+ goto next;
if (p->num_bad_objects) {
unsigned i;
prev parent reply other threads:[~2009-02-19 21:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-19 20:24 Really slow 'git gc' Linus Torvalds
2009-02-19 21:14 ` Junio C Hamano
2009-02-19 21:25 ` Linus Torvalds
2009-02-28 9:15 ` [PATCH 0/6] "git repack -a -d" improvements Junio C Hamano
2009-02-28 9:15 ` [PATCH 1/6] git-repack: resist stray environment variable Junio C Hamano
2009-02-28 9:15 ` [PATCH 2/6] has_sha1_pack(): refactor "pretend these packs do not exist" interface Junio C Hamano
2009-02-28 9:15 ` [PATCH 3/6] has_sha1_kept_pack(): take "struct rev_info" Junio C Hamano
2009-02-28 9:15 ` [PATCH 4/6] Consolidate ignore_packed logic more Junio C Hamano
2009-02-28 9:15 ` [PATCH 5/6] Simplify is_kept_pack() Junio C Hamano
2009-02-28 9:15 ` [PATCH 6/6] is_kept_pack(): final clean-up Junio C Hamano
2009-02-28 12:29 ` [PATCH 0/6] "git repack -a -d" improvements Kjetil Barvik
2009-02-28 17:41 ` Junio C Hamano
[not found] ` <7Vazs5mFk91IKAarOd0wrBNmYj7eSJxVIcR0PEQxJl8R0aQmQDEqSJMphMrXhmVu570fijupQ34@cipher.nrlssc.navy.mil>
2009-03-18 20:59 ` [PATCH] t7700-repack: repack -a now works properly, expect success from test Brandon Casey
2009-03-20 3:47 ` [PATCH 0/5] repack improvements Brandon Casey
2009-03-20 3:47 ` [PATCH 1/5] t7700-repack: add two new tests demonstrating repacking flaws Brandon Casey
2009-03-20 3:47 ` [PATCH 2/5] git-repack.sh: don't use --kept-pack-only option to pack-objects Brandon Casey
2009-03-20 3:47 ` [PATCH 3/5] pack-objects: only repack or loosen objects residing in "local" packs Brandon Casey
2009-03-20 3:47 ` [PATCH 4/5] t7700-repack: repack -a now works properly, expect success from test Brandon Casey
2009-03-20 3:47 ` [PATCH 5/5] Remove --kept-pack-only option and associated infrastructure Brandon Casey
2009-03-20 4:05 ` [PATCH 0/5] repack improvements Brandon Casey
2009-02-19 21:34 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vk57mkt5j.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).