From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
Davide Libenzi <davidel@xmailserver.org>
Cc: Daniel Berlin <dberlin@dberlin.org>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git annotate runs out of memory
Date: Tue, 11 Dec 2007 16:02:45 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.0.9999.0712111548200.25032@woody.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.0.9999.0712111523210.25032@woody.linux-foundation.org>
On Tue, 11 Dec 2007, Linus Torvalds wrote:
>
> and while I suspect xdiff could be optimized a bit more for the cases
> where we have no changes at the end, that's beyond my skills.
Ok, I lied.
Nothing is beyond my skills. My mad k0der skillz are unbeatable.
This speeds up git-blame on ChangeLog-style files by a big amount, by just
ignoring the common end that we don't care about, since we don't want any
context anyway at that point. So I now get:
[torvalds@woody gcc]$ time git blame gcc/ChangeLog > /dev/null
real 0m7.031s
user 0m6.852s
sys 0m0.180s
which seems quite reasonable, and is about three times faster than trying
to diff those big files.
Davide: this really _does_ make a huge difference. Maybe xdiff itself
should do this optimization on its own, rather than have the caller hack
around the fact that xdiff doesn't handle this common case all that well?
The same thing obviously works for the beginning-of-file too, but then you
have to play games with line numbers being affected etc, so the end is the
rather much easier case and is the case that a ChangeLog-style file cares
about.
Daniel, this is obviously on top of the patches that fix the memory leak.
Linus
---
diff --git a/builtin-blame.c b/builtin-blame.c
index c158d31..677188c 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -543,6 +551,20 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
return state.ret;
}
+#define BLOCK 1024
+
+static void truncate_common_data(mmfile_t *a, mmfile_t *b)
+{
+ long l1 = a->size, l2 = b->size;
+
+ while ((l1 -= BLOCK) > 0 && (l2 -= BLOCK) > 0) {
+ if (memcmp(a->ptr + l1, b->ptr + l2, BLOCK))
+ break;
+ a->size = l1;
+ b->size = l2;
+ }
+}
+
/*
* Run diff between two origins and grab the patch output, so that
* we can pass blame for lines origin is currently suspected for
@@ -557,6 +579,7 @@ static struct patch *get_patch(struct origin *parent, struct origin *origin)
fill_origin_blob(origin, &file_o);
if (!file_p.ptr || !file_o.ptr)
return NULL;
+ truncate_common_data(&file_p, &file_o);
patch = compare_buffer(&file_p, &file_o, 0);
num_get_patch++;
return patch;
next prev parent reply other threads:[~2007-12-12 0:03 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-11 17:33 git annotate runs out of memory Daniel Berlin
2007-12-11 17:47 ` Nicolas Pitre
2007-12-11 17:53 ` Daniel Berlin
2007-12-11 18:01 ` Nicolas Pitre
2007-12-11 18:32 ` Marco Costalba
2007-12-11 19:03 ` Daniel Berlin
2007-12-11 19:14 ` Marco Costalba
2007-12-11 19:27 ` Jason Sewall
2007-12-11 19:46 ` Daniel Barkalow
2007-12-11 20:14 ` Marco Costalba
2007-12-11 18:40 ` Linus Torvalds
2007-12-11 19:01 ` Matthieu Moy
2007-12-11 19:22 ` Linus Torvalds
2007-12-11 19:24 ` Daniel Berlin
2007-12-11 19:42 ` Pierre Habouzit
2007-12-11 21:09 ` Daniel Berlin
2007-12-11 23:37 ` Matthieu Moy
2007-12-11 23:48 ` Linus Torvalds
2007-12-11 19:06 ` Nicolas Pitre
2007-12-11 20:31 ` Jon Smirl
2007-12-11 19:09 ` Daniel Berlin
2007-12-11 19:26 ` Daniel Barkalow
2007-12-11 19:34 ` Pierre Habouzit
2007-12-11 19:59 ` Junio C Hamano
2007-12-11 19:42 ` Linus Torvalds
2007-12-11 19:50 ` Linus Torvalds
2007-12-11 21:14 ` Daniel Berlin
2007-12-11 21:34 ` Linus Torvalds
2007-12-12 7:57 ` Jeff King
2007-12-17 23:24 ` Jan Hudec
2007-12-18 0:05 ` Linus Torvalds
2007-12-11 21:14 ` Linus Torvalds
2007-12-11 21:54 ` Junio C Hamano
2007-12-11 23:36 ` Linus Torvalds
2007-12-12 0:02 ` Linus Torvalds [this message]
2007-12-12 0:22 ` Davide Libenzi
2007-12-12 0:50 ` Linus Torvalds
2007-12-12 1:12 ` Davide Libenzi
2007-12-12 2:10 ` Linus Torvalds
2007-12-12 3:35 ` Linus Torvalds
2007-12-12 0:56 ` Junio C Hamano
2007-12-12 2:20 ` Linus Torvalds
2007-12-12 2:39 ` Linus Torvalds
2007-12-12 19:43 ` Daniel Berlin
2007-12-12 4:48 ` Junio C Hamano
2007-12-11 21:24 ` Daniel Berlin
2007-12-12 3:57 ` Shawn O. Pearce
2007-12-11 20:29 ` Marco Costalba
2007-12-11 19:29 ` Steven Grimm
2007-12-11 20:14 ` Jakub Narebski
2007-12-12 10:36 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.9999.0712111548200.25032@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=davidel@xmailserver.org \
--cc=dberlin@dberlin.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).