From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: three-way diff performance problem
Date: Tue, 21 Jul 2009 12:47:33 -0700 [thread overview]
Message-ID: <7vd47tes2y.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <7v7hy1g7vg.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Tue\, 21 Jul 2009 12\:21\:07 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> In fact, on the instruction-level profile, it looks like it's all spent on
>> four instructions:
>>
>> 3.51 : 45aec0: 4c 85 72 10 test %r14,0x10(%rdx)
>> 86.27 : 45aec4: 48 0f 45 ca cmovne %rdx,%rcx
>> : if (sline->lost_head) {
>> : struct lline *last_one = NULL;
>> : /* We cannot squash it with earlier one */
>> : for (lline = sline->lost_head;
>> : lline;
>> : lline = lline->next)
>> 6.69 : 45aec8: 48 8b 12 mov (%rdx),%rdx
>> :
>> : /* Check to see if we can squash things */
>> : if (sline->lost_head) {
>> : struct lline *last_one = NULL;
>> : /* We cannot squash it with earlier one */
>> : for (lline = sline->lost_head;
>> 3.51 : 45aecb: 48 85 d2 test %rdx,%rdx
>> 0.01 : 45aece: 75 f0 jne 45aec0 <consume_line+0xc0>
>> ...
> After feeding the first diff (forget the one with +m1,n1 example above),
> if the second diff looks like this:
>
> @@ -l,k +m2,n2 @@
> -another lost line
> -lost line
> common line
> +added line
>
> we match "-another lost line" with the second element in lost_head list,
> and when processing the next line "-lost line", we try to avoid matching
> it with the first one in the list, by stupidly scanning from the front of
> the list. I think we should add a member to struct sline to remember the
> last element in lost_head list for the current parent to skip that scan.
> Perhaps that scan is the one that is costing, not memcmp()?
Here is a patch to do that. I haven't tested it yet though.
combine-diff.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index bbf74fc..8194104 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -80,6 +80,7 @@ struct lline {
/* Lines surviving in the merge result */
struct sline {
struct lline *lost_head, **lost_tail;
+ struct lline *lost_last_for_current_parent;
char *bol;
int len;
/* bit 0 up to (N-1) are on if the parent has this line (i.e.
@@ -121,18 +122,15 @@ static void append_lost(struct sline *sline, int n, const char *line, int len)
/* Check to see if we can squash things */
if (sline->lost_head) {
- struct lline *last_one = NULL;
- /* We cannot squash it with earlier one */
- for (lline = sline->lost_head;
- lline;
- lline = lline->next)
- if (lline->parent_map & this_mask)
- last_one = lline;
- lline = last_one ? last_one->next : sline->lost_head;
+ if (sline->lost_last_for_current_parent)
+ lline = sline->lost_last_for_current_parent;
+ else
+ lline = sline->lost_head;
while (lline) {
if (lline->len == len &&
!memcmp(lline->line, line, len)) {
lline->parent_map |= this_mask;
+ sline->lost_last_for_current_parent = lline;
return;
}
lline = lline->next;
@@ -147,6 +145,7 @@ static void append_lost(struct sline *sline, int n, const char *line, int len)
lline->line[len] = 0;
*sline->lost_tail = lline;
sline->lost_tail = &lline->next;
+ sline->lost_last_for_current_parent = lline;
}
struct combine_diff_state {
@@ -187,6 +186,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
xcalloc(state->num_parent,
sizeof(unsigned long));
state->sline[state->nb-1].p_lno[state->n] = state->ob;
+ state->lost_bucket->lost_last_for_current_parent = NULL;
return;
}
if (!state->lost_bucket)
next prev parent reply other threads:[~2009-07-21 19:47 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-21 18:10 three-way diff performance problem Linus Torvalds
2009-07-21 18:16 ` Linus Torvalds
2009-07-21 19:21 ` Junio C Hamano
2009-07-21 19:31 ` Linus Torvalds
2009-07-21 19:47 ` Junio C Hamano [this message]
2009-07-21 20:34 ` Linus Torvalds
2009-07-21 20:46 ` Junio C Hamano
2009-07-21 21:01 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vd47tes2y.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).