From: Linus Torvalds <torvalds@osdl.org>
To: Jim Meyering <jim@meyering.net>,
Davide Libenzi <davidel@xmailserver.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes
Date: Mon, 16 Oct 2006 08:47:31 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0610160838200.3962@g5.osdl.org> (raw)
In-Reply-To: <87slhopcws.fsf@rho.meyering.net>
Davide? I'm quoting the whole report, because I suspect you don't follow
the git lists, and this is all original libxdiff code.
Jim: the annotation failure _may_ just be due to a "valid" diff change
(there is not always a unique correct answer for a diff, and so two
different diff algorithms can validly give two different answers, which
will also mean that git-annotate/blame would give different explanations).
But it could certainly also be that you just broke the diffs entirely, so
I would like to wait for Davide to comment on your diff before Junio
should apply it.
Others may be intimately familiar with the diff algorithms too, of course.
It just scares me personally ;)
Linus
On Mon, 16 Oct 2006, Jim Meyering wrote:
>
> [using very latest code, built an hour ago:
> git version 1.4.3.rc3.gb32db-dirty ]
>
> I found that git-diff-tree is *very* slow when processing files with
> many changes. The offending example involves comparing the configure
> file from coreutils-6.3 with that from the latest coreutils development
> sources. Both are over 50k lines long, and the diff -u output is almost
> 50k-lines long, divided into ~700 hunks.
>
> http://meyering.net/code/git-perf/configure.gz
> http://meyering.net/code/git-perf/configure-curr.gz
>
> Comparing them with "diff -u" takes about 0.3s.
> Putting them in a git repo (uncompressed, and with the same name,
> of course) and comparing with git-diff-tree takes over a minute.
> That's 200 times slower.
>
> I traced the problem to xdiff/xprepare.c's xdl_cleanup_records function.
> Each of its two doubly-nested loops ends up running the inner-loop
> code more than 2 *billion* times.
>
> That seems to be due to the two typos fixed by this patch:
> With this patch, my "git-diff-tree --no-commit-id -r -p 2c2172"
> command completes in just 2 seconds.
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 1be7b31..e5438a9 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -381,7 +381,7 @@ static int xdl_cleanup_records(xdfile_t
> hav = (*recs)->ha;
> rhi = (long) XDL_HASHLONG(hav, xdf2->hbits);
> for (nm = 0, rec = xdf2->rhash[rhi]; rec; rec = rec->next)
> - if (rec->ha == hav && ++nm == mlim)
> + if (rec->ha == hav || ++nm == mlim)
> break;
> dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> }
> @@ -392,7 +392,7 @@ static int xdl_cleanup_records(xdfile_t
> hav = (*recs)->ha;
> rhi = (long) XDL_HASHLONG(hav, xdf1->hbits);
> for (nm = 0, rec = xdf1->rhash[rhi]; rec; rec = rec->next)
> - if (rec->ha == hav && ++nm == mlim)
> + if (rec->ha == hav || ++nm == mlim)
> break;
> dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> }
>
> However, that change causes the t800*.sh (14,16,18) annotate/blame
> tests to fail, so take it with a grain of salt.
>
> I.e., running one of them manually gave this:
>
> $ sh t8001-annotate.sh --immediate --verbose
> * expecting success: check_count A 2 B 1 B1 2 B2 1 "A U Thor" 1
> Author A (expected 2, attributed 2) good
> Author B1 (expected 2, attributed 1) bad
> Author A U Thor (expected 1, attributed 2) bad
> Author B2 (expected 1, attributed 1) good
> Author B (expected 1, attributed 1) good
> * FAIL 14: Two lines blamed on A, one on B, two on B1, one on B2, one on A U Thor
> check_count A 2 B 1 B1 2 B2 1 "A U Thor" 1
> [Exit 1]
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2006-10-16 15:47 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-16 14:12 git-diff-tree inordinately (O(M*N)) slow on files with many changes Jim Meyering
2006-10-16 15:47 ` Linus Torvalds [this message]
2006-10-16 16:12 ` Linus Torvalds
2006-10-16 16:33 ` Jim Meyering
2006-10-16 16:42 ` Davide Libenzi
2006-10-16 16:50 ` Jim Meyering
2006-10-16 16:54 ` Davide Libenzi
2006-10-16 16:57 ` Jim Meyering
2006-10-16 17:02 ` Davide Libenzi
2006-10-16 17:56 ` Linus Torvalds
2006-10-16 18:03 ` Linus Torvalds
2006-10-16 18:41 ` Davide Libenzi
2006-10-16 18:18 ` Davide Libenzi
2006-10-16 18:51 ` Linus Torvalds
2006-10-16 19:44 ` Davide Libenzi
2006-10-16 20:29 ` Jakub Narebski
2006-10-16 22:53 ` Junio C Hamano
2006-10-16 23:24 ` Linus Torvalds
2006-10-16 23:52 ` Davide Libenzi
2006-10-16 18:24 ` Jim Meyering
2006-10-16 18:30 ` Davide Libenzi
2006-10-16 18:43 ` Jim Meyering
2006-10-16 16:54 ` Linus Torvalds
2006-10-16 16:36 ` Davide Libenzi
2006-10-16 16:57 ` Linus Torvalds
2006-10-16 16:24 ` Davide Libenzi
2006-10-16 16:54 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0610160838200.3962@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=davidel@xmailserver.org \
--cc=git@vger.kernel.org \
--cc=jim@meyering.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).