git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-diff-tree inordinately (O(M*N)) slow on files with many changes
@ 2006-10-16 14:12 Jim Meyering
  2006-10-16 15:47 ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Meyering @ 2006-10-16 14:12 UTC (permalink / raw)
  To: git

[using very latest code, built an hour ago:
 git version 1.4.3.rc3.gb32db-dirty ]

I found that git-diff-tree is *very* slow when processing files with
many changes.  The offending example involves comparing the configure
file from coreutils-6.3 with that from the latest coreutils development
sources.  Both are over 50k lines long, and the diff -u output is almost
50k-lines long, divided into ~700 hunks.

  http://meyering.net/code/git-perf/configure.gz
  http://meyering.net/code/git-perf/configure-curr.gz

Comparing them with "diff -u" takes about 0.3s.
Putting them in a git repo (uncompressed, and with the same name,
of course) and comparing with git-diff-tree takes over a minute.
That's 200 times slower.

I traced the problem to xdiff/xprepare.c's xdl_cleanup_records function.
Each of its two doubly-nested loops ends up running the inner-loop
code more than 2 *billion* times.

That seems to be due to the two typos fixed by this patch:
With this patch, my "git-diff-tree --no-commit-id -r -p 2c2172"
command completes in just 2 seconds.

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 1be7b31..e5438a9 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -381,7 +381,7 @@ static int xdl_cleanup_records(xdfile_t
 		hav = (*recs)->ha;
 		rhi = (long) XDL_HASHLONG(hav, xdf2->hbits);
 		for (nm = 0, rec = xdf2->rhash[rhi]; rec; rec = rec->next)
-			if (rec->ha == hav && ++nm == mlim)
+			if (rec->ha == hav || ++nm == mlim)
 				break;
 		dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
 	}
@@ -392,7 +392,7 @@ static int xdl_cleanup_records(xdfile_t
 		hav = (*recs)->ha;
 		rhi = (long) XDL_HASHLONG(hav, xdf1->hbits);
 		for (nm = 0, rec = xdf1->rhash[rhi]; rec; rec = rec->next)
-			if (rec->ha == hav && ++nm == mlim)
+			if (rec->ha == hav || ++nm == mlim)
 				break;
 		dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
 	}

However, that change causes the t800*.sh (14,16,18) annotate/blame
tests to fail, so take it with a grain of salt.

I.e., running one of them manually gave this:

    $ sh t8001-annotate.sh --immediate --verbose
    * expecting success: check_count A 2 B 1 B1 2 B2 1 "A U Thor" 1
    Author A (expected 2, attributed 2) good
    Author B1 (expected 2, attributed 1) bad
    Author A U Thor (expected 1, attributed 2) bad
    Author B2 (expected 1, attributed 1) good
    Author B (expected 1, attributed 1) good
    * FAIL 14: Two lines blamed on A, one on B, two on B1, one on B2, one on A U Thor
            check_count A 2 B 1 B1 2 B2 1 "A U Thor" 1
    [Exit 1]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-10-16 23:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-16 14:12 git-diff-tree inordinately (O(M*N)) slow on files with many changes Jim Meyering
2006-10-16 15:47 ` Linus Torvalds
2006-10-16 16:12   ` Linus Torvalds
2006-10-16 16:33     ` Jim Meyering
2006-10-16 16:42       ` Davide Libenzi
2006-10-16 16:50         ` Jim Meyering
2006-10-16 16:54           ` Davide Libenzi
2006-10-16 16:57             ` Jim Meyering
2006-10-16 17:02               ` Davide Libenzi
2006-10-16 17:56           ` Linus Torvalds
2006-10-16 18:03             ` Linus Torvalds
2006-10-16 18:41               ` Davide Libenzi
2006-10-16 18:18             ` Davide Libenzi
2006-10-16 18:51               ` Linus Torvalds
2006-10-16 19:44                 ` Davide Libenzi
2006-10-16 20:29                   ` Jakub Narebski
2006-10-16 22:53                 ` Junio C Hamano
2006-10-16 23:24                   ` Linus Torvalds
2006-10-16 23:52                     ` Davide Libenzi
2006-10-16 18:24             ` Jim Meyering
2006-10-16 18:30               ` Davide Libenzi
2006-10-16 18:43                 ` Jim Meyering
2006-10-16 16:54       ` Linus Torvalds
2006-10-16 16:36     ` Davide Libenzi
2006-10-16 16:57       ` Linus Torvalds
2006-10-16 16:24   ` Davide Libenzi
2006-10-16 16:54     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).