From: Linus Torvalds <torvalds@linux-foundation.org>
To: Abhijit Menon-Sen <ams@toroid.org>,
Pierre Habouzit <madcoder@debian.org>,
Davide Libenzi <davidel@xmailserver.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: absurdly slow git-diff
Date: Fri, 7 Nov 2008 13:37:29 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.2.00.0811071335010.3468@nehalem.linux-foundation.org> (raw)
In-Reply-To: <20081107200126.GA20284@toroid.org>
On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote:
>
> If anyone's interested, the files are http://toroid.org/misc/1 and
> http://toroid.org/misc/2
Btw, you can see this by just doing
git diff 1 2
without even doing "git init" or doing any actual git repository.
> Does anyone understand why this slowdown might happen or have
> suggestions about where I should look for it?
Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly
higher), and it's triggered by the fact that almost all your lines are
identical, ie you have a file that basically has 40,000 lines of each of
xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx
xx: xxxx xxxxxxxxxxx <xxxx@xxxx.xxx>
xxxx: xxxx xxxxxxxxxxx <xxxx@xxxx.xxx>
and 30,000 of
* xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx}
with a smattering of others. And this is a case where the internal git
implementation does really badly. And nobody has really cared before,
because nobody has ever had a case that mattered.
There's a number of different 'diff' algorithms, and it looks like GNU
diff has one that avoids the O(n^2) case for this case.
I'm adding Davide as the original author of the diff library to the cc.
I'm also adding Pierre, since he was talking about trying to implement
another diff algorithm (although I'm not at all sure that the patience
diff really would help this case at all).
Linus
next prev parent reply other threads:[~2008-11-07 21:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-07 20:01 absurdly slow git-diff Abhijit Menon-Sen
2008-11-07 21:28 ` Mike Hommey
2008-11-07 21:37 ` Linus Torvalds [this message]
2008-11-07 23:04 ` Davide Libenzi
2008-11-07 23:18 ` Davide Libenzi
2008-11-07 23:42 ` Linus Torvalds
2008-11-07 23:48 ` Davide Libenzi
2008-11-07 23:57 ` Linus Torvalds
2008-11-08 4:57 ` Abhijit Menon-Sen
2008-11-08 21:02 ` Junio C Hamano
2008-11-08 5:30 ` Junio C Hamano
2008-11-08 16:27 ` Davide Libenzi
2008-11-08 0:14 ` Pierre Habouzit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0811071335010.3468@nehalem.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=ams@toroid.org \
--cc=davidel@xmailserver.org \
--cc=git@vger.kernel.org \
--cc=madcoder@debian.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox