Git development
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Abhijit Menon-Sen <ams@toroid.org>,
	Pierre Habouzit <madcoder@debian.org>,
	Davide Libenzi <davidel@xmailserver.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: absurdly slow git-diff
Date: Fri, 7 Nov 2008 13:37:29 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.0811071335010.3468@nehalem.linux-foundation.org> (raw)
In-Reply-To: <20081107200126.GA20284@toroid.org>


On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote:
>
> If anyone's interested, the files are http://toroid.org/misc/1 and
> http://toroid.org/misc/2

Btw, you can see this by just doing

	git diff 1 2

without even doing "git init" or doing any actual git repository.

> Does anyone understand why this slowdown might happen or have
> suggestions about where I should look for it?

Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly 
higher), and it's triggered by the fact that almost all your lines are 
identical, ie you have a file that basically has 40,000 lines of each of

	xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx
	xx: xxxx xxxxxxxxxxx <xxxx@xxxx.xxx>
	xxxx: xxxx xxxxxxxxxxx <xxxx@xxxx.xxx>

and 30,000 of 

	* xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx}

with a smattering of others. And this is a case where the internal git 
implementation does really badly. And nobody has really cared before, 
because nobody has ever had a case that mattered.

There's a number of different 'diff' algorithms, and it looks like GNU 
diff has one that avoids the O(n^2) case for this case.

I'm adding Davide as the original author of the diff library to the cc. 
I'm also adding Pierre, since he was talking about trying to implement
another diff algorithm (although I'm not at all sure that the patience 
diff really would help this case at all).

		Linus

  parent reply	other threads:[~2008-11-07 21:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-07 20:01 absurdly slow git-diff Abhijit Menon-Sen
2008-11-07 21:28 ` Mike Hommey
2008-11-07 21:37 ` Linus Torvalds [this message]
2008-11-07 23:04   ` Davide Libenzi
2008-11-07 23:18     ` Davide Libenzi
2008-11-07 23:42       ` Linus Torvalds
2008-11-07 23:48         ` Davide Libenzi
2008-11-07 23:57           ` Linus Torvalds
2008-11-08  4:57             ` Abhijit Menon-Sen
2008-11-08 21:02             ` Junio C Hamano
2008-11-08  5:30           ` Junio C Hamano
2008-11-08 16:27             ` Davide Libenzi
2008-11-08  0:14   ` Pierre Habouzit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0811071335010.3468@nehalem.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=ams@toroid.org \
    --cc=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=madcoder@debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox