git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jim Meyering <jim@meyering.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes
Date: Mon, 16 Oct 2006 09:36:06 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610160932100.7697@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0610160904400.3962@g5.osdl.org>

On Mon, 16 Oct 2006, Linus Torvalds wrote:

> On Mon, 16 Oct 2006, Linus Torvalds wrote:
> > 
> > But it could certainly also be that you just broke the diffs entirely, so 
> > I would like to wait for Davide to comment on your diff before Junio 
> > should apply it. 
> 
> I think you broke it. 
> 
> If the "&& vs ||" makes a difference (and it clearly does), that implies 
> that you have lots of different hash values on the same hash chain, and 
> you end up considering those _different_ hash values to be all equivalent 
> for the counting, even though they obviously aren't.
> 
> I think the real problem is that with big input, the hash tables are too 
> small, making the hash chains too long - even though the values on the 
> chains are different (ie we're not hashing different records with the same 
> hash value over and over again - if that was true, the "&& vs ||" change 
> wouldn't make any difference).
> 
> So I think xdiff has chosen too small a hash. Can you try what happens if 
> you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return 
> a bigger value (for example, by initializing "bits" to 2 instead of 0), 
> and see if that makes a difference.

I think the xdl_hashbits() picks up the hash table size "almost" 
correctly. I think we're looking at some bad hash *collisions* (not 
records with same hash value, that'd be stopped by the mlim check). 
Send me the files and I'll take a look ...




> But again, I'm not actually all _that_ familiar with the libxdiff 
> algorithms, _especially_ the line-based ones (I can follow the regular 
> binary delta code, but the line-based one just makes my head hurt). So 
> take anything I say with a pinch of salt.

That's my revenge on myself having to follow your code in the kernel  :D




- Davide

  parent reply	other threads:[~2006-10-16 16:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-16 14:12 git-diff-tree inordinately (O(M*N)) slow on files with many changes Jim Meyering
2006-10-16 15:47 ` Linus Torvalds
2006-10-16 16:12   ` Linus Torvalds
2006-10-16 16:33     ` Jim Meyering
2006-10-16 16:42       ` Davide Libenzi
2006-10-16 16:50         ` Jim Meyering
2006-10-16 16:54           ` Davide Libenzi
2006-10-16 16:57             ` Jim Meyering
2006-10-16 17:02               ` Davide Libenzi
2006-10-16 17:56           ` Linus Torvalds
2006-10-16 18:03             ` Linus Torvalds
2006-10-16 18:41               ` Davide Libenzi
2006-10-16 18:18             ` Davide Libenzi
2006-10-16 18:51               ` Linus Torvalds
2006-10-16 19:44                 ` Davide Libenzi
2006-10-16 20:29                   ` Jakub Narebski
2006-10-16 22:53                 ` Junio C Hamano
2006-10-16 23:24                   ` Linus Torvalds
2006-10-16 23:52                     ` Davide Libenzi
2006-10-16 18:24             ` Jim Meyering
2006-10-16 18:30               ` Davide Libenzi
2006-10-16 18:43                 ` Jim Meyering
2006-10-16 16:54       ` Linus Torvalds
2006-10-16 16:36     ` Davide Libenzi [this message]
2006-10-16 16:57       ` Linus Torvalds
2006-10-16 16:24   ` Davide Libenzi
2006-10-16 16:54     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610160932100.7697@alien.or.mcafeemobile.com \
    --to=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=jim@meyering.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).