From: Linus Torvalds <torvalds@osdl.org>
To: Jim Meyering <jim@meyering.net>
Cc: Davide Libenzi <davidel@xmailserver.org>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes
Date: Mon, 16 Oct 2006 09:54:11 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0610160948450.3962@g5.osdl.org> (raw)
In-Reply-To: <87mz7wp6ek.fsf@rho.meyering.net>
On Mon, 16 Oct 2006, Jim Meyering wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> > On Mon, 16 Oct 2006, Linus Torvalds wrote:
> ...
> > So I think xdiff has chosen too small a hash. Can you try what happens if
> > you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return
> > a bigger value (for example, by initializing "bits" to 2 instead of 0),
> > and see if that makes a difference.
>
> It makes no difference.
>
> Bear in mind that there are a *lot* of duplicate lines in the files
> being compared: filtering each through "sort -u" removes 40-50k lines.
It can't be due to duplicate lines. If the lines are truly duplicate, then
they'd get the same 32-bit hash value, and then the first conditional in
the expression would always be true, and then it wouldn't _matter_ if it's
a "&&" or a "||".
See?
So as far as I can tell it has to be some kind of collission on the hash
queue with _different_ hash values being queued on the same hash queue.
Now, it could be that there's a bad hash algorithm somewhere (eg if
XDL_HASHLONG() just does horribly badly in distributing the hash values
onto the hash queues, you'd see this _regardless_ of how many bits you
have, just because it clumps).
Or there could be something else that I'm just missing..
It would probably be nice to just get a sampling of what the hash-queue
looks like for the bad case? Maybe it would be obvious that certain
different hash values then get the same XDL_HASHLONG() thing..
Linus
next prev parent reply other threads:[~2006-10-16 16:54 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-16 14:12 git-diff-tree inordinately (O(M*N)) slow on files with many changes Jim Meyering
2006-10-16 15:47 ` Linus Torvalds
2006-10-16 16:12 ` Linus Torvalds
2006-10-16 16:33 ` Jim Meyering
2006-10-16 16:42 ` Davide Libenzi
2006-10-16 16:50 ` Jim Meyering
2006-10-16 16:54 ` Davide Libenzi
2006-10-16 16:57 ` Jim Meyering
2006-10-16 17:02 ` Davide Libenzi
2006-10-16 17:56 ` Linus Torvalds
2006-10-16 18:03 ` Linus Torvalds
2006-10-16 18:41 ` Davide Libenzi
2006-10-16 18:18 ` Davide Libenzi
2006-10-16 18:51 ` Linus Torvalds
2006-10-16 19:44 ` Davide Libenzi
2006-10-16 20:29 ` Jakub Narebski
2006-10-16 22:53 ` Junio C Hamano
2006-10-16 23:24 ` Linus Torvalds
2006-10-16 23:52 ` Davide Libenzi
2006-10-16 18:24 ` Jim Meyering
2006-10-16 18:30 ` Davide Libenzi
2006-10-16 18:43 ` Jim Meyering
2006-10-16 16:54 ` Linus Torvalds [this message]
2006-10-16 16:36 ` Davide Libenzi
2006-10-16 16:57 ` Linus Torvalds
2006-10-16 16:24 ` Davide Libenzi
2006-10-16 16:54 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0610160948450.3962@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=davidel@xmailserver.org \
--cc=git@vger.kernel.org \
--cc=jim@meyering.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).