git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Fredrik Kuivinen <freku045@student.liu.se>, git@vger.kernel.org
Subject: Re: git-diff-tree -M performance regression in 'next'
Date: Sun, 12 Mar 2006 17:39:20 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0603121733350.3618@g5.osdl.org> (raw)
In-Reply-To: <7vhd63w33n.fsf@assigned-by-dhcp.cox.net>



On Sun, 12 Mar 2006, Junio C Hamano wrote:
> 
> The code uses close to 16-bit hash and I had 65k flat array as a
> hashtable.  That one was what you commented as "4-times as many
> page misses".

Ahh. That explains the limited bits in the hash function too. I only 
looked at the current sources, not at the historic ones.

Btw, the page misses may come from the fact that you allocated and 
re-allocated the flat array all the time. That can be very expensive for 
big allocations, since most libraries may decide that it's a big enough 
area that it should be map/unmap'ed in order to give memory back to the 
system (without realizing that there's another allocation coming soon 
afterwards of the same size).

If you map/unmap, the kernel will end up having to not just use new pages, 
but obviously also clear them for security reasons. So it ends up sucking 
on many levels. In contrast, if you just have a 64k-entry array of "int", 
and allocate it _once_ (instead of once per file-pair) you'll still have 
to clear it in between file-pairs, but at least you won't have the 
overhead of mapping/unmapping it.

The clearing can still be pretty expensive (64k "int" entries is 256kB, 
and since most _files_ are just in the ~4-8kB range, you're spending a 
_lot_ of time just memset'ing). Which is why it's probably a good idea to 
instead default to having just "filesize / 8" entries, but then you can 
obviously not use the hash as the index any more.

		Linus

  reply	other threads:[~2006-03-13  1:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-11 17:28 git-diff-tree -M performance regression in 'next' Fredrik Kuivinen
2006-03-12  3:10 ` Junio C Hamano
2006-03-12 12:28 ` Junio C Hamano
2006-03-12 17:00   ` Linus Torvalds
2006-03-12 19:34     ` Junio C Hamano
2006-03-13  0:42       ` Junio C Hamano
2006-03-13  1:09         ` Linus Torvalds
2006-03-13  1:22           ` Junio C Hamano
2006-03-13  1:39             ` Linus Torvalds [this message]
2006-03-13  1:29           ` Linus Torvalds
2006-03-13  1:31             ` Linus Torvalds
2006-03-13  2:29             ` Linus Torvalds
2006-03-13  2:53               ` Linus Torvalds
2006-03-13  4:14             ` Junio C Hamano
2006-03-14  2:55               ` Junio C Hamano
2006-03-14  3:47                 ` Linus Torvalds
2006-03-14 10:26                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0603121733350.3618@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=freku045@student.liu.se \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).