From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Carlos R. Mafra" <crmafra2@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: Performance issue of 'git branch'
Date: Wed, 22 Jul 2009 20:21:01 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0907222009340.21520@localhost.localdomain> (raw)
In-Reply-To: <alpine.LFD.2.01.0907221959330.21520@localhost.localdomain>
On Wed, 22 Jul 2009, Linus Torvalds wrote:
>
> The GIT_DEBUG_LOOKUP debug output probably does match the number of
> cold-cache IO's fairly well for something like this (at least to a first
> approximation), so I really hope my patch will fix your problem.
Side note: the object lookup binary search we do is simple and reasonably
efficient, but it is _not_ very cache-friendly (where "cache-friendly"
also in this case means IO caches).
There are more cache-friendly ways of searching, although the really
clever ones would require us to switch the format of the pack-file index
around. Which would be a fairly big pain (in addition to making the lookup
a lot more complex).
The _simpler_ cache-friendly alternative is likely to try the "guess
location by assuming the SHA1's are evenly spread out" thing doesn't jump
back-and-forth like a binary search does.
We tried it a few years ago, but didn't do cold-cache numbers. And
repositories were smaller too.
With something like the kernel repo, with 1.2+ million objects, a binary
search needs about 21 comparisons for each object we look up. The index
has a first-level fan-out of 256, so that takes away 8 of them, but we're
still talking about 13 comparisons. With bad locality except for the very
last ones.
Assuming a 4kB page-size, and about 170 index entries per page (~7 binary
search levels), that's 6 pages we have to page-fault in for each search.
And we probably won't start seeing lots of cache reuse until we hit
hundreds or thousands of objects searched for.
With soemthing like "three iterations of newton-raphson + linear search",
we might end up with more index entries looked at, but we'd quite possibly
get much better locality.
I suspect the old newton-raphson patches we had (Discussions and patches
back in April 2007 on this list) could be resurrected pretty easily.
Linus
next prev parent reply other threads:[~2009-07-23 3:21 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-22 23:59 Performance issue of 'git branch' Carlos R. Mafra
2009-07-23 0:21 ` Linus Torvalds
2009-07-23 0:51 ` Linus Torvalds
2009-07-23 0:55 ` Linus Torvalds
2009-07-23 2:02 ` Carlos R. Mafra
2009-07-23 2:28 ` Linus Torvalds
2009-07-23 12:42 ` Jakub Narebski
2009-07-23 14:45 ` Carlos R. Mafra
2009-07-23 16:25 ` Linus Torvalds
2009-07-23 1:22 ` Carlos R. Mafra
2009-07-23 2:20 ` Linus Torvalds
2009-07-23 2:23 ` Linus Torvalds
2009-07-23 3:08 ` Linus Torvalds
2009-07-23 3:21 ` Linus Torvalds [this message]
2009-07-23 17:47 ` Tony Finch
2009-07-23 18:57 ` Linus Torvalds
2009-07-23 22:48 ` Newton-Raphson, was " Tony Finch
2009-07-23 23:24 ` Johannes Schindelin
2009-07-23 23:50 ` Tony Finch
2009-07-24 0:43 ` Johannes Schindelin
2009-07-23 3:18 ` Carlos R. Mafra
2009-07-23 3:27 ` Carlos R. Mafra
2009-07-23 3:40 ` Carlos R. Mafra
2009-07-23 3:47 ` Linus Torvalds
2009-07-23 4:10 ` Linus Torvalds
2009-07-23 5:13 ` Junio C Hamano
2009-07-23 5:17 ` Carlos R. Mafra
2009-07-23 4:40 ` Junio C Hamano
2009-07-23 5:36 ` Linus Torvalds
2009-07-23 5:52 ` Junio C Hamano
2009-07-23 6:04 ` Junio C Hamano
2009-07-23 17:19 ` Linus Torvalds
2009-07-23 16:07 ` Carlos R. Mafra
2009-07-23 16:19 ` Linus Torvalds
2009-07-23 16:53 ` Carlos R. Mafra
2009-07-23 19:05 ` Linus Torvalds
2009-07-23 19:13 ` Linus Torvalds
2009-07-23 19:55 ` Carlos R. Mafra
2009-07-24 20:36 ` Linus Torvalds
2009-07-24 20:47 ` Linus Torvalds
2009-07-24 21:21 ` Linus Torvalds
2009-07-24 22:13 ` Linus Torvalds
2009-07-24 22:18 ` david
2009-07-24 22:42 ` Linus Torvalds
2009-07-24 22:46 ` david
2009-07-25 2:39 ` Linus Torvalds
2009-07-25 2:53 ` Daniel Barkalow
2009-08-07 4:21 ` Jeff King
2009-07-24 22:54 ` Theodore Tso
2009-07-24 22:59 ` Shawn O. Pearce
2009-07-24 23:28 ` Junio C Hamano
2009-07-26 17:07 ` Avi Kivity
2009-07-26 17:16 ` Johannes Schindelin
2009-07-24 23:46 ` Carlos R. Mafra
2009-07-25 0:41 ` Carlos R. Mafra
2009-07-25 18:04 ` Linus Torvalds
2009-07-25 18:57 ` Timo Hirvonen
2009-07-25 19:06 ` Reece Dunn
2009-07-25 20:31 ` Mike Hommey
2009-07-25 21:02 ` Linus Torvalds
2009-07-25 21:13 ` Linus Torvalds
2009-07-25 23:23 ` Johannes Schindelin
2009-07-26 4:49 ` Linus Torvalds
2009-07-26 16:29 ` Theodore Tso
2009-07-26 7:54 ` Mike Hommey
2009-07-26 10:16 ` Johannes Schindelin
2009-07-26 10:23 ` demerphq
2009-07-26 10:27 ` demerphq
2009-07-25 21:04 ` Carlos R. Mafra
2009-07-23 16:48 ` Anders Kaseorg
2009-07-23 19:03 ` Carlos R. Mafra
2009-07-23 0:23 ` SZEDER Gábor
2009-07-23 2:25 ` Carlos R. Mafra
-- strict thread matches above, loose matches on Subject: below --
2009-07-26 23:21 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0907222009340.21520@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=crmafra2@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).