From: Linus Torvalds <torvalds@linux-foundation.org>
To: linux@horizon.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/3] Lazily open pack index files on demand
Date: Mon, 28 May 2007 20:26:42 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.0.98.0705282014480.26602@woody.linux-foundation.org> (raw)
In-Reply-To: <20070529000949.28007.qmail@science.horizon.com>
On Mon, 28 May 2007, linux@horizon.com wrote:
>
> Even losing a constant factor of 2, it still seems like it might offer a
> factor-of-2 speedup for large repositories.
Well, not so much according to the numbers I had.
Yes, SHA-1's are very nicely distributed on a larger scale, but on a
_small_ scale they aren't.
So you end up being able to get very good initial guesses for the first
few iterations, but once you get "close enough", you can't do anything at
all, and then you're actually worse off.
Also, please do realize that the binary search is actually a *smart*
binary search, which does a radix-256 fan-out at the beginning, getting
rid of the first 8 levels of searching for free.
The Newton-Raphson thing can do that too (and in my trial patch, did), but
it means that you actually get rid of just the initial guess (the one that
worked the best!) and you still have the problem that once you get close
enough, the distribution at a local level is not at all nice and linear.
So what I did with my patch (you can find it in the archives - search for
Newton-Raphson, I'd say about 3 months ago) was to do three cycles of
approximating, and then a linear search. The linear search has good cache
behaviour, so it's not as horrid as it might sound, but for some numbers I
did, my approximate thing would hit on the exact first entry about half
the time, but would have to walk up to ~20 entries occasionally.
Which meant that the binary search (with the initial radix-256 fan-out)
actually mostly outperformed the Newton-Raphson thing.
Also, object lookup is certainly noticeable, but the biggest cost of it is
the cache misses, and even then it's not really _that_ noticeable. And
it's really neither slow nor stupid as it is.
So I'd love to see somebody try to do a _proper_ apprixmation of Newton-
Raphson, not the quick hack I did. But I think factor-of-two is actually
optimistic, even for pretty large repos. And it would have to be no worse
than what we have now for average-sized ones.
(And object lookup time is generally not the dominant cost, so while it's
noticeable, cutting it down by even a whole 50% isn't going to speed up
any normal git operations by more than a couple of percent.. Generating
diffs in particular is a much more costly op for things like "git blame")
Linus
next prev parent reply other threads:[~2007-05-29 3:26 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 0:09 [PATCH 1/3] Lazily open pack index files on demand linux
2007-05-29 3:26 ` Linus Torvalds [this message]
-- strict thread matches above, loose matches on Subject: below --
2007-05-27 10:46 Martin Koegler
2007-05-27 15:36 ` Nicolas Pitre
2007-05-26 5:24 Shawn O. Pearce
2007-05-26 8:29 ` Junio C Hamano
2007-05-26 17:30 ` Shawn O. Pearce
2007-05-26 17:31 ` Dana How
2007-05-27 2:43 ` Nicolas Pitre
2007-05-27 4:31 ` Dana How
2007-05-27 14:41 ` Nicolas Pitre
2007-05-27 3:34 ` Shawn O. Pearce
2007-05-27 4:40 ` Dana How
2007-05-27 15:29 ` Nicolas Pitre
2007-05-27 21:35 ` Shawn O. Pearce
2007-05-28 1:35 ` Dana How
2007-05-28 2:30 ` A Large Angry SCM
2007-05-28 18:31 ` Nicolas Pitre
2007-05-28 2:18 ` Nicolas Pitre
2007-05-27 15:26 ` Nicolas Pitre
2007-05-27 16:06 ` Dana How
2007-05-27 21:52 ` Shawn O. Pearce
2007-05-27 23:35 ` Nicolas Pitre
2007-05-28 16:22 ` Linus Torvalds
2007-05-28 17:13 ` Nicolas Pitre
2007-05-28 17:40 ` Karl Hasselström
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.98.0705282014480.26602@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=linux@horizon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).