From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Some git performance measurements..
Date: Wed, 28 Nov 2007 22:59:37 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.0.99999.0711282244190.9605@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.0.9999.0711281852160.8458@woody.linux-foundation.org>
On Wed, 28 Nov 2007, Linus Torvalds wrote:
> - the index accesses are much more "random": the initial 256-way fan-out
> followed by the binary search causes the access patterns to look very
> different:
>
> 0: 28367707
> 136: 18867574
> 140: 221280
> 141: 745890
> 142: 284427
> 143: 338
> 381: 9787459
> 377: 394
> 375: 255
> 376: 248
> 3344: 29885989
> 3347: 334
> 3346: 255
> 3684: 7251911
> 1055: 12954064
> 1052: 386
> 1050: 251
> 1049: 240
> 1947: 10501455
> 1944: 382
> 1946: 262
>
> where it doesn't even read-ahead at all in the beginning (because it
> looks entirely random), but the kernel eventually *does* actually go
> into read-ahead mode pretty soon simply because once it gets into the
> binary search thing, the data entries are close enough to be in
> adjacent pages, and it all looks ok.
Did you try with version 2 of the pack index? Because it should have
somewhat better locality as the object SHA1 and their offset are split
into separate tables.
> That said, I think there's something subtly wrong in our pack-file
> sorting, and it should be more contiguous when we just do tree object
> accesses on the top commit. I was really hoping that all the top-level
> trees should be written entirely together, but I wonder if the "write out
> deltas first" thing causes us to have those big gaps in between.
Tree objects aren't all together. Related blob objects are interlaced
with those tree objects. But for a checkout that should actually
correspond to a nice linear access.
And deltas aren't written first, but rather their base object. And
because deltas are based on newer objects, in theory the top commit
shouldn't have any delta at all, and the second commit should have all
the base objects for its deltas already written out a part of the first
commit. At least that's what a perfect data set would produce. Last
time I checked, there was about 20% of the deltas that happened to be in
the other direction, i.e. the deltified object was younger than its base
object, most probably because the new version of the file shrunk instead
of growing which is against the assumption in the delta search
object sort. But again, because the base object is needed to resolve
the delta, it will be read anyway.
Nicolas
next prev parent reply other threads:[~2007-11-29 3:59 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-29 2:49 Some git performance measurements Linus Torvalds
2007-11-29 3:14 ` Linus Torvalds
2007-11-29 3:59 ` Nicolas Pitre [this message]
2007-11-29 4:32 ` Linus Torvalds
2007-11-29 17:25 ` Nicolas Pitre
2007-11-29 17:48 ` Linus Torvalds
2007-11-29 18:52 ` Nicolas Pitre
2007-11-30 5:00 ` Junio C Hamano
2007-11-30 6:03 ` Linus Torvalds
2007-11-30 0:54 ` Jakub Narebski
2007-11-30 2:21 ` Linus Torvalds
2007-11-30 2:39 ` Jakub Narebski
2007-11-30 2:40 ` Nicolas Pitre
2007-11-30 6:11 ` Steffen Prohaska
2007-12-07 13:35 ` Mike Ralphson
2007-12-07 13:49 ` Johannes Schindelin
2007-12-07 16:07 ` Linus Torvalds
2007-12-07 16:09 ` Mike Ralphson
2007-12-07 18:37 ` Johannes Schindelin
2007-12-07 19:15 ` Mike Ralphson
2007-12-08 11:05 ` Johannes Schindelin
2007-12-08 23:04 ` Brian Downing
2007-11-30 2:54 ` Linus Torvalds
2007-12-05 1:04 ` Federico Mena Quintero
2007-12-01 11:36 ` Joachim B Haga
2007-12-01 17:19 ` Linus Torvalds
2007-11-29 5:17 ` Junio C Hamano
2007-11-29 10:17 ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.99999.0711282244190.9605@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).