From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, git@vger.kernel.org
Subject: Re: Show of hands, how many set USE_NSEC
Date: Fri, 8 Aug 2008 10:42:06 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0808081027590.3462@nehalem.linux-foundation.org> (raw)
In-Reply-To: <20080808165718.GG9152@spearce.org>
On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
>
> Really I'd just like to scrap the entire DIRC file format and do
> it over again. Having the flat namespace is nuts. Linus and I
> really disagree here, and since I have never produced code for C
> Git to replace it (and prove why its better) I think he has me in
> his kill file now. :)
I really disagree, because you have no clue about performance.
The flat file format is absolutely _critical_ for performance.
Try this:
time cat .git/index > /dev/null
time git ls-tree -r HEAD > /dev/null
which isn't quite fair (but see how it's unfair both ways later), but
gives you an idea of the cost of recursively reading lots of small files.
Notice how the latter is about an order-and-a-half slower?
Then try it with a cold cache. If you thought the "ls-tree" was unfair
because it used things like zlib and pack-files, realize that the ls-tree
actually almost certainly has _better_ IO patterns than doing individual
files. So when you do the cold-cache case, we're actually giving an unfair
advantage to ls-tree (assuming fully packed repo like I have).
And it still loses in a big big way. If it was actually one file per
directory, it would be _horrible_. You could kind of approximate the IO
patterns for that case by doing
time find . -name don-t-exist > /dev/null
(which obviousyl uses "readdir()" instead of read(), but the IO patterns
should be similar).
The point is, the index file absolutely *HAS* to be a single file in order
to perform well in a big project. Otherwise there's no point, and you
might as well just use a git "tree" object for everything.
Now, if you talk about the _sorting_ order (as opposed to the flatness of
the file), I could probably agree. The sort order was probably a mistake.
That said, we're stuck with it. You can't change it without changing the
tree object format, so it's not just an "local index file" format issue.
Linus
next prev parent reply other threads:[~2008-08-08 17:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-08 16:34 Show of hands, how many set USE_NSEC Shawn O. Pearce
2008-08-08 16:55 ` Johannes Schindelin
2008-08-08 16:57 ` Shawn O. Pearce
2008-08-08 17:42 ` Linus Torvalds [this message]
2008-08-08 17:52 ` Shawn O. Pearce
2008-08-08 18:00 ` Linus Torvalds
2008-08-13 20:01 ` Robin Rosenberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0808081027590.3462@nehalem.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).