From: Linus Torvalds <torvalds@osdl.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Willy Tarreau <w@1wt.eu>, "H. Peter Anvin" <hpa@zytor.com>,
git@vger.kernel.org, nigel@nigel.suspend2.net,
"J.H." <warthog9@kernel.org>,
Randy Dunlap <randy.dunlap@oracle.com>,
Andrew Morton <akpm@osdl.org>, Pavel Machek <pavel@ucw.cz>,
kernel list <linux-kernel@vger.kernel.org>,
webmaster@kernel.org
Subject: Re: How git affects kernel.org performance
Date: Sun, 7 Jan 2007 11:13:06 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0701071028450.3661@woody.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0701070957080.3661@woody.osdl.org>
On Sun, 7 Jan 2007, Linus Torvalds wrote:
>
> A year or two ago I did a totally half-assed code for the non-hashed
> readdir that improved performance by an order of magnitude for ext3 for a
> test-case of mine, but it was subtly buggy and didn't do the hashed case
> AT ALL.
Btw, this isn't the test-case, but it's a half-way re-creation of
something like it. It's _really_ stupid, but here's what you can do:
- compile and run this idiotic program. It creates a directory called
"throwaway" that is ~44kB in size, and if I did things right, it should
not be totally contiguous on disk with the current ext3 allocation
logic.
- as root, do "echo 3 > /proc/sys/vm/drop_caches" to get a cache-cold
schenario.
- do "time ls throwaway > /dev/null".
I don't know what people consider to be reasonable performance, but for
me, it takes about half a second to do a simple "ls". NOTE! This is _not_
reading inode stat information or anything like that. It literally takes
0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s
throughput on a reasonably fast modern disk.
That's what we in the industry call "sad".
And that's on a totally unloaded machine. There was _nothing_ else going
on. No IO congestion, no nothing. Just the cost of synchronously doing
ten or eleven disk reads.
The fix?
- proper read-ahead. Right now, even if the directory is totally
contiguous on disk (just remove the thing that writes data to the
files, so that you'll have empty files instead of 8kB files), I think
we do those reads totally synchronously if the filesystem was mounted
with directory hashing enabled.
Without hashing, the directory will be much smaller too, so readdir()
will have less data to read. And it _should_ do some readahead,
although in my testing, the best I could do was still 0.185s for a (now
shrunken) 28kB directory.
- better directory block allocation patterns would likely help a lot,
rather than single blocks. That's true even without any read-ahead (at
least the disk wouldn't need to seek, and any on-disk track buffers etc
would work better), but with read-ahead and contiguous blocks it should
be just a couple of IO's (the indirect stuff means that it's more than
one), and so you should see much better IO patterns because the
elevator can try to help too.
Maybe I just have unrealistic expectations, but I really don't like how a
fairly small 50kB directory takes an appreciable fraction of a second to
read.
Once it's cached, it still takes too long, but at least at that point the
individual getdents calls take just tens of microseconds.
Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec
in the middle.. The 5-6ms range indicates a single IO for the intermediate
ones, which basically says that each call does roughly one IO, except the
first one that does ~5 (probably the indirect index blocks), and two in
the middle who are able to fill up the buffer from the IO done by the
previous one (4kB buffers, so if the previous getdents() happened to just
read the beginning of a block, the next one might be able to fill
everything from that block without having to do IO).
getdents(3, /* 103 entries */, 4096) = 4088 <0.034830>
getdents(3, /* 102 entries */, 4096) = 4080 <0.006703>
getdents(3, /* 102 entries */, 4096) = 4080 <0.006719>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000354>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000017>
getdents(3, /* 102 entries */, 4096) = 4080 <0.005302>
getdents(3, /* 102 entries */, 4096) = 4080 <0.016957>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000017>
getdents(3, /* 102 entries */, 4096) = 4080 <0.003530>
getdents(3, /* 83 entries */, 4096) = 3320 <0.000296>
getdents(3, /* 0 entries */, 4096) = 0 <0.000006>
Here's the pure CPU overhead: still pretty high (200 usec! For a single
system call! That's disgusting! In contrast, a 4kB read() call takes 7
usec on this machine, so the overhead of doing things one dentry at a
time, and calling down to several layers of filesystem is quite high):
getdents(3, /* 103 entries */, 4096) = 4088 <0.000204>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000122>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000112>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000153>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000018>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000103>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000217>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000018>
getdents(3, /* 102 entries */, 4096) = 4080 <0.000095>
getdents(3, /* 83 entries */, 4096) = 3320 <0.000089>
getdents(3, /* 0 entries */, 4096) = 0 <0.000006>
but you can see the difference.. The real cost is obviously the IO.
Linus
----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
static char buffer[8192];
static int create_file(const char *name)
{
int fd = open(name, O_RDWR | O_CREAT | O_TRUNC, 0666);
if (fd < 0)
return fd;
write(fd, buffer, sizeof(buffer));
close(fd);
return 0;
}
int main(int argc, char **argv)
{
int i;
char name[256];
/* Fill up the buffer with some random garbage */
for (i = 0; i < sizeof(buffer); i++)
buffer[i] = "abcdefghijklmnopqrstuvwxyz\n"[i % 27];
if (mkdir("throwaway", 0777) < 0 || chdir("throwaway") < 0) {
perror("throwaway");
exit(1);
}
/*
* Create a reasonably big directory by having a number
* of files with non-trivial filenames, and with some
* real content to fragment the directory blocks..
*/
for (i = 0; i < 1000; i++) {
snprintf(name, sizeof(name),
"file-name-%d-%d-%d-%d",
i / 1000,
(i / 100) % 10,
(i / 10) % 10,
(i / 1) % 10);
create_file(name);
}
return 0;
}
next prev parent reply other threads:[~2007-01-07 19:15 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20061214223718.GA3816@elf.ucw.cz>
[not found] ` <20061216094421.416a271e.randy.dunlap@oracle.com>
[not found] ` <20061216095702.3e6f1d1f.akpm@osdl.org>
[not found] ` <458434B0.4090506@oracle.com>
[not found] ` <1166297434.26330.34.camel@localhost.localdomain>
[not found] ` <1166304080.13548.8.camel@nigel.suspend2.net>
[not found] ` <459152B1.9040106@zytor.com>
[not found] ` <1168140954.2153.1.camel@nigel.suspend2.net>
2007-01-07 4:22 ` [KORG] Re: kernel.org lies about latest -mm kernel Jeff Garzik
2007-01-07 4:29 ` Linus Torvalds
2007-01-07 20:11 ` Greg KH
2007-01-07 21:30 ` H. Peter Anvin
2007-01-07 21:54 ` Junio C Hamano
2007-01-07 22:21 ` Jeff Garzik
2007-01-07 22:53 ` Linus Torvalds
2007-01-07 23:32 ` Martin Langhoff
[not found] ` <45A08269.4050504@zytor.com>
2007-01-07 5:24 ` How git affects kernel.org performance H. Peter Anvin
2007-01-07 5:39 ` Linus Torvalds
2007-01-07 8:55 ` Willy Tarreau
2007-01-07 8:58 ` H. Peter Anvin
2007-01-07 9:03 ` Willy Tarreau
2007-01-07 10:28 ` Christoph Hellwig
2007-01-07 10:52 ` Willy Tarreau
2007-01-07 18:17 ` Linus Torvalds
2007-01-07 19:13 ` Linus Torvalds [this message]
[not found] ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
2007-01-07 19:35 ` Linus Torvalds
2007-01-07 10:50 ` Jan Engelhardt
2007-01-07 18:49 ` Randy Dunlap
2007-01-07 19:07 ` Jan Engelhardt
2007-01-07 19:28 ` Randy Dunlap
2007-01-07 19:37 ` Linus Torvalds
2007-01-07 9:15 ` Andrew Morton
2007-01-07 9:38 ` Rene Herman
2007-01-08 3:05 ` Suparna Bhattacharya
2007-01-08 12:58 ` Theodore Tso
2007-01-08 13:41 ` Johannes Stezenbach
2007-01-08 13:56 ` Theodore Tso
2007-01-08 13:59 ` Pavel Machek
2007-01-08 14:17 ` Theodore Tso
2007-01-08 13:43 ` Jeff Garzik
2007-01-09 1:09 ` Paul Jackson
2007-01-09 2:18 ` Jeremy Higdon
[not found] ` <20070109075945.GA8799@mail.ustc.edu.cn>
2007-01-09 7:59 ` Fengguang Wu
2007-01-09 7:59 ` Fengguang Wu
2007-01-09 16:23 ` Linus Torvalds
[not found] ` <20070110015739.GA26978@mail.ustc.edu.cn>
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 3:20 ` Nigel Cunningham
[not found] ` <20070110140730.GA986@mail.ustc.edu.cn>
2007-01-10 14:07 ` Fengguang Wu
2007-01-10 14:07 ` Fengguang Wu
2007-01-10 14:07 ` Fengguang Wu
2007-01-12 10:54 ` Nigel Cunningham
2007-01-09 7:59 ` Fengguang Wu
2007-01-07 14:57 ` Robert Fitzsimons
2007-01-07 19:12 ` J.H.
2007-01-08 1:51 ` Jakub Narebski
2007-01-07 15:06 ` Krzysztof Halasa
2007-01-07 20:31 ` Shawn O. Pearce
2007-01-08 14:46 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0701071028450.3661@woody.osdl.org \
--to=torvalds@osdl.org \
--cc=akpm@osdl.org \
--cc=git@vger.kernel.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nigel@nigel.suspend2.net \
--cc=pavel@ucw.cz \
--cc=randy.dunlap@oracle.com \
--cc=w@1wt.eu \
--cc=warthog9@kernel.org \
--cc=webmaster@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).