git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Willy Tarreau <w@1wt.eu>, "H. Peter Anvin" <hpa@zytor.com>,
	git@vger.kernel.org, nigel@nigel.suspend2.net,
	"J.H." <warthog9@kernel.org>,
	Randy Dunlap <randy.dunlap@oracle.com>,
	Andrew Morton <akpm@osdl.org>, Pavel Machek <pavel@ucw.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	webmaster@kernel.org
Subject: Re: How git affects kernel.org performance
Date: Sun, 7 Jan 2007 11:13:06 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701071028450.3661@woody.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0701070957080.3661@woody.osdl.org>



On Sun, 7 Jan 2007, Linus Torvalds wrote:
> 
> A year or two ago I did a totally half-assed code for the non-hashed 
> readdir that improved performance by an order of magnitude for ext3 for a 
> test-case of mine, but it was subtly buggy and didn't do the hashed case 
> AT ALL.

Btw, this isn't the test-case, but it's a half-way re-creation of 
something like it. It's _really_ stupid, but here's what you can do:

 - compile and run this idiotic program. It creates a directory called 
   "throwaway" that is ~44kB in size, and if I did things right, it should 
   not be totally contiguous on disk with the current ext3 allocation 
   logic.

 - as root, do "echo 3 > /proc/sys/vm/drop_caches" to get a cache-cold 
   schenario.

 - do "time ls throwaway > /dev/null".

I don't know what people consider to be reasonable performance, but for 
me, it takes about half a second to do a simple "ls". NOTE! This is _not_ 
reading inode stat information or anything like that. It literally takes 
0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s 
throughput on a reasonably fast modern disk.

That's what we in the industry call  "sad".

And that's on a totally unloaded machine. There was _nothing_ else going 
on. No IO congestion, no nothing. Just the cost of synchronously doing 
ten or eleven disk reads.

The fix?

 - proper read-ahead. Right now, even if the directory is totally 
   contiguous on disk (just remove the thing that writes data to the 
   files, so that you'll have empty files instead of 8kB files), I think 
   we do those reads totally synchronously if the filesystem was mounted 
   with directory hashing enabled.

   Without hashing, the directory will be much smaller too, so readdir() 
   will have less data to read. And it _should_ do some readahead, 
   although in my testing, the best I could do was still 0.185s for a (now 
   shrunken) 28kB directory. 

 - better directory block allocation patterns would likely help a lot, 
   rather than single blocks. That's true even without any read-ahead (at 
   least the disk wouldn't need to seek, and any on-disk track buffers etc 
   would work better), but with read-ahead and contiguous blocks it should 
   be just a couple of IO's (the indirect stuff means that it's more than 
   one), and so you should see much better IO patterns because the 
   elevator can try to help too.

Maybe I just have unrealistic expectations, but I really don't like how a 
fairly small 50kB directory takes an appreciable fraction of a second to 
read.

Once it's cached, it still takes too long, but at least at that point the 
individual getdents calls take just tens of microseconds.

Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec 
in the middle.. The 5-6ms range indicates a single IO for the intermediate 
ones, which basically says that each call does roughly one IO, except the 
first one that does ~5 (probably the indirect index blocks), and two in 
the middle who are able to fill up the buffer from the IO done by the 
previous one (4kB buffers, so if the previous getdents() happened to just 
read the beginning of a block, the next one might be able to fill 
everything from that block without having to do IO).

	getdents(3, /* 103 entries */, 4096)    = 4088 <0.034830>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.006703>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.006719>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000354>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000017>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.005302>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.016957>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000017>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.003530>
	getdents(3, /* 83 entries */, 4096)     = 3320 <0.000296>
	getdents(3, /* 0 entries */, 4096)      = 0 <0.000006>

Here's the pure CPU overhead: still pretty high (200 usec! For a single 
system call! That's disgusting! In contrast, a 4kB read() call takes 7 
usec on this machine, so the overhead of doing things one dentry at a 
time, and calling down to several layers of filesystem is quite high):

	getdents(3, /* 103 entries */, 4096)    = 4088 <0.000204>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000122>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000112>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000153>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000018>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000103>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000217>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000018>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000095>
	getdents(3, /* 83 entries */, 4096)     = 3320 <0.000089>
	getdents(3, /* 0 entries */, 4096)      = 0 <0.000006>

but you can see the difference.. The real cost is obviously the IO.

		Linus

----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>

static char buffer[8192];

static int create_file(const char *name)
{
	int fd = open(name, O_RDWR | O_CREAT | O_TRUNC, 0666);
	if (fd < 0)
		return fd;

	write(fd, buffer, sizeof(buffer));
	close(fd);
	return 0;
}

int main(int argc, char **argv)
{
	int i;
	char name[256];

	/* Fill up the buffer with some random garbage */
	for (i = 0; i < sizeof(buffer); i++)
		buffer[i] = "abcdefghijklmnopqrstuvwxyz\n"[i % 27];

	if (mkdir("throwaway", 0777) < 0 || chdir("throwaway") < 0) {
		perror("throwaway");
		exit(1);
	}

	/*
	 * Create a reasonably big directory by having a number
	 * of files with non-trivial filenames, and with some
	 * real content to fragment the directory blocks..
	 */
	for (i = 0; i < 1000; i++) {
		snprintf(name, sizeof(name),
			"file-name-%d-%d-%d-%d",
			i / 1000,
			(i / 100) % 10,
			(i / 10) % 10,
			(i / 1) % 10);
		create_file(name);
	}
	return 0;
}

  reply	other threads:[~2007-01-07 19:15 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061214223718.GA3816@elf.ucw.cz>
     [not found] ` <20061216094421.416a271e.randy.dunlap@oracle.com>
     [not found]   ` <20061216095702.3e6f1d1f.akpm@osdl.org>
     [not found]     ` <458434B0.4090506@oracle.com>
     [not found]       ` <1166297434.26330.34.camel@localhost.localdomain>
     [not found]         ` <1166304080.13548.8.camel@nigel.suspend2.net>
     [not found]           ` <459152B1.9040106@zytor.com>
     [not found]             ` <1168140954.2153.1.camel@nigel.suspend2.net>
2007-01-07  4:22               ` [KORG] Re: kernel.org lies about latest -mm kernel Jeff Garzik
2007-01-07  4:29                 ` Linus Torvalds
2007-01-07 20:11                 ` Greg KH
2007-01-07 21:30                   ` H. Peter Anvin
2007-01-07 21:54                   ` Junio C Hamano
2007-01-07 22:21                     ` Jeff Garzik
2007-01-07 22:53                       ` Linus Torvalds
2007-01-07 23:32                         ` Martin Langhoff
     [not found]               ` <45A08269.4050504@zytor.com>
2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
2007-01-07  5:39                   ` Linus Torvalds
2007-01-07  8:55                     ` Willy Tarreau
2007-01-07  8:58                       ` H. Peter Anvin
2007-01-07  9:03                         ` Willy Tarreau
2007-01-07 10:28                           ` Christoph Hellwig
2007-01-07 10:52                             ` Willy Tarreau
2007-01-07 18:17                             ` Linus Torvalds
2007-01-07 19:13                               ` Linus Torvalds [this message]
     [not found]                                 ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
2007-01-07 19:35                                   ` Linus Torvalds
2007-01-07 10:50                           ` Jan Engelhardt
2007-01-07 18:49                             ` Randy Dunlap
2007-01-07 19:07                               ` Jan Engelhardt
2007-01-07 19:28                                 ` Randy Dunlap
2007-01-07 19:37                                   ` Linus Torvalds
2007-01-07  9:15                       ` Andrew Morton
2007-01-07  9:38                         ` Rene Herman
2007-01-08  3:05                         ` Suparna Bhattacharya
2007-01-08 12:58                           ` Theodore Tso
2007-01-08 13:41                             ` Johannes Stezenbach
2007-01-08 13:56                               ` Theodore Tso
2007-01-08 13:59                                 ` Pavel Machek
2007-01-08 14:17                                   ` Theodore Tso
2007-01-08 13:43                             ` Jeff Garzik
2007-01-09  1:09                               ` Paul Jackson
2007-01-09  2:18                                 ` Jeremy Higdon
     [not found]                             ` <20070109075945.GA8799@mail.ustc.edu.cn>
2007-01-09  7:59                               ` Fengguang Wu
2007-01-09  7:59                               ` Fengguang Wu
2007-01-09 16:23                                 ` Linus Torvalds
     [not found]                                   ` <20070110015739.GA26978@mail.ustc.edu.cn>
2007-01-10  1:57                                     ` Fengguang Wu
2007-01-10  1:57                                     ` Fengguang Wu
2007-01-10  1:57                                     ` Fengguang Wu
2007-01-10  3:20                                     ` Nigel Cunningham
     [not found]                                       ` <20070110140730.GA986@mail.ustc.edu.cn>
2007-01-10 14:07                                         ` Fengguang Wu
2007-01-10 14:07                                         ` Fengguang Wu
2007-01-10 14:07                                         ` Fengguang Wu
2007-01-12 10:54                                         ` Nigel Cunningham
2007-01-09  7:59                               ` Fengguang Wu
2007-01-07 14:57                   ` Robert Fitzsimons
2007-01-07 19:12                     ` J.H.
2007-01-08  1:51                     ` Jakub Narebski
2007-01-07 15:06                   ` Krzysztof Halasa
2007-01-07 20:31                     ` Shawn O. Pearce
2007-01-08 14:46                       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701071028450.3661@woody.osdl.org \
    --to=torvalds@osdl.org \
    --cc=akpm@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nigel@nigel.suspend2.net \
    --cc=pavel@ucw.cz \
    --cc=randy.dunlap@oracle.com \
    --cc=w@1wt.eu \
    --cc=warthog9@kernel.org \
    --cc=webmaster@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).