* Re: [KORG] Re: kernel.org lies about latest -mm kernel [not found] ` <1168140954.2153.1.camel@nigel.suspend2.net> @ 2007-01-07 4:22 ` Jeff Garzik 2007-01-07 4:29 ` Linus Torvalds 2007-01-07 20:11 ` Greg KH [not found] ` <45A08269.4050504@zytor.com> 1 sibling, 2 replies; 52+ messages in thread From: Jeff Garzik @ 2007-01-07 4:22 UTC (permalink / raw) To: nigel, H. Peter Anvin, Andrew Morton, Greg KH, Linus Torvalds Cc: J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, Git Mailing List > On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote: >> Not really. In fact, it would hardly help at all. >> >> The two things git users can do to help is: >> >> 1. Make sure your alternatives file is set up correctly; >> 2. Keep your trees packed and pruned, to keep the file count down. >> >> If you do this, the load imposed by a single git tree is fairly negible. Would kernel hackers be amenable to having their trees auto-repacked, and linked via alternatives to Linus's linux-2.6.git? Looking through kernel.org, we have a ton of repositories, however packed, that carrying their own copies of the linux-2.6.git repo. Also, I wonder if "git push" will push only the non-linux-2.6.git objects, if both local and remote sides have the proper alternatives set up? Jeff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 4:22 ` [KORG] Re: kernel.org lies about latest -mm kernel Jeff Garzik @ 2007-01-07 4:29 ` Linus Torvalds 2007-01-07 20:11 ` Greg KH 1 sibling, 0 replies; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 4:29 UTC (permalink / raw) To: Jeff Garzik Cc: nigel, H. Peter Anvin, Andrew Morton, Greg KH, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, Git Mailing List On Sat, 6 Jan 2007, Jeff Garzik wrote: > > Also, I wonder if "git push" will push only the non-linux-2.6.git objects, if > both local and remote sides have the proper alternatives set up? Yes. Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 4:22 ` [KORG] Re: kernel.org lies about latest -mm kernel Jeff Garzik 2007-01-07 4:29 ` Linus Torvalds @ 2007-01-07 20:11 ` Greg KH 2007-01-07 21:30 ` H. Peter Anvin 2007-01-07 21:54 ` Junio C Hamano 1 sibling, 2 replies; 52+ messages in thread From: Greg KH @ 2007-01-07 20:11 UTC (permalink / raw) To: Jeff Garzik Cc: nigel, H. Peter Anvin, Andrew Morton, Linus Torvalds, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, Git Mailing List On Sat, Jan 06, 2007 at 11:22:31PM -0500, Jeff Garzik wrote: > >On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote: > >>Not really. In fact, it would hardly help at all. > >> > >>The two things git users can do to help is: > >> > >>1. Make sure your alternatives file is set up correctly; > >>2. Keep your trees packed and pruned, to keep the file count down. > >> > >>If you do this, the load imposed by a single git tree is fairly negible. > > > Would kernel hackers be amenable to having their trees auto-repacked, > and linked via alternatives to Linus's linux-2.6.git? > > Looking through kernel.org, we have a ton of repositories, however > packed, that carrying their own copies of the linux-2.6.git repo. Well, I create my repos by doing a: git clone -l --bare which makes a hardlink from Linus's tree. But then it gets copied over to the public server, which probably severs that hardlink :( Any shortcut to clone or set up a repo using "alternatives" so that we don't have this issue at all? thanks, greg k-h ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 20:11 ` Greg KH @ 2007-01-07 21:30 ` H. Peter Anvin 2007-01-07 21:54 ` Junio C Hamano 1 sibling, 0 replies; 52+ messages in thread From: H. Peter Anvin @ 2007-01-07 21:30 UTC (permalink / raw) To: Greg KH Cc: Jeff Garzik, nigel, Andrew Morton, Linus Torvalds, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, Git Mailing List Greg KH wrote: > > Well, I create my repos by doing a: > git clone -l --bare > which makes a hardlink from Linus's tree. > > But then it gets copied over to the public server, which probably severs > that hardlink :( > > Any shortcut to clone or set up a repo using "alternatives" so that we > don't have this issue at all? > Use the -s option to git clone. -hpa ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 20:11 ` Greg KH 2007-01-07 21:30 ` H. Peter Anvin @ 2007-01-07 21:54 ` Junio C Hamano 2007-01-07 22:21 ` Jeff Garzik 1 sibling, 1 reply; 52+ messages in thread From: Junio C Hamano @ 2007-01-07 21:54 UTC (permalink / raw) To: Greg KH; +Cc: git Greg KH <gregkh@suse.de> writes: > Any shortcut to clone or set up a repo using "alternatives" so that we > don't have this issue at all? "clone -l -s" has been there for quote a long time (since mid Aug 2005). Because -s implies -l since end of November 2005, you should be able to say git clone --bare -s ..../torvalds/linux-2.6.git stable-queue.git ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 21:54 ` Junio C Hamano @ 2007-01-07 22:21 ` Jeff Garzik 2007-01-07 22:53 ` Linus Torvalds 0 siblings, 1 reply; 52+ messages in thread From: Jeff Garzik @ 2007-01-07 22:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Greg KH, git Junio C Hamano wrote: > Greg KH <gregkh@suse.de> writes: > >> Any shortcut to clone or set up a repo using "alternatives" so that we >> don't have this issue at all? > > "clone -l -s" has been there for quote a long time (since mid Aug > 2005). Because -s implies -l since end of November 2005, you > should be able to say > > git clone --bare -s ..../torvalds/linux-2.6.git stable-queue.git Yes but what about existing trees? Can you add an alternatives file, then prune, and get the same result as if you had done a clone -s ? Jeff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 22:21 ` Jeff Garzik @ 2007-01-07 22:53 ` Linus Torvalds 2007-01-07 23:32 ` Martin Langhoff 0 siblings, 1 reply; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 22:53 UTC (permalink / raw) To: Jeff Garzik; +Cc: Junio C Hamano, Greg KH, git On Sun, 7 Jan 2007, Jeff Garzik wrote: > > Yes but what about existing trees? > > Can you add an alternatives file, then prune, and get the same result as if > you had done a clone -s ? Yes. Also do git repack -a -d -l where the "-l" flag is the magic (it says to repack only objects that aren't already packed in the alternate repository) Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [KORG] Re: kernel.org lies about latest -mm kernel 2007-01-07 22:53 ` Linus Torvalds @ 2007-01-07 23:32 ` Martin Langhoff 0 siblings, 0 replies; 52+ messages in thread From: Martin Langhoff @ 2007-01-07 23:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff Garzik, Junio C Hamano, Greg KH, git On 1/8/07, Linus Torvalds <torvalds@osdl.org> wrote: > On Sun, 7 Jan 2007, Jeff Garzik wrote: > > Yes but what about existing trees? > > Can you add an alternatives file, then prune, and get the same result as if > > you had done a clone -s ? > Yes. Also do > git repack -a -d -l > > where the "-l" flag is the magic (it says to repack only objects that > aren't already packed in the alternate repository) If all kernel.org repos get git-repack -a -d -l and git-pack-refs, gitweb will see a significant speedup, as some up-to-date checks become extremely cheap. cheers martin ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <45A08269.4050504@zytor.com>]
* How git affects kernel.org performance [not found] ` <45A08269.4050504@zytor.com> @ 2007-01-07 5:24 ` H. Peter Anvin 2007-01-07 5:39 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: H. Peter Anvin @ 2007-01-07 5:24 UTC (permalink / raw) To: H. Peter Anvin, git Cc: nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster Some more data on how git affects kernel.org... During extremely high load, it appears that what slows kernel.org down more than anything else is the time that each individual getdents() call takes. When I've looked this I've observed times from 200 ms to almost 2 seconds! Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly packed tree, you can do the math yourself. I have tried reducing vm.vfs_cache_pressure down to 1 on the kernel.org machines in order to improve the situation, but even at that point it appears the kernel doesn't readily hold the entire directory hierarchy in memory, even though there is space to do so. I have suggested that we might want to add a sysctl to change the denominator from the default 100. The one thing that we need done locally is to have a smart uploader, instead of relying on rsync. That, unfortunately, is a fairly sizable project. -hpa ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 5:24 ` How git affects kernel.org performance H. Peter Anvin @ 2007-01-07 5:39 ` Linus Torvalds 2007-01-07 8:55 ` Willy Tarreau 2007-01-07 14:57 ` Robert Fitzsimons 2007-01-07 15:06 ` Krzysztof Halasa 2 siblings, 1 reply; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 5:39 UTC (permalink / raw) To: H. Peter Anvin Cc: git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sat, 6 Jan 2007, H. Peter Anvin wrote: > > During extremely high load, it appears that what slows kernel.org down more > than anything else is the time that each individual getdents() call takes. > When I've looked this I've observed times from 200 ms to almost 2 seconds! > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly > packed tree, you can do the math yourself. "getdents()" is totally serialized by the inode semaphore. It's one of the most expensive system calls in Linux, partly because of that, and partly because it has to call all the way down into the filesystem in a way that almost no other common system call has to (99% of all filesystem calls can be handled basically at the VFS layer with generic caches - but not getdents()). So if there are concurrent readdirs on the same directory, they get serialized. If there is any file creation/deletion activity in the directory, it serializes getdents(). To make matters worse, I don't think it has any read-ahead at all when you use hashed directory entries. So if you have cold-cache case, you'll read every single block totally individually, and serialized. One block at a time (I think the non-hashed case is likely also suspect, but that's a separate issue) In other words, I'm not at all surprised it hits on filldir time. Especially on ext3. Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 5:39 ` Linus Torvalds @ 2007-01-07 8:55 ` Willy Tarreau 2007-01-07 8:58 ` H. Peter Anvin 2007-01-07 9:15 ` Andrew Morton 0 siblings, 2 replies; 52+ messages in thread From: Willy Tarreau @ 2007-01-07 8:55 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote: > > > On Sat, 6 Jan 2007, H. Peter Anvin wrote: > > > > During extremely high load, it appears that what slows kernel.org down more > > than anything else is the time that each individual getdents() call takes. > > When I've looked this I've observed times from 200 ms to almost 2 seconds! > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly > > packed tree, you can do the math yourself. > > "getdents()" is totally serialized by the inode semaphore. It's one of the > most expensive system calls in Linux, partly because of that, and partly > because it has to call all the way down into the filesystem in a way that > almost no other common system call has to (99% of all filesystem calls can > be handled basically at the VFS layer with generic caches - but not > getdents()). > > So if there are concurrent readdirs on the same directory, they get > serialized. If there is any file creation/deletion activity in the > directory, it serializes getdents(). > > To make matters worse, I don't think it has any read-ahead at all when you > use hashed directory entries. So if you have cold-cache case, you'll read > every single block totally individually, and serialized. One block at a > time (I think the non-hashed case is likely also suspect, but that's a > separate issue) > > In other words, I'm not at all surprised it hits on filldir time. > Especially on ext3. At work, we had the same problem on a file server with ext3. We use rsync to make backups to a local IDE disk, and we noticed that getdents() took about the same time as Peter reports (0.2 to 2 seconds), especially in maildir directories. We tried many things to fix it with no result, including enabling dirindexes. Finally, we made a full backup, and switched over to XFS and the problem totally disappeared. So it seems that the filesystem matters a lot here when there are lots of entries in a directory, and that ext3 is not suitable for usages with thousands of entries in directories with millions of files on disk. I'm not certain it would be that easy to try other filesystems on kernel.org though :-/ Willy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 8:55 ` Willy Tarreau @ 2007-01-07 8:58 ` H. Peter Anvin 2007-01-07 9:03 ` Willy Tarreau 2007-01-07 9:15 ` Andrew Morton 1 sibling, 1 reply; 52+ messages in thread From: H. Peter Anvin @ 2007-01-07 8:58 UTC (permalink / raw) To: Willy Tarreau Cc: Linus Torvalds, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster Willy Tarreau wrote: > > At work, we had the same problem on a file server with ext3. We use rsync > to make backups to a local IDE disk, and we noticed that getdents() took > about the same time as Peter reports (0.2 to 2 seconds), especially in > maildir directories. We tried many things to fix it with no result, > including enabling dirindexes. Finally, we made a full backup, and switched > over to XFS and the problem totally disappeared. So it seems that the > filesystem matters a lot here when there are lots of entries in a > directory, and that ext3 is not suitable for usages with thousands > of entries in directories with millions of files on disk. I'm not > certain it would be that easy to try other filesystems on kernel.org > though :-/ > Changing filesystems would mean about a week of downtime for a server. It's painful, but it's doable; however, if we get a traffic spike during that time it'll hurt like hell. However, if there is credible reasons to believe XFS will help, I'd be inclined to try it out. -hpa ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 8:58 ` H. Peter Anvin @ 2007-01-07 9:03 ` Willy Tarreau 2007-01-07 10:28 ` Christoph Hellwig 2007-01-07 10:50 ` Jan Engelhardt 0 siblings, 2 replies; 52+ messages in thread From: Willy Tarreau @ 2007-01-07 9:03 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote: > Willy Tarreau wrote: > > > >At work, we had the same problem on a file server with ext3. We use rsync > >to make backups to a local IDE disk, and we noticed that getdents() took > >about the same time as Peter reports (0.2 to 2 seconds), especially in > >maildir directories. We tried many things to fix it with no result, > >including enabling dirindexes. Finally, we made a full backup, and switched > >over to XFS and the problem totally disappeared. So it seems that the > >filesystem matters a lot here when there are lots of entries in a > >directory, and that ext3 is not suitable for usages with thousands > >of entries in directories with millions of files on disk. I'm not > >certain it would be that easy to try other filesystems on kernel.org > >though :-/ > > > > Changing filesystems would mean about a week of downtime for a server. > It's painful, but it's doable; however, if we get a traffic spike during > that time it'll hurt like hell. > > However, if there is credible reasons to believe XFS will help, I'd be > inclined to try it out. The problem is that I have no sufficient FS knowledge to argument why it helps here. It was a desperate attempt to fix the problem for us and it definitely worked well. Hmmm I'm thinking about something very dirty : would it be possible to reduce the current FS size to get more space to create another FS ? Supposing you create a XX GB/TB XFS after the current ext3, you would be able to mount it in some directories with --bind and slowly switch some parts to it. The problem with this approach is that it will never be 100% converted, but as an experiment it might be worth it, no ? Willy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 9:03 ` Willy Tarreau @ 2007-01-07 10:28 ` Christoph Hellwig 2007-01-07 10:52 ` Willy Tarreau 2007-01-07 18:17 ` Linus Torvalds 2007-01-07 10:50 ` Jan Engelhardt 1 sibling, 2 replies; 52+ messages in thread From: Christoph Hellwig @ 2007-01-07 10:28 UTC (permalink / raw) To: Willy Tarreau Cc: H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote: > The problem is that I have no sufficient FS knowledge to argument why > it helps here. It was a desperate attempt to fix the problem for us > and it definitely worked well. XFS does rather efficient btree directories, and it does sophisticated readahead for directories. I suspect that's what is helping you there. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 10:28 ` Christoph Hellwig @ 2007-01-07 10:52 ` Willy Tarreau 2007-01-07 18:17 ` Linus Torvalds 1 sibling, 0 replies; 52+ messages in thread From: Willy Tarreau @ 2007-01-07 10:52 UTC (permalink / raw) To: Christoph Hellwig, H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, Jan 07, 2007 at 10:28:53AM +0000, Christoph Hellwig wrote: > On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote: > > The problem is that I have no sufficient FS knowledge to argument why > > it helps here. It was a desperate attempt to fix the problem for us > > and it definitely worked well. > > XFS does rather efficient btree directories, and it does sophisticated > readahead for directories. I suspect that's what is helping you there. Ok. Do you too think it might help (or even solve) the problem on kernel.org ? Willy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 10:28 ` Christoph Hellwig 2007-01-07 10:52 ` Willy Tarreau @ 2007-01-07 18:17 ` Linus Torvalds 2007-01-07 19:13 ` Linus Torvalds 1 sibling, 1 reply; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 18:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007, Christoph Hellwig wrote: > > On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote: > > The problem is that I have no sufficient FS knowledge to argument why > > it helps here. It was a desperate attempt to fix the problem for us > > and it definitely worked well. > > XFS does rather efficient btree directories, and it does sophisticated > readahead for directories. I suspect that's what is helping you there. The sad part is that this is a long-standing issue, and the directory reading code in ext3 really _should_ be able to do ok. A year or two ago I did a totally half-assed code for the non-hashed readdir that improved performance by an order of magnitude for ext3 for a test-case of mine, but it was subtly buggy and didn't do the hashed case AT ALL. Andrew fixed it up so that it at least wasn't subtly buggy any more, but in the process it also lost all capability of doing fragmented directories (so it doesn't help very much any more under exactly the situation that is the worst case), and it still doesn't do the hashed directory case. It's my personal pet peeve with ext3 (as Andrew can attest). And it's really sad, because I don't think it is fundamental per se, but the way the directory handling and jdb are done, it's apparently very hard to fix. (It's clearly not _impossible_ to do: I think that it should be possible to treat ext3 directories the same way we treat files, except they would always be in "data=journal" mode. But I understand ext2, not ext3 (and absolutely not jbd), so I'm not going to be able to do anything about it personally). Anyway, I think that disabling hashing can actually help. And I suspect that even with hashing enabled, there should be some quick hack for making the directory reading at least be able to do multiple outstanding reads in parallel, instead of reading the blocks totally synchronously ("read five blocks, then wait for the one we care" rather than the current "read one block at a time, wait for it, read the next one, wait for it.." situation). Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 18:17 ` Linus Torvalds @ 2007-01-07 19:13 ` Linus Torvalds [not found] ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com> 0 siblings, 1 reply; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 19:13 UTC (permalink / raw) To: Christoph Hellwig Cc: Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007, Linus Torvalds wrote: > > A year or two ago I did a totally half-assed code for the non-hashed > readdir that improved performance by an order of magnitude for ext3 for a > test-case of mine, but it was subtly buggy and didn't do the hashed case > AT ALL. Btw, this isn't the test-case, but it's a half-way re-creation of something like it. It's _really_ stupid, but here's what you can do: - compile and run this idiotic program. It creates a directory called "throwaway" that is ~44kB in size, and if I did things right, it should not be totally contiguous on disk with the current ext3 allocation logic. - as root, do "echo 3 > /proc/sys/vm/drop_caches" to get a cache-cold schenario. - do "time ls throwaway > /dev/null". I don't know what people consider to be reasonable performance, but for me, it takes about half a second to do a simple "ls". NOTE! This is _not_ reading inode stat information or anything like that. It literally takes 0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s throughput on a reasonably fast modern disk. That's what we in the industry call "sad". And that's on a totally unloaded machine. There was _nothing_ else going on. No IO congestion, no nothing. Just the cost of synchronously doing ten or eleven disk reads. The fix? - proper read-ahead. Right now, even if the directory is totally contiguous on disk (just remove the thing that writes data to the files, so that you'll have empty files instead of 8kB files), I think we do those reads totally synchronously if the filesystem was mounted with directory hashing enabled. Without hashing, the directory will be much smaller too, so readdir() will have less data to read. And it _should_ do some readahead, although in my testing, the best I could do was still 0.185s for a (now shrunken) 28kB directory. - better directory block allocation patterns would likely help a lot, rather than single blocks. That's true even without any read-ahead (at least the disk wouldn't need to seek, and any on-disk track buffers etc would work better), but with read-ahead and contiguous blocks it should be just a couple of IO's (the indirect stuff means that it's more than one), and so you should see much better IO patterns because the elevator can try to help too. Maybe I just have unrealistic expectations, but I really don't like how a fairly small 50kB directory takes an appreciable fraction of a second to read. Once it's cached, it still takes too long, but at least at that point the individual getdents calls take just tens of microseconds. Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec in the middle.. The 5-6ms range indicates a single IO for the intermediate ones, which basically says that each call does roughly one IO, except the first one that does ~5 (probably the indirect index blocks), and two in the middle who are able to fill up the buffer from the IO done by the previous one (4kB buffers, so if the previous getdents() happened to just read the beginning of a block, the next one might be able to fill everything from that block without having to do IO). getdents(3, /* 103 entries */, 4096) = 4088 <0.034830> getdents(3, /* 102 entries */, 4096) = 4080 <0.006703> getdents(3, /* 102 entries */, 4096) = 4080 <0.006719> getdents(3, /* 102 entries */, 4096) = 4080 <0.000354> getdents(3, /* 102 entries */, 4096) = 4080 <0.000017> getdents(3, /* 102 entries */, 4096) = 4080 <0.005302> getdents(3, /* 102 entries */, 4096) = 4080 <0.016957> getdents(3, /* 102 entries */, 4096) = 4080 <0.000017> getdents(3, /* 102 entries */, 4096) = 4080 <0.003530> getdents(3, /* 83 entries */, 4096) = 3320 <0.000296> getdents(3, /* 0 entries */, 4096) = 0 <0.000006> Here's the pure CPU overhead: still pretty high (200 usec! For a single system call! That's disgusting! In contrast, a 4kB read() call takes 7 usec on this machine, so the overhead of doing things one dentry at a time, and calling down to several layers of filesystem is quite high): getdents(3, /* 103 entries */, 4096) = 4088 <0.000204> getdents(3, /* 102 entries */, 4096) = 4080 <0.000122> getdents(3, /* 102 entries */, 4096) = 4080 <0.000112> getdents(3, /* 102 entries */, 4096) = 4080 <0.000153> getdents(3, /* 102 entries */, 4096) = 4080 <0.000018> getdents(3, /* 102 entries */, 4096) = 4080 <0.000103> getdents(3, /* 102 entries */, 4096) = 4080 <0.000217> getdents(3, /* 102 entries */, 4096) = 4080 <0.000018> getdents(3, /* 102 entries */, 4096) = 4080 <0.000095> getdents(3, /* 83 entries */, 4096) = 3320 <0.000089> getdents(3, /* 0 entries */, 4096) = 0 <0.000006> but you can see the difference.. The real cost is obviously the IO. Linus ---- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/stat.h> #include <sys/types.h> static char buffer[8192]; static int create_file(const char *name) { int fd = open(name, O_RDWR | O_CREAT | O_TRUNC, 0666); if (fd < 0) return fd; write(fd, buffer, sizeof(buffer)); close(fd); return 0; } int main(int argc, char **argv) { int i; char name[256]; /* Fill up the buffer with some random garbage */ for (i = 0; i < sizeof(buffer); i++) buffer[i] = "abcdefghijklmnopqrstuvwxyz\n"[i % 27]; if (mkdir("throwaway", 0777) < 0 || chdir("throwaway") < 0) { perror("throwaway"); exit(1); } /* * Create a reasonably big directory by having a number * of files with non-trivial filenames, and with some * real content to fragment the directory blocks.. */ for (i = 0; i < 1000; i++) { snprintf(name, sizeof(name), "file-name-%d-%d-%d-%d", i / 1000, (i / 100) % 10, (i / 10) % 10, (i / 1) % 10); create_file(name); } return 0; } ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>]
* Re: How git affects kernel.org performance [not found] ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com> @ 2007-01-07 19:35 ` Linus Torvalds 0 siblings, 0 replies; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 19:35 UTC (permalink / raw) To: Jon Smirl Cc: Christoph Hellwig, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007, Jon Smirl wrote: > > > > - proper read-ahead. Right now, even if the directory is totally > > contiguous on disk (just remove the thing that writes data to the > > files, so that you'll have empty files instead of 8kB files), I think > > we do those reads totally synchronously if the filesystem was mounted > > with directory hashing enabled. > > What's the status on the Adaptive Read-ahead patch from Wu Fengguang > <wfg@mail.ustc.edu.cn> ? That patch really helped with read ahead > problems I was having with mmap. It was in mm forever and I've lost > track of it. Won't help. ext3 does NO readahead at all. It doesn't use the general VFS helper routines to read data (because it doesn't use the page cache), it just does the raw buffer-head IO directly. (In the non-indexed case, it does do some read-ahead, and it uses the generic routines for it, but because it does everything by physical address, even the generic routines will decide that it's just doing random reading if the directory isn't physically contiguous - and stop reading ahead). (I may have missed some case where it does do read-ahead in the index routines, so don't take my word as being unquestionably true. I'm _fairly_ sure, but..) Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 9:03 ` Willy Tarreau 2007-01-07 10:28 ` Christoph Hellwig @ 2007-01-07 10:50 ` Jan Engelhardt 2007-01-07 18:49 ` Randy Dunlap 1 sibling, 1 reply; 52+ messages in thread From: Jan Engelhardt @ 2007-01-07 10:50 UTC (permalink / raw) To: Willy Tarreau Cc: H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Jan 7 2007 10:03, Willy Tarreau wrote: >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote: >> >[..] >> >entries in directories with millions of files on disk. I'm not >> >certain it would be that easy to try other filesystems on >> >kernel.org though :-/ >> >> Changing filesystems would mean about a week of downtime for a server. >> It's painful, but it's doable; however, if we get a traffic spike during >> that time it'll hurt like hell. Then make sure noone releases a kernel ;-) >> However, if there is credible reasons to believe XFS will help, I'd be >> inclined to try it out. > >Hmmm I'm thinking about something very dirty : would it be possible >to reduce the current FS size to get more space to create another >FS ? Supposing you create a XX GB/TB XFS after the current ext3, >you would be able to mount it in some directories with --bind and >slowly switch some parts to it. The problem with this approach is >that it will never be 100% converted, but as an experiment it might >be worth it, no ? Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync again to catch any new files that have been added until the ftp upload was closed, then do _one_ (technically two) mountpoint moves (as opposed to Willy's idea of "some directories") in a mere second along the lines of mount --move /oldfs /older; mount --move /newfs /oldfs. let old transfers that still use files in /older complete (lsof or fuser -m), then disconnect the old volume. In case /newfs (now /oldfs) is a volume you borrowed from someone and need to return it, well, I guess you need to rsync back somehow. -`J' -- ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 10:50 ` Jan Engelhardt @ 2007-01-07 18:49 ` Randy Dunlap 2007-01-07 19:07 ` Jan Engelhardt 0 siblings, 1 reply; 52+ messages in thread From: Randy Dunlap @ 2007-01-07 18:49 UTC (permalink / raw) To: Jan Engelhardt Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote: > > On Jan 7 2007 10:03, Willy Tarreau wrote: > >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote: > >> >[..] > >> >entries in directories with millions of files on disk. I'm not > >> >certain it would be that easy to try other filesystems on > >> >kernel.org though :-/ > >> > >> Changing filesystems would mean about a week of downtime for a server. > >> It's painful, but it's doable; however, if we get a traffic spike during > >> that time it'll hurt like hell. > > Then make sure noone releases a kernel ;-) maybe the week of LCA ? > >> However, if there is credible reasons to believe XFS will help, I'd be > >> inclined to try it out. > > > >Hmmm I'm thinking about something very dirty : would it be possible > >to reduce the current FS size to get more space to create another > >FS ? Supposing you create a XX GB/TB XFS after the current ext3, > >you would be able to mount it in some directories with --bind and > >slowly switch some parts to it. The problem with this approach is > >that it will never be 100% converted, but as an experiment it might > >be worth it, no ? > > Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync > again to catch any new files that have been added until the ftp > upload was closed, then do _one_ (technically two) mountpoint moves > (as opposed to Willy's idea of "some directories") in a mere second > along the lines of > > mount --move /oldfs /older; mount --move /newfs /oldfs. > > let old transfers that still use files in /older complete (lsof or > fuser -m), then disconnect the old volume. In case /newfs (now > /oldfs) is a volume you borrowed from someone and need to return it, > well, I guess you need to rsync back somehow. --- ~Randy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 18:49 ` Randy Dunlap @ 2007-01-07 19:07 ` Jan Engelhardt 2007-01-07 19:28 ` Randy Dunlap 0 siblings, 1 reply; 52+ messages in thread From: Jan Engelhardt @ 2007-01-07 19:07 UTC (permalink / raw) To: Randy Dunlap Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Andrew Morton, Pavel Machek, kernel list, webmaster On Jan 7 2007 10:49, Randy Dunlap wrote: >On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote: >> On Jan 7 2007 10:03, Willy Tarreau wrote: >> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote: >> >> >[..] >> >> >entries in directories with millions of files on disk. I'm not >> >> >certain it would be that easy to try other filesystems on >> >> >kernel.org though :-/ >> >> >> >> Changing filesystems would mean about a week of downtime for a server. >> >> It's painful, but it's doable; however, if we get a traffic spike during >> >> that time it'll hurt like hell. >> >> Then make sure noone releases a kernel ;-) > >maybe the week of LCA ? I don't know that acronym, but if you ask me when it should happen: _Before_ the next big thing is released, e.g. before 2.6.20-final. Reason: You never know how long they're chewing [downloading] on 2.6.20. Excluding other projects on kernel.org from my hypothesis, I'd suppose the lowest bandwidth usage the longer no new files have been released. (Because everyone has them then more or less.) -`J' -- ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 19:07 ` Jan Engelhardt @ 2007-01-07 19:28 ` Randy Dunlap 2007-01-07 19:37 ` Linus Torvalds 0 siblings, 1 reply; 52+ messages in thread From: Randy Dunlap @ 2007-01-07 19:28 UTC (permalink / raw) To: Jan Engelhardt Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H., Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007 20:07:43 +0100 (MET) Jan Engelhardt wrote: > > On Jan 7 2007 10:49, Randy Dunlap wrote: > >On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote: > >> On Jan 7 2007 10:03, Willy Tarreau wrote: > >> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote: > >> >> >[..] > >> >> >entries in directories with millions of files on disk. I'm not > >> >> >certain it would be that easy to try other filesystems on > >> >> >kernel.org though :-/ > >> >> > >> >> Changing filesystems would mean about a week of downtime for a server. > >> >> It's painful, but it's doable; however, if we get a traffic spike during > >> >> that time it'll hurt like hell. > >> > >> Then make sure noone releases a kernel ;-) > > > >maybe the week of LCA ? Sorry, it means Linux.conf.au (Australia): http://lca2007.linux.org.au/ Jan. 15-20, 2007 > I don't know that acronym, but if you ask me when it should happen: > _Before_ the next big thing is released, e.g. before 2.6.20-final. > Reason: You never know how long they're chewing [downloading] on 2.6.20. > Excluding other projects on kernel.org from my hypothesis, I'd suppose the > lowest bandwidth usage the longer no new files have been released. (Because > everyone has them then more or less.) ISTM that Linus is trying to make 2.6.20-final before LCA. We'll see. --- ~Randy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 19:28 ` Randy Dunlap @ 2007-01-07 19:37 ` Linus Torvalds 0 siblings, 0 replies; 52+ messages in thread From: Linus Torvalds @ 2007-01-07 19:37 UTC (permalink / raw) To: Randy Dunlap Cc: Jan Engelhardt, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007, Randy Dunlap wrote: > > ISTM that Linus is trying to make 2.6.20-final before LCA. We'll see. No. Hopefully "final -rc" before LCA, but I'll do the actual 2.6.20 release afterwards. I don't want to have a merge window during LCA, as I and many others will all be out anyway. So it's much better to have LCA happen during the end of the stabilization phase when there's hopefully not a lot going on. (Of course, often at the end of the stabilization phase there is all the "ok, what about regression XyZ?" panic) Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 8:55 ` Willy Tarreau 2007-01-07 8:58 ` H. Peter Anvin @ 2007-01-07 9:15 ` Andrew Morton 2007-01-07 9:38 ` Rene Herman 2007-01-08 3:05 ` Suparna Bhattacharya 1 sibling, 2 replies; 52+ messages in thread From: Andrew Morton @ 2007-01-07 9:15 UTC (permalink / raw) To: Willy Tarreau Cc: Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Sun, 7 Jan 2007 09:55:26 +0100 Willy Tarreau <w@1wt.eu> wrote: > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote: > > > > > > On Sat, 6 Jan 2007, H. Peter Anvin wrote: > > > > > > During extremely high load, it appears that what slows kernel.org down more > > > than anything else is the time that each individual getdents() call takes. > > > When I've looked this I've observed times from 200 ms to almost 2 seconds! > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly > > > packed tree, you can do the math yourself. > > > > "getdents()" is totally serialized by the inode semaphore. It's one of the > > most expensive system calls in Linux, partly because of that, and partly > > because it has to call all the way down into the filesystem in a way that > > almost no other common system call has to (99% of all filesystem calls can > > be handled basically at the VFS layer with generic caches - but not > > getdents()). > > > > So if there are concurrent readdirs on the same directory, they get > > serialized. If there is any file creation/deletion activity in the > > directory, it serializes getdents(). > > > > To make matters worse, I don't think it has any read-ahead at all when you > > use hashed directory entries. So if you have cold-cache case, you'll read > > every single block totally individually, and serialized. One block at a > > time (I think the non-hashed case is likely also suspect, but that's a > > separate issue) > > > > In other words, I'm not at all surprised it hits on filldir time. > > Especially on ext3. > > At work, we had the same problem on a file server with ext3. We use rsync > to make backups to a local IDE disk, and we noticed that getdents() took > about the same time as Peter reports (0.2 to 2 seconds), especially in > maildir directories. We tried many things to fix it with no result, > including enabling dirindexes. Finally, we made a full backup, and switched > over to XFS and the problem totally disappeared. So it seems that the > filesystem matters a lot here when there are lots of entries in a > directory, and that ext3 is not suitable for usages with thousands > of entries in directories with millions of files on disk. I'm not > certain it would be that easy to try other filesystems on kernel.org > though :-/ > Yeah, slowly-growing directories will get splattered all over the disk. Possible short-term fixes would be to just allocate up to (say) eight blocks when we grow a directory by one block. Or teach the directory-growth code to use ext3 reservations. Longer-term people are talking about things like on-disk rerservations. But I expect directories are being forgotten about in all of that. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 9:15 ` Andrew Morton @ 2007-01-07 9:38 ` Rene Herman 2007-01-08 3:05 ` Suparna Bhattacharya 1 sibling, 0 replies; 52+ messages in thread From: Rene Herman @ 2007-01-07 9:38 UTC (permalink / raw) To: Andrew Morton Cc: Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On 01/07/2007 10:15 AM, Andrew Morton wrote: > Yeah, slowly-growing directories will get splattered all over the > disk. > > Possible short-term fixes would be to just allocate up to (say) eight > blocks when we grow a directory by one block. Or teach the > directory-growth code to use ext3 reservations. > > Longer-term people are talking about things like on-disk > rerservations. But I expect directories are being forgotten about in > all of that. I wish people would just talk about de2fsrag... ;-\ Rene ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 9:15 ` Andrew Morton 2007-01-07 9:38 ` Rene Herman @ 2007-01-08 3:05 ` Suparna Bhattacharya 2007-01-08 12:58 ` Theodore Tso 1 sibling, 1 reply; 52+ messages in thread From: Suparna Bhattacharya @ 2007-01-08 3:05 UTC (permalink / raw) To: Andrew Morton Cc: Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote: > On Sun, 7 Jan 2007 09:55:26 +0100 > Willy Tarreau <w@1wt.eu> wrote: > > > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote: > > > > > > > > > On Sat, 6 Jan 2007, H. Peter Anvin wrote: > > > > > > > > During extremely high load, it appears that what slows kernel.org down more > > > > than anything else is the time that each individual getdents() call takes. > > > > When I've looked this I've observed times from 200 ms to almost 2 seconds! > > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly > > > > packed tree, you can do the math yourself. > > > > > > "getdents()" is totally serialized by the inode semaphore. It's one of the > > > most expensive system calls in Linux, partly because of that, and partly > > > because it has to call all the way down into the filesystem in a way that > > > almost no other common system call has to (99% of all filesystem calls can > > > be handled basically at the VFS layer with generic caches - but not > > > getdents()). > > > > > > So if there are concurrent readdirs on the same directory, they get > > > serialized. If there is any file creation/deletion activity in the > > > directory, it serializes getdents(). > > > > > > To make matters worse, I don't think it has any read-ahead at all when you > > > use hashed directory entries. So if you have cold-cache case, you'll read > > > every single block totally individually, and serialized. One block at a > > > time (I think the non-hashed case is likely also suspect, but that's a > > > separate issue) > > > > > > In other words, I'm not at all surprised it hits on filldir time. > > > Especially on ext3. > > > > At work, we had the same problem on a file server with ext3. We use rsync > > to make backups to a local IDE disk, and we noticed that getdents() took > > about the same time as Peter reports (0.2 to 2 seconds), especially in > > maildir directories. We tried many things to fix it with no result, > > including enabling dirindexes. Finally, we made a full backup, and switched > > over to XFS and the problem totally disappeared. So it seems that the > > filesystem matters a lot here when there are lots of entries in a > > directory, and that ext3 is not suitable for usages with thousands > > of entries in directories with millions of files on disk. I'm not > > certain it would be that easy to try other filesystems on kernel.org > > though :-/ > > > > Yeah, slowly-growing directories will get splattered all over the disk. > > Possible short-term fixes would be to just allocate up to (say) eight > blocks when we grow a directory by one block. Or teach the > directory-growth code to use ext3 reservations. > > Longer-term people are talking about things like on-disk rerservations. > But I expect directories are being forgotten about in all of that. By on-disk reservations, do you mean persistent file preallocation ? (that is explicit preallocation of blocks to a given file) If so, you are right, we haven't really given any thought to the possibility of directories needing that feature. Regards Suparna > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 3:05 ` Suparna Bhattacharya @ 2007-01-08 12:58 ` Theodore Tso 2007-01-08 13:41 ` Johannes Stezenbach ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Theodore Tso @ 2007-01-08 12:58 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote: > > Yeah, slowly-growing directories will get splattered all over the disk. > > > > Possible short-term fixes would be to just allocate up to (say) eight > > blocks when we grow a directory by one block. Or teach the > > directory-growth code to use ext3 reservations. > > > > Longer-term people are talking about things like on-disk rerservations. > > But I expect directories are being forgotten about in all of that. > > By on-disk reservations, do you mean persistent file preallocation ? (that > is explicit preallocation of blocks to a given file) If so, you are > right, we haven't really given any thought to the possibility of directories > needing that feature. The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If you're using some kind of b-tree structure, such as XFS does for directories, preallocation doesn't help you much. Delayed allocation can save you if your delayed allocator knows how to structure disk blocks so that a btree-traversal is efficient, but I'm guessing the biggest reason why we are losing is because we don't have sufficient readahead. This also has the advantage that it will help without needing to doing a backup/restore to improve layout. Allocating some number of empty blocks when we grow the directory would be a quick hack that I'd probably do as a 2nd priority. It won't help pre-existing directories, but combined with readahead logic, should help us out greatly in the non-btree case. - Ted ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 12:58 ` Theodore Tso @ 2007-01-08 13:41 ` Johannes Stezenbach 2007-01-08 13:56 ` Theodore Tso 2007-01-08 13:43 ` Jeff Garzik [not found] ` <20070109075945.GA8799@mail.ustc.edu.cn> 2 siblings, 1 reply; 52+ messages in thread From: Johannes Stezenbach @ 2007-01-08 13:41 UTC (permalink / raw) To: Theodore Tso Cc: Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: > > The fastest and probably most important thing to add is some readahead > smarts to directories --- both to the htree and non-htree cases. If > you're using some kind of b-tree structure, such as XFS does for > directories, preallocation doesn't help you much. Delayed allocation > can save you if your delayed allocator knows how to structure disk > blocks so that a btree-traversal is efficient, but I'm guessing the > biggest reason why we are losing is because we don't have sufficient > readahead. This also has the advantage that it will help without > needing to doing a backup/restore to improve layout. Would e2fsck -D help? What kind of optimization does it perform? Thanks, Johannes ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 13:41 ` Johannes Stezenbach @ 2007-01-08 13:56 ` Theodore Tso 2007-01-08 13:59 ` Pavel Machek 0 siblings, 1 reply; 52+ messages in thread From: Theodore Tso @ 2007-01-08 13:56 UTC (permalink / raw) To: Johannes Stezenbach Cc: Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote: > > Would e2fsck -D help? What kind of optimization > does it perform? It will help a little; e2fsck -D compresses the logical view of the directory, but it doesn't optimize the physical layout on disk at all, and of course, it won't help with the lack of readahead logic. It's possible to improve how e2fsck -D works, at the moment, it's not trying to make the directory be contiguous on disk. What it should probably do is to pull a list of all of the blocks used by the directory, sort them, and then try to see if it can improve on the list by allocating some new blocks that would make the directory more contiguous on disk. I suspect any improvements that would be seen by doing this would be second order effects at most, though. - Ted ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 13:56 ` Theodore Tso @ 2007-01-08 13:59 ` Pavel Machek 2007-01-08 14:17 ` Theodore Tso 0 siblings, 1 reply; 52+ messages in thread From: Pavel Machek @ 2007-01-08 13:59 UTC (permalink / raw) To: Theodore Tso, Johannes Stezenbach, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, kernel list, webmaster, linux-ext4@vger.kernel.org Hi! > > Would e2fsck -D help? What kind of optimization > > does it perform? > > It will help a little; e2fsck -D compresses the logical view of the > directory, but it doesn't optimize the physical layout on disk at all, > and of course, it won't help with the lack of readahead logic. It's > possible to improve how e2fsck -D works, at the moment, it's not > trying to make the directory be contiguous on disk. What it should > probably do is to pull a list of all of the blocks used by the > directory, sort them, and then try to see if it can improve on the > list by allocating some new blocks that would make the directory more > contiguous on disk. I suspect any improvements that would be seen by > doing this would be second order effects at most, though. ...sounds like a job for e2defrag, not e2fsck... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 13:59 ` Pavel Machek @ 2007-01-08 14:17 ` Theodore Tso 0 siblings, 0 replies; 52+ messages in thread From: Theodore Tso @ 2007-01-08 14:17 UTC (permalink / raw) To: Pavel Machek Cc: Johannes Stezenbach, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote: > Hi! > > > > Would e2fsck -D help? What kind of optimization > > > does it perform? > > > > It will help a little; e2fsck -D compresses the logical view of the > > directory, but it doesn't optimize the physical layout on disk at all, > > and of course, it won't help with the lack of readahead logic. It's > > possible to improve how e2fsck -D works, at the moment, it's not > > trying to make the directory be contiguous on disk. What it should > > probably do is to pull a list of all of the blocks used by the > > directory, sort them, and then try to see if it can improve on the > > list by allocating some new blocks that would make the directory more > > contiguous on disk. I suspect any improvements that would be seen by > > doing this would be second order effects at most, though. > > ...sounds like a job for e2defrag, not e2fsck... I wasn't proposing to move other data blocks around in order make the directory be contiguous, but just a "quick and dirty" try to make things better. But yes, in order to really fix layout issues you would have to do a full defrag, and it's probably more important that we try to fix things so that defragmentation runs aren't necessary in the first place.... - Ted ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 12:58 ` Theodore Tso 2007-01-08 13:41 ` Johannes Stezenbach @ 2007-01-08 13:43 ` Jeff Garzik 2007-01-09 1:09 ` Paul Jackson [not found] ` <20070109075945.GA8799@mail.ustc.edu.cn> 2 siblings, 1 reply; 52+ messages in thread From: Jeff Garzik @ 2007-01-08 13:43 UTC (permalink / raw) To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org Theodore Tso wrote: > The fastest and probably most important thing to add is some readahead > smarts to directories --- both to the htree and non-htree cases. If > you're using some kind of b-tree structure, such as XFS does for > directories, preallocation doesn't help you much. Delayed allocation > can save you if your delayed allocator knows how to structure disk > blocks so that a btree-traversal is efficient, but I'm guessing the > biggest reason why we are losing is because we don't have sufficient > readahead. This also has the advantage that it will help without > needing to doing a backup/restore to improve layout. Something I just thought of: ATA and SCSI hard disks do their own read-ahead. Seeking all over the place to pick up bits of directory will hurt even more with the disk reading and throwing away data (albeit in its internal elevator and cache). Jeff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-08 13:43 ` Jeff Garzik @ 2007-01-09 1:09 ` Paul Jackson 2007-01-09 2:18 ` Jeremy Higdon 0 siblings, 1 reply; 52+ messages in thread From: Paul Jackson @ 2007-01-09 1:09 UTC (permalink / raw) To: Jeff Garzik Cc: tytso, suparna, akpm, w, torvalds, hpa, git, nigel, warthog9, randy.dunlap, pavel, linux-kernel, webmaster, linux-ext4 Jeff wrote: > Something I just thought of: ATA and SCSI hard disks do their own > read-ahead. Probably this is wishful thinking on my part, but I would have hoped that most of the read-ahead they did was for stuff that happened to be on the cylinder they were reading anyway. So long as their read-ahead doesn't cause much extra or delayed disk head motion, what does it matter? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-09 1:09 ` Paul Jackson @ 2007-01-09 2:18 ` Jeremy Higdon 0 siblings, 0 replies; 52+ messages in thread From: Jeremy Higdon @ 2007-01-09 2:18 UTC (permalink / raw) To: Paul Jackson Cc: Jeff Garzik, tytso, suparna, akpm, w, torvalds, hpa, git, nigel, warthog9, randy.dunlap, pavel, linux-kernel, webmaster, linux-ext4 On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote: > Jeff wrote: > > Something I just thought of: ATA and SCSI hard disks do their own > > read-ahead. > > Probably this is wishful thinking on my part, but I would have hoped > that most of the read-ahead they did was for stuff that happened to be > on the cylinder they were reading anyway. So long as their read-ahead > doesn't cause much extra or delayed disk head motion, what does it > matter? And they usually won't readahead if there is another command to process, though they can be set up to read unrequested data in spite of outstanding commands. When they are reading ahead, they'll only fetch LBAs beyond the last request until a buffer fills or the readahead gets interrupted. jeremy ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20070109075945.GA8799@mail.ustc.edu.cn>]
* Re: How git affects kernel.org performance [not found] ` <20070109075945.GA8799@mail.ustc.edu.cn> @ 2007-01-09 7:59 ` Fengguang Wu 2007-01-09 7:59 ` Fengguang Wu 2007-01-09 7:59 ` Fengguang Wu 2 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-09 7:59 UTC (permalink / raw) To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: > On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote: > > > Yeah, slowly-growing directories will get splattered all over the disk. > > > > > > Possible short-term fixes would be to just allocate up to (say) eight > > > blocks when we grow a directory by one block. Or teach the > > > directory-growth code to use ext3 reservations. > > > > > > Longer-term people are talking about things like on-disk rerservations. > > > But I expect directories are being forgotten about in all of that. > > > > By on-disk reservations, do you mean persistent file preallocation ? (that > > is explicit preallocation of blocks to a given file) If so, you are > > right, we haven't really given any thought to the possibility of directories > > needing that feature. > > The fastest and probably most important thing to add is some readahead > smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Regards, Wu --- fs/ext3/dir.c | 22 ++++++++++++++++++++++ fs/ext3/inode.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + unsigned long sector; + unsigned long blk; + pgoff_t offset; + + for (blk = 0; blk < inode->i_blocks; blk++) { + sector = blk << (inode->i_blkbits - 9); + sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block); + offset = sector >> (PAGE_CACHE_SHIFT - 9); + do_page_cache_readahead(mapping, filp, offset, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + if (!filp->f_pos) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070109075945.GA8799@mail.ustc.edu.cn> 2007-01-09 7:59 ` Fengguang Wu @ 2007-01-09 7:59 ` Fengguang Wu 2007-01-09 16:23 ` Linus Torvalds 2007-01-09 7:59 ` Fengguang Wu 2 siblings, 1 reply; 52+ messages in thread From: Fengguang Wu @ 2007-01-09 7:59 UTC (permalink / raw) To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: > On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote: > > > Yeah, slowly-growing directories will get splattered all over the disk. > > > > > > Possible short-term fixes would be to just allocate up to (say) eight > > > blocks when we grow a directory by one block. Or teach the > > > directory-growth code to use ext3 reservations. > > > > > > Longer-term people are talking about things like on-disk rerservations. > > > But I expect directories are being forgotten about in all of that. > > > > By on-disk reservations, do you mean persistent file preallocation ? (that > > is explicit preallocation of blocks to a given file) If so, you are > > right, we haven't really given any thought to the possibility of directories > > needing that feature. > > The fastest and probably most important thing to add is some readahead > smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Regards, Wu --- fs/ext3/dir.c | 22 ++++++++++++++++++++++ fs/ext3/inode.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + unsigned long sector; + unsigned long blk; + pgoff_t offset; + + for (blk = 0; blk < inode->i_blocks; blk++) { + sector = blk << (inode->i_blkbits - 9); + sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block); + offset = sector >> (PAGE_CACHE_SHIFT - 9); + do_page_cache_readahead(mapping, filp, offset, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + if (!filp->f_pos) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-09 7:59 ` Fengguang Wu @ 2007-01-09 16:23 ` Linus Torvalds [not found] ` <20070110015739.GA26978@mail.ustc.edu.cn> 0 siblings, 1 reply; 52+ messages in thread From: Linus Torvalds @ 2007-01-09 16:23 UTC (permalink / raw) To: Fengguang Wu Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > The fastest and probably most important thing to add is some readahead > > smarts to directories --- both to the htree and non-htree cases. If > > Here's is a quick hack to practice the directory readahead idea. > Comments are welcome, it's a freshman's work :) Well, I'd probably have done it differently, but more important is whether this actually makes a difference performance-wise. Have you benchmarked it at all? Doing an echo 3 > /proc/sys/vm/drop_caches is your friend for testing things like this, to force cold-cache behaviour.. Linus ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20070110015739.GA26978@mail.ustc.edu.cn>]
* Re: How git affects kernel.org performance [not found] ` <20070110015739.GA26978@mail.ustc.edu.cn> @ 2007-01-10 1:57 ` Fengguang Wu 2007-01-10 1:57 ` Fengguang Wu ` (2 subsequent siblings) 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 1:57 UTC (permalink / raw) To: Linus Torvalds Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > The fastest and probably most important thing to add is some readahead > > > smarts to directories --- both to the htree and non-htree cases. If > > > > Here's is a quick hack to practice the directory readahead idea. > > Comments are welcome, it's a freshman's work :) > > Well, I'd probably have done it differently, but more important is whether > this actually makes a difference performance-wise. Have you benchmarked it > at all? Yes, a trivial test shows a marginal improvement, on a minimal debian system: # find / | wc -l 13641 # time find / > /dev/null real 0m10.000s user 0m0.210s sys 0m4.370s # time find / > /dev/null real 0m9.890s user 0m0.160s sys 0m3.270s > Doing an > > echo 3 > /proc/sys/vm/drop_caches > > is your friend for testing things like this, to force cold-cache > behaviour.. Thanks, I'll work out numbers on large/concurrent dir accesses soon. Regards, Wu ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110015739.GA26978@mail.ustc.edu.cn> 2007-01-10 1:57 ` Fengguang Wu @ 2007-01-10 1:57 ` Fengguang Wu 2007-01-10 1:57 ` Fengguang Wu 2007-01-10 3:20 ` Nigel Cunningham 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 1:57 UTC (permalink / raw) To: Linus Torvalds Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > The fastest and probably most important thing to add is some readahead > > > smarts to directories --- both to the htree and non-htree cases. If > > > > Here's is a quick hack to practice the directory readahead idea. > > Comments are welcome, it's a freshman's work :) > > Well, I'd probably have done it differently, but more important is whether > this actually makes a difference performance-wise. Have you benchmarked it > at all? Yes, a trivial test shows a marginal improvement, on a minimal debian system: # find / | wc -l 13641 # time find / > /dev/null real 0m10.000s user 0m0.210s sys 0m4.370s # time find / > /dev/null real 0m9.890s user 0m0.160s sys 0m3.270s > Doing an > > echo 3 > /proc/sys/vm/drop_caches > > is your friend for testing things like this, to force cold-cache > behaviour.. Thanks, I'll work out numbers on large/concurrent dir accesses soon. Regards, Wu ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110015739.GA26978@mail.ustc.edu.cn> 2007-01-10 1:57 ` Fengguang Wu 2007-01-10 1:57 ` Fengguang Wu @ 2007-01-10 1:57 ` Fengguang Wu 2007-01-10 3:20 ` Nigel Cunningham 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 1:57 UTC (permalink / raw) To: Linus Torvalds Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > The fastest and probably most important thing to add is some readahead > > > smarts to directories --- both to the htree and non-htree cases. If > > > > Here's is a quick hack to practice the directory readahead idea. > > Comments are welcome, it's a freshman's work :) > > Well, I'd probably have done it differently, but more important is whether > this actually makes a difference performance-wise. Have you benchmarked it > at all? Yes, a trivial test shows a marginal improvement, on a minimal debian system: # find / | wc -l 13641 # time find / > /dev/null real 0m10.000s user 0m0.210s sys 0m4.370s # time find / > /dev/null real 0m9.890s user 0m0.160s sys 0m3.270s > Doing an > > echo 3 > /proc/sys/vm/drop_caches > > is your friend for testing things like this, to force cold-cache > behaviour.. Thanks, I'll work out numbers on large/concurrent dir accesses soon. Regards, Wu ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110015739.GA26978@mail.ustc.edu.cn> ` (2 preceding siblings ...) 2007-01-10 1:57 ` Fengguang Wu @ 2007-01-10 3:20 ` Nigel Cunningham [not found] ` <20070110140730.GA986@mail.ustc.edu.cn> 3 siblings, 1 reply; 52+ messages in thread From: Nigel Cunningham @ 2007-01-10 3:20 UTC (permalink / raw) To: Fengguang Wu Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org Hi. On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote: > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > > > The fastest and probably most important thing to add is some readahead > > > > smarts to directories --- both to the htree and non-htree cases. If > > > > > > Here's is a quick hack to practice the directory readahead idea. > > > Comments are welcome, it's a freshman's work :) > > > > Well, I'd probably have done it differently, but more important is whether > > this actually makes a difference performance-wise. Have you benchmarked it > > at all? > > Yes, a trivial test shows a marginal improvement, on a minimal debian system: > > # find / | wc -l > 13641 > > # time find / > /dev/null > > real 0m10.000s > user 0m0.210s > sys 0m4.370s > > # time find / > /dev/null > > real 0m9.890s > user 0m0.160s > sys 0m3.270s > > > Doing an > > > > echo 3 > /proc/sys/vm/drop_caches > > > > is your friend for testing things like this, to force cold-cache > > behaviour.. > > Thanks, I'll work out numbers on large/concurrent dir accesses soon. I gave it a try, and I'm afraid the results weren't pretty. I did: time find /usr/src | wc -l on current git with (3 times) and without (5 times) the patch, and got with: real 54.306, 54.327, 53.742s usr 0.324, 0.284, 0.234s sys 2.432, 2.484, 2.592s without: real 24.413, 24.616, 24.080s usr 0.208, 0.316, 0.312s sys: 2.496, 2.440, 2.540s Subsequent runs without dropping caches did give a significant improvement in both cases (1.821/.188/1.632 is one result I wrote with the patch applied). Regards, Nigel ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20070110140730.GA986@mail.ustc.edu.cn>]
* Re: How git affects kernel.org performance [not found] ` <20070110140730.GA986@mail.ustc.edu.cn> @ 2007-01-10 14:07 ` Fengguang Wu 2007-01-10 14:07 ` Fengguang Wu ` (2 subsequent siblings) 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 14:07 UTC (permalink / raw) To: Nigel Cunningham Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote: > Hi. > > On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote: > > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > > > > > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > > > > > The fastest and probably most important thing to add is some readahead > > > > > smarts to directories --- both to the htree and non-htree cases. If > > > > > > > > Here's is a quick hack to practice the directory readahead idea. > > > > Comments are welcome, it's a freshman's work :) > > > > > > Well, I'd probably have done it differently, but more important is whether > > > this actually makes a difference performance-wise. Have you benchmarked it > > > at all? > > > > Yes, a trivial test shows a marginal improvement, on a minimal debian system: > > > > # find / | wc -l > > 13641 > > > > # time find / > /dev/null > > > > real 0m10.000s > > user 0m0.210s > > sys 0m4.370s > > > > # time find / > /dev/null > > > > real 0m9.890s > > user 0m0.160s > > sys 0m3.270s > > > > > Doing an > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > is your friend for testing things like this, to force cold-cache > > > behaviour.. > > > > Thanks, I'll work out numbers on large/concurrent dir accesses soon. > > I gave it a try, and I'm afraid the results weren't pretty. > > I did: > > time find /usr/src | wc -l > > on current git with (3 times) and without (5 times) the patch, and got > > with: > real 54.306, 54.327, 53.742s > usr 0.324, 0.284, 0.234s > sys 2.432, 2.484, 2.592s > > without: > real 24.413, 24.616, 24.080s > usr 0.208, 0.316, 0.312s > sys: 2.496, 2.440, 2.540s > > Subsequent runs without dropping caches did give a significant > improvement in both cases (1.821/.188/1.632 is one result I wrote with > the patch applied). Thanks, Nigel. But I'm very sorry that the calculation in the patch was wrong. Would you give this new patch a run? It produced pretty numbers here: #!/bin/zsh ROOT=/mnt/mnt TIMEFMT="%E clock %S kernel %U user %w+%c cs %J" echo 3 > /proc/sys/vm/drop_caches # 49: enable dir readahead # 50: disable echo ${1:-50} > /proc/sys/vm/readahead_ratio # time find $ROOT/a > /dev/null time find /etch > /dev/null # time find $ROOT/a > /dev/null& # time grep -r asdf $ROOT/b > /dev/null& # time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null& exit 0 # collected results on a SATA disk: # ./test-parallel-dir-reada.sh 49 4.18s clock 0.08s kernel 0.04s user 418+0 cs find $ROOT/a > /dev/null 4.09s clock 0.10s kernel 0.02s user 410+1 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 50 12.18s clock 0.15s kernel 0.07s user 1520+4 cs find $ROOT/a > /dev/null 11.99s clock 0.13s kernel 0.04s user 1558+6 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 49 4.01s clock 0.06s kernel 0.01s user 1567+2 cs find /etch > /dev/null 4.08s clock 0.07s kernel 0.00s user 1568+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 50 4.10s clock 0.09s kernel 0.01s user 1578+1 cs find /etch > /dev/null 4.19s clock 0.08s kernel 0.03s user 1578+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 49 7.73s clock 0.11s kernel 0.06s user 438+2 cs find $ROOT/a > /dev/null 18.92s clock 0.43s kernel 0.02s user 1246+13 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 32.91s clock 4.20s kernel 1.55s user 103564+51 cs grep -r asdf $ROOT/b > /dev/null 8.47s clock 0.10s kernel 0.02s user 442+4 cs find $ROOT/a > /dev/null 19.24s clock 0.53s kernel 0.03s user 1250+23 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 29.93s clock 4.18s kernel 1.61s user 100425+47 cs grep -r asdf $ROOT/b > /dev/null # ./test-parallel-dir-reada.sh 50 17.87s clock 0.57s kernel 0.02s user 1244+21 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 21.30s clock 0.08s kernel 0.05s user 1517+5 cs find $ROOT/a > /dev/null 49.68s clock 3.94s kernel 1.67s user 101520+57 cs grep -r asdf $ROOT/b > /dev/null 15.66s clock 0.51s kernel 0.00s user 1248+25 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 22.15s clock 0.15s kernel 0.04s user 1520+5 cs find $ROOT/a > /dev/null 46.14s clock 4.08s kernel 1.68s user 101517+63 cs grep -r asdf $ROOT/b > /dev/null Thanks, Wu --- Subject: ext3 readdir readahead Do readahead for ext3_readdir(). Reasons to be aggressive: - readdir() users are likely to traverse the whole directory, so readahead miss is not a concern. - most dirs are small, so slow start is not good - the htree indexing introduces some randomness, which can be helped by the aggressiveness. So we do 128K sized readaheads, at twice the speed of reads. The following actual readahead pages are collected for a dir with 110000 entries: 32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19 That means a readahead hit ratio of 454/541 = 84% The performance is marginally better for a minimal debian system: command: find / baseline: 4.10s 4.19s patched: 4.01s 4.08s And considerably better for 100 directories, each with 1000 8K files: command: find /throwaways baseline: 12.18s 11.99s patched: 4.18s 4.09s And also noticable better for parallel operations: baseline patched find /throwaways & 21.30s 22.15s 7.73s 8.47s grep -r asdf /throwaways2 & 49.68s 46.14s 32.91s 29.93s cp /KNOPPIX_CD.iso /dev/null & 17.87s 15.66s 18.92s 19.24s Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/ext3/dir.c | 33 +++++++++++++++++++++++++++++++++ fs/ext3/inode.c | 2 +- include/linux/ext3_fs.h | 2 ++ 3 files changed, 36 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,28 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +#define DIR_READAHEAD_BYTES (128*1024) +#define DIR_READAHEAD_PGMASK ((DIR_READAHEAD_BYTES >> PAGE_CACHE_SHIFT) - 1) + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + int bbits = inode->i_blkbits; + unsigned long blk, end; + + blk = filp->f_ra.prev_page << (PAGE_CACHE_SHIFT - bbits); + end = min(inode->i_blocks >> (bbits - 9), + blk + (DIR_READAHEAD_BYTES >> bbits)); + + for (; blk < end; blk++) { + pgoff_t phy; + phy = generic_block_bmap(inode->i_mapping, blk, ext3_get_block) + >> (PAGE_CACHE_SHIFT - bbits); + do_page_cache_readahead(mapping, filp, phy, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +130,17 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + /* + * Reading-ahead at 2x the page fault rate, in hope of reducing + * readahead misses caused by the partially random htree order. + */ + filp->f_ra.prev_page += 2; + filp->f_ra.prev_page &= ~1; + + if (!(filp->f_ra.prev_page & DIR_READAHEAD_PGMASK) && + filp->f_ra.prev_page < (inode->i_blocks >> (PAGE_CACHE_SHIFT-9))) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); --- linux.orig/include/linux/ext3_fs.h +++ linux/include/linux/ext3_fs.h @@ -814,6 +814,8 @@ struct buffer_head * ext3_bread (handle_ int ext3_get_blocks_handle(handle_t *handle, struct inode *inode, sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result, int create, int extend_disksize); +extern int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); extern void ext3_read_inode (struct inode *); extern int ext3_write_inode (struct inode *, int); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110140730.GA986@mail.ustc.edu.cn> 2007-01-10 14:07 ` Fengguang Wu @ 2007-01-10 14:07 ` Fengguang Wu 2007-01-10 14:07 ` Fengguang Wu 2007-01-12 10:54 ` Nigel Cunningham 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 14:07 UTC (permalink / raw) To: Nigel Cunningham Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote: > Hi. > > On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote: > > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > > > > > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > > > > > The fastest and probably most important thing to add is some readahead > > > > > smarts to directories --- both to the htree and non-htree cases. If > > > > > > > > Here's is a quick hack to practice the directory readahead idea. > > > > Comments are welcome, it's a freshman's work :) > > > > > > Well, I'd probably have done it differently, but more important is whether > > > this actually makes a difference performance-wise. Have you benchmarked it > > > at all? > > > > Yes, a trivial test shows a marginal improvement, on a minimal debian system: > > > > # find / | wc -l > > 13641 > > > > # time find / > /dev/null > > > > real 0m10.000s > > user 0m0.210s > > sys 0m4.370s > > > > # time find / > /dev/null > > > > real 0m9.890s > > user 0m0.160s > > sys 0m3.270s > > > > > Doing an > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > is your friend for testing things like this, to force cold-cache > > > behaviour.. > > > > Thanks, I'll work out numbers on large/concurrent dir accesses soon. > > I gave it a try, and I'm afraid the results weren't pretty. > > I did: > > time find /usr/src | wc -l > > on current git with (3 times) and without (5 times) the patch, and got > > with: > real 54.306, 54.327, 53.742s > usr 0.324, 0.284, 0.234s > sys 2.432, 2.484, 2.592s > > without: > real 24.413, 24.616, 24.080s > usr 0.208, 0.316, 0.312s > sys: 2.496, 2.440, 2.540s > > Subsequent runs without dropping caches did give a significant > improvement in both cases (1.821/.188/1.632 is one result I wrote with > the patch applied). Thanks, Nigel. But I'm very sorry that the calculation in the patch was wrong. Would you give this new patch a run? It produced pretty numbers here: #!/bin/zsh ROOT=/mnt/mnt TIMEFMT="%E clock %S kernel %U user %w+%c cs %J" echo 3 > /proc/sys/vm/drop_caches # 49: enable dir readahead # 50: disable echo ${1:-50} > /proc/sys/vm/readahead_ratio # time find $ROOT/a > /dev/null time find /etch > /dev/null # time find $ROOT/a > /dev/null& # time grep -r asdf $ROOT/b > /dev/null& # time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null& exit 0 # collected results on a SATA disk: # ./test-parallel-dir-reada.sh 49 4.18s clock 0.08s kernel 0.04s user 418+0 cs find $ROOT/a > /dev/null 4.09s clock 0.10s kernel 0.02s user 410+1 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 50 12.18s clock 0.15s kernel 0.07s user 1520+4 cs find $ROOT/a > /dev/null 11.99s clock 0.13s kernel 0.04s user 1558+6 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 49 4.01s clock 0.06s kernel 0.01s user 1567+2 cs find /etch > /dev/null 4.08s clock 0.07s kernel 0.00s user 1568+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 50 4.10s clock 0.09s kernel 0.01s user 1578+1 cs find /etch > /dev/null 4.19s clock 0.08s kernel 0.03s user 1578+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 49 7.73s clock 0.11s kernel 0.06s user 438+2 cs find $ROOT/a > /dev/null 18.92s clock 0.43s kernel 0.02s user 1246+13 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 32.91s clock 4.20s kernel 1.55s user 103564+51 cs grep -r asdf $ROOT/b > /dev/null 8.47s clock 0.10s kernel 0.02s user 442+4 cs find $ROOT/a > /dev/null 19.24s clock 0.53s kernel 0.03s user 1250+23 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 29.93s clock 4.18s kernel 1.61s user 100425+47 cs grep -r asdf $ROOT/b > /dev/null # ./test-parallel-dir-reada.sh 50 17.87s clock 0.57s kernel 0.02s user 1244+21 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 21.30s clock 0.08s kernel 0.05s user 1517+5 cs find $ROOT/a > /dev/null 49.68s clock 3.94s kernel 1.67s user 101520+57 cs grep -r asdf $ROOT/b > /dev/null 15.66s clock 0.51s kernel 0.00s user 1248+25 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 22.15s clock 0.15s kernel 0.04s user 1520+5 cs find $ROOT/a > /dev/null 46.14s clock 4.08s kernel 1.68s user 101517+63 cs grep -r asdf $ROOT/b > /dev/null Thanks, Wu --- Subject: ext3 readdir readahead Do readahead for ext3_readdir(). Reasons to be aggressive: - readdir() users are likely to traverse the whole directory, so readahead miss is not a concern. - most dirs are small, so slow start is not good - the htree indexing introduces some randomness, which can be helped by the aggressiveness. So we do 128K sized readaheads, at twice the speed of reads. The following actual readahead pages are collected for a dir with 110000 entries: 32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19 That means a readahead hit ratio of 454/541 = 84% The performance is marginally better for a minimal debian system: command: find / baseline: 4.10s 4.19s patched: 4.01s 4.08s And considerably better for 100 directories, each with 1000 8K files: command: find /throwaways baseline: 12.18s 11.99s patched: 4.18s 4.09s And also noticable better for parallel operations: baseline patched find /throwaways & 21.30s 22.15s 7.73s 8.47s grep -r asdf /throwaways2 & 49.68s 46.14s 32.91s 29.93s cp /KNOPPIX_CD.iso /dev/null & 17.87s 15.66s 18.92s 19.24s Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/ext3/dir.c | 33 +++++++++++++++++++++++++++++++++ fs/ext3/inode.c | 2 +- include/linux/ext3_fs.h | 2 ++ 3 files changed, 36 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,28 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +#define DIR_READAHEAD_BYTES (128*1024) +#define DIR_READAHEAD_PGMASK ((DIR_READAHEAD_BYTES >> PAGE_CACHE_SHIFT) - 1) + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + int bbits = inode->i_blkbits; + unsigned long blk, end; + + blk = filp->f_ra.prev_page << (PAGE_CACHE_SHIFT - bbits); + end = min(inode->i_blocks >> (bbits - 9), + blk + (DIR_READAHEAD_BYTES >> bbits)); + + for (; blk < end; blk++) { + pgoff_t phy; + phy = generic_block_bmap(inode->i_mapping, blk, ext3_get_block) + >> (PAGE_CACHE_SHIFT - bbits); + do_page_cache_readahead(mapping, filp, phy, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +130,17 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + /* + * Reading-ahead at 2x the page fault rate, in hope of reducing + * readahead misses caused by the partially random htree order. + */ + filp->f_ra.prev_page += 2; + filp->f_ra.prev_page &= ~1; + + if (!(filp->f_ra.prev_page & DIR_READAHEAD_PGMASK) && + filp->f_ra.prev_page < (inode->i_blocks >> (PAGE_CACHE_SHIFT-9))) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); --- linux.orig/include/linux/ext3_fs.h +++ linux/include/linux/ext3_fs.h @@ -814,6 +814,8 @@ struct buffer_head * ext3_bread (handle_ int ext3_get_blocks_handle(handle_t *handle, struct inode *inode, sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result, int create, int extend_disksize); +extern int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); extern void ext3_read_inode (struct inode *); extern int ext3_write_inode (struct inode *, int); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110140730.GA986@mail.ustc.edu.cn> 2007-01-10 14:07 ` Fengguang Wu 2007-01-10 14:07 ` Fengguang Wu @ 2007-01-10 14:07 ` Fengguang Wu 2007-01-12 10:54 ` Nigel Cunningham 3 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-10 14:07 UTC (permalink / raw) To: Nigel Cunningham Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote: > Hi. > > On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote: > > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: > > > > > > > > > On Tue, 9 Jan 2007, Fengguang Wu wrote: > > > > > > > > > > The fastest and probably most important thing to add is some readahead > > > > > smarts to directories --- both to the htree and non-htree cases. If > > > > > > > > Here's is a quick hack to practice the directory readahead idea. > > > > Comments are welcome, it's a freshman's work :) > > > > > > Well, I'd probably have done it differently, but more important is whether > > > this actually makes a difference performance-wise. Have you benchmarked it > > > at all? > > > > Yes, a trivial test shows a marginal improvement, on a minimal debian system: > > > > # find / | wc -l > > 13641 > > > > # time find / > /dev/null > > > > real 0m10.000s > > user 0m0.210s > > sys 0m4.370s > > > > # time find / > /dev/null > > > > real 0m9.890s > > user 0m0.160s > > sys 0m3.270s > > > > > Doing an > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > is your friend for testing things like this, to force cold-cache > > > behaviour.. > > > > Thanks, I'll work out numbers on large/concurrent dir accesses soon. > > I gave it a try, and I'm afraid the results weren't pretty. > > I did: > > time find /usr/src | wc -l > > on current git with (3 times) and without (5 times) the patch, and got > > with: > real 54.306, 54.327, 53.742s > usr 0.324, 0.284, 0.234s > sys 2.432, 2.484, 2.592s > > without: > real 24.413, 24.616, 24.080s > usr 0.208, 0.316, 0.312s > sys: 2.496, 2.440, 2.540s > > Subsequent runs without dropping caches did give a significant > improvement in both cases (1.821/.188/1.632 is one result I wrote with > the patch applied). Thanks, Nigel. But I'm very sorry that the calculation in the patch was wrong. Would you give this new patch a run? It produced pretty numbers here: #!/bin/zsh ROOT=/mnt/mnt TIMEFMT="%E clock %S kernel %U user %w+%c cs %J" echo 3 > /proc/sys/vm/drop_caches # 49: enable dir readahead # 50: disable echo ${1:-50} > /proc/sys/vm/readahead_ratio # time find $ROOT/a > /dev/null time find /etch > /dev/null # time find $ROOT/a > /dev/null& # time grep -r asdf $ROOT/b > /dev/null& # time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null& exit 0 # collected results on a SATA disk: # ./test-parallel-dir-reada.sh 49 4.18s clock 0.08s kernel 0.04s user 418+0 cs find $ROOT/a > /dev/null 4.09s clock 0.10s kernel 0.02s user 410+1 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 50 12.18s clock 0.15s kernel 0.07s user 1520+4 cs find $ROOT/a > /dev/null 11.99s clock 0.13s kernel 0.04s user 1558+6 cs find $ROOT/a > /dev/null # ./test-parallel-dir-reada.sh 49 4.01s clock 0.06s kernel 0.01s user 1567+2 cs find /etch > /dev/null 4.08s clock 0.07s kernel 0.00s user 1568+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 50 4.10s clock 0.09s kernel 0.01s user 1578+1 cs find /etch > /dev/null 4.19s clock 0.08s kernel 0.03s user 1578+0 cs find /etch > /dev/null # ./test-parallel-dir-reada.sh 49 7.73s clock 0.11s kernel 0.06s user 438+2 cs find $ROOT/a > /dev/null 18.92s clock 0.43s kernel 0.02s user 1246+13 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 32.91s clock 4.20s kernel 1.55s user 103564+51 cs grep -r asdf $ROOT/b > /dev/null 8.47s clock 0.10s kernel 0.02s user 442+4 cs find $ROOT/a > /dev/null 19.24s clock 0.53s kernel 0.03s user 1250+23 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 29.93s clock 4.18s kernel 1.61s user 100425+47 cs grep -r asdf $ROOT/b > /dev/null # ./test-parallel-dir-reada.sh 50 17.87s clock 0.57s kernel 0.02s user 1244+21 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 21.30s clock 0.08s kernel 0.05s user 1517+5 cs find $ROOT/a > /dev/null 49.68s clock 3.94s kernel 1.67s user 101520+57 cs grep -r asdf $ROOT/b > /dev/null 15.66s clock 0.51s kernel 0.00s user 1248+25 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 22.15s clock 0.15s kernel 0.04s user 1520+5 cs find $ROOT/a > /dev/null 46.14s clock 4.08s kernel 1.68s user 101517+63 cs grep -r asdf $ROOT/b > /dev/null Thanks, Wu --- Subject: ext3 readdir readahead Do readahead for ext3_readdir(). Reasons to be aggressive: - readdir() users are likely to traverse the whole directory, so readahead miss is not a concern. - most dirs are small, so slow start is not good - the htree indexing introduces some randomness, which can be helped by the aggressiveness. So we do 128K sized readaheads, at twice the speed of reads. The following actual readahead pages are collected for a dir with 110000 entries: 32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19 That means a readahead hit ratio of 454/541 = 84% The performance is marginally better for a minimal debian system: command: find / baseline: 4.10s 4.19s patched: 4.01s 4.08s And considerably better for 100 directories, each with 1000 8K files: command: find /throwaways baseline: 12.18s 11.99s patched: 4.18s 4.09s And also noticable better for parallel operations: baseline patched find /throwaways & 21.30s 22.15s 7.73s 8.47s grep -r asdf /throwaways2 & 49.68s 46.14s 32.91s 29.93s cp /KNOPPIX_CD.iso /dev/null & 17.87s 15.66s 18.92s 19.24s Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/ext3/dir.c | 33 +++++++++++++++++++++++++++++++++ fs/ext3/inode.c | 2 +- include/linux/ext3_fs.h | 2 ++ 3 files changed, 36 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,28 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +#define DIR_READAHEAD_BYTES (128*1024) +#define DIR_READAHEAD_PGMASK ((DIR_READAHEAD_BYTES >> PAGE_CACHE_SHIFT) - 1) + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + int bbits = inode->i_blkbits; + unsigned long blk, end; + + blk = filp->f_ra.prev_page << (PAGE_CACHE_SHIFT - bbits); + end = min(inode->i_blocks >> (bbits - 9), + blk + (DIR_READAHEAD_BYTES >> bbits)); + + for (; blk < end; blk++) { + pgoff_t phy; + phy = generic_block_bmap(inode->i_mapping, blk, ext3_get_block) + >> (PAGE_CACHE_SHIFT - bbits); + do_page_cache_readahead(mapping, filp, phy, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +130,17 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + /* + * Reading-ahead at 2x the page fault rate, in hope of reducing + * readahead misses caused by the partially random htree order. + */ + filp->f_ra.prev_page += 2; + filp->f_ra.prev_page &= ~1; + + if (!(filp->f_ra.prev_page & DIR_READAHEAD_PGMASK) && + filp->f_ra.prev_page < (inode->i_blocks >> (PAGE_CACHE_SHIFT-9))) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); --- linux.orig/include/linux/ext3_fs.h +++ linux/include/linux/ext3_fs.h @@ -814,6 +814,8 @@ struct buffer_head * ext3_bread (handle_ int ext3_get_blocks_handle(handle_t *handle, struct inode *inode, sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result, int create, int extend_disksize); +extern int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); extern void ext3_read_inode (struct inode *); extern int ext3_write_inode (struct inode *, int); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070110140730.GA986@mail.ustc.edu.cn> ` (2 preceding siblings ...) 2007-01-10 14:07 ` Fengguang Wu @ 2007-01-12 10:54 ` Nigel Cunningham 3 siblings, 0 replies; 52+ messages in thread From: Nigel Cunningham @ 2007-01-12 10:54 UTC (permalink / raw) To: Fengguang Wu Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org Hi. On Wed, 2007-01-10 at 22:07 +0800, Fengguang Wu wrote: > Thanks, Nigel. > But I'm very sorry that the calculation in the patch was wrong. > > Would you give this new patch a run? Sorry for my slowness. I just did time find /usr/src | wc -l again: Without patch: 35.137, 35.104, 35.351 seconds With patch: 34.518, 34.376, 34.489 seconds So there's about .8 seconds saved. Regards, Nigel ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance [not found] ` <20070109075945.GA8799@mail.ustc.edu.cn> 2007-01-09 7:59 ` Fengguang Wu 2007-01-09 7:59 ` Fengguang Wu @ 2007-01-09 7:59 ` Fengguang Wu 2 siblings, 0 replies; 52+ messages in thread From: Fengguang Wu @ 2007-01-09 7:59 UTC (permalink / raw) To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4@vger.kernel.org On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: > On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote: > > > Yeah, slowly-growing directories will get splattered all over the disk. > > > > > > Possible short-term fixes would be to just allocate up to (say) eight > > > blocks when we grow a directory by one block. Or teach the > > > directory-growth code to use ext3 reservations. > > > > > > Longer-term people are talking about things like on-disk rerservations. > > > But I expect directories are being forgotten about in all of that. > > > > By on-disk reservations, do you mean persistent file preallocation ? (that > > is explicit preallocation of blocks to a given file) If so, you are > > right, we haven't really given any thought to the possibility of directories > > needing that feature. > > The fastest and probably most important thing to add is some readahead > smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Regards, Wu --- fs/ext3/dir.c | 22 ++++++++++++++++++++++ fs/ext3/inode.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping; + unsigned long sector; + unsigned long blk; + pgoff_t offset; + + for (blk = 0; blk < inode->i_blocks; blk++) { + sector = blk << (inode->i_blkbits - 9); + sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block); + offset = sector >> (PAGE_CACHE_SHIFT - 9); + do_page_cache_readahead(mapping, filp, offset, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi sb = inode->i_sb; + if (!filp->f_pos) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) && --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 5:24 ` How git affects kernel.org performance H. Peter Anvin 2007-01-07 5:39 ` Linus Torvalds @ 2007-01-07 14:57 ` Robert Fitzsimons 2007-01-07 19:12 ` J.H. 2007-01-08 1:51 ` Jakub Narebski 2007-01-07 15:06 ` Krzysztof Halasa 2 siblings, 2 replies; 52+ messages in thread From: Robert Fitzsimons @ 2007-01-07 14:57 UTC (permalink / raw) To: H. Peter Anvin Cc: git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster > Some more data on how git affects kernel.org... I have a quick question about the gitweb configuration, does the $projects_list config entry point to a directory or a file? When it is a directory gitweb ends up doing the equivalent of a 'find $project_list' to find all the available projects, so it really should be changed to a projects list file. Robert ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 14:57 ` Robert Fitzsimons @ 2007-01-07 19:12 ` J.H. 2007-01-08 1:51 ` Jakub Narebski 1 sibling, 0 replies; 52+ messages in thread From: J.H. @ 2007-01-07 19:12 UTC (permalink / raw) To: Robert Fitzsimons Cc: H. Peter Anvin, git, nigel, Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster With my gitweb caching changes this isn't as big of a deal as the front page is only generated once every 10 minutes or so (and with the changes I'm working on today that timeout will be variable) - John On Sun, 2007-01-07 at 14:57 +0000, Robert Fitzsimons wrote: > > Some more data on how git affects kernel.org... > > I have a quick question about the gitweb configuration, does the > $projects_list config entry point to a directory or a file? > > When it is a directory gitweb ends up doing the equivalent of a 'find > $project_list' to find all the available projects, so it really should > be changed to a projects list file. > > Robert ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 14:57 ` Robert Fitzsimons 2007-01-07 19:12 ` J.H. @ 2007-01-08 1:51 ` Jakub Narebski 1 sibling, 0 replies; 52+ messages in thread From: Jakub Narebski @ 2007-01-08 1:51 UTC (permalink / raw) To: git; +Cc: linux-kernel Robert Fitzsimons wrote: >> Some more data on how git affects kernel.org... > > I have a quick question about the gitweb configuration, does the > $projects_list config entry point to a directory or a file? It can point to both. Usually it is either unset, and then we do find over $projectroot, or it is a file (URI escaped path relative to $projectroot, SPACE, and URI escaped owner of a project; you can get the file clicking on TXT on projects_list page). -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 5:24 ` How git affects kernel.org performance H. Peter Anvin 2007-01-07 5:39 ` Linus Torvalds 2007-01-07 14:57 ` Robert Fitzsimons @ 2007-01-07 15:06 ` Krzysztof Halasa 2007-01-07 20:31 ` Shawn O. Pearce 2 siblings, 1 reply; 52+ messages in thread From: Krzysztof Halasa @ 2007-01-07 15:06 UTC (permalink / raw) To: H. Peter Anvin Cc: git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster "H. Peter Anvin" <hpa@zytor.com> writes: > During extremely high load, it appears that what slows kernel.org down > more than anything else is the time that each individual getdents() > call takes. When I've looked this I've observed times from 200 ms to > almost 2 seconds! Since an unpacked *OR* unpruned git tree adds 256 > directories to a cleanly packed tree, you can do the math yourself. Hmm... Perhaps it should be possible to push git updates as a pack file only? I mean, the pack file would stay packed = never individual files and never 256 directories? People aren't doing commit/etc. activity there, right? -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 15:06 ` Krzysztof Halasa @ 2007-01-07 20:31 ` Shawn O. Pearce 2007-01-08 14:46 ` Nicolas Pitre 0 siblings, 1 reply; 52+ messages in thread From: Shawn O. Pearce @ 2007-01-07 20:31 UTC (permalink / raw) To: Krzysztof Halasa Cc: H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster Krzysztof Halasa <khc@pm.waw.pl> wrote: > Hmm... Perhaps it should be possible to push git updates as a pack > file only? I mean, the pack file would stay packed = never individual > files and never 256 directories? Latest Git does this. If the server is later than 1.4.3.3 then the receive-pack process can actually store the pack file rather than unpacking it into loose objects. The downside is that it will copy any missing base objects onto the end of a thin pack to make it not-thin. There's actually a limit that controls when to keep the pack and when not to (receive.unpackLimit). In 1.4.3.3 this defaulted to 5000 objects, which meant all but the largest pushes will be exploded into loose objects. In 1.5.0-rc0 that limit changed from 5000 to 100, though Nico did a lot of study and discovered that the optimum is likely 3. But that tends to create too many pack files so 100 was arbitrarily chosen. So if the user pushes <100 objects to a 1.5.0-rc0 server we unpack to loose; >= 100 we keep the pack file. Perhaps this would help kernel.org. -- Shawn. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: How git affects kernel.org performance 2007-01-07 20:31 ` Shawn O. Pearce @ 2007-01-08 14:46 ` Nicolas Pitre 0 siblings, 0 replies; 52+ messages in thread From: Nicolas Pitre @ 2007-01-08 14:46 UTC (permalink / raw) To: Shawn O. Pearce Cc: Krzysztof Halasa, H. Peter Anvin, git, nigel, J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster On Sun, 7 Jan 2007, Shawn O. Pearce wrote: > Krzysztof Halasa <khc@pm.waw.pl> wrote: > > Hmm... Perhaps it should be possible to push git updates as a pack > > file only? I mean, the pack file would stay packed = never individual > > files and never 256 directories? > > Latest Git does this. If the server is later than 1.4.3.3 then > the receive-pack process can actually store the pack file rather > than unpacking it into loose objects. The downside is that it will > copy any missing base objects onto the end of a thin pack to make > it not-thin. No. There are no thin packs for pushes. And IMHO it should stay that way exactly to avoid this little inconvenience on servers. The fetch case is a different story of course. Nicolas ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2007-01-12 10:55 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20061214223718.GA3816@elf.ucw.cz>
[not found] ` <20061216094421.416a271e.randy.dunlap@oracle.com>
[not found] ` <20061216095702.3e6f1d1f.akpm@osdl.org>
[not found] ` <458434B0.4090506@oracle.com>
[not found] ` <1166297434.26330.34.camel@localhost.localdomain>
[not found] ` <1166304080.13548.8.camel@nigel.suspend2.net>
[not found] ` <459152B1.9040106@zytor.com>
[not found] ` <1168140954.2153.1.camel@nigel.suspend2.net>
2007-01-07 4:22 ` [KORG] Re: kernel.org lies about latest -mm kernel Jeff Garzik
2007-01-07 4:29 ` Linus Torvalds
2007-01-07 20:11 ` Greg KH
2007-01-07 21:30 ` H. Peter Anvin
2007-01-07 21:54 ` Junio C Hamano
2007-01-07 22:21 ` Jeff Garzik
2007-01-07 22:53 ` Linus Torvalds
2007-01-07 23:32 ` Martin Langhoff
[not found] ` <45A08269.4050504@zytor.com>
2007-01-07 5:24 ` How git affects kernel.org performance H. Peter Anvin
2007-01-07 5:39 ` Linus Torvalds
2007-01-07 8:55 ` Willy Tarreau
2007-01-07 8:58 ` H. Peter Anvin
2007-01-07 9:03 ` Willy Tarreau
2007-01-07 10:28 ` Christoph Hellwig
2007-01-07 10:52 ` Willy Tarreau
2007-01-07 18:17 ` Linus Torvalds
2007-01-07 19:13 ` Linus Torvalds
[not found] ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
2007-01-07 19:35 ` Linus Torvalds
2007-01-07 10:50 ` Jan Engelhardt
2007-01-07 18:49 ` Randy Dunlap
2007-01-07 19:07 ` Jan Engelhardt
2007-01-07 19:28 ` Randy Dunlap
2007-01-07 19:37 ` Linus Torvalds
2007-01-07 9:15 ` Andrew Morton
2007-01-07 9:38 ` Rene Herman
2007-01-08 3:05 ` Suparna Bhattacharya
2007-01-08 12:58 ` Theodore Tso
2007-01-08 13:41 ` Johannes Stezenbach
2007-01-08 13:56 ` Theodore Tso
2007-01-08 13:59 ` Pavel Machek
2007-01-08 14:17 ` Theodore Tso
2007-01-08 13:43 ` Jeff Garzik
2007-01-09 1:09 ` Paul Jackson
2007-01-09 2:18 ` Jeremy Higdon
[not found] ` <20070109075945.GA8799@mail.ustc.edu.cn>
2007-01-09 7:59 ` Fengguang Wu
2007-01-09 7:59 ` Fengguang Wu
2007-01-09 16:23 ` Linus Torvalds
[not found] ` <20070110015739.GA26978@mail.ustc.edu.cn>
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 1:57 ` Fengguang Wu
2007-01-10 3:20 ` Nigel Cunningham
[not found] ` <20070110140730.GA986@mail.ustc.edu.cn>
2007-01-10 14:07 ` Fengguang Wu
2007-01-10 14:07 ` Fengguang Wu
2007-01-10 14:07 ` Fengguang Wu
2007-01-12 10:54 ` Nigel Cunningham
2007-01-09 7:59 ` Fengguang Wu
2007-01-07 14:57 ` Robert Fitzsimons
2007-01-07 19:12 ` J.H.
2007-01-08 1:51 ` Jakub Narebski
2007-01-07 15:06 ` Krzysztof Halasa
2007-01-07 20:31 ` Shawn O. Pearce
2007-01-08 14:46 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).