* Random file I/O regressions in 2.6 @ 2004-05-02 19:57 Alexey Kopytov 2004-05-03 11:14 ` Nick Piggin 0 siblings, 1 reply; 56+ messages in thread From: Alexey Kopytov @ 2004-05-02 19:57 UTC (permalink / raw) To: linux-kernel Hello! I tried to compare random file I/O performance in 2.4 and 2.6 kernels and found some regressions that I failed to explain. I tested 2.4.25, 2.6.5-bk2 and 2.6.6-rc3 with my own utility SysBench which was written to generate workloads similar to a database under intensive load. For 2.6.x kernels anticipatory, deadline, CFQ and noop I/O schedulers were tested with AS giving the best results for this workload, but it's still about 1.5 times worse than the results for 2.4.25 kernel. The SysBench 'fileio' test was configured to generate the following workload: 16 worker threads are created, each running random read/write file requests in blocks of 16 KB with a read/write ratio of 1.5. All I/O operations are evenly distributed over 128 files with a total size of 3 GB. Each 100 requests, an fsync() operations is performed sequentially on each file. The total number of requests is limited by 10000. The FS used for the test was ext3 with data=ordered. Here are the results (values are number of seconds to complete the test): 2.4.25: 77.5377 2.6.5-bk2(noop): 165.3393 2.6.5-bk2(anticipatory): 118.7450 2.6.5-bk2(deadline): 130.3254 2.6.5-bk2(CFQ): 146.4286 2.6.6-rc3(noop): 164.9486 2.6.6-rc3(anticipatory): 125.1776 2.6.6-rc3(deadline): 131.8903 2.6.6-rc3(CFQ): 152.9280 I have published the results as well as the hardware and kernel setups at the SysBench home page: http://sysbench.sourceforge.net/results/fileio/ Any comments or suggestions would be highly appreciated. -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov @ 2004-05-03 11:14 ` Nick Piggin 2004-05-03 18:08 ` Andrew Morton 0 siblings, 1 reply; 56+ messages in thread From: Nick Piggin @ 2004-05-03 11:14 UTC (permalink / raw) To: Alexey Kopytov; +Cc: linux-kernel, Jens Axboe, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 3200 bytes --] Alexey Kopytov wrote: > Hello! > > I tried to compare random file I/O performance in 2.4 and 2.6 kernels and > found some regressions that I failed to explain. I tested 2.4.25, 2.6.5-bk2 > and 2.6.6-rc3 with my own utility SysBench which was written to generate > workloads similar to a database under intensive load. > > For 2.6.x kernels anticipatory, deadline, CFQ and noop I/O schedulers were > tested with AS giving the best results for this workload, but it's still about > 1.5 times worse than the results for 2.4.25 kernel. > > The SysBench 'fileio' test was configured to generate the following workload: > 16 worker threads are created, each running random read/write file requests in > blocks of 16 KB with a read/write ratio of 1.5. All I/O operations are evenly > distributed over 128 files with a total size of 3 GB. Each 100 requests, an > fsync() operations is performed sequentially on each file. The total number of > requests is limited by 10000. > > The FS used for the test was ext3 with data=ordered. > I am able to reproduce this here. 2.6 isn't improved by increasing nr_requests, relaxing IO scheduler deadlines, or turning off readahead. It looks like 2.6 is submitting a lot of the IO in 4KB sized requests... Hmm, oh dear. It looks like the readahead logic shat itself and/or do_generic_mapping_read doesn't know how to handle multipage reads properly. What ends up happening is that readahead gets turned off, then the 16K read ends up being done in 4 synchronous 4K chunks. Because they are synchronous, they have no chance of being merged with one another either. I have attached a proof of concept hack... I think what should really happen is that page_cache_readahead should be taught about the size of the requested read, and ensures that a decent amount of reading is done while within the read request window, even if beyond-request-window-readahead has been previously unsuccessful. Numbers with an IDE disk, 256MB ram 2.4.24: 81s 2.6.6-rc3-mm1: 126s rc3-mm1+patch: 87s The small remaining regression might be explained by 2.6's smaller nr_requests, IDE driver, io scheduler tuning, etc. > Here are the results (values are number of seconds to complete the test): > > 2.4.25: 77.5377 > > 2.6.5-bk2(noop): 165.3393 > 2.6.5-bk2(anticipatory): 118.7450 > 2.6.5-bk2(deadline): 130.3254 > 2.6.5-bk2(CFQ): 146.4286 > > 2.6.6-rc3(noop): 164.9486 > 2.6.6-rc3(anticipatory): 125.1776 > 2.6.6-rc3(deadline): 131.8903 > 2.6.6-rc3(CFQ): 152.9280 > > I have published the results as well as the hardware and kernel setups at the > SysBench home page: http://sysbench.sourceforge.net/results/fileio/ > > Any comments or suggestions would be highly appreciated. > From your website: "Another interesting fact is that AS gives the best results for this workload, though it's believed to give worse results for this kind of workloads as compared to other I/O schedulers available in 2.6.x kernels." The anticipatory scheduler is actually in a fairly good state of tune, and can often beat deadline even for random read/write/fsync tests. The infamous database regression problem is when this sort of workload is combined with TCQ disk drives. Nick [-- Attachment #2: read-populate.patch --] [-- Type: text/x-patch, Size: 1010 bytes --] include/linux/mm.h | 0 linux-2.6-npiggin/mm/filemap.c | 5 ++++- 2 files changed, 4 insertions(+), 1 deletion(-) diff -puN mm/readahead.c~read-populate mm/readahead.c diff -puN mm/filemap.c~read-populate mm/filemap.c --- linux-2.6/mm/filemap.c~read-populate 2004-05-03 19:56:00.000000000 +1000 +++ linux-2.6-npiggin/mm/filemap.c 2004-05-03 20:51:37.000000000 +1000 @@ -627,6 +627,9 @@ void do_generic_mapping_read(struct addr index = *ppos >> PAGE_CACHE_SHIFT; offset = *ppos & ~PAGE_CACHE_MASK; + force_page_cache_readahead(mapping, filp, index, + max_sane_readahead(desc->count >> PAGE_CACHE_SHIFT)); + for (;;) { struct page *page; unsigned long end_index, nr, ret; @@ -644,7 +647,7 @@ void do_generic_mapping_read(struct addr } cond_resched(); - page_cache_readahead(mapping, ra, filp, index); + page_cache_readahead(mapping, ra, filp, index + desc->count); nr = nr - offset; find_page: diff -puN include/linux/mm.h~read-populate include/linux/mm.h _ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 11:14 ` Nick Piggin @ 2004-05-03 18:08 ` Andrew Morton 2004-05-03 20:22 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-03 18:08 UTC (permalink / raw) To: Nick Piggin; +Cc: alexeyk, linux-kernel, axboe Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > What ends up happening is that readahead gets turned off, then the > 16K read ends up being done in 4 synchronous 4K chunks. Because they > are synchronous, they have no chance of being merged with one another > either. yup. > I have attached a proof of concept hack... I think what should really > happen is that page_cache_readahead should be taught about the size > of the requested read, and ensures that a decent amount of reading is > done while within the read request window, even if > beyond-request-window-readahead has been previously unsuccessful. The "readahead turned itself off" thing is there to avoid doing lots of pagecache lookups in the very common case where the file is fully cached. The place which needs attention is handle_ra_miss(). But first I'd like to reacquaint myself with the intent behind the lazy-readahead patch. Was never happy with the complexity and special-cases which that introduced. > cond_resched(); > - page_cache_readahead(mapping, ra, filp, index); > + page_cache_readahead(mapping, ra, filp, index + desc->count); > `index' is a pagecache index and desc->count is a byte counter. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 18:08 ` Andrew Morton @ 2004-05-03 20:22 ` Ram Pai 2004-05-03 20:57 ` Andrew Morton 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-03 20:22 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 11:08, Andrew Morton wrote: > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > > What ends up happening is that readahead gets turned off, then the > > 16K read ends up being done in 4 synchronous 4K chunks. Because they > > are synchronous, they have no chance of being merged with one another > > either. > > yup. > > > I have attached a proof of concept hack... I think what should really > > happen is that page_cache_readahead should be taught about the size > > of the requested read, and ensures that a decent amount of reading is > > done while within the read request window, even if > > beyond-request-window-readahead has been previously unsuccessful. > > The "readahead turned itself off" thing is there to avoid doing lots of > pagecache lookups in the very common case where the file is fully cached. > > The place which needs attention is handle_ra_miss(). But first I'd like to > reacquaint myself with the intent behind the lazy-readahead patch. Was > never happy with the complexity and special-cases which that introduced. lazy-readahead has no role to play here. The readahead window got closed because the i/o pattern was totally random. My guess is multiple threads are generating 16k i/o on the same fd. In such a case the i/os can get interleaved and the readahead window size goes for a toss(which is expected behavior) Well if this is infact the case: the question is 1. does the i/o pattern really has some sequentiality to deserve a readahead? 2. or should we ensure that the interleaved case be somehow handled, by including the size parameter? I know Nick has implied option (2) but I think from the readahead's point of view it is (1), RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 20:22 ` Ram Pai @ 2004-05-03 20:57 ` Andrew Morton 2004-05-03 21:37 ` Peter Zaitsev 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-03 20:57 UTC (permalink / raw) To: Ram Pai; +Cc: nickpiggin, alexeyk, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > > The place which needs attention is handle_ra_miss(). But first I'd like to > > reacquaint myself with the intent behind the lazy-readahead patch. Was > > never happy with the complexity and special-cases which that introduced. > > lazy-readahead has no role to play here. Sure. But lazy-readahead is bolted on the side and is generally not to my liking. I'd like to find a solution to the sysbench problem which also solves the thing which lazy-readahead addressed. > The readahead window got closed > because the i/o pattern was totally random. My guess is multiple threads > are generating 16k i/o on the same fd. In such a case the i/os can get > interleaved and the readahead window size goes for a toss(which is > expected behavior) I don't think it's that. The app is doing well-aligned 16k reads and writes. If we get enough pagecache hits on the reads, readahead turns itself off (fair enough) but fails to turn itself on again. The readahead logic _should_ be able to adapt to the fixed-sized I/Os and issue correct-sized reads immediately after each seek. I _think_ this will fix the problem which lazy-readahead addressed, but as usual we don't have a rigorous description of that problem :( > Well if this is infact the case: the question is > 1. does the i/o pattern really has some sequentiality to > deserve a readahead? > 2. or should we ensure that the interleaved case be somehow > handled, by including the size parameter? > > I know Nick has implied option (2) but I think from the readahead's > point of view it is (1), Readahead has got too complex and is getting band-aidy. I'd prefer to tear it down and rethink things. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 20:57 ` Andrew Morton @ 2004-05-03 21:37 ` Peter Zaitsev 2004-05-03 21:50 ` Ram Pai 2004-05-03 21:59 ` Andrew Morton 0 siblings, 2 replies; 56+ messages in thread From: Peter Zaitsev @ 2004-05-03 21:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Ram Pai, nickpiggin, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 13:57, Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: > > > > > The place which needs attention is handle_ra_miss(). But first I'd like to > > > reacquaint myself with the intent behind the lazy-readahead patch. Was > > > never happy with the complexity and special-cases which that introduced. > > > > lazy-readahead has no role to play here. > Andrew, Could you please clarify how this things become to be dependent on read-ahead at all. At my understanding read-ahead it to catch sequential (or other) access pattern and do some advance reading, so instead of 16K request we do 128K request, or something similar. But how could read-ahead disabled end up in 16K request converted to several sequential synchronous 4K requests ? It all looks pretty strange. -- Peter Zaitsev, Senior Support Engineer MySQL AB, www.mysql.com ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 21:37 ` Peter Zaitsev @ 2004-05-03 21:50 ` Ram Pai 2004-05-03 22:01 ` Peter Zaitsev 2004-05-03 21:59 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-03 21:50 UTC (permalink / raw) To: Peter Zaitsev; +Cc: Andrew Morton, nickpiggin, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 14:37, Peter Zaitsev wrote: > On Mon, 2004-05-03 at 13:57, Andrew Morton wrote: > > Ram Pai <linuxram@us.ibm.com> wrote: > > > > > > > The place which needs attention is handle_ra_miss(). But first I'd like to > > > > reacquaint myself with the intent behind the lazy-readahead patch. Was > > > > never happy with the complexity and special-cases which that introduced. > > > > > > lazy-readahead has no role to play here. > > > > Andrew, > > Could you please clarify how this things become to be dependent on > read-ahead at all. > > At my understanding read-ahead it to catch sequential (or other) access > pattern and do some advance reading, so instead of 16K request we do > 128K request, or something similar. > > But how could read-ahead disabled end up in 16K request converted to > several sequential synchronous 4K requests ? When the readahead window gets closed,the code goes into slow-read mode. In this mode, all requests are broken to page-size. Hence a 16k request gets broken into 4 4K-requests. This continues to the point where enough number of sequential i/os are requested(i.e around ra->ra_pages number of pages), at which point the readahead window gets re-activated. Looking at it the other way, without readahead code, all requests satisfied through 4k i/os. Readahead helps in generating larger size i/os. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 21:50 ` Ram Pai @ 2004-05-03 22:01 ` Peter Zaitsev 0 siblings, 0 replies; 56+ messages in thread From: Peter Zaitsev @ 2004-05-03 22:01 UTC (permalink / raw) To: Ram Pai; +Cc: Andrew Morton, nickpiggin, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 14:50, Ram Pai wrote: > > Looking at it the other way, without readahead code, all requests > satisfied through 4k i/os. Readahead helps in generating larger size > i/os. Huh, This is kind of really strange. If you speak about database world, Random IO is quite frequent and database page sizes are normally larger than OS page size. Furthermore even if it is split to 4K block sizes, why are they not submitted in parallel, being merged on lower level. Anyway we seems to all agree this is not very good behavior and it should be fixed :) -- Peter Zaitsev, Senior Support Engineer MySQL AB, www.mysql.com ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 21:37 ` Peter Zaitsev 2004-05-03 21:50 ` Ram Pai @ 2004-05-03 21:59 ` Andrew Morton 2004-05-03 22:07 ` Ram Pai 2004-05-03 23:58 ` Nick Piggin 1 sibling, 2 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-03 21:59 UTC (permalink / raw) To: Peter Zaitsev; +Cc: linuxram, nickpiggin, alexeyk, linux-kernel, axboe Peter Zaitsev <peter@mysql.com> wrote: > > On Mon, 2004-05-03 at 13:57, Andrew Morton wrote: > > Ram Pai <linuxram@us.ibm.com> wrote: > > > > > > > The place which needs attention is handle_ra_miss(). But first I'd like to > > > > reacquaint myself with the intent behind the lazy-readahead patch. Was > > > > never happy with the complexity and special-cases which that introduced. > > > > > > lazy-readahead has no role to play here. > > > > Andrew, > > Could you please clarify how this things become to be dependent on > read-ahead at all. readahead is currently the only means by which we build up nice large multi-page BIOs. > At my understanding read-ahead it to catch sequential (or other) access > pattern and do some advance reading, so instead of 16K request we do > 128K request, or something similar. That's one of its usage patterns. It's also supposed to detect the fixed-sized-reads-seeking-all-over-the-place situation. In which case it's supposed to submit correctly-sized multi-page BIOs. But it's not working right for this workload. A naive solution would be to add special-case code which always does the fixed-size readahead after a seek. Basically that's if (ra->next_size == -1UL) force_page_cache_readahead(...) in filemap.c. But this means that the kernel does lots of pointless pagecache lookups when everything is in pagecache. We should detect this situation and stop doing readahead completely, until we start getting pagecache lookup misses again. > But how could read-ahead disabled end up in 16K request converted to > several sequential synchronous 4K requests ? Readahead got itself turned off because of pagecache hits and didn't turn itself on again. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 21:59 ` Andrew Morton @ 2004-05-03 22:07 ` Ram Pai 2004-05-03 23:58 ` Nick Piggin 1 sibling, 0 replies; 56+ messages in thread From: Ram Pai @ 2004-05-03 22:07 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zaitsev, nickpiggin, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 14:59, Andrew Morton wrote: > > > But how could read-ahead disabled end up in 16K request converted to > > several sequential synchronous 4K requests ? > > Readahead got itself turned off because of pagecache hits and didn't turn > itself on again. Andrew, In the slow read path, every contiguous access increases ra->size by 1 and non-contiguous access decreases the ra->size by 1. Now in the case of 16k random request, we have 1 non-contiguous request and 3 contiguous request. As a result the ra->size should have been incremented by -1+1+1+1=2 . So at the end of 16 4K request we should have had ra->size at 32. At this point onwards the readahead should get turned on. Right? I strongly feel the readahead got closed because of misses and not because of hits. Moreover if we are closing readahead window because of hits, then that implies we have pretty good caching going on.Which implies i/o should rarely hit the disk and hence performance should not degrade. Agree? RP > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 21:59 ` Andrew Morton 2004-05-03 22:07 ` Ram Pai @ 2004-05-03 23:58 ` Nick Piggin 2004-05-04 0:10 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Nick Piggin @ 2004-05-03 23:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zaitsev, linuxram, alexeyk, linux-kernel, axboe Andrew Morton wrote: > Peter Zaitsev <peter@mysql.com> wrote: > >>On Mon, 2004-05-03 at 13:57, Andrew Morton wrote: >> >>>Ram Pai <linuxram@us.ibm.com> wrote: >>> >>>>>The place which needs attention is handle_ra_miss(). But first I'd like to >>>>>reacquaint myself with the intent behind the lazy-readahead patch. Was >>>>>never happy with the complexity and special-cases which that introduced. >>>> >>>>lazy-readahead has no role to play here. >>> >>Andrew, >> >>Could you please clarify how this things become to be dependent on >>read-ahead at all. > > > readahead is currently the only means by which we build up nice large > multi-page BIOs. > > >>At my understanding read-ahead it to catch sequential (or other) access >>pattern and do some advance reading, so instead of 16K request we do >>128K request, or something similar. > > > That's one of its usage patterns. It's also supposed to detect the > fixed-sized-reads-seeking-all-over-the-place situation. In which case it's > supposed to submit correctly-sized multi-page BIOs. But it's not working > right for this workload. > > A naive solution would be to add special-case code which always does the > fixed-size readahead after a seek. Basically that's > > if (ra->next_size == -1UL) > force_page_cache_readahead(...) > I think a better solution to this case would be to ensure the readahead window is always min(size of read, some large number); The size of the read is basically a free and accurate "hint" to the minimum size of the required readahead. Either that or do a simple "preread" while you're still in the read request window, and run readahead when that completes. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-03 23:58 ` Nick Piggin @ 2004-05-04 0:10 ` Andrew Morton 2004-05-04 0:19 ` Nick Piggin 2004-05-04 8:27 ` Arjan van de Ven 0 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-04 0:10 UTC (permalink / raw) To: Nick Piggin; +Cc: peter, linuxram, alexeyk, linux-kernel, axboe Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > That's one of its usage patterns. It's also supposed to detect the > > fixed-sized-reads-seeking-all-over-the-place situation. In which case it's > > supposed to submit correctly-sized multi-page BIOs. But it's not working > > right for this workload. > > > > A naive solution would be to add special-case code which always does the > > fixed-size readahead after a seek. Basically that's > > > > if (ra->next_size == -1UL) > > force_page_cache_readahead(...) > > > > I think a better solution to this case would be to ensure the > readahead window is always min(size of read, some large number); > That would cause the kernel to perform lots of pointless pagecache lookups when the file is already 100% cached. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 0:10 ` Andrew Morton @ 2004-05-04 0:19 ` Nick Piggin 2004-05-04 0:50 ` Ram Pai 2004-05-04 1:15 ` Andrew Morton 2004-05-04 8:27 ` Arjan van de Ven 1 sibling, 2 replies; 56+ messages in thread From: Nick Piggin @ 2004-05-04 0:19 UTC (permalink / raw) To: Andrew Morton; +Cc: peter, linuxram, alexeyk, linux-kernel, axboe Andrew Morton wrote: > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >>>That's one of its usage patterns. It's also supposed to detect the >>>fixed-sized-reads-seeking-all-over-the-place situation. In which case it's >>>supposed to submit correctly-sized multi-page BIOs. But it's not working >>>right for this workload. >>> >>>A naive solution would be to add special-case code which always does the >>>fixed-size readahead after a seek. Basically that's >>> >>> if (ra->next_size == -1UL) >>> force_page_cache_readahead(...) >>> >> >>I think a better solution to this case would be to ensure the >>readahead window is always min(size of read, some large number); >> > > > That would cause the kernel to perform lots of pointless pagecache lookups > when the file is already 100% cached. > That's pretty sad. You need a "preread" or something which sends the pages back... or uses the actor itself. readahead would then have to be reworked to only run off the end of the read window, but that is what it should be doing anyway. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 0:19 ` Nick Piggin @ 2004-05-04 0:50 ` Ram Pai 2004-05-04 6:29 ` Andrew Morton 2004-05-04 1:15 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-04 0:50 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, peter, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 17:19, Nick Piggin wrote: > Andrew Morton wrote: > > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > >>>That's one of its usage patterns. It's also supposed to detect the > >>>fixed-sized-reads-seeking-all-over-the-place situation. In which case it's > >>>supposed to submit correctly-sized multi-page BIOs. But it's not working > >>>right for this workload. > >>> > >>>A naive solution would be to add special-case code which always does the > >>>fixed-size readahead after a seek. Basically that's > >>> > >>> if (ra->next_size == -1UL) > >>> force_page_cache_readahead(...) > >>> > >> > >>I think a better solution to this case would be to ensure the > >>readahead window is always min(size of read, some large number); > >> > > > > > > That would cause the kernel to perform lots of pointless pagecache lookups > > when the file is already 100% cached. > > > > > That's pretty sad. You need a "preread" or something which > sends the pages back... or uses the actor itself. readahead > would then have to be reworked to only run off the end of > the read window, but that is what it should be doing anyway. Sorry, If I am saying this again. I have checked the behaviour of the readahead code using my user level simulator as well as running some DSS benchmark and iozone benchmark. It generates a steady stream of large i/o for large-random-reads and should not exhibit the bad behavior that we are seeing. I feel this bad behavior is because of interleaved access by multiple thread. To illustrate with an example: t1 request reads from page 100 to 104 simultaneously t2 requests reads on the same fd from 200 to 204 So do_page_cache_readahead() can be called in the following pattern. 100,200,101,201,102,202,103,203,104,204. Because of this pattern the readahaed code assumes that the read pattern is absolutely random and hence closes the readahead window. I think I should generate a patch to validate this behavior, I will. How about having some /proc counters that keep track of number of window-closes because of cache-hits and because of cache-misses? RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 0:50 ` Ram Pai @ 2004-05-04 6:29 ` Andrew Morton 2004-05-04 15:03 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-04 6:29 UTC (permalink / raw) To: Ram Pai; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > Sorry, If I am saying this again. I have checked the behaviour of the > readahead code using my user level simulator as well as running some > DSS benchmark and iozone benchmark. It generates a steady stream of > large i/o for large-random-reads and should not exhibit the bad behavior > that we are seeing. I feel this bad behavior is because of interleaved > access by multiple thread. you're right - the benchmark has multiple threads issuing concurrent pread()s against the same fd. For some reason this mucks up the 2.6 readahead state more than 2.4's. Putting a semaphore around do_generic_file_read() or maintaining the state as below fixes it up. I wonder if we should bother fixing this? I guess as long as the app is using pread() it is a legitimate thing to be doing, so I guess we should... --- 25/mm/filemap.c~readahead-seralisation 2004-05-03 23:14:43.399947720 -0700 +++ 25-akpm/mm/filemap.c 2004-05-03 23:14:43.404946960 -0700 @@ -612,7 +612,7 @@ EXPORT_SYMBOL(grab_cache_page_nowait); * - note the struct file * is only passed for the use of readpage */ void do_generic_mapping_read(struct address_space *mapping, - struct file_ra_state *ra, + struct file_ra_state *_ra, struct file * filp, loff_t *ppos, read_descriptor_t * desc, @@ -622,6 +622,7 @@ void do_generic_mapping_read(struct addr unsigned long index, offset; struct page *cached_page; int error; + struct file_ra_state ra = *_ra; cached_page = NULL; index = *ppos >> PAGE_CACHE_SHIFT; @@ -644,13 +645,13 @@ void do_generic_mapping_read(struct addr } cond_resched(); - page_cache_readahead(mapping, ra, filp, index); + page_cache_readahead(mapping, &ra, filp, index); nr = nr - offset; find_page: page = find_get_page(mapping, index); if (unlikely(page == NULL)) { - handle_ra_miss(mapping, ra, index); + handle_ra_miss(mapping, &ra, index); goto no_cached_page; } if (!PageUptodate(page)) @@ -752,6 +753,8 @@ no_cached_page: goto readpage; } + *_ra = ra; + *ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset; if (cached_page) page_cache_release(cached_page); _ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 6:29 ` Andrew Morton @ 2004-05-04 15:03 ` Ram Pai 2004-05-04 19:39 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-04 15:03 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe On Mon, 2004-05-03 at 23:29, Andrew Morton wrote: > > Putting a semaphore around do_generic_file_read() or maintaining the state > as below fixes it up. > > I wonder if we should bother fixing this? I guess as long as the app is > using pread() it is a legitimate thing to be doing, so I guess we should... > > > Yes this patch makes sense. I have setup sysbench on my lab machine. Let me see how much improvement the patch provides. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 15:03 ` Ram Pai @ 2004-05-04 19:39 ` Ram Pai 2004-05-04 19:48 ` Andrew Morton 2004-05-04 23:01 ` Alexey Kopytov 0 siblings, 2 replies; 56+ messages in thread From: Ram Pai @ 2004-05-04 19:39 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe [-- Attachment #1: Type: text/plain, Size: 2111 bytes --] On Tue, 2004-05-04 at 08:03, Ram Pai wrote: > On Mon, 2004-05-03 at 23:29, Andrew Morton wrote: > > > > > Putting a semaphore around do_generic_file_read() or maintaining the state > > as below fixes it up. > > > > I wonder if we should bother fixing this? I guess as long as the app is > > using pread() it is a legitimate thing to be doing, so I guess we should... > > > > > > > Yes this patch makes sense. I have setup sysbench on my lab machine. Let > me see how much improvement the patch provides. I ran the following command: /root/sysbench-0.2.5/sysbench/sysbench --num-threads=256 --test=fileio --file-total-size=2800M --file-test-mode=rndrw run Without the patch: ------------------ Operations performed: 5959 Read, 4041 Write, 10752 Other = 20752 Total Read 93Mb Written 63Mb Total Transferred 156Mb 7.549Mb/sec Transferred 483.89 Requests/sec executed Test execution Statistics summary: Time spent for test: 20.6661s no of times window reset because of hits: 0 no of times window reset because of misses: 7 no of times window was shrunk because of hits: 6716 no of times the page request was non-contiguous: 5880 no of times the page request was contiguous : 19639 With the patch: -------------- Operations performed: 5960 Read, 4040 Write, 10880 Other = 20880 Total Read 93Mb Written 63Mb Total Transferred 156Mb 7.985Mb/sec Transferred 511.85 Requests/sec executed Test execution Statistics summary: Time spent for test: 19.5370s no of times window got reset because of hits: 0 no of times window got reset because of misses: 0 no of times window was shrunk because of hits: 5844 no of times the page request was non-contiguous: 5830 no of times the page request was contiguous : 20232 I have enclosed the patch that collects the hit/miss related counts. In general I am not seeing any major difference with or without andrew's ra-copy patch; except for readahead window getting closed because of misses when run without the patch. Would be nice if Alexey tries the patch on his machine and sees any major difference. RP [-- Attachment #2: ra_instrumentation.patch --] [-- Type: text/x-patch, Size: 4017 bytes --] diff -urNp linux-2.6.6-rc3/include/linux/sysctl.h linux-2.6.6-rc3.new/include/linux/sysctl.h --- linux-2.6.6-rc3/include/linux/sysctl.h 2004-04-27 18:35:49.000000000 -0700 +++ linux-2.6.6-rc3.new/include/linux/sysctl.h 2004-05-04 18:26:37.911973080 -0700 @@ -643,6 +643,11 @@ enum FS_XFS=17, /* struct: control xfs parameters */ FS_AIO_NR=18, /* current system-wide number of aio requests */ FS_AIO_MAX_NR=19, /* system-wide maximum number of aio requests */ + FS_READ_MISS_RESET=20, + FS_READ_HIT_RESET=21, + FS_CONTIGUOUS_CNT=22, + FS_NON_CONTIGUOUS_CNT=22, + FS_HIT_COUNT=23, }; /* /proc/sys/fs/quota/ */ diff -urNp linux-2.6.6-rc3/kernel/sysctl.c linux-2.6.6-rc3.new/kernel/sysctl.c --- linux-2.6.6-rc3/kernel/sysctl.c 2004-04-27 18:35:08.000000000 -0700 +++ linux-2.6.6-rc3.new/kernel/sysctl.c 2004-05-04 18:38:58.774344880 -0700 @@ -64,6 +64,13 @@ extern int sysctl_lower_zone_protection; extern int min_free_kbytes; extern int printk_ratelimit_jiffies; extern int printk_ratelimit_burst; +extern int printk_ratelimit_burst; +extern int printk_ratelimit_burst; +extern atomic_t hit_reset; +extern atomic_t miss_reset; +extern atomic_t hit_count; +extern atomic_t contiguous_cnt; +extern atomic_t non_contiguous_cnt; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -897,6 +904,46 @@ static ctl_table fs_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = FS_READ_MISS_RESET, + .procname = "read-miss-reset", + .data = &miss_reset, + .maxlen = sizeof(miss_reset), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = FS_READ_HIT_RESET, + .procname = "read-hit-reset", + .data = &hit_reset, + .maxlen = sizeof(hit_reset), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = FS_CONTIGUOUS_CNT, + .procname = "read-contiguous-cnt", + .data = &contiguous_cnt, + .maxlen = sizeof(contiguous_cnt), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = FS_NON_CONTIGUOUS_CNT, + .procname = "read-non-contiguous-cnt", + .data = &non_contiguous_cnt, + .maxlen = sizeof(non_contiguous_cnt), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = FS_HIT_COUNT, + .procname = "read-hit-count", + .data = &hit_count, + .maxlen = sizeof(hit_count), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; diff -urNp linux-2.6.6-rc3/mm/readahead.c linux-2.6.6-rc3.new/mm/readahead.c --- linux-2.6.6-rc3/mm/readahead.c 2004-04-27 18:35:06.000000000 -0700 +++ linux-2.6.6-rc3.new/mm/readahead.c 2004-05-04 18:37:20.681257296 -0700 @@ -316,6 +316,12 @@ int do_page_cache_readahead(struct addre return 0; } +atomic_t hit_reset= ATOMIC_INIT(0); +atomic_t miss_reset= ATOMIC_INIT(0); +atomic_t hit_count= ATOMIC_INIT(0); +atomic_t contiguous_cnt = ATOMIC_INIT(0); +atomic_t non_contiguous_cnt= ATOMIC_INIT(0); + /* * Check how effective readahead is being. If the amount of started IO is * less than expected then the file is partly or fully in pagecache and @@ -331,11 +337,13 @@ check_ra_success(struct file_ra_state *r if (actual == 0) { if (orig_next_size > 1) { ra->next_size = orig_next_size - 1; + atomic_inc(&hit_count); if (ra->ahead_size) ra->ahead_size = ra->next_size; } else { ra->next_size = -1UL; ra->size = 0; + atomic_inc(&hit_reset); } } } @@ -406,17 +414,20 @@ page_cache_readahead(struct address_spac * page beyond the end. Expand the next readahead size. */ ra->next_size += 2; + atomic_inc(&contiguous_cnt); } else { /* * A miss - lseek, pagefault, pread, etc. Shrink the readahead * window. */ ra->next_size -= 2; + atomic_inc(&non_contiguous_cnt); } if ((long)ra->next_size > (long)max) ra->next_size = max; if ((long)ra->next_size <= 0L) { + atomic_inc(&miss_reset); ra->next_size = -1UL; ra->size = 0; goto out; /* Readahead is off */ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 19:39 ` Ram Pai @ 2004-05-04 19:48 ` Andrew Morton 2004-05-04 19:58 ` Ram Pai 2004-05-04 23:01 ` Alexey Kopytov 1 sibling, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-04 19:48 UTC (permalink / raw) To: Ram Pai; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > I ran the following command: > > /root/sysbench-0.2.5/sysbench/sysbench --num-threads=256 --test=fileio > --file-total-size=2800M --file-test-mode=rndrw run > Alexey and I have been using 16 threads. You don't tell us how much memory your lab machine has. The above command only makes sense if it is less than 400 megabytes. Otherwise many or all of the reads are satisfied from pagecache. I've been testing with mem=256M, --file-total-size=2G. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 19:48 ` Andrew Morton @ 2004-05-04 19:58 ` Ram Pai 2004-05-04 21:51 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-04 19:58 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe On Tue, 2004-05-04 at 12:48, Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: > > > > I ran the following command: > > > > /root/sysbench-0.2.5/sysbench/sysbench --num-threads=256 --test=fileio > > --file-total-size=2800M --file-test-mode=rndrw run > > > > Alexey and I have been using 16 threads. > > You don't tell us how much memory your lab machine has. It has 8GB but only 4Gb is being used. I will try with 256MB and 16 threads. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 19:58 ` Ram Pai @ 2004-05-04 21:51 ` Ram Pai 2004-05-04 22:29 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-04 21:51 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe On Tue, 2004-05-04 at 12:58, Ram Pai wrote: > On Tue, 2004-05-04 at 12:48, Andrew Morton wrote: > > Ram Pai <linuxram@us.ibm.com> wrote: > > > > > > I ran the following command: > > > > > > /root/sysbench-0.2.5/sysbench/sysbench --num-threads=256 --test=fileio > > > --file-total-size=2800M --file-test-mode=rndrw run > > > > > > > Alexey and I have been using 16 threads. > > /root/sysbench-0.2.5/sysbench/sysbench --num-threads=16 --test=fileio --file-total-size=2800M --file-test-mode=rndrw run Without the patch: ------------------ Operations performed: 6002 Read, 3998 Write, 12800 Other = 22800 Total Read 93Mb Written 62Mb Total Transferred 156Mb 1.967Mb/sec Transferred 126.11 Requests/sec executed Test execution Statistics summary: Time spent for test: 79.2986s no of times window reset because of hits: 0 no of times window reset because of misses: 119 no of times window was shrunk because of hits: 417 no of times the page request was non-contiguous: 3809 no of times the page request was contiguous : 12745 With the patch: -------------- Operations performed: 6002 Read, 3999 Write, 12672 Other = 22673 Total Read 93Mb Written 62Mb Total Transferred 156Mb 2.927Mb/sec Transferred 187.65 Requests/sec executed Test execution Statistics summary: Time spent for test: 53.2949s no of times window reset because of hits: 0 no of times window reset because of misses: 0 no of times window was shrunk because of hits: 360 no of times the page request was non-contiguous: 5860 no of times the page request was contiguous : 20378 Impressive results. WOuld be nice to get a confirmation from Alexey. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 21:51 ` Ram Pai @ 2004-05-04 22:29 ` Ram Pai 0 siblings, 0 replies; 56+ messages in thread From: Ram Pai @ 2004-05-04 22:29 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, alexeyk, linux-kernel, axboe On Tue, 2004-05-04 at 14:51, Ram Pai wrote: memory used is ***256MB***. > /root/sysbench-0.2.5/sysbench/sysbench --num-threads=16 --test=fileio > --file-total-size=2800M --file-test-mode=rndrw run > RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 19:39 ` Ram Pai 2004-05-04 19:48 ` Andrew Morton @ 2004-05-04 23:01 ` Alexey Kopytov 2004-05-04 23:20 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Alexey Kopytov @ 2004-05-04 23:01 UTC (permalink / raw) To: Ram Pai; +Cc: Andrew Morton, nickpiggin, peter, linux-kernel, axboe Ram Pai wrote: >Without the patch: >------------------ >Time spent for test: 20.6661s > >no of times window reset because of hits: 0 >no of times window reset because of misses: 7 >no of times window was shrunk because of hits: 6716 >no of times the page request was non-contiguous: 5880 >no of times the page request was contiguous : 19639 > >With the patch: >-------------- >Time spent for test: 19.5370s > >no of times window got reset because of hits: 0 >no of times window got reset because of misses: 0 >no of times window was shrunk because of hits: 5844 >no of times the page request was non-contiguous: 5830 >no of times the page request was contiguous : 20232 > >Would be nice if Alexey tries the patch on his machine and sees any >major difference. Here's what I have (same hardware and test setups): Without the patch (but with Ram's patch applied): ------------------ Time spent for test: 125.4429s no of times window reset because of hits: 0 no of times window reset because of misses: 127 no of times window was shrunk because of hits: 1153 no of times the page request was non-contiguous: 3968 no of times the page request was contiguous : 10686 With the patch: --------------- Time spent for test: 86.5459s no of times window reset because of hits: 0 no of times window reset because of misses: 0 no of times window was shrunk because of hits: 1066 no of times the page request was non-contiguous: 5860 no of times the page request was contiguous : 18099 I wonder if there are some plans to further improve 2.6 behavior on this workload to match that of 2.4? Is the remaing regression a result of the different readahead handling, or it might be caused by IDE driver or I/O scheduler tuning? -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 23:01 ` Alexey Kopytov @ 2004-05-04 23:20 ` Andrew Morton 2004-05-05 22:04 ` Alexey Kopytov 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-04 23:20 UTC (permalink / raw) To: Alexey Kopytov; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe Alexey Kopytov <alexeyk@mysql.com> wrote: > > Without the patch (but with Ram's patch applied): > ------------------ > Time spent for test: 125.4429s > > no of times window reset because of hits: 0 > no of times window reset because of misses: 127 > no of times window was shrunk because of hits: 1153 > no of times the page request was non-contiguous: 3968 > no of times the page request was contiguous : 10686 > > With the patch: > --------------- > Time spent for test: 86.5459s > > no of times window reset because of hits: 0 > no of times window reset because of misses: 0 > no of times window was shrunk because of hits: 1066 > no of times the page request was non-contiguous: 5860 > no of times the page request was contiguous : 18099 > The patch brought my test box to the same speed as 2.4. With the deadline scheduler it was a bit faster than 2.4. I didn't do a lot of testing though. I was using ext2. Please try deadline. > I wonder if there are some plans to further improve 2.6 behavior on this > workload to match that of 2.4? Of course... Tuning work is being done on the anticipatory scheduler which we hope will bring it up to deadline throughput for this sort of workload. > Is the remaing regression a result of the > different readahead handling, or it might be caused by IDE driver or I/O > scheduler tuning? Don't know yet. On 2.6 the test actually does about 5% fewer reads than under 2.4, so the VM page replacement is working a bit better in this case. And 2.6 does about 40% fewer context switches for some reason. So we should be a little bit faster - it's a matter of finding where the additional seeks or idle time are coming from. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 23:20 ` Andrew Morton @ 2004-05-05 22:04 ` Alexey Kopytov 2004-05-06 8:43 ` Andrew Morton 0 siblings, 1 reply; 56+ messages in thread From: Alexey Kopytov @ 2004-05-05 22:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe Andrew Morton wrote: >Alexey Kopytov <alexeyk@mysql.com> wrote: >> With the patch: >> --------------- >> Time spent for test: 86.5459s >> >> no of times window reset because of hits: 0 >> no of times window reset because of misses: 0 >> no of times window was shrunk because of hits: 1066 >> no of times the page request was non-contiguous: 5860 >> no of times the page request was contiguous : 18099 > >The patch brought my test box to the same speed as 2.4. With the deadline >scheduler it was a bit faster than 2.4. I didn't do a lot of testing >though. I was using ext2. Please try deadline. > Results with the deadline scheduler on my hardware: Time spent for test: 92.8340s no of times window reset because of hits: 0 no of times window reset because of misses: 0 no of times window was shrunk because of hits: 1108 no of times the page request was non-contiguous: 5860 no of times the page request was contiguous : 18091 I have updated the results on the SysBench home page with 2.6.6-rc3 with the patch applied. -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-05 22:04 ` Alexey Kopytov @ 2004-05-06 8:43 ` Andrew Morton 2004-05-06 18:13 ` Peter Zaitsev 2004-05-10 19:50 ` Ram Pai 0 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-06 8:43 UTC (permalink / raw) To: Alexey Kopytov; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe Alexey Kopytov <alexeyk@mysql.com> wrote: > > Results with the deadline scheduler on my hardware: > > Time spent for test: 92.8340s Now we're into unreproducible results, alas. On a 256MB uniprocessor machine: ext3: sysbench --num-threads=16 --test=fileio --file-total-size=2G --file-test-mode=rndrw run 2.6.6-rc3-mm2, deadline: Time spent for test: 66.7536s Time spent for test: 67.9000s 0.04s user 6.41s system 4% cpu 2:14.74 total 2.6.6-rc2-mm2, as: Time spent for test: 66.7576s 0.07s user 6.68s system 5% cpu 2:14.18 total Time spent for test: 66.3216s 0.06s user 6.28s system 4% cpu 2:12.25 total 2.4.27-pre2: Time spent for test: 64.9766s 0.09s user 11.57s system 8% cpu 2:17.43 total Time spent for test: 64.2852s 0.11s user 11.18s system 8% cpu 2:14.63 total so 2.6 is a shade slower. 2.6 has tons less system CPU time, probably due to ext3 improvements. The reason for the difference appears to be the thing which Ram added to readahead which causes it to usually read one page too many. With this exciting patch: --- 25/mm/readahead.c~a 2004-05-06 01:24:26.230330464 -0700 +++ 25-akpm/mm/readahead.c 2004-05-06 01:24:26.234329856 -0700 @@ -475,7 +475,7 @@ do_io: ra->ahead_start = 0; /* Invalidate these */ ra->ahead_size = 0; actual = do_page_cache_readahead(mapping, filp, offset, - ra->size); + ra->size == 5 ? 4 : ra->size); if(!first_access) { /* * do not adjust the readahead window size the first _ I get: Time spent for test: 63.9435s 0.07s user 6.69s system 5% cpu 2:11.02 total which is a good result. Ram, can you take a look at fixing that up please? Something clean, not more hacks ;) I'd also be interested in an explanation of what the extra page is for. The little comment in there doesn't really help. One thing I note about this test is that it generates a huge number of inode writes. atime updates from the reads and mtime updates from the writes. Suppressing them doesn't actually make a lot of performance difference, but that is with writeback caching enabled. I expect that with a writethrough cache these will really hurt. The test uses 128 files, which seems excessive. I assume that four or eight files is a more likely real-life setup, and in theis case the atime/mtime update volume will be proportionately less. Alexey, I do not know why you're seeing such a disparity. I assume that IDE DMA is enabled - the difference seems too small for that to be an explanation, but please check it. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-06 8:43 ` Andrew Morton @ 2004-05-06 18:13 ` Peter Zaitsev 2004-05-06 21:49 ` Andrew Morton 2004-05-10 19:50 ` Ram Pai 1 sibling, 1 reply; 56+ messages in thread From: Peter Zaitsev @ 2004-05-06 18:13 UTC (permalink / raw) To: Andrew Morton; +Cc: Alexey Kopytov, linuxram, nickpiggin, linux-kernel, axboe On Thu, 2004-05-06 at 01:43, Andrew Morton wrote: > > One thing I note about this test is that it generates a huge number of > inode writes. atime updates from the reads and mtime updates from the > writes. Suppressing them doesn't actually make a lot of performance > difference, but that is with writeback caching enabled. I expect that with > a writethrough cache these will really hurt. Perhaps. By the way is there a way to disable update time modification as well ? It would make quite a good sense for partition used for Database needs - you do not need last modification time in most cases. > > The test uses 128 files, which seems excessive. I assume that four or > eight files is a more likely real-life setup, and in theis case the > atime/mtime update volume will be proportionately less. Actually both single (or very few) files and large amount of files are practical setup. In MySQL 4.1 we have the option to store each Innodb table in its own file, which will mean scattered random IO to many files for OLTP workloads. You might think 128 actively used tables are still to much, but practically we see even larger numbers - some customers partition data, creating huge number of tables with same structure, for example table-per customer. -- Peter Zaitsev, Senior Support Engineer MySQL AB, www.mysql.com ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-06 18:13 ` Peter Zaitsev @ 2004-05-06 21:49 ` Andrew Morton 2004-05-06 23:49 ` Nick Piggin 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-06 21:49 UTC (permalink / raw) To: Peter Zaitsev; +Cc: alexeyk, linuxram, nickpiggin, linux-kernel, axboe Peter Zaitsev <peter@mysql.com> wrote: > > On Thu, 2004-05-06 at 01:43, Andrew Morton wrote: > > > > > One thing I note about this test is that it generates a huge number of > > inode writes. atime updates from the reads and mtime updates from the > > writes. Suppressing them doesn't actually make a lot of performance > > difference, but that is with writeback caching enabled. I expect that with > > a writethrough cache these will really hurt. > > Perhaps. By the way is there a way to disable update time modification > as well ? No, there is not. > It would make quite a good sense for partition used for > Database needs - you do not need last modification time in most cases. First up, one needs to remove the inode_update_time() call from generic_file_aio_write_nolock() and run the tests. If this (and noatime) indeed makes a significant difference (probably on writethrough-caching disks) then yup, we should do something. `nomtime' would be simple enough. But another option would be to arrange for a/m/ctime dirtiness to not cause an inode writeout in fsync(). Instead, only sync the a/m/ctime-dirty inodes via sync, umount and pdflush. That way, the inodes get written every thirty seconds rather than once per second. It's probably not standards-compliant, but shoot me. Who cares if the mtimes come up 30 seconds out of date after a system crash? `nomtime' would be simpler and safer to implement, but not as nice. But we need those numbers first. I'll take a look. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-06 21:49 ` Andrew Morton @ 2004-05-06 23:49 ` Nick Piggin 2004-05-07 1:29 ` Peter Zaitsev 0 siblings, 1 reply; 56+ messages in thread From: Nick Piggin @ 2004-05-06 23:49 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zaitsev, alexeyk, linuxram, linux-kernel, axboe Andrew Morton wrote: > Peter Zaitsev <peter@mysql.com> wrote: > >>On Thu, 2004-05-06 at 01:43, Andrew Morton wrote: >> >> >>>One thing I note about this test is that it generates a huge number of >>>inode writes. atime updates from the reads and mtime updates from the >>>writes. Suppressing them doesn't actually make a lot of performance >>>difference, but that is with writeback caching enabled. I expect that with >>>a writethrough cache these will really hurt. >> >>Perhaps. By the way is there a way to disable update time modification >>as well ? > > > No, there is not. > > >>It would make quite a good sense for partition used for >>Database needs - you do not need last modification time in most cases. > > > First up, one needs to remove the inode_update_time() call from > generic_file_aio_write_nolock() and run the tests. If this (and noatime) > indeed makes a significant difference (probably on writethrough-caching > disks) then yup, we should do something. > > `nomtime' would be simple enough. But another option would be to arrange > for a/m/ctime dirtiness to not cause an inode writeout in fsync(). > Instead, only sync the a/m/ctime-dirty inodes via sync, umount and pdflush. > > That way, the inodes get written every thirty seconds rather than once per > second. > > It's probably not standards-compliant, but shoot me. Who cares if the > mtimes come up 30 seconds out of date after a system crash? > > `nomtime' would be simpler and safer to implement, but not as nice. > > But we need those numbers first. I'll take a look. > Can they use fdatasync? Does it do the right thing on Linux? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-06 23:49 ` Nick Piggin @ 2004-05-07 1:29 ` Peter Zaitsev 0 siblings, 0 replies; 56+ messages in thread From: Peter Zaitsev @ 2004-05-07 1:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, alexeyk, linuxram, linux-kernel, axboe On Thu, 2004-05-06 at 16:49, Nick Piggin wrote: > > > > `nomtime' would be simpler and safer to implement, but not as nice. > > > > But we need those numbers first. I'll take a look. > > > > Can they use fdatasync? Does it do the right thing on Linux? Nick, You're right. fdatasync suppose to be solution in this case and actually test supports this mode as well as MySQL does :) On other hand if you rather use O_DSYNC it does not seems to work being mapped to O_SYNC. But the thing I'm mostly interested in is O_DIRECT. It seems to be the best solution for many database needs, especially used together with asynchronous IO. There is however no matching option which should not flush MetaData. -- Peter Zaitsev, Senior Support Engineer MySQL AB, www.mysql.com ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-06 8:43 ` Andrew Morton 2004-05-06 18:13 ` Peter Zaitsev @ 2004-05-10 19:50 ` Ram Pai 2004-05-10 20:21 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-10 19:50 UTC (permalink / raw) To: Andrew Morton; +Cc: Alexey Kopytov, nickpiggin, peter, linux-kernel, axboe On Thu, 2004-05-06 at 01:43, Andrew Morton wrote: sorry, I am out for 10days and hence replies are late. > The reason for the difference appears to be the thing which Ram added to > readahead which causes it to usually read one page too many. With this > exciting patch: > > --- 25/mm/readahead.c~a 2004-05-06 01:24:26.230330464 -0700 > +++ 25-akpm/mm/readahead.c 2004-05-06 01:24:26.234329856 -0700 > @@ -475,7 +475,7 @@ do_io: > ra->ahead_start = 0; /* Invalidate these */ > ra->ahead_size = 0; > actual = do_page_cache_readahead(mapping, filp, offset, > - ra->size); > + ra->size == 5 ? 4 : ra->size); > if(!first_access) { > /* > * do not adjust the readahead window size the first > > _ > > > I get: > > Time spent for test: 63.9435s > 0.07s user 6.69s system 5% cpu 2:11.02 total > > which is a good result. > > Ram, can you take a look at fixing that up please? Something clean, not > more hacks ;) I'd also be interested in an explanation of what the extra > page is for. The little comment in there doesn't really help. The reason for the extra page read is as follows: Consider 16k random reads i/os. Reads are generated 4pages at a time. the readahead is triggered when the 4th page in the 'current-window' is touched. However the data which is read-in through the 'readahead window' gets thrown away because the next 16k read-io will not access anything read in the readahead window. As a result I put in that optimization which handles this wasted readahead-pages. The idea is, when we miss the current-window, read one more page than the number of pages accessed in the current-window. Here is a example scenario of random 16k i/os and with Andrew's code actual = do_page_cache_readahead(mapping, filp, offset, - ra->size); + ra->size == 5 ? 4 : ra->size); Consider that the application access page {1,2,3,4} {100,101,102,103} {200,201,202,203} Consider that the current-window holds 4 pages. i.e page 1,2,3,4 when the application asks for {1,2,3,4} we happily satisfy them through the current-window. However when the application touches page 4, the lazy-readahead triggers in and brings in pages {5,6,7,8,9,10,11,12} but now the application wants to access {100,101,102,103}. This waste of effort is probably bearable as long as we dont commit the same mistake in the future. When the application tries to access {100,101,102,103} the code then scraps both the current-window and the readahead-window and reads in a new current-window of size 4 i.e {100,101,102,103}. However when the application touches page 103, the lazy-readahead gets triggered and brings in 8 more pages {104,105,106,107,108,109,110,111} and as always all these pages go wasted. This wastage continues for ever. My Optimization [ I mean hack ;) ] was meant to avoid this bad behavior. Instead of reading in 'number of pages accessed in the current-window', I read in 'one more page than the number of pages accessed in the current-window'. With this optimization the behavior changes to as follows: when the application asks for {1,2,3,4} we happily satisfy them through the current-window. However when the application touches page 4, the lazy-readahead triggers and brings in pages {5,6,7,8,9,10,11,12} but now the application wants to access {100,101,102,103}. This bad behavior is probably ok since the optimization ensures that it does not commit the same mistake in the future. When the application tries to access {100,101,102,103} the code then scraps both the current window and the readahead window and reads in a new current window of size 4+1 i.e {100,101,102,103,104}. However since the application does not touch page 104 and hence lazy-readahead does not get triggered we do not waste effort bringing in pages. And this nice behavior continues for ever. Probably we may see marginal degradation of this optimization with 16k i/o but the amount of wastage avoided by this optimization (hack) is great when random i/o is of larger size. I think it was 4% better performance on DSS workload with 64k random reads. Do you still think its a hack? Also I think with sysbench workload and Andrew's ra-copy patch, we might be loosing some benefits of some of the optimization because if two threads simulteously work with copies of the same ra structure and update it, the optimization effect reflected in one of the ra-structure is lost depending on which ra structure gets copied back last. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-10 19:50 ` Ram Pai @ 2004-05-10 20:21 ` Andrew Morton 2004-05-10 22:39 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-10 20:21 UTC (permalink / raw) To: Ram Pai; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > > Ram, can you take a look at fixing that up please? Something clean, not > > more hacks ;) I'd also be interested in an explanation of what the extra > > page is for. The little comment in there doesn't really help. > > > The reason for the extra page read is as follows: > > Consider 16k random reads i/os. Reads are generated 4pages at a time. > > the readahead is triggered when the 4th page in the 'current-window' is > touched. Right. We've added two whole unsigned longs to the file_struct to track the access patterns. That should be sufficient for us to detect when the access pattern is random, and to then not perform readahead due to a current-window miss *at all*. So that extra page can go away, and: --- 25/mm/readahead.c~a Mon May 10 13:16:59 2004 +++ 25-akpm/mm/readahead.c Mon May 10 13:17:22 2004 @@ -492,21 +492,17 @@ do_io: */ if (ra->ahead_start == 0) { /* - * if the average io-size is less than maximum + * If the average io-size is less than maximum * readahead size of the file the io pattern is * sequential. Hence bring in the readahead window * immediately. - * Else the i/o pattern is random. Bring - * in the readahead window only if the last page of - * the current window is accessed (lazy readahead). */ unsigned long average = ra->average; if (ra->serial_cnt > average) average = (ra->serial_cnt + ra->average) / 2; - if ((average >= max) || (offset == (ra->start + - ra->size - 1))) { + if (average >= max) { ra->ahead_start = ra->start + ra->size; ra->ahead_size = ra->next_size; actual = do_page_cache_readahead(mapping, filp, _ That way, we read the correct amount of data, and we only start I/O when we know the application is going to actually use the data. This may cause problems when the application transitions from seeky-access to linear-access. Does it sound feasible? > > Probably we may see marginal degradation of this optimization with 16k > i/o but the amount of wastage avoided by this optimization (hack) > is great when random i/o is of larger size. I think it was 4% better > performance on DSS workload with 64k random reads. 64k sounds unusually large. We need top performance at 8k too. > Do you still think its a hack? yup ;) > Also I think with sysbench workload and Andrew's ra-copy patch, we > might be loosing some benefits of some of the optimization because > if two threads simulteously work with copies of the same ra structure > and update it, the optimization effect reflected in one of the > ra-structure is lost depending on which ra structure gets copied back > last. hm, maybe. That only makes a difference if two threads are accessing the same fd at the same time, and it was really bad before the patch. The IO patterns seemed OK to me with the patch. Except it's reading one page too many. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-10 20:21 ` Andrew Morton @ 2004-05-10 22:39 ` Ram Pai 2004-05-10 23:07 ` Andrew Morton 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-10 22:39 UTC (permalink / raw) To: Andrew Morton; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe On Mon, 2004-05-10 at 13:21, Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: > > > > > Ram, can you take a look at fixing that up please? Something clean, not > > > more hacks ;) I'd also be interested in an explanation of what the extra > > > page is for. The little comment in there doesn't really help. > > > > > > The reason for the extra page read is as follows: > > > > Consider 16k random reads i/os. Reads are generated 4pages at a time. > > > > the readahead is triggered when the 4th page in the 'current-window' is > > touched. > > Right. We've added two whole unsigned longs to the file_struct to track > the access patterns. That should be sufficient for us to detect when the > access pattern is random, and to then not perform readahead due to a > current-window miss *at all*. > > So that extra page can go away, and: > > --- 25/mm/readahead.c~a Mon May 10 13:16:59 2004 > +++ 25-akpm/mm/readahead.c Mon May 10 13:17:22 2004 > @@ -492,21 +492,17 @@ do_io: > */ > if (ra->ahead_start == 0) { > /* > - * if the average io-size is less than maximum > + * If the average io-size is less than maximum > * readahead size of the file the io pattern is > * sequential. Hence bring in the readahead window > * immediately. > - * Else the i/o pattern is random. Bring > - * in the readahead window only if the last page of > - * the current window is accessed (lazy readahead). > */ > unsigned long average = ra->average; > > if (ra->serial_cnt > average) > average = (ra->serial_cnt + ra->average) / 2; > > - if ((average >= max) || (offset == (ra->start + > - ra->size - 1))) { > + if (average >= max) { > ra->ahead_start = ra->start + ra->size; > ra->ahead_size = ra->next_size; > actual = do_page_cache_readahead(mapping, filp, > > _ > > > That way, we read the correct amount of data, and we only start I/O when we > know the application is going to actually use the data. > > This may cause problems when the application transitions from seeky-access > to linear-access. > > Does it sound feasible? I am nervous about this change. You are totally getting rid of lazy-readahead and that was the optimization which gave the best possible boost in performance. Let me see how this patch does with a DSS benchmark. > > > > > Probably we may see marginal degradation of this optimization with 16k > > i/o but the amount of wastage avoided by this optimization (hack) > > is great when random i/o is of larger size. I think it was 4% better > > performance on DSS workload with 64k random reads. > > 64k sounds unusually large. We need top performance at 8k too. > > > Do you still think its a hack? > > yup ;) > :-( > > Also I think with sysbench workload and Andrew's ra-copy patch, we > > might be loosing some benefits of some of the optimization because > > if two threads simulteously work with copies of the same ra structure > > and update it, the optimization effect reflected in one of the > > ra-structure is lost depending on which ra structure gets copied back > > last. > > hm, maybe. That only makes a difference if two threads are accessing the > same fd at the same time, and it was really bad before the patch. The IO > patterns seemed OK to me with the patch. Except it's reading one page too > many. In the normal large random workload this extra page would have compesated for all the wasted readaheads. However in the case of sysbench with Andrew's ra-copy patch the readahead calculation is not happening quiet right. Is it worth trying to get a marginal gain with sysbench at the cost of getting a big hit on DSS benchmarks, aio-tests,iozone and probably others. Or am I making an unsubstantiated claim? I will get back with results. RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-10 22:39 ` Ram Pai @ 2004-05-10 23:07 ` Andrew Morton 2004-05-11 20:51 ` Ram Pai 2004-05-11 22:26 ` Random file I/O regressions in 2.6 Bill Davidsen 0 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-10 23:07 UTC (permalink / raw) To: Ram Pai; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > I am nervous about this change. You are totally getting rid of > lazy-readahead and that was the optimization which gave the best > possible boost in performance. Because it disabled the large readahead outside the area which the app is reading. But it's still reading too much. > Let me see how this patch does with a DSS benchmark. That was not a real patch. More work is surely needed to get that right. > In the normal large random workload this extra page would have > compesated for all the wasted readaheads. I disagree that 64k is "normal"! > However in the case of > sysbench with Andrew's ra-copy patch the readahead calculation is not > happening quiet right. Is it worth trying to get a marginal gain > with sysbench at the cost of getting a big hit on DSS benchmarks, > aio-tests,iozone and probably others. Or am I making an unsubstantiated > claim? I will get back with results. It shouldn't hurt at all - the app does a seek, we perform the correctly-sized read. As I say, my main concern is that we correctly transition from seeky access to linear access and resume readahead. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-10 23:07 ` Andrew Morton @ 2004-05-11 20:51 ` Ram Pai 2004-05-11 21:17 ` Andrew Morton 2004-05-11 22:26 ` Random file I/O regressions in 2.6 Bill Davidsen 1 sibling, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-11 20:51 UTC (permalink / raw) To: Andrew Morton; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe [-- Attachment #1: Type: text/plain, Size: 1634 bytes --] On Mon, 2004-05-10 at 16:07, Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: > > > > I am nervous about this change. You are totally getting rid of > > lazy-readahead and that was the optimization which gave the best > > possible boost in performance. > > Because it disabled the large readahead outside the area which the app is > reading. But it's still reading too much. > > Let me see how this patch does with a DSS benchmark. > > That was not a real patch. More work is surely needed to get that right. > > > In the normal large random workload this extra page would have > > compesated for all the wasted readaheads. > > I disagree that 64k is "normal"! > > > However in the case of > > sysbench with Andrew's ra-copy patch the readahead calculation is not > > happening quiet right. Is it worth trying to get a marginal gain > > with sysbench at the cost of getting a big hit on DSS benchmarks, > > aio-tests,iozone and probably others. Or am I making an unsubstantiated > > claim? I will get back with results. > > It shouldn't hurt at all - the app does a seek, we perform the > correctly-sized read. Looks like you are right on all counts! I did some modifications to your patch and did a preliminary run with my user-level simulator. With these changes I am able to get rid of that extra page. Also code looks much simpler and adapts well to sequential and random patterns. However I have to run this under some benchmarks and see how it fares. Its a pre-alpha level patch. Can you take a quick look at the changes and see if you like it? I am sure you won't consider these changes a hack ;) RP [-- Attachment #2: readahead_trim.patch --] [-- Type: text/x-patch, Size: 3130 bytes --] diff -urNp linux-2.6.6/mm/readahead.c linux-2.6.6.new/mm/readahead.c --- linux-2.6.6/mm/readahead.c 2004-05-09 19:32:00.000000000 -0700 +++ linux-2.6.6.new/mm/readahead.c 2004-05-11 20:26:51.288797696 -0700 @@ -353,7 +353,7 @@ page_cache_readahead(struct address_spac unsigned orig_next_size; unsigned actual; int first_access=0; - unsigned long preoffset=0; + unsigned long average=0; /* * Here we detect the case where the application is performing @@ -394,10 +394,17 @@ page_cache_readahead(struct address_spac if (ra->serial_cnt <= (max * 2)) ra->serial_cnt++; } else { - ra->average = (ra->average + ra->serial_cnt) / 2; + /* to avoid rounding errors, ensure that 'average' + * tends towards the value of ra->serial_cnt. + */ + if(ra->average > ra->serial_cnt) { + average = ra->average - 1; + } else { + average = ra->average + 1; + } + ra->average = (average + ra->serial_cnt) / 2; ra->serial_cnt = 1; } - preoffset = ra->prev_page; ra->prev_page = offset; if (offset >= ra->start && offset <= (ra->start + ra->size)) { @@ -457,18 +464,14 @@ do_io: * ahead window and get some I/O underway for the new * current window. */ - if (!first_access && preoffset >= ra->start && - preoffset < (ra->start + ra->size)) { - /* Heuristic: If 'n' pages were - * accessed in the current window, there - * is a high probability that around 'n' pages - * shall be used in the next current window. - * - * To minimize lazy-readahead triggered - * in the next current window, read in - * an extra page. + if (!first_access) { + /* Heuristic: there is a high probability + * that around ra->average number of + * pages shall be accessed in the next + * current window. */ - ra->next_size = preoffset - ra->start + 2; + ra->next_size = (ra->average > max ? + max : ra->average); } ra->start = offset; ra->size = ra->next_size; @@ -492,21 +495,19 @@ do_io: */ if (ra->ahead_start == 0) { /* - * if the average io-size is less than maximum + * If the average io-size is more than maximum * readahead size of the file the io pattern is * sequential. Hence bring in the readahead window - * immediately. - * Else the i/o pattern is random. Bring - * in the readahead window only if the last page of - * the current window is accessed (lazy readahead). + * immediately. + * If the average io-size is less than maximum + * readahead size of the file the io pattern is + * random. Hence don't bother to readahead. */ - unsigned long average = ra->average; - + average = ra->average; if (ra->serial_cnt > average) - average = (ra->serial_cnt + ra->average) / 2; + average = (ra->serial_cnt + ra->average + 1) / 2; - if ((average >= max) || (offset == (ra->start + - ra->size - 1))) { + if (average > max) { ra->ahead_start = ra->start + ra->size; ra->ahead_size = ra->next_size; actual = do_page_cache_readahead(mapping, filp, ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-11 20:51 ` Ram Pai @ 2004-05-11 21:17 ` Andrew Morton 2004-05-13 20:41 ` Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-11 21:17 UTC (permalink / raw) To: Ram Pai; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe Ram Pai <linuxram@us.ibm.com> wrote: > > Looks like you are right on all counts! It's a probabilistic thing. > I did some modifications to your > patch and did a preliminary run with my user-level simulator. With these > changes I am able to get rid of that extra page. Also code looks much > simpler and adapts well to sequential and random patterns. That is good news. > However I have to run this under some benchmarks and see how it fares. > Its a pre-alpha level patch. It is nicer, thanks. I'll add it to -mm and hopefully Meredith and co will include it in regular performance testing. > Can you take a quick look at the changes and see if you like it? I am > sure you won't consider these changes a hack ;) Couple of minor things: > - unsigned long preoffset=0; yay! > + unsigned long average=0; Please add spaces around '='. But I don't think this needs to be initialised at all. > /* > * Here we detect the case where the application is performing > @@ -394,10 +394,17 @@ page_cache_readahead(struct address_spac > if (ra->serial_cnt <= (max * 2)) > ra->serial_cnt++; > } else { > - ra->average = (ra->average + ra->serial_cnt) / 2; > + /* to avoid rounding errors, ensure that 'average' > + * tends towards the value of ra->serial_cnt. > + */ multiline comment layout: /* * To avoid rounding errors, ensure that 'average' tends * towards the value of ra->serial_cnt. */ (I said "minor"). I can't say that I immediately understand what is the issue here with rounding errors? > + if(ra->average > ra->serial_cnt) { space between "if" and "(" > + ra->next_size = (ra->average > max ? > + max : ra->average); min(max, ra->average) ? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-11 21:17 ` Andrew Morton @ 2004-05-13 20:41 ` Ram Pai 2004-05-17 17:30 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-13 20:41 UTC (permalink / raw) To: Andrew Morton; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe On Tue, 2004-05-11 at 14:17, Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: I am yet to get my machine fully set up to run a DSS benchmark. But thought I will update you on the following comment. > > multiline comment layout: > > /* > * To avoid rounding errors, ensure that 'average' tends > * towards the value of ra->serial_cnt. > */ > > (I said "minor"). > > I can't say that I immediately understand what is the issue here with > rounding errors? Say the i/o size is 20 pages. Our algorithm starts by a initial average i/o size of 'ra_pages/2' which is mostly say 16. Now every time we take a average, the 'average' progresses as follows (16+20)/2=18 (18+20)/2=19 (19+20)/2=19 (19+20)/2=19..... and the rounding error makes it never touch 20 However the code can be further optimized to : /* * to avoid rounding errors, ensure that 'average' * tends towards the value of ra->serial_cnt. */ if (ra->average < ra->serial_cnt) { average = ra->average + 1; } I will send a updated patch with all your comments incorporated as soon as I see good benchmark numbers.(probably by tomorrow). RP > > > > + if(ra->average > ra->serial_cnt) { > > space between "if" and "(" > > > + ra->next_size = (ra->average > max ? > > + max : ra->average); > > min(max, ra->average) ? > > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-13 20:41 ` Ram Pai @ 2004-05-17 17:30 ` Ram Pai 2004-05-20 1:06 ` Alexey Kopytov 0 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-17 17:30 UTC (permalink / raw) To: Andrew Morton; +Cc: alexeyk, nickpiggin, peter, linux-kernel, axboe [-- Attachment #1: Type: text/plain, Size: 651 bytes --] On Thu, 2004-05-13 at 13:41, Ram Pai wrote: > On Tue, 2004-05-11 at 14:17, Andrew Morton wrote: > > Ram Pai <linuxram@us.ibm.com> wrote: > > I am yet to get my machine fully set up to run a DSS benchmark. But > thought I will update you on the following comment. Attached the cleaned up patch and the performance results of the patch. Overall Observation: 1.Small improvement with iozone with the patch, and overall much better performance than 2.4 2.Small/neglegible improvement with DSS workload. 3.Negligible impact with sysbench, but results worser than 2.4 kernels RP [-- Attachment #2: seeky-readahead-speedups.patch --] [-- Type: text/plain, Size: 7487 bytes --] Results of iozone,sysbench and DSS workload with the seeky-readahead-speedups.patch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Overall Observation: 1.Small improvement with iozone with the patch, and overall much better performance than 2.4 2.Small/neglegible improvement with DSS workload. 3.Negligible impact with sysbench, but results worser than 2.4 kernels The cleaned-up patch is included towards the end of this report. Details: ********************************************************************** IOZONE run on a nfs mounted filesystem: client machine 2proc, 733MHz, 2GB memory server machine 8proc, 700Mhz, 8GB memory ./iozone -c -t1 -s 4096m -r 128k --------------------------------------------------------- | | throughput | throughput | throughput | | | KB/sec | KB/sec | KB/sec | | | 266 | 266+patch | 2.4.20 | --------------------------------------------------------- |sequential read| 11697.55 | 11700.98 | 10846.87 | | | | | | |re-read | 11698.39 | 11691.84 | 10865.39 | | | | | | |reverse read | 20002.71 | 20099.86 | 10340.34 | | | | | | |stride read | 13813.01 | 13850.28 | 10193.87 | | | | | | |random read | 19705.06 | 19978.00 | 10839.57 | | | | | | |random mix | 28465.68 | 29964.38 | 10779.17 | | | | | | |pread | 11692.95 | 11697.29 | 10863.56 | --------------------------------------------------------- ************************************************************** SYSBENCH run on machine 2proc, 733MHz, 256MB memory --------------------------------------------------------- | | 266 | 266+patch | 2.4.21 | --------------------------------------------------------- |time spent | 79.6253 | 79.8176 | 73.2605sec | | | | | | |Mb/sec | 1.959Mb.sec| 1.954Mb/sec| 2.129Mb/sec| | | | | | |requests/sec | 125.59 | 125.29 | 136.54 | | | | | | |no of Reads | 6001 | 6001 | 6008 | | | | | | |no of Writes | 3999 | 3999 | 3995 | | | | | | --------------------------------------------------------- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 266 sysbench output: Operations performed: 6001 Read, 3999 Write, 12800 Other = 22800 Total Read 93Mb Written 62Mb Total Transferred 156Mb 1.959Mb/sec Transferred 125.59 Requests/sec executed Test execution Statistics summary: Time spent for test: 79.6253s Per Request statistics: Min: 0.0000s Avg: 0.0467s Max: 0.9802s Events tracked: 10000 Total time taken by event execution: 467.1493s Threads fairness: 87.41/94.20 distribution, 88.68/94.45 execution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 266+patch sysbench output: Operations performed: 6001 Read, 3999 Write, 12800 Other = 22800 Total Read 93Mb Written 62Mb Total Transferred 156Mb 1.954Mb/sec Transferred 125.29 Requests/sec executed Test execution Statistics summary: Time spent for test: 79.8176s Per Request statistics: Min: 0.0000s Avg: 0.0482s Max: 0.8481s Events tracked: 10000 Total time taken by event execution: 481.7572s Threads fairness: 85.27/93.25 distribution, 85.15/94.91 execution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.4.21 sysbench output: Operations performed: 6008 Read, 3995 Write, 12800 Other = 22803 Total Read 93Mb Written 62Mb Total Transferred 156Mb 2.129Mb/sec Transferred 136.54 Requests/sec executed Test execution Statistics summary: Time spent for test: 73.2605s Per Request statistics: Min: 0.0000s Avg: 0.0380s Max: 0.3712s Events tracked: 10003 Total time taken by event execution: 380.4081s Threads fairness: 79.04/91.95 distribution, 82.52/92.44 execution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ************************************************************** DSS WORKLOAD Got 1% improvement with the patch ************************************************************** diff -urNp linux-2.6.6/mm/readahead.c linux-2.6.6.new/mm/readahead.c --- linux-2.6.6/mm/readahead.c 2004-05-11 20:41:28.000000000 -0700 +++ linux-2.6.6.new/mm/readahead.c 2004-05-17 17:33:51.145040472 -0700 @@ -353,7 +353,7 @@ page_cache_readahead(struct address_spac unsigned orig_next_size; unsigned actual; int first_access=0; - unsigned long preoffset=0; + unsigned long average; /* * Here we detect the case where the application is performing @@ -394,10 +394,17 @@ page_cache_readahead(struct address_spac if (ra->serial_cnt <= (max * 2)) ra->serial_cnt++; } else { - ra->average = (ra->average + ra->serial_cnt) / 2; + /* + * to avoid rounding errors, ensure that 'average' + * tends towards the value of ra->serial_cnt. + */ + average = ra->average; + if (average < ra->serial_cnt) { + average++; + } + ra->average = (average + ra->serial_cnt) / 2; ra->serial_cnt = 1; } - preoffset = ra->prev_page; ra->prev_page = offset; if (offset >= ra->start && offset <= (ra->start + ra->size)) { @@ -457,18 +464,13 @@ do_io: * ahead window and get some I/O underway for the new * current window. */ - if (!first_access && preoffset >= ra->start && - preoffset < (ra->start + ra->size)) { - /* Heuristic: If 'n' pages were - * accessed in the current window, there - * is a high probability that around 'n' pages - * shall be used in the next current window. - * - * To minimize lazy-readahead triggered - * in the next current window, read in - * an extra page. + if (!first_access) { + /* Heuristic: there is a high probability + * that around ra->average number of + * pages shall be accessed in the next + * current window. */ - ra->next_size = preoffset - ra->start + 2; + ra->next_size = min(ra->average , (unsigned long)max); } ra->start = offset; ra->size = ra->next_size; @@ -492,21 +494,19 @@ do_io: */ if (ra->ahead_start == 0) { /* - * if the average io-size is less than maximum + * If the average io-size is more than maximum * readahead size of the file the io pattern is * sequential. Hence bring in the readahead window - * immediately. - * Else the i/o pattern is random. Bring - * in the readahead window only if the last page of - * the current window is accessed (lazy readahead). + * immediately. + * If the average io-size is less than maximum + * readahead size of the file the io pattern is + * random. Hence don't bother to readahead. */ - unsigned long average = ra->average; - + average = ra->average; if (ra->serial_cnt > average) - average = (ra->serial_cnt + ra->average) / 2; + average = (ra->serial_cnt + ra->average + 1) / 2; - if ((average >= max) || (offset == (ra->start + - ra->size - 1))) { + if (average > max) { ra->ahead_start = ra->start + ra->size; ra->ahead_size = ra->next_size; actual = do_page_cache_readahead(mapping, filp, ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-17 17:30 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai @ 2004-05-20 1:06 ` Alexey Kopytov 2004-05-20 1:31 ` Ram Pai ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Alexey Kopytov @ 2004-05-20 1:06 UTC (permalink / raw) To: Ram Pai; +Cc: Andrew Morton, nickpiggin, peter, linux-kernel, axboe Ram Pai wrote: >Attached the cleaned up patch and the performance results of the patch. > >Overall Observation: > 1.Small improvement with iozone with the patch, and overall > much better performance than 2.4 > 2.Small/neglegible improvement with DSS workload. > 3.Negligible impact with sysbench, but results worser than > 2.4 kernels Ram, can you clarify the status of this patch please? I ran the same sysbench test on my hardware with patched 2.6.6 and got 122.2348s execution time, i.e. almost the same results as in the original tests. Is this patch an intermediate step to improve the sysbench workload on 2.6, or it just addresses another problem? -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 1:06 ` Alexey Kopytov @ 2004-05-20 1:31 ` Ram Pai 2004-05-21 19:32 ` Alexey Kopytov 2004-05-20 5:49 ` Andrew Morton 2004-05-20 21:59 ` Andrew Morton 2 siblings, 1 reply; 56+ messages in thread From: Ram Pai @ 2004-05-20 1:31 UTC (permalink / raw) To: Alexey Kopytov; +Cc: Andrew Morton, nickpiggin, peter, linux-kernel, axboe On Wed, 2004-05-19 at 18:06, Alexey Kopytov wrote: > Ram Pai wrote: > > >Attached the cleaned up patch and the performance results of the patch. > > > >Overall Observation: > > 1.Small improvement with iozone with the patch, and overall > > much better performance than 2.4 > > 2.Small/neglegible improvement with DSS workload. > > 3.Negligible impact with sysbench, but results worser than > > 2.4 kernels > > Ram, can you clarify the status of this patch please? > > I ran the same sysbench test on my hardware with patched 2.6.6 and got > 122.2348s execution time, i.e. almost the same results as in the original > tests. Is this patch an intermediate step to improve the sysbench workload on > 2.6, or it just addresses another problem? this patch by itself does not address your problem. Your problem is better addressed by Andrew's 'readahead-private' patch. However; this patch applied on top of Andrew's 'readahead-private' patch may get you some extra performance. Can you confirm this please? RP ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 1:31 ` Ram Pai @ 2004-05-21 19:32 ` Alexey Kopytov 0 siblings, 0 replies; 56+ messages in thread From: Alexey Kopytov @ 2004-05-21 19:32 UTC (permalink / raw) To: Ram Pai; +Cc: Andrew Morton, nickpiggin, peter, linux-kernel, axboe On Thursday 20 May 2004 05:31, Ram Pai wrote: >On Wed, 2004-05-19 18:06, Alexey Kopytov wrote: >> Ram, can you clarify the status of this patch please? >> >> I ran the same sysbench test on my hardware with patched 2.6.6 and got >> 122.2348s execution time, i.e. almost the same results as in the original >> tests. Is this patch an intermediate step to improve the sysbench workload >> on 2.6, or it just addresses another problem? > >this patch by itself does not address your problem. Your problem is >better addressed by Andrew's 'readahead-private' patch. > >However; this patch applied on top of Andrew's 'readahead-private' patch >may get you some extra performance. > >Can you confirm this please? Yes. 2.6.6-rc3 + Andrew's patch: Time spent for test: 86.5459s 2.6.6-bk: Time spent for test: 83.1929s Thanks for clarifying! -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 1:06 ` Alexey Kopytov 2004-05-20 1:31 ` Ram Pai @ 2004-05-20 5:49 ` Andrew Morton 2004-05-20 21:59 ` Andrew Morton 2 siblings, 0 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-20 5:49 UTC (permalink / raw) To: Alexey Kopytov; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe Alexey Kopytov <alexeyk@mysql.com> wrote: > > Ram Pai wrote: > > >Attached the cleaned up patch and the performance results of the patch. > > > >Overall Observation: > > 1.Small improvement with iozone with the patch, and overall > > much better performance than 2.4 > > 2.Small/neglegible improvement with DSS workload. > > 3.Negligible impact with sysbench, but results worser than > > 2.4 kernels > > Ram, can you clarify the status of this patch please? Everything we have is now in Linus's tree. And in 2.6.6-mm4. > I ran the same sysbench test on my hardware with patched 2.6.6 and got > 122.2348s execution time, i.e. almost the same results as in the original > tests. Is this patch an intermediate step to improve the sysbench workload on > 2.6, or it just addresses another problem? The patches in Linus's tree improve sysbench significantly here. It's a 256MB 2-way with IDE disks, writeback caching enabled: sysbench --num-threads=16 --test=fileio --file-total-size=2G --file-test-mode=rndrw run 2.4.27-pre2, ext2: Time spent for test: 61.0240s 0.06s user 6.03s system 4% cpu 2:05.95 total Time spent for test: 60.8456s 0.11s user 5.49s system 4% cpu 2:04.94 total 2.6.6, CFQ, ext2: Time spent for test: 85.6614s 0.05s user 5.66s system 3% cpu 2:26.75 total Time spent for test: 85.2090s 0.06s user 5.32s system 3% cpu 2:24.75 total 2.6.6-bk, CFQ, ext2: Time spent for test: 66.7717s 0.04s user 5.54s system 4% cpu 2:06.19 total Time spent for test: 67.5666s 0.04s user 5.10s system 4% cpu 2:06.72 total 2.6.6, as, ext2: Time spent for test: 83.8358s 0.07s user 5.89s system 4% cpu 2:22.92 total Time spent for test: 83.8068s 0.06s user 5.34s system 3% cpu 2:21.33 total 2.6.6-bk, AS, ext2: Time spent for test: 62.5316s 0.05s user 5.27s system 4% cpu 2:01.28 total Time spent for test: 62.7401s 0.04s user 5.17s system 4% cpu 2:00.50 total 2.6.6, deadline, ext2: Time spent for test: 103.0084s 0.06s user 5.76s system 3% cpu 2:40.74 total Time spent for test: 101.9648s 0.07s user 5.35s system 3% cpu 2:38.83 total 2.6.6-bk, deadline, ext2: Time spent for test: 63.3405s 0.03s user 5.49s system 4% cpu 2:01.05 total Time spent for test: 63.5288s 0.03s user 5.05s system 4% cpu 2:00.78 total There's still something wrong here. 2.6.6-bk+deadline is pretty equivalent to 2.4 from an IO scheduler point of view in this test. Yet it's a couple of percent slower. I don't know why you're still seeing significant discrepancies. What sort of disk+controller system are you using? If scsi, what is the tag queue depth set to? Is writeback caching enabled on the disk? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 1:06 ` Alexey Kopytov 2004-05-20 1:31 ` Ram Pai 2004-05-20 5:49 ` Andrew Morton @ 2004-05-20 21:59 ` Andrew Morton 2004-05-20 22:23 ` Andrew Morton 2004-05-21 21:13 ` Alexey Kopytov 2 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2004-05-20 21:59 UTC (permalink / raw) To: Alexey Kopytov; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe (Resend due to osdl<->vger smtp bunfight) Alexey Kopytov <alexeyk@mysql.com> wrote: > > Ram Pai wrote: > > >Attached the cleaned up patch and the performance results of the patch. > > > >Overall Observation: > > 1.Small improvement with iozone with the patch, and overall > > much better performance than 2.4 > > 2.Small/neglegible improvement with DSS workload. > > 3.Negligible impact with sysbench, but results worser than > > 2.4 kernels > > Ram, can you clarify the status of this patch please? Everything we have is now in Linus's tree. And in 2.6.6-mm4. > I ran the same sysbench test on my hardware with patched 2.6.6 and got > 122.2348s execution time, i.e. almost the same results as in the original > tests. Is this patch an intermediate step to improve the sysbench workload on > 2.6, or it just addresses another problem? The patches in Linus's tree improve sysbench significantly here. It's a 256MB 2-way with IDE disks, writeback caching enabled: sysbench --num-threads=16 --test=fileio --file-total-size=2G --file-test-mode=rndrw run 2.4.27-pre2, ext2: Time spent for test: 61.0240s 0.06s user 6.03s system 4% cpu 2:05.95 total Time spent for test: 60.8456s 0.11s user 5.49s system 4% cpu 2:04.94 total 2.6.6, CFQ, ext2: Time spent for test: 85.6614s 0.05s user 5.66s system 3% cpu 2:26.75 total Time spent for test: 85.2090s 0.06s user 5.32s system 3% cpu 2:24.75 total 2.6.6-bk, CFQ, ext2: Time spent for test: 66.7717s 0.04s user 5.54s system 4% cpu 2:06.19 total Time spent for test: 67.5666s 0.04s user 5.10s system 4% cpu 2:06.72 total 2.6.6, as, ext2: Time spent for test: 83.8358s 0.07s user 5.89s system 4% cpu 2:22.92 total Time spent for test: 83.8068s 0.06s user 5.34s system 3% cpu 2:21.33 total 2.6.6-bk, AS, ext2: Time spent for test: 62.5316s 0.05s user 5.27s system 4% cpu 2:01.28 total Time spent for test: 62.7401s 0.04s user 5.17s system 4% cpu 2:00.50 total 2.6.6, deadline, ext2: Time spent for test: 103.0084s 0.06s user 5.76s system 3% cpu 2:40.74 total Time spent for test: 101.9648s 0.07s user 5.35s system 3% cpu 2:38.83 total 2.6.6-bk, deadline, ext2: Time spent for test: 63.3405s 0.03s user 5.49s system 4% cpu 2:01.05 total Time spent for test: 63.5288s 0.03s user 5.05s system 4% cpu 2:00.78 total There's still something wrong here. 2.6.6-bk+deadline is pretty equivalent to 2.4 from an IO scheduler point of view in this test. Yet it's a couple of percent slower. I don't know why you're still seeing significant discrepancies. What sort of disk+controller system are you using? If scsi, what is the tag queue depth set to? Is writeback caching enabled on the disk? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 21:59 ` Andrew Morton @ 2004-05-20 22:23 ` Andrew Morton 2004-05-21 7:31 ` Nick Piggin 2004-05-21 21:13 ` Alexey Kopytov 1 sibling, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-20 22:23 UTC (permalink / raw) To: alexeyk, linuxram, nickpiggin, peter, linux-kernel, axboe Andrew Morton <akpm@osdl.org> wrote: > > There's still something wrong here. 2.6.6-bk+deadline is pretty equivalent > to 2.4 from an IO scheduler point of view in this test. Yet it's a couple > of percent slower. > > I don't know why you're still seeing significant discrepancies. > > What sort of disk+controller system are you using? If scsi, what is the > tag queue depth set to? Is writeback caching enabled on the disk? If the 2.4 and 2.6 disk accounting statitics are to be believed, they show something interesting. Workload is one run of sysbench --num-threads=16 --test=fileio --file-total-size=2G --file-test-mode=rndrw run on ext2. 2.4.27-pre2: rio: 5549 (Read requests issued) rblk: 259680 (Total sectors read) wio: 42398 (Write requests issued) wblk: 4368056 (Total sectors written) 2.6.6-bk, as: reads: 5983 readsectors: 201192 writes: 22548 writesectors: 4343184 - Note that 2.6 read 20% less data from the disk. We observed this before. It appears that 2.6 page replacements decisions are working better for this workload. - Despite that, 2.6 issued *more* read requests. So it is submitting more, and smaller I/O's - Both kernels wrote basically the same amount of data. 2.6 a little less, perhaps because of fsync() optimisations. - But 2.6 issued far fewer write requests. Half as many as 2.4 - a huge difference. There are a number of reasons why this could happen but frankly, I don't have a clue what's going on in there. Given that 2.6 is issuing less IO requests it should be performing faster than 2.4. The reason that the two kernels are achieving about the same throughput despite this is that the disk is performing writeback caching and is absorbing 2.4's smaller write requests. I set the IDE disk to do writethrough (hdparm -W0): 2.6.6-bk, as: Time spent for test: 89.9427s 0.04s user 5.24s system 1% cpu 4:51.62 total 2.4.27-pre2: Time spent for test: 107.8293s 0.04s user 6.00s system 1% cpu 7:26.47 total as expected. Open questions are: a) Why is 2.6 write coalescing so superior to 2.4? b) Why is 2.6 issuing more read requests, for less data? c) Why is Alexey seeing dissimilar results? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 22:23 ` Andrew Morton @ 2004-05-21 7:31 ` Nick Piggin 2004-05-21 7:50 ` Jens Axboe 0 siblings, 1 reply; 56+ messages in thread From: Nick Piggin @ 2004-05-21 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: alexeyk, linuxram, peter, linux-kernel, axboe Andrew Morton wrote: > Open questions are: > > a) Why is 2.6 write coalescing so superior to 2.4? > > b) Why is 2.6 issuing more read requests, for less data? > > c) Why is Alexey seeing dissimilar results? > Interesting. I am not too familiar with 2.4's IO scheduler, but 2.6's have pretty comprehensive merging systems. Could that be helping, Jens? Or is 2.4 pretty equivalent? What about things like maximum request size for 2.4 vs 2.6 for example? This is another thing that can have an impact, especially for writes. I'll take a guess at b, and say it could be as-iosched.c. Another thing might be that 2.6 has smaller nr_requests than 2.4, although you are unlikely to hid the read side limit with only 16 threads if they are doing sync IO. As for question c, has Alexey confirmed that it is indeed 2.6-bk which has problems? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-21 7:31 ` Nick Piggin @ 2004-05-21 7:50 ` Jens Axboe 2004-05-21 8:40 ` Nick Piggin 2004-05-21 8:56 ` Spam: " Andrew Morton 0 siblings, 2 replies; 56+ messages in thread From: Jens Axboe @ 2004-05-21 7:50 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, alexeyk, linuxram, peter, linux-kernel On Fri, May 21 2004, Nick Piggin wrote: > Andrew Morton wrote: > > >Open questions are: > > > >a) Why is 2.6 write coalescing so superior to 2.4? > > > >b) Why is 2.6 issuing more read requests, for less data? > > > >c) Why is Alexey seeing dissimilar results? > > > > > Interesting. I am not too familiar with 2.4's IO scheduler, > but 2.6's have pretty comprehensive merging systems. Could > that be helping, Jens? Or is 2.4 pretty equivalent? 2.4 will give up merging faster than 2.6, elevator_linus will stop looking for a merge point if the sequence drops to zero. 2.6 will always merge. So that could explain the fewer writes. > What about things like maximum request size for 2.4 vs 2.6 > for example? This is another thing that can have an impact, > especially for writes. I think that's pretty similar. Andrew didn't say what device he was testing on, but 2.4 ide defaults to max 64k where 2.6 defaults to 128k. > I'll take a guess at b, and say it could be as-iosched.c. > Another thing might be that 2.6 has smaller nr_requests than > 2.4, although you are unlikely to hid the read side limit > with only 16 threads if they are doing sync IO. Andrew, you did numbers for deadline previously as well, but no rq statistics there? As for nr_requests that's true, would be worth a shot to bump available requests in 2.6. -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-21 7:50 ` Jens Axboe @ 2004-05-21 8:40 ` Nick Piggin 2004-05-21 8:56 ` Spam: " Andrew Morton 1 sibling, 0 replies; 56+ messages in thread From: Nick Piggin @ 2004-05-21 8:40 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, alexeyk, linuxram, peter, linux-kernel Jens Axboe wrote: > On Fri, May 21 2004, Nick Piggin wrote: > >>Andrew Morton wrote: >> >> >>>Open questions are: >>> >>>a) Why is 2.6 write coalescing so superior to 2.4? >>> >>>b) Why is 2.6 issuing more read requests, for less data? >>> >>>c) Why is Alexey seeing dissimilar results? >>> >> >> >>Interesting. I am not too familiar with 2.4's IO scheduler, >>but 2.6's have pretty comprehensive merging systems. Could >>that be helping, Jens? Or is 2.4 pretty equivalent? > > > 2.4 will give up merging faster than 2.6, elevator_linus will stop > looking for a merge point if the sequence drops to zero. 2.6 will always > merge. So that could explain the fewer writes. > Yep OK, that could be one thing. > >>What about things like maximum request size for 2.4 vs 2.6 >>for example? This is another thing that can have an impact, >>especially for writes. > > > I think that's pretty similar. Andrew didn't say what device he was > testing on, but 2.4 ide defaults to max 64k where 2.6 defaults to 128k. > This could be another. If Andrew's using IDE, this alone could make up the entire difference *if* writes are nicely sequential. I guess they probably aren't, but it could still help. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Spam: Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-21 7:50 ` Jens Axboe 2004-05-21 8:40 ` Nick Piggin @ 2004-05-21 8:56 ` Andrew Morton 2004-05-21 22:24 ` Alexey Kopytov 1 sibling, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-21 8:56 UTC (permalink / raw) To: Jens Axboe; +Cc: nickpiggin, alexeyk, linuxram, peter, linux-kernel Jens Axboe <axboe@suse.de> wrote: > > I think that's pretty similar. Andrew didn't say what device he was > testing on, but 2.4 ide defaults to max 64k where 2.6 defaults to 128k. IDE. I was being silly, sorry. Those I/O stats include the (huge linear) initial write of the "database" files, so the larger IDE request size will be dominating. What I need is a way of getting sysbench to create and remove the database files in separate invokations, but the syntax for that is defeating me at present. > > I'll take a guess at b, and say it could be as-iosched.c. > > Another thing might be that 2.6 has smaller nr_requests than > > 2.4, although you are unlikely to hid the read side limit > > with only 16 threads if they are doing sync IO. > > Andrew, you did numbers for deadline previously as well, but no rq > statistics there? As for nr_requests that's true, would be worth a shot > to bump available requests in 2.6. Doubling the request queue size makes no difference. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Spam: Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-21 8:56 ` Spam: " Andrew Morton @ 2004-05-21 22:24 ` Alexey Kopytov 0 siblings, 0 replies; 56+ messages in thread From: Alexey Kopytov @ 2004-05-21 22:24 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, nickpiggin, linuxram, peter, linux-kernel On Friday 21 May 2004 12:56, Andrew Morton wrote: > >What I need is a way of getting sysbench to create and remove the database >files in separate invokations, but the syntax for that is defeating me at >present. > I have changed the syntax to allow creating/removing test files and test running in separate stages: sysbench --test=fileio --file-total-size=3G prepare sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw run sysbench --test=fileio cleanup The updated version is available from the SysBench page at http://sourceforge.net/projects/sysbench/ -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-20 21:59 ` Andrew Morton 2004-05-20 22:23 ` Andrew Morton @ 2004-05-21 21:13 ` Alexey Kopytov 2004-05-26 4:43 ` Alexey Kopytov 1 sibling, 1 reply; 56+ messages in thread From: Alexey Kopytov @ 2004-05-21 21:13 UTC (permalink / raw) To: Andrew Morton; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe On Friday 21 May 2004 01:59, Andrew Morton wrote: >The patches in Linus's tree improve sysbench significantly here. It's a >256MB 2-way with IDE disks, writeback caching enabled: > >sysbench --num-threads=16 --test=fileio --file-total-size=2G > --file-test-mode=rndrw run > >2.4.27-pre2, ext2: > > Time spent for test: 61.0240s > 0.06s user 6.03s system 4% cpu 2:05.95 total > Time spent for test: 60.8456s > 0.11s user 5.49s system 4% cpu 2:04.94 total > >2.6.6-bk, AS, ext2: > > Time spent for test: 62.5316s > 0.05s user 5.27s system 4% cpu 2:01.28 total > Time spent for test: 62.7401s > 0.04s user 5.17s system 4% cpu 2:00.50 total I ran the tests with a configuration as close to yours as possible. Here are the results for mem=256M, 2G total file size (ext3): 2.4.25: Time spent for test: 79.4146s 0.20user 16.08system 3:20.29elapsed 8%CPU Time spent for test: 78.9797s 0.11user 15.84system 3:19.76elapsed 7%CPU 2.6.6-bk, AS: Time spent for test: 81.2208s 0.13user 17.97system 3:13.30elapsed 9%CPU Time spent for test: 82.5538s 0.14user 18.00system 3:14.88elapsed 9%CPU This correlates very well your results. But when I returned back to my original configuration (mem=640M, 3G total file size), I got the following: 2.4.25: Time spent for test: 77.5377s 2.6.6-bk, AS: Time spent for test: 83.1929s It seems like the smaller file size just hides the regression, but I have to run some more tests to ensure this. >I don't know why you're still seeing significant discrepancies. > >What sort of disk+controller system are you using? If scsi, what is the >tag queue depth set to? Is writeback caching enabled on the disk? It's IDE disk without TCQ support with writeback caching enabled. -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 [patch+results] 2004-05-21 21:13 ` Alexey Kopytov @ 2004-05-26 4:43 ` Alexey Kopytov 0 siblings, 0 replies; 56+ messages in thread From: Alexey Kopytov @ 2004-05-26 4:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linuxram, nickpiggin, peter, linux-kernel, axboe On Saturday 22 May 2004 01:13, Alexey Kopytov wrote: >I ran the tests with a configuration as close to yours as possible. Here are >the results for mem=256M, 2G total file size (ext3): > >2.4.25: > Time spent for test: 79.4146s > 0.20user 16.08system 3:20.29elapsed 8%CPU > Time spent for test: 78.9797s > 0.11user 15.84system 3:19.76elapsed 7%CPU > >2.6.6-bk, AS: > Time spent for test: 81.2208s > 0.13user 17.97system 3:13.30elapsed 9%CPU > Time spent for test: 82.5538s > 0.14user 18.00system 3:14.88elapsed 9%CPU > >This correlates very well your results. But when I returned back to my >original configuration (mem=640M, 3G total file size), I got the following: > >2.4.25: > Time spent for test: 77.5377s > >2.6.6-bk, AS: > Time spent for test: 83.1929s > >It seems like the smaller file size just hides the regression, but I have to >run some more tests to ensure this. > The assumption appears to be true. I tried to vary the total file size and got the following results (tests were done on another IDE disk): 2.4.27-pre3: 2 GB: 58.2707s 4 GB: 72.3313s 8 GB: 83.082s 2.6.7-rc1, AS: 2 GB: 60.6792s 4 GB: 82.8023s 8 GB: 99.4398s Varying the number of files while keeping the total file size constant also gives some interesting results: 2.4.27-pre3, 4 GB total file size: 1 file: 71.7288s 128 files: 72.3313s 256 files: 73.9268 2.6.7-rc1, AS, 4 GB total file size: 1 file: 76.443 128 files: 82.8023 256 files: 81.9618 -- Alexey Kopytov, Software Developer MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-10 23:07 ` Andrew Morton 2004-05-11 20:51 ` Ram Pai @ 2004-05-11 22:26 ` Bill Davidsen 1 sibling, 0 replies; 56+ messages in thread From: Bill Davidsen @ 2004-05-11 22:26 UTC (permalink / raw) To: linux-kernel Andrew Morton wrote: > Ram Pai <linuxram@us.ibm.com> wrote: > >>I am nervous about this change. You are totally getting rid of >>lazy-readahead and that was the optimization which gave the best >>possible boost in performance. > > > Because it disabled the large readahead outside the area which the app is > reading. But it's still reading too much. > > >>Let me see how this patch does with a DSS benchmark. > > > That was not a real patch. More work is surely needed to get that right. > > >>In the normal large random workload this extra page would have >>compesated for all the wasted readaheads. > > > I disagree that 64k is "normal"! > > >> However in the case of >>sysbench with Andrew's ra-copy patch the readahead calculation is not >>happening quiet right. Is it worth trying to get a marginal gain >>with sysbench at the cost of getting a big hit on DSS benchmarks, >>aio-tests,iozone and probably others. Or am I making an unsubstantiated >>claim? I will get back with results. > > > It shouldn't hurt at all - the app does a seek, we perform the > correctly-sized read. > > As I say, my main concern is that we correctly transition from seeky access > to linear access and resume readahead. One real problem is that you are trying to do in the kernel what would be best done in the application and better done in glibc... Because the benefit of readahead varies based on fd rather than device. Consider a program reading data from a file and putting it in a database. The benefit of readahead for the sequential access data file is higher than seek-read combinations. The library could do readahead based on the bytes read since the last seek on a by-file basis, something the kernel can't. This is not to say the kernel work hasn't been a benefit, but note that with all the patches 2.4 still seems to outperform 2.6. And that's a problem since other parts of 2.6 scale so well. I do see that 2.4 seems to outperform 2.6 for usenet news, where you have small reads against a modest database, a few TB or so, and 400-2000 processes doing random reads against the data. Settings and schedulers seem to have only modest effect there. -- -bill davidsen (davidsen@tmr.com) "The secret to procrastination is to put things off until the last possible moment - but no longer" -me ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 0:19 ` Nick Piggin 2004-05-04 0:50 ` Ram Pai @ 2004-05-04 1:15 ` Andrew Morton 2004-05-04 11:39 ` Nick Piggin 1 sibling, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-04 1:15 UTC (permalink / raw) To: Nick Piggin; +Cc: peter, linuxram, alexeyk, linux-kernel, axboe Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > Andrew Morton wrote: > > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > >>>That's one of its usage patterns. It's also supposed to detect the > >>>fixed-sized-reads-seeking-all-over-the-place situation. In which case it's > >>>supposed to submit correctly-sized multi-page BIOs. But it's not working > >>>right for this workload. > >>> > >>>A naive solution would be to add special-case code which always does the > >>>fixed-size readahead after a seek. Basically that's > >>> > >>> if (ra->next_size == -1UL) > >>> force_page_cache_readahead(...) > >>> > >> > >>I think a better solution to this case would be to ensure the > >>readahead window is always min(size of read, some large number); > >> > > > > > > That would cause the kernel to perform lots of pointless pagecache lookups > > when the file is already 100% cached. > > > > > That's pretty sad. You need a "preread" or something which > sends the pages back... or uses the actor itself. readahead > would then have to be reworked to only run off the end of > the read window, but that is what it should be doing anyway. Sorry, I do not understand that paragraph at all. All forms or pagecache population need to examine the pagecache to find out if the page is already there. This involves pagecache lookups. We want the read code to "learn" that the requested pages are all coming from cache and to stop doing those lookups altogether. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 1:15 ` Andrew Morton @ 2004-05-04 11:39 ` Nick Piggin 0 siblings, 0 replies; 56+ messages in thread From: Nick Piggin @ 2004-05-04 11:39 UTC (permalink / raw) To: Andrew Morton; +Cc: peter, linuxram, alexeyk, linux-kernel, axboe Andrew Morton wrote: > > > Sorry, I do not understand that paragraph at all. > > All forms or pagecache population need to examine the pagecache to find out > if the page is already there. This involves pagecache lookups. We want > the read code to "learn" that the requested pages are all coming from cache > and to stop doing those lookups altogether. Yeah I think I have an idea of what the basic problems are, but I'd have to understand things better before I know if I am really on the right track. My idea would probably also involve redoing some of the code code too, so at the moment I don't think I have time. If your simple fix works though, then that sounds good. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 0:10 ` Andrew Morton 2004-05-04 0:19 ` Nick Piggin @ 2004-05-04 8:27 ` Arjan van de Ven 2004-05-04 8:47 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Arjan van de Ven @ 2004-05-04 8:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, peter, linuxram, alexeyk, linux-kernel, axboe [-- Attachment #1: Type: text/plain, Size: 305 bytes --] > > That would cause the kernel to perform lots of pointless pagecache lookups > when the file is already 100% cached. well surely the read itself will do those AGAIN anyway, so in the fully cached case this is just warming up the cpu cache ;) (and thus really cheap as nett cost I suspect) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 8:27 ` Arjan van de Ven @ 2004-05-04 8:47 ` Andrew Morton 2004-05-04 8:50 ` Arjan van de Ven 0 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2004-05-04 8:47 UTC (permalink / raw) To: arjanv; +Cc: nickpiggin, peter, linuxram, alexeyk, linux-kernel, axboe Arjan van de Ven <arjanv@redhat.com> wrote: > > > > > > That would cause the kernel to perform lots of pointless pagecache lookups > > when the file is already 100% cached. > > well surely the read itself will do those AGAIN anyway, so in the fully > cached case this is just warming up the cpu cache ;) (and thus really > cheap as nett cost I suspect) Probably true for x86, but the cost is noticeable on ppc64, for example. Anton fixed some things in there shortly after it went in, but it's still apparent on profiles. We could perhaps speed things up a little bit by using gang lookup in both __do_page_cache_readahead() and in do_generic_file_read(). ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Random file I/O regressions in 2.6 2004-05-04 8:47 ` Andrew Morton @ 2004-05-04 8:50 ` Arjan van de Ven 0 siblings, 0 replies; 56+ messages in thread From: Arjan van de Ven @ 2004-05-04 8:50 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, peter, linuxram, alexeyk, linux-kernel, axboe [-- Attachment #1: Type: text/plain, Size: 1050 bytes --] On Tue, May 04, 2004 at 01:47:29AM -0700, Andrew Morton wrote: > Arjan van de Ven <arjanv@redhat.com> wrote: > > > > > > > > > > That would cause the kernel to perform lots of pointless pagecache lookups > > > when the file is already 100% cached. > > > > well surely the read itself will do those AGAIN anyway, so in the fully > > cached case this is just warming up the cpu cache ;) (and thus really > > cheap as nett cost I suspect) > > Probably true for x86, but the cost is noticeable on ppc64, for example. > Anton fixed some things in there shortly after it went in, but it's still > apparent on profiles. well do the profiles also show that the actual later lookup becomes near free due to a warm cpu cache? > > We could perhaps speed things up a little bit by using gang lookup in both > __do_page_cache_readahead() and in do_generic_file_read(). or go into the readahead path only when the first miss occurs; for the fully cached case you can then avoid the cost while when you're doing IO, well, a few premature cache misses... [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2004-05-26 4:43 UTC | newest] Thread overview: 56+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov 2004-05-03 11:14 ` Nick Piggin 2004-05-03 18:08 ` Andrew Morton 2004-05-03 20:22 ` Ram Pai 2004-05-03 20:57 ` Andrew Morton 2004-05-03 21:37 ` Peter Zaitsev 2004-05-03 21:50 ` Ram Pai 2004-05-03 22:01 ` Peter Zaitsev 2004-05-03 21:59 ` Andrew Morton 2004-05-03 22:07 ` Ram Pai 2004-05-03 23:58 ` Nick Piggin 2004-05-04 0:10 ` Andrew Morton 2004-05-04 0:19 ` Nick Piggin 2004-05-04 0:50 ` Ram Pai 2004-05-04 6:29 ` Andrew Morton 2004-05-04 15:03 ` Ram Pai 2004-05-04 19:39 ` Ram Pai 2004-05-04 19:48 ` Andrew Morton 2004-05-04 19:58 ` Ram Pai 2004-05-04 21:51 ` Ram Pai 2004-05-04 22:29 ` Ram Pai 2004-05-04 23:01 ` Alexey Kopytov 2004-05-04 23:20 ` Andrew Morton 2004-05-05 22:04 ` Alexey Kopytov 2004-05-06 8:43 ` Andrew Morton 2004-05-06 18:13 ` Peter Zaitsev 2004-05-06 21:49 ` Andrew Morton 2004-05-06 23:49 ` Nick Piggin 2004-05-07 1:29 ` Peter Zaitsev 2004-05-10 19:50 ` Ram Pai 2004-05-10 20:21 ` Andrew Morton 2004-05-10 22:39 ` Ram Pai 2004-05-10 23:07 ` Andrew Morton 2004-05-11 20:51 ` Ram Pai 2004-05-11 21:17 ` Andrew Morton 2004-05-13 20:41 ` Ram Pai 2004-05-17 17:30 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai 2004-05-20 1:06 ` Alexey Kopytov 2004-05-20 1:31 ` Ram Pai 2004-05-21 19:32 ` Alexey Kopytov 2004-05-20 5:49 ` Andrew Morton 2004-05-20 21:59 ` Andrew Morton 2004-05-20 22:23 ` Andrew Morton 2004-05-21 7:31 ` Nick Piggin 2004-05-21 7:50 ` Jens Axboe 2004-05-21 8:40 ` Nick Piggin 2004-05-21 8:56 ` Spam: " Andrew Morton 2004-05-21 22:24 ` Alexey Kopytov 2004-05-21 21:13 ` Alexey Kopytov 2004-05-26 4:43 ` Alexey Kopytov 2004-05-11 22:26 ` Random file I/O regressions in 2.6 Bill Davidsen 2004-05-04 1:15 ` Andrew Morton 2004-05-04 11:39 ` Nick Piggin 2004-05-04 8:27 ` Arjan van de Ven 2004-05-04 8:47 ` Andrew Morton 2004-05-04 8:50 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox