From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Alexey Kopytov <alexeyk@mysql.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@suse.de>,
Andrew Morton <akpm@osdl.org>
Subject: Re: Random file I/O regressions in 2.6
Date: Mon, 03 May 2004 21:14:45 +1000 [thread overview]
Message-ID: <409629A5.8070201@yahoo.com.au> (raw)
In-Reply-To: <200405022357.59415.alexeyk@mysql.com>
[-- Attachment #1: Type: text/plain, Size: 3200 bytes --]
Alexey Kopytov wrote:
> Hello!
>
> I tried to compare random file I/O performance in 2.4 and 2.6 kernels and
> found some regressions that I failed to explain. I tested 2.4.25, 2.6.5-bk2
> and 2.6.6-rc3 with my own utility SysBench which was written to generate
> workloads similar to a database under intensive load.
>
> For 2.6.x kernels anticipatory, deadline, CFQ and noop I/O schedulers were
> tested with AS giving the best results for this workload, but it's still about
> 1.5 times worse than the results for 2.4.25 kernel.
>
> The SysBench 'fileio' test was configured to generate the following workload:
> 16 worker threads are created, each running random read/write file requests in
> blocks of 16 KB with a read/write ratio of 1.5. All I/O operations are evenly
> distributed over 128 files with a total size of 3 GB. Each 100 requests, an
> fsync() operations is performed sequentially on each file. The total number of
> requests is limited by 10000.
>
> The FS used for the test was ext3 with data=ordered.
>
I am able to reproduce this here. 2.6 isn't improved by increasing
nr_requests, relaxing IO scheduler deadlines, or turning off readahead.
It looks like 2.6 is submitting a lot of the IO in 4KB sized requests...
Hmm, oh dear. It looks like the readahead logic shat itself and/or
do_generic_mapping_read doesn't know how to handle multipage reads
properly.
What ends up happening is that readahead gets turned off, then the
16K read ends up being done in 4 synchronous 4K chunks. Because they
are synchronous, they have no chance of being merged with one another
either.
I have attached a proof of concept hack... I think what should really
happen is that page_cache_readahead should be taught about the size
of the requested read, and ensures that a decent amount of reading is
done while within the read request window, even if
beyond-request-window-readahead has been previously unsuccessful.
Numbers with an IDE disk, 256MB ram
2.4.24: 81s
2.6.6-rc3-mm1: 126s
rc3-mm1+patch: 87s
The small remaining regression might be explained by 2.6's smaller
nr_requests, IDE driver, io scheduler tuning, etc.
> Here are the results (values are number of seconds to complete the test):
>
> 2.4.25: 77.5377
>
> 2.6.5-bk2(noop): 165.3393
> 2.6.5-bk2(anticipatory): 118.7450
> 2.6.5-bk2(deadline): 130.3254
> 2.6.5-bk2(CFQ): 146.4286
>
> 2.6.6-rc3(noop): 164.9486
> 2.6.6-rc3(anticipatory): 125.1776
> 2.6.6-rc3(deadline): 131.8903
> 2.6.6-rc3(CFQ): 152.9280
>
> I have published the results as well as the hardware and kernel setups at the
> SysBench home page: http://sysbench.sourceforge.net/results/fileio/
>
> Any comments or suggestions would be highly appreciated.
>
From your website:
"Another interesting fact is that AS gives the best results for this
workload, though it's believed to give worse results for this kind of
workloads as compared to other I/O schedulers available in 2.6.x
kernels."
The anticipatory scheduler is actually in a fairly good state of tune,
and can often beat deadline even for random read/write/fsync tests. The
infamous database regression problem is when this sort of workload is
combined with TCQ disk drives.
Nick
[-- Attachment #2: read-populate.patch --]
[-- Type: text/x-patch, Size: 1010 bytes --]
include/linux/mm.h | 0
linux-2.6-npiggin/mm/filemap.c | 5 ++++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff -puN mm/readahead.c~read-populate mm/readahead.c
diff -puN mm/filemap.c~read-populate mm/filemap.c
--- linux-2.6/mm/filemap.c~read-populate 2004-05-03 19:56:00.000000000 +1000
+++ linux-2.6-npiggin/mm/filemap.c 2004-05-03 20:51:37.000000000 +1000
@@ -627,6 +627,9 @@ void do_generic_mapping_read(struct addr
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
+ force_page_cache_readahead(mapping, filp, index,
+ max_sane_readahead(desc->count >> PAGE_CACHE_SHIFT));
+
for (;;) {
struct page *page;
unsigned long end_index, nr, ret;
@@ -644,7 +647,7 @@ void do_generic_mapping_read(struct addr
}
cond_resched();
- page_cache_readahead(mapping, ra, filp, index);
+ page_cache_readahead(mapping, ra, filp, index + desc->count);
nr = nr - offset;
find_page:
diff -puN include/linux/mm.h~read-populate include/linux/mm.h
_
next prev parent reply other threads:[~2004-05-03 11:14 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov
2004-05-03 11:14 ` Nick Piggin [this message]
2004-05-03 18:08 ` Andrew Morton
2004-05-03 20:22 ` Ram Pai
2004-05-03 20:57 ` Andrew Morton
2004-05-03 21:37 ` Peter Zaitsev
2004-05-03 21:50 ` Ram Pai
2004-05-03 22:01 ` Peter Zaitsev
2004-05-03 21:59 ` Andrew Morton
2004-05-03 22:07 ` Ram Pai
2004-05-03 23:58 ` Nick Piggin
2004-05-04 0:10 ` Andrew Morton
2004-05-04 0:19 ` Nick Piggin
2004-05-04 0:50 ` Ram Pai
2004-05-04 6:29 ` Andrew Morton
2004-05-04 15:03 ` Ram Pai
2004-05-04 19:39 ` Ram Pai
2004-05-04 19:48 ` Andrew Morton
2004-05-04 19:58 ` Ram Pai
2004-05-04 21:51 ` Ram Pai
2004-05-04 22:29 ` Ram Pai
2004-05-04 23:01 ` Alexey Kopytov
2004-05-04 23:20 ` Andrew Morton
2004-05-05 22:04 ` Alexey Kopytov
2004-05-06 8:43 ` Andrew Morton
2004-05-06 18:13 ` Peter Zaitsev
2004-05-06 21:49 ` Andrew Morton
2004-05-06 23:49 ` Nick Piggin
2004-05-07 1:29 ` Peter Zaitsev
2004-05-10 19:50 ` Ram Pai
2004-05-10 20:21 ` Andrew Morton
2004-05-10 22:39 ` Ram Pai
2004-05-10 23:07 ` Andrew Morton
2004-05-11 20:51 ` Ram Pai
2004-05-11 21:17 ` Andrew Morton
2004-05-13 20:41 ` Ram Pai
2004-05-17 17:30 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai
2004-05-20 1:06 ` Alexey Kopytov
2004-05-20 1:31 ` Ram Pai
2004-05-21 19:32 ` Alexey Kopytov
2004-05-20 5:49 ` Andrew Morton
2004-05-20 21:59 ` Andrew Morton
2004-05-20 22:23 ` Andrew Morton
2004-05-21 7:31 ` Nick Piggin
2004-05-21 7:50 ` Jens Axboe
2004-05-21 8:40 ` Nick Piggin
2004-05-21 8:56 ` Spam: " Andrew Morton
2004-05-21 22:24 ` Alexey Kopytov
2004-05-21 21:13 ` Alexey Kopytov
2004-05-26 4:43 ` Alexey Kopytov
2004-05-11 22:26 ` Random file I/O regressions in 2.6 Bill Davidsen
2004-05-04 1:15 ` Andrew Morton
2004-05-04 11:39 ` Nick Piggin
2004-05-04 8:27 ` Arjan van de Ven
2004-05-04 8:47 ` Andrew Morton
2004-05-04 8:50 ` Arjan van de Ven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=409629A5.8070201@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=alexeyk@mysql.com \
--cc=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox