public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Alexey Kopytov <alexeyk@mysql.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@suse.de>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: Random file I/O regressions in 2.6
Date: Mon, 03 May 2004 21:14:45 +1000	[thread overview]
Message-ID: <409629A5.8070201@yahoo.com.au> (raw)
In-Reply-To: <200405022357.59415.alexeyk@mysql.com>

[-- Attachment #1: Type: text/plain, Size: 3200 bytes --]

Alexey Kopytov wrote:
> Hello!
> 
> I tried to compare random file I/O performance in 2.4 and 2.6 kernels and 
> found some regressions that I failed to explain. I tested 2.4.25, 2.6.5-bk2 
> and 2.6.6-rc3 with my own utility SysBench which was written to generate 
> workloads similar to a database under intensive load. 
> 
> For 2.6.x kernels anticipatory, deadline, CFQ and noop I/O schedulers were
> tested with AS giving the best results for this workload, but it's still about 
> 1.5 times worse than the results for 2.4.25 kernel.
> 
> The SysBench 'fileio' test was configured to generate the following workload:
> 16 worker threads are created, each running random read/write file requests in
> blocks of 16 KB with a read/write ratio of 1.5. All I/O operations are evenly
> distributed over 128 files with a total size of 3 GB. Each 100 requests, an
> fsync() operations is performed sequentially on each file. The total number of
> requests is limited by 10000.
> 
> The FS used for the test was ext3 with data=ordered.
> 

I am able to reproduce this here. 2.6 isn't improved by increasing
nr_requests, relaxing IO scheduler deadlines, or turning off readahead.
It looks like 2.6 is submitting a lot of the IO in 4KB sized requests...

Hmm, oh dear. It looks like the readahead logic shat itself and/or
do_generic_mapping_read doesn't know how to handle multipage reads
properly.

What ends up happening is that readahead gets turned off, then the
16K read ends up being done in 4 synchronous 4K chunks. Because they
are synchronous, they have no chance of being merged with one another
either.

I have attached a proof of concept hack... I think what should really
happen is that page_cache_readahead should be taught about the size
of the requested read, and ensures that a decent amount of reading is
done while within the read request window, even if
beyond-request-window-readahead has been previously unsuccessful.

Numbers with an IDE disk, 256MB ram
2.4.24:		 81s
2.6.6-rc3-mm1:  126s
rc3-mm1+patch:   87s

The small remaining regression might be explained by 2.6's smaller
nr_requests, IDE driver, io scheduler tuning, etc.

> Here are the results (values are number of seconds to complete the test):
> 
> 2.4.25: 77.5377
> 
> 2.6.5-bk2(noop): 165.3393
> 2.6.5-bk2(anticipatory): 118.7450
> 2.6.5-bk2(deadline): 130.3254
> 2.6.5-bk2(CFQ): 146.4286
> 
> 2.6.6-rc3(noop): 164.9486
> 2.6.6-rc3(anticipatory): 125.1776
> 2.6.6-rc3(deadline): 131.8903
> 2.6.6-rc3(CFQ): 152.9280
> 
> I have published the results as well as the hardware and kernel setups at the
> SysBench home page: http://sysbench.sourceforge.net/results/fileio/
> 
> Any comments or suggestions would be highly appreciated.
> 

 From your website:
"Another interesting fact is that AS gives the best results for this
workload, though it's believed to give worse results for this kind of
workloads as compared to other I/O schedulers available in 2.6.x
kernels."

The anticipatory scheduler is actually in a fairly good state of tune,
and can often beat deadline even for random read/write/fsync tests. The
infamous database regression problem is when this sort of workload is
combined with TCQ disk drives.

Nick

[-- Attachment #2: read-populate.patch --]
[-- Type: text/x-patch, Size: 1010 bytes --]

 include/linux/mm.h             |    0 
 linux-2.6-npiggin/mm/filemap.c |    5 ++++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff -puN mm/readahead.c~read-populate mm/readahead.c
diff -puN mm/filemap.c~read-populate mm/filemap.c
--- linux-2.6/mm/filemap.c~read-populate	2004-05-03 19:56:00.000000000 +1000
+++ linux-2.6-npiggin/mm/filemap.c	2004-05-03 20:51:37.000000000 +1000
@@ -627,6 +627,9 @@ void do_generic_mapping_read(struct addr
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
+	force_page_cache_readahead(mapping, filp, index,
+			max_sane_readahead(desc->count >> PAGE_CACHE_SHIFT));
+
 	for (;;) {
 		struct page *page;
 		unsigned long end_index, nr, ret;
@@ -644,7 +647,7 @@ void do_generic_mapping_read(struct addr
 		}
 
 		cond_resched();
-		page_cache_readahead(mapping, ra, filp, index);
+		page_cache_readahead(mapping, ra, filp, index + desc->count);
 
 		nr = nr - offset;
 find_page:
diff -puN include/linux/mm.h~read-populate include/linux/mm.h

_

  reply	other threads:[~2004-05-03 11:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov
2004-05-03 11:14 ` Nick Piggin [this message]
2004-05-03 18:08   ` Andrew Morton
2004-05-03 20:22     ` Ram Pai
2004-05-03 20:57       ` Andrew Morton
2004-05-03 21:37         ` Peter Zaitsev
2004-05-03 21:50           ` Ram Pai
2004-05-03 22:01             ` Peter Zaitsev
2004-05-03 21:59           ` Andrew Morton
2004-05-03 22:07             ` Ram Pai
2004-05-03 23:58             ` Nick Piggin
2004-05-04  0:10               ` Andrew Morton
2004-05-04  0:19                 ` Nick Piggin
2004-05-04  0:50                   ` Ram Pai
2004-05-04  6:29                     ` Andrew Morton
2004-05-04 15:03                       ` Ram Pai
2004-05-04 19:39                         ` Ram Pai
2004-05-04 19:48                           ` Andrew Morton
2004-05-04 19:58                             ` Ram Pai
2004-05-04 21:51                               ` Ram Pai
2004-05-04 22:29                                 ` Ram Pai
2004-05-04 23:01                           ` Alexey Kopytov
2004-05-04 23:20                             ` Andrew Morton
2004-05-05 22:04                               ` Alexey Kopytov
2004-05-06  8:43                                 ` Andrew Morton
2004-05-06 18:13                                   ` Peter Zaitsev
2004-05-06 21:49                                     ` Andrew Morton
2004-05-06 23:49                                       ` Nick Piggin
2004-05-07  1:29                                         ` Peter Zaitsev
2004-05-10 19:50                                   ` Ram Pai
2004-05-10 20:21                                     ` Andrew Morton
2004-05-10 22:39                                       ` Ram Pai
2004-05-10 23:07                                         ` Andrew Morton
2004-05-11 20:51                                           ` Ram Pai
2004-05-11 21:17                                             ` Andrew Morton
2004-05-13 20:41                                               ` Ram Pai
2004-05-17 17:30                                                 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai
2004-05-20  1:06                                                   ` Alexey Kopytov
2004-05-20  1:31                                                     ` Ram Pai
2004-05-21 19:32                                                       ` Alexey Kopytov
2004-05-20  5:49                                                     ` Andrew Morton
2004-05-20 21:59                                                     ` Andrew Morton
2004-05-20 22:23                                                       ` Andrew Morton
2004-05-21  7:31                                                         ` Nick Piggin
2004-05-21  7:50                                                           ` Jens Axboe
2004-05-21  8:40                                                             ` Nick Piggin
2004-05-21  8:56                                                             ` Spam: " Andrew Morton
2004-05-21 22:24                                                               ` Alexey Kopytov
2004-05-21 21:13                                                       ` Alexey Kopytov
2004-05-26  4:43                                                         ` Alexey Kopytov
2004-05-11 22:26                                           ` Random file I/O regressions in 2.6 Bill Davidsen
2004-05-04  1:15                   ` Andrew Morton
2004-05-04 11:39                     ` Nick Piggin
2004-05-04  8:27                 ` Arjan van de Ven
2004-05-04  8:47                   ` Andrew Morton
2004-05-04  8:50                     ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=409629A5.8070201@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=alexeyk@mysql.com \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox