All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Alexey Kopytov <alexeyk@mysql.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@suse.de>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: Random file I/O regressions in 2.6
Date: Mon, 03 May 2004 21:14:45 +1000	[thread overview]
Message-ID: <409629A5.8070201@yahoo.com.au> (raw)
In-Reply-To: <200405022357.59415.alexeyk@mysql.com>

[-- Attachment #1: Type: text/plain, Size: 3200 bytes --]

Alexey Kopytov wrote:
> Hello!
> 
> I tried to compare random file I/O performance in 2.4 and 2.6 kernels and 
> found some regressions that I failed to explain. I tested 2.4.25, 2.6.5-bk2 
> and 2.6.6-rc3 with my own utility SysBench which was written to generate 
> workloads similar to a database under intensive load. 
> 
> For 2.6.x kernels anticipatory, deadline, CFQ and noop I/O schedulers were
> tested with AS giving the best results for this workload, but it's still about 
> 1.5 times worse than the results for 2.4.25 kernel.
> 
> The SysBench 'fileio' test was configured to generate the following workload:
> 16 worker threads are created, each running random read/write file requests in
> blocks of 16 KB with a read/write ratio of 1.5. All I/O operations are evenly
> distributed over 128 files with a total size of 3 GB. Each 100 requests, an
> fsync() operations is performed sequentially on each file. The total number of
> requests is limited by 10000.
> 
> The FS used for the test was ext3 with data=ordered.
> 

I am able to reproduce this here. 2.6 isn't improved by increasing
nr_requests, relaxing IO scheduler deadlines, or turning off readahead.
It looks like 2.6 is submitting a lot of the IO in 4KB sized requests...

Hmm, oh dear. It looks like the readahead logic shat itself and/or
do_generic_mapping_read doesn't know how to handle multipage reads
properly.

What ends up happening is that readahead gets turned off, then the
16K read ends up being done in 4 synchronous 4K chunks. Because they
are synchronous, they have no chance of being merged with one another
either.

I have attached a proof of concept hack... I think what should really
happen is that page_cache_readahead should be taught about the size
of the requested read, and ensures that a decent amount of reading is
done while within the read request window, even if
beyond-request-window-readahead has been previously unsuccessful.

Numbers with an IDE disk, 256MB ram
2.4.24:		 81s
2.6.6-rc3-mm1:  126s
rc3-mm1+patch:   87s

The small remaining regression might be explained by 2.6's smaller
nr_requests, IDE driver, io scheduler tuning, etc.

> Here are the results (values are number of seconds to complete the test):
> 
> 2.4.25: 77.5377
> 
> 2.6.5-bk2(noop): 165.3393
> 2.6.5-bk2(anticipatory): 118.7450
> 2.6.5-bk2(deadline): 130.3254
> 2.6.5-bk2(CFQ): 146.4286
> 
> 2.6.6-rc3(noop): 164.9486
> 2.6.6-rc3(anticipatory): 125.1776
> 2.6.6-rc3(deadline): 131.8903
> 2.6.6-rc3(CFQ): 152.9280
> 
> I have published the results as well as the hardware and kernel setups at the
> SysBench home page: http://sysbench.sourceforge.net/results/fileio/
> 
> Any comments or suggestions would be highly appreciated.
> 

 From your website:
"Another interesting fact is that AS gives the best results for this
workload, though it's believed to give worse results for this kind of
workloads as compared to other I/O schedulers available in 2.6.x
kernels."

The anticipatory scheduler is actually in a fairly good state of tune,
and can often beat deadline even for random read/write/fsync tests. The
infamous database regression problem is when this sort of workload is
combined with TCQ disk drives.

Nick

[-- Attachment #2: read-populate.patch --]
[-- Type: text/x-patch, Size: 1010 bytes --]

 include/linux/mm.h             |    0 
 linux-2.6-npiggin/mm/filemap.c |    5 ++++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff -puN mm/readahead.c~read-populate mm/readahead.c
diff -puN mm/filemap.c~read-populate mm/filemap.c
--- linux-2.6/mm/filemap.c~read-populate	2004-05-03 19:56:00.000000000 +1000
+++ linux-2.6-npiggin/mm/filemap.c	2004-05-03 20:51:37.000000000 +1000
@@ -627,6 +627,9 @@ void do_generic_mapping_read(struct addr
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
+	force_page_cache_readahead(mapping, filp, index,
+			max_sane_readahead(desc->count >> PAGE_CACHE_SHIFT));
+
 	for (;;) {
 		struct page *page;
 		unsigned long end_index, nr, ret;
@@ -644,7 +647,7 @@ void do_generic_mapping_read(struct addr
 		}
 
 		cond_resched();
-		page_cache_readahead(mapping, ra, filp, index);
+		page_cache_readahead(mapping, ra, filp, index + desc->count);
 
 		nr = nr - offset;
 find_page:
diff -puN include/linux/mm.h~read-populate include/linux/mm.h

_

  reply	other threads:[~2004-05-03 11:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov
2004-05-03 11:14 ` Nick Piggin [this message]
2004-05-03 18:08   ` Andrew Morton
2004-05-03 20:22     ` Ram Pai
2004-05-03 20:57       ` Andrew Morton
2004-05-03 21:37         ` Peter Zaitsev
2004-05-03 21:50           ` Ram Pai
2004-05-03 22:01             ` Peter Zaitsev
2004-05-03 21:59           ` Andrew Morton
2004-05-03 22:07             ` Ram Pai
2004-05-03 23:58             ` Nick Piggin
2004-05-04  0:10               ` Andrew Morton
2004-05-04  0:19                 ` Nick Piggin
2004-05-04  0:50                   ` Ram Pai
2004-05-04  6:29                     ` Andrew Morton
2004-05-04 15:03                       ` Ram Pai
2004-05-04 19:39                         ` Ram Pai
2004-05-04 19:48                           ` Andrew Morton
2004-05-04 19:58                             ` Ram Pai
2004-05-04 21:51                               ` Ram Pai
2004-05-04 22:29                                 ` Ram Pai
2004-05-04 23:01                           ` Alexey Kopytov
2004-05-04 23:20                             ` Andrew Morton
2004-05-05 22:04                               ` Alexey Kopytov
2004-05-06  8:43                                 ` Andrew Morton
2004-05-06 18:13                                   ` Peter Zaitsev
2004-05-06 21:49                                     ` Andrew Morton
2004-05-06 23:49                                       ` Nick Piggin
2004-05-07  1:29                                         ` Peter Zaitsev
2004-05-10 19:50                                   ` Ram Pai
2004-05-10 20:21                                     ` Andrew Morton
2004-05-10 22:39                                       ` Ram Pai
2004-05-10 23:07                                         ` Andrew Morton
2004-05-11 20:51                                           ` Ram Pai
2004-05-11 21:17                                             ` Andrew Morton
2004-05-13 20:41                                               ` Ram Pai
2004-05-17 17:30                                                 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai
2004-05-20  1:06                                                   ` Alexey Kopytov
2004-05-20  1:31                                                     ` Ram Pai
2004-05-21 19:32                                                       ` Alexey Kopytov
2004-05-20  5:49                                                     ` Andrew Morton
2004-05-20 21:59                                                     ` Andrew Morton
2004-05-20 22:23                                                       ` Andrew Morton
2004-05-21  7:31                                                         ` Nick Piggin
2004-05-21  7:50                                                           ` Jens Axboe
2004-05-21  8:40                                                             ` Nick Piggin
2004-05-21  8:56                                                             ` Spam: " Andrew Morton
2004-05-21 22:24                                                               ` Alexey Kopytov
2004-05-21 21:13                                                       ` Alexey Kopytov
2004-05-26  4:43                                                         ` Alexey Kopytov
2004-05-11 22:26                                           ` Random file I/O regressions in 2.6 Bill Davidsen
2004-05-04  1:15                   ` Andrew Morton
2004-05-04 11:39                     ` Nick Piggin
2004-05-04  8:27                 ` Arjan van de Ven
2004-05-04  8:47                   ` Andrew Morton
2004-05-04  8:50                     ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=409629A5.8070201@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=alexeyk@mysql.com \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.