From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Chris Mason <chris.mason@oracle.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
Matt Mackall <mpm@selenic.com>,
David Woodhouse <dwmw2@infradead.org>,
Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
Wu Fengguang <fengguang.wu@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Olivier Galibert <galibert@pobox.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 03/15] readahead: bump up the default readahead size
Date: Wed, 24 Feb 2010 11:10:04 +0800 [thread overview]
Message-ID: <20100224031054.032435626@intel.com> (raw)
In-Reply-To: 20100224031001.026464755@intel.com
[-- Attachment #1: readahead-enlarge-default-size.patch --]
[-- Type: text/plain, Size: 7038 bytes --]
Use 512kb max readahead size, and 32kb min readahead size.
The former helps io performance for common workloads.
The latter will be used in the thrashing safe context readahead.
====== Rationals on the 512kb size ======
I believe it yields more I/O throughput without noticeably increasing
I/O latency for today's HDD.
For example, for a 100MB/s and 8ms access time HDD, its random IO or
highly concurrent sequential IO would in theory be:
io_size KB access_time transfer_time io_latency util% throughput KB/s
4 8 0.04 8.04 0.49% 497.57
8 8 0.08 8.08 0.97% 990.33
16 8 0.16 8.16 1.92% 1961.69
32 8 0.31 8.31 3.76% 3849.62
64 8 0.62 8.62 7.25% 7420.29
128 8 1.25 9.25 13.51% 13837.84
256 8 2.50 10.50 23.81% 24380.95
512 8 5.00 13.00 38.46% 39384.62
1024 8 10.00 18.00 55.56% 56888.89
2048 8 20.00 28.00 71.43% 73142.86
4096 8 40.00 48.00 83.33% 85333.33
The 128KB => 512KB readahead size boosts IO throughput from ~13MB/s to
~39MB/s, while merely increases (minimal) IO latency from 9.25ms to 13ms.
As for SSD, I find that Intel X25-M SSD desires large readahead size
even for sequential reads:
rasize 1st run 2nd run
----------------------------------
4k 123 MB/s 122 MB/s
16k 153 MB/s 153 MB/s
32k 161 MB/s 162 MB/s
64k 167 MB/s 168 MB/s
128k 197 MB/s 197 MB/s
256k 217 MB/s 217 MB/s
512k 238 MB/s 234 MB/s
1M 251 MB/s 248 MB/s
2M 259 MB/s 257 MB/s
4M 269 MB/s 264 MB/s
8M 266 MB/s 266 MB/s
The two other impacts of an enlarged readahead size are
- memory footprint (caused by readahead miss)
Sequential readahead hit ratio is pretty high regardless of max
readahead size; the extra memory footprint is mainly caused by
enlarged mmap read-around.
I measured my desktop:
- under Xwindow:
128KB readahead hit ratio = 143MB/230MB = 62%
512KB readahead hit ratio = 138MB/248MB = 55%
1MB readahead hit ratio = 130MB/253MB = 51%
- under console: (seems more stable than the Xwindow data)
128KB readahead hit ratio = 30MB/56MB = 53%
1MB readahead hit ratio = 30MB/59MB = 51%
So the impact to memory footprint looks acceptable.
- readahead thrashing
It will now cost 1MB readahead buffer per stream. Memory tight
systems typically do not run multiple streams; but if they do
so, it should help I/O performance as long as we can avoid
thrashing, which can be achieved with the following patches.
I also boot the system into console with different readahead size,
and find that both the io_count and readahead_hit_ratio reduced by
~10% when increasing readahead_size from 128k to 512k. I guess typical
desktop users would prefer the reduced IO numbers (for fastboot) at
the cost of a dozen MB memory.
readahead_size io_count avg_io_pages total_readahead_pages readahead_hit_ratio
4k 6765 1 6765 -
128k 1077 8 8616 78.5%
512k 897 11 9867 68.6%
1024k 867 12 10404 65.0%
total_readahead_pages = io_count * avg_io_size
====== Remarks by Christian Ehrhardt ======
- 512 is by far superior to 128 for sequential reads
- improvements with iozone sequential read scaling from 1 to 64 parallel
processes up to +35%
- readahead sizes larger than 512 reevealed to not be "more useful" but
increasing the chance of trashing in low mem systems
====== Benchmarks by Vivek Goyal ======
I have got two paths to the HP EVA and got multipath device setup(dm-3).
I run increasing number of sequential readers. File system is ext3 and
filesize is 1G.
I have run the tests 3 times (3sets) and taken the average of it.
Workload=bsr iosched=cfq Filesz=1G bs=32K
======================================================================
2.6.33-rc5 2.6.33-rc5-readahead
job Set NR ReadBW(KB/s) MaxClat(us) ReadBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------ -----------
bsr 3 1 141768 130965 190302 97937.3
bsr 3 2 131979 135402 185636 223286
bsr 3 4 132351 420733 185986 363658
bsr 3 8 133152 455434 184352 428478
bsr 3 16 130316 674499 185646 594311
I ran same test on a different piece of hardware. There are few SATA disks
(5-6) in striped configuration behind a hardware RAID controller.
Workload=bsr iosched=cfq Filesz=1G bs=32K
======================================================================
2.6.33-rc5 2.6.33-rc5-readahead
job Set NR ReadBW(KB/s) MaxClat(us) ReadBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------ -----------
bsr 3 1 147569 14369.7 160191 22752
bsr 3 2 124716 243932 149343 184698
bsr 3 4 123451 327665 147183 430875
bsr 3 8 122486 455102 144568 484045
bsr 3 16 117645 1.03957e+06 137485 1.06257e+06
CC: Jens Axboe <jens.axboe@oracle.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Matt Mackall <mpm@selenic.com>
CC: David Woodhouse <dwmw2@infradead.org>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/mm.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- linux.orig/include/linux/mm.h 2010-02-24 10:44:26.000000000 +0800
+++ linux/include/linux/mm.h 2010-02-24 10:44:41.000000000 +0800
@@ -1186,8 +1186,8 @@ int write_one_page(struct page *page, in
void task_dirty_inc(struct task_struct *tsk);
/* readahead.c */
-#define VM_MAX_READAHEAD 128 /* kbytes */
-#define VM_MIN_READAHEAD 16 /* kbytes (includes current page) */
+#define VM_MAX_READAHEAD 512 /* kbytes */
+#define VM_MIN_READAHEAD 32 /* kbytes (includes current page) */
int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-24 3:10 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-24 3:10 [PATCH 00/15] 512K readahead size with thrashing safe readahead v2 Wu Fengguang
2010-02-24 3:10 ` [PATCH 01/15] readahead: limit readahead size for small devices Wu Fengguang
2010-02-25 3:11 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 02/15] readahead: retain inactive lru pages to be accessed soon Wu Fengguang
2010-02-25 3:17 ` Rik van Riel
2010-02-25 12:27 ` Wu Fengguang
2010-02-24 3:10 ` Wu Fengguang [this message]
2010-02-25 4:02 ` [PATCH 03/15] readahead: bump up the default readahead size Rik van Riel
2010-02-24 3:10 ` [PATCH 04/15] readahead: make default readahead size a kernel parameter Wu Fengguang
2010-02-25 14:59 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 05/15] readahead: limit readahead size for small memory systems Wu Fengguang
2010-02-25 15:00 ` Rik van Riel
2010-02-25 15:25 ` Christian Ehrhardt
2010-02-26 2:29 ` Wu Fengguang
2010-02-26 2:48 ` [PATCH] readahead: add notes on readahead size Wu Fengguang
2010-02-26 14:17 ` Vivek Goyal
2010-02-26 7:23 ` [PATCH 05/15] readahead: limit readahead size for small memory systems Christian Ehrhardt
2010-02-26 7:38 ` Wu Fengguang
2010-02-24 3:10 ` [PATCH 06/15] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2010-02-25 15:52 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 07/15] readahead: thrashing safe context readahead Wu Fengguang
2010-02-25 16:24 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 08/15] readahead: record readahead patterns Wu Fengguang
2010-02-25 22:37 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 09/15] readahead: add tracing event Wu Fengguang
2010-02-25 22:38 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 10/15] readahead: add /debug/readahead/stats Wu Fengguang
2010-02-25 22:40 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 11/15] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2010-02-25 22:42 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 12/15] radixtree: introduce radix_tree_lookup_leaf_node() Wu Fengguang
2010-02-25 23:13 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 13/15] radixtree: speed up the search for hole Wu Fengguang
2010-02-25 23:37 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 14/15] readahead: reduce MMAP_LOTSAMISS for mmap read-around Wu Fengguang
2010-02-25 23:42 ` Rik van Riel
2010-02-24 3:10 ` [PATCH 15/15] readahead: pagecache context based " Wu Fengguang
2010-02-26 1:33 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100224031054.032435626@intel.com \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=dwmw2@infradead.org \
--cc=ehrhardt@linux.vnet.ibm.com \
--cc=jens.axboe@oracle.com \
--cc=mpm@selenic.com \
--cc=paul.gortmaker@windriver.com \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).