linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Clemens Ladisch <clemens@ladisch.de>,
	Olivier Galibert <galibert@pobox.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 03/11] readahead: bump up the default readahead size
Date: Mon, 08 Feb 2010 08:20:31 +0100	[thread overview]
Message-ID: <4B6FBB3F.4010701@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100207041043.147345346@intel.com>

This is related to our discussion from October 09 e.g. 
http://lkml.indiana.edu/hypermail/linux/kernel/0910.1/01468.html

I work for s390 where - as mainframe - we only have environments that 
benefit from 512k readahead, but I still expect some embedded devices won't.
While my idea of making it configurable was not liked in the past, it 
may be still useful when introducing this default change to let some 
small devices choose without patching the src (a number field defaulting 
to 512 and explaining the past of that value would be really nice).

For the discussion of 512 vs. 128 I can add from my measurements that I 
have seen the following:
- 512 is by far superior to 128 for sequential reads
- improvements with iozone sequential read scaling from 1 to 64 parallel 
processes up to +35%
- readahead sizes larger than 512 reevealed to not be "more useful" but 
increasing the chance of trashing in low mem systems

So I appreciate this change with a little note that I would prefer a 
config option.
-> tested & acked-by Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>

Wu Fengguang wrote:
 >
 > Use 512kb max readahead size, and 32kb min readahead size.
 >
 > The former helps io performance for common workloads.
 > The latter will be used in the thrashing safe context readahead.
 >
 > -- Rationals on the 512kb size --
 >
 > I believe it yields more I/O throughput without noticeably increasing
 > I/O latency for today's HDD.
 >
 > For example, for a 100MB/s and 8ms access time HDD, its random IO or
 > highly concurrent sequential IO would in theory be:
 >
 > io_size KB  access_time  transfer_time  io_latency   util%   
throughput KB/s
 > 4           8             0.04           8.04        0.49%    497.57 
 > 8           8             0.08           8.08        0.97%    990.33 
 > 16          8             0.16           8.16        1.92%   1961.69
 > 32          8             0.31           8.31        3.76%   3849.62
 > 64          8             0.62           8.62        7.25%   7420.29
 > 128         8             1.25           9.25       13.51%  13837.84
 > 256         8             2.50          10.50       23.81%  24380.95
 > 512         8             5.00          13.00       38.46%  39384.62
 > 1024        8            10.00          18.00       55.56%  56888.89
 > 2048        8            20.00          28.00       71.43%  73142.86
 > 4096        8            40.00          48.00       83.33%  85333.33
 >
 > The 128KB => 512KB readahead size boosts IO throughput from ~13MB/s to
 > ~39MB/s, while merely increases (minimal) IO latency from 9.25ms to 13ms.
 >
 > As for SSD, I find that Intel X25-M SSD desires large readahead size
 > even for sequential reads:
 >
 >     rasize    1st run        2nd run
 >     ----------------------------------
 >       4k    123 MB/s    122 MB/s
 >      16k      153 MB/s    153 MB/s
 >      32k    161 MB/s    162 MB/s
 >      64k    167 MB/s    168 MB/s
 >     128k    197 MB/s    197 MB/s
 >     256k    217 MB/s    217 MB/s
 >     512k    238 MB/s    234 MB/s
 >       1M    251 MB/s    248 MB/s
 >       2M    259 MB/s    257 MB/s
 >          4M    269 MB/s    264 MB/s
 >       8M    266 MB/s    266 MB/s
 >
 > The two other impacts of an enlarged readahead size are
 >
 > - memory footprint (caused by readahead miss)
 >     Sequential readahead hit ratio is pretty high regardless of max
 >     readahead size; the extra memory footprint is mainly caused by
 >     enlarged mmap read-around.
 >     I measured my desktop:
 >     - under Xwindow:
 >         128KB readahead hit ratio = 143MB/230MB = 62%
 >         512KB readahead hit ratio = 138MB/248MB = 55%
 >           1MB readahead hit ratio = 130MB/253MB = 51%
 >     - under console: (seems more stable than the Xwindow data)
 >         128KB readahead hit ratio = 30MB/56MB   = 53%
 >           1MB readahead hit ratio = 30MB/59MB   = 51%
 >     So the impact to memory footprint looks acceptable.
 >
 > - readahead thrashing
 >     It will now cost 1MB readahead buffer per stream.  Memory tight
 >     systems typically do not run multiple streams; but if they do
 >     so, it should help I/O performance as long as we can avoid
 >     thrashing, which can be achieved with the following patches.
 >
 > -- Benchmarks by Vivek Goyal --
 >
 > I have got two paths to the HP EVA and got multipath device setup(dm-3).
 > I run increasing number of sequential readers. File system is ext3 and
 > filesize is 1G.
 > I have run the tests 3 times (3sets) and taken the average of it.
 >
 > Workload=bsr      iosched=cfq     Filesz=1G   bs=32K
 > ======================================================================
 >                     2.6.33-rc5                2.6.33-rc5-readahead
 > job   Set NR  ReadBW(KB/s)   MaxClat(us)    ReadBW(KB/s)   MaxClat(us)
 > ---   --- --  ------------   -----------    ------------   -----------
 > bsr   3   1   141768         130965         190302         97937.3   
 > bsr   3   2   131979         135402         185636         223286    
 > bsr   3   4   132351         420733         185986         363658    
 > bsr   3   8   133152         455434         184352         428478    
 > bsr   3   16  130316         674499         185646         594311    
 >
 > I ran same test on a different piece of hardware. There are few SATA 
disks
 > (5-6) in striped configuration behind a hardware RAID controller.
 >
 > Workload=bsr      iosched=cfq     Filesz=1G   bs=32K
 > ======================================================================
 >                     2.6.33-rc5                2.6.33-rc5-readahead
 > job   Set NR  ReadBW(KB/s)   MaxClat(us)    ReadBW(KB/s)   
MaxClat(us)   
 > ---   --- --  ------------   -----------    ------------   
-----------   
 > bsr   3   1   147569         14369.7        160191         
22752         
 > bsr   3   2   124716         243932         149343         
184698        
 > bsr   3   4   123451         327665         147183         
430875        
 > bsr   3   8   122486         455102         144568         
484045        
 > bsr   3   16  117645         1.03957e+06    137485         
1.06257e+06   
 >
 > Tested-by: Vivek Goyal <vgoyal@redhat.com>
 > CC: Jens Axboe <jens.axboe@oracle.com>
 > CC: Chris Mason <chris.mason@oracle.com>
 > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
 > CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
 > CC: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
 > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
 > ---
 >  include/linux/mm.h |    4 ++--
 >  1 file changed, 2 insertions(+), 2 deletions(-)
 >
 > --- linux.orig/include/linux/mm.h    2010-01-30 17:38:49.000000000 +0800
 > +++ linux/include/linux/mm.h    2010-01-30 18:09:58.000000000 +0800
 > @@ -1184,8 +1184,8 @@ int write_one_page(struct page *page, in
 >  void task_dirty_inc(struct task_struct *tsk);
 >
 >  /* readahead.c */
 > -#define VM_MAX_READAHEAD    128    /* kbytes */
 > -#define VM_MIN_READAHEAD    16    /* kbytes (includes current page) */
 > +#define VM_MAX_READAHEAD    512    /* kbytes */
 > +#define VM_MIN_READAHEAD    32    /* kbytes (includes current page) */
 >
 >  int force_page_cache_readahead(struct address_space *mapping, struct 
file *filp,
 >              pgoff_t offset, unsigned long nr_to_read);
 >
 >

-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-08  7:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-07  4:10 [PATCH 00/11] 512K readahead size with thrashing safe readahead Wu Fengguang
2010-02-07  4:10 ` [PATCH 01/11] readahead: limit readahead size for small devices Wu Fengguang
2010-02-07  4:10 ` [PATCH 02/11] readahead: retain inactive lru pages to be accessed soon Wu Fengguang
2010-02-07  4:10 ` [PATCH 03/11] readahead: bump up the default readahead size Wu Fengguang
2010-02-08  7:20   ` Christian Ehrhardt [this message]
2010-02-08 13:46     ` Wu Fengguang
2010-02-11 21:37       ` Matt Mackall
2010-02-11 23:42         ` Jamie Lokier
2010-02-12  0:04           ` Matt Mackall
2010-02-12 13:59           ` Wu Fengguang
2010-02-12 20:20             ` Matt Mackall
2010-02-21  2:25               ` Wu Fengguang
2010-02-07  4:10 ` [PATCH 04/11] readahead: introduce {MAX|MIN}_READAHEAD_PAGES macros for ease of use Wu Fengguang
2010-02-07  4:10 ` [PATCH 05/11] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2010-02-08  8:19   ` Nick Piggin
2010-02-08 13:43     ` Wu Fengguang
2010-02-07  4:10 ` [PATCH 06/11] readahead: thrashing safe context readahead Wu Fengguang
2010-02-07  4:10 ` [PATCH 07/11] readahead: record readahead patterns Wu Fengguang
2010-02-07  4:10 ` [PATCH 08/11] readahead: add tracing event Wu Fengguang
2010-02-07  4:10 ` [PATCH 09/11] readahead: add /debug/readahead/stats Wu Fengguang
2010-02-07  4:10 ` [PATCH 10/11] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2010-02-07  4:10 ` [PATCH 11/11] radixtree: speed up next/prev hole search Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6FBB3F.4010701@linux.vnet.ibm.com \
    --to=ehrhardt@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=clemens@ladisch.de \
    --cc=fengguang.wu@intel.com \
    --cc=galibert@pobox.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).