public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ram Pai <linuxram@us.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Alexey Kopytov <alexeyk@mysql.com>,
	nickpiggin@yahoo.com.au, peter@mysql.com,
	linux-kernel@vger.kernel.org, axboe@suse.de
Subject: Re: Random file I/O regressions in 2.6
Date: 10 May 2004 12:50:59 -0700	[thread overview]
Message-ID: <1084218659.6140.459.camel@localhost.localdomain> (raw)
In-Reply-To: <20040506014307.1a97d23b.akpm@osdl.org>

On Thu, 2004-05-06 at 01:43, Andrew Morton wrote:

sorry, I am out for 10days and hence replies are late.

> The reason for the difference appears to be the thing which Ram added to
> readahead which causes it to usually read one page too many.  With this
> exciting patch:
> 
> --- 25/mm/readahead.c~a	2004-05-06 01:24:26.230330464 -0700
> +++ 25-akpm/mm/readahead.c	2004-05-06 01:24:26.234329856 -0700
> @@ -475,7 +475,7 @@ do_io:
>  		ra->ahead_start = 0;		/* Invalidate these */
>  		ra->ahead_size = 0;
>  		actual = do_page_cache_readahead(mapping, filp, offset,
> -						 ra->size);
> +				ra->size == 5 ? 4 : ra->size);
>  		if(!first_access) {
>  			/*
>  			 * do not adjust the readahead window size the first
> 
> _
> 
> 
> I get:
> 
> 	Time spent for test:  63.9435s
> 		0.07s user 6.69s system 5% cpu 2:11.02 total
> 
> which is a good result.
> 
> Ram, can you take a look at fixing that up please?  Something clean, not
> more hacks ;) I'd also be interested in an explanation of what the extra
> page is for.  The little comment in there doesn't really help.


The reason for the extra page read is as follows:

Consider 16k random reads i/os. Reads are generated 4pages at a time.

the readahead is triggered when the 4th page in the 'current-window' is
touched. However the data which is read-in through the 'readahead
window' gets thrown away because the next 16k read-io will not access
anything read in the readahead window. As a result I put in that
optimization which handles this wasted readahead-pages. 

The idea is, when we miss the current-window, read one more page than
the number of pages accessed in the current-window. 

Here is a example scenario of random 16k i/os and with Andrew's code
		actual = do_page_cache_readahead(mapping, filp, offset,
-						 ra->size);
+				ra->size == 5 ? 4 : ra->size);


Consider that the application access  page {1,2,3,4}  
{100,101,102,103}  {200,201,202,203}

Consider that the current-window holds 4 pages. i.e page 1,2,3,4

when the application asks for {1,2,3,4} we happily satisfy them through
the current-window. However when the application touches page 4, the
lazy-readahead triggers in and brings in pages {5,6,7,8,9,10,11,12} but
now the application wants to access {100,101,102,103}. This waste of
effort is probably bearable as long as we dont commit the same mistake
in the future. When the application tries to access {100,101,102,103}
the code then scraps both the current-window and the readahead-window
and reads in a new current-window of size 4 i.e {100,101,102,103}.
However when
the application touches page 103, the lazy-readahead gets triggered and
brings in 8 more pages {104,105,106,107,108,109,110,111} and as always
all these pages go wasted. This wastage continues for ever.

My Optimization [ I mean hack ;) ] was meant to avoid this bad behavior.
Instead of reading in 'number of pages accessed in the current-window',
I read in 'one more page than the number of pages accessed in the
current-window'. With this optimization the behavior changes to as
follows:

when the application asks for {1,2,3,4} we happily satisfy them through
the current-window. However when the application touches page 4, the
lazy-readahead triggers and brings in pages {5,6,7,8,9,10,11,12} but now
the application wants to access {100,101,102,103}. This bad behavior is
probably ok since the optimization ensures that it does not commit the
same mistake in the future. When the application tries to access
{100,101,102,103} the code then scraps both the current window and the
readahead window and reads in a new current window of size 4+1 i.e
{100,101,102,103,104}. 
However since the application does not touch page 104 and hence
lazy-readahead does not get triggered we do not waste effort
bringing in pages. And this nice behavior continues for ever.


Probably we may see marginal degradation of this optimization with 16k
i/o but the amount of wastage avoided by this optimization (hack) 
is great when random i/o is of larger size. I think it was 4% better
performance on DSS workload with 64k random reads.

Do you still think its a hack?

Also I think  with sysbench workload and Andrew's ra-copy patch, we
might be loosing some benefits of some of the optimization because 
if two threads simulteously work with copies of the same ra structure
and update it, the optimization effect reflected in one of the
ra-structure is lost depending on which ra structure gets copied back
last.

RP


  parent reply	other threads:[~2004-05-10 19:53 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-02 19:57 Random file I/O regressions in 2.6 Alexey Kopytov
2004-05-03 11:14 ` Nick Piggin
2004-05-03 18:08   ` Andrew Morton
2004-05-03 20:22     ` Ram Pai
2004-05-03 20:57       ` Andrew Morton
2004-05-03 21:37         ` Peter Zaitsev
2004-05-03 21:50           ` Ram Pai
2004-05-03 22:01             ` Peter Zaitsev
2004-05-03 21:59           ` Andrew Morton
2004-05-03 22:07             ` Ram Pai
2004-05-03 23:58             ` Nick Piggin
2004-05-04  0:10               ` Andrew Morton
2004-05-04  0:19                 ` Nick Piggin
2004-05-04  0:50                   ` Ram Pai
2004-05-04  6:29                     ` Andrew Morton
2004-05-04 15:03                       ` Ram Pai
2004-05-04 19:39                         ` Ram Pai
2004-05-04 19:48                           ` Andrew Morton
2004-05-04 19:58                             ` Ram Pai
2004-05-04 21:51                               ` Ram Pai
2004-05-04 22:29                                 ` Ram Pai
2004-05-04 23:01                           ` Alexey Kopytov
2004-05-04 23:20                             ` Andrew Morton
2004-05-05 22:04                               ` Alexey Kopytov
2004-05-06  8:43                                 ` Andrew Morton
2004-05-06 18:13                                   ` Peter Zaitsev
2004-05-06 21:49                                     ` Andrew Morton
2004-05-06 23:49                                       ` Nick Piggin
2004-05-07  1:29                                         ` Peter Zaitsev
2004-05-10 19:50                                   ` Ram Pai [this message]
2004-05-10 20:21                                     ` Andrew Morton
2004-05-10 22:39                                       ` Ram Pai
2004-05-10 23:07                                         ` Andrew Morton
2004-05-11 20:51                                           ` Ram Pai
2004-05-11 21:17                                             ` Andrew Morton
2004-05-13 20:41                                               ` Ram Pai
2004-05-17 17:30                                                 ` Random file I/O regressions in 2.6 [patch+results] Ram Pai
2004-05-20  1:06                                                   ` Alexey Kopytov
2004-05-20  1:31                                                     ` Ram Pai
2004-05-21 19:32                                                       ` Alexey Kopytov
2004-05-20  5:49                                                     ` Andrew Morton
2004-05-20 21:59                                                     ` Andrew Morton
2004-05-20 22:23                                                       ` Andrew Morton
2004-05-21  7:31                                                         ` Nick Piggin
2004-05-21  7:50                                                           ` Jens Axboe
2004-05-21  8:40                                                             ` Nick Piggin
2004-05-21  8:56                                                             ` Spam: " Andrew Morton
2004-05-21 22:24                                                               ` Alexey Kopytov
2004-05-21 21:13                                                       ` Alexey Kopytov
2004-05-26  4:43                                                         ` Alexey Kopytov
2004-05-11 22:26                                           ` Random file I/O regressions in 2.6 Bill Davidsen
2004-05-04  1:15                   ` Andrew Morton
2004-05-04 11:39                     ` Nick Piggin
2004-05-04  8:27                 ` Arjan van de Ven
2004-05-04  8:47                   ` Andrew Morton
2004-05-04  8:50                     ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1084218659.6140.459.camel@localhost.localdomain \
    --to=linuxram@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=alexeyk@mysql.com \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=peter@mysql.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox