public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix readahead breakage for sequential after random reads
@ 2004-07-18 22:30 Miklos Szeredi
  2004-07-26 23:29 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2004-07-18 22:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hi Andrew,

Current readahead logic is broken when a random read pattern is
followed by a long sequential read.  The cause is that on a window
miss ra->next_size is set to ra->average, but ra->average is only
updated at the end of a sequence, so window size will remain 1 until
the end of the sequential read.

This patch fixes this by taking the current sequence length into
account (code taken from towards end of page_cache_readahead()), and
also setting ra->average to a decent value in handle_ra_miss() when
sequential access is detected.

Please apply!

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>

==============================================================================
--- linux-2.6.8-rc2/mm/readahead.c.orig	2004-06-16 07:18:57.000000000 +0200
+++ linux-2.6.8-rc2/mm/readahead.c	2004-07-18 23:52:31.000000000 +0200
@@ -470,7 +470,11 @@ do_io:
 			  * pages shall be accessed in the next
 			  * current window.
 			  */
-			ra->next_size = min(ra->average , (unsigned long)max);
+			average = ra->average;
+			if (ra->serial_cnt > average)
+				average = (ra->serial_cnt + ra->average + 1) / 2;
+
+			ra->next_size = min(average , (unsigned long)max);
 		}
 		ra->start = offset;
 		ra->size = ra->next_size;
@@ -552,6 +556,7 @@ void handle_ra_miss(struct address_space
 				ra->size = max;
 				ra->ahead_start = 0;
 				ra->ahead_size = 0;
+				ra->average = max / 2;
 			}
 		}
 		ra->prev_page = offset;




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-18 22:30 [PATCH] fix readahead breakage for sequential after random reads Miklos Szeredi
@ 2004-07-26 23:29 ` Andrew Morton
  2004-07-26 23:56   ` Ram Pai
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-07-26 23:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, Ram Pai

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> Current readahead logic is broken when a random read pattern is
>  followed by a long sequential read.  The cause is that on a window
>  miss ra->next_size is set to ra->average, but ra->average is only
>  updated at the end of a sequence, so window size will remain 1 until
>  the end of the sequential read.
> 
>  This patch fixes this by taking the current sequence length into
>  account (code taken from towards end of page_cache_readahead()), and
>  also setting ra->average to a decent value in handle_ra_miss() when
>  sequential access is detected.

Thanks.   Do you have any performance testing results from this patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-26 23:29 ` Andrew Morton
@ 2004-07-26 23:56   ` Ram Pai
  2004-07-27  0:08     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Ram Pai @ 2004-07-26 23:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, linux-kernel

Andrew,
	Yes the patch fixes a valid bug.

	
RP

On Mon, 2004-07-26 at 16:29, Andrew Morton wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > Current readahead logic is broken when a random read pattern is
> >  followed by a long sequential read.  The cause is that on a window
> >  miss ra->next_size is set to ra->average, but ra->average is only
> >  updated at the end of a sequence, so window size will remain 1 until
> >  the end of the sequential read.
> > 
> >  This patch fixes this by taking the current sequence length into
> >  account (code taken from towards end of page_cache_readahead()), and
> >  also setting ra->average to a decent value in handle_ra_miss() when
> >  sequential access is detected.
> 
> Thanks.   Do you have any performance testing results from this patch?
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-26 23:56   ` Ram Pai
@ 2004-07-27  0:08     ` Andrew Morton
  2004-07-27  4:18       ` Ram Pai
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-07-27  0:08 UTC (permalink / raw)
  To: Ram Pai; +Cc: miklos, linux-kernel

Ram Pai <linuxram@us.ibm.com> wrote:
>
> Andrew,
> 	Yes the patch fixes a valid bug.
> 

Please don't top-post :(

> RP
> 
> On Mon, 2004-07-26 at 16:29, Andrew Morton wrote:
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > Current readahead logic is broken when a random read pattern is
> > >  followed by a long sequential read.  The cause is that on a window
> > >  miss ra->next_size is set to ra->average, but ra->average is only
> > >  updated at the end of a sequence, so window size will remain 1 until
> > >  the end of the sequential read.
> > > 
> > >  This patch fixes this by taking the current sequence length into
> > >  account (code taken from towards end of page_cache_readahead()), and
> > >  also setting ra->average to a decent value in handle_ra_miss() when
> > >  sequential access is detected.
> > 
> > Thanks.   Do you have any performance testing results from this patch?
> > 
> Ram Pai <linuxram@us.ibm.com> wrote:
>
> Andrew,
> 	Yes the patch fixes a valid bug.

Fine, but the readahead code is performance-sensitive, and it takes quite
some time for any regressions to be discovered.  So I'm going to need to
either sit on this patch for a very long time, or extensively test it
myself, or await convincing test results from someone else.

Can you help with that?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-27  0:08     ` Andrew Morton
@ 2004-07-27  4:18       ` Ram Pai
  2004-07-27  7:40         ` Miklos Szeredi
  2004-08-04 17:38         ` Ram Pai
  0 siblings, 2 replies; 9+ messages in thread
From: Ram Pai @ 2004-07-27  4:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: miklos, linux-kernel

On Mon, 2004-07-26 at 17:08, Andrew Morton wrote:
> Ram Pai <linuxram@us.ibm.com> wrote:
> >
> > Andrew,
> > 	Yes the patch fixes a valid bug.
> > 
> 
> Please don't top-post :(
> > RP
> > 
> > On Mon, 2004-07-26 at 16:29, Andrew Morton wrote:
> > > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > Current readahead logic is broken when a random read pattern is
> > > >  followed by a long sequential read.  The cause is that on a window
> > > >  miss ra->next_size is set to ra->average, but ra->average is only
> > > >  updated at the end of a sequence, so window size will remain 1 until
> > > >  the end of the sequential read.
> > > > 
> > > >  This patch fixes this by taking the current sequence length into
> > > >  account (code taken from towards end of page_cache_readahead()), and
> > > >  also setting ra->average to a decent value in handle_ra_miss() when
> > > >  sequential access is detected.
> > > 
> > > Thanks.   Do you have any performance testing results from this patch?
> > > 
> > Ram Pai <linuxram@us.ibm.com> wrote:
> >
> > Andrew,
> > 	Yes the patch fixes a valid bug.
> 
> Fine, but the readahead code is performance-sensitive, and it takes quite
> some time for any regressions to be discovered.  So I'm going to need to
> either sit on this patch for a very long time, or extensively test it
> myself, or await convincing test results from someone else.
> 
> Can you help with that?

yes I will run all my standard testsuites before we take this patch.
(DSS workload, iozone, sysbench). I will get back with some results
sooon. Probably by the end of this week.

Also I think the bug that Miklos, found is really hard to reproduce. Did
he find this bug by code inspection? Its really really hard to get into
a state where the current window is of size 1 page with zero pages in
the readahead window, and then the sequential read pattern to just right
then. 

RP

> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-27  4:18       ` Ram Pai
@ 2004-07-27  7:40         ` Miklos Szeredi
  2004-07-27 15:34           ` Ram Pai
  2004-08-04 17:38         ` Ram Pai
  1 sibling, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2004-07-27  7:40 UTC (permalink / raw)
  To: linuxram; +Cc: akpm, linux-kernel


Ram Pai <linuxram@us.ibm.com> wrote:
>
> Also I think the bug that Miklos, found is really hard to reproduce. Did
> he find this bug by code inspection? Its really really hard to get into
> a state where the current window is of size 1 page with zero pages in
> the readahead window, and then the sequential read pattern to just right
> then. 

I found it by accident. I did my testing with the following read
sequence:

page offset  num pages
7            1
0            1
35           1
23           1
42           1
33           1
29           1
100          200

I did not measure actual performance, but only looked at the length of
the page vector passed to my filesystem's readpages() method.

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-27  7:40         ` Miklos Szeredi
@ 2004-07-27 15:34           ` Ram Pai
  0 siblings, 0 replies; 9+ messages in thread
From: Ram Pai @ 2004-07-27 15:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel

On Tue, 2004-07-27 at 00:40, Miklos Szeredi wrote:
> Ram Pai <linuxram@us.ibm.com> wrote:
> >
> > Also I think the bug that Miklos, found is really hard to reproduce. Did
> > he find this bug by code inspection? Its really really hard to get into
> > a state where the current window is of size 1 page with zero pages in
> > the readahead window, and then the sequential read pattern to just right
> > then. 
> 
> I found it by accident. I did my testing with the following read
> sequence:
> 
> page offset  num pages
> 7            1
> 0            1
> 35           1
> 23           1
> 42           1
> 33           1
> 29           1
> 100          200
> 
> I did not measure actual performance, but only looked at the length of
> the page vector passed to my filesystem's readpages() method.

right. this pattern is just about right to get into that bad state. Had
you had some more random reads then readahead algorithm will go into the
slow-read mode(readahead-off mode) and you will not bump into this bug.

RP

> 
> Miklos
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-07-27  4:18       ` Ram Pai
  2004-07-27  7:40         ` Miklos Szeredi
@ 2004-08-04 17:38         ` Ram Pai
  2004-08-04 19:39           ` Shane Shrybman
  1 sibling, 1 reply; 9+ messages in thread
From: Ram Pai @ 2004-08-04 17:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: miklos, linux-kernel, shrybman

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

On Mon, 2004-07-26 at 21:18, Ram Pai wrote:
> On Mon, 2004-07-26 at 17:08, Andrew Morton wrote:
> > Ram Pai <linuxram@us.ibm.com> wrote:
> > >
> > > Andrew,
> > > 	Yes the patch fixes a valid bug.
> > > 
> > 
> > Please don't top-post :(
> > > RP
> > > 
> > > On Mon, 2004-07-26 at 16:29, Andrew Morton wrote:
> > > > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > Current readahead logic is broken when a random read pattern is
> > > > >  followed by a long sequential read.  The cause is that on a window
> > > > >  miss ra->next_size is set to ra->average, but ra->average is only
> > > > >  updated at the end of a sequence, so window size will remain 1 until
> > > > >  the end of the sequential read.
> > > > > 
> > > > >  This patch fixes this by taking the current sequence length into
> > > > >  account (code taken from towards end of page_cache_readahead()), and
> > > > >  also setting ra->average to a decent value in handle_ra_miss() when
> > > > >  sequential access is detected.
> > > > 
> > > > Thanks.   Do you have any performance testing results from this patch?
> > > > 
> > > Ram Pai <linuxram@us.ibm.com> wrote:
> > >
> > > Andrew,
> > > 	Yes the patch fixes a valid bug.
> > 
> > Fine, but the readahead code is performance-sensitive, and it takes quite
> > some time for any regressions to be discovered.  So I'm going to need to
> > either sit on this patch for a very long time, or extensively test it
> > myself, or await convincing test results from someone else.
> > 
> > Can you help with that?
> 
> yes I will run all my standard testsuites before we take this patch.
> (DSS workload, iozone, sysbench). I will get back with some results
> sooon. Probably by the end of this week.

Ok I have enclosed the results. The summary is: there is no significant
improvement or decrease in performance of (DSS workload, iozone,
sysbench) The increase or decrease is in the margin of errors.

I have also enclosed a patch that partially backs off Miklos's fix. 
Shane Shrybman correctly pointed out that the real fix is to set
ra->average value to max/2 when we move from readahead-off mode to
readahead-on mode. The other part of Miklos's fix becomes irrelevent.

 
Sorry it took some time to get back on this. Its almost automated so
turnaround time should be quick now-on-wards.

RP

[-- Attachment #2: miklos_partial_backoff.patch --]
[-- Type: text/plain, Size: 531 bytes --]

--- linux-2.6.8-rc3/mm/readahead.c	2004-08-03 14:26:46.000000000 -0700
+++ mypatchdir/mm/readahead.c	2004-08-04 17:08:47.665196424 -0700
@@ -468,11 +468,7 @@ do_io:
 			  * pages shall be accessed in the next
 			  * current window.
 			  */
-			average = ra->average;
-			if (ra->serial_cnt > average)
-				average = (ra->serial_cnt + ra->average + 1) / 2;
-
-			ra->next_size = min(average , (unsigned long)max);
+			ra->next_size = min(ra->average , (unsigned long)max);
 		}
 		ra->start = offset;
 		ra->size = ra->next_size;

[-- Attachment #3: report --]
[-- Type: text/plain, Size: 4049 bytes --]

			DSS WORKLOAD
			----------------

---------------------------------------------------------------------
|   2.6.8-rc2	|	miklos_patch	|	miklos_mod_patch     |
|               |                       |                            |
----------------------------------------------------------------------
|               |                       |                            |
|     X		|	0.9001% increase|	0.73649% increase    |
|		| 	w.r.t 	X	|  	w.r.t	X	     |
|               |                       |                            |
---------------------------------------------------------------------


			iozone 
			-------
		iozone -c -t1 -s 4096m -r 128k -w
--------------------------------------------------------------------
|	      |	2.6.8-rc2	| miklos_patch	|miklos_mod_patch  | 
---------------------------------------------------------------------
|	      |                 |               |                  |
|Sequential   | 29773.95KB/sec  | 29777.38KB/sec|  29968.74KB/sec  |
|	      |                 |               |                  |
|reverse read | 17905.65KB/sec  | 17897.85KB/sec|  17726.55KB/sec  |
|	      |                 |               |                  |
|stride read  | 19927.43KB/sec  | 19778.90KB/sec|  19625.75KB/sec  |
|	      |                 |               |                  |
|random read  | 18454.49KB/sec  | 18384.28KB/sec|  18666.47KB/sec  |
|	      |                 |               |                  |
|random mix   | 19434.58KB/sec  | 19480.43KB/sec|  18755.05KB/sec  |
|	      |                 |               |                  |
|pread	      | 29763.56KB/sec  | 29780.05KB/sec|  29961.16KB/sec  |
|	      |                 |               |                  |
--------------------------------------------------------------------



			iozone NFS
			-------
		iozone -c -t1 -s 4096m -r 128k -w

---------------------------------------------------------------------------
|		| 2.6.8-rc2	|	miklos_patch	| miklos_mod_patch|
|---------------|---------------|-----------------------|-----------------|
|               |               |                       |           	  |
|Sequential	| 2272.77KB/sec |	2266.45KB/sec 	|   2269.98KB/sec |
|               |               |                       |           	  |
|reverse read	| 3078.14KB/sec |	3066.73KB/sec 	|   3079.87KB/sec |
|               |               |                       |           	  |
|stride read	| 3253.49KB/sec |	3220.70KB/sec 	|   3264.41KB/sec |
|               |               |                       |           	  |
|random read	| 5225.44KB/sec |	5231.68KB/sec 	|   5274.32KB/sec |
|               |               |                       |           	  |
|random mix	| 8389.64KB/sec |	8458.40KB/sec 	|   7722.72KB/sec |
|               |               |                       |           	  |
|pread		| 2264.64KB/sec |	2264.35KB/sec 	|   2256.15KB/sec |
--------------------------------------------------------------------------




			sysbench
			--------

sysbench --num-threads=256 --test=fileio --file-total-size=2800M --file-test-mode=rndrw run

-------------------------------------------------------------------------
|		|	2.6.8-rc2 	| miklos_patch	|miklos_mod_patch|
-------------------------------------------------------------------------
|               |                       |               |                |
| Throughput	|	2.745MB/sec     | 2.721MB/sec	| 2.607MB/sec    |
|---------------|-----------------------|---------------|----------------|
|               |                       |               |                |
|               |                       |               |                |
|Requests/sec	|	175.99		| 174.40MB/sec	| 167.08MB/sec   |
|---------------|-----------------------|---------------|----------------|
|               |                       |               |                |
|               |                       |               |                |
|Execution time	|	56.8213s	| 57.3400s	| 59.8498s       |
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix readahead breakage for sequential after random reads
  2004-08-04 17:38         ` Ram Pai
@ 2004-08-04 19:39           ` Shane Shrybman
  0 siblings, 0 replies; 9+ messages in thread
From: Shane Shrybman @ 2004-08-04 19:39 UTC (permalink / raw)
  To: Ram Pai; +Cc: Andrew Morton, miklos, linux-kernel

On Wed, 2004-08-04 at 13:38, Ram Pai wrote:
> On Mon, 2004-07-26 at 21:18, Ram Pai wrote:
> > On Mon, 2004-07-26 at 17:08, Andrew Morton wrote:
> > > Ram Pai <linuxram@us.ibm.com> wrote:
> > > >
> > > > Andrew,
> > > > 	Yes the patch fixes a valid bug.
> > > > 
> > > 
> > > Please don't top-post :(
> > > > RP
> > > > 
> > > > On Mon, 2004-07-26 at 16:29, Andrew Morton wrote:
> > > > > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > >
> > > > > > Current readahead logic is broken when a random read pattern is
> > > > > >  followed by a long sequential read.  The cause is that on a window
> > > > > >  miss ra->next_size is set to ra->average, but ra->average is only
> > > > > >  updated at the end of a sequence, so window size will remain 1 until
> > > > > >  the end of the sequential read.
> > > > > > 
> > > > > >  This patch fixes this by taking the current sequence length into
> > > > > >  account (code taken from towards end of page_cache_readahead()), and
> > > > > >  also setting ra->average to a decent value in handle_ra_miss() when
> > > > > >  sequential access is detected.
> > > > > 
> > > > > Thanks.   Do you have any performance testing results from this patch?
> > > > > 
> > > > Ram Pai <linuxram@us.ibm.com> wrote:
> > > >
> > > > Andrew,
> > > > 	Yes the patch fixes a valid bug.
> > > 
> > > Fine, but the readahead code is performance-sensitive, and it takes quite
> > > some time for any regressions to be discovered.  So I'm going to need to
> > > either sit on this patch for a very long time, or extensively test it
> > > myself, or await convincing test results from someone else.
> > > 
> > > Can you help with that?
> > 
> > yes I will run all my standard testsuites before we take this patch.
> > (DSS workload, iozone, sysbench). I will get back with some results
> > sooon. Probably by the end of this week.
> 
> Ok I have enclosed the results. The summary is: there is no significant
> improvement or decrease in performance of (DSS workload, iozone,
> sysbench) The increase or decrease is in the margin of errors.
> 
> I have also enclosed a patch that partially backs off Miklos's fix. 
> Shane Shrybman correctly pointed out that the real fix is to set
> ra->average value to max/2 when we move from readahead-off mode to
> readahead-on mode. The other part of Miklos's fix becomes irrelevent.
> 

The patch looks fine to me.
>  
> Sorry it took some time to get back on this. Its almost automated so
> turnaround time should be quick now-on-wards.
> 

For these types of bugs the difference is not in the IO performance
numbers but in the processing overhead. Does it make sense to track
processor statistics (sys/user/io-wait...) during the benchmarks. So we
could say that this patch doesn't change IO performance but uses 1% less
system time?

Maybe use /usr/bin/time -v sysbench ...

> RP

Regards,

Shane


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-08-04 19:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-18 22:30 [PATCH] fix readahead breakage for sequential after random reads Miklos Szeredi
2004-07-26 23:29 ` Andrew Morton
2004-07-26 23:56   ` Ram Pai
2004-07-27  0:08     ` Andrew Morton
2004-07-27  4:18       ` Ram Pai
2004-07-27  7:40         ` Miklos Szeredi
2004-07-27 15:34           ` Ram Pai
2004-08-04 17:38         ` Ram Pai
2004-08-04 19:39           ` Shane Shrybman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox