Re: [PATCH/RFC] Simplified Readahead

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Steven Pratt <slpratt@austin.ibm.com>
To: linuxram@us.ibm.com
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC] Simplified Readahead
Date: Fri, 01 Oct 2004 16:02:10 -0500	[thread overview]
Message-ID: <415DC5D2.8000405@austin.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0409291113580.4449-600000@localhost.localdomain>

Ram Pai wrote:

snip...

>>>>>To summarize you noticed 3 problems:
>>>>>
>>>>>1. page cache hits not handled properly.
>>>>>2. readahead thrashing not accounted.
>>>>>3. read congestion not accounted.
>>>>>          
>>>>>
>
>
>I have enclosed 5 patches that address each of the issues.
>
>1 . Code is obtuse and hard to maintain
>
>	The best I could do is update the comments to reflect the
>	current code. Hopefully that should help. 
>	
>	attached patch 1_comment.patch takes care of that part to
>	some extent.
>
>
>2. page cache hits not handled properly.
>
>	I fixed this by decrementing the size of the next readahead window
>	by the number of pages hit in the page cache. Now it slowly
>	accomodates the page cache hits. 
>
>	attached patch 2_cachehits.patch takes care of this issue.
>
>3. queue congestion not handled.
>
>	The fix is: call force_page_cache_readahead() if we are 
>	populating pages in the current window.
>	And call do_page_cache_readahead() if we are populating
>	pages in the ahead window. However if do_page_cache_readahead()
>	return with congestion, the readahead window is collapsed back 
>	to size zero. This will ensure that the next time ahead window
>	is attempted to populate.
>
>	attached patch 3_queuecongestion.patch handles this issue.
>
>4. page thrash handled ineffectively.
>
>	The fix is: on page thrash detection shutdown readahead.
>
>	attached patch 4_pagethrash.patch handles this issue.
>
>5. slow read path is too slow.
>
>	I could not figure out a way to atleast-read-the-requested-
>	number-of-pages if readahead is shutdown, without incorporating
>	the readsize parameter to page_cache_readahead(). So had
>	to pick some of your code in filemap.c to do that. Thanks!
>	
>	attached patch 5_fixedslowread.patch handles this issue.
>
>
>Apart from this you have noticed other issues
>
>6.  cache lookup done unneccessrily twice for pagecache_hits.
>
>	I have not handled this issue currently. But should be doable
>	if I introducing a flag, which notes when readahead is
>	shutdown by pagecahche hits. And hence attempts to lookup
>	the page only once.
>	
>
>And you have other features in your patch which will be the real
>differentiating factors.
>
>7.  exponential expand and shrink of window sizes.
>
>8.  overlapped read of current window and ahead window. 
>
>	( I think both are  desirable feature )
>
>I did run some premilinary tests using your patch and the above patches
>and found 
>
>your patch was doing slightly better on iozone and sysbench.
>however the above patch were doing slightly better with DSS workload.
>  
>

Ok, I have re-run the Tiobench tests.  On a single cpu ide based system 
you new patches have no noticable effect on sequential read performance 
(a good thing); but on random I/O things went bad :-(.

Here are the random read results for 16k io with 4GB fileset on 256MB 
mem, single cpu IDE

               Stock      w/ patches

  Threads      MBs/sec      MBs/sec    %diff         diff  
---------- ------------ ------------ -------- ------------ 
         1         1.73         1.72    -0.58        -0.01  
         4         1.70         1.56    -8.24        -0.14  
        16         1.66         0.81   -51.20        -0.85  
        64         1.49         0.68   -54.36        -0.81 

As you can see somewhere after 4 threads the new patches cause performance to tank.  

With 512k ios the problem kicks in with less than 4 threads.

               Stock      w/ patches
  Threads      MBs/sec      MBs/sec    %diff         diff  
---------- ------------ ------------ -------- ------------ 
         1        18.50        18.55     0.27         0.05 
         4         8.55         6.59   -22.92        -1.96  
        16         8.40         5.18   -38.33        -3.22 
        64         7.34         4.76   -35.15        -2.58 


Unfortunately this is the _good_ news.  The bad news is that this is much worse on SCSI.
We lose a few percent on sequential reads for all block sizes and random is just totally screwed.

Here is the same 16k io requests size with 4GB fileset on 1GB memory on 8way system on single scsi disk.

               stock        w/ patch
   Threads      MBs/sec      MBs/sec    %diff         diff   
---------- ------------ ------------ -------- ------------ 
         1         3.43         3.03   -11.66        -0.40   
         4         4.51         1.06   -76.50        -3.45 
        16         5.86         1.43   -75.60        -4.43   
        64         6.13         1.66   -72.92        -4.47 

11% degrade even on 1 thread, 75% degrade for 4 threads and above!  This is horribly broken. 


Steve

next prev parent reply	other threads:[~2004-10-01 21:17 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-23 16:06 [PATCH/RFC] Simplified Readahead Steven Pratt
2004-09-23 22:14 ` Joel Schopp
2004-09-24  0:21 ` Nick Piggin
2004-09-24  2:42 ` Andrew Morton
2004-09-24 15:40   ` Steven Pratt
2004-09-24 16:16     ` Nick Piggin
2004-09-24 16:48       ` Steven Pratt
2004-09-24 22:05     ` Andrew Morton
2004-09-24 22:43       ` Steven Pratt
2004-09-24 23:01         ` Andrew Morton
2004-09-27 15:39           ` Steven Pratt
2004-09-27 19:26             ` Andrew Morton
2004-09-28 10:13               ` Jens Axboe
2004-09-24 22:55       ` Steven Pratt
2004-09-27 20:29         ` Ray Bryant
2004-09-27 21:04           ` Steven Pratt
2004-09-25  0:45       ` Nick Piggin
2004-09-25  1:01 ` Ram Pai
2004-09-25  6:07   ` Ram Pai
2004-09-27 15:30     ` Steven Pratt
2004-09-27 18:42       ` Ram Pai
2004-09-27 20:07         ` Steven Pratt
2004-09-29 18:46           ` Ram Pai
2004-09-29 22:33             ` Steven Pratt
2004-09-29 23:13               ` Andreas Dilger
2004-09-30  2:26                 ` Ram Pai
2004-09-30  5:29                   ` Andrew Morton
2004-09-30 20:20                 ` Stephen C. Tweedie
2004-09-30  1:12               ` Ram Pai
2004-10-01 21:02             ` Steven Pratt [this message]
2004-10-05 17:52               ` Ram Pai
     [not found] <372479081@toto.iv>
2004-09-24  5:00 ` Peter Chubb
2004-09-24 22:57   ` Steven Pratt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=415DC5D2.8000405@austin.ibm.com \
    --to=slpratt@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox