public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Steven Pratt <slpratt@austin.ibm.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@suse.de>
Subject: Re: [PATCH/RFC] Simplified Readahead
Date: Mon, 27 Sep 2004 12:26:06 -0700	[thread overview]
Message-ID: <20040927122606.78feb424.akpm@osdl.org> (raw)
In-Reply-To: <4158342B.4020505@austin.ibm.com>

Steven Pratt <slpratt@austin.ibm.com> wrote:
>
> >yup.  POSIX_FADV_WILLNEED should just populate pagecache and should launch
>  >asynchronous I/O.
>  >
> 
>  Well then this could cause problems (other than congestion) on both the 
>  current and new code since both will effectivly see 2 reads, the second 
>  of which may appear to be a seek backwards thus confusing the code 
>  slightly.  Would it be best to just special case the POSIX_FADV_WILLNEED 
>  and issue the I/O required (via do_page_cache_readahead) without 
>  updating any of the window or current page offset  information ?

That's what we do at present.  do_page_cache_readahead() and
force_page_cache_readahead() are low-level functions which do not affect
file->ra_state.

Except whoops.  POSIX_FADV_WILLNEED is using force_page_cache_readahead(),
which bypasses the congested check.  Wonder how that happened.

<digs out the changeset>

ChangeSet 1.1046.589.14 2003/08/01 10:02:32 akpm@osdl.org
  [PATCH] rework readahead for congested queues
  
  Since Jens changed the block layer to fail readahead if the queue has no
  requests free, a few changes suggest themselves.
  
  - It's a bit silly to go and alocate a bunch of pages, build BIOs for them,
    submit the IO only to have it fail, forcing us to free the pages again.
  
    So the patch changes do_page_cache_readahead() to peek at the queue's
    read_congested state.  If the queue is read-congested we abandon the entire
    readahead up-front without doing all that work.
  
  - If the queue is not read-congested, we go ahead and do the readahead,
    after having set PF_READAHEAD.
  
    The backing_dev_info's read-congested threshold cuts in when 7/8ths of
    the queue's requests are in flight, so it is probable that the readahead
    abandonment code in __make_request will now almost never trigger.
  
  - The above changes make do_page_cache_readahead() "unreliable", in that it
    may do nothing at all.
  
    However there are some system calls:
  
  	- fadvise(POSIX_FADV_WILLNEED)
  	- madvise(MADV_WILLNEED)
  	- sys_readahead()
  
    In which the user has an expectation that the kernel will actually
    perform the IO.
  
    So the patch creates a new "force_page_cache_readahead()" which will
    perform the IO regardless of the queue's congestion state.
  
    Arguably, this is the wrong thing to do: even though the application
    requested readahead it could be that the kernel _should_ abandon the user's
    request because the disk is so busy.
  
    I don't know.  But for now, let's keep the above syscalls behaviour
    unchanged.  It is trivial to switch back to do_page_cache_readahead()
    later.


So there we have it.  The reason why normal readahead skips congested
queues is because the block layer will drop the I/O on the floor *anyway*
because it also skips congested queues for readahead I/O requests.

And fadvise() was switched to force_page_cache_readahead() because that was
the old behaviour.

But PF_READAHEAD isn't there any more, and BIO_RW_AHEAD and BIO_RW_BLOCK
are not used anywhere, so we broke that.  Jens, do you remember what
happened there?


  reply	other threads:[~2004-09-27 19:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-23 16:06 [PATCH/RFC] Simplified Readahead Steven Pratt
2004-09-23 22:14 ` Joel Schopp
2004-09-24  0:21 ` Nick Piggin
2004-09-24  2:42 ` Andrew Morton
2004-09-24 15:40   ` Steven Pratt
2004-09-24 16:16     ` Nick Piggin
2004-09-24 16:48       ` Steven Pratt
2004-09-24 22:05     ` Andrew Morton
2004-09-24 22:43       ` Steven Pratt
2004-09-24 23:01         ` Andrew Morton
2004-09-27 15:39           ` Steven Pratt
2004-09-27 19:26             ` Andrew Morton [this message]
2004-09-28 10:13               ` Jens Axboe
2004-09-24 22:55       ` Steven Pratt
2004-09-27 20:29         ` Ray Bryant
2004-09-27 21:04           ` Steven Pratt
2004-09-25  0:45       ` Nick Piggin
2004-09-25  1:01 ` Ram Pai
2004-09-25  6:07   ` Ram Pai
2004-09-27 15:30     ` Steven Pratt
2004-09-27 18:42       ` Ram Pai
2004-09-27 20:07         ` Steven Pratt
2004-09-29 18:46           ` Ram Pai
2004-09-29 22:33             ` Steven Pratt
2004-09-29 23:13               ` Andreas Dilger
2004-09-30  2:26                 ` Ram Pai
2004-09-30  5:29                   ` Andrew Morton
2004-09-30 20:20                 ` Stephen C. Tweedie
2004-09-30  1:12               ` Ram Pai
2004-10-01 21:02             ` Steven Pratt
2004-10-05 17:52               ` Ram Pai
     [not found] <372479081@toto.iv>
2004-09-24  5:00 ` Peter Chubb
2004-09-24 22:57   ` Steven Pratt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040927122606.78feb424.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=slpratt@austin.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox