From: Andrew Morton <akpm@osdl.org>
To: Steven Pratt <slpratt@austin.ibm.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@suse.de>
Subject: Re: [PATCH/RFC] Simplified Readahead
Date: Mon, 27 Sep 2004 12:26:06 -0700 [thread overview]
Message-ID: <20040927122606.78feb424.akpm@osdl.org> (raw)
In-Reply-To: <4158342B.4020505@austin.ibm.com>
Steven Pratt <slpratt@austin.ibm.com> wrote:
>
> >yup. POSIX_FADV_WILLNEED should just populate pagecache and should launch
> >asynchronous I/O.
> >
>
> Well then this could cause problems (other than congestion) on both the
> current and new code since both will effectivly see 2 reads, the second
> of which may appear to be a seek backwards thus confusing the code
> slightly. Would it be best to just special case the POSIX_FADV_WILLNEED
> and issue the I/O required (via do_page_cache_readahead) without
> updating any of the window or current page offset information ?
That's what we do at present. do_page_cache_readahead() and
force_page_cache_readahead() are low-level functions which do not affect
file->ra_state.
Except whoops. POSIX_FADV_WILLNEED is using force_page_cache_readahead(),
which bypasses the congested check. Wonder how that happened.
<digs out the changeset>
ChangeSet 1.1046.589.14 2003/08/01 10:02:32 akpm@osdl.org
[PATCH] rework readahead for congested queues
Since Jens changed the block layer to fail readahead if the queue has no
requests free, a few changes suggest themselves.
- It's a bit silly to go and alocate a bunch of pages, build BIOs for them,
submit the IO only to have it fail, forcing us to free the pages again.
So the patch changes do_page_cache_readahead() to peek at the queue's
read_congested state. If the queue is read-congested we abandon the entire
readahead up-front without doing all that work.
- If the queue is not read-congested, we go ahead and do the readahead,
after having set PF_READAHEAD.
The backing_dev_info's read-congested threshold cuts in when 7/8ths of
the queue's requests are in flight, so it is probable that the readahead
abandonment code in __make_request will now almost never trigger.
- The above changes make do_page_cache_readahead() "unreliable", in that it
may do nothing at all.
However there are some system calls:
- fadvise(POSIX_FADV_WILLNEED)
- madvise(MADV_WILLNEED)
- sys_readahead()
In which the user has an expectation that the kernel will actually
perform the IO.
So the patch creates a new "force_page_cache_readahead()" which will
perform the IO regardless of the queue's congestion state.
Arguably, this is the wrong thing to do: even though the application
requested readahead it could be that the kernel _should_ abandon the user's
request because the disk is so busy.
I don't know. But for now, let's keep the above syscalls behaviour
unchanged. It is trivial to switch back to do_page_cache_readahead()
later.
So there we have it. The reason why normal readahead skips congested
queues is because the block layer will drop the I/O on the floor *anyway*
because it also skips congested queues for readahead I/O requests.
And fadvise() was switched to force_page_cache_readahead() because that was
the old behaviour.
But PF_READAHEAD isn't there any more, and BIO_RW_AHEAD and BIO_RW_BLOCK
are not used anywhere, so we broke that. Jens, do you remember what
happened there?
next prev parent reply other threads:[~2004-09-27 19:28 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-23 16:06 [PATCH/RFC] Simplified Readahead Steven Pratt
2004-09-23 22:14 ` Joel Schopp
2004-09-24 0:21 ` Nick Piggin
2004-09-24 2:42 ` Andrew Morton
2004-09-24 15:40 ` Steven Pratt
2004-09-24 16:16 ` Nick Piggin
2004-09-24 16:48 ` Steven Pratt
2004-09-24 22:05 ` Andrew Morton
2004-09-24 22:43 ` Steven Pratt
2004-09-24 23:01 ` Andrew Morton
2004-09-27 15:39 ` Steven Pratt
2004-09-27 19:26 ` Andrew Morton [this message]
2004-09-28 10:13 ` Jens Axboe
2004-09-24 22:55 ` Steven Pratt
2004-09-27 20:29 ` Ray Bryant
2004-09-27 21:04 ` Steven Pratt
2004-09-25 0:45 ` Nick Piggin
2004-09-25 1:01 ` Ram Pai
2004-09-25 6:07 ` Ram Pai
2004-09-27 15:30 ` Steven Pratt
2004-09-27 18:42 ` Ram Pai
2004-09-27 20:07 ` Steven Pratt
2004-09-29 18:46 ` Ram Pai
2004-09-29 22:33 ` Steven Pratt
2004-09-29 23:13 ` Andreas Dilger
2004-09-30 2:26 ` Ram Pai
2004-09-30 5:29 ` Andrew Morton
2004-09-30 20:20 ` Stephen C. Tweedie
2004-09-30 1:12 ` Ram Pai
2004-10-01 21:02 ` Steven Pratt
2004-10-05 17:52 ` Ram Pai
[not found] <372479081@toto.iv>
2004-09-24 5:00 ` Peter Chubb
2004-09-24 22:57 ` Steven Pratt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040927122606.78feb424.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=slpratt@austin.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.