From: Steven Pratt <slpratt@austin.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC] Simplified Readahead
Date: Fri, 24 Sep 2004 17:43:03 -0500 [thread overview]
Message-ID: <4154A2F7.1050909@austin.ibm.com> (raw)
In-Reply-To: <20040924150523.4853465b.akpm@osdl.org>
Andrew Morton wrote:
>Steven Pratt <slpratt@austin.ibm.com> wrote:
>
>
>>>The advantage of the current page-at-a-time code is that the readahead code
>>>behaves exactly the same, whether the application is doing 256 4k reads or
>>>one 1M read. Plus it fits the old pagefault requirement.
>>>
>>>
>>>
>>Yes, but it accomplishes this by possible making the 1M slower. And I
>>must admit that I don't know what the "old pagefault requirement" is.
>>Is that something we still need to worry about?
>>
>>
>
>The "old pagefault requirement": the code in there used to perform
>readaround at pagefault time as well as readahead at read() time. Hence it
>had to work well for single-page requests. That requirement isn't there
>any more but some of the code to support it is still there, perhaps.
>
>
>
>>>>1. pages already in cache
>>>>
>>>>
>>>>
>>>>
>>>Yes, we need to handle this. All pages in cache with lots of CPUs
>>>hammering the same file is a common case.
>>>
>>>Maybe not so significant on small x86, but on large power4 with a higher
>>>lock-versus-memcpy cost ratio, that extra locking will hurt.
>>>
>>>
>>>
>>Ok, we have some data from larger machines. I will collect it all and
>>summarize separately.
>>
>>
>
>SDET would be interesting, as well as explicit testing of lots of processes
>reading the same fully-cached file.
>
>
Don't have SDET but we have been working on the multiple processes
reading same file case on a large(16way POWER4 with 128GB) machine. We
had to apply some fdrcu patches to get past problems in fget_light which
were causing 80%spin on the file_lock. We then end up with anout 45%
spin lock on mapping->tree_lock. This is on vanilla rc2. I know you
changed that to a rw lock in your tree and we nee to try that as well..
Data is not consistant enough to make any conclusions, but I don't see a
dramatic change by turning off readahead. I need to do more testing on
this to get better results.
In any case I agree that we should deal with this case.
>>>>cache we should just immediately turn off readahead. What is this
>>>>trigger point? 4 I/Os in a row? 400?
>>>>
>>>>
>>>>
>>>Hard call.
>>>
>>>
>>>
>>I know, but we have to come up with something if we really want to avoid
>>the double lookup.
>>
>>
>
>As long as readahead gets fully disabled at some stage, we should be OK.
>
>
I am attaching a reworked patch which now shuts off readahead if 10M
(arbitrary value for now) of I/O comes from page cache in a row. Any
actual I/O will restart readahead.
>We should probably compare i_size with mapping->nrpages at open() time,
>too. No point in enabling readahead if it's all cached. But doing that
>would make performance testing harder, so do it later.
>
>
Ok. Sounds good.
>
>
>>>I do think we should skip the I/O for POSIX_FADV_WILLNEED against a
>>>congested queue. I can't immediately think of a good reason for skipping
>>>the I/O for normal readahead.
>>>
>>>
>>>
>>>
>>Can you expand on the POSIX_FADV_WILLNEED.
>>
>>
>
>It's an application-specified readahead hint. It should ideally be
>asynchronous so the application can get some I/O underway while it's
>crunching on something else. If the queue is contested then the
>application will accidentally block when launching the readahead, which
>kinda defeats the purpose.
>
>
Well if the app really does this asynchronously, does it matter that we
block?
>Yes, the application will block when it does the subsequent read() anyway,
>but applications expect to block in read(). Seems saner this way.
>
Just to be sure I have this correct, the readahead code will be invoked
once on the POSIX_FADV_WILLNEED request, but this looks mostly like a
regular read, and then again for the same pages on a real read?
Steve
next prev parent reply other threads:[~2004-09-24 22:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-23 16:06 [PATCH/RFC] Simplified Readahead Steven Pratt
2004-09-23 22:14 ` Joel Schopp
2004-09-24 0:21 ` Nick Piggin
2004-09-24 2:42 ` Andrew Morton
2004-09-24 15:40 ` Steven Pratt
2004-09-24 16:16 ` Nick Piggin
2004-09-24 16:48 ` Steven Pratt
2004-09-24 22:05 ` Andrew Morton
2004-09-24 22:43 ` Steven Pratt [this message]
2004-09-24 23:01 ` Andrew Morton
2004-09-27 15:39 ` Steven Pratt
2004-09-27 19:26 ` Andrew Morton
2004-09-28 10:13 ` Jens Axboe
2004-09-24 22:55 ` Steven Pratt
2004-09-27 20:29 ` Ray Bryant
2004-09-27 21:04 ` Steven Pratt
2004-09-25 0:45 ` Nick Piggin
2004-09-25 1:01 ` Ram Pai
2004-09-25 6:07 ` Ram Pai
2004-09-27 15:30 ` Steven Pratt
2004-09-27 18:42 ` Ram Pai
2004-09-27 20:07 ` Steven Pratt
2004-09-29 18:46 ` Ram Pai
2004-09-29 22:33 ` Steven Pratt
2004-09-29 23:13 ` Andreas Dilger
2004-09-30 2:26 ` Ram Pai
2004-09-30 5:29 ` Andrew Morton
2004-09-30 20:20 ` Stephen C. Tweedie
2004-09-30 1:12 ` Ram Pai
2004-10-01 21:02 ` Steven Pratt
2004-10-05 17:52 ` Ram Pai
[not found] <372479081@toto.iv>
2004-09-24 5:00 ` Peter Chubb
2004-09-24 22:57 ` Steven Pratt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4154A2F7.1050909@austin.ibm.com \
--to=slpratt@austin.ibm.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox