From: Steven Pratt <slpratt@austin.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC] Simplified Readahead
Date: Fri, 24 Sep 2004 17:43:03 -0500 [thread overview]
Message-ID: <4154A2F7.1050909@austin.ibm.com> (raw)
In-Reply-To: <20040924150523.4853465b.akpm@osdl.org>
Andrew Morton wrote:
>Steven Pratt <slpratt@austin.ibm.com> wrote:
>
>
>>>The advantage of the current page-at-a-time code is that the readahead code
>>>behaves exactly the same, whether the application is doing 256 4k reads or
>>>one 1M read. Plus it fits the old pagefault requirement.
>>>
>>>
>>>
>>Yes, but it accomplishes this by possible making the 1M slower. And I
>>must admit that I don't know what the "old pagefault requirement" is.
>>Is that something we still need to worry about?
>>
>>
>
>The "old pagefault requirement": the code in there used to perform
>readaround at pagefault time as well as readahead at read() time. Hence it
>had to work well for single-page requests. That requirement isn't there
>any more but some of the code to support it is still there, perhaps.
>
>
>
>>>>1. pages already in cache
>>>>
>>>>
>>>>
>>>>
>>>Yes, we need to handle this. All pages in cache with lots of CPUs
>>>hammering the same file is a common case.
>>>
>>>Maybe not so significant on small x86, but on large power4 with a higher
>>>lock-versus-memcpy cost ratio, that extra locking will hurt.
>>>
>>>
>>>
>>Ok, we have some data from larger machines. I will collect it all and
>>summarize separately.
>>
>>
>
>SDET would be interesting, as well as explicit testing of lots of processes
>reading the same fully-cached file.
>
>
Don't have SDET but we have been working on the multiple processes
reading same file case on a large(16way POWER4 with 128GB) machine. We
had to apply some fdrcu patches to get past problems in fget_light which
were causing 80%spin on the file_lock. We then end up with anout 45%
spin lock on mapping->tree_lock. This is on vanilla rc2. I know you
changed that to a rw lock in your tree and we nee to try that as well..
Data is not consistant enough to make any conclusions, but I don't see a
dramatic change by turning off readahead. I need to do more testing on
this to get better results.
In any case I agree that we should deal with this case.
>>>>cache we should just immediately turn off readahead. What is this
>>>>trigger point? 4 I/Os in a row? 400?
>>>>
>>>>
>>>>
>>>Hard call.
>>>
>>>
>>>
>>I know, but we have to come up with something if we really want to avoid
>>the double lookup.
>>
>>
>
>As long as readahead gets fully disabled at some stage, we should be OK.
>
>
I am attaching a reworked patch which now shuts off readahead if 10M
(arbitrary value for now) of I/O comes from page cache in a row. Any
actual I/O will restart readahead.
>We should probably compare i_size with mapping->nrpages at open() time,
>too. No point in enabling readahead if it's all cached. But doing that
>would make performance testing harder, so do it later.
>
>
Ok. Sounds good.
>
>
>>>I do think we should skip the I/O for POSIX_FADV_WILLNEED against a
>>>congested queue. I can't immediately think of a good reason for skipping
>>>the I/O for normal readahead.
>>>
>>>
>>>
>>>
>>Can you expand on the POSIX_FADV_WILLNEED.
>>
>>
>
>It's an application-specified readahead hint. It should ideally be
>asynchronous so the application can get some I/O underway while it's
>crunching on something else. If the queue is contested then the
>application will accidentally block when launching the readahead, which
>kinda defeats the purpose.
>
>
Well if the app really does this asynchronously, does it matter that we
block?
>Yes, the application will block when it does the subsequent read() anyway,
>but applications expect to block in read(). Seems saner this way.
>
Just to be sure I have this correct, the readahead code will be invoked
once on the POSIX_FADV_WILLNEED request, but this looks mostly like a
regular read, and then again for the same pages on a real read?
Steve
next prev parent reply other threads:[~2004-09-24 22:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-23 16:06 [PATCH/RFC] Simplified Readahead Steven Pratt
2004-09-23 22:14 ` Joel Schopp
2004-09-24 0:21 ` Nick Piggin
2004-09-24 2:42 ` Andrew Morton
2004-09-24 15:40 ` Steven Pratt
2004-09-24 16:16 ` Nick Piggin
2004-09-24 16:48 ` Steven Pratt
2004-09-24 22:05 ` Andrew Morton
2004-09-24 22:43 ` Steven Pratt [this message]
2004-09-24 23:01 ` Andrew Morton
2004-09-27 15:39 ` Steven Pratt
2004-09-27 19:26 ` Andrew Morton
2004-09-28 10:13 ` Jens Axboe
2004-09-24 22:55 ` Steven Pratt
2004-09-27 20:29 ` Ray Bryant
2004-09-27 21:04 ` Steven Pratt
2004-09-25 0:45 ` Nick Piggin
2004-09-25 1:01 ` Ram Pai
2004-09-25 6:07 ` Ram Pai
2004-09-27 15:30 ` Steven Pratt
2004-09-27 18:42 ` Ram Pai
2004-09-27 20:07 ` Steven Pratt
2004-09-29 18:46 ` Ram Pai
2004-09-29 22:33 ` Steven Pratt
2004-09-29 23:13 ` Andreas Dilger
2004-09-30 2:26 ` Ram Pai
2004-09-30 5:29 ` Andrew Morton
2004-09-30 20:20 ` Stephen C. Tweedie
2004-09-30 1:12 ` Ram Pai
2004-10-01 21:02 ` Steven Pratt
2004-10-05 17:52 ` Ram Pai
[not found] <372479081@toto.iv>
2004-09-24 5:00 ` Peter Chubb
2004-09-24 22:57 ` Steven Pratt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4154A2F7.1050909@austin.ibm.com \
--to=slpratt@austin.ibm.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.