From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"npiggin@suse.de" <npiggin@suse.de>,
"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
"yinghan@google.com" <yinghan@google.com>
Subject: Re: [PATCH 9/9] readahead: record mmap read-around states in file_ra_state
Date: Sat, 11 Apr 2009 12:24:52 +0800 [thread overview]
Message-ID: <20090411042452.GB6613@localhost> (raw)
In-Reply-To: <20090410163853.0e1b8f7c.akpm@linux-foundation.org>
On Sat, Apr 11, 2009 at 07:38:53AM +0800, Andrew Morton wrote:
> On Fri, 10 Apr 2009 14:10:06 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > Mmap read-around now shares the same code style and data structure
> > with readahead code.
> >
> > This also removes do_page_cache_readahead().
> > Its last user, mmap read-around, has been changed to call ra_submit().
> >
> > The no-readahead-if-congested logic is dumped by the way.
> > Users will be pretty sensitive about the slow loading of executables.
> > So it's unfavorable to disabled mmap read-around on a congested queue.
>
> Did you verify that the read-congested code ever triggers?
No.
Sorry the described is an imagined case. There could be other
counter-cases, however..
> It used to be (and probably still is) the case that
> bdi_read_congested() is very very rare, because the read queue is long
> and the kernel rarely puts many read requests into it. You can of
> course create this condition with a fake workload with may
> threads/processes, but it _is_ fake.
The major workloads that could trigger read congestions:
1) file servers running highly concurrent IO streams
2) concurrent sys_readahead()s on a desktop, for fast booting
3) mysql is also able to issue concurrent sys_readahead()s
4) more workloads out of my imaginary
For 1) the change is favorable or irrelevant.
For 2) and 3) the change is irrelevant. The user space readahead
process must make sure the degree of parallelism is kept in control,
so that read congestion _never_ happen. Or normal reads will be blocked.
> Some real-world workloads (databases?) will of course trigger
> bdi_read_congested(). But they're usually doing fixed-sized reads, and
> if we're doing _any_ readahead/readaround in that case, readahead is
> busted.
Hmm I didn't have this possibility in mind: the read congestion is
exactly caused by a pool of concurrent mmap readers _themself_.
Let's assume they are doing N-page sized _sparse_ random reads.
(otherwise this patch will be favorable)
The current mmap_miss accounting works so that if ever N >= 2, the
readaround will be enabled for ever; if N == 1, the readaround will
be disabled quickly. The designer of this logic must be a master!
Why? If an application is to do 2-page random reads, the best option
will be the read() syscall. Because the size info will be _lost_ when
doing mmap reads. If ever the application author cares about
performance (i.e. big DBMS), he will find out that truth either by
theorizing or through experiments.
IMHO the kernel mmap readaround algorithm shall only be optimized for
- enabled: sequential reads
- enabled: large random reads
- enabled: clustered random reads
- disabled: 1-page random reads
and to perform bad and therefore discourage
- enabled and discouraged: small(2-page) and sparse random reads
It will do undesirable readaround for small sparse random mmap readers,
and I do think that we want this bad behavior to push such workloads
into using the more optimal read() syscall.
Therefore the change introduced by this patch is in line with the
above principles: either the readaround is favorable and should be
insisted even in read congestions, or readaround is already
unfavorable and let's keep it. If there are user complaints, good. We
_helped_ they discover performance bugs in their application and can
point them to the optimal solution. If it's not viable in short term,
there are workarounds like reducing the readahead size.
Thanks,
Fengguang
prev parent reply other threads:[~2009-04-11 4:25 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-10 6:09 [PATCH 0/9] filemap and readahead fixes for linux-next Wu Fengguang
2009-04-10 6:09 ` [PATCH 1/9] readahead: move max_sane_readahead() calls into force_page_cache_readahead() Wu Fengguang
2009-04-10 6:09 ` [PATCH 2/9] readahead: apply max_sane_readahead() limit in ondemand_readahead() Wu Fengguang
2009-04-10 6:10 ` [PATCH 3/9] readahead: remove one unnecessary radix tree lookup Wu Fengguang
2009-04-10 6:10 ` [PATCH 4/9] readahead: increase interleaved readahead size Wu Fengguang
2009-04-10 6:10 ` [PATCH 5/9] readahead: remove sync/async readahead call dependency Wu Fengguang
2009-04-10 6:10 ` [PATCH 6/9] readahead: clean up and simplify the code for filemap page fault readahead Wu Fengguang
2009-04-10 23:48 ` Andrew Morton
2009-04-11 13:58 ` KOSAKI Motohiro
2009-04-11 18:49 ` Andrew Morton
2009-04-12 23:16 ` KOSAKI Motohiro
2009-04-10 6:10 ` [PATCH 7/9] readahead: sequential mmap readahead Wu Fengguang
2009-04-10 23:34 ` Andrew Morton
2009-04-12 6:50 ` Wu Fengguang
2009-04-12 7:09 ` [PATCH] readahead: enforce full sync mmap readahead size Wu Fengguang
2009-04-12 15:15 ` Linus Torvalds
2009-04-13 13:53 ` Wu Fengguang
2009-04-14 7:01 ` Nick Piggin
2009-04-10 6:10 ` [PATCH 8/9] readahead: enforce full readahead size on async mmap readahead Wu Fengguang
2009-04-10 6:10 ` [PATCH 9/9] readahead: record mmap read-around states in file_ra_state Wu Fengguang
2009-04-10 23:38 ` Andrew Morton
2009-04-11 4:24 ` Wu Fengguang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090411042452.GB6613@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox