From: Minchan Kim <minchan.kim@gmail.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Quentin Barnes <qbarnes+nfs@yahoo-inc.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Steven Whitehouse <swhiteho@redhat.com>,
David Howells <dhowells@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Jonathan Corbet <corbet@lwn.net>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC][PATCH v3] readahead: introduce O_RANDOM for POSIX_FADV_RANDOM
Date: Mon, 4 Jan 2010 14:20:49 +0900 [thread overview]
Message-ID: <28c262361001032120v284e92b5ub1211f3d1fca6140@mail.gmail.com> (raw)
In-Reply-To: <20100104045020.GA21021@localhost>
Hi, Wu.
On Mon, Jan 4, 2010 at 1:50 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.
>
> POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor
> performance: a 16K read will be carried out in 4 _sync_ 1-page reads.
>
> In other places, ra_pages==0 means
> - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
> - some IO error happened
> where multi-page read IO won't help or should be avoided.
>
> POSIX_FADV_RANDOM actually want a different semantics: to disable the
> *heuristic* readahead algorithm, and to use a dumb one which faithfully
> submit read IO for whatever application requests.
>
> So introduce a flag O_RANDOM for POSIX_FADV_RANDOM.
> It will be visible to fcntl(F_GETFL).
>
> Note that the random hint is not likely to help random reads performance
> noticeably. And it may be too permissive on huge request size (its IO
> size is not limited by read_ahead_kb).
>
> In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
> (NFS read) performance of the application increased by 313%!
>
> v3: use O_RANDOM to indicate both read/write access pattern as in
> posix_fadvise(), although it only takes effect for read() now
> (proposed by Quentin)
> v2: use O_RANDOM_READ to avoid race conditions (pointed out by Andi)
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Steven Whitehouse <swhiteho@redhat.com>
> CC: David Howells <dhowells@redhat.com>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Christoph Hellwig <hch@infradead.org>
> Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> include/asm-generic/fcntl.h | 4 ++++
> mm/fadvise.c | 10 +++++++++-
> mm/readahead.c | 6 ++++++
> 3 files changed, 19 insertions(+), 1 deletion(-)
>
> --- linux.orig/include/asm-generic/fcntl.h 2010-01-04 12:39:29.000000000 +0800
> +++ linux/include/asm-generic/fcntl.h 2010-01-04 12:40:11.000000000 +0800
> @@ -80,6 +80,10 @@
> #define O_NDELAY O_NONBLOCK
> #endif
>
> +#ifndef O_RANDOM
> +#define O_RANDOM 010000000 /* random access pattern hint */
> +#endif
> +
> #define F_DUPFD 0 /* dup */
> #define F_GETFD 1 /* get close_on_exec */
> #define F_SETFD 2 /* set/clear close_on_exec */
> --- linux.orig/mm/fadvise.c 2010-01-04 12:39:29.000000000 +0800
> +++ linux/mm/fadvise.c 2010-01-04 12:39:30.000000000 +0800
> @@ -77,12 +77,20 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
> switch (advice) {
> case POSIX_FADV_NORMAL:
> file->f_ra.ra_pages = bdi->ra_pages;
> + spin_lock(&file->f_lock);
> + file->f_flags &= ~O_RANDOM;
> + spin_unlock(&file->f_lock);
> break;
> case POSIX_FADV_RANDOM:
> - file->f_ra.ra_pages = 0;
> + spin_lock(&file->f_lock);
> + file->f_flags |= O_RANDOM;
> + spin_unlock(&file->f_lock);
> break;
> case POSIX_FADV_SEQUENTIAL:
> file->f_ra.ra_pages = bdi->ra_pages * 2;
> + spin_lock(&file->f_lock);
> + file->f_flags &= ~O_RANDOM;
> + spin_unlock(&file->f_lock);
> break;
> case POSIX_FADV_WILLNEED:
> if (!mapping->a_ops->readpage) {
> --- linux.orig/mm/readahead.c 2010-01-04 12:39:29.000000000 +0800
> +++ linux/mm/readahead.c 2010-01-04 12:39:30.000000000 +0800
> @@ -501,6 +501,12 @@ void page_cache_sync_readahead(struct ad
> if (!ra->ra_pages)
> return;
>
> + /* be dumb */
> + if (filp->f_flags & O_RANDOM) {
> + force_page_cache_readahead(mapping, filp, offset, req_size);
> + return;
> + }
> +
Let me have a dumb question. :)
How about testing O_RANDOM in front of ra_pages testing?
My intention is that although we turn off ra, it would be better to read
contiguous block all at once than readpage() callback doing I/O
one page at a time.
Is it break some semantics or happen some problem in ondemand readahead?
> /* do read-ahead */
> ondemand_readahead(mapping, ra, filp, false, offset, req_size);
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-04 5:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20091225000717.GA26949@yahoo-inc.com>
[not found] ` <87aax18xms.fsf@basil.nowhere.org>
2009-12-30 5:15 ` [RFC][PATCH] Disabling read-ahead makes I/O of large reads small Wu Fengguang
2009-12-30 5:17 ` [RFC][PATCH 1/2] readahead: replace ra->mmap_miss with ra->flags Wu Fengguang
2009-12-30 5:24 ` [RFC][PATCH 2/2] readahead: avoid page-by-page reads on POSIX_FADV_RANDOM Wu Fengguang
2009-12-30 18:02 ` Andi Kleen
2009-12-31 1:39 ` Wu Fengguang
2009-12-31 2:53 ` Wu Fengguang
2009-12-31 3:03 ` Wu Fengguang
2009-12-31 4:31 ` [RFC][PATCH v2] readahead: introduce O_RANDOM_READ for POSIX_FADV_RANDOM Wu Fengguang
2010-01-08 13:08 ` Christoph Hellwig
2010-01-09 13:59 ` Wu Fengguang
2010-01-09 14:01 ` Wu Fengguang
2010-01-04 4:50 ` [RFC][PATCH v3] readahead: introduce O_RANDOM " Wu Fengguang
2010-01-04 5:17 ` Stephen Rothwell
2010-01-04 7:33 ` Christoph Hellwig
2010-01-04 12:56 ` [RFC][PATCH v4] " Wu Fengguang
2010-01-05 2:03 ` Stephen Rothwell
2010-01-05 2:26 ` Wu Fengguang
2010-01-05 2:28 ` Stephen Rothwell
2010-01-05 2:45 ` Wu Fengguang
2010-01-05 5:21 ` Eric Paris
2010-01-05 3:18 ` [RFC][PATCH v5] " Wu Fengguang
2010-01-05 3:27 ` Wu Fengguang
2010-01-04 16:50 ` [RFC][PATCH v3] " Quentin Barnes
2010-01-04 18:57 ` Andreas Dilger
2010-01-04 5:20 ` Minchan Kim [this message]
2010-01-04 12:16 ` Wu Fengguang
2010-01-05 1:46 ` Minchan Kim
2010-01-05 2:16 ` Wu Fengguang
2010-01-05 3:40 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28c262361001032120v284e92b5ub1211f3d1fca6140@mail.gmail.com \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=corbet@lwn.net \
--cc=dhowells@redhat.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=qbarnes+nfs@yahoo-inc.com \
--cc=swhiteho@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).