All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Quentin Barnes <qbarnes+nfs@yahoo-inc.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Nick Piggin <npiggin@suse.de>,
	Steven Whitehouse <swhiteho@redhat.com>
Subject: Re: [RFC][PATCH 2/2] readahead: avoid page-by-page reads on POSIX_FADV_RANDOM
Date: Thu, 31 Dec 2009 10:53:43 +0800	[thread overview]
Message-ID: <20091231025343.GA11327@localhost> (raw)
In-Reply-To: <20091231013935.GA6570@localhost>

On Thu, Dec 31, 2009 at 09:39:36AM +0800, Wu Fengguang wrote:
> On Thu, Dec 31, 2009 at 02:02:38AM +0800, Andi Kleen wrote:
> > Wu Fengguang <fengguang.wu@intel.com> writes:
> > >   * the ra fields can be accessed concurrently in a racy way.
> > > --- linux.orig/mm/fadvise.c	2009-12-30 13:02:03.000000000 +0800
> > > +++ linux/mm/fadvise.c	2009-12-30 13:23:05.000000000 +0800
> > > @@ -77,12 +77,14 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
> > >  	switch (advice) {
> > >  	case POSIX_FADV_NORMAL:
> > >  		file->f_ra.ra_pages = bdi->ra_pages;
> > > +		file->f_ra.flags &= ~RA_FLAG_RANDOM;
> > >  		break;
> > >  	case POSIX_FADV_RANDOM:
> > > -		file->f_ra.ra_pages = 0;
> > > +		file->f_ra.flags |= RA_FLAG_RANDOM;
> > 
> > What prevents this from racing with a parallel readahead
> > state modification, losing the bits?
> 
> Oh I pretended that the problem don't exist..
> 
> To be serious, the race only exist inside a mutithread application,
> where one single fd is shared between two threads, one is doing
> fadvise, another doing readahead.
> 
> A sane application won't do fadvise(POSIX_FADV_RANDOM) while active
> reads are going one concurrently: this leads to indeterminate behavior
> by itself -- from which request the random hint takes effect?
> 
> fadvise() shall always be in the same streamline with all reads.
> 
> In real workloads, 1% applications may do POSIX_FADV_RANDOM, among
> which 1% applications may be broken. And if the race does happen, the
> impact is very small. So I choose to just ignore the race and use
> non-atomic operations..

OK, when updating the manpages as follows, I feel ashamed to add a
sentence like "make sure there are no concurrent reads on the same file
descriptor...otherwise your advice will be lost".

So how about add a FMODE_HINT_RANDOM_READ bit to file->f_mode?
Modifying it at fadvise() time at least won't disturb the existing
open-time-modify-only f_mode bits..

Thanks,
Fengguang

---
 man2/posix_fadvise.2 |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- manpages-3.23.orig/man2/posix_fadvise.2	2009-12-31 09:46:13.000000000 +0800
+++ manpages-3.23/man2/posix_fadvise.2	2009-12-31 10:28:58.000000000 +0800
@@ -104,7 +104,8 @@ in POSIX.1-2003 TC1.
 .SH NOTES
 Under Linux, \fBPOSIX_FADV_NORMAL\fP sets the readahead window to the
 default size for the backing device; \fBPOSIX_FADV_SEQUENTIAL\fP doubles
-this size, and \fBPOSIX_FADV_RANDOM\fP disables file readahead entirely.
+this size. \fBPOSIX_FADV_RANDOM\fP ignores the readahead size, and will
+submit IO for the read requests as-is.
 These changes affect the entire file, not just the specified region
 (but other open file handles to the same file are unaffected).
 


  reply	other threads:[~2009-12-31  2:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-25  0:07 [RFC][PATCH] Disabling read-ahead makes I/O of large reads small Quentin Barnes
2009-12-29 18:04 ` Andi Kleen
2009-12-30  5:15   ` Wu Fengguang
2009-12-30  5:17     ` [RFC][PATCH 1/2] readahead: replace ra->mmap_miss with ra->flags Wu Fengguang
2009-12-30  5:24     ` [RFC][PATCH 2/2] readahead: avoid page-by-page reads on POSIX_FADV_RANDOM Wu Fengguang
2009-12-30 18:02       ` Andi Kleen
2009-12-31  1:39         ` Wu Fengguang
2009-12-31  2:53           ` Wu Fengguang [this message]
2009-12-31  3:03             ` Wu Fengguang
2009-12-31  4:31         ` [RFC][PATCH v2] readahead: introduce O_RANDOM_READ for POSIX_FADV_RANDOM Wu Fengguang
2010-01-08 13:08           ` Christoph Hellwig
2010-01-09 13:59             ` Wu Fengguang
2010-01-09 14:01               ` Wu Fengguang
2010-01-04  4:50         ` [RFC][PATCH v3] readahead: introduce O_RANDOM " Wu Fengguang
2010-01-04  5:17           ` Stephen Rothwell
2010-01-04  7:33             ` Christoph Hellwig
2010-01-04 12:56               ` [RFC][PATCH v4] " Wu Fengguang
2010-01-05  2:03                 ` Stephen Rothwell
2010-01-05  2:26                   ` Wu Fengguang
2010-01-05  2:28                     ` Stephen Rothwell
2010-01-05  2:45                       ` Wu Fengguang
2010-01-05  5:21                         ` Eric Paris
2010-01-05  3:18                       ` [RFC][PATCH v5] " Wu Fengguang
2010-01-05  3:27                         ` Wu Fengguang
2010-01-04 16:50               ` [RFC][PATCH v3] " Quentin Barnes
2010-01-04 18:57                 ` Andreas Dilger
2010-01-04  5:20           ` Minchan Kim
2010-01-04  5:20             ` Minchan Kim
2010-01-04 12:16             ` Wu Fengguang
2010-01-05  1:46               ` Minchan Kim
2010-01-05  1:46                 ` Minchan Kim
2010-01-05  2:16                 ` Wu Fengguang
2010-01-05  3:40                   ` Minchan Kim
2010-01-05  3:40                     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091231025343.GA11327@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=qbarnes+nfs@yahoo-inc.com \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.