linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Senger <lukas@fridolin.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthias Wirth <matthias.wirth@gmail.com>,
	i4passt@lists.cs.fau.de,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <matthew@wil.cx>, Jeff Layton <jlayton@redhat.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Rik van Riel <riel@redhat.com>,
	Lisa Du <cldu@marvell.com>, Jan Kara <jack@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Lukas Czerner <lczerner@redhat.com>,
	Damien Ramonda <damien.ramonda@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCHv2] mm: implement POSIX_FADV_NOREUSE
Date: Fri, 14 Mar 2014 13:34:20 +0100	[thread overview]
Message-ID: <1394800460.3853.8.camel@dinghy> (raw)
In-Reply-To: <20140313130115.e5abf7da216e6a7610d4cd36@linux-foundation.org>

On Thu, 2014-03-13 at 13:01 -0700, Andrew Morton wrote:
> On Thu, 13 Mar 2014 19:43:41 +0100 Matthias Wirth <matthias.wirth@gmail.com> wrote:
> 
> > Backups, logrotation and indexers don't need files they read to remain
> > in the page cache. Their pages can be reclaimed early and should not
> > displace useful pages. POSIX specifices the POSIX_FADV_NOREUSE flag for
> > these use cases but it's currently a noop.
> 
> As far as I can tell, POSIX_FADV_DONTNEED suits these applications
> quite well.  Why is this patch happening?

Using DONTNEED means the application will throw out its pages even if
they are used by other processes. If the application wants to be more
polite it needs a way to find out whether thats the case. One way is to
use mincore to get a snapshot of pages before mmaping the file and then
keeping pages that were already cached before we accessed them. This of
course ignores all accesses by other processes occuring while we use the
file and doesn't work with read. Apart from those flaws, does that kind
of page cache management belong into userspace?

> My proposal to deactivate the pages within the fadvise() call violates
> that, because the spec wants us to act *after* the app has touched the
> pages.
> 
> Your proposed implementation violates it because it affects data
> outside the specified range.
> 
> It would be interesting to know what the *bsd guys chose to do, but I
> don't understand it from the amount of context in
> http://lists.freebsd.org/pipermail/svn-src-stable-9/2012-August/002608.html
> 
> Ignoring the range and impacting the entire file (for this fd) is a
> bit lame.  Alternatives include:
> 
> a) Implement a per-fd tree of (start,len) ranges and maintain and
>    search that.  blah.
> 
> b) violate the spec in a different fashion and implement NOREUSE
>    synchronously within fadvise.
> 
> From a practical point of view, I'm currently inclining toward b). 
> Yes, we require NOREUSE be run *after* the read() instead of before it,
> but what's wrong with that?  It's just as easy to implement from
> userspace.  Perhaps we should call it POSIX_FADV_NOREUSE_LINUX to make
> it clear that we went our own way.

The problem with calling fadvise with NOREUSE_LINUX after read is that
it makes writing applications in a portable way harder. As you point
out, our version doesn't adhere to the spec perfectly either, but I'd
wager it covers the most common use case. And a) would at least allow a
spec faithful implementation in the future.

> I don't think that per-cpu page thing is suitable, really.  If this
> task context-switches to a different CPU then we get the wrong page. 
> This will happen pretty often as the task is performing physical IO. 
> This can be fixed by putting the page* into the task_struct instead,
> but passing function args via current-> is a bit of a hack.  Why not
> create add_to_page_cache_lru_tail()?

We agree and will send a new version with add_to_page_cache_lru_tail.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-03-14 12:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-11 10:25 [PATCH] mm: implement POSIX_FADV_NOREUSE Matthias Wirth
2014-03-11 14:06 ` Michal Hocko
2014-03-11 15:24   ` Dave Hansen
2014-03-11 21:27     ` Andrew Morton
2014-03-12 11:59       ` Lukas Senger
2014-03-12 14:46         ` Michal Hocko
2014-03-12 16:05         ` Dave Hansen
2014-03-13 12:40           ` Lukas Senger
2014-03-13 18:43 ` [PATCHv2] " Matthias Wirth
2014-03-13 20:01   ` Andrew Morton
2014-03-14 12:34     ` Lukas Senger [this message]
2014-03-14 15:52 ` [PATCHv3] " Matthias Wirth
2014-03-18 15:14   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1394800460.3853.8.camel@dinghy \
    --to=lukas@fridolin.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=cldu@marvell.com \
    --cc=damien.ramonda@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=i4passt@lists.cs.fau.de \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=khalid.aziz@oracle.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=lczerner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=matthew@wil.cx \
    --cc=matthias.wirth@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=sasha.levin@oracle.com \
    --cc=swhiteho@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).