public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: Rik van Riel <riel@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Robert Love <rlove@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Eric Anholt <eric@anholt.net>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Johannes Weiner <jweiner@redhat.com>,
	Jon Masters <jcm@redhat.com>
Subject: Re: [PATCH] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags
Date: Tue, 22 Nov 2011 11:48:58 -0800	[thread overview]
Message-ID: <1321991338.6445.70.camel@work-vm> (raw)
In-Reply-To: <4ECB6D60.1010702@redhat.com>

On Tue, 2011-11-22 at 04:37 -0500, Rik van Riel wrote:
> On 11/21/2011 10:33 PM, John Stultz wrote:
> > This patch provides new fadvise flags that can be used to mark
> > file pages as volatile, which will allow it to be discarded if the
> > kernel wants to reclaim memory.
> >
> > This is useful for userspace to allocate things like caches, and lets
> > the kernel destructively (but safely) reclaim them when there's memory
> > pressure.
> >
> > Right now, we can simply throw away pages if they are clean (backed
> > by a current on-disk copy).  That only happens for anonymous/tmpfs/shmfs
> > pages when they're swapped out.  This patch lets userspace select
> > dirty pages which can be simply thrown away instead of writing them
> > to disk first.  See the mm/shmem.c for this bit of code.  It's
> > different from FADV_DONTNEED since the pages are not immediately
> > discarded; they are only discarded under pressure.
> 
> I've got a few questions:
> 
> 1) How do you tell userspace some of its data got
>     discarded?

You get a return code when marking the page non-volatile if it has been
discarded. This follows the ashmem style that Robert described in the
other mail.

> 2) How do you prevent the situation where every
>     volatile object gets a few pages discarded, making
>     them all unusable?
>     (better to throw away an entire object at once)

Indeed. One of the issues folks brought up about the ashmem code was
that it manages its own lru.  This attempt just simplifies the code, by
using the kerenl's own lru, but does have the draw back that it is page
based instead of  object or range-based.

We could try to zap the entire range when a page from the range is
written out, or we could go back to using a range based lru, like ashmem
does. 


> 3) Isn't it too slow for something like Firefox to
>     create a new tmpfs object for every single throw-away
>     cache object?

So, if you mean creating a new file for every cache object, that doesn't
seem necessary, as you could map a number of objects into the same file
and mark the ranges as volatile or not as needed. 

Or are you worried about the allocation of the range structure when we
mark a region as volatile?

Either way, I'd defer to Robert on real-world usage. 


> Johannes, Jon and I have looked at an alternative way to
> allow the kernel and userspace to cooperate in throwing
> out cached data.  This alternative way does not touch
> the alloc/free fast path at all, but does require some
> cooperation at "shrink cache" time.
> 
> The idea is quite simple:
> 
> 1) Every program that we are interested in already has
>     some kind of main loop where it polls on file descriptors.
>     It is easy for such programs to add an additional file,
>     which would be a device or sysfs file that wakes up the
>     program from its poll/select loop when memory is getting
>     full to the point that userspace needs to shrink its
>     caches.
> 
>     The kernel can be smart here and wake up just one process
>     at a time, targeting specific NUMA nodes or cgroups. Such
>     kernel smarts do not require additional userspace changes.
> 
> 2) When userspace gets such a "please shrink your caches"
>     event, it can do various things.  A program like firefox
>     could throw away several cached objects, eg. uncompressed
>     images or entire pre-rendered tabs, while a JVM can shrink
>     its heap size and a database could shrink its internal
>     cache.

So similarly to Robert, I don't see this approach as necessarily
exclusive to the volatile flags. There are just some tradeoffs with the
different approaches.

The upside with your approach is that applications don't have to
remember to re-pin the cache before using it and unpin it after its done
using it.

The downside is that the "please shrink your caches" message is likely
to arrive when the system is low on resources. Once applications have
been asked to "be nice and get small!", having to wait for that action
to occur might not be great. Where as with the volatile regions, there
are just additionally easily freeable pages available when the kernel
needs them.

thanks
-john



  parent reply	other threads:[~2011-11-22 19:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-22  3:33 [PATCH] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2011-11-22  9:37 ` Rik van Riel
2011-11-22 10:45   ` Rik van Riel
2011-11-22 20:39     ` Dave Hansen
2011-11-22 16:31   ` Robert Love
2011-11-22 19:48   ` John Stultz [this message]
2011-11-23  0:27     ` Rik van Riel
     [not found]   ` <CAG6tG3xTkW1J=6xmUmmJoswJyR6ii5RDXvAsYrcH0CkVuUmJrQ@mail.gmail.com>
2011-11-23  0:39     ` Rik van Riel
2011-11-23 15:52       ` Robert Love
2011-11-26  0:05   ` Jan Kara
2011-11-22 20:52 ` Andrew Morton
2011-11-22 21:32   ` John Stultz
2011-11-22 21:39     ` Andrew Morton
2011-11-22 22:58       ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1321991338.6445.70.camel@work-vm \
    --to=john.stultz@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=eric@anholt.net \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jcm@redhat.com \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox