public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: John Stultz <john.stultz@linaro.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Robert Love <rlove@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Eric Anholt <eric@anholt.net>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Johannes Weiner <jweiner@redhat.com>,
	Jon Masters <jcm@redhat.com>
Subject: Re: [PATCH] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags
Date: Tue, 22 Nov 2011 04:37:36 -0500	[thread overview]
Message-ID: <4ECB6D60.1010702@redhat.com> (raw)
In-Reply-To: <1321932788-18043-1-git-send-email-john.stultz@linaro.org>

On 11/21/2011 10:33 PM, John Stultz wrote:
> This patch provides new fadvise flags that can be used to mark
> file pages as volatile, which will allow it to be discarded if the
> kernel wants to reclaim memory.
>
> This is useful for userspace to allocate things like caches, and lets
> the kernel destructively (but safely) reclaim them when there's memory
> pressure.
>
> Right now, we can simply throw away pages if they are clean (backed
> by a current on-disk copy).  That only happens for anonymous/tmpfs/shmfs
> pages when they're swapped out.  This patch lets userspace select
> dirty pages which can be simply thrown away instead of writing them
> to disk first.  See the mm/shmem.c for this bit of code.  It's
> different from FADV_DONTNEED since the pages are not immediately
> discarded; they are only discarded under pressure.

I've got a few questions:

1) How do you tell userspace some of its data got
    discarded?

2) How do you prevent the situation where every
    volatile object gets a few pages discarded, making
    them all unusable?
    (better to throw away an entire object at once)

3) Isn't it too slow for something like Firefox to
    create a new tmpfs object for every single throw-away
    cache object?

Johannes, Jon and I have looked at an alternative way to
allow the kernel and userspace to cooperate in throwing
out cached data.  This alternative way does not touch
the alloc/free fast path at all, but does require some
cooperation at "shrink cache" time.

The idea is quite simple:

1) Every program that we are interested in already has
    some kind of main loop where it polls on file descriptors.
    It is easy for such programs to add an additional file,
    which would be a device or sysfs file that wakes up the
    program from its poll/select loop when memory is getting
    full to the point that userspace needs to shrink its
    caches.

    The kernel can be smart here and wake up just one process
    at a time, targeting specific NUMA nodes or cgroups. Such
    kernel smarts do not require additional userspace changes.

2) When userspace gets such a "please shrink your caches"
    event, it can do various things.  A program like firefox
    could throw away several cached objects, eg. uncompressed
    images or entire pre-rendered tabs, while a JVM can shrink
    its heap size and a database could shrink its internal
    cache.

3) After doing that, they could all call the same glibc
    function that walks across program-internal free memory
    and calls MADV_FREE on all free regions that span
    multiple pages, which gives the pages back to the kernel,
    without needing to move VMA boundaries.  This is relatively
    light weight and allows for the nuking of pages right in
    the middle of a heap VMA.

4) In some GUI libraries, like gtk/glib, we could open the
    memory pressure device node (or sysfs file) by default,
    hooking it up to the glibc function from (3) by default,
    which would give all gtk/glib programs the ability to
    give free()d memory back to the kernel on request, without
    needing to even modify the program.

    Program modification would only be needed in order to
    free cached objects, etc.  The modification of programs
    running under those libraries would consist of overriding
    the "shrink caches" hook with their own function, which
    first does program-specific stuff and then calls the
    default hook to take care of the glibc side.

We considered the same approach you are proposing as well, but
we did not come up with satisfactory answers to the questions I
asked above, which is why we came up with this scheme.

Unfortunately we have not gotten around to implementing it yet,
but I'd be happy to work on it with you guys if you are
interested.

  reply	other threads:[~2011-11-22  9:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-22  3:33 [PATCH] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2011-11-22  9:37 ` Rik van Riel [this message]
2011-11-22 10:45   ` Rik van Riel
2011-11-22 20:39     ` Dave Hansen
2011-11-22 16:31   ` Robert Love
2011-11-22 19:48   ` John Stultz
2011-11-23  0:27     ` Rik van Riel
     [not found]   ` <CAG6tG3xTkW1J=6xmUmmJoswJyR6ii5RDXvAsYrcH0CkVuUmJrQ@mail.gmail.com>
2011-11-23  0:39     ` Rik van Riel
2011-11-23 15:52       ` Robert Love
2011-11-26  0:05   ` Jan Kara
2011-11-22 20:52 ` Andrew Morton
2011-11-22 21:32   ` John Stultz
2011-11-22 21:39     ` Andrew Morton
2011-11-22 22:58       ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECB6D60.1010702@redhat.com \
    --to=riel@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=eric@anholt.net \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jcm@redhat.com \
    --cc=john.stultz@linaro.org \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=rlove@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox