From: John Stultz <john.stultz@linaro.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags
Date: Mon, 13 Feb 2012 21:55:32 -0800 [thread overview]
Message-ID: <1329198932.2753.62.camel@work-vm> (raw)
In-Reply-To: <20120214051659.GH14132@dastard>
On Tue, 2012-02-14 at 16:16 +1100, Dave Chinner wrote:
> On Thu, Feb 09, 2012 at 04:16:33PM -0800, John Stultz wrote:
> > This patch provides new fadvise flags that can be used to mark
> > file pages as volatile, which will allow it to be discarded if the
> > kernel wants to reclaim memory.
> >
> > This is useful for userspace to allocate things like caches, and lets
> > the kernel destructively (but safely) reclaim them when there's memory
> > pressure.
> .....
> > @@ -655,6 +656,8 @@ struct address_space {
> > spinlock_t private_lock; /* for use by the address_space */
> > struct list_head private_list; /* ditto */
> > struct address_space *assoc_mapping; /* ditto */
> > + struct range_tree_node *volatile_root; /* volatile range list */
> > + struct mutex vlist_mutex; /* protect volatile_list */
> > } __attribute__((aligned(sizeof(long))));
>
> So you're adding roughly 32 bytes to every cached inode in the
> system? This will increasing the memory footprint of the inode cache
> by 2-5% (depending on the filesystem). Almost no-one will be using
> this functionality on most inodes that are cached in the system, so
> that seems like a pretty bad trade-off to me...
Yea. Bloating the address_space is a concern I'm aware of, but for the
initial passes I left it to see where folks would rather I keep it.
Pushing the mutex into a range_tree_root structure or something could
cut this down, but I still suspect it won't be loved. Another idea would
be to manage the mapping -> range tree separately via something like a
hash. Do you have any preferences or suggestions here?
> > +static int volatile_shrink(struct shrinker *ignored, struct shrink_control *sc)
> > +{
> > + struct volatile_range *range, *next;
> > + unsigned long nr_to_scan = sc->nr_to_scan;
> > + const gfp_t gfp_mask = sc->gfp_mask;
> > +
> > + /* We might recurse into filesystem code, so bail out if necessary */
> > + if (nr_to_scan && !(gfp_mask & __GFP_FS))
> > + return -1;
> > + if (!nr_to_scan)
> > + return lru_count;
> > +
> > + mutex_lock(&volatile_lru_mutex);
> > + list_for_each_entry_safe(range, next, &volatile_lru_list, lru) {
> > + struct inode *inode = range->mapping->host;
> > + loff_t start, end;
> > +
> > +
> > + start = range->range_node.start * PAGE_SIZE;
> > + end = (range->range_node.end + 1) * PAGE_SIZE - 1;
> > +
> > + /*
> > + * XXX - calling vmtruncate_range from a shrinker causes
> > + * lockdep warnings. Revisit this!
> > + */
> > + vmtruncate_range(inode, start, end);
>
> That function vmtruncate_range, I don't think it does what you think
> it does.
>
> Firstly, it's only implemented for shmfs/tmpfs, so this can't have
> been tested for caching files on any real filesystem. If it's only
> for shm/tmpfs, then the applications cwcan just as easily use their
> own memory for caching their volatile data...
Yep you're right, this started as being shm only, and has only been
tested on tmpfs mounts. In this verison, I had left the shm checks off
so that it could be possibly more generic, but I admittedly haven't
thought that through enough.
> Secondly, vmtruncate_range() is actually a hole punching function,
> not a page cache invalidation function. You should be using
> invalidate_inode_pages2_range() to invalidate and tear down the page
> cache. If you really want to punch holes in files, then you should
> be using the fallocate syscall with direct application control, not
> trying to hide it until memory pressure occurs via fadvise because
> hole punching requires memory for the transactions necessary to run
> extent freeing operations.
Thanks for the tip on invalidate_inode_pages2_range()! I'll look it over
and rework the patch using that.
Thanks so much for the review!
-john
next prev parent reply other threads:[~2012-02-14 5:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 0:16 [PATCH 1/2] [RFC] Range tree implementation John Stultz
2012-02-10 0:16 ` [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2012-02-12 14:08 ` Dmitry Adamushko
2012-02-17 3:49 ` John Stultz
2012-02-14 5:16 ` Dave Chinner
2012-02-14 5:55 ` John Stultz [this message]
2012-02-14 23:51 ` Dave Chinner
2012-02-15 0:29 ` John Stultz
2012-02-15 1:37 ` NeilBrown
2012-02-17 4:45 ` Dave Chinner
2012-02-17 5:27 ` NeilBrown
2012-02-17 5:38 ` John Stultz
2012-02-17 5:21 ` John Stultz
2012-02-20 7:34 ` NeilBrown
2012-02-20 23:25 ` Dave Hansen
[not found] ` <CAO6Zf6B6nGqsz5zpT3ixbO-+JWxMsScABasnwo-CVHuMKPqpLQ@mail.gmail.com>
2012-02-12 12:54 ` Fwd: " Dmitry Adamushko
2012-02-17 3:43 ` John Stultz
2012-02-17 5:24 ` John Stultz
-- strict thread matches above, loose matches on Subject: below --
2012-03-16 22:51 [PATCH 0/2] [RFC] Volatile ranges (v4) John Stultz
2012-03-16 22:51 ` [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2012-03-17 16:21 ` Dmitry Adamushko
2012-03-18 9:13 ` Dmitry Adamushko
2012-03-20 0:18 ` John Stultz
2012-03-21 4:15 [PATCH 0/2] [RFC] fadivse volatile & range tree (v5) John Stultz
2012-03-21 4:15 ` [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2012-04-07 0:08 [PATCH 0/2] [RFC] Volatile Ranges (v6) John Stultz
2012-04-07 0:08 ` [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
2012-04-14 1:07 [PATCH 0/2][RFC] Volatile Ranges (v7) John Stultz
2012-04-14 1:08 ` [PATCH 2/2] [RFC] fadvise: Add _VOLATILE,_ISVOLATILE, and _NONVOLATILE flags John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1329198932.2753.62.camel@work-vm \
--to=john.stultz@linaro.org \
--cc=akpm@linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=hughd@google.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
--cc=rlove@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.