public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Android Kernel Team <kernel-team@android.com>,
	Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
	Andrea Righi <andrea@betterlinux.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Taras Glek <tgek@mozilla.com>, Mike Hommey <mh@glandium.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers
Date: Wed, 06 Jun 2012 16:56:38 -0700	[thread overview]
Message-ID: <4FCFEE36.3010902@linaro.org> (raw)
In-Reply-To: <4FCFB4F6.6070308@gmail.com>

On 06/06/2012 12:52 PM, KOSAKI Motohiro wrote:
>> The key point is we want volatile ranges to be purged in the order they
>> were marked volatile.
>> If we use the page lru via shmem_writeout to trigger range purging, we
>> wouldn't necessarily get this desired behavior.
> Ok, so can you please explain your ideal order to reclaim. your last mail
> described old and new volatiled region. but I'm not sure regular tmpfs pages
> vs volatile pages vs regular file cache order. That said, when using shrink_slab(),
> we choose random order to drop against page cache. I'm not sure why you sure
> it is ideal.

So I'm not totally sure its ideal, but I can tell you what make sense to
me. If there is a more ideal order, I'm open to suggestions.

So volatile ranges should be purged first-in-first-out. So the first
range marked volatile should be purged first. Since volatile ranges
might have different costs depending on what filesystem the file is
backed by, this LRU order is per-filesystem.

It seems that if we have tmpfs volatile ranges, we should purge them
before we swap out any regular tmpfs pages. Thus why I'm purging any
available ranges on shmem_writepage before swapping, rather then using a
shrinker now (I'm hoping you saw the updated patchset I sent out friday).

Does that make sense?

> And, now I guess you think nobody touch volatiled page, yes? because otherwise
> volatile marking order is silly choice. If yes, what's happen if anyone touch
> a patch which volatiled. no-op? SIGBUS? 

So more of a noop. If you read a page that has been marked volatile, it
may return the data that was there, or it may return an empty nulled page.

I guess we could throw a signal to help avoid developers making
programming mistakes, but I'm not sure what the extra cost would be to
set up and tare that down each time. One important aspect of this is
that in order to make it attractive for an application to mark ranges as
volatile, it has to be very cheap to mark and unmark ranges.



> Which worklord didn't work. Usually, anon pages reclaim are only
> happen when 1) tmpfs streaming io workload or 2) heavy vm pressure.
> So, this scenario are not so inaccurate to me.

So it was more of a theoretical issue in my discussions, but once it was
brought up, ashmems' global range lru made more sense.

I think the workload we're mostly concerned with here is heavy vm pressure.



>> That's when I added the LRU tracking at the volatile range level (which
>> reverted back to the behavior ashmem has always used), and have been
>> using that model sense.
>>
>> Hopefully this clarifies things. My apologies if I don't always use the
>> correct terminology, as I'm still a newbie when it comes to VM code.
> I think your code is enough clean. But I'm still not sure your background
> design. Please help me to understand clearly.
Hopefully the above helps. But let me know where you'd like more
clarification.


> btw, Why do you choice fallocate instead of fadvise? As far as I skimmed,
> fallocate() is an operation of a disk layout, not of a cache. And, why
> did you choice fadvise() instead of madvise() at initial version. vma
> hint might be useful than fadvise() because it can be used for anonymous
> pages too.
I actually started with madvise, but quickly moved to fadvise when
feeling that the fd based ranges made more sense. With ashmem, fds are
often shared, and coordinating volatile ranges on a shared fd made more
sense on a (fd, offset, len) tuple, rather then on an offset and length
on an mmapped region.

I moved to fallocate at Dave Chinner's request. In short, it allows
non-tmpfs filesystems to implement volatile range semantics allowing
them to zap rather then writeout dirty volatile pages. And since the
volatile ranges are very similar to a delayed/cancel-able hole-punch, it
made sense to use a similar interface to FALLOC_FL_HOLE_PUNCH.

You can read the details of DaveC's suggestion here:
https://lkml.org/lkml/2012/4/30/441

thanks
-john






  reply	other threads:[~2012-06-06 23:56 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-01 18:29 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v2 John Stultz
2012-06-01 18:29 ` [PATCH 1/3] [RFC] Interval tree implementation John Stultz
2012-06-01 18:29 ` [PATCH 2/3] [RFC] Add volatile range management code John Stultz
2012-06-01 18:29 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
2012-06-01 20:17   ` KOSAKI Motohiro
2012-06-01 21:03     ` John Stultz
2012-06-01 21:37       ` KOSAKI Motohiro
2012-06-01 21:44         ` John Stultz
2012-06-01 22:34           ` KOSAKI Motohiro
2012-06-01 23:25             ` John Stultz
2012-06-06 19:52               ` KOSAKI Motohiro
2012-06-06 23:56                 ` John Stultz [this message]
2012-06-07 10:55                   ` Dmitry Adamushko
2012-06-07 23:41                     ` Dave Hansen
2012-06-08  3:03                       ` John Stultz
2012-06-08  4:50                         ` KOSAKI Motohiro
2012-06-09  3:45                           ` John Stultz
2012-06-10  6:35                             ` Dmitry Adamushko
2012-06-10 21:47                             ` Rik van Riel
2012-06-11 18:35                               ` John Stultz
2012-06-12  1:21                                 ` John Stultz
2012-06-12  7:16                             ` Minchan Kim
2012-06-12 16:03                               ` KOSAKI Motohiro
2012-06-12 19:35                               ` John Stultz
2012-06-13  0:10                                 ` Minchan Kim
2012-06-13  1:21                                   ` John Stultz
2012-06-13  4:42                                     ` Minchan Kim
2012-06-08  6:39                   ` KOSAKI Motohiro
  -- strict thread matches above, loose matches on Subject: below --
2012-06-01 23:38 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v3 John Stultz
2012-06-01 23:38 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FCFEE36.3010902@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@betterlinux.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@android.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mh@glandium.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    --cc=tgek@mozilla.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox