Re: [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: John Stultz <john.stultz@linaro.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Android Kernel Team <kernel-team@android.com>,
	Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
	Andrea Righi <andrea@betterlinux.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Taras Glek <tgek@mozilla.com>, Mike Hommey <mh@glandium.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers
Date: Fri, 08 Jun 2012 02:39:51 -0400	[thread overview]
Message-ID: <4FD19E37.3020309@gmail.com> (raw)
In-Reply-To: <4FCFEE36.3010902@linaro.org>

(6/6/12 7:56 PM), John Stultz wrote:
> On 06/06/2012 12:52 PM, KOSAKI Motohiro wrote:
>>> The key point is we want volatile ranges to be purged in the order they
>>> were marked volatile.
>>> If we use the page lru via shmem_writeout to trigger range purging, we
>>> wouldn't necessarily get this desired behavior.
>> Ok, so can you please explain your ideal order to reclaim. your last mail
>> described old and new volatiled region. but I'm not sure regular tmpfs pages
>> vs volatile pages vs regular file cache order. That said, when using shrink_slab(),
>> we choose random order to drop against page cache. I'm not sure why you sure
>> it is ideal.
> 
> So I'm not totally sure its ideal, but I can tell you what make sense to
> me. If there is a more ideal order, I'm open to suggestions.
> 
> So volatile ranges should be purged first-in-first-out. So the first
> range marked volatile should be purged first. Since volatile ranges
> might have different costs depending on what filesystem the file is
> backed by, this LRU order is per-filesystem.
> 
> It seems that if we have tmpfs volatile ranges, we should purge them
> before we swap out any regular tmpfs pages. Thus why I'm purging any
> available ranges on shmem_writepage before swapping, rather then using a
> shrinker now (I'm hoping you saw the updated patchset I sent out friday).
> 
> Does that make sense?
> 
>> And, now I guess you think nobody touch volatiled page, yes? because otherwise
>> volatile marking order is silly choice. If yes, what's happen if anyone touch
>> a patch which volatiled. no-op? SIGBUS?
> 
> So more of a noop. If you read a page that has been marked volatile, it
> may return the data that was there, or it may return an empty nulled page.
> 
> I guess we could throw a signal to help avoid developers making
> programming mistakes, but I'm not sure what the extra cost would be to
> set up and tare that down each time. One important aspect of this is
> that in order to make it attractive for an application to mark ranges as
> volatile, it has to be very cheap to mark and unmark ranges.

ok, i agree we don't need to pay any extra cost.

>> Which worklord didn't work. Usually, anon pages reclaim are only
>> happen when 1) tmpfs streaming io workload or 2) heavy vm pressure.
>> So, this scenario are not so inaccurate to me.
> 
> So it was more of a theoretical issue in my discussions, but once it was
> brought up, ashmems' global range lru made more sense.

No. Every global lru is evil. Please don't introduce numa unaware code for 
a new feature. That's a legacy and poor performance.


> I think the workload we're mostly concerned with here is heavy vm pressure.

I don't admit it. but note, when under heavy workload, shrink_slab() behave 
stupid seriously.



>>> That's when I added the LRU tracking at the volatile range level (which
>>> reverted back to the behavior ashmem has always used), and have been
>>> using that model sense.
>>>
>>> Hopefully this clarifies things. My apologies if I don't always use the
>>> correct terminology, as I'm still a newbie when it comes to VM code.
>> I think your code is enough clean. But I'm still not sure your background
>> design. Please help me to understand clearly.
> Hopefully the above helps. But let me know where you'd like more
> clarification.
> 
> 
>> btw, Why do you choice fallocate instead of fadvise? As far as I skimmed,
>> fallocate() is an operation of a disk layout, not of a cache. And, why
>> did you choice fadvise() instead of madvise() at initial version. vma
>> hint might be useful than fadvise() because it can be used for anonymous
>> pages too.
> I actually started with madvise, but quickly moved to fadvise when
> feeling that the fd based ranges made more sense. With ashmem, fds are
> often shared, and coordinating volatile ranges on a shared fd made more
> sense on a (fd, offset, len) tuple, rather then on an offset and length
> on an mmapped region.
> 
> I moved to fallocate at Dave Chinner's request. In short, it allows
> non-tmpfs filesystems to implement volatile range semantics allowing
> them to zap rather then writeout dirty volatile pages. And since the
> volatile ranges are very similar to a delayed/cancel-able hole-punch, it
> made sense to use a similar interface to FALLOC_FL_HOLE_PUNCH.
> 
> You can read the details of DaveC's suggestion here:
> https://lkml.org/lkml/2012/4/30/441

Hmmm...

I'm sorry. I can't imagine how to integrate FALLOCATE_VOLATILE into regular
file systems. do you have any idea?

next prev parent reply	other threads:[~2012-06-08  6:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-01 18:29 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v2 John Stultz
2012-06-01 18:29 ` [PATCH 1/3] [RFC] Interval tree implementation John Stultz
2012-06-01 18:29 ` [PATCH 2/3] [RFC] Add volatile range management code John Stultz
2012-06-01 18:29 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
2012-06-01 20:17   ` KOSAKI Motohiro
2012-06-01 21:03     ` John Stultz
2012-06-01 21:37       ` KOSAKI Motohiro
2012-06-01 21:44         ` John Stultz
2012-06-01 22:34           ` KOSAKI Motohiro
2012-06-01 23:25             ` John Stultz
2012-06-06 19:52               ` KOSAKI Motohiro
2012-06-06 23:56                 ` John Stultz
2012-06-07 10:55                   ` Dmitry Adamushko
2012-06-07 23:41                     ` Dave Hansen
2012-06-08  3:03                       ` John Stultz
2012-06-08  4:50                         ` KOSAKI Motohiro
2012-06-09  3:45                           ` John Stultz
2012-06-10  6:35                             ` Dmitry Adamushko
2012-06-10 21:47                             ` Rik van Riel
2012-06-11 18:35                               ` John Stultz
2012-06-12  1:21                                 ` John Stultz
2012-06-12  7:16                             ` Minchan Kim
2012-06-12 16:03                               ` KOSAKI Motohiro
2012-06-12 19:35                               ` John Stultz
2012-06-13  0:10                                 ` Minchan Kim
2012-06-13  1:21                                   ` John Stultz
2012-06-13  4:42                                     ` Minchan Kim
2012-06-08  6:39                   ` KOSAKI Motohiro [this message]
  -- strict thread matches above, loose matches on Subject: below --
2012-06-01 23:38 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v3 John Stultz
2012-06-01 23:38 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD19E37.3020309@gmail.com \
    --to=kosaki.motohiro@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@betterlinux.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mh@glandium.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    --cc=tgek@mozilla.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox