From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: John Stultz <john.stultz@linaro.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
Andrea Righi <andrea@betterlinux.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Taras Glek <tgek@mozilla.com>, Mike Hommey <mh@glandium.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers
Date: Fri, 08 Jun 2012 02:39:51 -0400 [thread overview]
Message-ID: <4FD19E37.3020309@gmail.com> (raw)
In-Reply-To: <4FCFEE36.3010902@linaro.org>
(6/6/12 7:56 PM), John Stultz wrote:
> On 06/06/2012 12:52 PM, KOSAKI Motohiro wrote:
>>> The key point is we want volatile ranges to be purged in the order they
>>> were marked volatile.
>>> If we use the page lru via shmem_writeout to trigger range purging, we
>>> wouldn't necessarily get this desired behavior.
>> Ok, so can you please explain your ideal order to reclaim. your last mail
>> described old and new volatiled region. but I'm not sure regular tmpfs pages
>> vs volatile pages vs regular file cache order. That said, when using shrink_slab(),
>> we choose random order to drop against page cache. I'm not sure why you sure
>> it is ideal.
>
> So I'm not totally sure its ideal, but I can tell you what make sense to
> me. If there is a more ideal order, I'm open to suggestions.
>
> So volatile ranges should be purged first-in-first-out. So the first
> range marked volatile should be purged first. Since volatile ranges
> might have different costs depending on what filesystem the file is
> backed by, this LRU order is per-filesystem.
>
> It seems that if we have tmpfs volatile ranges, we should purge them
> before we swap out any regular tmpfs pages. Thus why I'm purging any
> available ranges on shmem_writepage before swapping, rather then using a
> shrinker now (I'm hoping you saw the updated patchset I sent out friday).
>
> Does that make sense?
>
>> And, now I guess you think nobody touch volatiled page, yes? because otherwise
>> volatile marking order is silly choice. If yes, what's happen if anyone touch
>> a patch which volatiled. no-op? SIGBUS?
>
> So more of a noop. If you read a page that has been marked volatile, it
> may return the data that was there, or it may return an empty nulled page.
>
> I guess we could throw a signal to help avoid developers making
> programming mistakes, but I'm not sure what the extra cost would be to
> set up and tare that down each time. One important aspect of this is
> that in order to make it attractive for an application to mark ranges as
> volatile, it has to be very cheap to mark and unmark ranges.
ok, i agree we don't need to pay any extra cost.
>> Which worklord didn't work. Usually, anon pages reclaim are only
>> happen when 1) tmpfs streaming io workload or 2) heavy vm pressure.
>> So, this scenario are not so inaccurate to me.
>
> So it was more of a theoretical issue in my discussions, but once it was
> brought up, ashmems' global range lru made more sense.
No. Every global lru is evil. Please don't introduce numa unaware code for
a new feature. That's a legacy and poor performance.
> I think the workload we're mostly concerned with here is heavy vm pressure.
I don't admit it. but note, when under heavy workload, shrink_slab() behave
stupid seriously.
>>> That's when I added the LRU tracking at the volatile range level (which
>>> reverted back to the behavior ashmem has always used), and have been
>>> using that model sense.
>>>
>>> Hopefully this clarifies things. My apologies if I don't always use the
>>> correct terminology, as I'm still a newbie when it comes to VM code.
>> I think your code is enough clean. But I'm still not sure your background
>> design. Please help me to understand clearly.
> Hopefully the above helps. But let me know where you'd like more
> clarification.
>
>
>> btw, Why do you choice fallocate instead of fadvise? As far as I skimmed,
>> fallocate() is an operation of a disk layout, not of a cache. And, why
>> did you choice fadvise() instead of madvise() at initial version. vma
>> hint might be useful than fadvise() because it can be used for anonymous
>> pages too.
> I actually started with madvise, but quickly moved to fadvise when
> feeling that the fd based ranges made more sense. With ashmem, fds are
> often shared, and coordinating volatile ranges on a shared fd made more
> sense on a (fd, offset, len) tuple, rather then on an offset and length
> on an mmapped region.
>
> I moved to fallocate at Dave Chinner's request. In short, it allows
> non-tmpfs filesystems to implement volatile range semantics allowing
> them to zap rather then writeout dirty volatile pages. And since the
> volatile ranges are very similar to a delayed/cancel-able hole-punch, it
> made sense to use a similar interface to FALLOC_FL_HOLE_PUNCH.
>
> You can read the details of DaveC's suggestion here:
> https://lkml.org/lkml/2012/4/30/441
Hmmm...
I'm sorry. I can't imagine how to integrate FALLOCATE_VOLATILE into regular
file systems. do you have any idea?
next prev parent reply other threads:[~2012-06-08 6:39 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-01 18:29 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v2 John Stultz
2012-06-01 18:29 ` [PATCH 1/3] [RFC] Interval tree implementation John Stultz
2012-06-01 18:29 ` [PATCH 2/3] [RFC] Add volatile range management code John Stultz
2012-06-01 18:29 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
2012-06-01 20:17 ` KOSAKI Motohiro
2012-06-01 21:03 ` John Stultz
2012-06-01 21:37 ` KOSAKI Motohiro
2012-06-01 21:44 ` John Stultz
2012-06-01 22:34 ` KOSAKI Motohiro
2012-06-01 23:25 ` John Stultz
2012-06-06 19:52 ` KOSAKI Motohiro
2012-06-06 23:56 ` John Stultz
2012-06-07 10:55 ` Dmitry Adamushko
2012-06-07 23:41 ` Dave Hansen
2012-06-08 3:03 ` John Stultz
2012-06-08 4:50 ` KOSAKI Motohiro
2012-06-09 3:45 ` John Stultz
2012-06-10 6:35 ` Dmitry Adamushko
2012-06-10 21:47 ` Rik van Riel
2012-06-11 18:35 ` John Stultz
2012-06-12 1:21 ` John Stultz
2012-06-12 7:16 ` Minchan Kim
2012-06-12 16:03 ` KOSAKI Motohiro
2012-06-12 19:35 ` John Stultz
2012-06-13 0:10 ` Minchan Kim
2012-06-13 1:21 ` John Stultz
2012-06-13 4:42 ` Minchan Kim
2012-06-08 6:39 ` KOSAKI Motohiro [this message]
-- strict thread matches above, loose matches on Subject: below --
2012-06-01 23:38 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v3 John Stultz
2012-06-01 23:38 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD19E37.3020309@gmail.com \
--to=kosaki.motohiro@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@betterlinux.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=dmitry.adamushko@gmail.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=john.stultz@linaro.org \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mh@glandium.org \
--cc=neilb@suse.de \
--cc=riel@redhat.com \
--cc=rlove@google.com \
--cc=tgek@mozilla.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox