From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: John Stultz <john.stultz@linaro.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
Andrea Righi <andrea@betterlinux.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Taras Glek <tgek@mozilla.com>, Mike Hommey <mh@glandium.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers
Date: Fri, 08 Jun 2012 02:39:51 -0400 [thread overview]
Message-ID: <4FD19E37.3020309@gmail.com> (raw)
In-Reply-To: <4FCFEE36.3010902@linaro.org>
(6/6/12 7:56 PM), John Stultz wrote:
> On 06/06/2012 12:52 PM, KOSAKI Motohiro wrote:
>>> The key point is we want volatile ranges to be purged in the order they
>>> were marked volatile.
>>> If we use the page lru via shmem_writeout to trigger range purging, we
>>> wouldn't necessarily get this desired behavior.
>> Ok, so can you please explain your ideal order to reclaim. your last mail
>> described old and new volatiled region. but I'm not sure regular tmpfs pages
>> vs volatile pages vs regular file cache order. That said, when using shrink_slab(),
>> we choose random order to drop against page cache. I'm not sure why you sure
>> it is ideal.
>
> So I'm not totally sure its ideal, but I can tell you what make sense to
> me. If there is a more ideal order, I'm open to suggestions.
>
> So volatile ranges should be purged first-in-first-out. So the first
> range marked volatile should be purged first. Since volatile ranges
> might have different costs depending on what filesystem the file is
> backed by, this LRU order is per-filesystem.
>
> It seems that if we have tmpfs volatile ranges, we should purge them
> before we swap out any regular tmpfs pages. Thus why I'm purging any
> available ranges on shmem_writepage before swapping, rather then using a
> shrinker now (I'm hoping you saw the updated patchset I sent out friday).
>
> Does that make sense?
>
>> And, now I guess you think nobody touch volatiled page, yes? because otherwise
>> volatile marking order is silly choice. If yes, what's happen if anyone touch
>> a patch which volatiled. no-op? SIGBUS?
>
> So more of a noop. If you read a page that has been marked volatile, it
> may return the data that was there, or it may return an empty nulled page.
>
> I guess we could throw a signal to help avoid developers making
> programming mistakes, but I'm not sure what the extra cost would be to
> set up and tare that down each time. One important aspect of this is
> that in order to make it attractive for an application to mark ranges as
> volatile, it has to be very cheap to mark and unmark ranges.
ok, i agree we don't need to pay any extra cost.
>> Which worklord didn't work. Usually, anon pages reclaim are only
>> happen when 1) tmpfs streaming io workload or 2) heavy vm pressure.
>> So, this scenario are not so inaccurate to me.
>
> So it was more of a theoretical issue in my discussions, but once it was
> brought up, ashmems' global range lru made more sense.
No. Every global lru is evil. Please don't introduce numa unaware code for
a new feature. That's a legacy and poor performance.
> I think the workload we're mostly concerned with here is heavy vm pressure.
I don't admit it. but note, when under heavy workload, shrink_slab() behave
stupid seriously.
>>> That's when I added the LRU tracking at the volatile range level (which
>>> reverted back to the behavior ashmem has always used), and have been
>>> using that model sense.
>>>
>>> Hopefully this clarifies things. My apologies if I don't always use the
>>> correct terminology, as I'm still a newbie when it comes to VM code.
>> I think your code is enough clean. But I'm still not sure your background
>> design. Please help me to understand clearly.
> Hopefully the above helps. But let me know where you'd like more
> clarification.
>
>
>> btw, Why do you choice fallocate instead of fadvise? As far as I skimmed,
>> fallocate() is an operation of a disk layout, not of a cache. And, why
>> did you choice fadvise() instead of madvise() at initial version. vma
>> hint might be useful than fadvise() because it can be used for anonymous
>> pages too.
> I actually started with madvise, but quickly moved to fadvise when
> feeling that the fd based ranges made more sense. With ashmem, fds are
> often shared, and coordinating volatile ranges on a shared fd made more
> sense on a (fd, offset, len) tuple, rather then on an offset and length
> on an mmapped region.
>
> I moved to fallocate at Dave Chinner's request. In short, it allows
> non-tmpfs filesystems to implement volatile range semantics allowing
> them to zap rather then writeout dirty volatile pages. And since the
> volatile ranges are very similar to a delayed/cancel-able hole-punch, it
> made sense to use a similar interface to FALLOC_FL_HOLE_PUNCH.
>
> You can read the details of DaveC's suggestion here:
> https://lkml.org/lkml/2012/4/30/441
Hmmm...
I'm sorry. I can't imagine how to integrate FALLOCATE_VOLATILE into regular
file systems. do you have any idea?
next prev parent reply other threads:[~2012-06-08 6:39 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-01 18:29 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v2 John Stultz
2012-06-01 18:29 ` [PATCH 1/3] [RFC] Interval tree implementation John Stultz
2012-06-01 18:29 ` [PATCH 2/3] [RFC] Add volatile range management code John Stultz
2012-06-01 18:29 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
2012-06-01 20:17 ` KOSAKI Motohiro
2012-06-01 21:03 ` John Stultz
2012-06-01 21:37 ` KOSAKI Motohiro
2012-06-01 21:44 ` John Stultz
2012-06-01 22:34 ` KOSAKI Motohiro
2012-06-01 23:25 ` John Stultz
2012-06-06 19:52 ` KOSAKI Motohiro
2012-06-06 23:56 ` John Stultz
2012-06-07 10:55 ` Dmitry Adamushko
2012-06-07 23:41 ` Dave Hansen
2012-06-08 3:03 ` John Stultz
2012-06-08 4:50 ` KOSAKI Motohiro
2012-06-09 3:45 ` John Stultz
2012-06-10 6:35 ` Dmitry Adamushko
2012-06-10 21:47 ` Rik van Riel
2012-06-11 18:35 ` John Stultz
2012-06-12 1:21 ` John Stultz
2012-06-12 7:16 ` Minchan Kim
2012-06-12 7:16 ` Minchan Kim
2012-06-12 16:03 ` KOSAKI Motohiro
2012-06-12 16:03 ` KOSAKI Motohiro
2012-06-12 19:35 ` John Stultz
2012-06-12 19:35 ` John Stultz
2012-06-13 0:10 ` Minchan Kim
2012-06-13 0:10 ` Minchan Kim
2012-06-13 1:21 ` John Stultz
2012-06-13 1:21 ` John Stultz
2012-06-13 4:42 ` Minchan Kim
2012-06-13 4:42 ` Minchan Kim
2012-06-08 6:39 ` KOSAKI Motohiro [this message]
-- strict thread matches above, loose matches on Subject: below --
2012-06-01 23:38 [PATCH 0/3] [RFC] Fallocate Volatile Ranges v3 John Stultz
2012-06-01 23:38 ` [PATCH 3/3] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD19E37.3020309@gmail.com \
--to=kosaki.motohiro@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@betterlinux.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=dmitry.adamushko@gmail.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=john.stultz@linaro.org \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mh@glandium.org \
--cc=neilb@suse.de \
--cc=riel@redhat.com \
--cc=rlove@google.com \
--cc=tgek@mozilla.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.