Re: [RFC PATCH 0/4] Support vranges on files

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: John Stultz <john.stultz@linaro.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Arun Sharma <asharma@fb.com>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>, Dave Hansen <dave@sr71.net>,
	Rik van Riel <riel@redhat.com>, Neil Brown <neilb@suse.de>,
	Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Jason Evans <je@fb.com>,
	sanjay@google.com, Paul Turner <pjt@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michel Lespinasse <walken@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/4] Support vranges on files
Date: Thu, 4 Apr 2013 15:55:09 +0900	[thread overview]
Message-ID: <20130404065509.GE7675@blaptop> (raw)
In-Reply-To: <1365033144-15156-1-git-send-email-john.stultz@linaro.org>

Hey John,

First of all, I should confess I just glanced your code and poped
several questions. If I miss something, please slap me.

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
> This patchset is against Minchan's vrange work here:
> 	https://lkml.org/lkml/2013/3/12/105
> 
> Extending it to support volatile ranges on files. In effect
> providing the same functionality of my earlier file based
> volatile range patches on-top of Minchan's anonymous volatile
> range work.
> 
> Volatile ranges on files are different then on anonymous memory,
> because the volatility state can be shared between multiple
> applications. This makes storing the volatile ranges exclusively
> in the mm_struct (or in vmas as in Minchan's earlier work)
> inappropriate.
> 
> The patchset starts with some minor cleanup.
> 
> Then we introduce the idea of a vrange_root, which provides a
> interval-tree root and a lock to protect the tree. This structure
> can then be stored in the mm_struct or in an addres_space. Then the
> same infrastructure can be used to manage volatile ranges on both
> anonymous and file backed memory.

Thanks for the above two patches. It is a nice cleanup.

> 
> Next we introduce a parallel fvrange() syscall for creating
> volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add "int fd"
in vrange systemcall without inventing new wheel?

> 
> And finally, we change the range pruging logic to be able to
> handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.

Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.

So your [4/4] should investigate all processes mapped the page
atomically. You could do it with i_mmap_mutex and vrange_lock
and percolate the logic into try_to_discard_vpage.

> 
> Now there are some quirks still to be resolved with the approach
> used here. The biggest one being the vrange() call can't be used to
> create volatile ranges against mmapped files. Instead only the

Why?

> fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.

> 
> This could be overcome by iterating across all the process VMAs to
> determine if they're anonymous or file based, and if file-based,
> create a VMA sized volatile range on the mapping pointed to by the
> VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.

> 
> But this would have downsides, as Minchan has been clear that he wants
> to optmize the vrange() calls so that it is very cheap to create and
> destroy volatile ranges. Having simple per-process ranges be created
> means we don't have to iterate across the vmas in the range to
> determine if they're anonymous or file backed. Instead the current
> vrange() code just creates per process ranges (which may or may not
> cover mmapped file data), but will only purge anonymous pages in
> that range. This keeps the vrange() call cheap.

Right.

> 
> Additionally, just creating or destroying a single range is very
> simple to do, and requires a fixed amount of memory known up front.
> Thus we can allocate needed data prior to making any modifications.
> 
> But If we were to create a range that crosses anonymous and file
> backed pages, it must create or destroy multiple per-process or
> per-file ranges. This could require an unknown number of allocations,

This is a part I can fail to parse your opinion.

> opening the possibility of getting an ENOMEM half-way through the
> operation, leaving the volatile range partially created or destroyed.
> 
> So to keep this simple for this first pass, for now we have two
> syscalls for two types of volatile ranges.


My idea is following as

        vrange(fd, start, len, mode, behavior)

A) fd = 0

1) system call context - vrange system call registers new vrange
   in mm_struct.
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

B) fd = 1

1) system call context - vrange system call registers new vrange
   in address_space
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

What's the problem in this logic?

> 
> Let me know if you have any thoughts or comments. I'm sure there's
> plenty of room for improvement here.
> 
> In the meantime I'll be playing with some different approaches to
> try to handle single volatile ranges that cross file and anonymous
> vmas.
> 
> The entire queue, both Minchan's changes and mine can be found here:
> git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan
> 
> thanks
> -john
> 
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-04-04  6:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-03 23:52 [RFC PATCH 0/4] Support vranges on files John Stultz
2013-04-03 23:52 ` [RFC PATCH 1/4] vrange: Make various vrange.c local functions static John Stultz
2013-04-03 23:52 ` [RFC PATCH 2/4] vrange: Introduce vrange_root to make vrange structures more flexible John Stultz
2013-04-03 23:52 ` [RFC PATCH 3/4] vrange: Support fvrange() syscall for file based volatile ranges John Stultz
2013-04-03 23:52 ` [RFC PATCH 4/4] vrange: Enable purging of file backed " John Stultz
2013-04-04  6:55 ` Minchan Kim [this message]
2013-04-04 17:37   ` [RFC PATCH 0/4] Support vranges on files John Stultz
2013-04-05  7:55     ` Minchan Kim
2013-04-08  0:46       ` Minchan Kim
2013-04-09  0:36         ` John Stultz
2013-04-09  2:18           ` Minchan Kim
2013-04-09  3:27             ` John Stultz
2013-04-09  5:07               ` Minchan Kim
2013-04-09 22:36                 ` John Stultz
2013-04-10  2:48                   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130404065509.GE7675@blaptop \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=asharma@fb.com \
    --cc=dave@sr71.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=je@fb.com \
    --cc=john.stultz@linaro.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mh@glandium.org \
    --cc=mtk.manpages@gmail.com \
    --cc=neilb@suse.de \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=sanjay@google.com \
    --cc=tglek@mozilla.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).