From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: Andrew Morton <akpm@osdl.org>,
sct@redhat.com, hch@infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC] Distributed mmap API
Date: Thu, 4 Mar 2004 10:55:01 -0800 [thread overview]
Message-ID: <20040304185501.GH1384@us.ibm.com> (raw)
In-Reply-To: <200403030800.35612.phillips@arcor.de>
This matches what we are after here!
Thanx, Paul
On Wed, Mar 03, 2004 at 08:06:20AM -0500, Daniel Phillips wrote:
> On Tuesday 02 March 2004 22:15, Andrew Morton wrote:
> > Daniel Phillips <phillips@arcor.de> wrote:
> > > Here is a rearranged zap_pte_range that avoids any operations for
> > > out-of-range pfns.
> >
> > Please remind us why Linux needs this patch?
>
> The is purely to support mmap, including MAP_PRIVATE, accurately on
> distributed filesystems, where "accurately" is defined as "with local
> filesystem semantics".
>
> If the same file region is mmapped by more than one node, only one of them is
> allowed to have a given page of the mmap valid in the page tables at any
> time. When a memory write occurs on one of the other nodes, it must fault so
> that the distributed filesystem can arrange for exclusive ownership of the
> file page (or as GFS currently implements it, the whole file) to change from
> one node to the other. At this time, any pages already faulted in must be
> unmapped so that future memory accesses will properly fault. This unmapping
> is done by zap_page_range, which has nearly the semantics we want except that
> it will also unmap private pages of a MAP_PRIVATE mapping, destroying the
> only copy of that data. A user would observe the privately written data
> spontaneously revert to the current file contents. The purpose of this patch
> is to fix that.
>
> This patch allows a distributed filesystem to unmap file-backed memory without
> unmapping anonymous pages or deleting swap cache, avoiding the above data
> destruction. Since zap_page_range is the only function that knows how to
> unmap memory, it needs to be taught how to skip anonymous pages.
>
> An alternative to this patch is simply to export zap_page_range, then the
> distributed filesystem can walk the lists of mmapped vmas itself, skipping
> any that are MAP_PRIVATE. This achieves Posix local filesystem semantics,
> but not Linux local filesystem semantics, because updates to the mmap from
> other nodes become visible unpredictably. Earlier this year, Linus said that
> he wants tighter semantics for distributed MAP_PRIVATE.
>
> This patch presses zap_page_range into service in a way that was not
> originally intended, that is, for invalidation as opposed to destruction of
> memory regions. The requirements are identical except for the MAP_PRIVATE
> detail. Forking the whole zap_ chain would be even more distasteful than
> grafting on this option flag. It's also impractical to implement a zap_
> variant within a dfs module because of the heavy use of per-arch APIs. As
> far I can see, this patch is the minimum cost of having accurate semantics
> for distributed MAP_PRIVATE mmap.
>
> I'll take the opportunity to beat my chest a once again about the fact that
> this doesn't benefit anything other than distributed filesystems. On the
> other hand, the cost is miniscule: 54 bytes, a little stack and likely no
> measureable cpu.
>
> > I forget what `all' does? anon+swapcache as well as pagecache?
>
> Yes
>
> > A bit of API documentation here would be appropriate.
>
> Oops, sorry:
>
> /**
> * zap_page_range - remove user pages in a given range
> * @vma: vm_area_struct holding the applicable pages
> * @address: starting address of pages to zap
> * @size: number of bytes to zap
> * @all: also unmap anonymous pages
> */
> void zap_page_range(struct vm_area_struct *vma,
> unsigned long address, unsigned long size, int all)
>
> Regards,
>
> Daniel
>
>
prev parent reply other threads:[~2004-03-05 2:01 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
2004-02-17 2:31 ` Andrew Morton
2004-02-17 7:35 ` Christoph Hellwig
2004-02-17 12:40 ` Paul E. McKenney
2004-02-18 0:19 ` Andrew Morton
2004-02-18 12:51 ` Arjan van de Ven
2004-02-18 14:00 ` Paul E. McKenney
2004-02-18 21:10 ` Christoph Hellwig
2004-02-18 15:06 ` Paul E. McKenney
2004-02-18 22:21 ` Christoph Hellwig
2004-02-18 22:51 ` Andrew Morton
2004-02-18 23:00 ` Christoph Hellwig
2004-02-18 16:21 ` Paul E. McKenney
2004-02-18 23:32 ` Andrew Morton
2004-02-19 12:32 ` Christoph Hellwig
2004-02-19 18:56 ` Andrew Morton
2004-02-19 19:01 ` Christoph Hellwig
2004-02-19 13:04 ` Paul E. McKenney
2004-02-20 3:17 ` Anton Blanchard
2004-02-20 21:46 ` Valdis.Kletnieks
2004-02-19 0:28 ` Andrew Morton
2004-02-18 18:36 ` Paul E. McKenney
2004-02-19 12:31 ` Christoph Hellwig
2004-02-19 9:11 ` Paul E. McKenney
[not found] ` <20040219183210.GX14000@marowsky-bree.de>
2004-02-19 18:38 ` Arjan van de Ven
2004-02-19 19:16 ` viro
2004-02-19 16:15 ` Paul E. McKenney
2004-02-19 18:59 ` Tim Bird
2004-02-19 9:11 ` David Weinehall
2004-02-19 8:58 ` Paul E. McKenney
2004-03-04 5:51 ` Mike Fedyk
2004-02-19 10:29 ` Lars Marowsky-Bree
2004-02-19 9:00 ` Paul E. McKenney
2004-02-19 11:11 ` Arjan van de Ven
2004-02-19 11:53 ` Lars Marowsky-Bree
2004-02-18 18:04 ` Tim Bird
2004-02-19 20:56 ` Daniel Phillips
2004-02-19 22:06 ` Stephen C. Tweedie
2004-02-19 22:31 ` Daniel Phillips
2004-02-19 16:42 ` Paul E. McKenney
2004-02-20 2:06 ` Daniel Phillips
2004-02-19 19:47 ` Paul E. McKenney
2004-02-20 5:07 ` Daniel Phillips
2004-02-20 12:02 ` Paul E. McKenney
2004-02-20 20:37 ` Daniel Phillips
2004-02-20 14:01 ` Paul E. McKenney
2004-02-20 23:00 ` Daniel Phillips
2004-02-20 16:17 ` Paul E. McKenney
2004-02-21 3:19 ` Daniel Phillips
2004-02-20 21:17 ` Christoph Hellwig
2004-02-20 22:16 ` Daniel Phillips
2004-02-20 23:56 ` GFS requirements (was: Non-GPL export of invalidate_mmap_range) Lars Marowsky-Bree
2004-02-21 3:16 ` Daniel Phillips
2004-02-21 14:17 ` Lars Marowsky-Bree
2004-02-21 19:09 ` Daniel Phillips
2004-02-22 10:37 ` Lars Marowsky-Bree
2004-02-24 18:26 ` Daniel Phillips
2004-02-18 12:12 ` Non-GPL export of invalidate_mmap_range Dominik Kubla
[not found] ` <24651326.1077037044@42.150.104.212.access.eclipse.net.uk>
2004-02-18 13:13 ` Christoph Hellwig
2004-02-17 22:22 ` David Weinehall
[not found] ` <200402211400.16779.phillips@arcor.de>
[not found] ` <20040222233911.GB1311@us.ibm.com>
2004-02-25 21:04 ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 19:12 ` Paul E. McKenney
2004-02-25 19:14 ` Paul E. McKenney
2004-02-25 22:07 ` Andrew Morton
2004-02-25 22:07 ` Daniel Phillips
2004-02-25 22:16 ` Andrew Morton
2004-02-25 22:46 ` Daniel Phillips
2004-03-03 3:00 ` Daniel Phillips
2004-03-03 3:15 ` Andrew Morton
2004-03-03 13:06 ` Daniel Phillips
2004-03-04 18:55 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040304185501.GH1384@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=phillips@arcor.de \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox