From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: Andrew Morton <akpm@osdl.org>,
sct@redhat.com, hch@infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC] Distributed mmap API
Date: Thu, 4 Mar 2004 10:55:01 -0800 [thread overview]
Message-ID: <20040304185501.GH1384@us.ibm.com> (raw)
In-Reply-To: <200403030800.35612.phillips@arcor.de>
This matches what we are after here!
Thanx, Paul
On Wed, Mar 03, 2004 at 08:06:20AM -0500, Daniel Phillips wrote:
> On Tuesday 02 March 2004 22:15, Andrew Morton wrote:
> > Daniel Phillips <phillips@arcor.de> wrote:
> > > Here is a rearranged zap_pte_range that avoids any operations for
> > > out-of-range pfns.
> >
> > Please remind us why Linux needs this patch?
>
> The is purely to support mmap, including MAP_PRIVATE, accurately on
> distributed filesystems, where "accurately" is defined as "with local
> filesystem semantics".
>
> If the same file region is mmapped by more than one node, only one of them is
> allowed to have a given page of the mmap valid in the page tables at any
> time. When a memory write occurs on one of the other nodes, it must fault so
> that the distributed filesystem can arrange for exclusive ownership of the
> file page (or as GFS currently implements it, the whole file) to change from
> one node to the other. At this time, any pages already faulted in must be
> unmapped so that future memory accesses will properly fault. This unmapping
> is done by zap_page_range, which has nearly the semantics we want except that
> it will also unmap private pages of a MAP_PRIVATE mapping, destroying the
> only copy of that data. A user would observe the privately written data
> spontaneously revert to the current file contents. The purpose of this patch
> is to fix that.
>
> This patch allows a distributed filesystem to unmap file-backed memory without
> unmapping anonymous pages or deleting swap cache, avoiding the above data
> destruction. Since zap_page_range is the only function that knows how to
> unmap memory, it needs to be taught how to skip anonymous pages.
>
> An alternative to this patch is simply to export zap_page_range, then the
> distributed filesystem can walk the lists of mmapped vmas itself, skipping
> any that are MAP_PRIVATE. This achieves Posix local filesystem semantics,
> but not Linux local filesystem semantics, because updates to the mmap from
> other nodes become visible unpredictably. Earlier this year, Linus said that
> he wants tighter semantics for distributed MAP_PRIVATE.
>
> This patch presses zap_page_range into service in a way that was not
> originally intended, that is, for invalidation as opposed to destruction of
> memory regions. The requirements are identical except for the MAP_PRIVATE
> detail. Forking the whole zap_ chain would be even more distasteful than
> grafting on this option flag. It's also impractical to implement a zap_
> variant within a dfs module because of the heavy use of per-arch APIs. As
> far I can see, this patch is the minimum cost of having accurate semantics
> for distributed MAP_PRIVATE mmap.
>
> I'll take the opportunity to beat my chest a once again about the fact that
> this doesn't benefit anything other than distributed filesystems. On the
> other hand, the cost is miniscule: 54 bytes, a little stack and likely no
> measureable cpu.
>
> > I forget what `all' does? anon+swapcache as well as pagecache?
>
> Yes
>
> > A bit of API documentation here would be appropriate.
>
> Oops, sorry:
>
> /**
> * zap_page_range - remove user pages in a given range
> * @vma: vm_area_struct holding the applicable pages
> * @address: starting address of pages to zap
> * @size: number of bytes to zap
> * @all: also unmap anonymous pages
> */
> void zap_page_range(struct vm_area_struct *vma,
> unsigned long address, unsigned long size, int all)
>
> Regards,
>
> Daniel
>
>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: Andrew Morton <akpm@osdl.org>,
sct@redhat.com, hch@infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC] Distributed mmap API
Date: Thu, 4 Mar 2004 10:55:01 -0800 [thread overview]
Message-ID: <20040304185501.GH1384@us.ibm.com> (raw)
In-Reply-To: <200403030800.35612.phillips@arcor.de>
This matches what we are after here!
Thanx, Paul
On Wed, Mar 03, 2004 at 08:06:20AM -0500, Daniel Phillips wrote:
> On Tuesday 02 March 2004 22:15, Andrew Morton wrote:
> > Daniel Phillips <phillips@arcor.de> wrote:
> > > Here is a rearranged zap_pte_range that avoids any operations for
> > > out-of-range pfns.
> >
> > Please remind us why Linux needs this patch?
>
> The is purely to support mmap, including MAP_PRIVATE, accurately on
> distributed filesystems, where "accurately" is defined as "with local
> filesystem semantics".
>
> If the same file region is mmapped by more than one node, only one of them is
> allowed to have a given page of the mmap valid in the page tables at any
> time. When a memory write occurs on one of the other nodes, it must fault so
> that the distributed filesystem can arrange for exclusive ownership of the
> file page (or as GFS currently implements it, the whole file) to change from
> one node to the other. At this time, any pages already faulted in must be
> unmapped so that future memory accesses will properly fault. This unmapping
> is done by zap_page_range, which has nearly the semantics we want except that
> it will also unmap private pages of a MAP_PRIVATE mapping, destroying the
> only copy of that data. A user would observe the privately written data
> spontaneously revert to the current file contents. The purpose of this patch
> is to fix that.
>
> This patch allows a distributed filesystem to unmap file-backed memory without
> unmapping anonymous pages or deleting swap cache, avoiding the above data
> destruction. Since zap_page_range is the only function that knows how to
> unmap memory, it needs to be taught how to skip anonymous pages.
>
> An alternative to this patch is simply to export zap_page_range, then the
> distributed filesystem can walk the lists of mmapped vmas itself, skipping
> any that are MAP_PRIVATE. This achieves Posix local filesystem semantics,
> but not Linux local filesystem semantics, because updates to the mmap from
> other nodes become visible unpredictably. Earlier this year, Linus said that
> he wants tighter semantics for distributed MAP_PRIVATE.
>
> This patch presses zap_page_range into service in a way that was not
> originally intended, that is, for invalidation as opposed to destruction of
> memory regions. The requirements are identical except for the MAP_PRIVATE
> detail. Forking the whole zap_ chain would be even more distasteful than
> grafting on this option flag. It's also impractical to implement a zap_
> variant within a dfs module because of the heavy use of per-arch APIs. As
> far I can see, this patch is the minimum cost of having accurate semantics
> for distributed MAP_PRIVATE mmap.
>
> I'll take the opportunity to beat my chest a once again about the fact that
> this doesn't benefit anything other than distributed filesystems. On the
> other hand, the cost is miniscule: 54 bytes, a little stack and likely no
> measureable cpu.
>
> > I forget what `all' does? anon+swapcache as well as pagecache?
>
> Yes
>
> > A bit of API documentation here would be appropriate.
>
> Oops, sorry:
>
> /**
> * zap_page_range - remove user pages in a given range
> * @vma: vm_area_struct holding the applicable pages
> * @address: starting address of pages to zap
> * @size: number of bytes to zap
> * @all: also unmap anonymous pages
> */
> void zap_page_range(struct vm_area_struct *vma,
> unsigned long address, unsigned long size, int all)
>
> Regards,
>
> Daniel
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-03-05 2:01 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
2004-02-16 19:09 ` Paul E. McKenney
2004-02-17 2:31 ` Andrew Morton
2004-02-17 2:31 ` Andrew Morton
2004-02-17 7:35 ` Christoph Hellwig
2004-02-17 7:35 ` Christoph Hellwig
2004-02-17 12:40 ` Paul E. McKenney
2004-02-17 12:40 ` Paul E. McKenney
2004-02-18 0:19 ` Andrew Morton
2004-02-18 0:19 ` Andrew Morton
2004-02-18 12:51 ` Arjan van de Ven
2004-02-18 14:00 ` Paul E. McKenney
2004-02-18 14:00 ` Paul E. McKenney
2004-02-18 21:10 ` Christoph Hellwig
2004-02-18 21:10 ` Christoph Hellwig
2004-02-18 15:06 ` Paul E. McKenney
2004-02-18 15:06 ` Paul E. McKenney
2004-02-18 22:21 ` Christoph Hellwig
2004-02-18 22:21 ` Christoph Hellwig
2004-02-18 22:51 ` Andrew Morton
2004-02-18 22:51 ` Andrew Morton
2004-02-18 23:00 ` Christoph Hellwig
2004-02-18 23:00 ` Christoph Hellwig
2004-02-18 16:21 ` Paul E. McKenney
2004-02-18 16:21 ` Paul E. McKenney
2004-02-18 23:32 ` Andrew Morton
2004-02-18 23:32 ` Andrew Morton
2004-02-19 12:32 ` Christoph Hellwig
2004-02-19 12:32 ` Christoph Hellwig
2004-02-19 18:56 ` Andrew Morton
2004-02-19 18:56 ` Andrew Morton
2004-02-19 19:01 ` Christoph Hellwig
2004-02-19 19:01 ` Christoph Hellwig
2004-02-19 13:04 ` Paul E. McKenney
2004-02-19 13:04 ` Paul E. McKenney
2004-02-20 3:17 ` Anton Blanchard
2004-02-20 3:17 ` Anton Blanchard
2004-02-20 21:46 ` Valdis.Kletnieks
2004-02-19 0:28 ` Andrew Morton
2004-02-19 0:28 ` Andrew Morton
2004-02-18 18:36 ` Paul E. McKenney
2004-02-18 18:36 ` Paul E. McKenney
2004-02-19 12:31 ` Christoph Hellwig
2004-02-19 12:31 ` Christoph Hellwig
2004-02-19 9:11 ` Paul E. McKenney
2004-02-19 9:11 ` Paul E. McKenney
2004-02-19 18:32 ` Lars Marowsky-Bree
2004-02-19 18:38 ` Arjan van de Ven
2004-02-19 19:16 ` viro
2004-02-19 19:16 ` viro
2004-02-19 16:15 ` Paul E. McKenney
2004-02-19 16:15 ` Paul E. McKenney
2004-02-19 18:59 ` Tim Bird
2004-02-19 18:59 ` Tim Bird
2004-02-20 1:27 ` David Schwartz
2004-02-19 9:11 ` David Weinehall
2004-02-19 9:11 ` David Weinehall
2004-02-19 8:58 ` Paul E. McKenney
2004-02-19 8:58 ` Paul E. McKenney
2004-03-04 5:51 ` Mike Fedyk
2004-03-04 5:51 ` Mike Fedyk
2004-02-19 10:29 ` Lars Marowsky-Bree
2004-02-19 10:29 ` Lars Marowsky-Bree
2004-02-19 9:00 ` Paul E. McKenney
2004-02-19 9:00 ` Paul E. McKenney
2004-02-19 11:11 ` Arjan van de Ven
2004-02-19 11:53 ` Lars Marowsky-Bree
2004-02-19 11:53 ` Lars Marowsky-Bree
2004-02-18 18:04 ` Tim Bird
2004-02-18 18:04 ` Tim Bird
2004-02-19 20:56 ` Daniel Phillips
2004-02-19 20:56 ` Daniel Phillips
2004-02-19 22:06 ` Stephen C. Tweedie
2004-02-19 22:06 ` Stephen C. Tweedie
2004-02-19 22:31 ` Daniel Phillips
2004-02-19 22:31 ` Daniel Phillips
2004-02-19 16:42 ` Paul E. McKenney
2004-02-19 16:42 ` Paul E. McKenney
2004-02-20 2:06 ` Daniel Phillips
2004-02-20 2:06 ` Daniel Phillips
2004-02-19 19:47 ` Paul E. McKenney
2004-02-19 19:47 ` Paul E. McKenney
2004-02-20 5:07 ` Daniel Phillips
2004-02-20 5:07 ` Daniel Phillips
2004-02-20 12:02 ` Paul E. McKenney
2004-02-20 12:02 ` Paul E. McKenney
2004-02-20 20:37 ` Daniel Phillips
2004-02-20 20:37 ` Daniel Phillips
2004-02-20 14:01 ` Paul E. McKenney
2004-02-20 14:01 ` Paul E. McKenney
2004-02-20 23:00 ` Daniel Phillips
2004-02-20 23:00 ` Daniel Phillips
2004-02-20 16:17 ` Paul E. McKenney
2004-02-20 16:17 ` Paul E. McKenney
2004-02-21 3:19 ` Daniel Phillips
2004-02-21 3:19 ` Daniel Phillips
2004-02-21 19:00 ` Daniel Phillips
2004-02-22 23:39 ` Paul E. McKenney
2004-02-25 21:04 ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 21:04 ` Daniel Phillips
2004-02-25 19:12 ` Paul E. McKenney
2004-02-25 19:12 ` Paul E. McKenney
2004-02-25 19:14 ` Paul E. McKenney
2004-02-25 19:14 ` Paul E. McKenney
2004-02-25 22:07 ` Andrew Morton
2004-02-25 22:07 ` Andrew Morton
2004-02-25 22:07 ` Daniel Phillips
2004-02-25 22:07 ` Daniel Phillips
2004-02-25 22:16 ` Andrew Morton
2004-02-25 22:16 ` Andrew Morton
2004-02-25 22:46 ` Daniel Phillips
2004-02-25 22:46 ` Daniel Phillips
2004-03-03 3:00 ` Daniel Phillips
2004-03-03 3:00 ` Daniel Phillips
2004-03-03 3:15 ` Andrew Morton
2004-03-03 3:15 ` Andrew Morton
2004-03-03 13:06 ` Daniel Phillips
2004-03-03 13:06 ` Daniel Phillips
2004-03-04 18:55 ` Paul E. McKenney [this message]
2004-03-04 18:55 ` Paul E. McKenney
2004-02-20 21:17 ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 21:17 ` Christoph Hellwig
2004-02-20 22:16 ` Daniel Phillips
2004-02-20 22:16 ` Daniel Phillips
2004-02-20 23:56 ` GFS requirements (was: Non-GPL export of invalidate_mmap_range) Lars Marowsky-Bree
2004-02-21 3:16 ` Daniel Phillips
2004-02-21 14:17 ` Lars Marowsky-Bree
2004-02-21 19:09 ` Daniel Phillips
2004-02-22 10:37 ` Lars Marowsky-Bree
2004-02-24 18:26 ` Daniel Phillips
2004-02-18 12:12 ` Non-GPL export of invalidate_mmap_range Dominik Kubla
2004-02-18 12:12 ` Dominik Kubla
[not found] ` <24651326.1077037044@42.150.104.212.access.eclipse.net.uk>
2004-02-18 13:13 ` Christoph Hellwig
2004-02-17 22:22 ` David Weinehall
2004-02-17 22:22 ` David Weinehall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040304185501.GH1384@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=phillips@arcor.de \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.