All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Non-GPL export of invalidate_mmap_range
Date: Thu, 19 Feb 2004 11:47:51 -0800	[thread overview]
Message-ID: <20040219194751.GN1269@us.ibm.com> (raw)
In-Reply-To: <200402192106.02086.phillips@arcor.de>

On Thu, Feb 19, 2004 at 09:06:55PM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> > GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> > change the underlying file.  There are a number of things one can do,
> > but one must keep in mind that different processes can MAP_PRIVATE the
> > same file at different times, and that some processes might MAP_SHARED it
> > at the same time that others MAP_PRIVATE it.  Here are the alternatives
> > I can imagine:
> >
> > 1.	Any time a file changes, create a copy of the old version
> > 	for any MAP_PRIVATE vmas.  This would essentially create
> > 	a point-in-time copy of any file that a process mapped
> > 	MAP_PRIVATE.  This is arguably the most intuitive from the
> > 	user's standpoint, but (a) it would not be a small change and
> > 	(b) I haven't heard of anyone coming up with a good use for it.
> > 	Please enlighten me if I am missing a simple implementation or
> > 	compelling uses.
> 
> This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
> one day it would certainly not be under the guise of MAP_PRIVATE.

Whew!  That is a relief!!!  ;-)

> > 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> > 	as suggested by Daniel.
> 
> I did not suggest that, rather I described the existing practice in OpenGFS 
> and Sistina GFS, which at least does not destroy anonymous data.  The correct 
> behaviour is the one you describe in option 3, and we are perfectly willing 
> to change GFS to obtain that behaviour.  To be precise: I suggest we change 
> invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
> something else, having the current semantics.
> 
> As a historical note: the behavior GFS obtains from option 2 is 
> Posix-compliant, but falls short of Linus-compliance, who insists on 
> completely accurate invalidation behavior as is right and proper.

OK, this is the OpenGFS zap_inode_mapping(), right?

> > 	This would mean that a
> > 	process that had mapped a file MAP_PRIVATE and faulted
> > 	in parts of it would see different versions of the file
> > 	in different pages.  This should be straightforward to
> > 	implement, but in what situation is this skewed view of
> > 	the file useful?
> 
> You've got me there ;)  However, Posix explicitly blesses this sloppy 
> behaviour.  I suppose that with additional user space locking, applications 
> could make it work reliably.  But it's still sloppy, and worse, it's 
> different from Linux's local filesystem behaviour.

;-)

> > 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> > 	but invalidate those pages in the vma that have not yet been
> > 	modified (that are not anonymous) as suggested by Stephen.
> > 	This would mean that a process that had mapped a file MAP_PRIVATE
> > 	and written on parts of it would see different versions of the
> > 	file in different pages.
> 
> This is the correct behaviour and is the current behaviour for local 
> filesystems.  In particular, all processes on all nodes will see the current 
> contents of any file page that they have not yet faulted in, as of the last 
> time any process wrote that file page via mmap or otherwise.
> 
> Our goal for GFS, and the goal I'd like to hold up as definitive for any 
> distributed filesystem, is to imitate local filesystem semantics exactly, 
> even across the cluster.

OK, I surrender.  I got some private email agreeing with this
viewpoint.  Any dissenters, speak soon, or...

> > Again, in what situation is this skewed view of the file useful?
> 
> It's not skewed in any way that I can see.  Though I am no linker expert, I 
> dimly recall that these are precisely the semantics ld relies on.

I thought that the linker relied on people refraining (or being
prevented) from updating executables while they are in use.
But I am also no linker expert.

> > 5.	The current behavior, where the process's writes do not
> > 	flow through to the file, but all changes to the file are
> > 	visible to the writing process.
> 
> We all agree that's broken, I hope.

I can buy DFSes implementing semantics that are the same as local
filesystems.  But no one has yet shown me anything that it breaks!

> > 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> > 	files, so that (for example) any change to the underlying
> > 	file removes that file from any MAP_PRIVATE address spaces.
> > 	Subsequent accesses would get a SEGV, rather than a
> > 	surprise from silently changing data.
> 
> Creative :)  Well, data that changes "silently" is a fact of life whenever 
> data is shared.  It's up to applications to ensure that shared data changes 
> predictably.

Glad you liked it.  ;-)

I think that predictability when using MAP_PRIVATE requires that one
refrain from modifying the underlying file while someone has it mmap()ed
with MAP_PRIVATE.  I would welcome an example proving me wrong.

> > So, please help me out here...  What do applications that MAP_PRIVATE
> > changing files really expect to happen?
> 
> Number 3, is that ok with you?  Incidently, your list doesn't include the 
> semantics we'd get by just exporting and using invalidate_mmap_range.  I 
> presume that is because you agree it's not correct (it will clobber CoWed 
> anonymous pages).

I will give it a shot, though I would still like to hear about examples
where the difference in semantics affects a real application.
BTW, my list didn't include exporting and using the current
invalidate_mmap_range() because I didn't say what I meant to say.
Hate it when that happens!  ;-)

						Thanx, Paul

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Non-GPL export of invalidate_mmap_range
Date: Thu, 19 Feb 2004 11:47:51 -0800	[thread overview]
Message-ID: <20040219194751.GN1269@us.ibm.com> (raw)
In-Reply-To: <200402192106.02086.phillips@arcor.de>

On Thu, Feb 19, 2004 at 09:06:55PM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> > GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> > change the underlying file.  There are a number of things one can do,
> > but one must keep in mind that different processes can MAP_PRIVATE the
> > same file at different times, and that some processes might MAP_SHARED it
> > at the same time that others MAP_PRIVATE it.  Here are the alternatives
> > I can imagine:
> >
> > 1.	Any time a file changes, create a copy of the old version
> > 	for any MAP_PRIVATE vmas.  This would essentially create
> > 	a point-in-time copy of any file that a process mapped
> > 	MAP_PRIVATE.  This is arguably the most intuitive from the
> > 	user's standpoint, but (a) it would not be a small change and
> > 	(b) I haven't heard of anyone coming up with a good use for it.
> > 	Please enlighten me if I am missing a simple implementation or
> > 	compelling uses.
> 
> This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
> one day it would certainly not be under the guise of MAP_PRIVATE.

Whew!  That is a relief!!!  ;-)

> > 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> > 	as suggested by Daniel.
> 
> I did not suggest that, rather I described the existing practice in OpenGFS 
> and Sistina GFS, which at least does not destroy anonymous data.  The correct 
> behaviour is the one you describe in option 3, and we are perfectly willing 
> to change GFS to obtain that behaviour.  To be precise: I suggest we change 
> invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
> something else, having the current semantics.
> 
> As a historical note: the behavior GFS obtains from option 2 is 
> Posix-compliant, but falls short of Linus-compliance, who insists on 
> completely accurate invalidation behavior as is right and proper.

OK, this is the OpenGFS zap_inode_mapping(), right?

> > 	This would mean that a
> > 	process that had mapped a file MAP_PRIVATE and faulted
> > 	in parts of it would see different versions of the file
> > 	in different pages.  This should be straightforward to
> > 	implement, but in what situation is this skewed view of
> > 	the file useful?
> 
> You've got me there ;)  However, Posix explicitly blesses this sloppy 
> behaviour.  I suppose that with additional user space locking, applications 
> could make it work reliably.  But it's still sloppy, and worse, it's 
> different from Linux's local filesystem behaviour.

;-)

> > 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> > 	but invalidate those pages in the vma that have not yet been
> > 	modified (that are not anonymous) as suggested by Stephen.
> > 	This would mean that a process that had mapped a file MAP_PRIVATE
> > 	and written on parts of it would see different versions of the
> > 	file in different pages.
> 
> This is the correct behaviour and is the current behaviour for local 
> filesystems.  In particular, all processes on all nodes will see the current 
> contents of any file page that they have not yet faulted in, as of the last 
> time any process wrote that file page via mmap or otherwise.
> 
> Our goal for GFS, and the goal I'd like to hold up as definitive for any 
> distributed filesystem, is to imitate local filesystem semantics exactly, 
> even across the cluster.

OK, I surrender.  I got some private email agreeing with this
viewpoint.  Any dissenters, speak soon, or...

> > Again, in what situation is this skewed view of the file useful?
> 
> It's not skewed in any way that I can see.  Though I am no linker expert, I 
> dimly recall that these are precisely the semantics ld relies on.

I thought that the linker relied on people refraining (or being
prevented) from updating executables while they are in use.
But I am also no linker expert.

> > 5.	The current behavior, where the process's writes do not
> > 	flow through to the file, but all changes to the file are
> > 	visible to the writing process.
> 
> We all agree that's broken, I hope.

I can buy DFSes implementing semantics that are the same as local
filesystems.  But no one has yet shown me anything that it breaks!

> > 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> > 	files, so that (for example) any change to the underlying
> > 	file removes that file from any MAP_PRIVATE address spaces.
> > 	Subsequent accesses would get a SEGV, rather than a
> > 	surprise from silently changing data.
> 
> Creative :)  Well, data that changes "silently" is a fact of life whenever 
> data is shared.  It's up to applications to ensure that shared data changes 
> predictably.

Glad you liked it.  ;-)

I think that predictability when using MAP_PRIVATE requires that one
refrain from modifying the underlying file while someone has it mmap()ed
with MAP_PRIVATE.  I would welcome an example proving me wrong.

> > So, please help me out here...  What do applications that MAP_PRIVATE
> > changing files really expect to happen?
> 
> Number 3, is that ok with you?  Incidently, your list doesn't include the 
> semantics we'd get by just exporting and using invalidate_mmap_range.  I 
> presume that is because you agree it's not correct (it will clobber CoWed 
> anonymous pages).

I will give it a shot, though I would still like to hear about examples
where the difference in semantics affects a real application.
BTW, my list didn't include exporting and using the current
invalidate_mmap_range() because I didn't say what I meant to say.
Hate it when that happens!  ;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-02-20  2:53 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
2004-02-16 19:09 ` Paul E. McKenney
2004-02-17  2:31 ` Andrew Morton
2004-02-17  2:31   ` Andrew Morton
2004-02-17  7:35 ` Christoph Hellwig
2004-02-17  7:35   ` Christoph Hellwig
2004-02-17 12:40   ` Paul E. McKenney
2004-02-17 12:40     ` Paul E. McKenney
2004-02-18  0:19     ` Andrew Morton
2004-02-18  0:19       ` Andrew Morton
2004-02-18 12:51       ` Arjan van de Ven
2004-02-18 14:00         ` Paul E. McKenney
2004-02-18 14:00           ` Paul E. McKenney
2004-02-18 21:10           ` Christoph Hellwig
2004-02-18 21:10             ` Christoph Hellwig
2004-02-18 15:06             ` Paul E. McKenney
2004-02-18 15:06               ` Paul E. McKenney
2004-02-18 22:21               ` Christoph Hellwig
2004-02-18 22:21                 ` Christoph Hellwig
2004-02-18 22:51                 ` Andrew Morton
2004-02-18 22:51                   ` Andrew Morton
2004-02-18 23:00                   ` Christoph Hellwig
2004-02-18 23:00                     ` Christoph Hellwig
2004-02-18 16:21                     ` Paul E. McKenney
2004-02-18 16:21                       ` Paul E. McKenney
2004-02-18 23:32                     ` Andrew Morton
2004-02-18 23:32                       ` Andrew Morton
2004-02-19 12:32                       ` Christoph Hellwig
2004-02-19 12:32                         ` Christoph Hellwig
2004-02-19 18:56                         ` Andrew Morton
2004-02-19 18:56                           ` Andrew Morton
2004-02-19 19:01                           ` Christoph Hellwig
2004-02-19 19:01                             ` Christoph Hellwig
2004-02-19 13:04                             ` Paul E. McKenney
2004-02-19 13:04                               ` Paul E. McKenney
2004-02-20  3:17                             ` Anton Blanchard
2004-02-20  3:17                               ` Anton Blanchard
2004-02-20 21:46                               ` Valdis.Kletnieks
2004-02-19  0:28                     ` Andrew Morton
2004-02-19  0:28                       ` Andrew Morton
2004-02-18 18:36                       ` Paul E. McKenney
2004-02-18 18:36                         ` Paul E. McKenney
2004-02-19 12:31                       ` Christoph Hellwig
2004-02-19 12:31                         ` Christoph Hellwig
2004-02-19  9:11                         ` Paul E. McKenney
2004-02-19  9:11                           ` Paul E. McKenney
2004-02-19 18:32                           ` Lars Marowsky-Bree
2004-02-19 18:38                             ` Arjan van de Ven
2004-02-19 19:16                             ` viro
2004-02-19 19:16                               ` viro
2004-02-19 16:15                               ` Paul E. McKenney
2004-02-19 16:15                                 ` Paul E. McKenney
2004-02-19 18:59                         ` Tim Bird
2004-02-19 18:59                           ` Tim Bird
2004-02-20  1:27                       ` David Schwartz
2004-02-19  9:11                   ` David Weinehall
2004-02-19  9:11                     ` David Weinehall
2004-02-19  8:58                     ` Paul E. McKenney
2004-02-19  8:58                       ` Paul E. McKenney
2004-03-04  5:51                       ` Mike Fedyk
2004-03-04  5:51                         ` Mike Fedyk
2004-02-19 10:29                   ` Lars Marowsky-Bree
2004-02-19 10:29                     ` Lars Marowsky-Bree
2004-02-19  9:00                     ` Paul E. McKenney
2004-02-19  9:00                       ` Paul E. McKenney
2004-02-19 11:11                     ` Arjan van de Ven
2004-02-19 11:53                       ` Lars Marowsky-Bree
2004-02-19 11:53                         ` Lars Marowsky-Bree
2004-02-18 18:04         ` Tim Bird
2004-02-18 18:04           ` Tim Bird
2004-02-19 20:56       ` Daniel Phillips
2004-02-19 20:56         ` Daniel Phillips
2004-02-19 22:06         ` Stephen C. Tweedie
2004-02-19 22:06           ` Stephen C. Tweedie
2004-02-19 22:31           ` Daniel Phillips
2004-02-19 22:31             ` Daniel Phillips
2004-02-19 16:42             ` Paul E. McKenney
2004-02-19 16:42               ` Paul E. McKenney
2004-02-20  2:06               ` Daniel Phillips
2004-02-20  2:06                 ` Daniel Phillips
2004-02-19 19:47                 ` Paul E. McKenney [this message]
2004-02-19 19:47                   ` Paul E. McKenney
2004-02-20  5:07                   ` Daniel Phillips
2004-02-20  5:07                     ` Daniel Phillips
2004-02-20 12:02                     ` Paul E. McKenney
2004-02-20 12:02                       ` Paul E. McKenney
2004-02-20 20:37                       ` Daniel Phillips
2004-02-20 20:37                         ` Daniel Phillips
2004-02-20 14:01                         ` Paul E. McKenney
2004-02-20 14:01                           ` Paul E. McKenney
2004-02-20 23:00                           ` Daniel Phillips
2004-02-20 23:00                             ` Daniel Phillips
2004-02-20 16:17                             ` Paul E. McKenney
2004-02-20 16:17                               ` Paul E. McKenney
2004-02-21  3:19                               ` Daniel Phillips
2004-02-21  3:19                                 ` Daniel Phillips
2004-02-21 19:00                               ` Daniel Phillips
2004-02-22 23:39                                 ` Paul E. McKenney
2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 21:04                                     ` Daniel Phillips
2004-02-25 19:12                                     ` Paul E. McKenney
2004-02-25 19:12                                       ` Paul E. McKenney
2004-02-25 19:14                                     ` Paul E. McKenney
2004-02-25 19:14                                       ` Paul E. McKenney
2004-02-25 22:07                                     ` Andrew Morton
2004-02-25 22:07                                       ` Andrew Morton
2004-02-25 22:07                                       ` Daniel Phillips
2004-02-25 22:07                                         ` Daniel Phillips
2004-02-25 22:16                                         ` Andrew Morton
2004-02-25 22:16                                           ` Andrew Morton
2004-02-25 22:46                                           ` Daniel Phillips
2004-02-25 22:46                                             ` Daniel Phillips
2004-03-03  3:00                                       ` Daniel Phillips
2004-03-03  3:00                                         ` Daniel Phillips
2004-03-03  3:15                                         ` Andrew Morton
2004-03-03  3:15                                           ` Andrew Morton
2004-03-03 13:06                                           ` Daniel Phillips
2004-03-03 13:06                                             ` Daniel Phillips
2004-03-04 18:55                                             ` Paul E. McKenney
2004-03-04 18:55                                               ` Paul E. McKenney
2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 21:17                           ` Christoph Hellwig
2004-02-20 22:16                           ` Daniel Phillips
2004-02-20 22:16                             ` Daniel Phillips
2004-02-20 23:56                             ` GFS requirements (was: Non-GPL export of invalidate_mmap_range) Lars Marowsky-Bree
2004-02-21  3:16                               ` Daniel Phillips
2004-02-21 14:17                                 ` Lars Marowsky-Bree
2004-02-21 19:09                                   ` Daniel Phillips
2004-02-22 10:37                                     ` Lars Marowsky-Bree
2004-02-24 18:26                                       ` Daniel Phillips
2004-02-18 12:12     ` Non-GPL export of invalidate_mmap_range Dominik Kubla
2004-02-18 12:12       ` Dominik Kubla
     [not found]   ` <24651326.1077037044@42.150.104.212.access.eclipse.net.uk>
2004-02-18 13:13     ` Christoph Hellwig
2004-02-17 22:22 ` David Weinehall
2004-02-17 22:22   ` David Weinehall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040219194751.GN1269@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=phillips@arcor.de \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.