public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix remap of shared read only mappings
@ 2003-09-04 14:49 James Bottomley
  2003-09-04 21:48 ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2003-09-04 14:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel

When mmap MAP_SHARED is done on a file, it gets marked with VM_MAYSHARE
and, if it's read/write, VM_SHARED.  However, if it is remapped with
mremap(), the MAP_SHARED is only passed into the new mapping based on
VM_SHARED.  This means that remapped read only MAP_SHARED mappings lose
VM_MAYSHARE.  This is causing us a problem on parisc because we have to
align all shared mappings carefully to mitigate cache aliasing problems.

The fix is to key passing the MAP_SHARED flag back into the remapped are
off VM_MAYSHARE not VM_SHARED.

James

===== mremap.c 1.32 vs 1.33 =====
--- 1.32/mm/mremap.c	Thu Aug  7 12:29:10 2003
+++ 1.33/mm/mremap.c	Sun Aug 24 06:50:10 2003
@@ -420,7 +420,7 @@
 	if (flags & MREMAP_MAYMOVE) {
 		if (!(flags & MREMAP_FIXED)) {
 			unsigned long map_flags = 0;
-			if (vma->vm_flags & VM_SHARED)
+			if (vma->vm_flags & VM_MAYSHARE)
 				map_flags |= MAP_SHARED;
 
 			new_addr = get_unmapped_area(vma->vm_file, 0, new_len,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 14:49 [PATCH] fix remap of shared read only mappings James Bottomley
@ 2003-09-04 21:48 ` Jamie Lokier
  2003-09-04 22:33   ` James Bottomley
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2003-09-04 21:48 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Linux Kernel

James Bottomley wrote:
> When mmap MAP_SHARED is done on a file, it gets marked with VM_MAYSHARE
> and, if it's read/write, VM_SHARED.  However, if it is remapped with
> mremap(), the MAP_SHARED is only passed into the new mapping based on
> VM_SHARED.  This means that remapped read only MAP_SHARED mappings lose
> VM_MAYSHARE.  This is causing us a problem on parisc because we have to
> align all shared mappings carefully to mitigate cache aliasing problems.
> 
> The fix is to key passing the MAP_SHARED flag back into the remapped are
> off VM_MAYSHARE not VM_SHARED.

At first I disagreed with your patch.  I was thinking that special
alignment is only really needed to avoid aliasing problems for
_writable_ shared mappings, and VM_SHARED is right for that.  (Which
would indicate that mmap is faulty, not mremap).

But after some thought, I agree with you.

A read-only mapping of a shared segment must be coherent with other,
possibly writable mappings, so far as the architecture can guarantee
that.

That coherence should not disappear if the file handle used for the
mapping was opened O_RDONLY.

One last thought: this is what PROT_SEM is for.  Linux doesn't use
this in any useful way.  But, technically, mmap with MAP_SHARED ad
PROT_SEM should enable cache coherence, and that might include
aligning the address.  Without PROT_SEM an application should not rely
on cache coherence.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 21:48 ` Jamie Lokier
@ 2003-09-04 22:33   ` James Bottomley
  2003-09-04 22:56     ` Jamie Lokier
  2003-09-05  0:49     ` Daniel Phillips
  0 siblings, 2 replies; 13+ messages in thread
From: James Bottomley @ 2003-09-04 22:33 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linus Torvalds, Linux Kernel

On Thu, 2003-09-04 at 17:48, Jamie Lokier wrote:
> One last thought: this is what PROT_SEM is for.  Linux doesn't use
> this in any useful way.  But, technically, mmap with MAP_SHARED ad
> PROT_SEM should enable cache coherence, and that might include
> aligning the address.  Without PROT_SEM an application should not rely
> on cache coherence.

I'm not familiar with the PROT_SEM flag.  I don't believe it to be
defined in POSIX.

However, POSIX does imply levels of cache coherence for both MAP_SHARED
and MAP_PRIVATE:

With MAP_SHARED, any change to the underlying object after the mapping
must become visible to the mapper (although the change may be delayed by
local caching of the changer's implementation until it is explicitly
flushed).

With MAP_PRIVATE it is undefined what happens if the underlying object
is changed after mapping.

So regardless of PROT_SEM we have no choice but to worry about cache
coherence issues on MAP_SHARED mappings.

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 22:33   ` James Bottomley
@ 2003-09-04 22:56     ` Jamie Lokier
  2003-09-04 23:23       ` James Bottomley
  2003-09-05  0:49     ` Daniel Phillips
  1 sibling, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2003-09-04 22:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Linux Kernel

James Bottomley wrote:
> However, POSIX does imply levels of cache coherence for both MAP_SHARED
> and MAP_PRIVATE:
> 
> With MAP_SHARED, any change to the underlying object after the mapping
> must become visible to the mapper (although the change may be delayed by
> local caching of the changer's implementation until it is explicitly
> flushed).
...
> So regardless of PROT_SEM we have no choice but to worry about cache
> coherence issues on MAP_SHARED mappings.

You have just argued _against_ worrying about cache coherence by
aligning mapping addresses.

Basically, POSIX says shared mappings aren't guanteed to be coherent
until you call msync().  Then you can just do whatever is needed to
make different views coherent.  That's easier now that we have rmap.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 22:56     ` Jamie Lokier
@ 2003-09-04 23:23       ` James Bottomley
  2003-09-04 23:40         ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2003-09-04 23:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linus Torvalds, Linux Kernel

On Thu, 2003-09-04 at 18:56, Jamie Lokier wrote:
> You have just argued _against_ worrying about cache coherence by
> aligning mapping addresses.
> 
> Basically, POSIX says shared mappings aren't guanteed to be coherent
> until you call msync().  Then you can just do whatever is needed to
> make different views coherent.  That's easier now that we have rmap.

I think you may misunderstand what I mean by coherence:  Our problem is
the VIVT processor caches.  Once one mapper does an msync, that data
must be visible to all the other mappers, so at that point we have to
flush the cache lines of all the other mappers.  On PA, we only need to
flush one correctly aligned address to get the VIVT cache to flush all
the others.  However, the kernel page cache usually holds an unaligned
reference so we need to do the extra aligned flush when this data
changes.  If we didn't do the alignment, we'd need to flush every
virtual address in the current CPU translation for that page.

If you mean PROT_SEM requires immediate coherence without an msync, then
those semantics would be very tricky to achieve on parisc since we'd
need the kernel virtual address of the page in the page cache correctly
aligned as well.

rmap isn't really necessary for this, that's what the
page->mapping->i_mmap and i_mmap_shared lists are for.

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 23:23       ` James Bottomley
@ 2003-09-04 23:40         ` Jamie Lokier
  0 siblings, 0 replies; 13+ messages in thread
From: Jamie Lokier @ 2003-09-04 23:40 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Linux Kernel

James Bottomley wrote:
> I think you may misunderstand what I mean by coherence:  Our problem is
> the VIVT processor caches.  Once one mapper does an msync, that data
> must be visible to all the other mappers, so at that point we have to
> flush the cache lines of all the other mappers.  On PA, we only need to
> flush one correctly aligned address to get the VIVT cache to flush all
> the others.  However, the kernel page cache usually holds an unaligned
> reference so we need to do the extra aligned flush when this data
> changes.  If we didn't do the alignment, we'd need to flush every
> virtual address in the current CPU translation for that page.

Ok, I understand why you want matching alignments now. :)
(So that MS_INVALIDATE doesn't have to do anything).

> If you mean PROT_SEM requires immediate coherence without an msync, then
> those semantics would be very tricky to achieve on parisc since we'd
> need the kernel virtual address of the page in the page cache correctly
> aligned as well.

Linux hasn't ever done anything useful with PROT_SEM.  As far as I
know, on some other systems PROT_SEM has meant that you mark pages
uncacheable or similar, but only on systems where that is needed to
implement IPC through the shared memory.  For some definition IPC.

-- JAmie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-04 22:33   ` James Bottomley
  2003-09-04 22:56     ` Jamie Lokier
@ 2003-09-05  0:49     ` Daniel Phillips
  2003-09-05  0:49       ` Linus Torvalds
  2003-09-05  0:52       ` James Bottomley
  1 sibling, 2 replies; 13+ messages in thread
From: Daniel Phillips @ 2003-09-05  0:49 UTC (permalink / raw)
  To: James Bottomley, Jamie Lokier; +Cc: Linus Torvalds, Linux Kernel

On Friday 05 September 2003 00:33, James Bottomley wrote:
> On Thu, 2003-09-04 at 17:48, Jamie Lokier wrote:
> However, POSIX does imply levels of cache coherence for both MAP_SHARED
> and MAP_PRIVATE:
>
> With MAP_SHARED, any change to the underlying object after the mapping
> must become visible to the mapper (although the change may be delayed by
> local caching of the changer's implementation until it is explicitly
> flushed).

This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and 
I want to be sure I'm implementing true-blue Posix semantics.  But trawling 
through the Posix/SUS specification at:

   http://www.unix-systems.org/version3/online.html

all it says is that for MAP_SHARED "write references shall change the 
underlying object."  I don't see anything about when those changes become 
visible to other mappers, much less any discussion of local caching.  Am I 
looking at the wrong document?

Regards,

Daniel



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  0:49     ` Daniel Phillips
@ 2003-09-05  0:49       ` Linus Torvalds
  2003-09-05  1:07         ` Alan Cox
  2003-09-05  0:52       ` James Bottomley
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2003-09-05  0:49 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: James Bottomley, Jamie Lokier, Linux Kernel


On Fri, 5 Sep 2003, Daniel Phillips wrote:
>
> This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and 
> I want to be sure I'm implementing true-blue Posix semantics.

Please don't.

POSIX semantics are weak enough not to be interesting. Exactly because a 
number of old hardware platforms simply _cannot_ give you good coherency. 
And a number of old UNIXes couldn't either, for that matter.

What really matters is that mmap() under Linux is 100% coherent, as far as 
the hardware just allows. We haven't taken the easy way out. We shouldn't 
start now.

		Linus


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  0:49     ` Daniel Phillips
  2003-09-05  0:49       ` Linus Torvalds
@ 2003-09-05  0:52       ` James Bottomley
  2003-09-05  1:21         ` Daniel Phillips
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2003-09-05  0:52 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Jamie Lokier, Linus Torvalds, Linux Kernel

On Thu, 2003-09-04 at 20:49, Daniel Phillips wrote:
> This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and 
> I want to be sure I'm implementing true-blue Posix semantics.  But trawling 
> through the Posix/SUS specification at:
> 
>    http://www.unix-systems.org/version3/online.html
> 
> all it says is that for MAP_SHARED "write references shall change the 
> underlying object."  I don't see anything about when those changes become 
> visible to other mappers, much less any discussion of local caching.  Am I 
> looking at the wrong document?

Not sure which is "correct", but the one I'm looking at is the POSIX
update from the open group:

http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html

And that's where I was quoting from.

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  0:49       ` Linus Torvalds
@ 2003-09-05  1:07         ` Alan Cox
  2003-09-05  1:31           ` Daniel Phillips
  2003-09-05  4:35           ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Alan Cox @ 2003-09-05  1:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, James Bottomley, Jamie Lokier,
	Linux Kernel Mailing List

On Gwe, 2003-09-05 at 01:49, Linus Torvalds wrote:
> What really matters is that mmap() under Linux is 100% coherent, as far as 
> the hardware just allows. We haven't taken the easy way out. We shouldn't 
> start now.

NFS ?

The problem with OpenGFS is that it is a network file system so
implementing "perfect" shared mmap semantics might actually reduce it
from handy to useless. Right now the worst we have to do is mark pages
uncached in some weird shared map cases, with pages being bounced across
firewire its a bit different.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  0:52       ` James Bottomley
@ 2003-09-05  1:21         ` Daniel Phillips
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Phillips @ 2003-09-05  1:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jamie Lokier, Linus Torvalds, Linux Kernel

On Friday 05 September 2003 02:52, James Bottomley wrote:
> On Thu, 2003-09-04 at 20:49, Daniel Phillips wrote:
> > This an interesting tidbit, as I'm busy working on a DFS mmap for
> > OpenGFS, and I want to be sure I'm implementing true-blue Posix
> > semantics.  But trawling through the Posix/SUS specification at:
> >
> >    http://www.unix-systems.org/version3/online.html
> >
> > all it says is that for MAP_SHARED "write references shall change the
> > underlying object."  I don't see anything about when those changes become
> > visible to other mappers, much less any discussion of local caching.  Am
> > I looking at the wrong document?
>
> Not sure which is "correct", but the one I'm looking at is the POSIX
> update from the open group:
>
> http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html
>
> And that's where I was quoting from.

It appears to be the same text.  Either I'm blind, or the part about the 
caching was your editorial commentaryk.  Anyway, I'm going with Linus' ruling 
on semantics, I agree totally.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  1:07         ` Alan Cox
@ 2003-09-05  1:31           ` Daniel Phillips
  2003-09-05  4:35           ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Phillips @ 2003-09-05  1:31 UTC (permalink / raw)
  To: Alan Cox, Linus Torvalds
  Cc: James Bottomley, Jamie Lokier, Linux Kernel Mailing List

On Friday 05 September 2003 03:07, Alan Cox wrote:
> On Gwe, 2003-09-05 at 01:49, Linus Torvalds wrote:
> > What really matters is that mmap() under Linux is 100% coherent, as far
> > as the hardware just allows. We haven't taken the easy way out. We
> > shouldn't start now.
>
> NFS ?

NFS doesn't attempt to implement local Posix semantics, but a DFS should.  
Anyway, Linus has ruled we're being held to the higher standard of "Linux 
semantics".

> The problem with OpenGFS is that it is a network file system so
> implementing "perfect" shared mmap semantics might actually reduce it
> from handy to useless. Right now the worst we have to do is mark pages
> uncached in some weird shared map cases, with pages being bounced across
> firewire its a bit different.

Sistina has been doing it the "perfect" way for some time now, and it's worked 
out OK.  This relies on writes being rare.

Things will definitely slow to a crawl if several nodes keep writing little 
bits into the same mmap, but that's a case of "doctor it hurts" I think.  If 
there's a common application that does this, I'd appreciate knowing about it.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix remap of shared read only mappings
  2003-09-05  1:07         ` Alan Cox
  2003-09-05  1:31           ` Daniel Phillips
@ 2003-09-05  4:35           ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2003-09-05  4:35 UTC (permalink / raw)
  To: Alan Cox
  Cc: Daniel Phillips, James Bottomley, Jamie Lokier,
	Linux Kernel Mailing List


On Fri, 5 Sep 2003, Alan Cox wrote:
> 
> NFS ?

On a local machine, yes.

Clearly NFS will _not_ be cache-coherent over a network. But even NFS is
supposed to be cache-coherent on a per-client basis. 

Networking is _not_ an excuse for not doing that right.

		Linus


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-09-05  4:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-04 14:49 [PATCH] fix remap of shared read only mappings James Bottomley
2003-09-04 21:48 ` Jamie Lokier
2003-09-04 22:33   ` James Bottomley
2003-09-04 22:56     ` Jamie Lokier
2003-09-04 23:23       ` James Bottomley
2003-09-04 23:40         ` Jamie Lokier
2003-09-05  0:49     ` Daniel Phillips
2003-09-05  0:49       ` Linus Torvalds
2003-09-05  1:07         ` Alan Cox
2003-09-05  1:31           ` Daniel Phillips
2003-09-05  4:35           ` Linus Torvalds
2003-09-05  0:52       ` James Bottomley
2003-09-05  1:21         ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox