public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* msync() needed before munmap() when writing to shared mapping?
@ 2004-04-16 22:02 Jamie Lokier
  2004-04-16 22:46 ` Andrew Morton
  2004-04-17 20:23 ` H. Peter Anvin
  0 siblings, 2 replies; 6+ messages in thread
From: Jamie Lokier @ 2004-04-16 22:02 UTC (permalink / raw)
  To: linux-kernel

I'm verifying that writing to a shared mapping and then calling
munmap() or exit() definitely propagates the dirty bits from the page
tables to the file.

This has been asked before, 1 year ago, in a thread called "Memory
mapped files question", and hpa said:

> munmap() and fsync() or msync() will flush it to disk; there is no
> reason munmap() should unless perhaps the file was opened O_SYNC.

That was talking about flushing data all the way to disk.  The
implication of hpa's response is that munmap() does propagate the
dirty bits from the page table to the file.  That is the obvious
behaviour, and what I've always assumed.

I've followed the logic from do_munmap() and it looks good:
unmap_vmas->zap_pte_range->page_remove_rmap->set_page_dirty.

Can someone confirm this is correct, please?

Also, I recall a mention on lkml that some flavour of BSD doesn't
propagate the dirty bits when unmapping, and msync is needed to ensure
data is properly written to the file.  I haven't been able to find the
message, though; perhaps I imagined it.

Can someone confirm or refute that, and similarly does anyone know of
any other OS that needs msync before munmap or exit, to ensure written
data reaches the file?

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: msync() needed before munmap() when writing to shared mapping?
  2004-04-16 22:02 msync() needed before munmap() when writing to shared mapping? Jamie Lokier
@ 2004-04-16 22:46 ` Andrew Morton
  2004-04-16 23:10   ` Jamie Lokier
  2004-04-16 23:55   ` Hugh Dickins
  2004-04-17 20:23 ` H. Peter Anvin
  1 sibling, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2004-04-16 22:46 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

Jamie Lokier <jamie@shareable.org> wrote:
>
> I've followed the logic from do_munmap() and it looks good:
> unmap_vmas->zap_pte_range->page_remove_rmap->set_page_dirty.
> 
> Can someone confirm this is correct, please?

yup, zap_pte_range() transfers pte dirtiness into pagecache dirtiness when
tearing down the mapping, leaving the dirty page floating about in
pagecache for kupdate/kswapd/fsync to catch.  Longstanding behaviour.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: msync() needed before munmap() when writing to shared mapping?
  2004-04-16 22:46 ` Andrew Morton
@ 2004-04-16 23:10   ` Jamie Lokier
  2004-04-16 23:59     ` Andrew Morton
  2004-04-16 23:55   ` Hugh Dickins
  1 sibling, 1 reply; 6+ messages in thread
From: Jamie Lokier @ 2004-04-16 23:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Andrew Morton wrote:
> Jamie Lokier <jamie@shareable.org> wrote:
> > I've followed the logic from do_munmap() and it looks good:
> > unmap_vmas->zap_pte_range->page_remove_rmap->set_page_dirty.
> > 
> > Can someone confirm this is correct, please?
> 
> yup, zap_pte_range() transfers pte dirtiness into pagecache dirtiness when
> tearing down the mapping, leaving the dirty page floating about in
> pagecache for kupdate/kswapd/fsync to catch.  Longstanding behaviour.

Thanks.

A related question.  The comment for MADV_DONTNEED says:

 * NB: This interface discards data rather than pushes it out to swap,
 * as some implementations do.  This has performance implications for
 * applications like large transactional databases which want to discard
 * pages in anonymous maps after committing to backing store the data
 * that was kept in them.  There is no reason to write this data out to
 * the swap area if the application is discarding it.
 *
 * An interface that causes the system to free clean pages and flush
 * dirty pages is already available as msync(MS_INVALIDATE).

MADV_DONTNEED calls zap_page_range().
That propagates dirtiness into the pagecache.

So it *doesn't* "discard data rather than push it out to swap", if the
same dirty data is mapped elsewhere e.g. as a shared anonymous
mapping, does it?

The comment also mentions MS_INVALIDATE, but MS_INVALIDATE doesn't do
what the comment says and doesn't implement anything like POSIX
either.  (Linux's MS_INVALIDATE is practically equivalent to MS_ASYNC).

Is there a call which does what the command about MS_INVALIDATE says,
i.e. free clean pages and flush dirty ones?

Thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: msync() needed before munmap() when writing to shared mapping?
  2004-04-16 22:46 ` Andrew Morton
  2004-04-16 23:10   ` Jamie Lokier
@ 2004-04-16 23:55   ` Hugh Dickins
  1 sibling, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2004-04-16 23:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jamie Lokier, linux-kernel

On Fri, 16 Apr 2004, Andrew Morton wrote:
> Jamie Lokier <jamie@shareable.org> wrote:
> >
> > I've followed the logic from do_munmap() and it looks good:
> > unmap_vmas->zap_pte_range->page_remove_rmap->set_page_dirty.
> > 
> > Can someone confirm this is correct, please?
> 
> yup, zap_pte_range() transfers pte dirtiness into pagecache dirtiness when
> tearing down the mapping, leaving the dirty page floating about in
> pagecache for kupdate/kswapd/fsync to catch.  Longstanding behaviour.

May I add a clarification?  Jamie has focussed on the set_page_dirty
in page_remove_rmap: that's a special for s390, on everything else the
"page_test_and_clear_dirty" preceding it evaluates to 0.  For most
arches it is indeed the set_page_dirty actually in zap_pte_range
which smears the dirt from pte to page.  (Please don't ask me
to explain the s390 case, I'm no expert.)

Hugh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: msync() needed before munmap() when writing to shared mapping?
  2004-04-16 23:10   ` Jamie Lokier
@ 2004-04-16 23:59     ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2004-04-16 23:59 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, linux-mm

Jamie Lokier <jamie@shareable.org> wrote:
>
> ...
> A related question.  The comment for MADV_DONTNEED says:
> 
>  * NB: This interface discards data rather than pushes it out to swap,
>  * as some implementations do.  This has performance implications for
>  * applications like large transactional databases which want to discard
>  * pages in anonymous maps after committing to backing store the data
>  * that was kept in them.  There is no reason to write this data out to
>  * the swap area if the application is discarding it.
>  *
>  * An interface that causes the system to free clean pages and flush
>  * dirty pages is already available as msync(MS_INVALIDATE).
> 
> MADV_DONTNEED calls zap_page_range().
> That propagates dirtiness into the pagecache.
> 
> So it *doesn't* "discard data rather than push it out to swap", if the
> same dirty data is mapped elsewhere e.g. as a shared anonymous
> mapping, does it?

Sure.  If some other process is using the same pages we don't go toss them
away.

> The comment also mentions MS_INVALIDATE, but MS_INVALIDATE doesn't do
> what the comment says and doesn't implement anything like POSIX
> either.  (Linux's MS_INVALIDATE is practically equivalent to MS_ASYNC).

Seems that way - MS_INVALIDATE will simply propagate pte dirtiness into
page dirtiness.  For non-file-backed mappings it is a no-op.

> Is there a call which does what the command about MS_INVALIDATE says,
> i.e. free clean pages and flush dirty ones?

Not really.  What is a clean anonymous page?  If it's ever been written to,
it's conceptually dirty, whether or not it is physically dirty.  ie: if you
invalidate it, you've lost your data.

I guess you could get a similar result by munmap() and then mmapping it
again.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: msync() needed before munmap() when writing to shared mapping?
  2004-04-16 22:02 msync() needed before munmap() when writing to shared mapping? Jamie Lokier
  2004-04-16 22:46 ` Andrew Morton
@ 2004-04-17 20:23 ` H. Peter Anvin
  1 sibling, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2004-04-17 20:23 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20040416220223.GA27084@mail.shareable.org>
By author:    Jamie Lokier <jamie@shareable.org>
In newsgroup: linux.dev.kernel
> 
> > munmap() and fsync() or msync() will flush it to disk; there is no
> > reason munmap() should unless perhaps the file was opened O_SYNC.
> 
> That was talking about flushing data all the way to disk.  The
> implication of hpa's response is that munmap() does propagate the
> dirty bits from the page table to the file.  That is the obvious
> behaviour, and what I've always assumed.
> 

Obvious behaviour, and required by POSIX.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-04-17 20:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-16 22:02 msync() needed before munmap() when writing to shared mapping? Jamie Lokier
2004-04-16 22:46 ` Andrew Morton
2004-04-16 23:10   ` Jamie Lokier
2004-04-16 23:59     ` Andrew Morton
2004-04-16 23:55   ` Hugh Dickins
2004-04-17 20:23 ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox