linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: mtk.manpages@gmail.com, Heinrich Schuchardt <xypron.glpk@gmx.de>,
	linux-man@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Theodore T'so <tytso@mit.edu>,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	jamie@shareable.org
Subject: Re: munmap, msync: synchronization
Date: Mon, 21 Apr 2014 21:54:16 +0200	[thread overview]
Message-ID: <53557768.5070905@gmail.com> (raw)
In-Reply-To: <20140421181431.GA17125@infradead.org>

Christoph,

On 04/21/2014 08:14 PM, Christoph Hellwig wrote:
> On Mon, Apr 21, 2014 at 12:16:46PM +0200, Michael Kerrisk (man-pages) wrote:
>> 1. In the bad old days (even on Linux, AFAIK, but that was in days
>>    before I looked closely at what goes on), the page cache and
>>    the buffer cache were not unified. That meant that a page from 
>>    a file might both be in the buffer cache (because of file I/O
>>    syscalls) and in the page cache (because of mmap()).
> 
> Correct.
> 
>> 2. In a non-unified cache system, pages can naturally get out of
>>    synch in the two locations. Before it had a unified cache, Linux 
>>    used to jump some hoops to ensure that contents in the two 
>>    locations remained consistent.
> 
> Yeah.
> 
>> 3. Nowadays Linux--like most (all?) UNIX systems--has a 
>>    unified cache: file I/O, mmap(), and the paging system all 
>>    use the same cache. If a file is mmap()-ed and also subject
>>    to file I?/, there will be only one copy of each file page 
>>    in the cache. Ergo, the inconsistency problem goes away.
> 
> Mostly true, except for FreeBSD and Solaris when they use ZFS, which has
> it's own file cache that is not coherent with the VM cache at the
> implementation level.  Not sure how much of this leaks to userspace,
> though.

Thanks for that detail.

>> 4. IIUC, the pieces like msync(MS_ASYNC) and msync(MS_INVALIDATE)
>>    exist only because of the bad old non-unified cache days.
>>    MS_INVALIDATE was a way of saying: make sure that writes
>>    to the file by other processes are visible in this mapping.
>>    msync() without the MS_INVALIDATE flags was a way of saying:
>>    make sure that read()s from the file see the changes made
>>    via this mapping. Using either MS_SYNC or MS_ASYNC
>>    was the way of saying: "I either want to wait until the file
>>    updates have been completed", or "please start the updates
>>    now, but I don't want to wait until they're completed".
> 
> Right.
> 
>> 5. On systems with a unified cache, msync(MS_INVALIDATE)
>>    is a no-op. (That is so on Linux.)
> 
> Almost.  It returns EBUSY if it hits any mlock()ed region.  Don't ask me
> why, though..

Ahhh yes, I was aware of that detail, but overlooked it in the point 
above.

>> 6. On Linux, MS_ASYNC is also a no-op. That's fine on a unified 
>>    cache system. Filesystem I/O always sees a consistent view,
>>    and MS_ASYNC never undertook to give a guarantee about *when*
>>    the update would occur. (The Linux buffer cache logic will 
>>    ensure that it is flushed out sometime in the near future.)
> 
> Right.  It's a fairly inefficient noop, though - it actually loops
> over all vmas to do nothing with them.
> 
>> 7. On Linux (and probably many other modern systems), the only
>>    call that has any real use is msync(MS_SYNC), meaning
>>    "flush the buffers *now*, and I want to wait for that to 
>>    complete, so that I can then continue safe in the knowledge
>>    that my data has landed on a device". That's useful if we
>>    want insurance for our data in the event of a system crash.
> 
> Right.  It's basically another way to call fsync, which is used to
> implement it underneath.  It actually should be a ranged-fdatasync
> but right it's it's implemented horribly inefficiently in that it
> does a fsync call for each vma that it encounters in the range
> specified.
> 
>> 8. POSIX make no mandate for a unified cache system. Thus,
>>    we have MS_ASYNC and MS_INVALIDATE in the standard, and
>>    the standard says nothing (AFAIK) about whether munmap() 
>>    will flush data. On Linux (and probably most modern systems),
>>    we're fine. but portable applications that care about 
>>    standards and nonunified caches need to use msync().
>>
>>    My advice: To ensure that the contents of a shared file
>>    mapping are written to the underlying file--even on bad old
>>    implementations--a call to msync() should be made before 
>>    unmapping a mapping with munmap().
> 
> Agreed.

Thanks for checking all of this over and thanks also
for confirming that I learned my lessens well in the
"Jamie Lokier school of tough technical reviewing" ;-).

>> 9. The mmap() man page says this:
>>
>>        MAP_SHARED 
>>            Share this mapping.  Updates to the mapping are vis???
>>            ible to other processes that map this file, and  are
>>            carried  through  to  the underlying file.  The file
>>            may not actually be updated until msync(2)  or  mun???
>>            map() is called.
>>
>>    I believe the piece "or munmap()" is misleading. It implies
>>    that munmap() must trigger a sync action. I don't think this
>>    is true. All that it is required to do is remove some range
>>    of pages from the process's virtual address space. I'm
>>    inclined to remove those words, but I'd like to see if any
>>    FS person has a correction to my understanding first.
> 
> I would expect non-coherent systems to update their caches on munmap,
> Posix does not seem to require this, and I can't find any language
> towards that in the HP-UX man page, which was a system that I remember
> as non-coherent until the end.

Yes, that's how I read it too. POSIX seems to have no requirements here,
so I assume it was catering to to the lowest common denominator.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2014-04-21 19:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5353A158.9050009@gmx.de>
2014-04-21 10:16 ` munmap, msync: synchronization Michael Kerrisk (man-pages)
     [not found]   ` <5354F00E.8050609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-21 18:14     ` Christoph Hellwig
2014-04-21 19:54       ` Michael Kerrisk (man-pages) [this message]
2014-04-21 21:34         ` Jamie Lokier
     [not found]           ` <20140421213418.GH30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-22  6:03             ` Christoph Hellwig
2014-04-22  7:04               ` Jamie Lokier
2014-04-22  9:28                 ` [PATCH] fsync_range, was: " Christoph Hellwig
2014-04-23 14:33                   ` Michael Kerrisk (man-pages)
2014-04-23 15:45                     ` Christoph Hellwig
     [not found]                       ` <20140423154550.GA21014-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:20                         ` Jamie Lokier
     [not found]                           ` <20140423222011.GM30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25  6:07                             ` Christoph Hellwig
2014-04-24  9:34                       ` Michael Kerrisk (man-pages)
     [not found]                   ` <20140422092837.GA6191-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:15                     ` Jamie Lokier
     [not found]                       ` <20140423221402.GL30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25  6:26                         ` Christoph Hellwig
2014-04-24  1:34                     ` Dave Chinner
2014-04-25  6:06                       ` Christoph Hellwig
2014-04-23 14:03       ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53557768.5070905@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tytso@mit.edu \
    --cc=xypron.glpk@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).