All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: mtk.manpages@gmail.com, Heinrich Schuchardt <xypron.glpk@gmx.de>,
	linux-man@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Theodore T'so <tytso@mit.edu>,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	jamie@shareable.org
Subject: Re: munmap, msync: synchronization
Date: Mon, 21 Apr 2014 21:54:16 +0200	[thread overview]
Message-ID: <53557768.5070905@gmail.com> (raw)
In-Reply-To: <20140421181431.GA17125@infradead.org>

Christoph,

On 04/21/2014 08:14 PM, Christoph Hellwig wrote:
> On Mon, Apr 21, 2014 at 12:16:46PM +0200, Michael Kerrisk (man-pages) wrote:
>> 1. In the bad old days (even on Linux, AFAIK, but that was in days
>>    before I looked closely at what goes on), the page cache and
>>    the buffer cache were not unified. That meant that a page from 
>>    a file might both be in the buffer cache (because of file I/O
>>    syscalls) and in the page cache (because of mmap()).
> 
> Correct.
> 
>> 2. In a non-unified cache system, pages can naturally get out of
>>    synch in the two locations. Before it had a unified cache, Linux 
>>    used to jump some hoops to ensure that contents in the two 
>>    locations remained consistent.
> 
> Yeah.
> 
>> 3. Nowadays Linux--like most (all?) UNIX systems--has a 
>>    unified cache: file I/O, mmap(), and the paging system all 
>>    use the same cache. If a file is mmap()-ed and also subject
>>    to file I?/, there will be only one copy of each file page 
>>    in the cache. Ergo, the inconsistency problem goes away.
> 
> Mostly true, except for FreeBSD and Solaris when they use ZFS, which has
> it's own file cache that is not coherent with the VM cache at the
> implementation level.  Not sure how much of this leaks to userspace,
> though.

Thanks for that detail.

>> 4. IIUC, the pieces like msync(MS_ASYNC) and msync(MS_INVALIDATE)
>>    exist only because of the bad old non-unified cache days.
>>    MS_INVALIDATE was a way of saying: make sure that writes
>>    to the file by other processes are visible in this mapping.
>>    msync() without the MS_INVALIDATE flags was a way of saying:
>>    make sure that read()s from the file see the changes made
>>    via this mapping. Using either MS_SYNC or MS_ASYNC
>>    was the way of saying: "I either want to wait until the file
>>    updates have been completed", or "please start the updates
>>    now, but I don't want to wait until they're completed".
> 
> Right.
> 
>> 5. On systems with a unified cache, msync(MS_INVALIDATE)
>>    is a no-op. (That is so on Linux.)
> 
> Almost.  It returns EBUSY if it hits any mlock()ed region.  Don't ask me
> why, though..

Ahhh yes, I was aware of that detail, but overlooked it in the point 
above.

>> 6. On Linux, MS_ASYNC is also a no-op. That's fine on a unified 
>>    cache system. Filesystem I/O always sees a consistent view,
>>    and MS_ASYNC never undertook to give a guarantee about *when*
>>    the update would occur. (The Linux buffer cache logic will 
>>    ensure that it is flushed out sometime in the near future.)
> 
> Right.  It's a fairly inefficient noop, though - it actually loops
> over all vmas to do nothing with them.
> 
>> 7. On Linux (and probably many other modern systems), the only
>>    call that has any real use is msync(MS_SYNC), meaning
>>    "flush the buffers *now*, and I want to wait for that to 
>>    complete, so that I can then continue safe in the knowledge
>>    that my data has landed on a device". That's useful if we
>>    want insurance for our data in the event of a system crash.
> 
> Right.  It's basically another way to call fsync, which is used to
> implement it underneath.  It actually should be a ranged-fdatasync
> but right it's it's implemented horribly inefficiently in that it
> does a fsync call for each vma that it encounters in the range
> specified.
> 
>> 8. POSIX make no mandate for a unified cache system. Thus,
>>    we have MS_ASYNC and MS_INVALIDATE in the standard, and
>>    the standard says nothing (AFAIK) about whether munmap() 
>>    will flush data. On Linux (and probably most modern systems),
>>    we're fine. but portable applications that care about 
>>    standards and nonunified caches need to use msync().
>>
>>    My advice: To ensure that the contents of a shared file
>>    mapping are written to the underlying file--even on bad old
>>    implementations--a call to msync() should be made before 
>>    unmapping a mapping with munmap().
> 
> Agreed.

Thanks for checking all of this over and thanks also
for confirming that I learned my lessens well in the
"Jamie Lokier school of tough technical reviewing" ;-).

>> 9. The mmap() man page says this:
>>
>>        MAP_SHARED 
>>            Share this mapping.  Updates to the mapping are vis???
>>            ible to other processes that map this file, and  are
>>            carried  through  to  the underlying file.  The file
>>            may not actually be updated until msync(2)  or  mun???
>>            map() is called.
>>
>>    I believe the piece "or munmap()" is misleading. It implies
>>    that munmap() must trigger a sync action. I don't think this
>>    is true. All that it is required to do is remove some range
>>    of pages from the process's virtual address space. I'm
>>    inclined to remove those words, but I'd like to see if any
>>    FS person has a correction to my understanding first.
> 
> I would expect non-coherent systems to update their caches on munmap,
> Posix does not seem to require this, and I can't find any language
> towards that in the HP-UX man page, which was a system that I remember
> as non-coherent until the end.

Yes, that's how I read it too. POSIX seems to have no requirements here,
so I assume it was catering to to the lowest common denominator.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2014-04-21 19:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-20 10:28 munmap, msync: synchronization Heinrich Schuchardt
2014-04-21 10:16 ` Michael Kerrisk (man-pages)
     [not found]   ` <5354F00E.8050609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-21 18:14     ` Christoph Hellwig
2014-04-21 19:54       ` Michael Kerrisk (man-pages) [this message]
2014-04-21 21:34         ` Jamie Lokier
     [not found]           ` <20140421213418.GH30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-22  6:03             ` Christoph Hellwig
2014-04-22  7:04               ` Jamie Lokier
2014-04-22  9:28                 ` [PATCH] fsync_range, was: " Christoph Hellwig
2014-04-23 14:33                   ` Michael Kerrisk (man-pages)
2014-04-23 15:45                     ` Christoph Hellwig
     [not found]                       ` <20140423154550.GA21014-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:20                         ` Jamie Lokier
     [not found]                           ` <20140423222011.GM30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25  6:07                             ` Christoph Hellwig
2014-04-24  9:34                       ` Michael Kerrisk (man-pages)
     [not found]                   ` <20140422092837.GA6191-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:15                     ` Jamie Lokier
     [not found]                       ` <20140423221402.GL30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25  6:26                         ` Christoph Hellwig
2014-04-24  1:34                     ` Dave Chinner
2014-04-25  6:06                       ` Christoph Hellwig
2014-04-23 14:03       ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53557768.5070905@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tytso@mit.edu \
    --cc=xypron.glpk@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.