linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Rudoff, Andy" <andy.rudoff@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jeff Moyer <jmoyer@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Tue, 23 Feb 2016 23:06:44 +1100	[thread overview]
Message-ID: <20160223120644.GL25832@dastard> (raw)
In-Reply-To: <7168B635-938B-44A0-BECD-C0774207B36D@intel.com>

On Tue, Feb 23, 2016 at 10:07:07AM +0000, Rudoff, Andy wrote:
> 
> > [Hi Andy - care to properly line break after ~75 character, that makes
> > ready the message a lot easier, thanks!]
> 
> My bad. 
> 
> >> The instructions give you very fine-grain flushing control, but the
> >> downside is that the app must track what it changes at that fine
> >> granularity.  Both models work, but there's a trade-off.
> > 
> > No, the cache flush model simply does not work without a lot of hard
> > work to enable it first.
> 
> It's working well enough to pass tests that simulate crashes and
> various workload tests for the apps involved. And I agree there
> has been a lot of hard work behind it. I guess I'm not sure why you're
> saying it is impossible or not working.
> 
> Let's take an example: an app uses fallocate() to create a DAX file,
> mmap() to map it, msync() to flush changes. The app follows POSIX
> meaning it doesn't expect file metadata to be flushed magically, etc.
> The app is tested carefully and it works correctly.  Now the msync()
> call used to flush stores is replaced by flushing instructions.
> What's broken?

You haven't told the filesytem to flush any dirty metadata required
to access the user data to persistent storage.  If the zeroing and
unwritten extent conversion that is run by the filesytem during
write faults into preallocated blocks isn't persistent, then after a
crash the file will read back as unwritten extents, returning zeros
rather than the data that was written.

msync() calls fsync() on file back pages, which makes file metadata
changes persistent.  Indeed, if you read the fdatasync man page, you
might have noticed that it makes explicit reference that it requires
the filesystem to flush the metadata needed to access the data that
is being synced. IOWs, the filesystem knows about this dirty
metadata that needs to be flushed to ensure data integrity,
userspace doesn't.

Not to mention that the filesystem will convert and zero much more
than just a single cacheline (whole pages at minimum, could be 2MB
extents for large pages, etc) so the filesystem may require CPU
cache flushes over a much wider range of cachelines that the
application realises are dirty and require flushing for data
integrity purposes. The filesytem knows about these dirty cache
lines, userspace doesn't.

IOWs, your userspace library may have made sure the data it modifies
is in the physical location via your userspace CPU cache flushes,
but there can be a lot of stuff it doesn't know about internal to
the filesytem that also needs to be flushed to ensure data integrity
is maintained.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-23 12:07 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24   ` Boaz Harrosh
2016-02-21 20:57     ` Dan Williams
2016-02-21 21:23       ` Boaz Harrosh
2016-02-21 22:03         ` Dan Williams
2016-02-21 22:31           ` Dave Chinner
2016-02-22  9:57             ` Boaz Harrosh
2016-02-22 15:34             ` Jeff Moyer
2016-02-22 17:44               ` Christoph Hellwig
2016-02-22 17:58                 ` Jeff Moyer
2016-02-22 18:03                   ` Christoph Hellwig
2016-02-22 18:52                     ` Jeff Moyer
2016-02-23  9:45                       ` Christoph Hellwig
2016-02-22 20:05                 ` Rudoff, Andy
2016-02-23  9:52                   ` Christoph Hellwig
2016-02-23 10:07                     ` Rudoff, Andy
2016-02-23 12:06                       ` Dave Chinner [this message]
2016-02-23 17:10                         ` Ross Zwisler
2016-02-23 21:47                           ` Dave Chinner
2016-02-23 22:15                             ` Boaz Harrosh
2016-02-23 23:28                               ` Dave Chinner
2016-02-24  0:08                                 ` Boaz Harrosh
2016-02-23 14:10                     ` Boaz Harrosh
2016-02-23 16:56                       ` Dan Williams
2016-02-23 17:05                         ` Ross Zwisler
2016-02-23 17:26                           ` Dan Williams
2016-02-23 21:55                         ` Boaz Harrosh
2016-02-23 22:33                           ` Dan Williams
2016-02-23 23:07                             ` Boaz Harrosh
2016-02-23 23:23                               ` Dan Williams
2016-02-23 23:40                                 ` Boaz Harrosh
2016-02-24  0:08                                   ` Dave Chinner
2016-02-23 23:28                             ` Jeff Moyer
2016-02-23 23:34                               ` Dan Williams
2016-02-23 23:43                                 ` Jeff Moyer
2016-02-23 23:56                                   ` Dan Williams
2016-02-24  4:09                                     ` Ross Zwisler
2016-02-24 19:30                                       ` Ross Zwisler
2016-02-25  9:46                                         ` Jan Kara
2016-02-25  7:44                                       ` Boaz Harrosh
2016-02-24 15:02                                     ` Jeff Moyer
2016-02-24 22:56                                       ` Dave Chinner
2016-02-25 16:24                                         ` Jeff Moyer
2016-02-25 19:11                                           ` Jeff Moyer
2016-02-25 20:15                                             ` Dave Chinner
2016-02-25 20:57                                               ` Jeff Moyer
2016-02-25 22:27                                                 ` Dave Chinner
2016-02-26  4:02                                                   ` Dan Williams
2016-02-26 10:04                                                     ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17                                                       ` Boaz Harrosh
2016-03-03 17:38                                                         ` Howard Chu
2016-02-29 20:25                                                   ` Jeff Moyer
2016-02-25 21:08                                               ` Phil Terry
2016-02-25 21:39                                                 ` Dave Chinner
2016-02-25 21:20                                           ` Dave Chinner
2016-02-29 20:32                                             ` Jeff Moyer
2016-02-23 17:25                       ` Ross Zwisler
2016-02-23 22:47                         ` Boaz Harrosh
2016-02-22 21:50               ` Dave Chinner
2016-02-23 13:51               ` Boaz Harrosh
2016-02-23 14:22                 ` Jeff Moyer
2016-02-22 11:05           ` Boaz Harrosh
2016-03-11  6:44 ` Andy Lutomirski
2016-03-11 19:07   ` Dan Williams
2016-03-11 19:10     ` Andy Lutomirski
2016-03-11 23:02       ` Rudoff, Andy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160223120644.GL25832@dastard \
    --to=david@fromorbit.com \
    --cc=andy.rudoff@intel.com \
    --cc=arnd@arndb.de \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).