From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
linux-nvdimm <linux-nvdimm@ml01.01.org>,
Oleg Nesterov <oleg@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-mm <linux-mm@kvack.org>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Fri, 26 Feb 2016 09:27:06 +1100 [thread overview]
Message-ID: <20160225222705.GD30721@dastard> (raw)
In-Reply-To: <x49io1cik45.fsf@segfault.boston.devel.redhat.com>
On Thu, Feb 25, 2016 at 03:57:14PM -0500, Jeff Moyer wrote:
> Good morning, Dave,
>
> Dave Chinner <david@fromorbit.com> writes:
>
> > On Thu, Feb 25, 2016 at 02:11:49PM -0500, Jeff Moyer wrote:
> >> Jeff Moyer <jmoyer@redhat.com> writes:
> >>
> >> >> The big issue we have right now is that we haven't made the DAX/pmem
> >> >> infrastructure work correctly and reliably for general use. Hence
> >> >> adding new APIs to workaround cases where we haven't yet provided
> >> >> correct behaviour, let alone optimised for performance is, quite
> >> >> frankly, a clear case premature optimisation.
> >> >
> >> > Again, I see the two things as separate issues. You need both.
> >> > Implementing MAP_SYNC doesn't mean we don't have to solve the bigger
> >> > issue of making existing applications work safely.
> >>
> >> I want to add one more thing to this discussion, just for the sake of
> >> clarity. When I talk about existing applications and pmem, I mean
> >> applications that already know how to detect and recover from torn
> >> sectors. Any application that assumes hardware does not tear sectors
> >> should be run on a file system layered on top of the btt.
> >
> > Which turns off DAX, and hence makes this a moot discussion because
>
> You're missing the point. You can't take applications that don't know
> how to deal with torn sectors and put them on a block device that does
> not provide power fail write atomicity of a single sector.
Very few applications actually care about atomic sector writes.
Databases are probably the only class of application that really do
care about both single sector and multi-sector atomic write
behaviour, and many of them can be configured to assume single
sector writes can be torn.
Torn user data writes have always been possible, and so pmem does
not introduce any new semantics that applications have to handle.
> > Keep in mind that existing storage technologies tear fileystem data
> > writes, too, because user data writes are filesystem block sized and
> > not atomic at the device level (i.e. typical is 512 byte sector, 4k
> > filesystem block size, so there are 7 points in a single write where
> > a tear can occur on a crash).
>
> You are conflating torn pages (pages being a generic term for anything
> greater than a sector) and torn sectors.
No, I'm not. I'm pointing out that applications that really care
about data integrity already have the capability to recovery from
torn sectors in the event of a crash. pmem+DAX does not introduce
any new way of corrupting user data for these applications.
> > IOWs existing storage already has the capability of tearing user
> > data on crash and has been doing so for a least they last 30 years.
>
> And yet applications assume that this doesn't happen. Have a look at
> this:
> https://www.sqlite.org/psow.html
Quote:
"All versions of SQLite up to and including version 3.7.9 assume
that the filesystem does not provide powersafe overwrite. [...]
Hence it seems reasonable to assume powersafe overwrite for modern
disks. [...] Caution is advised though. As Roger Binns noted on the
SQLite developers mailing list: "'poorly written' should be the main
assumption about drive firmware."
IOWs, SQLite used to always assume that single sector overwrites can
be torn, and now that it is optional it recommends that users should
assume this is the way their storage behaves in order to be safe. In
this config, it uses the write ahead log even for single sector
writes, and hence can recover from torn sector writes without having
to detect that the write was torn.
Quote:
"SQLite never assumes that database page writes are atomic,
regardless of the PSOW setting.(1) And hence SQLite is always able
to automatically recover from torn pages induced by a crash."
This is Because multi-sector writes are always staged through the
write ahead log and hence are cleanly recoverable after a crash
without having to detect whether a torn write occurred or not.
IOWs, you've just pointed to an application that demonstrates
pmem-safe behaviour - just configure the database files with
"file:somefile.db?psow=0" and it will assume that individual sector
writes can be torn, and it will always recover.
Hence I'm not sure exactly what point you are trying to make with
this example.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-25 22:28 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24 ` Boaz Harrosh
2016-02-21 20:57 ` Dan Williams
2016-02-21 21:23 ` Boaz Harrosh
2016-02-21 22:03 ` Dan Williams
2016-02-21 22:31 ` Dave Chinner
2016-02-22 9:57 ` Boaz Harrosh
2016-02-22 15:34 ` Jeff Moyer
2016-02-22 17:44 ` Christoph Hellwig
2016-02-22 17:58 ` Jeff Moyer
2016-02-22 18:03 ` Christoph Hellwig
2016-02-22 18:52 ` Jeff Moyer
2016-02-23 9:45 ` Christoph Hellwig
2016-02-22 20:05 ` Rudoff, Andy
2016-02-23 9:52 ` Christoph Hellwig
2016-02-23 10:07 ` Rudoff, Andy
2016-02-23 12:06 ` Dave Chinner
2016-02-23 17:10 ` Ross Zwisler
2016-02-23 21:47 ` Dave Chinner
2016-02-23 22:15 ` Boaz Harrosh
2016-02-23 23:28 ` Dave Chinner
2016-02-24 0:08 ` Boaz Harrosh
2016-02-23 14:10 ` Boaz Harrosh
2016-02-23 16:56 ` Dan Williams
2016-02-23 17:05 ` Ross Zwisler
2016-02-23 17:26 ` Dan Williams
2016-02-23 21:55 ` Boaz Harrosh
2016-02-23 22:33 ` Dan Williams
2016-02-23 23:07 ` Boaz Harrosh
2016-02-23 23:23 ` Dan Williams
2016-02-23 23:40 ` Boaz Harrosh
2016-02-24 0:08 ` Dave Chinner
2016-02-23 23:28 ` Jeff Moyer
2016-02-23 23:34 ` Dan Williams
2016-02-23 23:43 ` Jeff Moyer
2016-02-23 23:56 ` Dan Williams
2016-02-24 4:09 ` Ross Zwisler
2016-02-24 19:30 ` Ross Zwisler
2016-02-25 9:46 ` Jan Kara
2016-02-25 7:44 ` Boaz Harrosh
2016-02-24 15:02 ` Jeff Moyer
2016-02-24 22:56 ` Dave Chinner
2016-02-25 16:24 ` Jeff Moyer
2016-02-25 19:11 ` Jeff Moyer
2016-02-25 20:15 ` Dave Chinner
2016-02-25 20:57 ` Jeff Moyer
2016-02-25 22:27 ` Dave Chinner [this message]
2016-02-26 4:02 ` Dan Williams
2016-02-26 10:04 ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17 ` Boaz Harrosh
2016-03-03 17:38 ` Howard Chu
2016-02-29 20:25 ` Jeff Moyer
2016-02-25 21:08 ` Phil Terry
2016-02-25 21:39 ` Dave Chinner
2016-02-25 21:20 ` Dave Chinner
2016-02-29 20:32 ` Jeff Moyer
2016-02-23 17:25 ` Ross Zwisler
2016-02-23 22:47 ` Boaz Harrosh
2016-02-22 21:50 ` Dave Chinner
2016-02-23 13:51 ` Boaz Harrosh
2016-02-23 14:22 ` Jeff Moyer
2016-02-22 11:05 ` Boaz Harrosh
2016-03-11 6:44 ` Andy Lutomirski
2016-03-11 19:07 ` Dan Williams
2016-03-11 19:10 ` Andy Lutomirski
2016-03-11 23:02 ` Rudoff, Andy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160225222705.GD30721@dastard \
--to=david@fromorbit.com \
--cc=arnd@arndb.de \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=mgorman@suse.de \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).