linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Rudoff, Andy" <andy.rudoff@intel.com>,
	Jeff Moyer <jmoyer@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Wed, 24 Feb 2016 11:08:08 +1100	[thread overview]
Message-ID: <20160224000808.GJ14668@dastard> (raw)
In-Reply-To: <56CCEE09.7070204@plexistor.com>

On Wed, Feb 24, 2016 at 01:40:57AM +0200, Boaz Harrosh wrote:
> On 02/24/2016 01:23 AM, Dan Williams wrote:
> > On Tue, Feb 23, 2016 at 3:07 PM, Boaz Harrosh <boaz@plexistor.com> wrote:
> >> On 02/24/2016 12:33 AM, Dan Williams wrote:
> > 
> >>> The crux of the problem, in my opinion, is that we're asking for an "I
> >>> know what I'm doing" flag, and I expect that's an impossible statement
> >>> for a filesystem to trust generically.  If you can get MAP_PMEM_AWARE
> >>> in, great, but I'm more and more of the opinion that the "I know what
> >>> I'm doing" interface should be something separate from today's trusted
> >>> filesystems.
> >>>
> >>
> >> I disagree. I'm not saying any "trust me I know what I'm doing" flag.
> >> the FS reveals nothing and trusts nothing.
> >> All I'm saying is that the libc library I'm using as the new pmem_memecpy()
> >> and I'm using that instead of the old memecpy(). So the FS does not need to
> >> wipe my face after I eat. Failing to do so just means a bug in the application
> > 
> > "just means a bug in the application"
> > 
> > Who gets the bug report when an app gets its cache syncing wrong and
> > data corruption ensues, and why isn't the fix for that bug that the
> > filesystem simply stops trusting MAP_PMEM_AWARE and synching
> > cachelines on behalf of the app when it calls sync as it must for
> > metadata consistency.  Problem solved globally for all broken usages
> > of MAP_PMEM_AWARE and the flag loses all meaning as a result.
> > 
> 
> Because this will not fix the application's bugs. Because if the application
> is broken then you do not know that this will fix it. It is broken it failed
> to uphold the contract it had with the Kernel.

That's not the point Dan was making. Data corruption bugs are going
to get reported to the filesystem developers, not the application
developers, because usres think that data corruption is always the
fault of the filesystem. How is the filesystem developer going to
know that a) the app is using DAX, b) the app has set some special
"I know what I'm doing flag", and c) the app doesn't actually know
what it is doing.

We are simply going to assume c) - from long experience I don't
trust any application developer to understand how data integrity
works. Almost any app developer that says they understand how
filesystems provide data integrity are almost always competely
wrong.

Hell, this thread has made me understand that most pmem developers
don't understand how filesystems provide data integrity guarantees.
Why should we trust applicaiton developers to do better?

> It is like saying lets call fsync on file close because broken apps keep
> forgetting to call fsync(). And file close is called even if the app crashes.
> Will Dave do that?

/me points to XFS_ITRUNCATE and xfs_release().

Yes, we already flush data on close in situations where data loss is
common due to stupid application developers refusing to use fsync
because "it's too slow".

ext4 has similar flush on close behaviours for the same reasons.

> No if an app has a bug like this falling to call the proper pmem_xxx routine
> in the proper work flow, it might has just forgotten to call fsync, or maybe
> still modifying memory after fsync was called. And your babysitting the app
> will not help.

History tells us otherwise. users always blame the filesystem first,
and then app developers will refuse to fix their applications
because it would either make their app slow or they think it's a
filesystem problem to solve because they tested on some other
filesystem and it didn't display that behaviour. The result is we
end up working around such problems in the filesystem so that users
don't end up losing data due to shit applications.

The same will happen here - filesystems will end up ignoring this
special "I know what I'm doing" flag because the vast majority of
app developers don't know enough to even realise that they don't
know what they are doing.

I *really* don't care about speed and performance here. I care about
reliability, resilience and data integrity. Speed comes from the
storage hardware being fast, not from filesystems ignoring
reliability, resilience and data integrity.

> > This is the takeaway I've internalized from Dave's pushback of these
> > new mmap flags.
> > 
> 
> We are already used to tell the firefox guys, you did not call fsync and
> you lost data on a crash.
> 
> We will have a new mantra, "You did not use pmem_memcpy() but used MAP_PMEM_AWARE"
> We have contracts like that between Kernel and apps all the time. I fail to see why
> this one crossed the line for you?

So, you prefer to repeat past mistakes rather than learning from
them. I prefer that we don't make the same mistakes again and so
have to live with them for the next 20 years.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-24  0:08 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24   ` Boaz Harrosh
2016-02-21 20:57     ` Dan Williams
2016-02-21 21:23       ` Boaz Harrosh
2016-02-21 22:03         ` Dan Williams
2016-02-21 22:31           ` Dave Chinner
2016-02-22  9:57             ` Boaz Harrosh
2016-02-22 15:34             ` Jeff Moyer
2016-02-22 17:44               ` Christoph Hellwig
2016-02-22 17:58                 ` Jeff Moyer
2016-02-22 18:03                   ` Christoph Hellwig
2016-02-22 18:52                     ` Jeff Moyer
2016-02-23  9:45                       ` Christoph Hellwig
2016-02-22 20:05                 ` Rudoff, Andy
2016-02-23  9:52                   ` Christoph Hellwig
2016-02-23 10:07                     ` Rudoff, Andy
2016-02-23 12:06                       ` Dave Chinner
2016-02-23 17:10                         ` Ross Zwisler
2016-02-23 21:47                           ` Dave Chinner
2016-02-23 22:15                             ` Boaz Harrosh
2016-02-23 23:28                               ` Dave Chinner
2016-02-24  0:08                                 ` Boaz Harrosh
2016-02-23 14:10                     ` Boaz Harrosh
2016-02-23 16:56                       ` Dan Williams
2016-02-23 17:05                         ` Ross Zwisler
2016-02-23 17:26                           ` Dan Williams
2016-02-23 21:55                         ` Boaz Harrosh
2016-02-23 22:33                           ` Dan Williams
2016-02-23 23:07                             ` Boaz Harrosh
2016-02-23 23:23                               ` Dan Williams
2016-02-23 23:40                                 ` Boaz Harrosh
2016-02-24  0:08                                   ` Dave Chinner [this message]
2016-02-23 23:28                             ` Jeff Moyer
2016-02-23 23:34                               ` Dan Williams
2016-02-23 23:43                                 ` Jeff Moyer
2016-02-23 23:56                                   ` Dan Williams
2016-02-24  4:09                                     ` Ross Zwisler
2016-02-24 19:30                                       ` Ross Zwisler
2016-02-25  9:46                                         ` Jan Kara
2016-02-25  7:44                                       ` Boaz Harrosh
2016-02-24 15:02                                     ` Jeff Moyer
2016-02-24 22:56                                       ` Dave Chinner
2016-02-25 16:24                                         ` Jeff Moyer
2016-02-25 19:11                                           ` Jeff Moyer
2016-02-25 20:15                                             ` Dave Chinner
2016-02-25 20:57                                               ` Jeff Moyer
2016-02-25 22:27                                                 ` Dave Chinner
2016-02-26  4:02                                                   ` Dan Williams
2016-02-26 10:04                                                     ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17                                                       ` Boaz Harrosh
2016-03-03 17:38                                                         ` Howard Chu
2016-02-29 20:25                                                   ` Jeff Moyer
2016-02-25 21:08                                               ` Phil Terry
2016-02-25 21:39                                                 ` Dave Chinner
2016-02-25 21:20                                           ` Dave Chinner
2016-02-29 20:32                                             ` Jeff Moyer
2016-02-23 17:25                       ` Ross Zwisler
2016-02-23 22:47                         ` Boaz Harrosh
2016-02-22 21:50               ` Dave Chinner
2016-02-23 13:51               ` Boaz Harrosh
2016-02-23 14:22                 ` Jeff Moyer
2016-02-22 11:05           ` Boaz Harrosh
2016-03-11  6:44 ` Andy Lutomirski
2016-03-11 19:07   ` Dan Williams
2016-03-11 19:10     ` Andy Lutomirski
2016-03-11 23:02       ` Rudoff, Andy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160224000808.GJ14668@dastard \
    --to=david@fromorbit.com \
    --cc=andy.rudoff@intel.com \
    --cc=arnd@arndb.de \
    --cc=boaz@plexistor.com \
    --cc=dan.j.williams@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).