All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Rudoff, Andy" <andy.rudoff@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Jeff Moyer <jmoyer@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Tue, 23 Feb 2016 10:25:12 -0700	[thread overview]
Message-ID: <20160223172512.GC15877@linux.intel.com> (raw)
In-Reply-To: <56CC686A.9040909@plexistor.com>

On Tue, Feb 23, 2016 at 04:10:50PM +0200, Boaz Harrosh wrote:
> On 02/23/2016 11:52 AM, Christoph Hellwig wrote:
> <>
> > 
> > And this is BS.  Using msync or fsync might not perform as well as not
> > actually using them, but without them you do not get persistence.  If
> > you use your pmem as a throw away cache that's fine, but for most people
> > that is not the case.
> > 
> 
> Hi Christoph
> 
> So is exactly my suggestion. My approach is *not* the we do not call
> m/fsync to let the FS clean up.
> 
> In my model we still do that, only we eliminate the m/fsync slowness
> and the all page faults overhead by being instructed by the application
> that we do not need to track the data modified cachelines. Since the
> application is telling us that it will do so.
> 
> In my model the job is split:
>  App will take care of data persistence by instructing a MAP_PMEM_AWARE,
>  and doing its own cl_flushing / movnt.
>  Which is the heavy cost
> 
>  The FS will keep track of the Meta-Data persistence as it already does, via the
>  call to m/fsync. Which is marginal performance compared to the above heavy
>  IO.
> 
> Note that the FS is still free to move blocks around, as Dave said:
> lockout pagefaultes, unmap from user space, let app fault again on a new
> block. this will still work as before, already in COW we flush the old
> block so there will be no persistence lost.
> 
> So this all thread started with my patches, and my patches do not say
> "no m/fsync" they say, make this 3-8 times faster than today if the app
> is participating in the heavy lifting.
> 
> Please tell me what you find wrong with my approach?

It seems like we are trying to solve a couple of different problems:

1) Make page faults faster by skipping any radix tree insertions, tag updates,
etc.

2) Make fsync/msync faster by not flushing data that the application says it
is already making durable from userspace.

I agree that your approach seems to improve both of these problems, but I
would argue that it is an incomplete solution for problem #2 because a
fsync/msync from the PMEM aware application would still flush any radix tree
entries from *other* threads that were writing to the same file.

It seems like a more direct solution for #2 above would be to have a
metadata-only equivalent of fsync/fdatasync, say "fmetasync", which says "I'll
make the writes I do to my mmaps durable from userspace, but I need you to
sync all filesystem metadata for me, please".

This would allow a complete separation of data synchronization in userspace
from metadata synchronization in kernel space by the filesystem code.

By itself a fmetasync() type solution of course would do nothing for issue #1
- if that was a compelling issue you'd need something like the mmap tag you're
proposing to skip work on page faults.

All that being said, though, I agree with others in the thread that we should
still be focused on correctness, as we have a lot of correctness issues
remaining.  When we eventually get to the place where we are trying to do
performance optimizations, those optimizations should be measurement driven.

- Ross

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-23 17:25 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24   ` Boaz Harrosh
2016-02-21 20:57     ` Dan Williams
2016-02-21 21:23       ` Boaz Harrosh
2016-02-21 22:03         ` Dan Williams
2016-02-21 22:31           ` Dave Chinner
2016-02-22  9:57             ` Boaz Harrosh
2016-02-22 15:34             ` Jeff Moyer
2016-02-22 17:44               ` Christoph Hellwig
2016-02-22 17:58                 ` Jeff Moyer
2016-02-22 18:03                   ` Christoph Hellwig
2016-02-22 18:52                     ` Jeff Moyer
2016-02-23  9:45                       ` Christoph Hellwig
2016-02-22 20:05                 ` Rudoff, Andy
2016-02-23  9:52                   ` Christoph Hellwig
2016-02-23 10:07                     ` Rudoff, Andy
2016-02-23 12:06                       ` Dave Chinner
2016-02-23 17:10                         ` Ross Zwisler
2016-02-23 21:47                           ` Dave Chinner
2016-02-23 22:15                             ` Boaz Harrosh
2016-02-23 23:28                               ` Dave Chinner
2016-02-24  0:08                                 ` Boaz Harrosh
2016-02-23 14:10                     ` Boaz Harrosh
2016-02-23 16:56                       ` Dan Williams
2016-02-23 17:05                         ` Ross Zwisler
2016-02-23 17:26                           ` Dan Williams
2016-02-23 21:55                         ` Boaz Harrosh
2016-02-23 22:33                           ` Dan Williams
2016-02-23 23:07                             ` Boaz Harrosh
2016-02-23 23:23                               ` Dan Williams
2016-02-23 23:40                                 ` Boaz Harrosh
2016-02-24  0:08                                   ` Dave Chinner
2016-02-23 23:28                             ` Jeff Moyer
2016-02-23 23:34                               ` Dan Williams
2016-02-23 23:43                                 ` Jeff Moyer
2016-02-23 23:56                                   ` Dan Williams
2016-02-24  4:09                                     ` Ross Zwisler
2016-02-24 19:30                                       ` Ross Zwisler
2016-02-25  9:46                                         ` Jan Kara
2016-02-25  7:44                                       ` Boaz Harrosh
2016-02-24 15:02                                     ` Jeff Moyer
2016-02-24 22:56                                       ` Dave Chinner
2016-02-25 16:24                                         ` Jeff Moyer
2016-02-25 19:11                                           ` Jeff Moyer
2016-02-25 20:15                                             ` Dave Chinner
2016-02-25 20:57                                               ` Jeff Moyer
2016-02-25 22:27                                                 ` Dave Chinner
2016-02-26  4:02                                                   ` Dan Williams
2016-02-26 10:04                                                     ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17                                                       ` Boaz Harrosh
2016-02-28 10:17                                                         ` Boaz Harrosh
2016-03-03 17:38                                                         ` Howard Chu
2016-02-29 20:25                                                   ` Jeff Moyer
2016-02-25 21:08                                               ` Phil Terry
2016-02-25 21:39                                                 ` Dave Chinner
2016-02-25 21:20                                           ` Dave Chinner
2016-02-29 20:32                                             ` Jeff Moyer
2016-02-23 17:25                       ` Ross Zwisler [this message]
2016-02-23 22:47                         ` Boaz Harrosh
2016-02-22 21:50               ` Dave Chinner
2016-02-23 13:51               ` Boaz Harrosh
2016-02-23 14:22                 ` Jeff Moyer
2016-02-22 11:05           ` Boaz Harrosh
2016-03-11  6:44 ` Andy Lutomirski
2016-03-11 19:07   ` Dan Williams
2016-03-11 19:10     ` Andy Lutomirski
2016-03-11 23:02       ` Rudoff, Andy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160223172512.GC15877@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=andy.rudoff@intel.com \
    --cc=arnd@arndb.de \
    --cc=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.