From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
"Rudoff, Andy" <andy.rudoff@intel.com>,
Dave Chinner <david@fromorbit.com>,
Jeff Moyer <jmoyer@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
linux-nvdimm <linux-nvdimm@ml01.01.org>,
Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Wed, 24 Feb 2016 00:47:44 +0200 [thread overview]
Message-ID: <56CCE190.4060408@plexistor.com> (raw)
In-Reply-To: <20160223172512.GC15877@linux.intel.com>
On 02/23/2016 07:25 PM, Ross Zwisler wrote:
<>
>
> It seems like we are trying to solve a couple of different problems:
>
> 1) Make page faults faster by skipping any radix tree insertions, tag updates,
> etc.
>
> 2) Make fsync/msync faster by not flushing data that the application says it
> is already making durable from userspace.
>
I fail to see how this is separate issues the reason you are keeping track
of pages in [1] is exactly because you want to know which are they in [2].
Only [2] matters and [1] is what you thought is a necessary cost.
If you remember I wanted to solve [1] differently by iterating over the
extent lists of the mmap range and cl_flushing all the "written" pages in the
range not only the written ones. In our testes we found that for most real world
applications and benchmarks this works better then your approach. because
page-faults are fast.
There is however work loads that are much worse. In anyway your way was easier
because it had a generic solution instead of an FS specific implementation.
> I agree that your approach seems to improve both of these problems, but I
> would argue that it is an incomplete solution for problem #2 because a
> fsync/msync from the PMEM aware application would still flush any radix tree
> entries from *other* threads that were writing to the same file.
>
No!! you meant applications. Because threads are from the same application if
a programmer is dumb enough to upgrade one mmap call site to new and keep
all other sites legacy without the flag and pmem_mecpy, then he can suffer, I do
not care for dumb programmers.
For the two applications one new one legacy writing to the same file each written
by a different team of programmers. For one they do not exist. But for two
this is an administrator issue. Yes if he allows such a setup he knows that the
performance will not be has if both apps upgraded but it will still be better then
two legacy apps. because at least all the pages from the new app will not slow-sync.
> It seems like a more direct solution for #2 above would be to have a
> metadata-only equivalent of fsync/fdatasync, say "fmetasync", which says "I'll
> make the writes I do to my mmaps durable from userspace, but I need you to
> sync all filesystem metadata for me, please".
>
> This would allow a complete separation of data synchronization in userspace
> from metadata synchronization in kernel space by the filesystem code.
>
> By itself a fmetasync() type solution of course would do nothing for issue #1
> - if that was a compelling issue you'd need something like the mmap tag you're
> proposing to skip work on page faults.
>
Again a novelty solution to a theoretical only problem. With only very marginal
performance gains. And no users that I can see. And lots of work including FS
specific work.
> All that being said, though, I agree with others in the thread that we should
> still be focused on correctness, as we have a lot of correctness issues
> remaining. When we eventually get to the place where we are trying to do
> performance optimizations, those optimizations should be measurement driven.
>
What I'm hopping to do is establish a good practice for pmem aware apps
that everyone can agree on and will give us ground to optimize for.
That pmem apps can start to be written and experimented with.
The patch I sent is so simple and none intrusive that it can be easily
be carried in the noise and I cannot see how it breaks anything. And yes
I am measurement driven and is why I even bother.
And hence the RFC let us establish a programming model first.
> - Ross
>
Thanks
Boaz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-23 22:47 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24 ` Boaz Harrosh
2016-02-21 20:57 ` Dan Williams
2016-02-21 21:23 ` Boaz Harrosh
2016-02-21 22:03 ` Dan Williams
2016-02-21 22:31 ` Dave Chinner
2016-02-22 9:57 ` Boaz Harrosh
2016-02-22 15:34 ` Jeff Moyer
2016-02-22 17:44 ` Christoph Hellwig
2016-02-22 17:58 ` Jeff Moyer
2016-02-22 18:03 ` Christoph Hellwig
2016-02-22 18:52 ` Jeff Moyer
2016-02-23 9:45 ` Christoph Hellwig
2016-02-22 20:05 ` Rudoff, Andy
2016-02-23 9:52 ` Christoph Hellwig
2016-02-23 10:07 ` Rudoff, Andy
2016-02-23 12:06 ` Dave Chinner
2016-02-23 17:10 ` Ross Zwisler
2016-02-23 21:47 ` Dave Chinner
2016-02-23 22:15 ` Boaz Harrosh
2016-02-23 23:28 ` Dave Chinner
2016-02-24 0:08 ` Boaz Harrosh
2016-02-23 14:10 ` Boaz Harrosh
2016-02-23 16:56 ` Dan Williams
2016-02-23 17:05 ` Ross Zwisler
2016-02-23 17:26 ` Dan Williams
2016-02-23 21:55 ` Boaz Harrosh
2016-02-23 22:33 ` Dan Williams
2016-02-23 23:07 ` Boaz Harrosh
2016-02-23 23:23 ` Dan Williams
2016-02-23 23:40 ` Boaz Harrosh
2016-02-24 0:08 ` Dave Chinner
2016-02-23 23:28 ` Jeff Moyer
2016-02-23 23:34 ` Dan Williams
2016-02-23 23:43 ` Jeff Moyer
2016-02-23 23:56 ` Dan Williams
2016-02-24 4:09 ` Ross Zwisler
2016-02-24 19:30 ` Ross Zwisler
2016-02-25 9:46 ` Jan Kara
2016-02-25 7:44 ` Boaz Harrosh
2016-02-24 15:02 ` Jeff Moyer
2016-02-24 22:56 ` Dave Chinner
2016-02-25 16:24 ` Jeff Moyer
2016-02-25 19:11 ` Jeff Moyer
2016-02-25 20:15 ` Dave Chinner
2016-02-25 20:57 ` Jeff Moyer
2016-02-25 22:27 ` Dave Chinner
2016-02-26 4:02 ` Dan Williams
2016-02-26 10:04 ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17 ` Boaz Harrosh
2016-03-03 17:38 ` Howard Chu
2016-02-29 20:25 ` Jeff Moyer
2016-02-25 21:08 ` Phil Terry
2016-02-25 21:39 ` Dave Chinner
2016-02-25 21:20 ` Dave Chinner
2016-02-29 20:32 ` Jeff Moyer
2016-02-23 17:25 ` Ross Zwisler
2016-02-23 22:47 ` Boaz Harrosh [this message]
2016-02-22 21:50 ` Dave Chinner
2016-02-23 13:51 ` Boaz Harrosh
2016-02-23 14:22 ` Jeff Moyer
2016-02-22 11:05 ` Boaz Harrosh
2016-03-11 6:44 ` Andy Lutomirski
2016-03-11 19:07 ` Dan Williams
2016-03-11 19:10 ` Andy Lutomirski
2016-03-11 23:02 ` Rudoff, Andy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56CCE190.4060408@plexistor.com \
--to=boaz@plexistor.com \
--cc=andy.rudoff@intel.com \
--cc=arnd@arndb.de \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=mgorman@suse.de \
--cc=oleg@redhat.com \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).