From: Boaz Harrosh <boaz@plexistor.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
"Rudoff, Andy" <andy.rudoff@intel.com>,
Dave Chinner <david@fromorbit.com>,
Jeff Moyer <jmoyer@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
linux-nvdimm <linux-nvdimm@ml01.01.org>,
Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Tue, 23 Feb 2016 23:55:24 +0200 [thread overview]
Message-ID: <56CCD54C.3010600@plexistor.com> (raw)
In-Reply-To: <CAPcyv4gTaikkXCG1fPBVT-0DE8Wst3icriUH5cbQH3thuEe-ow@mail.gmail.com>
On 02/23/2016 06:56 PM, Dan Williams wrote:
> On Tue, Feb 23, 2016 at 6:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
>> On 02/23/2016 11:52 AM, Christoph Hellwig wrote:
> [..]
>> Please tell me what you find wrong with my approach?
>
> Setting aside fs interactions you didn't respond to my note about
> architectures where the pmem-aware app needs to flush caches due to
> other non-pmem aware apps sharing the mapping. Non-temporal stores
> guaranteeing persistence on their own is an architecture specific
> feature. I don't see how we can have a generic support for mixed
> MAP_PMEM_AWARE / unaware shared mappings when the architecture
> dependency exists [1].
>
I thought I did. your Pentium M example below is just fine.
Or I'm missing something really big here. so you will need
to step me through real slow.
Lets say we have a very silly system
[Which BTW will never exist because again the presidence of NFS and
applications are written to do work also over NFS]
So say in this system two applications one writes to all the even
addressed longs and the second writes to all the odd addressed in
a given page. then app 1 syncs and so does app 2. Only after both syncs
the system is stable at a known checkpoint, because before to union of the
two syncs we do not know what will persist to harddisk, right?
Now say we are dax and app 1 is MAP_PMEM_AWAR and app 2 is old.
app 1] faults in page X; Does its "evens" stores Pentium M movnt style directly
to memory, all odd addresses new values are still in cache.
app 2] faults in page X; the page is in the radix tree because it is "old-style"
does its cached "odds" stores; calls a sync that does cl_flush.
Lets look at a single cacheline.
- If app 2 sync came before app1 movnt then in memory we have a zebra of zeros and app2 values.
but once app 1 came along and did its movnt all expected values are there persistent.
- If app 1 stores came before app 2 sync, then we have a zebra of app1 + zeros.
But once sync came we have persistent both values.
In any which case we are guarantied persistence when both apps finished their
run. If we interrupt the run at any point before, we will have zebra cachlines
even if we are talking about a regular harddisk with regular volatile page cache.
So I fail to see what is broken, please explain. What broken senario you are
seeing? that before dax/none-dax would work?
(For me BTW the two applications that intimately share a single cacheline are one
multi process application and for me they need to understand what they are doing.
if the admin upgrages the one he should also upgrade the other. Look in the real
world, who are heavy users of MAP_SHARED, can you imagine gcc linker sharing the same
file with another concurrent application? the only one that I know that remotely does
that is git. And git makes sure to take file locks when it writes such shared records.
Git works over NFS as well)
But seriously please explain the problem. I do not see one.
> I think Christoph has already pointed out the roadmap. Get the
> existing crop of DAX bugs squashed
Sure that's always true, I'm a stability freak through and through ask
the guys who work with me. I like to sleep at night ;-)
> and then *maybe* look at something
> like a MAP_SYNC to opt-out of userspace needing to call *sync.
>
MAP_SYNC Is another novelty, which as Dave showed will not be implemented
by such a legacy filesystem as xfs. any time soon. sync is needed not only
for memory stores. For me this is a supper set of what I proposed. because
again any file writes persistence is built of two parts durable data, and
durable meta-data. My flag says, app takes care of data, then the other part
can be done another way. For performance sake which is what I care about
the heavy lifting is done at the data path. the meta data is marginal.
If you want for completeness sake then fine have another flag.
The new app written will need to do its new pmem_memcpy magic any way.
then we are saying "do we need to call fsync() or not?"
I hate it that you postpone that to never because it would be nice for
philosophical sake to not have the app call sync at all. and all these
years suffer the performance penalty. Instead of putting in a 10 liners
patch today that has no risks, and yes forces new apps to keep the ugly
fsync() call, but have the targeted performance today instead of *maybe* never.
my path is a nice intermediate progression to yours. Yours blocks my needs
indefinitely?
> [1]: 10.4.6.2 Caching of Temporal vs. Non-Temporal Data
> "Some older CPU implementations (e.g., Pentium M) allowed addresses
> being written with a non-temporal store instruction to be updated
> in-place if the memory type was not WC and line was already in the
> cache."
>
> I wouldn't be surprised if other architectures had similar constraints.
>
Perhaps you are looking at this from the wrong perspective. Pentium M
can do this because the two cores shared the same cache. But we are talking
about POSIX files semantics. Not CPU memory semantics. Some of our problems
go away.
Or am I missing something out and I'm completely clueless. Please explain
slowly.
Thanks
Boaz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-23 21:55 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-21 17:03 [RFC 0/2] New MAP_PMEM_AWARE mmap flag Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24 ` Boaz Harrosh
2016-02-21 20:57 ` Dan Williams
2016-02-21 21:23 ` Boaz Harrosh
2016-02-21 22:03 ` Dan Williams
2016-02-21 22:31 ` Dave Chinner
2016-02-22 9:57 ` Boaz Harrosh
2016-02-22 15:34 ` Jeff Moyer
2016-02-22 17:44 ` Christoph Hellwig
2016-02-22 17:58 ` Jeff Moyer
2016-02-22 18:03 ` Christoph Hellwig
2016-02-22 18:52 ` Jeff Moyer
2016-02-23 9:45 ` Christoph Hellwig
2016-02-22 20:05 ` Rudoff, Andy
2016-02-23 9:52 ` Christoph Hellwig
2016-02-23 10:07 ` Rudoff, Andy
2016-02-23 12:06 ` Dave Chinner
2016-02-23 17:10 ` Ross Zwisler
2016-02-23 21:47 ` Dave Chinner
2016-02-23 22:15 ` Boaz Harrosh
2016-02-23 23:28 ` Dave Chinner
2016-02-24 0:08 ` Boaz Harrosh
2016-02-23 14:10 ` Boaz Harrosh
2016-02-23 16:56 ` Dan Williams
2016-02-23 17:05 ` Ross Zwisler
2016-02-23 17:26 ` Dan Williams
2016-02-23 21:55 ` Boaz Harrosh [this message]
2016-02-23 22:33 ` Dan Williams
2016-02-23 23:07 ` Boaz Harrosh
2016-02-23 23:23 ` Dan Williams
2016-02-23 23:40 ` Boaz Harrosh
2016-02-24 0:08 ` Dave Chinner
2016-02-23 23:28 ` Jeff Moyer
2016-02-23 23:34 ` Dan Williams
2016-02-23 23:43 ` Jeff Moyer
2016-02-23 23:56 ` Dan Williams
2016-02-24 4:09 ` Ross Zwisler
2016-02-24 19:30 ` Ross Zwisler
2016-02-25 9:46 ` Jan Kara
2016-02-25 7:44 ` Boaz Harrosh
2016-02-24 15:02 ` Jeff Moyer
2016-02-24 22:56 ` Dave Chinner
2016-02-25 16:24 ` Jeff Moyer
2016-02-25 19:11 ` Jeff Moyer
2016-02-25 20:15 ` Dave Chinner
2016-02-25 20:57 ` Jeff Moyer
2016-02-25 22:27 ` Dave Chinner
2016-02-26 4:02 ` Dan Williams
2016-02-26 10:04 ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17 ` Boaz Harrosh
2016-03-03 17:38 ` Howard Chu
2016-02-29 20:25 ` Jeff Moyer
2016-02-25 21:08 ` Phil Terry
2016-02-25 21:39 ` Dave Chinner
2016-02-25 21:20 ` Dave Chinner
2016-02-29 20:32 ` Jeff Moyer
2016-02-23 17:25 ` Ross Zwisler
2016-02-23 22:47 ` Boaz Harrosh
2016-02-22 21:50 ` Dave Chinner
2016-02-23 13:51 ` Boaz Harrosh
2016-02-23 14:22 ` Jeff Moyer
2016-02-22 11:05 ` Boaz Harrosh
2016-03-11 6:44 ` Andy Lutomirski
2016-03-11 19:07 ` Dan Williams
2016-03-11 19:10 ` Andy Lutomirski
2016-03-11 23:02 ` Rudoff, Andy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56CCD54C.3010600@plexistor.com \
--to=boaz@plexistor.com \
--cc=andy.rudoff@intel.com \
--cc=arnd@arndb.de \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=mgorman@suse.de \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).