From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@lst.de>,
linux-kernel@vger.kernel.org,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@osdl.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
Ingo Molnar <mingo@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@lists.01.org, Matthew Wilcox <willy@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
x86@kernel.org
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 02 Sep 2015 13:04:01 +0300 [thread overview]
Message-ID: <55E6C991.5010006@plexistor.com> (raw)
In-Reply-To: <20150902031945.GA8916@linux.intel.com>
On 09/02/2015 06:19 AM, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>> Which means applications that should "just work" without
>> modification on DAX are now subtly broken and don't actually
>> guarantee data is safe after a crash. That's a pretty nasty
>> landmine, and goes against *everything* we've claimed about using
>> DAX with existing applications.
>>
>> That's wrong, and needs fixing.
>
> I agree that we need to fix fsync as well, and that the fsync solution could
> be used to implement msync if we choose to go that route. I think we might
> want to consider keeping the msync and fsync implementations separate, though,
> for two reasons.
>
> 1) The current msync implementation is much more efficient than what will be
> needed for fsync. Fsync will need to call into the filesystem, traverse all
> the blocks, get kernel virtual addresses from those and then call
> wb_cache_pmem() on those kernel addresses.
I was thinking about this some more, and no this is not what we need to do
because of the virtual-based-cache ARCHs. And what we do for these systems
will also work for physical-based-cache ARCHs.
What we need to do, is dig into the mapping structure and pic up the current
VMA on the call to fsync. Then just flush that one on that virtual address,
(since it is current at the context of the fsync sys call)
And of course we need to do like I wrote, we must call fsync on vm_operations->close
before the VMA mappings goes away. Then an fsync after unmap is a no-op.
> I think this is a necessary evil
> for fsync since you don't have a VMA, but for msync we do and we can just
> flush using the user addresses without any fs lookups.
>
right see above
> 2) I believe that the near-term fsync code will rely on struct pages for
> PMEM, which I believe are possible but optional as of Dan's last patch set:
>
> https://lkml.org/lkml/2015/8/25/841
>
> I believe that this means that if we don't have struct pages for PMEM (becuase
> ZONE_DEVICE et al. are turned off) fsync won't work. I'd be nice not to lose
> msync as well.
Please see above it can be made to work. Actually what we do is the
traversal-kernel-ptr thing, and the fsync-on-unmap. And it works we have heavy
persistence testing and it is all very good.
So no, without pages it can all work very-well. There is only the sync problem
that I intend to fix soon, is only a matter of keeping a dax-dirty inode-list
per sb.
So no this is not an excuse.
Cheers
Boaz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <boaz@plexistor.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@lst.de>,
linux-kernel@vger.kernel.org,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@osdl.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
Ingo Molnar <mingo@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@ml01.01.org, Matthew Wilcox <willy@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
x86@kernel.org
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 02 Sep 2015 13:04:01 +0300 [thread overview]
Message-ID: <55E6C991.5010006@plexistor.com> (raw)
In-Reply-To: <20150902031945.GA8916@linux.intel.com>
On 09/02/2015 06:19 AM, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>> Which means applications that should "just work" without
>> modification on DAX are now subtly broken and don't actually
>> guarantee data is safe after a crash. That's a pretty nasty
>> landmine, and goes against *everything* we've claimed about using
>> DAX with existing applications.
>>
>> That's wrong, and needs fixing.
>
> I agree that we need to fix fsync as well, and that the fsync solution could
> be used to implement msync if we choose to go that route. I think we might
> want to consider keeping the msync and fsync implementations separate, though,
> for two reasons.
>
> 1) The current msync implementation is much more efficient than what will be
> needed for fsync. Fsync will need to call into the filesystem, traverse all
> the blocks, get kernel virtual addresses from those and then call
> wb_cache_pmem() on those kernel addresses.
I was thinking about this some more, and no this is not what we need to do
because of the virtual-based-cache ARCHs. And what we do for these systems
will also work for physical-based-cache ARCHs.
What we need to do, is dig into the mapping structure and pic up the current
VMA on the call to fsync. Then just flush that one on that virtual address,
(since it is current at the context of the fsync sys call)
And of course we need to do like I wrote, we must call fsync on vm_operations->close
before the VMA mappings goes away. Then an fsync after unmap is a no-op.
> I think this is a necessary evil
> for fsync since you don't have a VMA, but for msync we do and we can just
> flush using the user addresses without any fs lookups.
>
right see above
> 2) I believe that the near-term fsync code will rely on struct pages for
> PMEM, which I believe are possible but optional as of Dan's last patch set:
>
> https://lkml.org/lkml/2015/8/25/841
>
> I believe that this means that if we don't have struct pages for PMEM (becuase
> ZONE_DEVICE et al. are turned off) fsync won't work. I'd be nice not to lose
> msync as well.
Please see above it can be made to work. Actually what we do is the
traversal-kernel-ptr thing, and the fsync-on-unmap. And it works we have heavy
persistence testing and it is all very good.
So no, without pages it can all work very-well. There is only the sync problem
that I intend to fix soon, is only a matter of keeping a dax-dirty inode-list
per sb.
So no this is not an excuse.
Cheers
Boaz
next prev parent reply other threads:[~2015-09-02 10:04 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-31 18:59 [PATCH] dax, pmem: add support for msync Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:26 ` Ross Zwisler
2015-08-31 19:26 ` Ross Zwisler
2015-08-31 19:34 ` Christoph Hellwig
2015-08-31 19:34 ` Christoph Hellwig
2015-08-31 23:38 ` Dave Chinner
2015-08-31 23:38 ` Dave Chinner
2015-09-01 7:06 ` Christoph Hellwig
2015-09-01 7:06 ` Christoph Hellwig
2015-09-01 12:18 ` Boaz Harrosh
2015-09-01 12:18 ` Boaz Harrosh
2015-09-02 19:04 ` Ross Zwisler
2015-09-02 19:04 ` Ross Zwisler
2015-09-02 20:17 ` Kirill A. Shutemov
2015-09-02 20:17 ` Kirill A. Shutemov
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 6:32 ` Boaz Harrosh
2015-09-03 16:44 ` Ross Zwisler
2015-09-03 16:44 ` Ross Zwisler
2015-09-01 22:21 ` Dave Chinner
2015-09-01 22:21 ` Dave Chinner
2015-09-02 3:19 ` Ross Zwisler
2015-09-02 3:19 ` Ross Zwisler
2015-09-02 5:17 ` Dave Chinner
2015-09-02 5:17 ` Dave Chinner
2015-09-02 10:27 ` Boaz Harrosh
2015-09-02 10:27 ` Boaz Harrosh
2015-09-02 14:23 ` Dave Hansen
2015-09-02 14:23 ` Dave Hansen
2015-09-02 14:23 ` Dave Hansen
2015-09-02 15:18 ` Boaz Harrosh
2015-09-02 15:18 ` Boaz Harrosh
2015-09-02 15:39 ` Dave Hansen
2015-09-02 15:39 ` Dave Hansen
2015-09-02 15:39 ` Dave Hansen
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:00 ` Boaz Harrosh
2015-09-02 16:19 ` Dave Hansen
2015-09-02 16:19 ` Dave Hansen
2015-09-02 16:19 ` Dave Hansen
2015-09-03 6:41 ` Boaz Harrosh
2015-09-03 6:41 ` Boaz Harrosh
2015-09-02 10:04 ` Boaz Harrosh [this message]
2015-09-02 10:04 ` Boaz Harrosh
2015-09-01 10:08 ` Kirill A. Shutemov
2015-09-01 10:08 ` Kirill A. Shutemov
2015-09-01 11:27 ` Boaz Harrosh
2015-09-01 11:27 ` Boaz Harrosh
2015-09-01 22:49 ` Dave Chinner
2015-09-01 22:49 ` Dave Chinner
2015-09-02 9:13 ` Kirill A. Shutemov
2015-09-02 9:13 ` Kirill A. Shutemov
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:37 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:41 ` Boaz Harrosh
2015-09-02 9:47 ` Kirill A. Shutemov
2015-09-02 9:47 ` Kirill A. Shutemov
2015-09-02 10:28 ` Boaz Harrosh
2015-09-02 10:28 ` Boaz Harrosh
2015-09-03 0:57 ` Dave Chinner
2015-09-03 0:57 ` Dave Chinner
2015-09-01 13:12 ` Boaz Harrosh
2015-09-01 13:12 ` Boaz Harrosh
2015-09-02 17:47 ` Ross Zwisler
2015-09-02 17:47 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55E6C991.5010006@plexistor.com \
--to=boaz@plexistor.com \
--cc=akpm@osdl.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.