All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@osdl.org>, Christoph Hellwig <hch@lst.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, Matthew Wilcox <willy@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 2 Sep 2015 08:49:22 +1000	[thread overview]
Message-ID: <20150901224922.GR3902@dastard> (raw)
In-Reply-To: <20150901100804.GA7045@node.dhcp.inet.fi>

On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > Even for DAX, msync has to call vfs_fsync_range() for the filesystem to commit
> > the backing store allocations to stable storage, so there's not
> > getting around the fact msync is the wrong place to be flushing
> > DAX mappings to persistent storage.
> 
> Why?
> IIUC, msync() doesn't have any requirements wrt metadata, right?

Of course it does. If the backing store allocation has not been
committed, then after a crash there will be a hole in file and
so it will read as zeroes regardless of what data was written and
flushed.

> > I pointed this out almost 6 months ago (i.e. that fsync was broken)
> > anf hinted at how to solve it. Fix fsync, and msync gets fixed for
> > free:
> > 
> > https://lists.01.org/pipermail/linux-nvdimm/2015-March/000341.html
> > 
> > I've also reported to Willy that DAX write page faults don't work
> > correctly, either. xfstests generic/080 exposes this: a read
> > from a page followed immediately by a write to that page does not
> > result in ->page_mkwrite being called on the write and so
> > backing store is not allocated for the page, nor are the timestamps
> > for the file updated. This will also result in fsync (and msync)
> > not working properly.
> 
> Is that because XFS doesn't provide vm_ops->pfn_mkwrite?

I didn't know that had been committed. I don't recall seeing a pull
request with that in it, none of the XFS DAX patches conflicted
against it and there's been no runtime errors. I'll fix it up.

As such, shouldn't there be a check in the VM (in ->mmap callers)
that if we have the vma is returned with VM_MIXEDMODE enabled that
->pfn_mkwrite is not NULL?  It's now clear to me that any filesystem
that sets VM_MIXEDMODE needs to support both page_mkwrite and
pfn_mkwrite, and such a check would have caught this immediately...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@osdl.org>, Christoph Hellwig <hch@lst.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@ml01.01.org, Matthew Wilcox <willy@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org
Subject: Re: [PATCH] dax, pmem: add support for msync
Date: Wed, 2 Sep 2015 08:49:22 +1000	[thread overview]
Message-ID: <20150901224922.GR3902@dastard> (raw)
In-Reply-To: <20150901100804.GA7045@node.dhcp.inet.fi>

On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > Even for DAX, msync has to call vfs_fsync_range() for the filesystem to commit
> > the backing store allocations to stable storage, so there's not
> > getting around the fact msync is the wrong place to be flushing
> > DAX mappings to persistent storage.
> 
> Why?
> IIUC, msync() doesn't have any requirements wrt metadata, right?

Of course it does. If the backing store allocation has not been
committed, then after a crash there will be a hole in file and
so it will read as zeroes regardless of what data was written and
flushed.

> > I pointed this out almost 6 months ago (i.e. that fsync was broken)
> > anf hinted at how to solve it. Fix fsync, and msync gets fixed for
> > free:
> > 
> > https://lists.01.org/pipermail/linux-nvdimm/2015-March/000341.html
> > 
> > I've also reported to Willy that DAX write page faults don't work
> > correctly, either. xfstests generic/080 exposes this: a read
> > from a page followed immediately by a write to that page does not
> > result in ->page_mkwrite being called on the write and so
> > backing store is not allocated for the page, nor are the timestamps
> > for the file updated. This will also result in fsync (and msync)
> > not working properly.
> 
> Is that because XFS doesn't provide vm_ops->pfn_mkwrite?

I didn't know that had been committed. I don't recall seeing a pull
request with that in it, none of the XFS DAX patches conflicted
against it and there's been no runtime errors. I'll fix it up.

As such, shouldn't there be a check in the VM (in ->mmap callers)
that if we have the vma is returned with VM_MIXEDMODE enabled that
->pfn_mkwrite is not NULL?  It's now clear to me that any filesystem
that sets VM_MIXEDMODE needs to support both page_mkwrite and
pfn_mkwrite, and such a check would have caught this immediately...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2015-09-01 22:49 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-31 18:59 [PATCH] dax, pmem: add support for msync Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 18:59 ` Ross Zwisler
2015-08-31 19:06 ` Christoph Hellwig
2015-08-31 19:06   ` Christoph Hellwig
2015-08-31 19:06   ` Christoph Hellwig
2015-08-31 19:26   ` Ross Zwisler
2015-08-31 19:26     ` Ross Zwisler
2015-08-31 19:34     ` Christoph Hellwig
2015-08-31 19:34       ` Christoph Hellwig
2015-08-31 23:38 ` Dave Chinner
2015-08-31 23:38   ` Dave Chinner
2015-09-01  7:06   ` Christoph Hellwig
2015-09-01  7:06     ` Christoph Hellwig
2015-09-01 12:18     ` Boaz Harrosh
2015-09-01 12:18       ` Boaz Harrosh
2015-09-02 19:04       ` Ross Zwisler
2015-09-02 19:04         ` Ross Zwisler
2015-09-02 20:17         ` Kirill A. Shutemov
2015-09-02 20:17           ` Kirill A. Shutemov
2015-09-03  6:32         ` Boaz Harrosh
2015-09-03  6:32           ` Boaz Harrosh
2015-09-03  6:32           ` Boaz Harrosh
2015-09-03 16:44           ` Ross Zwisler
2015-09-03 16:44             ` Ross Zwisler
2015-09-01 22:21     ` Dave Chinner
2015-09-01 22:21       ` Dave Chinner
2015-09-02  3:19       ` Ross Zwisler
2015-09-02  3:19         ` Ross Zwisler
2015-09-02  5:17         ` Dave Chinner
2015-09-02  5:17           ` Dave Chinner
2015-09-02 10:27           ` Boaz Harrosh
2015-09-02 10:27             ` Boaz Harrosh
2015-09-02 14:23             ` Dave Hansen
2015-09-02 14:23               ` Dave Hansen
2015-09-02 14:23               ` Dave Hansen
2015-09-02 15:18               ` Boaz Harrosh
2015-09-02 15:18                 ` Boaz Harrosh
2015-09-02 15:39                 ` Dave Hansen
2015-09-02 15:39                   ` Dave Hansen
2015-09-02 15:39                   ` Dave Hansen
2015-09-02 16:00                   ` Boaz Harrosh
2015-09-02 16:00                     ` Boaz Harrosh
2015-09-02 16:00                     ` Boaz Harrosh
2015-09-02 16:19                     ` Dave Hansen
2015-09-02 16:19                       ` Dave Hansen
2015-09-02 16:19                       ` Dave Hansen
2015-09-03  6:41                       ` Boaz Harrosh
2015-09-03  6:41                         ` Boaz Harrosh
2015-09-02 10:04         ` Boaz Harrosh
2015-09-02 10:04           ` Boaz Harrosh
2015-09-01 10:08   ` Kirill A. Shutemov
2015-09-01 10:08     ` Kirill A. Shutemov
2015-09-01 11:27     ` Boaz Harrosh
2015-09-01 11:27       ` Boaz Harrosh
2015-09-01 22:49     ` Dave Chinner [this message]
2015-09-01 22:49       ` Dave Chinner
2015-09-02  9:13       ` Kirill A. Shutemov
2015-09-02  9:13         ` Kirill A. Shutemov
2015-09-02  9:37         ` Boaz Harrosh
2015-09-02  9:37           ` Boaz Harrosh
2015-09-02  9:37           ` Boaz Harrosh
2015-09-02  9:41           ` Boaz Harrosh
2015-09-02  9:41             ` Boaz Harrosh
2015-09-02  9:41             ` Boaz Harrosh
2015-09-02  9:47             ` Kirill A. Shutemov
2015-09-02  9:47               ` Kirill A. Shutemov
2015-09-02 10:28               ` Boaz Harrosh
2015-09-02 10:28                 ` Boaz Harrosh
2015-09-03  0:57         ` Dave Chinner
2015-09-03  0:57           ` Dave Chinner
2015-09-01 13:12 ` Boaz Harrosh
2015-09-01 13:12   ` Boaz Harrosh
2015-09-02 17:47   ` Ross Zwisler
2015-09-02 17:47     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150901224922.GR3902@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@osdl.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.