* [PATCH v7 00/14] dax: fix dma vs truncate/hole-punch
@ 2018-03-21 22:57 Dan Williams
[not found] ` <152167302988.5268.4370226749268662682.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2018-03-21 22:57 UTC (permalink / raw)
To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
Cc: Michal Hocko, jack-AlSwsSmVLrQ, kbuild test robot,
Darrick J. Wong, Dave Hansen, Dave Chinner,
Jérôme Glisse, Andrew Morton,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andreas Dilger, Alexander Viro,
Jan Kara, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
Matthew Wilcox
Changes since v6 [1]:
* Collect some more reviewed-by's from Christoph, thanks Christoph!
* Rework XFS_MMAPLOCK_EXCL handling to *permit* rather than require
XFS_MMAPLOCK_EXCL to be held over calls to xfs_break_layouts()
(Christoph)
* Clean up kbuild robot reports against "ext2, dax: introduce ext2_dax_aops"
and "fs, dax: use page->mapping to warn if truncate collides with a
busy page".
* Squash the xfs_break_leased_layouts() 'did_unlock' rework into "xfs:
prepare xfs_break_layouts() for another layout type" (Christoph)
* Communicate the 'did_unlock' condition with an out parameter rather
than a positive error code (Christoph). A few other small / welcome
clean ups fell out as a result.
* Rename BREAK_TRUNCATE to BREAK_UNMAPI to make it clear the
implementation is concerned with any and all inode extent unmap
events, not just truncate(2). (Darrick)
* Rebase the branch on commit 6b2bb7265f0b "sched/wait: Introduce
wait_var_event()" from tip.git/sched/wait. Thanks Peter!
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-March/014806.html
---
Background:
get_user_pages() in the filesystem pins file backed memory pages for
access by devices performing dma. However, it only pins the memory pages
not the page-to-file offset association. If a file is truncated the
pages are mapped out of the file and dma may continue indefinitely into
a page that is owned by a device driver. This breaks coherency of the
file vs dma, but the assumption is that if userspace wants the
file-space truncated it does not matter what data is inbound from the
device, it is not relevant anymore. The only expectation is that dma can
safely continue while the filesystem reallocates the block(s).
Problem:
This expectation that dma can safely continue while the filesystem
changes the block map is broken by dax. With dax the target dma page
*is* the filesystem block. The model of leaving the page pinned for dma,
but truncating the file block out of the file, means that the filesytem
is free to reallocate a block under active dma to another file and now
the expected data-incoherency situation has turned into active
data-corruption.
Solution:
Defer all filesystem operations (fallocate(), truncate()) on a dax mode
file while any page/block in the file is under active dma. This solution
assumes that dma is transient. Cases where dma operations are known to
not be transient, like RDMA, have been explicitly disabled via
commits like 5f1d43de5416 "IB/core: disable memory registration of
filesystem-dax vmas".
The dax_layout_busy_page() routine is called by filesystems with a lock
held against mm faults (i_mmap_lock) to find pinned / busy dax pages.
The process of looking up a busy page invalidates all mappings
to trigger any subsequent get_user_pages() to block on i_mmap_lock.
The filesystem continues to call dax_layout_busy_page() until it finally
returns no more active pages. This approach assumes that the page
pinning is transient, if that assumption is violated the system would
have likely hung from the uncompleted I/O.
---
Dan Williams (14):
dax: store pfns in the radix
fs, dax: prepare for dax-specific address_space_operations
block, dax: remove dead code in blkdev_writepages()
xfs, dax: introduce xfs_dax_aops
ext4, dax: introduce ext4_dax_aops
ext2, dax: introduce ext2_dax_aops
fs, dax: use page->mapping to warn if truncate collides with a busy page
mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks
mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS
memremap: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
mm, fs, dax: handle layout changes to pinned dax mappings
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
xfs: prepare xfs_break_layouts() for another layout type
xfs, dax: introduce xfs_break_dax_layouts()
drivers/dax/super.c | 96 +++++++++++++++++--
drivers/nvdimm/pmem.c | 3 -
fs/Kconfig | 1
fs/block_dev.c | 5 -
fs/dax.c | 231 ++++++++++++++++++++++++++++++++++++----------
fs/ext2/ext2.h | 1
fs/ext2/inode.c | 43 +++++----
fs/ext2/namei.c | 18 ----
fs/ext2/super.c | 6 +
fs/ext4/inode.c | 38 ++++++--
fs/ext4/super.c | 6 +
fs/libfs.c | 27 +++++
fs/xfs/xfs_aops.c | 21 +++-
fs/xfs/xfs_aops.h | 1
fs/xfs/xfs_file.c | 76 ++++++++++++++-
fs/xfs/xfs_inode.h | 16 +++
fs/xfs/xfs_ioctl.c | 8 --
fs/xfs/xfs_iops.c | 21 +++-
fs/xfs/xfs_pnfs.c | 16 ++-
fs/xfs/xfs_pnfs.h | 6 +
fs/xfs/xfs_super.c | 20 ++--
include/linux/dax.h | 46 ++++++++-
include/linux/fs.h | 3 +
include/linux/memremap.h | 28 ++----
include/linux/mm.h | 61 +++++++++---
kernel/memremap.c | 32 +++++-
mm/Kconfig | 5 +
mm/gup.c | 5 +
mm/hmm.c | 13 ---
mm/swap.c | 3 -
30 files changed, 638 insertions(+), 218 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v7 05/14] ext4, dax: introduce ext4_dax_aops
[not found] ` <152167302988.5268.4370226749268662682.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2018-03-21 22:57 ` Dan Williams
[not found] ` <152167305782.5268.13485258587227210521.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2018-03-21 22:57 UTC (permalink / raw)
To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
Cc: jack-AlSwsSmVLrQ, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andreas Dilger,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, hch-jcswGhMUV9g
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings.
Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
fs/ext4/inode.c | 38 +++++++++++++++++++++++++++++++-------
1 file changed, 31 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c94780075b04..f9884e41cb39 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2725,12 +2725,6 @@ static int ext4_writepages(struct address_space *mapping,
percpu_down_read(&sbi->s_journal_flag_rwsem);
trace_ext4_writepages(inode, wbc);
- if (dax_mapping(mapping)) {
- ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev,
- wbc);
- goto out_writepages;
- }
-
/*
* No pages to write? This is mainly a kludge to avoid starting
* a transaction for special inodes like journal inode on last iput()
@@ -2955,6 +2949,27 @@ static int ext4_writepages(struct address_space *mapping,
return ret;
}
+static int ext4_dax_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ int ret;
+ long nr_to_write = wbc->nr_to_write;
+ struct inode *inode = mapping->host;
+ struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
+
+ if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+ return -EIO;
+
+ percpu_down_read(&sbi->s_journal_flag_rwsem);
+ trace_ext4_writepages(inode, wbc);
+
+ ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc);
+ trace_ext4_writepages_result(inode, wbc, ret,
+ nr_to_write - wbc->nr_to_write);
+ percpu_up_read(&sbi->s_journal_flag_rwsem);
+ return ret;
+}
+
static int ext4_nonda_switch(struct super_block *sb)
{
s64 free_clusters, dirty_clusters;
@@ -3946,6 +3961,13 @@ static const struct address_space_operations ext4_da_aops = {
.error_remove_page = generic_error_remove_page,
};
+static const struct address_space_operations ext4_dax_aops = {
+ .direct_IO = ext4_direct_IO,
+ .writepages = ext4_dax_writepages,
+ .set_page_dirty = noop_set_page_dirty,
+ .invalidatepage = noop_invalidatepage,
+};
+
void ext4_set_aops(struct inode *inode)
{
switch (ext4_inode_journal_mode(inode)) {
@@ -3958,7 +3980,9 @@ void ext4_set_aops(struct inode *inode)
default:
BUG();
}
- if (test_opt(inode->i_sb, DELALLOC))
+ if (IS_DAX(inode))
+ inode->i_mapping->a_ops = &ext4_dax_aops;
+ else if (test_opt(inode->i_sb, DELALLOC))
inode->i_mapping->a_ops = &ext4_da_aops;
else
inode->i_mapping->a_ops = &ext4_aops;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v7 05/14] ext4, dax: introduce ext4_dax_aops
[not found] ` <152167305782.5268.13485258587227210521.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2018-03-29 15:40 ` Jan Kara
[not found] ` <20180329154035.lvsepjvt6vcplshw-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2018-03-29 15:40 UTC (permalink / raw)
To: Dan Williams
Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
david-FqsqvQoI3Ljby3iVrkZq2A, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andreas Dilger,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, hch-jcswGhMUV9g
On Wed 21-03-18 15:57:37, Dan Williams wrote:
> In preparation for the dax implementation to start associating dax pages
> to inodes via page->mapping, we need to provide a 'struct
> address_space_operations' instance for dax. Otherwise, direct-I/O
> triggers incorrect page cache assumptions and warnings.
>
> Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
> Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Looks good, just one nit below.
> @@ -3946,6 +3961,13 @@ static const struct address_space_operations ext4_da_aops = {
> .error_remove_page = generic_error_remove_page,
> };
>
> +static const struct address_space_operations ext4_dax_aops = {
> + .direct_IO = ext4_direct_IO,
So ext4_direct_IO() for IS_DAX() files will just bail out. So could you
just provide ext4_dax_direct_IO() which will bail out and use it here? With
a similar comment as in xfs_vm_direct_IO() that open still needs this
method set... Thanks!
Honza
--
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v7 05/14] ext4, dax: introduce ext4_dax_aops
[not found] ` <20180329154035.lvsepjvt6vcplshw-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
@ 2018-03-29 18:09 ` Christoph Hellwig
[not found] ` <20180329180927.GA16055-jcswGhMUV9g@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2018-03-29 18:09 UTC (permalink / raw)
To: Jan Kara
Cc: Theodore Ts'o, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
david-FqsqvQoI3Ljby3iVrkZq2A, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, Andreas Dilger,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, hch-jcswGhMUV9g
On Thu, Mar 29, 2018 at 05:40:35PM +0200, Jan Kara wrote:
> So ext4_direct_IO() for IS_DAX() files will just bail out. So could you
> just provide ext4_dax_direct_IO() which will bail out and use it here? With
> a similar comment as in xfs_vm_direct_IO() that open still needs this
> method set... Thanks!
In fact a common noop_direct_IO might make sense.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v7 05/14] ext4, dax: introduce ext4_dax_aops
[not found] ` <20180329180927.GA16055-jcswGhMUV9g@public.gmane.org>
@ 2018-03-29 22:47 ` Dan Williams
0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2018-03-29 22:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Theodore Ts'o, linux-nvdimm, david, Linux Kernel Mailing List,
linux-xfs, Andreas Dilger, linux-fsdevel, Jan Kara, linux-ext4
On Thu, Mar 29, 2018 at 11:09 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> On Thu, Mar 29, 2018 at 05:40:35PM +0200, Jan Kara wrote:
>> So ext4_direct_IO() for IS_DAX() files will just bail out. So could you
>> just provide ext4_dax_direct_IO() which will bail out and use it here? With
>> a similar comment as in xfs_vm_direct_IO() that open still needs this
>> method set... Thanks!
>
> In fact a common noop_direct_IO might make sense.
Ok, I introduced noop_direct_IO() in "fs, dax: prepare for
dax-specific address_space_operations", and cleaned up xfs, ext4, and
ext2 accordingly. Let me know if you want to see a resend of the
series with those changes. Otherwise this will appear in -next
shortly.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-03-29 22:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-21 22:57 [PATCH v7 00/14] dax: fix dma vs truncate/hole-punch Dan Williams
[not found] ` <152167302988.5268.4370226749268662682.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2018-03-21 22:57 ` [PATCH v7 05/14] ext4, dax: introduce ext4_dax_aops Dan Williams
[not found] ` <152167305782.5268.13485258587227210521.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2018-03-29 15:40 ` Jan Kara
[not found] ` <20180329154035.lvsepjvt6vcplshw-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-03-29 18:09 ` Christoph Hellwig
[not found] ` <20180329180927.GA16055-jcswGhMUV9g@public.gmane.org>
2018-03-29 22:47 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox