* [PATCH 0/2] Add updated DAX locking to ext2
@ 2015-10-09 22:02 Ross Zwisler
2015-10-09 22:02 ` [PATCH 1/2] dax: dax_pfn_mkwrite() truncate race check Ross Zwisler
0 siblings, 1 reply; 2+ messages in thread
From: Ross Zwisler @ 2015-10-09 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Ross Zwisler, Alexander Viro, Jan Kara, Matthew Wilcox,
linux-ext4, linux-fsdevel, Andrew Morton, Dan Williams,
Dave Chinner, linux-nvdimm, Matthew Wilcox, Andreas Dilger
The first patch in this series is a somewhat related bug fix. The second patch
adds new locking to ext2 to isolate DAX faults (page faults, PMD faults, page
mkwrite and pfn mkwrite) from ext2 operations that modify a given inode's data
block allocations.
In my first attempt at this fix I attempted to expand the protection offered by
ext2_inode_info->truncate_mutex so that we took truncate_mutex before our DAX
fault paths were entered. The DAX fault handlers would then call a version of
ext2_get_block() that assumed truncate_mutex was already held.
This *almost* works, but it turns out that it introduces a lock ordering
inversion between truncate_mutex and a given page's page_lock. With my series
as it is, the lock ordering looks like this:
mmap_sem (MM)
ext2_inode_info->dax_sem
sb_start_pagefault (vfs, freeze - taken in DAX)
address_space->i_mmap_rwsem or page_lock (mutually exclusive in DAX)
ext2_inode_info->truncate_mutex
With truncate_mutex being taken before the page fault handler, as in my initial
attempt, our lock ordering is instead:
mmap_sem (MM)
ext2_inode_info->truncate_mutex
sb_start_pagefault (vfs, freeze - taken in DAX)
address_space->i_mmap_rwsem or page_lock (mutually exclusive in DAX)
This is fine for DAX, but it turns out that you can't actually take
truncate_mutex before a page_lock because some operations call ext2_get_block()
while already holding page_lock from calls higher in the stack. This means
that whatever locks are taken in ext2_get_block() must be below page_lock in
the locking order, else you can deadlock. To solve this I had to introduce a
new lock that would sit above page_lock in the locking order. This is the new
"dax_sem" semaphore.
Folks with ext2 experience, have I missed any other cases that need to be
protected by dax_sem? I took a hard look at all the cases in XFS that are
protected by i_mmaplock and convinced myself that truncate was the only one of
them present in ext2.
I've tested this series using xfstests with DAX both enabled and disabled.
Lockdep was on and working, and didn't have any complaints.
Ross Zwisler (2):
dax: dax_pfn_mkwrite() truncate race check
ext2: Add locking for DAX faults
fs/dax.c | 13 +++++++++++--
fs/ext2/ext2.h | 1 +
fs/ext2/file.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
fs/ext2/inode.c | 9 +++++++++
fs/ext2/super.c | 1 +
5 files changed, 69 insertions(+), 6 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 2+ messages in thread* [PATCH 1/2] dax: dax_pfn_mkwrite() truncate race check
2015-10-09 22:02 [PATCH 0/2] Add updated DAX locking to ext2 Ross Zwisler
@ 2015-10-09 22:02 ` Ross Zwisler
0 siblings, 0 replies; 2+ messages in thread
From: Ross Zwisler @ 2015-10-09 22:02 UTC (permalink / raw)
To: linux-kernel
Cc: Ross Zwisler, Alexander Viro, Matthew Wilcox, linux-fsdevel,
Andrew Morton, Dan Williams, Dave Chinner, Jan Kara, linux-nvdimm,
Matthew Wilcox, Andreas Dilger
Update dax_pfn_mkwrite() so that it validates i_size before returning.
This is necessary to ensure that the page fault has not raced with truncate
and is now pointing to a region beyond the end of the current file.
This change is based on a similar outstanding patch for XFS from Dave
Chinner entitled "xfs: add ->pfn_mkwrite support for DAX".
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
fs/dax.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 131fd35a..82be6e4 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -693,12 +693,21 @@ EXPORT_SYMBOL_GPL(dax_pmd_fault);
*/
int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
{
- struct super_block *sb = file_inode(vma->vm_file)->i_sb;
+ struct inode *inode = file_inode(vma->vm_file);
+ struct super_block *sb = inode->i_sb;
+ int ret = VM_FAULT_NOPAGE;
+ loff_t size;
sb_start_pagefault(sb);
file_update_time(vma->vm_file);
+
+ /* check that the faulting page hasn't raced with truncate */
+ size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ if (vmf->pgoff >= size)
+ ret = VM_FAULT_SIGBUS;
+
sb_end_pagefault(sb);
- return VM_FAULT_NOPAGE;
+ return ret;
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
--
2.1.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-10-09 22:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-09 22:02 [PATCH 0/2] Add updated DAX locking to ext2 Ross Zwisler
2015-10-09 22:02 ` [PATCH 1/2] dax: dax_pfn_mkwrite() truncate race check Ross Zwisler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).