From: Jan Kara <jack@suse.cz>
To: "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>
Cc: Jan Kara <jack@suse.cz>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
NeilBrown <neilb@suse.com>
Subject: Re: [PATCH 05/12] dax: Remove synchronization using i_mmap_lock
Date: Mon, 14 Mar 2016 11:01:28 +0100 [thread overview]
Message-ID: <20160314100128.GB6801@quack.suse.cz> (raw)
In-Reply-To: <100D68C7BA14664A8938383216E40DE042207AA9@FMSMSX114.amr.corp.intel.com>
On Thu 10-03-16 20:10:09, Wilcox, Matthew R wrote:
> Here's the race:
>
> CPU 0 CPU 1
> do_cow_fault()
> __do_fault()
> takes sem
> dax_fault()
> releases sem
> truncate()
> unmap_mapping_range()
> i_mmap_lock_write()
> unmap_mapping_range_tree()
> i_mmap_unlock_write()
> do_set_pte()
>
> Holding i_mmap_lock_read() from inside __do_fault() prevents the truncate
> from proceeding until the page is inseted with do_set_pte().
Ah, right. Thanks for reminding me. I was hoping to get rid of this
i_mmap_lock abuse in DAX code but obviously it needs more work :).
Honza
> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz]
> Sent: Thursday, March 10, 2016 12:05 PM
> To: Wilcox, Matthew R
> Cc: Jan Kara; linux-fsdevel@vger.kernel.org; Ross Zwisler; Williams, Dan J; linux-nvdimm@lists.01.org; NeilBrown
> Subject: Re: [PATCH 05/12] dax: Remove synchronization using i_mmap_lock
>
> On Thu 10-03-16 19:55:21, Wilcox, Matthew R wrote:
> > This locking's still necessary. i_mmap_sem has already been released by
> > the time we're back in do_cow_fault(), so it doesn't protect that page,
> > and truncate can have whizzed past and thinks there's nothing to unmap.
> > So a task can have a MAP_PRIVATE page still in its address space after
> > it's supposed to have been unmapped.
>
> I don't think this is possible. Filesystem holds its inode->i_mmap_sem for
> reading when handling the fault. That synchronizes against truncate...
>
> Honza
>
> > -----Original Message-----
> > From: Jan Kara [mailto:jack@suse.cz]
> > Sent: Thursday, March 10, 2016 11:19 AM
> > To: linux-fsdevel@vger.kernel.org
> > Cc: Wilcox, Matthew R; Ross Zwisler; Williams, Dan J; linux-nvdimm@lists.01.org; NeilBrown; Jan Kara
> > Subject: [PATCH 05/12] dax: Remove synchronization using i_mmap_lock
> >
> > At one point DAX used i_mmap_lock so synchronize page faults with page
> > table invalidation during truncate. However these days DAX uses
> > filesystem specific RW semaphores to protect against these races
> > (i_mmap_sem in ext2 & ext4 cases, XFS_MMAPLOCK in xfs case). So remove
> > the unnecessary locking.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/dax.c | 19 -------------------
> > mm/memory.c | 14 --------------
> > 2 files changed, 33 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 9c4d697fb6fc..e409e8fc13b7 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -563,8 +563,6 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
> > pgoff_t size;
> > int error;
> >
> > - i_mmap_lock_read(mapping);
> > -
> > /*
> > * Check truncate didn't happen while we were allocating a block.
> > * If it did, this block may or may not be still allocated to the
> > @@ -597,8 +595,6 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
> > error = vm_insert_mixed(vma, vaddr, dax.pfn);
> >
> > out:
> > - i_mmap_unlock_read(mapping);
> > -
> > return error;
> > }
> >
> > @@ -695,17 +691,6 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> > if (error)
> > goto unlock_page;
> > vmf->page = page;
> > - if (!page) {
> > - i_mmap_lock_read(mapping);
> > - /* Check we didn't race with truncate */
> > - size = (i_size_read(inode) + PAGE_SIZE - 1) >>
> > - PAGE_SHIFT;
> > - if (vmf->pgoff >= size) {
> > - i_mmap_unlock_read(mapping);
> > - error = -EIO;
> > - goto out;
> > - }
> > - }
> > return VM_FAULT_LOCKED;
> > }
> >
> > @@ -895,8 +880,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > truncate_pagecache_range(inode, lstart, lend);
> > }
> >
> > - i_mmap_lock_read(mapping);
> > -
> > /*
> > * If a truncate happened while we were allocating blocks, we may
> > * leave blocks allocated to the file that are beyond EOF. We can't
> > @@ -1013,8 +996,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > }
> >
> > out:
> > - i_mmap_unlock_read(mapping);
> > -
> > if (buffer_unwritten(&bh))
> > complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8132787ae4d5..13f76eb08f33 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2430,8 +2430,6 @@ void unmap_mapping_range(struct address_space *mapping,
> > if (details.last_index < details.first_index)
> > details.last_index = ULONG_MAX;
> >
> > -
> > - /* DAX uses i_mmap_lock to serialise file truncate vs page fault */
> > i_mmap_lock_write(mapping);
> > if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
> > unmap_mapping_range_tree(&mapping->i_mmap, &details);
> > @@ -3019,12 +3017,6 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > if (fault_page) {
> > unlock_page(fault_page);
> > page_cache_release(fault_page);
> > - } else {
> > - /*
> > - * The fault handler has no page to lock, so it holds
> > - * i_mmap_lock for read to protect against truncate.
> > - */
> > - i_mmap_unlock_read(vma->vm_file->f_mapping);
> > }
> > goto uncharge_out;
> > }
> > @@ -3035,12 +3027,6 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > if (fault_page) {
> > unlock_page(fault_page);
> > page_cache_release(fault_page);
> > - } else {
> > - /*
> > - * The fault handler has no page to lock, so it holds
> > - * i_mmap_lock for read to protect against truncate.
> > - */
> > - i_mmap_unlock_read(vma->vm_file->f_mapping);
> > }
> > return ret;
> > uncharge_out:
> > --
> > 2.6.2
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-03-14 10:01 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-10 19:18 [RFC] [PATCH 0/12] DAX page fault locking Jan Kara
2016-03-10 19:18 ` [PATCH 01/12] DAX: move RADIX_DAX_ definitions to dax.c Jan Kara
2016-03-11 22:54 ` Ross Zwisler
2016-03-10 19:18 ` [PATCH 02/12] radix-tree: make 'indirect' bit available to exception entries Jan Kara
2016-03-10 19:18 ` [PATCH 03/12] mm: Remove VM_FAULT_MINOR Jan Kara
2016-03-10 19:38 ` Wilcox, Matthew R
2016-03-10 19:48 ` Jan Kara
2016-03-10 19:18 ` [PATCH 04/12] ocfs2: Fix return value from ocfs2_page_mkwrite() Jan Kara
2016-03-10 19:18 ` [PATCH 05/12] dax: Remove synchronization using i_mmap_lock Jan Kara
2016-03-10 19:55 ` Wilcox, Matthew R
2016-03-10 20:05 ` Jan Kara
2016-03-10 20:10 ` Wilcox, Matthew R
2016-03-14 10:01 ` Jan Kara [this message]
2016-03-14 14:51 ` Wilcox, Matthew R
2016-03-15 9:50 ` Jan Kara
2016-03-10 19:18 ` [PATCH 06/12] dax: Remove complete_unwritten argument Jan Kara
2016-03-10 19:18 ` [PATCH 07/12] dax: Fix data corruption for written and mmapped files Jan Kara
2016-03-10 19:18 ` [PATCH 08/12] dax: Fix bogus fault return value on cow faults Jan Kara
2016-03-10 19:18 ` [PATCH 09/12] dax: Allow DAX code to replace exceptional entries Jan Kara
2016-03-10 19:18 ` [PATCH 10/12] dax: Remove redundant inode size checks Jan Kara
2016-03-10 19:18 ` [PATCH 11/12] dax: Disable huge page handling Jan Kara
2016-03-10 19:34 ` Dan Williams
2016-03-10 19:52 ` Jan Kara
2016-03-10 19:18 ` [PATCH 12/12] dax: New fault locking Jan Kara
2016-03-10 23:54 ` NeilBrown
2016-03-15 21:34 ` NeilBrown
2016-03-18 14:16 ` Jan Kara
2016-03-18 15:39 ` Jan Kara
2016-03-22 21:10 ` NeilBrown
2016-03-23 11:00 ` Jan Kara
2016-03-31 4:20 ` NeilBrown
2016-03-31 8:54 ` Jan Kara
2016-04-01 0:34 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160314100128.GB6801@quack.suse.cz \
--to=jack@suse.cz \
--cc=dan.j.williams@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=matthew.r.wilcox@intel.com \
--cc=neilb@suse.com \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).