public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@linux.intel.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH] dax: fix deadlock in __dax_fault
Date: Thu, 24 Sep 2015 12:52:25 +1000	[thread overview]
Message-ID: <20150924025225.GT3902@dastard> (raw)
In-Reply-To: <1443040800-5460-1-git-send-email-ross.zwisler@linux.intel.com>

On Wed, Sep 23, 2015 at 02:40:00PM -0600, Ross Zwisler wrote:
> Fix the deadlock exposed by xfstests generic/075.  Here is the sequence
> that was causing us to deadlock:
> 
> 1) enter __dax_fault()
> 2) page = find_get_page() gives us a page, so skip
> 	i_mmap_lock_write(mapping)
> 3) if (!buffer_mapped(&bh) && !buffer_unwritten(&bh) && !vmf->cow_page)
> 	passes, enter this block
> 4) if (vmf->flags & FAULT_FLAG_WRITE) fails, so do the else case and
> 	i_mmap_unlock_write(mapping);
> 	return dax_load_hole(mapping, page, vmf);
> 
> This causes us to up_write() a semaphore that we weren't holding.
> 
> The up_write() on a semaphore we didn't down_write() happens twice in
> a row, and then the next time we try and i_mmap_lock_write(), we hang.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dax.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 7ae6df7..df1b0ac 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -405,7 +405,8 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  			if (error)
>  				goto unlock;
>  		} else {
> -			i_mmap_unlock_write(mapping);
> +			if (!page)
> +				i_mmap_unlock_write(mapping);
>  			return dax_load_hole(mapping, page, vmf);
>  		}
>  	}

I can't review this properly because I can't work out how this
locking is supposed to work.  Captain, we have a Charlie Foxtrot
situation here:

	page = find_get_page(mapping, vmf->pgoff)
	if (page) {
		....
	} else {
		i_mmap_lock_write(mapping);
	}

So if there's no page in the page cache, we lock the i_mmap_lock.
The we have the case the above patch fixes. Then later:

	if (vmf->cow_page) {
		.....
		if (!page) {
			/* can fall through */
		}
		return VM_FAULT_LOCKED;
	}

Which means __dax_fault() can also return here with the
i_mmap_lock_write() held. There's no documentation to indicate why
this is valid, and only by looking about 4 function calls higher up
the stack can I see that there's some attempt to handle this
*specific return condition* (in do_cow_fault()). That also is
lacking in documentation explaining the circumstances where we might
have the i_mmap_lock_write() held and have to release it. (Not to
mention the beautiful copy-n-waste of the unlock code, either.)

The above code in __dax_fault() is then followed by this gem:

	/* Check we didn't race with a read fault installing a new page */
        if (!page && major)
                page = find_lock_page(mapping, vmf->pgoff);

	if (page) {
		/* mapping invalidation .... */
	}
	.....

	if (!page)
		i_mmap_unlock_write(mapping);

Which means that if we had a race with another read fault, we'll
remove the page from the page cache, insert the new direct mapped
pfn into the mapping, and *then fail to unlock the i_mmap lock*.

Is this supposed to work this way? Or is it another bug?

Another difficult question this change of locking raised that I
can't answer: is it valid to call into the filesystem via getblock()
or complete_unwritten() while holding the i_mmap_rwsem? This puts
filesystem transactions and locks inside the scope of i_mmap_rwsem,
which may have impact on the fact that we already have an inode lock
order dependency w.r.t. i_mmap_rwsem through truncate (and probably
other paths, too).

So, please document the locking model, explain the corner cases and
the intricacies like why *unbalanced, return value conditional
locking* is necessary, and update the charts of lock order
dependencies in places like mm/filemap.c, and then we might have
some idea of how much of a train-wreck this actually is....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-09-24  2:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 20:40 [PATCH] dax: fix deadlock in __dax_fault Ross Zwisler
2015-09-24  2:52 ` Dave Chinner [this message]
2015-09-24  9:03   ` Boaz Harrosh
2015-09-24 15:50   ` Ross Zwisler
2015-09-25  2:53     ` Dave Chinner
2015-09-25 18:23       ` Ross Zwisler
2015-09-25 23:30         ` Dave Chinner
2015-09-26  3:17       ` Ross Zwisler
2015-09-28  0:59         ` Dave Chinner
2015-09-28 10:12           ` Dave Chinner
2015-09-28 10:23             ` kbuild test robot
2015-09-28 10:23             ` kbuild test robot
2015-09-28 12:13           ` Dan Williams
2015-09-28 21:35             ` Dave Chinner
2015-09-28 22:57               ` Dan Williams
2015-09-29  2:18                 ` Dave Chinner
2015-09-29  3:08                   ` Dan Williams
2015-09-29  4:19                     ` Dave Chinner
2015-09-28 22:40           ` Ross Zwisler
2015-09-29  2:44             ` Dave Chinner
2015-09-30  1:57               ` Dave Chinner
2015-09-30  2:04               ` Ross Zwisler
2015-09-30  3:22                 ` Dave Chinner
2015-10-02 12:55                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150924025225.GT3902@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox