linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600	[thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177@linux.intel.com>

On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > +		pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > +	struct address_space *mapping = vma->vm_file->f_mapping;
> > > +	unsigned long pmd_addr = address & PMD_MASK;
> > > +	bool write = flags & FAULT_FLAG_WRITE;
> > > +	struct inode *inode = mapping->host;
> > > +	struct iomap iomap = { 0 };
> > > +	int error, result = 0;
> > > +	pgoff_t size, pgoff;
> > > +	struct vm_fault vmf;
> > > +	void *entry;
> > > +	loff_t pos;
> > > +
> > > +	/* Fall back to PTEs if we're going to COW */
> > > +	if (write && !(vma->vm_flags & VM_SHARED)) {
> > > +		split_huge_pmd(vma, pmd, address);
> > > +		return VM_FAULT_FALLBACK;
> > > +	}
> > > +
> > > +	/* If the PMD would extend outside the VMA */
> > > +	if (pmd_addr < vma->vm_start)
> > > +		return VM_FAULT_FALLBACK;
> > > +	if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > +		return VM_FAULT_FALLBACK;
> > > +
> > > +	/*
> > > +	 * Check whether offset isn't beyond end of file now. Caller is
> > > +	 * supposed to hold locks serializing us with truncate / punch hole so
> > > +	 * this is a reliable test.
> > > +	 */
> > > +	pgoff = linear_page_index(vma, pmd_addr);
> > > +	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > +	if (pgoff >= size)
> > > +		return VM_FAULT_SIGBUS;
> > > +
> > > +	/* If the PMD would extend beyond the file size */
> > > +	if ((pgoff | PG_PMD_COLOUR) >= size)
> > > +		return VM_FAULT_FALLBACK;
> > > +
> > > +	/*
> > > +	 * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > +	 * PMD or a HZP entry.  If it can't (because a 4k page is already in
> > > +	 * the tree, for instance), it will return -EEXIST and we just fall
> > > +	 * back to 4k entries.
> > > +	 */
> > > +	entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > +	if (IS_ERR(entry))
> > > +		return VM_FAULT_FALLBACK;
> > > +
> > > +	/*
> > > +	 * Note that we don't use iomap_apply here.  We aren't doing I/O, only
> > > +	 * setting up a mapping, so really we're using iomap_begin() as a way
> > > +	 * to look up our filesystem block.
> > > +	 */
> > > +	pos = (loff_t)pgoff << PAGE_SHIFT;
> > > +	error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > +			&iomap);
> > 
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> > 
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> > 
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
> 
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.

Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem.  We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call 
  ext2_write_failed()
    ext2_truncate_blocks()
      dax_sem_down_write()

Where we try and take dax_sem again.  This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.

So...how should we handle this?  A few ideas:

1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.

2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().

3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.

For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.

Thoughts?

  parent reply	other threads:[~2016-10-06 21:34 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29 22:49 [PATCH v4 00/12] re-enable DAX PMD support Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 01/12] ext4: allow DAX writeback for hole punch Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 02/12] ext4: tell DAX the size of allocation holes Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 03/12] dax: remove buffer_size_valid() Ross Zwisler
2016-09-30  8:49   ` Christoph Hellwig
2016-09-29 22:49 ` [PATCH v4 04/12] ext2: remove support for DAX PMD faults Ross Zwisler
2016-09-30  8:49   ` Christoph Hellwig
2016-10-03  9:35   ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 05/12] dax: make 'wait_table' global variable static Ross Zwisler
2016-09-30  8:50   ` Christoph Hellwig
2016-10-03  9:36   ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 06/12] dax: consistent variable naming for DAX entries Ross Zwisler
2016-09-30  8:50   ` Christoph Hellwig
2016-10-03  9:37   ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 07/12] dax: coordinate locking for offsets in PMD range Ross Zwisler
2016-09-30  9:44   ` Christoph Hellwig
2016-10-03  9:55   ` Jan Kara
2016-10-03 18:40     ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 08/12] dax: remove dax_pmd_fault() Ross Zwisler
2016-09-30  8:50   ` Christoph Hellwig
2016-10-03  9:56   ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 09/12] dax: correct dax iomap code namespace Ross Zwisler
2016-09-30  8:51   ` Christoph Hellwig
2016-10-03  9:57   ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 10/12] dax: add struct iomap based DAX PMD support Ross Zwisler
2016-09-30  9:56   ` Christoph Hellwig
2016-10-03 21:16     ` Ross Zwisler
2016-10-03 10:59   ` Jan Kara
2016-10-03 16:37     ` Christoph Hellwig
2016-10-03 21:05     ` Ross Zwisler
2016-10-04  5:55       ` Jan Kara
2016-10-04 15:39         ` Ross Zwisler
2016-10-05  5:50           ` Jan Kara
2016-10-06 21:34       ` Ross Zwisler [this message]
2016-10-07  2:58         ` Ross Zwisler
2016-10-07  7:24           ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 11/12] xfs: use struct iomap based DAX PMD fault path Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 12/12] dax: remove "depends on BROKEN" from FS_DAX_PMD Ross Zwisler
2016-09-29 23:43 ` [PATCH v4 00/12] re-enable DAX PMD support Dave Chinner
2016-09-30  3:03   ` Ross Zwisler
2016-09-30  4:00     ` Darrick J. Wong
2016-10-03 18:54       ` Ross Zwisler
2016-09-30  6:48     ` Dave Chinner
2016-10-03 21:11       ` Ross Zwisler
2016-10-03 23:05   ` Ross Zwisler
2016-09-30 11:46 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161006213424.GA4569@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).