linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org,
	ross.zwisler@linux.intel.com, willy@linux.intel.com,
	dan.j.williams@intel.com, kirill.shutemov@linux.intel.com,
	linux-nvdimm@lists.01.org, jack@suse.cz,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/7] xfs, dax: fix the page fault/allocation mess
Date: Thu, 1 Oct 2015 14:31:21 -0600	[thread overview]
Message-ID: <20151001203121.GB23495@linux.intel.com> (raw)
In-Reply-To: <1443685599-4843-1-git-send-email-david@fromorbit.com>

On Thu, Oct 01, 2015 at 05:46:32PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> As discussed in the recent thread about problems with DAX locking:
> 
> http://www.gossamer-threads.com/lists/linux/kernel/2264090?do=post_view_threaded
> 
> I said that I'd post the patch set that fixed the problems for XFS
> as soon as I had something sane and workable. That's what this
> series is.
> 
> To start with, it passes xfstests "auto" group with only the only
> failures being expected failures or failures due to unexpected
> allocation patterns or trying to use unsupported block sizes. That
> makes it better than any previous version of the XFS/DAX code.
> 
> The patchset starts by reverting the two patches that were
> introduced in 4.3-rc1 to try to fix the fault vs fault and fault vs
> truncate races that caused deadlocks. This fixes the hangs in
> generic/075 that these patches introduced.
> 
> Patch 3 enables XFS to handle the behaviour of DAX and DIO when
> asking to allocate the block at (2^63 - 1FSB), where the offset +
> count s technically illegal (larger than sb->s_maxbytes) and
> overflows a s64 variable. This is currently hidden by the fact that
> all DAX and DIO allocation is currently unwritten, but patch 5
> exposes it for DAX.
> 
> Patch 4 introduces the ability for XFS to allocate physically zeroed
> data blocks. This is done for each physical extent that is
> allocated, deep inside the allocator itself and guaranteed to be
> atomic with the allocation transaction and hence has no
> crash+recovery exposure issues.
> 
> This is necessary because the BMAPI layer merges allocated extents
> in the BMBT before it returns the mapped extent back to the high
> level get_blocks() code. Hence the high level code can have a single
> extent presented that is made of merged new and existing extents,
> and so zeroing can't be done at this layer.
> 
> The advantage of driving the zeroing deep into the allocator is the
> functionality is now available to all XFS code. Hence we can
> allocate pre-zeroed blocks on any type of storage, and we can
> utilise storage-based hardware acceleration (e.g. discard to zero,
> WRITE_SAME, etc) to do the zeroing. From this POV, DAX is just
> another hardware accelerated physical zeroing mechanism for XFS. :)
> 
> [ This is an example of the mantra I repeat a lot: solve the problem
>   properly the first time and it will make everything simpler! Sure,
>   it took me three attempts to work out how to solve it in a sane
>   manner, but that's pretty much par for the course with anything
>   non-trivial. ]
> 
> Patch 5 makes __xfs_get_blocks() aware that it is being called from
> the DAX fault path and makes sure it returns zeroed blocks rather
> than unwritten extents via XFS_BMAPI_ZERO. It also now sets
> XFS_BMAPI_CONVERT, which tells it to convert unwritten extents to
> written, zeroed blocks. This is the major change of behaviour.
> 
> Patch 6 removes the IO completion callbacks from the XFS DAX code as
> they are not longer necessary after patch 5.
> 
> Patch 7 adds pfn_mkwrite support to XFS. This is needed to fix
> generic/080, which detects a failure to update the inode timestamp
> on a pfn fault. It also adds the same locking as the XFS
> implementation of ->fault and ->page_mkwrite and hence provide
> correct serialisation against truncate, hole punching, etc that
> doesn't currently exist.
> 
> The next steps that are needed are to do the same "block zeroing
> during allocation" to ext4, and then the block zeroing and
> complete_unwritten callbacks can be removed from the DAX API and
> code. I've had a breif look at the ext4 code - the block zeroing
> should be able to be done by overloading the existing zeroout code
> that ext4 has in the unwritten extent allocation code. I'd much
> prefer that an ext4 expert does this work, and then we can clean up
> the DAX code...

Thank you for working on this, and for documenting your thinking so clearly.

One thing I noticed is that in my test setup XFS+DAX is now failing
generic/274:

	# diff -u tests/generic/274.out /root/xfstests/results//generic/274.out.bad
	--- tests/generic/274.out	2015-08-24 11:05:41.490926305 -0600
	+++ /root/xfstests/results//generic/274.out.bad	2015-10-01 13:53:50.498354091 -0600
	@@ -2,4 +2,5 @@
	 ------------------------------
	 preallocation test
	 ------------------------------
	-done
	+failed to write to test file
	+(see /root/xfstests/results//generic/274.full for details)

I've verified that the test passes 100% of the time with my baseline
(v4.3-rc3), and with the set applied but without the DAX mount option.  With
the series and with DAX it fails 100% of the time.  I haven't looked into the
details of the failure yet, I just wanted to let you know that it was
happening.

  parent reply	other threads:[~2015-10-01 20:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-01  7:46 [PATCH 0/7] xfs, dax: fix the page fault/allocation mess Dave Chinner
2015-10-01  7:46 ` [PATCH 1/7] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Dave Chinner
2015-10-01  8:35   ` kbuild test robot
2015-10-01 20:27   ` Ross Zwisler
2015-10-01 22:14     ` Williams, Dan J
2015-10-01 22:45       ` Ross Zwisler
2015-10-01 22:32     ` Dave Chinner
2015-10-01 22:47       ` Ross Zwisler
2015-10-01  7:46 ` [PATCH 2/7] Revert "dax: fix race between simultaneous faults" Dave Chinner
2015-10-01  7:46 ` [PATCH 3/7] xfs: fix inode size update overflow in xfs_map_direct() Dave Chinner
2015-10-01  7:46 ` [PATCH 4/7] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2015-10-01  7:46 ` [PATCH 5/7] xfs: Don't use unwritten extents for DAX Dave Chinner
2015-10-01  7:46 ` [PATCH 6/7] xfs: DAX does not use IO completion callbacks Dave Chinner
2015-10-01  7:46 ` [PATCH 7/7] xfs: add ->pfn_mkwrite support for DAX Dave Chinner
2015-10-01 20:31 ` Ross Zwisler [this message]
2015-10-01 22:54   ` [PATCH 0/7] xfs, dax: fix the page fault/allocation mess Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151001203121.GB23495@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=willy@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).