From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org,
ross.zwisler@linux.intel.com, willy@linux.intel.com,
dan.j.williams@intel.com, kirill.shutemov@linux.intel.com,
linux-nvdimm@lists.01.org, jack@suse.cz,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/7] xfs, dax: fix the page fault/allocation mess
Date: Thu, 1 Oct 2015 14:31:21 -0600 [thread overview]
Message-ID: <20151001203121.GB23495@linux.intel.com> (raw)
In-Reply-To: <1443685599-4843-1-git-send-email-david@fromorbit.com>
On Thu, Oct 01, 2015 at 05:46:32PM +1000, Dave Chinner wrote:
> Hi folks,
>
> As discussed in the recent thread about problems with DAX locking:
>
> http://www.gossamer-threads.com/lists/linux/kernel/2264090?do=post_view_threaded
>
> I said that I'd post the patch set that fixed the problems for XFS
> as soon as I had something sane and workable. That's what this
> series is.
>
> To start with, it passes xfstests "auto" group with only the only
> failures being expected failures or failures due to unexpected
> allocation patterns or trying to use unsupported block sizes. That
> makes it better than any previous version of the XFS/DAX code.
>
> The patchset starts by reverting the two patches that were
> introduced in 4.3-rc1 to try to fix the fault vs fault and fault vs
> truncate races that caused deadlocks. This fixes the hangs in
> generic/075 that these patches introduced.
>
> Patch 3 enables XFS to handle the behaviour of DAX and DIO when
> asking to allocate the block at (2^63 - 1FSB), where the offset +
> count s technically illegal (larger than sb->s_maxbytes) and
> overflows a s64 variable. This is currently hidden by the fact that
> all DAX and DIO allocation is currently unwritten, but patch 5
> exposes it for DAX.
>
> Patch 4 introduces the ability for XFS to allocate physically zeroed
> data blocks. This is done for each physical extent that is
> allocated, deep inside the allocator itself and guaranteed to be
> atomic with the allocation transaction and hence has no
> crash+recovery exposure issues.
>
> This is necessary because the BMAPI layer merges allocated extents
> in the BMBT before it returns the mapped extent back to the high
> level get_blocks() code. Hence the high level code can have a single
> extent presented that is made of merged new and existing extents,
> and so zeroing can't be done at this layer.
>
> The advantage of driving the zeroing deep into the allocator is the
> functionality is now available to all XFS code. Hence we can
> allocate pre-zeroed blocks on any type of storage, and we can
> utilise storage-based hardware acceleration (e.g. discard to zero,
> WRITE_SAME, etc) to do the zeroing. From this POV, DAX is just
> another hardware accelerated physical zeroing mechanism for XFS. :)
>
> [ This is an example of the mantra I repeat a lot: solve the problem
> properly the first time and it will make everything simpler! Sure,
> it took me three attempts to work out how to solve it in a sane
> manner, but that's pretty much par for the course with anything
> non-trivial. ]
>
> Patch 5 makes __xfs_get_blocks() aware that it is being called from
> the DAX fault path and makes sure it returns zeroed blocks rather
> than unwritten extents via XFS_BMAPI_ZERO. It also now sets
> XFS_BMAPI_CONVERT, which tells it to convert unwritten extents to
> written, zeroed blocks. This is the major change of behaviour.
>
> Patch 6 removes the IO completion callbacks from the XFS DAX code as
> they are not longer necessary after patch 5.
>
> Patch 7 adds pfn_mkwrite support to XFS. This is needed to fix
> generic/080, which detects a failure to update the inode timestamp
> on a pfn fault. It also adds the same locking as the XFS
> implementation of ->fault and ->page_mkwrite and hence provide
> correct serialisation against truncate, hole punching, etc that
> doesn't currently exist.
>
> The next steps that are needed are to do the same "block zeroing
> during allocation" to ext4, and then the block zeroing and
> complete_unwritten callbacks can be removed from the DAX API and
> code. I've had a breif look at the ext4 code - the block zeroing
> should be able to be done by overloading the existing zeroout code
> that ext4 has in the unwritten extent allocation code. I'd much
> prefer that an ext4 expert does this work, and then we can clean up
> the DAX code...
Thank you for working on this, and for documenting your thinking so clearly.
One thing I noticed is that in my test setup XFS+DAX is now failing
generic/274:
# diff -u tests/generic/274.out /root/xfstests/results//generic/274.out.bad
--- tests/generic/274.out 2015-08-24 11:05:41.490926305 -0600
+++ /root/xfstests/results//generic/274.out.bad 2015-10-01 13:53:50.498354091 -0600
@@ -2,4 +2,5 @@
------------------------------
preallocation test
------------------------------
-done
+failed to write to test file
+(see /root/xfstests/results//generic/274.full for details)
I've verified that the test passes 100% of the time with my baseline
(v4.3-rc3), and with the set applied but without the DAX mount option. With
the series and with DAX it fails 100% of the time. I haven't looked into the
details of the failure yet, I just wanted to let you know that it was
happening.
next prev parent reply other threads:[~2015-10-01 20:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-01 7:46 [PATCH 0/7] xfs, dax: fix the page fault/allocation mess Dave Chinner
2015-10-01 7:46 ` [PATCH 1/7] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Dave Chinner
2015-10-01 8:35 ` kbuild test robot
2015-10-01 20:27 ` Ross Zwisler
2015-10-01 22:14 ` Williams, Dan J
2015-10-01 22:45 ` Ross Zwisler
2015-10-01 22:32 ` Dave Chinner
2015-10-01 22:47 ` Ross Zwisler
2015-10-01 7:46 ` [PATCH 2/7] Revert "dax: fix race between simultaneous faults" Dave Chinner
2015-10-01 7:46 ` [PATCH 3/7] xfs: fix inode size update overflow in xfs_map_direct() Dave Chinner
2015-10-01 7:46 ` [PATCH 4/7] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2015-10-01 7:46 ` [PATCH 5/7] xfs: Don't use unwritten extents for DAX Dave Chinner
2015-10-01 7:46 ` [PATCH 6/7] xfs: DAX does not use IO completion callbacks Dave Chinner
2015-10-01 7:46 ` [PATCH 7/7] xfs: add ->pfn_mkwrite support for DAX Dave Chinner
2015-10-01 20:31 ` Ross Zwisler [this message]
2015-10-01 22:54 ` [PATCH 0/7] xfs, dax: fix the page fault/allocation mess Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151001203121.GB23495@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=willy@linux.intel.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).