From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH v3 3/6] ext4: Online defrag not supported with DAX Date: Thu, 18 Feb 2016 11:12:23 +1100 Message-ID: <20160218001223.GJ19486@dastard> References: <1455680059-20126-1-git-send-email-ross.zwisler@linux.intel.com> <1455680059-20126-4-git-send-email-ross.zwisler@linux.intel.com> <20160217215037.GB30126@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: Ross Zwisler , Jan Kara , linux-kernel@vger.kernel.org, "J. Bruce Fields" , Theodore Ts'o , Alexander Viro , Andreas Dilger , Andrew Morton , Dan Williams , Jan Kara , Jeff Layton , Jens Axboe , Matthew Wilcox , linux-block@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, xfs@oss.sgi.com Return-path: Content-Disposition: inline In-Reply-To: <20160217215037.GB30126@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Wed, Feb 17, 2016 at 02:50:37PM -0700, Ross Zwisler wrote: > On Tue, Feb 16, 2016 at 08:34:16PM -0700, Ross Zwisler wrote: > > Online defrag operations for ext4 are hard coded to use the page cache. > > See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page() > > > > When combined with DAX I/O, which circumvents the page cache, this can > > result in data corruption. This was observed with xfstests ext4/307 and > > ext4/308. > > > > Fix this by only allowing online defrag for non-DAX files. > > Jan, > > Thinking about this a bit more, it's probably the case that the data > corruption I was observing was due to us skipping the writeback of the dirty > page cache pages because S_DAX was set. > > I do think we have a problem with defrag because it is doing the extent > swapping using the page cache, and we won't flush the dirty pages due to > S_DAX being set. > > This patch is the quick and easy answer, and is perhaps appropriate for v4.5. > > Looking forward, though, what do you think the correct solution is? Making an > extent swapper that doesn't use the page cache (as I believe XFS has? see > xfs_swap_extents()), XFS does the data copy in userspace using direct IO so we don't care about whether DAX is enabled or not on either the source or destination inode. i.e. xfs_swap_extents() is a pure metadata operation, swapping the entire extent tree between two inodes if the source data has not changed while the copy was in progress. Cheers, Dave. -- Dave Chinner david@fromorbit.com