From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris J Arges Subject: Re: regression in ext4_ind_remove_space Date: Sun, 01 Feb 2015 15:46:06 -0600 Message-ID: <54CE9E9E.2040700@canonical.com> References: <54CBD3F9.3030301@canonical.com> <54CBF434.1030708@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org To: Lukas Czerner , Theodore Ts'o , Jan Kara Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:40402 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752476AbbBAVqI (ORCPT ); Sun, 1 Feb 2015 16:46:08 -0500 In-Reply-To: <54CBF434.1030708@canonical.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 01/30/2015 03:14 PM, Chris J Arges wrote: > > > On 01/30/2015 12:56 PM, Chris J Arges wrote: >> Hi, >> >> Users of non-extent ext4 filesystems (ext4 ^extents, or ext3 w/ >> CONFIG_EXT4_USE_FOR_EXT23=y) can encounter data corruption when using >> fallocate with FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE flags. This >> seems to be a regression in ext4_ind_remove_space introduced in >> 4f579ae7, whereas commit 77ea2a4b passes the following test case. >> >> To reproduce this issue do the following: >> 1) Setup ext4 ^extents, or ext3 filesystem with CONFIG_EXT4_USE_FOR_EXT23=y >> 2) Create and install a VM using a qcow2 image and store the file on the >> filesystem >> 3) Snapshot the image with qemu-img >> 4) Boot the image and do some disk operations (fio,etc) >> 5) Shutdown image and delete snapshot >> 6) Repeat 3-5 until VM no longer boots due to image corruption, >> generally this takes a few iterations depending on disk operations. >> >> In addition, I've tested this with a single vCPU and single host CPU and >> the problem persists. Running the same test on ext4 w/ extents exhibits >> no failures. >> >> Any ideas for bug hunting here? Commit 4f579ae7 completely re-writes >> ext4_ind_remove_space. A revert of that commit from master fixes the >> issue, but I'm unsure if that un-fixes other things. > > Well, after a bit more extensive testing a revert of this patch still > has failures. This may just be a novel bug. > > I'll run longer tests on 77ea2a4b as well. > --chris 77ea2a4b eventually fails (~20 iterations) but seems to be a bit more resilient. So this does not seem like a regression; but still ext4_ind_remove_space is at the heart of the issue. If I use the following code snippet below, I can run the test case for days (~400 iterations so far) without failure: diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5653fa4..e14cdfe 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3367,6 +3367,10 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) unsigned int credits; int ret = 0; + /* EXTENTS required */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return -EOPNOTSUPP; + if (!S_ISREG(inode->i_mode)) return -EOPNOTSUPP; --chris > > I'm happy to >> continue debugging, or run any tests as necessary. >> >> Thanks, >> --chris j arges >>