From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.mpynet.fi ([82.197.21.85]:16561 "EHLO mx2.mpynet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751538AbdLNMLh (ORCPT ); Thu, 14 Dec 2017 07:11:37 -0500 Subject: Re: [PATCH] xfs_io: fix exitcode handling (was Re: generic/399 and xfs_io pwrite command) References: <7235f31f-5427-ff12-11f9-cba98431e71c@tuxera.com> <20171204222842.GX4094@dastard> <35d2712c-b036-d017-2d42-f74bbc4444a9@tuxera.com> <20171205211850.GA5858@dastard> <20171206002638.GB5858@dastard> From: Ari Sundholm Message-ID: <27b82a44-b889-212f-af74-cad35c103c86@tuxera.com> Date: Thu, 14 Dec 2017 14:11:32 +0200 MIME-Version: 1.0 In-Reply-To: <20171206002638.GB5858@dastard> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Dave Chinner Cc: fstests@vger.kernel.org, Emil Karlson , linux-xfs@vger.kernel.org On 12/06/2017 02:26 AM, Dave Chinner wrote: > On Wed, Dec 06, 2017 at 08:18:50AM +1100, Dave Chinner wrote: >> On Tue, Dec 05, 2017 at 04:23:43PM +0200, Ari Sundholm wrote: >>> On 12/05/2017 12:28 AM, Dave Chinner wrote: >>>> On Thu, Nov 30, 2017 at 04:21:47PM +0200, Ari Sundholm wrote: >>>>> Hi! >>>>> >>>>> While debugging an issue, we found out that generic/399 seems to >>>>> rely on a behavior that is specific to ext4 in the following section >>>>> of code: >>>>> ----------------8<--------snip--------8<------------------------------ >>>>> # >>>>> # Write files of 1 MB of all the same byte until we hit ENOSPC. >>>>> Note that we >>>>> # must not create sparse files, since the contents of sparse files are not >>>>> # stored on-disk. Also, we create multiple files rather than one big file >>>>> # because we want to test for reuse of per-file keys. >>>>> # >>>>> total_file_size=0 >>>>> i=1 >>>>> while true; do >>>>> file=$SCRATCH_MNT/encrypted_dir/file$i >>>>> if ! $XFS_IO_PROG -f $file -c 'pwrite 0 1M' &> $tmp.out; then >>>>> if ! grep -q 'No space left on device' $tmp.out; then >>>>> echo "FAIL: unexpected pwrite failure" >>>>> cat $tmp.out >>>>> elif [ -e $file ]; then >>>>> total_file_size=$((total_file_size + $(stat -c %s $file))) >>>>> fi >>>>> break >>>>> fi >>>>> total_file_size=$((total_file_size + $(stat -c %s $file))) >>>>> i=$((i + 1)) >>>>> if [ $i -gt $fs_size_in_mb ]; then >>>>> echo "FAIL: filesystem never filled up!" >>>>> break >>>>> fi >>>>> done >>>>> ----------------8<--------snip--------8<------------------------------ >>>>> >>>>> What happens with ext4 is that the xfs_io command gives a nonzero >>>>> exit value not when the pwrite command fails with ENOSPC but during >>>>> the *next* iteration when opening the file fails with ENOSPC. Turns >>>>> out the pwrite command failing does not cause xfs_io to give a >>>>> nonzero exit value. >>>> >>>> That implies ext4 is returning zero bytes written to the pwrite() >>>> call rather than ENOSPC. i.e.: >>>> >>>> bytes = do_pwrite(file->fd, off, cnt, buffersize, >>>> pwritev2_flags); >>>> if (bytes == 0) >>>> break; >>>> if (bytes < 0) { >>>> perror("pwrite"); >>>> return -1; >>>> } >>>> >>>> So if it's exiting with no error, then we can't have got an error >>> >from ext4 at ENOSPC. If that's the case, it probably should be >>>> considered an ext4 bug, not an issue with xfs_io... >>>> >>> >>> No, according to what we've observed, that is not what happens. The >>> pwrite() call does fail and errno is ENOSPC after the call. The >>> immediate problem is that xfs_io does not reflect this failure in >>> its exit value and thus the check in generic/399 does not work in >>> this case. Only when open() fails during the next iteration does >>> xfs_io give a nonzero exit value and cause the check in the test >>> case to allow the test case to end successfully. >> >> Ah, io/pwrite.c fails to set exitcode on failure iappropriately >> before returning. Looks like there is a bunch of xfs_io commands >> that fail to do this properly. >> >>> What is specific to ext4 here is, as stated in my original message, >>> that open() fails. >> >> Because there are no more inodes to allocate (i.e. inode table is >> full) and so attempts to create more fail with ENOSPC. The xfs_io >> open command set exitcode appropriately, and that's why it finally >> triggers a test abort. >> >> Looks like xfs_io does need fixing to set exitcode appropriately on >> all failures... > > Try the patch below. > Thanks, I'll try it. Sorry for the late response (was on vacation). > Cheers, > > Dave. >