From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Sun, 09 Mar 2008 15:59:33 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m29Mx80D003705 for ; Sun, 9 Mar 2008 15:59:12 -0700 Date: Mon, 10 Mar 2008 09:59:25 +1100 From: David Chinner Subject: Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c Message-ID: <20080309225925.GT155407@sgi.com> References: <1a4a774c0802130251h657a52f7lb97942e7afdf6e3f@mail.gmail.com> <20080213214551.GR155407@sgi.com> <1a4a774c0803050553h7f6294cfq41c38f34ea92ceae@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1a4a774c0803050553h7f6294cfq41c38f34ea92ceae@mail.gmail.com> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Christian =?iso-8859-1?Q?R=F8snes?= Cc: David Chinner , xfs@oss.sgi.com On Wed, Mar 05, 2008 at 02:53:18PM +0100, Christian Røsnes wrote: > On Wed, Feb 13, 2008 at 10:45 PM, David Chinner wrote: > After being hit several times by the problem mentioned above (running > kernel 2.6.17.7), > I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to > a 99% full partition: > > df -k: > /dev/sdb1 286380096 282994528 3385568 99% /data > > The rsync application will probably fail because it will most likely > run out of space, > but I got another xfs_trans_cancel kernel message: > > Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of > file fs/xfs/xfs_trans.c. Caller 0xc021a010 > Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1 > [] xfs_trans_cancel+0x5d/0xe6 > [] xfs_mkdir+0x45a/0x493 > [] xfs_mkdir+0x45a/0x493 > [] xfs_acl_vhasacl_default+0x33/0x44 > [] xfs_vn_mknod+0x165/0x243 > [] xfs_access+0x2f/0x35 > [] xfs_vn_mkdir+0x12/0x14 > [] vfs_mkdir+0xa3/0xe2 > [] sys_mkdirat+0x8a/0xc3 > [] sys_mkdir+0x1f/0x23 > [] syscall_call+0x7/0xb > ======================= > xfs_force_shutdown(sdb1,0x8) called from line 1164 of file > fs/xfs/xfs_trans.c. Return address = 0xc0212690 > Filesystem "sdb1": Corruption of in-memory data detected. Shutting > down filesystem: sdb1 > Please umount the filesystem, and rectify the problem(s) Ok, so the problem still exists. > Trying to umount /dev/sdb1 fails (umount just hangs) . That shouldn't happen. Any output in the log when it hung? What were the blocked process stack traces (/proc/sysrq-trigger is your friend)? > Rebooting the system seems to hang also - and I believe the kernel > outputs this message > when trying to umount /dev/sdb1: > > xfs_force_shutdown(sdb1,0x1) called from line 420 of file fs/xfs/xfs_rw.c. > Return address = 0xc021cb21 It's already been shut down, right? An unmount should not trigger more of these warnings... > > After waiting 5 minutes I power-cycle the system to bring it back up. > > After the restart, I ran: > > xfs_check /dev/sdb1 > > (there was no output from xfs_check). > > Could this be the same problem I experienced with 2.6.17.7 ? Yes, it likely is. Can you apply the patch below and reproduce the problem? I can't reproduce the problem locally, so I'll need you to apply test patches to isolate the error. I suspect a xfs_dir_canenter()/xfs_dir_createname() with resblks == 0 issue, and the patch below will tell us if this is the case. It annotates the error paths for both create and mkdir (the two places I've seen this error occur), and what I am expecting to see is something like: xfs_create: dir_enter w/ 0 resblks ok. xfs_create: dir_createname error 28 Cheers, Dave. --- fs/xfs/xfs_vnodeops.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2008-02-22 17:40:04.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2008-03-10 09:53:43.658179381 +1100 @@ -1886,12 +1886,17 @@ xfs_create( if (error) goto error_return; - if (resblks == 0 && (error = xfs_dir_canenter(tp, dp, name, namelen))) - goto error_return; + if (!resblks) { + error = xfs_dir_canenter(tp, dp, name, namelen); + if (error) + goto error_return; + printk(KERN_WARNING "xfs_create: dir_enter w/ 0 resblks ok.\n"); + } error = xfs_dir_ialloc(&tp, dp, mode, 1, rdev, credp, prid, resblks > 0, &ip, &committed); if (error) { + printk(KERN_WARNING "xfs_create: dir_ialloc error %d\n", error); if (error == ENOSPC) goto error_return; goto abort_return; @@ -1921,6 +1926,7 @@ xfs_create( resblks - XFS_IALLOC_SPACE_RES(mp) : 0); if (error) { ASSERT(error != ENOSPC); + printk(KERN_WARNING "xfs_create: dir_createname error %d\n", error); goto abort_return; } xfs_ichgtime(dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); @@ -1955,6 +1961,7 @@ xfs_create( error = xfs_bmap_finish(&tp, &free_list, &committed); if (error) { xfs_bmap_cancel(&free_list); + printk(KERN_WARNING "xfs_create: xfs_bmap_finish error %d\n", error); goto abort_rele; } @@ -2727,9 +2734,12 @@ xfs_mkdir( if (error) goto error_return; - if (resblks == 0 && - (error = xfs_dir_canenter(tp, dp, dir_name, dir_namelen))) - goto error_return; + if (!resblks) { + error = xfs_dir_canenter(tp, dp, dir_name, dir_namelen); + if (error) + goto error_return; + printk(KERN_WARNING "xfs_mkdir: dir_enter w/ 0 resblks ok.\n"); + } /* * create the directory inode. */ @@ -2737,6 +2747,7 @@ xfs_mkdir( 0, credp, prid, resblks > 0, &cdp, NULL); if (error) { + printk(KERN_WARNING "xfs_mkdir: dir_ialloc error %d\n", error); if (error == ENOSPC) goto error_return; goto abort_return; @@ -2761,6 +2772,7 @@ xfs_mkdir( &first_block, &free_list, resblks ? resblks - XFS_IALLOC_SPACE_RES(mp) : 0); if (error) { + printk(KERN_WARNING "xfs_mkdir: dir_createname error %d\n", error); ASSERT(error != ENOSPC); goto error1; } @@ -2805,6 +2817,7 @@ xfs_mkdir( error = xfs_bmap_finish(&tp, &free_list, &committed); if (error) { + printk(KERN_WARNING "xfs_mkdir: bmap_finish error %d\n", error); IRELE(cdp); goto error2; }