From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 04 Aug 2008 17:18:55 -0700 (PDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.168.29]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m750Iph3018189 for ; Mon, 4 Aug 2008 17:18:51 -0700 Received: from ipmail04.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 158F535AEE4 for ; Mon, 4 Aug 2008 17:20:04 -0700 (PDT) Received: from ipmail04.adl2.internode.on.net (ipmail04.adl2.internode.on.net [203.16.214.57]) by cuda.sgi.com with ESMTP id MaLyzP4k8JB52PxN for ; Mon, 04 Aug 2008 17:20:04 -0700 (PDT) Date: Tue, 5 Aug 2008 10:19:52 +1000 From: Dave Chinner Subject: Re: Corruption of in-memory data detected - on heavy hard linking Message-ID: <20080805001952.GI6119@disturbed> References: <48876D03.8010804@stepping-stone.ch> <20080725052051.GA26367@infradead.org> <489732B2.7000201@stepping-stone.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <489732B2.7000201@stepping-stone.ch> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Christian Affolter Cc: xfs@oss.sgi.com On Mon, Aug 04, 2008 at 06:47:46PM +0200, Christian Affolter wrote: > Hi > >> On Wed, Jul 23, 2008 at 07:40:19PM +0200, Christian Affolter wrote: >>> Kernel-Error: >>> Filesystem "sdc1": XFS internal error xfs_trans_cancel at line 1163 >>> of file fs/xfs/xfs_trans.c. Caller 0xffffffff803a4fcf >>> Pid: 22816, comm: cp Not tainted 2.6.24-gentoo-r8 #1 >> >> 2.6.24 is pretty old. Did you try with a recent kernel? We had some >> fixes for in-core memory corruption although I don't remember one in >> this area. > > I finally found the time to update the kernel to a recent 2.6.26 version. > > Unfortunately the problem still exists: > Filesystem "dm-3": XFS internal error xfs_trans_cancel at line 1163 of > file fs/xfs/xfs_trans.c. Caller 0xffffffff803a6672 > Pid: 12584, comm: cp Not tainted 2.6.26-gentoo #1 Ok, what we need is the following. First, try to reproduce the problem on a small filesystem (say a few GB). Once you've reproduced the problem, unmount and remount the filesystem to get the log replayed, then take a xfs_metadump image of the filesystem. Put the metadump image somewhere that can be downloaded (ftp/web site) and let us know where it is. If this is anything like the previous problem I found and fixed, then it will be a corner-case bug that is only triggered by a specific layout of free space and we need the filesystem image to be able to work out exactly what corner case is broken.... > Before the shutdown happens the copy command receives a > "No space left on device" error: > cp: cannot create regular file `[file name snipped': No space left on device > cp: cannot create regular file `[file name snipped]': Input/output error > > Although the device has more than 50% free space as well as free inodes. It will be an AG that is out of space, not the entire filesystem. Cheers, Dave. -- Dave Chinner david@fromorbit.com