From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q4F0wkUq140504 for ; Mon, 14 May 2012 19:58:46 -0500 Received: from boosthardware.localdomain (boosthardware.com [88.198.122.139]) by cuda.sgi.com with ESMTP id I2usGkGU9offo1nA (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Mon, 14 May 2012 17:58:45 -0700 (PDT) Message-ID: <64776.110.174.53.110.1337043522.squirrel@boosthardware.com> In-Reply-To: <20120514142948.GS3963@sgi.com> References: <51509.110.174.53.110.1336699622.squirrel@boosthardware.com> <20120511165012.GC16099@sgi.com> <59946.110.174.53.110.1336959906.squirrel@boosthardware.com> <20120514142948.GS3963@sgi.com> Date: Tue, 15 May 2012 02:58:42 +0200 (CEST) Subject: Re: file corruption issue From: "Patrick Shirkey" MIME-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com On Mon, May 14, 2012 4:29 pm, Ben Myers wrote: > Hey Patrick, > > On Mon, May 14, 2012 at 03:45:06AM +0200, Patrick Shirkey wrote: >> >> On Fri, May 11, 2012 6:50 pm, Ben Myers wrote: >> > On Fri, May 11, 2012 at 03:27:02AM +0200, Patrick Shirkey wrote: >> >> I have some HP machines running centos: >> >> >> >> kernel 2.6.32-042stab049.6 >> >> AMD Opteron(tm) Processor 6180 SE >> >> RAM: 528 GB >> >> RAID bus controller: Hewlett-Packard Company Smart Array G6 >> controllers >> >> >> >> We have experienced some kernel crashes due to a kernel bug with >> >> interleaving ram on this hardware which require hard reset of the >> >> machines. >> >> >> >> After reboot we are finding that there is severe file corruption on >> the >> >> xfs file system where TBs of readonly databases are getting partially >> or >> >> fully truncated. >> >> >> >> Has anyone come across this or similar? >> > >> > This rings a bell for me but I can't be certain. Could you provide a >> > metadump? >> > >> >> The machines are live so we have already restored the data several >> times. >> Will a metadump from the existing file system be useful or do you need >> it >> post crash? > > Well... one of each would be best. It might be helpful to compare the > block > map from before the crash with the block map after the crash for one of > the > read-only corrupted databases. > Unfortunately I cannot unmount the partition/s to run xfs_metadump because they are in use. I have found some files that were truncated on a recent crash. Is there any tool I can run on those files to get info that might be useful? -- Patrick Shirkey Boost Hardware Ltd _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs