From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Wed, 02 Apr 2008 15:07:40 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m32M7PR1003507 for ; Wed, 2 Apr 2008 15:07:29 -0700 Date: Thu, 3 Apr 2008 08:07:50 +1000 From: David Chinner Subject: Re: Serious XFS crash Message-ID: <20080402220750.GJ103491721@sgi.com> References: <20080325185453.3a1957dd@galadriel.home> <20080325233611.GW103491721@sgi.com> <20080401140035.46470306@galadriel.home> <20080402055831.GG103491721@sgi.com> <20080402133003.4bb043e4@galadriel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080402133003.4bb043e4@galadriel.home> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Emmanuel Florac Cc: David Chinner , xfs@oss.sgi.com On Wed, Apr 02, 2008 at 01:30:03PM +0200, Emmanuel Florac wrote: > Le Wed, 2 Apr 2008 15:58:31 +1000 vous écriviez: > > > The log is rather garbled - can you repost? Also, XFS usually outputs > > an error message before the stack trace; can you make sure you > > paste that as well (if it exists)? > > Well I attached the relevant part of kern.log; the message just before > the crash is not very clear... You can see the other messages relevant > to the disk error too. Like the fact reiser is also complaining about corrupted blocks? > Mar 6 06:25:04 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6E2A. > Mar 6 06:25:04 system3 kernel: ReiserFS: warning: is_tree_node: node level 28784 does not match to the expected one 1 > Mar 6 06:25:04 system3 kernel: ReiserFS: sda1: warning: vs-5150: search_by_key: invalid format found in block 753671. Fsck? > Mar 6 06:25:04 system3 kernel: ReiserFS: sda1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [18404 18463 0x0 SD] and: > Mar 6 10:42:46 system3 kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Mar 6 10:42:46 system3 kernel: Filesystem "md0": XFS internal error xfs_alloc_read_agf at line 2190 of file fs/xfs/xfs_alloc.c. Caller 0xc01f4b88 That's an AGF made up of zeros instead of real metadata. Something has trashed it - perhaps a "sector repair"? > Mar 6 10:42:46 system3 kernel: Please umount the filesystem, and rectify the problem(s) > Mar 6 10:51:19 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6E00. > Mar 6 10:51:20 system3 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0023): Sector repair completed:port=6, LBA=0xE6DCA. I'd go and find whatever disk is located at LBA 0xE6DCA-0xE6E2A and replace it - if there are that many repairs needed on it, it's likely to be failing.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group