From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q8I8U3fq027978 for ; Tue, 18 Sep 2012 03:30:04 -0500 Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.31.100]) by cuda.sgi.com with ESMTP id wg3waMfkfLVcTWFP for ; Tue, 18 Sep 2012 01:31:14 -0700 (PDT) Message-ID: <5058314F.7000102@cape-horn-eng.com> Date: Tue, 18 Sep 2012 10:31:11 +0200 From: Richard Ems MIME-Version: 1.0 Subject: Re: XFS (sdd1): Internal error xfs_da_do_buf(2) at line 2097 of file /usr/src/packages/BUILD/kernel-default-3.3.6/linux-3.3/fs/xfs/xfs_da_btree.c. References: <50573A13.7000206@cape-horn-eng.com> <20120917234926.GJ13691@dastard> In-Reply-To: <20120917234926.GJ13691@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 09/18/2012 01:49 AM, Dave Chinner wrote: > On Mon, Sep 17, 2012 at 04:56:19PM +0200, Richard Ems wrote: >> Hi all, >> >> saturday morning one hard disc on our RAID6 failed. About one hour later, >> the XFS running on that device reported the following error: >> >> XFS (sdd1): Internal error xfs_da_do_buf(2) at line 2097 of file /usr/sr= c/packages/BUILD/kernel-default-3.3.6/linux-3.3/fs/xfs/xfs_da_btree.c. > ..... >> Sep 15 07:30:51 fs1 kernel: [7369085.792619] XFS (sdd1): Corruption dete= cted. Unmount and run xfs_repair >> >> >> And this repeating again and again ... >> >> This system has been running fine for 87 days, no power outages or such. >> It's connected to an UPS, and the H800 Raid Controller has a BBU install= ed. > ..... >> Why could this have happened? > = > Something went wrong at the RAID level (i.e. your hardware) in > handling the disk failure and recovering the array. It corrupted > blocks in the volume rather than recovering them cleanly without > errors. The corrupted blocks happened to be in a directory block, > and a frequently accessed one according to the errors in the log. > = > What you found in lost+found was the recoverable fragments of the > directory and whatever else was corrupted during the disk failure > incident. > = >> What more info can I provide to understand this issue and avoid >> this to happen again? > = > I'd be asking your hardware vendor about why it corrupted the > volume on a single disk failure when it is supposed to be able to > transparently handle double disk failures without losing/corrupting > data. > = > Cheers, > = > Dave. > = Ok, many thanks Dave. I will forward this conversation to the DELL guys ... Thanks again, Richard -- = Richard Ems mail: Richard.Ems@Cape-Horn-Eng.com Cape Horn Engineering S.L. C/ Dr. J.J. D=F3mine 1, 5=BA piso 46011 Valencia Tel : +34 96 3242923 / Fax 924 http://www.cape-horn-eng.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs