From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: xfs and raid5 - "Structure needs cleaning for directory open" Date: Tue, 18 May 2010 07:45:32 +1000 Message-ID: <20100517214532.GL8120@dastard> References: <20100510022033.GB7165@dastard> <4BF1B4FE.7020503@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <4BF1B4FE.7020503@redhat.com> Sender: linux-raid-owner@vger.kernel.org To: Doug Ledford Cc: Rainer Fuegenstein , xfs@oss.sgi.com, linux-raid@vger.kernel.org List-Id: linux-raid.ids On Mon, May 17, 2010 at 05:28:30PM -0400, Doug Ledford wrote: > On 05/09/2010 10:20 PM, Dave Chinner wrote: > > On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote: > >> > >> today in the morning some daemon processes terminated because of > >> errors in the xfs file system on top of a software raid5, consisti= ng > >> of 4*1.5TB WD caviar green SATA disks. > >=20 > > Reminds me of a recent(-ish) md/dm readahead cancellation fix - tha= t > > would fit the symptoms of (btree corruption showing up under heavy = IO > > load but no corruption on disk. However, I can't seem to find any > > references to it at the moment (can't remember the bug title), but > > perhaps your distro doesn't have the fix in it? > >=20 > > Cheers, > >=20 > > Dave. >=20 > That sounds plausible, as does hardware error. A memory bit flip und= er > heavy load would cause the in memory data to be corrupt while the on > disk data is good. The data dumps from the bad blocks weren't wrong by a single bit - they were unrecogni=D1=95able garbage - so that it very unlikely to be a memory erro causing the problem. > By waiting to check it until later, the bad memory > was flushed at some point and when the data was reloaded it came in o= k > this time. Yup - XFS needs to do a better job of catching this case - the prototype metadata checksumming patch caught most of these cases... Cheers, Dave. --=20 Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o4HLhOTV009503 for ; Mon, 17 May 2010 16:43:25 -0500 Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C0E66353296 for ; Mon, 17 May 2010 14:45:40 -0700 (PDT) Received: from mail.internode.on.net (bld-mail12.adl6.internode.on.net [150.101.137.97]) by cuda.sgi.com with ESMTP id T0F1IkYZSuKmcrFU for ; Mon, 17 May 2010 14:45:40 -0700 (PDT) Date: Tue, 18 May 2010 07:45:32 +1000 From: Dave Chinner Subject: Re: xfs and raid5 - "Structure needs cleaning for directory open" Message-ID: <20100517214532.GL8120@dastard> References: <20100510022033.GB7165@dastard> <4BF1B4FE.7020503@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4BF1B4FE.7020503@redhat.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Doug Ledford Cc: linux-raid@vger.kernel.org, Rainer Fuegenstein , xfs@oss.sgi.com T24gTW9uLCBNYXkgMTcsIDIwMTAgYXQgMDU6Mjg6MzBQTSAtMDQwMCwgRG91ZyBMZWRmb3JkIHdy b3RlOgo+IE9uIDA1LzA5LzIwMTAgMTA6MjAgUE0sIERhdmUgQ2hpbm5lciB3cm90ZToKPiA+IE9u IFN1biwgTWF5IDA5LCAyMDEwIGF0IDA4OjQ4OjAwUE0gKzAyMDAsIFJhaW5lciBGdWVnZW5zdGVp biB3cm90ZToKPiA+Pgo+ID4+IHRvZGF5IGluIHRoZSBtb3JuaW5nIHNvbWUgZGFlbW9uIHByb2Nl c3NlcyB0ZXJtaW5hdGVkIGJlY2F1c2Ugb2YKPiA+PiBlcnJvcnMgaW4gdGhlIHhmcyBmaWxlIHN5 c3RlbSBvbiB0b3Agb2YgYSBzb2Z0d2FyZSByYWlkNSwgY29uc2lzdGluZwo+ID4+IG9mIDQqMS41 VEIgV0QgY2F2aWFyIGdyZWVuIFNBVEEgZGlza3MuCj4gPiAKPiA+IFJlbWluZHMgbWUgb2YgYSBy ZWNlbnQoLWlzaCkgbWQvZG0gcmVhZGFoZWFkIGNhbmNlbGxhdGlvbiBmaXggLSB0aGF0Cj4gPiB3 b3VsZCBmaXQgdGhlIHN5bXB0b21zIG9mIChidHJlZSBjb3JydXB0aW9uIHNob3dpbmcgdXAgdW5k ZXIgaGVhdnkgSU8KPiA+IGxvYWQgYnV0IG5vIGNvcnJ1cHRpb24gb24gZGlzay4gSG93ZXZlciwg SSBjYW4ndCBzZWVtIHRvIGZpbmQgYW55Cj4gPiByZWZlcmVuY2VzIHRvIGl0IGF0IHRoZSBtb21l bnQgKGNhbid0IHJlbWVtYmVyIHRoZSBidWcgdGl0bGUpLCBidXQKPiA+IHBlcmhhcHMgeW91ciBk aXN0cm8gZG9lc24ndCBoYXZlIHRoZSBmaXggaW4gaXQ/Cj4gPiAKPiA+IENoZWVycywKPiA+IAo+ ID4gRGF2ZS4KPiAKPiBUaGF0IHNvdW5kcyBwbGF1c2libGUsIGFzIGRvZXMgaGFyZHdhcmUgZXJy b3IuICBBIG1lbW9yeSBiaXQgZmxpcCB1bmRlcgo+IGhlYXZ5IGxvYWQgd291bGQgY2F1c2UgdGhl IGluIG1lbW9yeSBkYXRhIHRvIGJlIGNvcnJ1cHQgd2hpbGUgdGhlIG9uCj4gZGlzayBkYXRhIGlz IGdvb2QuCgpUaGUgZGF0YSBkdW1wcyBmcm9tIHRoZSBiYWQgYmxvY2tzIHdlcmVuJ3Qgd3Jvbmcg YnkgYSBzaW5nbGUgYml0IC0KdGhleSB3ZXJlIHVucmVjb2duadGVYWJsZSBnYXJiYWdlIC0gc28g dGhhdCBpdCB2ZXJ5IHVubGlrZWx5IHRvIGJlCmEgbWVtb3J5IGVycm8gY2F1c2luZyB0aGUgcHJv YmxlbS4KCj4gQnkgd2FpdGluZyB0byBjaGVjayBpdCB1bnRpbCBsYXRlciwgdGhlIGJhZCBtZW1v cnkKPiB3YXMgZmx1c2hlZCBhdCBzb21lIHBvaW50IGFuZCB3aGVuIHRoZSBkYXRhIHdhcyByZWxv YWRlZCBpdCBjYW1lIGluIG9rCj4gdGhpcyB0aW1lLgoKWXVwIC0gWEZTIG5lZWRzIHRvIGRvIGEg YmV0dGVyIGpvYiBvZiBjYXRjaGluZyB0aGlzIGNhc2UgLSB0aGUKcHJvdG90eXBlIG1ldGFkYXRh IGNoZWNrc3VtbWluZyBwYXRjaCBjYXVnaHQgbW9zdCBvZiB0aGVzZSBjYXNlcy4uLgoKQ2hlZXJz LAoKRGF2ZS4KLS0gCkRhdmUgQ2hpbm5lcgpkYXZpZEBmcm9tb3JiaXQuY29tCgpfX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwp4ZnMgbWFpbGluZyBsaXN0Cnhm c0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20vbWFpbG1hbi9saXN0aW5mby94ZnMK