From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jon Hardcastle Subject: Re: Fw: Why does one get mismatches? Date: Mon, 25 Jan 2010 02:07:11 -0800 (PST) Message-ID: <644690.69223.qm@web51302.mail.re2.yahoo.com> References: <878wbn85bu.fsf@frosties.localdomain> Reply-To: Jon@eHardcastle.com Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <878wbn85bu.fsf@frosties.localdomain> Sender: linux-raid-owner@vger.kernel.org To: Jon@eHardcastle.com Cc: Goswin von Brederlow , linux-raid@vger.kernel.org List-Id: linux-raid.ids --- On Sun, 24/1/10, Goswin von Brederlow wrote: > From: Goswin von Brederlow > Subject: Re: Fw: Why does one get mismatches? > To: Jon@eHardcastle.com > Cc: "Goswin von Brederlow" , linux-raid@vger.kerne= l.org > Date: Sunday, 24 January, 2010, 23:13 > Jon Hardcastle > writes: >=20 > > --- On Fri, 22/1/10, Goswin von Brederlow > wrote: > > > >> From: Goswin von Brederlow > >> Subject: Re: Fw: Why does one get mismatches? > >> To: Jon@eHardcastle.com > >> Cc: linux-raid@vger.kernel.org > >> Date: Friday, 22 January, 2010, 18:13 > >> Jon Hardcastle > >> writes: > >>=20 > >> > --- On Tue, 19/1/10, Jon Hardcastle > >> wrote: > >> > > >> >> From: Jon Hardcastle > >> >> Subject: Why does one get mismatches? > >> >> To: linux-raid@vger.kernel.org > >> >> Date: Tuesday, 19 January, 2010, 10:04 > >> >> Hi, > >> >>=20 > >> >> I kicked off a check/repair cycle on my > machine > >> after i > >> >> moved the phyiscal ordering of my drives > around > >> and I am now > >> >> on my second check/repair cycle and it > has kept > >> finding > >> >> mismatches. > >> >>=20 > >> >> Is it correct that the mismatch value > after a > >> repair was > >> >> needed should equal the value present > after a > >> check? What if > >> >> it doesn't? What does it mean if another > check > >> STILL reveals > >> >> mismatches? > >> >>=20 > >> >> I had something similar after i reshaped > from raid > >> 5 to 6 i > >> >> had to run check/repair/check/repair > several times > >> before i > >> >> got my 0. > >> >>=20 > >> >>=20 > >> > > >> > Guys, > >> > > >> > Anyone got any suggestions here? I am now on > my ~5 > >> check/repair and after a reboot the first check is > still > >> returning 8. > >> > > >> > All i have done is move the drives around. It > is the > >> same controllers/cables/etc=20 > >> > > >> > I really dont like the seeming random nature > of what > >> can/does/has caused the mismatches? > >>=20 > >> There is some unknown corruption going on with > raid1 that > >> causes > >> mismatches but it is believed that it will never > occur on > >> any used > >> block. Swapping is a likely cause. > >>=20 > >> Any swap device on the raid? Try turning that > off. > >> If that doesn't help try umounting filesystems or > >> remounting RO. > >>=20 > >> MfG > >> =A0 =A0 =A0 =A0 Goswin > > > > Hello, my usual savior Goswin! > > > > The deal is it is a 7 drive raid 6 array. it has LVM > on it and is not used for swapping. I have umounted all LV's > and still got mismatches, i run smartctl --test=3Dlong on all > drives - nothing. I have now dismantled the array and am 3/4 > the way through 'badblocks -svn' on each of the component > drive. I have a hunch that it may be a dodgy SATA cable but > have no evidence. No errors in log, nothing on dmesg. > > > > Is there any way to get more information? I am > starting to think this is more happened since i changed from > raid 5 to 6..... which i did < 1 month ago. > > > > The only lead i have is that whilst doing the bad > blocks 1 drive ran at ~10~15MB/s whereas the rest are going > at ~30 i have another identical model drive coming up so i > will see if that one is slow too. But the lack of logging > info is not helpful and worrying! and the prospect of silent > corruption a big worry! >=20 > You did run a repair pass and not just repeated check > passes, right? > Check itself only counts the mismatches but does not > correct them. > If the raid is unused (vgchange -a n) and you do first > repair and then > check then that definetly should not find any mismatches. >=20 > MfG >=20 > =A0 =A0 =A0 =A0 Goswin >=20 Hello! Yes, I have a simple script that first does a check, then if there are = mismatches it does repair. I have then been manually rerunning a check = and I keep getting mismatches. I goes like this 232, 8, 24, 8, 8, 16, 1= 6, 24, 24, 8, 16, 24. But I have also done this manually and run severa= l repairs in a row (assuming that will return 0 if no work is to be don= e) Now the array is completely dismantled and I am running bad blocks on t= he drives but I am on the last 2 of the 7 drives and I still have no le= ads. No bad blocks, no offline uncorrectable, no pending sectors no dme= sg errors no nothing. I have absolutely no leads what so ever. The only thing i have left to try is a full Mem test and disconnect and= reseat the additional sata controllers, oh and buy 7 new sata cables i= ncase 1 is bad. But it would be REALLY helpful to know on what drive the mismatches hav= e occured. Any help here would be gratefully received! I might even try converting= the array back to raid 5 as i remember i had mismatches immediately af= ter i converted from 5 to 6. =20 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html