From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tom Walsh Subject: Re: array always resyncs on boot Date: Sat, 29 Nov 2008 21:21:03 -0500 Message-ID: <4931F88F.9000300@openhardware.net> References: <4931B2C5.5060403@openhardware.net> <4931E75D.5080905@openhardware.net> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Justin Piszcz Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids Justin Piszcz wrote: > > > On Sat, 29 Nov 2008, Tom Walsh wrote: > >> Justin Piszcz wrote: >>> > > [ .. snip .. ] > >> From your dmesg, you need to figure out why it is assembling only 5 > out of the 6 disks, is it kicking a non-fresh one out of the array, or..? > [...] > > 1. Remove quiet from the boot options (in lilo.conf or grub.conf) > 2. Send an updated e-mail with the boot message. > 3. Try to capture all of the dmesg/syslog when you shutdown and send > that as well. > > > Also see below, are all of these drives on the -SAME- controller? > http://www.newegg.com/Product/Product.aspx?Item=N82E16813128347 > Yes, they are. Something keeps nagging at me. I think that about 3 months ago I did a urpmi update and it replaced the kernel, or I allowed it to. The system that I had back then, that was working with the ICH10 chip was a 2.6.22 kernel (Mandriva customized). I've noticed in the dmesg output that the 'md:' does not finish scanning the sata drives before the raid10 module starts, as evidenced here in this dmesg snippet: ACPI: Processor [CPU2] (supports 8 throttling states) ACPI: SSDT 7FEE8700, 0152 (r1 PmRef Cpu3Ist 3000 INTL 20040311) ACPI: CPU3 (power states: C1[C1] C2[C2]) processor ACPI0007:03: registered as cooling_device3 ACPI: Processor [CPU3] (supports 8 throttling states) md: bind md: bind md: bind HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 HDA Intel 0000:00:1b.0: setting latency timer to 64 md: bind md: bind md: raid10 personality registered for level 10 raid10: raid set md0 active with 5 out of 6 devices md: bind RAID10 conf printout: --- wd:5 rd:6 disk 0, wo:0, o:1, dev:sda5 disk 1, wo:0, o:1, dev:sdb5 disk 2, wo:0, o:1, dev:sdc5 disk 3, wo:1, o:1, dev:sdd5 disk 4, wo:0, o:1, dev:sde5 disk 5, wo:0, o:1, dev:sdf5 md: recovery of RAID array md0 ================================================ If you notice, '^raid10: ...' announcement preceeds the '^md: bind' announcement? Then, the raid10 conf printout shows that a different status for the dev:sdd5 ?? I googled this message and it seems to tie in with what may be happening here, see: http://archive.netbsd.se/?ml=linux-ide&a=2008-08&t=8302899 In that message, he talks about "hard resetting link" problems and failing to properly assemble the array. In all this madness of mine in these past 5..6 days, I'd forgotten that I put the drive array onto an old AMD Sempron system here and installed Mandriva 2008 onto it. That seemed to work without a problem. That is a 2.6.22 kernel. I'm going to focus briefly on the kernel version as I see sometime around either 2.6.24 or 2.6.26 that some work on the ICH10 driver was being done. Prior to that version, the ide_generic was used with the ICH10 and you had to enable the AHCI Bios on the motherboard. I'll let you know... This may be a driver issue. TomW -- Tom Walsh - WN3L - Embedded Systems Consultant http://openhardware.net http://cyberiansoftware.com http://openzipit.org "Windows? No thanks, I have work to do..." ----------------------------------------------------