* Continuing problems with RAID arrays starting at boot - devices not found
@ 2010-05-14 1:31 Mark Knecht
2010-05-15 23:37 ` Leslie Rhorer
0 siblings, 1 reply; 2+ messages in thread
From: Mark Knecht @ 2010-05-14 1:31 UTC (permalink / raw)
To: Linux-RAID
Hi,
I have a continuing problem with all my my RAID arrays. I'm
moderately confident that it's not a mdadm problem but I hope I can
ask here for any tests or ideas to test it before I try reporting it
up to the LKML. Thanks in advance.
I have a new high-end home server/desktop machine built using an
Asus Rampage II Extreme motherboard, an Intel Core i7-980x and
(currently) 12GB DRAM. The machine has five 500GB WD RAID Edition
drives:
RAID 1: /dev/sda, /dev/sdb & /dev/sdc - 3 partitions on each drive
creating three 3-drive RAID1 drives
RAID 0: /dev/sdd & /dev/sde - currently 1 partition on each drive
creating a RAID0.
Every time I boot all 5 drives are recognized by system BIOS. There
is a BIOS device table printing on the screen and it __always__ shows
all 5 drives. If I enter BIOS and look at the storage page all drives
are shown. If I do nothing then the system waits 10 seconds, then
boots into grub. Grub boots the kernel, the boot process rolls along,
gets to where it starts mdadm, and then 50%-75% of the time one or
more of the partitions isn't found and mdadm doesn't start the RAID
correctly.
Now, after booting and RAID not starting correctly, maybe half the
time I can look for the drive (ls /dev/sde1 for instance) find it and
add it back to the RAID array. Half the time the drive isn't found
until I reboot the machine. If I look in dmesg I don't see the missing
drive. It's just like it isn't there even though BIOS said it was
before booting Linux. The missing drive is not always found on a warm
reboot, but is often found on a cold reboot.
The problem has been consistent across all the kernels I've tried
over the last 2 months.
My question is whether this is in any way related to mdadm? I
suspect it isn't but thought I'd try to get some ideas on how to test
for the root cause of this problem. If it was purely a mdadm problem
then even if the RAID wasn't correctly started then wouldn't I still
find the drive partitions?
I can send along whatever info is needed. I don't know what to
supply at this point.
Thanks,
Mark
c2stable ~ # uname -a
Linux c2stable 2.6.34-rc5 #1 SMP PREEMPT Mon Apr 26 12:04:14 PDT 2010
x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel GNU/Linux
c2stable ~ #
c2stable ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sda6[0] sdc6[2] sdb6[1]
247416933 blocks super 1.1 [3/3] [UUU]
md11 : active raid0 sdd1[0] sde1[1]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdc3[2] sda3[0] sdb3[1]
52436096 blocks [3/3] [UUU]
md5 : active raid1 sdc5[2] sda5[0] sdb5[1]
52436032 blocks [3/3] [UUU]
unused devices: <none>
c2stable ~ #
c2stable ~ # ls /dev/sd*
/dev/sda /dev/sda4 /dev/sdb1 /dev/sdb5 /dev/sdc2 /dev/sdc6 /dev/sde1
/dev/sda1 /dev/sda5 /dev/sdb2 /dev/sdb6 /dev/sdc3 /dev/sdd
/dev/sda2 /dev/sda6 /dev/sdb3 /dev/sdc /dev/sdc4 /dev/sdd1
/dev/sda3 /dev/sdb /dev/sdb4 /dev/sdc1 /dev/sdc5 /dev/sde
c2stable ~ #
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: Continuing problems with RAID arrays starting at boot - devices not found
2010-05-14 1:31 Continuing problems with RAID arrays starting at boot - devices not found Mark Knecht
@ 2010-05-15 23:37 ` Leslie Rhorer
0 siblings, 0 replies; 2+ messages in thread
From: Leslie Rhorer @ 2010-05-15 23:37 UTC (permalink / raw)
To: 'Mark Knecht', 'Linux-RAID'
> Hi,
> I have a continuing problem with all my my RAID arrays. I'm
> moderately confident that it's not a mdadm problem but I hope I can
> ask here for any tests or ideas to test it before I try reporting it
> up to the LKML. Thanks in advance.
I was having a similar problem with one of my systems. Because I
thought it might be an issue with udev, I tried to update the kernel.
Because of it, I am not in a situation where the system is unbootable.
> Every time I boot all 5 drives are recognized by system BIOS. There
> is a BIOS device table printing on the screen and it __always__ shows
> all 5 drives. If I enter BIOS and look at the storage page all drives
> are shown. If I do nothing then the system waits 10 seconds, then
> boots into grub. Grub boots the kernel, the boot process rolls along,
> gets to where it starts mdadm, and then 50%-75% of the time one or
> more of the partitions isn't found and mdadm doesn't start the RAID
> correctly.
Except that the drives on the controller are not recognized by the
BIOS (and never will be), I was have very much the same symptoms - at a high
level, anyway - as you.
> Now, after booting and RAID not starting correctly, maybe half the
> time I can look for the drive (ls /dev/sde1 for instance) find it and
> add it back to the RAID array. Half the time the drive isn't found
> until I reboot the machine. If I look in dmesg I don't see the missing
> drive. It's just like it isn't there even though BIOS said it was
> before booting Linux. The missing drive is not always found on a warm
> reboot, but is often found on a cold reboot.
>
> The problem has been consistent across all the kernels I've tried
> over the last 2 months.
>
> My question is whether this is in any way related to mdadm? I
'Pretty unlikely. Mdamd doesn't fiddle with block devices created
by udev. If the block device for the hard drive isn't there, then for
whatever reason udev isn't creating it, and if udev doesn't create it, mdadm
can't use it as a member in an array. I think the udev problem went away
when I upgraded to 2.6.32-3-amd64 (or at least every time I have looked,
now, all 8 eSATA targets seem to be there), but now I have much bigger
problems.
> suspect it isn't but thought I'd try to get some ideas on how to test
> for the root cause of this problem. If it was purely a mdadm problem
> then even if the RAID wasn't correctly started then wouldn't I still
> find the drive partitions?
Yes, you would. Mdadm is failing because the block devices are not
in /dev, not the other way around. You might look at the boot logs for
reports concerning failing SATA devices. Try `man udevadm`.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-05-15 23:37 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14 1:31 Continuing problems with RAID arrays starting at boot - devices not found Mark Knecht
2010-05-15 23:37 ` Leslie Rhorer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).