* And then there was Bryce...
@ 2006-06-08 0:41 Bryce
2006-06-08 6:38 ` Henrik Holst
0 siblings, 1 reply; 7+ messages in thread
From: Bryce @ 2006-06-08 0:41 UTC (permalink / raw)
To: linux-raid
Gosh, where to start,..
Ok general setup
I'm using kernel version 2.6.17-rc5 and Raid 5 over 5 500Gb SATA disks
(boring dump)
-----------------------------------------------------------------------
[root@emerald ~]# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sat May 27 20:49:13 2006
Raid Level : raid5
Array Size : 1953533952 (1863.04 GiB 2000.42 GB)
Device Size : 488383488 (465.76 GiB 500.10 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Jun 8 01:05:24 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 1024K
UUID : d8d7cacb:24db29e6:46ace8ec:49547cc4
Events : 0.143369
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
-----------------------------------------------------------------------
Anyway, I happen to have a 512MB USB pen drive that I was playing with
earlier that I left attached over a reboot
What follows is horrifying.
From the syslog...
Jun 7 18:47:10 Emerald syslogd 1.4.1: restart.
Jun 7 18:47:10 Emerald kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Jun 7 18:47:10 Emerald kernel: Linux version 2.6.17-rc5 (root@emerald)
(gcc version 4.1.0 20060304 (Red Hat 4.1.0-3)) #2 SMP Sun May 28
15:29:46 BST 2006
...
everything going ok,.. normal boot
and then it all goes horribly wrong,...
Jun 7 18:52:30 Emerald kernel: raid5: Disk failure on sde1, disabling
device. Operation continuing on 3 devices
Jun 7 18:52:30 Emerald kernel: RAID5 conf printout:
Jun 7 18:52:30 Emerald kernel: --- rd:5 wd:3 fd:2
Jun 7 18:52:30 Emerald kernel: disk 0, o:1, dev:sdb1
Jun 7 18:52:30 Emerald kernel: disk 1, o:1, dev:sdd1
Jun 7 18:52:30 Emerald kernel: disk 2, o:0, dev:sde1
Jun 7 18:52:30 Emerald kernel: disk 4, o:1, dev:sdg1
Jun 7 18:52:30 Emerald kernel: RAID5 conf printout:
Jun 7 18:52:30 Emerald kernel: --- rd:5 wd:3 fd:2
Jun 7 18:52:30 Emerald kernel: disk 0, o:1, dev:sdb1
Jun 7 18:52:30 Emerald kernel: disk 1, o:1, dev:sdd1
Jun 7 18:52:30 Emerald kernel: disk 4, o:1, dev:sdg1
Jun 7 18:54:37 Emerald kernel: Buffer I/O error on device dm-2, logical
block 0
Jun 7 18:54:37 Emerald kernel: lost page write due to I/O error on dm-2
Jun 7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical
block 488383472
Jun 7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical
block 488383472
Jun 7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical
block 488383486
Jun 7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical
block 488383486
Jun 7 19:05:10 Emerald kernel: md: unbind<sde1>
Jun 7 19:05:10 Emerald kernel: md: export_rdev(sde1)
Jun 7 19:05:15 Emerald kernel: md: bind<sde1>
but wait a sec,.. WTF is this sdg1 in the raid printout?....
reading back in the syslog, I see
Jun 7 18:47:26 Emerald kernel: SCSI device sdg: 976773168 512-byte hdwr
sectors (500108 MB)
Jun 7 18:47:26 Emerald kernel: sdg: Write Protect is off
Jun 7 18:47:26 Emerald kernel: SCSI device sdg: drive cache: write back
Jun 7 18:47:26 Emerald kernel: SCSI device sdg: 976773168 512-byte hdwr
sectors (500108 MB)
Jun 7 18:47:26 Emerald kernel: sdg: Write Protect is off
Jun 7 18:47:26 Emerald kernel: SCSI device sdg: drive cache: write back
Jun 7 18:47:26 Emerald kernel: sdg: sdg1
Jun 7 18:47:26 Emerald kernel: sd 6:0:0:0: Attached scsi disk sdg
well thats nice, thats my pendrive! so what happened when it setup the
array?
Jun 7 18:47:30 Emerald kernel: md: Autodetecting RAID arrays.
Jun 7 18:47:30 Emerald kernel: md: autorun ...
Jun 7 18:47:30 Emerald kernel: md: considering sdg1 ...
Jun 7 18:47:30 Emerald kernel: md: adding sdg1 ...
Jun 7 18:47:30 Emerald kernel: md: adding sdf1 ...
Jun 7 18:47:30 Emerald kernel: md: adding sde1 ...
Jun 7 18:47:30 Emerald kernel: md: adding sdd1 ...
Jun 7 18:47:30 Emerald kernel: md: adding sdb1 ...
Jun 7 18:47:30 Emerald kernel: md: created md0
Jun 7 18:47:30 Emerald kernel: md: bind<sdb1>
Jun 7 18:47:31 Emerald kernel: md: bind<sdd1>
Jun 7 18:47:31 Emerald kernel: md: bind<sde1>
Jun 7 18:47:31 Emerald kernel: md: bind<sdf1>
Jun 7 18:47:31 Emerald kernel: md: bind<sdg1>
Jun 7 18:47:31 Emerald kernel: md: running: <sdg1><sdf1><sde1><sdd1><sdb1>
Jun 7 18:47:31 Emerald kernel: md: kicking non-fresh sdf1 from array!
Jun 7 18:47:31 Emerald kernel: md: unbind<sdf1>
Jun 7 18:47:31 Emerald kernel: md: export_rdev(sdf1)
Jun 7 18:47:31 Emerald kernel: raid5: automatically using best
checksumming function: pIII_sse
Jun 7 18:47:31 Emerald kernel: pIII_sse : 4203.000 MB/sec
Jun 7 18:47:31 Emerald kernel: raid5: using function: pIII_sse
(4203.000 MB/sec)
Jun 7 18:47:31 Emerald kernel: md: raid5 personality registered for level 5
Jun 7 18:47:31 Emerald kernel: md: raid4 personality registered for level 4
Jun 7 18:47:31 Emerald kernel: raid5: device sdg1 operational as raid
disk 4
Jun 7 18:47:31 Emerald kernel: raid5: device sde1 operational as raid
disk 2
Jun 7 18:47:31 Emerald kernel: raid5: device sdd1 operational as raid
disk 1
Jun 7 18:47:31 Emerald kernel: raid5: device sdb1 operational as raid
disk 0
Jun 7 18:47:31 Emerald kernel: raid5: allocated 5248kB for md0
Jun 7 18:47:31 Emerald kernel: raid5: raid level 5 set md0 active with
4 out of 5 devices, algorithm 2
Jun 7 18:47:31 Emerald kernel: RAID5 conf printout:
Jun 7 18:47:31 Emerald kernel: --- rd:5 wd:4 fd:1
Jun 7 18:47:31 Emerald kernel: disk 0, o:1, dev:sdb1
Jun 7 18:47:31 Emerald kernel: disk 1, o:1, dev:sdd1
Jun 7 18:47:31 Emerald kernel: disk 2, o:1, dev:sde1
Jun 7 18:47:31 Emerald kernel: disk 4, o:1, dev:sdg1
Jun 7 18:47:31 Emerald kernel: md: ... autorun DONE.
WHAT THE HELL?!??
*considering sdg1* ?!?! then deciding it was fair game to use?!??
it's a FAT16 FS pendrive with NO UUID stuff on it...
suddenly the RAID5 gets very unhappy and becomes a RID5 and I spend the
next few hours rebuilding it (fortunately all data was preserved but it
wasn't a pleasant evening I can tell you)
Hum ho,.. I survived the horror but umm, well, I'll leave the above as a
story to frighten young sysadmins with.
Phil
=--=
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: And then there was Bryce...
2006-06-08 0:41 And then there was Bryce Bryce
@ 2006-06-08 6:38 ` Henrik Holst
2006-06-08 10:36 ` Bryce
2006-06-08 15:54 ` H. Peter Anvin
0 siblings, 2 replies; 7+ messages in thread
From: Henrik Holst @ 2006-06-08 6:38 UTC (permalink / raw)
To: Bryce; +Cc: linux-raid
Bryce wrote:
>
> Gosh, where to start,..
>
> Ok general setup
>
> I'm using kernel version 2.6.17-rc5 and Raid 5 over 5 500Gb SATA disks
You have just upgraded to udev havn't you? :-)
[snip!]
>
> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
> a story to frighten young sysadmins with.
The same happened to me with eth0-2. I _could_ not for my life
understand why I didn't get internet connect to work. But then I
realized that eth0 and eth1 had been swapped after I upgraded to udev.
Please advice your distribution udev documentation how to "lock down"
scsi and network cards to specific kernel names.
Regards,
Henrik Holst
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: And then there was Bryce...
2006-06-08 6:38 ` Henrik Holst
@ 2006-06-08 10:36 ` Bryce
2006-06-08 15:59 ` John Stoffel
2006-06-13 18:38 ` Bill Davidsen
2006-06-08 15:54 ` H. Peter Anvin
1 sibling, 2 replies; 7+ messages in thread
From: Bryce @ 2006-06-08 10:36 UTC (permalink / raw)
To: Henrik Holst; +Cc: linux-raid
Henrik Holst wrote:
> Bryce wrote:
>
>> Gosh, where to start,..
>>
>> Ok general setup
>>
>> I'm using kernel version 2.6.17-rc5 and Raid 5 over 5 500Gb SATA disks
>>
>
> You have just upgraded to udev havn't you? :-)
>
> [snip!]
>
>
>> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
>> a story to frighten young sysadmins with.
>>
>
> The same happened to me with eth0-2. I _could_ not for my life
> understand why I didn't get internet connect to work. But then I
> realized that eth0 and eth1 had been swapped after I upgraded to udev.
> Please advice your distribution udev documentation how to "lock down"
> scsi and network cards to specific kernel names.
>
> Regards,
> Henrik Holst
>
Ah,.. yes,, udev has helpfully remapped where all the drives I have
were,.. and of course I've misread the log because my brain is so
fixated on expecting drives to be where they should
curse you UDEV!!
Phil
=--=
Move along, nothing to see here except an overstressed worker...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: And then there was Bryce...
2006-06-08 10:36 ` Bryce
@ 2006-06-08 15:59 ` John Stoffel
2006-06-08 17:01 ` H. Peter Anvin
2006-06-13 18:38 ` Bill Davidsen
1 sibling, 1 reply; 7+ messages in thread
From: John Stoffel @ 2006-06-08 15:59 UTC (permalink / raw)
To: Bryce; +Cc: Henrik Holst, linux-raid
>>>>> "Bryce" == Bryce <bryce@zeniv.linux.org.uk> writes:
Bryce> Ah,.. yes,, udev has helpfully remapped where all the drives I
Bryce> have were,.. and of course I've misread the log because my
Bryce> brain is so fixated on expecting drives to be where they should
Bryce> curse you UDEV!!
The problem is more likely that your /etc/mdadm/mdadm.conf file is
specifying exactly which partitions to use, instead of just doing
something like the following:
DEVICE partitions
ARRAY /dev/md0 level=raid1 auto=yes num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094
Which should do the trick for you. Can you post your mdadm.conf file
so we can look it over?
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: And then there was Bryce...
2006-06-08 15:59 ` John Stoffel
@ 2006-06-08 17:01 ` H. Peter Anvin
0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2006-06-08 17:01 UTC (permalink / raw)
To: linux-raid
Followup to: <17544.18790.382198.453845@smtp.charter.net>
By author: "John Stoffel" <john@stoffel.org>
In newsgroup: linux.dev.raid
>
> The problem is more likely that your /etc/mdadm/mdadm.conf file is
> specifying exactly which partitions to use, instead of just doing
> something like the following:
>
> DEVICE partitions
> ARRAY /dev/md0 level=raid1 auto=yes num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094
>
> Which should do the trick for you. Can you post your mdadm.conf file
> so we can look it over?
Hey guys, look at the syslog output again. He's using kernel autorun.
-hpa
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: And then there was Bryce...
2006-06-08 10:36 ` Bryce
2006-06-08 15:59 ` John Stoffel
@ 2006-06-13 18:38 ` Bill Davidsen
1 sibling, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2006-06-13 18:38 UTC (permalink / raw)
To: Bryce; +Cc: Henrik Holst, linux-raid
Bryce wrote:
> Henrik Holst wrote:
>
>> Bryce wrote:
>>
>>
>>> Gosh, where to start,..
>>>
>>> Ok general setup
>>>
>>> I'm using kernel version 2.6.17-rc5 and Raid 5 over 5 500Gb SATA
>>> disks
>>>
>>
>>
>> You have just upgraded to udev havn't you? :-)
>>
>> [snip!]
>>
>>
>>
>>> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
>>> a story to frighten young sysadmins with.
>>>
>>
>>
>> The same happened to me with eth0-2. I _could_ not for my life
>> understand why I didn't get internet connect to work. But then I
>> realized that eth0 and eth1 had been swapped after I upgraded to udev.
>> Please advice your distribution udev documentation how to "lock down"
>> scsi and network cards to specific kernel names.
>>
>> Regards,
>> Henrik Holst
>>
>
> Ah,.. yes,, udev has helpfully remapped where all the drives I have
> were,.. and of course I've misread the log because my brain is so
> fixated on expecting drives to be where they should
>
> curse you UDEV!!
If you were using PARTITIONS and letting mdadm assemble the RAID it
wouldn't matter. Using names is dangerous even without udev, I have a
system on (mostly) FC1, using a 2.6.15 kernel, and if I bbot with a
drive in the removable bay it calls that controller (VIA something) hde
and hdf, if there's no drive it drops the module for the controller and
everything else moves up by two.
Using mdadm I haven't been bitten by this in severl years.
I have similar problems on a RH8.0 system which needs to run the burner
on ide-scsi, depending on USB devices plugged at boot names are
negotiable. At boot time the "right" names are found and symlinks
created as needed.
Finally, there's a command which allows you to set names of NICs by MAC
address. Haven't needed it in years, I *think* it's called ifname from
the iproute2 stuff. That's from memory.
Hope some of this is useful, I over-answered the question.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: And then there was Bryce...
2006-06-08 6:38 ` Henrik Holst
2006-06-08 10:36 ` Bryce
@ 2006-06-08 15:54 ` H. Peter Anvin
1 sibling, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2006-06-08 15:54 UTC (permalink / raw)
To: linux-raid
Followup to: <4487C5F6.2080107@idgmail.se>
By author: Henrik Holst <henrik.holst@idgmail.se>
In newsgroup: linux.dev.raid
>
> The same happened to me with eth0-2. I _could_ not for my life
> understand why I didn't get internet connect to work. But then I
> realized that eth0 and eth1 had been swapped after I upgraded to udev.
> Please advice your distribution udev documentation how to "lock down"
> scsi and network cards to specific kernel names.
>
This doesn't explain how come it bound drives without superblocks.
It should only bind drives with the correct superblock UUID, EVER.
Udev doesn't actually matter here, since the kernel, not udev, assigns
the numbers to the drives.
-hpa
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-06-13 18:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08 0:41 And then there was Bryce Bryce
2006-06-08 6:38 ` Henrik Holst
2006-06-08 10:36 ` Bryce
2006-06-08 15:59 ` John Stoffel
2006-06-08 17:01 ` H. Peter Anvin
2006-06-13 18:38 ` Bill Davidsen
2006-06-08 15:54 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).