linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* And then there was Bryce...
@ 2006-06-08  0:41 Bryce
  2006-06-08  6:38 ` Henrik Holst
  0 siblings, 1 reply; 7+ messages in thread
From: Bryce @ 2006-06-08  0:41 UTC (permalink / raw)
  To: linux-raid


Gosh, where to start,..

Ok general setup

I'm using  kernel version 2.6.17-rc5 and  Raid 5 over 5 500Gb SATA disks

(boring dump)
-----------------------------------------------------------------------
[root@emerald ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat May 27 20:49:13 2006
     Raid Level : raid5
     Array Size : 1953533952 (1863.04 GiB 2000.42 GB)
    Device Size : 488383488 (465.76 GiB 500.10 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jun  8 01:05:24 2006
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 1024K

           UUID : d8d7cacb:24db29e6:46ace8ec:49547cc4
         Events : 0.143369

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
-----------------------------------------------------------------------

Anyway, I happen to have a 512MB USB pen drive that I was playing with 
earlier that I left attached over a reboot

What follows is horrifying.

 From the syslog...

Jun  7 18:47:10 Emerald syslogd 1.4.1: restart.
Jun  7 18:47:10 Emerald kernel: klogd 1.4.1, log source = /proc/kmsg 
started.
Jun  7 18:47:10 Emerald kernel: Linux version 2.6.17-rc5 (root@emerald) 
(gcc version 4.1.0 20060304 (Red Hat 4.1.0-3)) #2 SMP Sun May 28 
15:29:46 BST 2006
...
everything going ok,.. normal boot
and then it all goes horribly wrong,...


Jun  7 18:52:30 Emerald kernel: raid5: Disk failure on sde1, disabling 
device. Operation continuing on 3 devices
Jun  7 18:52:30 Emerald kernel: RAID5 conf printout:
Jun  7 18:52:30 Emerald kernel:  --- rd:5 wd:3 fd:2
Jun  7 18:52:30 Emerald kernel:  disk 0, o:1, dev:sdb1
Jun  7 18:52:30 Emerald kernel:  disk 1, o:1, dev:sdd1
Jun  7 18:52:30 Emerald kernel:  disk 2, o:0, dev:sde1
Jun  7 18:52:30 Emerald kernel:  disk 4, o:1, dev:sdg1
Jun  7 18:52:30 Emerald kernel: RAID5 conf printout:
Jun  7 18:52:30 Emerald kernel:  --- rd:5 wd:3 fd:2
Jun  7 18:52:30 Emerald kernel:  disk 0, o:1, dev:sdb1
Jun  7 18:52:30 Emerald kernel:  disk 1, o:1, dev:sdd1
Jun  7 18:52:30 Emerald kernel:  disk 4, o:1, dev:sdg1
Jun  7 18:54:37 Emerald kernel: Buffer I/O error on device dm-2, logical 
block 0
Jun  7 18:54:37 Emerald kernel: lost page write due to I/O error on dm-2
Jun  7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical 
block 488383472
Jun  7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical 
block 488383472
Jun  7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical 
block 488383486
Jun  7 18:57:11 Emerald kernel: Buffer I/O error on device md0, logical 
block 488383486
Jun  7 19:05:10 Emerald kernel: md: unbind<sde1>
Jun  7 19:05:10 Emerald kernel: md: export_rdev(sde1)
Jun  7 19:05:15 Emerald kernel: md: bind<sde1>

but wait a sec,.. WTF is this sdg1 in the raid printout?....
reading back in the syslog, I see

Jun  7 18:47:26 Emerald kernel: SCSI device sdg: 976773168 512-byte hdwr 
sectors (500108 MB)
Jun  7 18:47:26 Emerald kernel: sdg: Write Protect is off
Jun  7 18:47:26 Emerald kernel: SCSI device sdg: drive cache: write back
Jun  7 18:47:26 Emerald kernel: SCSI device sdg: 976773168 512-byte hdwr 
sectors (500108 MB)
Jun  7 18:47:26 Emerald kernel: sdg: Write Protect is off
Jun  7 18:47:26 Emerald kernel: SCSI device sdg: drive cache: write back
Jun  7 18:47:26 Emerald kernel:  sdg: sdg1
Jun  7 18:47:26 Emerald kernel: sd 6:0:0:0: Attached scsi disk sdg

well thats nice, thats my pendrive! so what happened when it setup the 
array?

Jun  7 18:47:30 Emerald kernel: md: Autodetecting RAID arrays.
Jun  7 18:47:30 Emerald kernel: md: autorun ...
Jun  7 18:47:30 Emerald kernel: md: considering sdg1 ...
Jun  7 18:47:30 Emerald kernel: md:  adding sdg1 ...
Jun  7 18:47:30 Emerald kernel: md:  adding sdf1 ...
Jun  7 18:47:30 Emerald kernel: md:  adding sde1 ...
Jun  7 18:47:30 Emerald kernel: md:  adding sdd1 ...
Jun  7 18:47:30 Emerald kernel: md:  adding sdb1 ...
Jun  7 18:47:30 Emerald kernel: md: created md0
Jun  7 18:47:30 Emerald kernel: md: bind<sdb1>
Jun  7 18:47:31 Emerald kernel: md: bind<sdd1>
Jun  7 18:47:31 Emerald kernel: md: bind<sde1>
Jun  7 18:47:31 Emerald kernel: md: bind<sdf1>
Jun  7 18:47:31 Emerald kernel: md: bind<sdg1>
Jun  7 18:47:31 Emerald kernel: md: running: <sdg1><sdf1><sde1><sdd1><sdb1>
Jun  7 18:47:31 Emerald kernel: md: kicking non-fresh sdf1 from array!
Jun  7 18:47:31 Emerald kernel: md: unbind<sdf1>
Jun  7 18:47:31 Emerald kernel: md: export_rdev(sdf1)
Jun  7 18:47:31 Emerald kernel: raid5: automatically using best 
checksumming function: pIII_sse
Jun  7 18:47:31 Emerald kernel:    pIII_sse  :  4203.000 MB/sec
Jun  7 18:47:31 Emerald kernel: raid5: using function: pIII_sse 
(4203.000 MB/sec)
Jun  7 18:47:31 Emerald kernel: md: raid5 personality registered for level 5
Jun  7 18:47:31 Emerald kernel: md: raid4 personality registered for level 4
Jun  7 18:47:31 Emerald kernel: raid5: device sdg1 operational as raid 
disk 4
Jun  7 18:47:31 Emerald kernel: raid5: device sde1 operational as raid 
disk 2
Jun  7 18:47:31 Emerald kernel: raid5: device sdd1 operational as raid 
disk 1
Jun  7 18:47:31 Emerald kernel: raid5: device sdb1 operational as raid 
disk 0
Jun  7 18:47:31 Emerald kernel: raid5: allocated 5248kB for md0
Jun  7 18:47:31 Emerald kernel: raid5: raid level 5 set md0 active with 
4 out of 5 devices, algorithm 2
Jun  7 18:47:31 Emerald kernel: RAID5 conf printout:
Jun  7 18:47:31 Emerald kernel:  --- rd:5 wd:4 fd:1
Jun  7 18:47:31 Emerald kernel:  disk 0, o:1, dev:sdb1
Jun  7 18:47:31 Emerald kernel:  disk 1, o:1, dev:sdd1
Jun  7 18:47:31 Emerald kernel:  disk 2, o:1, dev:sde1
Jun  7 18:47:31 Emerald kernel:  disk 4, o:1, dev:sdg1
Jun  7 18:47:31 Emerald kernel: md: ... autorun DONE.

WHAT THE HELL?!??
*considering sdg1* ?!?! then deciding it was fair game to use?!??
it's a FAT16 FS pendrive with NO UUID stuff on it...
suddenly the RAID5 gets very unhappy and becomes a RID5 and I spend the 
next few hours rebuilding it (fortunately all data was preserved but it 
wasn't a pleasant evening I can tell you)

Hum ho,.. I survived the horror but umm, well, I'll leave the above as a 
story to frighten young sysadmins with.

Phil
=--=



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08  0:41 And then there was Bryce Bryce
@ 2006-06-08  6:38 ` Henrik Holst
  2006-06-08 10:36   ` Bryce
  2006-06-08 15:54   ` H. Peter Anvin
  0 siblings, 2 replies; 7+ messages in thread
From: Henrik Holst @ 2006-06-08  6:38 UTC (permalink / raw)
  To: Bryce; +Cc: linux-raid

Bryce wrote:
>
> Gosh, where to start,..
>
> Ok general setup
>
> I'm using  kernel version 2.6.17-rc5 and  Raid 5 over 5 500Gb SATA disks

You have just upgraded to udev havn't you? :-)

[snip!]

>
> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
> a story to frighten young sysadmins with.

The same happened to me with eth0-2. I _could_ not for my life
understand why I didn't get internet connect to work. But then I
realized that eth0 and eth1 had been swapped after I upgraded to udev.
Please advice your distribution udev documentation how to "lock down"
scsi and network cards to specific kernel names.

Regards,
Henrik Holst

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08  6:38 ` Henrik Holst
@ 2006-06-08 10:36   ` Bryce
  2006-06-08 15:59     ` John Stoffel
  2006-06-13 18:38     ` Bill Davidsen
  2006-06-08 15:54   ` H. Peter Anvin
  1 sibling, 2 replies; 7+ messages in thread
From: Bryce @ 2006-06-08 10:36 UTC (permalink / raw)
  To: Henrik Holst; +Cc: linux-raid

Henrik Holst wrote:
> Bryce wrote:
>   
>> Gosh, where to start,..
>>
>> Ok general setup
>>
>> I'm using  kernel version 2.6.17-rc5 and  Raid 5 over 5 500Gb SATA disks
>>     
>
> You have just upgraded to udev havn't you? :-)
>
> [snip!]
>
>   
>> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
>> a story to frighten young sysadmins with.
>>     
>
> The same happened to me with eth0-2. I _could_ not for my life
> understand why I didn't get internet connect to work. But then I
> realized that eth0 and eth1 had been swapped after I upgraded to udev.
> Please advice your distribution udev documentation how to "lock down"
> scsi and network cards to specific kernel names.
>
> Regards,
> Henrik Holst
>   
Ah,.. yes,, udev has helpfully remapped where all the drives I have 
were,.. and of course I've misread the log because my brain is so 
fixated on expecting drives to be where they should

curse you UDEV!!

Phil
=--=


Move along, nothing to see here except an overstressed worker...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08  6:38 ` Henrik Holst
  2006-06-08 10:36   ` Bryce
@ 2006-06-08 15:54   ` H. Peter Anvin
  1 sibling, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2006-06-08 15:54 UTC (permalink / raw)
  To: linux-raid

Followup to:  <4487C5F6.2080107@idgmail.se>
By author:    Henrik Holst <henrik.holst@idgmail.se>
In newsgroup: linux.dev.raid
> 
> The same happened to me with eth0-2. I _could_ not for my life
> understand why I didn't get internet connect to work. But then I
> realized that eth0 and eth1 had been swapped after I upgraded to udev.
> Please advice your distribution udev documentation how to "lock down"
> scsi and network cards to specific kernel names.
> 

This doesn't explain how come it bound drives without superblocks.
It should only bind drives with the correct superblock UUID, EVER.

Udev doesn't actually matter here, since the kernel, not udev, assigns
the numbers to the drives.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08 10:36   ` Bryce
@ 2006-06-08 15:59     ` John Stoffel
  2006-06-08 17:01       ` H. Peter Anvin
  2006-06-13 18:38     ` Bill Davidsen
  1 sibling, 1 reply; 7+ messages in thread
From: John Stoffel @ 2006-06-08 15:59 UTC (permalink / raw)
  To: Bryce; +Cc: Henrik Holst, linux-raid

>>>>> "Bryce" == Bryce  <bryce@zeniv.linux.org.uk> writes:

Bryce> Ah,.. yes,, udev has helpfully remapped where all the drives I
Bryce> have were,.. and of course I've misread the log because my
Bryce> brain is so fixated on expecting drives to be where they should

Bryce> curse you UDEV!!

The problem is more likely that your /etc/mdadm/mdadm.conf file is
specifying exactly which partitions to use, instead of just doing
something like the following:

  DEVICE partitions
  ARRAY /dev/md0 level=raid1 auto=yes num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094

Which should do the trick for you.  Can you post your mdadm.conf file
so we can look it over?

John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08 15:59     ` John Stoffel
@ 2006-06-08 17:01       ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2006-06-08 17:01 UTC (permalink / raw)
  To: linux-raid

Followup to:  <17544.18790.382198.453845@smtp.charter.net>
By author:    "John Stoffel" <john@stoffel.org>
In newsgroup: linux.dev.raid
> 
> The problem is more likely that your /etc/mdadm/mdadm.conf file is
> specifying exactly which partitions to use, instead of just doing
> something like the following:
> 
>   DEVICE partitions
>   ARRAY /dev/md0 level=raid1 auto=yes num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094
> 
> Which should do the trick for you.  Can you post your mdadm.conf file
> so we can look it over?

Hey guys, look at the syslog output again.  He's using kernel autorun.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: And then there was Bryce...
  2006-06-08 10:36   ` Bryce
  2006-06-08 15:59     ` John Stoffel
@ 2006-06-13 18:38     ` Bill Davidsen
  1 sibling, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2006-06-13 18:38 UTC (permalink / raw)
  To: Bryce; +Cc: Henrik Holst, linux-raid

Bryce wrote:

> Henrik Holst wrote:
>
>> Bryce wrote:
>>  
>>
>>> Gosh, where to start,..
>>>
>>> Ok general setup
>>>
>>> I'm using  kernel version 2.6.17-rc5 and  Raid 5 over 5 500Gb SATA 
>>> disks
>>>     
>>
>>
>> You have just upgraded to udev havn't you? :-)
>>
>> [snip!]
>>
>>  
>>
>>> Hum ho,.. I survived the horror but umm, well, I'll leave the above as
>>> a story to frighten young sysadmins with.
>>>     
>>
>>
>> The same happened to me with eth0-2. I _could_ not for my life
>> understand why I didn't get internet connect to work. But then I
>> realized that eth0 and eth1 had been swapped after I upgraded to udev.
>> Please advice your distribution udev documentation how to "lock down"
>> scsi and network cards to specific kernel names.
>>
>> Regards,
>> Henrik Holst
>>   
>
> Ah,.. yes,, udev has helpfully remapped where all the drives I have 
> were,.. and of course I've misread the log because my brain is so 
> fixated on expecting drives to be where they should
>
> curse you UDEV!! 

If you were using PARTITIONS and letting mdadm assemble the RAID it 
wouldn't matter. Using names is dangerous even without udev, I have a 
system on (mostly) FC1, using a 2.6.15 kernel, and if I bbot with a 
drive in the removable bay it calls that controller (VIA something) hde 
and hdf, if there's no drive it drops the module for the controller and 
everything else moves up by two.

Using mdadm I haven't been bitten by this in severl years.

I have similar problems on a RH8.0 system which needs to run the burner 
on ide-scsi, depending on USB devices plugged at boot names are 
negotiable. At boot time the "right" names are found and symlinks 
created as needed.

Finally, there's a command which allows you to set names of NICs by MAC 
address. Haven't needed it in years, I *think* it's called ifname from 
the iproute2 stuff. That's from memory.

Hope some of this is useful, I over-answered the question.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-06-13 18:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08  0:41 And then there was Bryce Bryce
2006-06-08  6:38 ` Henrik Holst
2006-06-08 10:36   ` Bryce
2006-06-08 15:59     ` John Stoffel
2006-06-08 17:01       ` H. Peter Anvin
2006-06-13 18:38     ` Bill Davidsen
2006-06-08 15:54   ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).