* Software RAID Stopped Working With Aurora Kernel
@ 2002-05-19 23:38 Calvin Webster
2002-05-20 11:26 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Calvin Webster @ 2002-05-19 23:38 UTC (permalink / raw)
To: linux-raid
Looking for help from list members in resolving software RAID problems.
With the help of Tom "Spot" Callaway, I was able to upgrade my UltraSparc
IIi to Aurora. However, once the new kernel was booted, RAID stopped
working. We have been unable to determine what is causing it since Tom
doesn't have a SCSI software RAID box to duplicate my problem. His IDE RAID
works fine. After trying several things, he suggested that I post to this
list.
My reason for looking to Aurora was to fix the excruciatingly slow
performance of the "Happy Meal" driver on RHL 6.2. See sparc-list thread:
"Happy Meal Ethernet": Extremely slow and "MAX Packet size" errors). Dave
Miller advised me to upgrade to a 2.4 kernel. Aurora seemed the most logical
choice since it is based upon RedHat Linux. So, I joined the
aurora-sparc-user mailing list and Tom Callaway helped me get the resources
together to upgrade my system. See thread: "[aurora-sparc-user] I really
need a 2.4 kernel" and more in [aurora-sparc-devel]. What we've done so far
on the RAID issue is documented in the thread: "[aurora-sparc-devel] RAID5:
What's the story?"
The RAID was fully functional under RHL 6.2, providing Samba and FTP file
services, albeit extremely slowly. The upgrade to Aurora fixed the slow
Ethernet transfers very well. I'm now getting up to 7 MBytes/sec throughput
on a 100Mbps half/duplex connection. However, the software RAID seems to be
broken. From the log messages ("no chunksize specified"), it doesn't appear
to be even reading the raidtab. It also looks like it may be looking at the
wrong devices when starting up "md". I can't be sure, though, since I don't
know whether it is referring to major-minor/minor-major device numbers when
"dmesg" entries refer to "[dev 00:08]", for example?
The system boots from an IDE disk. The RAID disks are in an external SCSI
disk array. The boot IDE disk is an IBM "Death Star" (DeskStar) drive,
making noises like it wants to join the majority of its ill-fated brethren
on the scrap heap. I'd like to get the RAID issues resolved as soon as
possible. Please let me know what I can do to help.
I've included two sections below. The first shows most of the error messages
I'm getting relating to RAID/SCSI. The second gives a brief system profile
containing relevant information. If you need further information, please
let me know.
Thank you for any help or advice you can offer to help resolve this.
Cal Webster
Network Manager
NAWCTSD ISEO CPNC
Email: cwebster@ec.rr.com
###########################
# Begin Error Indications #
###########################
## Console messages after initial boot to Aurora kernel (with md0 devices in
/etc/fstab):
------------------------------------------------------------------
/lib/raid5.o: unresolved symbol md_unregister_thread_R4ba824f9
(about 12 more like this)
Note: modules without a GPL compatible license cannot use
GPLONLY_symbols
ERROR:/bin/insmod exited abnormally!
------------------------------------------------------------------
## System dropped to single-user mode. I commented out the lines referring
the "md0" filesystems in /etc/fstab and renamed /etc/raidtab. Then,
rebooted system.
## The system came up very cleanly. Aside from the RAID issues, the system
is very smooth and lightning fast!
## "lsmod" after clean boot:
------------------------------------------------------------------------
Module Size Used by Tainted: P
sunhme 25272 1
openprom 4964 0 (autoclean)
ide-cd 30832 0 (autoclean)
cdrom 28304 0 (autoclean) [ide-cd]
md 66264 0
xor 2680 0
sym53c8xx 71240 0
------------------------------------------------------------------------
## Then "modprobe raid5"
------------------------------------------------------------------------
Warning: loading /lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
will taint the kernel: no license
------------------------------------------------------------------------
## Another "lsmod"
------------------------------------------------------------------------
Module Size Used by Tainted: P
raid5 17536 0 (unused)
sunhme 25272 1
openprom 4964 0 (autoclean)
ide-cd 30832 0 (autoclean)
cdrom 28304 0 (autoclean) [ide-cd]
md 66264 0 [raid5]
xor 2680 0 [raid5]
sym53c8xx 71240 0
------------------------------------------------------------------------
## RAID/SCSI related messages from "dmesg":
It looks like device names that "md" is using (i.e. [dev 00:08]) are
incorrect. When it finally checks sdb1 it imports it, but later kicks it
out because it thinks the name was changed and considers it faulty.
----------------------------------------------------------------------------
-
sym53c8xx: at PCI bus 3, device 0, function 0
sym53c8xx: 53c875 detected
sym53c8xx: at PCI bus 3, device 1, function 0
sym53c8xx: 53c875 detected
sym53c875-0: rev 0x4 on pci bus 3 device 0 function 0 irq 4,7d8
sym53c875-0: ID 7, Fast-20, Parity Checking
sym53c875-1: rev 0x4 on pci bus 3 device 1 function 0 irq 4,7d9
sym53c875-1: ID 7, Fast-20, Parity Checking
scsi0 : sym53c8xx-1.7.3c-20010512
scsi1 : sym53c8xx-1.7.3c-20010512
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Vendor: FUJITSU Model: MAA3182S SUN18G Rev: 2107
Type: Direct-Access ANSI SCSI revision: 02
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 2, lun 0
Attached scsi disk sdd at scsi1, channel 0, id 3, lun 0
Attached scsi disk sde at scsi1, channel 0, id 4, lun 0
Attached scsi disk sdf at scsi1, channel 0, id 5, lun 0
sym53c875-1-<0,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sda: 35378533 512-byte hdwr sectors (18114 MB)
sda: sda1 sda3
sym53c875-1-<1,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sdb: 35378533 512-byte hdwr sectors (18114 MB)
sdb: sdb1 sdb3
sym53c875-1-<2,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sdc: 35378533 512-byte hdwr sectors (18114 MB)
sdc: sdc1 sdc3
sym53c875-1-<3,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sdd: 35378533 512-byte hdwr sectors (18114 MB)
sdd: sdd1 sdd3
sym53c875-1-<4,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sde: 35378533 512-byte hdwr sectors (18114 MB)
sde: sde1 sde3
sym53c875-1-<5,*>: FAST-10 WIDE SCSI 20.0 MB/s (100.0 ns, offset 16)
SCSI device sdf: 35378533 512-byte hdwr sectors (18114 MB)
raid5: using function: VIS (113.600 MB/sec)
kmod: failed to exec /sbin/modprobe -s -k block-major-9, errno = 2
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
[events: 00000000]
md: could not lock [dev 00:08], zero-size? Marking faulty.
md: could not import [dev 00:08], trying to run array nevertheless.
[events: 76692d66]
md: invalid raid superblock magic on [dev 01:08]
md: [dev 01:08] has invalid sb, not importing!
md: could not import [dev 01:08], trying to run array nevertheless.
md: could not lock [dev 02:08], zero-size? Marking faulty.
md: could not import [dev 02:08], trying to run array nevertheless.
md: hda8 has zero size, marking faulty!
md: could not import hda8, trying to run array nevertheless.
md: could not lock [dev 04:08], zero-size? Marking faulty.
md: could not import [dev 04:08], trying to run array nevertheless.
md: could not lock [dev 05:08], zero-size? Marking faulty.
md: could not import [dev 05:08], trying to run array nevertheless.
md: could not import [dev 05:08], trying to run array nevertheless.
md: autorun ...
md: considering sdb1 ...
md: adding sdb1 ...
md: created md0
md: bind<sdb1,1>
md: running: <sdb1>
md: sdb1's event counter: 00000000
md: device name has changed from [dev 00:08] to sdb1 since last import!
md0: kicking faulty sdb1!
md: unbind<sdb1,0>
md: export_rdev(sdb1)
md0: former device [dev 01:08] is unavailable, removing from array!
md0: removing former faulty [dev 02:08]!
md0: former device hda8 is unavailable, removing from array!
md0: former device [dev 04:08] is unavailable, removing from array!
md0: removing former faulty [dev 05:08]!
no chunksize specified, see 'man raidtab'
md :do_md_run() returned -22
md: md0 stopped.
md: ... autorun DONE.
----------------------------------------------------------------------------
-
## Raid related entries from /var/log/messages:
----------------------------------------------------------------------------
-
May 7 18:52:06 winggear kernel: raid5: measuring checksumming speed
May 7 18:52:06 winggear kernel: VIS : 113.600 MB/sec
May 7 18:52:06 winggear kernel: raid5: using function: VIS (113.600
MB/sec)
May 7 18:52:06 winggear kernel: kmod: failed to exec /sbin/modprobe -s
-k block-major-9, errno = 2
May 7 18:52:06 winggear kernel: md: md driver 0.90.0 MAX_MD_DEVS=256,
MD_SB_DISKS=27
May 7 18:52:06 winggear kernel: [events: 00000000]
May 7 18:52:06 winggear kernel: md: could not lock [dev 00:08],
zero-size? Marking faulty.
May 7 18:52:06 winggear kernel: md: could not import [dev 00:08],
trying to run array nevertheless.
May 7 18:52:06 winggear kernel: [events: 76692d66]
May 7 18:52:06 winggear kernel: md: invalid raid superblock magic on
[dev 01:08]
May 7 18:52:06 winggear kernel: md: [dev 01:08] has invalid sb, not
importing!
May 7 18:52:06 winggear kernel: md: could not import [dev 01:08],
trying to run array nevertheless.
May 7 18:52:06 winggear kernel: md: could not lock [dev 02:08],
zero-size? Marking faulty.
May 7 18:52:06 winggear kernel: md: could not import [dev 02:08],
trying to run array nevertheless.
May 7 18:52:06 winggear kernel: md: hda8 has zero size, marking faulty!
May 7 18:52:06 winggear kernel: md: could not import hda8, trying to
run array nevertheless.
May 7 18:52:06 winggear kernel: md: could not lock [dev 04:08],
zero-size? Marking faulty.
May 7 18:52:06 winggear kernel: md: could not import [dev 04:08],
trying to run array nevertheless.
May 7 18:52:06 winggear kernel: md: could not lock [dev 05:08],
zero-size? Marking faulty.
May 7 18:52:06 winggear kernel: md: could not import [dev 05:08],
trying to run
array nevertheless.
May 7 18:52:06 winggear kernel: md: autorun ...
May 7 18:52:06 winggear kernel: md: considering sdb1 ...
May 7 18:52:06 winggear kernel: md: adding sdb1 ...
May 7 18:52:06 winggear kernel: md: created md0
May 7 18:52:06 winggear kernel: md: bind<sdb1,1>
May 7 18:52:06 winggear kernel: md: running: <sdb1>
May 7 18:52:06 winggear kernel: md: sdb1's event counter: 00000000
May 7 18:52:06 winggear kernel: md: device name has changed from [dev
00:08] to sdb1 since last import!
May 7 18:52:06 winggear kernel: md0: kicking faulty sdb1!
May 7 18:52:06 winggear kernel: md: unbind<sdb1,0>
May 7 18:52:06 winggear kernel: md: export_rdev(sdb1)
May 7 18:52:06 winggear kernel: md0: former device [dev 01:08] is
unavailable, removing from array!
May 7 18:52:06 winggear kernel: md0: removing former faulty [dev
02:08]!
May 7 18:52:06 winggear kernel: md0: former device hda8 is unavailable,
removing from array!
May 7 18:52:06 winggear kernel: md0: former device [dev 04:08] is
unavailable, removing from array!
May 7 18:52:06 winggear kernel: md0: removing former faulty [dev
05:08]!
May 7 18:52:06 winggear kernel: no chunksize specified, see 'man
raidtab'
May 7 18:52:06 winggear kernel: md :do_md_run() returned -22
May 7 18:52:06 winggear kernel: md: md0 stopped.
May 7 18:52:06 winggear kernel: md: ... autorun DONE.
----------------------------------------------------------------------------
-
#########################
# End Error Indications #
#########################
########################
# Begin System Profile #
########################
A brief system profile:
cpu : TI UltraSparc IIi
fpu : UltraSparc IIi integrated FPU
promlib : Version 3 Revision 14
prom : 3.14.0
type : sun4u
ncpus probed : 1
ncpus active : 1
Cpu0Bogo : 599.65
Cpu0ClkTck : 0000000011e1a3cb
MMU Type : Spitfire
## Contents of /etc/silo.conf
-------------------------------------------------
partition=5
timeout=50
root=/dev/hda5
read-only
default=linux
image=/boot/vmlinuz-2.4.18-0.92sparc
label=linux
initrd=/boot/initrd-2.4.18-0.92sparc.img
image=/boot/vmlinuz-2.4.18-0.91sparc
label=linux.aurora
initrd=/boot/initrd-2.4.18-0.91sparc.img
image=/boot/vmlinuz-2.2.19-6.2.16
label=linux.old
initrd=/boot/initrd-2.2.19-6.2.16.img
image=/boot/vmlinuz-2.2.19-6.2.12
label=linux.old2
initrd=/boot/initrd-2.2.19-6.2.12.img
-------------------------------------------------
## Listing of RAID devices:
------------------------------------------------------------
brw-rw---- 1 root disk 8, 1 May 5 1998 sda1
brw-rw---- 1 root disk 8, 17 May 5 1998 sdb1
brw-rw---- 1 root disk 8, 33 May 5 1998 sdc1
brw-rw---- 1 root disk 8, 49 May 5 1998 sdd1
brw-rw---- 1 root disk 8, 65 Apr 16 1999 sde1
brw-rw---- 1 root disk 8, 81 Apr 16 1999 sdf1
------------------------------------------------------------
## Contents of /etc/raidtab:
-------------------------------------------------
#
# 'persistent' RAID5 setup, with one spare disk:
#
raiddev /dev/md0
raid-level 5
nr-raid-disks 5
nr-spare-disks 1
persistent-superblock 1
chunk-size 128
device /dev/sdb1
raid-disk 1
device /dev/sdc1
raid-disk 2
device /dev/sdd1
raid-disk 3
device /dev/sde1
raid-disk 4
device /dev/sda1
raid-disk 0
device /dev/sdf1
spare-disk 0
-------------------------------------------------
## Partition Table of Boot IDE Disk:
----------------------------------------------------------------------
Disk /dev/hda (Sun disk label): 15 heads, 63 sectors, 42526 cylinders
Units = cylinders of 945 * 512 bytes
Device Flag Start End Blocks Id System
/dev/hda1 0 19488 9208048+ 83 Linux native
/dev/hda2 19488 38977 9208552+ 83 Linux native
/dev/hda3 0 42526 20093535 5 Whole disk
/dev/hda4 u 38977 39532 262237+ 83 Linux native
/dev/hda5 39532 40087 262237+ 83 Linux native
/dev/hda6 40087 40642 262237+ 82 Linux swap
----------------------------------------------------------------------
## Partition Tables on RAID disks:
----------------------------------------------------------------------
Disk /dev/sda (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sda1 1 7506 17681780 fd Linux raid autodetect
/dev/sda3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Disk /dev/sdb (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sdb1 1 7506 17681780 fd Linux raid autodetect
/dev/sdb3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Disk /dev/sdc (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sdc1 1 7506 17681780 fd Linux raid autodetect
/dev/sdc3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Disk /dev/sdd (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sdd1 1 7506 17681780 fd Linux raid autodetect
/dev/sdd3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Disk /dev/sde (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sde1 1 7506 17681780 fd Linux raid autodetect
/dev/sde3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Disk /dev/sdf (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sdf1 1 7506 17681780 fd Linux raid autodetect
/dev/sdf3 0 7506 17684136 5 Whole disk
----------------------------------------------------------------------
Upgraded the following packages from Aurora Scratch tree:
initscripts-6.67-3sparc.sparc.rpm
raidtools-1.00.2-1.3.sparc.rpm
kernel-source-2.4.18-0.92sparc.sparc.rpm
kernel-doc-2.4.18-0.92sparc.sparc.rpm
Installed kernel-2.4.18-0.92sparc.sparc64.rpm
######################
# End System Profile #
######################
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-19 23:38 Software RAID Stopped Working With Aurora Kernel Calvin Webster
@ 2002-05-20 11:26 ` Neil Brown
2002-05-20 19:25 ` Cal Webster
2002-05-21 11:24 ` Calvin D. Webster
0 siblings, 2 replies; 15+ messages in thread
From: Neil Brown @ 2002-05-20 11:26 UTC (permalink / raw)
To: cwebster; +Cc: linux-raid
On Sunday May 19, cwebster@ec.rr.com wrote:
> Looking for help from list members in resolving software RAID problems.
Here is the help:
1/ You seem to have two copies of raid5.o which seems to be confusing
modprobe.
I don't know why, but in one instance it tried to load
/lib/raid5.o
and gets :unresolved symbol md_unregister_thread_R4ba824f9
The other time it correctly loads
/lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
I suggest removing /lib/raid5.c
2/ 2.4 raid is *not* upwards compatable with patched 2.2 raid on Sparc64
or any architecture that requires a 64bit integer to be 64bit
aligned instead of just 32bit aligned.
I'm sorry, but this was a bug in the raid patches and it is not
practical to reliably detect raid devices that were built
badly because of this bug. The bug results in the on-disc
superblock being laid out wrongly.
You will need to remake the array for 2.4.
If you remake the array correctly, you will not lose any
data.
Alternately, if you are feeling adventurous, you could download
and compile mdadm 1.0.1 from
http://www.cse.unsw.edu.au/~neilb/source/mdadm/
For each device that should be part of the array (e.g. /dev/sda1)
run
mdadm --examine /dev/sda1
This should show you some sensible information up to the Checksum,
and then bad information after that.
Then run
mdadm --examine --sparc2.2 /dev/sda1
This should show you the same sensible information, and then
sensible information after the Checksum as well.
If it does, then run
mdadm --examine --sparc2.2update /dev/sda1
and this should show the same sensible information, and then update
the superblock on the device to have this information too.
After this,
mdadm --examine --sparc2.2 /dev/sda1
should show the correct information.
If this works, you can then run the same command on the order
devices, and then assemble the array with
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 ...
or whatever your actual device names are.
Note: I have not tested this code, but it looks OK.
If you try it and something goes wrong, then give me as much
information as you can and I will do my best to make sure you get
your data back.
Note2: --sparc2.2 and --sparc2.2update are not yet documented,
except in this mail item.
If you confirm that they work, I will add some documentation.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-20 11:26 ` Neil Brown
@ 2002-05-20 19:25 ` Cal Webster
2002-05-20 21:09 ` Neil Brown
2002-05-21 11:24 ` Calvin D. Webster
1 sibling, 1 reply; 15+ messages in thread
From: Cal Webster @ 2002-05-20 19:25 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Thank you for the excellent information and suggestions. I would like to try
using mdadm, but had some trouble compiling it. Also, please see notes on
raid5 modules below.
> -----Original Message-----
> From: Neil Brown [mailto:neilb@notabene]On Behalf Of Neil Brown
> Sent: Monday, May 20, 2002 5:27 AM
> To: cwebster@ec.rr.com
> Cc: linux-raid
> Subject: Re: Software RAID Stopped Working With Aurora Kernel
>
>
> On Sunday May 19, cwebster@ec.rr.com wrote:
> > Looking for help from list members in resolving software RAID problems.
>
> Here is the help:
>
> 1/ You seem to have two copies of raid5.o which seems to be confusing
> modprobe.
> I don't know why, but in one instance it tried to load
> /lib/raid5.o
> and gets :unresolved symbol md_unregister_thread_R4ba824f9
>
> The other time it correctly loads
> /lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
>
> I suggest removing /lib/raid5.c
Strangely, the only time this error occurred was when I tried to boot with
"md0" devices listed in /etc/fstab. I don't know why it reported this path.
There are no modules directly under /lib. Here's where all the modules are.
I will remove all but the current kernel modules. In fact, I'll be removing
all the 2.2 kernels since I can't boot them any more anyway.
[root@winggear redhat]# find /lib -name raid5.o
/lib/modules/2.2.14-5.0/block/raid5.o
/lib/modules/2.2.19-6.2.1/block/raid5.o
/lib/modules/2.2.19-6.2.12/block/raid5.o
/lib/modules/2.2.19-6.2.16/block/raid5.o
/lib/modules/2.4.18-0.91sparc/kernel/drivers/md/raid5.o
/lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
> Alternately, if you are feeling adventurous, you could download
> and compile mdadm 1.0.1 from
> http://www.cse.unsw.edu.au/~neilb/source/mdadm/
>
Using your source RPM:
[root@winggear redhat]# rpm -bc --target sparc-redhat-linux SPECS/mdadm.spec
Building target platforms: sparc-redhat-linux
Building for target sparc-redhat-linux
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.90707
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd /usr/src/redhat/BUILD
+ rm -rf mdadm-1.0.1
+ /bin/gzip -dc /usr/src/redhat/SOURCES/mdadm-1.0.1.tgz
+ tar -xf -
gzip: /usr/src/redhat/SOURCES/mdadm-1.0.1.tgz: decompression OK, trailing
garbag
e ignored
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd mdadm-1.0.1
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.90717
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd mdadm-1.0.1
+ make 'CXFLAGS=-O2 -m32 -mtune=ultrasparc' SYSCONFDIR=/etc
gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -O2 -m3
2 -m
tune=ultrasparc -c -o mdadm.o mdadm.c
cc1: warnings being treated as errors
In file included from mdadm.h:57,
from mdadm.c:30:
md_p.h: In function `md_event':
md_p.h:168: warning: left shift count >= width of type
make: *** [mdadm.o] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.90717 (%build)
RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.90717 (%build)
============= Compilers and Libraries ================
gcc-2.96-101
gcc-sparc32-2.96-101gcc-c++-2.96-101
gcc-g77-2.96-101
gcc3-g77-3.0.1-3
gcc-c++-sparc32-2.96-101
gcc3-3.0.1-3
libgcc-3.0.1-3
gcc3-c++-3.0.1-3
gcc-objc-2.96-101
gcc-chill-2.96-101
glib-1.2.10-5
glibc-2.2.4-19
glib-devel-1.2.10-5
compat-glibc-6.2-2.1.3.2
glibc-profile-2.2.4-19
glib10-1.0.6-10
glibc-devel-2.2.4-19
glibc-common-2.2.4-19
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-20 19:25 ` Cal Webster
@ 2002-05-20 21:09 ` Neil Brown
2002-05-20 21:33 ` Calvin D. Webster
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2002-05-20 21:09 UTC (permalink / raw)
To: cwebster; +Cc: linux-raid
On Monday May 20, kc130iseo@coastalnet.com wrote:
> > I don't know why, but in one instance it tried to load
> > /lib/raid5.o
> > and gets :unresolved symbol md_unregister_thread_R4ba824f9
> >
> > The other time it correctly loads
> > /lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
> >
> > I suggest removing /lib/raid5.c
>
> Strangely, the only time this error occurred was when I tried to boot with
> "md0" devices listed in /etc/fstab. I don't know why it reported this path.
> There are no modules directly under /lib. Here's where all the modules are.
> I will remove all but the current kernel modules. In fact, I'll be removing
> all the 2.2 kernels since I can't boot them any more anyway.
Odd.... you could do an "nm" of each raid5.o and see which one
mentions
md_unregister_thread_R4ba824f9
>
> Using your source RPM:
...
> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -O2 -m3
> 2 -m
> tune=ultrasparc -c -o mdadm.o mdadm.c
> cc1: warnings being treated as errors
> In file included from mdadm.h:57,
> from mdadm.c:30:
> md_p.h: In function `md_event':
> md_p.h:168: warning: left shift count >= width of type
> make: *** [mdadm.o] Error 1
> error: Bad exit status from /var/tmp/rpm-tmp.90717 (%build)
Even odder. It seems to think that __u64 is only 32bits wide.
Sounds like a compiler error, or an include file error.
I suspect __u64 is defined in /usr/include/asm/types.h to be
typedef unsigned long __u64;
where it must need to be "unsigned long long", but I don't know enough
about the different flavours of sparc or the different compilers to
figure out the most likely problem.
Maybe just put
#define __64 unsigned long long
in mdadm.h and see if that helps.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-20 21:09 ` Neil Brown
@ 2002-05-20 21:33 ` Calvin D. Webster
0 siblings, 0 replies; 15+ messages in thread
From: Calvin D. Webster @ 2002-05-20 21:33 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
>
> On Monday May 20, kc130iseo@coastalnet.com wrote:
> > > I don't know why, but in one instance it tried to load
> > > /lib/raid5.o
> > > and gets :unresolved symbol md_unregister_thread_R4ba824f9
> > >
> > > The other time it correctly loads
> > > /lib/modules/2.4.18-0.92sparc/kernel/drivers/md/raid5.o
> > >
> > > I suggest removing /lib/raid5.c
> >
> > Strangely, the only time this error occurred was when I tried to boot with
> > "md0" devices listed in /etc/fstab. I don't know why it reported this path.
> > There are no modules directly under /lib. Here's where all the modules are.
> > I will remove all but the current kernel modules. In fact, I'll be removing
> > all the 2.2 kernels since I can't boot them any more anyway.
>
> Odd.... you could do an "nm" of each raid5.o and see which one
> mentions
> md_unregister_thread_R4ba824f9
Both of the 2.4 raid5.o modules show this. None of the 2.2 modules do.
> > Using your source RPM:
> ...
> > gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -O2 -m3
> > 2 -m
> > tune=ultrasparc -c -o mdadm.o mdadm.c
> > cc1: warnings being treated as errors
> > In file included from mdadm.h:57,
> > from mdadm.c:30:
> > md_p.h: In function `md_event':
> > md_p.h:168: warning: left shift count >= width of type
> > make: *** [mdadm.o] Error 1
> > error: Bad exit status from /var/tmp/rpm-tmp.90717 (%build)
>
> Even odder. It seems to think that __u64 is only 32bits wide.
> Sounds like a compiler error, or an include file error.
> I suspect __u64 is defined in /usr/include/asm/types.h to be
> typedef unsigned long __u64;
>
This is probably my fault. I've had to rearrange the "asm" links in the
include directories to get some packages to build while upgrading to
Aurora. The RHL 6.2 "include/asm*" links and directories were different.
I think I lost track of which were supposed to be where in the 2.4
kernel.
Tom "Spot" Callaway was good enough to build mdadm for me on his machine
and upload it to the Aurora site. So, I've got a good package to use
until this gets fixed.
Here's what I've got now. What do they need to look like? I know, its a
mess.
[root@winggear rpms]# ls -l /usr/include/ | grep asm
lrwxrwxrwx 1 root root 34 May 20 14:09 asm ->
/usr/src/linux/include/asm-sparc64
lrwxrwxrwx 1 root root 34 May 11 16:54 asm-generic ->
/usr/src/linux/include/asm-generic
drwxr-xr-x 2 root root 4096 May 20 14:09 asm.sav
lrwxrwxrwx 1 root root 32 May 11 16:56 asm-sparc ->
/usr/src/linux/include/asm-sparc
lrwxrwxrwx 1 root root 34 May 11 16:56 asm-sparc64 ->
/usr/src/linux/include/asm-sparc64
[root@winggear rpms]# ls -l /usr/include/asm.sav/
total 0
lrwxrwxrwx 1 root root 34 May 11 17:01 asm-generic ->
/usr/src/linux/include/asm-generic
lrwxrwxrwx 1 root root 32 Apr 5 17:38 asm-sparc ->
/usr/src/linux/include/asm-sparc
lrwxrwxrwx 1 root root 34 Apr 5 17:39 asm-sparc64 ->
/usr/src/linux/include/asm-sparc64
[root@winggear rpms]# ls -l /usr/src/ | grep linux
lrwxrwxrwx 1 root root 22 May 11 14:36 linux ->
linux-2.4.18-0.92sparc
drwxr-xr-x 5 root root 4096 May 6 02:47 linux-2.2.19
lrwxrwxrwx 1 root root 22 May 7 18:41 linux-2.4 ->
linux-2.4.18-0.92sparc
drwxr-xr-x 15 root root 4096 May 15 12:57
linux-2.4.18-0.92sparc
[root@winggear rpms]# ls -l /usr/src/linux-2.4.18-0.92sparc/include/ |
grep asm
lrwxrwxrwx 1 root root 11 May 15 12:25 asm ->
asm-sparc64
drwxr-xr-x 2 root root 4096 May 7 18:41 asm-generic
drwxr-xr-x 2 root root 4096 May 7 18:41 asm-sparc
drwxr-xr-x 2 root root 4096 May 11 14:12 asm-sparc64
/usr/src/linux-2.4.18-0.92sparc/include/asm-generic
/usr/src/linux-2.4.18-0.92sparc/include/asm
/usr/src/linux-2.4.18-0.92sparc/include/asm-sparc
/usr/src/linux-2.4.18-0.92sparc/include/asm-sparc64
Thanks!
I'm working on the RAID now, using your mdadm tool.
--Cal Webster
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-20 11:26 ` Neil Brown
2002-05-20 19:25 ` Cal Webster
@ 2002-05-21 11:24 ` Calvin D. Webster
2002-05-21 12:10 ` Neil Brown
1 sibling, 1 reply; 15+ messages in thread
From: Calvin D. Webster @ 2002-05-21 11:24 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
>
> On Sunday May 19, cwebster@ec.rr.com wrote:
> > Looking for help from list members in resolving software RAID problems.
>
> Here is the help:
> For each device that should be part of the array (e.g. /dev/sda1)
> run
> mdadm --examine /dev/sda1
> This should show you some sensible information up to the Checksum,
> and then bad information after that.
Yup, just like you said... sensible before, nonsense after.
> Then run
> mdadm --examine --sparc2.2 /dev/sda1
> This should show you the same sensible information, and then
> sensible information after the Checksum as well.
This looks much better, but not just right. I would expect that the list
following "Chunk Size" would be close to the /etc/raidtab that was being
used prior to the Aurora upgrade when it was last working. At any rate,
the "this" device and the last device listed are definitely not correct.
I'm not sure what "this" should be, but I assume it should list
parameters for the device being queried.
What I'm getting with "mdadm --examine --sparc2.2":
The first device listed ("this") always has a minor number of "8". Major
numbers for the "this" device on each raid drive, a through f, are
4,5,0,2,3,1, respectively. I don't know if this is random or somehow
correlates to the original raid device sequence. None of these is an
actual raid device, though.
The last device listed is always:
5 0 5 8 33 faulty sync
The remaining devices in the middle of each list, although out of
sequence in terms of /etc/raidtab, have the correct major/minor device
numbers. To be honest, though, I don't remember if the device list in
/proc/mdstat prior to the Aurora upgrade actually matched /etc/raidtab
in terms of raid device numbers either. If I remember correctly, they
seemed to be reversed but it worked so I left it alone. I do distinctly
remember that, prior to the Aurora upgrade, all raid devices were
operational and there was no reconstruction in progress. The system was
virtually quiescent.
How should the spare drive show up in these lists? /dev/sdc1 is missing
from all the lists. Could this have been the actual spare?
I've included below, the complete output from "mdadm --examine
--sparc2.2" for each raid device. First, I've listed the contents of
/etc/raidtab and a long listing of the actual raid devices. Please let
me know if this makes any sense to you.
Thanks for the help!
--Cal Webster
==============================================================================================
## Contents of /etc/raidtab
------------------------------------------------
#
# 'persistent' RAID5 setup, with one spare disk:
#
raiddev /dev/md0
raid-level 5
nr-raid-disks 5
nr-spare-disks 1
persistent-superblock 1
chunk-size 128
device /dev/sdb1
raid-disk 1
device /dev/sdc1
raid-disk 2
device /dev/sdd1
raid-disk 3
device /dev/sde1
raid-disk 4
device /dev/sda1
raid-disk 0
device /dev/sdf1
spare-disk 0
------------------------------------------------
## Raid devices:
----------------------------------------------------------------------
brw-rw---- 1 root disk 8, 1 May 5 1998 sda1
brw-rw---- 1 root disk 8, 17 May 5 1998 sdb1
brw-rw---- 1 root disk 8, 33 May 5 1998 sdc1
brw-rw---- 1 root disk 8, 49 May 5 1998 sdd1
brw-rw---- 1 root disk 8, 65 Apr 16 1999 sde1
brw-rw---- 1 root disk 8, 81 Apr 16 1999 sdf1
----------------------------------------------------------------------
----------------[ mdadm --examine --sparc2.2 /dev/sda1
]----------------
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abae - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 4 8 1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
----------------[ mdadm --examine --sparc2.2 /dev/sdb1
]----------------
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abbb - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 0 8 17 faulty
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
----------------[ mdadm --examine --sparc2.2 /dev/sdc1
]----------------
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abce - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 5 8 33 faulty sync
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
----------------[ mdadm --examine --sparc2.2 /dev/sdd1
]----------------
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abdf - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 2 8 49 faulty active /dev/fd0h1200
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
----------------[ mdadm --examine --sparc2.2 /dev/sde1
]----------------
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abf1 - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 3 8 65 sync /dev/hda8
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
----------------[ mdadm --examine --sparc2.2 /dev/sdf1
]----------------
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abfd - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 1 8 81 active /dev/ram8
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 0 5 8 33 faulty sync
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-21 11:24 ` Calvin D. Webster
@ 2002-05-21 12:10 ` Neil Brown
2002-05-21 17:30 ` Cal Webster
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2002-05-21 12:10 UTC (permalink / raw)
To: cwebster; +Cc: linux-raid
On Tuesday May 21, cwebster@ec.rr.com wrote:
>
> This looks much better, but not just right. I would expect that the list
> following "Chunk Size" would be close to the /etc/raidtab that was being
> used prior to the Aurora upgrade when it was last working. At any rate,
> the "this" device and the last device listed are definitely not correct.
> I'm not sure what "this" should be, but I assume it should list
> parameters for the device being queried.
Thanks for the detail. I see my mistake.
Please apply this patch and try again.. (memcpy counts in words, even
when I give it pointers to __u32's (ofcourse!)).
NeilBrown
--- Examine.c 2002/05/21 12:07:44 1.1
+++ Examine.c 2002/05/21 12:08:01
@@ -170,7 +170,7 @@
__u32 *sb32 = (__u32*)&super;
memcpy(sb32+MD_SB_GENERIC_CONSTANT_WORDS+7,
sb32+MD_SB_GENERIC_CONSTANT_WORDS+7+1,
- MD_SB_WORDS - (MD_SB_GENERIC_CONSTANT_WORDS+7+1));
+ (MD_SB_WORDS - (MD_SB_GENERIC_CONSTANT_WORDS+7+1))*4);
printf (" --- adjusting superblock for 2.2/sparc compatability ---\n");
}
printf(" Events : %d.%d\n", super.events_hi, super.events_lo);
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-21 12:10 ` Neil Brown
@ 2002-05-21 17:30 ` Cal Webster
2002-05-21 20:46 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Cal Webster @ 2002-05-21 17:30 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org]On Behalf Of Neil Brown
> Sent: Tuesday, May 21, 2002 6:10 AM
> To: cwebster@ec.rr.com
> Cc: linux-raid
> Subject: Re: Software RAID Stopped Working With Aurora Kernel
>
Before I could react, Tom Callaway had applied the patch and placed a
rebuilt binary rpm on the Aurora site.
This looks much more like normal. The device sequence, by raid device
number, looks right now. The reason it doesn't match /etc/raidtab exactly is
because I conducted failure/fail-over tests after placing this RAID in
service. The tests involved rendering one, then two of the devices
inoperable (/dev/sdc1) and verify that the raid would pick-up the spare
(/dev/sdf1) on the first failure and run in degraded mode following the
second. After bringing the simulated failures back on-line I didn't bother
to re-order the devices.
I've again appended the output from "mdadm --examine --sparc2.2" for each
device to the end of this message for your reference.
Just a couple of questions before I proceed:
1. Is there anything I need to change in "/etc/raidtab"?
- Should it match the actual configuration?
- Differences between raidtools 0.90-6 and 1.00.2-1.3
2. Will the "Version" number in the superblock automatically be updated when
I start the RAID?
Thank you very much for seeing this through with me!
--Cal Webster
P.S. You need not cc: my email address. I'm a list member so I get duplicate
messages.
==================
Things to do next:
==================
## Update the superblock:
mdadm --examine --sparc2.2update /dev/sda1
## Re-check after updating:
mdadm --examine --sparc2.2 /dev/sda1
## Assemble the array (use raid device order):
## Only assemble active devices, not spare.
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdf1 /dev/sdd1 /dev/sde1
## Start the RAID
raidstart /dev/md0
## Verify that raid is running
cat /proc/mdstat
## Verify that data is intact
=================================================================
--------[ mdadm --examine --sparc2.2 /dev/sda1 ]--------
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abae - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 4 8 1 0 active sync /dev/sda1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
--------[ mdadm --examine --sparc2.2 /dev/sdb1 ]--------
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abbb - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 8 17 1 active sync /dev/sdb1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
--------[ mdadm --examine --sparc2.2 /dev/sdc1 ]--------
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abce - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 5 8 33 5 /dev/sdc1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
--------[ mdadm --examine --sparc2.2 /dev/sdd1 ]--------
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abdf - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 2 8 49 3 active sync /dev/sdd1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
--------[ mdadm --examine --sparc2.2 /dev/sde1 ]--------
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abf1 - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 3 8 65 4 active sync /dev/sde1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
--------[ mdadm --examine --sparc2.2 /dev/sdf1 ]--------
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 6
Preferred Minor : 0
Update Time : Mon May 6 18:41:41 2002
State : clean, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 0
Spare Devices : 1
Checksum : 6515abfd - correct
--- adjusting superblock for 2.2/sparc compatability ---
Events : 0.171
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 1 8 81 2 active sync /dev/sdf1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
5 5 8 33 5 /dev/sdc1
=================================================================
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-21 17:30 ` Cal Webster
@ 2002-05-21 20:46 ` Neil Brown
2002-05-21 22:12 ` Calvin D. Webster
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2002-05-21 20:46 UTC (permalink / raw)
To: linux-raid
On Tuesday May 21, kc130iseo@coastalnet.com wrote:
>
> 1. Is there anything I need to change in "/etc/raidtab"?
> - Should it match the actual configuration?
raidtab is only really used by mkraid.
raidstart uses it to find one drive in the array, and ignores the rest
of raidtab. Given that one drive, it reads the superblock to find
other drives. You shouldn't need to change raidtab.
> - Differences between raidtools 0.90-6 and 1.00.2-1.3
Don't know. Not much I think.
>
> 2. Will the "Version" number in the superblock automatically be updated when
> I start the RAID?
No, the version of the superblock is still 0.90.0. This number
describes the format.
> ==================
> Things to do next:
> ==================
>
> ## Update the superblock:
>
> mdadm --examine --sparc2.2update /dev/sda1
>
> ## Re-check after updating:
>
> mdadm --examine --sparc2.2 /dev/sda1
This should show (different) garbage.
After you have updated, your superblock no longer has the
sparc-on-2.2 breakage, and
mdadm --examine /dev/sda1
should show the correct superblock.
>
> ## Assemble the array (use raid device order):
> ## Only assemble active devices, not spare.
>
> mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdf1 /dev/sdd1 /dev/sde1
>
> ## Start the RAID
>
> raidstart /dev/md0
No, you don't need to do this. "mdadm --assemble" assembles the raid
and starts it..
>
> ## Verify that raid is running
>
> cat /proc/mdstat
>
> ## Verify that data is intact
>
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-21 20:46 ` Neil Brown
@ 2002-05-21 22:12 ` Calvin D. Webster
2002-05-21 23:51 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Calvin D. Webster @ 2002-05-21 22:12 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
>
> On Tuesday May 21, kc130iseo@coastalnet.com wrote:
> >
> > mdadm --examine --sparc2.2update /dev/sda1
> >
> mdadm --examine /dev/sda1
> should show the correct superblock.
Very Slick! All went well.
> > ## Assemble the array (use raid device order):
> > ## Only assemble active devices, not spare.
> >
> > mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdf1 /dev/sdd1 /dev/sde1
mdadm: /dev/md0 has been started with 5 drives.
> > ## Verify that raid is running
> >
> > cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sde1[3] sdd1[2] sdf1[1] sdb1[0] sda1[4]
70726656 blocks level 5, 128k chunk, algorithm 0 [5/5] [UUUUU]
unused devices: <none>
## There's something fishy here.^
> > ## Verify that data is intact
## Excerpt from /etc/fstab
----------------------------------------------------------------
/dev/md0 /usr/local/archive ext2 defaults
1 2
/usr/local/archive/shares/redhat/redhat-7.2/enigma-i386-disc1.iso
/home/ftp/pub/redhat/redhat-7.2/disc1 iso9660 defaults,loop,ro 0 0
/usr/local/archive/shares/redhat/redhat-7.2/enigma-i386-disc2.iso
/home/ftp/pub/redhat/redhat-7.2/disc2 iso9660 defaults,loop,ro 0 0
----------------------------------------------------------------
## Mount RAID filesystems:
----------------------------------------------------------------
[root@winggear root]# mount /usr/local/archive/
[root@winggear root]# mount /home/ftp/pub/redhat/redhat-7.2/disc1
[root@winggear root]# mount /home/ftp/pub/redhat/redhat-7.2/disc2
----------------------------------------------------------------
## See that we can list mounted filesystems
-------------------------------------------------------------------------------
[root@winggear /]# ls /usr/local/archive/shares/
admin applications Av8bigtd-Gold Development Pvcs Archive redhat
users
[root@winggear /]# ls /home/ftp/pub/redhat/redhat-7.2/disc1
autorun images README.fr RedHat RELEASE-NOTES.fr
RPM-GPG-KEY
boot.cat README README.it RELEASE-NOTES RELEASE-NOTES.it
TRANS.TBL
COPYING README.de README.ja RELEASE-NOTES.de RELEASE-NOTES.ja
dosutils README.es README.ko RELEASE-NOTES.es RELEASE-NOTES.ko
[root@winggear /]# ls /usr/local/archive/shares/
admin applications Av8bigtd-Gold Development Pvcs Archive redhat
users
-------------------------------------------------------------------------------
### This is fabulous!!! ###
Thank you so.............. much!
There is only one thing that doesn't look right to me. That is the
contents of "/proc/mdstat".
The device numbers seem mismatched to their device names.
Also, different from what I'm used to seeing, is that the spare drive
(sdc1) is not shown in the "active" list as it was before.
## /proc/mdstat before (with 2.2 kernel):
----------------------------------------------------------------------------------
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
70726656 bloc
ks level 5, 128k chunk, algorithm 0 [5/5] [UUUUU]
unused devices: <none>
----------------------------------------------------------------------------------
## /proc/mdstat after (with 2.4 kernel, fixed with mdadm):
----------------------------------------------------------------------------------
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sda1[4] sde1[3] sdd1[2] sdf1[1] sdb1[0]
70726656 blocks level 5, 128k chunk, algorithm 0 [5/5] [UUUUU]
unused devices: <none>
----------------------------------------------------------------------------------
## Superblocks look fine
------------------[ mdadm --examine /dev/sda1 ]------------------
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 494dd54f:15e7b548:90bae5c3:532ac910
Creation Time : Thu Apr 26 12:29:38 2001
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Tue May 21 18:05:13 2002
State : dirty, no-errors
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 652969c1 - correct
Events : 0.178
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 4 8 1 0 active sync /dev/sda1
0 0 8 17 1 active sync /dev/sdb1
1 1 8 81 2 active sync /dev/sdf1
2 2 8 49 3 active sync /dev/sdd1
3 3 8 65 4 active sync /dev/sde1
4 4 8 1 0 active sync /dev/sda1
-----------------------------------------------------------------
Other than these minor things, IT'S PERFECT!
--Cal Webster
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-21 22:12 ` Calvin D. Webster
@ 2002-05-21 23:51 ` Neil Brown
2002-05-22 12:00 ` Calvin D. Webster
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2002-05-21 23:51 UTC (permalink / raw)
To: linux-raid
On Tuesday May 21, cwebster@ec.rr.com wrote:
>
> There is only one thing that doesn't look right to me. That is the
> contents of "/proc/mdstat".
> The device numbers seem mismatched to their device names.
>
> Also, different from what I'm used to seeing, is that the spare drive
> (sdc1) is not shown in the "active" list as it was before.
It is not there because you didn't tell "mdadm --assemble" about it.
mdadm will only assemble the drives that you give it. It doesn't
matter what order you give them, but you should list all the drives
that part of the array (live or spare).
You can hot add /dev/sdc1 back in with
mdadm /dev/md0 --add /dev/sdc1
or
raidhotadd /dev/md0 /dev/sdc1
>
> ## /proc/mdstat before (with 2.2 kernel):
>
> ----------------------------------------------------------------------------------
> Personalities : [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
> 70726656 bloc
> ks level 5, 128k chunk, algorithm 0 [5/5] [UUUUU]
> unused devices: <none>
> ----------------------------------------------------------------------------------
>
> ## /proc/mdstat after (with 2.4 kernel, fixed with mdadm):
>
> ----------------------------------------------------------------------------------
> Personalities : [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sda1[4] sde1[3] sdd1[2] sdf1[1] sdb1[0]
> 70726656 blocks level 5, 128k chunk, algorithm 0 [5/5] [UUUUU]
>
> unused devices: <none>
> ----------------------------------------------------------------------------------
I don't know what caused the different numbering. However as the raid
drive is clearly putting the right disks in the right places I
wouldn't worry about it.
I'm glad all went well.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-21 23:51 ` Neil Brown
@ 2002-05-22 12:00 ` Calvin D. Webster
2002-05-27 4:59 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Calvin D. Webster @ 2002-05-22 12:00 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
>
> On Tuesday May 21, cwebster@ec.rr.com wrote:
> >
> > There is only one thing that doesn't look right to me. That is the
> > contents of "/proc/mdstat".
> > The device numbers seem mismatched to their device names.
>
> I don't know what caused the different numbering. However as the raid
> drive is clearly putting the right disks in the right places I
> wouldn't worry about it.
There were a few other minor inconsistencies that I thought you might be
interested in between the output of "mdadm" and that of "lsraid" from
the new raidtools.
"mdadm --examine /dev/sda1" shows "State : dirty, no-errors". All other
RAID drives show the same.
"lsraid -D -a /dev/md0" shows "state = good". All other RAID devices
show the same.
## Possibly related:
## Because of this inconsistency, I ran "e2fsck /dev/md0":
-------------------------------------------------------------------------
[root@winggear root]# e2fsck /dev/md0
e2fsck 1.23, 15-Aug-2001 for EXT2 FS 0.5b, 95/08/09
/dev/md0: clean, 69960/8847360 files, 9835985/17681664 blocks
-------------------------------------------------------------------------
## Okay, it says "clean" but I'm not buying it. Try "force"
-------------------------------------------------------------------------
[root@winggear root]# e2fsck -f /dev/md0
e2fsck 1.23, 15-Aug-2001 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Filesystem contains large files, but lacks LARGE_FILE flag in
superblock.
Fix<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 69960/8847360 files (1.0% non-contiguous), 9835985/17681664
blocks
-------------------------------------------------------------------------
## I wonder why the LARGE_FILE flag was not set?
## I wonder if there is something else not quite right in the superblock
causing "mdadm" to report "dirty".
Everything seems to be running okay, though. The system will be getting
some exercise today. I should know soon if there are any problems. Let
me know if you'd like for me to check anything.
--Cal Webster
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Software RAID Stopped Working With Aurora Kernel
2002-05-22 12:00 ` Calvin D. Webster
@ 2002-05-27 4:59 ` Neil Brown
2002-05-29 13:07 ` Calvin Webster
0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2002-05-27 4:59 UTC (permalink / raw)
To: cwebster; +Cc: linux-raid
On Wednesday May 22, cwebster@ec.rr.com wrote:
>
> There were a few other minor inconsistencies that I thought you might be
> interested in between the output of "mdadm" and that of "lsraid" from
> the new raidtools.
>
>
> "mdadm --examine /dev/sda1" shows "State : dirty, no-errors". All other
> RAID drives show the same.
>
> "lsraid -D -a /dev/md0" shows "state = good". All other RAID devices
> show the same.
I have never looked at "lsraid". My guess is that "good" is
equivalent to "no-errors" and that lsraid doesn't bother to report
"dirty". This shouldn't relate to the filesystem superblock though...
still, a fsck every few months is a good idea.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-27 4:59 ` Neil Brown
@ 2002-05-29 13:07 ` Calvin Webster
2002-05-29 23:16 ` Neil Brown
0 siblings, 1 reply; 15+ messages in thread
From: Calvin Webster @ 2002-05-29 13:07 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org]On Behalf Of Neil Brown
> Sent: Monday, May 27, 2002 12:59 AM
> To: cwebster@ec.rr.com
> Cc: linux-raid
> Subject: Re: Software RAID Stopped Working With Aurora Kernel
>
>
> On Wednesday May 22, cwebster@ec.rr.com wrote:
> >
> > There were a few other minor inconsistencies that I thought you might be
> > interested in between the output of "mdadm" and that of "lsraid" from
> > the new raidtools.
> >
> >
> > "mdadm --examine /dev/sda1" shows "State : dirty, no-errors". All other
> > RAID drives show the same.
> >
> > "lsraid -D -a /dev/md0" shows "state = good".
> All other RAID devices
> > show the same.
>
> I have never looked at "lsraid". My guess is that "good" is
> equivalent to "no-errors" and that lsraid doesn't bother to report
> "dirty". This shouldn't relate to the filesystem superblock though...
> still, a fsck every few months is a good idea.
What made this observation noteworthy, however, was that it persisted even
_after_ e2fsck was run on /dev/md0.
--Cal Webster
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Software RAID Stopped Working With Aurora Kernel
2002-05-29 13:07 ` Calvin Webster
@ 2002-05-29 23:16 ` Neil Brown
0 siblings, 0 replies; 15+ messages in thread
From: Neil Brown @ 2002-05-29 23:16 UTC (permalink / raw)
To: cwebster; +Cc: linux-raid
On Wednesday May 29, cwebster@ec.rr.com wrote:
> > > "mdadm --examine /dev/sda1" shows "State : dirty, no-errors". All other
> > > RAID drives show the same.
> > >
> > > "lsraid -D -a /dev/md0" shows "state = good".
> > All other RAID devices
> > > show the same.
> >
> > I have never looked at "lsraid". My guess is that "good" is
> > equivalent to "no-errors" and that lsraid doesn't bother to report
> > "dirty". This shouldn't relate to the filesystem superblock though...
> > still, a fsck every few months is a good idea.
>
> What made this observation noteworthy, however, was that it persisted even
> _after_ e2fsck was run on /dev/md0.
The raid "dirty" or "error" flags are completely separate from any
filesystem "dirty" or "error" flags. So fsck will not affect the raid
flags.
The raid "dirty" flag simply means that the array has been started and
not yet stopped. Either it is active or you had an unclean shutdown
and will need to regenerate redundancy.
The raid "error" flag is meaningless. It is never set by anything,
and I cannot think of any use that it might be put to. But it is
there is the definition of the superblock, or mdadm reports it.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2002-05-29 23:16 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-19 23:38 Software RAID Stopped Working With Aurora Kernel Calvin Webster
2002-05-20 11:26 ` Neil Brown
2002-05-20 19:25 ` Cal Webster
2002-05-20 21:09 ` Neil Brown
2002-05-20 21:33 ` Calvin D. Webster
2002-05-21 11:24 ` Calvin D. Webster
2002-05-21 12:10 ` Neil Brown
2002-05-21 17:30 ` Cal Webster
2002-05-21 20:46 ` Neil Brown
2002-05-21 22:12 ` Calvin D. Webster
2002-05-21 23:51 ` Neil Brown
2002-05-22 12:00 ` Calvin D. Webster
2002-05-27 4:59 ` Neil Brown
2002-05-29 13:07 ` Calvin Webster
2002-05-29 23:16 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).