* Half of RAID1 array missing on 2.6.7-rc3
@ 2004-08-05 14:08 John Stoffel
2004-08-05 15:09 ` Alvin Oga
2004-08-05 19:38 ` John Stoffel
0 siblings, 2 replies; 7+ messages in thread
From: John Stoffel @ 2004-08-05 14:08 UTC (permalink / raw)
To: linux-raid; +Cc: stoffel
Hi folks,
I've run into a problem on my debian SMP system, running kernels
2.6.7-rc3 (as well as 2.6.8-rc2-mm2 and 2.6.8-rc3) where I can't seem
to add or removed devices from my /dev/md0 array. The system is a
dual processor Xeon, 550mhz. Debian unstable, fairly aggressively
updated.
The root filesystems are all on SCSI disks, and I have a pair of WD
120gb drives on a Promise HPT302 controller which are mirrored. These
are /dev/hde and /dev/hdg respectively. The other day while I was
mucking around with getting a third 120gb drive working in a
USB2.0/Firewire external case, I noticed that /dev/md0 had lost one of
it's two disks, /dev/hdg. I've been trying to re-add it back in, but
I can't.
What I'm doing is setting up the two disks mirrored as /dev/md0 using
/dev/hde1 and /dev/hdg1. Then I've setup a volume group using
DeviceMapper to hold a pair of filesystems on there, so that I can
grow/shrink them as needed down the line. So far so good. The data
is all there and I can still access it no problem, but I can't get my
data mirrored again!
I've run a complete badblocks on /dev/hdg and it passes without any
problems. I suspect that because I have what looks to be two UUIDs
associated with /dev/md0, that it's somehow screwed up somewhere. I
really don't want to lose this data if I can help it.
Here's some info on versions and setup.
# mdadm --version
mdadm - v1.6.0 - 4 June 2004
I had been using 1.4.0-3 before, but I upgraded in case there was
something wrong. I can drop back if need be.
# cat /proc/partitions
major minor #blocks name
33 0 117220824 hde
33 1 117218241 hde1
34 0 117220824 hdg
34 1 117218241 hdg1
8 0 17783000 sda
8 1 248976 sda1
8 2 4000185 sda2
8 3 996030 sda3
8 4 1 sda4
8 5 4000153 sda5
8 6 8000338 sda6
8 16 17782540 sdb
8 17 248976 sdb1
8 18 996030 sdb2
8 19 16530885 sdb3
9 0 117218176 md0
8 32 117220824 sdc
8 33 58593496 sdc1
8 34 48828024 sdc2
253 0 53477376 dm-0
253 1 36700160 dm-1
253 2 117218241 dm-2
253 3 248976 dm-3
253 4 996030 dm-4
253 5 16530885 dm-5
253 6 58593496 dm-6
253 7 48828024 dm-7
# mdadm -QE --scan
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094
devices=/dev/hde1
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=9835ebd0:5d02ebf0:907edc91:c4bf97b2
devices=/dev/hde
This bothers me, why am I seeing two different UUIDs here?
# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Fri Oct 24 19:23:41 2003
Raid Level : raid1
Array Size : 117218176 (111.79 GiB 120.03 GB)
Device Size : 117218176 (111.79 GiB 120.03 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Aug 5 09:33:35 2004
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 33 1 0 active sync /dev/hde1
1 0 0 -1 removed
UUID : 2e078443:42b63ef5:cc179492:aecf0094
Events : 0.990424
Here's another strange thing. I have Raid Devices = 2, but the Active
and Working Devices are both 1.
I've unmounted both filesystems, stopped the volume group (vgchange -a
n) and now stopped the /dev/md0 device with:
mdadm --stop --scan
Then I rebuilt it with:
# mdadm --assemble /dev/md0 --auto --scan --update=summaries --verbose
mdadm: looking for devices for /dev/md0
mdadm: /dev/hde has wrong uuid.
mdadm: /dev/hde1 is identified as a member of /dev/md0, slot 0.
mdadm: no RAID superblock on /dev/hdg
mdadm: /dev/hdg has wrong uuid.
mdadm: no RAID superblock on /dev/hdg1
mdadm: /dev/hdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sda has wrong uuid.
mdadm: no RAID superblock on /dev/sda1
mdadm: /dev/sda1 has wrong uuid.
mdadm: no RAID superblock on /dev/sda2
mdadm: /dev/sda2 has wrong uuid.
mdadm: no RAID superblock on /dev/sda3
mdadm: /dev/sda3 has wrong uuid.
mdadm: no RAID superblock on /dev/sda4
mdadm: /dev/sda4 has wrong uuid.
mdadm: no RAID superblock on /dev/sda5
mdadm: /dev/sda5 has wrong uuid.
mdadm: no RAID superblock on /dev/sda6
mdadm: /dev/sda6 has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: no RAID superblock on /dev/sdb1
mdadm: /dev/sdb1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdb2
mdadm: /dev/sdb2 has wrong uuid.
mdadm: no RAID superblock on /dev/sdb3
mdadm: /dev/sdb3 has wrong uuid.
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdc1
mdadm: /dev/sdc1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdc2
mdadm: /dev/sdc2 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/hdg1
mdadm: /dev/evms/.nodes/hdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/sdb1
mdadm: /dev/evms/.nodes/sdb1 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/sdb2
mdadm: /dev/evms/.nodes/sdb2 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/sdb3
mdadm: /dev/evms/.nodes/sdb3 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/sdc1
mdadm: /dev/evms/.nodes/sdc1 has wrong uuid.
mdadm: no RAID superblock on /dev/evms/.nodes/sdc2
mdadm: /dev/evms/.nodes/sdc2 has wrong uuid.
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/hde1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
Which is great, I can still see it without a problem.
jfsnew:/etc/init.d# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Fri Oct 24 19:23:41 2003
Raid Level : raid1
Array Size : 117218176 (111.79 GiB 120.03 GB)
Device Size : 117218176 (111.79 GiB 120.03 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Aug 5 09:33:35 2004
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 33 1 0 active sync /dev/hde1
1 0 0 -1 removed
UUID : 2e078443:42b63ef5:cc179492:aecf0094
Events : 0.990424
Well, no change there.
jfsnew:/etc/init.d# mdadm /dev/md0 -a /dev/hdg1
mdadm: hot add failed for /dev/hdg1: Invalid argument
And this just fails. I get the following error in /var/log/syslog.
Aug 5 09:58:09 jfsnew kernel: md: trying to hot-add hdg1 to md0 ...
Aug 5 09:58:09 jfsnew kernel: md: could not lock hdg1.
Aug 5 09:58:09 jfsnew kernel: md: error, md_import_device() returned -16
Which doesn't seem to make any sense. Can someone tell me what the
heck is going on here?
Thanks,
John
John Stoffel - Senior Unix Systems Administrator - Lucent Technologies
stoffel@lucent.com - http://www.lucent.com - 978-952-7548
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 14:08 Half of RAID1 array missing on 2.6.7-rc3 John Stoffel
@ 2004-08-05 15:09 ` Alvin Oga
2004-08-05 15:17 ` John Stoffel
2004-08-05 19:38 ` John Stoffel
1 sibling, 1 reply; 7+ messages in thread
From: Alvin Oga @ 2004-08-05 15:09 UTC (permalink / raw)
To: John Stoffel; +Cc: linux-raid
hi ya john
On Thu, 5 Aug 2004, John Stoffel wrote:
> The root filesystems are all on SCSI disks, and I have a pair of WD
> 120gb drives on a Promise HPT302 controller which are mirrored. These
if it was me, i'd throw away the highpoint controller ... it aint worth
the risk of losing your data
- i prefer sw raid and its flexibility over expensive hw raid
isn't hpt (rocketraid) hardware raid ??
- why are we using mdadm tools on a hw raid controller ??
> .... I noticed that /dev/md0 had lost one of
> it's two disks, /dev/hdg. I've been trying to re-add it back in, but
> I can't.
you should monitor the raid so that you know if a disk crashed within few
hours .. otherwise, you lose all data on the entire raid array
> What I'm doing is setting up the two disks mirrored as /dev/md0 using
> /dev/hde1 and /dev/hdg1. Then I've setup a volume group using
> DeviceMapper to hold a pair of filesystems on there, so that I can
> grow/shrink them as needed down the line. So far so good. The data
> is all there and I can still access it no problem, but I can't get my
> data mirrored again!
than it's NOT "so good" so far ... ( raid is broken )
> I've run a complete badblocks on /dev/hdg and it passes without any
> problems.
good
> # mdadm -QE --scan
> ARRAY /dev/md0 level=raid1 num-devices=2 UUID=2e078443:42b63ef5:cc179492:aecf0094
> devices=/dev/hde1
> ARRAY /dev/md0 level=raid1 num-devices=2 UUID=9835ebd0:5d02ebf0:907edc91:c4bf97b2
> devices=/dev/hde
>
> This bothers me, why am I seeing two different UUIDs here?
one is the entire disk ... other is a partition
> # mdadm --detail /dev/md0
> Update Time : Thu Aug 5 09:33:35 2004
> State : clean, degraded
degraded is good... if you lost one disk
> Number Major Minor RaidDevice State
> 0 33 1 0 active sync /dev/hde1
> 1 0 0 -1 removed
good ... one removed
> # mdadm --assemble /dev/md0 --auto --scan --update=summaries --verbose
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/hde has wrong uuid.
> mdadm: /dev/hde1 is identified as a member of /dev/md0, slot 0.
fun times w/ hw raid..
>
> jfsnew:/etc/init.d# mdadm /dev/md0 -a /dev/hdg1
> mdadm: hot add failed for /dev/hdg1: Invalid argument
how about simple "raid stop" and "raid start" or at least the
commands that came with (possibly non-hw-raid) hpt302 ...
> And this just fails. I get the following error in /var/log/syslog.
>
> Aug 5 09:58:09 jfsnew kernel: md: trying to hot-add hdg1 to md0 ...
> Aug 5 09:58:09 jfsnew kernel: md: could not lock hdg1.
> Aug 5 09:58:09 jfsnew kernel: md: error, md_import_device() returned -16
>
> Which doesn't seem to make any sense. Can someone tell me what the
> heck is going on here?
i think you're using mdadm ( sw raid tools ) on a hardware raid controller
c ya
alvin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 15:09 ` Alvin Oga
@ 2004-08-05 15:17 ` John Stoffel
0 siblings, 0 replies; 7+ messages in thread
From: John Stoffel @ 2004-08-05 15:17 UTC (permalink / raw)
To: Alvin Oga; +Cc: John Stoffel, linux-raid
Alvin> if it was me, i'd throw away the highpoint controller ... it
Alvin> aint worth the risk of losing your data - i prefer sw raid and
Alvin> its flexibility over expensive hw raid
It's purely an IDE controller, four ports, two channels, supposedly
ATA 133. I'm only using SW RAID on there.
Alvin> isn't hpt (rocketraid) hardware raid ??
Nope, at least not this one.
Thanks for the input.
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 14:08 Half of RAID1 array missing on 2.6.7-rc3 John Stoffel
2004-08-05 15:09 ` Alvin Oga
@ 2004-08-05 19:38 ` John Stoffel
2004-08-05 19:52 ` Luca Berra
1 sibling, 1 reply; 7+ messages in thread
From: John Stoffel @ 2004-08-05 19:38 UTC (permalink / raw)
To: John Stoffel; +Cc: linux-raid, stoffel
Hi folks,
I think I've found the problem. At least there are a couple of
problems here.
1. When the md code tries to run the various hot-add scripts, it gives
back a fairly useless error. It should instead tell you that the
device is locked by some other user and hopefully tell you WHAT
that user is.
I finally started poking around at device mapper stuff as well, and I
ran the command:
# dmsetup status
sdb3: 0 33061770 linear
sdb2: 0 1992060 linear
data_vg-local_lv: 0 62914560 linear
data_vg-local_lv: 62914560 10485760 linear
sdb1: 0 497952 linear
data_vg-home_lv: 0 83886080 linear
data_vg-home_lv: 83886080 23068672 linear
sdc2: 0 97656048 linear
sdc1: 0 117186993 linear
hdg1: 0 234436482 linear
Notice how hdg1 is listed as a LINEAR device. I certainly didn't do
that by default, god knows how it gets picked up. But once I did:
# dmsetup remove hdg1
It was removed!!
# dmsetup status
sdb3: 0 33061770 linear
sdb2: 0 1992060 linear
data_vg-local_lv: 0 62914560 linear
data_vg-local_lv: 62914560 10485760 linear
sdb1: 0 497952 linear
data_vg-home_lv: 0 83886080 linear
data_vg-home_lv: 83886080 23068672 linear
sdc2: 0 97656048 linear
sdc1: 0 117186993 linear
So now I was able to do:
# mdadm /dev/md0 --force -a /dev/hdg1
mdadm: hot added /dev/hdg1
Which was great to see.
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
md0 : active raid1 hdg1[2] hde1[0]
117218176 blocks [2/1] [U_]
[>....................] recovery = 0.5% (673664/117218176) finish=49.0min speed=39627K/sec
unused devices: <none>
And now it's re-building the mirror properly. So now I need to see
how I can stop the Device Mapper stuff from taking over and
controlling various devices. The /dev/sdb? and the /dev/sdc? ones are
also problematic, since those are just a second SCSI disk and a USB
storage device. Hmm... I wonder if I remove the usb-storage device
from device mapper I'll be able to finally write all the data to it
without locking up the system. Gotta try that out.
I do appreciate the people who gave suggestions, even though it didn't
turn out to be the right solution in the end.
Now to figure out how to make device mapper only look at /dev/md*
devices in the future.
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 19:38 ` John Stoffel
@ 2004-08-05 19:52 ` Luca Berra
2004-08-05 20:05 ` John Stoffel
0 siblings, 1 reply; 7+ messages in thread
From: Luca Berra @ 2004-08-05 19:52 UTC (permalink / raw)
To: linux-raid
On Thu, Aug 05, 2004 at 03:38:07PM -0400, John Stoffel wrote:
>Now to figure out how to make device mapper only look at /dev/md*
>devices in the future.
how are you using device mapper?
recent versions of lvm2 ignore md components by default
or you can specify a filter in the configuration file.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 19:52 ` Luca Berra
@ 2004-08-05 20:05 ` John Stoffel
2004-08-05 20:34 ` Luca Berra
0 siblings, 1 reply; 7+ messages in thread
From: John Stoffel @ 2004-08-05 20:05 UTC (permalink / raw)
To: linux-raid, Luca Berra
Luca> how are you using device mapper?
To setup some volume groups on top of an MD array of two mirrored
disks. I think. All I know currently is that it's required by LVM2
to have it setup and around. I have /dev/mapper/data_vg-*_lv and some
other devices currently. All I really want are the LVM2 devices to be
covered. And since those devices are built on top of /dev/md0 (for
now, might add more later), I don't need other devices looked at.
Period.
Can you tell me what I'm doing wrong here?
Luca> recent versions of lvm2 ignore md components by default or you
Luca> can specify a filter in the configuration file.
Which file is that? There's nothing in /etc/dm/... that I can see.
There's some stuff in /etc/lvm/lvm.conf, but I haven't touched that at
all.
Basically, I don't want Device Mapper to scan and take over all my
disks. From my dmesg file, I see a bunch of these on startup now:
device-mapper: error adding target to table
device-mapper: : dm-linear: Device lookup failed
device-mapper: error adding target to table
device-mapper: : dm-linear: Device lookup failed
device-mapper: error adding target to table
device-mapper: : dm-linear: Device lookup failed
device-mapper: error adding target to table
device-mapper: : dm-linear: Device lookup failed
device-mapper: error adding target to table
device-mapper: : dm-linear: Device lookup failed
And I don't know where they are coming from.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Half of RAID1 array missing on 2.6.7-rc3
2004-08-05 20:05 ` John Stoffel
@ 2004-08-05 20:34 ` Luca Berra
0 siblings, 0 replies; 7+ messages in thread
From: Luca Berra @ 2004-08-05 20:34 UTC (permalink / raw)
To: linux-raid
On Thu, Aug 05, 2004 at 04:05:16PM -0400, John Stoffel wrote:
>Luca> recent versions of lvm2 ignore md components by default or you
>Luca> can specify a filter in the configuration file.
>
>Which file is that? There's nothing in /etc/dm/... that I can see.
>There's some stuff in /etc/lvm/lvm.conf, but I haven't touched that at
>all.
/etc/lvm/lvm.conf
if you have a recent version you will find a
md_component_detection = 1
line,
if not you can change the
filter = [ "a/.*/" ]
to
filter = [ "a|/dev/md*|" ]
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-08-05 20:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-05 14:08 Half of RAID1 array missing on 2.6.7-rc3 John Stoffel
2004-08-05 15:09 ` Alvin Oga
2004-08-05 15:17 ` John Stoffel
2004-08-05 19:38 ` John Stoffel
2004-08-05 19:52 ` Luca Berra
2004-08-05 20:05 ` John Stoffel
2004-08-05 20:34 ` Luca Berra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).