* md: could not bd_claim hde1
@ 2009-10-11 3:39 Jon Lewis
2009-10-11 3:55 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Jon Lewis @ 2009-10-11 3:39 UTC (permalink / raw)
To: linux-raid
I've got a CentOS 4.8 system running software RAID1 on an identical pair
of PATA drives. At some point in the recent past, the system rebooted
unexpectedly, and the two RAID1 devices degraded. When trying to
mdadm /dev/md0 -a /dev/hde1
I get
mdadm: hot add failed for /dev/hde1: Invalid argument
and from the kernel,
md: could not bd_claim hde1.
md: error, md_import_device() returned -16
The same happens when trying to add hde3 back to md1. The googling I've
done suggests this is because something has these hde devices open.
They're not mounted. LVM is not in use. Nothing that I can find is using
these devices. The system has been properly shut down a number of times
since the arrays went degraded, but the errors we get when trying to readd
the missing devices has not changed.
# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Wed Mar 23 15:49:58 2005
Raid Level : raid1
Array Size : 38547840 (36.76 GiB 39.47 GB)
Device Size : 38547840 (36.76 GiB 39.47 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Oct 10 23:31:50 2009
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 0 0 -1 removed
1 34 1 1 active sync /dev/hdg1
UUID : e5f38941:e40733ec:4b761dd4:cdc8f141
Events : 0.60067904
Short of booting from some form of rescue image, and trying to readd the
missing devices (and then waiting for synchronization before putting the
machine back in service) is there anything worth trying to get the arrays
back in sync?
Disk /dev/hde: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hde1 * 1 4799 38547936 fd Linux raid autodetect
/dev/hde2 4800 4930 1052257+ 82 Linux swap
/dev/hde3 4931 9729 38547967+ fd Linux raid autodetect
Disk /dev/hdg: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hdg1 * 1 4799 38547936 fd Linux raid autodetect
/dev/hdg2 4800 4930 1052257+ 82 Linux swap
/dev/hdg3 4931 9729 38547967+ fd Linux raid autodetect
----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: could not bd_claim hde1
2009-10-11 3:39 md: could not bd_claim hde1 Jon Lewis
@ 2009-10-11 3:55 ` NeilBrown
2009-10-11 4:58 ` Jon Lewis
0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2009-10-11 3:55 UTC (permalink / raw)
To: Jon Lewis; +Cc: linux-raid
On Sun, October 11, 2009 2:39 pm, Jon Lewis wrote:
> I've got a CentOS 4.8 system running software RAID1 on an identical pair
> of PATA drives. At some point in the recent past, the system rebooted
> unexpectedly, and the two RAID1 devices degraded. When trying to
> mdadm /dev/md0 -a /dev/hde1
> I get
> mdadm: hot add failed for /dev/hde1: Invalid argument
> and from the kernel,
> md: could not bd_claim hde1.
> md: error, md_import_device() returned -16
Either hde1 or hde is definitely in use by something else.
Try:
cat /proc/mounts
cat /proc/swaps
ls -l /sys/block/hde1/holders
ls -l /sys/block/hde/holders
cat /proc/mdstat
lsof /dev/hde /dev/hde1
one of those should give some pointer.
NeilBrown
>
> The same happens when trying to add hde3 back to md1. The googling I've
> done suggests this is because something has these hde devices open.
> They're not mounted. LVM is not in use. Nothing that I can find is using
> these devices. The system has been properly shut down a number of times
> since the arrays went degraded, but the errors we get when trying to readd
> the missing devices has not changed.
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 00.90.01
> Creation Time : Wed Mar 23 15:49:58 2005
> Raid Level : raid1
> Array Size : 38547840 (36.76 GiB 39.47 GB)
> Device Size : 38547840 (36.76 GiB 39.47 GB)
> Raid Devices : 2
> Total Devices : 1
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Sat Oct 10 23:31:50 2009
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 0
> Spare Devices : 0
>
> Number Major Minor RaidDevice State
> 0 0 0 -1 removed
> 1 34 1 1 active sync /dev/hdg1
> UUID : e5f38941:e40733ec:4b761dd4:cdc8f141
> Events : 0.60067904
>
> Short of booting from some form of rescue image, and trying to readd the
> missing devices (and then waiting for synchronization before putting the
> machine back in service) is there anything worth trying to get the arrays
> back in sync?
>
> Disk /dev/hde: 80.0 GB, 80026361856 bytes
> 255 heads, 63 sectors/track, 9729 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hde1 * 1 4799 38547936 fd Linux raid
> autodetect
> /dev/hde2 4800 4930 1052257+ 82 Linux swap
> /dev/hde3 4931 9729 38547967+ fd Linux raid
> autodetect
>
> Disk /dev/hdg: 80.0 GB, 80026361856 bytes
> 255 heads, 63 sectors/track, 9729 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hdg1 * 1 4799 38547936 fd Linux raid
> autodetect
> /dev/hdg2 4800 4930 1052257+ 82 Linux swap
> /dev/hdg3 4931 9729 38547967+ fd Linux raid
> autodetect
>
>
> ----------------------------------------------------------------------
> Jon Lewis | I route
> Senior Network Engineer | therefore you are
> Atlantic Net |
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: could not bd_claim hde1
2009-10-11 3:55 ` NeilBrown
@ 2009-10-11 4:58 ` Jon Lewis
2009-10-11 8:35 ` Michael Tokarev
0 siblings, 1 reply; 6+ messages in thread
From: Jon Lewis @ 2009-10-11 4:58 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Sun, 11 Oct 2009, NeilBrown wrote:
> Either hde1 or hde is definitely in use by something else.
> Try:
> cat /proc/mounts
# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
/dev/md1 /var ext3 rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
[there's a remote nfs mount here that I've removed]
> cat /proc/swaps
# cat /proc/swaps
Filename Type Size Used Priority
/dev/hdg2 partition 1052248 160 -1
> ls -l /sys/block/hde1/holders
> ls -l /sys/block/hde/holders
# ls -ld /sys/block/hd*
drwxr-xr-x 3 root root 0 Oct 11 00:53 /sys/block/hda
drwxr-xr-x 6 root root 0 Oct 11 00:53 /sys/block/hde
drwxr-xr-x 6 root root 0 Oct 11 00:53 /sys/block/hdg
# ls -l /sys/block/hde
total 0
-r--r--r-- 1 root root 4096 Oct 11 00:53 dev
lrwxrwxrwx 1 root root 0 Oct 11 00:53 device -> ../../devices/pci0000:00/0000:00:02.0/0000:02:1f.0/0000:03:03.0/0000:04:0c.0/ide2/2.0
drwxr-xr-x 2 root root 0 Oct 9 23:52 hde1
drwxr-xr-x 2 root root 0 Oct 9 23:52 hde2
drwxr-xr-x 2 root root 0 Oct 9 23:52 hde3
drwxr-xr-x 3 root root 0 Oct 9 23:52 queue
-r--r--r-- 1 root root 4096 Oct 11 00:53 range
-r--r--r-- 1 root root 4096 Oct 11 00:53 removable
-r--r--r-- 1 root root 4096 Oct 11 00:53 size
-r--r--r-- 1 root root 4096 Oct 11 00:53 stat
# ls -l /sys/block/hde/hde1/
total 0
-r--r--r-- 1 root root 4096 Oct 11 00:54 dev
-r--r--r-- 1 root root 4096 Oct 11 00:54 size
-r--r--r-- 1 root root 4096 Oct 11 00:54 start
-r--r--r-- 1 root root 4096 Oct 11 00:54 stat
# ls -l /sys/block/hde/hde3/
total 0
-r--r--r-- 1 root root 4096 Oct 11 00:54 dev
-r--r--r-- 1 root root 4096 Oct 11 00:54 size
-r--r--r-- 1 root root 4096 Oct 11 00:54 start
-r--r--r-- 1 root root 4096 Oct 11 00:54 stat
> cat /proc/mdstat
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hdg3[1]
38547840 blocks [2/1] [_U]
md0 : active raid1 hdg1[1]
38547840 blocks [2/1] [_U]
unused devices: <none>
> lsof /dev/hde /dev/hde1
# lsof /dev/hde /dev/hde1
#
----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: could not bd_claim hde1
2009-10-11 4:58 ` Jon Lewis
@ 2009-10-11 8:35 ` Michael Tokarev
2009-10-11 12:55 ` Jon Lewis
2009-10-11 19:23 ` Gabor Gombas
0 siblings, 2 replies; 6+ messages in thread
From: Michael Tokarev @ 2009-10-11 8:35 UTC (permalink / raw)
To: Jon Lewis; +Cc: NeilBrown, linux-raid
Jon Lewis wrote:
> On Sun, 11 Oct 2009, NeilBrown wrote:
>
>> Either hde1 or hde is definitely in use by something else.
>> Try:
>> cat /proc/mounts
>
> # cat /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/root / ext3 rw 0 0
That's what you get for using rh and their nash ;)
What IS /dev/root? How does your root device
is specified in initramfs -- by uuid/label?
I bet it is running off /dev/hde1, not /dev/md0.
But this does not explain why it can't add
/dev/hde3 to md1.
Speaking of "LVM is not in use" -- I understand
it like "you didn't set it up", but are you sure
it does not stay on the way? Like, the dm modules -
how about rmmod'ing them?
In any way, if nothing obvious comes on, your best
bet is to booting into single-user mode, stopping
all "extra" processes, umounting everyhing that's
possible, and trying mdadm from there.
Or from initramfs itself, if that has any tools in
there.
/mjt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: could not bd_claim hde1
2009-10-11 8:35 ` Michael Tokarev
@ 2009-10-11 12:55 ` Jon Lewis
2009-10-11 19:23 ` Gabor Gombas
1 sibling, 0 replies; 6+ messages in thread
From: Jon Lewis @ 2009-10-11 12:55 UTC (permalink / raw)
To: Michael Tokarev; +Cc: NeilBrown, linux-raid
On Sun, 11 Oct 2009, Michael Tokarev wrote:
>> # cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/root / ext3 rw 0 0
>
> That's what you get for using rh and their nash ;)
>
> What IS /dev/root? How does your root device
> is specified in initramfs -- by uuid/label?
lrwxrwxrwx 1 root root 8 Oct 9 23:52 /dev/root -> /dev/md0
title CentOS (2.6.9-34.ELsmp)
root (hd0,0)
kernel /boot/vmlinuz-2.6.9-34.ELsmp ro root=/dev/md0
initrd /boot/initrd-2.6.9-34.ELsmp.img
Pulling apart the initrd, the end of init does these commands
raidautorun /dev/md0
raidautorun /dev/md1
echo Creating root device
mkrootdev /dev/root
umount /sys
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
mount -t tmpfs --bind /dev /sysroot/dev
echo Switching to new root
switchroot /sysroot
umount /initrd/dev
So, mkrootdev (nash) should be using /dev/md0 as the root fs device.
I finally figured this out though. The drives are on a Promise fast track
controller, but we don't use its softraid feature...we only use it as
additional PATA ports. Apparently dmraid decided otherwise and had
activated the devices. I was able to unload dm_raid, but dm_mod was "in
use" I suspect by the two devices. dmraid -a n fixed that, and the
devices were free again and are going back into sync.
----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: could not bd_claim hde1
2009-10-11 8:35 ` Michael Tokarev
2009-10-11 12:55 ` Jon Lewis
@ 2009-10-11 19:23 ` Gabor Gombas
1 sibling, 0 replies; 6+ messages in thread
From: Gabor Gombas @ 2009-10-11 19:23 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Jon Lewis, NeilBrown, linux-raid
On Sun, Oct 11, 2009 at 12:35:52PM +0400, Michael Tokarev wrote:
> ># cat /proc/mounts
> >rootfs / rootfs rw 0 0
> >/dev/root / ext3 rw 0 0
>
> That's what you get for using rh and their nash ;)
>
> What IS /dev/root? How does your root device
> is specified in initramfs -- by uuid/label?
Hmm. cat /proc/mounts:
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0
Vanilla 2.6.31.3 with no initramfs in sight, no UUID, no label thingy,
just good old "root=/dev/md0".
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-11 19:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-11 3:39 md: could not bd_claim hde1 Jon Lewis
2009-10-11 3:55 ` NeilBrown
2009-10-11 4:58 ` Jon Lewis
2009-10-11 8:35 ` Michael Tokarev
2009-10-11 12:55 ` Jon Lewis
2009-10-11 19:23 ` Gabor Gombas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).