md: could not bd_claim hde1

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md: could not bd_claim hde1
@ 2009-10-11  3:39 Jon Lewis
  2009-10-11  3:55 ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Lewis @ 2009-10-11  3:39 UTC (permalink / raw)
  To: linux-raid

I've got a CentOS 4.8 system running software RAID1 on an identical pair 
of PATA drives.  At some point in the recent past, the system rebooted 
unexpectedly, and the two RAID1 devices degraded.  When trying to
mdadm /dev/md0 -a /dev/hde1
I get
mdadm: hot add failed for /dev/hde1: Invalid argument
and from the kernel,
md: could not bd_claim hde1.
md: error, md_import_device() returned -16

The same happens when trying to add hde3 back to md1.  The googling I've 
done suggests this is because something has these hde devices open. 
They're not mounted.  LVM is not in use.  Nothing that I can find is using 
these devices.  The system has been properly shut down a number of times 
since the arrays went degraded, but the errors we get when trying to readd 
the missing devices has not changed.

# mdadm --detail /dev/md0
/dev/md0:
         Version : 00.90.01
   Creation Time : Wed Mar 23 15:49:58 2005
      Raid Level : raid1
      Array Size : 38547840 (36.76 GiB 39.47 GB)
     Device Size : 38547840 (36.76 GiB 39.47 GB)
    Raid Devices : 2
   Total Devices : 1
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Sat Oct 10 23:31:50 2009
           State : clean, degraded
  Active Devices : 1
Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0

     Number   Major   Minor   RaidDevice State
        0       0        0       -1      removed
        1      34        1        1      active sync   /dev/hdg1
            UUID : e5f38941:e40733ec:4b761dd4:cdc8f141
          Events : 0.60067904

Short of booting from some form of rescue image, and trying to readd the 
missing devices (and then waiting for synchronization before putting the 
machine back in service) is there anything worth trying to get the arrays 
back in sync?

Disk /dev/hde: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/hde1   *           1        4799    38547936   fd  Linux raid autodetect
/dev/hde2            4800        4930     1052257+  82  Linux swap
/dev/hde3            4931        9729    38547967+  fd  Linux raid autodetect

Disk /dev/hdg: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/hdg1   *           1        4799    38547936   fd  Linux raid autodetect
/dev/hdg2            4800        4930     1052257+  82  Linux swap
/dev/hdg3            4931        9729    38547967+  fd  Linux raid autodetect

----------------------------------------------------------------------
  Jon Lewis                   |  I route
  Senior Network Engineer     |  therefore you are
  Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: md: could not bd_claim hde1
  2009-10-11  3:39 md: could not bd_claim hde1 Jon Lewis
@ 2009-10-11  3:55 ` NeilBrown
  2009-10-11  4:58   ` Jon Lewis
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2009-10-11  3:55 UTC (permalink / raw)
  To: Jon Lewis; +Cc: linux-raid

On Sun, October 11, 2009 2:39 pm, Jon Lewis wrote:
> I've got a CentOS 4.8 system running software RAID1 on an identical pair
> of PATA drives.  At some point in the recent past, the system rebooted
> unexpectedly, and the two RAID1 devices degraded.  When trying to
> mdadm /dev/md0 -a /dev/hde1
> I get
> mdadm: hot add failed for /dev/hde1: Invalid argument
> and from the kernel,
> md: could not bd_claim hde1.
> md: error, md_import_device() returned -16

Either hde1 or hde is definitely in use by something else.
Try:
 cat /proc/mounts
 cat /proc/swaps
 ls -l /sys/block/hde1/holders
 ls -l /sys/block/hde/holders
 cat /proc/mdstat
 lsof /dev/hde /dev/hde1

one of those should give some pointer.

NeilBrown

>
> The same happens when trying to add hde3 back to md1.  The googling I've
> done suggests this is because something has these hde devices open.
> They're not mounted.  LVM is not in use.  Nothing that I can find is using
> these devices.  The system has been properly shut down a number of times
> since the arrays went degraded, but the errors we get when trying to readd
> the missing devices has not changed.
>
> # mdadm --detail /dev/md0
> /dev/md0:
>          Version : 00.90.01
>    Creation Time : Wed Mar 23 15:49:58 2005
>       Raid Level : raid1
>       Array Size : 38547840 (36.76 GiB 39.47 GB)
>      Device Size : 38547840 (36.76 GiB 39.47 GB)
>     Raid Devices : 2
>    Total Devices : 1
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Sat Oct 10 23:31:50 2009
>            State : clean, degraded
>   Active Devices : 1
> Working Devices : 1
>   Failed Devices : 0
>    Spare Devices : 0
>
>      Number   Major   Minor   RaidDevice State
>         0       0        0       -1      removed
>         1      34        1        1      active sync   /dev/hdg1
>             UUID : e5f38941:e40733ec:4b761dd4:cdc8f141
>           Events : 0.60067904
>
> Short of booting from some form of rescue image, and trying to readd the
> missing devices (and then waiting for synchronization before putting the
> machine back in service) is there anything worth trying to get the arrays
> back in sync?
>
> Disk /dev/hde: 80.0 GB, 80026361856 bytes
> 255 heads, 63 sectors/track, 9729 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>     Device Boot      Start         End      Blocks   Id  System
> /dev/hde1   *           1        4799    38547936   fd  Linux raid
> autodetect
> /dev/hde2            4800        4930     1052257+  82  Linux swap
> /dev/hde3            4931        9729    38547967+  fd  Linux raid
> autodetect
>
> Disk /dev/hdg: 80.0 GB, 80026361856 bytes
> 255 heads, 63 sectors/track, 9729 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>     Device Boot      Start         End      Blocks   Id  System
> /dev/hdg1   *           1        4799    38547936   fd  Linux raid
> autodetect
> /dev/hdg2            4800        4930     1052257+  82  Linux swap
> /dev/hdg3            4931        9729    38547967+  fd  Linux raid
> autodetect
>
>
> ----------------------------------------------------------------------
>   Jon Lewis                   |  I route
>   Senior Network Engineer     |  therefore you are
>   Atlantic Net                |
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: md: could not bd_claim hde1
  2009-10-11  3:55 ` NeilBrown
@ 2009-10-11  4:58   ` Jon Lewis
  2009-10-11  8:35     ` Michael Tokarev
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Lewis @ 2009-10-11  4:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, 11 Oct 2009, NeilBrown wrote:

> Either hde1 or hde is definitely in use by something else.
> Try:
> cat /proc/mounts

# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
/dev/md1 /var ext3 rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
[there's a remote nfs mount here that I've removed]

> cat /proc/swaps

# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/hdg2                               partition       1052248 160     -1

> ls -l /sys/block/hde1/holders
> ls -l /sys/block/hde/holders

# ls -ld /sys/block/hd*
drwxr-xr-x  3 root root 0 Oct 11 00:53 /sys/block/hda
drwxr-xr-x  6 root root 0 Oct 11 00:53 /sys/block/hde
drwxr-xr-x  6 root root 0 Oct 11 00:53 /sys/block/hdg

# ls -l /sys/block/hde
total 0
-r--r--r--  1 root root 4096 Oct 11 00:53 dev
lrwxrwxrwx  1 root root    0 Oct 11 00:53 device -> ../../devices/pci0000:00/0000:00:02.0/0000:02:1f.0/0000:03:03.0/0000:04:0c.0/ide2/2.0
drwxr-xr-x  2 root root    0 Oct  9 23:52 hde1
drwxr-xr-x  2 root root    0 Oct  9 23:52 hde2
drwxr-xr-x  2 root root    0 Oct  9 23:52 hde3
drwxr-xr-x  3 root root    0 Oct  9 23:52 queue
-r--r--r--  1 root root 4096 Oct 11 00:53 range
-r--r--r--  1 root root 4096 Oct 11 00:53 removable
-r--r--r--  1 root root 4096 Oct 11 00:53 size
-r--r--r--  1 root root 4096 Oct 11 00:53 stat

# ls -l /sys/block/hde/hde1/
total 0
-r--r--r--  1 root root 4096 Oct 11 00:54 dev
-r--r--r--  1 root root 4096 Oct 11 00:54 size
-r--r--r--  1 root root 4096 Oct 11 00:54 start
-r--r--r--  1 root root 4096 Oct 11 00:54 stat
# ls -l /sys/block/hde/hde3/
total 0
-r--r--r--  1 root root 4096 Oct 11 00:54 dev
-r--r--r--  1 root root 4096 Oct 11 00:54 size
-r--r--r--  1 root root 4096 Oct 11 00:54 start
-r--r--r--  1 root root 4096 Oct 11 00:54 stat

> cat /proc/mdstat

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hdg3[1]
       38547840 blocks [2/1] [_U]

md0 : active raid1 hdg1[1]
       38547840 blocks [2/1] [_U]

unused devices: <none>

> lsof /dev/hde /dev/hde1

# lsof /dev/hde /dev/hde1
#

----------------------------------------------------------------------
  Jon Lewis                   |  I route
  Senior Network Engineer     |  therefore you are
  Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: md: could not bd_claim hde1
  2009-10-11  4:58   ` Jon Lewis
@ 2009-10-11  8:35     ` Michael Tokarev
  2009-10-11 12:55       ` Jon Lewis
  2009-10-11 19:23       ` Gabor Gombas
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Tokarev @ 2009-10-11  8:35 UTC (permalink / raw)
  To: Jon Lewis; +Cc: NeilBrown, linux-raid

Jon Lewis wrote:
> On Sun, 11 Oct 2009, NeilBrown wrote:
> 
>> Either hde1 or hde is definitely in use by something else.
>> Try:
>> cat /proc/mounts
> 
> # cat /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/root / ext3 rw 0 0

That's what you get for using rh and their nash ;)

What IS /dev/root?  How does your root device
is specified in initramfs -- by uuid/label?

I bet it is running off /dev/hde1, not /dev/md0.

But this does not explain why it can't add
/dev/hde3 to md1.

Speaking of "LVM is not in use" -- I understand
it like "you didn't set it up", but are you sure
it does not stay on the way?  Like, the dm modules -
how about rmmod'ing them?

In any way, if nothing obvious comes on, your best
bet is to booting into single-user mode, stopping
all "extra" processes, umounting everyhing that's
possible, and trying mdadm from there.

Or from initramfs itself, if that has any tools in
there.

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: md: could not bd_claim hde1
  2009-10-11  8:35     ` Michael Tokarev
@ 2009-10-11 12:55       ` Jon Lewis
  2009-10-11 19:23       ` Gabor Gombas
  1 sibling, 0 replies; 6+ messages in thread
From: Jon Lewis @ 2009-10-11 12:55 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: NeilBrown, linux-raid

On Sun, 11 Oct 2009, Michael Tokarev wrote:

>> # cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/root / ext3 rw 0 0
>
> That's what you get for using rh and their nash ;)
>
> What IS /dev/root?  How does your root device
> is specified in initramfs -- by uuid/label?

lrwxrwxrwx  1 root root 8 Oct  9 23:52 /dev/root -> /dev/md0

title CentOS (2.6.9-34.ELsmp)
         root (hd0,0)
         kernel /boot/vmlinuz-2.6.9-34.ELsmp ro root=/dev/md0
         initrd /boot/initrd-2.6.9-34.ELsmp.img

Pulling apart the initrd, the end of init does these commands

raidautorun /dev/md0
raidautorun /dev/md1
echo Creating root device
mkrootdev /dev/root
umount /sys
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
mount -t tmpfs --bind /dev /sysroot/dev
echo Switching to new root
switchroot /sysroot
umount /initrd/dev

So, mkrootdev (nash) should be using /dev/md0 as the root fs device.

I finally figured this out though.  The drives are on a Promise fast track 
controller, but we don't use its softraid feature...we only use it as 
additional PATA ports.  Apparently dmraid decided otherwise and had 
activated the devices.  I was able to unload dm_raid, but dm_mod was "in 
use" I suspect by the two devices.  dmraid -a n fixed that, and the 
devices were free again and are going back into sync.

----------------------------------------------------------------------
  Jon Lewis                   |  I route
  Senior Network Engineer     |  therefore you are
  Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: md: could not bd_claim hde1
  2009-10-11  8:35     ` Michael Tokarev
  2009-10-11 12:55       ` Jon Lewis
@ 2009-10-11 19:23       ` Gabor Gombas
  1 sibling, 0 replies; 6+ messages in thread
From: Gabor Gombas @ 2009-10-11 19:23 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Jon Lewis, NeilBrown, linux-raid

On Sun, Oct 11, 2009 at 12:35:52PM +0400, Michael Tokarev wrote:

> ># cat /proc/mounts
> >rootfs / rootfs rw 0 0
> >/dev/root / ext3 rw 0 0
> 
> That's what you get for using rh and their nash ;)
>
> What IS /dev/root?  How does your root device
> is specified in initramfs -- by uuid/label?

Hmm. cat /proc/mounts:

rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0

Vanilla 2.6.31.3 with no initramfs in sight, no UUID, no label thingy,
just good old "root=/dev/md0".

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-11 19:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-11  3:39 md: could not bd_claim hde1 Jon Lewis
2009-10-11  3:55 ` NeilBrown
2009-10-11  4:58   ` Jon Lewis
2009-10-11  8:35     ` Michael Tokarev
2009-10-11 12:55       ` Jon Lewis
2009-10-11 19:23       ` Gabor Gombas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).