unable to remove failed drive

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* unable to remove failed drive
@ 2007-12-08  1:55 Jeff Breidenbach
  2007-12-10 19:12 ` Bill Davidsen
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Breidenbach @ 2007-12-08  1:55 UTC (permalink / raw)
  To: linux-raid list

... and all access to array hangs indefinitely, resulting in unkillable zombie
processes. Have to hard reboot the machine. Any thoughts on the matter?

===

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sde1[6](F) sdg1[1] sdb1[4] sdd1[3] sdc1[2]
      488383936 blocks [6/4] [_UUUU_]

unused devices: <none>

# mdadm --fail /dev/md1 /dev/sde1
mdadm: set /dev/sde1 faulty in /dev/md1

# mdadm --remove /dev/md1 /dev/sde1
mdadm: hot remove failed for /dev/sde1: Device or resource busy

# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Sun Dec 25 16:12:34 2005
     Raid Level : raid1
     Array Size : 488383936 (465.76 GiB 500.11 GB)
    Device Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 6
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Dec  7 11:37:46 2007
          State : active, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0

           UUID : f3ee6aa3:2f1d5767:f443dfc0:c23e80af
         Events : 0.22331500

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       97        1      active sync   /dev/sdg1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
       4       8       17        4      active sync   /dev/sdb1
       5       0        0        -      removed

       6       8       65        0      faulty   /dev/sde1


# dmesg
sd 4:0:0:0: SCSI error: return code = 0x8000002
sde: Current: sense key: Aborted Command
    Additional sense: Scsi parity error
end_request: I/O error, dev sde, sector 594882271
raid1: sde1: rescheduling sector 594882208
ata5: command timeout
ata5: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata5: status=0xff { Busy }
sd 4:0:0:0: SCSI error: return code = 0x8000002
sde: Current: sense key: Aborted Command
    Additional sense: Scsi parity error
end_request: I/O error, dev sde, sector 528737607
raid1: sde1: rescheduling sector 528737544
ata5: command timeout
ata5: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata5: status=0xff { Busy }
sd 4:0:0:0: SCSI error: return code = 0x8000002
sde: Current: sense key: Aborted Command
    Additional sense: Scsi parity error
end_request: I/O error, dev sde, sector 814377071

[...]

md: cannot remove active disk sde1 from md1 ...
md: could not bd_claim sde1.
md: error, md_import_device() returned -16

# cat /proc/version
Linux version 2.6.17-2-amd64 (Debian 2.6.17-7) (waldi@debian.org) (gcc
version 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)) #1 SMP Thu Aug
24 16:13:57 UTC 2006

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: unable to remove failed drive
  2007-12-08  1:55 unable to remove failed drive Jeff Breidenbach
@ 2007-12-10 19:12 ` Bill Davidsen
  2007-12-10 21:46   ` Jeff Breidenbach
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Davidsen @ 2007-12-10 19:12 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid list

Jeff Breidenbach wrote:
> ... and all access to array hangs indefinitely, resulting in unkillable zombie
> processes. Have to hard reboot the machine. Any thoughts on the matter?
>
> ===
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md1 : active raid1 sde1[6](F) sdg1[1] sdb1[4] sdd1[3] sdc1[2]
>       488383936 blocks [6/4] [_UUUU_]
>
> unused devices: <none>
>
> # mdadm --fail /dev/md1 /dev/sde1
> mdadm: set /dev/sde1 faulty in /dev/md1
>
> # mdadm --remove /dev/md1 /dev/sde1
> mdadm: hot remove failed for /dev/sde1: Device or resource busy
>
> # mdadm -D /dev/md1
> /dev/md1:
>         Version : 00.90.03
>   Creation Time : Sun Dec 25 16:12:34 2005
>      Raid Level : raid1
>      Array Size : 488383936 (465.76 GiB 500.11 GB)
>     Device Size : 488383936 (465.76 GiB 500.11 GB)
>    Raid Devices : 6
>   Total Devices : 5
> Preferred Minor : 1
>     Persistence : Superblock is persistent
>
>     Update Time : Fri Dec  7 11:37:46 2007
>           State : active, degraded
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 0
>
>            UUID : f3ee6aa3:2f1d5767:f443dfc0:c23e80af
>          Events : 0.22331500
>
>     Number   Major   Minor   RaidDevice State
>        0       0        0        -      removed
>        1       8       97        1      active sync   /dev/sdg1
>        2       8       33        2      active sync   /dev/sdc1
>        3       8       49        3      active sync   /dev/sdd1
>        4       8       17        4      active sync   /dev/sdb1
>        5       0        0        -      removed
>
>        6       8       65        0      faulty   /dev/sde1
>
>   
This is without doubt really messed up! You have four active devices, 
four working devices, five total devices, and six(!) raid devices. And 
at the end of the output seven(!!) devices, four active, two removed, 
and one faulty. I wouldn't even be able to make a guess how you go to 
this point, but I would guess that some system administration was involved.

If this is an array you can live without and still have a working system 
I do have a thought, however. If you can unmount everything on this 
device and then stop it, you may be able to assemble (-A) it with just 
the four working drives. If that succeeds you may be able to remove 
sde1, although I suspect that the two removed drives shown are really 
caused by partially removal of sde1 in the past. Either that or you have 
a serious problem with reliability...

I'm sure others will have some ideas on this, if it were mine a backup 
would be my first order of business.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: unable to remove failed drive
  2007-12-10 19:12 ` Bill Davidsen
@ 2007-12-10 21:46   ` Jeff Breidenbach
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff Breidenbach @ 2007-12-10 21:46 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid list

> This is without doubt really messed up!

Thanks for the comments. Fortunately, a reboot cleared the problem (the
array did resync afterwards). It might help that I've decided sde is too
dangerous to have in the array, so I took it completely out of action.  In
any case this wasn't so much a cry for help as providing a report for bug
hunters. I'm happy to provide any follow up information if that is helpful.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-12-10 21:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-08  1:55 unable to remove failed drive Jeff Breidenbach
2007-12-10 19:12 ` Bill Davidsen
2007-12-10 21:46   ` Jeff Breidenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).