linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bug/race in md causing device to wedge in busy state
@ 2009-12-17  1:04 Brett Russ
  2009-12-22 21:48 ` Brett Russ
  2009-12-23 23:12 ` Neil Brown
  0 siblings, 2 replies; 5+ messages in thread
From: Brett Russ @ 2009-12-17  1:04 UTC (permalink / raw)
  To: linux-raid

I'm seeing cases where an attempted remove of a manually faulted disk 
from an existing RAID unit can fail with mdadm reporting "Device or 
resource busy".  I've reduced the problem down to the smallest set that 
reliably reproduces the issue:

Starting with 2 drives (a,b), each with at least 3 partitions:
1) create 3 raid1 md's on the drives using the 3 partitions
2) fault & remove drive b from each of the 3 md's
3) zero the superblock on b so it forgets where it came from (or use a 
third drive c...) and add drive b back to each of the 3 md's
4) fault & remove drive b from each of the 3 md's

The problem was originally seen sporadically during the remove part of 
step 2, but is *very* reproducible in the remove part of step 4.  I 
attribute this to the fact that there's guaranteed I/O happening during 
this step.

Now here's the catch.  If I change step 4 to:
4a) fault drive b from each of the 3 md's
4b) remove drive b from each of the 3 md's
then the removes haven't yet been seen to fail with BUSY yet (i.e. no 
issues).

But my scripts currently do this instead for each md:
4a) fault drive b from md
4b) sleep 0-10 seconds
4c) remove drive b md
which will fail on the remove from one of the md's, almost guaranteed. 
It seems odd to me that no amount of sleeping in between these steps can 
allow me to reliably remove a faulted member of an array.

Some other info:
-I haven't yet seen the problem if I use < 3 partitions.
-I've tried different create options to no avail
-I'm running a Fedora 2.6.31.6 kernel
-I've hooked up to this machine with kgdb and may poke around more if needed

I will use the "fault all" then "remove all" as my workaround if it 
proves reliable, but I thought it'd be a good idea to report this bug.

To that end, below find 3 things:

1) the simplest possible run of a script that reliably triggers bug
2) the system logs from this run
3) the script used to simulate this bug


1) console output of running simplest use case of script (note md110 is 
not part of this exercise):

> mxmx# ./sim_md_busy.sh -dg1 /dev/sdg -dg2 /dev/sdp -df /dev/sdp -dr /dev/sdp -create -m bug3part
> mdadm --create /dev/md0  --force --run --level=1 --raid-devices=2 /dev/sdg1  /dev/sdp1 --assume-clean
> mdadm: /dev/sdg1 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: /dev/sdp1 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: array /dev/md0 started.
> Succeeded after 0 retries
> mdadm --create /dev/md1  --force --run --level=1 --raid-devices=2 /dev/sdg3  /dev/sdp3 --assume-clean
> mdadm: /dev/sdg3 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: /dev/sdp3 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: array /dev/md1 started.
> Succeeded after 0 retries
> mdadm --create /dev/md2  --force --run --level=1 --raid-devices=2 /dev/sdg4  /dev/sdp4 --assume-clean
> mdadm: /dev/sdg4 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: /dev/sdp4 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Wed Dec 16 19:43:50 2009
> mdadm: array /dev/md2 started.
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md0 --fail   /dev/sdp1
> mdadm: set /dev/sdp1 faulty in /dev/md0
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md0 --remove   /dev/sdp1
> mdadm: hot removed /dev/sdp1
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md1 --fail   /dev/sdp3
> mdadm: set /dev/sdp3 faulty in /dev/md1
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3
> mdadm: hot removed /dev/sdp3
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md2 --fail   /dev/sdp4
> mdadm: set /dev/sdp4 faulty in /dev/md2
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md2 --remove   /dev/sdp4
> mdadm: hot removed /dev/sdp4
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md0 --add   /dev/sdp1
> mdadm: added /dev/sdp1
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md1 --add   /dev/sdp3
> mdadm: added /dev/sdp3
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md2 --add   /dev/sdp4
> mdadm: added /dev/sdp4
> Succeeded after 0 retries
> Personalities : [raid1] [raid10] [raid0] [linear]
> md2 : active raid1 sdp4[2] sdg4[0]
>       4200896 blocks [2/1] [U_]
>         resync=DELAYED
>
> md1 : active raid1 sdp3[2] sdg3[0]
>       4200896 blocks [2/1] [U_]
>         resync=DELAYED
>
> md0 : active raid1 sdp1[2] sdg1[0]
>       4200896 blocks [2/1] [U_]
>       [>....................]  recovery =  1.5% (65472/4200896) finish=1.0min speed=65472K/sec
>
> md110 : active raid1 sda6[0] sdn6[1]
>       1060248 blocks super 1.0 [2/2] [UU]
>       bitmap: 1/1 pages [4KB], 1024KB chunk
>
> unused devices: <none>
> mdadm --verbose  --manage /dev/md0 --fail   /dev/sdp1
> mdadm: set /dev/sdp1 faulty in /dev/md0
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md0 --remove   /dev/sdp1
> mdadm: hot removed /dev/sdp1
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md1 --fail   /dev/sdp3
> mdadm: set /dev/sdp3 faulty in /dev/md1
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> ERROR: Command failed after 3 retries (mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3)
> mdadm --verbose  --manage /dev/md2 --fail   /dev/sdp4
> mdadm: set /dev/sdp4 faulty in /dev/md2
> Succeeded after 0 retries
> mdadm --verbose  --manage /dev/md2 --remove   /dev/sdp4
> mdadm: hot removed /dev/sdp4



2) the logs from that run:

> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdg1>
> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdp1>
> Dec 16 19:44:34 mxmx user.info kernel: raid1: raid set md0 active with 2 out of 2 mirrors
> Dec 16 19:44:34 mxmx user.info kernel: md0: detected capacity change from 0 to 4301717504
> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdg3>
> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdp3>
> Dec 16 19:44:34 mxmx user.info kernel: raid1: raid set md1 active with 2 out of 2 mirrors
> Dec 16 19:44:34 mxmx user.info kernel: md1: detected capacity change from 0 to 4301717504
> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdg4>
> Dec 16 19:44:34 mxmx user.info kernel: md: bind<sdp4>
> Dec 16 19:44:34 mxmx user.info kernel: raid1: raid set md2 active with 2 out of 2 mirrors
> Dec 16 19:44:34 mxmx user.info kernel: md2: detected capacity change from 0 to 4301717504
> Dec 16 19:44:35 mxmx user.info kernel:  md0:
> Dec 16 19:44:35 mxmx user.info kernel:  unknown partition table
> Dec 16 19:44:36 mxmx user.alert kernel: raid1: Disk failure on sdp1, disabling device.
> Dec 16 19:44:36 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:36 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:36 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:36 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Dec 16 19:44:36 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp1
> Dec 16 19:44:36 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:36 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:36 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Dec 16 19:44:38 mxmx user.info kernel: md: unbind<sdp1>
> Dec 16 19:44:38 mxmx user.info kernel: md: export_rdev(sdp1)
> Dec 16 19:44:38 mxmx user.info kernel:  md1:
> Dec 16 19:44:38 mxmx user.info kernel:  unknown partition table
> Dec 16 19:44:38 mxmx user.alert kernel: raid1: Disk failure on sdp3, disabling device.
> Dec 16 19:44:38 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:38 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:38 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:38 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Dec 16 19:44:38 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp3
> Dec 16 19:44:38 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:38 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:38 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Dec 16 19:44:40 mxmx user.info kernel: md: unbind<sdp3>
> Dec 16 19:44:40 mxmx user.info kernel: md: export_rdev(sdp3)
> Dec 16 19:44:40 mxmx user.info kernel:  md2:
> Dec 16 19:44:40 mxmx user.info kernel:  unknown partition table
> Dec 16 19:44:40 mxmx user.alert kernel: raid1: Disk failure on sdp4, disabling device.
> Dec 16 19:44:40 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:40 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:40 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:40 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Dec 16 19:44:40 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp4
> Dec 16 19:44:40 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:40 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:40 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Dec 16 19:44:42 mxmx user.info kernel: md: unbind<sdp4>
> Dec 16 19:44:42 mxmx user.info kernel: md: export_rdev(sdp4)
> Dec 16 19:44:42 mxmx user.info kernel: md: bind<sdp1>
> Dec 16 19:44:42 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:42 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp1
> Dec 16 19:44:42 mxmx user.info kernel: md: recovery of RAID array md0
> Dec 16 19:44:42 mxmx user.info kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Dec 16 19:44:42 mxmx user.info kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Dec 16 19:44:42 mxmx user.info kernel: md: using 128k window, over a total of 4200896 blocks.
> Dec 16 19:44:42 mxmx user.info kernel: md: bind<sdp3>
> Dec 16 19:44:42 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:42 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp3
> Dec 16 19:44:42 mxmx user.info kernel: md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
> Dec 16 19:44:42 mxmx user.info kernel: md: bind<sdp4>
> Dec 16 19:44:42 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:42 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Dec 16 19:44:42 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp4
> Dec 16 19:44:42 mxmx user.info kernel: md: delaying recovery of md2 until md0 has finished (they share one or more physical units)
> Dec 16 19:44:42 mxmx user.info kernel: md: delaying recovery of md1 until md2 has finished (they share one or more physical units)
> Dec 16 19:44:43 mxmx user.alert kernel: raid1: Disk failure on sdp1, disabling device.
> Dec 16 19:44:43 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:43 mxmx user.info kernel: md: md0: recovery done.
> Dec 16 19:44:44 mxmx user.info kernel: md: delaying recovery of md1 until md2 has finished (they share one or more physical units)
> Dec 16 19:44:44 mxmx user.info kernel: md: recovery of RAID array md2
> Dec 16 19:44:44 mxmx user.info kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Dec 16 19:44:44 mxmx user.info kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Dec 16 19:44:44 mxmx user.info kernel: md: using 128k window, over a total of 4200896 blocks.
> Dec 16 19:44:44 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:44 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:44 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Dec 16 19:44:44 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp1
> Dec 16 19:44:44 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:44 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:44 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Dec 16 19:44:45 mxmx user.info kernel: md: unbind<sdp1>
> Dec 16 19:44:45 mxmx user.info kernel: md: export_rdev(sdp1)
> Dec 16 19:44:45 mxmx user.alert kernel: raid1: Disk failure on sdp3, disabling device.
> Dec 16 19:44:45 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:48 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Dec 16 19:44:49 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Dec 16 19:44:50 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Dec 16 19:44:51 mxmx user.alert kernel: raid1: Disk failure on sdp4, disabling device.
> Dec 16 19:44:51 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Dec 16 19:44:51 mxmx user.info kernel: md: md2: recovery done.
> Dec 16 19:44:51 mxmx user.info kernel: md: recovery of RAID array md1
> Dec 16 19:44:51 mxmx user.info kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Dec 16 19:44:51 mxmx user.info kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Dec 16 19:44:51 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:51 mxmx user.info kernel: md: using 128k window, over a total of 4200896 blocks.
> Dec 16 19:44:51 mxmx user.info kernel: md: resuming recovery of md1 from checkpoint.
> Dec 16 19:44:51 mxmx user.info kernel: md: md1: recovery done.
> Dec 16 19:44:51 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Dec 16 19:44:51 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:51 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp3
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp4
> Dec 16 19:44:51 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:51 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:51 mxmx user.warn kernel: RAID1 conf printout:
> Dec 16 19:44:51 mxmx user.warn kernel:  --- wd:1 rd:2
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Dec 16 19:44:51 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Dec 16 19:44:53 mxmx user.info kernel: md: unbind<sdp4>
> Dec 16 19:44:53 mxmx user.info kernel: md: export_rdev(sdp4)



3) the script (note if/when running to adjust partition (and if needed, 
md) numbers to match your system.  Actual drive names are passed in as 
arguments.  See usage and example usage above.  "modes" which fail are 
"bug" and "bug3part".  The mode which reliably works with any # of 
partitions is "fault_all_first".

> #!/bin/sh
>
> do_md()
> {
>     md=$1
>     shift
>     op=$1
>     shift
>     opdisk=$1
>     shift
>     opdisk2=$1
>
>     # customize the 'pp' partitions used for each of the 'md' devices here:
>     case $md in
>         0) pp=1 ;;
>         1) pp=3 ;;
>         2) pp=4 ;;
>         3) pp=5 ;;
>         *)
>             echo "unsupported md dev: $md"
>             return
>             ;;
>     esac
>
>     retry="y"
>     case $op in
>         "stop")
> 	    retry="n"
>             cmd="mdadm --verbose  --stop /dev/md$md"
>             ;;
>         "create")
> 	    retry="n"
>             cmd="mdadm --create /dev/md$md  --force --run --level=1 \
> --raid-devices=2 ${opdisk}$pp  ${opdisk2}$pp --assume-clean "
>             ;;
>         *)
>             if [ "$op" = "add" ]; then
>                 mdadm --zero-superblock $opdisk$pp
>             fi
>             cmd="mdadm --verbose  --manage /dev/md$md --$op   ${opdisk}$pp"
>             ;;
>     esac
>     echo "$cmd"
>     retries=3
>     i=$retries
>     while [ $i -gt 0 ]; do
>         $cmd
> 	cmdstatus=$?
> 	if [ $cmdstatus -eq 0 ]; then
> 	    break
> 	fi
> 	[ "$retry" != "y" ] && break
>         i=$((i-1))
> 	sleep 1
>     done
>     if [ $cmdstatus -eq 0 ]; then
>         echo "Succeeded after $((retries-i)) retries"
>     else
>         echo "ERROR: Command failed after $retries retries ($cmd)"
>     fi
> }
>
> do_md_all()
> {
>     op=$1
>     shift
>     opdisk=$1
>     shift
>     opdisk2=$1
>
>     for md in $mdlist ; do
> 	do_md $md $op $opdisk $opdisk2
>     done
> }
>
> do_mode_bug()
> {
>     for md in $mdlist ; do
> 	do_md $md fail $fdisk
> 	sleep 2
> 	do_md $md remove $fdisk
>     done
>
>     for md in $mdlist ; do
> 	do_md $md add $rdisk
>     done
>
>     sleep 1
>     cat /proc/mdstat
>
>     for md in $mdlist ; do
> 	do_md $md fail $rdisk
> 	sleep 2
> 	do_md $md remove $rdisk
>     done
> }
>
> do_mode_fault_all_first()
> {
>     do_md_all "fail" $fdisk
>     do_md_all "remove" $fdisk
>
>     # regen to spare:
>
>     do_md_all "add" $rdisk
>
>     # failover spare:
>
>     sleep 1
>     cat /proc/mdstat
>
>     do_md_all "fail" $rdisk
> #    sleep 1
>     cat /proc/mdstat
> #    sleep 1
>     do_md_all "remove" $rdisk
> }
>
> usage()
> {
>     cat <<EOF
> Usage: $0 [OPTIONS]
> -m <arg>  : run in mode <arg>; supported are:
>             "bug", "fault_all_first", "bug1part",
>             "bug2part", "bug3part"
> -stop     : first stop any running arrays previously setup by
>             this program on the given disks
> -create   : create needed running md's on the given disks (otherwise
>             we assume they're already running)
> -dg1 <arg> : use <arg> as the 1st "good" disk
> -dg2 <arg> : use <arg> as the 2nd "good" disk
> -df <arg>  : use <arg> as the "failed" disk
> -dr <arg>  : use <arg> as the "replacement" disk
> EOF
> }
>
> mdlist="0 1 2 3"
>
> while [ $# -gt 0 ]; do
>     case "$1" in
> 	-m)
> 	    case "$2" in
> 		bug1part)
> 		    mdlist="0"
> 		    mode=$2 ;;
> 		bug2part)
> 		    mdlist="0 1"
> 		    mode=$2 ;;
> 		bug3part)
> 		    mdlist="0 1 2"
> 		    mode=$2 ;;
> 		bug|\
> 		fault_all_first)
> 		    mode=$2 ;;
> 		*)
> 		    usage
> 		    exit 1 ;;
> 	    esac
>             shift
>             ;;
> 	-dg1)
>             gdisk1="$2"
>             shift
>             ;;
> 	-dg2)
>             gdisk2="$2"
>             shift
>             ;;
> 	-df)
>             fdisk="$2"
>             shift
>             ;;
> 	-dr)
>             rdisk="$2"
>             shift
>             ;;
> 	-h|--help|-help)
>             usage
>             exit 0
>             ;;
> 	-stop)
>             v_stop=1
>             ;;
> 	-create)
>             v_create=1
>             ;;
> 	*)
>             usage
> 	    echo "Unknown argument: $1"
>             exit 1
>             ;;
>     esac
>     shift;
> done
>
>
> [ -n "$gdisk1" -a -n "$gdisk2" -a -n "$fdisk" -a -n "$rdisk" ] || {
>     usage
>     echo "Need all 4 disks supplied (good[12]/failed/replacement)"
>     exit 1
> }
>
> if [ -n "$v_stop" ]; then
>     for md in $mdlist ; do
> 	do_md $md "stop"
>     done
> fi
>
> sleep 1
>
> if [ -n "$v_create" ]; then
>     for md in $mdlist ; do
> 	do_md $md "create" $gdisk1 $gdisk2
>     done
> fi
>
> sleep 1
>
> if [ -z "$mode" ]; then
>     echo "No mode provided; exiting"
>     exit 0
> fi
>
> case $mode in
>     bug*) do_mode_bug ;;
>     *) do_mode_$mode ;;
> esac



Thanks,
Brett

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug/race in md causing device to wedge in busy state
  2009-12-17  1:04 bug/race in md causing device to wedge in busy state Brett Russ
@ 2009-12-22 21:48 ` Brett Russ
  2009-12-23 23:12 ` Neil Brown
  1 sibling, 0 replies; 5+ messages in thread
From: Brett Russ @ 2009-12-22 21:48 UTC (permalink / raw)
  To: linux-raid

On 12/16/2009 08:04 PM, Brett Russ wrote:
> I'm seeing cases where an attempted remove of a manually faulted disk
> from an existing RAID unit can fail with mdadm reporting "Device or
> resource busy". I've reduced the problem down to the smallest set that
> reliably reproduces the issue:
>
> Starting with 2 drives (a,b), each with at least 3 partitions:
> 1) create 3 raid1 md's on the drives using the 3 partitions
> 2) fault & remove drive b from each of the 3 md's
> 3) zero the superblock on b so it forgets where it came from (or use a
> third drive c...) and add drive b back to each of the 3 md's
> 4) fault & remove drive b from each of the 3 md's
>
> The problem was originally seen sporadically during the remove part of
> step 2, but is *very* reproducible in the remove part of step 4. I
> attribute this to the fact that there's guaranteed I/O happening during
> this step.
>
> Now here's the catch. If I change step 4 to:
> 4a) fault drive b from each of the 3 md's
> 4b) remove drive b from each of the 3 md's
> then the removes haven't yet been seen to fail with BUSY yet (i.e. no
> issues).
>
> But my scripts currently do this instead for each md:
> 4a) fault drive b from md
> 4b) sleep 0-10 seconds
> 4c) remove drive b md
> which will fail on the remove from one of the md's, almost guaranteed.
> It seems odd to me that no amount of sleeping in between these steps can
> allow me to reliably remove a faulted member of an array.

Neil et al,

Would you expect to see a dependency across md devices on the same 
spindle which would affect a device remove like this?

I have to assume it's a bug since the condition doesn't clear up even 
after removing the rest of the devices on the spindle, i.e. the 
partition permanently reports busy.

-Brett


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug/race in md causing device to wedge in busy state
  2009-12-17  1:04 bug/race in md causing device to wedge in busy state Brett Russ
  2009-12-22 21:48 ` Brett Russ
@ 2009-12-23 23:12 ` Neil Brown
  2010-01-08 15:18   ` Brett Russ
  1 sibling, 1 reply; 5+ messages in thread
From: Neil Brown @ 2009-12-23 23:12 UTC (permalink / raw)
  To: Brett Russ; +Cc: linux-raid

On Wed, 16 Dec 2009 20:04:46 -0500
Brett Russ <bruss@netezza.com> wrote:

> I'm seeing cases where an attempted remove of a manually faulted disk 
> from an existing RAID unit can fail with mdadm reporting "Device or 
> resource busy".  I've reduced the problem down to the smallest set that 
> reliably reproduces the issue:

Thanks for the very detailed report.

Can you please see if the following patch fixes the problem.

When an array wants to resync but is waiting for other arrays
on the same devices to finish their resync, it does not abort the
resync attempt properly when an error is reported.
This should fix that.

Thanks,
NeilBrown



diff --git a/drivers/md/md.c b/drivers/md/md.c
index d2aff72..42fa446 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6504,6 +6504,8 @@ void md_do_sync(mddev_t *mddev)
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 			goto skip;
 		}
+		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			goto skip;
 		for_each_mddev(mddev2, tmp) {
 			if (mddev2 == mddev)
 				continue;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: bug/race in md causing device to wedge in busy state
  2009-12-23 23:12 ` Neil Brown
@ 2010-01-08 15:18   ` Brett Russ
  2010-01-29 14:50     ` Brett Russ
  0 siblings, 1 reply; 5+ messages in thread
From: Brett Russ @ 2010-01-08 15:18 UTC (permalink / raw)
  To: linux-raid

On 12/23/2009 06:12 PM, Neil Brown wrote:
> Can you please see if the following patch fixes the problem.
>
> When an array wants to resync but is waiting for other arrays
> on the same devices to finish their resync, it does not abort the
> resync attempt properly when an error is reported.
> This should fix that.

Neil,

Sorry for the delay--holidays, etc.  I tried the patch and, while the 
newly added code is getting hit, I'm still hitting the issue.  See below:

> mxmx# ./sim_md_busy.sh -dg1 /dev/sdg -dg2 /dev/sdp -df /dev/sdp -dr /dev/sdp -create -m bug3part 2>&1 | tee md_busy_run_bug3part.6
> mdadm --create /dev/md0  --force --run --level=1 --raid-devices=2 /dev/sdg1  /dev/sdp1 --assume-clean
> mdadm: /dev/sdg1 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: /dev/sdp1 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: array /dev/md0 started.
> mdadm --create /dev/md1  --force --run --level=1 --raid-devices=2 /dev/sdg3  /dev/sdp3 --assume-clean
> mdadm: /dev/sdg3 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: /dev/sdp3 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: array /dev/md1 started.
> mdadm --create /dev/md2  --force --run --level=1 --raid-devices=2 /dev/sdg4  /dev/sdp4 --assume-clean
> mdadm: /dev/sdg4 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: /dev/sdp4 appears to be part of a raid array:
>     level=raid1 devices=2 ctime=Fri Jan  8 10:11:09 2010
> mdadm: array /dev/md2 started.
> mdadm --verbose  --manage /dev/md0 --fail   /dev/sdp1
> mdadm: set /dev/sdp1 faulty in /dev/md0
> mdadm --verbose  --manage /dev/md0 --remove   /dev/sdp1
> mdadm: hot removed /dev/sdp1
> mdadm --verbose  --manage /dev/md1 --fail   /dev/sdp3
> mdadm: set /dev/sdp3 faulty in /dev/md1
> mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3
> mdadm: hot removed /dev/sdp3
> mdadm --verbose  --manage /dev/md2 --fail   /dev/sdp4
> mdadm: set /dev/sdp4 faulty in /dev/md2
> mdadm --verbose  --manage /dev/md2 --remove   /dev/sdp4
> mdadm: hot removed /dev/sdp4
> mdadm --verbose  --manage /dev/md0 --add   /dev/sdp1
> mdadm: added /dev/sdp1
> mdadm --verbose  --manage /dev/md1 --add   /dev/sdp3
> mdadm: added /dev/sdp3
> mdadm --verbose  --manage /dev/md2 --add   /dev/sdp4
> mdadm: added /dev/sdp4
> Personalities : [raid1] [raid10]
> md2 : active raid1 sdp4[2] sdg4[0]
>       4200896 blocks [2/1] [U_]
>         resync=DELAYED
>
> md1 : active raid1 sdp3[2] sdg3[0]
>       4200896 blocks [2/1] [U_]
>         resync=DELAYED
>
> md0 : active raid1 sdp1[2] sdg1[0]
>       4200896 blocks [2/1] [U_]
>       [>....................]  recovery =  1.9% (80256/4200896) finish=0.8min speed=80256K/sec
>
> unused devices: <none>
> mdadm --verbose  --manage /dev/md0 --fail   /dev/sdp1
> mdadm: set /dev/sdp1 faulty in /dev/md0
> mdadm --verbose  --manage /dev/md0 --remove   /dev/sdp1
> mdadm: hot removed /dev/sdp1
> mdadm --verbose  --manage /dev/md1 --fail   /dev/sdp3
> mdadm: set /dev/sdp3 faulty in /dev/md1
> mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
> ERROR: Command failed after 3 retries (mdadm --verbose  --manage /dev/md1 --remove   /dev/sdp3)
> mdadm --verbose  --manage /dev/md2 --fail   /dev/sdp4
> mdadm: set /dev/sdp4 faulty in /dev/md2
> mdadm --verbose  --manage /dev/md2 --remove   /dev/sdp4
> mdadm: hot removed /dev/sdp4


And the log from this run:

> Jan  8 10:11:41 mxmx user.notice root: chkpt
> Jan  8 10:11:46 mxmx user.info kernel: md: bind<sdg1>
> Jan  8 10:11:46 mxmx user.info kernel: md: bind<sdp1>
> Jan  8 10:11:46 mxmx user.info kernel: raid1: raid set md0 active with 2 out of 2 mirrors
> Jan  8 10:11:46 mxmx user.info kernel: md0: detected capacity change from 0 to 4301717504
> Jan  8 10:11:46 mxmx user.info kernel: md: bind<sdg3>
> Jan  8 10:11:46 mxmx user.info kernel: md: bind<sdp3>
> Jan  8 10:11:46 mxmx user.info kernel: raid1: raid set md1 active with 2 out of 2 mirrors
> Jan  8 10:11:46 mxmx user.info kernel: md1: detected capacity change from 0 to 4301717504
> Jan  8 10:11:47 mxmx user.info kernel: md: bind<sdg4>
> Jan  8 10:11:47 mxmx user.info kernel: md: bind<sdp4>
> Jan  8 10:11:47 mxmx user.info kernel: raid1: raid set md2 active with 2 out of 2 mirrors
> Jan  8 10:11:47 mxmx user.info kernel: md2: detected capacity change from 0 to 4301717504
> Jan  8 10:11:48 mxmx user.info kernel:  md0:
> Jan  8 10:11:48 mxmx user.info kernel:  unknown partition table
> Jan  8 10:11:48 mxmx user.alert kernel: raid1: Disk failure on sdp1, disabling device.
> Jan  8 10:11:48 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:11:48 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:48 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:48 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Jan  8 10:11:48 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp1
> Jan  8 10:11:48 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:48 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:48 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Jan  8 10:11:50 mxmx user.info kernel: md: unbind<sdp1>
> Jan  8 10:11:50 mxmx user.info kernel: md: export_rdev(sdp1)
> Jan  8 10:11:50 mxmx user.info kernel:  md1:
> Jan  8 10:11:50 mxmx user.info kernel:  unknown partition table
> Jan  8 10:11:50 mxmx user.alert kernel: raid1: Disk failure on sdp3, disabling device.
> Jan  8 10:11:50 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:11:50 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:50 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:50 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Jan  8 10:11:50 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp3
> Jan  8 10:11:50 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:50 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:50 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Jan  8 10:11:52 mxmx user.info kernel: md: unbind<sdp3>
> Jan  8 10:11:52 mxmx user.info kernel: md: export_rdev(sdp3)
> Jan  8 10:11:52 mxmx user.info kernel:  md2:
> Jan  8 10:11:52 mxmx user.info kernel:  unknown partition table
> Jan  8 10:11:52 mxmx user.alert kernel: raid1: Disk failure on sdp4, disabling device.
> Jan  8 10:11:52 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:11:52 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:52 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:52 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Jan  8 10:11:52 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp4
> Jan  8 10:11:52 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:52 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:52 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Jan  8 10:11:54 mxmx user.info kernel: md: unbind<sdp4>
> Jan  8 10:11:54 mxmx user.info kernel: md: export_rdev(sdp4)
> Jan  8 10:11:54 mxmx user.info kernel: md: bind<sdp1>
> Jan  8 10:11:54 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:54 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:54 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Jan  8 10:11:54 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp1
> Jan  8 10:11:54 mxmx user.info kernel: md: recovery of RAID array md0
> Jan  8 10:11:54 mxmx user.info kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jan  8 10:11:54 mxmx user.info kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Jan  8 10:11:54 mxmx user.info kernel: md: using 128k window, over a total of 4200896 blocks.
> Jan  8 10:11:54 mxmx user.info kernel: md: bind<sdp3>
> Jan  8 10:11:54 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:54 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:54 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Jan  8 10:11:54 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp3
> Jan  8 10:11:54 mxmx user.info kernel: md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
> Jan  8 10:11:55 mxmx user.info kernel: md: bind<sdp4>
> Jan  8 10:11:55 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:55 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:55 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Jan  8 10:11:55 mxmx user.warn kernel:  disk 1, wo:1, o:1, dev:sdp4
> Jan  8 10:11:55 mxmx user.info kernel: md: delaying recovery of md2 until md1 has finished (they share one or more physical units)
> Jan  8 10:11:56 mxmx user.alert kernel: raid1: Disk failure on sdp1, disabling device.
> Jan  8 10:11:56 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:11:56 mxmx user.info kernel: md: md0: recovery done.
> Jan  8 10:11:56 mxmx user.info kernel: md: delaying recovery of md2 until md1 has finished (they share one or more physical units)
> Jan  8 10:11:56 mxmx user.info kernel: md: delaying recovery of md1 until md2 has finished (they share one or more physical units)
> Jan  8 10:11:56 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:56 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:56 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Jan  8 10:11:56 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp1
> Jan  8 10:11:56 mxmx user.info kernel: md: recovery of RAID array md2
> Jan  8 10:11:56 mxmx user.info kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jan  8 10:11:56 mxmx user.info kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Jan  8 10:11:56 mxmx user.info kernel: md: using 128k window, over a total of 4200896 blocks.
> Jan  8 10:11:56 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:11:56 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:11:56 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg1
> Jan  8 10:11:58 mxmx user.info kernel: md: unbind<sdp1>
> Jan  8 10:11:58 mxmx user.info kernel: md: export_rdev(sdp1)
> Jan  8 10:11:58 mxmx user.alert kernel: raid1: Disk failure on sdp3, disabling device.
> Jan  8 10:11:58 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:12:00 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Jan  8 10:12:01 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Jan  8 10:12:02 mxmx user.warn kernel: md: cannot remove active disk sdp3 from md1 ...
> Jan  8 10:12:03 mxmx user.alert kernel: raid1: Disk failure on sdp4, disabling device.
> Jan  8 10:12:03 mxmx user.alert kernel: raid1: Operation continuing on 1 devices.
> Jan  8 10:12:03 mxmx user.info kernel: md: md2: recovery done.
> Jan  8 10:12:03 mxmx user.err kernel: md: running bug fix for md1
> Jan  8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:12:03 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp4
> Jan  8 10:12:03 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 1, wo:1, o:0, dev:sdp3
> Jan  8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:12:03 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
> Jan  8 10:12:03 mxmx user.warn kernel:  --- wd:1 rd:2
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg4
> Jan  8 10:12:03 mxmx user.warn kernel:  disk 0, wo:0, o:1, dev:sdg3
> Jan  8 10:12:05 mxmx user.info kernel: md: unbind<sdp4>
> Jan  8 10:12:05 mxmx user.info kernel: md: export_rdev(sdp4)


Note the line "md: running bug fix for md1" at 10:12:03 which I added to 
your patch inside the 'if' before the 'goto'.  Looks like it hit that 
code *after* the busy...

Thanks,
Brett


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bug/race in md causing device to wedge in busy state
  2010-01-08 15:18   ` Brett Russ
@ 2010-01-29 14:50     ` Brett Russ
  0 siblings, 0 replies; 5+ messages in thread
From: Brett Russ @ 2010-01-29 14:50 UTC (permalink / raw)
  To: linux-raid

Hi Neil,
Looks like you're back from traveling?  Just wanted to make sure we 
don't forget about the issue below.
Thanks,
Brett

On 01/08/2010 10:18 AM, Brett Russ wrote:
> On 12/23/2009 06:12 PM, Neil Brown wrote:
>> Can you please see if the following patch fixes the problem.
>>
>> When an array wants to resync but is waiting for other arrays
>> on the same devices to finish their resync, it does not abort the
>> resync attempt properly when an error is reported.
>> This should fix that.
>
> Neil,
>
> Sorry for the delay--holidays, etc. I tried the patch and, while the
> newly added code is getting hit, I'm still hitting the issue. See below:
>
>> mxmx# ./sim_md_busy.sh -dg1 /dev/sdg -dg2 /dev/sdp -df /dev/sdp -dr
>> /dev/sdp -create -m bug3part 2>&1 | tee md_busy_run_bug3part.6
>> mdadm --create /dev/md0 --force --run --level=1 --raid-devices=2
>> /dev/sdg1 /dev/sdp1 --assume-clean
>> mdadm: /dev/sdg1 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: /dev/sdp1 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: array /dev/md0 started.
>> mdadm --create /dev/md1 --force --run --level=1 --raid-devices=2
>> /dev/sdg3 /dev/sdp3 --assume-clean
>> mdadm: /dev/sdg3 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: /dev/sdp3 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: array /dev/md1 started.
>> mdadm --create /dev/md2 --force --run --level=1 --raid-devices=2
>> /dev/sdg4 /dev/sdp4 --assume-clean
>> mdadm: /dev/sdg4 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: /dev/sdp4 appears to be part of a raid array:
>> level=raid1 devices=2 ctime=Fri Jan 8 10:11:09 2010
>> mdadm: array /dev/md2 started.
>> mdadm --verbose --manage /dev/md0 --fail /dev/sdp1
>> mdadm: set /dev/sdp1 faulty in /dev/md0
>> mdadm --verbose --manage /dev/md0 --remove /dev/sdp1
>> mdadm: hot removed /dev/sdp1
>> mdadm --verbose --manage /dev/md1 --fail /dev/sdp3
>> mdadm: set /dev/sdp3 faulty in /dev/md1
>> mdadm --verbose --manage /dev/md1 --remove /dev/sdp3
>> mdadm: hot removed /dev/sdp3
>> mdadm --verbose --manage /dev/md2 --fail /dev/sdp4
>> mdadm: set /dev/sdp4 faulty in /dev/md2
>> mdadm --verbose --manage /dev/md2 --remove /dev/sdp4
>> mdadm: hot removed /dev/sdp4
>> mdadm --verbose --manage /dev/md0 --add /dev/sdp1
>> mdadm: added /dev/sdp1
>> mdadm --verbose --manage /dev/md1 --add /dev/sdp3
>> mdadm: added /dev/sdp3
>> mdadm --verbose --manage /dev/md2 --add /dev/sdp4
>> mdadm: added /dev/sdp4
>> Personalities : [raid1] [raid10]
>> md2 : active raid1 sdp4[2] sdg4[0]
>> 4200896 blocks [2/1] [U_]
>> resync=DELAYED
>>
>> md1 : active raid1 sdp3[2] sdg3[0]
>> 4200896 blocks [2/1] [U_]
>> resync=DELAYED
>>
>> md0 : active raid1 sdp1[2] sdg1[0]
>> 4200896 blocks [2/1] [U_]
>> [>....................] recovery = 1.9% (80256/4200896) finish=0.8min
>> speed=80256K/sec
>>
>> unused devices: <none>
>> mdadm --verbose --manage /dev/md0 --fail /dev/sdp1
>> mdadm: set /dev/sdp1 faulty in /dev/md0
>> mdadm --verbose --manage /dev/md0 --remove /dev/sdp1
>> mdadm: hot removed /dev/sdp1
>> mdadm --verbose --manage /dev/md1 --fail /dev/sdp3
>> mdadm: set /dev/sdp3 faulty in /dev/md1
>> mdadm --verbose --manage /dev/md1 --remove /dev/sdp3
>> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
>> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
>> mdadm: hot remove failed for /dev/sdp3: Device or resource busy
>> ERROR: Command failed after 3 retries (mdadm --verbose --manage
>> /dev/md1 --remove /dev/sdp3)
>> mdadm --verbose --manage /dev/md2 --fail /dev/sdp4
>> mdadm: set /dev/sdp4 faulty in /dev/md2
>> mdadm --verbose --manage /dev/md2 --remove /dev/sdp4
>> mdadm: hot removed /dev/sdp4
>
>
> And the log from this run:
>
>> Jan 8 10:11:41 mxmx user.notice root: chkpt
>> Jan 8 10:11:46 mxmx user.info kernel: md: bind<sdg1>
>> Jan 8 10:11:46 mxmx user.info kernel: md: bind<sdp1>
>> Jan 8 10:11:46 mxmx user.info kernel: raid1: raid set md0 active with
>> 2 out of 2 mirrors
>> Jan 8 10:11:46 mxmx user.info kernel: md0: detected capacity change
>> from 0 to 4301717504
>> Jan 8 10:11:46 mxmx user.info kernel: md: bind<sdg3>
>> Jan 8 10:11:46 mxmx user.info kernel: md: bind<sdp3>
>> Jan 8 10:11:46 mxmx user.info kernel: raid1: raid set md1 active with
>> 2 out of 2 mirrors
>> Jan 8 10:11:46 mxmx user.info kernel: md1: detected capacity change
>> from 0 to 4301717504
>> Jan 8 10:11:47 mxmx user.info kernel: md: bind<sdg4>
>> Jan 8 10:11:47 mxmx user.info kernel: md: bind<sdp4>
>> Jan 8 10:11:47 mxmx user.info kernel: raid1: raid set md2 active with
>> 2 out of 2 mirrors
>> Jan 8 10:11:47 mxmx user.info kernel: md2: detected capacity change
>> from 0 to 4301717504
>> Jan 8 10:11:48 mxmx user.info kernel: md0:
>> Jan 8 10:11:48 mxmx user.info kernel: unknown partition table
>> Jan 8 10:11:48 mxmx user.alert kernel: raid1: Disk failure on sdp1,
>> disabling device.
>> Jan 8 10:11:48 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:11:48 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:48 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:48 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg1
>> Jan 8 10:11:48 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp1
>> Jan 8 10:11:48 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:48 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:48 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg1
>> Jan 8 10:11:50 mxmx user.info kernel: md: unbind<sdp1>
>> Jan 8 10:11:50 mxmx user.info kernel: md: export_rdev(sdp1)
>> Jan 8 10:11:50 mxmx user.info kernel: md1:
>> Jan 8 10:11:50 mxmx user.info kernel: unknown partition table
>> Jan 8 10:11:50 mxmx user.alert kernel: raid1: Disk failure on sdp3,
>> disabling device.
>> Jan 8 10:11:50 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:11:50 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:50 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:50 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg3
>> Jan 8 10:11:50 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp3
>> Jan 8 10:11:50 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:50 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:50 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg3
>> Jan 8 10:11:52 mxmx user.info kernel: md: unbind<sdp3>
>> Jan 8 10:11:52 mxmx user.info kernel: md: export_rdev(sdp3)
>> Jan 8 10:11:52 mxmx user.info kernel: md2:
>> Jan 8 10:11:52 mxmx user.info kernel: unknown partition table
>> Jan 8 10:11:52 mxmx user.alert kernel: raid1: Disk failure on sdp4,
>> disabling device.
>> Jan 8 10:11:52 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:11:52 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:52 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:52 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg4
>> Jan 8 10:11:52 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp4
>> Jan 8 10:11:52 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:52 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:52 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg4
>> Jan 8 10:11:54 mxmx user.info kernel: md: unbind<sdp4>
>> Jan 8 10:11:54 mxmx user.info kernel: md: export_rdev(sdp4)
>> Jan 8 10:11:54 mxmx user.info kernel: md: bind<sdp1>
>> Jan 8 10:11:54 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:54 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:54 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg1
>> Jan 8 10:11:54 mxmx user.warn kernel: disk 1, wo:1, o:1, dev:sdp1
>> Jan 8 10:11:54 mxmx user.info kernel: md: recovery of RAID array md0
>> Jan 8 10:11:54 mxmx user.info kernel: md: minimum _guaranteed_ speed:
>> 1000 KB/sec/disk.
>> Jan 8 10:11:54 mxmx user.info kernel: md: using maximum available idle
>> IO bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jan 8 10:11:54 mxmx user.info kernel: md: using 128k window, over a
>> total of 4200896 blocks.
>> Jan 8 10:11:54 mxmx user.info kernel: md: bind<sdp3>
>> Jan 8 10:11:54 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:54 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:54 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg3
>> Jan 8 10:11:54 mxmx user.warn kernel: disk 1, wo:1, o:1, dev:sdp3
>> Jan 8 10:11:54 mxmx user.info kernel: md: delaying recovery of md1
>> until md0 has finished (they share one or more physical units)
>> Jan 8 10:11:55 mxmx user.info kernel: md: bind<sdp4>
>> Jan 8 10:11:55 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:55 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:55 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg4
>> Jan 8 10:11:55 mxmx user.warn kernel: disk 1, wo:1, o:1, dev:sdp4
>> Jan 8 10:11:55 mxmx user.info kernel: md: delaying recovery of md2
>> until md1 has finished (they share one or more physical units)
>> Jan 8 10:11:56 mxmx user.alert kernel: raid1: Disk failure on sdp1,
>> disabling device.
>> Jan 8 10:11:56 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:11:56 mxmx user.info kernel: md: md0: recovery done.
>> Jan 8 10:11:56 mxmx user.info kernel: md: delaying recovery of md2
>> until md1 has finished (they share one or more physical units)
>> Jan 8 10:11:56 mxmx user.info kernel: md: delaying recovery of md1
>> until md2 has finished (they share one or more physical units)
>> Jan 8 10:11:56 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:56 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:56 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg1
>> Jan 8 10:11:56 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp1
>> Jan 8 10:11:56 mxmx user.info kernel: md: recovery of RAID array md2
>> Jan 8 10:11:56 mxmx user.info kernel: md: minimum _guaranteed_ speed:
>> 1000 KB/sec/disk.
>> Jan 8 10:11:56 mxmx user.info kernel: md: using maximum available idle
>> IO bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jan 8 10:11:56 mxmx user.info kernel: md: using 128k window, over a
>> total of 4200896 blocks.
>> Jan 8 10:11:56 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:11:56 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:11:56 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg1
>> Jan 8 10:11:58 mxmx user.info kernel: md: unbind<sdp1>
>> Jan 8 10:11:58 mxmx user.info kernel: md: export_rdev(sdp1)
>> Jan 8 10:11:58 mxmx user.alert kernel: raid1: Disk failure on sdp3,
>> disabling device.
>> Jan 8 10:11:58 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:12:00 mxmx user.warn kernel: md: cannot remove active disk
>> sdp3 from md1 ...
>> Jan 8 10:12:01 mxmx user.warn kernel: md: cannot remove active disk
>> sdp3 from md1 ...
>> Jan 8 10:12:02 mxmx user.warn kernel: md: cannot remove active disk
>> sdp3 from md1 ...
>> Jan 8 10:12:03 mxmx user.alert kernel: raid1: Disk failure on sdp4,
>> disabling device.
>> Jan 8 10:12:03 mxmx user.alert kernel: raid1: Operation continuing on
>> 1 devices.
>> Jan 8 10:12:03 mxmx user.info kernel: md: md2: recovery done.
>> Jan 8 10:12:03 mxmx user.err kernel: md: running bug fix for md1
>> Jan 8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:12:03 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg4
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp4
>> Jan 8 10:12:03 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg3
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 1, wo:1, o:0, dev:sdp3
>> Jan 8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:12:03 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:12:03 mxmx user.warn kernel: RAID1 conf printout:
>> Jan 8 10:12:03 mxmx user.warn kernel: --- wd:1 rd:2
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg4
>> Jan 8 10:12:03 mxmx user.warn kernel: disk 0, wo:0, o:1, dev:sdg3
>> Jan 8 10:12:05 mxmx user.info kernel: md: unbind<sdp4>
>> Jan 8 10:12:05 mxmx user.info kernel: md: export_rdev(sdp4)
>
>
> Note the line "md: running bug fix for md1" at 10:12:03 which I added to
> your patch inside the 'if' before the 'goto'. Looks like it hit that
> code *after* the busy...



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-29 14:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-17  1:04 bug/race in md causing device to wedge in busy state Brett Russ
2009-12-22 21:48 ` Brett Russ
2009-12-23 23:12 ` Neil Brown
2010-01-08 15:18   ` Brett Russ
2010-01-29 14:50     ` Brett Russ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).