From: Alexander Lyakas <alex.bolshoy@gmail.com>
To: linux-raid@vger.kernel.org
Subject: RAID5: failing an active component during spare rebuild - arrays hangs
Date: Sun, 5 Jun 2011 22:41:55 +0300 [thread overview]
Message-ID: <BANLkTinwr9UE_B+MSXfbE2nAv0wLrTvhXg@mail.gmail.com> (raw)
In-Reply-To: <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>
Hello everybody,
I am testing a scenario, in which I create a RAID5 with three devices:
/dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation,
it treats the array as degraded and starts rebuilding the sdc as a
spare. This is as documented.
Then I do --fail on /dev/sda. I understand that at this point my data
is gone, but I think should still be able to tear down the array.
Sometimes I see that /dev/sda is kicked from the array as faulty, and
/dev/sdc is also removed and marked as a spare. Then I am able to tear
down the array.
But sometimes, it looks like the system hits some kind of a deadlock.
mdadm --detail produces:
Update Time : Sun Jun 5 21:54:34 2011
State : active, FAILED
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:zvp_1123
UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c
Events : 67
Number Major Minor RaidDevice State
0 8 0 0 faulty spare rebuilding /dev/sda
1 8 16 1 active sync /dev/sdb
3 8 32 2 spare rebuilding /dev/sdc
So the faulty device and the spare are not kicked out of the array. At
this point I am unable to do anything with the array:
root@ubuntu:~# sudo mdadm --stop /dev/md1123
mdadm: failed to stop array /dev/md1123: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda
mdadm: hot remove failed for /dev/sda: Device or resource busy
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb
mdadm: hot remove failed for /dev/sdb: Device or resource busy
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc
mdadm: hot remove failed for /dev/sdc: Device or resource busy
This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010.
Looking at some code in mdadm/Detail.c, it looks like /dev/sda has
been marked only as MD_DISK_FAULTY, but has not yet been kicked out of
the array. The "spare" and "rebuilding" prints also result from that.
Same thing also happens (sometimes) when I manually initiate resync
(by writing 'repair' to 'sync_action'), and later manually failing one
of the devices. Then I also saw messages like this in the syslog:
Jun 5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task
md1123_resync:7993 blocked for more than 120 seconds.
Jun 5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync D
0000000000000000 0 7993 2 0x00000004
Jun 5 21:42:00 ubuntu kernel: [ 2280.350647] ffff8800b56b1cd0
0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000
Jun 5 21:42:00 ubuntu kernel: [ 2280.350649] 0000000000013d00
ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00
Jun 5 21:42:00 ubuntu kernel: [ 2280.350652] ffff8800b7f1adc0
ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610
Jun 5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace:
Jun 5 21:42:00 ubuntu kernel: [ 2280.350657] [<ffffffff81492885>]
md_do_sync+0xb45/0xc90
Jun 5 21:42:00 ubuntu kernel: [ 2280.350660] [<ffffffff81087940>] ?
autoremove_wake_function+0x0/0x40
Jun 5 21:42:00 ubuntu kernel: [ 2280.350663] [<ffffffff8107861b>] ?
recalc_sigpending+0x1b/0x50
Jun 5 21:42:00 ubuntu kernel: [ 2280.350665] [<ffffffff8148c516>]
md_thread+0x116/0x150
Jun 5 21:42:00 ubuntu kernel: [ 2280.350667] [<ffffffff8148c400>] ?
md_thread+0x0/0x150
Jun 5 21:42:00 ubuntu kernel: [ 2280.350669] [<ffffffff810871f6>]
kthread+0x96/0xa0
Jun 5 21:42:00 ubuntu kernel: [ 2280.350672] [<ffffffff8100cde4>]
kernel_thread_helper+0x4/0x10
Jun 5 21:42:00 ubuntu kernel: [ 2280.350674] [<ffffffff81087160>] ?
kthread+0x0/0xa0
Jun 5 21:42:00 ubuntu kernel: [ 2280.350676] [<ffffffff8100cde0>] ?
kernel_thread_helper+0x0/0x10
This is pretty easy for me to reproduce.
Basically, I would like to know what the user is expected to do when
more than one RAID5 array component fails during rebuild/resync.
Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2011-06-05 19:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>
2011-06-05 19:41 ` Alexander Lyakas [this message]
[not found] ` <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>
2011-06-06 18:19 ` RAID5: failing an active component during spare rebuild - arrays hangs Alexander Lyakas
2011-06-21 8:05 ` Alexander Lyakas
2011-06-22 2:54 ` NeilBrown
2011-06-26 18:13 ` Alexander Lyakas
2011-06-28 2:29 ` NeilBrown
2011-07-17 8:29 ` Alexander Lyakas
2011-08-25 8:59 ` Alexander Lyakas
2011-08-25 10:10 ` Alexander Lyakas
2011-08-31 2:46 ` NeilBrown
2011-11-27 9:56 ` Alexander Lyakas
2011-12-06 3:16 ` NeilBrown
2011-12-06 21:07 ` Alexander Lyakas
2011-12-06 21:21 ` NeilBrown
2011-12-14 10:27 ` Alexander Lyakas
2011-12-14 11:32 ` NeilBrown
2011-12-15 14:38 ` Alexander Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTinwr9UE_B+MSXfbE2nAv0wLrTvhXg@mail.gmail.com \
--to=alex.bolshoy@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).