From: Alexander Lyakas <alex.bolshoy@gmail.com>
To: Nagilum <nagilum@nagilum.org>, linux-raid@vger.kernel.org
Subject: Re: RAID5: failing an active component during spare rebuild - arrays hangs
Date: Mon, 6 Jun 2011 21:19:40 +0300 [thread overview]
Message-ID: <BANLkTinr_1GcbypCpxFKPXoid4DxTKvCag@mail.gmail.com> (raw)
In-Reply-To: <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>
Hello,
the kernel version is:
root@ubuntu:~# uname -a
Linux ubuntu 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
2011 x86_64 x86_64 x86_64 GNU/Linux
mdadm version is:
root@ubuntu:~# mdadm -V
mdadm - v3.1.4 - 31st August 2010
Examining the three array components:
root@ubuntu:~# mdadm -E /dev/sd{a,b,c}
/dev/sda:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
Name : vc:zvp_1123
Creation Time : Mon Jun 6 21:10:38 2011
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
Array Size : 83879936 (40.00 GiB 42.95 GB)
Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 8db90071:be80216e:09468262:1f5046b1
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jun 6 21:10:46 2011
Checksum : 2e424556 - correct
Events : 10
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : A.A ('A' == active, '.' == missing)
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
Name : vc:zvp_1123
Creation Time : Mon Jun 6 21:10:38 2011
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
Array Size : 83879936 (40.00 GiB 42.95 GB)
Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9f41313b:b1aa70f8:6cf0ca2f:c6ea0a64
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jun 6 21:10:44 2011
Checksum : 2d23c61 - correct
Events : 8
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA ('A' == active, '.' == missing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x3
Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
Name : vc:zvp_1123
Creation Time : Mon Jun 6 21:10:38 2011
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
Array Size : 83879936 (40.00 GiB 42.95 GB)
Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 999424 sectors
State : active
Device UUID : 61189a9d:ec082cea:a3ba32fb:800fe84b
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jun 6 21:10:46 2011
Checksum : a47a059 - correct
Events : 10
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.A ('A' == active, '.' == missing)
Details about the array:
root@ubuntu:~# mdadm -Q --detail /dev/md1123
/dev/md1123:
Version : 1.2
Creation Time : Mon Jun 6 21:10:38 2011
Raid Level : raid5
Array Size : 41939968 (40.00 GiB 42.95 GB)
Used Dev Size : 20969984 (20.00 GiB 21.47 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Jun 6 21:10:46 2011
State : active, FAILED
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Name : vc:zvp_1123
UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
Events : 10
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 faulty spare rebuilding /dev/sdb
3 8 32 2 spare rebuilding /dev/sdc
Basically, the thing is that the faulty (and the rebuilding spare)
component are not kicked out of the array, and the array is stuck in
this state.
Thanks,
Alex.
2011/6/6 Nagilum <nagilum@nagilum.org>:
> Make sure you provide all relevant details such as kernel version, mdadm
> version and maybe also mdadm -E /dev/sd{a,b,c}, mdadm -Q --detail /dev/md0,
> ..
>
> ----- Message from alex.bolshoy@gmail.com ---------
> Date: Sun, 5 Jun 2011 22:41:55 +0300
> From: Alexander Lyakas <alex.bolshoy@gmail.com>
> Subject: RAID5: failing an active component during spare rebuild - arrays
> hangs
> To: linux-raid@vger.kernel.org
>
>
>> Hello everybody,
>> I am testing a scenario, in which I create a RAID5 with three devices:
>> /dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation,
>> it treats the array as degraded and starts rebuilding the sdc as a
>> spare. This is as documented.
>>
>> Then I do --fail on /dev/sda. I understand that at this point my data
>> is gone, but I think should still be able to tear down the array.
>>
>> Sometimes I see that /dev/sda is kicked from the array as faulty, and
>> /dev/sdc is also removed and marked as a spare. Then I am able to tear
>> down the array.
>>
>> But sometimes, it looks like the system hits some kind of a deadlock.
>> mdadm --detail produces:
>>
>> Update Time : Sun Jun 5 21:54:34 2011
>> State : active, FAILED
>> Active Devices : 1
>> Working Devices : 2
>> Failed Devices : 1
>> Spare Devices : 1
>>
>> Layout : left-symmetric
>> Chunk Size : 512K
>>
>> Name : ubuntu:zvp_1123
>> UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c
>> Events : 67
>>
>> Number Major Minor RaidDevice State
>> 0 8 0 0 faulty spare rebuilding /dev/sda
>> 1 8 16 1 active sync /dev/sdb
>> 3 8 32 2 spare rebuilding /dev/sdc
>>
>> So the faulty device and the spare are not kicked out of the array. At
>> this point I am unable to do anything with the array:
>>
>> root@ubuntu:~# sudo mdadm --stop /dev/md1123
>> mdadm: failed to stop array /dev/md1123: Device or resource busy
>> Perhaps a running process, mounted filesystem or active volume group?
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda
>> mdadm: hot remove failed for /dev/sda: Device or resource busy
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb
>> mdadm: hot remove failed for /dev/sdb: Device or resource busy
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc
>> mdadm: hot remove failed for /dev/sdc: Device or resource busy
>>
>> This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010.
>> Looking at some code in mdadm/Detail.c, it looks like /dev/sda has
>> been marked only as MD_DISK_FAULTY, but has not yet been kicked out of
>> the array. The "spare" and "rebuilding" prints also result from that.
>>
>> Same thing also happens (sometimes) when I manually initiate resync
>> (by writing 'repair' to 'sync_action'), and later manually failing one
>> of the devices. Then I also saw messages like this in the syslog:
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task
>> md1123_resync:7993 blocked for more than 120 seconds.
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync D
>> 0000000000000000 0 7993 2 0x00000004
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350647] ffff8800b56b1cd0
>> 0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350649] 0000000000013d00
>> ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350652] ffff8800b7f1adc0
>> ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace:
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350657] [<ffffffff81492885>]
>> md_do_sync+0xb45/0xc90
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350660] [<ffffffff81087940>] ?
>> autoremove_wake_function+0x0/0x40
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350663] [<ffffffff8107861b>] ?
>> recalc_sigpending+0x1b/0x50
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350665] [<ffffffff8148c516>]
>> md_thread+0x116/0x150
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350667] [<ffffffff8148c400>] ?
>> md_thread+0x0/0x150
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350669] [<ffffffff810871f6>]
>> kthread+0x96/0xa0
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350672] [<ffffffff8100cde4>]
>> kernel_thread_helper+0x4/0x10
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350674] [<ffffffff81087160>] ?
>> kthread+0x0/0xa0
>> Jun 5 21:42:00 ubuntu kernel: [ 2280.350676] [<ffffffff8100cde0>] ?
>> kernel_thread_helper+0x0/0x10
>>
>> This is pretty easy for me to reproduce.
>>
>> Basically, I would like to know what the user is expected to do when
>> more than one RAID5 array component fails during rebuild/resync.
>>
>> Thanks,
>> Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> ----- End message from alex.bolshoy@gmail.com -----
>
>
>
> ========================================================================
> # _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
> # / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
> # / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
> # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
> # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
> ========================================================================
>
>
> ----------------------------------------------------------------
> cakebox.homeunix.net - all the machine one needs..
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-06-06 18:19 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>
2011-06-05 19:41 ` RAID5: failing an active component during spare rebuild - arrays hangs Alexander Lyakas
[not found] ` <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>
2011-06-06 18:19 ` Alexander Lyakas [this message]
2011-06-21 8:05 ` Alexander Lyakas
2011-06-22 2:54 ` NeilBrown
2011-06-26 18:13 ` Alexander Lyakas
2011-06-28 2:29 ` NeilBrown
2011-07-17 8:29 ` Alexander Lyakas
2011-08-25 8:59 ` Alexander Lyakas
2011-08-25 10:10 ` Alexander Lyakas
2011-08-31 2:46 ` NeilBrown
2011-11-27 9:56 ` Alexander Lyakas
2011-12-06 3:16 ` NeilBrown
2011-12-06 21:07 ` Alexander Lyakas
2011-12-06 21:21 ` NeilBrown
2011-12-14 10:27 ` Alexander Lyakas
2011-12-14 11:32 ` NeilBrown
2011-12-15 14:38 ` Alexander Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTinr_1GcbypCpxFKPXoid4DxTKvCag@mail.gmail.com \
--to=alex.bolshoy@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=nagilum@nagilum.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).