From: NeilBrown <neilb@suse.de>
To: "Jörg Habenicht" <j.habenicht@gmx.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Array died during grow; now resync stopped
Date: Tue, 3 Feb 2015 06:40:56 +1100 [thread overview]
Message-ID: <20150203064056.5dbba8c5@notabene.brown> (raw)
In-Reply-To: <loom.20150202T100454-579@post.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 6762 bytes --]
On Mon, 2 Feb 2015 09:41:02 +0000 (UTC) Jörg Habenicht <j.habenicht@gmx.de>
wrote:
> Hi all,
>
> I had a server crash during an array grow.
> Commandline was "mdadm --grow /dev/md0 --raid-devices=6 --chunk=1M"
>
> Now the sync is stuck at 27% and wont continue.
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
> 5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6]
> [UUUUUU]
> [=====>...............] reshape = 27.9% (410229760/1465137152)
> finish=8670020128.0min speed=0K/sec
>
> unused devices: <none>
>
>
> $ mdadm -D /dev/md0
> /dev/md0:
> Version : 1.0
> Creation Time : Thu Oct 7 09:28:04 2010
> Raid Level : raid5
> Array Size : 5860548608 (5589.05 GiB 6001.20 GB)
> Used Dev Size : 1465137152 (1397.26 GiB 1500.30 GB)
> Raid Devices : 6
> Total Devices : 6
> Persistence : Superblock is persistent
>
> Update Time : Sun Feb 1 13:30:05 2015
> State : clean, reshaping
> Active Devices : 6
> Working Devices : 6
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Reshape Status : 27% complete
> Delta Devices : 1, (5->6)
> New Chunksize : 1024K
>
> Name : stelli:3 (local to host stelli)
> UUID : 52857d77:3806e446:477d4865:d711451e
> Events : 2254869
>
> Number Major Minor RaidDevice State
> 0 8 65 0 active sync /dev/sde1
> 5 8 81 1 active sync /dev/sdf1
> 8 8 49 2 active sync /dev/sdd1
> 7 8 17 3 active sync /dev/sdb1
> 6 8 33 4 active sync /dev/sdc1
> 9 8 97 5 active sync /dev/sdg1
>
>
> smartcrl reports the disks are OK. No remapped sectors, no pending writes, etc.
>
> The system load keeps at 2.0:
> $ cat /proc/loadavg
> 2.00 2.00 1.95 1/140 2937
> which may be caused by udevd and md0_reshape
> $ ps fax
> PID TTY STAT TIME COMMAND
> 2 ? S 0:00 [kthreadd]
> ...
> 1671 ? D 0:00 \_ [md0_reshape]
> ...
> 1289 ? Ss 0:01 /sbin/udevd --daemon
> 1672 ? D 0:00 \_ /sbin/udevd --daemon
>
>
> Could this be caused by a software lock?
Some sort of software problem I suspect.
What does
cat /proc/1671/stack
cat /proc/1672/stack
show?
Alternatively,
echo w > /proc/sysrq-trigger
and see what appears in 'dmesg'.
>
> The system got 2G RAM and 2G swap. Is this sufficient to complete?
Memory shouldn't be a problem.
However it wouldn't hurt to see what value is in
/sys/block/md0/md/stripe_cache_size
and double it.
If all else fails a reboot should be safe and will probably start the reshape
properly. md is very careful about surviving reboots.
NeilBrown
> $ free
> total used free shared buffers cached
> Mem: 1799124 351808 1447316 540 14620 286216
> -/+ buffers/cache: 50972 1748152
> Swap: 2104508 0 2104508
>
>
> And in dmesg I found this:
> $ dmesg | less
> [ 5.456941] md: bind<sdg1>
> [ 11.015014] xor: measuring software checksum speed
> [ 11.051384] prefetch64-sse: 3291.000 MB/sec
> [ 11.091375] generic_sse: 3129.000 MB/sec
> [ 11.091378] xor: using function: prefetch64-sse (3291.000 MB/sec)
> [ 11.159365] raid6: sse2x1 1246 MB/s
> [ 11.227343] raid6: sse2x2 2044 MB/s
> [ 11.295327] raid6: sse2x4 2487 MB/s
> [ 11.295331] raid6: using algorithm sse2x4 (2487 MB/s)
> [ 11.295334] raid6: using intx1 recovery algorithm
> [ 11.328771] md: raid6 personality registered for level 6
> [ 11.328776] md: raid5 personality registered for level 5
> [ 11.328779] md: raid4 personality registered for level 4
> [ 19.840890] bio: create slab <bio-1> at 1
> [ 159.701406] md: md0 stopped.
> [ 159.701413] md: unbind<sdg1>
> [ 159.709902] md: export_rdev(sdg1)
> [ 159.709980] md: unbind<sdd1>
> [ 159.721856] md: export_rdev(sdd1)
> [ 159.721955] md: unbind<sdb1>
> [ 159.733883] md: export_rdev(sdb1)
> [ 159.733991] md: unbind<sdc1>
> [ 159.749856] md: export_rdev(sdc1)
> [ 159.749954] md: unbind<sdf1>
> [ 159.769885] md: export_rdev(sdf1)
> [ 159.769985] md: unbind<sde1>
> [ 159.781873] md: export_rdev(sde1)
> [ 160.471460] md: md0 stopped.
> [ 160.490329] md: bind<sdf1>
> [ 160.490478] md: bind<sdd1>
> [ 160.490689] md: bind<sdb1>
> [ 160.490911] md: bind<sdc1>
> [ 160.491164] md: bind<sdg1>
> [ 160.491408] md: bind<sde1>
> [ 160.492616] md/raid:md0: reshape will continue
> [ 160.492638] md/raid:md0: device sde1 operational as raid disk 0
> [ 160.492640] md/raid:md0: device sdg1 operational as raid disk 5
> [ 160.492641] md/raid:md0: device sdc1 operational as raid disk 4
> [ 160.492642] md/raid:md0: device sdb1 operational as raid disk 3
> [ 160.492644] md/raid:md0: device sdd1 operational as raid disk 2
> [ 160.492645] md/raid:md0: device sdf1 operational as raid disk 1
> [ 160.493187] md/raid:md0: allocated 0kB
> [ 160.493253] md/raid:md0: raid level 5 active with 6 out of 6 devices,
> algorithm 2
> [ 160.493256] RAID conf printout:
> [ 160.493257] --- level:5 rd:6 wd:6
> [ 160.493259] disk 0, o:1, dev:sde1
> [ 160.493261] disk 1, o:1, dev:sdf1
> [ 160.493262] disk 2, o:1, dev:sdd1
> [ 160.493263] disk 3, o:1, dev:sdb1
> [ 160.493264] disk 4, o:1, dev:sdc1
> [ 160.493266] disk 5, o:1, dev:sdg1
> [ 160.493336] md0: detected capacity change from 0 to 6001201774592
> [ 160.493340] md: reshape of RAID array md0
> [ 160.493342] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> [ 160.493343] md: using maximum available idle IO bandwidth (but not more
> than 200000 KB/sec) for reshape.
> [ 160.493351] md: using 128k window, over a total of 1465137152k.
> [ 160.951404] md0: unknown partition table
> [ 190.984871] udevd[1289]: worker [1672] /devices/virtual/block/md0
> timeout; kill it
> [ 190.984901] udevd[1289]: seq 2259 '/devices/virtual/block/md0' killed
>
>
>
> $ mdadm --version
> mdadm - v3.3.1 - 5th June 2014
>
> uname -a
> Linux XXXXX 3.14.14-gentoo #3 SMP Sat Jan 31 18:45:04 CET 2015 x86_64 AMD
> Athlon(tm) II X2 240e Processor AuthenticAMD GNU/Linux
>
>
> Currently I can't access the array to read the remaining data, nor can I
> continue the array grow.
> Can you help me get it running?
>
>
> best regards
> Jörg
>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-02-02 19:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-02 9:41 Array died during grow; now resync stopped Jörg Habenicht
2015-02-02 19:40 ` NeilBrown [this message]
2015-02-03 10:35 ` Aw: " "Jörg Habenicht"
2015-02-04 6:45 ` NeilBrown
2015-02-04 11:58 ` Aw: " "Jörg Habenicht"
2015-02-05 4:40 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150203064056.5dbba8c5@notabene.brown \
--to=neilb@suse.de \
--cc=j.habenicht@gmx.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).