* --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
@ 2013-05-07 11:36 Ole Tange
2013-05-07 11:54 ` NeilBrown
2013-05-07 11:56 ` Ole Tange
0 siblings, 2 replies; 7+ messages in thread
From: Ole Tange @ 2013-05-07 11:36 UTC (permalink / raw)
To: linux-raid
I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
md1 : active raid6 sdg[0] sdi[12](S) sdt[15](S) sdy[17](S) sdx[16](S)
sdh[8] sdw[13] sdo[14] sdk[5] sdd[11] sdc[3] sdv[9] sdn[10]
27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2
[9/9] [UUUUUUUUU]
bitmap: 2/2 pages [8KB], 1048576KB chunk
It is, however, hanging the system.
# remove the bitmap
mdadm -v --grow /dev/md1 -b none
# Do the reshape
mdadm -v --grow /dev/md1 --raid-devices=10
--backup-file=/root/back-md1
mdadm: Need to backup 7168K of critical section..
cat /proc/mdstat
<<hangs>>
dmesg says:
[4328128.021614] md: reshape of RAID array md1
[4328128.021618] md: minimum _guaranteed_ speed: 10000 KB/sec/disk.
[4328128.021621] md: using maximum available idle IO bandwidth (but
not more than 30000 KB/sec) for reshape.
[4328128.021783] md: using 128k window, over a total of 3907017344k.
[4328128.312637] md: md_do_sync() got signal ... exiting
Disk I/O is blocked to the RAID.
What to do?
/Ole
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 11:36 --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang Ole Tange
@ 2013-05-07 11:54 ` NeilBrown
2013-05-07 12:08 ` Ole Tange
2013-05-07 11:56 ` Ole Tange
1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-07 11:54 UTC (permalink / raw)
To: Ole Tange; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]
On Tue, 7 May 2013 13:36:56 +0200 Ole Tange <tange@binf.ku.dk> wrote:
> I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
>
> md1 : active raid6 sdg[0] sdi[12](S) sdt[15](S) sdy[17](S) sdx[16](S)
> sdh[8] sdw[13] sdo[14] sdk[5] sdd[11] sdc[3] sdv[9] sdn[10]
> 27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [9/9] [UUUUUUUUU]
> bitmap: 2/2 pages [8KB], 1048576KB chunk
>
> It is, however, hanging the system.
>
> # remove the bitmap
> mdadm -v --grow /dev/md1 -b none
>
> # Do the reshape
> mdadm -v --grow /dev/md1 --raid-devices=10
> --backup-file=/root/back-md1
> mdadm: Need to backup 7168K of critical section..
>
> cat /proc/mdstat
> <<hangs>>
>
> dmesg says:
>
> [4328128.021614] md: reshape of RAID array md1
> [4328128.021618] md: minimum _guaranteed_ speed: 10000 KB/sec/disk.
> [4328128.021621] md: using maximum available idle IO bandwidth (but
> not more than 30000 KB/sec) for reshape.
> [4328128.021783] md: using 128k window, over a total of 3907017344k.
> [4328128.312637] md: md_do_sync() got signal ... exiting
>
> Disk I/O is blocked to the RAID.
>
> What to do?
What does
grep . /sys/block/md1/md/*
show? Or does it hang?
What about "mdadm --examine /dev/sd*"
Did the "mdadm --grow" appear to complete, and return to the shell prompt?
What kernel version? What mdadm version?
A hanging /proc/mdstat is definitely not a good sign. The "got signal ...
exiting" isn't good either. I would expect more messages with that.
You didn't just "grep md" in dmesg did you? That is a complete dmesg output
for the entire time period that could possibly be relevant?
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 11:36 --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang Ole Tange
2013-05-07 11:54 ` NeilBrown
@ 2013-05-07 11:56 ` Ole Tange
2013-05-07 12:14 ` NeilBrown
1 sibling, 1 reply; 7+ messages in thread
From: Ole Tange @ 2013-05-07 11:56 UTC (permalink / raw)
To: linux-raid
On Tue, May 7, 2013 at 1:36 PM, Ole Tange <tange@binf.ku.dk> wrote:
> I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
:
> It is, however, hanging the system.
I can mdadm -E:
/dev/sdi:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 242d6530:e2562ecb:1dcd2a97:15a1a868
Name : lemaitre:1 (local to host lemaitre)
Creation Time : Mon Nov 5 16:27:45 2012
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
Array Size : 31256138752 (29808.18 GiB 32006.29 GB)
Used Dev Size : 7814034688 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 4b8de95b:90a2aed7:c0ae092b:a056dd95
Reshape pos'n : 8192 (8.00 MiB 8.39 MB)
Delta Devices : 1 (9->10)
Update Time : Tue May 7 13:12:19 2013
Checksum : a4f483fd - correct
Events : 298792
Layout : left-symmetric
Chunk Size : 128K
Device Role : Active device 9
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
So it seems stuck on the first 8 MB. Is it safe to reboot?
This hangs:
grep . /sys/block/md1/md/*
$ mdadm --version
mdadm - v3.2.5 - 18th May 2012
$ uname -r
3.2.0-0.bpo.1-amd64
/Ole
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 11:54 ` NeilBrown
@ 2013-05-07 12:08 ` Ole Tange
2013-05-07 12:40 ` NeilBrown
0 siblings, 1 reply; 7+ messages in thread
From: Ole Tange @ 2013-05-07 12:08 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Tue, May 7, 2013 at 1:54 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 7 May 2013 13:36:56 +0200 Ole Tange <tange@binf.ku.dk> wrote:
>
>> I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
:
>> It is, however, hanging the system.
:
>> # Do the reshape
>> mdadm -v --grow /dev/md1 --raid-devices=10
>> --backup-file=/root/back-md1
>> mdadm: Need to backup 7168K of critical section..
This completed - did not hang.
> What does
> grep . /sys/block/md1/md/*
> show? Or does it hang?
Hangs (ctrl-c works).
> What about "mdadm --examine /dev/sd*"
https://gist.github.com/anonymous/5532063
The disk box contains more drives than just the array in question. The
interesting array is: 242d6530:e2562ecb:1dcd2a97:15a1a868
> Did the "mdadm --grow" appear to complete, and return to the shell prompt?
Yes.
> What kernel version? What mdadm version?
$ mdadm --version
mdadm - v3.2.5 - 18th May 2012
$ uname -r
3.2.0-0.bpo.1-amd64
> A hanging /proc/mdstat is definitely not a good sign. The "got signal ...
> exiting" isn't good either. I would expect more messages with that.
> You didn't just "grep md" in dmesg did you? That is a complete dmesg output
> for the entire time period that could possibly be relevant?
dmesg of controller upgrade (after which everything worked fine)
followed by --grow at 4328065.432267
https://gist.github.com/anonymous/5532093
/Ole
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 11:56 ` Ole Tange
@ 2013-05-07 12:14 ` NeilBrown
2013-05-07 12:16 ` Ole Tange
0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-07 12:14 UTC (permalink / raw)
To: Ole Tange; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]
On Tue, 7 May 2013 13:56:55 +0200 Ole Tange <tange@binf.ku.dk> wrote:
> On Tue, May 7, 2013 at 1:36 PM, Ole Tange <tange@binf.ku.dk> wrote:
>
> > I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
> :
> > It is, however, hanging the system.
>
> I can mdadm -E:
>
> /dev/sdi:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 242d6530:e2562ecb:1dcd2a97:15a1a868
> Name : lemaitre:1 (local to host lemaitre)
> Creation Time : Mon Nov 5 16:27:45 2012
> Raid Level : raid6
> Raid Devices : 10
>
> Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB)
> Array Size : 31256138752 (29808.18 GiB 32006.29 GB)
> Used Dev Size : 7814034688 (3726.02 GiB 4000.79 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 4b8de95b:90a2aed7:c0ae092b:a056dd95
>
> Reshape pos'n : 8192 (8.00 MiB 8.39 MB)
> Delta Devices : 1 (9->10)
>
> Update Time : Tue May 7 13:12:19 2013
> Checksum : a4f483fd - correct
> Events : 298792
>
> Layout : left-symmetric
> Chunk Size : 128K
>
> Device Role : Active device 9
> Array State : AAAAAAAAAA ('A' == active, '.' == missing)
>
> So it seems stuck on the first 8 MB. Is it safe to reboot?
>
> This hangs:
>
> grep . /sys/block/md1/md/*
>
> $ mdadm --version
> mdadm - v3.2.5 - 18th May 2012
>
> $ uname -r
> 3.2.0-0.bpo.1-amd64
>
It should be safe to reboot though until we know why it is hanging, I cannot
promise it won't hang straight away again.
You didn't answer my question about dmesg output: did you leave anything out?
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 12:14 ` NeilBrown
@ 2013-05-07 12:16 ` Ole Tange
0 siblings, 0 replies; 7+ messages in thread
From: Ole Tange @ 2013-05-07 12:16 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Tue, May 7, 2013 at 2:14 PM, NeilBrown <neilb@suse.de> wrote:
> You didn't answer my question about dmesg output: did you leave anything out?
Nothing left out on:
https://gist.github.com/anonymous/5532093
/Ole
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang
2013-05-07 12:08 ` Ole Tange
@ 2013-05-07 12:40 ` NeilBrown
0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2013-05-07 12:40 UTC (permalink / raw)
To: Ole Tange; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]
On Tue, 7 May 2013 14:08:14 +0200 Ole Tange <tange@binf.ku.dk> wrote:
> On Tue, May 7, 2013 at 1:54 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 7 May 2013 13:36:56 +0200 Ole Tange <tange@binf.ku.dk> wrote:
> >
> >> I am expanding my 9 harddisk RAID6 to 10 harddisk RAID6:
> :
> >> It is, however, hanging the system.
> :
> >> # Do the reshape
> >> mdadm -v --grow /dev/md1 --raid-devices=10
> >> --backup-file=/root/back-md1
> >> mdadm: Need to backup 7168K of critical section..
>
> This completed - did not hang.
>
> > What does
> > grep . /sys/block/md1/md/*
> > show? Or does it hang?
>
> Hangs (ctrl-c works).
>
> > What about "mdadm --examine /dev/sd*"
>
> https://gist.github.com/anonymous/5532063
>
> The disk box contains more drives than just the array in question. The
> interesting array is: 242d6530:e2562ecb:1dcd2a97:15a1a868
>
> > Did the "mdadm --grow" appear to complete, and return to the shell prompt?
>
> Yes.
>
> > What kernel version? What mdadm version?
>
> $ mdadm --version
> mdadm - v3.2.5 - 18th May 2012
>
> $ uname -r
> 3.2.0-0.bpo.1-amd64
>
> > A hanging /proc/mdstat is definitely not a good sign. The "got signal ...
> > exiting" isn't good either. I would expect more messages with that.
> > You didn't just "grep md" in dmesg did you? That is a complete dmesg output
> > for the entire time period that could possibly be relevant?
>
> dmesg of controller upgrade (after which everything worked fine)
> followed by --grow at 4328065.432267
>
> https://gist.github.com/anonymous/5532093
>
> /Ole
Thanks for the extra info. I can't find any smoking gun unfortunately.
What does "ps axgu" show. I'm particularly looking for processes in 'D'
state.
If there are any, particularly if they are md related, try
cat /proc/$PID/stack
for appropriate values of $PID
Maybe also try
echo t > /proc/sysrq_trigger
and see what gets into 'dmesg' - hopefully your dmesg buffer is big enough to
hold the important stack traces.
If you get anything from either of those, please post.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-05-07 12:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-07 11:36 --grow RAID6 gives: md: md_do_sync() got signal ... exiting + hang Ole Tange
2013-05-07 11:54 ` NeilBrown
2013-05-07 12:08 ` Ole Tange
2013-05-07 12:40 ` NeilBrown
2013-05-07 11:56 ` Ole Tange
2013-05-07 12:14 ` NeilBrown
2013-05-07 12:16 ` Ole Tange
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox