All of lore.kernel.org
 help / color / mirror / Atom feed
From: Larkin Lowrey <llowrey@nuclearwinter.com>
To: linux-raid@vger.kernel.org
Subject: Re: Raid5 device hangs in active state
Date: Sun, 11 Mar 2012 17:39:59 -0500	[thread overview]
Message-ID: <4F5D29BF.1050500@nuclearwinter.com> (raw)
In-Reply-To: <20120109112644.489e4a48@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 5772 bytes --]

I either have more info or a totally different scenario. I initiated a
raid5->raid6 reshape on a different machine. At the same time I (perhaps
stupidly) ran resize2fs to shrink the ext4 fs on the array being
reshaped. The reshape is going slowly (as I would expect) but the resize
is nearly dead. It is only able to write a single 4k block to the array
about every 5-6 seconds.

If that's expected then sorry for the noise and please ignore the rest.
Otherwise...

When I tried reducing sync_speed_min to 1000 the resize2fs write
interval increased to once per 8-10s. When I lowered sync_speed_max to
1000 I saw no more writes until I set the min/max back to 500000 at
which point iostat reported a w_await time roughly equal to the time the
array had the lower max.

The '# echo t > /proc/sysrq-trigger ' output seems to indicate that
resize2fs is stuck doing an fsync. The full dump is attached.

Here's an iostat showing 100% utilization of the LVM volume and the 4k
block writes.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    1.00     0.00     4.00    
8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    1.00     0.00     4.00    
8.00     1.00 6200.00    0.00 6200.00 1000.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    1.00     0.00     4.00    
8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    1.00     0.00     4.00    
8.00     1.00 5450.00    0.00 5450.00 1000.00 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
md2               0.00     0.00    0.00    0.00     0.00     0.00    
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00    
0.00     1.00    0.00    0.00    0.00   0.00 100.00

/proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md10 : active raid5 md11[3] sdo1[1] sdn1[0]
      1250259200 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/3]
[UUU]

md11 : active raid0 sdm1[1] sdad1[0]
      625138176 blocks super 1.2 128k chunks

md5 : active raid5 sdq1[6] sdp1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
      1220981760 blocks super 1.2 level 5, 128k chunk, algorithm 2 [6/6]
[UUUUUU]

md3 : active raid6 sdy2[4] md4[2] sdx3[1] sdt3[0]
      1875411968 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/3]
[UUU_]
        resync=DELAYED

md2 : active raid6 sdy1[11] sdz2[0] sdt2[10] sdx2[9] sdu2[8] sds2[7]
sdw2[6] sdv2[5] sdr2[4] sdac2[3] sdaa2[2] sdab2[1]
      6641912320 blocks super 0.91 level 6, 128k chunk, algorithm 18
[12/11] [UUUUUUUUUUU_]
      [===============>.....]  reshape = 77.4% (514551296/664191232)
finish=1030.3min speed=2420K/sec

md1 : active raid5 sdj1[0] sdu1[14] sdx1[13] sdt1[12] sdv1[11] sdw1[10]
sdh1[9] sdac1[8] sdaa1[7] sdr1[6] sdab1[5] sdz1[4] sdk1[3] sds1[2] sdl1[1]
      4375960064 blocks level 5, 64k chunk, algorithm 2 [15/15]
[UUUUUUUUUUUUUUU]

md4 : active raid0 sdi1[1] sdg1[0]
      976769024 blocks super 1.2 128k chunks

md0 : active raid1 sdb1[1] sda1[0]
      41941944 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Kernel: 3.2.9-1.fc16.x86_64
mdadm:  v3.2.3
resize2fs: 1.41.14

--Larkin

On 1/8/2012 6:26 PM, NeilBrown wrote:
> On Sun, 08 Jan 2012 16:03:10 -0600 Larkin Lowrey
<llowrey@nuclearwinter.com>
> wrote:
>
>> Suggestions?
>
> # echo t > /proc/sysrq-trigger
>
> and capture that messages that go to 'dmesg'. Post them.
>
> Hopefully your message ring buffer is big enough to collect the entire
> output. If it isn't you might need to boot with
> log_buf_len=1M
> or similar.
>
> That should show what process is blocking on what.
>
> NeilBrown


[-- Attachment #2: resize2fs hang.zip --]
[-- Type: application/octet-stream, Size: 24641 bytes --]

  parent reply	other threads:[~2012-03-11 22:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-08 22:03 Raid5 device hangs in active state Larkin Lowrey
2012-01-09  0:26 ` NeilBrown
2012-02-28 18:23   ` Larkin Lowrey
     [not found]   ` <4F4D1B33.3010308@nuclearwinter.com>
2012-02-28 19:52     ` NeilBrown
2012-02-28 21:33       ` Larkin Lowrey
2012-02-28 21:46         ` NeilBrown
2012-03-11 22:39   ` Larkin Lowrey [this message]
2012-03-11 23:29 ` Asdo
2012-03-12  0:18   ` Larkin Lowrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5D29BF.1050500@nuclearwinter.com \
    --to=llowrey@nuclearwinter.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.