Linux RAID subsystem development
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Sam Bingner <sam@bingner.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Reshape Shrink Hung Again
Date: Mon, 22 Apr 2013 07:24:14 +1000	[thread overview]
Message-ID: <20130422072414.3e30882c@notabene.brown> (raw)
In-Reply-To: <96A49024-A8A7-4ED9-82B1-5AE430374EBE@bingner.com>

[-- Attachment #1: Type: text/plain, Size: 3030 bytes --]

On Fri, 19 Apr 2013 08:29:37 +0000 Sam Bingner <sam@bingner.com> wrote:

> I'll start this off by saying that no data is in jeopardy, but I would like to track down the cause of this problem and fix it.  I originally thought it must have been due to the incorrect backup-file size with a raid array shrunk to smaller than the final size when it happened to me last time but this time this was not the case.
> 
> I initiated a shrink from a 4-drive RAID5 to a 3-drive RAID5, this shrink had no problems except that a drive failed right at the end of the reshape... then it hung at 99.9% and does not allow me to remove the failed drive from the array because it is "rebuilding".  I am not sure if the drive failed at the end, or if it was after it had gotten to 99.9% because I didn't see this until the next morning as it ran overnight.
> 
> Sam
> 
> root@fs:/var/log# uname -a
> Linux fs 2.6.32-5-686 #1 SMP Mon Jan 16 16:04:25 UTC 2012 i686 GNU/Linux
> 
> Apr 17 22:37:41 fs kernel: [25860779.639762] md1: detected capacity change from 749122093056 to 499414728704
> Apr 17 22:38:40 fs kernel: [25860837.912441] md: reshape of RAID array md1
> Apr 17 22:38:40 fs kernel: [25860837.912447] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Apr 17 22:38:40 fs kernel: [25860837.912452] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> Apr 17 22:38:40 fs kernel: [25860837.912459] md: using 128k window, over a total of 243854848 blocks.
> Apr 18 07:51:09 fs kernel: [25893987.273813] raid5: Disk failure on sda2, disabling device.
> Apr 18 07:51:09 fs kernel: [25893987.273815] raid5: Operation continuing on 2 devices.
> Apr 18 07:51:09 fs kernel: [25893987.287168] md: super_written gets error=-5, uptodate=0
> Apr 18 07:51:10 fs kernel: [25893987.657039] md: md1: reshape done.
> Apr 18 07:51:10 fs kernel: [25893987.781599] md: reshape of RAID array md1
> Apr 18 07:51:10 fs kernel: [25893987.781607] md: minimum _guaranteed_  speed: 100 KB/sec/disk.
> Apr 18 07:51:10 fs kernel: [25893987.781613] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> Apr 18 07:51:10 fs kernel: [25893987.781620] md: using 128k window, over a total of 243854848 blocks.
> 
> 
> md1 : active raid5 sdd2[3] sda2[0](F) sdc2[2] sdb2[4]
>       487709696 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
>       [===================>.]  reshape = 99.9% (243853824/243854848) finish=343.6min speed=0K/sec
> 

Looks like a bug - probably in mdadm.
mdadm needs to help the reshape over the last little bit, and md is probably
waiting for it to do that.  This will be the only time in the whole process
when the backup file is used.

I would try stopping the array and re-assembling it.  That might require a
reboot.  If that doesn't fix it, let me know and I'll prioritise this.
Otherwise - I've put it on my to-do list.  I'll try to reproduce and fix it
in due course.

Thanks for the report,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2013-04-21 21:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-19  8:29 Reshape Shrink Hung Again Sam Bingner
2013-04-21  8:26 ` Sam Bingner
2013-04-21 17:38   ` Phil Turmel
2013-04-21 21:24 ` NeilBrown [this message]
2013-05-01  2:00   ` Sam Bingner
2013-05-06  5:29     ` NeilBrown
2013-05-06  6:36       ` Sam Bingner
2013-05-09  6:16         ` NeilBrown
2013-05-09  6:58           ` Sam Bingner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130422072414.3e30882c@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=sam@bingner.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox