From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bill Davidsen <davidsen@tmr.com>
Subject: Re: mdadm --grow failed
Date: Sat, 17 Feb 2007 13:27:26 -0500
Message-ID: <45D7490E.6080309@tmr.com>
References: <20070217030514.M74974@liquid-nexus.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <linux-raid-owner@vger.kernel.org>
In-Reply-To: <20070217030514.M74974@liquid-nexus.net>
Sender: linux-raid-owner@vger.kernel.org
To: Marc Marais <marcm@liquid-nexus.net>
Cc: linux-raid@vger.kernel.org
List-Id: linux-raid.ids

Marc Marais wrote:
> I'm trying to grow my raid 5 array as I've just added a new disk. The array 
> was originally 3 drives, I've added a fourth using:
>
> mdadm -a /dev/md6 /dev/sda1
>
> Which added the new drive as a spare. I then did:
>
> mdadm --grow /dev/md6 -n 4
>
> Which started the reshape operation. 
>
> Feb 16 23:51:40 xerces kernel: RAID5 conf printout:
> Feb 16 23:51:40 xerces kernel:  --- rd:4 wd:4
> Feb 16 23:51:40 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 16 23:51:40 xerces kernel:  disk 1, o:1, dev:sdc1
> Feb 16 23:51:40 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 16 23:51:40 xerces kernel:  disk 3, o:1, dev:sda1
> Feb 16 23:51:40 xerces kernel: md: reshape of RAID array md6
> Feb 16 23:51:40 xerces kernel: md: minimum _guaranteed_  speed: 1000 
> KB/sec/disk.
> Feb 16 23:51:40 xerces kernel: md: using maximum available idle IO bandwidth 
> (but not more than 200000 KB/sec) for reshape.
> Feb 16 23:51:40 xerces kernel: md: using 128k window, over a total of 
> 156288256 blocks.
>
> Unfortunately one of the drives timed out during the operation (not a read 
> error - just a timeout - which I would've thought would be retried but 
> anyway...):
>
> Feb 17 00:19:16 xerces kernel: ata3: command timeout
> Feb 17 00:19:16 xerces kernel: ata3: no sense translation for status: 0x40
> Feb 17 00:19:16 xerces kernel: ata3: translated ATA stat/err 0x40/00 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> Feb 17 00:19:16 xerces kernel: ata3: status=0x40 { DriveReady }
> Feb 17 00:19:16 xerces kernel: sd 3:0:0:0: SCSI error: return code = 
> 0x08000002
> Feb 17 00:19:16 xerces kernel: sdc: Current [descriptor]: sense key: Aborted 
> Command
> Feb 17 00:19:16 xerces kernel:     Additional sense: No additional sense 
> information
> Feb 17 00:19:16 xerces kernel: Descriptor sense data with sense descriptors 
> (in hex):
> Feb 17 00:19:16 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 
> 00 00 00 00 
> Feb 17 00:19:16 xerces kernel:         00 00 00 01 
> Feb 17 00:19:16 xerces kernel: end_request: I/O error, dev sdc, sector 
> 24065423
> Feb 17 00:19:16 xerces kernel: raid5: Disk failure on sdc1, disabling 
> device. Operation continuing on 3 devices
>
> Which then unfortunately aborted the reshape operation:
>
> Feb 17 00:19:16 xerces kernel: md: md6: reshape done.
> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 17 00:19:17 xerces kernel:  disk 1, o:0, dev:sdc1
> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
>
> I re-added the failed disk (sdc) (which btw is a brand new disk - seems this 
> is a controller issue - high IO load?) which then resynced the array.
>
> At this point I'm confused as to the state of the array.
>
> mdadm -D /dev/md6 gives:
>
> /dev/md6:
>         Version : 00.91.03
>   Creation Time : Tue Aug  1 23:31:54 2006
>      Raid Level : raid5
>      Array Size : 312576512 (298.10 GiB 320.08 GB)
>   Used Dev Size : 156288256 (149.05 GiB 160.04 GB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 6
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Feb 17 12:14:22 2007
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 128K
>
>   Delta Devices : 1, (3->4)
>
>            UUID : 603e7ac0:de4df2d1:d44c6b9b:3d20ad32
>          Events : 0.7215890
>
>     Number   Major   Minor   RaidDevice State
>        0       8       17        0      active sync   /dev/sdb1
>        1       8       33        1      active sync   /dev/sdc1
>        2       8       49        2      active sync   /dev/sdd1
>        3       8        1        3      active sync   /dev/sda1
>
> Although it previously (before issuing the command below) mentioned 
> something about reshape 1% or something to that effect.
>
> I've attempted to continue the reshape by issuing:
>
> mdadm --grow /dev/md6 -n 4 
>
> Which gives the error that the array can't be reshaped without increasing 
> its size!
>
> Is my array destroyed? Seeing as the sda disk wasn't completely synced I'm 
> wonder how it was using to resync the array when sdc went offline. I've got 
> a bad feeling about this :|
>
> Help appreciated. (I do have a full backup of course but that's a last 
> resort with my luck I'd get a read error from the tape drive)
I have to think maybe a 'check' would have been good before the grow, 
but since Neil didn't suggest it, please don't now, unless he agrees 
that it's a valid attempt.

However, you certainly can run 'df' and see if the filesystem is resized.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979