From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glen Dragon Subject: Re: raid5 reshape failure - restart? Date: Sun, 15 May 2011 17:45:34 -0400 Message-ID: References: <20110516073702.6b6b9bb2@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20110516073702.6b6b9bb2@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On Sun, May 15, 2011 at 5:37 PM, NeilBrown wrote: > On Sun, 15 May 2011 13:33:28 -0400 Glen Dragon wrote: > >> In trying to reshape a raid5 array, I encountered some problems. >> I was trying to reshape from raid5 3->4 devices. =A0The reshape proc= ess >> started with seeming no problems, however i noticed in the kernel lo= g >> a number of ata3.00: failed command: WRITE FPDMA QUEUED errors. >> In trying to determine if this was going to be bad for me, I disable= d >> ncq on this device. Looking at the log, i notice around the same tim= e >> /dev/sdd reported problems and took itself offline. >> At this point the reshape seemed to be continuing w/o issue, even >> though one of the drives was offline.. I wasn't sure that this made >> sense. >> >> Shortly after, I noticed that the progress on the reshape had stalle= d. >> =A0I tried changing the stripe_cache_size from 256 to [1024|2048|409= 6], >> but the reshape did not resume. =A0top reported that the reshape pro= cess >> was using 100% of one core, and the load average was climbing into t= he >> 50's >> >> At this point I rebooted. =A0 The array does not start. >> >> Can the reshape be restarted? =A0I cannot figure out where the backu= p >> file ended up. =A0It does not seem to be where I thought I saved it. > > When a reshape is increasing the size of the array the backup file is= only > needed for the first few stripes. =A0After that it is irrelevant and = is removed. > > You should be able to simply reassemble the array and it should conti= nue the > reshape. > > What happens when you =A0try: > > =A0mdadm -S /dev/md_d2 > =A0mdadm -A /dev/md_d2 /dev/sd[abc]5 -vv > > Please report both the messsages from mdadm and any new message is "d= mesg" at > the time. > > NeilBrown > # mdadm -S /dev/md_d2 mdadm: stopped /dev/md_d2 # mdadm -A /dev/md_d2 /dev/sd[abcd]5 -vv mdadm: looking for devices for /dev/md_d2 mdadm: /dev/sda5 is identified as a member of /dev/md_d2, slot 0. mdadm: /dev/sdb5 is identified as a member of /dev/md_d2, slot 1. mdadm: /dev/sdc5 is identified as a member of /dev/md_d2, slot 3. mdadm: /dev/sdd5 is identified as a member of /dev/md_d2, slot 2. mdadm:/dev/md_d2 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on device-3 mdadm: added /dev/sdb5 to /dev/md_d2 as 1 mdadm: added /dev/sdd5 to /dev/md_d2 as 2 mdadm: added /dev/sdc5 to /dev/md_d2 as 3 mdadm: added /dev/sda5 to /dev/md_d2 as 0 mdadm: /dev/md_d2 assembled from 3 drives - not enough to start the array while not clean - consider --force. # mdadm -D /dev/md_d2 mdadm: md device /dev/md_d2 does not appear to be active. # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [multipath] [raid1] md_d2 : inactive sda5[0](S) sdc5[3](S) sdd5[2](S) sdb5[1](S) 2799357952 blocks super 0.91 md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2] 5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] md1 : active raid5 sdd3[2] sdb3[1] sda3[0] 62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU] md0 : active raid1 sdb1[1] sda1[0] sdd1[2] 208704 blocks [3/3] [UUU] kernel log: md: md_d2 stopped. md: unbind md: export_rdev(sda5) md: unbind md: export_rdev(sdc5) md: unbind md: export_rdev(sdd5) md: unbind md: export_rdev(sdb5) md: md_d2 stopped. md: bind md: bind md: bind md: bind -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html