Re: raid5 reshape failure - restart?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Glen Dragon <glen.dragon@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5 reshape failure - restart?
Date: Sun, 15 May 2011 17:45:34 -0400	[thread overview]
Message-ID: <BANLkTi=7oHaQYbOA+qWHHzUzP5fp6UB=9A@mail.gmail.com> (raw)
In-Reply-To: <20110516073702.6b6b9bb2@notabene.brown>

On Sun, May 15, 2011 at 5:37 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 15 May 2011 13:33:28 -0400 Glen Dragon <glen.dragon@gmail.com> wrote:
>
>> In trying to reshape a raid5 array, I encountered some problems.
>> I was trying to reshape from raid5 3->4 devices.  The reshape process
>> started with seeming no problems, however i noticed in the kernel log
>> a number of ata3.00: failed command: WRITE FPDMA QUEUED errors.
>> In trying to determine if this was going to be bad for me, I disabled
>> ncq on this device. Looking at the log, i notice around the same time
>> /dev/sdd reported problems and took itself offline.
>> At this point the reshape seemed to be continuing w/o issue, even
>> though one of the drives was offline.. I wasn't sure that this made
>> sense.
>>
>> Shortly after, I noticed that the progress on the reshape had stalled.
>>  I tried changing the stripe_cache_size from 256 to [1024|2048|4096],
>> but the reshape did not resume.  top reported that the reshape process
>> was using 100% of one core, and the load average was climbing into the
>> 50's
>>
>> At this point I rebooted.   The array does not start.
>>
>> Can the reshape be restarted?  I cannot figure out where the backup
>> file ended up.  It does not seem to be where I thought I saved it.
>
> When a reshape is increasing the size of the array the backup file is only
> needed for the first few stripes.  After that it is irrelevant and is removed.
>
> You should be able to simply reassemble the array and it should continue the
> reshape.
>
> What happens when you  try:
>
>  mdadm -S /dev/md_d2
>  mdadm -A /dev/md_d2 /dev/sd[abc]5 -vv
>
> Please report both the messsages from mdadm and any new message is "dmesg" at
> the time.
>
> NeilBrown
>

 # mdadm -S /dev/md_d2
mdadm: stopped /dev/md_d2


 # mdadm -A /dev/md_d2  /dev/sd[abcd]5 -vv
mdadm: looking for devices for /dev/md_d2
mdadm: /dev/sda5 is identified as a member of /dev/md_d2, slot 0.
mdadm: /dev/sdb5 is identified as a member of /dev/md_d2, slot 1.
mdadm: /dev/sdc5 is identified as a member of /dev/md_d2, slot 3.
mdadm: /dev/sdd5 is identified as a member of /dev/md_d2, slot 2.
mdadm:/dev/md_d2 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on device-3
mdadm: added /dev/sdb5 to /dev/md_d2 as 1
mdadm: added /dev/sdd5 to /dev/md_d2 as 2
mdadm: added /dev/sdc5 to /dev/md_d2 as 3
mdadm: added /dev/sda5 to /dev/md_d2 as 0
mdadm: /dev/md_d2 assembled from 3 drives - not enough to start the
array while not clean - consider --force.

 # mdadm -D /dev/md_d2
mdadm: md device /dev/md_d2 does not appear to be active.

 # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [multipath] [raid1]
md_d2 : inactive sda5[0](S) sdc5[3](S) sdd5[2](S) sdb5[1](S)
      2799357952 blocks super 0.91

md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2]
      5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

md1 : active raid5 sdd3[2] sdb3[1] sda3[0]
      62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdb1[1] sda1[0] sdd1[2]
      208704 blocks [3/3] [UUU]


kernel log:
md: md_d2 stopped.
md: unbind<sda5>
md: export_rdev(sda5)
md: unbind<sdc5>
md: export_rdev(sdc5)
md: unbind<sdd5>
md: export_rdev(sdd5)
md: unbind<sdb5>
md: export_rdev(sdb5)
md: md_d2 stopped.
md: bind<sdb5>
md: bind<sdd5>
md: bind<sdc5>
md: bind<sda5>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2011-05-15 21:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-15 17:33 raid5 reshape failure - restart? Glen Dragon
2011-05-15 21:37 ` NeilBrown
2011-05-15 21:45   ` Glen Dragon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=7oHaQYbOA+qWHHzUzP5fp6UB=9A@mail.gmail.com' \
    --to=glen.dragon@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).