Re: Fwd: Help with failed RAID-5 -> 6 migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Keith Phillips <spootsy.ootsy@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Fwd: Help with failed RAID-5 -> 6 migration
Date: Tue, 11 Jun 2013 06:44:14 -0400	[thread overview]
Message-ID: <51B6FF7E.1000707@turmel.org> (raw)
In-Reply-To: <CAASLJ=52pzLpk179U=SY+5aGmEe+XcSfrEuBLC_o1ThPvAXfew@mail.gmail.com>

On 06/10/2013 10:08 PM, Keith Phillips wrote:
> Hi  Phil,
> 
>> A big stack trace suggests other problems in your system.  Not that you
>> don't have potential I/O error issues, but there might be a kernel problem.
>>
>> Please show "uname -a" and "mdadm --version".
> 
> These are the verisons I currently have, which the migration was
> attempted with. The array was originally constructed years ago,
> probably with older kernel/mdadm versions:
> 
> Linux muncher 3.0.0-32-server #51-Ubuntu SMP Thu Mar 21 16:09:49 UTC
> 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> mdadm - v3.1.4 - 31st August 2010

If the recommendations below don't help, consider using a modern liveCD
to complete the reshape.  I use SystemRescueCD myself, but I'm sure
others would do fine, too.

>> The key thing to look for is a nonzero mismatch count in sysfs for that
>> array.  I'm not familiar with Ubuntu's script, so you might want to look
>> by hand at some future point.
> 
> I'll have a look in future. I do also have mdadm running daily via
> cron with "--monitor --oneshot" - do you know if this checks the
> "mismatch_cnt" file and reports errors?

I don't think so.

>>> Also, while poking yesterday I noticed I was getting warnings of the
>>> form "Device has wrong state in superblock but /dev/sde seems ok", so
>>> I tried a forced assemble:
>>> mdadm --assemble /dev/md0 --force
>>>
>>> Looks like it updated some info in the superblocks (and yes, I forgot
>>> to save the original output first!), but the array remains inactive. I
>>> have now sworn off poking around by myself, because I've no idea what
>>> to do from here.
>>
>> Please show /proc/mdstat again, along with "mdadm -D /dev/md0".
> 
> ---------------------------
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sde[4] sdc[1] sdb[0] sdd[3]
>       7814054240 blocks super 1.2
> 
> unused devices: <none>
> ---------------------------
> /dev/md0:
>         Version : 1.2
>   Creation Time : Sun Jul 17 00:41:57 2011
>      Raid Level : raid6
>   Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
>    Raid Devices : 4
>   Total Devices : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Sat Jun  8 11:00:43 2013
>           State : active, degraded, Not Started
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 1
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>      New Layout : left-symmetric
> 
>            Name : muncher:0  (local to host muncher)
>            UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
>          Events : 50599
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       16        0      active sync   /dev/sdb
>        1       8       32        1      active sync   /dev/sdc
>        3       8       48        2      active sync   /dev/sdd
>        4       8       64        3      spare rebuilding   /dev/sde
> ---------------------------
> 
>>> for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
>>> ----------------------------
>>> /sys/block/sdb/device/timeout 30
>>> /sys/block/sdc/device/timeout 30
>>> /sys/block/sdd/device/timeout 30
>>> /sys/block/sde/device/timeout 30
>>
>> Due to your green drives, you cannot leave these timeouts at 30 seconds.
>>  I recommend 180 seconds:
>>
>> for x in /sys/block/sd[bcde]/device/timeout ; do echo 180 >$x ; done
>>
>> (You should do this ASAP.  On the run is fine.)
>>
>> You will need your system to do this at every boot.  Most distros have
>> rc.local or a similar scripting mechanism you can use.
>>
>> Phil
> 
> Done - thanks for the tip.

Given the above data, I believe you should be able to just do "mdadm
/dev/md0 --run" and watch it recover.

If it still gives you trouble, stop the array and reassemble with "-vv"
and show what it reports.

Also report any dmesg errors.

Phil

next prev parent reply	other threads:[~2013-06-11 10:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-08  3:02 Help with failed RAID-5 -> 6 migration Keith Phillips
2013-06-08 22:43 ` Phil Turmel
2013-06-08 23:02 ` Phil Turmel
     [not found]   ` <CAASLJ=5JkQ8L9fbrOSUKH8Y-a7PZgkTcCsi6PW=rhzsUPRF6ow@mail.gmail.com>
2013-06-10 16:16     ` Fwd: " Keith Phillips
2013-06-10 19:35       ` Phil Turmel
2013-06-11  2:08         ` Keith Phillips
2013-06-11 10:44           ` Phil Turmel [this message]
2013-06-11 12:42             ` Vanhorn, Mike
     [not found]             ` <CAASLJ=6eEVY6DeZ=+9Aw6yXmqNSc5mygqtD_8y+MaUid6B_TcQ@mail.gmail.com>
2013-06-12 14:51               ` Fwd: " Phil Turmel
     [not found]               ` <51B88AB2.5060303@turmel.org>
     [not found]                 ` <CAASLJ=7=hnez3udgc4Voa_i7drZq_Y-8FkOgxt02_ROL5eD3qg@mail.gmail.com>
2013-06-13 14:09                   ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B6FF7E.1000707@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=spootsy.ootsy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.