Re: RAID5 - Disk failed during re-shape

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Sam Clark <sclark_77@hotmail.com>
Cc: linux-raid@vger.kernel.org, NeilBrown <neilb@suse.de>
Subject: Re: RAID5 - Disk failed during re-shape
Date: Fri, 10 Aug 2012 18:36:26 -0400	[thread overview]
Message-ID: <50258CEA.10100@turmel.org> (raw)
In-Reply-To: <BLU153-ds1688A6A36A974526D12FDB94CC0@phx.gbl>

Hi Sam,

On 08/09/2012 04:38 AM, Sam Clark wrote:
> Hi All, 
> 
> Hoping you can help recover my data!
> 
> I have (had?) a software RAID 5 volume, created on Ubuntu 10.04 a few years
> back consisting of 4 x 1500GB drives.  Was running great until the
> motherboard died last week.   Purchased new motherboard, CPU & RAM,
> installed Ubuntu 12.04, and got everything assembled fine, and working for
> around 48 hours.  

Uh-oh.  Stock 12.04 has a buggy kernel.  See here:
http://neil.brown.name/blog/20120615073245

> After that I added a 2000GB drive to increase capacity, and ran mdadm --add
> /dev/md0 /dev/sdf.  The Re-configuration started to run, and then around
> 11.4% of the reshaping I saw that the server had some errors:

And you reshaped and got media errors ...

> Aug  8 22:17:41 nas kernel: [ 5927.453434] Buffer I/O error on device md0,
> logical block 715013760
> Aug  8 22:17:41 nas kernel: [ 5927.453439] EXT4-fs warning (device md0):
> ext4_end_bio:251: I/O error writing to inode 224003641 (offset 157810688
> size 4096 starting block 715013760)
> Aug  8 22:17:41 nas kernel: [ 5927.453448] JBD2: Detected IO errors while
> flushing file data on md0-8
> Aug  8 22:17:41 nas kernel: [ 5927.453467] Aborting journal on device md0-8.
> Aug  8 22:17:41 nas kernel: [ 5927.453642] Buffer I/O error on device md0,
> logical block 548962304
> Aug  8 22:17:41 nas kernel: [ 5927.453643] lost page write due to I/O error
> on md0
> Aug  8 22:17:41 nas kernel: [ 5927.453656] JBD2: I/O error detected when
> updating journal superblock for md0-8.
> Aug  8 22:17:41 nas kernel: [ 5927.453688] Buffer I/O error on device md0,
> logical block 0
> Aug  8 22:17:41 nas kernel: [ 5927.453690] lost page write due to I/O error
> on md0
> Aug  8 22:17:41 nas kernel: [ 5927.453697] EXT4-fs error (device md0):
> ext4_journal_start_sb:327: Detected aborted journal
> Aug  8 22:17:41 nas kernel: [ 5927.453700] EXT4-fs (md0): Remounting
> filesystem read-only
> Aug  8 22:17:41 nas kernel: [ 5927.453703] EXT4-fs (md0): previous I/O error
> to superblock detected
> Aug  8 22:17:41 nas kernel: [ 5927.453826] Buffer I/O error on device md0,
> logical block 715013760
> Aug  8 22:17:41 nas kernel: [ 5927.453828] lost page write due to I/O error
> on md0
> Aug  8 22:17:41 nas kernel: [ 5927.453842] JBD2: Detected IO errors while
> flushing file data on md0-8
> Aug  8 22:17:41 nas kernel: [ 5927.453848] Buffer I/O error on device md0,
> logical block 0
> Aug  8 22:17:41 nas kernel: [ 5927.453850] lost page write due to I/O error
> on md0
> Aug  8 22:20:54 nas kernel: [ 6120.964129] INFO: task md0_reshape:297
> blocked for more than 120 seconds.
> 
> On checking the progress of /proc/mdstat, I found that 2 drives were listed
> as failed (__UUU), and the finish time was simply growing by hundreds of
> minutes at a time.
> 
> I was able to browse some data on the Raid set (incl my Home folder), but
> couldn't browse some other sections - shell simply hung when I tried to
> issue "ls /raidmount".  I tied to add one of the failed disks back in, but
> got the response that there was no superblock on it.  rebooted it at that
> time.

Poof.  The bug wiped your active device's metadata.

> During boot I was given the option to manually recover, or skip mounting - I
> chose the latter. 

Good instincts, but probably not any help.

> Now that the system is running, I tried to assemble, but keeps failing. 
> Have tried:
> mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde
> /dev/sdf
> 
> I am able to see all the drives, but can see the UUID is incorrect and the
> Raid Level states -unknown-, as below... does this mean the data can't be
> recovered?  

If you weren't in the middle of a reshape, you could recover using the
instructions in the blog entry above.

[trim /]

> I guess the 'invalid argument' is the -unknown- in the raid level.. but it's
> only a guess. 
> 
> I'm at the extent of my knowledge - would appreciate some expert assistance
> in recovering this array, if it's possible!

I think you are toast, as I saw nothing in the metadata that would give
you a precise reshape restart position, even if you got Neil to work up
a custom mdadm that could use it.  The 11.4% could be converted into an
approximate restart position, perhaps.

Neil, is there any way to do some combination of "create
--assume-clean", start a reshape held at zero, then skip 11.4% ?

Phil

next prev parent reply	other threads:[~2012-08-10 22:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-09  8:38 RAID5 - Disk failed during re-shape Sam Clark
2012-08-10 22:36 ` Phil Turmel [this message]
2012-08-11  1:21   ` Dmitrijs Ledkovs
2012-08-11  8:42   ` Sam Clark
2012-08-12 23:35     ` NeilBrown
     [not found]       ` <BLU153-ds10943E39726EDC983C484594B00@phx.gbl>
2012-08-14  2:38         ` NeilBrown
2012-08-14 13:40           ` Sam Clark
2012-08-14 21:05             ` NeilBrown
2012-08-15 16:32               ` Sam Clark
2012-08-15 22:38                 ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50258CEA.10100@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=sclark_77@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.