Re: degraded raid array with bad blocks

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Fabian Fischer <raid@fabianfischer.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: degraded raid array with bad blocks
Date: Wed, 22 Jul 2015 08:48:12 +1000	[thread overview]
Message-ID: <20150722084812.55016c7c@noble> (raw)
In-Reply-To: <55A7F47D.1020004@fabianfischer.org>

On Thu, 16 Jul 2015 20:14:21 +0200 Fabian Fischer
<raid@fabianfischer.org> wrote:

> Hi,
> today I had some problems with my mdadm raid5 (4disks). Firstly I try to
> explaine what happened and what the result is:
> 
> One disk in my array has some bad blocks. After some hardware-changes
> one of the intact disks was thrown out of the array due to a faulty
> sata-cable.
> I shut down the server and replaced the cable.
> After booting, the removed disk wasn't re added to the array (maybe
> because of different event count). --re-add doesn't work.
> So I used --add.

You need a bitmap configured for --re-add to be useful.
(It is generally a good idea anyway).

> 
> Because of the bad blocks on one of the remaining disks, the rebuild
> stops when reaching the first bad block. The re added disk is declared
> as spare, 2 disks active and the disk with bad blocks as faulty.

That shouldn't happen.
Your devices all have a bad block log present, so when rebuilding hits
a bad block it should just record that as a bad block on the recovering
disk and keep going.  That is the whole point of the bad block log.

What kernel are you running?  And what version of mdadm?


> 
> /dev/md127:
>         Version : 1.2
>   Creation Time : Tue Apr 19 08:51:32 2011
>      Raid Level : raid5
>      Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
>   Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
>    Raid Devices : 4
>   Total Devices : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Thu Jul 16 19:02:09 2015
>           State : clean, FAILED
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : FiFa-Server:0
>            UUID : 839fb405:d0b1f13a:5a55ee42:fc8a2061
>          Events : 107223
> 
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        1       8       80        1      active sync   /dev/sdf
>        5       8       32        2      active sync   /dev/sdc
>        6       0        0        6      removed
> 
>        4       8       96        -      faulty   /dev/sdg
>        6       8       64        -      spare   /dev/sde
> 
> 
> In my opinion there a 3 possibilities to get the array back working. I
> am not sure whether both possibilities really exist and which one is the
> most promising.
> 	- Using the 'spare'-disk as active disk. The data on the disk
> 	  should be still there.
> 	- Ignoring the bad blocks and loose information stored in this
> 	  blocks
> 	- force start the array without the 'spare' disk and copy the
> 	  data to backup-storage, or does the bad block will cause the
> 	  array to fail when reaching a bad block?

If you have somewhere to store backed up data, and if you can assemble
the array with "--assemble --force", then taking that approach and
copying all the data to somewhere else is the safest option.

To do anything else would require a clear understanding of the history
of the array.  Maybe re-creating the array using the 3 "best" devices
would help, but you would want to be really sure what you were doing.

The data in the record bad blocks is probably lost already anyway -
hopefully there is nothing critical there.

Given the update times on the superblocks are very close the array is
probably quite consistent.

I think "--assemble --force" listing the three devices was "Device
Role: Active device ..." should work and give you a degraded array.
Then 'fsck' that and copy data off.
Then maybe recreate the array from scratch.

NeilBrown



> 
> In the attachment you can find the output of --examine.
> In can not explain why 3 disk have a Bad Block Log. According to
> smart-values only sdg has Reallocated_Sector_Ct >0
> Another thing I can't explain is why sdg (which is the disk with known
> bad blocks) has a lower event count.
> 
> 
> I hope I can get some great ideas how to fix my array.
> 
> Fabian
>

     prev parent reply	other threads:[~2015-07-21 22:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16 18:14 degraded raid array with bad blocks Fabian Fischer
2015-07-17  2:09 ` Roman Mamedov
2015-07-21 22:48 ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150722084812.55016c7c@noble \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=raid@fabianfischer.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).