All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Fabian Fischer <raid@fabianfischer.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: degraded raid array with bad blocks
Date: Wed, 22 Jul 2015 08:48:12 +1000	[thread overview]
Message-ID: <20150722084812.55016c7c@noble> (raw)
In-Reply-To: <55A7F47D.1020004@fabianfischer.org>

On Thu, 16 Jul 2015 20:14:21 +0200 Fabian Fischer
<raid@fabianfischer.org> wrote:

> Hi,
> today I had some problems with my mdadm raid5 (4disks). Firstly I try to
> explaine what happened and what the result is:
> 
> One disk in my array has some bad blocks. After some hardware-changes
> one of the intact disks was thrown out of the array due to a faulty
> sata-cable.
> I shut down the server and replaced the cable.
> After booting, the removed disk wasn't re added to the array (maybe
> because of different event count). --re-add doesn't work.
> So I used --add.

You need a bitmap configured for --re-add to be useful.
(It is generally a good idea anyway).

> 
> Because of the bad blocks on one of the remaining disks, the rebuild
> stops when reaching the first bad block. The re added disk is declared
> as spare, 2 disks active and the disk with bad blocks as faulty.

That shouldn't happen.
Your devices all have a bad block log present, so when rebuilding hits
a bad block it should just record that as a bad block on the recovering
disk and keep going.  That is the whole point of the bad block log.

What kernel are you running?  And what version of mdadm?


> 
> /dev/md127:
>         Version : 1.2
>   Creation Time : Tue Apr 19 08:51:32 2011
>      Raid Level : raid5
>      Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
>   Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
>    Raid Devices : 4
>   Total Devices : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Thu Jul 16 19:02:09 2015
>           State : clean, FAILED
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : FiFa-Server:0
>            UUID : 839fb405:d0b1f13a:5a55ee42:fc8a2061
>          Events : 107223
> 
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        1       8       80        1      active sync   /dev/sdf
>        5       8       32        2      active sync   /dev/sdc
>        6       0        0        6      removed
> 
>        4       8       96        -      faulty   /dev/sdg
>        6       8       64        -      spare   /dev/sde
> 
> 
> In my opinion there a 3 possibilities to get the array back working. I
> am not sure whether both possibilities really exist and which one is the
> most promising.
> 	- Using the 'spare'-disk as active disk. The data on the disk
> 	  should be still there.
> 	- Ignoring the bad blocks and loose information stored in this
> 	  blocks
> 	- force start the array without the 'spare' disk and copy the
> 	  data to backup-storage, or does the bad block will cause the
> 	  array to fail when reaching a bad block?

If you have somewhere to store backed up data, and if you can assemble
the array with "--assemble --force", then taking that approach and
copying all the data to somewhere else is the safest option.

To do anything else would require a clear understanding of the history
of the array.  Maybe re-creating the array using the 3 "best" devices
would help, but you would want to be really sure what you were doing.

The data in the record bad blocks is probably lost already anyway -
hopefully there is nothing critical there.

Given the update times on the superblocks are very close the array is
probably quite consistent.

I think "--assemble --force" listing the three devices was "Device
Role: Active device ..." should work and give you a degraded array.
Then 'fsck' that and copy data off.
Then maybe recreate the array from scratch.

NeilBrown



> 
> In the attachment you can find the output of --examine.
> In can not explain why 3 disk have a Bad Block Log. According to
> smart-values only sdg has Reallocated_Sector_Ct >0
> Another thing I can't explain is why sdg (which is the disk with known
> bad blocks) has a lower event count.
> 
> 
> I hope I can get some great ideas how to fix my array.
> 
> Fabian
> 


      parent reply	other threads:[~2015-07-21 22:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16 18:14 degraded raid array with bad blocks Fabian Fischer
2015-07-17  2:09 ` Roman Mamedov
2015-07-21 22:48 ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150722084812.55016c7c@noble \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=raid@fabianfischer.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.