Re: Recover RAID6 with 4 disks removed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wols Lists <antlists@youngman.org.uk>
To: Nicolas Karolak <nicolas.karolak@ubicast.eu>,
	Reindl Harald <h.reindl@thelounge.net>,
	linux-raid@vger.kernel.org
Subject: Re: Recover RAID6 with 4 disks removed
Date: Thu, 6 Feb 2020 18:11:45 +0000	[thread overview]
Message-ID: <5E3C56E1.1070100@youngman.org.uk> (raw)
In-Reply-To: <20200206162250.GA32172@cthulhu.home.robinhill.me.uk>

On 06/02/20 16:22, Robin Hill wrote:
> On Thu Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:
> 
>> Am 06.02.20 um 14:46 schrieb Nicolas Karolak:
>>> I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
>>> from it, and obviously i messed up. Here is the commands i issued (i
>>> do not have the output of them):
>>
>> didn't you realize that RAID6 has redundancy to survive *exactly two*
>> failing disks no matter how many disks the array has anmd the data and
>> redundancy informations are spread ove the disks?
>>
>>> mdadm --manage /dev/md1 --fail /dev/sdh
>>> mdadm --manage /dev/md1 --fail /dev/sdg
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --manage /dev/md1 --fail /dev/sdf
>>> mdadm --manage /dev/md1 --fail /dev/sde
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --manage /dev/md1 --remove /dev/sdh
>>> mdadm --manage /dev/md1 --remove /dev/sdg
>>> mdadm --manage /dev/md1 --remove /dev/sde
>>> mdadm --manage /dev/md1 --remove /dev/sdf
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --grow /dev/md1 --raid-devices=4
>>> mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
>>> going wrong on the system
>>
>> becaue mdadm din't't prevent you from shoot yourself in the foot, likely
>> for cases when one needs a hammer for restore from a uncommon state as
>> last ressort
>>
>> set more than one disk at the same time to "fail" is aksing for troubles
>> no matter what
>>
>> what happens when one drive starts to puke when you removed every
>> redundancy and happily start a reshape that implies heavy IO?
>>
>>> I began to have "inpout/output" error, `ls` or `cat` or almost every
>>> command was not working (something like "/usr/sbin/ls not found").
>>> `mdadm` command was still working, so i did that:
>>>
>>> ```
>>> mdadm --manage /dev/md1 --re-add /dev/sde
>>> mdadm --manage /dev/md1 --re-add /dev/sdf
>>> mdadm --manage /dev/md1 --re-add /dev/sdg
>>> mdadm --manage /dev/md1 --re-add /dev/sdh
>>> mdadm --grow /dev/md1 --raid-devices=8
>>> ```
>>>
>>> The disks were re-added, but as "spares". After that i powered down
>>> the server and made backup of the disks with `dd`.
>>>
>>> Is there any hope to retrieve the data? If yes, then how?
>>
>> unlikely - the started reshape did writes
> 
> I don't think it'll have written anything as the array was in a failed
> state.

That was my reaction, too ...

> You'll have lost the metadata on the original disks though as
> they were removed & re-added (unless you have anything recording these
> before the above operations?)

Will you?

> so that means doing a create --assume-clean
> and "fsck -n" loop with all combinations until you find the correct
> order (and assumes they were added at the same time and so share the
> same offset). At least you know the positions of 4 of the array members,
> so that reduces the number of combinations you'll need.

I'm not sure about that ... BUT DO NOT try anything that may be
destructive without making sure you've got backups !!!

What I would try (there've been plenty of reports of disks being added
back as spares) is to take out sdh and sdg (the first two disks to be
removed) which will give you a degraded 6-drive array that SHOULD have
all the data on it. Do a forced assembly and run - it will hopefully work!

If it does, then you need to re-add the other two drives back, and hope
nothing else goes wrong while the array sorts itself out ...
> 
> Check the wiki - there should be instructions on there regarding use of
> overlays to prevent further accidental damage. There may even be scripts
> to help with automating the create/fsck process.
> 
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

> Cheers,
>     Robin
> 
Cheers,
Wol

     prev parent reply	other threads:[~2020-02-06 18:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-06 13:46 Recover RAID6 with 4 disks removed Nicolas Karolak
2020-02-06 14:07 ` Reindl Harald
2020-02-06 16:02   ` Nicolas KAROLAK
2020-02-06 19:27     ` Reindl Harald
2020-02-06 16:22   ` Robin Hill
2020-02-06 18:11     ` Wols Lists [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5E3C56E1.1070100@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=h.reindl@thelounge.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=nicolas.karolak@ubicast.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).