Re: RAID6 12 device assemble force failure

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Adam Niescierowicz <adam.niescierowicz@justnet.pl>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 12 device assemble force failure
Date: Wed, 3 Jul 2024 09:42:53 +0200	[thread overview]
Message-ID: <20240703094253.00007a94@linux.intel.com> (raw)
In-Reply-To: <347003bc-28f1-41e9-b5c4-a2cba5a4475c@justnet.pl>

On Tue, 2 Jul 2024 19:47:52 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> >>>> What can I do to start this array?  
> >>>    You may try to add them manually. I know that there is
> >>> --re-add functionality but I've never used it. Maybe something like that
> >>> would
> >>> work:
> >>> #mdadm --remove /dev/md126 <failed drive>
> >>> #mdadm --re-add /dev/md126 <failed_drive>  
> >> I tried this but didn't help.  
> > Please provide a logs then (possibly with -vvvvv) maybe I or someone else
> > would help.  
> 
> Logs
> ---
> 
> # mdadm --run -vvvvv /dev/md126
> mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error
> 
> # mdadm --stop /dev/md126
> mdadm: stopped /dev/md126
> 
> # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 
> /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 
> /dev/sdn1 /dev/sdw1 /dev/sdt1
> mdadm: looking for devices for /dev/md126
> mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
> mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
> mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
> mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
> mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
> mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
> mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
> mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
> mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
> mdadm: added /dev/sdv1 to /dev/md126 as 1
> mdadm: added /dev/sdn1 to /dev/md126 as 2
> mdadm: added /dev/sdm1 to /dev/md126 as 3
> mdadm: added /dev/sdw1 to /dev/md126 as 4
> mdadm: no uptodate device for slot 5 of /dev/md126
> mdadm: added /dev/sdr1 to /dev/md126 as 6
> mdadm: added /dev/sds1 to /dev/md126 as 7
> mdadm: no uptodate device for slot 8 of /dev/md126
> mdadm: added /dev/sdx1 to /dev/md126 as 9
> mdadm: no uptodate device for slot 10 of /dev/md126
> mdadm: added /dev/sdz1 to /dev/md126 as 11
> mdadm: added /dev/sdq1 to /dev/md126 as -1
> mdadm: added /dev/sdu1 to /dev/md126 as -1
> mdadm: added /dev/sdk1 to /dev/md126 as -1
> mdadm: added /dev/sdt1 to /dev/md126 as 0
> mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
> start the array.
> ---

Could you please share the logs with from --re-add attempt? In a meantime I
will try to simulate this scenario.
> 
> Can somebody explain me behavior of the array? (theory)
> 
> This is RAID-6 so after two disk are disconnected it still works fine. 
> Next when third disk disconnect the array should stop as faulty, yes?
> If array stop as faulty the data on array and third disconnected disk 
> should be the same, yes?

If you will recover only one drive (and start double degraded array), it may
lead to RWH (raid write hole).

If there were writes during disks failure, we don't know which in flight
requests completed. The XOR based calculations may leads us to improper results
for some sectors (we need to read all disks and XOR the data to get the data
for missing 2 drives).

But.. if you will add again all disks, in worst case we will read outdated data
and your filesystem should be able to recover from it.

So yes, it should be fine if you will start array with all drives.

Mariusz

next prev parent reply	other threads:[~2024-07-03  7:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-29 15:17 RAID6 12 device assemble force failure Adam Niescierowicz
2024-07-01  8:51 ` Mariusz Tkaczyk
2024-07-01  9:33   ` Adam Niescierowicz
2024-07-02  8:47     ` Mariusz Tkaczyk
2024-07-02 17:47       ` Adam Niescierowicz
2024-07-03  7:42         ` Mariusz Tkaczyk [this message]
2024-07-03 10:16           ` Mariusz Tkaczyk
2024-07-03 21:10             ` Adam Niescierowicz
2024-07-04 11:06               ` Mariusz Tkaczyk
2024-07-04 12:35                 ` Adam Niescierowicz
2024-07-05 11:02                   ` Mariusz Tkaczyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240703094253.00007a94@linux.intel.com \
    --to=mariusz.tkaczyk@linux.intel.com \
    --cc=adam.niescierowicz@justnet.pl \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).