From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Adam Niescierowicz <adam.niescierowicz@justnet.pl>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 12 device assemble force failure
Date: Wed, 3 Jul 2024 12:16:10 +0200 [thread overview]
Message-ID: <20240703121610.00001041@linux.intel.com> (raw)
In-Reply-To: <20240703094253.00007a94@linux.intel.com>
On Wed, 3 Jul 2024 09:42:53 +0200
Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote:
> On Tue, 2 Jul 2024 19:47:52 +0200
> Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:
>
> > >>>> What can I do to start this array?
> > >>> You may try to add them manually. I know that there is
> > >>> --re-add functionality but I've never used it. Maybe something like that
> > >>> would
> > >>> work:
> > >>> #mdadm --remove /dev/md126 <failed drive>
> > >>> #mdadm --re-add /dev/md126 <failed_drive>
> > >> I tried this but didn't help.
> > > Please provide a logs then (possibly with -vvvvv) maybe I or someone else
> > > would help.
> >
> > Logs
> > ---
> >
> > # mdadm --run -vvvvv /dev/md126
> > mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error
> >
> > # mdadm --stop /dev/md126
> > mdadm: stopped /dev/md126
> >
> > # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1
> > /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1
> > /dev/sdn1 /dev/sdw1 /dev/sdt1
> > mdadm: looking for devices for /dev/md126
> > mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
> > mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
> > mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
> > mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
> > mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
> > mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
> > mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
> > mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
> > mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
> > mdadm: added /dev/sdv1 to /dev/md126 as 1
> > mdadm: added /dev/sdn1 to /dev/md126 as 2
> > mdadm: added /dev/sdm1 to /dev/md126 as 3
> > mdadm: added /dev/sdw1 to /dev/md126 as 4
> > mdadm: no uptodate device for slot 5 of /dev/md126
> > mdadm: added /dev/sdr1 to /dev/md126 as 6
> > mdadm: added /dev/sds1 to /dev/md126 as 7
> > mdadm: no uptodate device for slot 8 of /dev/md126
> > mdadm: added /dev/sdx1 to /dev/md126 as 9
> > mdadm: no uptodate device for slot 10 of /dev/md126
> > mdadm: added /dev/sdz1 to /dev/md126 as 11
> > mdadm: added /dev/sdq1 to /dev/md126 as -1
> > mdadm: added /dev/sdu1 to /dev/md126 as -1
> > mdadm: added /dev/sdk1 to /dev/md126 as -1
> > mdadm: added /dev/sdt1 to /dev/md126 as 0
> > mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to
> > start the array.
> > ---
>
> Could you please share the logs with from --re-add attempt? In a meantime I
> will try to simulate this scenario.
> >
> > Can somebody explain me behavior of the array? (theory)
> >
> > This is RAID-6 so after two disk are disconnected it still works fine.
> > Next when third disk disconnect the array should stop as faulty, yes?
> > If array stop as faulty the data on array and third disconnected disk
> > should be the same, yes?
>
> If you will recover only one drive (and start double degraded array), it may
> lead to RWH (raid write hole).
>
> If there were writes during disks failure, we don't know which in flight
> requests completed. The XOR based calculations may leads us to improper
> results for some sectors (we need to read all disks and XOR the data to get
> the data for missing 2 drives).
>
> But.. if you will add again all disks, in worst case we will read outdated
> data and your filesystem should be able to recover from it.
>
> So yes, it should be fine if you will start array with all drives.
>
> Mariusz
>
I was able to achieve similar state:
mdadm -E /dev/nvme2n1
/dev/nvme2n1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88
Name : gklab-localhost:my_r6 (local to host gklab-localhost)
Creation Time : Wed Jul 3 09:43:32 2024
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
Array Size : 10485760 KiB (10.00 GiB 10.74 GB)
Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=1942775216 sectors
State : clean
Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367
Update Time : Wed Jul 3 11:49:34 2024
Bad Block Log : 512 entries available at offset 16 sectors
Checksum : a96eaa64 - correct
Events : 6
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing)
In my case, events value was different and /dev/nvme3n1 had different Array
State:
Device Role : Active device 3
Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)
And I failed to start it, sorry. It is possible but it requires to work with
sysfs and ioctls directly so much safer is to recreate an array with
--assume-clean, especially that it is fresh array.
Thanks,
Mariusz
next prev parent reply other threads:[~2024-07-03 10:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-29 15:17 RAID6 12 device assemble force failure Adam Niescierowicz
2024-07-01 8:51 ` Mariusz Tkaczyk
2024-07-01 9:33 ` Adam Niescierowicz
2024-07-02 8:47 ` Mariusz Tkaczyk
2024-07-02 17:47 ` Adam Niescierowicz
2024-07-03 7:42 ` Mariusz Tkaczyk
2024-07-03 10:16 ` Mariusz Tkaczyk [this message]
2024-07-03 21:10 ` Adam Niescierowicz
2024-07-04 11:06 ` Mariusz Tkaczyk
2024-07-04 12:35 ` Adam Niescierowicz
2024-07-05 11:02 ` Mariusz Tkaczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240703121610.00001041@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=adam.niescierowicz@justnet.pl \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).