* RAID6 recovery, event count mismatch
@ 2022-12-18 18:18 Linus Lüssing
2022-12-19 20:22 ` John Stoffel
0 siblings, 1 reply; 3+ messages in thread
From: Linus Lüssing @ 2022-12-18 18:18 UTC (permalink / raw)
To: linux-raid
Hi,
I recently had a disk failure in my Linux RAID6 array with 6
devices. The badblocks tool confirmed some broken sectors. I tried
to remove the faulty drive but that seems to have caused more
issues (2.5" inch "low power" drives connected via USB-SATA
adapters over an externally powered USB hub to a Pi 4... which
ran fine for more than a year now, but seems to be prone to
physical disk reconnects).
I removed the faulty drive, rebooted the whole system and the
RAID is now inactive. The event count is a little old on 3 of 5
disks, off by 3 to 7 events).
Question 1)
Is it safe and still recommended to use the command that is
suggested here?
https://raid.wiki.kernel.org/index.php/RAID_Recovery#Trying_to_assemble_using_--force
-> "mdadm /dev/mdX --assemble --force <list of devices>"
Or should I do a forced --re-add of the three devices that have
the lower even counts and a "Device Role : spare"?
Question 2)
If a forced re-add/assemble works and a RAID check / rebuilt runs
through fine, is everything fine again then? Or are there additional
steps I should follow to ensure the data and filesystems are
not corrupted? (below I'm using LVM with normal and thinly
provisioned volumes with LXD for containers, and other than that
volumes are formatted with ext4)
Question 3)
Would the "new" Linux RAID write journal feature with a dedicated
SSD have prevented such an inconsistency?
Question 4)
"mdadm -D /dev/md127" says "Raid Level : raid0", which is wrong
and luckily the disks themselves each individually still know
it's a raid6 according to mdadm. Is this just a displaying bug
of mdadm and nothing to worry about?
System/OS:
$ uname -a
Linux treehouse 5.18.9-v8+ #4 SMP PREEMPT Mon Jul 11 02:47:28 CEST 2022 aarch64 GNU/Linux
$ mdadm --version
mdadm - v4.1 - 2018-10-01
$ cat /etc/debian_version
11.5
-> Debian bullseye
More detailed mdadm output below.
Regards, Linus
==========
$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive dm-13[6](S) dm-12[5](S) dm-11[3](S) dm-10[2](S) dm-9[0](S)
9762371240 blocks super 1.2
unused devices: <none>
$ mdadm -D /dev/md127
/dev/md127:
Version : 1.2
Raid Level : raid0
Total Devices : 5
Persistence : Superblock is persistent
State : inactive
Working Devices : 5
Name : treehouse:raid (local to host treehouse)
UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Events : 2554495
Number Major Minor RaidDevice
- 254 13 - /dev/dm-13
- 254 11 - /dev/dm-11
- 254 12 - /dev/dm-12
- 254 10 - /dev/dm-10
- 254 9 - /dev/dm-9
$ mdadm -E /dev/dm-9
/dev/dm-9:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Name : treehouse:raid (local to host treehouse)
Creation Time : Mon Jan 29 02:48:26 2018
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
Data Offset : 252928 sectors
Super Offset : 8 sectors
Unused Space : before=252840 sectors, after=9488 sectors
State : clean
Device UUID : 5fa00c38:e4069502:d4013eeb:08801a9b
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 26 09:59:17 2022
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 1a214e3c - correct
Events : 2554492
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
$ mdadm -E /dev/dm-10
/dev/dm-10:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Name : treehouse:raid (local to host treehouse)
Creation Time : Mon Jan 29 02:48:26 2018
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
Data Offset : 252928 sectors
Super Offset : 8 sectors
Unused Space : before=252840 sectors, after=9488 sectors
State : clean
Device UUID : 7edd1414:e610975a:fbe4a253:7ff9d404
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 26 09:35:16 2022
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 204aec57 - correct
Events : 2554488
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
$ mdadm -E /dev/dm-11
/dev/dm-11:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Name : treehouse:raid (local to host treehouse)
Creation Time : Mon Jan 29 02:48:26 2018
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
Data Offset : 252928 sectors
Super Offset : 8 sectors
Unused Space : before=252840 sectors, after=9488 sectors
State : clean
Device UUID : e8620025:d7cfec3d:a580f07d:9b7b5e11
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 26 09:47:17 2022
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 4b64514b - correct
Events : 2554490
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
$ mdadm -E /dev/dm-12
/dev/dm-12:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Name : treehouse:raid (local to host treehouse)
Creation Time : Mon Jan 29 02:48:26 2018
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
Data Offset : 252928 sectors
Super Offset : 8 sectors
Unused Space : before=252840 sectors, after=9488 sectors
State : clean
Device UUID : 02cd8021:ece5f701:777c1d5e:1f19449a
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Dec 4 00:57:01 2022
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 750b7a8f - correct
Events : 2554495
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
$ mdadm -E /dev/dm-13
/dev/dm-13:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
Name : treehouse:raid (local to host treehouse)
Creation Time : Mon Jan 29 02:48:26 2018
Raid Level : raid6
Raid Devices : 6
Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
Data Offset : 252928 sectors
Super Offset : 8 sectors
Unused Space : before=252848 sectors, after=9488 sectors
State : clean
Device UUID : c7e94388:5d5020e9:51fe2079:9f6a989d
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Dec 4 00:57:01 2022
Bad Block Log : 512 entries available at offset 16 sectors
Checksum : e14ed4e9 - correct
Events : 2554495
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: RAID6 recovery, event count mismatch
2022-12-18 18:18 RAID6 recovery, event count mismatch Linus Lüssing
@ 2022-12-19 20:22 ` John Stoffel
2022-12-28 18:51 ` Linus Lüssing
0 siblings, 1 reply; 3+ messages in thread
From: John Stoffel @ 2022-12-19 20:22 UTC (permalink / raw)
To: Linus Lüssing; +Cc: linux-raid
>>>>> "Linus" == Linus Lüssing <linus.luessing@c0d3.blue> writes:
> I recently had a disk failure in my Linux RAID6 array with 6
> devices. The badblocks tool confirmed some broken sectors. I tried
> to remove the faulty drive but that seems to have caused more
> issues (2.5" inch "low power" drives connected via USB-SATA
> adapters over an externally powered USB hub to a Pi 4... which
> ran fine for more than a year now, but seems to be prone to
> physical disk reconnects).
Yikes! I'm amazed you haven't had more problems with this setup. It
must be pretty darn slow...
> I removed the faulty drive, rebooted the whole system and the RAID
> is now inactive. The event count is a little old on 3 of 5 disks,
> off by 3 to 7 events).
That implies to me that when you removed the faulty drive, the array
(USB bus, etc) went south at the same time.
> Question 1)
> Is it safe and still recommended to use the command that is
> suggested here?
> https://raid.wiki.kernel.org/index.php/RAID_Recovery#Trying_to_assemble_using_--force
-> "mdadm /dev/mdX --assemble --force <list of devices>"
I would try this first. Do you have an details of the individual
drives and their counts as well?
> Or should I do a forced --re-add of the three devices that have
> the lower even counts and a "Device Role : spare"?
> Question 2)
> If a forced re-add/assemble works and a RAID check / rebuilt runs
> through fine, is everything fine again then? Or are there additional
> steps I should follow to ensure the data and filesystems are
> not corrupted? (below I'm using LVM with normal and thinly
> provisioned volumes with LXD for containers, and other than that
> volumes are formatted with ext4)
> Question 3)
> Would the "new" Linux RAID write journal feature with a dedicated
> SSD have prevented such an inconsistency?
> Question 4)
> "mdadm -D /dev/md127" says "Raid Level : raid0", which is wrong
> and luckily the disks themselves each individually still know
> it's a raid6 according to mdadm. Is this just a displaying bug
> of mdadm and nothing to worry about?
> System/OS:
> $ uname -a
> Linux treehouse 5.18.9-v8+ #4 SMP PREEMPT Mon Jul 11 02:47:28 CEST 2022 aarch64 GNU/Linux
> $ mdadm --version
> mdadm - v4.1 - 2018-10-01
> $ cat /etc/debian_version
> 11.5
-> Debian bullseye
> More detailed mdadm output below.
> Regards, Linus
> ==========
> $ cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md127 : inactive dm-13[6](S) dm-12[5](S) dm-11[3](S) dm-10[2](S) dm-9[0](S)
> 9762371240 blocks super 1.2
> unused devices: <none>
> $ mdadm -D /dev/md127
> /dev/md127:
> Version : 1.2
> Raid Level : raid0
> Total Devices : 5
> Persistence : Superblock is persistent
> State : inactive
> Working Devices : 5
> Name : treehouse:raid (local to host treehouse)
> UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Events : 2554495
> Number Major Minor RaidDevice
> - 254 13 - /dev/dm-13
> - 254 11 - /dev/dm-11
> - 254 12 - /dev/dm-12
> - 254 10 - /dev/dm-10
> - 254 9 - /dev/dm-9
> $ mdadm -E /dev/dm-9
> /dev/dm-9:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Name : treehouse:raid (local to host treehouse)
> Creation Time : Mon Jan 29 02:48:26 2018
> Raid Level : raid6
> Raid Devices : 6
> Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
> Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
> Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
> Data Offset : 252928 sectors
> Super Offset : 8 sectors
> Unused Space : before=252840 sectors, after=9488 sectors
> State : clean
> Device UUID : 5fa00c38:e4069502:d4013eeb:08801a9b
> Internal Bitmap : 8 sectors from superblock
> Update Time : Sat Nov 26 09:59:17 2022
> Bad Block Log : 512 entries available at offset 72 sectors
> Checksum : 1a214e3c - correct
> Events : 2554492
> Layout : left-symmetric
> Chunk Size : 512K
> Device Role : spare
> Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
> $ mdadm -E /dev/dm-10
> /dev/dm-10:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Name : treehouse:raid (local to host treehouse)
> Creation Time : Mon Jan 29 02:48:26 2018
> Raid Level : raid6
> Raid Devices : 6
> Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
> Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
> Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
> Data Offset : 252928 sectors
> Super Offset : 8 sectors
> Unused Space : before=252840 sectors, after=9488 sectors
> State : clean
> Device UUID : 7edd1414:e610975a:fbe4a253:7ff9d404
> Internal Bitmap : 8 sectors from superblock
> Update Time : Sat Nov 26 09:35:16 2022
> Bad Block Log : 512 entries available at offset 72 sectors
> Checksum : 204aec57 - correct
> Events : 2554488
> Layout : left-symmetric
> Chunk Size : 512K
> Device Role : spare
> Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
> $ mdadm -E /dev/dm-11
> /dev/dm-11:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Name : treehouse:raid (local to host treehouse)
> Creation Time : Mon Jan 29 02:48:26 2018
> Raid Level : raid6
> Raid Devices : 6
> Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
> Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
> Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
> Data Offset : 252928 sectors
> Super Offset : 8 sectors
> Unused Space : before=252840 sectors, after=9488 sectors
> State : clean
> Device UUID : e8620025:d7cfec3d:a580f07d:9b7b5e11
> Internal Bitmap : 8 sectors from superblock
> Update Time : Sat Nov 26 09:47:17 2022
> Bad Block Log : 512 entries available at offset 72 sectors
> Checksum : 4b64514b - correct
> Events : 2554490
> Layout : left-symmetric
> Chunk Size : 512K
> Device Role : spare
> Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
> $ mdadm -E /dev/dm-12
> /dev/dm-12:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Name : treehouse:raid (local to host treehouse)
> Creation Time : Mon Jan 29 02:48:26 2018
> Raid Level : raid6
> Raid Devices : 6
> Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
> Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
> Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
> Data Offset : 252928 sectors
> Super Offset : 8 sectors
> Unused Space : before=252840 sectors, after=9488 sectors
> State : clean
> Device UUID : 02cd8021:ece5f701:777c1d5e:1f19449a
> Internal Bitmap : 8 sectors from superblock
> Update Time : Sun Dec 4 00:57:01 2022
> Bad Block Log : 512 entries available at offset 72 sectors
> Checksum : 750b7a8f - correct
> Events : 2554495
> Layout : left-symmetric
> Chunk Size : 512K
> Device Role : Active device 3
> Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
> $ mdadm -E /dev/dm-13
> /dev/dm-13:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : cc2852b8:aca4bdf8:761739d6:0ca5c3bb
> Name : treehouse:raid (local to host treehouse)
> Creation Time : Mon Jan 29 02:48:26 2018
> Raid Level : raid6
> Raid Devices : 6
> Avail Dev Size : 3904948496 (1862.02 GiB 1999.33 GB)
> Array Size : 7809878016 (7448.08 GiB 7997.32 GB)
> Used Dev Size : 3904939008 (1862.02 GiB 1999.33 GB)
> Data Offset : 252928 sectors
> Super Offset : 8 sectors
> Unused Space : before=252848 sectors, after=9488 sectors
> State : clean
> Device UUID : c7e94388:5d5020e9:51fe2079:9f6a989d
> Internal Bitmap : 8 sectors from superblock
> Update Time : Sun Dec 4 00:57:01 2022
> Bad Block Log : 512 entries available at offset 16 sectors
> Checksum : e14ed4e9 - correct
> Events : 2554495
> Layout : left-symmetric
> Chunk Size : 512K
> Device Role : Active device 5
> Array State : ...A.A ('A' == active, '.' == missing, 'R' == replacing)
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: RAID6 recovery, event count mismatch
2022-12-19 20:22 ` John Stoffel
@ 2022-12-28 18:51 ` Linus Lüssing
0 siblings, 0 replies; 3+ messages in thread
From: Linus Lüssing @ 2022-12-28 18:51 UTC (permalink / raw)
To: John Stoffel; +Cc: linux-raid
On Mon, Dec 19, 2022 at 03:22:45PM -0500, John Stoffel wrote:
> [...]
>
> Yikes! I'm amazed you haven't had more problems with this setup. It
> must be pretty darn slow...
Speed is actually decent for my use-cases. Just a full RAID check or
replacing a disk can take a bit longer. It was a different story
with my first try with a Pi 3 and its USB2 ports :D.
> [...]
>
> > https://raid.wiki.kernel.org/index.php/RAID_Recovery#Trying_to_assemble_using_--force
> -> "mdadm /dev/mdX --assemble --force <list of devices>"
>
> I would try this first. Do you have an details of the individual
> drives and their counts as well?
The event counts are as follows (as can be seen in the full mdadm
outputs at the bottom of my previous email):
dm-9: 2554492
dm-10: 2554488
dm-11: 2554490
dm-12: 2554495
dm-13: 2554495
Are these the counters you were asking for?
Regards, Linus
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-12-28 18:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-18 18:18 RAID6 recovery, event count mismatch Linus Lüssing
2022-12-19 20:22 ` John Stoffel
2022-12-28 18:51 ` Linus Lüssing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).