* Reshape Failure
@ 2023-09-03 21:39 Jason Moss
2023-09-04 1:41 ` Yu Kuai
0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-03 21:39 UTC (permalink / raw)
To: linux-raid
Hello,
I recently attempted to add a new drive to my 8-drive RAID 6 array,
growing it to 9 drives. I've done similar before with the same array,
having previously grown it from 6 drives to 7 and then from 7 to 8
with no issues. Drives are WD Reds, most older than 2019, some
(including the newest) newer, but all confirmed CMR and not SMR.
Process used to expand the array:
mdadm --add /dev/md0 /dev/sdb1
mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
The reshape started off fine, the process was underway, and the volume
was still usable as expected. However, 15-30 minutes into the reshape,
I lost access to the contents of the drive. Checking /proc/mdstat, the
reshape was stopped at 0.6% with the counter not incrementing at all.
Any process accessing the array would just hang until killed. I waited
a half hour and there was still no further change to the counter. At
this point, I restarted the server and found that when it came back up
it would begin reshaping again, but only very briefly, under 30
seconds, but the counter would be increasing during that time.
I searched furiously for ideas and tried stopping and reassembling the
array, assembling with an invalid-backup flag, echoing "frozen" then
"reshape" to the sync_action file, and echoing "max" to the sync_max
file. Nothing ever seemed to make a difference.
Here is where I slightly panicked, worried that I'd borked my array,
and powered off the server again and disconnected the new drive that
was just added, assuming that since it was the change, it may be the
problem despite having burn-in tested it, and figuring that I'll rush
order a new drive, so long as the reshape continues and I can just
rebuild onto a new drive once the reshape finishes. However, this made
no difference and the array continued to not rebuild.
Much searching later, I'd found nothing substantially different then
I'd already tried and one of the common threads in other people's
issues was bad drives, so I ran a self-test against each of the
existing drives and found one drive that failed the read test.
Thinking I had the culprit now, I dropped that drive out of the array
and assembled the array again, but the same behavior persists. The
array reshapes very briefly, then completely stops.
Down to 0 drives of redundancy (in the reshaped section at least), not
finding any new ideas on any of the forums, mailing list, wiki, etc,
and very frustrated, I took a break, bought all new drives to build a
new array in another server and restored from a backup. However, there
is still some data not captured by the most recent backup that I would
like to recover, and I'd also like to solve the problem purely to
understand what happened and how to recover in the future.
Is there anything else I should try to recover this array, or is this
a lost cause?
Details requested by the wiki to follow and I'm happy to collect any
further data that would assist. /dev/sdb is the new drive that was
added, then disconnected. /dev/sdh is the drive that failed a
self-test and was removed from the array.
Thank you in advance for any help provided!
$ uname -a
Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux
$ mdadm --version
mdadm - v4.2 - 2021-12-30
$ sudo smartctl -H -i -l scterc /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N7AT7R7X
LU WWN Device Id: 5 0014ee 268545f93
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:27:55 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N7AT7R7X
LU WWN Device Id: 5 0014ee 268545f93
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:16 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WXG1A8UGLS42
LU WWN Device Id: 5 0014ee 2b75ef53b
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:19 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N4HYL32Y
LU WWN Device Id: 5 0014ee 2630752f8
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:20 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68N32N0
Serial Number: WD-WCC7K1FF6DYK
LU WWN Device Id: 5 0014ee 2ba952a30
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:21 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sde
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N5ZHTRJF
LU WWN Device Id: 5 0014ee 2b88b83bb
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:22 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WMC1T3804790
LU WWN Device Id: 5 0014ee 6036b6826
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:23 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdg
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N0H692Z9
LU WWN Device Id: 5 0014ee 65af39740
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:24 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdh
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N0K5S750
LU WWN Device Id: 5 0014ee 6b048d9ca
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:24 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo smartctl -H -i -l scterc /dev/sdi
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WMC1T1502475
LU WWN Device Id: 5 0014ee 058d2e5cb
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 3 13:28:27 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
$ sudo mdadm --examine /dev/sda
/dev/sda:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0xd
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247728 sectors, after=14336 sectors
State : clean
Device UUID : 8ca60ad5:60d19333:11b24820:91453532
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 24 sectors - bad
blocks present.
Checksum : b6d8f4d1 - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdb
/dev/sdb:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247728 sectors, after=14336 sectors
State : clean
Device UUID : 386d3001:16447e43:4d2a5459:85618d11
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 00:02:59 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : b544a39 - correct
Events : 181077
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdc
/dev/sdc:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0xd
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 88d8b8fc - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdd
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247728 sectors, after=14336 sectors
State : clean
Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : d1471d9d - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sde
/dev/sde:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e05d0278 - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdf
/dev/sdf:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 26792cc0 - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdg
/dev/sdg:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : 74476ce7:4edc23f6:08120711:ba281425
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 6f67d179 - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdh
/dev/sdh:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0xd
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 20:09:14 2023
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : b7696b68 - correct
Events : 181089
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdi
/dev/sdi:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
$ sudo mdadm --examine /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 440dc11e:079308b1:131eda79:9a74c670
Name : Blyth:0 (local to host Blyth)
Creation Time : Tue Aug 4 23:47:57 2015
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
Data Offset : 247808 sectors
Super Offset : 8 sectors
Unused Space : before=247720 sectors, after=14336 sectors
State : clean
Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
Delta Devices : 1 (8->9)
Update Time : Tue Jul 11 23:12:08 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 23b6d024 - correct
Events : 181105
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Raid Level : raid6
Total Devices : 9
Persistence : Superblock is persistent
State : inactive
Working Devices : 9
Delta Devices : 1, (-1->0)
New Level : raid6
New Layout : left-symmetric
New Chunksize : 512K
Name : Blyth:0 (local to host Blyth)
UUID : 440dc11e:079308b1:131eda79:9a74c670
Events : 181105
Number Major Minor RaidDevice
- 8 1 - /dev/sda1
- 8 129 - /dev/sdi1
- 8 113 - /dev/sdh1
- 8 97 - /dev/sdg1
- 8 81 - /dev/sdf1
- 8 65 - /dev/sde1
- 8 49 - /dev/sdd1
- 8 33 - /dev/sdc1
- 8 17 - /dev/sdb1
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
26353689600 blocks super 1.2
unused devices: <none>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Reshape Failure 2023-09-03 21:39 Reshape Failure Jason Moss @ 2023-09-04 1:41 ` Yu Kuai 2023-09-04 16:38 ` Jason Moss 0 siblings, 1 reply; 21+ messages in thread From: Yu Kuai @ 2023-09-04 1:41 UTC (permalink / raw) To: Jason Moss, linux-raid; +Cc: yangerkun@huawei.com, yukuai (C) Hi, 在 2023/09/04 5:39, Jason Moss 写道: > Hello, > > I recently attempted to add a new drive to my 8-drive RAID 6 array, > growing it to 9 drives. I've done similar before with the same array, > having previously grown it from 6 drives to 7 and then from 7 to 8 > with no issues. Drives are WD Reds, most older than 2019, some > (including the newest) newer, but all confirmed CMR and not SMR. > > Process used to expand the array: > mdadm --add /dev/md0 /dev/sdb1 > mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > > The reshape started off fine, the process was underway, and the volume > was still usable as expected. However, 15-30 minutes into the reshape, > I lost access to the contents of the drive. Checking /proc/mdstat, the > reshape was stopped at 0.6% with the counter not incrementing at all. > Any process accessing the array would just hang until killed. I waited What kernel version are you using? And it'll be very helpful if you can collect the stack of all stuck thread. There is a known deadlock for raid5 related to reshape, and it's fixed in v6.5: https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > a half hour and there was still no further change to the counter. At > this point, I restarted the server and found that when it came back up > it would begin reshaping again, but only very briefly, under 30 > seconds, but the counter would be increasing during that time. > > I searched furiously for ideas and tried stopping and reassembling the > array, assembling with an invalid-backup flag, echoing "frozen" then > "reshape" to the sync_action file, and echoing "max" to the sync_max > file. Nothing ever seemed to make a difference. > Don't do this before v6.5, echo "reshape" while reshape is still in progress will corrupt your data: https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com Thanks, Kuai > Here is where I slightly panicked, worried that I'd borked my array, > and powered off the server again and disconnected the new drive that > was just added, assuming that since it was the change, it may be the > problem despite having burn-in tested it, and figuring that I'll rush > order a new drive, so long as the reshape continues and I can just > rebuild onto a new drive once the reshape finishes. However, this made > no difference and the array continued to not rebuild. > > Much searching later, I'd found nothing substantially different then > I'd already tried and one of the common threads in other people's > issues was bad drives, so I ran a self-test against each of the > existing drives and found one drive that failed the read test. > Thinking I had the culprit now, I dropped that drive out of the array > and assembled the array again, but the same behavior persists. The > array reshapes very briefly, then completely stops. > > Down to 0 drives of redundancy (in the reshaped section at least), not > finding any new ideas on any of the forums, mailing list, wiki, etc, > and very frustrated, I took a break, bought all new drives to build a > new array in another server and restored from a backup. However, there > is still some data not captured by the most recent backup that I would > like to recover, and I'd also like to solve the problem purely to > understand what happened and how to recover in the future. > > Is there anything else I should try to recover this array, or is this > a lost cause? > > Details requested by the wiki to follow and I'm happy to collect any > further data that would assist. /dev/sdb is the new drive that was > added, then disconnected. /dev/sdh is the drive that failed a > self-test and was removed from the array. > > Thank you in advance for any help provided! > > > $ uname -a > Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > 2023 x86_64 x86_64 x86_64 GNU/Linux > > $ mdadm --version > mdadm - v4.2 - 2021-12-30 > > > $ sudo smartctl -H -i -l scterc /dev/sda > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WCC4N7AT7R7X > LU WWN Device Id: 5 0014ee 268545f93 > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:27:55 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sda > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WCC4N7AT7R7X > LU WWN Device Id: 5 0014ee 268545f93 > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:16 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdb > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WXG1A8UGLS42 > LU WWN Device Id: 5 0014ee 2b75ef53b > Firmware Version: 80.00A80 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:19 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdc > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WCC4N4HYL32Y > LU WWN Device Id: 5 0014ee 2630752f8 > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:20 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdd > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68N32N0 > Serial Number: WD-WCC7K1FF6DYK > LU WWN Device Id: 5 0014ee 2ba952a30 > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Form Factor: 3.5 inches > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-3 T13/2161-D revision 5 > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:21 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sde > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WCC4N5ZHTRJF > LU WWN Device Id: 5 0014ee 2b88b83bb > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:22 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdf > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68AX9N0 > Serial Number: WD-WMC1T3804790 > LU WWN Device Id: 5 0014ee 6036b6826 > Firmware Version: 80.00A80 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:23 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdg > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WMC4N0H692Z9 > LU WWN Device Id: 5 0014ee 65af39740 > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:24 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdh > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number: WD-WMC4N0K5S750 > LU WWN Device Id: 5 0014ee 6b048d9ca > Firmware Version: 82.00A82 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:24 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > $ sudo smartctl -H -i -l scterc /dev/sdi > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red > Device Model: WDC WD30EFRX-68AX9N0 > Serial Number: WD-WMC1T1502475 > LU WWN Device Id: 5 0014ee 058d2e5cb > Firmware Version: 80.00A80 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Sun Sep 3 13:28:27 2023 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 70 (7.0 seconds) > Write: 70 (7.0 seconds) > > > $ sudo mdadm --examine /dev/sda > /dev/sda: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sda1 > /dev/sda1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0xd > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247728 sectors, after=14336 sectors > State : clean > Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 24 sectors - bad > blocks present. > Checksum : b6d8f4d1 - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 7 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdb > /dev/sdb: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdb1 > /dev/sdb1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247728 sectors, after=14336 sectors > State : clean > Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 00:02:59 2023 > Bad Block Log : 512 entries available at offset 24 sectors > Checksum : b544a39 - correct > Events : 181077 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 8 > Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdc > /dev/sdc: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdc1 > /dev/sdc1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0xd > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 72 sectors - bad > blocks present. > Checksum : 88d8b8fc - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 4 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdd > /dev/sdd: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdd1 > /dev/sdd1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247728 sectors, after=14336 sectors > State : clean > Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 24 sectors > Checksum : d1471d9d - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 6 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sde > /dev/sde: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sde1 > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : e05d0278 - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 5 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdf > /dev/sdf: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdf1 > /dev/sdf1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : 26792cc0 - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 0 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdg > /dev/sdg: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdg1 > /dev/sdg1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : 6f67d179 - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 1 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdh > /dev/sdh: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdh1 > /dev/sdh1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0xd > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 20:09:14 2023 > Bad Block Log : 512 entries available at offset 72 sectors - bad > blocks present. > Checksum : b7696b68 - correct > Events : 181089 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 2 > Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --examine /dev/sdi > /dev/sdi: > MBR Magic : aa55 > Partition[0] : 4294967295 sectors at 1 (type ee) > $ sudo mdadm --examine /dev/sdi1 > /dev/sdi1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x5 > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > Name : Blyth:0 (local to host Blyth) > Creation Time : Tue Aug 4 23:47:57 2015 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > Data Offset : 247808 sectors > Super Offset : 8 sectors > Unused Space : before=247720 sectors, after=14336 sectors > State : clean > Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > > Internal Bitmap : 8 sectors from superblock > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > Delta Devices : 1 (8->9) > > Update Time : Tue Jul 11 23:12:08 2023 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : 23b6d024 - correct > Events : 181105 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 3 > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > $ sudo mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 > Raid Level : raid6 > Total Devices : 9 > Persistence : Superblock is persistent > > State : inactive > Working Devices : 9 > > Delta Devices : 1, (-1->0) > New Level : raid6 > New Layout : left-symmetric > New Chunksize : 512K > > Name : Blyth:0 (local to host Blyth) > UUID : 440dc11e:079308b1:131eda79:9a74c670 > Events : 181105 > > Number Major Minor RaidDevice > > - 8 1 - /dev/sda1 > - 8 129 - /dev/sdi1 > - 8 113 - /dev/sdh1 > - 8 97 - /dev/sdg1 > - 8 81 - /dev/sdf1 > - 8 65 - /dev/sde1 > - 8 49 - /dev/sdd1 > - 8 33 - /dev/sdc1 > - 8 17 - /dev/sdb1 > > $ cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > 26353689600 blocks super 1.2 > > unused devices: <none> > > . > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-04 1:41 ` Yu Kuai @ 2023-09-04 16:38 ` Jason Moss 2023-09-05 1:07 ` Yu Kuai 0 siblings, 1 reply; 21+ messages in thread From: Jason Moss @ 2023-09-04 16:38 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi Kuai, Thank you for the suggestion, I was previously on 5.15.0. I've built an environment with 6.5.0.1 now and assembled the array there, but the same problem happens. It reshaped for 20-30 seconds, then completely stopped. Processes and /proc/<PID>/stack output: root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] [root@arch ~]# cat /proc/24593/stack [<0>] rescuer_thread+0x2b0/0x3b0 [<0>] kthread+0xe8/0x120 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1b/0x30 [root@arch ~]# cat /proc/24594/stack [root@arch ~]# cat /proc/24595/stack [<0>] reshape_request+0x416/0x9f0 [raid456] [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] [<0>] md_thread+0xae/0x190 [md_mod] [<0>] kthread+0xe8/0x120 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1b/0x30 Please let me know if there's a better way to provide the stack info. Thank you On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/04 5:39, Jason Moss 写道: > > Hello, > > > > I recently attempted to add a new drive to my 8-drive RAID 6 array, > > growing it to 9 drives. I've done similar before with the same array, > > having previously grown it from 6 drives to 7 and then from 7 to 8 > > with no issues. Drives are WD Reds, most older than 2019, some > > (including the newest) newer, but all confirmed CMR and not SMR. > > > > Process used to expand the array: > > mdadm --add /dev/md0 /dev/sdb1 > > mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > > > > The reshape started off fine, the process was underway, and the volume > > was still usable as expected. However, 15-30 minutes into the reshape, > > I lost access to the contents of the drive. Checking /proc/mdstat, the > > reshape was stopped at 0.6% with the counter not incrementing at all. > > Any process accessing the array would just hang until killed. I waited > > What kernel version are you using? And it'll be very helpful if you can > collect the stack of all stuck thread. There is a known deadlock for > raid5 related to reshape, and it's fixed in v6.5: > > https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > > > a half hour and there was still no further change to the counter. At > > this point, I restarted the server and found that when it came back up > > it would begin reshaping again, but only very briefly, under 30 > > seconds, but the counter would be increasing during that time. > > > > I searched furiously for ideas and tried stopping and reassembling the > > array, assembling with an invalid-backup flag, echoing "frozen" then > > "reshape" to the sync_action file, and echoing "max" to the sync_max > > file. Nothing ever seemed to make a difference. > > > > Don't do this before v6.5, echo "reshape" while reshape is still in > progress will corrupt your data: > > https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com > > Thanks, > Kuai > > > Here is where I slightly panicked, worried that I'd borked my array, > > and powered off the server again and disconnected the new drive that > > was just added, assuming that since it was the change, it may be the > > problem despite having burn-in tested it, and figuring that I'll rush > > order a new drive, so long as the reshape continues and I can just > > rebuild onto a new drive once the reshape finishes. However, this made > > no difference and the array continued to not rebuild. > > > > Much searching later, I'd found nothing substantially different then > > I'd already tried and one of the common threads in other people's > > issues was bad drives, so I ran a self-test against each of the > > existing drives and found one drive that failed the read test. > > Thinking I had the culprit now, I dropped that drive out of the array > > and assembled the array again, but the same behavior persists. The > > array reshapes very briefly, then completely stops. > > > > Down to 0 drives of redundancy (in the reshaped section at least), not > > finding any new ideas on any of the forums, mailing list, wiki, etc, > > and very frustrated, I took a break, bought all new drives to build a > > new array in another server and restored from a backup. However, there > > is still some data not captured by the most recent backup that I would > > like to recover, and I'd also like to solve the problem purely to > > understand what happened and how to recover in the future. > > > > Is there anything else I should try to recover this array, or is this > > a lost cause? > > > > Details requested by the wiki to follow and I'm happy to collect any > > further data that would assist. /dev/sdb is the new drive that was > > added, then disconnected. /dev/sdh is the drive that failed a > > self-test and was removed from the array. > > > > Thank you in advance for any help provided! > > > > > > $ uname -a > > Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > > 2023 x86_64 x86_64 x86_64 GNU/Linux > > > > $ mdadm --version > > mdadm - v4.2 - 2021-12-30 > > > > > > $ sudo smartctl -H -i -l scterc /dev/sda > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WCC4N7AT7R7X > > LU WWN Device Id: 5 0014ee 268545f93 > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:27:55 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sda > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WCC4N7AT7R7X > > LU WWN Device Id: 5 0014ee 268545f93 > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:16 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdb > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WXG1A8UGLS42 > > LU WWN Device Id: 5 0014ee 2b75ef53b > > Firmware Version: 80.00A80 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:19 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdc > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WCC4N4HYL32Y > > LU WWN Device Id: 5 0014ee 2630752f8 > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:20 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdd > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68N32N0 > > Serial Number: WD-WCC7K1FF6DYK > > LU WWN Device Id: 5 0014ee 2ba952a30 > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Form Factor: 3.5 inches > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-3 T13/2161-D revision 5 > > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:21 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sde > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WCC4N5ZHTRJF > > LU WWN Device Id: 5 0014ee 2b88b83bb > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:22 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdf > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68AX9N0 > > Serial Number: WD-WMC1T3804790 > > LU WWN Device Id: 5 0014ee 6036b6826 > > Firmware Version: 80.00A80 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:23 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdg > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WMC4N0H692Z9 > > LU WWN Device Id: 5 0014ee 65af39740 > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:24 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdh > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68EUZN0 > > Serial Number: WD-WMC4N0K5S750 > > LU WWN Device Id: 5 0014ee 6b048d9ca > > Firmware Version: 82.00A82 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5400 rpm > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:24 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > $ sudo smartctl -H -i -l scterc /dev/sdi > > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD30EFRX-68AX9N0 > > Serial Number: WD-WMC1T1502475 > > LU WWN Device Id: 5 0014ee 058d2e5cb > > Firmware Version: 80.00A80 > > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > > Local Time is: Sun Sep 3 13:28:27 2023 PDT > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > SCT Error Recovery Control: > > Read: 70 (7.0 seconds) > > Write: 70 (7.0 seconds) > > > > > > $ sudo mdadm --examine /dev/sda > > /dev/sda: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sda1 > > /dev/sda1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0xd > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247728 sectors, after=14336 sectors > > State : clean > > Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 24 sectors - bad > > blocks present. > > Checksum : b6d8f4d1 - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 7 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdb > > /dev/sdb: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdb1 > > /dev/sdb1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247728 sectors, after=14336 sectors > > State : clean > > Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 00:02:59 2023 > > Bad Block Log : 512 entries available at offset 24 sectors > > Checksum : b544a39 - correct > > Events : 181077 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 8 > > Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdc > > /dev/sdc: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdc1 > > /dev/sdc1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0xd > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 72 sectors - bad > > blocks present. > > Checksum : 88d8b8fc - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 4 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdd > > /dev/sdd: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdd1 > > /dev/sdd1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247728 sectors, after=14336 sectors > > State : clean > > Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 24 sectors > > Checksum : d1471d9d - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 6 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sde > > /dev/sde: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sde1 > > /dev/sde1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 72 sectors > > Checksum : e05d0278 - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 5 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdf > > /dev/sdf: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdf1 > > /dev/sdf1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 72 sectors > > Checksum : 26792cc0 - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 0 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdg > > /dev/sdg: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdg1 > > /dev/sdg1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 72 sectors > > Checksum : 6f67d179 - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 1 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdh > > /dev/sdh: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdh1 > > /dev/sdh1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0xd > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 20:09:14 2023 > > Bad Block Log : 512 entries available at offset 72 sectors - bad > > blocks present. > > Checksum : b7696b68 - correct > > Events : 181089 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 2 > > Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --examine /dev/sdi > > /dev/sdi: > > MBR Magic : aa55 > > Partition[0] : 4294967295 sectors at 1 (type ee) > > $ sudo mdadm --examine /dev/sdi1 > > /dev/sdi1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x5 > > Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Name : Blyth:0 (local to host Blyth) > > Creation Time : Tue Aug 4 23:47:57 2015 > > Raid Level : raid6 > > Raid Devices : 9 > > > > Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > > Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > > Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > > Data Offset : 247808 sectors > > Super Offset : 8 sectors > > Unused Space : before=247720 sectors, after=14336 sectors > > State : clean > > Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > > > > Internal Bitmap : 8 sectors from superblock > > Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > > Delta Devices : 1 (8->9) > > > > Update Time : Tue Jul 11 23:12:08 2023 > > Bad Block Log : 512 entries available at offset 72 sectors > > Checksum : 23b6d024 - correct > > Events : 181105 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 3 > > Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > > > > $ sudo mdadm --detail /dev/md0 > > /dev/md0: > > Version : 1.2 > > Raid Level : raid6 > > Total Devices : 9 > > Persistence : Superblock is persistent > > > > State : inactive > > Working Devices : 9 > > > > Delta Devices : 1, (-1->0) > > New Level : raid6 > > New Layout : left-symmetric > > New Chunksize : 512K > > > > Name : Blyth:0 (local to host Blyth) > > UUID : 440dc11e:079308b1:131eda79:9a74c670 > > Events : 181105 > > > > Number Major Minor RaidDevice > > > > - 8 1 - /dev/sda1 > > - 8 129 - /dev/sdi1 > > - 8 113 - /dev/sdh1 > > - 8 97 - /dev/sdg1 > > - 8 81 - /dev/sdf1 > > - 8 65 - /dev/sde1 > > - 8 49 - /dev/sdd1 > > - 8 33 - /dev/sdc1 > > - 8 17 - /dev/sdb1 > > > > $ cat /proc/mdstat > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > > [raid4] [raid10] > > md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > > sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > > 26353689600 blocks super 1.2 > > > > unused devices: <none> > > > > . > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-04 16:38 ` Jason Moss @ 2023-09-05 1:07 ` Yu Kuai 2023-09-06 14:05 ` Jason Moss 0 siblings, 1 reply; 21+ messages in thread From: Yu Kuai @ 2023-09-05 1:07 UTC (permalink / raw) To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, 在 2023/09/05 0:38, Jason Moss 写道: > Hi Kuai, > > Thank you for the suggestion, I was previously on 5.15.0. I've built > an environment with 6.5.0.1 now and assembled the array there, but the > same problem happens. It reshaped for 20-30 seconds, then completely > stopped. > > Processes and /proc/<PID>/stack output: > root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] > root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] > root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] > > [root@arch ~]# cat /proc/24593/stack > [<0>] rescuer_thread+0x2b0/0x3b0 > [<0>] kthread+0xe8/0x120 > [<0>] ret_from_fork+0x34/0x50 > [<0>] ret_from_fork_asm+0x1b/0x30 > > [root@arch ~]# cat /proc/24594/stack > > [root@arch ~]# cat /proc/24595/stack > [<0>] reshape_request+0x416/0x9f0 [raid456] Can you provide the addr2line result? Let's see where reshape_request() is stuck first. Thanks, Kuai > [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] > [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] > [<0>] md_thread+0xae/0x190 [md_mod] > [<0>] kthread+0xe8/0x120 > [<0>] ret_from_fork+0x34/0x50 > [<0>] ret_from_fork_asm+0x1b/0x30 > > Please let me know if there's a better way to provide the stack info. > > Thank you > > On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/09/04 5:39, Jason Moss 写道: >>> Hello, >>> >>> I recently attempted to add a new drive to my 8-drive RAID 6 array, >>> growing it to 9 drives. I've done similar before with the same array, >>> having previously grown it from 6 drives to 7 and then from 7 to 8 >>> with no issues. Drives are WD Reds, most older than 2019, some >>> (including the newest) newer, but all confirmed CMR and not SMR. >>> >>> Process used to expand the array: >>> mdadm --add /dev/md0 /dev/sdb1 >>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 >>> >>> The reshape started off fine, the process was underway, and the volume >>> was still usable as expected. However, 15-30 minutes into the reshape, >>> I lost access to the contents of the drive. Checking /proc/mdstat, the >>> reshape was stopped at 0.6% with the counter not incrementing at all. >>> Any process accessing the array would just hang until killed. I waited >> >> What kernel version are you using? And it'll be very helpful if you can >> collect the stack of all stuck thread. There is a known deadlock for >> raid5 related to reshape, and it's fixed in v6.5: >> >> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com >> >>> a half hour and there was still no further change to the counter. At >>> this point, I restarted the server and found that when it came back up >>> it would begin reshaping again, but only very briefly, under 30 >>> seconds, but the counter would be increasing during that time. >>> >>> I searched furiously for ideas and tried stopping and reassembling the >>> array, assembling with an invalid-backup flag, echoing "frozen" then >>> "reshape" to the sync_action file, and echoing "max" to the sync_max >>> file. Nothing ever seemed to make a difference. >>> >> >> Don't do this before v6.5, echo "reshape" while reshape is still in >> progress will corrupt your data: >> >> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com >> >> Thanks, >> Kuai >> >>> Here is where I slightly panicked, worried that I'd borked my array, >>> and powered off the server again and disconnected the new drive that >>> was just added, assuming that since it was the change, it may be the >>> problem despite having burn-in tested it, and figuring that I'll rush >>> order a new drive, so long as the reshape continues and I can just >>> rebuild onto a new drive once the reshape finishes. However, this made >>> no difference and the array continued to not rebuild. >>> >>> Much searching later, I'd found nothing substantially different then >>> I'd already tried and one of the common threads in other people's >>> issues was bad drives, so I ran a self-test against each of the >>> existing drives and found one drive that failed the read test. >>> Thinking I had the culprit now, I dropped that drive out of the array >>> and assembled the array again, but the same behavior persists. The >>> array reshapes very briefly, then completely stops. >>> >>> Down to 0 drives of redundancy (in the reshaped section at least), not >>> finding any new ideas on any of the forums, mailing list, wiki, etc, >>> and very frustrated, I took a break, bought all new drives to build a >>> new array in another server and restored from a backup. However, there >>> is still some data not captured by the most recent backup that I would >>> like to recover, and I'd also like to solve the problem purely to >>> understand what happened and how to recover in the future. >>> >>> Is there anything else I should try to recover this array, or is this >>> a lost cause? >>> >>> Details requested by the wiki to follow and I'm happy to collect any >>> further data that would assist. /dev/sdb is the new drive that was >>> added, then disconnected. /dev/sdh is the drive that failed a >>> self-test and was removed from the array. >>> >>> Thank you in advance for any help provided! >>> >>> >>> $ uname -a >>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC >>> 2023 x86_64 x86_64 x86_64 GNU/Linux >>> >>> $ mdadm --version >>> mdadm - v4.2 - 2021-12-30 >>> >>> >>> $ sudo smartctl -H -i -l scterc /dev/sda >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WCC4N7AT7R7X >>> LU WWN Device Id: 5 0014ee 268545f93 >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:27:55 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sda >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WCC4N7AT7R7X >>> LU WWN Device Id: 5 0014ee 268545f93 >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:16 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdb >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WXG1A8UGLS42 >>> LU WWN Device Id: 5 0014ee 2b75ef53b >>> Firmware Version: 80.00A80 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:19 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdc >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WCC4N4HYL32Y >>> LU WWN Device Id: 5 0014ee 2630752f8 >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:20 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdd >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68N32N0 >>> Serial Number: WD-WCC7K1FF6DYK >>> LU WWN Device Id: 5 0014ee 2ba952a30 >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Form Factor: 3.5 inches >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-3 T13/2161-D revision 5 >>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:21 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sde >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WCC4N5ZHTRJF >>> LU WWN Device Id: 5 0014ee 2b88b83bb >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:22 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdf >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68AX9N0 >>> Serial Number: WD-WMC1T3804790 >>> LU WWN Device Id: 5 0014ee 6036b6826 >>> Firmware Version: 80.00A80 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:23 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdg >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WMC4N0H692Z9 >>> LU WWN Device Id: 5 0014ee 65af39740 >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdh >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68EUZN0 >>> Serial Number: WD-WMC4N0K5S750 >>> LU WWN Device Id: 5 0014ee 6b048d9ca >>> Firmware Version: 82.00A82 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Rotation Rate: 5400 rpm >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> $ sudo smartctl -H -i -l scterc /dev/sdi >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Western Digital Red >>> Device Model: WDC WD30EFRX-68AX9N0 >>> Serial Number: WD-WMC1T1502475 >>> LU WWN Device Id: 5 0014ee 058d2e5cb >>> Firmware Version: 80.00A80 >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: ACS-2 (minor revision not indicated) >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>> Local Time is: Sun Sep 3 13:28:27 2023 PDT >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> SCT Error Recovery Control: >>> Read: 70 (7.0 seconds) >>> Write: 70 (7.0 seconds) >>> >>> >>> $ sudo mdadm --examine /dev/sda >>> /dev/sda: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sda1 >>> /dev/sda1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0xd >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247728 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 24 sectors - bad >>> blocks present. >>> Checksum : b6d8f4d1 - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 7 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdb >>> /dev/sdb: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdb1 >>> /dev/sdb1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247728 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 00:02:59 2023 >>> Bad Block Log : 512 entries available at offset 24 sectors >>> Checksum : b544a39 - correct >>> Events : 181077 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 8 >>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdc >>> /dev/sdc: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdc1 >>> /dev/sdc1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0xd >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>> blocks present. >>> Checksum : 88d8b8fc - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 4 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdd >>> /dev/sdd: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdd1 >>> /dev/sdd1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247728 sectors, after=14336 sectors >>> State : clean >>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 24 sectors >>> Checksum : d1471d9d - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 6 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sde >>> /dev/sde: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sde1 >>> /dev/sde1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors >>> Checksum : e05d0278 - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 5 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdf >>> /dev/sdf: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdf1 >>> /dev/sdf1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors >>> Checksum : 26792cc0 - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 0 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdg >>> /dev/sdg: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdg1 >>> /dev/sdg1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors >>> Checksum : 6f67d179 - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 1 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdh >>> /dev/sdh: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdh1 >>> /dev/sdh1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0xd >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 20:09:14 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>> blocks present. >>> Checksum : b7696b68 - correct >>> Events : 181089 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 2 >>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --examine /dev/sdi >>> /dev/sdi: >>> MBR Magic : aa55 >>> Partition[0] : 4294967295 sectors at 1 (type ee) >>> $ sudo mdadm --examine /dev/sdi1 >>> /dev/sdi1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x5 >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Name : Blyth:0 (local to host Blyth) >>> Creation Time : Tue Aug 4 23:47:57 2015 >>> Raid Level : raid6 >>> Raid Devices : 9 >>> >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>> Data Offset : 247808 sectors >>> Super Offset : 8 sectors >>> Unused Space : before=247720 sectors, after=14336 sectors >>> State : clean >>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 >>> >>> Internal Bitmap : 8 sectors from superblock >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>> Delta Devices : 1 (8->9) >>> >>> Update Time : Tue Jul 11 23:12:08 2023 >>> Bad Block Log : 512 entries available at offset 72 sectors >>> Checksum : 23b6d024 - correct >>> Events : 181105 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 3 >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>> >>> $ sudo mdadm --detail /dev/md0 >>> /dev/md0: >>> Version : 1.2 >>> Raid Level : raid6 >>> Total Devices : 9 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> Working Devices : 9 >>> >>> Delta Devices : 1, (-1->0) >>> New Level : raid6 >>> New Layout : left-symmetric >>> New Chunksize : 512K >>> >>> Name : Blyth:0 (local to host Blyth) >>> UUID : 440dc11e:079308b1:131eda79:9a74c670 >>> Events : 181105 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 1 - /dev/sda1 >>> - 8 129 - /dev/sdi1 >>> - 8 113 - /dev/sdh1 >>> - 8 97 - /dev/sdg1 >>> - 8 81 - /dev/sdf1 >>> - 8 65 - /dev/sde1 >>> - 8 49 - /dev/sdd1 >>> - 8 33 - /dev/sdc1 >>> - 8 17 - /dev/sdb1 >>> >>> $ cat /proc/mdstat >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) >>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) >>> 26353689600 blocks super 1.2 >>> >>> unused devices: <none> >>> >>> . >>> >> > > . > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-05 1:07 ` Yu Kuai @ 2023-09-06 14:05 ` Jason Moss 2023-09-07 1:38 ` Yu Kuai 0 siblings, 1 reply; 21+ messages in thread From: Jason Moss @ 2023-09-06 14:05 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi Kuai, I ended up using gdb rather than addr2line, as that output didn't give me the global offset. Maybe there's a better way, but this seems to be similar to what I expected. (gdb) list *(reshape_request+0x416) 0x11566 is in reshape_request (drivers/md/raid5.c:6396). 6391 if ((mddev->reshape_backwards 6392 ? (safepos > writepos && readpos < writepos) 6393 : (safepos < writepos && readpos > writepos)) || 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { 6395 /* Cannot proceed until we've updated the superblock... */ 6396 wait_event(conf->wait_for_overlap, 6397 atomic_read(&conf->reshape_stripes)==0 6398 || test_bit(MD_RECOVERY_INTR, &mddev->recovery)); 6399 if (atomic_read(&conf->reshape_stripes) != 0) 6400 return 0; Thanks On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/05 0:38, Jason Moss 写道: > > Hi Kuai, > > > > Thank you for the suggestion, I was previously on 5.15.0. I've built > > an environment with 6.5.0.1 now and assembled the array there, but the > > same problem happens. It reshaped for 20-30 seconds, then completely > > stopped. > > > > Processes and /proc/<PID>/stack output: > > root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] > > root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] > > root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] > > > > [root@arch ~]# cat /proc/24593/stack > > [<0>] rescuer_thread+0x2b0/0x3b0 > > [<0>] kthread+0xe8/0x120 > > [<0>] ret_from_fork+0x34/0x50 > > [<0>] ret_from_fork_asm+0x1b/0x30 > > > > [root@arch ~]# cat /proc/24594/stack > > > > [root@arch ~]# cat /proc/24595/stack > > [<0>] reshape_request+0x416/0x9f0 [raid456] > Can you provide the addr2line result? Let's see where reshape_request() > is stuck first. > > Thanks, > Kuai > > > [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] > > [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] > > [<0>] md_thread+0xae/0x190 [md_mod] > > [<0>] kthread+0xe8/0x120 > > [<0>] ret_from_fork+0x34/0x50 > > [<0>] ret_from_fork_asm+0x1b/0x30 > > > > Please let me know if there's a better way to provide the stack info. > > > > Thank you > > > > On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2023/09/04 5:39, Jason Moss 写道: > >>> Hello, > >>> > >>> I recently attempted to add a new drive to my 8-drive RAID 6 array, > >>> growing it to 9 drives. I've done similar before with the same array, > >>> having previously grown it from 6 drives to 7 and then from 7 to 8 > >>> with no issues. Drives are WD Reds, most older than 2019, some > >>> (including the newest) newer, but all confirmed CMR and not SMR. > >>> > >>> Process used to expand the array: > >>> mdadm --add /dev/md0 /dev/sdb1 > >>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > >>> > >>> The reshape started off fine, the process was underway, and the volume > >>> was still usable as expected. However, 15-30 minutes into the reshape, > >>> I lost access to the contents of the drive. Checking /proc/mdstat, the > >>> reshape was stopped at 0.6% with the counter not incrementing at all. > >>> Any process accessing the array would just hang until killed. I waited > >> > >> What kernel version are you using? And it'll be very helpful if you can > >> collect the stack of all stuck thread. There is a known deadlock for > >> raid5 related to reshape, and it's fixed in v6.5: > >> > >> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > >> > >>> a half hour and there was still no further change to the counter. At > >>> this point, I restarted the server and found that when it came back up > >>> it would begin reshaping again, but only very briefly, under 30 > >>> seconds, but the counter would be increasing during that time. > >>> > >>> I searched furiously for ideas and tried stopping and reassembling the > >>> array, assembling with an invalid-backup flag, echoing "frozen" then > >>> "reshape" to the sync_action file, and echoing "max" to the sync_max > >>> file. Nothing ever seemed to make a difference. > >>> > >> > >> Don't do this before v6.5, echo "reshape" while reshape is still in > >> progress will corrupt your data: > >> > >> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com > >> > >> Thanks, > >> Kuai > >> > >>> Here is where I slightly panicked, worried that I'd borked my array, > >>> and powered off the server again and disconnected the new drive that > >>> was just added, assuming that since it was the change, it may be the > >>> problem despite having burn-in tested it, and figuring that I'll rush > >>> order a new drive, so long as the reshape continues and I can just > >>> rebuild onto a new drive once the reshape finishes. However, this made > >>> no difference and the array continued to not rebuild. > >>> > >>> Much searching later, I'd found nothing substantially different then > >>> I'd already tried and one of the common threads in other people's > >>> issues was bad drives, so I ran a self-test against each of the > >>> existing drives and found one drive that failed the read test. > >>> Thinking I had the culprit now, I dropped that drive out of the array > >>> and assembled the array again, but the same behavior persists. The > >>> array reshapes very briefly, then completely stops. > >>> > >>> Down to 0 drives of redundancy (in the reshaped section at least), not > >>> finding any new ideas on any of the forums, mailing list, wiki, etc, > >>> and very frustrated, I took a break, bought all new drives to build a > >>> new array in another server and restored from a backup. However, there > >>> is still some data not captured by the most recent backup that I would > >>> like to recover, and I'd also like to solve the problem purely to > >>> understand what happened and how to recover in the future. > >>> > >>> Is there anything else I should try to recover this array, or is this > >>> a lost cause? > >>> > >>> Details requested by the wiki to follow and I'm happy to collect any > >>> further data that would assist. /dev/sdb is the new drive that was > >>> added, then disconnected. /dev/sdh is the drive that failed a > >>> self-test and was removed from the array. > >>> > >>> Thank you in advance for any help provided! > >>> > >>> > >>> $ uname -a > >>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > >>> 2023 x86_64 x86_64 x86_64 GNU/Linux > >>> > >>> $ mdadm --version > >>> mdadm - v4.2 - 2021-12-30 > >>> > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sda > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WCC4N7AT7R7X > >>> LU WWN Device Id: 5 0014ee 268545f93 > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:27:55 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sda > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WCC4N7AT7R7X > >>> LU WWN Device Id: 5 0014ee 268545f93 > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:16 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdb > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WXG1A8UGLS42 > >>> LU WWN Device Id: 5 0014ee 2b75ef53b > >>> Firmware Version: 80.00A80 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:19 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdc > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WCC4N4HYL32Y > >>> LU WWN Device Id: 5 0014ee 2630752f8 > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:20 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdd > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68N32N0 > >>> Serial Number: WD-WCC7K1FF6DYK > >>> LU WWN Device Id: 5 0014ee 2ba952a30 > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Form Factor: 3.5 inches > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-3 T13/2161-D revision 5 > >>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:21 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sde > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WCC4N5ZHTRJF > >>> LU WWN Device Id: 5 0014ee 2b88b83bb > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:22 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdf > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68AX9N0 > >>> Serial Number: WD-WMC1T3804790 > >>> LU WWN Device Id: 5 0014ee 6036b6826 > >>> Firmware Version: 80.00A80 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:23 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdg > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WMC4N0H692Z9 > >>> LU WWN Device Id: 5 0014ee 65af39740 > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdh > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68EUZN0 > >>> Serial Number: WD-WMC4N0K5S750 > >>> LU WWN Device Id: 5 0014ee 6b048d9ca > >>> Firmware Version: 82.00A82 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Rotation Rate: 5400 rpm > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> $ sudo smartctl -H -i -l scterc /dev/sdi > >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>> > >>> === START OF INFORMATION SECTION === > >>> Model Family: Western Digital Red > >>> Device Model: WDC WD30EFRX-68AX9N0 > >>> Serial Number: WD-WMC1T1502475 > >>> LU WWN Device Id: 5 0014ee 058d2e5cb > >>> Firmware Version: 80.00A80 > >>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>> Device is: In smartctl database [for details use: -P show] > >>> ATA Version is: ACS-2 (minor revision not indicated) > >>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>> Local Time is: Sun Sep 3 13:28:27 2023 PDT > >>> SMART support is: Available - device has SMART capability. > >>> SMART support is: Enabled > >>> > >>> === START OF READ SMART DATA SECTION === > >>> SMART overall-health self-assessment test result: PASSED > >>> > >>> SCT Error Recovery Control: > >>> Read: 70 (7.0 seconds) > >>> Write: 70 (7.0 seconds) > >>> > >>> > >>> $ sudo mdadm --examine /dev/sda > >>> /dev/sda: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sda1 > >>> /dev/sda1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0xd > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247728 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 24 sectors - bad > >>> blocks present. > >>> Checksum : b6d8f4d1 - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 7 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdb > >>> /dev/sdb: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdb1 > >>> /dev/sdb1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247728 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 00:02:59 2023 > >>> Bad Block Log : 512 entries available at offset 24 sectors > >>> Checksum : b544a39 - correct > >>> Events : 181077 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 8 > >>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdc > >>> /dev/sdc: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdc1 > >>> /dev/sdc1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0xd > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>> blocks present. > >>> Checksum : 88d8b8fc - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 4 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdd > >>> /dev/sdd: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdd1 > >>> /dev/sdd1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247728 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 24 sectors > >>> Checksum : d1471d9d - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 6 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sde > >>> /dev/sde: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sde1 > >>> /dev/sde1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors > >>> Checksum : e05d0278 - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 5 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdf > >>> /dev/sdf: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdf1 > >>> /dev/sdf1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors > >>> Checksum : 26792cc0 - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 0 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdg > >>> /dev/sdg: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdg1 > >>> /dev/sdg1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors > >>> Checksum : 6f67d179 - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 1 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdh > >>> /dev/sdh: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdh1 > >>> /dev/sdh1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0xd > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 20:09:14 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>> blocks present. > >>> Checksum : b7696b68 - correct > >>> Events : 181089 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 2 > >>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --examine /dev/sdi > >>> /dev/sdi: > >>> MBR Magic : aa55 > >>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>> $ sudo mdadm --examine /dev/sdi1 > >>> /dev/sdi1: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x5 > >>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Name : Blyth:0 (local to host Blyth) > >>> Creation Time : Tue Aug 4 23:47:57 2015 > >>> Raid Level : raid6 > >>> Raid Devices : 9 > >>> > >>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>> Data Offset : 247808 sectors > >>> Super Offset : 8 sectors > >>> Unused Space : before=247720 sectors, after=14336 sectors > >>> State : clean > >>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > >>> > >>> Internal Bitmap : 8 sectors from superblock > >>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>> Delta Devices : 1 (8->9) > >>> > >>> Update Time : Tue Jul 11 23:12:08 2023 > >>> Bad Block Log : 512 entries available at offset 72 sectors > >>> Checksum : 23b6d024 - correct > >>> Events : 181105 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 512K > >>> > >>> Device Role : Active device 3 > >>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>> > >>> $ sudo mdadm --detail /dev/md0 > >>> /dev/md0: > >>> Version : 1.2 > >>> Raid Level : raid6 > >>> Total Devices : 9 > >>> Persistence : Superblock is persistent > >>> > >>> State : inactive > >>> Working Devices : 9 > >>> > >>> Delta Devices : 1, (-1->0) > >>> New Level : raid6 > >>> New Layout : left-symmetric > >>> New Chunksize : 512K > >>> > >>> Name : Blyth:0 (local to host Blyth) > >>> UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>> Events : 181105 > >>> > >>> Number Major Minor RaidDevice > >>> > >>> - 8 1 - /dev/sda1 > >>> - 8 129 - /dev/sdi1 > >>> - 8 113 - /dev/sdh1 > >>> - 8 97 - /dev/sdg1 > >>> - 8 81 - /dev/sdf1 > >>> - 8 65 - /dev/sde1 > >>> - 8 49 - /dev/sdd1 > >>> - 8 33 - /dev/sdc1 > >>> - 8 17 - /dev/sdb1 > >>> > >>> $ cat /proc/mdstat > >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >>> [raid4] [raid10] > >>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > >>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > >>> 26353689600 blocks super 1.2 > >>> > >>> unused devices: <none> > >>> > >>> . > >>> > >> > > > > . > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-06 14:05 ` Jason Moss @ 2023-09-07 1:38 ` Yu Kuai 2023-09-07 5:44 ` Jason Moss 0 siblings, 1 reply; 21+ messages in thread From: Yu Kuai @ 2023-09-07 1:38 UTC (permalink / raw) To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, 在 2023/09/06 22:05, Jason Moss 写道: > Hi Kuai, > > I ended up using gdb rather than addr2line, as that output didn't give > me the global offset. Maybe there's a better way, but this seems to be > similar to what I expected. It's ok. > > (gdb) list *(reshape_request+0x416) > 0x11566 is in reshape_request (drivers/md/raid5.c:6396). > 6391 if ((mddev->reshape_backwards > 6392 ? (safepos > writepos && readpos < writepos) > 6393 : (safepos < writepos && readpos > writepos)) || > 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { > 6395 /* Cannot proceed until we've updated the > superblock... */ > 6396 wait_event(conf->wait_for_overlap, > 6397 atomic_read(&conf->reshape_stripes)==0 > 6398 || test_bit(MD_RECOVERY_INTR, If reshape is stuck here, which means: 1) Either reshape io is stuck somewhere and never complete; 2) Or the counter reshape_stripes is broken; Can you read following debugfs files to verify if io is stuck in underlying disk? /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} Furthermore, echo frozen should break above wait_event() because 'MD_RECOVERY_INTR' will be set, however, based on your description, the problem still exist. Can you collect stack and addr2line result of stuck thread after echo frozen? Thanks, Kuai > &mddev->recovery)); > 6399 if (atomic_read(&conf->reshape_stripes) != 0) > 6400 return 0; > > Thanks > > On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/09/05 0:38, Jason Moss 写道: >>> Hi Kuai, >>> >>> Thank you for the suggestion, I was previously on 5.15.0. I've built >>> an environment with 6.5.0.1 now and assembled the array there, but the >>> same problem happens. It reshaped for 20-30 seconds, then completely >>> stopped. >>> >>> Processes and /proc/<PID>/stack output: >>> root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] >>> root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] >>> root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] >>> >>> [root@arch ~]# cat /proc/24593/stack >>> [<0>] rescuer_thread+0x2b0/0x3b0 >>> [<0>] kthread+0xe8/0x120 >>> [<0>] ret_from_fork+0x34/0x50 >>> [<0>] ret_from_fork_asm+0x1b/0x30 >>> >>> [root@arch ~]# cat /proc/24594/stack >>> >>> [root@arch ~]# cat /proc/24595/stack >>> [<0>] reshape_request+0x416/0x9f0 [raid456] >> Can you provide the addr2line result? Let's see where reshape_request() >> is stuck first. >> >> Thanks, >> Kuai >> >>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] >>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] >>> [<0>] md_thread+0xae/0x190 [md_mod] >>> [<0>] kthread+0xe8/0x120 >>> [<0>] ret_from_fork+0x34/0x50 >>> [<0>] ret_from_fork_asm+0x1b/0x30 >>> >>> Please let me know if there's a better way to provide the stack info. >>> >>> Thank you >>> >>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> Hi, >>>> >>>> 在 2023/09/04 5:39, Jason Moss 写道: >>>>> Hello, >>>>> >>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array, >>>>> growing it to 9 drives. I've done similar before with the same array, >>>>> having previously grown it from 6 drives to 7 and then from 7 to 8 >>>>> with no issues. Drives are WD Reds, most older than 2019, some >>>>> (including the newest) newer, but all confirmed CMR and not SMR. >>>>> >>>>> Process used to expand the array: >>>>> mdadm --add /dev/md0 /dev/sdb1 >>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 >>>>> >>>>> The reshape started off fine, the process was underway, and the volume >>>>> was still usable as expected. However, 15-30 minutes into the reshape, >>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the >>>>> reshape was stopped at 0.6% with the counter not incrementing at all. >>>>> Any process accessing the array would just hang until killed. I waited >>>> >>>> What kernel version are you using? And it'll be very helpful if you can >>>> collect the stack of all stuck thread. There is a known deadlock for >>>> raid5 related to reshape, and it's fixed in v6.5: >>>> >>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com >>>> >>>>> a half hour and there was still no further change to the counter. At >>>>> this point, I restarted the server and found that when it came back up >>>>> it would begin reshaping again, but only very briefly, under 30 >>>>> seconds, but the counter would be increasing during that time. >>>>> >>>>> I searched furiously for ideas and tried stopping and reassembling the >>>>> array, assembling with an invalid-backup flag, echoing "frozen" then >>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max >>>>> file. Nothing ever seemed to make a difference. >>>>> >>>> >>>> Don't do this before v6.5, echo "reshape" while reshape is still in >>>> progress will corrupt your data: >>>> >>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com >>>> >>>> Thanks, >>>> Kuai >>>> >>>>> Here is where I slightly panicked, worried that I'd borked my array, >>>>> and powered off the server again and disconnected the new drive that >>>>> was just added, assuming that since it was the change, it may be the >>>>> problem despite having burn-in tested it, and figuring that I'll rush >>>>> order a new drive, so long as the reshape continues and I can just >>>>> rebuild onto a new drive once the reshape finishes. However, this made >>>>> no difference and the array continued to not rebuild. >>>>> >>>>> Much searching later, I'd found nothing substantially different then >>>>> I'd already tried and one of the common threads in other people's >>>>> issues was bad drives, so I ran a self-test against each of the >>>>> existing drives and found one drive that failed the read test. >>>>> Thinking I had the culprit now, I dropped that drive out of the array >>>>> and assembled the array again, but the same behavior persists. The >>>>> array reshapes very briefly, then completely stops. >>>>> >>>>> Down to 0 drives of redundancy (in the reshaped section at least), not >>>>> finding any new ideas on any of the forums, mailing list, wiki, etc, >>>>> and very frustrated, I took a break, bought all new drives to build a >>>>> new array in another server and restored from a backup. However, there >>>>> is still some data not captured by the most recent backup that I would >>>>> like to recover, and I'd also like to solve the problem purely to >>>>> understand what happened and how to recover in the future. >>>>> >>>>> Is there anything else I should try to recover this array, or is this >>>>> a lost cause? >>>>> >>>>> Details requested by the wiki to follow and I'm happy to collect any >>>>> further data that would assist. /dev/sdb is the new drive that was >>>>> added, then disconnected. /dev/sdh is the drive that failed a >>>>> self-test and was removed from the array. >>>>> >>>>> Thank you in advance for any help provided! >>>>> >>>>> >>>>> $ uname -a >>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC >>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> $ mdadm --version >>>>> mdadm - v4.2 - 2021-12-30 >>>>> >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sda >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WCC4N7AT7R7X >>>>> LU WWN Device Id: 5 0014ee 268545f93 >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:27:55 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sda >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WCC4N7AT7R7X >>>>> LU WWN Device Id: 5 0014ee 268545f93 >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:16 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdb >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WXG1A8UGLS42 >>>>> LU WWN Device Id: 5 0014ee 2b75ef53b >>>>> Firmware Version: 80.00A80 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:19 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdc >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WCC4N4HYL32Y >>>>> LU WWN Device Id: 5 0014ee 2630752f8 >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:20 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdd >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68N32N0 >>>>> Serial Number: WD-WCC7K1FF6DYK >>>>> LU WWN Device Id: 5 0014ee 2ba952a30 >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Form Factor: 3.5 inches >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-3 T13/2161-D revision 5 >>>>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:21 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sde >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WCC4N5ZHTRJF >>>>> LU WWN Device Id: 5 0014ee 2b88b83bb >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:22 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdf >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68AX9N0 >>>>> Serial Number: WD-WMC1T3804790 >>>>> LU WWN Device Id: 5 0014ee 6036b6826 >>>>> Firmware Version: 80.00A80 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:23 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdg >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WMC4N0H692Z9 >>>>> LU WWN Device Id: 5 0014ee 65af39740 >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdh >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>> Serial Number: WD-WMC4N0K5S750 >>>>> LU WWN Device Id: 5 0014ee 6b048d9ca >>>>> Firmware Version: 82.00A82 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Rotation Rate: 5400 rpm >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdi >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>> >>>>> === START OF INFORMATION SECTION === >>>>> Model Family: Western Digital Red >>>>> Device Model: WDC WD30EFRX-68AX9N0 >>>>> Serial Number: WD-WMC1T1502475 >>>>> LU WWN Device Id: 5 0014ee 058d2e5cb >>>>> Firmware Version: 80.00A80 >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> Device is: In smartctl database [for details use: -P show] >>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>> Local Time is: Sun Sep 3 13:28:27 2023 PDT >>>>> SMART support is: Available - device has SMART capability. >>>>> SMART support is: Enabled >>>>> >>>>> === START OF READ SMART DATA SECTION === >>>>> SMART overall-health self-assessment test result: PASSED >>>>> >>>>> SCT Error Recovery Control: >>>>> Read: 70 (7.0 seconds) >>>>> Write: 70 (7.0 seconds) >>>>> >>>>> >>>>> $ sudo mdadm --examine /dev/sda >>>>> /dev/sda: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sda1 >>>>> /dev/sda1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0xd >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 24 sectors - bad >>>>> blocks present. >>>>> Checksum : b6d8f4d1 - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 7 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdb >>>>> /dev/sdb: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdb1 >>>>> /dev/sdb1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 00:02:59 2023 >>>>> Bad Block Log : 512 entries available at offset 24 sectors >>>>> Checksum : b544a39 - correct >>>>> Events : 181077 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 8 >>>>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdc >>>>> /dev/sdc: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdc1 >>>>> /dev/sdc1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0xd >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>>>> blocks present. >>>>> Checksum : 88d8b8fc - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 4 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdd >>>>> /dev/sdd: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdd1 >>>>> /dev/sdd1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 24 sectors >>>>> Checksum : d1471d9d - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 6 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sde >>>>> /dev/sde: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sde1 >>>>> /dev/sde1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>> Checksum : e05d0278 - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 5 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdf >>>>> /dev/sdf: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdf1 >>>>> /dev/sdf1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>> Checksum : 26792cc0 - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 0 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdg >>>>> /dev/sdg: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdg1 >>>>> /dev/sdg1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>> Checksum : 6f67d179 - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 1 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdh >>>>> /dev/sdh: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdh1 >>>>> /dev/sdh1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0xd >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 20:09:14 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>>>> blocks present. >>>>> Checksum : b7696b68 - correct >>>>> Events : 181089 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 2 >>>>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --examine /dev/sdi >>>>> /dev/sdi: >>>>> MBR Magic : aa55 >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>> $ sudo mdadm --examine /dev/sdi1 >>>>> /dev/sdi1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x5 >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Name : Blyth:0 (local to host Blyth) >>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>> Raid Level : raid6 >>>>> Raid Devices : 9 >>>>> >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>> Data Offset : 247808 sectors >>>>> Super Offset : 8 sectors >>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>> State : clean >>>>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 >>>>> >>>>> Internal Bitmap : 8 sectors from superblock >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>> Delta Devices : 1 (8->9) >>>>> >>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>> Checksum : 23b6d024 - correct >>>>> Events : 181105 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 3 >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>> >>>>> $ sudo mdadm --detail /dev/md0 >>>>> /dev/md0: >>>>> Version : 1.2 >>>>> Raid Level : raid6 >>>>> Total Devices : 9 >>>>> Persistence : Superblock is persistent >>>>> >>>>> State : inactive >>>>> Working Devices : 9 >>>>> >>>>> Delta Devices : 1, (-1->0) >>>>> New Level : raid6 >>>>> New Layout : left-symmetric >>>>> New Chunksize : 512K >>>>> >>>>> Name : Blyth:0 (local to host Blyth) >>>>> UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>> Events : 181105 >>>>> >>>>> Number Major Minor RaidDevice >>>>> >>>>> - 8 1 - /dev/sda1 >>>>> - 8 129 - /dev/sdi1 >>>>> - 8 113 - /dev/sdh1 >>>>> - 8 97 - /dev/sdg1 >>>>> - 8 81 - /dev/sdf1 >>>>> - 8 65 - /dev/sde1 >>>>> - 8 49 - /dev/sdd1 >>>>> - 8 33 - /dev/sdc1 >>>>> - 8 17 - /dev/sdb1 >>>>> >>>>> $ cat /proc/mdstat >>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>>>> [raid4] [raid10] >>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) >>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) >>>>> 26353689600 blocks super 1.2 >>>>> >>>>> unused devices: <none> >>>>> >>>>> . >>>>> >>>> >>> >>> . >>> >> > > . > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-07 1:38 ` Yu Kuai @ 2023-09-07 5:44 ` Jason Moss [not found] ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com> 0 siblings, 1 reply; 21+ messages in thread From: Jason Moss @ 2023-09-07 5:44 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/06 22:05, Jason Moss 写道: > > Hi Kuai, > > > > I ended up using gdb rather than addr2line, as that output didn't give > > me the global offset. Maybe there's a better way, but this seems to be > > similar to what I expected. > > It's ok. > > > > (gdb) list *(reshape_request+0x416) > > 0x11566 is in reshape_request (drivers/md/raid5.c:6396). > > 6391 if ((mddev->reshape_backwards > > 6392 ? (safepos > writepos && readpos < writepos) > > 6393 : (safepos < writepos && readpos > writepos)) || > > 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { > > 6395 /* Cannot proceed until we've updated the > > superblock... */ > > 6396 wait_event(conf->wait_for_overlap, > > 6397 atomic_read(&conf->reshape_stripes)==0 > > 6398 || test_bit(MD_RECOVERY_INTR, > > If reshape is stuck here, which means: > > 1) Either reshape io is stuck somewhere and never complete; > 2) Or the counter reshape_stripes is broken; > > Can you read following debugfs files to verify if io is stuck in > underlying disk? > > /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} > I'll attach this below. > Furthermore, echo frozen should break above wait_event() because > 'MD_RECOVERY_INTR' will be set, however, based on your description, > the problem still exist. Can you collect stack and addr2line result > of stuck thread after echo frozen? > I echo'd frozen to /sys/block/md0/md/sync_action, however the echo call has been sitting for about 30 minutes, maybe longer, and has not returned. Here's the current state: root 454 0.0 0.0 0 0 ? I< Sep05 0:00 [raid5wq] root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) root 456 99.9 0.0 0 0 ? R Sep05 1543:40 [md0_raid6] root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] [jason@arch md]$ sudo cat /proc/457/stack [<0>] md_do_sync+0xef2/0x11d0 [md_mod] [<0>] md_thread+0xae/0x190 [md_mod] [<0>] kthread+0xe8/0x120 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1b/0x30 Reading symbols from md-mod.ko... (gdb) list *(md_do_sync+0xef2) 0xb3a2 is in md_do_sync (drivers/md/md.c:9035). 9030 ? "interrupted" : "done"); 9031 /* 9032 * this also signals 'finished resyncing' to md_stop 9033 */ 9034 blk_finish_plug(&plug); 9035 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); 9036 9037 if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && 9038 !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && 9039 mddev->curr_resync >= MD_RESYNC_ACTIVE) { The debugfs info: [root@arch ~]# cat /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=64 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=64 busy=1 cleared=55 bits_per_word=16 map_nr=4 alloc_hint={40, 20, 46, 0} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=48 nr_tags=32 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=32 busy=0 cleared=27 bits_per_word=8 map_nr=4 alloc_hint={19, 26, 5, 21} wake_batch=4 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx* /{sched_tags,tags,busy,dispatch} nr_tags=64 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=64 busy=1 cleared=56 bits_per_word=16 map_nr=4 alloc_hint={57, 43, 14, 19} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=48 nr_tags=32 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=32 busy=0 cleared=24 bits_per_word=8 map_nr=4 alloc_hint={17, 13, 23, 17} wake_batch=4 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=64 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=64 busy=1 cleared=51 bits_per_word=16 map_nr=4 alloc_hint={36, 43, 15, 7} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=48 nr_tags=32 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=32 busy=0 cleared=31 bits_per_word=8 map_nr=4 alloc_hint={0, 15, 1, 22} wake_batch=4 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=1 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=256 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=256 busy=1 cleared=131 bits_per_word=64 map_nr=4 alloc_hint={125, 46, 83, 205} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=192 nr_tags=10104 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=10104 busy=0 cleared=235 bits_per_word=64 map_nr=158 alloc_hint={503, 2913, 9827, 9851} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=256 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=256 busy=1 cleared=97 bits_per_word=64 map_nr=4 alloc_hint={144, 144, 127, 254} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=192 nr_tags=10104 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=10104 busy=0 cleared=235 bits_per_word=64 map_nr=158 alloc_hint={503, 2913, 9827, 9851} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=256 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=256 busy=1 cleared=34 bits_per_word=64 map_nr=4 alloc_hint={197, 20, 1, 230} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=192 nr_tags=10104 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=10104 busy=0 cleared=235 bits_per_word=64 map_nr=158 alloc_hint={503, 2913, 9827, 9851} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=4294967295 [root@arch ~]# cat /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch} nr_tags=256 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=256 busy=1 cleared=27 bits_per_word=64 map_nr=4 alloc_hint={132, 74, 129, 76} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=192 nr_tags=10104 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=10104 busy=0 cleared=235 bits_per_word=64 map_nr=158 alloc_hint={503, 2913, 9827, 9851} wake_batch=8 wake_index=0 ws_active=0 ws={ {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, {.wait=inactive}, } round_robin=0 min_shallow_depth=4294967295 Thanks for your continued assistance with this! Jason > Thanks, > Kuai > > > &mddev->recovery)); > > 6399 if (atomic_read(&conf->reshape_stripes) != 0) > > 6400 return 0; > > > > Thanks > > > > On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2023/09/05 0:38, Jason Moss 写道: > >>> Hi Kuai, > >>> > >>> Thank you for the suggestion, I was previously on 5.15.0. I've built > >>> an environment with 6.5.0.1 now and assembled the array there, but the > >>> same problem happens. It reshaped for 20-30 seconds, then completely > >>> stopped. > >>> > >>> Processes and /proc/<PID>/stack output: > >>> root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] > >>> root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] > >>> root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] > >>> > >>> [root@arch ~]# cat /proc/24593/stack > >>> [<0>] rescuer_thread+0x2b0/0x3b0 > >>> [<0>] kthread+0xe8/0x120 > >>> [<0>] ret_from_fork+0x34/0x50 > >>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>> > >>> [root@arch ~]# cat /proc/24594/stack > >>> > >>> [root@arch ~]# cat /proc/24595/stack > >>> [<0>] reshape_request+0x416/0x9f0 [raid456] > >> Can you provide the addr2line result? Let's see where reshape_request() > >> is stuck first. > >> > >> Thanks, > >> Kuai > >> > >>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] > >>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] > >>> [<0>] md_thread+0xae/0x190 [md_mod] > >>> [<0>] kthread+0xe8/0x120 > >>> [<0>] ret_from_fork+0x34/0x50 > >>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>> > >>> Please let me know if there's a better way to provide the stack info. > >>> > >>> Thank you > >>> > >>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>> > >>>> Hi, > >>>> > >>>> 在 2023/09/04 5:39, Jason Moss 写道: > >>>>> Hello, > >>>>> > >>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array, > >>>>> growing it to 9 drives. I've done similar before with the same array, > >>>>> having previously grown it from 6 drives to 7 and then from 7 to 8 > >>>>> with no issues. Drives are WD Reds, most older than 2019, some > >>>>> (including the newest) newer, but all confirmed CMR and not SMR. > >>>>> > >>>>> Process used to expand the array: > >>>>> mdadm --add /dev/md0 /dev/sdb1 > >>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > >>>>> > >>>>> The reshape started off fine, the process was underway, and the volume > >>>>> was still usable as expected. However, 15-30 minutes into the reshape, > >>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the > >>>>> reshape was stopped at 0.6% with the counter not incrementing at all. > >>>>> Any process accessing the array would just hang until killed. I waited > >>>> > >>>> What kernel version are you using? And it'll be very helpful if you can > >>>> collect the stack of all stuck thread. There is a known deadlock for > >>>> raid5 related to reshape, and it's fixed in v6.5: > >>>> > >>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > >>>> > >>>>> a half hour and there was still no further change to the counter. At > >>>>> this point, I restarted the server and found that when it came back up > >>>>> it would begin reshaping again, but only very briefly, under 30 > >>>>> seconds, but the counter would be increasing during that time. > >>>>> > >>>>> I searched furiously for ideas and tried stopping and reassembling the > >>>>> array, assembling with an invalid-backup flag, echoing "frozen" then > >>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max > >>>>> file. Nothing ever seemed to make a difference. > >>>>> > >>>> > >>>> Don't do this before v6.5, echo "reshape" while reshape is still in > >>>> progress will corrupt your data: > >>>> > >>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com > >>>> > >>>> Thanks, > >>>> Kuai > >>>> > >>>>> Here is where I slightly panicked, worried that I'd borked my array, > >>>>> and powered off the server again and disconnected the new drive that > >>>>> was just added, assuming that since it was the change, it may be the > >>>>> problem despite having burn-in tested it, and figuring that I'll rush > >>>>> order a new drive, so long as the reshape continues and I can just > >>>>> rebuild onto a new drive once the reshape finishes. However, this made > >>>>> no difference and the array continued to not rebuild. > >>>>> > >>>>> Much searching later, I'd found nothing substantially different then > >>>>> I'd already tried and one of the common threads in other people's > >>>>> issues was bad drives, so I ran a self-test against each of the > >>>>> existing drives and found one drive that failed the read test. > >>>>> Thinking I had the culprit now, I dropped that drive out of the array > >>>>> and assembled the array again, but the same behavior persists. The > >>>>> array reshapes very briefly, then completely stops. > >>>>> > >>>>> Down to 0 drives of redundancy (in the reshaped section at least), not > >>>>> finding any new ideas on any of the forums, mailing list, wiki, etc, > >>>>> and very frustrated, I took a break, bought all new drives to build a > >>>>> new array in another server and restored from a backup. However, there > >>>>> is still some data not captured by the most recent backup that I would > >>>>> like to recover, and I'd also like to solve the problem purely to > >>>>> understand what happened and how to recover in the future. > >>>>> > >>>>> Is there anything else I should try to recover this array, or is this > >>>>> a lost cause? > >>>>> > >>>>> Details requested by the wiki to follow and I'm happy to collect any > >>>>> further data that would assist. /dev/sdb is the new drive that was > >>>>> added, then disconnected. /dev/sdh is the drive that failed a > >>>>> self-test and was removed from the array. > >>>>> > >>>>> Thank you in advance for any help provided! > >>>>> > >>>>> > >>>>> $ uname -a > >>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > >>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux > >>>>> > >>>>> $ mdadm --version > >>>>> mdadm - v4.2 - 2021-12-30 > >>>>> > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:27:55 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:16 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdb > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WXG1A8UGLS42 > >>>>> LU WWN Device Id: 5 0014ee 2b75ef53b > >>>>> Firmware Version: 80.00A80 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:19 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdc > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WCC4N4HYL32Y > >>>>> LU WWN Device Id: 5 0014ee 2630752f8 > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:20 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdd > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68N32N0 > >>>>> Serial Number: WD-WCC7K1FF6DYK > >>>>> LU WWN Device Id: 5 0014ee 2ba952a30 > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Form Factor: 3.5 inches > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-3 T13/2161-D revision 5 > >>>>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:21 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sde > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WCC4N5ZHTRJF > >>>>> LU WWN Device Id: 5 0014ee 2b88b83bb > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:22 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdf > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>> Serial Number: WD-WMC1T3804790 > >>>>> LU WWN Device Id: 5 0014ee 6036b6826 > >>>>> Firmware Version: 80.00A80 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:23 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdg > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WMC4N0H692Z9 > >>>>> LU WWN Device Id: 5 0014ee 65af39740 > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdh > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>> Serial Number: WD-WMC4N0K5S750 > >>>>> LU WWN Device Id: 5 0014ee 6b048d9ca > >>>>> Firmware Version: 82.00A82 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Rotation Rate: 5400 rpm > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> $ sudo smartctl -H -i -l scterc /dev/sdi > >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>> > >>>>> === START OF INFORMATION SECTION === > >>>>> Model Family: Western Digital Red > >>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>> Serial Number: WD-WMC1T1502475 > >>>>> LU WWN Device Id: 5 0014ee 058d2e5cb > >>>>> Firmware Version: 80.00A80 > >>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>> Device is: In smartctl database [for details use: -P show] > >>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>> Local Time is: Sun Sep 3 13:28:27 2023 PDT > >>>>> SMART support is: Available - device has SMART capability. > >>>>> SMART support is: Enabled > >>>>> > >>>>> === START OF READ SMART DATA SECTION === > >>>>> SMART overall-health self-assessment test result: PASSED > >>>>> > >>>>> SCT Error Recovery Control: > >>>>> Read: 70 (7.0 seconds) > >>>>> Write: 70 (7.0 seconds) > >>>>> > >>>>> > >>>>> $ sudo mdadm --examine /dev/sda > >>>>> /dev/sda: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sda1 > >>>>> /dev/sda1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0xd > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 24 sectors - bad > >>>>> blocks present. > >>>>> Checksum : b6d8f4d1 - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 7 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdb > >>>>> /dev/sdb: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdb1 > >>>>> /dev/sdb1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 00:02:59 2023 > >>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>> Checksum : b544a39 - correct > >>>>> Events : 181077 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 8 > >>>>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdc > >>>>> /dev/sdc: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdc1 > >>>>> /dev/sdc1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0xd > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>> blocks present. > >>>>> Checksum : 88d8b8fc - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 4 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdd > >>>>> /dev/sdd: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdd1 > >>>>> /dev/sdd1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>> Checksum : d1471d9d - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 6 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sde > >>>>> /dev/sde: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sde1 > >>>>> /dev/sde1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>> Checksum : e05d0278 - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 5 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdf > >>>>> /dev/sdf: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdf1 > >>>>> /dev/sdf1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>> Checksum : 26792cc0 - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 0 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdg > >>>>> /dev/sdg: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdg1 > >>>>> /dev/sdg1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>> Checksum : 6f67d179 - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 1 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdh > >>>>> /dev/sdh: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdh1 > >>>>> /dev/sdh1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0xd > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 20:09:14 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>> blocks present. > >>>>> Checksum : b7696b68 - correct > >>>>> Events : 181089 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 2 > >>>>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --examine /dev/sdi > >>>>> /dev/sdi: > >>>>> MBR Magic : aa55 > >>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>> $ sudo mdadm --examine /dev/sdi1 > >>>>> /dev/sdi1: > >>>>> Magic : a92b4efc > >>>>> Version : 1.2 > >>>>> Feature Map : 0x5 > >>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>> Raid Level : raid6 > >>>>> Raid Devices : 9 > >>>>> > >>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>> Data Offset : 247808 sectors > >>>>> Super Offset : 8 sectors > >>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>> State : clean > >>>>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > >>>>> > >>>>> Internal Bitmap : 8 sectors from superblock > >>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>> Delta Devices : 1 (8->9) > >>>>> > >>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>> Checksum : 23b6d024 - correct > >>>>> Events : 181105 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 512K > >>>>> > >>>>> Device Role : Active device 3 > >>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>> > >>>>> $ sudo mdadm --detail /dev/md0 > >>>>> /dev/md0: > >>>>> Version : 1.2 > >>>>> Raid Level : raid6 > >>>>> Total Devices : 9 > >>>>> Persistence : Superblock is persistent > >>>>> > >>>>> State : inactive > >>>>> Working Devices : 9 > >>>>> > >>>>> Delta Devices : 1, (-1->0) > >>>>> New Level : raid6 > >>>>> New Layout : left-symmetric > >>>>> New Chunksize : 512K > >>>>> > >>>>> Name : Blyth:0 (local to host Blyth) > >>>>> UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>> Events : 181105 > >>>>> > >>>>> Number Major Minor RaidDevice > >>>>> > >>>>> - 8 1 - /dev/sda1 > >>>>> - 8 129 - /dev/sdi1 > >>>>> - 8 113 - /dev/sdh1 > >>>>> - 8 97 - /dev/sdg1 > >>>>> - 8 81 - /dev/sdf1 > >>>>> - 8 65 - /dev/sde1 > >>>>> - 8 49 - /dev/sdd1 > >>>>> - 8 33 - /dev/sdc1 > >>>>> - 8 17 - /dev/sdb1 > >>>>> > >>>>> $ cat /proc/mdstat > >>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >>>>> [raid4] [raid10] > >>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > >>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > >>>>> 26353689600 blocks super 1.2 > >>>>> > >>>>> unused devices: <none> > >>>>> > >>>>> . > >>>>> > >>>> > >>> > >>> . > >>> > >> > > > > . > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com>]
* Re: Reshape Failure [not found] ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com> @ 2023-09-07 6:19 ` Jason Moss 2023-09-10 2:45 ` Yu Kuai 0 siblings, 1 reply; 21+ messages in thread From: Jason Moss @ 2023-09-07 6:19 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/07 13:44, Jason Moss 写道: > > Hi, > > > > On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2023/09/06 22:05, Jason Moss 写道: > >>> Hi Kuai, > >>> > >>> I ended up using gdb rather than addr2line, as that output didn't give > >>> me the global offset. Maybe there's a better way, but this seems to be > >>> similar to what I expected. > >> > >> It's ok. > >>> > >>> (gdb) list *(reshape_request+0x416) > >>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396). > >>> 6391 if ((mddev->reshape_backwards > >>> 6392 ? (safepos > writepos && readpos < writepos) > >>> 6393 : (safepos < writepos && readpos > writepos)) || > >>> 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { > >>> 6395 /* Cannot proceed until we've updated the > >>> superblock... */ > >>> 6396 wait_event(conf->wait_for_overlap, > >>> 6397 atomic_read(&conf->reshape_stripes)==0 > >>> 6398 || test_bit(MD_RECOVERY_INTR, > >> > >> If reshape is stuck here, which means: > >> > >> 1) Either reshape io is stuck somewhere and never complete; > >> 2) Or the counter reshape_stripes is broken; > >> > >> Can you read following debugfs files to verify if io is stuck in > >> underlying disk? > >> > >> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} > >> > > > > I'll attach this below. > > > >> Furthermore, echo frozen should break above wait_event() because > >> 'MD_RECOVERY_INTR' will be set, however, based on your description, > >> the problem still exist. Can you collect stack and addr2line result > >> of stuck thread after echo frozen? > >> > > > > I echo'd frozen to /sys/block/md0/md/sync_action, however the echo > > call has been sitting for about 30 minutes, maybe longer, and has not > > returned. Here's the current state: > > > > root 454 0.0 0.0 0 0 ? I< Sep05 0:00 [raid5wq] > > root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) > > Can you also show the stack of udev-worker? And any other thread with > 'D' state, I think above "echo frozen" is probably also stuck in D > state. > As requested: ps aux | grep D USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] root 45507 0.0 0.0 8272 4736 pts/1 Ds+ Sep05 0:00 -bash jason 279169 0.0 0.0 6976 2560 pts/0 S+ 23:16 0:00 grep --color=auto D [jason@arch md]$ sudo cat /proc/455/stack [<0>] wait_woken+0x54/0x60 [<0>] raid5_make_request+0x5fe/0x12f0 [raid456] [<0>] md_handle_request+0x135/0x220 [md_mod] [<0>] __submit_bio+0xb3/0x170 [<0>] submit_bio_noacct_nocheck+0x159/0x370 [<0>] block_read_full_folio+0x21c/0x340 [<0>] filemap_read_folio+0x40/0xd0 [<0>] filemap_get_pages+0x475/0x630 [<0>] filemap_read+0xd9/0x350 [<0>] blkdev_read_iter+0x6b/0x1b0 [<0>] vfs_read+0x201/0x350 [<0>] ksys_read+0x6f/0xf0 [<0>] do_syscall_64+0x60/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [jason@arch md]$ sudo cat /proc/45507/stack [<0>] kthread_stop+0x6a/0x180 [<0>] md_unregister_thread+0x29/0x60 [md_mod] [<0>] action_store+0x168/0x320 [md_mod] [<0>] md_attr_store+0x86/0xf0 [md_mod] [<0>] kernfs_fop_write_iter+0x136/0x1d0 [<0>] vfs_write+0x23e/0x420 [<0>] ksys_write+0x6f/0xf0 [<0>] do_syscall_64+0x60/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Please let me know if you'd like me to identify the lines for any of those. Thanks, Jason > > root 456 99.9 0.0 0 0 ? R Sep05 1543:40 [md0_raid6] > > root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] > > > > [jason@arch md]$ sudo cat /proc/457/stack > > [<0>] md_do_sync+0xef2/0x11d0 [md_mod] > > [<0>] md_thread+0xae/0x190 [md_mod] > > [<0>] kthread+0xe8/0x120 > > [<0>] ret_from_fork+0x34/0x50 > > [<0>] ret_from_fork_asm+0x1b/0x30 > > > > Reading symbols from md-mod.ko... > > (gdb) list *(md_do_sync+0xef2) > > 0xb3a2 is in md_do_sync (drivers/md/md.c:9035). > > 9030 ? "interrupted" : "done"); > > 9031 /* > > 9032 * this also signals 'finished resyncing' to md_stop > > 9033 */ > > 9034 blk_finish_plug(&plug); > > 9035 wait_event(mddev->recovery_wait, > > !atomic_read(&mddev->recovery_active)); > > That's also wait for reshape io to be done from common layer. > > > 9036 > > 9037 if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && > > 9038 !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && > > 9039 mddev->curr_resync >= MD_RESYNC_ACTIVE) { > > > > > > The debugfs info: > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch} > > Only sched_tags is read, sorry that I didn't mean to use this exact cmd. > > Perhaps you can using following cmd: > > find /sys/kernel/debug/block/sda/ -type f | xargs grep . > > > nr_tags=64 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=64 > > busy=1 > > This means there is one IO in sda, however, I need more information to > make sure where is this IO. And please make sure don't run any other > thread that can read/write from sda. You can use "iostat -dmx 1" and > observe for a while to confirm that there is no new io. > > Thanks, > Kuai > > > cleared=55 > > bits_per_word=16 > > map_nr=4 > > alloc_hint={40, 20, 46, 0} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=48 > > nr_tags=32 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=32 > > busy=0 > > cleared=27 > > bits_per_word=8 > > map_nr=4 > > alloc_hint={19, 26, 5, 21} > > wake_batch=4 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=4294967295 > > > > > > > > [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx* > > /{sched_tags,tags,busy,dispatch} > > nr_tags=64 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=64 > > busy=1 > > cleared=56 > > bits_per_word=16 > > map_nr=4 > > alloc_hint={57, 43, 14, 19} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=48 > > nr_tags=32 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=32 > > busy=0 > > cleared=24 > > bits_per_word=8 > > map_nr=4 > > alloc_hint={17, 13, 23, 17} > > wake_batch=4 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=4294967295 > > > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch} > > nr_tags=64 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=64 > > busy=1 > > cleared=51 > > bits_per_word=16 > > map_nr=4 > > alloc_hint={36, 43, 15, 7} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=48 > > nr_tags=32 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=32 > > busy=0 > > cleared=31 > > bits_per_word=8 > > map_nr=4 > > alloc_hint={0, 15, 1, 22} > > wake_batch=4 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=1 > > min_shallow_depth=4294967295 > > > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch} > > nr_tags=256 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=256 > > busy=1 > > cleared=131 > > bits_per_word=64 > > map_nr=4 > > alloc_hint={125, 46, 83, 205} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=192 > > nr_tags=10104 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=10104 > > busy=0 > > cleared=235 > > bits_per_word=64 > > map_nr=158 > > alloc_hint={503, 2913, 9827, 9851} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=4294967295 > > > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch} > > nr_tags=256 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=256 > > busy=1 > > cleared=97 > > bits_per_word=64 > > map_nr=4 > > alloc_hint={144, 144, 127, 254} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=192 > > nr_tags=10104 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=10104 > > busy=0 > > cleared=235 > > bits_per_word=64 > > map_nr=158 > > alloc_hint={503, 2913, 9827, 9851} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=4294967295 > > > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch} > > nr_tags=256 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=256 > > busy=1 > > cleared=34 > > bits_per_word=64 > > map_nr=4 > > alloc_hint={197, 20, 1, 230} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=192 > > nr_tags=10104 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=10104 > > busy=0 > > cleared=235 > > bits_per_word=64 > > map_nr=158 > > alloc_hint={503, 2913, 9827, 9851} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=4294967295 > > > > > > [root@arch ~]# cat > > /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch} > > nr_tags=256 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=256 > > busy=1 > > cleared=27 > > bits_per_word=64 > > map_nr=4 > > alloc_hint={132, 74, 129, 76} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=192 > > nr_tags=10104 > > nr_reserved_tags=0 > > active_queues=0 > > > > bitmap_tags: > > depth=10104 > > busy=0 > > cleared=235 > > bits_per_word=64 > > map_nr=158 > > alloc_hint={503, 2913, 9827, 9851} > > wake_batch=8 > > wake_index=0 > > ws_active=0 > > ws={ > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > {.wait=inactive}, > > } > > round_robin=0 > > min_shallow_depth=4294967295 > > > > > > Thanks for your continued assistance with this! > > Jason > > > > > >> Thanks, > >> Kuai > >> > >>> &mddev->recovery)); > >>> 6399 if (atomic_read(&conf->reshape_stripes) != 0) > >>> 6400 return 0; > >>> > >>> Thanks > >>> > >>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>> > >>>> Hi, > >>>> > >>>> 在 2023/09/05 0:38, Jason Moss 写道: > >>>>> Hi Kuai, > >>>>> > >>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built > >>>>> an environment with 6.5.0.1 now and assembled the array there, but the > >>>>> same problem happens. It reshaped for 20-30 seconds, then completely > >>>>> stopped. > >>>>> > >>>>> Processes and /proc/<PID>/stack output: > >>>>> root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] > >>>>> root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] > >>>>> root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] > >>>>> > >>>>> [root@arch ~]# cat /proc/24593/stack > >>>>> [<0>] rescuer_thread+0x2b0/0x3b0 > >>>>> [<0>] kthread+0xe8/0x120 > >>>>> [<0>] ret_from_fork+0x34/0x50 > >>>>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>>>> > >>>>> [root@arch ~]# cat /proc/24594/stack > >>>>> > >>>>> [root@arch ~]# cat /proc/24595/stack > >>>>> [<0>] reshape_request+0x416/0x9f0 [raid456] > >>>> Can you provide the addr2line result? Let's see where reshape_request() > >>>> is stuck first. > >>>> > >>>> Thanks, > >>>> Kuai > >>>> > >>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] > >>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] > >>>>> [<0>] md_thread+0xae/0x190 [md_mod] > >>>>> [<0>] kthread+0xe8/0x120 > >>>>> [<0>] ret_from_fork+0x34/0x50 > >>>>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>>>> > >>>>> Please let me know if there's a better way to provide the stack info. > >>>>> > >>>>> Thank you > >>>>> > >>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> 在 2023/09/04 5:39, Jason Moss 写道: > >>>>>>> Hello, > >>>>>>> > >>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array, > >>>>>>> growing it to 9 drives. I've done similar before with the same array, > >>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8 > >>>>>>> with no issues. Drives are WD Reds, most older than 2019, some > >>>>>>> (including the newest) newer, but all confirmed CMR and not SMR. > >>>>>>> > >>>>>>> Process used to expand the array: > >>>>>>> mdadm --add /dev/md0 /dev/sdb1 > >>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > >>>>>>> > >>>>>>> The reshape started off fine, the process was underway, and the volume > >>>>>>> was still usable as expected. However, 15-30 minutes into the reshape, > >>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the > >>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all. > >>>>>>> Any process accessing the array would just hang until killed. I waited > >>>>>> > >>>>>> What kernel version are you using? And it'll be very helpful if you can > >>>>>> collect the stack of all stuck thread. There is a known deadlock for > >>>>>> raid5 related to reshape, and it's fixed in v6.5: > >>>>>> > >>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > >>>>>> > >>>>>>> a half hour and there was still no further change to the counter. At > >>>>>>> this point, I restarted the server and found that when it came back up > >>>>>>> it would begin reshaping again, but only very briefly, under 30 > >>>>>>> seconds, but the counter would be increasing during that time. > >>>>>>> > >>>>>>> I searched furiously for ideas and tried stopping and reassembling the > >>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then > >>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max > >>>>>>> file. Nothing ever seemed to make a difference. > >>>>>>> > >>>>>> > >>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in > >>>>>> progress will corrupt your data: > >>>>>> > >>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com > >>>>>> > >>>>>> Thanks, > >>>>>> Kuai > >>>>>> > >>>>>>> Here is where I slightly panicked, worried that I'd borked my array, > >>>>>>> and powered off the server again and disconnected the new drive that > >>>>>>> was just added, assuming that since it was the change, it may be the > >>>>>>> problem despite having burn-in tested it, and figuring that I'll rush > >>>>>>> order a new drive, so long as the reshape continues and I can just > >>>>>>> rebuild onto a new drive once the reshape finishes. However, this made > >>>>>>> no difference and the array continued to not rebuild. > >>>>>>> > >>>>>>> Much searching later, I'd found nothing substantially different then > >>>>>>> I'd already tried and one of the common threads in other people's > >>>>>>> issues was bad drives, so I ran a self-test against each of the > >>>>>>> existing drives and found one drive that failed the read test. > >>>>>>> Thinking I had the culprit now, I dropped that drive out of the array > >>>>>>> and assembled the array again, but the same behavior persists. The > >>>>>>> array reshapes very briefly, then completely stops. > >>>>>>> > >>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not > >>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc, > >>>>>>> and very frustrated, I took a break, bought all new drives to build a > >>>>>>> new array in another server and restored from a backup. However, there > >>>>>>> is still some data not captured by the most recent backup that I would > >>>>>>> like to recover, and I'd also like to solve the problem purely to > >>>>>>> understand what happened and how to recover in the future. > >>>>>>> > >>>>>>> Is there anything else I should try to recover this array, or is this > >>>>>>> a lost cause? > >>>>>>> > >>>>>>> Details requested by the wiki to follow and I'm happy to collect any > >>>>>>> further data that would assist. /dev/sdb is the new drive that was > >>>>>>> added, then disconnected. /dev/sdh is the drive that failed a > >>>>>>> self-test and was removed from the array. > >>>>>>> > >>>>>>> Thank you in advance for any help provided! > >>>>>>> > >>>>>>> > >>>>>>> $ uname -a > >>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > >>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux > >>>>>>> > >>>>>>> $ mdadm --version > >>>>>>> mdadm - v4.2 - 2021-12-30 > >>>>>>> > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:27:55 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:16 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WXG1A8UGLS42 > >>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b > >>>>>>> Firmware Version: 80.00A80 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:19 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WCC4N4HYL32Y > >>>>>>> LU WWN Device Id: 5 0014ee 2630752f8 > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:20 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68N32N0 > >>>>>>> Serial Number: WD-WCC7K1FF6DYK > >>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30 > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Form Factor: 3.5 inches > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-3 T13/2161-D revision 5 > >>>>>>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:21 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WCC4N5ZHTRJF > >>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:22 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>>>> Serial Number: WD-WMC1T3804790 > >>>>>>> LU WWN Device Id: 5 0014ee 6036b6826 > >>>>>>> Firmware Version: 80.00A80 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:23 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WMC4N0H692Z9 > >>>>>>> LU WWN Device Id: 5 0014ee 65af39740 > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>> Serial Number: WD-WMC4N0K5S750 > >>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca > >>>>>>> Firmware Version: 82.00A82 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Rotation Rate: 5400 rpm > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi > >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>> > >>>>>>> === START OF INFORMATION SECTION === > >>>>>>> Model Family: Western Digital Red > >>>>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>>>> Serial Number: WD-WMC1T1502475 > >>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb > >>>>>>> Firmware Version: 80.00A80 > >>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>> Local Time is: Sun Sep 3 13:28:27 2023 PDT > >>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>> SMART support is: Enabled > >>>>>>> > >>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>> > >>>>>>> SCT Error Recovery Control: > >>>>>>> Read: 70 (7.0 seconds) > >>>>>>> Write: 70 (7.0 seconds) > >>>>>>> > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sda > >>>>>>> /dev/sda: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sda1 > >>>>>>> /dev/sda1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0xd > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 24 sectors - bad > >>>>>>> blocks present. > >>>>>>> Checksum : b6d8f4d1 - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 7 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdb > >>>>>>> /dev/sdb: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdb1 > >>>>>>> /dev/sdb1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 00:02:59 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>>>> Checksum : b544a39 - correct > >>>>>>> Events : 181077 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 8 > >>>>>>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdc > >>>>>>> /dev/sdc: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdc1 > >>>>>>> /dev/sdc1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0xd > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>>>> blocks present. > >>>>>>> Checksum : 88d8b8fc - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 4 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdd > >>>>>>> /dev/sdd: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdd1 > >>>>>>> /dev/sdd1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>>>> Checksum : d1471d9d - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 6 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sde > >>>>>>> /dev/sde: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sde1 > >>>>>>> /dev/sde1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>> Checksum : e05d0278 - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 5 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdf > >>>>>>> /dev/sdf: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdf1 > >>>>>>> /dev/sdf1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>> Checksum : 26792cc0 - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 0 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdg > >>>>>>> /dev/sdg: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdg1 > >>>>>>> /dev/sdg1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>> Checksum : 6f67d179 - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 1 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdh > >>>>>>> /dev/sdh: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdh1 > >>>>>>> /dev/sdh1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0xd > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 20:09:14 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>>>> blocks present. > >>>>>>> Checksum : b7696b68 - correct > >>>>>>> Events : 181089 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 2 > >>>>>>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --examine /dev/sdi > >>>>>>> /dev/sdi: > >>>>>>> MBR Magic : aa55 > >>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>> $ sudo mdadm --examine /dev/sdi1 > >>>>>>> /dev/sdi1: > >>>>>>> Magic : a92b4efc > >>>>>>> Version : 1.2 > >>>>>>> Feature Map : 0x5 > >>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>> Raid Level : raid6 > >>>>>>> Raid Devices : 9 > >>>>>>> > >>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>> Data Offset : 247808 sectors > >>>>>>> Super Offset : 8 sectors > >>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>> State : clean > >>>>>>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > >>>>>>> > >>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>> Delta Devices : 1 (8->9) > >>>>>>> > >>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>> Checksum : 23b6d024 - correct > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Layout : left-symmetric > >>>>>>> Chunk Size : 512K > >>>>>>> > >>>>>>> Device Role : Active device 3 > >>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>> > >>>>>>> $ sudo mdadm --detail /dev/md0 > >>>>>>> /dev/md0: > >>>>>>> Version : 1.2 > >>>>>>> Raid Level : raid6 > >>>>>>> Total Devices : 9 > >>>>>>> Persistence : Superblock is persistent > >>>>>>> > >>>>>>> State : inactive > >>>>>>> Working Devices : 9 > >>>>>>> > >>>>>>> Delta Devices : 1, (-1->0) > >>>>>>> New Level : raid6 > >>>>>>> New Layout : left-symmetric > >>>>>>> New Chunksize : 512K > >>>>>>> > >>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>> UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>> Events : 181105 > >>>>>>> > >>>>>>> Number Major Minor RaidDevice > >>>>>>> > >>>>>>> - 8 1 - /dev/sda1 > >>>>>>> - 8 129 - /dev/sdi1 > >>>>>>> - 8 113 - /dev/sdh1 > >>>>>>> - 8 97 - /dev/sdg1 > >>>>>>> - 8 81 - /dev/sdf1 > >>>>>>> - 8 65 - /dev/sde1 > >>>>>>> - 8 49 - /dev/sdd1 > >>>>>>> - 8 33 - /dev/sdc1 > >>>>>>> - 8 17 - /dev/sdb1 > >>>>>>> > >>>>>>> $ cat /proc/mdstat > >>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >>>>>>> [raid4] [raid10] > >>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > >>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > >>>>>>> 26353689600 blocks super 1.2 > >>>>>>> > >>>>>>> unused devices: <none> > >>>>>>> > >>>>>>> . > >>>>>>> > >>>>>> > >>>>> > >>>>> . > >>>>> > >>>> > >>> > >>> . > >>> > >> > > > > . > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-07 6:19 ` Jason Moss @ 2023-09-10 2:45 ` Yu Kuai 2023-09-10 4:58 ` Jason Moss 0 siblings, 1 reply; 21+ messages in thread From: Yu Kuai @ 2023-09-10 2:45 UTC (permalink / raw) To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, 在 2023/09/07 14:19, Jason Moss 写道: > Hi, > > On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/09/07 13:44, Jason Moss 写道: >>> Hi, >>> >>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> Hi, >>>> >>>> 在 2023/09/06 22:05, Jason Moss 写道: >>>>> Hi Kuai, >>>>> >>>>> I ended up using gdb rather than addr2line, as that output didn't give >>>>> me the global offset. Maybe there's a better way, but this seems to be >>>>> similar to what I expected. >>>> >>>> It's ok. >>>>> >>>>> (gdb) list *(reshape_request+0x416) >>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396). >>>>> 6391 if ((mddev->reshape_backwards >>>>> 6392 ? (safepos > writepos && readpos < writepos) >>>>> 6393 : (safepos < writepos && readpos > writepos)) || >>>>> 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { >>>>> 6395 /* Cannot proceed until we've updated the >>>>> superblock... */ >>>>> 6396 wait_event(conf->wait_for_overlap, >>>>> 6397 atomic_read(&conf->reshape_stripes)==0 >>>>> 6398 || test_bit(MD_RECOVERY_INTR, >>>> >>>> If reshape is stuck here, which means: >>>> >>>> 1) Either reshape io is stuck somewhere and never complete; >>>> 2) Or the counter reshape_stripes is broken; >>>> >>>> Can you read following debugfs files to verify if io is stuck in >>>> underlying disk? >>>> >>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} >>>> >>> >>> I'll attach this below. >>> >>>> Furthermore, echo frozen should break above wait_event() because >>>> 'MD_RECOVERY_INTR' will be set, however, based on your description, >>>> the problem still exist. Can you collect stack and addr2line result >>>> of stuck thread after echo frozen? >>>> >>> >>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo >>> call has been sitting for about 30 minutes, maybe longer, and has not >>> returned. Here's the current state: >>> >>> root 454 0.0 0.0 0 0 ? I< Sep05 0:00 [raid5wq] >>> root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) >> >> Can you also show the stack of udev-worker? And any other thread with >> 'D' state, I think above "echo frozen" is probably also stuck in D >> state. >> > > As requested: > > ps aux | grep D > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) > root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] > root 45507 0.0 0.0 8272 4736 pts/1 Ds+ Sep05 0:00 -bash > jason 279169 0.0 0.0 6976 2560 pts/0 S+ 23:16 0:00 > grep --color=auto D > > [jason@arch md]$ sudo cat /proc/455/stack > [<0>] wait_woken+0x54/0x60 > [<0>] raid5_make_request+0x5fe/0x12f0 [raid456] > [<0>] md_handle_request+0x135/0x220 [md_mod] > [<0>] __submit_bio+0xb3/0x170 > [<0>] submit_bio_noacct_nocheck+0x159/0x370 > [<0>] block_read_full_folio+0x21c/0x340 > [<0>] filemap_read_folio+0x40/0xd0 > [<0>] filemap_get_pages+0x475/0x630 > [<0>] filemap_read+0xd9/0x350 > [<0>] blkdev_read_iter+0x6b/0x1b0 > [<0>] vfs_read+0x201/0x350 > [<0>] ksys_read+0x6f/0xf0 > [<0>] do_syscall_64+0x60/0x90 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > [jason@arch md]$ sudo cat /proc/45507/stack > [<0>] kthread_stop+0x6a/0x180 > [<0>] md_unregister_thread+0x29/0x60 [md_mod] > [<0>] action_store+0x168/0x320 [md_mod] > [<0>] md_attr_store+0x86/0xf0 [md_mod] > [<0>] kernfs_fop_write_iter+0x136/0x1d0 > [<0>] vfs_write+0x23e/0x420 > [<0>] ksys_write+0x6f/0xf0 > [<0>] do_syscall_64+0x60/0x90 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > Please let me know if you'd like me to identify the lines for any of those. > That's enough. > Thanks, > Jason > > >>> root 456 99.9 0.0 0 0 ? R Sep05 1543:40 [md0_raid6] >>> root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] >>> >>> [jason@arch md]$ sudo cat /proc/457/stack >>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod] >>> [<0>] md_thread+0xae/0x190 [md_mod] >>> [<0>] kthread+0xe8/0x120 >>> [<0>] ret_from_fork+0x34/0x50 >>> [<0>] ret_from_fork_asm+0x1b/0x30 >>> >>> Reading symbols from md-mod.ko... >>> (gdb) list *(md_do_sync+0xef2) >>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035). >>> 9030 ? "interrupted" : "done"); >>> 9031 /* >>> 9032 * this also signals 'finished resyncing' to md_stop >>> 9033 */ >>> 9034 blk_finish_plug(&plug); >>> 9035 wait_event(mddev->recovery_wait, >>> !atomic_read(&mddev->recovery_active)); >> >> That's also wait for reshape io to be done from common layer. >> >>> 9036 >>> 9037 if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && >>> 9038 !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && >>> 9039 mddev->curr_resync >= MD_RESYNC_ACTIVE) { >>> >>> >>> The debugfs info: >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch} >> >> Only sched_tags is read, sorry that I didn't mean to use this exact cmd. >> >> Perhaps you can using following cmd: >> >> find /sys/kernel/debug/block/sda/ -type f | xargs grep . >> >>> nr_tags=64 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=64 >>> busy=1 >> >> This means there is one IO in sda, however, I need more information to >> make sure where is this IO. And please make sure don't run any other >> thread that can read/write from sda. You can use "iostat -dmx 1" and >> observe for a while to confirm that there is no new io. And can you help for this? Confirm no new io and collect debugfs. Thanks, Kuai >> >> Thanks, >> Kuai >> >>> cleared=55 >>> bits_per_word=16 >>> map_nr=4 >>> alloc_hint={40, 20, 46, 0} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=48 >>> nr_tags=32 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=32 >>> busy=0 >>> cleared=27 >>> bits_per_word=8 >>> map_nr=4 >>> alloc_hint={19, 26, 5, 21} >>> wake_batch=4 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=4294967295 >> >> >>> >>> >>> [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx* >>> /{sched_tags,tags,busy,dispatch} >>> nr_tags=64 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=64 >>> busy=1 >>> cleared=56 >>> bits_per_word=16 >>> map_nr=4 >>> alloc_hint={57, 43, 14, 19} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=48 >>> nr_tags=32 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=32 >>> busy=0 >>> cleared=24 >>> bits_per_word=8 >>> map_nr=4 >>> alloc_hint={17, 13, 23, 17} >>> wake_batch=4 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=4294967295 >>> >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch} >>> nr_tags=64 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=64 >>> busy=1 >>> cleared=51 >>> bits_per_word=16 >>> map_nr=4 >>> alloc_hint={36, 43, 15, 7} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=48 >>> nr_tags=32 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=32 >>> busy=0 >>> cleared=31 >>> bits_per_word=8 >>> map_nr=4 >>> alloc_hint={0, 15, 1, 22} >>> wake_batch=4 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=1 >>> min_shallow_depth=4294967295 >>> >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch} >>> nr_tags=256 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=256 >>> busy=1 >>> cleared=131 >>> bits_per_word=64 >>> map_nr=4 >>> alloc_hint={125, 46, 83, 205} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=192 >>> nr_tags=10104 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=10104 >>> busy=0 >>> cleared=235 >>> bits_per_word=64 >>> map_nr=158 >>> alloc_hint={503, 2913, 9827, 9851} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=4294967295 >>> >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch} >>> nr_tags=256 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=256 >>> busy=1 >>> cleared=97 >>> bits_per_word=64 >>> map_nr=4 >>> alloc_hint={144, 144, 127, 254} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=192 >>> nr_tags=10104 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=10104 >>> busy=0 >>> cleared=235 >>> bits_per_word=64 >>> map_nr=158 >>> alloc_hint={503, 2913, 9827, 9851} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=4294967295 >>> >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch} >>> nr_tags=256 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=256 >>> busy=1 >>> cleared=34 >>> bits_per_word=64 >>> map_nr=4 >>> alloc_hint={197, 20, 1, 230} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=192 >>> nr_tags=10104 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=10104 >>> busy=0 >>> cleared=235 >>> bits_per_word=64 >>> map_nr=158 >>> alloc_hint={503, 2913, 9827, 9851} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=4294967295 >>> >>> >>> [root@arch ~]# cat >>> /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch} >>> nr_tags=256 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=256 >>> busy=1 >>> cleared=27 >>> bits_per_word=64 >>> map_nr=4 >>> alloc_hint={132, 74, 129, 76} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=192 >>> nr_tags=10104 >>> nr_reserved_tags=0 >>> active_queues=0 >>> >>> bitmap_tags: >>> depth=10104 >>> busy=0 >>> cleared=235 >>> bits_per_word=64 >>> map_nr=158 >>> alloc_hint={503, 2913, 9827, 9851} >>> wake_batch=8 >>> wake_index=0 >>> ws_active=0 >>> ws={ >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> {.wait=inactive}, >>> } >>> round_robin=0 >>> min_shallow_depth=4294967295 >>> >>> >>> Thanks for your continued assistance with this! >>> Jason >>> >>> >>>> Thanks, >>>> Kuai >>>> >>>>> &mddev->recovery)); >>>>> 6399 if (atomic_read(&conf->reshape_stripes) != 0) >>>>> 6400 return 0; >>>>> >>>>> Thanks >>>>> >>>>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> 在 2023/09/05 0:38, Jason Moss 写道: >>>>>>> Hi Kuai, >>>>>>> >>>>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built >>>>>>> an environment with 6.5.0.1 now and assembled the array there, but the >>>>>>> same problem happens. It reshaped for 20-30 seconds, then completely >>>>>>> stopped. >>>>>>> >>>>>>> Processes and /proc/<PID>/stack output: >>>>>>> root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] >>>>>>> root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] >>>>>>> root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] >>>>>>> >>>>>>> [root@arch ~]# cat /proc/24593/stack >>>>>>> [<0>] rescuer_thread+0x2b0/0x3b0 >>>>>>> [<0>] kthread+0xe8/0x120 >>>>>>> [<0>] ret_from_fork+0x34/0x50 >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30 >>>>>>> >>>>>>> [root@arch ~]# cat /proc/24594/stack >>>>>>> >>>>>>> [root@arch ~]# cat /proc/24595/stack >>>>>>> [<0>] reshape_request+0x416/0x9f0 [raid456] >>>>>> Can you provide the addr2line result? Let's see where reshape_request() >>>>>> is stuck first. >>>>>> >>>>>> Thanks, >>>>>> Kuai >>>>>> >>>>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] >>>>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] >>>>>>> [<0>] md_thread+0xae/0x190 [md_mod] >>>>>>> [<0>] kthread+0xe8/0x120 >>>>>>> [<0>] ret_from_fork+0x34/0x50 >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30 >>>>>>> >>>>>>> Please let me know if there's a better way to provide the stack info. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> 在 2023/09/04 5:39, Jason Moss 写道: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array, >>>>>>>>> growing it to 9 drives. I've done similar before with the same array, >>>>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8 >>>>>>>>> with no issues. Drives are WD Reds, most older than 2019, some >>>>>>>>> (including the newest) newer, but all confirmed CMR and not SMR. >>>>>>>>> >>>>>>>>> Process used to expand the array: >>>>>>>>> mdadm --add /dev/md0 /dev/sdb1 >>>>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 >>>>>>>>> >>>>>>>>> The reshape started off fine, the process was underway, and the volume >>>>>>>>> was still usable as expected. However, 15-30 minutes into the reshape, >>>>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the >>>>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all. >>>>>>>>> Any process accessing the array would just hang until killed. I waited >>>>>>>> >>>>>>>> What kernel version are you using? And it'll be very helpful if you can >>>>>>>> collect the stack of all stuck thread. There is a known deadlock for >>>>>>>> raid5 related to reshape, and it's fixed in v6.5: >>>>>>>> >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com >>>>>>>> >>>>>>>>> a half hour and there was still no further change to the counter. At >>>>>>>>> this point, I restarted the server and found that when it came back up >>>>>>>>> it would begin reshaping again, but only very briefly, under 30 >>>>>>>>> seconds, but the counter would be increasing during that time. >>>>>>>>> >>>>>>>>> I searched furiously for ideas and tried stopping and reassembling the >>>>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then >>>>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max >>>>>>>>> file. Nothing ever seemed to make a difference. >>>>>>>>> >>>>>>>> >>>>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in >>>>>>>> progress will corrupt your data: >>>>>>>> >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kuai >>>>>>>> >>>>>>>>> Here is where I slightly panicked, worried that I'd borked my array, >>>>>>>>> and powered off the server again and disconnected the new drive that >>>>>>>>> was just added, assuming that since it was the change, it may be the >>>>>>>>> problem despite having burn-in tested it, and figuring that I'll rush >>>>>>>>> order a new drive, so long as the reshape continues and I can just >>>>>>>>> rebuild onto a new drive once the reshape finishes. However, this made >>>>>>>>> no difference and the array continued to not rebuild. >>>>>>>>> >>>>>>>>> Much searching later, I'd found nothing substantially different then >>>>>>>>> I'd already tried and one of the common threads in other people's >>>>>>>>> issues was bad drives, so I ran a self-test against each of the >>>>>>>>> existing drives and found one drive that failed the read test. >>>>>>>>> Thinking I had the culprit now, I dropped that drive out of the array >>>>>>>>> and assembled the array again, but the same behavior persists. The >>>>>>>>> array reshapes very briefly, then completely stops. >>>>>>>>> >>>>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not >>>>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc, >>>>>>>>> and very frustrated, I took a break, bought all new drives to build a >>>>>>>>> new array in another server and restored from a backup. However, there >>>>>>>>> is still some data not captured by the most recent backup that I would >>>>>>>>> like to recover, and I'd also like to solve the problem purely to >>>>>>>>> understand what happened and how to recover in the future. >>>>>>>>> >>>>>>>>> Is there anything else I should try to recover this array, or is this >>>>>>>>> a lost cause? >>>>>>>>> >>>>>>>>> Details requested by the wiki to follow and I'm happy to collect any >>>>>>>>> further data that would assist. /dev/sdb is the new drive that was >>>>>>>>> added, then disconnected. /dev/sdh is the drive that failed a >>>>>>>>> self-test and was removed from the array. >>>>>>>>> >>>>>>>>> Thank you in advance for any help provided! >>>>>>>>> >>>>>>>>> >>>>>>>>> $ uname -a >>>>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC >>>>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux >>>>>>>>> >>>>>>>>> $ mdadm --version >>>>>>>>> mdadm - v4.2 - 2021-12-30 >>>>>>>>> >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WCC4N7AT7R7X >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93 >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:27:55 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WCC4N7AT7R7X >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93 >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:16 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WXG1A8UGLS42 >>>>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b >>>>>>>>> Firmware Version: 80.00A80 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:19 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WCC4N4HYL32Y >>>>>>>>> LU WWN Device Id: 5 0014ee 2630752f8 >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:20 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68N32N0 >>>>>>>>> Serial Number: WD-WCC7K1FF6DYK >>>>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30 >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Form Factor: 3.5 inches >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-3 T13/2161-D revision 5 >>>>>>>>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:21 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WCC4N5ZHTRJF >>>>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:22 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68AX9N0 >>>>>>>>> Serial Number: WD-WMC1T3804790 >>>>>>>>> LU WWN Device Id: 5 0014ee 6036b6826 >>>>>>>>> Firmware Version: 80.00A80 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:23 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WMC4N0H692Z9 >>>>>>>>> LU WWN Device Id: 5 0014ee 65af39740 >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 >>>>>>>>> Serial Number: WD-WMC4N0K5S750 >>>>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca >>>>>>>>> Firmware Version: 82.00A82 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Rotation Rate: 5400 rpm >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org >>>>>>>>> >>>>>>>>> === START OF INFORMATION SECTION === >>>>>>>>> Model Family: Western Digital Red >>>>>>>>> Device Model: WDC WD30EFRX-68AX9N0 >>>>>>>>> Serial Number: WD-WMC1T1502475 >>>>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb >>>>>>>>> Firmware Version: 80.00A80 >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>>>>>> Device is: In smartctl database [for details use: -P show] >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) >>>>>>>>> Local Time is: Sun Sep 3 13:28:27 2023 PDT >>>>>>>>> SMART support is: Available - device has SMART capability. >>>>>>>>> SMART support is: Enabled >>>>>>>>> >>>>>>>>> === START OF READ SMART DATA SECTION === >>>>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>>>> >>>>>>>>> SCT Error Recovery Control: >>>>>>>>> Read: 70 (7.0 seconds) >>>>>>>>> Write: 70 (7.0 seconds) >>>>>>>>> >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sda >>>>>>>>> /dev/sda: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sda1 >>>>>>>>> /dev/sda1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0xd >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors - bad >>>>>>>>> blocks present. >>>>>>>>> Checksum : b6d8f4d1 - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 7 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdb >>>>>>>>> /dev/sdb: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdb1 >>>>>>>>> /dev/sdb1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 00:02:59 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors >>>>>>>>> Checksum : b544a39 - correct >>>>>>>>> Events : 181077 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 8 >>>>>>>>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdc >>>>>>>>> /dev/sdc: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdc1 >>>>>>>>> /dev/sdc1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0xd >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>>>>>>>> blocks present. >>>>>>>>> Checksum : 88d8b8fc - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 4 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdd >>>>>>>>> /dev/sdd: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdd1 >>>>>>>>> /dev/sdd1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors >>>>>>>>> Checksum : d1471d9d - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 6 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sde >>>>>>>>> /dev/sde: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sde1 >>>>>>>>> /dev/sde1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>>>>>> Checksum : e05d0278 - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 5 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdf >>>>>>>>> /dev/sdf: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdf1 >>>>>>>>> /dev/sdf1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>>>>>> Checksum : 26792cc0 - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 0 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdg >>>>>>>>> /dev/sdg: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdg1 >>>>>>>>> /dev/sdg1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>>>>>> Checksum : 6f67d179 - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 1 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdh >>>>>>>>> /dev/sdh: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdh1 >>>>>>>>> /dev/sdh1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0xd >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 20:09:14 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad >>>>>>>>> blocks present. >>>>>>>>> Checksum : b7696b68 - correct >>>>>>>>> Events : 181089 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 2 >>>>>>>>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --examine /dev/sdi >>>>>>>>> /dev/sdi: >>>>>>>>> MBR Magic : aa55 >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) >>>>>>>>> $ sudo mdadm --examine /dev/sdi1 >>>>>>>>> /dev/sdi1: >>>>>>>>> Magic : a92b4efc >>>>>>>>> Version : 1.2 >>>>>>>>> Feature Map : 0x5 >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Raid Devices : 9 >>>>>>>>> >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) >>>>>>>>> Data Offset : 247808 sectors >>>>>>>>> Super Offset : 8 sectors >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors >>>>>>>>> State : clean >>>>>>>>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 >>>>>>>>> >>>>>>>>> Internal Bitmap : 8 sectors from superblock >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) >>>>>>>>> Delta Devices : 1 (8->9) >>>>>>>>> >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors >>>>>>>>> Checksum : 23b6d024 - correct >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Layout : left-symmetric >>>>>>>>> Chunk Size : 512K >>>>>>>>> >>>>>>>>> Device Role : Active device 3 >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) >>>>>>>>> >>>>>>>>> $ sudo mdadm --detail /dev/md0 >>>>>>>>> /dev/md0: >>>>>>>>> Version : 1.2 >>>>>>>>> Raid Level : raid6 >>>>>>>>> Total Devices : 9 >>>>>>>>> Persistence : Superblock is persistent >>>>>>>>> >>>>>>>>> State : inactive >>>>>>>>> Working Devices : 9 >>>>>>>>> >>>>>>>>> Delta Devices : 1, (-1->0) >>>>>>>>> New Level : raid6 >>>>>>>>> New Layout : left-symmetric >>>>>>>>> New Chunksize : 512K >>>>>>>>> >>>>>>>>> Name : Blyth:0 (local to host Blyth) >>>>>>>>> UUID : 440dc11e:079308b1:131eda79:9a74c670 >>>>>>>>> Events : 181105 >>>>>>>>> >>>>>>>>> Number Major Minor RaidDevice >>>>>>>>> >>>>>>>>> - 8 1 - /dev/sda1 >>>>>>>>> - 8 129 - /dev/sdi1 >>>>>>>>> - 8 113 - /dev/sdh1 >>>>>>>>> - 8 97 - /dev/sdg1 >>>>>>>>> - 8 81 - /dev/sdf1 >>>>>>>>> - 8 65 - /dev/sde1 >>>>>>>>> - 8 49 - /dev/sdd1 >>>>>>>>> - 8 33 - /dev/sdc1 >>>>>>>>> - 8 17 - /dev/sdb1 >>>>>>>>> >>>>>>>>> $ cat /proc/mdstat >>>>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>>>>>>>> [raid4] [raid10] >>>>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) >>>>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) >>>>>>>>> 26353689600 blocks super 1.2 >>>>>>>>> >>>>>>>>> unused devices: <none> >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>>> >>>>> >>>>> . >>>>> >>>> >>> >>> . >>> >> > > . > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-10 2:45 ` Yu Kuai @ 2023-09-10 4:58 ` Jason Moss 2023-09-10 6:10 ` Yu Kuai 0 siblings, 1 reply; 21+ messages in thread From: Jason Moss @ 2023-09-10 4:58 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C) Hi, On Sat, Sep 9, 2023 at 7:45 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/09/07 14:19, Jason Moss 写道: > > Hi, > > > > On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2023/09/07 13:44, Jason Moss 写道: > >>> Hi, > >>> > >>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>> > >>>> Hi, > >>>> > >>>> 在 2023/09/06 22:05, Jason Moss 写道: > >>>>> Hi Kuai, > >>>>> > >>>>> I ended up using gdb rather than addr2line, as that output didn't give > >>>>> me the global offset. Maybe there's a better way, but this seems to be > >>>>> similar to what I expected. > >>>> > >>>> It's ok. > >>>>> > >>>>> (gdb) list *(reshape_request+0x416) > >>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396). > >>>>> 6391 if ((mddev->reshape_backwards > >>>>> 6392 ? (safepos > writepos && readpos < writepos) > >>>>> 6393 : (safepos < writepos && readpos > writepos)) || > >>>>> 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { > >>>>> 6395 /* Cannot proceed until we've updated the > >>>>> superblock... */ > >>>>> 6396 wait_event(conf->wait_for_overlap, > >>>>> 6397 atomic_read(&conf->reshape_stripes)==0 > >>>>> 6398 || test_bit(MD_RECOVERY_INTR, > >>>> > >>>> If reshape is stuck here, which means: > >>>> > >>>> 1) Either reshape io is stuck somewhere and never complete; > >>>> 2) Or the counter reshape_stripes is broken; > >>>> > >>>> Can you read following debugfs files to verify if io is stuck in > >>>> underlying disk? > >>>> > >>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} > >>>> > >>> > >>> I'll attach this below. > >>> > >>>> Furthermore, echo frozen should break above wait_event() because > >>>> 'MD_RECOVERY_INTR' will be set, however, based on your description, > >>>> the problem still exist. Can you collect stack and addr2line result > >>>> of stuck thread after echo frozen? > >>>> > >>> > >>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo > >>> call has been sitting for about 30 minutes, maybe longer, and has not > >>> returned. Here's the current state: > >>> > >>> root 454 0.0 0.0 0 0 ? I< Sep05 0:00 [raid5wq] > >>> root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) > >> > >> Can you also show the stack of udev-worker? And any other thread with > >> 'D' state, I think above "echo frozen" is probably also stuck in D > >> state. > >> > > > > As requested: > > > > ps aux | grep D > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) > > root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] > > root 45507 0.0 0.0 8272 4736 pts/1 Ds+ Sep05 0:00 -bash > > jason 279169 0.0 0.0 6976 2560 pts/0 S+ 23:16 0:00 > > grep --color=auto D > > > > [jason@arch md]$ sudo cat /proc/455/stack > > [<0>] wait_woken+0x54/0x60 > > [<0>] raid5_make_request+0x5fe/0x12f0 [raid456] > > [<0>] md_handle_request+0x135/0x220 [md_mod] > > [<0>] __submit_bio+0xb3/0x170 > > [<0>] submit_bio_noacct_nocheck+0x159/0x370 > > [<0>] block_read_full_folio+0x21c/0x340 > > [<0>] filemap_read_folio+0x40/0xd0 > > [<0>] filemap_get_pages+0x475/0x630 > > [<0>] filemap_read+0xd9/0x350 > > [<0>] blkdev_read_iter+0x6b/0x1b0 > > [<0>] vfs_read+0x201/0x350 > > [<0>] ksys_read+0x6f/0xf0 > > [<0>] do_syscall_64+0x60/0x90 > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > > > [jason@arch md]$ sudo cat /proc/45507/stack > > [<0>] kthread_stop+0x6a/0x180 > > [<0>] md_unregister_thread+0x29/0x60 [md_mod] > > [<0>] action_store+0x168/0x320 [md_mod] > > [<0>] md_attr_store+0x86/0xf0 [md_mod] > > [<0>] kernfs_fop_write_iter+0x136/0x1d0 > > [<0>] vfs_write+0x23e/0x420 > > [<0>] ksys_write+0x6f/0xf0 > > [<0>] do_syscall_64+0x60/0x90 > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > Please let me know if you'd like me to identify the lines for any of those. > > > > That's enough. > > Thanks, > > Jason > > > > > >>> root 456 99.9 0.0 0 0 ? R Sep05 1543:40 [md0_raid6] > >>> root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] > >>> > >>> [jason@arch md]$ sudo cat /proc/457/stack > >>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod] > >>> [<0>] md_thread+0xae/0x190 [md_mod] > >>> [<0>] kthread+0xe8/0x120 > >>> [<0>] ret_from_fork+0x34/0x50 > >>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>> > >>> Reading symbols from md-mod.ko... > >>> (gdb) list *(md_do_sync+0xef2) > >>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035). > >>> 9030 ? "interrupted" : "done"); > >>> 9031 /* > >>> 9032 * this also signals 'finished resyncing' to md_stop > >>> 9033 */ > >>> 9034 blk_finish_plug(&plug); > >>> 9035 wait_event(mddev->recovery_wait, > >>> !atomic_read(&mddev->recovery_active)); > >> > >> That's also wait for reshape io to be done from common layer. > >> > >>> 9036 > >>> 9037 if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && > >>> 9038 !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && > >>> 9039 mddev->curr_resync >= MD_RESYNC_ACTIVE) { > >>> > >>> > >>> The debugfs info: > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch} > >> > >> Only sched_tags is read, sorry that I didn't mean to use this exact cmd. > >> > >> Perhaps you can using following cmd: > >> > >> find /sys/kernel/debug/block/sda/ -type f | xargs grep . > >> > >>> nr_tags=64 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=64 > >>> busy=1 > >> > >> This means there is one IO in sda, however, I need more information to > >> make sure where is this IO. And please make sure don't run any other > >> thread that can read/write from sda. You can use "iostat -dmx 1" and > >> observe for a while to confirm that there is no new io. > > And can you help for this? Confirm no new io and collect debugfs. As instructed, I confirmed there is no active IO to sda1 via iostat. I then ran the provided command [root@arch ~]# find /sys/kernel/debug/block/sda/ -type f | xargs grep . /sys/kernel/debug/block/sda/rqos/wbt/wb_background:6 /sys/kernel/debug/block/sda/rqos/wbt/wb_normal:12 /sys/kernel/debug/block/sda/rqos/wbt/unknown_cnt:4 /sys/kernel/debug/block/sda/rqos/wbt/min_lat_nsec:75000000 /sys/kernel/debug/block/sda/rqos/wbt/inflight:0: inflight 1 /sys/kernel/debug/block/sda/rqos/wbt/inflight:1: inflight 0 /sys/kernel/debug/block/sda/rqos/wbt/inflight:2: inflight 0 /sys/kernel/debug/block/sda/rqos/wbt/id:0 /sys/kernel/debug/block/sda/rqos/wbt/enabled:1 /sys/kernel/debug/block/sda/rqos/wbt/curr_win_nsec:100000000 /sys/kernel/debug/block/sda/hctx0/type:default /sys/kernel/debug/block/sda/hctx0/dispatch_busy:0 /sys/kernel/debug/block/sda/hctx0/active:0 /sys/kernel/debug/block/sda/hctx0/run:2583 /sys/kernel/debug/block/sda/hctx0/sched_tags_bitmap:00000000: 0000 0000 8000 0000 /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_tags=64 /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_reserved_tags=0 /sys/kernel/debug/block/sda/hctx0/sched_tags:active_queues=0 /sys/kernel/debug/block/sda/hctx0/sched_tags:bitmap_tags: /sys/kernel/debug/block/sda/hctx0/sched_tags:depth=64 /sys/kernel/debug/block/sda/hctx0/sched_tags:busy=1 /sys/kernel/debug/block/sda/hctx0/sched_tags:cleared=57 /sys/kernel/debug/block/sda/hctx0/sched_tags:bits_per_word=16 /sys/kernel/debug/block/sda/hctx0/sched_tags:map_nr=4 /sys/kernel/debug/block/sda/hctx0/sched_tags:alloc_hint={40, 20, 48, 0} /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_batch=8 /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_index=0 /sys/kernel/debug/block/sda/hctx0/sched_tags:ws_active=0 /sys/kernel/debug/block/sda/hctx0/sched_tags:ws={ /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/sched_tags:} /sys/kernel/debug/block/sda/hctx0/sched_tags:round_robin=1 /sys/kernel/debug/block/sda/hctx0/sched_tags:min_shallow_depth=48 /sys/kernel/debug/block/sda/hctx0/tags_bitmap:00000000: 0000 0000 /sys/kernel/debug/block/sda/hctx0/tags:nr_tags=32 /sys/kernel/debug/block/sda/hctx0/tags:nr_reserved_tags=0 /sys/kernel/debug/block/sda/hctx0/tags:active_queues=0 /sys/kernel/debug/block/sda/hctx0/tags:bitmap_tags: /sys/kernel/debug/block/sda/hctx0/tags:depth=32 /sys/kernel/debug/block/sda/hctx0/tags:busy=0 /sys/kernel/debug/block/sda/hctx0/tags:cleared=21 /sys/kernel/debug/block/sda/hctx0/tags:bits_per_word=8 /sys/kernel/debug/block/sda/hctx0/tags:map_nr=4 /sys/kernel/debug/block/sda/hctx0/tags:alloc_hint={19, 26, 7, 21} /sys/kernel/debug/block/sda/hctx0/tags:wake_batch=4 /sys/kernel/debug/block/sda/hctx0/tags:wake_index=0 /sys/kernel/debug/block/sda/hctx0/tags:ws_active=0 /sys/kernel/debug/block/sda/hctx0/tags:ws={ /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, /sys/kernel/debug/block/sda/hctx0/tags:} /sys/kernel/debug/block/sda/hctx0/tags:round_robin=1 /sys/kernel/debug/block/sda/hctx0/tags:min_shallow_depth=4294967295 /sys/kernel/debug/block/sda/hctx0/ctx_map:00000000: 00 /sys/kernel/debug/block/sda/hctx0/flags:alloc_policy=RR SHOULD_MERGE /sys/kernel/debug/block/sda/sched/queued:0 0 0 /sys/kernel/debug/block/sda/sched/owned_by_driver:0 0 0 /sys/kernel/debug/block/sda/sched/async_depth:48 /sys/kernel/debug/block/sda/sched/starved:0 /sys/kernel/debug/block/sda/sched/batching:2 /sys/kernel/debug/block/sda/state:SAME_COMP|IO_STAT|ADD_RANDOM|INIT_DONE|WC|STATS|REGISTERED|NOWAIT|SQ_SCHED /sys/kernel/debug/block/sda/pm_only:0 Let me know if there's anything further I can provide to assist in troubleshooting. Thanks, Jason > > Thanks, > Kuai > > >> > >> Thanks, > >> Kuai > >> > >>> cleared=55 > >>> bits_per_word=16 > >>> map_nr=4 > >>> alloc_hint={40, 20, 46, 0} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=48 > >>> nr_tags=32 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=32 > >>> busy=0 > >>> cleared=27 > >>> bits_per_word=8 > >>> map_nr=4 > >>> alloc_hint={19, 26, 5, 21} > >>> wake_batch=4 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=4294967295 > >> > >> > >>> > >>> > >>> [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx* > >>> /{sched_tags,tags,busy,dispatch} > >>> nr_tags=64 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=64 > >>> busy=1 > >>> cleared=56 > >>> bits_per_word=16 > >>> map_nr=4 > >>> alloc_hint={57, 43, 14, 19} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=48 > >>> nr_tags=32 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=32 > >>> busy=0 > >>> cleared=24 > >>> bits_per_word=8 > >>> map_nr=4 > >>> alloc_hint={17, 13, 23, 17} > >>> wake_batch=4 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch} > >>> nr_tags=64 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=64 > >>> busy=1 > >>> cleared=51 > >>> bits_per_word=16 > >>> map_nr=4 > >>> alloc_hint={36, 43, 15, 7} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=48 > >>> nr_tags=32 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=32 > >>> busy=0 > >>> cleared=31 > >>> bits_per_word=8 > >>> map_nr=4 > >>> alloc_hint={0, 15, 1, 22} > >>> wake_batch=4 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=1 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch} > >>> nr_tags=256 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=256 > >>> busy=1 > >>> cleared=131 > >>> bits_per_word=64 > >>> map_nr=4 > >>> alloc_hint={125, 46, 83, 205} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=192 > >>> nr_tags=10104 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=10104 > >>> busy=0 > >>> cleared=235 > >>> bits_per_word=64 > >>> map_nr=158 > >>> alloc_hint={503, 2913, 9827, 9851} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch} > >>> nr_tags=256 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=256 > >>> busy=1 > >>> cleared=97 > >>> bits_per_word=64 > >>> map_nr=4 > >>> alloc_hint={144, 144, 127, 254} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=192 > >>> nr_tags=10104 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=10104 > >>> busy=0 > >>> cleared=235 > >>> bits_per_word=64 > >>> map_nr=158 > >>> alloc_hint={503, 2913, 9827, 9851} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch} > >>> nr_tags=256 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=256 > >>> busy=1 > >>> cleared=34 > >>> bits_per_word=64 > >>> map_nr=4 > >>> alloc_hint={197, 20, 1, 230} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=192 > >>> nr_tags=10104 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=10104 > >>> busy=0 > >>> cleared=235 > >>> bits_per_word=64 > >>> map_nr=158 > >>> alloc_hint={503, 2913, 9827, 9851} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> [root@arch ~]# cat > >>> /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch} > >>> nr_tags=256 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=256 > >>> busy=1 > >>> cleared=27 > >>> bits_per_word=64 > >>> map_nr=4 > >>> alloc_hint={132, 74, 129, 76} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=192 > >>> nr_tags=10104 > >>> nr_reserved_tags=0 > >>> active_queues=0 > >>> > >>> bitmap_tags: > >>> depth=10104 > >>> busy=0 > >>> cleared=235 > >>> bits_per_word=64 > >>> map_nr=158 > >>> alloc_hint={503, 2913, 9827, 9851} > >>> wake_batch=8 > >>> wake_index=0 > >>> ws_active=0 > >>> ws={ > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> {.wait=inactive}, > >>> } > >>> round_robin=0 > >>> min_shallow_depth=4294967295 > >>> > >>> > >>> Thanks for your continued assistance with this! > >>> Jason > >>> > >>> > >>>> Thanks, > >>>> Kuai > >>>> > >>>>> &mddev->recovery)); > >>>>> 6399 if (atomic_read(&conf->reshape_stripes) != 0) > >>>>> 6400 return 0; > >>>>> > >>>>> Thanks > >>>>> > >>>>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> 在 2023/09/05 0:38, Jason Moss 写道: > >>>>>>> Hi Kuai, > >>>>>>> > >>>>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built > >>>>>>> an environment with 6.5.0.1 now and assembled the array there, but the > >>>>>>> same problem happens. It reshaped for 20-30 seconds, then completely > >>>>>>> stopped. > >>>>>>> > >>>>>>> Processes and /proc/<PID>/stack output: > >>>>>>> root 24593 0.0 0.0 0 0 ? I< 09:22 0:00 [raid5wq] > >>>>>>> root 24594 96.5 0.0 0 0 ? R 09:22 2:29 [md0_raid6] > >>>>>>> root 24595 0.3 0.0 0 0 ? D 09:22 0:00 [md0_reshape] > >>>>>>> > >>>>>>> [root@arch ~]# cat /proc/24593/stack > >>>>>>> [<0>] rescuer_thread+0x2b0/0x3b0 > >>>>>>> [<0>] kthread+0xe8/0x120 > >>>>>>> [<0>] ret_from_fork+0x34/0x50 > >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>>>>>> > >>>>>>> [root@arch ~]# cat /proc/24594/stack > >>>>>>> > >>>>>>> [root@arch ~]# cat /proc/24595/stack > >>>>>>> [<0>] reshape_request+0x416/0x9f0 [raid456] > >>>>>> Can you provide the addr2line result? Let's see where reshape_request() > >>>>>> is stuck first. > >>>>>> > >>>>>> Thanks, > >>>>>> Kuai > >>>>>> > >>>>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456] > >>>>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod] > >>>>>>> [<0>] md_thread+0xae/0x190 [md_mod] > >>>>>>> [<0>] kthread+0xe8/0x120 > >>>>>>> [<0>] ret_from_fork+0x34/0x50 > >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30 > >>>>>>> > >>>>>>> Please let me know if there's a better way to provide the stack info. > >>>>>>> > >>>>>>> Thank you > >>>>>>> > >>>>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> 在 2023/09/04 5:39, Jason Moss 写道: > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array, > >>>>>>>>> growing it to 9 drives. I've done similar before with the same array, > >>>>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8 > >>>>>>>>> with no issues. Drives are WD Reds, most older than 2019, some > >>>>>>>>> (including the newest) newer, but all confirmed CMR and not SMR. > >>>>>>>>> > >>>>>>>>> Process used to expand the array: > >>>>>>>>> mdadm --add /dev/md0 /dev/sdb1 > >>>>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0 > >>>>>>>>> > >>>>>>>>> The reshape started off fine, the process was underway, and the volume > >>>>>>>>> was still usable as expected. However, 15-30 minutes into the reshape, > >>>>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the > >>>>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all. > >>>>>>>>> Any process accessing the array would just hang until killed. I waited > >>>>>>>> > >>>>>>>> What kernel version are you using? And it'll be very helpful if you can > >>>>>>>> collect the stack of all stuck thread. There is a known deadlock for > >>>>>>>> raid5 related to reshape, and it's fixed in v6.5: > >>>>>>>> > >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com > >>>>>>>> > >>>>>>>>> a half hour and there was still no further change to the counter. At > >>>>>>>>> this point, I restarted the server and found that when it came back up > >>>>>>>>> it would begin reshaping again, but only very briefly, under 30 > >>>>>>>>> seconds, but the counter would be increasing during that time. > >>>>>>>>> > >>>>>>>>> I searched furiously for ideas and tried stopping and reassembling the > >>>>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then > >>>>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max > >>>>>>>>> file. Nothing ever seemed to make a difference. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in > >>>>>>>> progress will corrupt your data: > >>>>>>>> > >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Kuai > >>>>>>>> > >>>>>>>>> Here is where I slightly panicked, worried that I'd borked my array, > >>>>>>>>> and powered off the server again and disconnected the new drive that > >>>>>>>>> was just added, assuming that since it was the change, it may be the > >>>>>>>>> problem despite having burn-in tested it, and figuring that I'll rush > >>>>>>>>> order a new drive, so long as the reshape continues and I can just > >>>>>>>>> rebuild onto a new drive once the reshape finishes. However, this made > >>>>>>>>> no difference and the array continued to not rebuild. > >>>>>>>>> > >>>>>>>>> Much searching later, I'd found nothing substantially different then > >>>>>>>>> I'd already tried and one of the common threads in other people's > >>>>>>>>> issues was bad drives, so I ran a self-test against each of the > >>>>>>>>> existing drives and found one drive that failed the read test. > >>>>>>>>> Thinking I had the culprit now, I dropped that drive out of the array > >>>>>>>>> and assembled the array again, but the same behavior persists. The > >>>>>>>>> array reshapes very briefly, then completely stops. > >>>>>>>>> > >>>>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not > >>>>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc, > >>>>>>>>> and very frustrated, I took a break, bought all new drives to build a > >>>>>>>>> new array in another server and restored from a backup. However, there > >>>>>>>>> is still some data not captured by the most recent backup that I would > >>>>>>>>> like to recover, and I'd also like to solve the problem purely to > >>>>>>>>> understand what happened and how to recover in the future. > >>>>>>>>> > >>>>>>>>> Is there anything else I should try to recover this array, or is this > >>>>>>>>> a lost cause? > >>>>>>>>> > >>>>>>>>> Details requested by the wiki to follow and I'm happy to collect any > >>>>>>>>> further data that would assist. /dev/sdb is the new drive that was > >>>>>>>>> added, then disconnected. /dev/sdh is the drive that failed a > >>>>>>>>> self-test and was removed from the array. > >>>>>>>>> > >>>>>>>>> Thank you in advance for any help provided! > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> $ uname -a > >>>>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC > >>>>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux > >>>>>>>>> > >>>>>>>>> $ mdadm --version > >>>>>>>>> mdadm - v4.2 - 2021-12-30 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:27:55 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WCC4N7AT7R7X > >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93 > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:16 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WXG1A8UGLS42 > >>>>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b > >>>>>>>>> Firmware Version: 80.00A80 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:19 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WCC4N4HYL32Y > >>>>>>>>> LU WWN Device Id: 5 0014ee 2630752f8 > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:20 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68N32N0 > >>>>>>>>> Serial Number: WD-WCC7K1FF6DYK > >>>>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30 > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Form Factor: 3.5 inches > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-3 T13/2161-D revision 5 > >>>>>>>>> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:21 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WCC4N5ZHTRJF > >>>>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:22 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>>>>>> Serial Number: WD-WMC1T3804790 > >>>>>>>>> LU WWN Device Id: 5 0014ee 6036b6826 > >>>>>>>>> Firmware Version: 80.00A80 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:23 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WMC4N0H692Z9 > >>>>>>>>> LU WWN Device Id: 5 0014ee 65af39740 > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68EUZN0 > >>>>>>>>> Serial Number: WD-WMC4N0K5S750 > >>>>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca > >>>>>>>>> Firmware Version: 82.00A82 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Rotation Rate: 5400 rpm > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:24 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi > >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build) > >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org > >>>>>>>>> > >>>>>>>>> === START OF INFORMATION SECTION === > >>>>>>>>> Model Family: Western Digital Red > >>>>>>>>> Device Model: WDC WD30EFRX-68AX9N0 > >>>>>>>>> Serial Number: WD-WMC1T1502475 > >>>>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb > >>>>>>>>> Firmware Version: 80.00A80 > >>>>>>>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB] > >>>>>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical > >>>>>>>>> Device is: In smartctl database [for details use: -P show] > >>>>>>>>> ATA Version is: ACS-2 (minor revision not indicated) > >>>>>>>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > >>>>>>>>> Local Time is: Sun Sep 3 13:28:27 2023 PDT > >>>>>>>>> SMART support is: Available - device has SMART capability. > >>>>>>>>> SMART support is: Enabled > >>>>>>>>> > >>>>>>>>> === START OF READ SMART DATA SECTION === > >>>>>>>>> SMART overall-health self-assessment test result: PASSED > >>>>>>>>> > >>>>>>>>> SCT Error Recovery Control: > >>>>>>>>> Read: 70 (7.0 seconds) > >>>>>>>>> Write: 70 (7.0 seconds) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sda > >>>>>>>>> /dev/sda: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sda1 > >>>>>>>>> /dev/sda1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0xd > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 8ca60ad5:60d19333:11b24820:91453532 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors - bad > >>>>>>>>> blocks present. > >>>>>>>>> Checksum : b6d8f4d1 - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 7 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdb > >>>>>>>>> /dev/sdb: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdb1 > >>>>>>>>> /dev/sdb1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 386d3001:16447e43:4d2a5459:85618d11 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 00:02:59 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>>>>>> Checksum : b544a39 - correct > >>>>>>>>> Events : 181077 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 8 > >>>>>>>>> Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdc > >>>>>>>>> /dev/sdc: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdc1 > >>>>>>>>> /dev/sdc1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0xd > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>>>>>> blocks present. > >>>>>>>>> Checksum : 88d8b8fc - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 4 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdd > >>>>>>>>> /dev/sdd: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdd1 > >>>>>>>>> /dev/sdd1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247728 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 24 sectors > >>>>>>>>> Checksum : d1471d9d - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 6 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sde > >>>>>>>>> /dev/sde: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sde1 > >>>>>>>>> /dev/sde1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>>>> Checksum : e05d0278 - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 5 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdf > >>>>>>>>> /dev/sdf: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdf1 > >>>>>>>>> /dev/sdf1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>>>> Checksum : 26792cc0 - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 0 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdg > >>>>>>>>> /dev/sdg: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdg1 > >>>>>>>>> /dev/sdg1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 74476ce7:4edc23f6:08120711:ba281425 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>>>> Checksum : 6f67d179 - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 1 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdh > >>>>>>>>> /dev/sdh: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdh1 > >>>>>>>>> /dev/sdh1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0xd > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124207104 (118.45 GiB 127.19 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 20:09:14 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors - bad > >>>>>>>>> blocks present. > >>>>>>>>> Checksum : b7696b68 - correct > >>>>>>>>> Events : 181089 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 2 > >>>>>>>>> Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --examine /dev/sdi > >>>>>>>>> /dev/sdi: > >>>>>>>>> MBR Magic : aa55 > >>>>>>>>> Partition[0] : 4294967295 sectors at 1 (type ee) > >>>>>>>>> $ sudo mdadm --examine /dev/sdi1 > >>>>>>>>> /dev/sdi1: > >>>>>>>>> Magic : a92b4efc > >>>>>>>>> Version : 1.2 > >>>>>>>>> Feature Map : 0x5 > >>>>>>>>> Array UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> Creation Time : Tue Aug 4 23:47:57 2015 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Raid Devices : 9 > >>>>>>>>> > >>>>>>>>> Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Array Size : 20497268736 KiB (19.09 TiB 20.99 TB) > >>>>>>>>> Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB) > >>>>>>>>> Data Offset : 247808 sectors > >>>>>>>>> Super Offset : 8 sectors > >>>>>>>>> Unused Space : before=247720 sectors, after=14336 sectors > >>>>>>>>> State : clean > >>>>>>>>> Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483 > >>>>>>>>> > >>>>>>>>> Internal Bitmap : 8 sectors from superblock > >>>>>>>>> Reshape pos'n : 124311040 (118.55 GiB 127.29 GB) > >>>>>>>>> Delta Devices : 1 (8->9) > >>>>>>>>> > >>>>>>>>> Update Time : Tue Jul 11 23:12:08 2023 > >>>>>>>>> Bad Block Log : 512 entries available at offset 72 sectors > >>>>>>>>> Checksum : 23b6d024 - correct > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Layout : left-symmetric > >>>>>>>>> Chunk Size : 512K > >>>>>>>>> > >>>>>>>>> Device Role : Active device 3 > >>>>>>>>> Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >>>>>>>>> > >>>>>>>>> $ sudo mdadm --detail /dev/md0 > >>>>>>>>> /dev/md0: > >>>>>>>>> Version : 1.2 > >>>>>>>>> Raid Level : raid6 > >>>>>>>>> Total Devices : 9 > >>>>>>>>> Persistence : Superblock is persistent > >>>>>>>>> > >>>>>>>>> State : inactive > >>>>>>>>> Working Devices : 9 > >>>>>>>>> > >>>>>>>>> Delta Devices : 1, (-1->0) > >>>>>>>>> New Level : raid6 > >>>>>>>>> New Layout : left-symmetric > >>>>>>>>> New Chunksize : 512K > >>>>>>>>> > >>>>>>>>> Name : Blyth:0 (local to host Blyth) > >>>>>>>>> UUID : 440dc11e:079308b1:131eda79:9a74c670 > >>>>>>>>> Events : 181105 > >>>>>>>>> > >>>>>>>>> Number Major Minor RaidDevice > >>>>>>>>> > >>>>>>>>> - 8 1 - /dev/sda1 > >>>>>>>>> - 8 129 - /dev/sdi1 > >>>>>>>>> - 8 113 - /dev/sdh1 > >>>>>>>>> - 8 97 - /dev/sdg1 > >>>>>>>>> - 8 81 - /dev/sdf1 > >>>>>>>>> - 8 65 - /dev/sde1 > >>>>>>>>> - 8 49 - /dev/sdd1 > >>>>>>>>> - 8 33 - /dev/sdc1 > >>>>>>>>> - 8 17 - /dev/sdb1 > >>>>>>>>> > >>>>>>>>> $ cat /proc/mdstat > >>>>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >>>>>>>>> [raid4] [raid10] > >>>>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S) > >>>>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S) > >>>>>>>>> 26353689600 blocks super 1.2 > >>>>>>>>> > >>>>>>>>> unused devices: <none> > >>>>>>>>> > >>>>>>>>> . > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> . > >>>>>>> > >>>>>> > >>>>> > >>>>> . > >>>>> > >>>> > >>> > >>> . > >>> > >> > > > > . > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reshape Failure 2023-09-10 4:58 ` Jason Moss @ 2023-09-10 6:10 ` Yu Kuai 0 siblings, 0 replies; 21+ messages in thread From: Yu Kuai @ 2023-09-10 6:10 UTC (permalink / raw) To: Jason Moss, Yu Kuai Cc: linux-raid, yangerkun@huawei.com, linux-block, Jens Axboe, yukuai (C) Hi, [cc linux-block] 在 2023/09/10 12:58, Jason Moss 写道: > Hi, > > On Sat, Sep 9, 2023 at 7:45 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/09/07 14:19, Jason Moss 写道: >>> Hi, >>> >>> On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> Hi, >>>> >>>> 在 2023/09/07 13:44, Jason Moss 写道: >>>>> Hi, >>>>> >>>>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> 在 2023/09/06 22:05, Jason Moss 写道: >>>>>>> Hi Kuai, >>>>>>> >>>>>>> I ended up using gdb rather than addr2line, as that output didn't give >>>>>>> me the global offset. Maybe there's a better way, but this seems to be >>>>>>> similar to what I expected. >>>>>> >>>>>> It's ok. >>>>>>> >>>>>>> (gdb) list *(reshape_request+0x416) >>>>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396). >>>>>>> 6391 if ((mddev->reshape_backwards >>>>>>> 6392 ? (safepos > writepos && readpos < writepos) >>>>>>> 6393 : (safepos < writepos && readpos > writepos)) || >>>>>>> 6394 time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { >>>>>>> 6395 /* Cannot proceed until we've updated the >>>>>>> superblock... */ >>>>>>> 6396 wait_event(conf->wait_for_overlap, >>>>>>> 6397 atomic_read(&conf->reshape_stripes)==0 >>>>>>> 6398 || test_bit(MD_RECOVERY_INTR, >>>>>> >>>>>> If reshape is stuck here, which means: >>>>>> >>>>>> 1) Either reshape io is stuck somewhere and never complete; >>>>>> 2) Or the counter reshape_stripes is broken; >>>>>> >>>>>> Can you read following debugfs files to verify if io is stuck in >>>>>> underlying disk? >>>>>> >>>>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch} >>>>>> >>>>> >>>>> I'll attach this below. >>>>> >>>>>> Furthermore, echo frozen should break above wait_event() because >>>>>> 'MD_RECOVERY_INTR' will be set, however, based on your description, >>>>>> the problem still exist. Can you collect stack and addr2line result >>>>>> of stuck thread after echo frozen? >>>>>> >>>>> >>>>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo >>>>> call has been sitting for about 30 minutes, maybe longer, and has not >>>>> returned. Here's the current state: >>>>> >>>>> root 454 0.0 0.0 0 0 ? I< Sep05 0:00 [raid5wq] >>>>> root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) >>>> >>>> Can you also show the stack of udev-worker? And any other thread with >>>> 'D' state, I think above "echo frozen" is probably also stuck in D >>>> state. >>>> >>> >>> As requested: >>> >>> ps aux | grep D >>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>> root 455 0.0 0.0 34680 5988 ? D Sep05 0:00 (udev-worker) >>> root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] >>> root 45507 0.0 0.0 8272 4736 pts/1 Ds+ Sep05 0:00 -bash >>> jason 279169 0.0 0.0 6976 2560 pts/0 S+ 23:16 0:00 >>> grep --color=auto D >>> >>> [jason@arch md]$ sudo cat /proc/455/stack >>> [<0>] wait_woken+0x54/0x60 >>> [<0>] raid5_make_request+0x5fe/0x12f0 [raid456] >>> [<0>] md_handle_request+0x135/0x220 [md_mod] >>> [<0>] __submit_bio+0xb3/0x170 >>> [<0>] submit_bio_noacct_nocheck+0x159/0x370 >>> [<0>] block_read_full_folio+0x21c/0x340 >>> [<0>] filemap_read_folio+0x40/0xd0 >>> [<0>] filemap_get_pages+0x475/0x630 >>> [<0>] filemap_read+0xd9/0x350 >>> [<0>] blkdev_read_iter+0x6b/0x1b0 >>> [<0>] vfs_read+0x201/0x350 >>> [<0>] ksys_read+0x6f/0xf0 >>> [<0>] do_syscall_64+0x60/0x90 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 >>> >>> >>> [jason@arch md]$ sudo cat /proc/45507/stack >>> [<0>] kthread_stop+0x6a/0x180 >>> [<0>] md_unregister_thread+0x29/0x60 [md_mod] >>> [<0>] action_store+0x168/0x320 [md_mod] >>> [<0>] md_attr_store+0x86/0xf0 [md_mod] >>> [<0>] kernfs_fop_write_iter+0x136/0x1d0 >>> [<0>] vfs_write+0x23e/0x420 >>> [<0>] ksys_write+0x6f/0xf0 >>> [<0>] do_syscall_64+0x60/0x90 >>> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 >>> >>> Please let me know if you'd like me to identify the lines for any of those. >>> >> >> That's enough. >>> Thanks, >>> Jason >>> >>> >>>>> root 456 99.9 0.0 0 0 ? R Sep05 1543:40 [md0_raid6] >>>>> root 457 0.0 0.0 0 0 ? D Sep05 0:00 [md0_reshape] >>>>> >>>>> [jason@arch md]$ sudo cat /proc/457/stack >>>>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod] >>>>> [<0>] md_thread+0xae/0x190 [md_mod] >>>>> [<0>] kthread+0xe8/0x120 >>>>> [<0>] ret_from_fork+0x34/0x50 >>>>> [<0>] ret_from_fork_asm+0x1b/0x30 >>>>> >>>>> Reading symbols from md-mod.ko... >>>>> (gdb) list *(md_do_sync+0xef2) >>>>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035). >>>>> 9030 ? "interrupted" : "done"); >>>>> 9031 /* >>>>> 9032 * this also signals 'finished resyncing' to md_stop >>>>> 9033 */ >>>>> 9034 blk_finish_plug(&plug); >>>>> 9035 wait_event(mddev->recovery_wait, >>>>> !atomic_read(&mddev->recovery_active)); >>>> >>>> That's also wait for reshape io to be done from common layer. >>>> >>>>> 9036 >>>>> 9037 if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && >>>>> 9038 !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && >>>>> 9039 mddev->curr_resync >= MD_RESYNC_ACTIVE) { >>>>> >>>>> >>>>> The debugfs info: >>>>> >>>>> [root@arch ~]# cat >>>>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch} >>>> >>>> Only sched_tags is read, sorry that I didn't mean to use this exact cmd. >>>> >>>> Perhaps you can using following cmd: >>>> >>>> find /sys/kernel/debug/block/sda/ -type f | xargs grep . >>>> >>>>> nr_tags=64 >>>>> nr_reserved_tags=0 >>>>> active_queues=0 >>>>> >>>>> bitmap_tags: >>>>> depth=64 >>>>> busy=1 >>>> >>>> This means there is one IO in sda, however, I need more information to >>>> make sure where is this IO. And please make sure don't run any other >>>> thread that can read/write from sda. You can use "iostat -dmx 1" and >>>> observe for a while to confirm that there is no new io. >> >> And can you help for this? Confirm no new io and collect debugfs. > > As instructed, I confirmed there is no active IO to sda1 via iostat. I > then ran the provided command > > [root@arch ~]# find /sys/kernel/debug/block/sda/ -type f | xargs grep . > /sys/kernel/debug/block/sda/rqos/wbt/wb_background:6 > /sys/kernel/debug/block/sda/rqos/wbt/wb_normal:12 > /sys/kernel/debug/block/sda/rqos/wbt/unknown_cnt:4 > /sys/kernel/debug/block/sda/rqos/wbt/min_lat_nsec:75000000 > /sys/kernel/debug/block/sda/rqos/wbt/inflight:0: inflight 1 > /sys/kernel/debug/block/sda/rqos/wbt/inflight:1: inflight 0 > /sys/kernel/debug/block/sda/rqos/wbt/inflight:2: inflight 0 > /sys/kernel/debug/block/sda/rqos/wbt/id:0 > /sys/kernel/debug/block/sda/rqos/wbt/enabled:1 > /sys/kernel/debug/block/sda/rqos/wbt/curr_win_nsec:100000000 > /sys/kernel/debug/block/sda/hctx0/type:default > /sys/kernel/debug/block/sda/hctx0/dispatch_busy:0 > /sys/kernel/debug/block/sda/hctx0/active:0 > /sys/kernel/debug/block/sda/hctx0/run:2583 > /sys/kernel/debug/block/sda/hctx0/sched_tags_bitmap:00000000: 0000 > 0000 8000 0000 > /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_tags=64 > /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_reserved_tags=0 > /sys/kernel/debug/block/sda/hctx0/sched_tags:active_queues=0 > /sys/kernel/debug/block/sda/hctx0/sched_tags:bitmap_tags: > /sys/kernel/debug/block/sda/hctx0/sched_tags:depth=64 > /sys/kernel/debug/block/sda/hctx0/sched_tags:busy=1 sched_tags:busy is 1 indicate this io made to the elevator. Which means this problem is not related to raid,io issued to sda never return. > /sys/kernel/debug/block/sda/hctx0/sched_tags:cleared=57 > /sys/kernel/debug/block/sda/hctx0/sched_tags:bits_per_word=16 > /sys/kernel/debug/block/sda/hctx0/sched_tags:map_nr=4 > /sys/kernel/debug/block/sda/hctx0/sched_tags:alloc_hint={40, 20, 48, 0} > /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_batch=8 > /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_index=0 > /sys/kernel/debug/block/sda/hctx0/sched_tags:ws_active=0 > /sys/kernel/debug/block/sda/hctx0/sched_tags:ws={ > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/sched_tags:} > /sys/kernel/debug/block/sda/hctx0/sched_tags:round_robin=1 > /sys/kernel/debug/block/sda/hctx0/sched_tags:min_shallow_depth=48 > /sys/kernel/debug/block/sda/hctx0/tags_bitmap:00000000: 0000 0000 > /sys/kernel/debug/block/sda/hctx0/tags:nr_tags=32 > /sys/kernel/debug/block/sda/hctx0/tags:nr_reserved_tags=0 > /sys/kernel/debug/block/sda/hctx0/tags:active_queues=0 > /sys/kernel/debug/block/sda/hctx0/tags:bitmap_tags: > /sys/kernel/debug/block/sda/hctx0/tags:depth=32 > /sys/kernel/debug/block/sda/hctx0/tags:busy=0 sched_tags:busy is 0 indicate this io didn't make to the driver. So io is still in block layer, likely still in elevator. Which elevator you are using? You can confirm by: cat /sys/block/sda/queue/scheduler It's likely mq-deadline, anyway, can you switch to other elevator before assemble the array and retry to test if you can still reporduce the problem? Thanks, Kuai > /sys/kernel/debug/block/sda/hctx0/tags:cleared=21 > /sys/kernel/debug/block/sda/hctx0/tags:bits_per_word=8 > /sys/kernel/debug/block/sda/hctx0/tags:map_nr=4 > /sys/kernel/debug/block/sda/hctx0/tags:alloc_hint={19, 26, 7, 21} > /sys/kernel/debug/block/sda/hctx0/tags:wake_batch=4 > /sys/kernel/debug/block/sda/hctx0/tags:wake_index=0 > /sys/kernel/debug/block/sda/hctx0/tags:ws_active=0 > /sys/kernel/debug/block/sda/hctx0/tags:ws={ > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive}, > /sys/kernel/debug/block/sda/hctx0/tags:} > /sys/kernel/debug/block/sda/hctx0/tags:round_robin=1 > /sys/kernel/debug/block/sda/hctx0/tags:min_shallow_depth=4294967295 > /sys/kernel/debug/block/sda/hctx0/ctx_map:00000000: 00 > /sys/kernel/debug/block/sda/hctx0/flags:alloc_policy=RR SHOULD_MERGE > /sys/kernel/debug/block/sda/sched/queued:0 0 0 > /sys/kernel/debug/block/sda/sched/owned_by_driver:0 0 0 > /sys/kernel/debug/block/sda/sched/async_depth:48 > /sys/kernel/debug/block/sda/sched/starved:0 > /sys/kernel/debug/block/sda/sched/batching:2 > /sys/kernel/debug/block/sda/state:SAME_COMP|IO_STAT|ADD_RANDOM|INIT_DONE|WC|STATS|REGISTERED|NOWAIT|SQ_SCHED > /sys/kernel/debug/block/sda/pm_only:0 > > Let me know if there's anything further I can provide to assist in > troubleshooting. ^ permalink raw reply [flat|nested] 21+ messages in thread
* reshape failure @ 2011-02-16 15:46 Tobias McNulty 2011-02-16 20:32 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Tobias McNulty @ 2011-02-16 15:46 UTC (permalink / raw) To: linux-raid Hi, I tried to start a reshape over the weekend (RAID6 -> RAID5) and was dismayed to see that it was going to take roughly 2 weeks to complete: md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1] 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [uuuuu] [>....................] reshape = 0.0% (245760/1953514496) finish=21189.7min speed=1536K/sec The disk that contained the backup file began experiencing SATA errors several days into the reshape, due to what turned out to be a faulty SATA card. The card has since been replaced and the RAID1 device that contains the backup file successfully resync'ed. However, when I try to re-start the reshape now, I get the following error: nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh mdadm: Failed to restore critical section for reshape, sorry. Is my data lost for good? Is there anything else I can do? Thanks, Tobias -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-02-16 15:46 reshape failure Tobias McNulty @ 2011-02-16 20:32 ` NeilBrown 2011-02-16 20:41 ` Tobias McNulty 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2011-02-16 20:32 UTC (permalink / raw) To: Tobias McNulty; +Cc: linux-raid On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com> wrote: > Hi, > > I tried to start a reshape over the weekend (RAID6 -> RAID5) and was > dismayed to see that it was going to take roughly 2 weeks to complete: > > md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1] > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] > [uuuuu] [>....................] reshape = 0.0% (245760/1953514496) > finish=21189.7min speed=1536K/sec > > The disk that contained the backup file began experiencing SATA errors > several days into the reshape, due to what turned out to be a faulty > SATA card. The card has since been replaced and the RAID1 device that > contains the backup file successfully resync'ed. > > However, when I try to re-start the reshape now, I get the following error: > > nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc > /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh > mdadm: Failed to restore critical section for reshape, sorry. > > Is my data lost for good? Is there anything else I can do? Try above command with --verbose. If a message about "too-old timestamp" appears, run export MDADM_GROW_ALLOW_OLD=1 and run the command again. In either case, post the output. NeilBrown ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-02-16 20:32 ` NeilBrown @ 2011-02-16 20:41 ` Tobias McNulty 2011-02-16 21:06 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Tobias McNulty @ 2011-02-16 20:41 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown <neilb@suse.de> wrote: > On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com> >> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc >> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh >> mdadm: Failed to restore critical section for reshape, sorry. >> >> Is my data lost for good? Is there anything else I can do? > > Try above command with --verbose. > If a message about "too-old timestamp" appears, run > > export MDADM_GROW_ALLOW_OLD=1 > > and run the command again. > > In either case, post the output. Wow - it looks like that might have done the trick: nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh mdadm: looking for devices for /dev/md0 mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3. mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. mdadm: /dev/sde is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4. mdadm:/dev/md0 has an active reshape - checking if critical section needs to be restored mdadm: too-old timestamp on backup-metadata on md0.backup mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. nas:~# export MDADM_GROW_ALLOW_OLD=1 nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh mdadm: looking for devices for /dev/md0 mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3. mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. mdadm: /dev/sde is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4. mdadm:/dev/md0 has an active reshape - checking if critical section needs to be restored mdadm: accepting backup with timestamp 1297624561 for array with timestamp 1297692473 mdadm: restoring critical section mdadm: added /dev/sde to /dev/md0 as 1 mdadm: added /dev/sdd to /dev/md0 as 2 mdadm: added /dev/sdc to /dev/md0 as 3 mdadm: added /dev/sdh to /dev/md0 as 4 mdadm: added /dev/sdg to /dev/md0 as 5 mdadm: added /dev/sdf to /dev/md0 as 0 mdadm: /dev/md0 has been started with 5 drives and 1 spare. Now I see this in /etc/mdstat: md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] [=>...................] reshape = 9.9% (193691648/1953514496) finish=97156886.4min speed=0K/sec Is the 0K/sec something I need to worry about? Thanks! Tobias -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-02-16 20:41 ` Tobias McNulty @ 2011-02-16 21:06 ` NeilBrown 2011-02-17 21:39 ` Tobias McNulty 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2011-02-16 21:06 UTC (permalink / raw) To: Tobias McNulty; +Cc: linux-raid On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com> wrote: > On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com> > >> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc > >> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh > >> mdadm: Failed to restore critical section for reshape, sorry. > >> > >> Is my data lost for good? Is there anything else I can do? > > > > Try above command with --verbose. > > If a message about "too-old timestamp" appears, run > > > > export MDADM_GROW_ALLOW_OLD=1 > > > > and run the command again. > > > > In either case, post the output. > > Wow - it looks like that might have done the trick: > > nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup > /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh > mdadm: looking for devices for /dev/md0 > mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3. > mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. > mdadm: /dev/sde is identified as a member of /dev/md0, slot 1. > mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0. > mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5. > mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4. > mdadm:/dev/md0 has an active reshape - checking if critical section > needs to be restored > mdadm: too-old timestamp on backup-metadata on md0.backup > mdadm: Failed to find backup of critical section > mdadm: Failed to restore critical section for reshape, sorry. > nas:~# export MDADM_GROW_ALLOW_OLD=1 > nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup > /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh > mdadm: looking for devices for /dev/md0 > mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3. > mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. > mdadm: /dev/sde is identified as a member of /dev/md0, slot 1. > mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0. > mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5. > mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4. > mdadm:/dev/md0 has an active reshape - checking if critical section > needs to be restored > mdadm: accepting backup with timestamp 1297624561 for array with > timestamp 1297692473 > mdadm: restoring critical section > mdadm: added /dev/sde to /dev/md0 as 1 > mdadm: added /dev/sdd to /dev/md0 as 2 > mdadm: added /dev/sdc to /dev/md0 as 3 > mdadm: added /dev/sdh to /dev/md0 as 4 > mdadm: added /dev/sdg to /dev/md0 as 5 > mdadm: added /dev/sdf to /dev/md0 as 0 > mdadm: /dev/md0 has been started with 5 drives and 1 spare. That is what I expected.. > > Now I see this in /etc/mdstat: > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > [=>...................] reshape = 9.9% (193691648/1953514496) > finish=97156886.4min speed=0K/sec > > Is the 0K/sec something I need to worry about? Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. It is something to worry about. Is there an 'mdadm' running in the background? Can you 'strace' it for a few seconds? What does grep . /sys/block/md0/md/* show? Maybe do it twice, 1 minute apart. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-02-16 21:06 ` NeilBrown @ 2011-02-17 21:39 ` Tobias McNulty 2011-05-11 18:06 ` Tobias McNulty 0 siblings, 1 reply; 21+ messages in thread From: Tobias McNulty @ 2011-02-17 21:39 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com> > wrote: > > > > Now I see this in /etc/mdstat: > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > [=>...................] reshape = 9.9% (193691648/1953514496) > > finish=97156886.4min speed=0K/sec > > > > Is the 0K/sec something I need to worry about? > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. It is > something to worry about. It seems like it was another buggy SATA HBA?? I moved everything back to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic this time): md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] [==>..................] reshape = 10.0% (196960192/1953514496) finish=11376.9min speed=2572K/sec Is it really possible that I had two buggy SATA cards, from different manufacturers? Perhaps the motherboard is at fault? Or am I missing something very basic about connecting SATA drives to something other than the on-board ports? Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze (2.6.32-5-amd64). Tobias [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-02-17 21:39 ` Tobias McNulty @ 2011-05-11 18:06 ` Tobias McNulty 2011-05-11 21:12 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Tobias McNulty @ 2011-05-11 18:06 UTC (permalink / raw) To: linux-raid On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote: > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com> > > wrote: > > > > > > Now I see this in /etc/mdstat: > > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > > [=>...................] reshape = 9.9% (193691648/1953514496) > > > finish=97156886.4min speed=0K/sec > > > > > > Is the 0K/sec something I need to worry about? > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. It is > > something to worry about. > > It seems like it was another buggy SATA HBA?? I moved everything back > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic > this time): > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > [==>..................] reshape = 10.0% (196960192/1953514496) > finish=11376.9min speed=2572K/sec > > Is it really possible that I had two buggy SATA cards, from different > manufacturers? Perhaps the motherboard is at fault? Or am I missing > something very basic about connecting SATA drives to something other > than the on-board ports? > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze > (2.6.32-5-amd64). > > Tobias > > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y > [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm So, after figuring out the hardware issues, the reshape appears to have completed successfully (hurray!), but /proc/mdstat still says that the array is level 6. Is there another command I have to run to put the finishing touches on the conversion? md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU] Thank you! Tobias -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-05-11 18:06 ` Tobias McNulty @ 2011-05-11 21:12 ` NeilBrown 2011-05-11 21:19 ` Tobias McNulty [not found] ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com> 0 siblings, 2 replies; 21+ messages in thread From: NeilBrown @ 2011-05-11 21:12 UTC (permalink / raw) To: Tobias McNulty; +Cc: linux-raid On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com> wrote: > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote: > > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com> > > > wrote: > > > > > > > > Now I see this in /etc/mdstat: > > > > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > > > [=>...................] reshape = 9.9% (193691648/1953514496) > > > > finish=97156886.4min speed=0K/sec > > > > > > > > Is the 0K/sec something I need to worry about? > > > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. It is > > > something to worry about. > > > > It seems like it was another buggy SATA HBA?? I moved everything back > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic > > this time): > > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > [==>..................] reshape = 10.0% (196960192/1953514496) > > finish=11376.9min speed=2572K/sec > > > > Is it really possible that I had two buggy SATA cards, from different > > manufacturers? Perhaps the motherboard is at fault? Or am I missing > > something very basic about connecting SATA drives to something other > > than the on-board ports? > > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze > > (2.6.32-5-amd64). > > > > Tobias > > > > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y > > [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm > > So, after figuring out the hardware issues, the reshape appears to > have completed successfully (hurray!), but /proc/mdstat still says > that the array is level 6. Is there another command I have to run to > put the finishing touches on the conversion? > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU] > Just mdadm --grow /dev/md0 --level=5 should complete instantly. (assuming I'm correct in thinking that you want this to be a raid5 array - I don't really remember the details anymore :-) NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-05-11 21:12 ` NeilBrown @ 2011-05-11 21:19 ` Tobias McNulty [not found] ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com> 1 sibling, 0 replies; 21+ messages in thread From: Tobias McNulty @ 2011-05-11 21:19 UTC (permalink / raw) To: linux-raid On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com> > wrote: > > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote: > > > > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com> > > > > wrote: > > > > > > > > > > Now I see this in /etc/mdstat: > > > > > > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > > > > [=>...................] reshape = 9.9% (193691648/1953514496) > > > > > finish=97156886.4min speed=0K/sec > > > > > > > > > > Is the 0K/sec something I need to worry about? > > > > > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. It is > > > > something to worry about. > > > > > > It seems like it was another buggy SATA HBA?? I moved everything back > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic > > > this time): > > > > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > > [==>..................] reshape = 10.0% (196960192/1953514496) > > > finish=11376.9min speed=2572K/sec > > > > > > Is it really possible that I had two buggy SATA cards, from different > > > manufacturers? Perhaps the motherboard is at fault? Or am I missing > > > something very basic about connecting SATA drives to something other > > > than the on-board ports? > > > > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze > > > (2.6.32-5-amd64). > > > > > > Tobias > > > > > > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y > > > [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm > > > > So, after figuring out the hardware issues, the reshape appears to > > have completed successfully (hurray!), but /proc/mdstat still says > > that the array is level 6. Is there another command I have to run to > > put the finishing touches on the conversion? > > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > > 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU] > > > > Just > mdadm --grow /dev/md0 --level=5 > > should complete instantly. (assuming I'm correct in thinking that you want > this to be a raid5 array - I don't really remember the details anymore :-) Bingo! Thanks. md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1] 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] And I even ended up with a spare disk (wasn't sure how that part was going to work). Do you always have to run that command twice, or only if the reshape is interrupted? At least, I thought that was the same command I ran originally to kick it off. Thanks again. Tobias -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>]
* Re: reshape failure [not found] ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com> @ 2011-05-11 21:34 ` NeilBrown 2011-05-12 0:46 ` Tobias McNulty 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2011-05-11 21:34 UTC (permalink / raw) To: Tobias McNulty; +Cc: linux-raid On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty <tobias@caktusgroup.com> wrote: > On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote: > > > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com> > > wrote: > > > > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> > > wrote: > > > > > > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty < > > tobias@caktusgroup.com> > > > > > wrote: > > > > > > > > > > > > Now I see this in /etc/mdstat: > > > > > > > > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] > > > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 > > [5/5] [UUUUU] > > > > > > [=>...................] reshape = 9.9% > > (193691648/1953514496) > > > > > > finish=97156886.4min speed=0K/sec > > > > > > > > > > > > Is the 0K/sec something I need to worry about? > > > > > > > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. > > It is > > > > > something to worry about. > > > > > > > > It seems like it was another buggy SATA HBA?? I moved everything back > > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device > > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's > > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic > > > > this time): > > > > > > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 > > [5/5] [UUUUU] > > > > [==>..................] reshape = 10.0% (196960192/1953514496) > > > > finish=11376.9min speed=2572K/sec > > > > > > > > Is it really possible that I had two buggy SATA cards, from different > > > > manufacturers? Perhaps the motherboard is at fault? Or am I missing > > > > something very basic about connecting SATA drives to something other > > > > than the on-board ports? > > > > > > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a > > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze > > > > (2.6.32-5-amd64). > > > > > > > > Tobias > > > > > > > > [1] > > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y > > > > [2] > > http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm > > > > > > So, after figuring out the hardware issues, the reshape appears to > > > have completed successfully (hurray!), but /proc/mdstat still says > > > that the array is level 6. Is there another command I have to run to > > > put the finishing touches on the conversion? > > > > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] > > > 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU] > > > > > > > Just > > mdadm --grow /dev/md0 --level=5 > > > > should complete instantly. (assuming I'm correct in thinking that you want > > this to be a raid5 array - I don't really remember the details anymore :-) > > > Bingo! Thanks. > > md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1] > 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > > And I even ended up with a spare disk (wasn't sure how that part was going > to work). > > Do you always have to run that command twice, or only if the reshape is > interrupted? At least, I thought that was the same command I ran originally > to kick it off. Only if it is interrupted. The array doesn't know that a level change is needed after the layout change is completed, only the mdadm process knows that. And it has died. I could probably get the array itself to 'know' this... one day. NeilBrown > > Thanks again. > > Tobias ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: reshape failure 2011-05-11 21:34 ` NeilBrown @ 2011-05-12 0:46 ` Tobias McNulty 0 siblings, 0 replies; 21+ messages in thread From: Tobias McNulty @ 2011-05-12 0:46 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, May 11, 2011 at 5:34 PM, NeilBrown <neilb@suse.de> wrote: > On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty <tobias@caktusgroup.com> > wrote: > >> On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote: >> >> > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com> >> > wrote: >> > >> > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> >> > wrote: >> > > > >> > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote: >> > > > > >> > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty < >> > tobias@caktusgroup.com> >> > > > > wrote: >> > > > > > >> > > > > > Now I see this in /etc/mdstat: >> > > > > > >> > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1] >> > > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 >> > [5/5] [UUUUU] >> > > > > > [=>...................] reshape = 9.9% >> > (193691648/1953514496) >> > > > > > finish=97156886.4min speed=0K/sec >> > > > > > >> > > > > > Is the 0K/sec something I need to worry about? >> > > > > >> > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. >> > It is >> > > > > something to worry about. >> > > > >> > > > It seems like it was another buggy SATA HBA?? I moved everything back >> > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device >> > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's >> > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic >> > > > this time): >> > > > >> > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] >> > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 >> > [5/5] [UUUUU] >> > > > [==>..................] reshape = 10.0% (196960192/1953514496) >> > > > finish=11376.9min speed=2572K/sec >> > > > >> > > > Is it really possible that I had two buggy SATA cards, from different >> > > > manufacturers? Perhaps the motherboard is at fault? Or am I missing >> > > > something very basic about connecting SATA drives to something other >> > > > than the on-board ports? >> > > > >> > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a >> > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze >> > > > (2.6.32-5-amd64). >> > > > >> > > > Tobias >> > > > >> > > > [1] >> > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y >> > > > [2] >> > http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm >> > > >> > > So, after figuring out the hardware issues, the reshape appears to >> > > have completed successfully (hurray!), but /proc/mdstat still says >> > > that the array is level 6. Is there another command I have to run to >> > > put the finishing touches on the conversion? >> > > >> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1] >> > > 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU] >> > > >> > >> > Just >> > mdadm --grow /dev/md0 --level=5 >> > >> > should complete instantly. (assuming I'm correct in thinking that you want >> > this to be a raid5 array - I don't really remember the details anymore :-) >> >> >> Bingo! Thanks. >> >> md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1] >> 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] >> >> And I even ended up with a spare disk (wasn't sure how that part was going >> to work). >> >> Do you always have to run that command twice, or only if the reshape is >> interrupted? At least, I thought that was the same command I ran originally >> to kick it off. > > Only if it is interrupted. The array doesn't know that a level change is > needed after the layout change is completed, only the mdadm process knows > that. And it has died. > > I could probably get the array itself to 'know' this... one day. > > NeilBrown Hey, it makes perfect sense to me know that I know it's the expected behavior. I might have even tried it myself if I wasn't worried about screwing up the array, again. :-) Thanks Tobias -- Tobias McNulty, Managing Partner Caktus Consulting Group, LLC http://www.caktusgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-09-10 6:11 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-03 21:39 Reshape Failure Jason Moss
2023-09-04 1:41 ` Yu Kuai
2023-09-04 16:38 ` Jason Moss
2023-09-05 1:07 ` Yu Kuai
2023-09-06 14:05 ` Jason Moss
2023-09-07 1:38 ` Yu Kuai
2023-09-07 5:44 ` Jason Moss
[not found] ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com>
2023-09-07 6:19 ` Jason Moss
2023-09-10 2:45 ` Yu Kuai
2023-09-10 4:58 ` Jason Moss
2023-09-10 6:10 ` Yu Kuai
-- strict thread matches above, loose matches on Subject: below --
2011-02-16 15:46 reshape failure Tobias McNulty
2011-02-16 20:32 ` NeilBrown
2011-02-16 20:41 ` Tobias McNulty
2011-02-16 21:06 ` NeilBrown
2011-02-17 21:39 ` Tobias McNulty
2011-05-11 18:06 ` Tobias McNulty
2011-05-11 21:12 ` NeilBrown
2011-05-11 21:19 ` Tobias McNulty
[not found] ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>
2011-05-11 21:34 ` NeilBrown
2011-05-12 0:46 ` Tobias McNulty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).