RAID6 12 device assemble force failure

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID6 12 device assemble force failure
@ 2024-06-29 15:17 Adam Niescierowicz
  2024-07-01  8:51 ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Niescierowicz @ 2024-06-29 15:17 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 20134 bytes --]

Hi,

i have raid 6 array on 12 disk attached via external SAS backplane 
connected by 4 luns to the server. After some problems with backplane 
when 3 disk went offline (in one second) and array stop.

When disk come online we stop array by mdadm --stop /dev/md126 
unfurtunetlny we can't  assemble
---

mdadm --assemble /dev/sd{q,p,o,n,m,z,y,z,w,t,s,r}1

mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
start the array.

---

at system boot mdadm recognise array

---

#cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] 
[raid1] [raid10]
md126 : inactive sdt1[8](S) sdn1[5](S) sdw1[2] sdy1[6] sdm1[9] sdp1[0] 
sdq1[7] sds1[4] sdo1[11] sdx1[1] sdz1[10](S) sdr1[3]
       234380292096 blocks super 1.2

# mdadm --detail /dev/md126
/dev/md126:
            Version : 1.2
         Raid Level : raid6
      Total Devices : 12
        Persistence : Superblock is persistent

              State : inactive
    Working Devices : 12

               Name : backup:card1port1chassis2
               UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
             Events : 48640

     Number   Major   Minor   RaidDevice

        -      65        1        -        /dev/sdq1
        -       8      241        -        /dev/sdp1
        -       8      225        -        /dev/sdo1
        -       8      209        -        /dev/sdn1
        -       8      193        -        /dev/sdm1
        -      65      145        -        /dev/sdz1
        -      65      129        -        /dev/sdy1
        -      65      113        -        /dev/sdx1
        -      65       97        -        /dev/sdw1
        -      65       49        -        /dev/sdt1
        -      65       33        -        /dev/sds1
        -      65       17        -        /dev/sdr1
---


But it can't start, after mdadm --run

---

# mdadm --detail /dev/md126
/dev/md126:
            Version : 1.2
      Creation Time : Tue Jun 18 20:07:19 2024
         Raid Level : raid6
      Used Dev Size : 18446744073709551615
       Raid Devices : 12
      Total Devices : 12
        Persistence : Superblock is persistent

        Update Time : Fri Jun 28 22:21:57 2024
              State : active, FAILED, Not Started
     Active Devices : 9
    Working Devices : 12
     Failed Devices : 0
      Spare Devices : 3

             Layout : left-symmetric
         Chunk Size : 512K

Consistency Policy : unknown

               Name : backup:card1port1chassis2
               UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
             Events : 48640

     Number   Major   Minor   RaidDevice State
        -       0        0        0      removed
        -       0        0        1      removed
        -       0        0        2      removed
        -       0        0        3      removed
        -       0        0        4      removed
        -       0        0        5      removed
        -       0        0        6      removed
        -       0        0        7      removed
        -       0        0        8      removed
        -       0        0        9      removed
        -       0        0       10      removed
        -       0        0       11      removed

        -      65        1        7      sync   /dev/sdq1
        -       8      241        0      sync   /dev/sdp1
        -       8      225       11      sync   /dev/sdo1
        -       8      209        -      spare   /dev/sdn1
        -       8      193        9      sync   /dev/sdm1
        -      65      145        -      spare   /dev/sdz1
        -      65      129        6      sync   /dev/sdy1
        -      65      113        1      sync   /dev/sdx1
        -      65       97        2      sync   /dev/sdw1
        -      65       49        -      spare   /dev/sdt1
        -      65       33        4      sync   /dev/sds1
        -      65       17        3      sync   /dev/sdr1

---

I think the problem is that disk are recognised as spare, but why?

After examinantion of disk all have the same event: 48640 but three that 
have
---
Bad Block Log : 512 entries available at offset 88 sectors - bad blocks 
present.
Device role: spare

---

full mdadm -E
---
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x9
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264088 sectors, after=0 sectors
           State : clean
     Device UUID : e726c6bc:11415fcc:49e8e0a5:041b69e4

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
   Bad Block Log : 512 entries available at offset 88 sectors - bad 
blocks present.
        Checksum : 9ad955ac - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
---


I tried with `mdadm --assemble --force --update=force-no-bbl 
/dev/sd{q,p,o,n,m,z,y,z,w,t,s,r}1` and now mdam -E shows


---

           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : e726c6bc:11415fcc:49e8e0a5:041b69e4

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 9ad1554c - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
---


What can I do to start this array?


Below full mdadm -E of all disk in the array

```
# mdadm -E /dev/sd{q,p,o,n,m,z,y,z,w,t,s,r}1
/dev/sdq1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : eef85925:fbdeb293:703eade7:04725a02

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 2a767262 - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 7
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdp1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 892b5f88:e0233f4a:0e509ffd:c72375f6

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 4e14ad3d - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdo1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 7765576b:6e58ad14:c1ae0b5e:90d36937

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 9cdc2a3e - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 11
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdn1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 0c7d6e27:e4356ca7:556ba2d1:5632cbf6

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 1eaa3a9f - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdm1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 49b1e45d:719a504f:9450e6b8:649c1cd5

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : c29a22b9 - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 9
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdz1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : a4437981:be1db310:47aaf5ab:5e325a68

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 2dde280f - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdy1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 2c02425e:b4e2ceb2:a83c5656:bda51c7b

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 69e5b149 - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 6
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdz1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : a4437981:be1db310:47aaf5ab:5e325a68

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 2dde280f - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdw1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 0ab85136:9bb9481f:97cce69a:56966f09

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 8152be91 - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdt1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : e726c6bc:11415fcc:49e8e0a5:041b69e4

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 9ad1554c - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sds1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : 6fadf5ed:db4aeaf4:5713a2df:a5e70ff5

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : 3ef3dd4a - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
/dev/sdr1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
            Name : backup:card1port1chassis2
   Creation Time : Tue Jun 18 20:07:19 2024
      Raid Level : raid6
    Raid Devices : 12

  Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
      Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
     Data Offset : 264192 sectors
    Super Offset : 8 sectors
    Unused Space : before=264104 sectors, after=0 sectors
           State : clean
     Device UUID : b923a958:3fca3a18:03f93cd3:e4ff4203

Internal Bitmap : 8 sectors from superblock
     Update Time : Fri Jun 28 22:21:57 2024
        Checksum : cec5d0df - correct
          Events : 48640

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
replacing)
```



-- 
---
Thanks
Adam Nieścierowicz

[-- Attachment #2: adam_niescierowicz.vcf --]
[-- Type: text/vcard, Size: 195 bytes --]

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@justnet.pl
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-06-29 15:17 RAID6 12 device assemble force failure Adam Niescierowicz
@ 2024-07-01  8:51 ` Mariusz Tkaczyk
  2024-07-01  9:33   ` Adam Niescierowicz
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-01  8:51 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

Hello Adam,
I hope you have backup! Citation from raid wiki linked below:

"Remember, RAID is not a backup! If you lose redundancy, you need to take a
backup!"

I'm not native raid expert but I will try to give you some clues.

On Sat, 29 Jun 2024 17:17:54 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> Hi,
> 
> i have raid 6 array on 12 disk attached via external SAS backplane 
> connected by 4 luns to the server. After some problems with backplane 
> when 3 disk went offline (in one second) and array stop.
> 

And raid is considered as failed by mdadm and it is persistent with state of
the devices in metadata.

>     Device Role : spare
>     Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
> replacing)

3 missing = failed raid 6 array.

> I think the problem is that disk are recognised as spare, but why?

Because mdadm cannot trust them because they are reported as "missing" so they
are not configured as raid devices (spare is default state).

> I tried with `mdadm --assemble --force --update=force-no-bbl 

It remove badblocks but not revert devices from "missing" to "active".

> /dev/sd{q,p,o,n,m,z,y,z,w,t,s,r}1` and now mdam -E shows
> 
> 
> ---
> 
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x1
>       Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
>             Name : backup:card1port1chassis2
>    Creation Time : Tue Jun 18 20:07:19 2024
>       Raid Level : raid6
>     Raid Devices : 12
> 
>   Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
>       Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
>      Data Offset : 264192 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=264104 sectors, after=0 sectors
>            State : clean
>      Device UUID : e726c6bc:11415fcc:49e8e0a5:041b69e4
> 
> Internal Bitmap : 8 sectors from superblock
>      Update Time : Fri Jun 28 22:21:57 2024
>         Checksum : 9ad1554c - correct
>           Events : 48640
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : spare
>     Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' == 
> replacing)
> ---
> 
> 
> What can I do to start this array?

 You may try to add them manually. I know that there is
--re-add functionality but I've never used it. Maybe something like that would
work:
#mdadm --remove /dev/md126 <failed drive>
#mdadm --re-add /dev/md126 <failed_drive>

If you will recover one drive this way, array should start but data might be not
consistent, please be aware of that!

Drive should be restored to sync state in details.

I highly advice you to simulate this scenario on not infrastructure
critical setup. As i said, I'm not a native raid expert.

for more suggestions see:
https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-01  8:51 ` Mariusz Tkaczyk
@ 2024-07-01  9:33   ` Adam Niescierowicz
  2024-07-02  8:47     ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Niescierowicz @ 2024-07-01  9:33 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2638 bytes --]

> On Sat, 29 Jun 2024 17:17:54 +0200
> Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:
>
>> Hi,
>>
>> i have raid 6 array on 12 disk attached via external SAS backplane
>> connected by 4 luns to the server. After some problems with backplane
>> when 3 disk went offline (in one second) and array stop.
>>
> And raid is considered as failed by mdadm and it is persistent with state of
> the devices in metadata.
>
>>      Device Role : spare
>>      Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' ==
>> replacing)
> 3 missing = failed raid 6 array.
>
>> I think the problem is that disk are recognised as spare, but why?
> Because mdadm cannot trust them because they are reported as "missing" so they
> are not configured as raid devices (spare is default state).
>> I tried with `mdadm --assemble --force --update=force-no-bbl
> It remove badblocks but not revert devices from "missing" to "active".
Is there a way to force state=active in the metadata?
 From what I saw each drive have exactly the same Events: 48640 and 
Update Time so data on the drive should be the same.
>> /dev/sd{q,p,o,n,m,z,y,z,w,t,s,r}1` and now mdam -E shows
>>
>>
>> ---
>>
>>             Magic : a92b4efc
>>           Version : 1.2
>>       Feature Map : 0x1
>>        Array UUID : f8fb0d5d:5cacae2e:12bf1656:18264fb5
>>              Name : backup:card1port1chassis2
>>     Creation Time : Tue Jun 18 20:07:19 2024
>>        Raid Level : raid6
>>      Raid Devices : 12
>>
>>    Avail Dev Size : 39063382016 sectors (18.19 TiB 20.00 TB)
>>        Array Size : 195316910080 KiB (181.90 TiB 200.00 TB)
>>       Data Offset : 264192 sectors
>>      Super Offset : 8 sectors
>>      Unused Space : before=264104 sectors, after=0 sectors
>>             State : clean
>>       Device UUID : e726c6bc:11415fcc:49e8e0a5:041b69e4
>>
>> Internal Bitmap : 8 sectors from superblock
>>       Update Time : Fri Jun 28 22:21:57 2024
>>          Checksum : 9ad1554c - correct
>>            Events : 48640
>>
>>            Layout : left-symmetric
>>        Chunk Size : 512K
>>
>>      Device Role : spare
>>      Array State : AAAAA.AA.A.A ('A' == active, '.' == missing, 'R' ==
>> replacing)
>> ---
>>
>>
>> What can I do to start this array?
>   You may try to add them manually. I know that there is
> --re-add functionality but I've never used it. Maybe something like that would
> work:
> #mdadm --remove /dev/md126 <failed drive>
> #mdadm --re-add /dev/md126 <failed_drive>
I tried this but didn't help.

-- 
---
Thanks
Adam Nieścierowicz

[-- Attachment #2: adam_niescierowicz.vcf --]
[-- Type: text/vcard, Size: 195 bytes --]

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@justnet.pl
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-01  9:33   ` Adam Niescierowicz
@ 2024-07-02  8:47     ` Mariusz Tkaczyk
  2024-07-02 17:47       ` Adam Niescierowicz
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-02  8:47 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

On Mon, 1 Jul 2024 11:33:16 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> Is there a way to force state=active in the metadata?
>  From what I saw each drive have exactly the same Events: 48640 and 
> Update Time so data on the drive should be the same.

The most important: I advice you to clone disks to have a safe space for
practicing. Whatever you will do is risky now, we don't want to make
situation worse. My suggestions might be destructible and I don't want to take
responsibility of making it worse.

We have --dump and --restore functionality, I've never used it
(I mainly IMSM focused) so I can just point you that it is there and it is an
option to clone metadata.

native metadata keep both spares and data in the same array, and we can see
that spare states for those 3 devices are consistently reported on every drive.

It means that at some point metadata with missing disk states updated to
spares  has been written to the all array members (including spares) but it does
not mean that the data is consistent. You are recovering from error scenario and
whatever is there, you need to be read for the worst case.

The brute-force method would be to recreate an array with same startup
parameters and --assume-clean flag but this is risky. Probably your array
was initially created few years (and mdadm versions) so there could be small
differences in the array parameters mdadm sets now. Anyway, I see it as the
simplest option.

We can try to start array manually by setting sysfs values, however it will
require well familiarize with mdadm code so would be time consuming.

>>> What can I do to start this array?  
>>   You may try to add them manually. I know that there is
>> --re-add functionality but I've never used it. Maybe something like that
>> would
>> work:
>> #mdadm --remove /dev/md126 <failed drive>
>> #mdadm --re-add /dev/md126 <failed_drive>  
>I tried this but didn't help.

Please provide a logs then (possibly with -vvvvv) maybe I or someone else would
help.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-02  8:47     ` Mariusz Tkaczyk
@ 2024-07-02 17:47       ` Adam Niescierowicz
  2024-07-03  7:42         ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Niescierowicz @ 2024-07-02 17:47 UTC (permalink / raw)
  To: linux-raid; +Cc: Mariusz Tkaczyk


[-- Attachment #1.1: Type: text/plain, Size: 4644 bytes --]


On 2.07.2024 o 10:47, Mariusz Tkaczyk wrote:
> On Mon, 1 Jul 2024 11:33:16 +0200
> Adam Niescierowicz<adam.niescierowicz@justnet.pl>  wrote:
>
>> Is there a way to force state=active in the metadata?
>>   From what I saw each drive have exactly the same Events: 48640 and
>> Update Time so data on the drive should be the same.
> The most important: I advice you to clone disks to have a safe space for
> practicing. Whatever you will do is risky now, we don't want to make
> situation worse. My suggestions might be destructible and I don't want to take
> responsibility of making it worse.
I am aware of the danger. Thank you also for your help.
> We have --dump and --restore functionality, I've never used it
> (I mainly IMSM focused) so I can just point you that it is there and it is an
> option to clone metadata.
>
> native metadata keep both spares and data in the same array, and we can see
> that spare states for those 3 devices are consistently reported on every drive.
>
> It means that at some point metadata with missing disk states updated to
> spares  has been written to the all array members (including spares) but it does
> not mean that the data is consistent. You are recovering from error scenario and
> whatever is there, you need to be read for the worst case.
>
> The brute-force method would be to recreate an array with same startup
> parameters and --assume-clean flag but this is risky. Probably your array
> was initially created few years (and mdadm versions) so there could be small
> differences in the array parameters mdadm sets now. Anyway, I see it as the
> simplest option.

In my situation this is fresh config like 2 weeks and I can install 
exactly the same mdadm version.

If there will be no other way to process I will try to create array 
--force --assume-clean

> We can try to start array manually by setting sysfs values, however it will
> require well familiarize with mdadm code so would be time consuming.
>
>>>> What can I do to start this array?
>>>    You may try to add them manually. I know that there is
>>> --re-add functionality but I've never used it. Maybe something like that
>>> would
>>> work:
>>> #mdadm --remove /dev/md126 <failed drive>
>>> #mdadm --re-add /dev/md126 <failed_drive>
>> I tried this but didn't help.
> Please provide a logs then (possibly with -vvvvv) maybe I or someone else would
> help.

Logs
---

# mdadm --run -vvvvv /dev/md126
mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error

# mdadm --stop /dev/md126
mdadm: stopped /dev/md126

# mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 
/dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 
/dev/sdn1 /dev/sdw1 /dev/sdt1
mdadm: looking for devices for /dev/md126
mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
mdadm: added /dev/sdv1 to /dev/md126 as 1
mdadm: added /dev/sdn1 to /dev/md126 as 2
mdadm: added /dev/sdm1 to /dev/md126 as 3
mdadm: added /dev/sdw1 to /dev/md126 as 4
mdadm: no uptodate device for slot 5 of /dev/md126
mdadm: added /dev/sdr1 to /dev/md126 as 6
mdadm: added /dev/sds1 to /dev/md126 as 7
mdadm: no uptodate device for slot 8 of /dev/md126
mdadm: added /dev/sdx1 to /dev/md126 as 9
mdadm: no uptodate device for slot 10 of /dev/md126
mdadm: added /dev/sdz1 to /dev/md126 as 11
mdadm: added /dev/sdq1 to /dev/md126 as -1
mdadm: added /dev/sdu1 to /dev/md126 as -1
mdadm: added /dev/sdk1 to /dev/md126 as -1
mdadm: added /dev/sdt1 to /dev/md126 as 0
mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
start the array.
---

Can somebody explain me behavior of the array? (theory)

This is RAID-6 so after two disk are disconnected it still works fine. 
Next when third disk disconnect the array should stop as faulty, yes?
If array stop as faulty the data on array and third disconnected disk 
should be the same, yes?


-- 
---
Thanks
Adam Nieścierowicz

[-- Attachment #1.2: Type: text/html, Size: 6481 bytes --]

[-- Attachment #2: adam_niescierowicz.vcf --]
[-- Type: text/vcard, Size: 195 bytes --]

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@justnet.pl
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-02 17:47       ` Adam Niescierowicz
@ 2024-07-03  7:42         ` Mariusz Tkaczyk
  2024-07-03 10:16           ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-03  7:42 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

On Tue, 2 Jul 2024 19:47:52 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> >>>> What can I do to start this array?  
> >>>    You may try to add them manually. I know that there is
> >>> --re-add functionality but I've never used it. Maybe something like that
> >>> would
> >>> work:
> >>> #mdadm --remove /dev/md126 <failed drive>
> >>> #mdadm --re-add /dev/md126 <failed_drive>  
> >> I tried this but didn't help.  
> > Please provide a logs then (possibly with -vvvvv) maybe I or someone else
> > would help.  
> 
> Logs
> ---
> 
> # mdadm --run -vvvvv /dev/md126
> mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error
> 
> # mdadm --stop /dev/md126
> mdadm: stopped /dev/md126
> 
> # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 
> /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 
> /dev/sdn1 /dev/sdw1 /dev/sdt1
> mdadm: looking for devices for /dev/md126
> mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
> mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
> mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
> mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
> mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
> mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
> mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
> mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
> mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
> mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
> mdadm: added /dev/sdv1 to /dev/md126 as 1
> mdadm: added /dev/sdn1 to /dev/md126 as 2
> mdadm: added /dev/sdm1 to /dev/md126 as 3
> mdadm: added /dev/sdw1 to /dev/md126 as 4
> mdadm: no uptodate device for slot 5 of /dev/md126
> mdadm: added /dev/sdr1 to /dev/md126 as 6
> mdadm: added /dev/sds1 to /dev/md126 as 7
> mdadm: no uptodate device for slot 8 of /dev/md126
> mdadm: added /dev/sdx1 to /dev/md126 as 9
> mdadm: no uptodate device for slot 10 of /dev/md126
> mdadm: added /dev/sdz1 to /dev/md126 as 11
> mdadm: added /dev/sdq1 to /dev/md126 as -1
> mdadm: added /dev/sdu1 to /dev/md126 as -1
> mdadm: added /dev/sdk1 to /dev/md126 as -1
> mdadm: added /dev/sdt1 to /dev/md126 as 0
> mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
> start the array.
> ---

Could you please share the logs with from --re-add attempt? In a meantime I
will try to simulate this scenario.
> 
> Can somebody explain me behavior of the array? (theory)
> 
> This is RAID-6 so after two disk are disconnected it still works fine. 
> Next when third disk disconnect the array should stop as faulty, yes?
> If array stop as faulty the data on array and third disconnected disk 
> should be the same, yes?

If you will recover only one drive (and start double degraded array), it may
lead to RWH (raid write hole).

If there were writes during disks failure, we don't know which in flight
requests completed. The XOR based calculations may leads us to improper results
for some sectors (we need to read all disks and XOR the data to get the data
for missing 2 drives).

But.. if you will add again all disks, in worst case we will read outdated data
and your filesystem should be able to recover from it.

So yes, it should be fine if you will start array with all drives.

Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-03  7:42         ` Mariusz Tkaczyk
@ 2024-07-03 10:16           ` Mariusz Tkaczyk
  2024-07-03 21:10             ` Adam Niescierowicz
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-03 10:16 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

On Wed, 3 Jul 2024 09:42:53 +0200
Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote:

> On Tue, 2 Jul 2024 19:47:52 +0200
> Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:
> 
> > >>>> What can I do to start this array?    
> > >>>    You may try to add them manually. I know that there is
> > >>> --re-add functionality but I've never used it. Maybe something like that
> > >>> would
> > >>> work:
> > >>> #mdadm --remove /dev/md126 <failed drive>
> > >>> #mdadm --re-add /dev/md126 <failed_drive>    
> > >> I tried this but didn't help.    
> > > Please provide a logs then (possibly with -vvvvv) maybe I or someone else
> > > would help.    
> > 
> > Logs
> > ---
> > 
> > # mdadm --run -vvvvv /dev/md126
> > mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error
> > 
> > # mdadm --stop /dev/md126
> > mdadm: stopped /dev/md126
> > 
> > # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 
> > /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 
> > /dev/sdn1 /dev/sdw1 /dev/sdt1
> > mdadm: looking for devices for /dev/md126
> > mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
> > mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
> > mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
> > mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
> > mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
> > mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
> > mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
> > mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
> > mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
> > mdadm: added /dev/sdv1 to /dev/md126 as 1
> > mdadm: added /dev/sdn1 to /dev/md126 as 2
> > mdadm: added /dev/sdm1 to /dev/md126 as 3
> > mdadm: added /dev/sdw1 to /dev/md126 as 4
> > mdadm: no uptodate device for slot 5 of /dev/md126
> > mdadm: added /dev/sdr1 to /dev/md126 as 6
> > mdadm: added /dev/sds1 to /dev/md126 as 7
> > mdadm: no uptodate device for slot 8 of /dev/md126
> > mdadm: added /dev/sdx1 to /dev/md126 as 9
> > mdadm: no uptodate device for slot 10 of /dev/md126
> > mdadm: added /dev/sdz1 to /dev/md126 as 11
> > mdadm: added /dev/sdq1 to /dev/md126 as -1
> > mdadm: added /dev/sdu1 to /dev/md126 as -1
> > mdadm: added /dev/sdk1 to /dev/md126 as -1
> > mdadm: added /dev/sdt1 to /dev/md126 as 0
> > mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
> > start the array.
> > ---  
> 
> Could you please share the logs with from --re-add attempt? In a meantime I
> will try to simulate this scenario.
> > 
> > Can somebody explain me behavior of the array? (theory)
> > 
> > This is RAID-6 so after two disk are disconnected it still works fine. 
> > Next when third disk disconnect the array should stop as faulty, yes?
> > If array stop as faulty the data on array and third disconnected disk 
> > should be the same, yes?  
> 
> If you will recover only one drive (and start double degraded array), it may
> lead to RWH (raid write hole).
> 
> If there were writes during disks failure, we don't know which in flight
> requests completed. The XOR based calculations may leads us to improper
> results for some sectors (we need to read all disks and XOR the data to get
> the data for missing 2 drives).
> 
> But.. if you will add again all disks, in worst case we will read outdated
> data and your filesystem should be able to recover from it.
> 
> So yes, it should be fine if you will start array with all drives.
> 
> Mariusz
> 


I was able to achieve similar state:

mdadm -E /dev/nvme2n1
/dev/nvme2n1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88
           Name : gklab-localhost:my_r6  (local to host gklab-localhost)
  Creation Time : Wed Jul  3 09:43:32 2024
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
     Array Size : 10485760 KiB (10.00 GiB 10.74 GB)
  Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=1942775216 sectors
          State : clean
    Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367

    Update Time : Wed Jul  3 11:49:34 2024
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : a96eaa64 - correct
         Events : 6

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing)


In my case, events value was different and /dev/nvme3n1 had different Array
State:
 Device Role : Active device 3
   Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)

And I failed to start it, sorry. It is possible but it requires to work with
sysfs and ioctls directly so much safer is to recreate an array with
--assume-clean, especially that it is fresh array.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-03 10:16           ` Mariusz Tkaczyk
@ 2024-07-03 21:10             ` Adam Niescierowicz
  2024-07-04 11:06               ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Niescierowicz @ 2024-07-03 21:10 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 2408 bytes --]

On 3.07.2024 o 12:16, Mariusz Tkaczyk wrote:
> On Wed, 3 Jul 2024 09:42:53 +0200
> Mariusz Tkaczyk<mariusz.tkaczyk@linux.intel.com>  wrote:
>
> I was able to achieve similar state:
>
> mdadm -E /dev/nvme2n1
> /dev/nvme2n1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88
>             Name : gklab-localhost:my_r6  (local to host gklab-localhost)
>    Creation Time : Wed Jul  3 09:43:32 2024
>       Raid Level : raid6
>     Raid Devices : 4
>
>   Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
>       Array Size : 10485760 KiB (10.00 GiB 10.74 GB)
>    Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB)
>      Data Offset : 264192 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=264112 sectors, after=1942775216 sectors
>            State : clean
>      Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367
>
>      Update Time : Wed Jul  3 11:49:34 2024
>    Bad Block Log : 512 entries available at offset 16 sectors
>         Checksum : a96eaa64 - correct
>           Events : 6
>
>           Layout : left-symmetric
>       Chunk Size : 512K
>
>     Device Role : Active device 2
>     Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing)
>
>
> In my case, events value was different and /dev/nvme3n1 had different Array
> State:
>   Device Role : Active device 3
>     Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)

This kind of array behavior is like it should be?

Why I'm asking, in theory.

We have bitmap so when third drive disconnect from the array we should 
have time to stop the array in foulty state before bitmap space is over, 
yes?.
Next array send notification to FS(filesystem) that there is a problem 
and will discard all write operation.

Data that can't be store on the foulty device should be keep in the bitmap.
Next when we reatach missing third drive when we write missing data from 
bitmap to disk everything should be good, yes?

I'm thinking correctly?


> And I failed to start it, sorry. It is possible but it requires to work with
> sysfs and ioctls directly so much safer is to recreate an array with
> --assume-clean, especially that it is fresh array.

I recreated the array, LVM detected PV and works fine but XFS above the 
LVM is missing data from recreate array.

-- 
---
Pozdrawiam
Adam Nieścierowicz

[-- Attachment #1.2: Type: text/html, Size: 3155 bytes --]

[-- Attachment #2: adam_niescierowicz.vcf --]
[-- Type: text/vcard, Size: 195 bytes --]

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@justnet.pl
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-03 21:10             ` Adam Niescierowicz
@ 2024-07-04 11:06               ` Mariusz Tkaczyk
  2024-07-04 12:35                 ` Adam Niescierowicz
  0 siblings, 1 reply; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-04 11:06 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

On Wed, 3 Jul 2024 23:10:52 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> On 3.07.2024 o 12:16, Mariusz Tkaczyk wrote:
> > On Wed, 3 Jul 2024 09:42:53 +0200
> > Mariusz Tkaczyk<mariusz.tkaczyk@linux.intel.com>  wrote:
> >
> > I was able to achieve similar state:
> >
> > mdadm -E /dev/nvme2n1
> > /dev/nvme2n1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x0
> >       Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88
> >             Name : gklab-localhost:my_r6  (local to host gklab-localhost)
> >    Creation Time : Wed Jul  3 09:43:32 2024
> >       Raid Level : raid6
> >     Raid Devices : 4
> >
> >   Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
> >       Array Size : 10485760 KiB (10.00 GiB 10.74 GB)
> >    Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB)
> >      Data Offset : 264192 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=264112 sectors, after=1942775216 sectors
> >            State : clean
> >      Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367
> >
> >      Update Time : Wed Jul  3 11:49:34 2024
> >    Bad Block Log : 512 entries available at offset 16 sectors
> >         Checksum : a96eaa64 - correct
> >           Events : 6
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 2
> >     Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing)
> >
> >
> > In my case, events value was different and /dev/nvme3n1 had different Array
> > State:
> >   Device Role : Active device 3
> >     Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)  
> 
> This kind of array behavior is like it should be?
> 
> Why I'm asking, in theory.
> 
> We have bitmap so when third drive disconnect from the array we should 
> have time to stop the array in foulty state before bitmap space is over, 
> yes?.
> Next array send notification to FS(filesystem) that there is a problem 
> and will discard all write operation.

At the moment when failure of 3th disk is recorded then array is moved to
BROKEN state which means that every new write is automatically failed. Only
reads are allowed.
Thus makes no possibility for bitmap to be fully occupied (no bitmap update on
read path), we aborting earlier than bitmap is updated for new write if array
is broken.

No notification to filesystem is sent but filesystem may discover it after
receiving error on every write.

> 
> Data that can't be store on the foulty device should be keep in the bitmap.
> Next when we reatach missing third drive when we write missing data from 
> bitmap to disk everything should be good, yes?
> 
> I'm thinking correctly?
> 

Bitmap doesn't record writes. Please read:
https://man7.org/linux/man-pages/man4/md.4.html
bitmap is used to optimize resync and recovery in case of re-add (but we
know that it won't work in your case). 

> 
> > And I failed to start it, sorry. It is possible but it requires to work with
> > sysfs and ioctls directly so much safer is to recreate an array with
> > --assume-clean, especially that it is fresh array.  
> 
> I recreated the array, LVM detected PV and works fine but XFS above the 
> LVM is missing data from recreate array.
> 

Well, it looks like you did it right because LVM is up. Please compare if disks
are ordered same way in new array (indexes of the drives in mdadm -D output).
Just do be double sure.

For the filesystem issues- I cannot help on that, I hope you can recover at
least part of data :(

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-04 11:06               ` Mariusz Tkaczyk
@ 2024-07-04 12:35                 ` Adam Niescierowicz
  2024-07-05 11:02                   ` Mariusz Tkaczyk
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Niescierowicz @ 2024-07-04 12:35 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 1973 bytes --]

On 4.07.2024 o 13:06, Mariusz Tkaczyk wrote:
>> Data that can't be store on the foulty device should be keep in the bitmap.
>> Next when we reatach missing third drive when we write missing data from
>> bitmap to disk everything should be good, yes?
>>
>> I'm thinking correctly?
>>
> Bitmap doesn't record writes. Please read:
> https://man7.org/linux/man-pages/man4/md.4.html
> bitmap is used to optimize resync and recovery in case of re-add (but we
> know that it won't work in your case).

Is there a way to make storage more fault tolerant?

 From what I saw till now one array=one PV(LVM)=LV(LVM)=one FS.

Mixing two array in LVM and FS isn't good practice.


But what about raid configuration?
I have 4 external backplane, 12 disk each. Each backplane is attached by 
external four SAS LUNs.
In scenario where I attache three disk to one LUN and one LUN crash or 
hang and next restart or ... data on the array will be damaged, yes?

I think that I can create raid5 array for three disk in one LUN so when 
LUN freeze, disconect, hungs or etc one array will stop like server 
crash without power and this should be recovable(until now I didn't have 
problem with array rebuild in this kind of situation)

Problem is with disk usage, each 12 pcs backplane will use 4 disk for 
parity( 12 disk=4 luns = 4 raid 5 array).

Is there better way to do this?


> And I failed to start it, sorry. It is possible but it requires to 
> work with
>>> sysfs and ioctls directly so much safer is to recreate an array with
>>> --assume-clean, especially that it is fresh array.
>> I recreated the array, LVM detected PV and works fine but XFS above the
>> LVM is missing data from recreate array.
>>
> Well, it looks like you did it right because LVM is up. Please compare if disks
> are ordered same way in new array (indexes of the drives in mdadm -D output).
> Just do be double sure.

How can I assigne raid disk number to each disk?


-- 
---
Pozdrawiam
Adam Nieścierowicz

[-- Attachment #1.2: Type: text/html, Size: 3280 bytes --]

[-- Attachment #2: adam_niescierowicz.vcf --]
[-- Type: text/vcard, Size: 195 bytes --]

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@justnet.pl
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID6 12 device assemble force failure
  2024-07-04 12:35                 ` Adam Niescierowicz
@ 2024-07-05 11:02                   ` Mariusz Tkaczyk
  0 siblings, 0 replies; 11+ messages in thread
From: Mariusz Tkaczyk @ 2024-07-05 11:02 UTC (permalink / raw)
  To: Adam Niescierowicz; +Cc: linux-raid

On Thu, 4 Jul 2024 14:35:26 +0200
Adam Niescierowicz <adam.niescierowicz@justnet.pl> wrote:

> On 4.07.2024 o 13:06, Mariusz Tkaczyk wrote:
> >> Data that can't be store on the foulty device should be keep in the bitmap.
> >> Next when we reatach missing third drive when we write missing data from
> >> bitmap to disk everything should be good, yes?
> >>
> >> I'm thinking correctly?
> >>  
> > Bitmap doesn't record writes. Please read:
> > https://man7.org/linux/man-pages/man4/md.4.html
> > bitmap is used to optimize resync and recovery in case of re-add (but we
> > know that it won't work in your case).  
> 
> Is there a way to make storage more fault tolerant?
> 
>  From what I saw till now one array=one PV(LVM)=LV(LVM)=one FS.
> 
> Mixing two array in LVM and FS isn't good practice.

I don't have expertise to advice about FS and LVM.

We (MD) can offer you RAID6 RAID1 and RAID10 so please choose wisely what fits
best your needs. RAID1 is the best fault tolerant but capacity is the lowest.

> 
> 
> But what about raid configuration?
> I have 4 external backplane, 12 disk each. Each backplane is attached by 
> external four SAS LUNs.
> In scenario where I attache three disk to one LUN and one LUN crash or 
> hang and next restart or ... data on the array will be damaged, yes?

Yes, that could be. RAID6 cannot save you from that. It has up to 2 failure
tolerance, not more. That is why backups are important.

Leading array to failed state may cause data damage, any recover from something
like that is recover from error scenario so data might be damaged. I cannot say
yes or no because it varies. Generally, you should be always ready for the
worst case.

We *should* not record failed state in metadata to give user chance to recover
from such scenario so I don't get why it happened (maybe a bug?). I will try to
find time to work on it in next weeks.

> 
> I think that I can create raid5 array for three disk in one LUN so when 
> LUN freeze, disconect, hungs or etc one array will stop like server 
> crash without power and this should be recovable(until now I didn't have 
> problem with array rebuild in this kind of situation)

We cannot record any failure because we lost all drives at the same moment. It
is kind of workaround, it will save you from going to failed or degraded
state. There could be still filesystem error but probably correctable (if array
is wasn't degraded, otherwise RWH may happen).

> 
> Problem is with disk usage, each 12 pcs backplane will use 4 disk for 
> parity( 12 disk=4 luns = 4 raid 5 array).
> 
> Is there better way to do this?

It depends what do you mean by better :) This is always the compromise between
performance, capacity and redundancy. If you are satisfied with raid5
performance, and you think that the redundancy offered by this approach is
enough for your needs- this is fine. If you need more fault tolerant array (or
arrays)- please consider raid1 and raid10.

> 
> 
> > And I failed to start it, sorry. It is possible but it requires to 
> > work with  
> >>> sysfs and ioctls directly so much safer is to recreate an array with
> >>> --assume-clean, especially that it is fresh array.  
> >> I recreated the array, LVM detected PV and works fine but XFS above the
> >> LVM is missing data from recreate array.
> >>  
> > Well, it looks like you did it right because LVM is up. Please compare if
> > disks are ordered same way in new array (indexes of the drives in mdadm -D
> > output). Just do be double sure.  
> 
> How can I assigne raid disk number to each disk?
> 
> 

Order in create command matters. You must pass devices in same order as they
were, starting from the lowest one i.e.:
mdadm -CR volume -n 12 /dev/disk1 /dev/disk2 ...

If you are using bask completion please be aware that it may order them
differently.

Mariusz

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-07-05 11:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-29 15:17 RAID6 12 device assemble force failure Adam Niescierowicz
2024-07-01  8:51 ` Mariusz Tkaczyk
2024-07-01  9:33   ` Adam Niescierowicz
2024-07-02  8:47     ` Mariusz Tkaczyk
2024-07-02 17:47       ` Adam Niescierowicz
2024-07-03  7:42         ` Mariusz Tkaczyk
2024-07-03 10:16           ` Mariusz Tkaczyk
2024-07-03 21:10             ` Adam Niescierowicz
2024-07-04 11:06               ` Mariusz Tkaczyk
2024-07-04 12:35                 ` Adam Niescierowicz
2024-07-05 11:02                   ` Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).