linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Can't replace drive in imsm RAID 5 array, spare not shown
@ 2024-10-06  6:00 19 Devices
  2024-10-09 10:09 ` Mariusz Tkaczyk
  0 siblings, 1 reply; 4+ messages in thread
From: 19 Devices @ 2024-10-06  6:00 UTC (permalink / raw)
  To: linux-raid

Hi, I have a 4 drive imsm RAID 5 array which is working fine.  I want to remove one of the drives, sda, and replace it with a spare, sdc.  From man mdadm I understand that add - fail - remove is the way to go but this does not work.

Before:
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
      2831155200 blocks super external:/md126/0 level 5,
 128k chunk, algorithm 0 [4/4] [UUUU]

md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
      99116032 blocks super external:/md126/1 level 5, 1
28k chunk, algorithm 0 [4/4] [UUUU]

md126 : inactive sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
      14681 blocks super external:imsm

unused devices: <none>


I can add (or add-spare) which increases the size of the container and though I can't see any spare drives listed by mdadm, it appears as SPARE DISK in the Intel option ROM after a reboot.

$ sudo mdadm --zero-superblock /dev/sdc

$ sudo mdadm /dev/md/imsm1 --add-spare /de
v/sdc
mdadm: added /dev/sdc

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
      2831155200 blocks super external:/md126/0 level 5,
 128k chunk, algorithm 0 [4/4] [UUUU]

md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
      99116032 blocks super external:/md126/1 level 5, 1
28k chunk, algorithm 0 [4/4] [UUUU]

md126 : inactive sdc[4](S) sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
      15786 blocks super external:imsm

unused devices: <none>
$


No spare devices listed here:

$ sudo mdadm -D /dev/md/imsm1
/dev/md/imsm1:
           Version : imsm
        Raid Level : container
     Total Devices : 5

   Working Devices : 5


              UUID : bdb7f495:21b8c189:e496c216:6f2d6c4c
     Member Arrays : /dev/md/md1_0 /dev/md/md0_0

    Number   Major   Minor   RaidDevice

       -       8       64        -        /dev/sde
       -       8       32        -        /dev/sdc
       -       8        0        -        /dev/sda
       -       8       48        -        /dev/sdd
       -       8       16        -        /dev/sdb
$


Trying to remove sda fails.

$ sudo mdadm --fail /dev/md126 /dev/sda
mdadm: Cannot remove /dev/sda from /dev/md126, array will be failed.

sda is 2TB, the others are 1TB - is that a problem?

smartctl shows 2 drives don't support  SCT and it's disabled on the other 3.

There's a very similar question here from Edwin in 2017:
https://unix.stackexchange.com/questions/372908/add-hot-spare-drive-to-intel-rst-onboard-raid#372920

The only reply points to an Intel doc which uses the standard command to add a drive but doesn't show the result.

$ uname -a
Linux Intel 6.9.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 26
 May 2024 01:30:29 +0000 x86_64 GNU/Linux

$ mdadm --version
mdadm - v4.3 - 2024-02-15

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can't replace drive in imsm RAID 5 array, spare not shown
  2024-10-06  6:00 Can't replace drive in imsm RAID 5 array, spare not shown 19 Devices
@ 2024-10-09 10:09 ` Mariusz Tkaczyk
  2024-10-10  8:25   ` 19 Devices
  0 siblings, 1 reply; 4+ messages in thread
From: Mariusz Tkaczyk @ 2024-10-09 10:09 UTC (permalink / raw)
  To: 19 Devices; +Cc: linux-raid

On Sun, 06 Oct 2024 07:00:18 +0100
19 Devices <19devices@gmail.com> wrote:

> Hi, I have a 4 drive imsm RAID 5 array which is working fine.  I want to
> remove one of the drives, sda, and replace it with a spare, sdc.  From man
> mdadm I understand that add - fail - remove is the way to go but this does
> not work.
> 
> Before:
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>       2831155200 blocks super external:/md126/0 level 5,
>  128k chunk, algorithm 0 [4/4] [UUUU]
> 
> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>       99116032 blocks super external:/md126/1 level 5, 1
> 28k chunk, algorithm 0 [4/4] [UUUU]
> 
> md126 : inactive sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
>       14681 blocks super external:imsm
> 
> unused devices: <none>
> 
> 
> I can add (or add-spare) which increases the size of the container and though
> I can't see any spare drives listed by mdadm, it appears as SPARE DISK in the
> Intel option ROM after a reboot.
> 
> $ sudo mdadm --zero-superblock /dev/sdc
> 
> $ sudo mdadm /dev/md/imsm1 --add-spare /de
> v/sdc
> mdadm: added /dev/sdc
> 
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>       2831155200 blocks super external:/md126/0 level 5,
>  128k chunk, algorithm 0 [4/4] [UUUU]
> 
> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>       99116032 blocks super external:/md126/1 level 5, 1
> 28k chunk, algorithm 0 [4/4] [UUUU]
> 
> md126 : inactive sdc[4](S) sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
>       15786 blocks super external:imsm
> 
> unused devices: <none>
> $
> 
> 
> No spare devices listed here:
> 
> $ sudo mdadm -D /dev/md/imsm1
> /dev/md/imsm1:
>            Version : imsm
>         Raid Level : container
>      Total Devices : 5
> 
>    Working Devices : 5
> 
> 
>               UUID : bdb7f495:21b8c189:e496c216:6f2d6c4c
>      Member Arrays : /dev/md/md1_0 /dev/md/md0_0
> 
>     Number   Major   Minor   RaidDevice
> 
>        -       8       64        -        /dev/sde
>        -       8       32        -        /dev/sdc
>        -       8        0        -        /dev/sda
>        -       8       48        -        /dev/sdd
>        -       8       16        -        /dev/sdb
> $
> 
Hello,

I know. It is fine. From container point of view these all are spares.
Nobody ever complained about that so we did not fixed it :)
The most important is that all drives are here.

To detect spares you must compare this list with list from #mdadm --detail
/dev/md124 (member array). Drives that are not used in member array are spares.
> 
> Trying to remove sda fails.
> 
> $ sudo mdadm --fail /dev/md126 /dev/sda
> mdadm: Cannot remove /dev/sda from /dev/md126, array will be failed.

It might be an issue in mdadm, we added this and later we added fixes:

Commit:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=fc6fd4063769f4194c3fb8f77b32b2819e140fb9

Fixes:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=b3e7b7eb1dfedd7cbd9a3800e884941f67d94c96
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=461fae7e7809670d286cc19aac5bfa861c29f93a

but your release is mdadm-4.3, all fixes should be there. It might be a new bug.

Try:
#mdadm -If sda
but please do not abuse it (just use it one time because it may fail your
array). According to mdstat it should be safe in this case.

If you can do some investigation, I would be tankful, I expect issues
in enough() function.

Thanks,
Mariusz

> 
> sda is 2TB, the others are 1TB - is that a problem?
> 
> smartctl shows 2 drives don't support  SCT and it's disabled on the other 3.
> 
> There's a very similar question here from Edwin in 2017:
> https://unix.stackexchange.com/questions/372908/add-hot-spare-drive-to-intel-rst-onboard-raid#372920
> 
> The only reply points to an Intel doc which uses the standard command to add
> a drive but doesn't show the result.
> 
> $ uname -a
> Linux Intel 6.9.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 26
>  May 2024 01:30:29 +0000 x86_64 GNU/Linux
> 
> $ mdadm --version
> mdadm - v4.3 - 2024-02-15
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can't replace drive in imsm RAID 5 array, spare not shown
  2024-10-09 10:09 ` Mariusz Tkaczyk
@ 2024-10-10  8:25   ` 19 Devices
  2024-10-10 10:42     ` Mariusz Tkaczyk
  0 siblings, 1 reply; 4+ messages in thread
From: 19 Devices @ 2024-10-10  8:25 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid



On 9 October 2024 11:09:40 BST, Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote:
>On Sun, 06 Oct 2024 07:00:18 +0100
>19 Devices <19devices@gmail.com> wrote:
>
>> Hi, I have a 4 drive imsm RAID 5 array which is working fine.  I want to
>> remove one of the drives, sda, and replace it with a spare, sdc.  From man
>> mdadm I understand that add - fail - remove is the way to go but this does
>> not work.
>> 
>> Before:
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>>       2831155200 blocks super external:/md126/0 level 5,
>>  128k chunk, algorithm 0 [4/4] [UUUU]
>> 
>> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>>       99116032 blocks super external:/md126/1 level 5, 1
>> 28k chunk, algorithm 0 [4/4] [UUUU]
>> 
>> md126 : inactive sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
>>       14681 blocks super external:imsm
>> 
>> unused devices: <none>
>> 
>> 
>> I can add (or add-spare) which increases the size of the container and though
>> I can't see any spare drives listed by mdadm, it appears as SPARE DISK in the
>> Intel option ROM after a reboot.
>> 
>> $ sudo mdadm --zero-superblock /dev/sdc
>> 
>> $ sudo mdadm /dev/md/imsm1 --add-spare /de
>> v/sdc
>> mdadm: added /dev/sdc
>> 
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>>       2831155200 blocks super external:/md126/0 level 5,
>>  128k chunk, algorithm 0 [4/4] [UUUU]
>> 
>> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0]
>>       99116032 blocks super external:/md126/1 level 5, 1
>> 28k chunk, algorithm 0 [4/4] [UUUU]
>> 
>> md126 : inactive sdc[4](S) sda[3](S) sdb[2](S) sdd[1](S) sde[0](S)
>>       15786 blocks super external:imsm
>> 
>> unused devices: <none>
>> $
>> 
>> 
>> No spare devices listed here:
>> 
>> $ sudo mdadm -D /dev/md/imsm1
>> /dev/md/imsm1:
>>            Version : imsm
>>         Raid Level : container
>>      Total Devices : 5
>> 
>>    Working Devices : 5
>> 
>> 
>>               UUID : bdb7f495:21b8c189:e496c216:6f2d6c4c
>>      Member Arrays : /dev/md/md1_0 /dev/md/md0_0
>> 
>>     Number   Major   Minor   RaidDevice
>> 
>>        -       8       64        -        /dev/sde
>>        -       8       32        -        /dev/sdc
>>        -       8        0        -        /dev/sda
>>        -       8       48        -        /dev/sdd
>>        -       8       16        -        /dev/sdb
>> $
>> 
>Hello,
>
>I know. It is fine. From container point of view these all are spares.
>Nobody ever complained about that so we did not fixed it :)
>The most important is that all drives are here.
>
>To detect spares you must compare this list with list from #mdadm --detail
>/dev/md124 (member array). Drives that are not used in member array are spares.
>> 
>> Trying to remove sda fails.
>> 
>> $ sudo mdadm --fail /dev/md126 /dev/sda
>> mdadm: Cannot remove /dev/sda from /dev/md126, array will be failed.
>
>It might be an issue in mdadm, we added this and later we added fixes:
>
>Commit:
>https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=fc6fd4063769f4194c3fb8f77b32b2819e140fb9
>
>Fixes:
>https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=b3e7b7eb1dfedd7cbd9a3800e884941f67d94c96
>https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=461fae7e7809670d286cc19aac5bfa861c29f93a
>
>but your release is mdadm-4.3, all fixes should be there. It might be a new bug.
>
>Try:
>#mdadm -If sda
>but please do not abuse it (just use it one time because it may fail your
>array). According to mdstat it should be safe in this case.
>
>If you can do some investigation, I would be tankful, I expect issues
>in enough() function.
>
>Thanks,
>Mariusz
>
>> 
>> sda is 2TB, the others are 1TB - is that a problem?
>> 
>> smartctl shows 2 drives don't support  SCT and it's disabled on the other 3.
>> 
>> There's a very similar question here from Edwin in 2017:
>> https://unix.stackexchange.com/questions/372908/add-hot-spare-drive-to-intel-rst-onboard-raid#372920
>> 
>> The only reply points to an Intel doc which uses the standard command to add
>> a drive but doesn't show the result.
>> 
>> $ uname -a
>> Linux Intel 6.9.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 26
>>  May 2024 01:30:29 +0000 x86_64 GNU/Linux
>> 
>> $ mdadm --version
>> mdadm - v4.3 - 2024-02-15
>> 
>

---------------------------------------

Thank you Mariusz, that (--incremental --fail) worked:


# mdadm -If sda
mdadm: set sda faulty in md124
mdadm: set sda faulty in md125
mdadm: hot removed sda from md126

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md124 : active raid5 sdc[4] sdd[3] sdb[2] sde[0]
      2831155200 blocks super external:/md126/0 level 5,
 128k chunk, algorithm 0 [4/3] [UU_U]
      [>....................]  recovery =  0.2% (2275456
/943718400) finish=222.5min speed=70515K/sec

md125 : active raid5 sdc[4] sdd[3] sdb[2] sde[0]
      99116032 blocks super external:/md126/1 level 5, 1
28k chunk, algorithm 0 [4/3] [UU_U]
        resync=DELAYED

md126 : inactive sdc[4](S) sdb[2](S) sdd[1](S) sde[0](S)
      10585 blocks super external:imsm

unused devices: <none>
#


# journalctl -f
kernel: md/raid:md124: Disk failure on sda, disabling device.
kernel: md/raid:md124: Operation continuing on 3 devices.
kernel: md/raid:md125: Disk failure on sda, disabling device.
kernel: md/raid:md125: Operation continuing on 3 devices.
kernel: md: recovery of RAID array md124
kernel: md: delaying recovery of md125 until md124 has finished (they share one or more physical units)
mdadm[628]: mdadm: Fail event detected on md device /dev/md125, component device /dev/sda
mdadm[628]: mdadm: RebuildStarted event detected on md device /dev/md124
Intel mdadm[628]: mdadm: Fail event detected on md device /dev/md124, component device /dev/sda

---------------------------------------

ps. Belated thanks too for your solution to my previous problem here on 2021/08/02.  That fix showed no sign it had succeeded until reboot but after that all was fine.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can't replace drive in imsm RAID 5 array, spare not shown
  2024-10-10  8:25   ` 19 Devices
@ 2024-10-10 10:42     ` Mariusz Tkaczyk
  0 siblings, 0 replies; 4+ messages in thread
From: Mariusz Tkaczyk @ 2024-10-10 10:42 UTC (permalink / raw)
  To: 19 Devices; +Cc: linux-raid

On Thu, 10 Oct 2024 09:25:58 +0100
19 Devices <19devices@gmail.com> wrote:

> ps. Belated thanks too for your solution to my previous problem here on
> 2021/08/02.  That fix showed no sign it had succeeded until reboot but after
> that all was fine.


Cool. I'm glad to hear that.

Mariusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-10 10:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-06  6:00 Can't replace drive in imsm RAID 5 array, spare not shown 19 Devices
2024-10-09 10:09 ` Mariusz Tkaczyk
2024-10-10  8:25   ` 19 Devices
2024-10-10 10:42     ` Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).