* Is it possible to restart --add?
@ 2022-12-10 19:59 Chris Dunlop
2022-12-11 13:55 ` Wols Lists
0 siblings, 1 reply; 3+ messages in thread
From: Chris Dunlop @ 2022-12-10 19:59 UTC (permalink / raw)
To: linux-raid
Hi,
When replacing a failed disk with a new one using --add, is it possible to
restart a partially-complete --add, e.g. after a reboot?
I have a raid-6 with a failed disk, and used --add to add a new disk as a
replacement. From /proc/mdstat, "finish" told me it would take around 24
hours to complete the add.
The machine was rebooted some hours into the add, and on restart the md
was missing the new disk (and the failed disk). I tried to --re-add the
new disk again, but mdadm told me it's "not possible":
mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible
I ended up --add'ing the disk again, so the 24 hours to complete started
again.
Is this expected, and/or is there a way to restart the --add rather than
starting from the beginning again?
$ mdadm --version
mdadm - v4.1 - 2018-10-01
Thanks,
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Is it possible to restart --add?
2022-12-10 19:59 Is it possible to restart --add? Chris Dunlop
@ 2022-12-11 13:55 ` Wols Lists
2022-12-12 7:05 ` Chris Dunlop
0 siblings, 1 reply; 3+ messages in thread
From: Wols Lists @ 2022-12-11 13:55 UTC (permalink / raw)
To: Chris Dunlop, linux-raid
On 10/12/2022 19:59, Chris Dunlop wrote:
> Hi,
>
> When replacing a failed disk with a new one using --add, is it possible
> to restart a partially-complete --add, e.g. after a reboot?
>
> I have a raid-6 with a failed disk, and used --add to add a new disk as
> a replacement. From /proc/mdstat, "finish" told me it would take around
> 24 hours to complete the add.
>
> The machine was rebooted some hours into the add, and on restart the md
> was missing the new disk (and the failed disk). I tried to --re-add the
> new disk again, but mdadm told me it's "not possible":
>
> mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible
>
> I ended up --add'ing the disk again, so the 24 hours to complete started
> again.
>
> Is this expected, and/or is there a way to restart the --add rather than
> starting from the beginning again?
>
Raid is supposed to be robust, so this surprises me. When it rebooted it
should have known it was part-way through a rebuild. Was it a controlled
reboot, or a crash and restart?
What I would expect is that the array would be rebuilt including sdh1,
and the rebuild would just carry on. So I suspect that whatever went
wrong, it was a bit further back than that - somehow md forgot that sdh1
was now part of the array.
Weird.
Cheers,
Wol
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Is it possible to restart --add?
2022-12-11 13:55 ` Wols Lists
@ 2022-12-12 7:05 ` Chris Dunlop
0 siblings, 0 replies; 3+ messages in thread
From: Chris Dunlop @ 2022-12-12 7:05 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
On Sun, Dec 11, 2022 at 01:55:54PM +0000, Wols Lists wrote:
> On 10/12/2022 19:59, Chris Dunlop wrote:
>> Hi,
>>
>> When replacing a failed disk with a new one using --add, is it
>> possible to restart a partially-complete --add, e.g. after a reboot?
>>
>> I have a raid-6 with a failed disk, and used --add to add a new disk
>> as a replacement. From /proc/mdstat, "finish" told me it would take
>> around 24 hours to complete the add.
>>
>> The machine was rebooted some hours into the add, and on restart the
>> md was missing the new disk (and the failed disk). I tried to
>> --re-add the new disk again, but mdadm told me it's "not possible":
>>
>> mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible
>>
>> I ended up --add'ing the disk again, so the 24 hours to complete
>> started again.
>>
>> Is this expected, and/or is there a way to restart the --add rather
>> than starting from the beginning again?
>
> Raid is supposed to be robust, so this surprises me. When it rebooted
> it should have known it was part-way through a rebuild. Was it a
> controlled reboot, or a crash and restart?
Controlled reboot.
> What I would expect is that the array would be rebuilt including sdh1,
> and the rebuild would just carry on. So I suspect that whatever went
> wrong, it was a bit further back than that - somehow md forgot that
> sdh1 was now part of the array.
Yes, I was expecting that the --add would be periodically recording it's
current "synced to" block or offset so on restart it would be able to pick
up where it left off (or a little before).
> Weird.
Yup.
Tks,
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-12-12 7:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-10 19:59 Is it possible to restart --add? Chris Dunlop
2022-12-11 13:55 ` Wols Lists
2022-12-12 7:05 ` Chris Dunlop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).