linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mdadm: add --fast-initialize
@ 2024-05-28 14:33 Mariusz Tkaczyk
  2024-06-04 12:46 ` Xiao Ni
  0 siblings, 1 reply; 4+ messages in thread
From: Mariusz Tkaczyk @ 2024-05-28 14:33 UTC (permalink / raw)
  To: linux-raid; +Cc: Mariusz Tkaczyk, Logan Gunthorpe

This is not complete change but I would like to get the feedback on
concept proposed. There are few features for optimized space zeroing.
We already support --write-zeroes but Intel would like to add support of
deallocate command (discard) in the future. There is also Sata trim
which could be potentially used.

The goal of this RFC is to get feedback about proposing one option to
check for few features which can be used for performing smarter
initialization instead of resync. With that, user may just type
--fast-initialize and mdadm will determine what can be used, else abort.

This won't be merged.

Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

---
 mdadm.8.in | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mdadm.8.in b/mdadm.8.in
index aa0c540399f6..be592d70ac9b 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -849,6 +849,17 @@ each disk is zeroed in parallel with the others.
 .IP
 This is only meaningful with --create.
 
+.TP
+.BR \-\-fast-initialize
+When creating an array, check disks for optional features to perform optimized initialization
+instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is
+feature supported by all drives, it is executed, otherwise error is returned. This option invokes
+.B \-\-assume\-clean
+.This is intended for use with devices that have hardware offload for zeroing, but despite this
+zeroing can still take several minutes for large disks to complete.
+.IP
+This is only meaningful with --create.
+
 .TP
 .BR \-\-backup\-file=
 This is needed when
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mdadm: add --fast-initialize
  2024-05-28 14:33 [RFC PATCH] mdadm: add --fast-initialize Mariusz Tkaczyk
@ 2024-06-04 12:46 ` Xiao Ni
  2024-06-04 16:19   ` Logan Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Xiao Ni @ 2024-06-04 12:46 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, Logan Gunthorpe

Hi Mariusz

The discard can't promise to write zero to nvme disks, right? If so,
we can't use it for resync, because it can't make sure the raid is in
sync state.

Best Regards
Xiao

On Tue, May 28, 2024 at 10:33 PM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> This is not complete change but I would like to get the feedback on
> concept proposed. There are few features for optimized space zeroing.
> We already support --write-zeroes but Intel would like to add support of
> deallocate command (discard) in the future. There is also Sata trim
> which could be potentially used.
>
> The goal of this RFC is to get feedback about proposing one option to
> check for few features which can be used for performing smarter
> initialization instead of resync. With that, user may just type
> --fast-initialize and mdadm will determine what can be used, else abort.
>
> This won't be merged.
>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
>
> ---
>  mdadm.8.in | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/mdadm.8.in b/mdadm.8.in
> index aa0c540399f6..be592d70ac9b 100644
> --- a/mdadm.8.in
> +++ b/mdadm.8.in
> @@ -849,6 +849,17 @@ each disk is zeroed in parallel with the others.
>  .IP
>  This is only meaningful with --create.
>
> +.TP
> +.BR \-\-fast-initialize
> +When creating an array, check disks for optional features to perform optimized initialization
> +instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is
> +feature supported by all drives, it is executed, otherwise error is returned. This option invokes
> +.B \-\-assume\-clean
> +.This is intended for use with devices that have hardware offload for zeroing, but despite this
> +zeroing can still take several minutes for large disks to complete.
> +.IP
> +This is only meaningful with --create.
> +
>  .TP
>  .BR \-\-backup\-file=
>  This is needed when
> --
> 2.35.3
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mdadm: add --fast-initialize
  2024-06-04 12:46 ` Xiao Ni
@ 2024-06-04 16:19   ` Logan Gunthorpe
  2024-06-10  8:57     ` Mariusz Tkaczyk
  0 siblings, 1 reply; 4+ messages in thread
From: Logan Gunthorpe @ 2024-06-04 16:19 UTC (permalink / raw)
  To: Xiao Ni, Mariusz Tkaczyk; +Cc: linux-raid



On 2024-06-04 06:46, Xiao Ni wrote:
> Hi Mariusz
> 
> The discard can't promise to write zero to nvme disks, right? If so,
> we can't use it for resync, because it can't make sure the raid is in
> sync state.

Yes, discard requests are a best effort and the drive is free to ignore
some or all of the request. See [1] for more information from Martin
Peterson.

I think if we have a device that has a fast zero operation that we know
guarantees zeroing then the kernel's write-zeros operation should be
changed to use it. We shouldn't make fast-but-dangerous options in mdadm.

Thanks,

Logan


[1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mdadm: add --fast-initialize
  2024-06-04 16:19   ` Logan Gunthorpe
@ 2024-06-10  8:57     ` Mariusz Tkaczyk
  0 siblings, 0 replies; 4+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-10  8:57 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: Xiao Ni, linux-raid

On Tue, 4 Jun 2024 10:19:59 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> On 2024-06-04 06:46, Xiao Ni wrote:
> > Hi Mariusz
> > 
> > The discard can't promise to write zero to nvme disks, right? If so,
> > we can't use it for resync, because it can't make sure the raid is in
> > sync state.  
> 
> Yes, discard requests are a best effort and the drive is free to ignore
> some or all of the request. See [1] for more information from Martin
> Peterson.
> 
> I think if we have a device that has a fast zero operation that we know
> guarantees zeroing then the kernel's write-zeros operation should be
> changed to use it. We shouldn't make fast-but-dangerous options in mdadm.
> 
> Thanks,
> 
> Logan
> 
> 
> [1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u

Thanks for giving the valuable feedback. I'm not directly involved in technical
details about this implementation and in fact I didn't read the previous
discussion yet. You pointed great problem and I will make sure that it is
addressed.

I asked about mdadm API, it is despite the technical implementation.
I would like to propose one command to integrate existing way (--write-zeroes)
and potentially new way (if any other fast-initialization capability would be
safe to add).

Do you see it as right approach or we should keep them separately?

Mariusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-10  8:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 14:33 [RFC PATCH] mdadm: add --fast-initialize Mariusz Tkaczyk
2024-06-04 12:46 ` Xiao Ni
2024-06-04 16:19   ` Logan Gunthorpe
2024-06-10  8:57     ` Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).