public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
	Coly Li <colyli@suse.de>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Jonmichael Hands <jm@chia.net>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create
Date: Wed, 14 Sep 2022 14:01:06 +0200	[thread overview]
Message-ID: <20220914140106.00000b36@linux.intel.com> (raw)
In-Reply-To: <856072cb-ebdf-52fe-1a13-857763077bca@deltatee.com>

On Tue, 13 Sep 2022 09:43:52 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> On 2022-09-13 01:35, Mariusz Tkaczyk wrote:
> > On Fri, 9 Sep 2022 09:47:21 -0600
> > Logan Gunthorpe <logang@deltatee.com> wrote:  
> >>>> +{
> >>>> +	uint64_t range[2] = {offset, size};    
> >>> Probably you don't need to specify [2] but it is not an issue I think.
> >>>     
> >>>> +	unsigned long buf[4096 / sizeof(unsigned long)];    
> >>>
> >>> Can you use any define for 4096?     
> >>
> >> I don't see any appropriate defines in the code base. It really just
> >> needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually
> >> the worst case.  
> > 
> > See comment bellow.  
> 
> I don't see how the comment below relates to this at all. This 4k has
> nothing to do with anything related to the array, it wos only a
> convenient size to read and check the result. But per Martin's points,
> this code will go away in v3 seeing it's more appropriate to use
> WRITE_ZEROS and that interface guarantees that there will be zeros, thus
> no need to check.

I suggested to skip verification at all in next comment but as you said with
WRITE_ZEROS it is not needed anyway. Sorry for being inaccurate.

> 
> 
> >> That's correct. I discussed this in the cover letter. That's why this
> >> check is here. Per some of the discussion from others I still think the
> >> best course of action is to just check what the discard did and fail if
> >> it is non-zero. Even though many NVMe and ATA devices have the ability
> >> to control or query the behaviour, the kernel doesn't support this and
> >> I don't think it can be relied upon.  
> > 
> > It is also controversial approach[1].
> > 
> > I think that the best we can do here is to warn user, for example:
> > "Please ensure that drive you used return zeros after discard."
> > We should ask for confirmation (it should be possible to skip with --run).
> > I would like to leave discard feature validation on user side, it is not
> > mdadm task. Simply, if you want to use it, then it is done on your own risk
> > and hopefully you know what you want to achieve.
> > That will simplify implementation on mdadm side. What do you think?  
> 
> I think the better approach is to just use WRITE_ZEROS.

Agree.

> 
> >>>> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev,
> >>>>  				}
> >>>>  				if (fd >= 0)
> >>>>  					remove_partitions(fd);
> >>>> +
> >>>> +				if (s->discard &&
> >>>> +				    discard_device(c, fd, dv->devname,
> >>>> +						   dv->data_offset << 9,
> >>>> +						   s->size << 10)) {
> >>>> +					ioctl(mdfd, STOP_ARRAY, NULL);
> >>>> +					goto abort_locked;
> >>>> +				}
> >>>> +    
> >>> Feel free to use up to 100 char in one line it is allowed now.
> >>> Why we need dv->data_offset << 9 and  s->size << 10 here?
> >>> How this applies to zoned raid0?    
> >>
> >> As I understand it the offset and size will give the bounds of the
> >> data region on the disk. Do you not think it works for zoned raid0?  
> > 
> > mdadm operates on 512B, so using 4K data regions could be destructive.
> > Also left shift causes that size value is increasing. We can't clear more
> > that user requested. We need to use 512b sectors as mdadm does.  
> 
> I don't really follow this.

I understand that you want left shit is used to round size to data region
and I assumed that data_region is 4K and that is probably wrong.
You are right I has no sense, my apologizes.

Let's imagine that our size is for example, 2687 sectors. Left shit will
cause that we will get 2751488 and that will be passed as a size to function.
Similar for data_offset. That is much more than we want to clear.
Do I miss something? I guess that ioctl operates on sectors too but please
correct me if that is wrong.

> 
> > I don't know how native raid0 size is passed and how zones are created but I
> > suspect that s->size may not be correct for all drives. It it a global
> > property but for zoned raid member size could be different. You need to
> > check how it applies.  
> 
> > Also, I'm not sure if this feature is needed for raid0, because there is no
> > resync. Maybe we can exclude raid0 from discard?  
> 
> I'll check raid0 size. If possible I'd rather not have the restriction
> to avoid raid0 as it becomes complicated and users may have reason to
> zero besides avoiding resync.

Agree, thanks.

> >>
> >> Well it was my opinion that it was clearer in the code to just
> >> explicitly include discard in the conditionals instead of making discard
> >> also set assume-clean, but if you think otherwise I can change it for v3.
> >>
> >> What kind of user message are you thinking is necessary here?  
> > 
> > In my convention, all shape attributes should be set in mdadm.c, later they
> > should be considered as const, we should not overwrite them. This structure
> > represents user settings.
> > This is why I consider updating assume_clean in Create.c as wrong. When
> > discard is set then we are assuming that user want to skip resync too,
> > otherwise it doesn't have sense. Also, if discard of any drive fails we are
> > returning error and create operation will fail anyway.  
> 
> The v2 version of this patch did not modify the shape attributes in
> Create.c; that was only in v1.

Ok, I missed that, thanks.
> 
> > I would expected something like: "Discard requested, setting --assume-clean
> > to skip resync".
> > Also, will be great if you can add some tests, at least for command-line.  
> 
> Ok.
> 

Mariusz

  reply	other threads:[~2022-09-14 12:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 23:08 [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create Logan Gunthorpe
2022-09-09  9:57   ` Mariusz Tkaczyk
2022-09-09 11:54     ` Roman Mamedov
     [not found]       ` <CABdXBANrJNWjq4237k9DPRoxLVmiAUoKMZxaaLUrcMHsODwvmA@mail.gmail.com>
2022-09-09 15:31         ` Roman Mamedov
2022-09-12 17:43       ` Martin K. Petersen
2022-09-09 15:47     ` Logan Gunthorpe
2022-09-13  7:35       ` Mariusz Tkaczyk
2022-09-13 15:43         ` Logan Gunthorpe
2022-09-14 12:01           ` Mariusz Tkaczyk [this message]
2022-09-14 16:29             ` Logan Gunthorpe
2022-09-14 17:39               ` Mariusz Tkaczyk
2022-09-19  8:41   ` Xiao Ni
2022-09-21 18:45     ` Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 2/2] manpage: Add --discard option to manpage Logan Gunthorpe
2022-09-12 17:40 ` [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Martin K. Petersen
     [not found]   ` <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com>
2022-09-13  3:47     ` Martin K. Petersen
2022-09-13 15:38   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220914140106.00000b36@linux.intel.com \
    --to=mariusz.tkaczyk@linux.intel.com \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=chaitanyak@nvidia.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=jes@trained-monkey.org \
    --cc=jm@chia.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=sbates@raithlin.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox