public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Cc: linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
	Coly Li <colyli@suse.de>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Jonmichael Hands <jm@chia.net>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create
Date: Fri, 9 Sep 2022 09:47:21 -0600	[thread overview]
Message-ID: <6dd46583-05ef-12e7-8a37-b732cbe79f23@deltatee.com> (raw)
In-Reply-To: <20220909115749.00007431@linux.intel.com>



On 2022-09-09 03:57, Mariusz Tkaczyk wrote:
>> If all the discard requests are successful and there are no missing
>> disks thin it is safe to set assume_clean as we know the array is clean.
> 
> Please update message. We agreed in v1 that missing disks and discard features
> are not related, right?

Oops, yes, I'll update the commit message for v3.


>> +static int discard_device(struct context *c, int fd, const char *devname,
>> +			  unsigned long long offset, unsigned long long size)
> 
> Will be great if you can description.

Ok, will do for v3.

>> +{
>> +	uint64_t range[2] = {offset, size};
> Probably you don't need to specify [2] but it is not an issue I think.
> 
>> +	unsigned long buf[4096 / sizeof(unsigned long)];
> 
> Can you use any define for 4096? 

I don't see any appropriate defines in the code base. It really just
needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually
the worst case.

>> +	unsigned long i;
>> +
>> +	if (c->verbose)
>> +		printf("discarding data from %lld to %lld on: %s\n",
>> +		       offset, size, devname);
>> +
>> +	if (ioctl(fd, BLKDISCARD, &range)) {
>> +		pr_err("discard failed on '%s': %m\n", devname);
>> +		return 1;
>> +	}
>> +
>> +	if (pread(fd, buf, sizeof(buf), offset) != sizeof(buf)) {
>> +		pr_err("failed to readback '%s' after discard: %m\n",
>> devname);
>> +		return 1;
>> +	}
>> +
>> +	for (i = 0; i < ARRAY_SIZE(buf); i++) {
>> +		if (buf[i]) {
>> +			pr_err("device did not read back zeros after discard
>> on '%s': %lx\n",
>> +			       devname, buf[i]);
> In previous version I wanted to leave the message on stderr, but just move a
> data (buf[i]) to debug, or if (verbose > 0).
> I think that printing binary data in error message is not necessary.

I added the hex because it might be informative to know what a discard
did to the device (all FFs or random data).

> BTW. I'm not sure if discard ensures that data will be all zero. It causes that
> drive drops all references but I doesn't mean that data is zeroed. Could you
> please check it in documentation? Should we expect zeroes?

That's correct. I discussed this in the cover letter. That's why this
check is here. Per some of the discussion from others I still think the
best course of action is to just check what the discard did and fail if
it is non-zero. Even though many NVMe and ATA devices have the ability
to control or query the behaviour, the kernel doesn't support this and
I don't think it can be relied upon.


>> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev,
>>  				}
>>  				if (fd >= 0)
>>  					remove_partitions(fd);
>> +
>> +				if (s->discard &&
>> +				    discard_device(c, fd, dv->devname,
>> +						   dv->data_offset << 9,
>> +						   s->size << 10)) {
>> +					ioctl(mdfd, STOP_ARRAY, NULL);
>> +					goto abort_locked;
>> +				}
>> +
> Feel free to use up to 100 char in one line it is allowed now.
> Why we need dv->data_offset << 9 and  s->size << 10 here?
> How this applies to zoned raid0?

As I understand it the offset and size will give the bounds of the
data region on the disk. Do you not think it works for zoned raid0?

>> diff --git a/mdadm.c b/mdadm.c
>> index 972adb524dfb..049cdce1cdd2 100644
>> --- a/mdadm.c
>> +++ b/mdadm.c
>> @@ -602,6 +602,10 @@ int main(int argc, char *argv[])
>>  			s.assume_clean = 1;
>>  			continue;
>>  
>> +		case O(CREATE, Discard):
>> +			s.discard = true;
>> +			continue;
>> +
> 
> I would like to set s.assume_clean=true along with discard. Then will be no need
> to modify other conditions. If we are assuming that after discard all is zeros
> then we can skip resync, right? According to message, it should be.
> Please add message for user and set assume_clean too.


Well it was my opinion that it was clearer in the code to just
explicitly include discard in the conditionals instead of making discard
also set assume-clean, but if you think otherwise I can change it for v3.

What kind of user message are you thinking is necessary here?

Logan

  parent reply	other threads:[~2022-09-09 15:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 23:08 [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create Logan Gunthorpe
2022-09-09  9:57   ` Mariusz Tkaczyk
2022-09-09 11:54     ` Roman Mamedov
     [not found]       ` <CABdXBANrJNWjq4237k9DPRoxLVmiAUoKMZxaaLUrcMHsODwvmA@mail.gmail.com>
2022-09-09 15:31         ` Roman Mamedov
2022-09-12 17:43       ` Martin K. Petersen
2022-09-09 15:47     ` Logan Gunthorpe [this message]
2022-09-13  7:35       ` Mariusz Tkaczyk
2022-09-13 15:43         ` Logan Gunthorpe
2022-09-14 12:01           ` Mariusz Tkaczyk
2022-09-14 16:29             ` Logan Gunthorpe
2022-09-14 17:39               ` Mariusz Tkaczyk
2022-09-19  8:41   ` Xiao Ni
2022-09-21 18:45     ` Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 2/2] manpage: Add --discard option to manpage Logan Gunthorpe
2022-09-12 17:40 ` [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Martin K. Petersen
     [not found]   ` <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com>
2022-09-13  3:47     ` Martin K. Petersen
2022-09-13 15:38   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6dd46583-05ef-12e7-8a37-b732cbe79f23@deltatee.com \
    --to=logang@deltatee.com \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=chaitanyak@nvidia.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=jes@trained-monkey.org \
    --cc=jm@chia.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=sbates@raithlin.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox