From: Logan Gunthorpe <logang@deltatee.com>
To: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Cc: linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
Coly Li <colyli@suse.de>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Jonmichael Hands <jm@chia.net>,
Stephen Bates <sbates@raithlin.com>,
Martin Oliveira <Martin.Oliveira@eideticom.com>,
David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create
Date: Fri, 9 Sep 2022 09:47:21 -0600 [thread overview]
Message-ID: <6dd46583-05ef-12e7-8a37-b732cbe79f23@deltatee.com> (raw)
In-Reply-To: <20220909115749.00007431@linux.intel.com>
On 2022-09-09 03:57, Mariusz Tkaczyk wrote:
>> If all the discard requests are successful and there are no missing
>> disks thin it is safe to set assume_clean as we know the array is clean.
>
> Please update message. We agreed in v1 that missing disks and discard features
> are not related, right?
Oops, yes, I'll update the commit message for v3.
>> +static int discard_device(struct context *c, int fd, const char *devname,
>> + unsigned long long offset, unsigned long long size)
>
> Will be great if you can description.
Ok, will do for v3.
>> +{
>> + uint64_t range[2] = {offset, size};
> Probably you don't need to specify [2] but it is not an issue I think.
>
>> + unsigned long buf[4096 / sizeof(unsigned long)];
>
> Can you use any define for 4096?
I don't see any appropriate defines in the code base. It really just
needs to be bigger than any O_DIRECT restrictions. 4096 bytes is usually
the worst case.
>> + unsigned long i;
>> +
>> + if (c->verbose)
>> + printf("discarding data from %lld to %lld on: %s\n",
>> + offset, size, devname);
>> +
>> + if (ioctl(fd, BLKDISCARD, &range)) {
>> + pr_err("discard failed on '%s': %m\n", devname);
>> + return 1;
>> + }
>> +
>> + if (pread(fd, buf, sizeof(buf), offset) != sizeof(buf)) {
>> + pr_err("failed to readback '%s' after discard: %m\n",
>> devname);
>> + return 1;
>> + }
>> +
>> + for (i = 0; i < ARRAY_SIZE(buf); i++) {
>> + if (buf[i]) {
>> + pr_err("device did not read back zeros after discard
>> on '%s': %lx\n",
>> + devname, buf[i]);
> In previous version I wanted to leave the message on stderr, but just move a
> data (buf[i]) to debug, or if (verbose > 0).
> I think that printing binary data in error message is not necessary.
I added the hex because it might be informative to know what a discard
did to the device (all FFs or random data).
> BTW. I'm not sure if discard ensures that data will be all zero. It causes that
> drive drops all references but I doesn't mean that data is zeroed. Could you
> please check it in documentation? Should we expect zeroes?
That's correct. I discussed this in the cover letter. That's why this
check is here. Per some of the discussion from others I still think the
best course of action is to just check what the discard did and fail if
it is non-zero. Even though many NVMe and ATA devices have the ability
to control or query the behaviour, the kernel doesn't support this and
I don't think it can be relied upon.
>> @@ -945,6 +983,15 @@ int Create(struct supertype *st, char *mddev,
>> }
>> if (fd >= 0)
>> remove_partitions(fd);
>> +
>> + if (s->discard &&
>> + discard_device(c, fd, dv->devname,
>> + dv->data_offset << 9,
>> + s->size << 10)) {
>> + ioctl(mdfd, STOP_ARRAY, NULL);
>> + goto abort_locked;
>> + }
>> +
> Feel free to use up to 100 char in one line it is allowed now.
> Why we need dv->data_offset << 9 and s->size << 10 here?
> How this applies to zoned raid0?
As I understand it the offset and size will give the bounds of the
data region on the disk. Do you not think it works for zoned raid0?
>> diff --git a/mdadm.c b/mdadm.c
>> index 972adb524dfb..049cdce1cdd2 100644
>> --- a/mdadm.c
>> +++ b/mdadm.c
>> @@ -602,6 +602,10 @@ int main(int argc, char *argv[])
>> s.assume_clean = 1;
>> continue;
>>
>> + case O(CREATE, Discard):
>> + s.discard = true;
>> + continue;
>> +
>
> I would like to set s.assume_clean=true along with discard. Then will be no need
> to modify other conditions. If we are assuming that after discard all is zeros
> then we can skip resync, right? According to message, it should be.
> Please add message for user and set assume_clean too.
Well it was my opinion that it was clearer in the code to just
explicitly include discard in the conditionals instead of making discard
also set assume-clean, but if you think otherwise I can change it for v3.
What kind of user message are you thinking is necessary here?
Logan
next prev parent reply other threads:[~2022-09-09 15:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-08 23:08 [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create Logan Gunthorpe
2022-09-09 9:57 ` Mariusz Tkaczyk
2022-09-09 11:54 ` Roman Mamedov
[not found] ` <CABdXBANrJNWjq4237k9DPRoxLVmiAUoKMZxaaLUrcMHsODwvmA@mail.gmail.com>
2022-09-09 15:31 ` Roman Mamedov
2022-09-12 17:43 ` Martin K. Petersen
2022-09-09 15:47 ` Logan Gunthorpe [this message]
2022-09-13 7:35 ` Mariusz Tkaczyk
2022-09-13 15:43 ` Logan Gunthorpe
2022-09-14 12:01 ` Mariusz Tkaczyk
2022-09-14 16:29 ` Logan Gunthorpe
2022-09-14 17:39 ` Mariusz Tkaczyk
2022-09-19 8:41 ` Xiao Ni
2022-09-21 18:45 ` Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 2/2] manpage: Add --discard option to manpage Logan Gunthorpe
2022-09-12 17:40 ` [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Martin K. Petersen
[not found] ` <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com>
2022-09-13 3:47 ` Martin K. Petersen
2022-09-13 15:38 ` Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6dd46583-05ef-12e7-8a37-b732cbe79f23@deltatee.com \
--to=logang@deltatee.com \
--cc=David.Sloan@eideticom.com \
--cc=Martin.Oliveira@eideticom.com \
--cc=chaitanyak@nvidia.com \
--cc=colyli@suse.de \
--cc=guoqing.jiang@linux.dev \
--cc=jes@trained-monkey.org \
--cc=jm@chia.net \
--cc=linux-raid@vger.kernel.org \
--cc=mariusz.tkaczyk@linux.intel.com \
--cc=sbates@raithlin.com \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox