From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Jonmichael Hands <jm@chia.net>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
Coly Li <colyli@suse.de>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Stephen Bates <sbates@raithlin.com>,
Martin Oliveira <Martin.Oliveira@eideticom.com>,
David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH mdadm v2 0/2] Discard Option for Creating Arrays
Date: Mon, 12 Sep 2022 23:47:47 -0400 [thread overview]
Message-ID: <yq1sfkw7yqi.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com> (Jonmichael Hands's message of "Mon, 12 Sep 2022 11:01:48 -0700")
Jonmichael,
> are there capabilities of REQ_OP_WRITE_ZEROES for detection of NVMe
> DLFEAT in the identify namespace information? The purpose of this
> capability is for operating systems to detect it, precisely for use
> cases like we have identified where deterministic read zero is
> required to save a tremendous amount of time and NAND endurance.
I don't believe DEAC/DLFEAT are currently wired up in the NVMe driver
but it would be trivial to match what SCSI does in that department.
The intent of the REQ_OP_WRITE_ZEROES interface is to provide the choice
between deallocate semantics (think discard) and allocate semantics
(think write same) for zeroing. See the BLKDEV_ZERO_NOUNMAP flag for
more info.
The important distinction between REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES
is that the latter is a data integrity operation that produces
deterministic results. I.e. guarantees that all blocks will return
zeroes on subsequent reads. Whereas REQ_OP_DISCARD is a hint that can
and often will skip portions of the request sent.
It was a mistake to conflate deallocation and zeroing in our initial
implementation of discards in Linux. We have painstakingly removed that
and now provide two distinct interfaces: REQ_OP_DISCARD tells a device
that a block range is no longer in use, we don't care about block
contents for future reads. Whereas REQ_OP_WRITE_ZEROES aims to provide
an optimal interface for clearing block ranges given the reported
characteristics of a given device.
Note that I am careful about using REQ_OP_DISCARD and
REQ_OP_WRITE_ZEROES terminology to describe the block layer primitives
for deallocating and zeroing block ranges here. At the bottom of the
stack, a REQ_OP_WRITE_ZEROES operation could very well end up issuing
what people would think of as a "discard" operation (DSM TRIM, WRITE
SAME w/UNMAP) assuming the device has been identified as doing the right
thing.
Anything operating at the block device level should be using the
REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES primitives (or their corresponding
ioctls or fallocate flags). And if there is a need to address how those
primitives are translated into commands for a given device, then we
should handle that in the relevant device driver.
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2022-09-13 3:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-08 23:08 [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create Logan Gunthorpe
2022-09-09 9:57 ` Mariusz Tkaczyk
2022-09-09 11:54 ` Roman Mamedov
[not found] ` <CABdXBANrJNWjq4237k9DPRoxLVmiAUoKMZxaaLUrcMHsODwvmA@mail.gmail.com>
2022-09-09 15:31 ` Roman Mamedov
2022-09-12 17:43 ` Martin K. Petersen
2022-09-09 15:47 ` Logan Gunthorpe
2022-09-13 7:35 ` Mariusz Tkaczyk
2022-09-13 15:43 ` Logan Gunthorpe
2022-09-14 12:01 ` Mariusz Tkaczyk
2022-09-14 16:29 ` Logan Gunthorpe
2022-09-14 17:39 ` Mariusz Tkaczyk
2022-09-19 8:41 ` Xiao Ni
2022-09-21 18:45 ` Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 2/2] manpage: Add --discard option to manpage Logan Gunthorpe
2022-09-12 17:40 ` [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Martin K. Petersen
[not found] ` <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com>
2022-09-13 3:47 ` Martin K. Petersen [this message]
2022-09-13 15:38 ` Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1sfkw7yqi.fsf@ca-mkp.ca.oracle.com \
--to=martin.petersen@oracle.com \
--cc=David.Sloan@eideticom.com \
--cc=Martin.Oliveira@eideticom.com \
--cc=chaitanyak@nvidia.com \
--cc=colyli@suse.de \
--cc=guoqing.jiang@linux.dev \
--cc=jes@trained-monkey.org \
--cc=jm@chia.net \
--cc=linux-raid@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=mariusz.tkaczyk@linux.intel.com \
--cc=sbates@raithlin.com \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox