public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Jonmichael Hands <jm@chia.net>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	Coly Li <colyli@suse.de>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH mdadm v2 0/2] Discard Option for Creating Arrays
Date: Mon, 12 Sep 2022 23:47:47 -0400	[thread overview]
Message-ID: <yq1sfkw7yqi.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com> (Jonmichael Hands's message of "Mon, 12 Sep 2022 11:01:48 -0700")


Jonmichael,

> are there capabilities of REQ_OP_WRITE_ZEROES for detection of NVMe
> DLFEAT in the identify namespace information? The purpose of this
> capability is for operating systems to detect it, precisely for use
> cases like we have identified where deterministic read zero is
> required to save a tremendous amount of time and NAND endurance.

I don't believe DEAC/DLFEAT are currently wired up in the NVMe driver
but it would be trivial to match what SCSI does in that department.

The intent of the REQ_OP_WRITE_ZEROES interface is to provide the choice
between deallocate semantics (think discard) and allocate semantics
(think write same) for zeroing. See the BLKDEV_ZERO_NOUNMAP flag for
more info.

The important distinction between REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES
is that the latter is a data integrity operation that produces
deterministic results. I.e. guarantees that all blocks will return
zeroes on subsequent reads. Whereas REQ_OP_DISCARD is a hint that can
and often will skip portions of the request sent.

It was a mistake to conflate deallocation and zeroing in our initial
implementation of discards in Linux. We have painstakingly removed that
and now provide two distinct interfaces: REQ_OP_DISCARD tells a device
that a block range is no longer in use, we don't care about block
contents for future reads. Whereas REQ_OP_WRITE_ZEROES aims to provide
an optimal interface for clearing block ranges given the reported
characteristics of a given device.

Note that I am careful about using REQ_OP_DISCARD and
REQ_OP_WRITE_ZEROES terminology to describe the block layer primitives
for deallocating and zeroing block ranges here. At the bottom of the
stack, a REQ_OP_WRITE_ZEROES operation could very well end up issuing
what people would think of as a "discard" operation (DSM TRIM, WRITE
SAME w/UNMAP) assuming the device has been identified as doing the right
thing.

Anything operating at the block device level should be using the
REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES primitives (or their corresponding
ioctls or fallocate flags). And if there is a need to address how those
primitives are translated into commands for a given device, then we
should handle that in the relevant device driver.

-- 
Martin K. Petersen	Oracle Linux Engineering

  parent reply	other threads:[~2022-09-13  3:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 23:08 [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 1/2] mdadm: Add --discard option for Create Logan Gunthorpe
2022-09-09  9:57   ` Mariusz Tkaczyk
2022-09-09 11:54     ` Roman Mamedov
     [not found]       ` <CABdXBANrJNWjq4237k9DPRoxLVmiAUoKMZxaaLUrcMHsODwvmA@mail.gmail.com>
2022-09-09 15:31         ` Roman Mamedov
2022-09-12 17:43       ` Martin K. Petersen
2022-09-09 15:47     ` Logan Gunthorpe
2022-09-13  7:35       ` Mariusz Tkaczyk
2022-09-13 15:43         ` Logan Gunthorpe
2022-09-14 12:01           ` Mariusz Tkaczyk
2022-09-14 16:29             ` Logan Gunthorpe
2022-09-14 17:39               ` Mariusz Tkaczyk
2022-09-19  8:41   ` Xiao Ni
2022-09-21 18:45     ` Logan Gunthorpe
2022-09-08 23:08 ` [PATCH mdadm v2 2/2] manpage: Add --discard option to manpage Logan Gunthorpe
2022-09-12 17:40 ` [PATCH mdadm v2 0/2] Discard Option for Creating Arrays Martin K. Petersen
     [not found]   ` <CABdXBAP0LeQMmhSLUMZ_TmnSp5xmZ4xJBkNa7HUm7094m_x9xA@mail.gmail.com>
2022-09-13  3:47     ` Martin K. Petersen [this message]
2022-09-13 15:38   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1sfkw7yqi.fsf@ca-mkp.ca.oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=chaitanyak@nvidia.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=jes@trained-monkey.org \
    --cc=jm@chia.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=sbates@raithlin.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox