From: Hannes Reinecke <hare@suse.de>
To: John Garry <john.g.garry@oracle.com>,
axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
jejb@linux.ibm.com, martin.petersen@oracle.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com,
jack@suse.cz
Cc: djwong@kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
linux-scsi@vger.kernel.org, ojaswin@linux.ibm.com,
linux-aio@kvack.org, linux-btrfs@vger.kernel.org,
io-uring@vger.kernel.org, nilay@linux.ibm.com,
ritesh.list@gmail.com, willy@infradead.org, agk@redhat.com,
snitzer@kernel.org, mpatocka@redhat.com,
dm-devel@lists.linux.dev, Alan Adamson <alan.adamson@oracle.com>
Subject: Re: [Patch v9 10/10] nvme: Atomic write support
Date: Fri, 21 Jun 2024 08:17:21 +0200 [thread overview]
Message-ID: <e2574365-cb5b-4376-aa8e-adf05b788337@suse.de> (raw)
In-Reply-To: <20240620125359.2684798-11-john.g.garry@oracle.com>
On 6/20/24 14:53, John Garry wrote:
> From: Alan Adamson <alan.adamson@oracle.com>
>
> Add support to set block layer request_queue atomic write limits. The
> limits will be derived from either the namespace or controller atomic
> parameters.
>
> NVMe atomic-related parameters are grouped into "normal" and "power-fail"
> (or PF) class of parameter. For atomic write support, only PF parameters
> are of interest. The "normal" parameters are concerned with racing reads
> and writes (which also applies to PF). See NVM Command Set Specification
> Revision 1.0d section 2.1.4 for reference.
>
> Whether to use per namespace or controller atomic parameters is decided by
> NSFEAT bit 1 - see Figure 97: Identify – Identify Namespace Data
> Structure, NVM Command Set.
>
> NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
> are provided for a write which straddles this per-lba space boundary. The
> block layer merging policy is such that no merges may occur in which the
> resultant request would straddle such a boundary.
>
> Unlike SCSI, NVMe specifies no granularity or alignment rules, apart from
> atomic boundary rule. In addition, again unlike SCSI, there is no
> dedicated atomic write command - a write which adheres to the atomic size
> limit and boundary is implicitly atomic.
>
> If NSFEAT bit 1 is set, the following parameters are of interest:
> - NAWUPF (Namespace Atomic Write Unit Power Fail)
> - NABSPF (Namespace Atomic Boundary Size Power Fail)
> - NABO (Namespace Atomic Boundary Offset)
>
> and we set request_queue limits as follows:
> - atomic_write_unit_max = rounddown_pow_of_two(NAWUPF)
> - atomic_write_max_bytes = NAWUPF
> - atomic_write_boundary = NABSPF
>
> If in the unlikely scenario that NABO is non-zero, then atomic writes will
> not be supported at all as dealing with this adds extra complexity. This
> policy may change in future.
>
> In all cases, atomic_write_unit_min is set to the logical block size.
>
> If NSFEAT bit 1 is unset, the following parameter is of interest:
> - AWUPF (Atomic Write Unit Power Fail)
>
> and we set request_queue limits as follows:
> - atomic_write_unit_max = rounddown_pow_of_two(AWUPF)
> - atomic_write_max_bytes = AWUPF
> - atomic_write_boundary = 0
>
> A new function, nvme_valid_atomic_write(), is also called from submission
> path to verify that a request has been submitted to the driver will
> actually be executed atomically. As mentioned, there is no dedicated NVMe
> atomic write command (which may error for a command which exceeds the
> controller atomic write limits).
>
> Note on NABSPF:
> There seems to be some vagueness in the spec as to whether NABSPF applies
> for NSFEAT bit 1 being unset. Figure 97 does not explicitly mention NABSPF
> and how it is affected by bit 1. However Figure 4 does tell to check Figure
> 97 for info about per-namespace parameters, which NABSPF is, so it is
> implied. However currently nvme_update_disk_info() does check namespace
> parameter NABO regardless of this bit.
>
> Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> jpg: total rewrite
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
next prev parent reply other threads:[~2024-06-21 6:17 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 12:53 [Patch v9 00/10] block atomic writes John Garry
2024-06-20 12:53 ` [Patch v9 01/10] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-06-20 14:12 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 02/10] block: Generalize chunk_sectors support as boundary support John Garry
2024-06-20 14:14 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 03/10] fs: Initial atomic write support John Garry
2024-06-21 5:56 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 04/10] fs: Add initial atomic write support info to statx John Garry
2024-06-21 5:57 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 05/10] block: Add core atomic write support John Garry
2024-06-20 19:34 ` Keith Busch
2024-06-21 6:09 ` Hannes Reinecke
2024-06-21 7:41 ` John Garry
2024-06-20 12:53 ` [Patch v9 06/10] block: Add atomic write support for statx John Garry
2024-06-20 19:46 ` Keith Busch
2024-06-21 6:10 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 07/10] block: Add fops atomic write support John Garry
2024-06-20 19:46 ` Keith Busch
2024-06-21 6:13 ` Hannes Reinecke
2024-06-21 12:02 ` John Garry
2024-06-21 21:23 ` Darrick J. Wong
2024-06-21 9:41 ` Kanchan Joshi
2024-06-20 12:53 ` [Patch v9 08/10] scsi: sd: Atomic " John Garry
2024-06-21 6:15 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 09/10] scsi: scsi_debug: " John Garry
2024-06-21 6:15 ` Hannes Reinecke
2024-06-20 12:53 ` [Patch v9 10/10] nvme: " John Garry
2024-06-20 20:36 ` Keith Busch
2024-06-21 6:17 ` Hannes Reinecke [this message]
2024-06-21 9:40 ` Kanchan Joshi
2024-06-20 21:23 ` [Patch v9 00/10] block atomic writes Jens Axboe
2024-06-21 7:59 ` John Garry
2024-06-21 14:28 ` Jens Axboe
2024-06-21 14:41 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2574365-cb5b-4376-aa8e-adf05b788337@suse.de \
--to=hare@suse.de \
--cc=agk@redhat.com \
--cc=alan.adamson@oracle.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=jack@suse.cz \
--cc=jbongio@google.com \
--cc=jejb@linux.ibm.com \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-aio@kvack.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mpatocka@redhat.com \
--cc=nilay@linux.ibm.com \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=sagi@grimberg.me \
--cc=snitzer@kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).