From: Cornelia Huck <cohuck@redhat.com>
To: Dmitry Fomichev <dmitry.fomichev@wdc.com>,
virtio-dev@lists.oasis-open.org,
virtio-comment@lists.oasis-open.org
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
Hannes Reinecke <hare@suse.de>,
Matias Bjorling <matias.bjorling@wdc.com>,
Niklas Cassel <niklas.cassel@wdc.com>,
Hans Holmberg <Hans.Holmberg@wdc.com>,
Klaus Jensen <its@irrelevant.dk>,
Sam Li <faithilikerun@gmail.com>,
Dmitry Fomichev <dmitry.fomichev@wdc.com>
Subject: Re: [virtio-dev] [PATCH v5] virtio-blk: add zoned block device specification
Date: Mon, 29 Aug 2022 16:46:50 +0200 [thread overview]
Message-ID: <87v8qbds4l.fsf@redhat.com> (raw)
In-Reply-To: <20220829005818.12532-1-dmitry.fomichev@wdc.com>
On Sun, Aug 28 2022, Dmitry Fomichev <dmitry.fomichev@wdc.com> wrote:
> Introduce support for Zoned Block Devices to virtio.
>
> Zoned Block Devices (ZBDs) aim to achieve a better capacity, latency
> and/or cost characteristics compared to commonly available block
> devices by getting the entire LBA space of the device divided to block
> regions that are much larger than the LBA size. These regions are
> called zones and they can only be written sequentially. More details
> about ZBDs can be found at
>
> https://zonedstorage.io/docs/introduction/zoned-storage .
>
> In its current form, the virtio protocol for block devices (virtio-blk)
> is not aware of ZBDs but it allows the driver to successfully scan a
> host-managed drive provided by the virtio block device. As the result,
> the host-managed drive is recognized by virtio driver as a regular,
> non-zoned drive that will operate erroneously under the most common
> write workloads. Host-aware ZBDs are currently usable, but their
> performance may not be optimal because the driver can only see them as
> non-zoned block devices.
>
> To fix this, the virtio-blk protocol needs to be extended to add the
> capabilities to convey the zone characteristics of ZBDs at the device
> side to the driver and to provide support for ZBD-specific commands -
> Report Zones, four zone operations (Open, Close, Finish and Reset) and
> (optionally) Zone Append. The proposed standard extension aims to
> define this new functionality.
>
> This patch extends the virtio-blk section of virtio specification with
> the minimum set of requirements that are necessary to support ZBDs.
> The resulting device model is a subset of the models defined in ZAC/ZBC
> and ZNS standards documents. The included functionality mirrors
> the existing Linux kernel block layer ZBD support and should be
> sufficient to handle the host-managed and host-aware HDDs that are on
> the market today as well as ZNS SSDs that are entering the market at
> the time of submission of this patch.
>
> I would like to thank the following people for their useful feedback
> and suggestions while working on the initial iterations of this patch.
>
> Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Matias Bjørling <Matias.Bjorling@wdc.com>
> Niklas Cassel <Niklas.Cassel@wdc.com>
> Hans Holmberg <Hans.Holmberg@wdc.com>
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/143
> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
(...)
I'm commenting from a more-or-less pure spec point of view; I'll leave
any block-related details to others more familiar with the matter.
> @@ -4589,6 +4596,75 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
> \field{max_secure_erase_sectors} \field{secure_erase_sector_alignment} are expressed
> in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiated.
>
> +If the VIRTIO_BLK_F_ZONED feature is negotiated, then in
> +\field{virtio_blk_zoned_characteristics},
> +\begin{itemize}
> +\item \field{zone_sectors} value is expressed in 512-byte sectors.
> +\item \field{max_append_sectors} value is expressed in 512-byte sectors.
> +\item \field{write_granularity} value is expressed in bytes.
> +\end{itemize}
> +
> +The \field{model} field in \field{zoned} may have the following values:
> +VIRTIO_BLK_Z_NONE(0), VIRTIO_BLK_Z_HM(1) and VIRTIO_BLK_Z_HA(2).
I think you can drop that line, as you define the values below.
> +
> +\begin{lstlisting}
> +#define VIRTIO_BLK_Z_NONE 0
> +#define VIRTIO_BLK_Z_HM 1
> +#define VIRTIO_BLK_Z_HA 2
> +\end{lstlisting}
> +
(...)
> @@ -4701,6 +4790,30 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
> The driver MUST NOT read \field{writeback} before setting
> the FEATURES_OK \field{device status} bit.
>
> +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_ZONED feature if they are incapable
> +of supporting devices with the VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA zoned model.
> +
> +If the VIRTIO_BLK_F_ZONED feature is offered by the device with the
> +VIRTIO_BLK_Z_HM zone model, then
> +\begin{itemize}
> +\item the VIRTIO_BLK_F_DISCARD feature must not be offered by the driver.
s/must not/MUST NOT/
Also, s/driver/device/, I guess? And it needs to go into the device
normative section?
> +
> +\item if the driver sets both VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_ZONED
> + feature bits, the device must fail by not setting FEATURES_OK bit.
Hm, shouldn't the device fail already if the driver wants to negotiate a
feature that it had not offered in the first place? (Also, s/must/MUST/
if we decide to keep it, and needs to go into the device normative
section. The correct terminology would be "the device MUST fail setting
the FEATURES_OK bit.")
> +\end{itemize}
Is it ok for the device to offer both F_DISCARD and F_ZONED with Z_HA or
Z_NONE? In that case, is it ok for the driver to negotiate both
F_DISCARD and F_ZONED, or does it need to pick at most one of them?
> +
> +If the VIRTIO_BLK_F_ZONED feature is negotiated, then
> +\begin{itemize}
> +\item if the driver that can not support host-managed zoned devices
> + reads VIRTIO_BLK_Z_HM from the \field{model} field of \field{zoned}, the
> + driver MUST NOT set FEATURES_OK flag and instead set the FAILED bit.
[I'll skip commenting on "negotiated" vs "offered", as we need to clarify
that comprehensively in the whole document.]
Could the driver simply reject F_ZONED in that case? The device can
always fail setting FEATURES_OK if the driver does not accept a feature
that is needed. Maybe make this a "SHOULD NOT"?
> +
> +\item if the driver that can not support support zoned devices reads
> + VIRTIO_BLK_Z_HA from the \field{model} field of \field{zoned}, the driver
> + MAY handle the device as a non-zoned device. In this case, the
> + driver SHOULD ignore all other fields in \field{zoned}.
I would expect a driver that does not support zoned devices at all to
always reject F_ZONED (including older drivers). Should we mandate that
the driver does not accept F_ZONED if it cannot handle it?
> +\end{itemize}
> +
> \devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
>
> Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it
> @@ -4712,6 +4825,74 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
> The device MUST initialize padding bytes \field{unused0} and
> \field{unused1} to 0.
>
> +If the device that is being initialized is a not a zoned device, the device
> +SHOULD NOT offer the VIRTIO_BLK_F_ZONED feature.
> +
> +If the VIRTIO_BLK_F_ZONED bit is not set by the driver,
If it is not accepted, I guess?
> +\begin{itemize}
> +\item the device with the VIRTIO_BLK_Z_HA zone model SHOULD proceed with the
> + initialization while setting all zoned characteristics fields to zero.
> +
> +\item the device with the VIRTIO_BLK_Z_HM zone model MUST fail to set the
> + FEATURES_OK device status bit when the driver writes the Device Status
> + field.
> +\end{itemize}
> +
> +If the VIRTIO_BLK_F_ZONED feature is negotiated, then the \field{model} field in
> +\field{zoned} struct in the configuration space MUST be set by the device
> +\begin{itemize}
> +\item to the value of VIRTIO_BLK_Z_NONE if it operates as a drive-managed
> + zoned block device or a non-zoned block device.
> +
> +\item to the value of VIRTIO_BLK_Z_HM if it operates as a host-managed zoned
> + block device.
> +
> +\item to the value of VIRTIO_BLK_Z_HA if it operates as a host-aware zoned
> + block device.
> +\end{itemize}
> +
> +If the VIRTIO_BLK_F_ZONED feature is negotiated,
> +\begin{itemize}
> +\item the \field{zone_sectors} field of \field{zoned} MUST be set by the device
> + to the size of a single zone on the device. All zones of the device have the
> + same size indicated by \field{zone_sectors} except for the last zone that
> + MAY be smaller than all other zones. The driver can calculate the number of
> + zones on the device as
> + \begin{lstlisting}
> + nr_zones = (capacity + zone_sectors - 1) / zone_sectors;
> + \end{lstlisting}
> + and the size of the last zone as
> + \begin{lstlisting}
> + zs_last = capacity - (nr_zones - 1) * zone_sectors;
> + \end{lstlisting}
> +
> +\item The \field{max_open_zones} field of the \field{zoned} structure MUST be
> + set by the device to the maximum number of zones that can be open on the
> + device (zones in the implicit open or explicit open state). A value
> + of zero indicates that the device does not have any limit on the number of
> + open zones.
> +
> +\item The \field{max_active_zones} field of the \field{zoned} structure MUST
> + be set by the device to the maximum number zones that can be active on the
> + device (zones in the implicit open, explicit open or closed state). A value
> + of zero indicates that the device does not have any limit on the number of
> + active zones.
> +
> +\item the \field{max_append_sectors} field of \field{zoned} MUST be set by
> + the device to the maximum data size of a VIRTIO_BLK_T_ZONE_APPEND request
> + that can be successfully issued to the device. The value of this field MUST
> + NOT exceed the \field{seg_max} * \field{size_max} value. A device MAY set
> + the \field{max_append_sectors} to zero if it doesn't support
> + VIRTIO_BLK_T_ZONE_APPEND requests.
> +
> +\item the \field{write_granularity} field of \field{zoned} MUST be set by the
> + device to the offset and size alignment constraint for VIRTIO_BLK_T_OUT
> + and VIRTIO_BLK_T_ZONE_APPEND requests issued to a sequential zone of the
> + device.
> +
> +\item the device MUST initialize padding bytes \field{unused2} to 0.
> +\end{itemize}
As this depends on FEATURES_OK a lot, I guess that F_ZONED cannot be
implemented for legacy devices. Do we need to be explicit about that?
> +
> \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
>
> Because legacy devices do not have FEATURES_OK, transitional devices
(...)
> +The zone descriptor field \field{z_type} \field{virtio_blk_zone_descriptor}
> +indicates the type of the zone. The available types are VIRTIO_BLK_ZT_CONV(1),
> +VIRTIO_BLK_ZT_SWR(2) or VIRTIO_BLK_ZT_SWP(3).
"The following types are available:" ?
> +
> +\begin{lstlisting}
> +#define VIRTIO_BLK_ZT_CONV 1
> +#define VIRTIO_BLK_ZT_SWR 2
> +#define VIRTIO_BLK_ZT_SWP 3
> +\end{lstlisting}
> +
> +Read and write operations into zones with the VIRTIO_BLK_ZT_CONV (Conventional)
> +type have the same behavior as read and write operations on a regular block
> +device. Any block in a conventional zone can be read or written at any time and
> +in any order.
> +
> +Zones with VIRTIO_BLK_ZT_SWR can be read randomly, but must be written
> +sequentially at a certain point in the zone called the Write Pointer (WP). With
> +every write, the Write Pointer is incremented by the number of sectors written.
> +
> +Zones with VIRTIO_BLK_ZT_SWP can be read randomly and should be written
> +sequentially, similarly to SWR zones. However, SWP zones can accept random write
> +operations, that is, VIRTIO_BLK_T_OUT requests with a start sector different
> +from the zone write pointer position.
> +
> +The field \field{z_state} of \field{virtio_blk_zone_descriptor} indicates the
> +state of the device zone. The available zone states are VIRTIO_BLK_ZS_NOT_WP(0),
> +VIRTIO_BLK_ZS_EMPTY(1), VIRTIO_BLK_ZS_IOPEN(2), VIRTIO_BLK_ZS_EOPEN(3),
> +VIRTIO_BLK_ZS_CLOSED(4), VIRTIO_BLK_ZS_RDONLY(13), VIRTIO_BLK_ZS_FULL(14) and
> +VIRTIO_BLK_ZS_OFFLINE(15).
"The following zone states are available:" ?
> +
> +\begin{lstlisting}
> +#define VIRTIO_BLK_ZS_NOT_WP 0
> +#define VIRTIO_BLK_ZS_EMPTY 1
> +#define VIRTIO_BLK_ZS_IOPEN 2
> +#define VIRTIO_BLK_ZS_EOPEN 3
> +#define VIRTIO_BLK_ZS_CLOSED 4
> +#define VIRTIO_BLK_ZS_RDONLY 13
> +#define VIRTIO_BLK_ZS_FULL 14
> +#define VIRTIO_BLK_ZS_OFFLINE 15
> +\end{lstlisting}
(...)
> +The device SHALL report all other error conditions related to zoned block model
s/SHALL report/reports/ ? This is outside of a normative section if I'm
not confused.
> +operation by setting the VIRTIO_BLK_S_ZONE_INVALID_CMD value in
> +\field{status} of \field{virtio_blk_req} structure.
> +
> \drivernormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
>
> A driver MUST NOT submit a request which would cause a read or write
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2022-08-29 14:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-29 0:58 [virtio-dev] [PATCH v5] virtio-blk: add zoned block device specification Dmitry Fomichev
2022-08-29 14:46 ` Cornelia Huck [this message]
2022-09-04 16:55 ` Dmitry Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8qbds4l.fsf@redhat.com \
--to=cohuck@redhat.com \
--cc=Hans.Holmberg@wdc.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dmitry.fomichev@wdc.com \
--cc=faithilikerun@gmail.com \
--cc=hare@suse.de \
--cc=its@irrelevant.dk \
--cc=matias.bjorling@wdc.com \
--cc=niklas.cassel@wdc.com \
--cc=stefanha@gmail.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox