From: Alberto Faria <afaria@redhat.com>
To: virtio-comment@lists.linux.dev, mst@redhat.com,
stefanha@redhat.com, dverkamp@chromium.org
Cc: Alberto Faria <afaria@redhat.com>
Subject: [PATCH v4] virtio-blk: Add support for "Force Unit Access" writes
Date: Wed, 12 Nov 2025 05:51:13 +0000 [thread overview]
Message-ID: <20251112055113.62207-1-afaria@redhat.com> (raw)
Add a VIRTIO_BLK_F_REQ_FLAGS feature bit converting the current
`virtio_blk_req::reserved` field into a `flags` bit field, which can be
used to modify the behavior of an entire request. The meaning of each
bit depends on the request type.
Define a single VIRTIO_BLK_REQ_FLAG_OUT_FUA bit as signaling that a
VIRTIO_BLK_T_OUT request should be a "Force Unit Access" (FUA) write,
i.e., should become stable once the request completes. FUA writes enable
better performance compared to the alternative of waiting for a write to
complete and subsequently submitting a flush.
Also add a VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA feature bit indicating device
support for the aforementioned FUA bit.
This approach allows for future expansion to other request-level flags
and allows the same flag bit to be used for different purposes on
different request types. The VIRTIO_BLK_F_REQ_FLAGS feature bit ensures
compatibility with legacy devices/drivers that interpret the
previously-`reserved` field as a priority indicator.
Signed-off-by: Alberto Faria <afaria@redhat.com>
---
v4:
- Have the semantics of each request flag depend on the request type, as
suggested by Stefan.
- Some other smaller rewordings suggested by Stefan.
v3:
- Changed to a more future-proof approach somewhat similar to what was
suggested by Stefan.
- Included a brief rationale for the introduction of FUA write requests,
as suggested by Michael.
v2:
- Redefine VIRTIO_BLK_T_OUT_FUA to 27 since 15 is already in use.
- Clarify that the cache mode has no impact on VIRTIO_BLK_T_OUT_FUA
semantics.
- Allow drivers to negotiate VIRTIO_BLK_F_OUT_FUA even if they are
incapable of sending VIRTIO_BLK_T_OUT_FUA commands.
device-types/blk/description.tex | 43 +++++++++++++++++++++++++++++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex
index 2712ada..3b3a4e7 100644
--- a/device-types/blk/description.tex
+++ b/device-types/blk/description.tex
@@ -66,6 +66,13 @@ \subsection{Feature bits}\label{sec:Device Types / Block Device / Feature bits}
(ZNS). For brevity, these standard documents are referred as "ZBD standards"
from this point on in the text.
+\item[VIRTIO_BLK_F_REQ_FLAGS (18)] Device can interpret the \field{flags}
+ bitfield in the \field{virtio_blk_req} structure.
+
+\item[VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA (19)] Device supports the
+ VIRTIO_BLK_REQ_FLAG_OUT_FUA flag in the \field{flags} bitfield of the
+ \field{virtio_blk_req} structure for VIRTIO_BLK_T_OUT requests.
+
\end{description}
\subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
@@ -317,6 +324,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
driver SHOULD ignore all other fields in \field{zoned}.
\end{itemize}
+The driver MUST NOT negotiate VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA without
+VIRTIO_BLK_F_REQ_FLAGS.
+
\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it
@@ -402,6 +412,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
\item the device MUST initialize padding bytes \field{unused2} to 0.
\end{itemize}
+The device MUST NOT acknowledge FEATURES_OK if the driver sets
+VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA without VIRTIO_BLK_F_REQ_FLAGS.
+
\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
Because legacy devices do not have FEATURES_OK, transitional devices
@@ -434,7 +447,7 @@ \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Ope
\begin{lstlisting}
struct virtio_blk_req {
le32 type;
- le32 reserved;
+ le32 flags;
le64 sector;
u8 data[];
u8 status;
@@ -459,6 +472,16 @@ \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Ope
#define VIRTIO_BLK_T_SECURE_ERASE 14
\end{lstlisting}
+The \field{flags} bitfield is ignored by the device unless
+VIRTIO_BLK_F_REQ_FLAGS is negotiated, in which case each bit's meaning depends
+on the request type. The following flags are currently defined (the numeric
+value is the bit index in the \field{flags} bitfield):
+
+\begin{description}
+\item[VIRTIO_BLK_REQ_FLAG_OUT_FUA (0) for VIRTIO_BLK_T_OUT requests] Force Unit
+ Access (FUA) flag.
+\end{description}
+
The \field{sector} number indicates the offset (multiplied by 512) where
the read or write is to occur. This field is unused and set to 0 for
commands other than read, write and some zone operations.
@@ -873,6 +896,11 @@ \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Ope
A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
+If VIRTIO_BLK_F_REQ_FLAGS is negotiated, a driver MUST NOT set a bit in
+\field{flags} (e.g., VIRTIO_BLK_REQ_FLAG_OUT_FUA) unless the corresponding
+feature (e.g., VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA) for the request type in question
+(e.g., VIRTIO_BLK_T_OUT) is negotiated.
+
A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request.
A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
@@ -1000,14 +1028,21 @@ \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Ope
\field{writeback} field in configuration space was 0 \textbf{all the time between
the submission of the write and its completion};
-\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
+\item\label{item:flush3} the VIRTIO_BLK_F_REQ_FLAGS_OUT_FUA feature was
+ negotiated and the VIRTIO_BLK_REQ_FLAG_OUT_FUA bit in \field{flags} was set in
+ the write request (regardless of whether the VIRTIO_BLK_F_FLUSH or
+ VIRTIO_BLK_F_CONFIG_WCE features were negotiated, and regardless of the
+ current cache mode as expressed by the value of the \field{writeback} field in
+ configuration space).
+
+\item\label{item:flush4} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
completed} and is completed itself.
\end{enumerate}
If the device is backed by persistent storage, the device MUST ensure that
stable writes are committed to it, before reporting completion of the write
-(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush
-(case~\ref{item:flush3}). Failure to do so can cause data loss
+(cases~\ref{item:flush1}, \ref{item:flush2} and~\ref{item:flush3}) or the flush
+(case~\ref{item:flush4}). Failure to do so can cause data loss
in case of a crash.
If the driver changes \field{writeback} between the submission of the write
--
2.51.1
next reply other threads:[~2025-11-12 5:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 5:51 Alberto Faria [this message]
2025-11-13 15:55 ` [PATCH v4] virtio-blk: Add support for "Force Unit Access" writes Stefan Hajnoczi
2026-01-21 3:33 ` Alberto Faria
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251112055113.62207-1-afaria@redhat.com \
--to=afaria@redhat.com \
--cc=dverkamp@chromium.org \
--cc=mst@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox