Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
* [PATCH] virtio-blk: Define dev cfg layout before its fields
From: Parav Pandit @ 2023-02-23 13:52 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck
  Cc: virtio-comment, shahafs, Parav Pandit, Max Gurtovoy

Define device configuration layout structure before describing its
individual fields.

This is an editorial change.

Suggested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 device-types/blk/description.tex | 95 ++++++++++++++++----------------
 1 file changed, 48 insertions(+), 47 deletions(-)

diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex
index 20007e3..517b012 100644
--- a/device-types/blk/description.tex
+++ b/device-types/blk/description.tex
@@ -83,6 +83,54 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block De
 
 \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
 
+The block device has the following device configuration layout.
+
+\begin{lstlisting}
+struct virtio_blk_config {
+        le64 capacity;
+        le32 size_max;
+        le32 seg_max;
+        struct virtio_blk_geometry {
+                le16 cylinders;
+                u8 heads;
+                u8 sectors;
+        } geometry;
+        le32 blk_size;
+        struct virtio_blk_topology {
+                // # of logical blocks per physical block (log2)
+                u8 physical_block_exp;
+                // offset of first aligned logical block
+                u8 alignment_offset;
+                // suggested minimum I/O size in blocks
+                le16 min_io_size;
+                // optimal (suggested maximum) I/O size in blocks
+                le32 opt_io_size;
+        } topology;
+        u8 writeback;
+        u8 unused0;
+        u16 num_queues;
+        le32 max_discard_sectors;
+        le32 max_discard_seg;
+        le32 discard_sector_alignment;
+        le32 max_write_zeroes_sectors;
+        le32 max_write_zeroes_seg;
+        u8 write_zeroes_may_unmap;
+        u8 unused1[3];
+        le32 max_secure_erase_sectors;
+        le32 max_secure_erase_seg;
+        le32 secure_erase_sector_alignment;
+        struct virtio_blk_zoned_characteristics {
+                le32 zone_sectors;
+                le32 max_open_zones;
+                le32 max_active_zones;
+                le32 max_append_sectors;
+                le32 write_granularity;
+                u8 model;
+                u8 unused2[3];
+        } zoned;
+};
+\end{lstlisting}
+
 The \field{capacity} of the device (expressed in 512-byte sectors) is always
 present. The availability of the others all depend on various feature
 bits as indicated above.
@@ -167,53 +215,6 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
 terminated by the device with a "zone resources exceeded" error as defined for
 specific commands later.
 
-\begin{lstlisting}
-struct virtio_blk_config {
-        le64 capacity;
-        le32 size_max;
-        le32 seg_max;
-        struct virtio_blk_geometry {
-                le16 cylinders;
-                u8 heads;
-                u8 sectors;
-        } geometry;
-        le32 blk_size;
-        struct virtio_blk_topology {
-                // # of logical blocks per physical block (log2)
-                u8 physical_block_exp;
-                // offset of first aligned logical block
-                u8 alignment_offset;
-                // suggested minimum I/O size in blocks
-                le16 min_io_size;
-                // optimal (suggested maximum) I/O size in blocks
-                le32 opt_io_size;
-        } topology;
-        u8 writeback;
-        u8 unused0;
-        u16 num_queues;
-        le32 max_discard_sectors;
-        le32 max_discard_seg;
-        le32 discard_sector_alignment;
-        le32 max_write_zeroes_sectors;
-        le32 max_write_zeroes_seg;
-        u8 write_zeroes_may_unmap;
-        u8 unused1[3];
-        le32 max_secure_erase_sectors;
-        le32 max_secure_erase_seg;
-        le32 secure_erase_sector_alignment;
-        struct virtio_blk_zoned_characteristics {
-                le32 zone_sectors;
-                le32 max_open_zones;
-                le32 max_active_zones;
-                le32 max_append_sectors;
-                le32 write_granularity;
-                u8 model;
-                u8 unused2[3];
-        } zoned;
-};
-\end{lstlisting}
-
-
 \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout / Legacy Interface: Device configuration layout}
 When using the legacy interface, transitional devices and drivers
 MUST format the fields in struct virtio_blk_config
-- 
2.26.2


^ permalink raw reply related

* Re: [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: Michael S. Tsirkin @ 2023-02-23 13:20 UTC (permalink / raw)
  To: David Edmondson
  Cc: Heng Qi, virtio-dev, Parav Pandit, Alvaro Karsz, Jason Wang,
	Xuan Zhuo, Cornelia Huck, virtio-comment
In-Reply-To: <m27cw8d2k6.fsf@oracle.com>

On Thu, Feb 23, 2023 at 11:43:29AM +0000, David Edmondson wrote:
> 
> On Thursday, 2023-02-23 at 18:52:14 +08, Heng Qi wrote:
> > Hi, David.
> >
> > 在 2023/2/23 下午6:05, David Edmondson 写道:
> >> On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
> >>> Currently, coalescing parameters are grouped for all transmit and receive
> >>> virtqueues. This patch supports setting or getting the parameters for a
> >>> specified virtqueue, and a typical application of this function is netdim[1].
> >>>
> >>> When the traffic between virtqueues is unbalanced, for example, one virtqueue
> >>> is busy and another virtqueue is idle, then it will be very useful to
> >>> control coalescing parameters at the virtqueue granularity.
> >>>
> >>> [1] https://docs.kernel.org/networking/net_dim.html
> >>>
> >>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> >>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >>> ---
> >>> This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
> >>>
> >>> v6->v7:
> >>>         1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
> >>>         2. Remove formula for vqn range. @Parav Pandit
> >>>         3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
> >>>
> >>> v5->v6:
> >>>         1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson
> >> A couple of things about this:
> >> - why say "a value close to a power of 2" - couldn't the device pick any
> >>    value it chooses?
> >
> > This is just a hint from the spec, it is "MAY", not "MUST" in the
> > conformance of the device, the device can still set any value it
> > receives.
> 
> Okay.
> 
> > And, since "virtqueue notification coalescing" feature will be used in
> > the netdim [1] algorithm,
> > and the coalescing moderation of netdim is roughly as follows, so it
> > is allowed to give the hint in the spec:
> > "
> > #define NET_DIM_RX_EQE_PROFILES { \
> > {.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \
> > }
> >
> > #define NET_DIM_RX_CQE_PROFILES { \
> > {.usec = 2, .pkts = 256,}, \
> > {.usec = 8, .pkts = 128,}, \
> > {.usec = 16, .pkts = 64,}, \
> > {.usec = 32, .pkts = 64,}, \
> > {.usec = 64, .pkts = 64,} \
> > }
> >
> > #define NET_DIM_TX_EQE_PROFILES { \
> > {.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> > {.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \
> > }
> >
> > #define NET_DIM_TX_CQE_PROFILES { \
> > {.usec = 5, .pkts = 128,}, \
> > {.usec = 8, .pkts = 64,}, \
> > {.usec = 16, .pkts = 32,}, \
> > {.usec = 32, .pkts = 32,}, \
> > {.usec = 64, .pkts = 32,} \
> > }
> > "
> > [1]  https://docs.kernel.org/networking/net_dim.html
> >
> >> - I think that we need to be more explicit that the values passed in the
> >>    SET request may not be honoured exactly.
> >
> > Yes, there are already examples in the current spec:
> > "
> > +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL
> > class to set a coalescing parameter,
> > +it may set the parameter to a value close to a power of 2. For example:
> > +If the device receives \field{max_usecs} = 7 from the
> > VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs}
> > = 8 for a given enabled virtqueue.
> > "
> > If you find this unclear, do you need more examples or clarification,
> > or do you have a better way?
> 
> Explicit is good:
> 
> When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class
> to set a coalescing parameter it may choose to use a value different to
> that specified in the command, for example a power of two value close to
> the specified parameter.


Hmm. This clarification actually belongs in Alvaro's patch btw, no?
Basically with the best-effort sentence.


> The value chosen by the device can be retrieved
> using the VIRTIO_NET_CTRL_NOTF_VQ_GET command.

I do however feel that maybe we should not allow any value.
Values lower than what is specified are definitely ok.


> >> - should the chosen value be returned in the SET call? (Not too fussed
> >>    about this, though it may result in an implementation immediately
> >>    calling GET after SET to see what actually happened.)
> >
> > As you said, I think we can just call GET to view.
> >
> >> - the example which shows how the global and per-VQ set operations
> >>    interact is reasonably worded ("the device responds with coalescing
> >>    parameters of virtqueue1 set by command5"), so that seems okay.
> >
> > Yeah.
> >
> > Thanks.
> >
> >>> v4->v5:
> >>>         1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
> >>>         2. Add read and write attributes for each field. @Michael S. Tsirkin
> >>>         3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
> >>>         4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
> >>>
> >>> v3->v4:
> >>>         1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
> >>>         2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> >>>         3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
> >>>         4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> >>>
> >>> v2->v3:
> >>>         1. Add the netdim link. @Parav Pandit
> >>>         2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
> >>>         3. _VQ_GET is explained more. @Michael S. Tsirkin
> >>>         4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
> >>>         5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> >>>         6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
> >>>         7. Fix some typos. @Michael S. Tsirkin
> >>>
> >>> v1->v2:
> >>>         1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
> >>>         2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
> >>>         3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
> >>>         4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> >>>         5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
> >>>         6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> >>>
> >>>   device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
> >>>   1 file changed, 94 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> >>> index e71e33b..745e4d9 100644
> >>> --- a/device-types/net/description.tex
> >>> +++ b/device-types/net/description.tex
> >>> @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >>>       channel.
> >>>   +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue
> >>> notification coalescing.
> >>> +
> >>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >>>     \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4
> >>> packets.
> >>> @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> >>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >>>   \end{description}
> >>>     \subsubsection{Legacy Interface: Feature bits}\label{sec:Device
> >>> Types / Network Device / Feature bits / Legacy Interface: Feature
> >>> bits}
> >>> @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >>>   If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
> >>>   send control commands for dynamically changing the coalescing parameters.
> >>>   +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
> >>> +\begin{itemize}
> >>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
> >>> +      enabled transmit/receive virtqueue.
> >>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
> >>> +      coalescing parameters of a given enabled transmit/receive virtqueue.
> >>> +\end{itemize}
> >>> +
> >>>   \begin{note}
> >>>   The behavior of the device in response to these commands is best-effort:
> >>>   the device may generate notifications more or less frequently than specified.
> >>> @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >>>       le32 max_usecs;
> >>>   };
> >>>   +struct virtio_net_ctrl_coal_vq {
> >>> +    le16 vqn;
> >>> +    le16 reserved;
> >>> +    struct virtio_net_ctrl_coal coal;
> >>> +};
> >>> +
> >>>   #define VIRTIO_NET_CTRL_NOTF_COAL 6
> >>>    #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
> >>>    #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> >>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> >>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
> >>>   \end{lstlisting}
> >>>   +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
> >>> VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
> >>> +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
> >>> +transmit/receive virtqueues.
> >>> +
> >>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
> >>> +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
> >>> +
> >>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
> >>> +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
> >>> +in the virtio_net_ctrl_coal_vq structure.
> >>> +
> >>> +# Read/Write attributes for coalescing parameters
> >>> +\begin{itemize}
> >>> +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
> >>> +      and \field{max_packets} are write-only for a driver.
> >>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
> >>> +      and \field{max_packets} are write-only for a driver.
> >>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
> >>> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
> >>> +\end{itemize}
> >>> +
> >>>   Coalescing parameters:
> >>>   \begin{itemize}
> >>> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
> >>>   \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
> >>>   \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
> >>>   \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
> >>>   \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
> >>>   \end{itemize}
> >>>   -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
> >>> +\field{reserved} is reserved and it is ignored by a device.
> >>> +
> >>> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
> >>>   \begin{enumerate}
> >>> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
> >>> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
> >>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
> >>> +                                        virtqueue whose number is \field{vqn}.
> >>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
> >>> +                                        transmit/receive virtqueue whose number is \field{vqn}.
> >>> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> >>> +                                        each virtqueue of transmitq1\ldots transmitqN.
> >>> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> >>> +                                        each virtqueue of receiveq1\ldots receiveqN.
> >>>   \end{enumerate}
> >>>   +If coalescing parameters are being set, the device applies the
> >>> last coalescing parameters received for a
> >>> +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
> >>> +# Command sequence
> >>> +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
> >>> +\begin{itemize}
> >>> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing
> > parameters for virtqueue0 and virtqueue2, and, virtqueue1 and
> > virtqueue3 retain their previous parameter values.
> >>> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
> >>> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
> >>> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
> >>> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
> >>> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
> >>> +\end{itemize}
> >>> +
> >>>   \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
> >>>     The device sends a used buffer notification once the
> >>> notification conditions are met and if the notifications are not
> >>> suppressed as explained in \ref{sec:Basic Facilities of a Virtio
> >>> Device / Virtqueues / Used Buffer Notification Suppression}.
> >>> @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >>>     When the device has \field{max_usecs} = 0 or
> >>> \field{max_packets} = 0, the notification conditions are met after
> >>> every packet received/sent.
> >>>   +When a device receives a command of the
> >>> VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
> >>> +it may set the parameter to a value close to a power of 2. For example:
> >>> +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
> >>> +
> >>> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
> >>> +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
> >>> +does not change the global values. If the device is reset, the global values will be set to 0.
> >>> +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
> >>> +
> >>>   \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
> >>>     If, for example:
> >>> @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >>>     \drivernormative{\subparagraph}{Notifications
> >>> Coalescing}{Device Types / Network Device / Device Operation /
> >>> Control Virtqueue / Notifications Coalescing}
> >>>   -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated,
> >>> the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> >>> +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
> >>> +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> >>> +
> >>> +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
> >>>     \devicenormative{\subparagraph}{Notifications
> >>> Coalescing}{Device Types / Network Device / Device Operation /
> >>> Control Virtqueue / Notifications Coalescing}
> >>>   -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL
> >>> commands with VIRTIO_NET_ERR if it was not able to change the
> >>> parameters.
> >>> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
> >>> +
> >>> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
> >>> +
> >>> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
> >>> +
> >>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
> >>> +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
> >>> +The device MUST set the global values of coalescing parameters to 0 after being reset.
> >>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
> >>> +
> >>> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
> >>> +
> >>> +A device MAY set the coalescing parameter to a value close to a power of 2 value.
> >>> +
> >>> +A device MUST ignore \field{reserved}.
> >>>     A device SHOULD NOT send used buffer notifications to the
> >>> driver if the notifications are suppressed, even if the
> >>> notification conditions are met.
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> -- 
> Come down, come talk to me.


^ permalink raw reply

* Re: [PATCH v9] virtio-net: support inner header hash
From: Michael S. Tsirkin @ 2023-02-23 13:13 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan
In-Reply-To: <20230218143715.841-1-hengqi@linux.alibaba.com>

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement.

Is this just with RSS? I assume hash calculation is also used for
something like queueing so there's a similar risk even just
with hash reporting, no?


> When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with invaild
> +packets, and the flooded packets may be hashed into the same queue as packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
> +       overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
> +\item  The user can observe the traffic information and enqueue information of other normal
> +       tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +


So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
up with an idea: RSS indirection table entries are 16 bit but
onlu 15 bits are used to indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.

The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)
then
	calculate inner hash
	return rss[inner hash] & ~tunnel bit
else
	return rss[outer hash]


this fixes the security issue returning us back to
status quo : specific tunnels can be directed to separate queues.


This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.

-- 
MST


^ permalink raw reply

* Re: [PATCH v9] virtio-net: support inner header hash
From: Michael S. Tsirkin @ 2023-02-23 13:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo
In-Reply-To: <1047920c-5dd5-8f31-0c4c-a108f36155f8@redhat.com>

On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> Hi:
> 
> 在 2023/2/22 14:46, Heng Qi 写道:
> > Hi, Jason. Long time no see. :)
> > 
> > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > 
> > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > +There may be potential security risks when encapsulated
> > > > > packets using RSS to
> > > > > +select queues for placement. When a user inside a tunnel
> > > > > tries to control the
> > > 
> > > 
> > > What do you mean by "user" here? Is it a remote or local one?
> > > 
> > 
> > I mean a remote attacker who is not under the control of the tunnel
> > owner.
> 
> 
> Anything may the tunnel different? I think this can happen even without
> tunnel (and even with single queue).

I think you are missing the fact that tunnel is normally a
security boundary: users within the tunnel can not control
what is happening outside.
The feature breaks the encapsulation somewhat.

For example without tunneling it is possible
to create a special "bad guy queue" and direct specific tunnels
there by playing with key and indirection table.

> How to mitigate those attackers seems more like a implementation details
> where might require fair queuing or other QOS technology which has been well
> studied.
> 
> It seems out of the scope of the spec (unless we want to let driver
> manageable QOS).
> 
> Thanks
> 
> 
> > 
> > Thanks.
> > 
> > > 
> > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > the device with invaild
> > > > > +packets, and the flooded packets may be hashed into the
> > > > > same queue as packets in
> > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > +
> > > > > +This can pose several security risks:
> > > > > +\begin{itemize}
> > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > enqueued due to queue
> > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > +\item  The delay and retransmission of packets in the
> > > > > normal tunnels are extremely increased.
> > > > > +\item  The user can observe the traffic information and
> > > > > enqueue information of other normal
> > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > +\end{\itemize}
> > > > > +
> > > > Hmm with this all written out it sounds pretty severe.
> > > 
> > > 
> > > I think we need first understand whether or not it's a problem that
> > > we need to solve at spec level:
> > > 
> > > 1) anything make encapsulated packets different or why we can't hit
> > > this problem without encapsulation
> > > 
> > > 2) whether or not it's the implementation details that the spec
> > > doesn't need to care (or how it is solved in real NIC)
> > > 
> > > Thanks
> > > 
> > > 
> > > > At this point with no ways to mitigate, I don't feel this is something
> > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > others  find this somehow useful e.g. for dpdk.
> > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > and asking them for the opinion?
> > > > 
> > > > 
> > 


^ permalink raw reply

* Re: [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: Michael S. Tsirkin @ 2023-02-23 12:52 UTC (permalink / raw)
  To: Heng Qi
  Cc: David Edmondson, virtio-dev, virtio-comment, Parav Pandit,
	Alvaro Karsz, Jason Wang, Xuan Zhuo, Cornelia Huck
In-Reply-To: <3e0d0bd0-b3e8-1616-7fd6-8a4a5a35e6db@linux.alibaba.com>

On Thu, Feb 23, 2023 at 06:52:14PM +0800, Heng Qi wrote:
> Hi, David.
> 
> 在 2023/2/23 下午6:05, David Edmondson 写道:
> > On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
> > > Currently, coalescing parameters are grouped for all transmit and receive
> > > virtqueues. This patch supports setting or getting the parameters for a
> > > specified virtqueue, and a typical application of this function is netdim[1].
> > > 
> > > When the traffic between virtqueues is unbalanced, for example, one virtqueue
> > > is busy and another virtqueue is idle, then it will be very useful to
> > > control coalescing parameters at the virtqueue granularity.
> > > 
> > > [1] https://docs.kernel.org/networking/net_dim.html
> > > 
> > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
> > > 
> > > v6->v7:
> > >         1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
> > >         2. Remove formula for vqn range. @Parav Pandit
> > >         3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
> > > 
> > > v5->v6:
> > >         1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson
> > A couple of things about this:
> > - why say "a value close to a power of 2" - couldn't the device pick any
> >    value it chooses?
> 
> This is just a hint from the spec, it is "MAY", not "MUST" in the
> conformance of the device, the device can still set any value it receives.

I think a power of 2 is an example that confuses more than it clarifies.

> And, since "virtqueue notification coalescing" feature will be used in the
> netdim [1] algorithm,
> and the coalescing moderation of netdim is roughly as follows, so it is
> allowed to give the hint in the spec:
> "
> #define NET_DIM_RX_EQE_PROFILES { \
> {.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \
> }
> 
> #define NET_DIM_RX_CQE_PROFILES { \
> {.usec = 2, .pkts = 256,}, \
> {.usec = 8, .pkts = 128,}, \
> {.usec = 16, .pkts = 64,}, \
> {.usec = 32, .pkts = 64,}, \
> {.usec = 64, .pkts = 64,} \
> }
> 
> #define NET_DIM_TX_EQE_PROFILES { \
> {.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \
> }
> 
> #define NET_DIM_TX_CQE_PROFILES { \
> {.usec = 5, .pkts = 128,}, \
> {.usec = 8, .pkts = 64,}, \
> {.usec = 16, .pkts = 32,}, \
> {.usec = 32, .pkts = 32,}, \
> {.usec = 64, .pkts = 32,} \
> }
> "
> [1]  https://docs.kernel.org/networking/net_dim.html
> 
> > - I think that we need to be more explicit that the values passed in the
> >    SET request may not be honoured exactly.
> 
> Yes, there are already examples in the current spec:
> "
> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class to
> set a coalescing parameter,
> +it may set the parameter to a value close to a power of 2. For example:
> +If the device receives \field{max_usecs} = 7 from the
> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8
> for a given enabled virtqueue.
> "
> If you find this unclear, do you need more examples or clarification, or do
> you have a better way?
> 
> > - should the chosen value be returned in the SET call? (Not too fussed
> >    about this, though it may result in an implementation immediately
> >    calling GET after SET to see what actually happened.)
> 
> As you said, I think we can just call GET to view.
> 
> > - the example which shows how the global and per-VQ set operations
> >    interact is reasonably worded ("the device responds with coalescing
> >    parameters of virtqueue1 set by command5"), so that seems okay.
> 
> Yeah.
> 
> Thanks.
> 
> > > v4->v5:
> > >         1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
> > >         2. Add read and write attributes for each field. @Michael S. Tsirkin
> > >         3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
> > >         4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
> > > 
> > > v3->v4:
> > >         1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
> > >         2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> > >         3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
> > >         4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> > > 
> > > v2->v3:
> > >         1. Add the netdim link. @Parav Pandit
> > >         2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
> > >         3. _VQ_GET is explained more. @Michael S. Tsirkin
> > >         4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
> > >         5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> > >         6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
> > >         7. Fix some typos. @Michael S. Tsirkin
> > > 
> > > v1->v2:
> > >         1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
> > >         2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
> > >         3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
> > >         4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> > >         5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
> > >         6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
> > > 
> > >   device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
> > >   1 file changed, 94 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > > index e71e33b..745e4d9 100644
> > > --- a/device-types/net/description.tex
> > > +++ b/device-types/net/description.tex
> > > @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >       channel.
> > > +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing.
> > > +
> > >   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > >   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > >   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > >   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > >   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > >   \end{description}
> > >   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >   If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
> > >   send control commands for dynamically changing the coalescing parameters.
> > > +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
> > > +\begin{itemize}
> > > +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
> > > +      enabled transmit/receive virtqueue.
> > > +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
> > > +      coalescing parameters of a given enabled transmit/receive virtqueue.
> > > +\end{itemize}
> > > +
> > >   \begin{note}
> > >   The behavior of the device in response to these commands is best-effort:
> > >   the device may generate notifications more or less frequently than specified.
> > > @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >       le32 max_usecs;
> > >   };
> > > +struct virtio_net_ctrl_coal_vq {
> > > +    le16 vqn;
> > > +    le16 reserved;
> > > +    struct virtio_net_ctrl_coal coal;
> > > +};
> > > +
> > >   #define VIRTIO_NET_CTRL_NOTF_COAL 6
> > >    #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
> > >    #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> > > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> > > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
> > >   \end{lstlisting}
> > > +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
> > > +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
> > > +transmit/receive virtqueues.
> > > +
> > > +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
> > > +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
> > > +
> > > +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
> > > +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
> > > +in the virtio_net_ctrl_coal_vq structure.
> > > +
> > > +# Read/Write attributes for coalescing parameters
> > > +\begin{itemize}
> > > +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
> > > +      and \field{max_packets} are write-only for a driver.
> > > +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
> > > +      and \field{max_packets} are write-only for a driver.
> > > +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
> > > +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
> > > +\end{itemize}
> > > +
> > >   Coalescing parameters:
> > >   \begin{itemize}
> > > +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
> > >   \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
> > >   \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
> > >   \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
> > >   \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
> > >   \end{itemize}
> > > -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
> > > +\field{reserved} is reserved and it is ignored by a device.
> > > +
> > > +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
> > >   \begin{enumerate}
> > > -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
> > > -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
> > > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
> > > +                                        virtqueue whose number is \field{vqn}.
> > > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
> > > +                                        transmit/receive virtqueue whose number is \field{vqn}.
> > > +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> > > +                                        each virtqueue of transmitq1\ldots transmitqN.
> > > +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> > > +                                        each virtqueue of receiveq1\ldots receiveqN.
> > >   \end{enumerate}
> > > +If coalescing parameters are being set, the device applies the last coalescing parameters received for a
> > > +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
> > > +# Command sequence
> > > +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
> > > +\begin{itemize}
> > > +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing parameters for virtqueue0 and virtqueue2, and, virtqueue1 and virtqueue3 retain their previous parameter values.
> > > +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
> > > +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
> > > +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
> > > +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
> > > +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
> > > +\end{itemize}
> > > +
> > >   \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
> > >   The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}.
> > > @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >   When the device has \field{max_usecs} = 0 or \field{max_packets} = 0, the notification conditions are met after every packet received/sent.
> > > +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
> > > +it may set the parameter to a value close to a power of 2. For example:
> > > +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
> > > +
> > > +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
> > > +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
> > > +does not change the global values. If the device is reset, the global values will be set to 0.
> > > +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
> > > +
> > >   \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
> > >   If, for example:
> > > @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >   \drivernormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> > > +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
> > > +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> > > +
> > > +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
> > >   \devicenormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL commands with VIRTIO_NET_ERR if it was not able to change the parameters.
> > > +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
> > > +
> > > +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
> > > +
> > > +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
> > > +
> > > +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
> > > +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
> > > +The device MUST set the global values of coalescing parameters to 0 after being reset.
> > > +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
> > > +
> > > +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
> > > +
> > > +A device MAY set the coalescing parameter to a value close to a power of 2 value.
> > > +
> > > +A device MUST ignore \field{reserved}.
> > >   A device SHOULD NOT send used buffer notifications to the driver if the notifications are suppressed, even if the notification conditions are met.


^ permalink raw reply

* Re: [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: Heng Qi @ 2023-02-23 11:56 UTC (permalink / raw)
  To: David Edmondson
  Cc: virtio-dev, Michael S . Tsirkin, Parav Pandit, Alvaro Karsz,
	Jason Wang, Xuan Zhuo, Cornelia Huck, virtio-comment
In-Reply-To: <m27cw8d2k6.fsf@oracle.com>



在 2023/2/23 下午7:43, David Edmondson 写道:
> On Thursday, 2023-02-23 at 18:52:14 +08, Heng Qi wrote:
>> Hi, David.
>>
>> 在 2023/2/23 下午6:05, David Edmondson 写道:
>>> On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
>>>> Currently, coalescing parameters are grouped for all transmit and receive
>>>> virtqueues. This patch supports setting or getting the parameters for a
>>>> specified virtqueue, and a typical application of this function is netdim[1].
>>>>
>>>> When the traffic between virtqueues is unbalanced, for example, one virtqueue
>>>> is busy and another virtqueue is idle, then it will be very useful to
>>>> control coalescing parameters at the virtqueue granularity.
>>>>
>>>> [1] https://docs.kernel.org/networking/net_dim.html
>>>>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>> This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
>>>>
>>>> v6->v7:
>>>>          1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
>>>>          2. Remove formula for vqn range. @Parav Pandit
>>>>          3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
>>>>
>>>> v5->v6:
>>>>          1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson
>>> A couple of things about this:
>>> - why say "a value close to a power of 2" - couldn't the device pick any
>>>     value it chooses?
>> This is just a hint from the spec, it is "MAY", not "MUST" in the
>> conformance of the device, the device can still set any value it
>> receives.
> Okay.
>
>> And, since "virtqueue notification coalescing" feature will be used in
>> the netdim [1] algorithm,
>> and the coalescing moderation of netdim is roughly as follows, so it
>> is allowed to give the hint in the spec:
>> "
>> #define NET_DIM_RX_EQE_PROFILES { \
>> {.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \
>> }
>>
>> #define NET_DIM_RX_CQE_PROFILES { \
>> {.usec = 2, .pkts = 256,}, \
>> {.usec = 8, .pkts = 128,}, \
>> {.usec = 16, .pkts = 64,}, \
>> {.usec = 32, .pkts = 64,}, \
>> {.usec = 64, .pkts = 64,} \
>> }
>>
>> #define NET_DIM_TX_EQE_PROFILES { \
>> {.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
>> {.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \
>> }
>>
>> #define NET_DIM_TX_CQE_PROFILES { \
>> {.usec = 5, .pkts = 128,}, \
>> {.usec = 8, .pkts = 64,}, \
>> {.usec = 16, .pkts = 32,}, \
>> {.usec = 32, .pkts = 32,}, \
>> {.usec = 64, .pkts = 32,} \
>> }
>> "
>> [1]  https://docs.kernel.org/networking/net_dim.html
>>
>>> - I think that we need to be more explicit that the values passed in the
>>>     SET request may not be honoured exactly.
>> Yes, there are already examples in the current spec:
>> "
>> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL
>> class to set a coalescing parameter,
>> +it may set the parameter to a value close to a power of 2. For example:
>> +If the device receives \field{max_usecs} = 7 from the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs}
>> = 8 for a given enabled virtqueue.
>> "
>> If you find this unclear, do you need more examples or clarification,
>> or do you have a better way?
> Explicit is good:
>
> When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class
> to set a coalescing parameter it may choose to use a value different to
> that specified in the command, for example a power of two value close to
> the specified parameter. The value chosen by the device can be retrieved
> using the VIRTIO_NET_CTRL_NOTF_VQ_GET command.

Good advice, thanks! :)

>
>>> - should the chosen value be returned in the SET call? (Not too fussed
>>>     about this, though it may result in an implementation immediately
>>>     calling GET after SET to see what actually happened.)
>> As you said, I think we can just call GET to view.
>>
>>> - the example which shows how the global and per-VQ set operations
>>>     interact is reasonably worded ("the device responds with coalescing
>>>     parameters of virtqueue1 set by command5"), so that seems okay.
>> Yeah.
>>
>> Thanks.
>>
>>>> v4->v5:
>>>>          1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
>>>>          2. Add read and write attributes for each field. @Michael S. Tsirkin
>>>>          3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
>>>>          4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
>>>>
>>>> v3->v4:
>>>>          1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
>>>>          2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>>          3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
>>>>          4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>>
>>>> v2->v3:
>>>>          1. Add the netdim link. @Parav Pandit
>>>>          2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
>>>>          3. _VQ_GET is explained more. @Michael S. Tsirkin
>>>>          4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
>>>>          5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>>          6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
>>>>          7. Fix some typos. @Michael S. Tsirkin
>>>>
>>>> v1->v2:
>>>>          1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
>>>>          2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
>>>>          3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
>>>>          4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>>          5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
>>>>          6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>>
>>>>    device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
>>>>    1 file changed, 94 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>>>> index e71e33b..745e4d9 100644
>>>> --- a/device-types/net/description.tex
>>>> +++ b/device-types/net/description.tex
>>>> @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>>    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>>        channel.
>>>>    +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue
>>>> notification coalescing.
>>>> +
>>>>    \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>>      \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4
>>>> packets.
>>>> @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>>    \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>    \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>>    \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>    \end{description}
>>>>      \subsubsection{Legacy Interface: Feature bits}\label{sec:Device
>>>> Types / Network Device / Feature bits / Legacy Interface: Feature
>>>> bits}
>>>> @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>    If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
>>>>    send control commands for dynamically changing the coalescing parameters.
>>>>    +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
>>>> +\begin{itemize}
>>>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
>>>> +      enabled transmit/receive virtqueue.
>>>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
>>>> +      coalescing parameters of a given enabled transmit/receive virtqueue.
>>>> +\end{itemize}
>>>> +
>>>>    \begin{note}
>>>>    The behavior of the device in response to these commands is best-effort:
>>>>    the device may generate notifications more or less frequently than specified.
>>>> @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>        le32 max_usecs;
>>>>    };
>>>>    +struct virtio_net_ctrl_coal_vq {
>>>> +    le16 vqn;
>>>> +    le16 reserved;
>>>> +    struct virtio_net_ctrl_coal coal;
>>>> +};
>>>> +
>>>>    #define VIRTIO_NET_CTRL_NOTF_COAL 6
>>>>     #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
>>>>     #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
>>>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
>>>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
>>>>    \end{lstlisting}
>>>>    +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>>>> VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
>>>> +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
>>>> +transmit/receive virtqueues.
>>>> +
>>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
>>>> +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
>>>> +
>>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
>>>> +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
>>>> +in the virtio_net_ctrl_coal_vq structure.
>>>> +
>>>> +# Read/Write attributes for coalescing parameters
>>>> +\begin{itemize}
>>>> +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
>>>> +      and \field{max_packets} are write-only for a driver.
>>>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
>>>> +      and \field{max_packets} are write-only for a driver.
>>>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
>>>> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
>>>> +\end{itemize}
>>>> +
>>>>    Coalescing parameters:
>>>>    \begin{itemize}
>>>> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
>>>>    \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
>>>>    \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
>>>>    \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
>>>>    \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
>>>>    \end{itemize}
>>>>    -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
>>>> +\field{reserved} is reserved and it is ignored by a device.
>>>> +
>>>> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
>>>>    \begin{enumerate}
>>>> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
>>>> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
>>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
>>>> +                                        virtqueue whose number is \field{vqn}.
>>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
>>>> +                                        transmit/receive virtqueue whose number is \field{vqn}.
>>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>>>> +                                        each virtqueue of transmitq1\ldots transmitqN.
>>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>>>> +                                        each virtqueue of receiveq1\ldots receiveqN.
>>>>    \end{enumerate}
>>>>    +If coalescing parameters are being set, the device applies the
>>>> last coalescing parameters received for a
>>>> +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
>>>> +# Command sequence
>>>> +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
>>>> +\begin{itemize}
>>>> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing
>> parameters for virtqueue0 and virtqueue2, and, virtqueue1 and
>> virtqueue3 retain their previous parameter values.
>>>> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
>>>> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
>>>> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
>>>> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
>>>> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
>>>> +\end{itemize}
>>>> +
>>>>    \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
>>>>      The device sends a used buffer notification once the
>>>> notification conditions are met and if the notifications are not
>>>> suppressed as explained in \ref{sec:Basic Facilities of a Virtio
>>>> Device / Virtqueues / Used Buffer Notification Suppression}.
>>>> @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>      When the device has \field{max_usecs} = 0 or
>>>> \field{max_packets} = 0, the notification conditions are met after
>>>> every packet received/sent.
>>>>    +When a device receives a command of the
>>>> VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
>>>> +it may set the parameter to a value close to a power of 2. For example:
>>>> +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
>>>> +
>>>> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
>>>> +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
>>>> +does not change the global values. If the device is reset, the global values will be set to 0.
>>>> +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
>>>> +
>>>>    \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
>>>>      If, for example:
>>>> @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>      \drivernormative{\subparagraph}{Notifications
>>>> Coalescing}{Device Types / Network Device / Device Operation /
>>>> Control Virtqueue / Notifications Coalescing}
>>>>    -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated,
>>>> the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>>>> +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
>>>> +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>>>> +
>>>> +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
>>>>      \devicenormative{\subparagraph}{Notifications
>>>> Coalescing}{Device Types / Network Device / Device Operation /
>>>> Control Virtqueue / Notifications Coalescing}
>>>>    -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL
>>>> commands with VIRTIO_NET_ERR if it was not able to change the
>>>> parameters.
>>>> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
>>>> +
>>>> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
>>>> +
>>>> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
>>>> +
>>>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
>>>> +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
>>>> +The device MUST set the global values of coalescing parameters to 0 after being reset.
>>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
>>>> +
>>>> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
>>>> +
>>>> +A device MAY set the coalescing parameter to a value close to a power of 2 value.
>>>> +
>>>> +A device MUST ignore \field{reserved}.
>>>>      A device SHOULD NOT send used buffer notifications to the
>>>> driver if the notifications are suppressed, even if the
>>>> notification conditions are met.
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply

* Re: [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: David Edmondson @ 2023-02-23 11:43 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-dev, Michael S . Tsirkin, Parav Pandit, Alvaro Karsz,
	Jason Wang, Xuan Zhuo, Cornelia Huck, virtio-comment
In-Reply-To: <3e0d0bd0-b3e8-1616-7fd6-8a4a5a35e6db@linux.alibaba.com>


On Thursday, 2023-02-23 at 18:52:14 +08, Heng Qi wrote:
> Hi, David.
>
> 在 2023/2/23 下午6:05, David Edmondson 写道:
>> On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
>>> Currently, coalescing parameters are grouped for all transmit and receive
>>> virtqueues. This patch supports setting or getting the parameters for a
>>> specified virtqueue, and a typical application of this function is netdim[1].
>>>
>>> When the traffic between virtqueues is unbalanced, for example, one virtqueue
>>> is busy and another virtqueue is idle, then it will be very useful to
>>> control coalescing parameters at the virtqueue granularity.
>>>
>>> [1] https://docs.kernel.org/networking/net_dim.html
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>> This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
>>>
>>> v6->v7:
>>>         1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
>>>         2. Remove formula for vqn range. @Parav Pandit
>>>         3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
>>>
>>> v5->v6:
>>>         1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson
>> A couple of things about this:
>> - why say "a value close to a power of 2" - couldn't the device pick any
>>    value it chooses?
>
> This is just a hint from the spec, it is "MAY", not "MUST" in the
> conformance of the device, the device can still set any value it
> receives.

Okay.

> And, since "virtqueue notification coalescing" feature will be used in
> the netdim [1] algorithm,
> and the coalescing moderation of netdim is roughly as follows, so it
> is allowed to give the hint in the spec:
> "
> #define NET_DIM_RX_EQE_PROFILES { \
> {.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \
> }
>
> #define NET_DIM_RX_CQE_PROFILES { \
> {.usec = 2, .pkts = 256,}, \
> {.usec = 8, .pkts = 128,}, \
> {.usec = 16, .pkts = 64,}, \
> {.usec = 32, .pkts = 64,}, \
> {.usec = 64, .pkts = 64,} \
> }
>
> #define NET_DIM_TX_EQE_PROFILES { \
> {.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
> {.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \
> }
>
> #define NET_DIM_TX_CQE_PROFILES { \
> {.usec = 5, .pkts = 128,}, \
> {.usec = 8, .pkts = 64,}, \
> {.usec = 16, .pkts = 32,}, \
> {.usec = 32, .pkts = 32,}, \
> {.usec = 64, .pkts = 32,} \
> }
> "
> [1]  https://docs.kernel.org/networking/net_dim.html
>
>> - I think that we need to be more explicit that the values passed in the
>>    SET request may not be honoured exactly.
>
> Yes, there are already examples in the current spec:
> "
> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL
> class to set a coalescing parameter,
> +it may set the parameter to a value close to a power of 2. For example:
> +If the device receives \field{max_usecs} = 7 from the
> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs}
> = 8 for a given enabled virtqueue.
> "
> If you find this unclear, do you need more examples or clarification,
> or do you have a better way?

Explicit is good:

When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class
to set a coalescing parameter it may choose to use a value different to
that specified in the command, for example a power of two value close to
the specified parameter. The value chosen by the device can be retrieved
using the VIRTIO_NET_CTRL_NOTF_VQ_GET command.

>> - should the chosen value be returned in the SET call? (Not too fussed
>>    about this, though it may result in an implementation immediately
>>    calling GET after SET to see what actually happened.)
>
> As you said, I think we can just call GET to view.
>
>> - the example which shows how the global and per-VQ set operations
>>    interact is reasonably worded ("the device responds with coalescing
>>    parameters of virtqueue1 set by command5"), so that seems okay.
>
> Yeah.
>
> Thanks.
>
>>> v4->v5:
>>>         1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
>>>         2. Add read and write attributes for each field. @Michael S. Tsirkin
>>>         3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
>>>         4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
>>>
>>> v3->v4:
>>>         1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
>>>         2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>         3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
>>>         4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>
>>> v2->v3:
>>>         1. Add the netdim link. @Parav Pandit
>>>         2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
>>>         3. _VQ_GET is explained more. @Michael S. Tsirkin
>>>         4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
>>>         5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>         6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
>>>         7. Fix some typos. @Michael S. Tsirkin
>>>
>>> v1->v2:
>>>         1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
>>>         2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
>>>         3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
>>>         4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>         5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
>>>         6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>>
>>>   device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
>>>   1 file changed, 94 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>>> index e71e33b..745e4d9 100644
>>> --- a/device-types/net/description.tex
>>> +++ b/device-types/net/description.tex
>>> @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>       channel.
>>>   +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue
>>> notification coalescing.
>>> +
>>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>     \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4
>>> packets.
>>> @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \end{description}
>>>     \subsubsection{Legacy Interface: Feature bits}\label{sec:Device
>>> Types / Network Device / Feature bits / Legacy Interface: Feature
>>> bits}
>>> @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>   If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
>>>   send control commands for dynamically changing the coalescing parameters.
>>>   +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
>>> +\begin{itemize}
>>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
>>> +      enabled transmit/receive virtqueue.
>>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
>>> +      coalescing parameters of a given enabled transmit/receive virtqueue.
>>> +\end{itemize}
>>> +
>>>   \begin{note}
>>>   The behavior of the device in response to these commands is best-effort:
>>>   the device may generate notifications more or less frequently than specified.
>>> @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>       le32 max_usecs;
>>>   };
>>>   +struct virtio_net_ctrl_coal_vq {
>>> +    le16 vqn;
>>> +    le16 reserved;
>>> +    struct virtio_net_ctrl_coal coal;
>>> +};
>>> +
>>>   #define VIRTIO_NET_CTRL_NOTF_COAL 6
>>>    #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
>>>    #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
>>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
>>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
>>>   \end{lstlisting}
>>>   +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>>> VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
>>> +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
>>> +transmit/receive virtqueues.
>>> +
>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
>>> +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
>>> +
>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
>>> +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
>>> +in the virtio_net_ctrl_coal_vq structure.
>>> +
>>> +# Read/Write attributes for coalescing parameters
>>> +\begin{itemize}
>>> +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
>>> +      and \field{max_packets} are write-only for a driver.
>>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
>>> +      and \field{max_packets} are write-only for a driver.
>>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
>>> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
>>> +\end{itemize}
>>> +
>>>   Coalescing parameters:
>>>   \begin{itemize}
>>> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
>>>   \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
>>>   \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
>>>   \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
>>>   \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
>>>   \end{itemize}
>>>   -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
>>> +\field{reserved} is reserved and it is ignored by a device.
>>> +
>>> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
>>>   \begin{enumerate}
>>> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
>>> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
>>> +                                        virtqueue whose number is \field{vqn}.
>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
>>> +                                        transmit/receive virtqueue whose number is \field{vqn}.
>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>>> +                                        each virtqueue of transmitq1\ldots transmitqN.
>>> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>>> +                                        each virtqueue of receiveq1\ldots receiveqN.
>>>   \end{enumerate}
>>>   +If coalescing parameters are being set, the device applies the
>>> last coalescing parameters received for a
>>> +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
>>> +# Command sequence
>>> +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
>>> +\begin{itemize}
>>> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing
> parameters for virtqueue0 and virtqueue2, and, virtqueue1 and
> virtqueue3 retain their previous parameter values.
>>> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
>>> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
>>> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
>>> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
>>> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
>>> +\end{itemize}
>>> +
>>>   \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
>>>     The device sends a used buffer notification once the
>>> notification conditions are met and if the notifications are not
>>> suppressed as explained in \ref{sec:Basic Facilities of a Virtio
>>> Device / Virtqueues / Used Buffer Notification Suppression}.
>>> @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>     When the device has \field{max_usecs} = 0 or
>>> \field{max_packets} = 0, the notification conditions are met after
>>> every packet received/sent.
>>>   +When a device receives a command of the
>>> VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
>>> +it may set the parameter to a value close to a power of 2. For example:
>>> +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
>>> +
>>> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
>>> +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
>>> +does not change the global values. If the device is reset, the global values will be set to 0.
>>> +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
>>> +
>>>   \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
>>>     If, for example:
>>> @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>     \drivernormative{\subparagraph}{Notifications
>>> Coalescing}{Device Types / Network Device / Device Operation /
>>> Control Virtqueue / Notifications Coalescing}
>>>   -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated,
>>> the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>>> +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
>>> +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>>> +
>>> +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
>>>     \devicenormative{\subparagraph}{Notifications
>>> Coalescing}{Device Types / Network Device / Device Operation /
>>> Control Virtqueue / Notifications Coalescing}
>>>   -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL
>>> commands with VIRTIO_NET_ERR if it was not able to change the
>>> parameters.
>>> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
>>> +
>>> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
>>> +
>>> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
>>> +
>>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
>>> +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
>>> +The device MUST set the global values of coalescing parameters to 0 after being reset.
>>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
>>> +
>>> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
>>> +
>>> +A device MAY set the coalescing parameter to a value close to a power of 2 value.
>>> +
>>> +A device MUST ignore \field{reserved}.
>>>     A device SHOULD NOT send used buffer notifications to the
>>> driver if the notifications are suppressed, even if the
>>> notification conditions are met.
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
-- 
Come down, come talk to me.


^ permalink raw reply

* Re: [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: Heng Qi @ 2023-02-23 10:52 UTC (permalink / raw)
  To: David Edmondson
  Cc: virtio-dev, virtio-comment, Michael S . Tsirkin, Parav Pandit,
	Alvaro Karsz, Jason Wang, Xuan Zhuo, Cornelia Huck
In-Reply-To: <m2sfewd76r.fsf@oracle.com>

Hi, David.

在 2023/2/23 下午6:05, David Edmondson 写道:
> On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
>> Currently, coalescing parameters are grouped for all transmit and receive
>> virtqueues. This patch supports setting or getting the parameters for a
>> specified virtqueue, and a typical application of this function is netdim[1].
>>
>> When the traffic between virtqueues is unbalanced, for example, one virtqueue
>> is busy and another virtqueue is idle, then it will be very useful to
>> control coalescing parameters at the virtqueue granularity.
>>
>> [1] https://docs.kernel.org/networking/net_dim.html
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>> This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
>>
>> v6->v7:
>>         1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
>>         2. Remove formula for vqn range. @Parav Pandit
>>         3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
>>
>> v5->v6:
>>         1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson
> A couple of things about this:
> - why say "a value close to a power of 2" - couldn't the device pick any
>    value it chooses?

This is just a hint from the spec, it is "MAY", not "MUST" in the 
conformance of the device, the device can still set any value it receives.

And, since "virtqueue notification coalescing" feature will be used in 
the netdim [1] algorithm,
and the coalescing moderation of netdim is roughly as follows, so it is 
allowed to give the hint in the spec:
"
#define NET_DIM_RX_EQE_PROFILES { \
{.usec = 1, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
{.usec = 8, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
{.usec = 64, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
{.usec = 128, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,}, \
{.usec = 256, .pkts = NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE,} \
}

#define NET_DIM_RX_CQE_PROFILES { \
{.usec = 2, .pkts = 256,}, \
{.usec = 8, .pkts = 128,}, \
{.usec = 16, .pkts = 64,}, \
{.usec = 32, .pkts = 64,}, \
{.usec = 64, .pkts = 64,} \
}

#define NET_DIM_TX_EQE_PROFILES { \
{.usec = 1, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
{.usec = 8, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
{.usec = 32, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
{.usec = 64, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,}, \
{.usec = 128, .pkts = NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE,} \
}

#define NET_DIM_TX_CQE_PROFILES { \
{.usec = 5, .pkts = 128,}, \
{.usec = 8, .pkts = 64,}, \
{.usec = 16, .pkts = 32,}, \
{.usec = 32, .pkts = 32,}, \
{.usec = 64, .pkts = 32,} \
}
"
[1]  https://docs.kernel.org/networking/net_dim.html

> - I think that we need to be more explicit that the values passed in the
>    SET request may not be honoured exactly.

Yes, there are already examples in the current spec:
"
+When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class 
to set a coalescing parameter,
+it may set the parameter to a value close to a power of 2. For example:
+If the device receives \field{max_usecs} = 7 from the 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 
8 for a given enabled virtqueue.
"
If you find this unclear, do you need more examples or clarification, or 
do you have a better way?

> - should the chosen value be returned in the SET call? (Not too fussed
>    about this, though it may result in an implementation immediately
>    calling GET after SET to see what actually happened.)

As you said, I think we can just call GET to view.

> - the example which shows how the global and per-VQ set operations
>    interact is reasonably worded ("the device responds with coalescing
>    parameters of virtqueue1 set by command5"), so that seems okay.

Yeah.

Thanks.

>> v4->v5:
>>         1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
>>         2. Add read and write attributes for each field. @Michael S. Tsirkin
>>         3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
>>         4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
>>
>> v3->v4:
>>         1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
>>         2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>         3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
>>         4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>
>> v2->v3:
>>         1. Add the netdim link. @Parav Pandit
>>         2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
>>         3. _VQ_GET is explained more. @Michael S. Tsirkin
>>         4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
>>         5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>         6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
>>         7. Fix some typos. @Michael S. Tsirkin
>>
>> v1->v2:
>>         1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
>>         2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
>>         3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
>>         4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>         5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
>>         6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>>
>>   device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
>>   1 file changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>> index e71e33b..745e4d9 100644
>> --- a/device-types/net/description.tex
>> +++ b/device-types/net/description.tex
>> @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>   
>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>   
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \end{description}
>>   
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
>>   send control commands for dynamically changing the coalescing parameters.
>>   
>> +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
>> +\begin{itemize}
>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
>> +      enabled transmit/receive virtqueue.
>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
>> +      coalescing parameters of a given enabled transmit/receive virtqueue.
>> +\end{itemize}
>> +
>>   \begin{note}
>>   The behavior of the device in response to these commands is best-effort:
>>   the device may generate notifications more or less frequently than specified.
>> @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>       le32 max_usecs;
>>   };
>>   
>> +struct virtio_net_ctrl_coal_vq {
>> +    le16 vqn;
>> +    le16 reserved;
>> +    struct virtio_net_ctrl_coal coal;
>> +};
>> +
>>   #define VIRTIO_NET_CTRL_NOTF_COAL 6
>>    #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
>>    #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
>>   \end{lstlisting}
>>   
>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
>> +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
>> +transmit/receive virtqueues.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
>> +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
>> +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
>> +in the virtio_net_ctrl_coal_vq structure.
>> +
>> +# Read/Write attributes for coalescing parameters
>> +\begin{itemize}
>> +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
>> +      and \field{max_packets} are write-only for a driver.
>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
>> +      and \field{max_packets} are write-only for a driver.
>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
>> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
>> +\end{itemize}
>> +
>>   Coalescing parameters:
>>   \begin{itemize}
>> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
>>   \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
>>   \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
>>   \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
>>   \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
>>   \end{itemize}
>>   
>> -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
>> +\field{reserved} is reserved and it is ignored by a device.
>> +
>> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
>>   \begin{enumerate}
>> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
>> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
>> +                                        virtqueue whose number is \field{vqn}.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
>> +                                        transmit/receive virtqueue whose number is \field{vqn}.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>> +                                        each virtqueue of transmitq1\ldots transmitqN.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
>> +                                        each virtqueue of receiveq1\ldots receiveqN.
>>   \end{enumerate}
>>   
>> +If coalescing parameters are being set, the device applies the last coalescing parameters received for a
>> +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
>> +# Command sequence
>> +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
>> +\begin{itemize}
>> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing parameters for virtqueue0 and virtqueue2, and, virtqueue1 and virtqueue3 retain their previous parameter values.
>> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
>> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
>> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
>> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
>> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
>> +\end{itemize}
>> +
>>   \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
>>   
>>   The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}.
>> @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   
>>   When the device has \field{max_usecs} = 0 or \field{max_packets} = 0, the notification conditions are met after every packet received/sent.
>>   
>> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
>> +it may set the parameter to a value close to a power of 2. For example:
>> +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
>> +
>> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
>> +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
>> +does not change the global values. If the device is reset, the global values will be set to 0.
>> +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
>> +
>>   \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
>>   
>>   If, for example:
>> @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   
>>   \drivernormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>   
>> -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>> +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
>> +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>> +
>> +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
>>   
>>   \devicenormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>   
>> -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL commands with VIRTIO_NET_ERR if it was not able to change the parameters.
>> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
>> +
>> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
>> +
>> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
>> +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
>> +The device MUST set the global values of coalescing parameters to 0 after being reset.
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
>> +
>> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
>> +
>> +A device MAY set the coalescing parameter to a value close to a power of 2 value.
>> +
>> +A device MUST ignore \field{reserved}.
>>   
>>   A device SHOULD NOT send used buffer notifications to the driver if the notifications are suppressed, even if the notification conditions are met.


^ permalink raw reply

* [virtio-comment] Re: [PATCH v4 2/2] virtio-net: Define cfg fields before description
From: David Edmondson @ 2023-02-23 10:12 UTC (permalink / raw)
  To: Parav Pandit; +Cc: mst, virtio-dev, cohuck, virtio-comment, shahafs
In-Reply-To: <20230223023521.159959-3-parav@nvidia.com>


On Thursday, 2023-02-23 at 04:35:21 +02, Parav Pandit wrote:
> Currently some fields of the virtio_net_config structure are defined
> before introducing the structure and some are defined after
> introducing virtio_net_config.

Don't need the trailing "introducing virtio_net_config".

> Better to define the configuration layout first followed by
> description of all the fields.
>
> Device configuration fields are described in the section. Change wording
> from 'listed' to 'described' as suggested in patch [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00004.html
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/161
> Reviewed-by: David Edmondson <david.edmondson@oracle.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> changelog:
> v2->v3:
> - split the patch for read only description as prepration patch
> - rebased
> v1->v2:
> - remove read-only wording from multiple places
> v0->v1:
> - Change wording about device configuration field introduction
> - remove duplicate read-only wording for status field
> - reword sentence to read it better
> ---
>  device-types/net/description.tex | 42 +++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index 821f7b0..7033594 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -156,14 +156,29 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
>  \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
>  \label{sec:Device Types / Block Device / Feature bits / Device configuration layout}
>  
> -Device configuration fields are listed below. All the device
> -configuration fields are read-only for the driver.
> +The network device has the following device configuration layout. All the
> +device configuration fields are read-only for the driver.
>  
> -The \field{mac} address field always exists (though is only
> -valid if VIRTIO_NET_F_MAC is set), and \field{status} only
> -exists if VIRTIO_NET_F_STATUS is set. Two bits are currently
> -defined for the status field: VIRTIO_NET_S_LINK_UP and
> -VIRTIO_NET_S_ANNOUNCE.
> +\begin{lstlisting}
> +struct virtio_net_config {
> +        u8 mac[6];
> +        le16 status;
> +        le16 max_virtqueue_pairs;
> +        le16 mtu;
> +        le32 speed;
> +        u8 duplex;
> +        u8 rss_max_key_size;
> +        le16 rss_max_indirection_table_length;
> +        le32 supported_hash_types;
> +};
> +\end{lstlisting}
> +
> +The \field{mac} address field always exists (although it is only
> +valid if VIRTIO_NET_F_MAC is set).
> +
> +The \field{status} only exists if VIRTIO_NET_F_STATUS is set.
> +Two bits are currently defined for the status field: VIRTIO_NET_S_LINK_UP
> +and VIRTIO_NET_S_ANNOUNCE.
>  
>  \begin{lstlisting}
>  #define VIRTIO_NET_S_LINK_UP     1
> @@ -193,19 +208,6 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>  is expected to re-read these values after receiving a
>  configuration change notification.
>  
> -\begin{lstlisting}
> -struct virtio_net_config {
> -        u8 mac[6];
> -        le16 status;
> -        le16 max_virtqueue_pairs;
> -        le16 mtu;
> -        le32 speed;
> -        u8 duplex;
> -        u8 rss_max_key_size;
> -        le16 rss_max_indirection_table_length;
> -        le32 supported_hash_types;
> -};
> -\end{lstlisting}
>  The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
>  It specifies the maximum supported length of RSS key in bytes.
-- 
I used to get mad at my school, the teachers who taught me weren't cool.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply

* Re: [PATCH v4 1/2] virtio-net: Describe dev cfg fields read only
From: David Edmondson @ 2023-02-23 10:10 UTC (permalink / raw)
  To: Parav Pandit; +Cc: mst, virtio-dev, cohuck, virtio-comment, shahafs
In-Reply-To: <20230223023521.159959-2-parav@nvidia.com>


On Thursday, 2023-02-23 at 04:35:20 +02, Parav Pandit wrote:
> Device configuration fields are read only. Avoid duplicating this
> description for multiple fields.
>
> Instead describe it one time and do it in the driver requirements
> section.
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/161
> Reviewed-by: David Edmondson <david.edmondson@oracle.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>

Minor comment below.

Reviewed-by: David Edmondson <david.edmondson@oracle.com>

> ---
> changelog:
> v3->v4:
> - write driver requirement as normative statement
> - add back read only wording in the description
> v2->v3:
> - split as new patch
> ---
>  device-types/net/description.tex | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index 73501b6..821f7b0 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -156,25 +156,28 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
>  \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
>  \label{sec:Device Types / Block Device / Feature bits / Device configuration layout}
>  
> -Device configuration fields are listed below, they are read-only for a driver. The \field{mac} address field
> -always exists (though is only valid if VIRTIO_NET_F_MAC is set), and
> -\field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> -read-only bits (for the driver) are currently defined for the status field:
> -VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> +Device configuration fields are listed below. All the device

"All of the"

> +configuration fields are read-only for the driver.
> +
> +The \field{mac} address field always exists (though is only
> +valid if VIRTIO_NET_F_MAC is set), and \field{status} only
> +exists if VIRTIO_NET_F_STATUS is set. Two bits are currently
> +defined for the status field: VIRTIO_NET_S_LINK_UP and
> +VIRTIO_NET_S_ANNOUNCE.
>  
>  \begin{lstlisting}
>  #define VIRTIO_NET_S_LINK_UP     1
>  #define VIRTIO_NET_S_ANNOUNCE    2
>  \end{lstlisting}
>  
> -The following driver-read-only field, \field{max_virtqueue_pairs} only exists if
> +The following field, \field{max_virtqueue_pairs} only exists if
>  VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is set. This field specifies the maximum number
>  of each of transmit and receive virtqueues (receiveq1\ldots receiveqN
>  and transmitq1\ldots transmitqN respectively) that can be configured once at least one of these features
>  is negotiated.
>  
> -The following driver-read-only field, \field{mtu} only exists if
> -VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
> +The following field, \field{mtu} only exists if VIRTIO_NET_F_MTU
> +is set. This field specifies the maximum MTU for the driver to
>  use.
>  
>  The following two fields, \field{speed} and \field{duplex}, only
> @@ -264,6 +267,8 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>  
>  \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>  
> +The driver MUST NOT write to any of the device configuration fields.
> +
>  A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
>  If the driver negotiates the VIRTIO_NET_F_MAC feature, the driver MUST set
>  the physical address of the NIC to \field{mac}.  Otherwise, it SHOULD
-- 
What did you learn today? I learnt nothing.


^ permalink raw reply

* [virtio-comment] Re: [PATCH v7] virtio-net: support the virtqueue coalescing moderation
From: David Edmondson @ 2023-02-23 10:05 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-dev, virtio-comment, Michael S . Tsirkin, Parav Pandit,
	Alvaro Karsz, Jason Wang, Xuan Zhuo, Cornelia Huck
In-Reply-To: <20230222140632.10253-1-hengqi@linux.alibaba.com>


On Wednesday, 2023-02-22 at 22:06:32 +08, Heng Qi wrote:
> Currently, coalescing parameters are grouped for all transmit and receive
> virtqueues. This patch supports setting or getting the parameters for a
> specified virtqueue, and a typical application of this function is netdim[1].
>
> When the traffic between virtqueues is unbalanced, for example, one virtqueue
> is busy and another virtqueue is idle, then it will be very useful to
> control coalescing parameters at the virtqueue granularity.
>
> [1] https://docs.kernel.org/networking/net_dim.html
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> This patch is on top of Alvaro's latest v7 patch: https://lists.oasis-open.org/archives/virtio-dev/202302/msg00431.html .
>
> v6->v7:
>        1. Clarify the relationship of VIRTIO_NET_CTRL_NOTF_COAL_TX/RX_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET. @Alvaro Karsz, @Michael S. Tsirkin
>        2. Remove formula for vqn range. @Parav Pandit
>        3. Some expressions are clearer. @Parav Pandit, @Michael S. Tsirkin
>
> v5->v6:
>        1. Explain that the device may set a different value than the one passed in by the driver. @David Edmondson

A couple of things about this:
- why say "a value close to a power of 2" - couldn't the device pick any
  value it chooses?
- I think that we need to be more explicit that the values passed in the
  SET request may not be honoured exactly.
- should the chosen value be returned in the SET call? (Not too fussed
  about this, though it may result in an implementation immediately
  calling GET after SET to see what actually happened.)
- the example which shows how the global and per-VQ set operations
  interact is reasonably worded ("the device responds with coalescing
  parameters of virtqueue1 set by command5"), so that seems okay.

> v4->v5:
>        1. Add the correspondence between virtio_net_ctrl_coal and virtio_net_ctrl_coal_vq and control commands. @Michael S. Tsirkin
>        2. Add read and write attributes for each field. @Michael S. Tsirkin
>        3. A clearer description of how to set coalescing parameters for vq reset. @Michael S. Tsirkin
>        4. Fix some syntax errors. @Michael S. Tsirkin, @David Edmondson
>
> v3->v4:
>        1. Include virtio_net_ctrl_coal in the virtio_net_ctrl_coal_vq structure. @Alvaro Karsz
>        2. Add consideration of vq reset. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>        3. Avoid too many examples by giving a comprehensive example. @Michael S. Tsirkin
>        4. Fix typos and streamline clarifications. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>
> v2->v3:
>        1. Add the netdim link. @Parav Pandit
>        2. VIRTIO_NET_F_VQ_NOTF_COAL no longer depends on VIRTIO_NET_F_NOTF_COAL. @Michael S. Tsirkin, @Alvaro Karsz
>        3. _VQ_GET is explained more. @Michael S. Tsirkin
>        4. Add more examples to avoid misunderstandings. @Michael S. Tsirkin
>        5. Clarify some statements. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>        6. Adjust the virtio_net_ctrl_coal_vq structure. @Michael S. Tsirkin
>        7. Fix some typos. @Michael S. Tsirkin
>
> v1->v2:
>        1. Rename VIRTIO_NET_F_PERQUEUE_NOTF_COAL to VIRTIO_NET_F_VQ_NOTF_COAL. @Michael S. Tsirkin
>        2. Use the \field{vqn} instead of the qid. @Michael S. Tsirkin
>        3. Unify tx and rx control structres into one structure virtio_net_ctrl_coal_vq. @Michael S. Tsirkin
>        4. Add a new control command VIRTIO_NET_CTRL_NOTF_COAL_VQ. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>        5. The special value 0xFFF is removed because VIRTIO_NET_CTRL_NOTF_COAL can be used. @Alvaro Karsz
>        6. Clarify some special scenarios. @Michael S. Tsirkin, @Parav Pandit, @Alvaro Karsz
>
>  device-types/net/description.tex | 99 ++++++++++++++++++++++++++++++--
>  1 file changed, 94 insertions(+), 5 deletions(-)
>
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index e71e33b..745e4d9 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -83,6 +83,8 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>  
> +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing.
> +
>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>  
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -139,6 +141,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \end{description}
>  
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -1508,6 +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
>  send control commands for dynamically changing the coalescing parameters.
>  
> +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
> +\begin{itemize}
> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set coalescing parameters of a given
> +      enabled transmit/receive virtqueue.
> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a device, and the device responds with
> +      coalescing parameters of a given enabled transmit/receive virtqueue.
> +\end{itemize}
> +
>  \begin{note}
>  The behavior of the device in response to these commands is best-effort:
>  the device may generate notifications more or less frequently than specified.
> @@ -1519,25 +1530,76 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>      le32 max_usecs;
>  };
>  
> +struct virtio_net_ctrl_coal_vq {
> +    le16 vqn;
> +    le16 reserved;
> +    struct virtio_net_ctrl_coal coal;
> +};
> +
>  #define VIRTIO_NET_CTRL_NOTF_COAL 6
>   #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
>   #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
>  \end{lstlisting}
>  
> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
> +virtio_net_ctrl_coal structure to set \field{max_usecs} and \field{max_packets} for all
> +transmit/receive virtqueues.
> +
> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the virtio_net_ctrl_coal_vq structure
> +to set \field{max_usecs} and \field{max_packets} for the supplied virtqueue number \field{vqn}.
> +
> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of \field{max_usecs} and
> +\field{max_packets} of the specified virtqueue from the device by setting \field{vqn}
> +in the virtio_net_ctrl_coal_vq structure.
> +
> +# Read/Write attributes for coalescing parameters
> +\begin{itemize}
> +\item For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
> +      and \field{max_packets} are write-only for a driver.
> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn}, \field{reserved}, \field{max_usecs}
> +      and \field{max_packets} are write-only for a driver.
> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn} and \field{reserved} are write-only
> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only for the driver.
> +\end{itemize}
> +
>  Coalescing parameters:
>  \begin{itemize}
> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
>  \item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX notification.
>  \item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX notification.
>  \item \field{max_packets} for RX: Maximum number of packets to receive before a RX notification.
>  \item \field{max_packets} for TX: Maximum number of packets to send before a TX notification.
>  \end{itemize}
>  
> -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
> +\field{reserved} is reserved and it is ignored by a device.
> +
> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
>  \begin{enumerate}
> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all transmit virtqueues.
> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and \field{max_packets} parameters for all receive virtqueues.
> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and \field{max_packets} parameters for an enabled transmit/receive
> +                                        virtqueue whose number is \field{vqn}.
> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the \field{max_usecs} and \field{max_packets} parameters for an enabled
> +                                        transmit/receive virtqueue whose number is \field{vqn}.
> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> +                                        each virtqueue of transmitq1\ldots transmitqN.
> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect of setting coalescing parameters as the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for
> +                                        each virtqueue of receiveq1\ldots receiveqN.
>  \end{enumerate}
>  
> +If coalescing parameters are being set, the device applies the last coalescing parameters received for a
> +virtqueue, regardless of the command used to set the parameters. For example with 2 pairs of virtqueues:
> +# Command sequence
> +Each of the following commands sets \field{max_usecs} and \field{max_packets} parameters for virtqueues.
> +\begin{itemize}
> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing parameters for virtqueue0 and virtqueue2, and, virtqueue1 and virtqueue3 retain their previous parameter values.
> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values from command1.
> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 0, the device responds with coalescing parameters of virtqueue0 set by command2.
> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} = 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous values.
> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing parameters for virtqueue1 and virtqueue3, and overrides the values set by command4.
> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} = 1, the device responds with coalescing parameters of virtqueue1 set by command5.
> +\end{itemize}
> +
>  \subparagraph{Operation}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / Operation}
>  
>  The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}.
> @@ -1549,6 +1611,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  
>  When the device has \field{max_usecs} = 0 or \field{max_packets} = 0, the notification conditions are met after every packet received/sent.
>  
> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL class to set a coalescing parameter,
> +it may set the parameter to a value close to a power of 2. For example:
> +If the device receives \field{max_usecs} = 7 from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set \field{max_usecs} = 8 for a given enabled virtqueue.
> +
> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands,
> +it saves the values of coalescing parameters as global values, and the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
> +does not change the global values. If the device is reset, the global values will be set to 0.
> +When a virtqueue is enabled after virtqueue reset, its coalescing parameters are set to global values.
> +
>  \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Example}
>  
>  If, for example:
> @@ -1585,11 +1656,29 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  
>  \drivernormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>  
> -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> +If neither the VIRTIO_NET_F_NOTF_COAL nor the VIRTIO_NET_F_VQ_NOTF_COAL feature
> +has been negotiated, the driver MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
> +
> +A driver MUST ignore the values of coalescing parameters received from the VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with VIRTIO_NET_ERR.
>  
>  \devicenormative{\subparagraph}{Notifications Coalescing}{Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>  
> -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL commands with VIRTIO_NET_ERR if it was not able to change the parameters.
> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it was not able to change the parameters.
> +
> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command with VIRTIO_NET_ERR if it was not able to change the parameters.
> +
> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the given virtqueue is disabled.
> +
> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters for all transmit/receive
> +virtqueues respectively and values of coalescing parameters are recorded as global values by a device.
> +The device MUST set the global values of coalescing parameters to 0 after being reset.
> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing parameters for a given enabled virtqueue without changing the global values.
> +
> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing parameters of the virtqueue to the global values.
> +
> +A device MAY set the coalescing parameter to a value close to a power of 2 value.
> +
> +A device MUST ignore \field{reserved}.
>  
>  A device SHOULD NOT send used buffer notifications to the driver if the notifications are suppressed, even if the notification conditions are met.
-- 
I might throw out a curve ball but I never throw a lob.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply

* [virtio-comment] Re: [PATCH v6] virtio-net: support the virtqueue coalescing moderation
From: David Edmondson @ 2023-02-23 10:01 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-dev@lists.oasis-open.org,
	virtio-comment@lists.oasis-open.org, Michael S . Tsirkin,
	Alvaro Karsz, Jason Wang, Xuan Zhuo, Cornelia Huck
In-Reply-To: <PH0PR12MB5481457D8EE37A39198592C1DCAA9@PH0PR12MB5481.namprd12.prod.outlook.com>


On Wednesday, 2023-02-22 at 04:13:06 UTC, Parav Pandit wrote:
>> From: Heng Qi <hengqi@linux.alibaba.com>
>> Sent: Tuesday, February 21, 2023 10:22 PM
>
>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue
>> notification coalescing.
>> +
> s/notification/notifications 
> should be plural as multiple notifications are coalesced like the below description of VIRTIO_NET_F_NOTF_COAL.

Using the plural sounds weird, and isn't it implied by "coalescing"? If
there were not multiple, there would be nothing to coalesce :-)

>>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
>> coalescing.
>> 
>>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -139,6 +141,7 @@ \subsubsection{Feature bit
>> requirements}\label{sec:Device Types / Network Device
>> \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
>> VIRTIO_NET_F_HOST_TSO6.
>>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>  \end{description}
>> 
>>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
>> Network Device / Feature bits / Legacy Interface: Feature bits} @@ -1508,6
>> +1511,14 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types /
>> Network Device / Devi  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated,
>> the driver can  send control commands for dynamically changing the coalescing
>> parameters.
>> 
>> +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
>> +\begin{itemize}
>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command
>> to set coalescing parameters of a given
>> +      enabled transmit/receive virtqueue.
>> +\item a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command
>> to a device, and the device responds with
>> +      coalescing parameters of a given enabled transmit/receive virtqueue.
>> +\end{itemize}
>> +
>>  \begin{note}
>>  The behavior of the device in response to these commands is best-effort:
>>  the device may generate notifications more or less frequently than specified.
>> @@ -1519,25 +1530,76 @@ \subsubsection{Control
>> Virtqueue}\label{sec:Device Types / Network Device / Devi
>>      le32 max_usecs;
>>  };
>> 
>> +struct virtio_net_ctrl_coal_vq {
>> +    le16 vqn;
>> +    le16 reserved;
>> +    struct virtio_net_ctrl_coal coal;
>> +};
>> +
>>  #define VIRTIO_NET_CTRL_NOTF_COAL 6
>>   #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
>>   #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
>> + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2 #define
>> + VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
>>  \end{lstlisting}
>> 
>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>> +VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands use the
>> virtio_net_ctrl_coal
>> +structure to set \field{max_usecs} and \field{max_packets} for all
>> transmit/receive virtqueues.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command uses the
>> +virtio_net_ctrl_coal_vq structure to set \field{max_usecs} and
>> \field{max_packets} for the supplied virtqueue number \field{vqn}.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command gets the values of
>> +\field{max_usecs} and \field{max_packets} of the specified virtqueue
>> +from the device by setting \field{vqn} in the virtio_net_ctrl_coal_vq structure.
>> +
>
>> +# Read/Write attributes for coalescing parameters \begin{itemize} \item
>> +For commands VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>> VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, \field{max_usecs}
>> +      and \field{max_packets} are write-only for a driver.
>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, \field{vqn},
>
> Virtio spec is using vq number and vq index terminology interchangeably.
> For example, a new patch about admin vq, uses aq_start_index.
>
> MMIO device has vq_index register too.
> I am inclined to vq_index given current state of spec and new additions by Michael.
>
> Michael,
> Can you please suggest vqn or vq_index to use?
>
>> \field{reserved}, \field{max_usecs}
>> +      and \field{max_packets} are write-only for a driver.
>> +\item For the command VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET, \field{vqn}
>> and \field{reserved} are write-only
>
>
>> +      for a driver, and, \field{max_usecs} and \field{max_packets} are read-only
>> for a driver.
> Remove this trailing "for a driver", it is a duplicate.
>
>> +\end{itemize}
>> +
>>  Coalescing parameters:
>>  \begin{itemize}
>> +\item \field{vqn}: The virtqueue number of an enabled transmit or receive
>> virtqueue.
>>  \item \field{max_usecs} for RX: Maximum number of microseconds to delay a
>> RX notification.
>>  \item \field{max_usecs} for TX: Maximum number of microseconds to delay a
>> TX notification.
>>  \item \field{max_packets} for RX: Maximum number of packets to receive
>> before a RX notification.
>>  \item \field{max_packets} for TX: Maximum number of packets to send before
>> a TX notification.
>>  \end{itemize}
>> 
>> -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
>> +\field{vqn} points to an enabled transmit/receive virtqueue, and its value
>> satisfies $ 0 \leq vqn < max_virtqueue_pairs \ast 2 $.
>> +
> This calculation description is not needed here. It is covered somewhere else.
>
>> +\field{reserved} is reserved and it is ignored by a device.
>> +
>> +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
>>  \begin{enumerate}
>> -\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and
>> \field{max_packets} parameters for all transmit virtqueues.
>> -\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and
>> \field{max_packets} parameters for all receive virtqueues.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_usecs} and
>> \field{max_packets} parameters for an enabled transmit/receive
>> +                                        virtqueue whose number is \field{vqn}.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device returns the
>> \field{max_usecs} and \field{max_packets} parameters for an enabled
>> +                                        transmit/receive virtqueue whose number is \field{vqn}.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: have the same effect as the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for each
>> virtqueue of transmitq1\ldots transmitqN.
>> +\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: have the same effect as the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command repeated for each
>> virtqueue of receiveq1\ldots receiveqN.
>>  \end{enumerate}
>> 
>> +If coalescing parameters are being set, the device applies the last
>> +coalescing parameters received for a virtqueue, regardless of the command
>> used to set the parameters. For example with 2 pairs of virtqueues:
>> +# Command sequence
>> +Each of the following commands sets \field{max_usecs} and
>> \field{max_packets} parameters for virtqueues.
>> +\begin{itemize}
>> +\item Command1: VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets coalescing
>> parameters for virtqueue0 and virtqueue2, and, virtqueue1 and virtqueue3
>> retain their previous parameter values.
>> +\item Command2: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} =
>> 0 sets coalescing parameters for virtqueue0, and virtqueue2 retains the values
>> from command1.
>> +\item Command3: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} =
>> 0, the device responds with coalescing parameters of virtqueue0 set by
>> command2.
>> +\item Command4: VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with \field{vqn} =
>> 1 sets coalescing parameters for virtqueue1, and virtqueue3 retains its previous
>> values.
>> +\item Command5: VIRTIO_NET_CTRL_NOTF_COAL_TX_SET sets coalescing
>> parameters for virtqueue1 and virtqueue3, and overrides the values set by
>> command4.
>> +\item Command6: VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET with \field{vqn} =
>> 1, the device responds with coalescing parameters of virtqueue1 set by
>> command5.
>> +\end{itemize}
>> +
>>  \subparagraph{Operation}\label{sec:Device Types / Network Device / Device
>> Operation / Control Virtqueue / Notifications Coalescing / Operation}
>> 
>>  The device sends a used buffer notification once the notification conditions are
>> met and if the notifications are not suppressed as explained in \ref{sec:Basic
>> Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}.
>> @@ -1549,6 +1611,16 @@ \subsubsection{Control
>> Virtqueue}\label{sec:Device Types / Network Device / Devi
>> 
>>  When the device has \field{max_usecs} = 0 or \field{max_packets} = 0, the
>> notification conditions are met after every packet received/sent.
>> 
>> +When a device receives a command of the VIRTIO_NET_CTRL_NOTF_COAL
>> class
>> +to set a coalescing parameter, it may set the parameter to a value close to a
>> power of 2. For example:
>> +If the device receives \field{max_usecs} = 7 from the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command, it may set
>> \field{max_usecs} = 8 for a given enabled virtqueue.
>> +
>> +When the device receives the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>> +VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands, it saves the values of
>> +coalescing parameters as global values, and the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command does not change the global
>> values. If the device is reset, the global values will be set to 0.
>> +
>> +When a virtqueue is reset, its coalescing parameters are set to the global
>> values.
>> +
> A VQ reset operation disables (or destroys) the VQ in the device.
> Hence, a vq under reset doesn't have any valid parameters.
> Therefore, above wording should be,
>
> When a virtqueue is enabled after it is reset, its coalescing
> parameters are set to global values as configured by
> VIRTIO_NET_CTRL_NOTF_COAL_TX_SET or VIRTIO_NET_CTRL_NOTF_COAL_RX_SET.
>
>>  \subparagraph{RX Example}\label{sec:Device Types / Network Device / Device
>> Operation / Control Virtqueue / Notifications Coalescing / RX Example}
>> 
>>  If, for example:
>> @@ -1585,11 +1657,29 @@ \subsubsection{Control
>> Virtqueue}\label{sec:Device Types / Network Device / Devi
>> 
>>  \drivernormative{\subparagraph}{Notifications Coalescing}{Device Types /
>> Network Device / Device Operation / Control Virtqueue / Notifications
>> Coalescing}
>> 
>> -If the VIRTIO_NET_F_NOTF_COAL feature has not been negotiated, the driver
>> MUST NOT issue VIRTIO_NET_CTRL_NOTF_COAL commands.
>> +If neither the VIRTIO_NET_F_NOTF_COAL nor the
>> VIRTIO_NET_F_VQ_NOTF_COAL
>> +feature has been negotiated, the driver MUST NOT issue
>> VIRTIO_NET_CTRL_NOTF_COAL commands.
>> +
>> +A driver MUST ignore the values of coalescing parameters received from the
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command if a device responds with
>> VIRTIO_NET_ERR.
>> 
>>  \devicenormative{\subparagraph}{Notifications Coalescing}{Device Types /
>> Network Device / Device Operation / Control Virtqueue / Notifications
>> Coalescing}
>> 
>> -A device SHOULD respond to the VIRTIO_NET_CTRL_NOTF_COAL commands
>> with VIRTIO_NET_ERR if it was not able to change the parameters.
>> +A device SHOULD respond to VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>> VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands with VIRTIO_NET_ERR if it
>> was not able to change the parameters.
>> +
>> +A device MUST respond to the VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET
>> command with VIRTIO_NET_ERR if it was not able to change the parameters.
>> +
>> +A device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and
>> VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if
>> the given virtqueue is disabled.
>> +
>> +The VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
>> +VIRTIO_NET_CTRL_NOTF_COAL_RX_SET commands set coalescing parameters
>> for all transmit/receive virtqueues respectively and values of coalescing
>> parameters are recorded as global values by a device.
>> +The device MUST set the global values of coalescing parameters to 0 after
>> being reset.
>> +The VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command sets the coalescing
>> parameters for a given enabled virtqueue without changing the global values.
>> +
>> +After disabling and re-enabling a virtqueue, the device MUST revert coalescing
>> parameters of the virtqueue to the global values.
>> +
>> +A device MAY set the coalescing parameter to a value close to a power of 2
>> value.
>> +
>> +A device MUST ignore \field{reserved}.
>> 
>>  A device SHOULD NOT send used buffer notifications to the driver if the
>> notifications are suppressed, even if the notification conditions are met.
>> 
>> --
>> 2.19.1.6.gb485710b
-- 
Tonight I think I'll walk alone, I'll find my soul as I go home.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply

* RE: [PATCH v3 1/2] virtio-net: Describe dev cfg fields read only
From: Parav Pandit @ 2023-02-23  5:50 UTC (permalink / raw)
  To: Cornelia Huck, Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org,
	virtio-comment@lists.oasis-open.org, Shahaf Shuler
In-Reply-To: <87ttzdal0p.fsf@redhat.com>



> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Wednesday, February 22, 2023 8:27 AM

> > We can just skip adding a new requirement completely - we'll never get
> > there with a compliant driver.  This is what we do e.g. for MMIO.
> > Why not?
> 
> That would be fine with me as well.
> 
> > This has an advantage as this allows backing config with regular RAM.
> > Also I feel that since it always said "read only for driver" then this
> > implies a restriction on driver not the device.
> 
> Indeed. The normative statement below should be enough to make that "read-
> only for the driver" thing obvious.
> 
> >
> >> >
> >> >
> >> >> >
> >> >> >>
> >> >> >> Driver section:
> >> >> >> Driver must not write to read-only fields.
> >> >>
> >> >> "The driver MUST NOT write to any config space field."

I addressed the comments in v4 at [1].

[1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00523.html
Please review.


^ permalink raw reply

* [PATCH 3/3] transport-mmio: Refer to the vq by its number
From: Parav Pandit @ 2023-02-23  5:46 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223054624.168042-1-parav@nvidia.com>

Currently specification uses virtqueue index and
number interchangeably to refer to the virtqueue.

Instead refer to it by its number.

This patch is on top of [1].

[1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00527.html

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 transport-mmio.tex | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index c59975e..324cecf 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -96,7 +96,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     bits accessible by writing to \field{DriverFeatures}.
   }
   \hline
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+  \mmioreg{QueueSel}{Virtual queue number}{0x030}{W}{%
     Writing to this register selects the virtual queue that the
     following operations on \field{QueueNumMax}, \field{QueueNum}, \field{QueueReady},
     \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, \field{QueueDriverHigh},
@@ -130,7 +130,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     there are new buffers to process in a queue.
 
     When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-    the value written is the queue index.
+    the value written is the queue number.
 
     When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
     the \field{Notification data} value has the following format:
@@ -373,7 +373,7 @@ \subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Option
 
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
 the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
+the 16-bit virtqueue number
 of the queue to be notified to \field{QueueNotify}.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
@@ -451,7 +451,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
     (see QueuePFN).
   }
   \hline
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+  \mmioreg{QueueSel}{Virtual queue number}{0x030}{W}{%
     Writing to this register selects the virtual queue that the
     following operations on the \field{QueueNumMax}, \field{QueueNum}, \field{QueueAlign}
     and \field{QueuePFN} registers apply to. The index
-- 
2.26.2


^ permalink raw reply related

* [PATCH 2/3] transport-mmio: Rename QueueNum register
From: Parav Pandit @ 2023-02-23  5:46 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223054624.168042-1-parav@nvidia.com>

Currently, the specification uses virtqueue index and number
interchangeably to refer to the virtqueue.

It is better to identify it using one terminology.

Two registers QueueNumMax and QueueNum actually reflect the queue size
or queue depth indicating max and actual number of entries in the queue.
Equivalent register in PCI transport is named differently as queue_size.

To bring consistency between pci and mmio transport, and to avoid
confusion between number and index, rename the QueueNumMax and QueueNum
registers to QueueSizeMax and QueueSize respectively.

[1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00527.html

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 transport-mmio.tex | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index 65bae54..c59975e 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -104,14 +104,14 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     number of the first queue is zero (0x0).
   }
   \hline
-  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+  \mmioreg{QueueSizeMax}{Maximum virtual queue size}{0x034}{R}{%
     Reading from the register returns the maximum size (number of
     elements) of the queue the device is ready to process or
     zero (0x0) if the queue is not available. This applies to the
     queue selected by writing to \field{QueueSel}.
   }
   \hline
-  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+  \mmioreg{QueueSize}{Virtual queue size}{0x038}{W}{%
     Queue size is the number of elements in the queue.
     Writing to this register notifies the device what size of the
     queue the driver will use. This applies to the queue selected by
@@ -459,7 +459,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
 .
   }
   \hline
-  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+  \mmioreg{QueueSizeMax}{Maximum virtual queue size}{0x034}{R}{%
     Reading from the register returns the maximum size of the queue
     the device is ready to process or zero (0x0) if the queue is not
     available. This applies to the queue selected by writing to
@@ -467,7 +467,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
     (0x0), so when the queue is not actively used.
   }
   \hline
-  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+  \mmioreg{QueueSize}{Virtual queue size}{0x038}{W}{%
     Queue size is the number of elements in the queue.
     Writing to this register notifies the device what size of the
     queue the driver will use. This applies to the queue selected by
-- 
2.26.2


^ permalink raw reply related

* [PATCH 1/3] transport-pci: Refer to the vq by its number
From: Parav Pandit @ 2023-02-23  5:46 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223054624.168042-1-parav@nvidia.com>

Currently specification uses virtqueue index and
number interchangeably to refer to the virtqueue.

Instead refer to it by its number.

This patch is on top of [1].

[1] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00527.html

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 transport-pci.tex | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/transport-pci.tex b/transport-pci.tex
index da1486a..c9e112a 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -1005,7 +1005,7 @@ \subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virti
 The driver typically does this as follows, for each virtqueue a device has:
 
 \begin{enumerate}
-\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
+\item Write the virtqueue number (first queue is 0) to \field{queue_select}.
 
 \item Read the virtqueue size from \field{queue_size}. This controls how big the virtqueue is
   (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist.
@@ -1035,7 +1035,7 @@ \subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Option
 
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
 the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
+the 16-bit virtqueue number
 of this virtqueue to the Queue Notify address.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
@@ -1053,7 +1053,7 @@ \subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Option
 If VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated:
 \begin{itemize}
 \item If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver MUST use the
-\field{queue_notify_data} value instead of the virtqueue index.
+\field{queue_notify_data} value instead of the virtqueue number.
 \item If VIRTIO_F_NOTIFICATION_DATA has been negotiated, the driver MUST set the
 \field{vqn} field to the \field{queue_notify_data} value.
 \end{itemize}
-- 
2.26.2


^ permalink raw reply related

* [PATCH 0/3] Rename queue index to queue number
From: Parav Pandit @ 2023-02-23  5:46 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit

1. Currently, virtqueue is identified between driver and device
interchangeably using either number of index terminology.

2. Between PCI and MMIO transport the queue size (depth) is
defined as queue_size and QueueNum respectively.

To avoid confusion and to have consistency, unify them to use as Number.

Solution:
Use virtqueue number description, and rename MMIO register as QueueSize.

Patch summary:
patch-1 renames index to number for pci transport
patch-2 renames mmio register from Num to Size
patch-3 renames index to number for mmio transport

Please review.
This series fixes the issue [1].

This series is on top of [2].

[1] https://github.com/oasis-tcs/virtio-spec/issues/163
[2] https://lists.oasis-open.org/archives/virtio-dev/202302/msg00527.html

---
Cornelia:
I was not sure about ccw for vq_config_block and vq_info_block structures
index field refers to the queue number or not.
Can you please clarify?

If it vqn, I will send v1 by replacing index to vqn to be
consistent with other part of the spec which also uses vqn.

Parav Pandit (3):
  transport-pci: Refer to the vq by its number
  transport-mmio: Rename QueueNum register
  transport-mmio: Refer to the vq by its number

 transport-mmio.tex | 16 ++++++++--------
 transport-pci.tex  |  6 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [virtio-comment] RE: [virtio-dev] [PATCH v2] virtio-net: define the VIRTIO_NET_F_CTRL_RX_EXTRA feature bit
From: Parav Pandit @ 2023-02-23  5:28 UTC (permalink / raw)
  To: Alvaro Karsz, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org
  Cc: jasowang@redhat.com, mst@redhat.com
In-Reply-To: <20230222083014.2375727-1-alvaro.karsz@solid-run.com>


> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> Behalf Of Alvaro Karsz
> 
> The VIRTIO_NET_F_CTRL_RX_EXTRA feature bit is mentioned in the spec since
> version 1.0, but it's not properly defined.
> 
> This patch defines the feature bit and defines the dependency on
> VIRTIO_NET_F_CTRL_VQ.
> 
> Since this dependency is missing in previous versions, we add it now as a
> "SHOULD".
> 
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/162
> 
> Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
> ---
> v2:
> 	- Rephrase commit log, no changes to patch body.
> 
>  device-types/net/description.tex | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index 8487ccd..cf37da5 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -74,6 +74,8 @@ \subsection{Feature bits}\label{sec:Device Types /
> Network Device / Feature bits
> 
>  \item[VIRTIO_NET_F_CTRL_VLAN (19)] Control channel VLAN filtering.
> 
> +\item[VIRTIO_NET_F_CTRL_RX_EXTRA (20)]	Control channel RX extra
> mode support.
> +
>  \item[VIRTIO_NET_F_GUEST_ANNOUNCE(21)] Driver can send gratuitous
>      packets.
> 
> @@ -259,6 +261,9 @@ \subsection{Device configuration
> layout}\label{sec:Device Types / Network Device  The device SHOULD NOT
> offer VIRTIO_NET_F_HASH_REPORT if it  does not offer
> VIRTIO_NET_F_CTRL_VQ.
> 
> +The device SHOULD NOT offer VIRTIO_NET_F_CTRL_RX_EXTRA if it does not
> +offer VIRTIO_NET_F_CTRL_VQ.
> +
>  \drivernormative{\subsubsection}{Device configuration layout}{Device Types /
> Network Device / Device configuration layout}
> 
>  A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
> @@ -295,6 +300,9 @@ \subsection{Device configuration
> layout}\label{sec:Device Types / Network Device  A driver SHOULD NOT
> negotiate VIRTIO_NET_F_HASH_REPORT if it  does not negotiate
> VIRTIO_NET_F_CTRL_VQ.
> 
> +A driver SHOULD NOT negotiate VIRTIO_NET_F_CTRL_RX_EXTRA if it does not
> +negotiate VIRTIO_NET_F_CTRL_VQ.
> +
>  \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device
> Types / Network Device / Device configuration layout / Legacy Interface: Device
> configuration layout}  \label{sec:Device Types / Block Device / Feature bits /
> Device configuration layout / Legacy Interface: Device configuration layout}
> When using the legacy interface, transitional devices and drivers
> --
> 2.34.1

Reviewed-by: Parav Pandit <parav@nvidia.com>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
From: Heng Qi @ 2023-02-23  4:41 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo
In-Reply-To: <1047920c-5dd5-8f31-0c4c-a108f36155f8@redhat.com>



在 2023/2/23 上午10:50, Jason Wang 写道:
> Hi:
>
> 在 2023/2/22 14:46, Heng Qi 写道:
>> Hi, Jason. Long time no see. :)
>>
>> 在 2023/2/22 上午11:22, Jason Wang 写道:
>>>
>>> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>>>> +There may be potential security risks when encapsulated packets 
>>>>> using RSS to
>>>>> +select queues for placement. When a user inside a tunnel tries to 
>>>>> control the
>>>
>>>
>>> What do you mean by "user" here? Is it a remote or local one?
>>>
>>
>> I mean a remote attacker who is not under the control of the tunnel 
>> owner.
>
>
> Anything may the tunnel different? I think this can happen even 
> without tunnel (and even with single queue).

I agree.

>
> How to mitigate those attackers seems more like a implementation 
> details where might require fair queuing or other QOS technology which 
> has been well studied.

I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more protected 
outside the device,
but it seems to be okay to give some attack reminders.

Thanks.

>
> It seems out of the scope of the spec (unless we want to let driver 
> manageable QOS).
>
> Thanks
>
>
>>
>> Thanks.
>>
>>>
>>>>> +enqueuing of encapsulated packets, then the user can flood the 
>>>>> device with invaild
>>>>> +packets, and the flooded packets may be hashed into the same 
>>>>> queue as packets in
>>>>> +other normal tunnels, which causing the queue to overflow.
>>>>> +
>>>>> +This can pose several security risks:
>>>>> +\begin{itemize}
>>>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>>>> enqueued due to queue
>>>>> +       overflow, resulting in a large amount of packet loss.
>>>>> +\item  The delay and retransmission of packets in the normal 
>>>>> tunnels are extremely increased.
>>>>> +\item  The user can observe the traffic information and enqueue 
>>>>> information of other normal
>>>>> +       tunnels, and conduct targeted DoS attacks.
>>>>> +\end{\itemize}
>>>>> +
>>>> Hmm with this all written out it sounds pretty severe.
>>>
>>>
>>> I think we need first understand whether or not it's a problem that 
>>> we need to solve at spec level:
>>>
>>> 1) anything make encapsulated packets different or why we can't hit 
>>> this problem without encapsulation
>>>
>>> 2) whether or not it's the implementation details that the spec 
>>> doesn't need to care (or how it is solved in real NIC)
>>>
>>> Thanks
>>>
>>>
>>>> At this point with no ways to mitigate, I don't feel this is something
>>>> e.g. Linux can enable.  I am not going to nack the spec patch if
>>>> others  find this somehow useful e.g. for dpdk.
>>>> How about CC e.g. dpdk devs or whoever else is going to use this
>>>> and asking them for the opinion?
>>>>
>>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply

* [PATCH v4 6/6] transport-ccw: Fix spellings and white spaces
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Now that we have individual files, fix reported spelling errors.

While at it, remove extra white spaces.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v1->v2:
- remove trailing white spaces
---
 transport-ccw.tex | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/transport-ccw.tex b/transport-ccw.tex
index 93401a4..c492cb9 100644
--- a/transport-ccw.tex
+++ b/transport-ccw.tex
@@ -56,13 +56,13 @@ \subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over cha
 \end{tabular}
 
 A virtio-ccw proxy device facilitates:
-\begin{itemize} 
+\begin{itemize}
 \item Discovery and attachment of virtio devices (as described above).
 \item Initialization of virtqueues and transport-specific facilities (using
       virtio-specific channel commands).
 \item Notifications (via hypercall and a combination of I/O interrupts
       and indicator bits).
-\end{itemize} 
+\end{itemize}
 
 \subsubsection{Channel Commands for Virtio}\label{sec:Virtio Transport Options / Virtio
 over channel I/O / Basic Concepts/ Channel Commands for Virtio}
@@ -107,7 +107,7 @@ \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
 Host->Guest Notification / Notification via Classic I/O Interrupts} and
 \ref{sec:Virtio Transport Options / Virtio over channel I/O / Device
 Operation / Host->Guest Notification / Notification via Adapter I/O
-Interrupts} respectively. 
+Interrupts} respectively.
 
 Configuration change notifications are done using so-called classic I/O
 interrupts. The initialization is described in section \ref{sec:Virtio
@@ -413,7 +413,7 @@ \subsubsection{Setting Up Indicators}\label{sec:Virtio Transport Options / Virti
 \begin{itemize}
 \item a summary indicator byte covering the virtqueues for one or more
   virtio-ccw proxy devices
-\item a set of contigous indicator bits for the virtqueues for a
+\item a set of contiguous indicator bits for the virtqueues for a
   virtio-ccw proxy device
 \end{itemize}
 
@@ -563,7 +563,7 @@ \subsubsection{Guest->Host Notification}\label{sec:Virtio Transport Options / Vi
 
 \drivernormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
 
-For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.
+For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notification.
 
 \begin{note}
 For example:
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 5/6] transport-mmio: Fix spellings and white spaces
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Now that we have individual files, fix reported spelling errors.

While at it, remove trailing white spaces.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v0->v1:
- removed many trailing white spaces
---
 transport-mmio.tex | 90 +++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index 7f2e0c3..65bae54 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -18,7 +18,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
 
 MMIO virtio devices provide a set of memory mapped control
 registers followed by a device-specific configuration space,
-described in the table~\ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+described in the table~\ref{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}.
 
 All register values are organized as Little Endian.
 
@@ -32,23 +32,23 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
 
 \begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
   \caption {MMIO Device Register Layout}
-  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout} \\
+  \label{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout} \\
+  \hline
+  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description}
+  \hline
   \hline
-  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
-  \hline 
-  \hline 
   \endfirsthead
   \hline
-  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
-  \hline 
-  \hline 
+  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description}
+  \hline
+  \hline
   \endhead
   \endfoot
   \endlastfoot
   \mmioreg{MagicValue}{Magic value}{0x000}{R}{%
     0x74726976
     (a Little Endian equivalent of the ``virt'' string).
-  } 
+  }
   \hline
   \mmioreg{Version}{Device version number}{0x004}{R}{%
     0x2.
@@ -56,7 +56,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
       Legacy devices (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1.
     \end{note}
   }
-  \hline 
+  \hline
   \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{%
     See \ref{sec:Device Types}~\nameref{sec:Device Types} for possible values.
     Value zero (0x0) is used to
@@ -64,9 +64,9 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     well known addresses, assigning functions to them depending
     on user's needs.
   }
-  \hline 
+  \hline
   \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
-  \hline 
+  \hline
   \mmioreg{DeviceFeatures}{Flags representing features the device supports}{0x010}{R}{%
     Reading from this register returns 32 consecutive flag bits,
     the least significant bit depending on the last value written to
@@ -76,12 +76,12 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     features bits 32 to 63 if \field{DeviceFeaturesSel} is set to 1.
     Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
   }
-  \hline 
+  \hline
   \mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{%
     Writing to this register selects a set of 32 device feature bits
     accessible by reading from \field{DeviceFeatures}.
   }
-  \hline 
+  \hline
   \mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{%
     Writing to this register sets 32 consecutive flag bits, the least significant
     bit depending on the last value written to \field{DriverFeaturesSel}.
@@ -90,41 +90,41 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     \field{DriverFeaturesSel} is set to 0 and features bits 32 to 63 if
     \field{DriverFeaturesSel} is set to 1. Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
   }
-  \hline 
+  \hline
   \mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{%
     Writing to this register selects a set of 32 activated feature
     bits accessible by writing to \field{DriverFeatures}.
   }
-  \hline 
+  \hline
   \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
     Writing to this register selects the virtual queue that the
     following operations on \field{QueueNumMax}, \field{QueueNum}, \field{QueueReady},
     \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, \field{QueueDriverHigh},
     \field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} apply to. The index
-    number of the first queue is zero (0x0). 
+    number of the first queue is zero (0x0).
   }
-  \hline 
+  \hline
   \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
     Reading from the register returns the maximum size (number of
     elements) of the queue the device is ready to process or
     zero (0x0) if the queue is not available. This applies to the
     queue selected by writing to \field{QueueSel}.
   }
-  \hline 
+  \hline
   \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
     Queue size is the number of elements in the queue.
     Writing to this register notifies the device what size of the
     queue the driver will use. This applies to the queue selected by
     writing to \field{QueueSel}.
   }
-  \hline 
+  \hline
   \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
     Writing one (0x1) to this register notifies the device that it can
     execute requests from this virtual queue. Reading from this register
     returns the last value written to it. Both read and write
     accesses apply to the queue selected by writing to \field{QueueSel}.
   }
-  \hline 
+  \hline
   \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
     Writing a value to this register notifies the device that
     there are new buffers to process in a queue.
@@ -140,7 +140,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
     for the definition of the components.
   }
-  \hline 
+  \hline
   \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
     Reading from this register returns a bit mask of events that
     caused the device interrupt to be asserted.
@@ -153,49 +153,49 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
         asserted because the configuration of the device has changed.
     \end{description}
   }
-  \hline 
+  \hline
   \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
     Writing a value with bits set as defined in \field{InterruptStatus}
     to this register notifies the device that events causing
     the interrupt have been handled.
   }
-  \hline 
+  \hline
   \mmioreg{Status}{Device status}{0x070}{RW}{%
     Reading from this register returns the current device status
     flags.
     Writing non-zero values to this register sets the status flags,
     indicating the driver progress. Writing zero (0x0) to this
-    register triggers a device reset. 
+    register triggers a device reset.
     See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}.
   }
-  \hline 
+  \hline
   \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Area 64 bit long physical address}{0x080}{0x084}{W}{%
     Writing to these two registers (lower 32 bits of the address
     to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
     the device about location of the Descriptor Area of the queue
     selected by writing to \field{QueueSel} register.
   }
-  \hline 
+  \hline
   \mmiodreg{QueueDriverLow}{QueueDriverHigh}{Virtual queue's Driver Area 64 bit long physical address}{0x090}{0x094}{W}{%
     Writing to these two registers (lower 32 bits of the address
     to \field{QueueDriverLow}, higher 32 bits to \field{QueueDriverHigh}) notifies
     the device about location of the Driver Area of the queue
     selected by writing to \field{QueueSel}.
   }
-  \hline 
+  \hline
   \mmiodreg{QueueDeviceLow}{QueueDeviceHigh}{Virtual queue's Device Area 64 bit long physical address}{0x0a0}{0x0a4}{W}{%
     Writing to these two registers (lower 32 bits of the address
     to \field{QueueDeviceLow}, higher 32 bits to \field{QueueDeviceHigh}) notifies
     the device about location of the Device Area of the queue
     selected by writing to \field{QueueSel}.
   }
-  \hline 
+  \hline
   \mmioreg{SHMSel}{Shared memory id}{0x0ac}{W}{%
     Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
     following operations on \field{SHMLenLow}, \field{SHMLenHigh},
     \field{SHMBaseLow} and \field{SHMBaseHigh} apply to.
   }
-  \hline 
+  \hline
   \mmiodreg{SHMLenLow}{SHMLenHigh}{Shared memory region 64 bit long length}{0x0b0}{0x0b4}{R}{%
     These registers return the length of the shared memory
     region in bytes, as defined by the device for the region selected by
@@ -205,7 +205,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     region (i.e. where the ID written to \field{SHMSel} is unused)
     results in a length of -1.
   }
-  \hline 
+  \hline
   \mmiodreg{SHMBaseLow}{SHMBaseHigh}{Shared memory region 64 bit long physical address}{0x0b8}{0x0bc}{R}{%
     The driver reads these registers to discover the base address
     of the region in physical address space.  This address is
@@ -216,7 +216,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     \field{SHMSel} is unused) results in a base address of
     0xffffffffffffffff.
   }
-  \hline 
+  \hline
   \mmioreg{QueueReset}{Virtual queue reset bit}{0x0c0}{RW}{%
     If VIRTIO_F_RING_RESET has been negotiated, writing one (0x1) to this
     register selectively resets the queue. Both read and write accesses
@@ -230,7 +230,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     If the values are different, the configuration space accesses were not atomic and the driver has to perform the operations again.
     See also \ref {sec:Basic Facilities of a Virtio Device / Device Configuration Space}.
   }
-  \hline 
+  \hline
   \mmioreg{Config}{Configuration space}{0x100+}{RW}{
     Device-specific configuration space starts at the offset 0x100
     and is accessed with byte alignment. Its meaning and size
@@ -272,13 +272,13 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
 
 \drivernormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
 The driver MUST NOT access memory locations not described in the
-table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}
+table \ref{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
 (or, in case of the configuration space, described in the device specification),
 MUST NOT write to the read-only registers (direction R) and
 MUST NOT read from the write-only registers (direction W).
 
 The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers
-described in table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+described in table \ref{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}.
 For the device-specific configuration space, the driver MUST use 8 bit wide accesses for
 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and
 aligned accesses for 32 and 64 bit wide fields.
@@ -407,23 +407,23 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
 in a slightly different control register layout, the device
 initialization and the virtual queue configuration procedure.
 
-Table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} 
+Table \ref{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout}
 presents control registers layout, omitting
 descriptions of registers which did not change their function
 nor behaviour:
 
 \begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
   \caption {MMIO Device Legacy Register Layout}
-  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} \\
+  \label{tab:Virtio Transport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} \\
+  \hline
+  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description}
+  \hline
   \hline
-  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
-  \hline 
-  \hline 
   \endfirsthead
   \hline
-  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
-  \hline 
-  \hline 
+  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description}
+  \hline
+  \hline
   \endhead
   \endfoot
   \endlastfoot
@@ -442,7 +442,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
   \mmioreg{GuestFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{}
   \hline
   \mmioreg{GuestFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{}
-  \hline 
+  \hline
   \mmioreg{GuestPageSize}{Guest page size}{0x028}{W}{%
     The driver writes the guest page size in bytes to the
     register during initialization, before any queues are used.
@@ -455,7 +455,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
     Writing to this register selects the virtual queue that the
     following operations on the \field{QueueNumMax}, \field{QueueNum}, \field{QueueAlign}
     and \field{QueuePFN} registers apply to. The index
-    number of the first queue is zero (0x0). 
+    number of the first queue is zero (0x0).
 .
   }
   \hline
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 4/6] transport-pci: Fix spellings and white spaces
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Now that we have individual files, fix reported spelling errors.

While at it, remove trailing white spaces.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v0->v1:
- removed many trailing white spaces
---
 transport-pci.tex | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/transport-pci.tex b/transport-pci.tex
index 49c35bd..da1486a 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -5,7 +5,7 @@ \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over P
 A Virtio device can be implemented as any kind of PCI device:
 a Conventional PCI device or a PCI Express
 device.  To assure designs meet the latest level
-requirements, see 
+requirements, see
 the PCI-SIG home page at \url{http://www.pcisig.com} for any
 approved changes.
 
@@ -14,7 +14,7 @@ \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over P
 guest an interface that meets the specification requirements of
 the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
 and \hyperref[intro:PCIe]{[PCIe]}
-respectively. 
+respectively.
 
 \subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
 
@@ -586,7 +586,7 @@ \subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virti
 
 \devicenormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
 
-The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.  
+The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.
 
 The device MUST set the Device Configuration Interrupt bit
 in \field{ISR status} before sending a device configuration
@@ -945,7 +945,7 @@ \subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virti
 \end{lstlisting}
 
 Note that mapping an event to vector might require device to
-allocate internal device resources, and thus could fail. 
+allocate internal device resources, and thus could fail.
 
 \devicenormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
 
@@ -973,7 +973,7 @@ \subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virti
 unless it is impossible for the device to satisfy the mapping
 request.  Devices MUST report mapping
 failures by returning the NO_VECTOR value when the relevant
-\field{config_msix_vector}/\field{queue_msix_vector} field is read. 
+\field{config_msix_vector}/\field{queue_msix_vector} field is read.
 
 \drivernormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
 
@@ -981,7 +981,7 @@ \subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virti
 Driver MAY fall back on using INT\#x interrupts for a device
 which only supports one MSI-X vector (MSI-X Table Size = 0).
 
-Driver MAY intepret the Table Size as a hint from the device
+Driver MAY interpret the Table Size as a hint from the device
 for the suggested number of MSI-X vectors to use.
 
 Driver MUST NOT attempt to map an event to a vector
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 3/6] transport-ccw: Split Channel IO transport to its own file
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Place Channel IO transport specification in its own file to
better maintain it.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v1->v2:
- renamed file to transport-ccw.tex
---
 content.tex       | 603 +---------------------------------------------
 transport-ccw.tex | 601 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 602 insertions(+), 602 deletions(-)
 create mode 100644 transport-ccw.tex

diff --git a/content.tex b/content.tex
index 80c28df..cff548a 100644
--- a/content.tex
+++ b/content.tex
@@ -581,608 +581,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
 
 \input{transport-pci.tex}
 \input{transport-mmio.tex}
-
-\section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio Over Channel I/O}
-
-S/390 based virtual machines support neither PCI nor MMIO, so a
-different transport is needed there.
-
-virtio-ccw uses the standard channel I/O based mechanism used for
-the majority of devices on S/390. A virtual channel device with a
-special control unit type acts as proxy to the virtio device
-(similar to the way virtio-pci uses a PCI device) and
-configuration and operation of the virtio device is accomplished
-(mostly) via channel commands. This means virtio devices are
-discoverable via standard operating system algorithms, and adding
-virtio support is mainly a question of supporting a new control
-unit type.
-
-As the S/390 is a big endian machine, the data structures transmitted
-via channel commands are big-endian: this is made clear by use of
-the types be16, be32 and be64.
-
-\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
-
-As a proxy device, virtio-ccw uses a channel-attached I/O control
-unit with a special control unit type (0x3832) and a control unit
-model corresponding to the attached virtio device's subsystem
-device ID, accessed via a virtual I/O subchannel and a virtual
-channel path of type 0x32. This proxy device is discoverable via
-normal channel subsystem device discovery (usually a STORE
-SUBCHANNEL loop) and answers to the basic channel commands:
-
-\begin{itemize}
-\item NO-OPERATION (0x03)
-\item BASIC SENSE (0x04)
-\item TRANSFER IN CHANNEL (0x08)
-\item SENSE ID (0xe4)
-\end{itemize}
-
-For a virtio-ccw proxy device, SENSE ID will return the following
-information:
-
-\begin{tabular}{ |l|l|l| }
-\hline
-Bytes & Description & Contents \\
-\hline \hline
-0     & reserved              & 0xff \\
-\hline
-1-2   & control unit type     & 0x3832 \\
-\hline
-3     & control unit model    & <virtio device id> \\
-\hline
-4-5   & device type           & zeroes (unset) \\
-\hline
-6     & device model          & zeroes (unset) \\
-\hline
-7-255 & extended SenseId data & zeroes (unset) \\
-\hline
-\end{tabular}
-
-A virtio-ccw proxy device facilitates:
-\begin{itemize} 
-\item Discovery and attachment of virtio devices (as described above).
-\item Initialization of virtqueues and transport-specific facilities (using
-      virtio-specific channel commands).
-\item Notifications (via hypercall and a combination of I/O interrupts
-      and indicator bits).
-\end{itemize} 
-
-\subsubsection{Channel Commands for Virtio}\label{sec:Virtio Transport Options / Virtio
-over channel I/O / Basic Concepts/ Channel Commands for Virtio}
-
-In addition to the basic channel commands, virtio-ccw defines a
-set of channel commands related to configuration and operation of
-virtio:
-
-\begin{lstlisting}
-#define CCW_CMD_SET_VQ 0x13
-#define CCW_CMD_VDEV_RESET 0x33
-#define CCW_CMD_SET_IND 0x43
-#define CCW_CMD_SET_CONF_IND 0x53
-#define CCW_CMD_SET_IND_ADAPTER 0x73
-#define CCW_CMD_READ_FEAT 0x12
-#define CCW_CMD_WRITE_FEAT 0x11
-#define CCW_CMD_READ_CONF 0x22
-#define CCW_CMD_WRITE_CONF 0x21
-#define CCW_CMD_WRITE_STATUS 0x31
-#define CCW_CMD_READ_VQ_CONF 0x32
-#define CCW_CMD_SET_VIRTIO_REV 0x83
-#define CCW_CMD_READ_STATUS 0x72
-\end{lstlisting}
-
-\subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
-over channel I/O / Basic Concepts/ Notifications}
-
-Available buffer notifications are realized as a hypercall. No additional
-setup by the driver is needed. The operation of available buffer
-notifications is described in section \ref{sec:Virtio Transport Options /
-Virtio over channel I/O / Device Operation / Guest->Host Notification}.
-
-Used buffer notifications are realized either as so-called classic or
-adapter I/O interrupts depending on a transport level negotiation. The
-initialization is described in sections \ref{sec:Virtio Transport Options
-/ Virtio over channel I/O / Device Initialization / Setting Up Indicators
-/ Setting Up Classic Queue Indicators} and \ref{sec:Virtio Transport
-Options / Virtio over channel I/O / Device Initialization / Setting Up
-Indicators / Setting Up Two-Stage Queue Indicators} respectively.  The
-operation of each flavor is described in sections \ref{sec:Virtio
-Transport Options / Virtio over channel I/O / Device Operation /
-Host->Guest Notification / Notification via Classic I/O Interrupts} and
-\ref{sec:Virtio Transport Options / Virtio over channel I/O / Device
-Operation / Host->Guest Notification / Notification via Adapter I/O
-Interrupts} respectively. 
-
-Configuration change notifications are done using so-called classic I/O
-interrupts. The initialization is described in section \ref{sec:Virtio
-Transport Options / Virtio over channel I/O / Device Initialization /
-Setting Up Indicators / Setting Up Configuration Change Indicators} and
-the operation in section \ref{sec:Virtio Transport Options / Virtio over
-channel I/O / Device Operation / Host->Guest Notification / Notification
-via Classic I/O Interrupts}.
-
-\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
-
-The virtio-ccw device acts like a normal channel device, as specified
-in \hyperref[intro:S390 PoP]{[S390 PoP]} and \hyperref[intro:S390 Common I/O]{[S390 Common I/O]}. In particular:
-
-\begin{itemize}
-\item A device MUST post a unit check with command reject for any command
-  it does not support.
-
-\item If a driver did not suppress length checks for a channel command,
-  the device MUST present a subchannel status as detailed in the
-  architecture when the actual length did not match the expected length.
-
-\item If a driver did suppress length checks for a channel command, the
-  device MUST present a check condition if the transmitted data does
-  not contain enough data to process the command. If the driver submitted
-  a buffer that was too long, the device SHOULD accept the command.
-\end{itemize}
-
-\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
-
-A driver for virtio-ccw devices MUST check for a control unit
-type of 0x3832 and MUST ignore the device type and model.
-
-A driver SHOULD attempt to provide the correct length in a channel
-command even if it suppresses length checks for that command.
-
-\subsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization}
-
-virtio-ccw uses several channel commands to set up a device.
-
-\subsubsection{Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
-
-CCW_CMD_SET_VIRTIO_REV is issued by the driver to set the revision of
-the virtio-ccw transport it intends to drive the device with. It uses the
-following communication structure:
-
-\begin{lstlisting}
-struct virtio_rev_info {
-        be16 revision;
-        be16 length;
-        u8 data[];
-};
-\end{lstlisting}
-
-\field{revision} contains the desired revision id, \field{length} the length of the
-data portion and \field{data} revision-dependent additional desired options.
-
-The following values are supported:
-
-\begin{tabular}{ |l|l|l|l| }
-\hline
-\field{revision} & \field{length} & \field{data}      & remarks \\
-\hline \hline
-0        & 0      & <empty>   & legacy interface; transitional devices only \\
-\hline
-1        & 0      & <empty>   & Virtio 1 \\
-\hline
-2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
-\hline
-3-n      &        &           & reserved for later revisions \\
-\hline
-\end{tabular}
-
-Note that a change in the virtio standard does not necessarily
-correspond to a change in the virtio-ccw revision.
-
-\devicenormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
-
-A device MUST post a unit check with command reject for any \field{revision}
-it does not support. For any invalid combination of \field{revision}, \field{length}
-and \field{data}, it MUST post a unit check with command reject as well. A
-non-transitional device MUST reject revision id 0.
-
-A device SHOULD answer with command reject to any virtio-ccw specific
-channel command that is not contained in the revision selected by the
-driver.
-
-A device MUST answer with command reject to any attempt to select a different revision
-after a revision has been successfully selected by the driver.
-
-A device MUST treat the revision as unset from the time the associated
-subchannel has been enabled until a revision has been successfully set
-by the driver. This implies that revisions are not persistent across
-disabling and enabling of the associated subchannel.
-
-\drivernormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
-
-A driver SHOULD start with trying to set the highest revision it
-supports and continue with lower revisions if it gets a command reject.
-
-A driver MUST NOT issue any other virtio-ccw specific channel commands
-prior to setting the revision.
-
-After a revision has been successfully selected by the driver, it
-MUST NOT attempt to select a different revision.
-
-\paragraph{Legacy Interfaces: A Note on Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision / Legacy Interfaces: A Note on Setting the Virtio Revision}
-
-A legacy device will not support the CCW_CMD_SET_VIRTIO_REV and answer
-with a command reject. A non-transitional driver MUST stop trying to
-operate this device in that case. A transitional driver MUST operate
-the device as if it had been able to set revision 0.
-
-A legacy driver will not issue the CCW_CMD_SET_VIRTIO_REV prior to
-issuing other virtio-ccw specific channel commands. A non-transitional
-device therefore MUST answer any such attempts with a command reject.
-A transitional device MUST assume in this case that the driver is a
-legacy driver and continue as if the driver selected revision 0. This
-implies that the device MUST reject any command not valid for revision
-0, including a subsequent CCW_CMD_SET_VIRTIO_REV.
-
-\subsubsection{Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
-
-CCW_CMD_READ_VQ_CONF is issued by the driver to obtain information
-about a queue. It uses the following structure for communicating:
-
-\begin{lstlisting}
-struct vq_config_block {
-        be16 index;
-        be16 max_num;
-};
-\end{lstlisting}
-
-The requested number of buffers for queue \field{index} is returned in
-\field{max_num}.
-
-Afterwards, CCW_CMD_SET_VQ is issued by the driver to inform the
-device about the location used for its queue. The transmitted
-structure is
-
-\begin{lstlisting}
-struct vq_info_block {
-        be64 desc;
-        be32 res0;
-        be16 index;
-        be16 num;
-        be64 driver;
-        be64 device;
-};
-\end{lstlisting}
-
-\field{desc}, \field{driver} and \field{device} contain the guest
-addresses for the descriptor area,
-available area and used area for queue \field{index}, respectively. The actual
-virtqueue size (number of allocated buffers) is transmitted in \field{num}.
-
-\devicenormative{\paragraph}{Configuring a Virtqueue}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
-
-\field{res0} is reserved and MUST be ignored by the device.
-
-\paragraph{Legacy Interface: A Note on Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue / Legacy Interface: A Note on Configuring a Virtqueue}
-
-For a legacy driver or for a driver that selected revision 0,
-CCW_CMD_SET_VQ uses the following communication block:
-
-\begin{lstlisting}
-struct vq_info_block_legacy {
-        be64 queue;
-        be32 align;
-        be16 index;
-        be16 num;
-};
-\end{lstlisting}
-
-\field{queue} contains the guest address for queue \field{index}, \field{num} the number of buffers
-and \field{align} the alignment. The queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}.
-
-\subsubsection{Communicating Status Information}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
-
-The driver changes the status of a device via the
-CCW_CMD_WRITE_STATUS command, which transmits an 8 bit status
-value.
-
-As described in
-\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits},
-a device sometimes fails to set the \field{device status} field: For example, it
-might fail to accept the FEATURES_OK status bit during device initialization.
-
-With revision 2, CCW_CMD_READ_STATUS is defined: It reads an 8 bit status
-value from the device and acts as a reverse operation to CCW_CMD_WRITE_STATUS.
-
-\drivernormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
-
-If the device posts a unit check with command reject in response to the
-CCW_CMD_WRITE_STATUS command, the driver MUST assume that the device failed
-to set the status and the \field{device status} field retained
-its previous value.
-
-If at least revision 2 has been negotiated, the driver SHOULD use the
-CCW_CMD_READ_STATUS command to retrieve the \field{device status} field after
-a configuration change has been detected.
-
-If not at least revision 2 has been negotiated, the driver MUST NOT attempt
-to issue the CCW_CMD_READ_STATUS command.
-
-\devicenormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
-
-If the device fails to set the \field{device status} field
-to the value written by the driver, the device MUST assure
-that the \field{device status} field is left unchanged and
-MUST post a unit check with command reject.
-
-If at least revision 2 has been negotiated, the device MUST return the
-current \field{device status} field if the CCW_CMD_READ_STATUS
-command is issued.
-
-\subsubsection{Handling Device Features}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Device Features}
-
-Feature bits are arranged in an array of 32 bit values, making
-for a total of 8192 feature bits. Feature bits are in
-little-endian byte order.
-
-The CCW commands dealing with features use the following
-communication block:
-
-\begin{lstlisting}
-struct virtio_feature_desc {
-        le32 features;
-        u8 index;
-};
-\end{lstlisting}
-
-\field{features} are the 32 bits of features currently accessed, while
-\field{index} describes which of the feature bit values is to be
-accessed. No padding is added at the end of the structure, it is
-exactly 5 bytes in length.
-
-The guest obtains the device's device feature set via the
-CCW_CMD_READ_FEAT command. The device stores the features at \field{index}
-to \field{features}.
-
-For communicating its supported features to the device, the driver
-uses the CCW_CMD_WRITE_FEAT command, denoting a \field{features}/\field{index}
-combination.
-
-\subsubsection{Device Configuration}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Device Configuration}
-
-The device's configuration space is located in host memory.
-
-To obtain information from the configuration space, the driver
-uses CCW_CMD_READ_CONF, specifying the guest memory for the device
-to write to.
-
-For changing configuration information, the driver uses
-CCW_CMD_WRITE_CONF, specifying the guest memory for the device to
-read from.
-
-In both cases, the complete configuration space is transmitted.  This
-allows the driver to compare the new configuration space with the old
-version, and keep a generation count internally whenever it changes.
-
-\subsubsection{Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators}
-
-In order to set up the indicator bits for host->guest notification,
-the driver uses different channel commands depending on whether it
-wishes to use traditional I/O interrupts tied to a subchannel or
-adapter I/O interrupts for virtqueue notifications. For any given
-device, the two mechanisms are mutually exclusive.
-
-For the configuration change indicators, only a mechanism using
-traditional I/O interrupts is provided, regardless of whether
-traditional or adapter I/O interrupts are used for virtqueue
-notifications.
-
-\paragraph{Setting Up Classic Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Classic Queue Indicators}
-
-Indicators for notification via classic I/O interrupts are contained
-in a 64 bit value per virtio-ccw proxy device.
-
-To communicate the location of the indicator bits for host->guest
-notification, the driver uses the CCW_CMD_SET_IND command,
-pointing to a location containing the guest address of the
-indicators in a 64 bit value.
-
-If the driver has already set up two-staged queue indicators via the
-CCW_CMD_SET_IND_ADAPTER command, the device MUST post a unit check
-with command reject to any subsequent CCW_CMD_SET_IND command.
-
-\paragraph{Setting Up Configuration Change Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Configuration Change Indicators}
-
-Indicators for configuration change host->guest notification are
-contained in a 64 bit value per virtio-ccw proxy device.
-
-To communicate the location of the indicator bits used in the
-configuration change host->guest notification, the driver issues the
-CCW_CMD_SET_CONF_IND command, pointing to a location containing the
-guest address of the indicators in a 64 bit value.
-
-\paragraph{Setting Up Two-Stage Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
-
-Indicators for notification via adapter I/O interrupts consist of
-two stages:
-\begin{itemize}
-\item a summary indicator byte covering the virtqueues for one or more
-  virtio-ccw proxy devices
-\item a set of contigous indicator bits for the virtqueues for a
-  virtio-ccw proxy device
-\end{itemize}
-
-To communicate the location of the summary and queue indicator bits,
-the driver uses the CCW_CMD_SET_IND_ADAPTER command with the following
-payload:
-
-\begin{lstlisting}
-struct virtio_thinint_area {
-        be64 summary_indicator;
-        be64 indicator;
-        be64 bit_nr;
-        u8 isc;
-} __attribute__ ((packed));
-\end{lstlisting}
-
-\field{summary_indicator} contains the guest address of the 8 bit summary
-indicator.
-\field{indicator} contains the guest address of an area wherein the indicators
-for the devices are contained, starting at \field{bit_nr}, one bit per
-virtqueue of the device. Bit numbers start at the left, i.e. the most
-significant bit in the first byte is assigned the bit number 0.
-\field{isc} contains the I/O interruption subclass to be used for the adapter
-I/O interrupt. It MAY be different from the isc used by the proxy
-virtio-ccw device's subchannel.
-No padding is added at the end of the structure, it is exactly 25 bytes
-in length.
-
-
-\devicenormative{\subparagraph}{Setting Up Two-Stage Queue Indicators}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
-If the driver has already set up classic queue indicators via the
-CCW_CMD_SET_IND command, the device MUST post a unit check with
-command reject to any subsequent CCW_CMD_SET_IND_ADAPTER command.
-
-\paragraph{Legacy Interfaces: A Note on Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Legacy Interfaces: A Note on Setting Up Indicators}
-
-In some cases, legacy devices will only support classic queue indicators;
-in that case, they will reject CCW_CMD_SET_IND_ADAPTER as they don't know that
-command. Some legacy devices will support two-stage queue indicators, though,
-and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
-them up.
-
-\subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
-
-\subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}
-
-There are two modes of operation regarding host->guest notification,
-classic I/O interrupts and adapter I/O interrupts. The mode to be
-used is determined by the driver by using CCW_CMD_SET_IND respectively
-CCW_CMD_SET_IND_ADAPTER to set up queue indicators.
-
-For configuration changes, the driver always uses classic I/O
-interrupts.
-
-\paragraph{Notification via Classic I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Classic I/O Interrupts}
-
-If the driver used the CCW_CMD_SET_IND command to set up queue
-indicators, the device will use classic I/O interrupts for
-host->guest notification about virtqueue activity.
-
-For notifying the driver of virtqueue buffers, the device sets the
-corresponding bit in the guest-provided indicators. If an
-interrupt is not already pending for the subchannel, the device
-generates an unsolicited I/O interrupt.
-
-If the device wants to notify the driver about configuration
-changes, it sets bit 0 in the configuration indicators and
-generates an unsolicited I/O interrupt, if needed. This also
-applies if adapter I/O interrupts are used for queue notifications.
-
-\paragraph{Notification via Adapter I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
-
-If the driver used the CCW_CMD_SET_IND_ADAPTER command to set up
-queue indicators, the device will use adapter I/O interrupts for
-host->guest notification about virtqueue activity.
-
-For notifying the driver of virtqueue buffers, the device sets the
-bit in the guest-provided indicator area at the corresponding offset.
-The guest-provided summary indicator is set to 0x01. An adapter I/O
-interrupt for the corresponding interruption subclass is generated.
-
-The recommended way to process an adapter I/O interrupt by the driver
-is as follows:
-
-\begin{itemize}
-\item Process all queue indicator bits associated with the summary indicator.
-\item Clear the summary indicator, performing a synchronization (memory
-barrier) afterwards.
-\item Process all queue indicator bits associated with the summary indicator
-again.
-\end{itemize}
-
-\devicenormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
-
-The device SHOULD only generate an adapter I/O interrupt if the
-summary indicator had not been set prior to notification.
-
-\drivernormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
-The driver
-MUST clear the summary indicator after receiving an adapter I/O
-interrupt before it processes the queue indicators.
-
-\paragraph{Legacy Interfaces: A Note on Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Legacy Interfaces: A Note on Host->Guest Notification}
-
-As legacy devices and drivers support only classic queue indicators,
-host->guest notification will always be done via classic I/O interrupts.
-
-\subsubsection{Guest->Host Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
-
-For notifying the device of virtqueue buffers, the driver
-unfortunately can't use a channel command (the asynchronous
-characteristics of channel I/O interact badly with the host block
-I/O backend). Instead, it uses a diagnose 0x500 call with subcode
-3 specifying the queue, as follows:
-
-\begin{tabular}{ |l|l|l| }
-\hline
-GPR  &   Input Value     & Output Value \\
-\hline \hline
-  1   &       0x3         &              \\
-\hline
-  2   &  Subchannel ID    & Host Cookie  \\
-\hline
-  3   & Notification data &              \\
-\hline
-  4   &   Host Cookie     &              \\
-\hline
-\end{tabular}
-
-When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-the \field{Notification data} contains the Virtqueue number.
-
-When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
-the value has the following format:
-\lstinputlisting{notifications-be.c}
-
-See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
-for the definition of the components.
-
-\devicenormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
-The device MUST ignore bits 0-31 (counting from the left) of GPR2.
-This aligns passing the subchannel ID with the way it is passed
-for the existing I/O instructions.
-
-The device MAY return a 64-bit host cookie in GPR2 to speed up the
-notification execution.
-
-\drivernormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
-
-For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.
-
-\begin{note}
-For example:
-\begin{lstlisting}
-info->cookie = do_notify(schid,
-                         virtqueue_get_queue_index(vq),
-                         info->cookie);
-\end{lstlisting}
-\end{note}
-
-\subsubsection{Resetting Devices}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
-
-In order to reset a device, a driver sends the
-CCW_CMD_VDEV_RESET command. This command does not carry any payload.
-
-The device signals completion of the virtio reset operation through successful
-conclusion of the CCW_CMD_VDEV_RESET channel command. In particular, the
-command not only triggers the reset operation, but the reset operation is
-already completed when the operation concludes successfully.
-
-\devicenormative{\paragraph}{Resetting Devices}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
-
-The device MUST finish the virtio reset operation and reinitialize
-\field{device status} to zero before it concludes the CCW_CMD_VDEV_RESET
-command successfully.
-
-The device MUST NOT send notifications or interact with the queues after
-it signaled successful conclusion of the CCW_CMD_VDEV_RESET command.
-
-\drivernormative{\paragraph}{Resetting Devices}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
-
-The driver MAY consider the virtio reset operation to be complete already after
-successful conclusion of the CCW_CMD_VDEV_RESET channel command, although it
-MAY also choose to verify reset completion by reading \field{device status} via
-CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
+\input{transport-ccw.tex}
 
 \chapter{Device Types}\label{sec:Device Types}
 
diff --git a/transport-ccw.tex b/transport-ccw.tex
new file mode 100644
index 0000000..93401a4
--- /dev/null
+++ b/transport-ccw.tex
@@ -0,0 +1,601 @@
+\section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio Over Channel I/O}
+
+S/390 based virtual machines support neither PCI nor MMIO, so a
+different transport is needed there.
+
+virtio-ccw uses the standard channel I/O based mechanism used for
+the majority of devices on S/390. A virtual channel device with a
+special control unit type acts as proxy to the virtio device
+(similar to the way virtio-pci uses a PCI device) and
+configuration and operation of the virtio device is accomplished
+(mostly) via channel commands. This means virtio devices are
+discoverable via standard operating system algorithms, and adding
+virtio support is mainly a question of supporting a new control
+unit type.
+
+As the S/390 is a big endian machine, the data structures transmitted
+via channel commands are big-endian: this is made clear by use of
+the types be16, be32 and be64.
+
+\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+As a proxy device, virtio-ccw uses a channel-attached I/O control
+unit with a special control unit type (0x3832) and a control unit
+model corresponding to the attached virtio device's subsystem
+device ID, accessed via a virtual I/O subchannel and a virtual
+channel path of type 0x32. This proxy device is discoverable via
+normal channel subsystem device discovery (usually a STORE
+SUBCHANNEL loop) and answers to the basic channel commands:
+
+\begin{itemize}
+\item NO-OPERATION (0x03)
+\item BASIC SENSE (0x04)
+\item TRANSFER IN CHANNEL (0x08)
+\item SENSE ID (0xe4)
+\end{itemize}
+
+For a virtio-ccw proxy device, SENSE ID will return the following
+information:
+
+\begin{tabular}{ |l|l|l| }
+\hline
+Bytes & Description & Contents \\
+\hline \hline
+0     & reserved              & 0xff \\
+\hline
+1-2   & control unit type     & 0x3832 \\
+\hline
+3     & control unit model    & <virtio device id> \\
+\hline
+4-5   & device type           & zeroes (unset) \\
+\hline
+6     & device model          & zeroes (unset) \\
+\hline
+7-255 & extended SenseId data & zeroes (unset) \\
+\hline
+\end{tabular}
+
+A virtio-ccw proxy device facilitates:
+\begin{itemize} 
+\item Discovery and attachment of virtio devices (as described above).
+\item Initialization of virtqueues and transport-specific facilities (using
+      virtio-specific channel commands).
+\item Notifications (via hypercall and a combination of I/O interrupts
+      and indicator bits).
+\end{itemize} 
+
+\subsubsection{Channel Commands for Virtio}\label{sec:Virtio Transport Options / Virtio
+over channel I/O / Basic Concepts/ Channel Commands for Virtio}
+
+In addition to the basic channel commands, virtio-ccw defines a
+set of channel commands related to configuration and operation of
+virtio:
+
+\begin{lstlisting}
+#define CCW_CMD_SET_VQ 0x13
+#define CCW_CMD_VDEV_RESET 0x33
+#define CCW_CMD_SET_IND 0x43
+#define CCW_CMD_SET_CONF_IND 0x53
+#define CCW_CMD_SET_IND_ADAPTER 0x73
+#define CCW_CMD_READ_FEAT 0x12
+#define CCW_CMD_WRITE_FEAT 0x11
+#define CCW_CMD_READ_CONF 0x22
+#define CCW_CMD_WRITE_CONF 0x21
+#define CCW_CMD_WRITE_STATUS 0x31
+#define CCW_CMD_READ_VQ_CONF 0x32
+#define CCW_CMD_SET_VIRTIO_REV 0x83
+#define CCW_CMD_READ_STATUS 0x72
+\end{lstlisting}
+
+\subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
+over channel I/O / Basic Concepts/ Notifications}
+
+Available buffer notifications are realized as a hypercall. No additional
+setup by the driver is needed. The operation of available buffer
+notifications is described in section \ref{sec:Virtio Transport Options /
+Virtio over channel I/O / Device Operation / Guest->Host Notification}.
+
+Used buffer notifications are realized either as so-called classic or
+adapter I/O interrupts depending on a transport level negotiation. The
+initialization is described in sections \ref{sec:Virtio Transport Options
+/ Virtio over channel I/O / Device Initialization / Setting Up Indicators
+/ Setting Up Classic Queue Indicators} and \ref{sec:Virtio Transport
+Options / Virtio over channel I/O / Device Initialization / Setting Up
+Indicators / Setting Up Two-Stage Queue Indicators} respectively.  The
+operation of each flavor is described in sections \ref{sec:Virtio
+Transport Options / Virtio over channel I/O / Device Operation /
+Host->Guest Notification / Notification via Classic I/O Interrupts} and
+\ref{sec:Virtio Transport Options / Virtio over channel I/O / Device
+Operation / Host->Guest Notification / Notification via Adapter I/O
+Interrupts} respectively. 
+
+Configuration change notifications are done using so-called classic I/O
+interrupts. The initialization is described in section \ref{sec:Virtio
+Transport Options / Virtio over channel I/O / Device Initialization /
+Setting Up Indicators / Setting Up Configuration Change Indicators} and
+the operation in section \ref{sec:Virtio Transport Options / Virtio over
+channel I/O / Device Operation / Host->Guest Notification / Notification
+via Classic I/O Interrupts}.
+
+\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+The virtio-ccw device acts like a normal channel device, as specified
+in \hyperref[intro:S390 PoP]{[S390 PoP]} and \hyperref[intro:S390 Common I/O]{[S390 Common I/O]}. In particular:
+
+\begin{itemize}
+\item A device MUST post a unit check with command reject for any command
+  it does not support.
+
+\item If a driver did not suppress length checks for a channel command,
+  the device MUST present a subchannel status as detailed in the
+  architecture when the actual length did not match the expected length.
+
+\item If a driver did suppress length checks for a channel command, the
+  device MUST present a check condition if the transmitted data does
+  not contain enough data to process the command. If the driver submitted
+  a buffer that was too long, the device SHOULD accept the command.
+\end{itemize}
+
+\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+A driver for virtio-ccw devices MUST check for a control unit
+type of 0x3832 and MUST ignore the device type and model.
+
+A driver SHOULD attempt to provide the correct length in a channel
+command even if it suppresses length checks for that command.
+
+\subsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization}
+
+virtio-ccw uses several channel commands to set up a device.
+
+\subsubsection{Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+CCW_CMD_SET_VIRTIO_REV is issued by the driver to set the revision of
+the virtio-ccw transport it intends to drive the device with. It uses the
+following communication structure:
+
+\begin{lstlisting}
+struct virtio_rev_info {
+        be16 revision;
+        be16 length;
+        u8 data[];
+};
+\end{lstlisting}
+
+\field{revision} contains the desired revision id, \field{length} the length of the
+data portion and \field{data} revision-dependent additional desired options.
+
+The following values are supported:
+
+\begin{tabular}{ |l|l|l|l| }
+\hline
+\field{revision} & \field{length} & \field{data}      & remarks \\
+\hline \hline
+0        & 0      & <empty>   & legacy interface; transitional devices only \\
+\hline
+1        & 0      & <empty>   & Virtio 1 \\
+\hline
+2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
+\hline
+3-n      &        &           & reserved for later revisions \\
+\hline
+\end{tabular}
+
+Note that a change in the virtio standard does not necessarily
+correspond to a change in the virtio-ccw revision.
+
+\devicenormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+A device MUST post a unit check with command reject for any \field{revision}
+it does not support. For any invalid combination of \field{revision}, \field{length}
+and \field{data}, it MUST post a unit check with command reject as well. A
+non-transitional device MUST reject revision id 0.
+
+A device SHOULD answer with command reject to any virtio-ccw specific
+channel command that is not contained in the revision selected by the
+driver.
+
+A device MUST answer with command reject to any attempt to select a different revision
+after a revision has been successfully selected by the driver.
+
+A device MUST treat the revision as unset from the time the associated
+subchannel has been enabled until a revision has been successfully set
+by the driver. This implies that revisions are not persistent across
+disabling and enabling of the associated subchannel.
+
+\drivernormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+A driver SHOULD start with trying to set the highest revision it
+supports and continue with lower revisions if it gets a command reject.
+
+A driver MUST NOT issue any other virtio-ccw specific channel commands
+prior to setting the revision.
+
+After a revision has been successfully selected by the driver, it
+MUST NOT attempt to select a different revision.
+
+\paragraph{Legacy Interfaces: A Note on Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision / Legacy Interfaces: A Note on Setting the Virtio Revision}
+
+A legacy device will not support the CCW_CMD_SET_VIRTIO_REV and answer
+with a command reject. A non-transitional driver MUST stop trying to
+operate this device in that case. A transitional driver MUST operate
+the device as if it had been able to set revision 0.
+
+A legacy driver will not issue the CCW_CMD_SET_VIRTIO_REV prior to
+issuing other virtio-ccw specific channel commands. A non-transitional
+device therefore MUST answer any such attempts with a command reject.
+A transitional device MUST assume in this case that the driver is a
+legacy driver and continue as if the driver selected revision 0. This
+implies that the device MUST reject any command not valid for revision
+0, including a subsequent CCW_CMD_SET_VIRTIO_REV.
+
+\subsubsection{Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
+
+CCW_CMD_READ_VQ_CONF is issued by the driver to obtain information
+about a queue. It uses the following structure for communicating:
+
+\begin{lstlisting}
+struct vq_config_block {
+        be16 index;
+        be16 max_num;
+};
+\end{lstlisting}
+
+The requested number of buffers for queue \field{index} is returned in
+\field{max_num}.
+
+Afterwards, CCW_CMD_SET_VQ is issued by the driver to inform the
+device about the location used for its queue. The transmitted
+structure is
+
+\begin{lstlisting}
+struct vq_info_block {
+        be64 desc;
+        be32 res0;
+        be16 index;
+        be16 num;
+        be64 driver;
+        be64 device;
+};
+\end{lstlisting}
+
+\field{desc}, \field{driver} and \field{device} contain the guest
+addresses for the descriptor area,
+available area and used area for queue \field{index}, respectively. The actual
+virtqueue size (number of allocated buffers) is transmitted in \field{num}.
+
+\devicenormative{\paragraph}{Configuring a Virtqueue}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
+
+\field{res0} is reserved and MUST be ignored by the device.
+
+\paragraph{Legacy Interface: A Note on Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue / Legacy Interface: A Note on Configuring a Virtqueue}
+
+For a legacy driver or for a driver that selected revision 0,
+CCW_CMD_SET_VQ uses the following communication block:
+
+\begin{lstlisting}
+struct vq_info_block_legacy {
+        be64 queue;
+        be32 align;
+        be16 index;
+        be16 num;
+};
+\end{lstlisting}
+
+\field{queue} contains the guest address for queue \field{index}, \field{num} the number of buffers
+and \field{align} the alignment. The queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}.
+
+\subsubsection{Communicating Status Information}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+The driver changes the status of a device via the
+CCW_CMD_WRITE_STATUS command, which transmits an 8 bit status
+value.
+
+As described in
+\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits},
+a device sometimes fails to set the \field{device status} field: For example, it
+might fail to accept the FEATURES_OK status bit during device initialization.
+
+With revision 2, CCW_CMD_READ_STATUS is defined: It reads an 8 bit status
+value from the device and acts as a reverse operation to CCW_CMD_WRITE_STATUS.
+
+\drivernormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+If the device posts a unit check with command reject in response to the
+CCW_CMD_WRITE_STATUS command, the driver MUST assume that the device failed
+to set the status and the \field{device status} field retained
+its previous value.
+
+If at least revision 2 has been negotiated, the driver SHOULD use the
+CCW_CMD_READ_STATUS command to retrieve the \field{device status} field after
+a configuration change has been detected.
+
+If not at least revision 2 has been negotiated, the driver MUST NOT attempt
+to issue the CCW_CMD_READ_STATUS command.
+
+\devicenormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+If the device fails to set the \field{device status} field
+to the value written by the driver, the device MUST assure
+that the \field{device status} field is left unchanged and
+MUST post a unit check with command reject.
+
+If at least revision 2 has been negotiated, the device MUST return the
+current \field{device status} field if the CCW_CMD_READ_STATUS
+command is issued.
+
+\subsubsection{Handling Device Features}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Device Features}
+
+Feature bits are arranged in an array of 32 bit values, making
+for a total of 8192 feature bits. Feature bits are in
+little-endian byte order.
+
+The CCW commands dealing with features use the following
+communication block:
+
+\begin{lstlisting}
+struct virtio_feature_desc {
+        le32 features;
+        u8 index;
+};
+\end{lstlisting}
+
+\field{features} are the 32 bits of features currently accessed, while
+\field{index} describes which of the feature bit values is to be
+accessed. No padding is added at the end of the structure, it is
+exactly 5 bytes in length.
+
+The guest obtains the device's device feature set via the
+CCW_CMD_READ_FEAT command. The device stores the features at \field{index}
+to \field{features}.
+
+For communicating its supported features to the device, the driver
+uses the CCW_CMD_WRITE_FEAT command, denoting a \field{features}/\field{index}
+combination.
+
+\subsubsection{Device Configuration}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Device Configuration}
+
+The device's configuration space is located in host memory.
+
+To obtain information from the configuration space, the driver
+uses CCW_CMD_READ_CONF, specifying the guest memory for the device
+to write to.
+
+For changing configuration information, the driver uses
+CCW_CMD_WRITE_CONF, specifying the guest memory for the device to
+read from.
+
+In both cases, the complete configuration space is transmitted.  This
+allows the driver to compare the new configuration space with the old
+version, and keep a generation count internally whenever it changes.
+
+\subsubsection{Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators}
+
+In order to set up the indicator bits for host->guest notification,
+the driver uses different channel commands depending on whether it
+wishes to use traditional I/O interrupts tied to a subchannel or
+adapter I/O interrupts for virtqueue notifications. For any given
+device, the two mechanisms are mutually exclusive.
+
+For the configuration change indicators, only a mechanism using
+traditional I/O interrupts is provided, regardless of whether
+traditional or adapter I/O interrupts are used for virtqueue
+notifications.
+
+\paragraph{Setting Up Classic Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Classic Queue Indicators}
+
+Indicators for notification via classic I/O interrupts are contained
+in a 64 bit value per virtio-ccw proxy device.
+
+To communicate the location of the indicator bits for host->guest
+notification, the driver uses the CCW_CMD_SET_IND command,
+pointing to a location containing the guest address of the
+indicators in a 64 bit value.
+
+If the driver has already set up two-staged queue indicators via the
+CCW_CMD_SET_IND_ADAPTER command, the device MUST post a unit check
+with command reject to any subsequent CCW_CMD_SET_IND command.
+
+\paragraph{Setting Up Configuration Change Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Configuration Change Indicators}
+
+Indicators for configuration change host->guest notification are
+contained in a 64 bit value per virtio-ccw proxy device.
+
+To communicate the location of the indicator bits used in the
+configuration change host->guest notification, the driver issues the
+CCW_CMD_SET_CONF_IND command, pointing to a location containing the
+guest address of the indicators in a 64 bit value.
+
+\paragraph{Setting Up Two-Stage Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
+
+Indicators for notification via adapter I/O interrupts consist of
+two stages:
+\begin{itemize}
+\item a summary indicator byte covering the virtqueues for one or more
+  virtio-ccw proxy devices
+\item a set of contigous indicator bits for the virtqueues for a
+  virtio-ccw proxy device
+\end{itemize}
+
+To communicate the location of the summary and queue indicator bits,
+the driver uses the CCW_CMD_SET_IND_ADAPTER command with the following
+payload:
+
+\begin{lstlisting}
+struct virtio_thinint_area {
+        be64 summary_indicator;
+        be64 indicator;
+        be64 bit_nr;
+        u8 isc;
+} __attribute__ ((packed));
+\end{lstlisting}
+
+\field{summary_indicator} contains the guest address of the 8 bit summary
+indicator.
+\field{indicator} contains the guest address of an area wherein the indicators
+for the devices are contained, starting at \field{bit_nr}, one bit per
+virtqueue of the device. Bit numbers start at the left, i.e. the most
+significant bit in the first byte is assigned the bit number 0.
+\field{isc} contains the I/O interruption subclass to be used for the adapter
+I/O interrupt. It MAY be different from the isc used by the proxy
+virtio-ccw device's subchannel.
+No padding is added at the end of the structure, it is exactly 25 bytes
+in length.
+
+
+\devicenormative{\subparagraph}{Setting Up Two-Stage Queue Indicators}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
+If the driver has already set up classic queue indicators via the
+CCW_CMD_SET_IND command, the device MUST post a unit check with
+command reject to any subsequent CCW_CMD_SET_IND_ADAPTER command.
+
+\paragraph{Legacy Interfaces: A Note on Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Legacy Interfaces: A Note on Setting Up Indicators}
+
+In some cases, legacy devices will only support classic queue indicators;
+in that case, they will reject CCW_CMD_SET_IND_ADAPTER as they don't know that
+command. Some legacy devices will support two-stage queue indicators, though,
+and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
+them up.
+
+\subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
+
+\subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}
+
+There are two modes of operation regarding host->guest notification,
+classic I/O interrupts and adapter I/O interrupts. The mode to be
+used is determined by the driver by using CCW_CMD_SET_IND respectively
+CCW_CMD_SET_IND_ADAPTER to set up queue indicators.
+
+For configuration changes, the driver always uses classic I/O
+interrupts.
+
+\paragraph{Notification via Classic I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Classic I/O Interrupts}
+
+If the driver used the CCW_CMD_SET_IND command to set up queue
+indicators, the device will use classic I/O interrupts for
+host->guest notification about virtqueue activity.
+
+For notifying the driver of virtqueue buffers, the device sets the
+corresponding bit in the guest-provided indicators. If an
+interrupt is not already pending for the subchannel, the device
+generates an unsolicited I/O interrupt.
+
+If the device wants to notify the driver about configuration
+changes, it sets bit 0 in the configuration indicators and
+generates an unsolicited I/O interrupt, if needed. This also
+applies if adapter I/O interrupts are used for queue notifications.
+
+\paragraph{Notification via Adapter I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+
+If the driver used the CCW_CMD_SET_IND_ADAPTER command to set up
+queue indicators, the device will use adapter I/O interrupts for
+host->guest notification about virtqueue activity.
+
+For notifying the driver of virtqueue buffers, the device sets the
+bit in the guest-provided indicator area at the corresponding offset.
+The guest-provided summary indicator is set to 0x01. An adapter I/O
+interrupt for the corresponding interruption subclass is generated.
+
+The recommended way to process an adapter I/O interrupt by the driver
+is as follows:
+
+\begin{itemize}
+\item Process all queue indicator bits associated with the summary indicator.
+\item Clear the summary indicator, performing a synchronization (memory
+barrier) afterwards.
+\item Process all queue indicator bits associated with the summary indicator
+again.
+\end{itemize}
+
+\devicenormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+
+The device SHOULD only generate an adapter I/O interrupt if the
+summary indicator had not been set prior to notification.
+
+\drivernormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+The driver
+MUST clear the summary indicator after receiving an adapter I/O
+interrupt before it processes the queue indicators.
+
+\paragraph{Legacy Interfaces: A Note on Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Legacy Interfaces: A Note on Host->Guest Notification}
+
+As legacy devices and drivers support only classic queue indicators,
+host->guest notification will always be done via classic I/O interrupts.
+
+\subsubsection{Guest->Host Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+
+For notifying the device of virtqueue buffers, the driver
+unfortunately can't use a channel command (the asynchronous
+characteristics of channel I/O interact badly with the host block
+I/O backend). Instead, it uses a diagnose 0x500 call with subcode
+3 specifying the queue, as follows:
+
+\begin{tabular}{ |l|l|l| }
+\hline
+GPR  &   Input Value     & Output Value \\
+\hline \hline
+  1   &       0x3         &              \\
+\hline
+  2   &  Subchannel ID    & Host Cookie  \\
+\hline
+  3   & Notification data &              \\
+\hline
+  4   &   Host Cookie     &              \\
+\hline
+\end{tabular}
+
+When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
+the \field{Notification data} contains the Virtqueue number.
+
+When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
+the value has the following format:
+\lstinputlisting{notifications-be.c}
+
+See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
+for the definition of the components.
+
+\devicenormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+The device MUST ignore bits 0-31 (counting from the left) of GPR2.
+This aligns passing the subchannel ID with the way it is passed
+for the existing I/O instructions.
+
+The device MAY return a 64-bit host cookie in GPR2 to speed up the
+notification execution.
+
+\drivernormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+
+For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.
+
+\begin{note}
+For example:
+\begin{lstlisting}
+info->cookie = do_notify(schid,
+                         virtqueue_get_queue_index(vq),
+                         info->cookie);
+\end{lstlisting}
+\end{note}
+
+\subsubsection{Resetting Devices}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
+
+In order to reset a device, a driver sends the
+CCW_CMD_VDEV_RESET command. This command does not carry any payload.
+
+The device signals completion of the virtio reset operation through successful
+conclusion of the CCW_CMD_VDEV_RESET channel command. In particular, the
+command not only triggers the reset operation, but the reset operation is
+already completed when the operation concludes successfully.
+
+\devicenormative{\paragraph}{Resetting Devices}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
+
+The device MUST finish the virtio reset operation and reinitialize
+\field{device status} to zero before it concludes the CCW_CMD_VDEV_RESET
+command successfully.
+
+The device MUST NOT send notifications or interact with the queues after
+it signaled successful conclusion of the CCW_CMD_VDEV_RESET command.
+
+\drivernormative{\paragraph}{Resetting Devices}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
+
+The driver MAY consider the virtio reset operation to be complete already after
+successful conclusion of the CCW_CMD_VDEV_RESET channel command, although it
+MAY also choose to verify reset completion by reading \field{device status} via
+CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 2/6] transport-mmio: Split MMIO transport to its own file
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Place MMIO transport specification in its own file to better maintain it.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 content.tex        | 554 +--------------------------------------------
 transport-mmio.tex | 552 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 553 insertions(+), 553 deletions(-)
 create mode 100644 transport-mmio.tex

diff --git a/content.tex b/content.tex
index be911e6..80c28df 100644
--- a/content.tex
+++ b/content.tex
@@ -580,559 +580,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
 into virtio general and bus-specific sections.
 
 \input{transport-pci.tex}
-
-\subsection{MMIO Device Discovery}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Discovery}
-
-Unlike PCI, MMIO provides no generic device discovery mechanism.  For each
-device, the guest OS will need to know the location of the registers
-and interrupt(s) used.  The suggested binding for systems using
-flattened device trees is shown in this example:
-
-\begin{lstlisting}
-// EXAMPLE: virtio_block device taking 512 bytes at 0x1e000, interrupt 42.
-virtio_block@1e000 {
-        compatible = "virtio,mmio";
-        reg = <0x1e000 0x200>;
-        interrupts = <42>;
-}
-\end{lstlisting}
-
-\subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
-
-MMIO virtio devices provide a set of memory mapped control
-registers followed by a device-specific configuration space,
-described in the table~\ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
-
-All register values are organized as Little Endian.
-
-\newcommand{\mmioreg}[5]{% Name Function Offset Direction Description
-  {\field{#1}} \newline #3 \newline #4 & {\bf#2} \newline #5 \\
-}
-
-\newcommand{\mmiodreg}[7]{% NameHigh NameLow Function OffsetHigh OffsetLow Direction Description
-  {\field{#1}} \newline #4 \newline {\field{#2}} \newline #5 \newline #6 & {\bf#3} \newline #7 \\
-}
-
-\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
-  \caption {MMIO Device Register Layout}
-  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout} \\
-  \hline
-  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
-  \hline 
-  \hline 
-  \endfirsthead
-  \hline
-  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
-  \hline 
-  \hline 
-  \endhead
-  \endfoot
-  \endlastfoot
-  \mmioreg{MagicValue}{Magic value}{0x000}{R}{%
-    0x74726976
-    (a Little Endian equivalent of the ``virt'' string).
-  } 
-  \hline
-  \mmioreg{Version}{Device version number}{0x004}{R}{%
-    0x2.
-    \begin{note}
-      Legacy devices (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1.
-    \end{note}
-  }
-  \hline 
-  \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{%
-    See \ref{sec:Device Types}~\nameref{sec:Device Types} for possible values.
-    Value zero (0x0) is used to
-    define a system memory map with placeholder devices at static,
-    well known addresses, assigning functions to them depending
-    on user's needs.
-  }
-  \hline 
-  \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
-  \hline 
-  \mmioreg{DeviceFeatures}{Flags representing features the device supports}{0x010}{R}{%
-    Reading from this register returns 32 consecutive flag bits,
-    the least significant bit depending on the last value written to
-    \field{DeviceFeaturesSel}. Access to this register returns
-    bits $\field{DeviceFeaturesSel}*32$ to $(\field{DeviceFeaturesSel}*32)+31$, eg.
-    feature bits 0 to 31 if \field{DeviceFeaturesSel} is set to 0 and
-    features bits 32 to 63 if \field{DeviceFeaturesSel} is set to 1.
-    Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
-  }
-  \hline 
-  \mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{%
-    Writing to this register selects a set of 32 device feature bits
-    accessible by reading from \field{DeviceFeatures}.
-  }
-  \hline 
-  \mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{%
-    Writing to this register sets 32 consecutive flag bits, the least significant
-    bit depending on the last value written to \field{DriverFeaturesSel}.
-     Access to this register sets bits $\field{DriverFeaturesSel}*32$
-    to $(\field{DriverFeaturesSel}*32)+31$, eg. feature bits 0 to 31 if
-    \field{DriverFeaturesSel} is set to 0 and features bits 32 to 63 if
-    \field{DriverFeaturesSel} is set to 1. Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
-  }
-  \hline 
-  \mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{%
-    Writing to this register selects a set of 32 activated feature
-    bits accessible by writing to \field{DriverFeatures}.
-  }
-  \hline 
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
-    Writing to this register selects the virtual queue that the
-    following operations on \field{QueueNumMax}, \field{QueueNum}, \field{QueueReady},
-    \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, \field{QueueDriverHigh},
-    \field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} apply to. The index
-    number of the first queue is zero (0x0). 
-  }
-  \hline 
-  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
-    Reading from the register returns the maximum size (number of
-    elements) of the queue the device is ready to process or
-    zero (0x0) if the queue is not available. This applies to the
-    queue selected by writing to \field{QueueSel}.
-  }
-  \hline 
-  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
-    Queue size is the number of elements in the queue.
-    Writing to this register notifies the device what size of the
-    queue the driver will use. This applies to the queue selected by
-    writing to \field{QueueSel}.
-  }
-  \hline 
-  \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
-    Writing one (0x1) to this register notifies the device that it can
-    execute requests from this virtual queue. Reading from this register
-    returns the last value written to it. Both read and write
-    accesses apply to the queue selected by writing to \field{QueueSel}.
-  }
-  \hline 
-  \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
-    Writing a value to this register notifies the device that
-    there are new buffers to process in a queue.
-
-    When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-    the value written is the queue index.
-
-    When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
-    the \field{Notification data} value has the following format:
-
-    \lstinputlisting{notifications-le.c}
-
-    See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
-    for the definition of the components.
-  }
-  \hline 
-  \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
-    Reading from this register returns a bit mask of events that
-    caused the device interrupt to be asserted.
-    The following events are possible:
-    \begin{description}
-      \item[Used Buffer Notification] - bit 0 - the interrupt was asserted
-        because the device has used a buffer
-        in at least one of the active virtual queues.
-      \item [Configuration Change Notification] - bit 1 - the interrupt was
-        asserted because the configuration of the device has changed.
-    \end{description}
-  }
-  \hline 
-  \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
-    Writing a value with bits set as defined in \field{InterruptStatus}
-    to this register notifies the device that events causing
-    the interrupt have been handled.
-  }
-  \hline 
-  \mmioreg{Status}{Device status}{0x070}{RW}{%
-    Reading from this register returns the current device status
-    flags.
-    Writing non-zero values to this register sets the status flags,
-    indicating the driver progress. Writing zero (0x0) to this
-    register triggers a device reset. 
-    See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}.
-  }
-  \hline 
-  \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Area 64 bit long physical address}{0x080}{0x084}{W}{%
-    Writing to these two registers (lower 32 bits of the address
-    to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
-    the device about location of the Descriptor Area of the queue
-    selected by writing to \field{QueueSel} register.
-  }
-  \hline 
-  \mmiodreg{QueueDriverLow}{QueueDriverHigh}{Virtual queue's Driver Area 64 bit long physical address}{0x090}{0x094}{W}{%
-    Writing to these two registers (lower 32 bits of the address
-    to \field{QueueDriverLow}, higher 32 bits to \field{QueueDriverHigh}) notifies
-    the device about location of the Driver Area of the queue
-    selected by writing to \field{QueueSel}.
-  }
-  \hline 
-  \mmiodreg{QueueDeviceLow}{QueueDeviceHigh}{Virtual queue's Device Area 64 bit long physical address}{0x0a0}{0x0a4}{W}{%
-    Writing to these two registers (lower 32 bits of the address
-    to \field{QueueDeviceLow}, higher 32 bits to \field{QueueDeviceHigh}) notifies
-    the device about location of the Device Area of the queue
-    selected by writing to \field{QueueSel}.
-  }
-  \hline 
-  \mmioreg{SHMSel}{Shared memory id}{0x0ac}{W}{%
-    Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
-    following operations on \field{SHMLenLow}, \field{SHMLenHigh},
-    \field{SHMBaseLow} and \field{SHMBaseHigh} apply to.
-  }
-  \hline 
-  \mmiodreg{SHMLenLow}{SHMLenHigh}{Shared memory region 64 bit long length}{0x0b0}{0x0b4}{R}{%
-    These registers return the length of the shared memory
-    region in bytes, as defined by the device for the region selected by
-    the \field{SHMSel} register.  The lower 32 bits of the length
-    are read from \field{SHMLenLow} and the higher 32 bits from
-    \field{SHMLenHigh}.  Reading from a non-existent
-    region (i.e. where the ID written to \field{SHMSel} is unused)
-    results in a length of -1.
-  }
-  \hline 
-  \mmiodreg{SHMBaseLow}{SHMBaseHigh}{Shared memory region 64 bit long physical address}{0x0b8}{0x0bc}{R}{%
-    The driver reads these registers to discover the base address
-    of the region in physical address space.  This address is
-    chosen by the device (or other part of the VMM).
-    The lower 32 bits of the address are read from \field{SHMBaseLow}
-    with the higher 32 bits from \field{SHMBaseHigh}.  Reading
-    from a non-existent region (i.e. where the ID written to
-    \field{SHMSel} is unused) results in a base address of
-    0xffffffffffffffff.
-  }
-  \hline 
-  \mmioreg{QueueReset}{Virtual queue reset bit}{0x0c0}{RW}{%
-    If VIRTIO_F_RING_RESET has been negotiated, writing one (0x1) to this
-    register selectively resets the queue. Both read and write accesses
-    apply to the queue selected by writing to \field{QueueSel}.
-  }
-  \hline
-  \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
-    Reading from this register returns a value describing a version of the device-specific configuration space (see \field{Config}).
-    The driver can then access the configuration space and, when finished, read \field{ConfigGeneration} again.
-    If no part of the configuration space has changed between these two \field{ConfigGeneration} reads, the returned values are identical.
-    If the values are different, the configuration space accesses were not atomic and the driver has to perform the operations again.
-    See also \ref {sec:Basic Facilities of a Virtio Device / Device Configuration Space}.
-  }
-  \hline 
-  \mmioreg{Config}{Configuration space}{0x100+}{RW}{
-    Device-specific configuration space starts at the offset 0x100
-    and is accessed with byte alignment. Its meaning and size
-    depend on the device and the driver.
-  }
-  \hline
-\end{longtable}
-
-\devicenormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
-
-The device MUST return 0x74726976 in \field{MagicValue}.
-
-The device MUST return value 0x2 in \field{Version}.
-
-The device MUST present each event by setting the corresponding bit in \field{InterruptStatus} from the
-moment it takes place, until the driver acknowledges the interrupt
-by writing a corresponding bit mask to the \field{InterruptACK} register.  Bits which
-do not represent events which took place MUST be zero.
-
-Upon reset, the device MUST clear all bits in \field{InterruptStatus} and ready bits in the
-\field{QueueReady} register for all queues in the device.
-
-The device MUST change value returned in \field{ConfigGeneration} if there is any risk of a
-driver seeing an inconsistent configuration state.
-
-The device MUST NOT access virtual queue contents when \field{QueueReady} is zero (0x0).
-
-If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
-\field{QueueReset} on reset.
-
-If VIRTIO_F_RING_RESET has been negotiated, The device MUST present a 0 in
-\field{QueueReset} after the virtqueue is enabled with \field{QueueReady}.
-
-The device MUST reset the queue when 1 is written to \field{QueueReset}. The
-device MUST continue to present 1 in \field{QueueReset} as long as the queue reset
-is ongoing. The device MUST present 0 in both \field{QueueReset} and \field{QueueReady}
-when queue reset has completed.
-(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
-
-\drivernormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
-The driver MUST NOT access memory locations not described in the
-table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}
-(or, in case of the configuration space, described in the device specification),
-MUST NOT write to the read-only registers (direction R) and
-MUST NOT read from the write-only registers (direction W).
-
-The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers
-described in table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
-For the device-specific configuration space, the driver MUST use 8 bit wide accesses for
-8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and
-aligned accesses for 32 and 64 bit wide fields.
-
-The driver MUST ignore a device with \field{MagicValue} which is not 0x74726976,
-although it MAY report an error.
-
-The driver MUST ignore a device with \field{Version} which is not 0x2,
-although it MAY report an error.
-
-The driver MUST ignore a device with \field{DeviceID} 0x0,
-but MUST NOT report any error.
-
-Before reading from \field{DeviceFeatures}, the driver MUST write a value to \field{DeviceFeaturesSel}.
-
-Before writing to the \field{DriverFeatures} register, the driver MUST write a value to the \field{DriverFeaturesSel} register.
-
-The driver MUST write a value to \field{QueueNum} which is less than
-or equal to the value presented by the device in \field{QueueNumMax}.
-
-When \field{QueueReady} is not zero, the driver MUST NOT access
-\field{QueueNum}, \field{QueueDescLow}, \field{QueueDescHigh},
-\field{QueueDriverLow}, \field{QueueDriverHigh}, \field{QueueDeviceLow}, \field{QueueDeviceHigh}.
-
-To stop using the queue the driver MUST write zero (0x0) to this
-\field{QueueReady} and MUST read the value back to ensure
-synchronization.
-
-The driver MUST ignore undefined bits in \field{InterruptStatus}.
-
-The driver MUST write a value with a bit mask describing events it handled into \field{InterruptACK} when
-it finishes handling an interrupt and MUST NOT set any of the undefined bits in the value.
-
-If VIRTIO_F_RING_RESET has been negotiated, after the driver writes 1 to
-\field{QueueReset} to reset the queue, the driver MUST NOT consider queue
-reset to be complete until it reads back 0 in \field{QueueReset}. The driver
-MAY re-enable the queue by writing 1 to \field{QueueReady} after ensuring
-that other virtqueue fields have been set up correctly. The driver MAY set
-driver-writeable queue configuration values to different values than those
-that were used before the queue reset.
-(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
-
-\subsection{MMIO-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation}
-
-\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
-
-\drivernormative{\paragraph}{Device Initialization}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
-
-The driver MUST start the device initialization by reading and
-checking values from \field{MagicValue} and \field{Version}.
-If both values are valid, it MUST read \field{DeviceID}
-and if its value is zero (0x0) MUST abort initialization and
-MUST NOT access any other register.
-
-Drivers not expecting shared memory MUST NOT use the shared
-memory registers.
-
-Further initialization MUST follow the procedure described in
-\ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
-
-\subsubsection{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration}
-
-The driver will typically initialize the virtual queue in the following way:
-
-\begin{enumerate}
-\item Select the queue writing its index (first queue is 0) to
-   \field{QueueSel}.
-
-\item Check if the queue is not already in use: read \field{QueueReady},
-   and expect a returned value of zero (0x0).
-
-\item Read maximum queue size (number of elements) from
-   \field{QueueNumMax}. If the returned value is zero (0x0) the
-   queue is not available.
-
-\item Allocate and zero the queue memory, making sure the memory
-   is physically contiguous.
-
-\item Notify the device about the queue size by writing the size to
-   \field{QueueNum}.
-
-\item Write physical addresses of the queue's Descriptor Area,
-   Driver Area and Device Area to (respectively) the
-   \field{QueueDescLow}/\field{QueueDescHigh},
-   \field{QueueDriverLow}/\field{QueueDriverHigh} and
-   \field{QueueDeviceLow}/\field{QueueDeviceHigh} register pairs.
-
-\item Write 0x1 to \field{QueueReady}.
-\end{enumerate}
-
-\subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Available Buffer Notifications}
-
-When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
-of the queue to be notified to \field{QueueNotify}.
-
-When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
-the driver sends an available buffer notification to the device by writing
-the following 32-bit value to \field{QueueNotify}:
-\lstinputlisting{notifications-le.c}
-
-See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
-for the definition of the components.
-
-\subsubsection{Notifications From The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
-
-The memory mapped virtio device is using a single, dedicated
-interrupt signal, which is asserted when at least one of the
-bits described in the description of \field{InterruptStatus}
-is set. This is how the device sends a used buffer notification
-or a configuration change notification to the device.
-
-\drivernormative{\paragraph}{Notifications From The Device}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
-After receiving an interrupt, the driver MUST read
-\field{InterruptStatus} to check what caused the interrupt (see the
-register description).  The used buffer notification bit being set
-SHOULD be interpreted as a used buffer notification for each active
-virtqueue.  After the interrupt is handled, the driver MUST acknowledge
-it by writing a bit mask corresponding to the handled events to the
-InterruptACK register.
-
-\subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}
-
-The legacy MMIO transport used page-based addressing, resulting
-in a slightly different control register layout, the device
-initialization and the virtual queue configuration procedure.
-
-Table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} 
-presents control registers layout, omitting
-descriptions of registers which did not change their function
-nor behaviour:
-
-\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
-  \caption {MMIO Device Legacy Register Layout}
-  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} \\
-  \hline
-  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
-  \hline 
-  \hline 
-  \endfirsthead
-  \hline
-  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
-  \hline 
-  \hline 
-  \endhead
-  \endfoot
-  \endlastfoot
-  \mmioreg{MagicValue}{Magic value}{0x000}{R}{}
-  \hline
-  \mmioreg{Version}{Device version number}{0x004}{R}{Legacy device returns value 0x1.}
-  \hline
-  \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{}
-  \hline
-  \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
-  \hline
-  \mmioreg{HostFeatures}{Flags representing features the device supports}{0x010}{R}{}
-  \hline
-  \mmioreg{HostFeaturesSel}{Device (host) features word selection.}{0x014}{W}{}
-  \hline
-  \mmioreg{GuestFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{}
-  \hline
-  \mmioreg{GuestFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{}
-  \hline 
-  \mmioreg{GuestPageSize}{Guest page size}{0x028}{W}{%
-    The driver writes the guest page size in bytes to the
-    register during initialization, before any queues are used.
-    This value should be a power of 2 and is used by the device to
-    calculate the Guest address of the first queue page
-    (see QueuePFN).
-  }
-  \hline
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
-    Writing to this register selects the virtual queue that the
-    following operations on the \field{QueueNumMax}, \field{QueueNum}, \field{QueueAlign}
-    and \field{QueuePFN} registers apply to. The index
-    number of the first queue is zero (0x0). 
-.
-  }
-  \hline
-  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
-    Reading from the register returns the maximum size of the queue
-    the device is ready to process or zero (0x0) if the queue is not
-    available. This applies to the queue selected by writing to
-    \field{QueueSel} and is allowed only when \field{QueuePFN} is set to zero
-    (0x0), so when the queue is not actively used.
-  }
-  \hline
-  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
-    Queue size is the number of elements in the queue.
-    Writing to this register notifies the device what size of the
-    queue the driver will use. This applies to the queue selected by
-    writing to \field{QueueSel}.
-  }
-  \hline
-  \mmioreg{QueueAlign}{Used Ring alignment in the virtual queue}{0x03c}{W}{%
-    Writing to this register notifies the device about alignment
-    boundary of the Used Ring in bytes. This value should be a power
-    of 2 and applies to the queue selected by writing to \field{QueueSel}.
-  }
-  \hline
-  \mmioreg{QueuePFN}{Guest physical page number of the virtual queue}{0x040}{RW}{%
-    Writing to this register notifies the device about location of the
-    virtual queue in the Guest's physical address space. This value
-    is the index number of a page starting with the queue
-    Descriptor Table. Value zero (0x0) means physical address zero
-    (0x00000000) and is illegal. When the driver stops using the
-    queue it writes zero (0x0) to this register.
-    Reading from this register returns the currently used page
-    number of the queue, therefore a value other than zero (0x0)
-    means that the queue is in use.
-    Both read and write accesses apply to the queue selected by
-    writing to \field{QueueSel}.
-  }
-  \hline
-  \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{}
-  \hline
-  \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{}
-  \hline
-  \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{}
-  \hline
-  \mmioreg{Status}{Device status}{0x070}{RW}{%
-    Reading from this register returns the current device status
-    flags.
-    Writing non-zero values to this register sets the status flags,
-    indicating the OS/driver progress. Writing zero (0x0) to this
-    register triggers a device reset. The device
-    sets \field{QueuePFN} to zero (0x0) for all queues in the device.
-    Also see \ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
-  }
-  \hline
-  \mmioreg{Config}{Configuration space}{0x100+}{RW}{}
-  \hline
-\end{longtable}
-
-The virtual queue page size is defined by writing to \field{GuestPageSize},
-as written by the guest. The driver does this before the
-virtual queues are configured.
-
-The virtual queue layout follows
-p. \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout},
-with the alignment defined in \field{QueueAlign}.
-
-The virtual queue is configured as follows:
-\begin{enumerate}
-\item Select the queue writing its index (first queue is 0) to
-   \field{QueueSel}.
-
-\item Check if the queue is not already in use: read \field{QueuePFN},
-   expecting a returned value of zero (0x0).
-
-\item Read maximum queue size (number of elements) from
-   \field{QueueNumMax}. If the returned value is zero (0x0) the
-   queue is not available.
-
-\item Allocate and zero the queue pages in contiguous virtual
-   memory, aligning the Used Ring to an optimal boundary (usually
-   page size). The driver should choose a queue size smaller than or
-   equal to \field{QueueNumMax}.
-
-\item Notify the device about the queue size by writing the size to
-   \field{QueueNum}.
-
-\item Notify the device about the used alignment by writing its value
-   in bytes to \field{QueueAlign}.
-
-\item Write the physical number of the first page of the queue to
-   the \field{QueuePFN} register.
-\end{enumerate}
-
-Notification mechanisms did not change.
+\input{transport-mmio.tex}
 
 \section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio Over Channel I/O}
 
diff --git a/transport-mmio.tex b/transport-mmio.tex
new file mode 100644
index 0000000..7f2e0c3
--- /dev/null
+++ b/transport-mmio.tex
@@ -0,0 +1,552 @@
+\subsection{MMIO Device Discovery}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Discovery}
+
+Unlike PCI, MMIO provides no generic device discovery mechanism.  For each
+device, the guest OS will need to know the location of the registers
+and interrupt(s) used.  The suggested binding for systems using
+flattened device trees is shown in this example:
+
+\begin{lstlisting}
+// EXAMPLE: virtio_block device taking 512 bytes at 0x1e000, interrupt 42.
+virtio_block@1e000 {
+        compatible = "virtio,mmio";
+        reg = <0x1e000 0x200>;
+        interrupts = <42>;
+}
+\end{lstlisting}
+
+\subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+
+MMIO virtio devices provide a set of memory mapped control
+registers followed by a device-specific configuration space,
+described in the table~\ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+
+All register values are organized as Little Endian.
+
+\newcommand{\mmioreg}[5]{% Name Function Offset Direction Description
+  {\field{#1}} \newline #3 \newline #4 & {\bf#2} \newline #5 \\
+}
+
+\newcommand{\mmiodreg}[7]{% NameHigh NameLow Function OffsetHigh OffsetLow Direction Description
+  {\field{#1}} \newline #4 \newline {\field{#2}} \newline #5 \newline #6 & {\bf#3} \newline #7 \\
+}
+
+\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
+  \caption {MMIO Device Register Layout}
+  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout} \\
+  \hline
+  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
+  \hline 
+  \hline 
+  \endfirsthead
+  \hline
+  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
+  \hline 
+  \hline 
+  \endhead
+  \endfoot
+  \endlastfoot
+  \mmioreg{MagicValue}{Magic value}{0x000}{R}{%
+    0x74726976
+    (a Little Endian equivalent of the ``virt'' string).
+  } 
+  \hline
+  \mmioreg{Version}{Device version number}{0x004}{R}{%
+    0x2.
+    \begin{note}
+      Legacy devices (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1.
+    \end{note}
+  }
+  \hline 
+  \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{%
+    See \ref{sec:Device Types}~\nameref{sec:Device Types} for possible values.
+    Value zero (0x0) is used to
+    define a system memory map with placeholder devices at static,
+    well known addresses, assigning functions to them depending
+    on user's needs.
+  }
+  \hline 
+  \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
+  \hline 
+  \mmioreg{DeviceFeatures}{Flags representing features the device supports}{0x010}{R}{%
+    Reading from this register returns 32 consecutive flag bits,
+    the least significant bit depending on the last value written to
+    \field{DeviceFeaturesSel}. Access to this register returns
+    bits $\field{DeviceFeaturesSel}*32$ to $(\field{DeviceFeaturesSel}*32)+31$, eg.
+    feature bits 0 to 31 if \field{DeviceFeaturesSel} is set to 0 and
+    features bits 32 to 63 if \field{DeviceFeaturesSel} is set to 1.
+    Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
+  }
+  \hline 
+  \mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{%
+    Writing to this register selects a set of 32 device feature bits
+    accessible by reading from \field{DeviceFeatures}.
+  }
+  \hline 
+  \mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{%
+    Writing to this register sets 32 consecutive flag bits, the least significant
+    bit depending on the last value written to \field{DriverFeaturesSel}.
+     Access to this register sets bits $\field{DriverFeaturesSel}*32$
+    to $(\field{DriverFeaturesSel}*32)+31$, eg. feature bits 0 to 31 if
+    \field{DriverFeaturesSel} is set to 0 and features bits 32 to 63 if
+    \field{DriverFeaturesSel} is set to 1. Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
+  }
+  \hline 
+  \mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{%
+    Writing to this register selects a set of 32 activated feature
+    bits accessible by writing to \field{DriverFeatures}.
+  }
+  \hline 
+  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+    Writing to this register selects the virtual queue that the
+    following operations on \field{QueueNumMax}, \field{QueueNum}, \field{QueueReady},
+    \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, \field{QueueDriverHigh},
+    \field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} apply to. The index
+    number of the first queue is zero (0x0). 
+  }
+  \hline 
+  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+    Reading from the register returns the maximum size (number of
+    elements) of the queue the device is ready to process or
+    zero (0x0) if the queue is not available. This applies to the
+    queue selected by writing to \field{QueueSel}.
+  }
+  \hline 
+  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+    Queue size is the number of elements in the queue.
+    Writing to this register notifies the device what size of the
+    queue the driver will use. This applies to the queue selected by
+    writing to \field{QueueSel}.
+  }
+  \hline 
+  \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
+    Writing one (0x1) to this register notifies the device that it can
+    execute requests from this virtual queue. Reading from this register
+    returns the last value written to it. Both read and write
+    accesses apply to the queue selected by writing to \field{QueueSel}.
+  }
+  \hline 
+  \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
+    Writing a value to this register notifies the device that
+    there are new buffers to process in a queue.
+
+    When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
+    the value written is the queue index.
+
+    When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
+    the \field{Notification data} value has the following format:
+
+    \lstinputlisting{notifications-le.c}
+
+    See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
+    for the definition of the components.
+  }
+  \hline 
+  \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
+    Reading from this register returns a bit mask of events that
+    caused the device interrupt to be asserted.
+    The following events are possible:
+    \begin{description}
+      \item[Used Buffer Notification] - bit 0 - the interrupt was asserted
+        because the device has used a buffer
+        in at least one of the active virtual queues.
+      \item [Configuration Change Notification] - bit 1 - the interrupt was
+        asserted because the configuration of the device has changed.
+    \end{description}
+  }
+  \hline 
+  \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
+    Writing a value with bits set as defined in \field{InterruptStatus}
+    to this register notifies the device that events causing
+    the interrupt have been handled.
+  }
+  \hline 
+  \mmioreg{Status}{Device status}{0x070}{RW}{%
+    Reading from this register returns the current device status
+    flags.
+    Writing non-zero values to this register sets the status flags,
+    indicating the driver progress. Writing zero (0x0) to this
+    register triggers a device reset. 
+    See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}.
+  }
+  \hline 
+  \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Area 64 bit long physical address}{0x080}{0x084}{W}{%
+    Writing to these two registers (lower 32 bits of the address
+    to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
+    the device about location of the Descriptor Area of the queue
+    selected by writing to \field{QueueSel} register.
+  }
+  \hline 
+  \mmiodreg{QueueDriverLow}{QueueDriverHigh}{Virtual queue's Driver Area 64 bit long physical address}{0x090}{0x094}{W}{%
+    Writing to these two registers (lower 32 bits of the address
+    to \field{QueueDriverLow}, higher 32 bits to \field{QueueDriverHigh}) notifies
+    the device about location of the Driver Area of the queue
+    selected by writing to \field{QueueSel}.
+  }
+  \hline 
+  \mmiodreg{QueueDeviceLow}{QueueDeviceHigh}{Virtual queue's Device Area 64 bit long physical address}{0x0a0}{0x0a4}{W}{%
+    Writing to these two registers (lower 32 bits of the address
+    to \field{QueueDeviceLow}, higher 32 bits to \field{QueueDeviceHigh}) notifies
+    the device about location of the Device Area of the queue
+    selected by writing to \field{QueueSel}.
+  }
+  \hline 
+  \mmioreg{SHMSel}{Shared memory id}{0x0ac}{W}{%
+    Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
+    following operations on \field{SHMLenLow}, \field{SHMLenHigh},
+    \field{SHMBaseLow} and \field{SHMBaseHigh} apply to.
+  }
+  \hline 
+  \mmiodreg{SHMLenLow}{SHMLenHigh}{Shared memory region 64 bit long length}{0x0b0}{0x0b4}{R}{%
+    These registers return the length of the shared memory
+    region in bytes, as defined by the device for the region selected by
+    the \field{SHMSel} register.  The lower 32 bits of the length
+    are read from \field{SHMLenLow} and the higher 32 bits from
+    \field{SHMLenHigh}.  Reading from a non-existent
+    region (i.e. where the ID written to \field{SHMSel} is unused)
+    results in a length of -1.
+  }
+  \hline 
+  \mmiodreg{SHMBaseLow}{SHMBaseHigh}{Shared memory region 64 bit long physical address}{0x0b8}{0x0bc}{R}{%
+    The driver reads these registers to discover the base address
+    of the region in physical address space.  This address is
+    chosen by the device (or other part of the VMM).
+    The lower 32 bits of the address are read from \field{SHMBaseLow}
+    with the higher 32 bits from \field{SHMBaseHigh}.  Reading
+    from a non-existent region (i.e. where the ID written to
+    \field{SHMSel} is unused) results in a base address of
+    0xffffffffffffffff.
+  }
+  \hline 
+  \mmioreg{QueueReset}{Virtual queue reset bit}{0x0c0}{RW}{%
+    If VIRTIO_F_RING_RESET has been negotiated, writing one (0x1) to this
+    register selectively resets the queue. Both read and write accesses
+    apply to the queue selected by writing to \field{QueueSel}.
+  }
+  \hline
+  \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
+    Reading from this register returns a value describing a version of the device-specific configuration space (see \field{Config}).
+    The driver can then access the configuration space and, when finished, read \field{ConfigGeneration} again.
+    If no part of the configuration space has changed between these two \field{ConfigGeneration} reads, the returned values are identical.
+    If the values are different, the configuration space accesses were not atomic and the driver has to perform the operations again.
+    See also \ref {sec:Basic Facilities of a Virtio Device / Device Configuration Space}.
+  }
+  \hline 
+  \mmioreg{Config}{Configuration space}{0x100+}{RW}{
+    Device-specific configuration space starts at the offset 0x100
+    and is accessed with byte alignment. Its meaning and size
+    depend on the device and the driver.
+  }
+  \hline
+\end{longtable}
+
+\devicenormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+
+The device MUST return 0x74726976 in \field{MagicValue}.
+
+The device MUST return value 0x2 in \field{Version}.
+
+The device MUST present each event by setting the corresponding bit in \field{InterruptStatus} from the
+moment it takes place, until the driver acknowledges the interrupt
+by writing a corresponding bit mask to the \field{InterruptACK} register.  Bits which
+do not represent events which took place MUST be zero.
+
+Upon reset, the device MUST clear all bits in \field{InterruptStatus} and ready bits in the
+\field{QueueReady} register for all queues in the device.
+
+The device MUST change value returned in \field{ConfigGeneration} if there is any risk of a
+driver seeing an inconsistent configuration state.
+
+The device MUST NOT access virtual queue contents when \field{QueueReady} is zero (0x0).
+
+If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
+\field{QueueReset} on reset.
+
+If VIRTIO_F_RING_RESET has been negotiated, The device MUST present a 0 in
+\field{QueueReset} after the virtqueue is enabled with \field{QueueReady}.
+
+The device MUST reset the queue when 1 is written to \field{QueueReset}. The
+device MUST continue to present 1 in \field{QueueReset} as long as the queue reset
+is ongoing. The device MUST present 0 in both \field{QueueReset} and \field{QueueReady}
+when queue reset has completed.
+(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
+
+\drivernormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+The driver MUST NOT access memory locations not described in the
+table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}
+(or, in case of the configuration space, described in the device specification),
+MUST NOT write to the read-only registers (direction R) and
+MUST NOT read from the write-only registers (direction W).
+
+The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers
+described in table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+For the device-specific configuration space, the driver MUST use 8 bit wide accesses for
+8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and
+aligned accesses for 32 and 64 bit wide fields.
+
+The driver MUST ignore a device with \field{MagicValue} which is not 0x74726976,
+although it MAY report an error.
+
+The driver MUST ignore a device with \field{Version} which is not 0x2,
+although it MAY report an error.
+
+The driver MUST ignore a device with \field{DeviceID} 0x0,
+but MUST NOT report any error.
+
+Before reading from \field{DeviceFeatures}, the driver MUST write a value to \field{DeviceFeaturesSel}.
+
+Before writing to the \field{DriverFeatures} register, the driver MUST write a value to the \field{DriverFeaturesSel} register.
+
+The driver MUST write a value to \field{QueueNum} which is less than
+or equal to the value presented by the device in \field{QueueNumMax}.
+
+When \field{QueueReady} is not zero, the driver MUST NOT access
+\field{QueueNum}, \field{QueueDescLow}, \field{QueueDescHigh},
+\field{QueueDriverLow}, \field{QueueDriverHigh}, \field{QueueDeviceLow}, \field{QueueDeviceHigh}.
+
+To stop using the queue the driver MUST write zero (0x0) to this
+\field{QueueReady} and MUST read the value back to ensure
+synchronization.
+
+The driver MUST ignore undefined bits in \field{InterruptStatus}.
+
+The driver MUST write a value with a bit mask describing events it handled into \field{InterruptACK} when
+it finishes handling an interrupt and MUST NOT set any of the undefined bits in the value.
+
+If VIRTIO_F_RING_RESET has been negotiated, after the driver writes 1 to
+\field{QueueReset} to reset the queue, the driver MUST NOT consider queue
+reset to be complete until it reads back 0 in \field{QueueReset}. The driver
+MAY re-enable the queue by writing 1 to \field{QueueReady} after ensuring
+that other virtqueue fields have been set up correctly. The driver MAY set
+driver-writeable queue configuration values to different values than those
+that were used before the queue reset.
+(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
+
+\subsection{MMIO-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation}
+
+\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
+
+\drivernormative{\paragraph}{Device Initialization}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
+
+The driver MUST start the device initialization by reading and
+checking values from \field{MagicValue} and \field{Version}.
+If both values are valid, it MUST read \field{DeviceID}
+and if its value is zero (0x0) MUST abort initialization and
+MUST NOT access any other register.
+
+Drivers not expecting shared memory MUST NOT use the shared
+memory registers.
+
+Further initialization MUST follow the procedure described in
+\ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
+
+\subsubsection{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration}
+
+The driver will typically initialize the virtual queue in the following way:
+
+\begin{enumerate}
+\item Select the queue writing its index (first queue is 0) to
+   \field{QueueSel}.
+
+\item Check if the queue is not already in use: read \field{QueueReady},
+   and expect a returned value of zero (0x0).
+
+\item Read maximum queue size (number of elements) from
+   \field{QueueNumMax}. If the returned value is zero (0x0) the
+   queue is not available.
+
+\item Allocate and zero the queue memory, making sure the memory
+   is physically contiguous.
+
+\item Notify the device about the queue size by writing the size to
+   \field{QueueNum}.
+
+\item Write physical addresses of the queue's Descriptor Area,
+   Driver Area and Device Area to (respectively) the
+   \field{QueueDescLow}/\field{QueueDescHigh},
+   \field{QueueDriverLow}/\field{QueueDriverHigh} and
+   \field{QueueDeviceLow}/\field{QueueDeviceHigh} register pairs.
+
+\item Write 0x1 to \field{QueueReady}.
+\end{enumerate}
+
+\subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Available Buffer Notifications}
+
+When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
+the driver sends an available buffer notification to the device by writing
+the 16-bit virtqueue index
+of the queue to be notified to \field{QueueNotify}.
+
+When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
+the driver sends an available buffer notification to the device by writing
+the following 32-bit value to \field{QueueNotify}:
+\lstinputlisting{notifications-le.c}
+
+See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
+for the definition of the components.
+
+\subsubsection{Notifications From The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
+
+The memory mapped virtio device is using a single, dedicated
+interrupt signal, which is asserted when at least one of the
+bits described in the description of \field{InterruptStatus}
+is set. This is how the device sends a used buffer notification
+or a configuration change notification to the device.
+
+\drivernormative{\paragraph}{Notifications From The Device}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
+After receiving an interrupt, the driver MUST read
+\field{InterruptStatus} to check what caused the interrupt (see the
+register description).  The used buffer notification bit being set
+SHOULD be interpreted as a used buffer notification for each active
+virtqueue.  After the interrupt is handled, the driver MUST acknowledge
+it by writing a bit mask corresponding to the handled events to the
+InterruptACK register.
+
+\subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}
+
+The legacy MMIO transport used page-based addressing, resulting
+in a slightly different control register layout, the device
+initialization and the virtual queue configuration procedure.
+
+Table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} 
+presents control registers layout, omitting
+descriptions of registers which did not change their function
+nor behaviour:
+
+\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
+  \caption {MMIO Device Legacy Register Layout}
+  \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} \\
+  \hline
+  \mmioreg{Name}{Function}{Offset from base}{Direction}{Description} 
+  \hline 
+  \hline 
+  \endfirsthead
+  \hline
+  \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description} 
+  \hline 
+  \hline 
+  \endhead
+  \endfoot
+  \endlastfoot
+  \mmioreg{MagicValue}{Magic value}{0x000}{R}{}
+  \hline
+  \mmioreg{Version}{Device version number}{0x004}{R}{Legacy device returns value 0x1.}
+  \hline
+  \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{}
+  \hline
+  \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
+  \hline
+  \mmioreg{HostFeatures}{Flags representing features the device supports}{0x010}{R}{}
+  \hline
+  \mmioreg{HostFeaturesSel}{Device (host) features word selection.}{0x014}{W}{}
+  \hline
+  \mmioreg{GuestFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{}
+  \hline
+  \mmioreg{GuestFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{}
+  \hline 
+  \mmioreg{GuestPageSize}{Guest page size}{0x028}{W}{%
+    The driver writes the guest page size in bytes to the
+    register during initialization, before any queues are used.
+    This value should be a power of 2 and is used by the device to
+    calculate the Guest address of the first queue page
+    (see QueuePFN).
+  }
+  \hline
+  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+    Writing to this register selects the virtual queue that the
+    following operations on the \field{QueueNumMax}, \field{QueueNum}, \field{QueueAlign}
+    and \field{QueuePFN} registers apply to. The index
+    number of the first queue is zero (0x0). 
+.
+  }
+  \hline
+  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+    Reading from the register returns the maximum size of the queue
+    the device is ready to process or zero (0x0) if the queue is not
+    available. This applies to the queue selected by writing to
+    \field{QueueSel} and is allowed only when \field{QueuePFN} is set to zero
+    (0x0), so when the queue is not actively used.
+  }
+  \hline
+  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+    Queue size is the number of elements in the queue.
+    Writing to this register notifies the device what size of the
+    queue the driver will use. This applies to the queue selected by
+    writing to \field{QueueSel}.
+  }
+  \hline
+  \mmioreg{QueueAlign}{Used Ring alignment in the virtual queue}{0x03c}{W}{%
+    Writing to this register notifies the device about alignment
+    boundary of the Used Ring in bytes. This value should be a power
+    of 2 and applies to the queue selected by writing to \field{QueueSel}.
+  }
+  \hline
+  \mmioreg{QueuePFN}{Guest physical page number of the virtual queue}{0x040}{RW}{%
+    Writing to this register notifies the device about location of the
+    virtual queue in the Guest's physical address space. This value
+    is the index number of a page starting with the queue
+    Descriptor Table. Value zero (0x0) means physical address zero
+    (0x00000000) and is illegal. When the driver stops using the
+    queue it writes zero (0x0) to this register.
+    Reading from this register returns the currently used page
+    number of the queue, therefore a value other than zero (0x0)
+    means that the queue is in use.
+    Both read and write accesses apply to the queue selected by
+    writing to \field{QueueSel}.
+  }
+  \hline
+  \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{}
+  \hline
+  \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{}
+  \hline
+  \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{}
+  \hline
+  \mmioreg{Status}{Device status}{0x070}{RW}{%
+    Reading from this register returns the current device status
+    flags.
+    Writing non-zero values to this register sets the status flags,
+    indicating the OS/driver progress. Writing zero (0x0) to this
+    register triggers a device reset. The device
+    sets \field{QueuePFN} to zero (0x0) for all queues in the device.
+    Also see \ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
+  }
+  \hline
+  \mmioreg{Config}{Configuration space}{0x100+}{RW}{}
+  \hline
+\end{longtable}
+
+The virtual queue page size is defined by writing to \field{GuestPageSize},
+as written by the guest. The driver does this before the
+virtual queues are configured.
+
+The virtual queue layout follows
+p. \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout},
+with the alignment defined in \field{QueueAlign}.
+
+The virtual queue is configured as follows:
+\begin{enumerate}
+\item Select the queue writing its index (first queue is 0) to
+   \field{QueueSel}.
+
+\item Check if the queue is not already in use: read \field{QueuePFN},
+   expecting a returned value of zero (0x0).
+
+\item Read maximum queue size (number of elements) from
+   \field{QueueNumMax}. If the returned value is zero (0x0) the
+   queue is not available.
+
+\item Allocate and zero the queue pages in contiguous virtual
+   memory, aligning the Used Ring to an optimal boundary (usually
+   page size). The driver should choose a queue size smaller than or
+   equal to \field{QueueNumMax}.
+
+\item Notify the device about the queue size by writing the size to
+   \field{QueueNum}.
+
+\item Notify the device about the used alignment by writing its value
+   in bytes to \field{QueueAlign}.
+
+\item Write the physical number of the first page of the queue to
+   the \field{QueuePFN} register.
+\end{enumerate}
+
+Notification mechanisms did not change.
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 1/6] transport-pci: Split PCI transport to its own file
From: Parav Pandit @ 2023-02-23  4:09 UTC (permalink / raw)
  To: mst, virtio-dev, cohuck; +Cc: virtio-comment, shahafs, Parav Pandit
In-Reply-To: <20230223040919.166617-1-parav@nvidia.com>

Place PCI transport specification in its own file to better maintain it.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/157
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 content.tex       | 1161 +--------------------------------------------
 transport-pci.tex | 1160 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1161 insertions(+), 1160 deletions(-)
 create mode 100644 transport-pci.tex

diff --git a/content.tex b/content.tex
index 0c7cdf8..be911e6 100644
--- a/content.tex
+++ b/content.tex
@@ -579,1166 +579,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
 Virtio can use various different buses, thus the standard is split
 into virtio general and bus-specific sections.
 
-\section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
-
-Virtio devices are commonly implemented as PCI devices.
-
-A Virtio device can be implemented as any kind of PCI device:
-a Conventional PCI device or a PCI Express
-device.  To assure designs meet the latest level
-requirements, see 
-the PCI-SIG home page at \url{http://www.pcisig.com} for any
-approved changes.
-
-\devicenormative{\subsection}{Virtio Over PCI Bus}{Virtio Transport Options / Virtio Over PCI Bus}
-A Virtio device using Virtio Over PCI Bus MUST expose to
-guest an interface that meets the specification requirements of
-the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
-and \hyperref[intro:PCIe]{[PCIe]}
-respectively. 
-
-\subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
-
-Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through
-0x107F inclusive is a virtio device. The actual value within this range
-indicates which virtio device is supported by the device.
-The PCI Device ID is calculated by adding 0x1040 to the Virtio Device ID,
-as indicated in section \ref{sec:Device Types}.
-Additionally, devices MAY utilize a Transitional PCI Device ID range,
-0x1000 to 0x103F depending on the device type.
-
-\devicenormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
-
-Devices MUST have the PCI Vendor ID 0x1AF4.
-Devices MUST either have the PCI Device ID calculated by adding 0x1040
-to the Virtio Device ID, as indicated in section \ref{sec:Device
-Types} or have the Transitional PCI Device ID depending on the device type,
-as follows:
-
-\begin{tabular}{|l|c|}
-\hline
-Transitional PCI Device ID  &  Virtio Device    \\
-\hline \hline
-0x1000      &   network device     \\
-\hline
-0x1001     &   block device     \\
-\hline
-0x1002     & memory ballooning (traditional)  \\
-\hline
-0x1003     &      console       \\
-\hline
-0x1004     &     SCSI host      \\
-\hline
-0x1005     &  entropy source    \\
-\hline
-0x1009     &   9P transport     \\
-\hline
-\end{tabular}
-
-For example, the network device with the Virtio Device ID 1
-has the PCI Device ID 0x1041 or the Transitional PCI Device ID 0x1000.
-
-The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY reflect
-the PCI Vendor and Device ID of the environment (for informational purposes by the driver).
-
-Non-transitional devices SHOULD have a PCI Device ID in the range
-0x1040 to 0x107f.
-Non-transitional devices SHOULD have a PCI Revision ID of 1 or higher.
-Non-transitional devices SHOULD have a PCI Subsystem Device ID of 0x40 or higher.
-
-This is to reduce the chance of a legacy driver attempting
-to drive the device.
-
-\drivernormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
-Drivers MUST match devices with the PCI Vendor ID 0x1AF4 and
-the PCI Device ID in the range 0x1040 to 0x107f,
-calculated by adding 0x1040 to the Virtio Device ID,
-as indicated in section \ref{sec:Device Types}.
-Drivers for device types listed in section \ref{sec:Virtio
-Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
-MUST match devices with the PCI Vendor ID 0x1AF4 and
-the Transitional PCI Device ID indicated in section
- \ref{sec:Virtio
-Transport Options / Virtio Over PCI Bus / PCI Device Discovery}.
-
-Drivers MUST match any PCI Revision ID value.
-Drivers MAY match any PCI Subsystem Vendor ID and any
-PCI Subsystem Device ID value.
-
-\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
-Transitional devices MUST have a PCI Revision ID of 0.
-Transitional devices MUST have the PCI Subsystem Device ID
-matching the Virtio Device ID, as indicated in section \ref{sec:Device Types}.
-Transitional devices MUST have the Transitional PCI Device ID in
-the range 0x1000 to 0x103f.
-
-This is to match legacy drivers.
-
-\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
-
-The device is configured via I/O and/or memory regions (though see
-\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
-for access via the PCI configuration space), as specified by Virtio
-Structure PCI Capabilities.
-
-Fields of different sizes are present in the device
-configuration regions.
-All 64-bit, 32-bit and 16-bit fields are little-endian.
-64-bit fields are to be treated as two 32-bit fields,
-with low 32 bit part followed by the high 32 bit part.
-
-\drivernormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
-
-For device configuration access, the driver MUST use 8-bit wide
-accesses for 8-bit wide fields, 16-bit wide and aligned accesses
-for 16-bit wide fields and 32-bit wide and aligned accesses for
-32-bit and 64-bit wide fields. For 64-bit fields, the driver MAY
-access each of the high and low 32-bit parts of the field
-independently.
-
-\devicenormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
-
-For 64-bit device configuration fields, the device MUST allow driver
-independent access to high and low 32-bit parts of the field.
-
-\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
-
-The virtio device configuration layout includes several structures:
-\begin{itemize}
-\item Common configuration
-\item Notifications
-\item ISR Status
-\item Device-specific configuration (optional)
-\item PCI configuration access
-\end{itemize}
-
-Each structure can be mapped by a Base Address register (BAR) belonging to
-the function, or accessed via the special VIRTIO_PCI_CAP_PCI_CFG field in the PCI configuration space.
-
-The location of each structure is specified using a vendor-specific PCI capability located
-on the capability list in PCI configuration space of the device.
-This virtio structure capability uses little-endian format; all fields are
-read-only for the driver unless stated otherwise:
-
-\begin{lstlisting}
-struct virtio_pci_cap {
-        u8 cap_vndr;    /* Generic PCI field: PCI_CAP_ID_VNDR */
-        u8 cap_next;    /* Generic PCI field: next ptr. */
-        u8 cap_len;     /* Generic PCI field: capability length */
-        u8 cfg_type;    /* Identifies the structure. */
-        u8 bar;         /* Where to find it. */
-        u8 id;          /* Multiple capabilities of the same type */
-        u8 padding[2];  /* Pad to full dword. */
-        le32 offset;    /* Offset within bar. */
-        le32 length;    /* Length of the structure, in bytes. */
-};
-\end{lstlisting}
-
-This structure can be followed by extra data, depending on
-\field{cfg_type}, as documented below.
-
-The fields are interpreted as follows:
-
-\begin{description}
-\item[\field{cap_vndr}]
-        0x09; Identifies a vendor-specific capability.
-
-\item[\field{cap_next}]
-        Link to next capability in the capability list in the PCI configuration space.
-
-\item[\field{cap_len}]
-        Length of this capability structure, including the whole of
-        struct virtio_pci_cap, and extra data if any.
-        This length MAY include padding, or fields unused by the driver.
-
-\item[\field{cfg_type}]
-        identifies the structure, according to the following table:
-
-\begin{lstlisting}
-/* Common configuration */
-#define VIRTIO_PCI_CAP_COMMON_CFG        1
-/* Notifications */
-#define VIRTIO_PCI_CAP_NOTIFY_CFG        2
-/* ISR Status */
-#define VIRTIO_PCI_CAP_ISR_CFG           3
-/* Device specific configuration */
-#define VIRTIO_PCI_CAP_DEVICE_CFG        4
-/* PCI configuration access */
-#define VIRTIO_PCI_CAP_PCI_CFG           5
-/* Shared memory region */
-#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
-/* Vendor-specific data */
-#define VIRTIO_PCI_CAP_VENDOR_CFG        9
-\end{lstlisting}
-
-        Any other value is reserved for future use.
-
-        Each structure is detailed individually below.
-
-        The device MAY offer more than one structure of any type - this makes it
-        possible for the device to expose multiple interfaces to drivers.  The order of
-        the capabilities in the capability list specifies the order of preference
-        suggested by the device.  A device may specify that this ordering mechanism be
-        overridden by the use of the \field{id} field.
-        \begin{note}
-          For example, on some hypervisors, notifications using IO accesses are
-        faster than memory accesses. In this case, the device would expose two
-        capabilities with \field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG:
-        the first one addressing an I/O BAR, the second one addressing a memory BAR.
-        In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on
-        memory BAR when I/O resources are unavailable.
-        \end{note}
-
-\item[\field{bar}]
-        values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
-        the function located beginning at 10h in PCI Configuration Space
-        and used to map the structure into Memory or I/O Space.
-        The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
-        or I/O Space.
-
-        Any other value is reserved for future use.
-
-\item[\field{id}]
-        Used by some device types to uniquely identify multiple capabilities
-        of a certain type. If the device type does not specify the meaning of
-        this field, its contents are undefined.
-
-
-\item[\field{offset}]
-        indicates where the structure begins relative to the base address associated
-        with the BAR.  The alignment requirements of \field{offset} are indicated
-        in each structure-specific section below.
-
-\item[\field{length}]
-        indicates the length of the structure.
-
-        \field{length} MAY include padding, or fields unused by the driver, or
-        future extensions.
-
-        \begin{note}
-        For example, a future device might present a large structure size of several
-        MBytes.
-        As current devices never utilize structures larger than 4KBytes in size,
-        driver MAY limit the mapped structure size to e.g.
-        4KBytes (thus ignoring parts of structure after the first
-        4KBytes) to allow forward compatibility with such devices without loss of
-        functionality and without wasting resources.
-        \end{note}
-\end{description}
-
-A variant of this type, struct virtio_pci_cap64, is defined for
-those capabilities that require offsets or lengths larger than
-4GiB:
-
-\begin{lstlisting}
-struct virtio_pci_cap64 {
-        struct virtio_pci_cap cap;
-        u32 offset_hi;
-        u32 length_hi;
-};
-\end{lstlisting}
-
-Given that the \field{cap.length} and \field{cap.offset} fields
-are only 32 bit, the additional \field{offset_hi} and \field{length_hi}
-fields provide the most significant 32 bits of a total 64 bit offset and
-length within the BAR specified by \field{cap.bar}.
-
-\drivernormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
-
-The driver MUST ignore any vendor-specific capability structure which has
-a reserved \field{cfg_type} value.
-
-The driver SHOULD use the first instance of each virtio structure type they can
-support.
-
-The driver MUST accept a \field{cap_len} value which is larger than specified here.
-
-The driver MUST ignore any vendor-specific capability structure which has
-a reserved \field{bar} value.
-
-        The drivers SHOULD only map part of configuration structure
-        large enough for device operation.  The drivers MUST handle
-        an unexpectedly large \field{length}, but MAY check that \field{length}
-        is large enough for device operation.
-
-The driver MUST NOT write into any field of the capability structure,
-with the exception of those with \field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as
-detailed in \ref{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}.
-
-\devicenormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
-
-The device MUST include any extra data (from the beginning of the \field{cap_vndr} field
-through end of the extra data fields if any) in \field{cap_len}.
-The device MAY append extra data
-or padding to any structure beyond that.
-
-If the device presents multiple structures of the same type, it SHOULD order
-them from optimal (first) to least-optimal (last).
-
-\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
-
-The common configuration structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
-
-\begin{lstlisting}
-struct virtio_pci_common_cfg {
-        /* About the whole device. */
-        le32 device_feature_select;     /* read-write */
-        le32 device_feature;            /* read-only for driver */
-        le32 driver_feature_select;     /* read-write */
-        le32 driver_feature;            /* read-write */
-        le16 config_msix_vector;        /* read-write */
-        le16 num_queues;                /* read-only for driver */
-        u8 device_status;               /* read-write */
-        u8 config_generation;           /* read-only for driver */
-
-        /* About a specific virtqueue. */
-        le16 queue_select;              /* read-write */
-        le16 queue_size;                /* read-write */
-        le16 queue_msix_vector;         /* read-write */
-        le16 queue_enable;              /* read-write */
-        le16 queue_notify_off;          /* read-only for driver */
-        le64 queue_desc;                /* read-write */
-        le64 queue_driver;              /* read-write */
-        le64 queue_device;              /* read-write */
-        le16 queue_notify_data;         /* read-only for driver */
-        le16 queue_reset;               /* read-write */
-};
-\end{lstlisting}
-
-\begin{description}
-\item[\field{device_feature_select}]
-        The driver uses this to select which feature bits \field{device_feature} shows.
-        Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
-
-\item[\field{device_feature}]
-        The device uses this to report which feature bits it is
-        offering to the driver: the driver writes to
-        \field{device_feature_select} to select which feature bits are presented.
-
-\item[\field{driver_feature_select}]
-        The driver uses this to select which feature bits \field{driver_feature} shows.
-        Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
-
-\item[\field{driver_feature}]
-        The driver writes this to accept feature bits offered by the device.
-        Driver Feature Bits selected by \field{driver_feature_select}.
-
-\item[\field{config_msix_vector}]
-        The driver sets the Configuration Vector for MSI-X.
-
-\item[\field{num_queues}]
-        The device specifies the maximum number of virtqueues supported here.
-
-\item[\field{device_status}]
-        The driver writes the device status here (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}). Writing 0 into this
-        field resets the device.
-
-\item[\field{config_generation}]
-        Configuration atomicity value.  The device changes this every time the
-        configuration noticeably changes.
-
-\item[\field{queue_select}]
-        Queue Select. The driver selects which virtqueue the following
-        fields refer to.
-
-\item[\field{queue_size}]
-        Queue Size.  On reset, specifies the maximum queue size supported by
-        the device. This can be modified by the driver to reduce memory requirements.
-        A 0 means the queue is unavailable.
-
-\item[\field{queue_msix_vector}]
-        The driver uses this to specify the queue vector for MSI-X.
-
-\item[\field{queue_enable}]
-        The driver uses this to selectively prevent the device from executing requests from this virtqueue.
-        1 - enabled; 0 - disabled.
-
-\item[\field{queue_notify_off}]
-        The driver reads this to calculate the offset from start of Notification structure at
-        which this virtqueue is located.
-        \begin{note} this is \em{not} an offset in bytes.
-        See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
-        \end{note}
-
-\item[\field{queue_desc}]
-        The driver writes the physical address of Descriptor Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
-
-\item[\field{queue_driver}]
-        The driver writes the physical address of Driver Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
-
-\item[\field{queue_device}]
-        The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
-
-\item[\field{queue_notify_data}]
-        This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
-        The driver will use this value to put it in the 'virtqueue number' field
-        in the available buffer notification structure.
-        See section \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}.
-        \begin{note}
-        This field provides the device with flexibility to determine how virtqueues
-        will be referred to in available buffer notifications.
-        In a trivial case the device can set \field{queue_notify_data}=vqn. Some devices
-        may benefit from providing another value, for example an internal virtqueue
-        identifier, or an internal offset related to the virtqueue number.
-        \end{note}
-
-\item[\field{queue_reset}]
-        The driver uses this to selectively reset the queue.
-        This field exists only if VIRTIO_F_RING_RESET has been
-        negotiated. (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
-
-\end{description}
-
-\devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
-\field{offset} MUST be 4-byte aligned.
-
-The device MUST present at least one common configuration capability.
-
-The device MUST present the feature bits it is offering in \field{device_feature}, starting at bit \field{device_feature_select} $*$ 32 for any \field{device_feature_select} written by the driver.
-\begin{note}
-  This means that it will present 0 for any \field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63.
-\end{note}
-
-The device MUST present any valid feature bits the driver has written in \field{driver_feature}, starting at bit \field{driver_feature_select} $*$ 32 for any \field{driver_feature_select} written by the driver.  Valid feature bits are those which are subset of the corresponding \field{device_feature} bits.  The device MAY present invalid bits written by the driver.
-
-\begin{note}
-  This means that a device can ignore writes for feature bits it never
-  offers, and simply present 0 on reads.  Or it can just mirror what the driver wrote
-  (but it will still have to check them when the driver sets FEATURES_OK).
-\end{note}
-
-\begin{note}
-  A driver shouldn't write invalid bits anyway, as per \ref{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it.
-\end{note}
-
-The device MUST present a changed \field{config_generation} after the
-driver has read a device-specific configuration value which has
-changed since any part of the device-specific configuration was last
-read.
-\begin{note}
-As \field{config_generation} is an 8-bit value, simply incrementing it
-on every configuration change could violate this requirement due to wrap.
-Better would be to set an internal flag when it has changed,
-and if that flag is set when the driver reads from the device-specific
-configuration, increment \field{config_generation} and clear the flag.
-\end{note}
-
-The device MUST reset when 0 is written to \field{device_status}, and
-present a 0 in \field{device_status} once that is done.
-
-The device MUST present a 0 in \field{queue_enable} on reset.
-
-If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
-\field{queue_reset} on reset.
-
-If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
-\field{queue_reset} after the virtqueue is enabled with \field{queue_enable}.
-
-The device MUST reset the queue when 1 is written to \field{queue_reset}. The
-device MUST continue to present 1 in \field{queue_reset} as long as the queue reset
-is ongoing. The device MUST present 0 in both \field{queue_reset} and \field{queue_enable}
-when queue reset has completed.
-(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
-
-The device MUST present a 0 in \field{queue_size} if the virtqueue
-corresponding to the current \field{queue_select} is unavailable.
-
-If VIRTIO_F_RING_PACKED has not been negotiated, the device MUST
-present either a value of 0 or a power of 2 in
-\field{queue_size}.
-
-\drivernormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
-
-The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation}, \field{queue_notify_off} or \field{queue_notify_data}.
-
-If VIRTIO_F_RING_PACKED has been negotiated,
-the driver MUST NOT write the value 0 to \field{queue_size}.
-If VIRTIO_F_RING_PACKED has not been negotiated,
-the driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
-
-The driver MUST configure the other virtqueue fields before enabling the virtqueue
-with \field{queue_enable}.
-
-After writing 0 to \field{device_status}, the driver MUST wait for a read of
-\field{device_status} to return 0 before reinitializing the device.
-
-The driver MUST NOT write a 0 to \field{queue_enable}.
-
-If VIRTIO_F_RING_RESET has been negotiated, after the driver writes 1 to
-\field{queue_reset} to reset the queue, the driver MUST NOT consider queue
-reset to be complete until it reads back 0 in \field{queue_reset}. The driver
-MAY re-enable the queue by writing 1 to \field{queue_enable} after ensuring
-that other virtqueue fields have been set up correctly. The driver MAY set
-driver-writeable queue configuration values to different values than those that
-were used before the queue reset.
-(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
-
-\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
-
-The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
-capability.  This capability is immediately followed by an additional
-field, like so:
-
-\begin{lstlisting}
-struct virtio_pci_notify_cap {
-        struct virtio_pci_cap cap;
-        le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
-};
-\end{lstlisting}
-
-\field{notify_off_multiplier} is combined with the \field{queue_notify_off} to
-derive the Queue Notify address within a BAR for a virtqueue:
-
-\begin{lstlisting}
-        cap.offset + queue_notify_off * notify_off_multiplier
-\end{lstlisting}
-
-The \field{cap.offset} and \field{notify_off_multiplier} are taken from the
-notification capability structure above, and the \field{queue_notify_off} is
-taken from the common configuration structure.
-
-\begin{note}
-For example, if \field{notifier_off_multiplier} is 0, the device uses
-the same Queue Notify address for all queues.
-\end{note}
-
-\devicenormative{\paragraph}{Notification capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
-The device MUST present at least one notification capability.
-
-For devices not offering VIRTIO_F_NOTIFICATION_DATA:
-
-The \field{cap.offset} MUST be 2-byte aligned.
-
-The device MUST either present \field{notify_off_multiplier} as an even power of 2,
-or present \field{notify_off_multiplier} as 0.
-
-The value \field{cap.length} presented by the device MUST be at least 2
-and MUST be large enough to support queue notification offsets
-for all supported queues in all possible configurations.
-
-For all queues, the value \field{cap.length} presented by the device MUST satisfy:
-\begin{lstlisting}
-cap.length >= queue_notify_off * notify_off_multiplier + 2
-\end{lstlisting}
-
-For devices offering VIRTIO_F_NOTIFICATION_DATA:
-
-The device MUST either present \field{notify_off_multiplier} as a
-number that is a power of 2 that is also a multiple 4,
-or present \field{notify_off_multiplier} as 0.
-
-The \field{cap.offset} MUST be 4-byte aligned.
-
-The value \field{cap.length} presented by the device MUST be at least 4
-and MUST be large enough to support queue notification offsets
-for all supported queues in all possible configurations.
-
-For all queues, the value \field{cap.length} presented by the device MUST satisfy:
-\begin{lstlisting}
-cap.length >= queue_notify_off * notify_off_multiplier + 4
-\end{lstlisting}
-
-\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
-
-The VIRTIO_PCI_CAP_ISR_CFG capability
-refers to at least a single byte, which contains the 8-bit ISR status field
-to be used for INT\#x interrupt handling.
-
-The \field{offset} for the \field{ISR status} has no alignment requirements.
-
-The ISR bits allow the driver to distinguish between device-specific configuration
-change interrupts and normal virtqueue interrupts:
-
-\begin{tabular}{ |l||l|l|l| }
-\hline
-Bits       & 0                               & 1               &  2 to 31 \\
-\hline
-Purpose    & Queue Interrupt  & Device Configuration Interrupt & Reserved \\
-\hline
-\end{tabular}
-
-To avoid an extra access, simply reading this register resets it to 0 and
-causes the device to de-assert the interrupt.
-
-In this way, driver read of ISR status causes the device to de-assert
-an interrupt.
-
-See sections \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications} and \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used.
-
-\devicenormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
-
-The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.  
-
-The device MUST set the Device Configuration Interrupt bit
-in \field{ISR status} before sending a device configuration
-change notification to the driver.
-
-If MSI-X capability is disabled, the device MUST set the Queue
-Interrupt bit in \field{ISR status} before sending a virtqueue
-notification to the driver.
-
-If MSI-X capability is disabled, the device MUST set the Interrupt Status
-bit in the PCI Status register in the PCI Configuration Header of
-the device to the logical OR of all bits in \field{ISR status} of
-the device.  The device then asserts/deasserts INT\#x interrupts unless masked
-according to standard PCI rules \hyperref[intro:PCI]{[PCI]}.
-
-The device MUST reset \field{ISR status} to 0 on driver read.
-
-\drivernormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
-
-If MSI-X capability is enabled, the driver SHOULD NOT access
-\field{ISR status} upon detecting a Queue Interrupt.
-
-\subsubsection{Device-specific configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
-
-The device MUST present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability for
-any device type which has a device-specific configuration.
-
-\devicenormative{\paragraph}{Device-specific configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
-
-The \field{offset} for the device-specific configuration MUST be 4-byte aligned.
-
-\subsubsection{Shared memory capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
-
-Shared memory regions \ref{sec:Basic Facilities of a Virtio
-Device / Shared Memory Regions} are enumerated on the PCI transport
-as a sequence of VIRTIO_PCI_CAP_SHARED_MEMORY_CFG capabilities, one per region.
-
-The capability is defined by a struct virtio_pci_cap64 and
-utilises the \field{cap.id} to allow multiple shared memory
-regions per device.
-The identifier in \field{cap.id} does not denote a certain order of
-preference; it is only used to uniquely identify a region.
-
-\devicenormative{\paragraph}{Shared memory capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
-
-The region defined by the combination of the \field{cap.offset},
-\field{offset_hi}, and \field{cap.length}, \field{length_hi}
-fields MUST be contained within the BAR specified by
-\field{cap.bar}.
-
-The \field{cap.id} MUST be unique for any one device instance.
-
-\subsubsection{Vendor data capability}\label{sec:Virtio
-Transport Options / Virtio Over PCI Bus / PCI Device Layout /
-Vendor data capability}
-
-The optional Vendor data capability allows the device to present
-vendor-specific data to the driver, without
-conflicts, for debugging and/or reporting purposes,
-and without conflicting with standard functionality.
-
-This capability augments but does not replace the standard
-subsystem ID and subsystem vendor ID fields
-(offsets 0x2C and 0x2E in the PCI configuration space header)
-as specified by \hyperref[intro:PCI]{[PCI]}.
-
-Vendor data capability is enumerated on the PCI transport
-as a VIRTIO_PCI_CAP_VENDOR_CFG capability.
-
-The capability has the following structure:
-\begin{lstlisting}
-struct virtio_pci_vndr_data {
-        u8 cap_vndr;    /* Generic PCI field: PCI_CAP_ID_VNDR */
-        u8 cap_next;    /* Generic PCI field: next ptr. */
-        u8 cap_len;     /* Generic PCI field: capability length */
-        u8 cfg_type;    /* Identifies the structure. */
-        u16 vendor_id;  /* Identifies the vendor-specific format. */
-	/* For Vendor Definition */
-	/* Pads structure to a multiple of 4 bytes */
-	/* Reads must not have side effects */
-};
-\end{lstlisting}
-
-Where \field{vendor_id} identifies the PCI-SIG assigned Vendor ID
-as specified by \hyperref[intro:PCI]{[PCI]}.
-
-Note that the capability size is required to be a multiple of 4.
-
-To make it safe for a generic driver to access the capability,
-reads from this capability MUST NOT have any side effects.
-
-\devicenormative{\paragraph}{Vendor data capability}{Virtio
-Transport Options / Virtio Over PCI Bus / PCI Device Layout /
-Vendor data capability}
-
-Devices CAN present \field{vendor_id} that does not match
-either the PCI Vendor ID or the PCI Subsystem Vendor ID.
-
-Devices CAN present multiple Vendor data capabilities with
-either different or identical \field{vendor_id} values.
-
-The value \field{vendor_id} MUST NOT equal 0x1AF4.
-
-The size of the Vendor data capability MUST be a multiple of 4 bytes.
-
-Reads of the Vendor data capability by the driver MUST NOT have any
-side effects.
-
-\drivernormative{\paragraph}{Vendor data capability}{Virtio
-Transport Options / Virtio Over PCI Bus / PCI Device Layout /
-Vendor data capability}
-
-The driver SHOULD NOT use the Vendor data capability except
-for debugging and reporting purposes.
-
-The driver MUST qualify the \field{vendor_id} before
-interpreting or writing into the Vendor data capability.
-
-\subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
-
-The VIRTIO_PCI_CAP_PCI_CFG capability
-creates an alternative (and likely suboptimal) access method to the
-common configuration, notification, ISR and device-specific configuration regions.
-
-The capability is immediately followed by an additional field like so:
-
-\begin{lstlisting}
-struct virtio_pci_cfg_cap {
-        struct virtio_pci_cap cap;
-        u8 pci_cfg_data[4]; /* Data for BAR access. */
-};
-\end{lstlisting}
-
-The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} and
-\field{pci_cfg_data} are read-write (RW) for the driver.
-
-To access a device region, the driver writes into the capability
-structure (ie. within the PCI configuration space) as follows:
-
-\begin{itemize}
-\item The driver sets the BAR to access by writing to \field{cap.bar}.
-
-\item The driver sets the size of the access by writing 1, 2 or 4 to
-  \field{cap.length}.
-
-\item The driver sets the offset within the BAR by writing to
-  \field{cap.offset}.
-\end{itemize}
-
-At that point, \field{pci_cfg_data} will provide a window of size
-\field{cap.length} into the given \field{cap.bar} at offset \field{cap.offset}.
-
-\devicenormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
-
-The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.
-
-Upon detecting driver write access
-to \field{pci_cfg_data}, the device MUST execute a write access
-at offset \field{cap.offset} at BAR selected by \field{cap.bar} using the first \field{cap.length}
-bytes from \field{pci_cfg_data}.
-
-Upon detecting driver read access
-to \field{pci_cfg_data}, the device MUST
-execute a read access of length cap.length at offset \field{cap.offset}
-at BAR selected by \field{cap.bar} and store the first \field{cap.length} bytes in
-\field{pci_cfg_data}.
-
-\drivernormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
-
-The driver MUST NOT write a \field{cap.offset} which is not
-a multiple of \field{cap.length} (ie. all accesses MUST be aligned).
-
-The driver MUST NOT read or write \field{pci_cfg_data}
-unless \field{cap.bar}, \field{cap.length} and \field{cap.offset}
-address \field{cap.length} bytes within a BAR range
-specified by some other Virtio Structure PCI Capability
-of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
-
-\subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
-
-Transitional devices MUST present part of configuration
-registers in a legacy configuration structure in BAR0 in the first I/O
-region of the PCI device, as documented below.
-When using the legacy interface, transitional drivers
-MUST use the legacy configuration structure in BAR0 in the first
-I/O region of the PCI device, as documented below.
-
-When using the legacy interface the driver MAY access
-the device-specific configuration region using any width accesses, and
-a transitional device MUST present driver with the same results as
-when accessed using the ``natural'' access method (i.e.
-32-bit accesses for 32-bit fields, etc).
-
-Note that this is possible because while the virtio common configuration structure is PCI
-(i.e. little) endian, when using the legacy interface the device-specific
-configuration region is encoded in the native endian of the guest (where such distinction is
-applicable).
-
-When used through the legacy interface, the virtio common configuration structure looks as follows:
-
-\begin{tabularx}{\textwidth}{ |X||X|X|X|X|X|X|X|X| }
-\hline
- Bits & 32 & 32 & 32 & 16 & 16 & 16 & 8 & 8 \\
-\hline
- Read / Write & R & R+W & R+W & R & R+W & R+W & R+W & R \\
-\hline
- Purpose & Device Features bits 0:31 & Driver Features bits 0:31 &
-  Queue Address & \field{queue_size} & \field{queue_select} & Queue Notify &
-  Device Status & ISR \newline Status \\
-\hline
-\end{tabularx}
-
-If MSI-X is enabled for the device, two additional fields
-immediately follow this header:
-
-\begin{tabular}{ |l||l|l| }
-\hline
-Bits       & 16             & 16     \\
-\hline
-Read/Write & R+W            & R+W    \\
-\hline
-Purpose (MSI-X) & \field{config_msix_vector}  & \field{queue_msix_vector} \\
-\hline
-\end{tabular}
-
-Note: When MSI-X capability is enabled, device-specific configuration starts at
-byte offset 24 in virtio common configuration structure structure. When MSI-X capability is not
-enabled, device-specific configuration starts at byte offset 20 in virtio
-header.  ie. once you enable MSI-X on the device, the other fields move.
-If you turn it off again, they move back!
-
-Any device-specific configuration space immediately follows
-these general headers:
-
-\begin{tabular}{|l||l|l|}
-\hline
-Bits & Device Specific & \multirow{3}{*}{\ldots} \\
-\cline{1-2}
-Read / Write & Device Specific & \\
-\cline{1-2}
-Purpose & Device Specific & \\
-\hline
-\end{tabular}
-
-When accessing the device-specific configuration space
-using the legacy interface, transitional
-drivers MUST access the device-specific configuration space
-at an offset immediately following the general headers.
-
-When using the legacy interface, transitional
-devices MUST present the device-specific configuration space
-if any at an offset immediately following the general headers.
-
-Note that only Feature Bits 0 to 31 are accessible through the
-Legacy Interface. When used through the Legacy Interface,
-Transitional Devices MUST assume that Feature Bits 32 to 63
-are not acknowledged by Driver.
-
-As legacy devices had no \field{config_generation} field,
-see \ref{sec:Basic Facilities of a Virtio Device / Device
-Configuration Space / Legacy Interface: Device Configuration
-Space}~\nameref{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space} for workarounds.
-
-\subsubsection{Non-transitional Device With Legacy Driver: A Note
-on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio
-Over PCI Bus / PCI Device Layout / Non-transitional Device With
-Legacy Driver: A Note on PCI Device Layout}
-
-All known legacy drivers check either the PCI Revision or the
-Device and Vendor IDs, and thus won't attempt to drive a
-non-transitional device.
-
-A buggy legacy driver might mistakenly attempt to drive a
-non-transitional device. If support for such drivers is required
-(as opposed to fixing the bug), the following would be the
-recommended way to detect and handle them.
-\begin{note}
-Such buggy drivers are not currently known to be used in
-production.
-\end{note}
-
-\subparagraph{Device Requirements: Non-transitional Device With Legacy Driver}
-\label{drivernormative:Virtio Transport Options / Virtio Over PCI
-Bus / PCI-specific Initialization And Device Operation /
-Device Initialization / Non-transitional Device With Legacy
-Driver}
-\label{devicenormative:Virtio Transport Options / Virtio Over PCI
-Bus / PCI-specific Initialization And Device Operation /
-Device Initialization / Non-transitional Device With Legacy
-Driver}
-
-Non-transitional devices, on a platform where a legacy driver for
-a legacy device with the same ID (including PCI Revision, Device
-and Vendor IDs) is known to have previously existed,
-SHOULD take the following steps to cause the legacy driver to
-fail gracefully when it attempts to drive them:
-
-\begin{enumerate}
-\item Present an I/O BAR in BAR0, and
-\item Respond to a single-byte zero write to offset 18
-   (corresponding to Device Status register in the legacy layout)
-   of BAR0 by presenting zeroes on every BAR and ignoring writes.
-\end{enumerate}
-
-\subsection{PCI-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation}
-
-\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization}
-
-This documents PCI-specific steps executed during Device Initialization.
-
-\paragraph{Virtio Device Configuration Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection}
-
-As a prerequisite to device initialization, the driver scans the
-PCI capability list, detecting virtio configuration layout using Virtio
-Structure PCI capabilities as detailed in \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
-
-\subparagraph{Legacy Interface: A Note on Device Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection / Legacy Interface: A Note on Device Layout Detection}
-
-Legacy drivers skipped the Device Layout Detection step, assuming legacy
-device configuration space in BAR0 in I/O space unconditionally.
-
-Legacy devices did not have the Virtio PCI Capability in their
-capability list.
-
-Therefore:
-
-Transitional devices MUST expose the Legacy Interface in I/O
-space in BAR0.
-
-Transitional drivers MUST look for the Virtio PCI
-Capabilities on the capability list.
-If these are not present, driver MUST assume a legacy device,
-and use it through the legacy interface.
-
-Non-transitional drivers MUST look for the Virtio PCI
-Capabilities on the capability list.
-If these are not present, driver MUST assume a legacy device,
-and fail gracefully.
-
-\paragraph{MSI-X Vector Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
-
-When MSI-X capability is present and enabled in the device
-(through standard PCI configuration space) \field{config_msix_vector} and \field{queue_msix_vector} are used to map configuration change and queue
-interrupts to MSI-X vectors. In this case, the ISR Status is unused.
-
-Writing a valid MSI-X Table entry number, 0 to 0x7FF, to
-\field{config_msix_vector}/\field{queue_msix_vector} maps interrupts triggered
-by the configuration change/selected queue events respectively to
-the corresponding MSI-X vector. To disable interrupts for an
-event type, the driver unmaps this event by writing a special NO_VECTOR
-value:
-
-\begin{lstlisting}
-/* Vector value used to disable MSI for queue */
-#define VIRTIO_MSI_NO_VECTOR            0xffff
-\end{lstlisting}
-
-Note that mapping an event to vector might require device to
-allocate internal device resources, and thus could fail. 
-
-\devicenormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
-
-A device that has an MSI-X capability SHOULD support at least 2
-and at most 0x800 MSI-X vectors.
-Device MUST report the number of vectors supported in
-\field{Table Size} in the MSI-X Capability as specified in
-\hyperref[intro:PCI]{[PCI]}.
-The device SHOULD restrict the reported MSI-X Table Size field
-to a value that might benefit system performance.
-\begin{note}
-For example, a device which does not expect to send
-interrupts at a high rate might only specify 2 MSI-X vectors.
-\end{note}
-Device MUST support mapping any event type to any valid
-vector 0 to MSI-X \field{Table Size}.
-Device MUST support unmapping any event type.
-
-The device MUST return vector mapped to a given event,
-(NO_VECTOR if unmapped) on read of \field{config_msix_vector}/\field{queue_msix_vector}.
-The device MUST have all queue and configuration change
-events are unmapped upon reset.
-
-Devices SHOULD NOT cause mapping an event to vector to fail
-unless it is impossible for the device to satisfy the mapping
-request.  Devices MUST report mapping
-failures by returning the NO_VECTOR value when the relevant
-\field{config_msix_vector}/\field{queue_msix_vector} field is read. 
-
-\drivernormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
-
-Driver MUST support device with any MSI-X Table Size 0 to 0x7FF.
-Driver MAY fall back on using INT\#x interrupts for a device
-which only supports one MSI-X vector (MSI-X Table Size = 0).
-
-Driver MAY intepret the Table Size as a hint from the device
-for the suggested number of MSI-X vectors to use.
-
-Driver MUST NOT attempt to map an event to a vector
-outside the MSI-X Table supported by the device,
-as reported by \field{Table Size} in the MSI-X Capability.
-
-After mapping an event to vector, the
-driver MUST verify success by reading the Vector field value: on
-success, the previously written value is returned, and on
-failure, NO_VECTOR is returned. If a mapping failure is detected,
-the driver MAY retry mapping with fewer vectors, disable MSI-X
-or report device failure.
-
-\paragraph{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration}
-
-As a device can have zero or more virtqueues for bulk data
-transport\footnote{For example, the simplest network device has two virtqueues.}, the driver
-needs to configure them as part of the device-specific
-configuration.
-
-The driver typically does this as follows, for each virtqueue a device has:
-
-\begin{enumerate}
-\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
-
-\item Read the virtqueue size from \field{queue_size}. This controls how big the virtqueue is
-  (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist.
-
-\item Optionally, select a smaller virtqueue size and write it to \field{queue_size}.
-
-\item Allocate and zero Descriptor Table, Available and Used rings for the
-   virtqueue in contiguous physical memory.
-
-\item Optionally, if MSI-X capability is present and enabled on the
-  device, select a vector to use to request interrupts triggered
-  by virtqueue events. Write the MSI-X Table entry number
-  corresponding to this vector into \field{queue_msix_vector}. Read
-  \field{queue_msix_vector}: on success, previously written value is
-  returned; on failure, NO_VECTOR value is returned.
-\end{enumerate}
-
-\subparagraph{Legacy Interface: A Note on Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration / Legacy Interface: A Note on Virtqueue Configuration}
-When using the legacy interface, the queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout} with an alignment of 4096.
-Driver writes the physical address, divided
-by 4096 to the Queue Address field\footnote{The 4096 is based on the x86 page size, but it's also large
-enough to ensure that the separate parts of the virtqueue are on
-separate cache lines.
-}.  There was no mechanism to negotiate the queue size.
-
-\subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}
-
-When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
-of this virtqueue to the Queue Notify address.
-
-When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
-the driver sends an available buffer notification to the device by writing
-the following 32-bit value to the Queue Notify address:
-\lstinputlisting{notifications-le.c}
-
-See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
-for the definition of the components.
-
-See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
-for how to calculate the Queue Notify address.
-
-\drivernormative{\paragraph}{Available Buffer Notifications}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}
-If VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated:
-\begin{itemize}
-\item If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver MUST use the
-\field{queue_notify_data} value instead of the virtqueue index.
-\item If VIRTIO_F_NOTIFICATION_DATA has been negotiated, the driver MUST set the
-\field{vqn} field to the \field{queue_notify_data} value.
-\end{itemize}
-
-\subsubsection{Used Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications}
-
-If a used buffer notification is necessary for a virtqueue, the device would typically act as follows:
-
-\begin{itemize}
-  \item If MSI-X capability is disabled:
-    \begin{enumerate}
-    \item Set the lower bit of the ISR Status field for the device.
-
-    \item Send the appropriate PCI interrupt for the device.
-    \end{enumerate}
-
-  \item If MSI-X capability is enabled:
-    \begin{enumerate}
-    \item If \field{queue_msix_vector} is not NO_VECTOR,
-      request the appropriate MSI-X interrupt message for the
-      device, \field{queue_msix_vector} sets the MSI-X Table entry
-      number.
-    \end{enumerate}
-\end{itemize}
-
-\devicenormative{\paragraph}{Used Buffer Notifications}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications}
-
-If MSI-X capability is enabled and \field{queue_msix_vector} is
-NO_VECTOR for a virtqueue, the device MUST NOT deliver an interrupt
-for that virtqueue.
-
-\subsubsection{Notification of Device Configuration Changes}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
-
-Some virtio PCI devices can change the device configuration
-state, as reflected in the device-specific configuration region of the device. In this case:
-
-\begin{itemize}
-  \item If MSI-X capability is disabled:
-    \begin{enumerate}
-    \item Set the second lower bit of the ISR Status field for the device.
-
-    \item Send the appropriate PCI interrupt for the device.
-    \end{enumerate}
-
-  \item If MSI-X capability is enabled:
-    \begin{enumerate}
-    \item If \field{config_msix_vector} is not NO_VECTOR,
-      request the appropriate MSI-X interrupt message for the
-      device, \field{config_msix_vector} sets the MSI-X Table entry
-      number.
-    \end{enumerate}
-\end{itemize}
-
-A single interrupt MAY indicate both that one or more virtqueue has
-been used and that the configuration space has changed.
-
-\devicenormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
-
-If MSI-X capability is enabled and \field{config_msix_vector} is
-NO_VECTOR, the device MUST NOT deliver an interrupt
-for device configuration space changes.
-
-\drivernormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
-
-A driver MUST handle the case where the same interrupt is used to indicate
-both device configuration space change and one or more virtqueues being used.
-
-\subsubsection{Driver Handling Interrupts}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Driver Handling Interrupts}
-The driver interrupt handler would typically:
-
-\begin{itemize}
-  \item If MSI-X capability is disabled:
-    \begin{itemize}
-      \item Read the ISR Status field, which will reset it to zero.
-      \item If the lower bit is set:
-        look through all virtqueues for the
-        device, to see if any progress has been made by the device
-        which requires servicing.
-      \item If the second lower bit is set:
-        re-examine the configuration space to see what changed.
-    \end{itemize}
-  \item If MSI-X capability is enabled:
-    \begin{itemize}
-      \item
-        Look through all virtqueues mapped to that MSI-X vector for the
-        device, to see if any progress has been made by the device
-        which requires servicing.
-      \item
-        If the MSI-X vector is equal to \field{config_msix_vector},
-        re-examine the configuration space to see what changed.
-    \end{itemize}
-\end{itemize}
-
-\section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over MMIO}
-
-Virtual environments without PCI support (a common situation in
-embedded devices models) might use simple memory mapped device
-(``virtio-mmio'') instead of the PCI device.
-
-The memory mapped virtio device behaviour is based on the PCI
-device specification. Therefore most operations including device
-initialization, queues configuration and buffer transfers are
-nearly identical. Existing differences are described in the
-following sections.
+\input{transport-pci.tex}
 
 \subsection{MMIO Device Discovery}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Discovery}
 
diff --git a/transport-pci.tex b/transport-pci.tex
new file mode 100644
index 0000000..49c35bd
--- /dev/null
+++ b/transport-pci.tex
@@ -0,0 +1,1160 @@
+\section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
+
+Virtio devices are commonly implemented as PCI devices.
+
+A Virtio device can be implemented as any kind of PCI device:
+a Conventional PCI device or a PCI Express
+device.  To assure designs meet the latest level
+requirements, see 
+the PCI-SIG home page at \url{http://www.pcisig.com} for any
+approved changes.
+
+\devicenormative{\subsection}{Virtio Over PCI Bus}{Virtio Transport Options / Virtio Over PCI Bus}
+A Virtio device using Virtio Over PCI Bus MUST expose to
+guest an interface that meets the specification requirements of
+the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
+and \hyperref[intro:PCIe]{[PCIe]}
+respectively. 
+
+\subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+
+Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through
+0x107F inclusive is a virtio device. The actual value within this range
+indicates which virtio device is supported by the device.
+The PCI Device ID is calculated by adding 0x1040 to the Virtio Device ID,
+as indicated in section \ref{sec:Device Types}.
+Additionally, devices MAY utilize a Transitional PCI Device ID range,
+0x1000 to 0x103F depending on the device type.
+
+\devicenormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+
+Devices MUST have the PCI Vendor ID 0x1AF4.
+Devices MUST either have the PCI Device ID calculated by adding 0x1040
+to the Virtio Device ID, as indicated in section \ref{sec:Device
+Types} or have the Transitional PCI Device ID depending on the device type,
+as follows:
+
+\begin{tabular}{|l|c|}
+\hline
+Transitional PCI Device ID  &  Virtio Device    \\
+\hline \hline
+0x1000      &   network device     \\
+\hline
+0x1001     &   block device     \\
+\hline
+0x1002     & memory ballooning (traditional)  \\
+\hline
+0x1003     &      console       \\
+\hline
+0x1004     &     SCSI host      \\
+\hline
+0x1005     &  entropy source    \\
+\hline
+0x1009     &   9P transport     \\
+\hline
+\end{tabular}
+
+For example, the network device with the Virtio Device ID 1
+has the PCI Device ID 0x1041 or the Transitional PCI Device ID 0x1000.
+
+The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY reflect
+the PCI Vendor and Device ID of the environment (for informational purposes by the driver).
+
+Non-transitional devices SHOULD have a PCI Device ID in the range
+0x1040 to 0x107f.
+Non-transitional devices SHOULD have a PCI Revision ID of 1 or higher.
+Non-transitional devices SHOULD have a PCI Subsystem Device ID of 0x40 or higher.
+
+This is to reduce the chance of a legacy driver attempting
+to drive the device.
+
+\drivernormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+Drivers MUST match devices with the PCI Vendor ID 0x1AF4 and
+the PCI Device ID in the range 0x1040 to 0x107f,
+calculated by adding 0x1040 to the Virtio Device ID,
+as indicated in section \ref{sec:Device Types}.
+Drivers for device types listed in section \ref{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+MUST match devices with the PCI Vendor ID 0x1AF4 and
+the Transitional PCI Device ID indicated in section
+ \ref{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Discovery}.
+
+Drivers MUST match any PCI Revision ID value.
+Drivers MAY match any PCI Subsystem Vendor ID and any
+PCI Subsystem Device ID value.
+
+\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
+Transitional devices MUST have a PCI Revision ID of 0.
+Transitional devices MUST have the PCI Subsystem Device ID
+matching the Virtio Device ID, as indicated in section \ref{sec:Device Types}.
+Transitional devices MUST have the Transitional PCI Device ID in
+the range 0x1000 to 0x103f.
+
+This is to match legacy drivers.
+
+\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+The device is configured via I/O and/or memory regions (though see
+\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+for access via the PCI configuration space), as specified by Virtio
+Structure PCI Capabilities.
+
+Fields of different sizes are present in the device
+configuration regions.
+All 64-bit, 32-bit and 16-bit fields are little-endian.
+64-bit fields are to be treated as two 32-bit fields,
+with low 32 bit part followed by the high 32 bit part.
+
+\drivernormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+For device configuration access, the driver MUST use 8-bit wide
+accesses for 8-bit wide fields, 16-bit wide and aligned accesses
+for 16-bit wide fields and 32-bit wide and aligned accesses for
+32-bit and 64-bit wide fields. For 64-bit fields, the driver MAY
+access each of the high and low 32-bit parts of the field
+independently.
+
+\devicenormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+For 64-bit device configuration fields, the device MUST allow driver
+independent access to high and low 32-bit parts of the field.
+
+\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The virtio device configuration layout includes several structures:
+\begin{itemize}
+\item Common configuration
+\item Notifications
+\item ISR Status
+\item Device-specific configuration (optional)
+\item PCI configuration access
+\end{itemize}
+
+Each structure can be mapped by a Base Address register (BAR) belonging to
+the function, or accessed via the special VIRTIO_PCI_CAP_PCI_CFG field in the PCI configuration space.
+
+The location of each structure is specified using a vendor-specific PCI capability located
+on the capability list in PCI configuration space of the device.
+This virtio structure capability uses little-endian format; all fields are
+read-only for the driver unless stated otherwise:
+
+\begin{lstlisting}
+struct virtio_pci_cap {
+        u8 cap_vndr;    /* Generic PCI field: PCI_CAP_ID_VNDR */
+        u8 cap_next;    /* Generic PCI field: next ptr. */
+        u8 cap_len;     /* Generic PCI field: capability length */
+        u8 cfg_type;    /* Identifies the structure. */
+        u8 bar;         /* Where to find it. */
+        u8 id;          /* Multiple capabilities of the same type */
+        u8 padding[2];  /* Pad to full dword. */
+        le32 offset;    /* Offset within bar. */
+        le32 length;    /* Length of the structure, in bytes. */
+};
+\end{lstlisting}
+
+This structure can be followed by extra data, depending on
+\field{cfg_type}, as documented below.
+
+The fields are interpreted as follows:
+
+\begin{description}
+\item[\field{cap_vndr}]
+        0x09; Identifies a vendor-specific capability.
+
+\item[\field{cap_next}]
+        Link to next capability in the capability list in the PCI configuration space.
+
+\item[\field{cap_len}]
+        Length of this capability structure, including the whole of
+        struct virtio_pci_cap, and extra data if any.
+        This length MAY include padding, or fields unused by the driver.
+
+\item[\field{cfg_type}]
+        identifies the structure, according to the following table:
+
+\begin{lstlisting}
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG        1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG        2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG           3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG        4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG           5
+/* Shared memory region */
+#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
+/* Vendor-specific data */
+#define VIRTIO_PCI_CAP_VENDOR_CFG        9
+\end{lstlisting}
+
+        Any other value is reserved for future use.
+
+        Each structure is detailed individually below.
+
+        The device MAY offer more than one structure of any type - this makes it
+        possible for the device to expose multiple interfaces to drivers.  The order of
+        the capabilities in the capability list specifies the order of preference
+        suggested by the device.  A device may specify that this ordering mechanism be
+        overridden by the use of the \field{id} field.
+        \begin{note}
+          For example, on some hypervisors, notifications using IO accesses are
+        faster than memory accesses. In this case, the device would expose two
+        capabilities with \field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG:
+        the first one addressing an I/O BAR, the second one addressing a memory BAR.
+        In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on
+        memory BAR when I/O resources are unavailable.
+        \end{note}
+
+\item[\field{bar}]
+        values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
+        the function located beginning at 10h in PCI Configuration Space
+        and used to map the structure into Memory or I/O Space.
+        The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
+        or I/O Space.
+
+        Any other value is reserved for future use.
+
+\item[\field{id}]
+        Used by some device types to uniquely identify multiple capabilities
+        of a certain type. If the device type does not specify the meaning of
+        this field, its contents are undefined.
+
+
+\item[\field{offset}]
+        indicates where the structure begins relative to the base address associated
+        with the BAR.  The alignment requirements of \field{offset} are indicated
+        in each structure-specific section below.
+
+\item[\field{length}]
+        indicates the length of the structure.
+
+        \field{length} MAY include padding, or fields unused by the driver, or
+        future extensions.
+
+        \begin{note}
+        For example, a future device might present a large structure size of several
+        MBytes.
+        As current devices never utilize structures larger than 4KBytes in size,
+        driver MAY limit the mapped structure size to e.g.
+        4KBytes (thus ignoring parts of structure after the first
+        4KBytes) to allow forward compatibility with such devices without loss of
+        functionality and without wasting resources.
+        \end{note}
+\end{description}
+
+A variant of this type, struct virtio_pci_cap64, is defined for
+those capabilities that require offsets or lengths larger than
+4GiB:
+
+\begin{lstlisting}
+struct virtio_pci_cap64 {
+        struct virtio_pci_cap cap;
+        u32 offset_hi;
+        u32 length_hi;
+};
+\end{lstlisting}
+
+Given that the \field{cap.length} and \field{cap.offset} fields
+are only 32 bit, the additional \field{offset_hi} and \field{length_hi}
+fields provide the most significant 32 bits of a total 64 bit offset and
+length within the BAR specified by \field{cap.bar}.
+
+\drivernormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{cfg_type} value.
+
+The driver SHOULD use the first instance of each virtio structure type they can
+support.
+
+The driver MUST accept a \field{cap_len} value which is larger than specified here.
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{bar} value.
+
+        The drivers SHOULD only map part of configuration structure
+        large enough for device operation.  The drivers MUST handle
+        an unexpectedly large \field{length}, but MAY check that \field{length}
+        is large enough for device operation.
+
+The driver MUST NOT write into any field of the capability structure,
+with the exception of those with \field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as
+detailed in \ref{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}.
+
+\devicenormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The device MUST include any extra data (from the beginning of the \field{cap_vndr} field
+through end of the extra data fields if any) in \field{cap_len}.
+The device MAY append extra data
+or padding to any structure beyond that.
+
+If the device presents multiple structures of the same type, it SHOULD order
+them from optimal (first) to least-optimal (last).
+
+\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+
+The common configuration structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
+
+\begin{lstlisting}
+struct virtio_pci_common_cfg {
+        /* About the whole device. */
+        le32 device_feature_select;     /* read-write */
+        le32 device_feature;            /* read-only for driver */
+        le32 driver_feature_select;     /* read-write */
+        le32 driver_feature;            /* read-write */
+        le16 config_msix_vector;        /* read-write */
+        le16 num_queues;                /* read-only for driver */
+        u8 device_status;               /* read-write */
+        u8 config_generation;           /* read-only for driver */
+
+        /* About a specific virtqueue. */
+        le16 queue_select;              /* read-write */
+        le16 queue_size;                /* read-write */
+        le16 queue_msix_vector;         /* read-write */
+        le16 queue_enable;              /* read-write */
+        le16 queue_notify_off;          /* read-only for driver */
+        le64 queue_desc;                /* read-write */
+        le64 queue_driver;              /* read-write */
+        le64 queue_device;              /* read-write */
+        le16 queue_notify_data;         /* read-only for driver */
+        le16 queue_reset;               /* read-write */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{device_feature_select}]
+        The driver uses this to select which feature bits \field{device_feature} shows.
+        Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
+
+\item[\field{device_feature}]
+        The device uses this to report which feature bits it is
+        offering to the driver: the driver writes to
+        \field{device_feature_select} to select which feature bits are presented.
+
+\item[\field{driver_feature_select}]
+        The driver uses this to select which feature bits \field{driver_feature} shows.
+        Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
+
+\item[\field{driver_feature}]
+        The driver writes this to accept feature bits offered by the device.
+        Driver Feature Bits selected by \field{driver_feature_select}.
+
+\item[\field{config_msix_vector}]
+        The driver sets the Configuration Vector for MSI-X.
+
+\item[\field{num_queues}]
+        The device specifies the maximum number of virtqueues supported here.
+
+\item[\field{device_status}]
+        The driver writes the device status here (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}). Writing 0 into this
+        field resets the device.
+
+\item[\field{config_generation}]
+        Configuration atomicity value.  The device changes this every time the
+        configuration noticeably changes.
+
+\item[\field{queue_select}]
+        Queue Select. The driver selects which virtqueue the following
+        fields refer to.
+
+\item[\field{queue_size}]
+        Queue Size.  On reset, specifies the maximum queue size supported by
+        the device. This can be modified by the driver to reduce memory requirements.
+        A 0 means the queue is unavailable.
+
+\item[\field{queue_msix_vector}]
+        The driver uses this to specify the queue vector for MSI-X.
+
+\item[\field{queue_enable}]
+        The driver uses this to selectively prevent the device from executing requests from this virtqueue.
+        1 - enabled; 0 - disabled.
+
+\item[\field{queue_notify_off}]
+        The driver reads this to calculate the offset from start of Notification structure at
+        which this virtqueue is located.
+        \begin{note} this is \em{not} an offset in bytes.
+        See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
+        \end{note}
+
+\item[\field{queue_desc}]
+        The driver writes the physical address of Descriptor Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{queue_driver}]
+        The driver writes the physical address of Driver Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{queue_device}]
+        The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{queue_notify_data}]
+        This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
+        The driver will use this value to put it in the 'virtqueue number' field
+        in the available buffer notification structure.
+        See section \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}.
+        \begin{note}
+        This field provides the device with flexibility to determine how virtqueues
+        will be referred to in available buffer notifications.
+        In a trivial case the device can set \field{queue_notify_data}=vqn. Some devices
+        may benefit from providing another value, for example an internal virtqueue
+        identifier, or an internal offset related to the virtqueue number.
+        \end{note}
+
+\item[\field{queue_reset}]
+        The driver uses this to selectively reset the queue.
+        This field exists only if VIRTIO_F_RING_RESET has been
+        negotiated. (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
+
+\end{description}
+
+\devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+\field{offset} MUST be 4-byte aligned.
+
+The device MUST present at least one common configuration capability.
+
+The device MUST present the feature bits it is offering in \field{device_feature}, starting at bit \field{device_feature_select} $*$ 32 for any \field{device_feature_select} written by the driver.
+\begin{note}
+  This means that it will present 0 for any \field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63.
+\end{note}
+
+The device MUST present any valid feature bits the driver has written in \field{driver_feature}, starting at bit \field{driver_feature_select} $*$ 32 for any \field{driver_feature_select} written by the driver.  Valid feature bits are those which are subset of the corresponding \field{device_feature} bits.  The device MAY present invalid bits written by the driver.
+
+\begin{note}
+  This means that a device can ignore writes for feature bits it never
+  offers, and simply present 0 on reads.  Or it can just mirror what the driver wrote
+  (but it will still have to check them when the driver sets FEATURES_OK).
+\end{note}
+
+\begin{note}
+  A driver shouldn't write invalid bits anyway, as per \ref{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it.
+\end{note}
+
+The device MUST present a changed \field{config_generation} after the
+driver has read a device-specific configuration value which has
+changed since any part of the device-specific configuration was last
+read.
+\begin{note}
+As \field{config_generation} is an 8-bit value, simply incrementing it
+on every configuration change could violate this requirement due to wrap.
+Better would be to set an internal flag when it has changed,
+and if that flag is set when the driver reads from the device-specific
+configuration, increment \field{config_generation} and clear the flag.
+\end{note}
+
+The device MUST reset when 0 is written to \field{device_status}, and
+present a 0 in \field{device_status} once that is done.
+
+The device MUST present a 0 in \field{queue_enable} on reset.
+
+If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
+\field{queue_reset} on reset.
+
+If VIRTIO_F_RING_RESET has been negotiated, the device MUST present a 0 in
+\field{queue_reset} after the virtqueue is enabled with \field{queue_enable}.
+
+The device MUST reset the queue when 1 is written to \field{queue_reset}. The
+device MUST continue to present 1 in \field{queue_reset} as long as the queue reset
+is ongoing. The device MUST present 0 in both \field{queue_reset} and \field{queue_enable}
+when queue reset has completed.
+(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
+
+The device MUST present a 0 in \field{queue_size} if the virtqueue
+corresponding to the current \field{queue_select} is unavailable.
+
+If VIRTIO_F_RING_PACKED has not been negotiated, the device MUST
+present either a value of 0 or a power of 2 in
+\field{queue_size}.
+
+\drivernormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+
+The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation}, \field{queue_notify_off} or \field{queue_notify_data}.
+
+If VIRTIO_F_RING_PACKED has been negotiated,
+the driver MUST NOT write the value 0 to \field{queue_size}.
+If VIRTIO_F_RING_PACKED has not been negotiated,
+the driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
+
+The driver MUST configure the other virtqueue fields before enabling the virtqueue
+with \field{queue_enable}.
+
+After writing 0 to \field{device_status}, the driver MUST wait for a read of
+\field{device_status} to return 0 before reinitializing the device.
+
+The driver MUST NOT write a 0 to \field{queue_enable}.
+
+If VIRTIO_F_RING_RESET has been negotiated, after the driver writes 1 to
+\field{queue_reset} to reset the queue, the driver MUST NOT consider queue
+reset to be complete until it reads back 0 in \field{queue_reset}. The driver
+MAY re-enable the queue by writing 1 to \field{queue_enable} after ensuring
+that other virtqueue fields have been set up correctly. The driver MAY set
+driver-writeable queue configuration values to different values than those that
+were used before the queue reset.
+(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
+
+\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+
+The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
+capability.  This capability is immediately followed by an additional
+field, like so:
+
+\begin{lstlisting}
+struct virtio_pci_notify_cap {
+        struct virtio_pci_cap cap;
+        le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+\end{lstlisting}
+
+\field{notify_off_multiplier} is combined with the \field{queue_notify_off} to
+derive the Queue Notify address within a BAR for a virtqueue:
+
+\begin{lstlisting}
+        cap.offset + queue_notify_off * notify_off_multiplier
+\end{lstlisting}
+
+The \field{cap.offset} and \field{notify_off_multiplier} are taken from the
+notification capability structure above, and the \field{queue_notify_off} is
+taken from the common configuration structure.
+
+\begin{note}
+For example, if \field{notifier_off_multiplier} is 0, the device uses
+the same Queue Notify address for all queues.
+\end{note}
+
+\devicenormative{\paragraph}{Notification capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+The device MUST present at least one notification capability.
+
+For devices not offering VIRTIO_F_NOTIFICATION_DATA:
+
+The \field{cap.offset} MUST be 2-byte aligned.
+
+The device MUST either present \field{notify_off_multiplier} as an even power of 2,
+or present \field{notify_off_multiplier} as 0.
+
+The value \field{cap.length} presented by the device MUST be at least 2
+and MUST be large enough to support queue notification offsets
+for all supported queues in all possible configurations.
+
+For all queues, the value \field{cap.length} presented by the device MUST satisfy:
+\begin{lstlisting}
+cap.length >= queue_notify_off * notify_off_multiplier + 2
+\end{lstlisting}
+
+For devices offering VIRTIO_F_NOTIFICATION_DATA:
+
+The device MUST either present \field{notify_off_multiplier} as a
+number that is a power of 2 that is also a multiple 4,
+or present \field{notify_off_multiplier} as 0.
+
+The \field{cap.offset} MUST be 4-byte aligned.
+
+The value \field{cap.length} presented by the device MUST be at least 4
+and MUST be large enough to support queue notification offsets
+for all supported queues in all possible configurations.
+
+For all queues, the value \field{cap.length} presented by the device MUST satisfy:
+\begin{lstlisting}
+cap.length >= queue_notify_off * notify_off_multiplier + 4
+\end{lstlisting}
+
+\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The VIRTIO_PCI_CAP_ISR_CFG capability
+refers to at least a single byte, which contains the 8-bit ISR status field
+to be used for INT\#x interrupt handling.
+
+The \field{offset} for the \field{ISR status} has no alignment requirements.
+
+The ISR bits allow the driver to distinguish between device-specific configuration
+change interrupts and normal virtqueue interrupts:
+
+\begin{tabular}{ |l||l|l|l| }
+\hline
+Bits       & 0                               & 1               &  2 to 31 \\
+\hline
+Purpose    & Queue Interrupt  & Device Configuration Interrupt & Reserved \\
+\hline
+\end{tabular}
+
+To avoid an extra access, simply reading this register resets it to 0 and
+causes the device to de-assert the interrupt.
+
+In this way, driver read of ISR status causes the device to de-assert
+an interrupt.
+
+See sections \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications} and \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used.
+
+\devicenormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.  
+
+The device MUST set the Device Configuration Interrupt bit
+in \field{ISR status} before sending a device configuration
+change notification to the driver.
+
+If MSI-X capability is disabled, the device MUST set the Queue
+Interrupt bit in \field{ISR status} before sending a virtqueue
+notification to the driver.
+
+If MSI-X capability is disabled, the device MUST set the Interrupt Status
+bit in the PCI Status register in the PCI Configuration Header of
+the device to the logical OR of all bits in \field{ISR status} of
+the device.  The device then asserts/deasserts INT\#x interrupts unless masked
+according to standard PCI rules \hyperref[intro:PCI]{[PCI]}.
+
+The device MUST reset \field{ISR status} to 0 on driver read.
+
+\drivernormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+If MSI-X capability is enabled, the driver SHOULD NOT access
+\field{ISR status} upon detecting a Queue Interrupt.
+
+\subsubsection{Device-specific configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
+
+The device MUST present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability for
+any device type which has a device-specific configuration.
+
+\devicenormative{\paragraph}{Device-specific configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
+
+The \field{offset} for the device-specific configuration MUST be 4-byte aligned.
+
+\subsubsection{Shared memory capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
+
+Shared memory regions \ref{sec:Basic Facilities of a Virtio
+Device / Shared Memory Regions} are enumerated on the PCI transport
+as a sequence of VIRTIO_PCI_CAP_SHARED_MEMORY_CFG capabilities, one per region.
+
+The capability is defined by a struct virtio_pci_cap64 and
+utilises the \field{cap.id} to allow multiple shared memory
+regions per device.
+The identifier in \field{cap.id} does not denote a certain order of
+preference; it is only used to uniquely identify a region.
+
+\devicenormative{\paragraph}{Shared memory capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
+
+The region defined by the combination of the \field{cap.offset},
+\field{offset_hi}, and \field{cap.length}, \field{length_hi}
+fields MUST be contained within the BAR specified by
+\field{cap.bar}.
+
+The \field{cap.id} MUST be unique for any one device instance.
+
+\subsubsection{Vendor data capability}\label{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Vendor data capability}
+
+The optional Vendor data capability allows the device to present
+vendor-specific data to the driver, without
+conflicts, for debugging and/or reporting purposes,
+and without conflicting with standard functionality.
+
+This capability augments but does not replace the standard
+subsystem ID and subsystem vendor ID fields
+(offsets 0x2C and 0x2E in the PCI configuration space header)
+as specified by \hyperref[intro:PCI]{[PCI]}.
+
+Vendor data capability is enumerated on the PCI transport
+as a VIRTIO_PCI_CAP_VENDOR_CFG capability.
+
+The capability has the following structure:
+\begin{lstlisting}
+struct virtio_pci_vndr_data {
+        u8 cap_vndr;    /* Generic PCI field: PCI_CAP_ID_VNDR */
+        u8 cap_next;    /* Generic PCI field: next ptr. */
+        u8 cap_len;     /* Generic PCI field: capability length */
+        u8 cfg_type;    /* Identifies the structure. */
+        u16 vendor_id;  /* Identifies the vendor-specific format. */
+	/* For Vendor Definition */
+	/* Pads structure to a multiple of 4 bytes */
+	/* Reads must not have side effects */
+};
+\end{lstlisting}
+
+Where \field{vendor_id} identifies the PCI-SIG assigned Vendor ID
+as specified by \hyperref[intro:PCI]{[PCI]}.
+
+Note that the capability size is required to be a multiple of 4.
+
+To make it safe for a generic driver to access the capability,
+reads from this capability MUST NOT have any side effects.
+
+\devicenormative{\paragraph}{Vendor data capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Vendor data capability}
+
+Devices CAN present \field{vendor_id} that does not match
+either the PCI Vendor ID or the PCI Subsystem Vendor ID.
+
+Devices CAN present multiple Vendor data capabilities with
+either different or identical \field{vendor_id} values.
+
+The value \field{vendor_id} MUST NOT equal 0x1AF4.
+
+The size of the Vendor data capability MUST be a multiple of 4 bytes.
+
+Reads of the Vendor data capability by the driver MUST NOT have any
+side effects.
+
+\drivernormative{\paragraph}{Vendor data capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Vendor data capability}
+
+The driver SHOULD NOT use the Vendor data capability except
+for debugging and reporting purposes.
+
+The driver MUST qualify the \field{vendor_id} before
+interpreting or writing into the Vendor data capability.
+
+\subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The VIRTIO_PCI_CAP_PCI_CFG capability
+creates an alternative (and likely suboptimal) access method to the
+common configuration, notification, ISR and device-specific configuration regions.
+
+The capability is immediately followed by an additional field like so:
+
+\begin{lstlisting}
+struct virtio_pci_cfg_cap {
+        struct virtio_pci_cap cap;
+        u8 pci_cfg_data[4]; /* Data for BAR access. */
+};
+\end{lstlisting}
+
+The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} and
+\field{pci_cfg_data} are read-write (RW) for the driver.
+
+To access a device region, the driver writes into the capability
+structure (ie. within the PCI configuration space) as follows:
+
+\begin{itemize}
+\item The driver sets the BAR to access by writing to \field{cap.bar}.
+
+\item The driver sets the size of the access by writing 1, 2 or 4 to
+  \field{cap.length}.
+
+\item The driver sets the offset within the BAR by writing to
+  \field{cap.offset}.
+\end{itemize}
+
+At that point, \field{pci_cfg_data} will provide a window of size
+\field{cap.length} into the given \field{cap.bar} at offset \field{cap.offset}.
+
+\devicenormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.
+
+Upon detecting driver write access
+to \field{pci_cfg_data}, the device MUST execute a write access
+at offset \field{cap.offset} at BAR selected by \field{cap.bar} using the first \field{cap.length}
+bytes from \field{pci_cfg_data}.
+
+Upon detecting driver read access
+to \field{pci_cfg_data}, the device MUST
+execute a read access of length cap.length at offset \field{cap.offset}
+at BAR selected by \field{cap.bar} and store the first \field{cap.length} bytes in
+\field{pci_cfg_data}.
+
+\drivernormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The driver MUST NOT write a \field{cap.offset} which is not
+a multiple of \field{cap.length} (ie. all accesses MUST be aligned).
+
+The driver MUST NOT read or write \field{pci_cfg_data}
+unless \field{cap.bar}, \field{cap.length} and \field{cap.offset}
+address \field{cap.length} bytes within a BAR range
+specified by some other Virtio Structure PCI Capability
+of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
+
+\subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
+
+Transitional devices MUST present part of configuration
+registers in a legacy configuration structure in BAR0 in the first I/O
+region of the PCI device, as documented below.
+When using the legacy interface, transitional drivers
+MUST use the legacy configuration structure in BAR0 in the first
+I/O region of the PCI device, as documented below.
+
+When using the legacy interface the driver MAY access
+the device-specific configuration region using any width accesses, and
+a transitional device MUST present driver with the same results as
+when accessed using the ``natural'' access method (i.e.
+32-bit accesses for 32-bit fields, etc).
+
+Note that this is possible because while the virtio common configuration structure is PCI
+(i.e. little) endian, when using the legacy interface the device-specific
+configuration region is encoded in the native endian of the guest (where such distinction is
+applicable).
+
+When used through the legacy interface, the virtio common configuration structure looks as follows:
+
+\begin{tabularx}{\textwidth}{ |X||X|X|X|X|X|X|X|X| }
+\hline
+ Bits & 32 & 32 & 32 & 16 & 16 & 16 & 8 & 8 \\
+\hline
+ Read / Write & R & R+W & R+W & R & R+W & R+W & R+W & R \\
+\hline
+ Purpose & Device Features bits 0:31 & Driver Features bits 0:31 &
+  Queue Address & \field{queue_size} & \field{queue_select} & Queue Notify &
+  Device Status & ISR \newline Status \\
+\hline
+\end{tabularx}
+
+If MSI-X is enabled for the device, two additional fields
+immediately follow this header:
+
+\begin{tabular}{ |l||l|l| }
+\hline
+Bits       & 16             & 16     \\
+\hline
+Read/Write & R+W            & R+W    \\
+\hline
+Purpose (MSI-X) & \field{config_msix_vector}  & \field{queue_msix_vector} \\
+\hline
+\end{tabular}
+
+Note: When MSI-X capability is enabled, device-specific configuration starts at
+byte offset 24 in virtio common configuration structure structure. When MSI-X capability is not
+enabled, device-specific configuration starts at byte offset 20 in virtio
+header.  ie. once you enable MSI-X on the device, the other fields move.
+If you turn it off again, they move back!
+
+Any device-specific configuration space immediately follows
+these general headers:
+
+\begin{tabular}{|l||l|l|}
+\hline
+Bits & Device Specific & \multirow{3}{*}{\ldots} \\
+\cline{1-2}
+Read / Write & Device Specific & \\
+\cline{1-2}
+Purpose & Device Specific & \\
+\hline
+\end{tabular}
+
+When accessing the device-specific configuration space
+using the legacy interface, transitional
+drivers MUST access the device-specific configuration space
+at an offset immediately following the general headers.
+
+When using the legacy interface, transitional
+devices MUST present the device-specific configuration space
+if any at an offset immediately following the general headers.
+
+Note that only Feature Bits 0 to 31 are accessible through the
+Legacy Interface. When used through the Legacy Interface,
+Transitional Devices MUST assume that Feature Bits 32 to 63
+are not acknowledged by Driver.
+
+As legacy devices had no \field{config_generation} field,
+see \ref{sec:Basic Facilities of a Virtio Device / Device
+Configuration Space / Legacy Interface: Device Configuration
+Space}~\nameref{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space} for workarounds.
+
+\subsubsection{Non-transitional Device With Legacy Driver: A Note
+on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio
+Over PCI Bus / PCI Device Layout / Non-transitional Device With
+Legacy Driver: A Note on PCI Device Layout}
+
+All known legacy drivers check either the PCI Revision or the
+Device and Vendor IDs, and thus won't attempt to drive a
+non-transitional device.
+
+A buggy legacy driver might mistakenly attempt to drive a
+non-transitional device. If support for such drivers is required
+(as opposed to fixing the bug), the following would be the
+recommended way to detect and handle them.
+\begin{note}
+Such buggy drivers are not currently known to be used in
+production.
+\end{note}
+
+\subparagraph{Device Requirements: Non-transitional Device With Legacy Driver}
+\label{drivernormative:Virtio Transport Options / Virtio Over PCI
+Bus / PCI-specific Initialization And Device Operation /
+Device Initialization / Non-transitional Device With Legacy
+Driver}
+\label{devicenormative:Virtio Transport Options / Virtio Over PCI
+Bus / PCI-specific Initialization And Device Operation /
+Device Initialization / Non-transitional Device With Legacy
+Driver}
+
+Non-transitional devices, on a platform where a legacy driver for
+a legacy device with the same ID (including PCI Revision, Device
+and Vendor IDs) is known to have previously existed,
+SHOULD take the following steps to cause the legacy driver to
+fail gracefully when it attempts to drive them:
+
+\begin{enumerate}
+\item Present an I/O BAR in BAR0, and
+\item Respond to a single-byte zero write to offset 18
+   (corresponding to Device Status register in the legacy layout)
+   of BAR0 by presenting zeroes on every BAR and ignoring writes.
+\end{enumerate}
+
+\subsection{PCI-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation}
+
+\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization}
+
+This documents PCI-specific steps executed during Device Initialization.
+
+\paragraph{Virtio Device Configuration Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection}
+
+As a prerequisite to device initialization, the driver scans the
+PCI capability list, detecting virtio configuration layout using Virtio
+Structure PCI capabilities as detailed in \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+\subparagraph{Legacy Interface: A Note on Device Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection / Legacy Interface: A Note on Device Layout Detection}
+
+Legacy drivers skipped the Device Layout Detection step, assuming legacy
+device configuration space in BAR0 in I/O space unconditionally.
+
+Legacy devices did not have the Virtio PCI Capability in their
+capability list.
+
+Therefore:
+
+Transitional devices MUST expose the Legacy Interface in I/O
+space in BAR0.
+
+Transitional drivers MUST look for the Virtio PCI
+Capabilities on the capability list.
+If these are not present, driver MUST assume a legacy device,
+and use it through the legacy interface.
+
+Non-transitional drivers MUST look for the Virtio PCI
+Capabilities on the capability list.
+If these are not present, driver MUST assume a legacy device,
+and fail gracefully.
+
+\paragraph{MSI-X Vector Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+When MSI-X capability is present and enabled in the device
+(through standard PCI configuration space) \field{config_msix_vector} and \field{queue_msix_vector} are used to map configuration change and queue
+interrupts to MSI-X vectors. In this case, the ISR Status is unused.
+
+Writing a valid MSI-X Table entry number, 0 to 0x7FF, to
+\field{config_msix_vector}/\field{queue_msix_vector} maps interrupts triggered
+by the configuration change/selected queue events respectively to
+the corresponding MSI-X vector. To disable interrupts for an
+event type, the driver unmaps this event by writing a special NO_VECTOR
+value:
+
+\begin{lstlisting}
+/* Vector value used to disable MSI for queue */
+#define VIRTIO_MSI_NO_VECTOR            0xffff
+\end{lstlisting}
+
+Note that mapping an event to vector might require device to
+allocate internal device resources, and thus could fail. 
+
+\devicenormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+A device that has an MSI-X capability SHOULD support at least 2
+and at most 0x800 MSI-X vectors.
+Device MUST report the number of vectors supported in
+\field{Table Size} in the MSI-X Capability as specified in
+\hyperref[intro:PCI]{[PCI]}.
+The device SHOULD restrict the reported MSI-X Table Size field
+to a value that might benefit system performance.
+\begin{note}
+For example, a device which does not expect to send
+interrupts at a high rate might only specify 2 MSI-X vectors.
+\end{note}
+Device MUST support mapping any event type to any valid
+vector 0 to MSI-X \field{Table Size}.
+Device MUST support unmapping any event type.
+
+The device MUST return vector mapped to a given event,
+(NO_VECTOR if unmapped) on read of \field{config_msix_vector}/\field{queue_msix_vector}.
+The device MUST have all queue and configuration change
+events are unmapped upon reset.
+
+Devices SHOULD NOT cause mapping an event to vector to fail
+unless it is impossible for the device to satisfy the mapping
+request.  Devices MUST report mapping
+failures by returning the NO_VECTOR value when the relevant
+\field{config_msix_vector}/\field{queue_msix_vector} field is read. 
+
+\drivernormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+Driver MUST support device with any MSI-X Table Size 0 to 0x7FF.
+Driver MAY fall back on using INT\#x interrupts for a device
+which only supports one MSI-X vector (MSI-X Table Size = 0).
+
+Driver MAY intepret the Table Size as a hint from the device
+for the suggested number of MSI-X vectors to use.
+
+Driver MUST NOT attempt to map an event to a vector
+outside the MSI-X Table supported by the device,
+as reported by \field{Table Size} in the MSI-X Capability.
+
+After mapping an event to vector, the
+driver MUST verify success by reading the Vector field value: on
+success, the previously written value is returned, and on
+failure, NO_VECTOR is returned. If a mapping failure is detected,
+the driver MAY retry mapping with fewer vectors, disable MSI-X
+or report device failure.
+
+\paragraph{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration}
+
+As a device can have zero or more virtqueues for bulk data
+transport\footnote{For example, the simplest network device has two virtqueues.}, the driver
+needs to configure them as part of the device-specific
+configuration.
+
+The driver typically does this as follows, for each virtqueue a device has:
+
+\begin{enumerate}
+\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
+
+\item Read the virtqueue size from \field{queue_size}. This controls how big the virtqueue is
+  (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist.
+
+\item Optionally, select a smaller virtqueue size and write it to \field{queue_size}.
+
+\item Allocate and zero Descriptor Table, Available and Used rings for the
+   virtqueue in contiguous physical memory.
+
+\item Optionally, if MSI-X capability is present and enabled on the
+  device, select a vector to use to request interrupts triggered
+  by virtqueue events. Write the MSI-X Table entry number
+  corresponding to this vector into \field{queue_msix_vector}. Read
+  \field{queue_msix_vector}: on success, previously written value is
+  returned; on failure, NO_VECTOR value is returned.
+\end{enumerate}
+
+\subparagraph{Legacy Interface: A Note on Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration / Legacy Interface: A Note on Virtqueue Configuration}
+When using the legacy interface, the queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout} with an alignment of 4096.
+Driver writes the physical address, divided
+by 4096 to the Queue Address field\footnote{The 4096 is based on the x86 page size, but it's also large
+enough to ensure that the separate parts of the virtqueue are on
+separate cache lines.
+}.  There was no mechanism to negotiate the queue size.
+
+\subsubsection{Available Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}
+
+When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
+the driver sends an available buffer notification to the device by writing
+the 16-bit virtqueue index
+of this virtqueue to the Queue Notify address.
+
+When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
+the driver sends an available buffer notification to the device by writing
+the following 32-bit value to the Queue Notify address:
+\lstinputlisting{notifications-le.c}
+
+See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications}~\nameref{sec:Basic Facilities of a Virtio Device / Driver notifications}
+for the definition of the components.
+
+See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+for how to calculate the Queue Notify address.
+
+\drivernormative{\paragraph}{Available Buffer Notifications}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}
+If VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated:
+\begin{itemize}
+\item If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver MUST use the
+\field{queue_notify_data} value instead of the virtqueue index.
+\item If VIRTIO_F_NOTIFICATION_DATA has been negotiated, the driver MUST set the
+\field{vqn} field to the \field{queue_notify_data} value.
+\end{itemize}
+
+\subsubsection{Used Buffer Notifications}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications}
+
+If a used buffer notification is necessary for a virtqueue, the device would typically act as follows:
+
+\begin{itemize}
+  \item If MSI-X capability is disabled:
+    \begin{enumerate}
+    \item Set the lower bit of the ISR Status field for the device.
+
+    \item Send the appropriate PCI interrupt for the device.
+    \end{enumerate}
+
+  \item If MSI-X capability is enabled:
+    \begin{enumerate}
+    \item If \field{queue_msix_vector} is not NO_VECTOR,
+      request the appropriate MSI-X interrupt message for the
+      device, \field{queue_msix_vector} sets the MSI-X Table entry
+      number.
+    \end{enumerate}
+\end{itemize}
+
+\devicenormative{\paragraph}{Used Buffer Notifications}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Used Buffer Notifications}
+
+If MSI-X capability is enabled and \field{queue_msix_vector} is
+NO_VECTOR for a virtqueue, the device MUST NOT deliver an interrupt
+for that virtqueue.
+
+\subsubsection{Notification of Device Configuration Changes}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+Some virtio PCI devices can change the device configuration
+state, as reflected in the device-specific configuration region of the device. In this case:
+
+\begin{itemize}
+  \item If MSI-X capability is disabled:
+    \begin{enumerate}
+    \item Set the second lower bit of the ISR Status field for the device.
+
+    \item Send the appropriate PCI interrupt for the device.
+    \end{enumerate}
+
+  \item If MSI-X capability is enabled:
+    \begin{enumerate}
+    \item If \field{config_msix_vector} is not NO_VECTOR,
+      request the appropriate MSI-X interrupt message for the
+      device, \field{config_msix_vector} sets the MSI-X Table entry
+      number.
+    \end{enumerate}
+\end{itemize}
+
+A single interrupt MAY indicate both that one or more virtqueue has
+been used and that the configuration space has changed.
+
+\devicenormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+If MSI-X capability is enabled and \field{config_msix_vector} is
+NO_VECTOR, the device MUST NOT deliver an interrupt
+for device configuration space changes.
+
+\drivernormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+A driver MUST handle the case where the same interrupt is used to indicate
+both device configuration space change and one or more virtqueues being used.
+
+\subsubsection{Driver Handling Interrupts}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Driver Handling Interrupts}
+The driver interrupt handler would typically:
+
+\begin{itemize}
+  \item If MSI-X capability is disabled:
+    \begin{itemize}
+      \item Read the ISR Status field, which will reset it to zero.
+      \item If the lower bit is set:
+        look through all virtqueues for the
+        device, to see if any progress has been made by the device
+        which requires servicing.
+      \item If the second lower bit is set:
+        re-examine the configuration space to see what changed.
+    \end{itemize}
+  \item If MSI-X capability is enabled:
+    \begin{itemize}
+      \item
+        Look through all virtqueues mapped to that MSI-X vector for the
+        device, to see if any progress has been made by the device
+        which requires servicing.
+      \item
+        If the MSI-X vector is equal to \field{config_msix_vector},
+        re-examine the configuration space to see what changed.
+    \end{itemize}
+\end{itemize}
+
+\section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over MMIO}
+
+Virtual environments without PCI support (a common situation in
+embedded devices models) might use simple memory mapped device
+(``virtio-mmio'') instead of the PCI device.
+
+The memory mapped virtio device behaviour is based on the PCI
+device specification. Therefore most operations including device
+initialization, queues configuration and buffer transfers are
+nearly identical. Existing differences are described in the
+following sections.
-- 
2.26.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox