* Re: [virtio-dev] [PATCH v11 04/10] admin: introduce virtio admin virtqueues
From: Zhu Lingshan @ 2023-04-06 7:17 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, Zhu Lingshan, pasic, Shahaf Shuler,
Parav Pandit, Max Gurtovoy
In-Reply-To: <9cbecf20548a7adf4d97d07b17e40eb3c1c50209.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> The admin virtqueues will be the first interface used to issue admin commands.
>
> Currently the virtio specification defines control virtqueue to manipulate
> features and configuration of the device it operates on:
> virtio-net, virtio-scsi, etc all have existing control virtqueues. However,
> control virtqueue commands are device type specific, which makes it very
> difficult to extend for device agnostic commands.
>
> Keeping the device-specific virtqueue separate from the admin virtqueue
> is simpler and has fewer potential problems. I don't think creating
> common infrastructure for device-specific control virtqueues across
> device types worthwhile or within the scope of this patch series.
>
> To support this requirement in a more generic way, this patch introduces
> a new admin virtqueue interface.
> The admin virtqueue can be seen as the virtqueue analog to a transport.
> The admin queue thus does nothing device type-specific (net, scsi, etc)
> and instead focuses on transporting the admin commands.
>
> We also support more than one admin virtqueue, for QoS and
> scalability requirements.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ---
> explain ordering of commands as suggested by Stefan
> dropped Max's S.O.B
> reword commit log as suggested by David
> minor wording fixes suggested by David
> ---
> admin.tex | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> content.tex | 7 +++--
> 2 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/admin.tex b/admin.tex
> index c869a1a..f201bcd 100644
> --- a/admin.tex
> +++ b/admin.tex
> @@ -182,3 +182,78 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
> \field{command_specific_data} and \field{command_specific_result}
> depends on these structures and is described separately or is
> implicit in the structure description.
> +
> +\section{Administration Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Administration Virtqueues}
> +
> +An administration virtqueue of an owner device is used to submit
> +group administration commands. An owner device can have more
> +than one administration virtqueue.
> +
> +If VIRTIO_F_ADMIN_VQ has been negotiated, an owner device exposes one
> +or more adminstration virtqueues. The number and locations of the
> +administration virtqueues are exposed by the owner device in a transport
> +specific manner.
> +
> +The driver queues requests to an arbitrary administration
> +virtqueue, and they are used by the device on that same
> +virtqueue. It is the responsibility of the driver to ensure
> +strict request ordering for commands, because they will be
> +consumed with no order constraints. For example, if consistency
> +is required then the driver can wait for the processing of a
> +first command by the device to be completed before submitting
> +another command depending on the first one.
> +
> +Administration virtqueues are used as follows:
> +\begin{itemize}
> +\item The driver submits the command using the \field{struct virtio_admin_cmd}
> +structure using a buffer consisting of two parts: a device-readable one followed by a
> +device-writable one.
> +\item the device-readable part includes fields from \field{opcode}
> +through \field{command_specific_data}.
> +\item the device-writeable buffer includes fields from \field{status}
> +through \field{command_specific_result} inclusive.
> +\end{itemize}
> +
> +For each command, this specification describes a distinct
> +format structure used for \field{command_specific_data} and
> +\field{command_specific_result}, the length of these fields
> +depends on the command.
> +
> +However, to ensure forward compatibility
> +\begin{itemize}
> +\item drivers are allowed to submit buffers that are longer
> +than the device expects
> +(that is, longer than the length of
> +\field{opcode} through \field{command_specific_data}).
> +This allows the driver to maintain
> +a single format structure even if some structure fields are
> +unused by the device.
> +\item drivers are allowed to submit buffers that are shorter
> +than what the device expects
> +(that is, shorter than the length of \field{status} through
> +\field{command_specific_result}). This allows the device to maintain
> +a single format structure even if some structure fields are
> +unused by the driver.
> +\end{itemize}
> +
> +The device compares the length of each part (device-readable and
> +device-writeable) of the buffer as submitted by driver to what it
> +expects and then silently truncates the structures to either the
> +length submitted by the driver, or the length described in this
> +specification, whichever is shorter. The device silently ignores
> +any data falling outside the shorter of the two lengths. Any
> +missing fields are interpreted as set to zero.
> +
> +Similarly, the driver compares the used buffer length
> +of the buffer to what it expects and then silently
> +truncates the structure to the used buffer length.
> +The driver silently ignores any data falling outside
> +the used buffer length reported by the device. Any missing
> +fields are interpreted as set to zero.
> +
> +This simplifies driver and device implementations since the
> +driver/device can simply maintain a single large structure (such
> +as a C structure) for a command and its result. As new versions
> +of the specification are designed, new fields can be added to the
> +tail of a structure, with the driver/device using the full
> +structure without concern for versioning.
> diff --git a/content.tex b/content.tex
> index 04592fb..2eb15fa 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
> \begin{description}
> \item[0 to 23, and 50 to 127] Feature bits for the specific device type
>
> -\item[24 to 40] Feature bits reserved for extensions to the queue and
> +\item[24 to 41] Feature bits reserved for extensions to the queue and
> feature negotiation mechanisms
>
> -\item[41 to 49, and 128 and above] Feature bits reserved for future extensions.
> +\item[42 to 49, and 128 and above] Feature bits reserved for future extensions.
> \end{description}
>
> \begin{note}
> @@ -7684,6 +7684,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> that the driver can reset a queue individually.
> See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>
> + \item[VIRTIO_F_ADMIN_VQ(41)] This feature indicates that the device exposes one or more
> + administration virtqueues.
> +
> \end{description}
>
> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [PATCH v11 03/10] admin: introduce group administration commands
From: Zhu, Lingshan @ 2023-04-06 7:16 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, pasic, Shahaf Shuler, Parav Pandit,
Max Gurtovoy
In-Reply-To: <93b34ea59bedbff0671ede2b171cacc1078f30ff.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> This introduces a general structure for group administration commands,
> used to control device groups through their owner.
>
> Following patches will introduce specific commands and an interface for
> submitting these commands to the owner.
>
> Note that the commands are focused on controlling device groups:
> this is why group related fields are in the generic part of
> the structure.
> Without this the admin vq would become a "whatever" vq which does not do
> anything specific at all, just a general transport like thing.
> I feel going this way opens the design space to the point where
> we no longer know what belongs in e.g. config space
> what in the control q and what in the admin q.
> As it is, whatever deals with groups is in the admin q; other
> things not in the admin q.
>
> There are specific exceptions such as query but that's an exception that
> proves the rule ;)
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> changes since v10:
> explain the role of status vs status_qualifier, addresses
> multiple comments
> add more status values and appropriate qualifiers,
> as suggested by Parav
> clarify reserved command opcodes as suggested by Stefan
> better formatting for ranges, comment by Jiri
> make sure command-specific-data is a multiple of qword,
> so things are aligned, suggested by Jiri
> add Q_OK qualifier for success
> ---
> admin.tex | 121 +++++++++++++++++++++++++++++++++++++++++++++++
> introduction.tex | 3 ++
> 2 files changed, 124 insertions(+)
>
> diff --git a/admin.tex b/admin.tex
> index 04d5498..c869a1a 100644
> --- a/admin.tex
> +++ b/admin.tex
> @@ -60,4 +60,125 @@ \section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device g
> PCI transport (see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}).
> \end{description}
>
> +\subsection{Group administration commands}\label{sec:Basic Facilities of a Virtio Device / Device groups / Group administration commands}
>
> +The driver sends group administration commands to the owner device of
> +a group to control member devices of the group.
> +This mechanism can
> +be used, for example, to configure a member device before it is
> +initialized by its driver.
> +\footnote{The term "administration" is intended to be interpreted
> +widely to include any kind of control. See specific commands
> +for detail.}
> +
> +All the group administration commands are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_admin_cmd {
> + /* Device-readable part */
> + le16 opcode;
> + /*
> + * 1 - SR-IOV
> + * 2-65535 - reserved
> + */
> + le16 group_type;
> + /* unused, reserved for future extensions */
> + u8 reserved1[12];
> + le64 group_member_id;
> + le64 command_specific_data[];
> +
> + /* Device-writable part */
> + le16 status;
> + le16 status_qualifier;
> + /* unused, reserved for future extensions */
> + u8 reserved2[4];
> + u8 command_specific_result[];
> +};
> +\end{lstlisting}
> +
> +For all commands, \field{opcode}, \field{group_type} and if
> +necessary \field{group_member_id} and \field{command_specific_data} are
> +set by the driver, and the owner device sets \field{status} and if
> +needed \field{status_qualifier} and
> +\field{command_specific_result}.
> +
> +Generally, any unused device-readable fields are set to zero by the driver
> +and ignored by the device. Any unused device-writeable fields are set to zero
> +by the device and ignored by the driver.
> +
> +\field{opcode} specifies the command. The valid
> +values for \field{opcode} can be found in the following table:
> +
> +\begin{tabular}{|l|l|}
> +\hline
> +opcode & Name & Command Description \\
> +\hline \hline
> +0x0000 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
> +\hline
> +0x8000 - 0xFFFF & - & Reserved for future commands (possibly using a different structure) \\
> +\hline
> +\end{tabular}
> +
> +The \field{group_type} specifies the group type identifier.
> +The \field{group_member_id} specifies the member identifier within the group.
> +See section \ref{sec:Introduction / Terminology / Device group}
> +for the definition of the group type identifier and group member
> +identifier.
> +
> +The \field{status} describes the command result and possibly
> +failure reason at an abstract level, this is appropriate for
> +forwarding to applications. The \field{status_qualifier} describes
> +failures at a low virtio specific level, as appropriate for debugging.
> +The following table describes possible \field{status} values;
> +to simplify common implementations, they are intentionally
> +matching common \hyperref[intro:errno]{Linux error names and numbers}:
> +
> +\begin{tabular}{|l|l|l|}
> +\hline
> +Status (decimal) & Name & Description \\
> +\hline \hline
> +00 & VIRTIO_ADMIN_STATUS_OK & successful completion \\
> +\hline
> +11 & VIRTIO_ADMIN_STATUS_EAGAIN & try again \\
> +\hline
> +12 & VIRTIO_ADMIN_STATUS_ENOMEM & insufficient resources \\
> +\hline
> +22 & VIRTIO_ADMIN_STATUS_EINVAL & invalid command \\
> +\hline
> +other & - & group administration command error \\
> +\hline
> +\end{tabular}
> +
> +When \field{status} is VIRTIO_ADMIN_STATUS_OK, \field{status_qualifier}
> +is reserved and set to zero by the device.
> +
> +The following table describes possible \field{status_qualifier} values:
> +\begin{tabular}{|l|l|l|}
> +\hline
> +Status & Name & Description \\
> +\hline \hline
> +0x00 & VIRTIO_ADMIN_STATUS_Q_OK & used with VIRTIO_ADMIN_STATUS_OK \\
> +\hline
> +0x01 & VIRTIO_ADMIN_STATUS_Q_INVALID_COMMAND & command error: no additional information \\
> +\hline
> +0x02 & VIRTIO_ADMIN_STATUS_Q_INVALID_OPCODE & unsupported or invalid \field{opcode} \\
> +\hline
> +0x03 & VIRTIO_ADMIN_STATUS_Q_INVALID_FIELD & unsupported or invalid field within \field{command_specific_data} \\
> +\hline
> +0x04 & VIRTIO_ADMIN_STATUS_Q_INVALID_GROUP & unsupported or invalid \field{group_type} \\
> +\hline
> +0x05 & VIRTIO_ADMIN_STATUS_Q_INVALID_MEMBER & unsupported or invalid \field{group_member_id} \\
> +\hline
> +0x06 & VIRTIO_ADMIN_STATUS_Q_NORESOURCE & out of internal resources: ok to retry \\
> +\hline
> +0x07 & VIRTIO_ADMIN_STATUS_Q_TRYAGAIN & command blocks for too long: should retry \\
> +\hline
> +0x08-0xFFFF & - & reserved for future use \\
> +\hline
> +\end{tabular}
> +
> +Each command uses a different \field{command_specific_data} and
> +\field{command_specific_result} structures and the length of
> +\field{command_specific_data} and \field{command_specific_result}
> +depends on these structures and is described separately or is
> +implicit in the structure description.
> diff --git a/introduction.tex b/introduction.tex
> index 287c5fc..0d849a9 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -68,6 +68,9 @@ \section{Normative References}\label{sec:Normative References}
> \phantomsection\label{intro:FUSE}\textbf{[FUSE]} &
> Linux FUSE interface,
> \newline\url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h}\\
> + \phantomsection\label{intro:errno}\textbf{[errno]} &
> + Linux error names and numbers,
> + \newline\url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/uapi/asm-generic/errno-base.h}\\
> \phantomsection\label{intro:eMMC}\textbf{[eMMC]} &
> eMMC Electrical Standard (5.1), JESD84-B51,
> \newline\url{http://www.jedec.org/sites/default/files/docs/JESD84-B51.pdf}\\
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio] [PATCH v11 05/10] pci: add admin vq registers to virtio over pci
From: Zhu, Lingshan @ 2023-04-06 7:17 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, pasic, Shahaf Shuler, Parav Pandit,
Max Gurtovoy
In-Reply-To: <fbf6f659974884f7430ec7ef68028305d7acee0c.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Add new registers to the PCI common configuration structure.
>
> These registers will be used for querying the indices of the admin
> virtqueues of the owner device. To configure, reset or enable the admin
> virtqueues, the driver should follow existing queue configuration/setup
> sequence.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ---
> dropped Max's S.O.B
> make queue_num not 0 based
> ---
> content.tex | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index 2eb15fa..5057df2 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -948,6 +948,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> le64 queue_device; /* read-write */
> le16 queue_notify_data; /* read-only for driver */
> le16 queue_reset; /* read-write */
> +
> + /* About the administration virtqueue. */
> + le16 admin_queue_index; /* read-only for driver */
> + le16 admin_queue_num; /* read-only for driver */
> };
> \end{lstlisting}
>
> @@ -1033,6 +1037,19 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> This field exists only if VIRTIO_F_RING_RESET has been
> negotiated. (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
>
> +\item[\field{admin_queue_index}]
> + The device uses this to report the index of the first administration virtqueue.
> + This field is valid only if VIRTIO_F_ADMIN_VQ has been negotiated.
> +\item[\field{admin_queue_num}]
> + The device uses this to report the number of the
> + supported administration virtqueues.
> + Virtqueues with index
> + between \field{admin_queue_index} and (\field{admin_queue_index} +
> + \field{admin_queue_num} - 1) inclusive serve as administration
> + virtqueues.
> + The value 0 indicates no supported administration virtqueues.
> + This field is valid only if VIRTIO_F_ADMIN_VQ has been
> + negotiated.
> \end{description}
>
> \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -1119,6 +1136,14 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> were used before the queue reset.
> (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}).
>
> +If VIRTIO_F_ADMIN_VQ has been negotiated, and if the driver
> +configures any administration virtqueues, the driver MUST
> +configure the administration virtqueues using the index
> +in the range \field{admin_queue_index} to
> +\field{admin_queue_index} + \field{admin_queue_num} - 1 inclusive.
> +The driver MAY configure less administration virtqueues than
> +supported by the device.
> +
> \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
>
> The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
> @@ -7686,6 +7711,15 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>
> \item[VIRTIO_F_ADMIN_VQ(41)] This feature indicates that the device exposes one or more
> administration virtqueues.
> + At the moment this feature is only supported for devices using
> + \ref{sec:Virtio Transport Options / Virtio Over PCI
> + Bus}~\nameref{sec:Virtio Transport Options / Virtio Over PCI Bus}
> + as the transport and is reserved for future use for
> + devices using other transports (see
> + \ref{drivernormative:Basic Facilities of a Virtio Device / Feature Bits}
> + and
> + \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> + handling features reserved for future use.
>
> \end{description}
>
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11 02/10] admin: introduce device group and related concepts
From: Zhu, Lingshan @ 2023-04-06 7:16 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, pasic, Shahaf Shuler, Parav Pandit,
Max Gurtovoy
In-Reply-To: <f83a6a225a28909eea927575b6783c0fe238fdbd.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Each device group has a type. For now, define one initial group type:
>
> SR-IOV type - PCI SR-IOV virtual functions (VFs) of a given
> PCI SR-IOV physical function (PF). This group may contain zero or more
> virtio devices according to NumVFs configured.
>
> Each device within a group has a unique identifier. This identifier
> is the group member identifier.
>
> Note: one can argue both ways whether the new device group handling
> functionality (this and following patches) is closer
> to a new device type or a new transport type.
>
> However, I expect that we will add more features in the near future. To
> facilitate this as much as possible of the text is located in the new
> admin chapter.
>
> I did my best to minimize transport-specific text.
>
> There's a bit of duplication with 0x1 repeated twice and
> no special section for group type identifiers.
> I think it's ok to defer adding these until we have more group
> types.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> changes in v11:
> dropped Max's S.O.B.
> dropped an unmatched ) as reported by Parav
> document that member id is 1 to numvfs
> document that vf enable must be set (will also be reflected in
> the conformance section)
> document that we deferred generalizing text as requested by Parav -
> I think we can do it later
> minor wording fixes to address comments by David
> add concepts of owner and member driver. unused currently
> but will come in handy later, as suggested by Stefan
> ---
> admin.tex | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> content.tex | 2 ++
> 2 files changed, 65 insertions(+)
> create mode 100644 admin.tex
>
> diff --git a/admin.tex b/admin.tex
> new file mode 100644
> index 0000000..04d5498
> --- /dev/null
> +++ b/admin.tex
> @@ -0,0 +1,63 @@
> +\section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device groups}
> +
> +It is occasionally useful to have a device control a group of
> +other devices. Terminology used in such cases:
> +
> +\begin{description}
> +\item[Device group]
> + or just group, includes zero or more devices.
> +\item[Owner device]
> + or owner, the device controlling the group.
> +\item[Member device]
> + a device within a group. The owner device itself is not
> + a member of the group.
> +\item[Member identifier]
> + each member has this identifier, unique within the group
> + and used to address it through the owner device.
> +\item[Group type identifier]
> + specifies what kind of member devices there are in a
> + group, how the member identifier is interpreted
> + and what kind of control the owner has.
> + A given owner can control multiple groups
> + of different types but only a single group of a given type,
> + thus the type and the owner together identify the group.
> + \footnote{Even though some group types only support
> + specific transports, group type identifiers
> + are global rather than transport-specific -
> + we don't expect a flood of new group types.}
> +\end{description}
> +
> +\begin{note}
> +Each device only has a single driver, thus for the purposes of
> +this section, "the driver" is usually unambiguous and refers to
> +the driver of the owner device. When there's ambiguity, "owner
> +driver" refers to the driver of the owner device, while "member
> +driver" refers to the driver of a member device.
> +\end{note}
> +
> +The following group types, and their identifiers, are currently specified:
> +\begin{description}
> +\item[SR-IOV group type (0x1)]
> +This device group has a PCI Single Root I/O Virtualization
> +(SR-IOV) physical function (PF) device as the owner and includes
> +all its SR-IOV virtual functions (VFs) as members (see
> +\hyperref[intro:PCIe]{[PCIe]}).
> +
> +The PF device itself is not a member of the group.
> +
> +The group type identifier for this group is 0x1.
> +
> +A member identifier for this group can have a value from 0x1 to
> +\field{NumVFs} as specified in the
> +SR-IOV Extended Capability of the owner device
> +and equals the SR-IOV VF number of the member device;
> +the group only exists when the \field{VF Enable} bit
> +in the SR-IOV Control Register within the
> +SR-IOV Extended Capability of the owner device is set
> +(see \hyperref[intro:PCIe]{[PCIe]}).
> +
> +Both owner and member devices for this group type use the Virtio
> +PCI transport (see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}).
> +\end{description}
> +
> +
> diff --git a/content.tex b/content.tex
> index 8098988..04592fb 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -493,6 +493,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
> types. It is RECOMMENDED that devices generate version 4
> UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>
> +\input{admin.tex}
> +
> \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>
> We start with an overview of device initialization, then expand on the
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] [PATCH v11 06/10] mmio: document ADMIN_VQ as reserved
From: Zhu Lingshan @ 2023-04-06 7:18 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, Zhu Lingshan, pasic, Shahaf Shuler,
Parav Pandit, Max Gurtovoy
In-Reply-To: <aa6e3f39c4db34f3cf50a650cffeb0180d89e616.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Adding relevant registers needs more work and it's not
> clear what the use-case will be as currently only
> the PCI transport is supported. But let's keep the
> door open on this.
> We already say it's reserved in a central place, but it
> does not hurt to remind implementers to mask it.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> content.tex | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index 5057df2..f7446bf 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2364,6 +2364,18 @@ \subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over M
>
> Notification mechanisms did not change.
>
> +\subsection{Features reserved for future use}\label{sec:Virtio Transport Options / Virtio Over MMIO / Features reserved for future use}
> +
> +At this time, devices and drivers utilizing Virtio Over MMIO
> +do not support the following features:
> +\begin{itemize}
> +
> +\item VIRTIO_F_ADMIN_VQ
> +
> +\end{itemize}
> +
> +These features are reserved for future use.
> +
> \section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio Over Channel I/O}
>
> S/390 based virtual machines support neither PCI nor MMIO, so a
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11 07/10] ccw: document ADMIN_VQ as reserved
From: Zhu Lingshan @ 2023-04-06 7:18 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, Zhu Lingshan, pasic, Shahaf Shuler,
Parav Pandit, Max Gurtovoy
In-Reply-To: <71c1c5f86e19651585426160fc4cc15dc2da6ee6.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Adding relevant registers needs more work and it's not
> clear what the use-case will be as currently only
> the PCI transport is supported. But let's keep the
> door open on this.
> We already say it's reserved in a central place, but it
> does not hurt to remind implementers to mask it.
>
> Note: there are more features to add to this list.
> Will be done later with a patch on top.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> content.tex | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index f7446bf..1213d48 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2978,6 +2978,18 @@ \subsubsection{Resetting Devices}\label{sec:Virtio Transport Options / Virtio ov
> MAY also choose to verify reset completion by reading \field{device status} via
> CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
>
> +\subsection{Features reserved for future use}\label{sec:Virtio Transport Options / Virtio over channel I/O / Features reserved for future use}
> +
> +At this time, devices and drivers utilizing Virtio over channel I/O
> +do not support the following features:
> +\begin{itemize}
> +
> +\item VIRTIO_F_ADMIN_VQ
> +
> +\end{itemize}
> +
> +These features are reserved for future use.
> +
> \chapter{Device Types}\label{sec:Device Types}
>
> On top of the queues, config space and feature negotiation facilities
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11 08/10] admin: command list discovery
From: Zhu Lingshan @ 2023-04-06 7:20 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, Zhu Lingshan, pasic, Shahaf Shuler,
Parav Pandit, Max Gurtovoy
In-Reply-To: <84cf421ad95111b37cddb52b2a7bf625b7955ae6.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Add commands to find out which commands does each group support,
> as well as enable their use by driver.
> This will be especially useful once we have multiple group types.
>
> An alternative is per-type VQs. This is possible but will
> require more per-transport work. Discovery through the vq
> helps keep things contained.
>
> e.g. lack of support for some command can switch to a legacy mode
>
> note that commands are expected to be avolved by adding new
> fields to command specific data at the tail, so
> we generally do not need feature bits for compatibility.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ---
> dropped S.O.B by Max
> explain in commit log how commands will evolve, comment by Stefan
> replace will use with is capable of use, comment by Stefan
> typo fix reported by David and Lingshan
> rename cmds to cmd_opcodes suggested by Jiri
> list group_type explicitly in command description suggested by Jiri
> clarify how can device not support some commands
> always say group administration commands, comment by Jiri
> ---
> admin.tex | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/admin.tex b/admin.tex
> index f201bcd..9552797 100644
> --- a/admin.tex
> +++ b/admin.tex
> @@ -113,7 +113,9 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
> \hline
> opcode & Name & Command Description \\
> \hline \hline
> -0x0000 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
> +0x0000 & VIRTIO_ADMIN_CMD_LIST_QUERY & Provides to driver list of commands supported for this group type \\
> +0x0001 & VIRTIO_ADMIN_CMD_LIST_USE & Provides to device list of commands used for this group type \\
> +0x0002 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
> \hline
> 0x8000 - 0xFFFF & - & Reserved for future commands (possibly using a different structure) \\
> \hline
> @@ -183,6 +185,104 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
> depends on these structures and is described separately or is
> implicit in the structure description.
>
> +Before sending any group administration commands to the device, the driver
> +needs to communicate to the device which commands it is going to
> +use. Initially (after reset), only two commands are assumed to be used:
> +VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE.
> +
> +Before sending any other commands for any member of a specific group to
> +the device, the driver queries the supported commands via
> +VIRTIO_ADMIN_CMD_LIST_QUERY and sends the commands it is
> +capable of using via VIRTIO_ADMIN_CMD_LIST_USE.
> +
> +Commands VIRTIO_ADMIN_CMD_LIST_QUERY and
> +VIRTIO_ADMIN_CMD_LIST_USE
> +both use the following structure describing the
> +command opcodes:
> +
> +\begin{lstlisting}
> +struct virtio_admin_cmd_list {
> + /* Indicates which of the below fields were returned
> + le64 device_admin_cmd_opcodes[];
> +};
> +\end{lstlisting}
> +
> +This structure is an array of 64 bit values in little-endian byte
> +order, in which a bit is set if the specific command opcode
> +is supported. Thus, \field{device_admin_cmd_opcodes[0]} refers to the
> +first 64-bit value in this array corresponding to opcodes 0 to
> +63, \field{device_admin_cmd_opcodes[1]} is the second 64-bit value
> +corresponding to opcodes 64 to 127, etc.
> +For example, the array of size 2 including
> +the values 0x3 in \field{device_admin_cmd_opcodes[0]}
> +and 0x1 in \field{device_admin_cmd_opcodes[1]} indicates that only
> +opcodes 0, 1 and 64 are supported.
> +The length of the array depends on the supported opcodes - it is
> +large enough to include bits set for all supported opcodes,
> +that is the length can be calculated by starting with the largest
> +supported opcode adding one, dividing by 64 and rounding up.
> +In other words, for
> +VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE the
> +length of \field{command_specific_result} and
> +\field{command_specific_data} respectively will be
> +$DIV_ROUND_UP(max_cmd, 64) * 8$ where DIV_ROUND_UP is integer division
> +with round up and \field{max_cmd} is the largest available command opcode.
> +
> +The array is also allowed to be larger and to additionally
> +include an arbitrary number of all-zero entries.
> +
> +Accordingly, bits 0 and 1 corresponding to opcode 0
> +(VIRTIO_ADMIN_CMD_LIST_QUERY) and 1
> +(VIRTIO_ADMIN_CMD_LIST_USE) are
> +always set in \field{device_admin_cmd_opcodes[0]} returned by VIRTIO_ADMIN_CMD_LIST_QUERY.
> +
> +For the command VIRTIO_ADMIN_CMD_LIST_QUERY, \field{opcode} is set to 0x0.
> +The \field{group_member_id} is unused. It is set to zero by driver.
> +This command has no command specific data.
> +The device, upon success, returns a result in
> +\field{command_specific_result} in the format
> +\field{struct virtio_admin_cmd_list} describing the
> +list of group administration commands supported for the group type
> +specified by \field{group_type}.
> +
> +For the command VIRTIO_ADMIN_CMD_LIST_USE, \field{opcode}
> +is set to 0x1.
> +The \field{group_member_id} is unused. It is set to zero by driver.
> +The \field{command_specific_data} is in the format
> +\field{struct virtio_admin_cmd_list} describing the
> +list of group administration commands used by the driver
> +with the group type specified by \field{group_type}.
> +
> +This command has no command specific result.
> +
> +The driver issues the command VIRTIO_ADMIN_CMD_LIST_QUERY to
> +query the list of commands valid for this group and before sending
> +any commands for any member of a group.
> +
> +The driver then enables use of some of the opcodes by sending to
> +the device the command VIRTIO_ADMIN_CMD_LIST_USE with a subset
> +of the list returned by VIRTIO_ADMIN_CMD_LIST_QUERY that is
> +both understood and used by the driver.
> +
> +If the device supports the command list used by the driver, the
> +device completes the command with status VIRTIO_ADMIN_STATUS_OK.
> +If the device does not support the command list
> +(for example, if the driver is not capable to use
> +some required commands), the device
> +completes the command with status
> +VIRTIO_ADMIN_STATUS_INVALID_FIELD.
> +
> +Note: drivers are assumed not to set bits in
> +device_admin_cmd_opcodes
> +if they are not familiar with how the command opcode
> +is used, since devices could have dependencies between
> +command opcodes.
> +
> +It is assumed that all members in a group support and are used
> +with the same list of commands. However, for owner devices
> +supporting multiple group types, the list of supported commands
> +might differ between different group types.
> +
> \section{Administration Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Administration Virtqueues}
>
> An administration virtqueue of an owner device is used to submit
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio] [PATCH v11 09/10] admin: conformance clauses
From: Zhu, Lingshan @ 2023-04-06 7:30 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, pasic, Shahaf Shuler, Parav Pandit,
Max Gurtovoy
In-Reply-To: <f254f6eb0b7ade1aa9892b09539711171a41a02b.1680533760.git.mst@redhat.com>
On 4/3/2023 11:03 PM, Michael S. Tsirkin wrote:
> Add conformance clauses for admin commands and admin virtqueues.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ---
> typo and wording fixes reported by David
> clarify cmd list use can be repeated but not in parallel with other
> commands, and returning same value across uses - comment by Lingshan
> add conformance clauses for new error codes
> ---
> admin.tex | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 227 insertions(+)
>
> diff --git a/admin.tex b/admin.tex
> index 9552797..a398c51 100644
> --- a/admin.tex
> +++ b/admin.tex
> @@ -283,6 +283,150 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
> supporting multiple group types, the list of supported commands
> might differ between different group types.
>
> +\devicenormative{\paragraph}{Group administration commands}{Basic Facilities of a Virtio Device / Device groups / Group administration commands}
> +
> +The device MUST validate \field{opcode}, \field{group_type} and
> +\field{group_member_id}, and if any of these has an invalid or
> +unsupported value, set \field{status} to
> +VIRTIO_ADMIN_STATUS_EINVAL and set \field{status_qualifier}
> +accordingly:
> +\begin{itemize}
> +\item if \field{group_type} is invalid, \field{status_qualifier}
> + MUST be set to VIRTIO_ADMIN_STATUS_Q_INVALID_GROUP;
> +\item otherwise, if \field{opcode} is invalid,
> + \field{status_qualifier} MUST be set to
> + VIRTIO_ADMIN_STATUS_Q_INVALID_OPCODE;
> +\item otherwise, if \field{group_member_id} is used by the
> + specific command and is invalid, \field{status_qualifier} MUST be
> + set to VIRTIO_ADMIN_STATUS_Q_INVALID_MEMBER.
> +\end{itemize}
> +
> +If a command completes successfully, the device MUST set
> +\field{status} to VIRTIO_ADMIN_STATUS_OK.
> +
> +If a command fails, the device MUST set
> +\field{status} to a value different from VIRTIO_ADMIN_STATUS_OK.
> +
> +If \field{status} is set to VIRTIO_ADMIN_STATUS_EINVAL, the
> +device state MUST NOT change, that is the command MUST NOT have
> +any side effects on the device, in particular the device MUST NOT
> +enter an error state as a result of this command.
> +
> +If a command fails, the device state generally SHOULD NOT change,
> +as far as possible.
> +
> +The device MAY enforce additional restrictions and dependencies on
> +opcodes used by the driver and MAY fail the command
> +VIRTIO_ADMIN_CMD_LIST_USE with \field{status} set to VIRTIO_ADMIN_STATUS_EINVAL
> +and \field{status_qualifier} set to VIRTIO_ADMIN_STATUS_Q_INVALID_FIELD
> +if the list of commands used violate internal device dependencies.
> +
> +If the device supports multiple group types, commands for each group
> +type MUST operate independently of each other, in particular,
> +the device MAY return different results for VIRTIO_ADMIN_CMD_LIST_QUERY
> +for different group types.
> +
> +After reset, if the device supports a given group type
> +and before receiving VIRTIO_ADMIN_CMD_LIST_USE for this group type
> +the device MUST assume
> +that the list of legal commands used by the driver consists of
> +the two commands VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE.
> +
> +After completing VIRTIO_ADMIN_CMD_LIST_USE successfully,
> +the device MUST set the list of legal commands used by the driver
> +to the one supplied in \field{command_specific_data}.
> +
> +The device MUST validate commands against the list used by
> +the driver and MUST fail any commands not in the list with
> +\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL
> +and \field{status_qualifier} set to
> +VIRTIO_ADMIN_STATUS_Q_INVALID_OPCODE.
> +
> +The list of supported commands reported by the device MUST NOT
> +shrink (but MAY expand): after reporting a given command as
> +supported through VIRTIO_ADMIN_CMD_LIST_QUERY the device MUST NOT
> +later report it as unsupported. Further, after a given set of
> +commands has been used (via a successful
> +VIRTIO_ADMIN_CMD_LIST_USE), then after a device or system reset
> +the device SHOULD complete successfully any following calls to
> +VIRTIO_ADMIN_CMD_LIST_USE with the same list of commands; if this
> +command VIRTIO_ADMIN_CMD_LIST_USE fails after a device or system
> +reset, the device MUST not fail it solely because of the command
> +list used. Failure to do so would interfere with resuming from
> +suspend and error recovery. Exceptions MAY apply if the system
> +configuration assures, in some way, that the driver does not
> +cache the previous value of VIRTIO_ADMIN_CMD_LIST_USE,
> +such as in the case of a firmware upgrade or downgrade.
> +
> +When processing a command with the SR-IOV group type,
> +if the device does not have an SR-IOV Extended Capability or
> +if \field{VF Enable} is clear
> +then the device MUST fail all commands with
> +\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL and
> +\field{status_qualifier} set to
> +VIRTIO_ADMIN_STATUS_Q_INVALID_GROUP;
> +otherwise, if \field{group_member_id} is not
> +between $1$ and \field{NumVFs} inclusive,
> +the device MUST fail all commands with
> +\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL and
> +\field{status_qualifier} set to
> +VIRTIO_ADMIN_STATUS_Q_INVALID_MEMBER;
> +\field{NumVFs}, \field{VF Migration Capable} and
> +\field{VF Enable} refer to registers within the SR-IOV Extended
> +Capability as specified by \hyperref[intro:PCIe]{[PCIe]}.
> +
> +\drivernormative{\paragraph}{Group administration commands}{Basic Facilities of a Virtio Device / Device groups / Group administration commands}
> +
> +The driver MAY discover whether device supports a specific group type
> +by issuing VIRTIO_ADMIN_CMD_LIST_QUERY with the matching
> +\field{group_type}.
> +
> +The driver MUST issue VIRTIO_ADMIN_CMD_LIST_USE
> +and wait for it to be completed with status
> +VIRTIO_ADMIN_STATUS_OK before issuing any commands
> +(except for the initial VIRTIO_ADMIN_CMD_LIST_QUERY
> +and VIRTIO_ADMIN_CMD_LIST_USE).
> +
> +The driver MAY issue VIRTIO_ADMIN_CMD_LIST_USE any number
> +of times but MUST NOT issue VIRTIO_ADMIN_CMD_LIST_USE commands
> +if any other command is currently being processed by the device.
Please correct me if I misunderstood this,
IMHO when the driver re-negotiates the features by issuing _USE command,
the driver can choose to shrink the feature bits and there
may be some already queued descriptors require the previous negotiated
feature bits. So I think the driver must make sure the admin queues are
empty(like all descriptors are marked as used) before it re-negotiating the
features.
Others look good to me.
> +
> +The driver SHOULD NOT set bits in device_admin_cmd_opcodes
> +if it is not familiar with how the command opcode
> +is used, as dependencies between command opcodes might exist.
> +
> +The driver MUST NOT request (via VIRTIO_ADMIN_CMD_LIST_USE)
> +the use of any commands not previously reported as
> +supported for the same group type by VIRTIO_ADMIN_CMD_LIST_QUERY.
> +
> +The driver MUST NOT use any commands for a given group type
> +before sending VIRTIO_ADMIN_CMD_LIST_USE with the correct
> +list of command opcodes and group type.
> +
> +The driver MAY block use of VIRTIO_ADMIN_CMD_LIST_QUERY and
> +VIRTIO_ADMIN_CMD_LIST_USE by issuing VIRTIO_ADMIN_CMD_LIST_USE
> +with respective bits cleared in \field{command_specific_data}.
> +
> +The driver MUST handle a command error with a reserved \field{status}
> +value in the same way as \field{status} set to VIRTIO_ADMIN_STATUS_EINVAL
> +(except possibly for different error reporting/diagnostic messages).
> +
> +The driver MUST handle a command error with a reserved
> +\field{status_qualifier} value in the same way as
> +\field{status_qualifier} set to
> +VIRTIO_ADMIN_STATUS_Q_INVALID_COMMAND (except possibly for
> +different error reporting/diagnostic messages).
> +
> +When sending commands with the SR-IOV group type,
> +the driver specify a value for \field{group_member_id}
> +between $1$ and \field{NumVFs} inclusive,
> +the driver MUST also make sure that as long as any such command
> +is outstanding, \field{VF Migration Capable} is clear and
> +\field{VF Enable} is set;
> +\field{NumVFs}, \field{VF Migration Capable} and
> +\field{VF Enable} refer to registers within the SR-IOV Extended
> +Capability as specified by \hyperref[intro:PCIe]{[PCIe]}.
> +
> \section{Administration Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Administration Virtqueues}
>
> An administration virtqueue of an owner device is used to submit
> @@ -357,3 +501,86 @@ \section{Administration Virtqueues}\label{sec:Basic Facilities of a Virtio Devic
> of the specification are designed, new fields can be added to the
> tail of a structure, with the driver/device using the full
> structure without concern for versioning.
> +
> +\devicenormative{\paragraph}{Group administration commands}{Basic Facilities of a Virtio Device / Administration virtqueues}
> +
> +The device MUST support device-readable and device-writeable buffers
> +shorter than described in this specification, by
> +\begin{enumerate}
> +\item acting as if any data that would be read outside the
> +device-readable buffers is set to zero, and
> +\item discarding data that would be written outside the
> +specified device-writeable buffers.
> +\end{enumerate}
> +
> +The device MUST support device-readable and device-writeable buffers
> +longer than described in this specification, by
> +\begin{enumerate}
> +\item ignoring any data in device-readable buffers outside
> +the expected length, and
> +\item only writing the expected structure to the device-writeable
> +buffers, ignoring any extra buffers, and reporting the
> +actual length of data written, in bytes,
> +as buffer used length.
> +\end{enumerate}
> +
> +The device SHOULD initialize the device-writeable buffer
> +up to the length of the structure described by this specification or
> +the length of the buffer supplied by the driver (even if the buffer is
> +all set to zero), whichever is shorter.
> +
> +The device MUST NOT fail a command solely because the buffers
> +provided are shorter or longer than described in this
> +specification.
> +
> +The device MUST initialize the device-writeable part of
> +\field{struct virtio_admin_cmd} that is a multiple of 64 bit in
> +size.
> +
> +The device MUST initialize \field{status} and
> +\field{status_qualifier} in \field{struct virtio_admin_cmd}.
> +
> +The device MUST process commands on a given administration virtqueue
> +in the order in which they are queued.
> +
> +If multiple administration virtqueues have been configured,
> +device MAY process commands on distinct virtqueues with
> +no order constraints.
> +
> +If the device sets \field{status} to either VIRTIO_ADMIN_STATUS_EAGAIN
> +or VIRTIO_ADMIN_STATUS_ENOMEM, then the command MUST NOT
> +have any side effects, making it safe to retry.
> +
> +\drivernormative{\paragraph}{Group administration commands}{Basic Facilities of a Virtio Device / Administration virtqueues}
> +
> +The driver MAY supply device-readable or device-writeable parts
> +of \field{struct virtio_admin_cmd} that are longer than described in
> +this specification.
> +
> +The driver SHOULD supply device-readable part of
> +\field{struct virtio_admin_cmd} that is at least as
> +large as the structure described by this specification
> +(even if the structure is all set to zero).
> +
> +The driver MUST supply both device-readable or device-writeable parts
> +of \field{struct virtio_admin_cmd} that are a multiple of 64 bit
> +in length.
> +
> +The device MUST supply both device-readable or device-writeable parts
> +of \field{struct virtio_admin_cmd} that are larger than zero in
> +length. However, \field{command_specific_data} and
> +\field{command_specific_result} MAY be zero in length, unless
> +specified otherwise for the command.
> +
> +The driver MUST NOT assume that the device will initialize the whole
> +device-writeable part of \field{struct virtio_admin_cmd} as described in the specification; instead,
> +the driver MUST act as if the structure
> +outside the part of the buffer used by the device
> +is set to zero.
> +
> +If multiple administration virtqueues have been configured,
> +the driver MUST ensure ordering for commands
> +placed on different administration virtqueues.
> +
> +The driver SHOULD retry a command that completed with
> +\field{status} VIRTIO_ADMIN_STATUS_EAGAIN.
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] [PATCH v11 10/10] ccw: document more reserved features
From: Zhu Lingshan @ 2023-04-06 7:31 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-comment, virtio-dev, jasowang, cohuck,
sgarzare, stefanha, nrupal.jani, Piotr.Uminski, hang.yuan
Cc: virtio, Jiri Pirko, Zhu Lingshan, pasic, Shahaf Shuler,
Parav Pandit, Max Gurtovoy
In-Reply-To: <b1445cf4f4702f4dbea74a3498e02a85b6cc702b.1680533760.git.mst@redhat.com>
On 4/3/2023 11:04 PM, Michael S. Tsirkin wrote:
> vq reset and shared memory are unsupported, too.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/160
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> content.tex | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index 1213d48..fca3827 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2985,6 +2985,8 @@ \subsection{Features reserved for future use}\label{sec:Virtio Transport Options
> \begin{itemize}
>
> \item VIRTIO_F_ADMIN_VQ
> +\item VIRTIO_F_RING_RESET
> +\item Shared memory regions including VIRTIO_PMEM_F_SHMEM_REGION
>
> \end{itemize}
>
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* RE: [virtio-dev] [PATCH v11 08/10] admin: command list discovery
From: Parav Pandit @ 2023-04-06 14:05 UTC (permalink / raw)
To: Zhu Lingshan, Michael S. Tsirkin,
virtio-comment@lists.oasis-open.org,
virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
cohuck@redhat.com, sgarzare@redhat.com, stefanha@redhat.com,
nrupal.jani@intel.com, Piotr.Uminski@intel.com,
hang.yuan@intel.com
Cc: virtio@lists.oasis-open.org, Jiri Pirko, Zhu Lingshan,
pasic@linux.ibm.com, Shahaf Shuler, Max Gurtovoy
In-Reply-To: <0149d88a-196b-0bdc-32b4-1e01eb562310@linux.intel.com>
> From: Zhu Lingshan <lingshan.zhu@linux.intel.com>
> Sent: Thursday, April 6, 2023 3:20 AM
I have a dumb question.
Zhu responded to these patches.
Some patches have prefix appended as "virtio", some has "virtio-dev", and some have "virtio-comment".
Whichever tool is adding it prefix, why it cannot keep the simple consistency?
Do we need to ask OASIS about it?
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11 08/10] admin: command list discovery
From: Zhu, Lingshan @ 2023-04-07 2:01 UTC (permalink / raw)
To: Parav Pandit, Zhu Lingshan, Michael S. Tsirkin,
virtio-comment@lists.oasis-open.org,
virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
cohuck@redhat.com, sgarzare@redhat.com, stefanha@redhat.com,
nrupal.jani@intel.com, Piotr.Uminski@intel.com,
hang.yuan@intel.com
Cc: virtio@lists.oasis-open.org, Jiri Pirko, pasic@linux.ibm.com,
Shahaf Shuler, Max Gurtovoy
In-Reply-To: <PH0PR12MB548174BC1333E7107B731108DC919@PH0PR12MB5481.namprd12.prod.outlook.com>
On 4/6/2023 10:05 PM, Parav Pandit wrote:
>
>> From: Zhu Lingshan <lingshan.zhu@linux.intel.com>
>> Sent: Thursday, April 6, 2023 3:20 AM
> I have a dumb question.
> Zhu responded to these patches.
> Some patches have prefix appended as "virtio", some has "virtio-dev", and some have "virtio-comment".
> Whichever tool is adding it prefix, why it cannot keep the simple consistency?
>
> Do we need to ask OASIS about it?
one of the reasons is that I subscribed virtio, virtio-comment and
virtio-dev, and these threads list are all in the to or cc recipients.
I agree OASIS mailing server sometimes behave oddly, it is reliable most
of the time, but I didn't receive any direct emails from you about
the virtio-net Sub-TC proposal, but I can receive emails from Stefan.
That's odd, I should see all your emails via virtio@list.oasis-open.org.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] Re: [virtio-comment] [PATCH v3 1/1] virtio-ism: introduce new device virtio-ism
From: Xuan Zhuo @ 2023-04-07 3:22 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Wen Gu, Halil Pasic, virtio-comment, hans, herongguang, zmlcc,
dust.li, tonylu, zhenzao, helinguo, gerry, cohuck, jasowang,
Jan Kiszka, kgraul, wenjia, jaka, hca, twinkler, raspl,
virtio-dev, Alexandra Winter
In-Reply-To: <20230405084632-mutt-send-email-mst@kernel.org>
On Wed, 5 Apr 2023 08:52:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Apr 05, 2023 at 02:39:53PM +0200, Alexandra Winter wrote:
> >
> >
> > On 24.03.23 05:03, Wen Gu wrote:
> > >
> > >
> > > On 2023/3/23 22:46, Halil Pasic wrote:
> > >
> > >> On Thu, 9 Feb 2023 11:30:56 +0800
> > >> Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >>
> >
> > ...
> >
> > >
> > >> To get back to the things proposed here: the cdid is IMHO
> > >> a nice thing, and is functionally corresponding to the
> > >> (S)EID. But it is 16 byte wide, and I have no idea how
> > >> is it supposed to be used in the CLC handshake.
> > >>
> > >
> > > CLC handshake carry one SEID for all the SMC-D device. Considering
> > > coexistence with ISM, I am not sure whether we can change or increase
> > > the SEID.. cc Alexandra
> > >
> > > Thanks!
> > > Wen Gu
> >
> > As mentioned by others, discussions are ongoing.
> > It would be great, if we can agree on a way to use the existing CLC handshake
> > for SMC-D via virtio-ism and ism-loopback.
> > In that case SEID needs to be unique per hardware instance, cannot be increased and
> > can only be changed for x86 in a non-colliding way.
> >
> > An alternative would be to define new a SMC-D(?) protocol variant/version, where we
> > are free to define new fields (e.g. UUIDs).
> >
> > Alexandra
>
> Problem with tying to hardware is that it is blocking
> migration (which is a challenge with ism anyway, but still).
We don't want to support migration. At least we don't want to support it for the
time being. Because there are indeed many problems. I think Migration is not
necessary for a new Virtio device.
Thanks.
>
>
> > >
> > >> If this is really supposed to work with SMC and not just take
> > >> inspiration from it, I would like some insight from our
> > >> SMC experts (they are already on copy).
> > >>
> > >> Regards,
> > >> Halil
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11 08/10] admin: command list discovery
From: Michael S. Tsirkin @ 2023-04-07 7:34 UTC (permalink / raw)
To: Parav Pandit
Cc: Zhu Lingshan, virtio-comment@lists.oasis-open.org,
virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
cohuck@redhat.com, sgarzare@redhat.com, stefanha@redhat.com,
nrupal.jani@intel.com, Piotr.Uminski@intel.com,
hang.yuan@intel.com, virtio@lists.oasis-open.org, Jiri Pirko,
Zhu Lingshan, pasic@linux.ibm.com, Shahaf Shuler, Max Gurtovoy
In-Reply-To: <PH0PR12MB548174BC1333E7107B731108DC919@PH0PR12MB5481.namprd12.prod.outlook.com>
On Thu, Apr 06, 2023 at 02:05:07PM +0000, Parav Pandit wrote:
>
>
> > From: Zhu Lingshan <lingshan.zhu@linux.intel.com>
> > Sent: Thursday, April 6, 2023 3:20 AM
>
> I have a dumb question.
> Zhu responded to these patches.
> Some patches have prefix appended as "virtio", some has "virtio-dev", and some have "virtio-comment".
> Whichever tool is adding it prefix, why it cannot keep the simple consistency?
>
> Do we need to ask OASIS about it?
>
Guess: are you using gmail? Each list adds its own prefix. Which is
admittedly a bad idea but there you are, OASIS outsources this and can't
change it. Adding a footer is an even worse idea but they won't change
this either.
So if you are subscribed to multiple lists you are getting
multiple copies, which is handy if each list goes to its
own folder. However, gmail is unusual in that it deduplicates
email discarding all copies but one. There are work-arounds
with filters but the end result in any case will be that
you will get one of the copies, selected at random.
As they have different prefixes and footers added, there you go,
a mix of different prefixes in a single thread.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [PATCH 08/11] transport-pci: Introduce virtio extended capability
From: Michael S. Tsirkin @ 2023-04-07 8:15 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
virtio-comment@lists.oasis-open.org, Shahaf Shuler,
Satananda Burla
In-Reply-To: <PH0PR12MB5481B5F4F8797F0F795E7794DC909@PH0PR12MB5481.namprd12.prod.outlook.com>
On Wed, Apr 05, 2023 at 01:16:31PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, April 5, 2023 1:11 AM
>
> > > > > +struct virtio_pcie_ext_cap {
> > > > > + struct pcie_ext_cap pcie_ecap;
> > > > > + u8 cfg_type; /* Identifies the structure. */
> > > > > + u8 bar; /* Index of the BAR where its located */
> > > > > + u8 id; /* Multiple capabilities of the same type */
> > > > > + u8 zero_padding[1];
> > > > > + le64 offset; /* Offset with the bar */
> > > > > + le64 length; /* Length of the structure, in bytes. */
> > > > > + u8 data[]; /* Optional variable length data */
> > > >
> > > > Maybe le64 data[], for alignment?
> > > >
> > > It gets harder to decode (typecasting ..) if its string with le64 data type.
> >
> > In what language? In C you have to cast anyway, string is char *, often signed,
> > not u8.
> >
> > > I will extend the comment,
> > >
> > > + u8 data[]; /* Optional variable length data, must be aligned
> > > + to 8 bytes */
> >
> > I'd keep it le64 or u64, it is highly unlikely we'll pass strings through this
> > interface anyway.
>
> Ok. will change.
>
> What about rest of the patches? If we proceed using MMR interface, rest of the patches are fine?
Biggest problem is 7/11 - the new IDs, breaking all existing drivers.
I thought I replied on that but don't see it on that specific patch.
Let me repost my thoughts now I had time to think it over.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [PATCH 07/11] transport-pci: Introduce transitional MMR device id
From: Michael S. Tsirkin @ 2023-04-07 8:37 UTC (permalink / raw)
To: Parav Pandit; +Cc: virtio-dev, cohuck, virtio-comment, shahafs, Satananda Burla
In-Reply-To: <20230330225834.506969-8-parav@nvidia.com>
On Fri, Mar 31, 2023 at 01:58:30AM +0300, Parav Pandit wrote:
> Transitional MMR device PCI Device IDs are unique. Hence,
> any of the existing drivers do not bind to it.
> This further maintains the backward compatibility with
> existing drivers.
>
> Co-developed-by: Satananda Burla <sburla@marvell.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
This is IMO just too big a change: fundamentally this is a completely
new transport since no existing drivers can use this device at all. Not
a thing we should do lightly. And we have options on the table
such as AQ, which address most of the issue. Yes they do not
work for passthrough of a PF but that seems like a minor issue.
If we want to save the MMR idea, let's find a way to cut
down this patchset's size significantly.
On a more positive side, I do not really see a reason to
have a new ID at all. Device can have a normal device ID
and additionally the new capability.
> ---
> transport-pci.tex | 45 +++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/transport-pci.tex b/transport-pci.tex
> index ee11ba5..665448e 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -19,12 +19,14 @@ \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over P
> \subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
>
> Any PCI device with PCI Vendor ID 0x1af4, and PCI Device ID 0x1000 through
> -0x107f inclusive is a virtio device. The actual value within this range
> -indicates which virtio device is supported by the device.
> +0x107f inclusive and DeviceID 0x10f9 through 0x10ff is a virtio device.
> +The actual value within this range indicates which virtio device
> +type it is.
> The PCI Device ID is calculated by adding 0x1040 to the Virtio Device ID,
> as indicated in section \ref{sec:Device Types}.
> -Additionally, devices MAY utilize a Transitional PCI Device ID range,
> -0x1000 to 0x103f depending on the device type.
> +Additionally, devices MAY utilize a Transitional PCI Device ID range
> +0x1000 to 0x103f inclusive or a Transitional MMR PCI Device ID range
> +0x10f9 to 0x10ff inclusive, depending on the device type.
>
> \devicenormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
>
> @@ -95,6 +97,41 @@ \subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virt
>
> This is to match legacy drivers.
>
> +\subsubsection{Transitional MMR Interface: A Note on PCI Device
> +Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI
> +Bus / PCI Device Discovery / Transitional MMR Interface: A Note on PCI Device Discovery}
> +
> +The transitional MMR device has one of the following PCI Device ID
> +depending on the device type:
> +
> +\begin{tabular}{|l|c|}
> +\hline
> +Transitional PCI Device ID & Virtio Device \\
> +\hline \hline
> +0x10f9 & network device \\
> +\hline
> +0x10fa & block device \\
> +\hline
> +0x10fb & memory ballooning (traditional) \\
> +\hline
> +0x10fc & console \\
> +\hline
> +0x10fd & SCSI host \\
> +\hline
> +0x10fe & entropy source \\
> +\hline
> +0x10ff & 9P transport \\
> +\hline
> +\end{tabular}
> +
> +The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY
> +reflect the PCI Vendor and Device ID of the environment.
> +
> +The transitional MMR driver MUST match any PCI Revision ID value.
> +
> +The transitional MMR driver MAY match any PCI Subsystem Vendor ID and
> +any PCI Subsystem Device ID value.
> +
> \subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
>
> The device is configured via I/O and/or memory regions (though see
> --
> 2.26.2
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
From: Michael S. Tsirkin @ 2023-04-07 8:55 UTC (permalink / raw)
To: Parav Pandit; +Cc: virtio-dev, cohuck, virtio-comment, shahafs, Satananda Burla
In-Reply-To: <20230330225834.506969-10-parav@nvidia.com>
On Fri, Mar 31, 2023 at 01:58:32AM +0300, Parav Pandit wrote:
> Legacy virtio configuration registers and adjacent
> device configuration registers are located somewhere
> in a memory BAR.
>
> A new capability supplies the location of these registers
> which a driver can use to map I/O access to legacy
> memory mapped registers.
>
> This gives the ability to locate legacy registers in either
> the existing memory BAR or as completely new BAR at BAR 0.
>
> A below example diagram attempts to depicts it in an existing
> memory BAR.
>
> +------------------------------+
> |Transitional |
> |MMR SRIOV VF |
> | |
> ++---------------+ |
> ||dev_id = | |
> ||{0x10f9-0x10ff}| |
> |+---------------+ |
> | |
> ++--------------------+ |
> || PCIe ext cap = 0xB | |
> || cfg_type = 10 | |
> || offset = 0x1000 | |
> || bar = A {0..5}| |
> |+--|-----------------+ |
> | | |
> | | |
> | | +-------------------+ |
> | | | Memory BAR = A | |
> | | | | |
> | +------>+--------------+ | |
> | | |legacy virtio | | |
> | | |+ dev cfg | | |
> | | |registers | | |
> | | +--------------+ | |
> | +-----------------+ | |
> +------------------------------+
>
> Co-developed-by: Satananda Burla <sburla@marvell.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
I am split about using the extended capability for this, since in
practice this makes for more code in hypervisors. How about just using
an existing capability as opposed to the extended capability? Less work
for existing hypervisors, no? And let's begin to use the extended
capability for something more important than legacy access.
> ---
> transport-pci.tex | 33 +++++++++++++++++++++++++++++++--
> 1 file changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/transport-pci.tex b/transport-pci.tex
> index aeda4a1..55a6aa0 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -168,6 +168,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> \item ISR Status
> \item Device-specific configuration (optional)
> \item PCI configuration access
> +\item Legacy memory mapped configuration registers (optional)
> \end{itemize}
>
> Each structure can be mapped by a Base Address register (BAR) belonging to
> @@ -228,6 +229,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> /* Vendor-specific data */
> #define VIRTIO_PCI_CAP_VENDOR_CFG 9
> +/* Legacy configuration registers capability */
> +#define VIRTIO_PCI_CAP_LEGACY_MMR_CFG 10
> \end{lstlisting}
>
> Any other value is reserved for future use.
> @@ -682,6 +685,18 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> Configuration Space / Legacy Interface: Device Configuration
> Space}~\nameref{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space} for workarounds.
>
> +\paragraph{Transitional MMR Interface: A Note on Configuration Registers}
> +\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities / Common configuration structure layout / Transitional MMR Interface: A Note on Configuration Registers}
> +
> +The transitional MMR device MUST present legacy virtio registers
> +consisting of legacy common configuration registers followed by
> +legacy device specific configuration registers described in section
> +\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities / Common configuration structure layout / Legacy Interfaces: A Note on Configuration Registers}
> +in a memory region PCI BAR.
> +
> +The transitional MMR device MUST provide the location of the
> +legacy virtio configuration registers using a legacy memory mapped
> +registers capability described in section \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities / Transitional MMR Interface: Legacy Memory Mapped Configuration Registers Capability}.
>
> \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
>
> @@ -956,9 +971,23 @@ \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport O
> specified by some other Virtio Structure PCI Capability
> of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
>
> +\subsubsection{Transitional MMR Interface: Legacy Memory Mapped Configuration Registers Capability}
> +\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities / Transitional MMR Interface: Legacy Memory Mapped Configuration Registers Capability}
> +
> +The optional VIRTIO_PCI_CAP_LEGACY_MMR_CFG capability defines
> +the location of the legacy virtio configuration registers
> +followed by legacy device specific configuration registers in
> +the memory region BAR for the transitional MMR device.
> +
> +The \field{cap.offset} MUST be 4-byte aligned.
> +The \field{cap.offset} SHOULD be 4KBytes aligned and
what's the point of this?
> +\field{cap.length} SHOULD be 4KBytes.
Why is length 4KBytes? Why not the actual length?
> +
> +The transitional MMR device MUST present a legacy configuration
> +memory mapped registers capability using \field{virtio_pcie_ext_cap}.
> +
> \subsubsection{Legacy Interface: A Note on Feature Bits}
> -\label{sec:Virtio Transport Options / Virtio Over PCI Bus /
> -Virtio Structure PCI Capabilities / Legacy Interface: A Note on Feature Bits}
> +\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities / Legacy Interface: A Note on Feature Bits}
>
So it is not by chance that we abadoned the legacy interface,
it had some fundamental issues.
Let me list some off the top of my mind:
- no atomicity across accesses, so if a register changes
while it is read driver gets trash. solved using generation id
- no defined endian-ness. solved using le
- no way to reject driver configuration
we just did a new thing instead, but I feel if you are reviving the
legacy interface yet again, it is worth thinking about solving the worst
of the warts. For example, I can see an endian-ness register that
hypervisor can either read to check device is compatible with guest, or
maybe even write to force device to specific endian-ness.
A generation counter that hypervisor can check to
verify value is consistent? Can work if hypervisor
caches configuration.
A bad state flag that device can set to make hypervisor
stop guest? Better than corrupting it ...
> Only Feature Bits 0 to 31 are accessible through the
> Legacy Interface. When used through the Legacy Interface,
> --
> 2.26.2
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [PATCH 06/11] introduction: Introduce transitional MMR interface
From: Michael S. Tsirkin @ 2023-04-07 9:17 UTC (permalink / raw)
To: Parav Pandit; +Cc: virtio-dev, cohuck, virtio-comment, shahafs, Satananda Burla
In-Reply-To: <20230330225834.506969-7-parav@nvidia.com>
On Fri, Mar 31, 2023 at 01:58:29AM +0300, Parav Pandit wrote:
> Introduce terminology for the transitional MMR device and transitional
> MMR driver.
>
> Add description of the transitional MMR device. It is a PCI
> device that implements legacy virtio common configuration registers
> followed by legacy device specific registers in a memory region at
> an offset.
>
> This enables hypervisor such as vfio driver to emulate
> I/O region towards the guest at BAR0. By doing so VFIO driver can
> translate read/write accesses on I/O region from the guest
> to the device memory region.
>
> High level comparison of 1.x, transitional & transitional MMR
> sriov vf device:
>
> +------------------+ +--------------------+ +--------------------+
> |virtio 1.x | |Transitional | |Transitional |
> |SRIOV VF | |SRIOV VF | |MMR SRIOV VF |
> | | | | | |
> ++---------------+ | ++---------------+ | ++---------------+ |
> ||dev_id = | | ||dev_id = | | ||dev_id = | |
> ||{0x1040-0x106C}| | ||{0x1000-0x103f}| | ||{0x10f9-0x10ff}| |
> |+---------------+ | |+---------------+ | |+---------------+ |
> | | | | | |
> |+------------+ | |+------------+ | |+-----------------+ |
> ||Memory BAR | | ||Memory BAR | | ||Memory BAR | |
> |+------------+ | |+------------+ | || | |
> | | | | || +--------------+| |
> | | |+-----------------+ | || |legacy virtio || |
> | | ||IOBAR impossible | | || |+ dev cfg || |
> | | |+-----------------+ | || |registers || |
> | | | | || +--------------+| |
> | | | | |+-----------------+ |
> +------------------+ +--------------------+ +--------------------+
>
> Motivation and background:
> PCIe and system limitations:
> 1. PCIe VFs do not support IOBAR cited at [1].
>
> Perhaps the PCIe spec could be extended, however it would be only
> useful for virtio transitional devices. Even if such an extension
> is present, there are other system limitations described below in (2)
> and (3).
>
> 2. cpu io port space limit and fragmentation
> x86_64 is limited to only 64K worth of IO port space at [2],
> which is shared with many other onboard system peripherals which
> are behind PCIe bridge; such I/O region also needs to be aligned
> to 4KB at PCIe bridge level cited at [3]. This can lead to a I/O space
> fragmentation. Due to this fragmentation and alignment need,
> actual usable range is small.
>
> 3. IO space access of PCI device is done through non-posted message
> which requires higher completion time in the PCIe fabric for
> round trip travel.
>
> [1] PCIe spec citation:
> VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space.
>
> [2] cpu arch citiation:
> Intel 64 and IA-32 Architectures Software Developer’s Manual
> The processor’s I/O address space is separate and distinct from
> the physical-memory address space. The I/O address space consists
> of 64K individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
>
> [3] PCIe spec citation:
> If a bridge implements an I/O address range,...I/O address range
> will be aligned to a 4 KB boundary.
>
> Co-developed-by: Satananda Burla <sburla@marvell.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> introduction.tex | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/introduction.tex b/introduction.tex
> index e8b34e3..9a0f96a 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -161,6 +161,20 @@ \subsection{Legacy Interface: Terminology}\label{intro:Legacy
> have a need for backwards compatibility!
> \end{note}
>
> +\begin{description}
> +\item[Transitional MMR Device]
> + is a PCI device which exposes legacy virtio configuration
> + registers followed by legacy device configuration registers as
> + memory mapped registers (MMR) at an offset in a memory region
> + BAR, has no I/O region BAR,
in fact, most of existing pci interface can be in either IO or memory bar,
I don't think we specify that it's in a memory bar in the spec.
For example IIRC qemu has a flag that uses IO for signalling
kicks for modern VQs, this is slightly faster on some architectures.
> having its own PCI Device ID range,
> + and follows the rest of the functionalities
this setence isn't grammatical.
not sure what exactly you are trying to say here, can not
help you rewrite.
> of the transitional device.
a transitional device probably.
> +\end{description}
> +
> +\begin{description}
> +\item[Transitional MMR Driver]
> + is a PCI device driver that supports the Transitional MMR device.
> +\end{description}
> +
> Devices or drivers with no legacy compatibility are referred to as
> non-transitional devices and drivers, respectively.
>
> @@ -174,6 +188,22 @@ \subsection{Transition from earlier specification drafts}\label{sec:Transition f
> sections tagged "Legacy Interface" in the section title.
> These highlight the changes made since the earlier drafts.
>
> +\subsection{Transitional MMR interface: specification drafts}\label{sec:Transitional MMR interface: specification drafts}
> +
> +The transitional MMR device and driver differs from the
> +transitional device and driver respectively in few areas. Such
a few
> +differences are contained in sections named
> +'Transitional MMR interface', like this one. When no differences
> +are mentioned explicitly, the transitional MMR device and driver
> +follow exactly the same functionalities as that of the
> +transitional device and driver respectively.
Ugh, we called these transitional because they were there for
the transition period :)
The thing that I feel you miss here is that
transitional driver using transitional device MUST NOT
use the legacy interface.
The new thing with this MMR is that it's a memory mapped
access to legacy registers.
Going back to my idea of just adding legacy MMR capability to existing
modern and transitional devices, as opposed to a completely new type of
device, we would basically have a modern driver access the new
capability, and forward accesses from a legacy driver to there.
So "memory mapped legacy interface" would be a better name I think.
> +
> +\begin{note}
> +Transitional MMR interface is only required to support backward
> +compatibility. It should not be implemented unless there is a need
> +for the backward compatibility.
> +\end{note}
> +
> \section{Structure Specifications}\label{sec:Structure Specifications}
>
> Many device and driver in-memory structure layouts are documented using
> --
> 2.26.2
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device
From: Michael S. Tsirkin @ 2023-04-07 9:35 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
virtio-comment@lists.oasis-open.org, Shahaf Shuler
In-Reply-To: <25d15176-042a-f579-0b59-d08f7eb7eafb@nvidia.com>
On Mon, Apr 03, 2023 at 06:00:13PM -0400, Parav Pandit wrote:
>
>
> On 4/3/2023 5:04 PM, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2023 at 08:25:02PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Monday, April 3, 2023 2:02 PM
> > >
> > > > > Because vqs involve DMA operations.
> > > > > It is left to the device implementation to do it, but a generic wisdom
> > > > > is not implement such slow work in the data path engines.
> > > > > So such register access vqs can/may be through firmware.
> > > > > Hence it can involve a lot higher latency.
> > > >
> > > > Then that wisdom is wrong? tens of microseconds is not workable even for
> > > > ethtool operations, you are killing boot time.
> > > >
> > > Huh.
> > > What ethtool latencies have you experienced? Number?
> >
> > I know an order of tens of eth calls happens during boot.
> > If as you said each takes tens of ms then we are talking close to a second.
> > That is measureable.
> I said it can take, doesn't have to be always same for all the commands.
> Better to work with real numbers. :)
>
> Let me take an example to walk through.
>
> If a cvq or aq command takes 0.5msec, total of 100 such commands will take
> 50msec.
>
> Once a while if two of commands say take 5msec, will result in 50 -> 60
> msec.
Not too bad. then it seems it should not be a problem to tunnel config
over AQ then?
>
> > OK then. Then if it is a dead end then it looks weird to add a whole new
> > config space as memory mapped.
> >
> I am aligned with you to not add any new register as memory mapped for 1.x.
> Or access through device own's tvq is fine if such q can be initialized
> before during device reset (init) phase.
>
> I explained that legacy registers are sub-set of existing 1.x.
> They should not consume extra memory.
>
> Lets walk through the merits and negatives of both to conclude.
>
> > > > Let me try again.
>
> > If hardware vendors do not want to bear the costs of registers then they
> > will not implement devices with registers, and then the whole thing will
> > become yet another legacy thing we need to support. If legacy emulation
> > without IO is useful, then can we not find a way to do it that will
> > survive the test of time?
> legacy_register_transport_vq for VF can be a option, but not for PF
> emulation.
OK. Do we really care? Are you guys selling lots of high end cards
without SRIOV that it matters?
> More below.
>
> >
> > > Again, I want to emphasize that register read/write over tvq has merits with trade-off.
> > > And so the mmr has merits with trade-off too.
> > >
> > > Better to list them and proceed forward.
> > >
> > > Method-1: VF's register read/write via PF based transport VQ
> > > Pros:
> > > a. Light weight registers implementation in device for new memory region window
> >
> > Is that all? I mentioned more.
> >
> b. device reset is more optimal with transport VQ
> c. a hypervisor may want to check (but not necessary) register content
> d. Some unknown guest VM driver which modifies mac address and still expect
> atomicity can benefit if hypervisor wants to do extra checks
It's not hard to be more specific.
Old Linux kernels are like this, this was fixed with:
commit 7e58d5aea8abb993983a3f3088fd4a3f06180a1c
Author: Amos Kong <akong@redhat.com>
Date: Mon Jan 21 01:17:23 2013 +0000
Currently we write MAC address to pci config space byte by byte,
this means that we have an intermediate step where mac is wrong.
This patch introduced a new control command to set MAC address,
it's atomic.
about 10 years ago.
> > > Cons:
> > > a. Higher DMA read/write latency
> > > b. Device requires synchronization between non legacy memory mapped registers and legacy regs access via tvq
> >
> > Same as a separate mmemory bar really. Just don't do it. Either access
> > legacy or non legacy.
> >
> It is really not same to treat them equally as tvq encapsulation is
> different, and hw wouldn't prefer to treat them equally like regular memory
> writes.
I think yoiu missunderstand what I said. You listed a problem:
the same device can be accessed through both
a modern and a legacy interface.
I said that it is not a problem at all, there is no reason
to use both.
> Transitional device exposed by hypervisor contains both legacy I/O bar and
> also the memory mapped registers. So a guest vm can access both.
But it must not, and some devices break if you do.
> > > c. Can only work with the VF. Cannot work for thin hypervisor, which can map transitional PF to bare metal OS
> > > (also listed in cover letter)
> >
> > Is that a significant limitation? Why?
> It is a functional limitation for the PF, as PF has no parent.
> and PF can also utilize memory BAR.
Yes it's a limitation, I just don't see why we care.
> >
> > > Method-2: VF's register read/write via MMR (current proposal)
> > > Pros:
> > > a. Device utilizes the same legacy and non-legacy registers.
> >
> > > b. an order of magnitude lower latency due to avoidance of DMA on register accesses
> > > (Important but not critical)
> >
> > And no cons? Even if you could not see them yourself did I fail to express myself to such
> > an extent?
> >
> Method-1 pros covered the advantage of it over method-2, but yes worth to
> list here for completeness.
>
> Cons:
> requires creating new memory region window in the device for configuration
> access
Parav please take a look at the discussion so far as collect more cons
that were mentioned for the proposal, I definitely listed some and I
don't really want to repeat myself. I expect a proposal to be balanced,
not a sales pitch.
> > > > > No. Interrupt latency is in usec range.
> > > > > The major latency contributors in msec range can arise from the device side.
> > > >
> > > > So you are saying there are devices out there already with this MMR hack
> > > > baked in, and in hardware not firmware, so it works reasonably?
> > > It is better to not assert a solution a "hack",
> >
> > Sorry if that sounded offensive. a hack is not necessary a bad thing.
> > It's a quick solution to a very local problem, though.
> >
> It is a solution because device can do at near to zero extra memory for
> existing registers.
> Anyways, we have better technical details to resolve. :)
> Lets focus on it.
>
> > Yes motivation is one of the things I'm trying to work out here.
> > It does however not help that it's an 11 patch strong patchset
> > adding 500 lines of text for what is supposedly a small change.
> >
> Many of the patches are rework and incorrect to attribute to the specific
> feature.
>
> Like others it could have been one giant patch... but we see value in
> smaller patches..
>
> Using tvq is even bigger change than this.
The main thing is that there's no new ID so the PF device itself will
stay usable with existing drivers.
> So we shouldn't be afraid of
> making transitional device actually work using it with larger spec patch.
>
> > > Regarding tvq, I have some idea on how to improve the register read/writes so that its optimal for devices to implement.
> >
> > Sounds useful, and maybe if tvq addresses legacy need then focus on
> > that?
> >
>
> tvq specific for legacy register access make sense.
> Some generic tvq is abstract and dont see any relation here.
>
> So better to name it as legacy_reg_transport_vq (lrt_vq).
Again this assumes tvq will be rewritten on top of AQ.
I guess legacy can then become a new type of AQ command?
And maybe you want a memory mapped register for AQ commands? I know
Jason really wanted that.
> How about having below format?
>
> /* Format of 16B descriptors for lrt_vq
> * lrt_vq = legacy register tranport vq.
> */
> struct legacy_reg_req_vf {
> union {
> struct {
> le32 reg_wr_data;
> le32 reserved;
> } write;
> struct {
> le64 reg_read_addr;
> };
> };
> le8 rd_wr : 1; /* rd=0, wr=1 */
> le8 reg_byte_offset : 7;
> le8 req_tag; /* unique request tag on this vq */
> le16 vf_num;
>
> le16 flags; /* new flag below */
> le16 next;
> };
>
> #define VIRTQ_DESC_F_Q_DEFINED 8
> /* Content of the VQ descriptor other than flags field is VQ
> * specific and defined by the VQ type.
> */
Any way to allow accesses of arbitrary length?
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v3 1/1] virtio-ism: introduce new device virtio-ism
From: Michael S. Tsirkin @ 2023-04-07 11:13 UTC (permalink / raw)
To: Xuan Zhuo
Cc: Wen Gu, Halil Pasic, virtio-comment, hans, herongguang, zmlcc,
dust.li, tonylu, zhenzao, helinguo, gerry, cohuck, jasowang,
Jan Kiszka, kgraul, wenjia, jaka, hca, twinkler, raspl,
virtio-dev, Alexandra Winter
In-Reply-To: <1680837735.392044-2-xuanzhuo@linux.alibaba.com>
On Fri, Apr 07, 2023 at 11:22:15AM +0800, Xuan Zhuo wrote:
> On Wed, 5 Apr 2023 08:52:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Apr 05, 2023 at 02:39:53PM +0200, Alexandra Winter wrote:
> > >
> > >
> > > On 24.03.23 05:03, Wen Gu wrote:
> > > >
> > > >
> > > > On 2023/3/23 22:46, Halil Pasic wrote:
> > > >
> > > >> On Thu, 9 Feb 2023 11:30:56 +0800
> > > >> Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >>
> > >
> > > ...
> > >
> > > >
> > > >> To get back to the things proposed here: the cdid is IMHO
> > > >> a nice thing, and is functionally corresponding to the
> > > >> (S)EID. But it is 16 byte wide, and I have no idea how
> > > >> is it supposed to be used in the CLC handshake.
> > > >>
> > > >
> > > > CLC handshake carry one SEID for all the SMC-D device. Considering
> > > > coexistence with ISM, I am not sure whether we can change or increase
> > > > the SEID.. cc Alexandra
> > > >
> > > > Thanks!
> > > > Wen Gu
> > >
> > > As mentioned by others, discussions are ongoing.
> > > It would be great, if we can agree on a way to use the existing CLC handshake
> > > for SMC-D via virtio-ism and ism-loopback.
> > > In that case SEID needs to be unique per hardware instance, cannot be increased and
> > > can only be changed for x86 in a non-colliding way.
> > >
> > > An alternative would be to define new a SMC-D(?) protocol variant/version, where we
> > > are free to define new fields (e.g. UUIDs).
> > >
> > > Alexandra
> >
> > Problem with tying to hardware is that it is blocking
> > migration (which is a challenge with ism anyway, but still).
>
>
> We don't want to support migration. At least we don't want to support it for the
> time being. Because there are indeed many problems. I think Migration is not
> necessary for a new Virtio device.
>
> Thanks.
The specific implementation does not matter much. At the spec level we
strive to make interfaces generic so they can be reused down the road,
rather than having to invent new ones for each use-case.
Maybe we can come up with a way that let devices choose either
an existing one with a SEID or a new one with a UUID?
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] [RFC] virtio 1.3 schedule
From: Michael S. Tsirkin @ 2023-04-07 11:24 UTC (permalink / raw)
To: Cornelia Huck; +Cc: virtio, virtio-comment, virtio-dev
In-Reply-To: <87edoyqvnu.fsf@redhat.com>
On Wed, Apr 05, 2023 at 03:57:09PM +0200, Cornelia Huck wrote:
> The virtio 1.2 spec has been released on July 1st, 2022, so I think we
> should come up with a plan for the 1.3 release. Given that it took us
> several months last time, we need to think about declaring freeze during
> (northern hemisphere) summer if we want to have a release ready before
> the end-of-year holiday season starts.
>
> We have some things currently under discussion/review that we should
> still consider for 1.3:
>
> - vq index etc. naming cleanup
> - admin vq
> - virtio-video
> - some things we deferred for 1.2 (see
> https://github.com/oasis-tcs/virtio-spec/issues?q=is%3Aissue+is%3Aopen+label%3Adeferred-in-v1.2-csprd01)
> - other things that are not on my personal list (please mention whatever
> you think is important)
>
> Please also keep in mind that 1.3 will not be the last virtio release,
> and many projects like Linux or QEMU consider features stable and ready
> for inclusion once the spec has hit the git tree, but not neccessarily a
> virtio spec release.
>
> That said, let me propose the following timeline:
>
> - July 1st 2023: enter feature freeze; all proposed changes must at
> least have an open github issue that refers to a proposal on-list
> - August 1st 2023: enter change freeze; everything needs to have been
> voted upon (i.e. we can start with preparations for a release like
> compiling the change log)
> - at the same time, fork virtio-next, which can be used for development
> while we're working on the release
> - August/September 2023: prepare draft, initiate initial voting etc.
Hmm looks like you forgot a 30 day the public review period (or is that
included in the "etc"? ). During this period we will likely receive
review comments which we have to address. Should the changes turn out to
be material, another public review period will be required.
Let's spell it out please.
> - by September/October 2023: have the final 1.3 spec ready, voted upon,
> and released
> - virtio-next can be merged into mainline, and we continue with business
> as usual
>
> Thoughts? We might want to move things a tad earlier, but I don't think
> we should push the schedule out much further.
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* Re: [virtio-dev] RE: [PATCH v12 01/10] content: Add vq index text
From: Michael S. Tsirkin @ 2023-04-07 11:30 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
sgarzare@redhat.com, pasic@linux.ibm.com,
virtio-comment@lists.oasis-open.org, Shahaf Shuler
In-Reply-To: <PH0PR12MB54810141129464569ECCC825DC909@PH0PR12MB5481.namprd12.prod.outlook.com>
On Wed, Apr 05, 2023 at 01:03:29PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, April 5, 2023 5:19 AM
> >
> > In fact can we just say "virtqueue index" everywhere? The less slang reader has
> > to learn, the better.
>
> Wow, didn't know that "vq" is a slang. :)
And so is "virtqueue". It is certainly not in my dictionary of standard english.
One term to learn is enough I think.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] RE: [PATCH v12 10/10] virtio-net: Describe RSS using rss rq id
From: Michael S. Tsirkin @ 2023-04-07 11:32 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
sgarzare@redhat.com, pasic@linux.ibm.com,
virtio-comment@lists.oasis-open.org, Shahaf Shuler
In-Reply-To: <PH0PR12MB5481035B8C5D113CA76CB6D9DC909@PH0PR12MB5481.namprd12.prod.outlook.com>
On Wed, Apr 05, 2023 at 01:02:24PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, April 5, 2023 5:18 AM
>
> > following format for \field{command-specific-data}:
> > > \begin{lstlisting}
> > > +struct rss_rq_id {
> > > + le16 vqn_1_16: 15; /* Bits 1 to 16 of the vq index */
> >
> > Do you want to then call this vq_index_1_16 ?
> Yes. will change.
>
> > And in the comment I would just write virtqueue index.
> >
> Why do you prefer longer to name here instead of abbreviation?
Eschew abbreviation is standard advice for writers.
Full names reduce the cognitive load - reader has less terms to
remember to be able to understand the spec.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] RE: [virtio-dev] RE: [PATCH v12 03/10] content: Rename confusing queue_notify_data and vqn names
From: Michael S. Tsirkin @ 2023-04-07 11:43 UTC (permalink / raw)
To: Parav Pandit
Cc: Halil Pasic, virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
sgarzare@redhat.com, virtio-comment@lists.oasis-open.org,
Shahaf Shuler
In-Reply-To: <PH0PR12MB54819AFEDBFEC41CACA0CF92DC909@PH0PR12MB5481.namprd12.prod.outlook.com>
On Wed, Apr 05, 2023 at 03:58:55PM +0000, Parav Pandit wrote:
> > From: Halil Pasic <pasic@linux.ibm.com>
> > Sent: Wednesday, April 5, 2023 11:28 AM
> >
> > On Wed, 5 Apr 2023 13:21:40 +0000
> > Parav Pandit <parav@nvidia.com> wrote:
> >
> > > > VIRTIO_F_NOTIF_CONFIG_DATA is such a narrow usecase, I don't like
> > > > burning "vq identifier" on this. How about we just say something along the
> > lines of:
> > > >
> > > Ok.
> > > >
> > > > When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, this
> > > > notification involves sending either the virtqueue index or the
> > > > virtqueue config data to the device (method depending on the transport).
> > > >
> > > > And then "the data sent is a device supplied virtqueue config data".
> > > >
> > > Sounds fine. I will reword it.
> >
> > FYI in an other thread I proposed calling this a "cookie". Sorry for being late to
> > the party. Yet again.
>
> If we spend (waste) more time, we will find many examples where "identifier" and "cookie" both are used in things associated with computer science.
>
> That too when same set of people has accepted text " internal virtqueue identifier" for same feature of CONFIG_DATA even though somehow it was not "id"!
because that's just an example:
In a trivial case the device can set \field{queue_notify_data}=vqn. Some devices
may benefit from providing another value, for example an internal virtqueue
identifier, or an internal offset related to the virtqueue number.
so the cookie can either be an identifier or something else.
> And when this spec refers to an RFC of UUID, session id (not "session cookie", even though session id is opaque and not meaningful to the recipient as per Wikipedia usage desc that you listed).
>
> For sure "cookie" is better than "config_data" and I don't have objection to "cookie".
>
> But I disagree to the claim that "identifier" is less good than "cookie".
>
> It is pointless debate of "identifier" vs "cookie".
>
> The union format is very useful to describe this crisply, I will use it.
I guess I'm fine with "cookie" in that in CS it is by now widely
understood to mean "some opaque data".
identifier comes from "identical", so implies a 1:1 mapping IMO.
The logic behind using a cookie is that it's a bit similar
to host cookie from ccw.
However, with ccw host cookie is used unconditionally, as
opposed to depending on the flag.
> Do you prefer to rename F_CONFIG_DATA To F_CONFIG_COOKIE?
It should all be consistent, but I worry about ccw which uses cookies
unconditionally. I am fine with leaving it as F_CONFIG_DATA for now
unless we see a good way to avoid confusion for ccw.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] RE: [PATCH v12 03/10] content: Rename confusing queue_notify_data and vqn names
From: Michael S. Tsirkin @ 2023-04-07 11:44 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-dev@lists.oasis-open.org, cohuck@redhat.com,
sgarzare@redhat.com, pasic@linux.ibm.com,
virtio-comment@lists.oasis-open.org, Shahaf Shuler
In-Reply-To: <PH0PR12MB5481C1EC27F79A87218C3767DC909@PH0PR12MB5481.namprd12.prod.outlook.com>
On Wed, Apr 05, 2023 at 01:21:40PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, April 5, 2023 1:30 AM
>
>
> > VIRTIO_F_NOTIF_CONFIG_DATA is such a narrow usecase, I don't like burning
> > "vq identifier" on this. How about we just say something along the lines of:
> >
> Ok.
> >
> > When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, this
> > notification involves sending either the virtqueue index or the virtqueue config
> > data to the device (method depending on the transport).
> >
> > And then "the data sent is a device supplied virtqueue config data".
> >
> Sounds fine. I will reword it.
FWIW I like this more than coming up with a cookie term, it is definitely a
more conservative approach.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] [PATCH v14] virtio-net: support the virtqueue coalescing moderation
From: Michael S. Tsirkin @ 2023-04-07 11:45 UTC (permalink / raw)
To: Halil Pasic
Cc: Heng Qi, virtio-dev, virtio-comment, Cornelia Huck, Parav Pandit,
Alvaro Karsz, David Edmondson, Jason Wang, Xuan Zhuo
In-Reply-To: <20230404182925.349d402e.pasic@linux.ibm.com>
On Tue, Apr 04, 2023 at 06:29:25PM +0200, Halil Pasic wrote:
> On Thu, 23 Mar 2023 23:24:22 +0800
> Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> > +struct virtio_net_ctrl_coal_vq {
> > + le16 vqn;
> > + le16 reserved;
> > + struct virtio_net_ctrl_coal coal;
> > +};
> > +
> > #define VIRTIO_NET_CTRL_NOTF_COAL 6
> > #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0
> > #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
> > \end{lstlisting}
> >
> > Coalescing parameters:
> > \begin{itemize}
> > +\item \field{vqn}: The virtqueue number of an enabled transmit or receive virtqueue.
>
> Just to be on the safe side: VIRTIO_F_NOTIF_CONFIG_DATA has been
> negotiated, and queue_select != queue_notify_data, is vqn supposed
> to contain queue_notify_data or the number/index that is used for
> queue_select (I'm talking about the PCI transport case)?
>
> Regards,
> Halil
I think it's the vq number as it says here, this is not transport specific.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox