[PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF

Discussion of the VIRTIO specification
 help / color / mirror / Atom feed

* [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
@ 2022-01-13 14:50 Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
                   ` (5 more replies)
  0 siblings, 6 replies; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:50 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Hi,

In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
device resource. Hence making efficient use of it based on the use case
that aligns to the VM configuration is desired for best system
performance.

For example, today's static assignment of the amount of MSI-X vectors
doesn't allow sophisticated utilization of resources.

A typical cloud provider SR-IOV use case is to create many VFs for
use by guest VMs. Each VM might have a different purpose and different
amount of resources accordingly (e.g. number of CPUs). A common driver
usage of device's MSI-X vectors is proportional to the number of CPUs in
the VM. Since the system administrator might know the amount of CPUs in
the requested VM, he can also configure the VF's MSI-X vectors amount
proportional to the number of CPUs in the VM. In this way, the
utilization of the physical hardware will be improved.

Today we have some operating systems that support provisioning MSI-X
vectors for PCI VFs.

Update the specification to have a method to change the number of MSI-X
vectors supported by a VF using the PF admin virtqueue interface. For that,
create a generic infrastructure for managing PCI resources of the managed
VF by its parent PF.

Patches (1/5)-(2/5) introduce the admin virtqueue concept and feature bits.
Patches (3/5)-(4/5) add the admin virtq to virtio-blk and virtio-net
devices.
Patch (5/5) introduce MSI-X mgmt support.

Max Gurtovoy (5):
  Add virtio Admin Virtqueue specification
  Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  virtio-net: add support for VIRTIO_F_ADMIN_VQ
  Add support for dynamic MSI-X vector mgmt for VFs

 admin-virtq.tex | 145 ++++++++++++++++++++++++++++++++++++++++++++++++
 content.tex     |  91 +++++++++++++++++++++++++++---
 packed-ring.tex |  26 ++++-----
 split-ring.tex  |  35 ++++++++----
 4 files changed, 263 insertions(+), 34 deletions(-)
 create mode 100644 admin-virtq.tex

-- 
2.21.0

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
@ 2022-01-13 14:50 ` Max Gurtovoy
  2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:50 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

In one of the many use cases a user wants to manipulate features and
configuration of the virtio devices regardless of the device type
(net/block/console). Some of this configuration is generic enough. i.e
Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
such features query and manipulation by its parent PCI PF.

Currently virtio specification defines control virtqueue to manipulate
features and configuration of the device it operates on. However,
control virtqueue commands are device type specific, which makes it very
difficult to extend for device agnostic commands. Control virtqueue is
also limited to follow in order completion for the device which
negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
control virtqueue for feature manipulation in out of order manner for
unrelated commands.

To support these requirements which overcome above two limitations in
elegant way, this patch introduces a new admin virtqueue. Admin
virtqueue will use the same command format for all types of virtio
devices.

Subsequent patches make use of this admin virtqueue.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 content.tex     |  9 +++++++--
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 admin-virtq.tex

diff --git a/admin-virtq.tex b/admin-virtq.tex
new file mode 100644
index 0000000..ad20f89
--- /dev/null
+++ b/admin-virtq.tex
@@ -0,0 +1,49 @@
+\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
+
+Admin virtqueue is used to send administrative commands to manipulate
+various features of the device which would not easily map into the
+configuration space.
+
+Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
+feature bit.
+
+Admin virtqueue index may vary among different device types.
+
+The Admin command set defines the commands that may be issued only to the admin
+virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
+support all the mandatory admin commands. A device MAY support also one or more
+optional admin commands. All commands are of the following form:
+
+\begin{lstlisting}
+struct virtio_admin_cmd {
+        /* Device-readable part */
+        u8 command;
+        u8 command-specific-data[];
+
+        /* Device-writable part */
+        u8 status;
+        u8 command-specific-result[];
+};
+
+/* status values */
+#define VIRTIO_ADMIN_STATUS_OK 0
+#define VIRTIO_ADMIN_STATUS_ERR 1
+#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
+\end{lstlisting}
+
+The \field{command} and \field{command-specific-data} are
+set by the driver, and the device sets the \field{status} and the
+\field{command-specific-result}, if needed.
+
+The following table describes the Admin command set:
+
+\begin{tabular}{|l|l|l|l|}
+\hline
+Opcode (bits) & Opcode (hex) & Command & M/O \\
+\hline \hline
+ -  & 00h - 7Fh   & Generic admin cmds    & -  \\
+\hline
+ -  & 80h - FFh   & Reserved    & - \\
+\hline
+\end{tabular}
+
diff --git a/content.tex b/content.tex
index 32de668..c524fab 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 40] Feature bits reserved for extensions to the queue and
+\item[24 to 41] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[41 and above] Feature bits reserved for future extensions.
+\item[42 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\input{admin-virtq.tex}
+
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
@@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   that the driver can reset a queue individually.
   See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
 
+  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
+  the device supports administration virtqueue negotiation.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
@ 2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-17  9:56     ` Max Gurtovoy
  2022-01-17 14:12     ` Parav Pandit
  0 siblings, 2 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:53 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
> In one of the many use cases a user wants to manipulate features and
> configuration of the virtio devices regardless of the device type
> (net/block/console). Some of this configuration is generic enough. i.e
> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
> such features query and manipulation by its parent PCI PF.
> 
> Currently virtio specification defines control virtqueue to manipulate
> features and configuration of the device it operates on. However,
> control virtqueue commands are device type specific, which makes it very
> difficult to extend for device agnostic commands. Control virtqueue is
> also limited to follow in order completion for the device which
> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
> control virtqueue for feature manipulation in out of order manner for
> unrelated commands.
> 
> To support these requirements which overcome above two limitations in
> elegant way, this patch introduces a new admin virtqueue. Admin
> virtqueue will use the same command format for all types of virtio
> devices.
> 
> Subsequent patches make use of this admin virtqueue.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  content.tex     |  9 +++++++--
>  2 files changed, 56 insertions(+), 2 deletions(-)
>  create mode 100644 admin-virtq.tex
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> new file mode 100644
> index 0000000..ad20f89
> --- /dev/null
> +++ b/admin-virtq.tex
> @@ -0,0 +1,49 @@
> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
> +
> +Admin virtqueue is used to send administrative commands to manipulate
> +various features of the device which would not easily map into the
> +configuration space.

IMHO this is too vague to be useful. E.g. I don't really see
why would not commands specified in the next patch map to config space.


We had an off-list meeting where I proposed addressing one device
from another or grouping multiple devices as a more specific
scope. That would be one way to address this.

Following this idea, all commands would then gain fields for addressing
one device from another.

Not everything maps well to a queue. E.g. it would be great to have
list of available commands in memory.
Figuring out max vectors also looks like a good
example for memory and not through a command.
VQ # of the admin VQ could also be made more discoverable.
How about an SRIOV capability describing this stuff then?




> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
> +feature bit.
> +
> +Admin virtqueue index may vary among different device types.
> +
> +The Admin command set defines the commands that may be issued only to the admin
> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
> +support all the mandatory admin commands. A device MAY support also one or more
> +optional admin commands. All commands are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_admin_cmd {
> +        /* Device-readable part */
> +        u8 command;
> +        u8 command-specific-data[];
> +
> +        /* Device-writable part */
> +        u8 status;
> +        u8 command-specific-result[];
> +};
> +
> +/* status values */
> +#define VIRTIO_ADMIN_STATUS_OK 0
> +#define VIRTIO_ADMIN_STATUS_ERR 1
> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
> +\end{lstlisting}
> +
> +The \field{command} and \field{command-specific-data} are
> +set by the driver, and the device sets the \field{status} and the
> +\field{command-specific-result}, if needed.
> +
> +The following table describes the Admin command set:
> +
> +\begin{tabular}{|l|l|l|l|}
> +\hline
> +Opcode (bits) & Opcode (hex) & Command & M/O \\
> +\hline \hline
> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> +\hline
> + -  & 80h - FFh   & Reserved    & - \\
> +\hline
> +\end{tabular}
> +

Add conformance clauses pls. If this section is too generic to have any then
this functionality is too generic to be useful ;)

> diff --git a/content.tex b/content.tex
> index 32de668..c524fab 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 40] Feature bits reserved for extensions to the queue and
> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[41 and above] Feature bits reserved for future extensions.
> +\item[42 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\input{admin-virtq.tex}
> +
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>  
>  We start with an overview of device initialization, then expand on the
> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that the driver can reset a queue individually.
>    See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>  
> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> +  the device supports administration virtqueue negotiation.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 17:53   ` Michael S. Tsirkin
@ 2022-01-17  9:56     ` Max Gurtovoy
  2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-17 14:12     ` Parav Pandit
  1 sibling, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-17  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
>> In one of the many use cases a user wants to manipulate features and
>> configuration of the virtio devices regardless of the device type
>> (net/block/console). Some of this configuration is generic enough. i.e
>> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
>> such features query and manipulation by its parent PCI PF.
>>
>> Currently virtio specification defines control virtqueue to manipulate
>> features and configuration of the device it operates on. However,
>> control virtqueue commands are device type specific, which makes it very
>> difficult to extend for device agnostic commands. Control virtqueue is
>> also limited to follow in order completion for the device which
>> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
>> control virtqueue for feature manipulation in out of order manner for
>> unrelated commands.
>>
>> To support these requirements which overcome above two limitations in
>> elegant way, this patch introduces a new admin virtqueue. Admin
>> virtqueue will use the same command format for all types of virtio
>> devices.
>>
>> Subsequent patches make use of this admin virtqueue.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   content.tex     |  9 +++++++--
>>   2 files changed, 56 insertions(+), 2 deletions(-)
>>   create mode 100644 admin-virtq.tex
>>
>> diff --git a/admin-virtq.tex b/admin-virtq.tex
>> new file mode 100644
>> index 0000000..ad20f89
>> --- /dev/null
>> +++ b/admin-virtq.tex
>> @@ -0,0 +1,49 @@
>> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
>> +
>> +Admin virtqueue is used to send administrative commands to manipulate
>> +various features of the device which would not easily map into the
>> +configuration space.
> IMHO this is too vague to be useful. E.g. I don't really see
> why would not commands specified in the next patch map to config space.

Well I took this sentence from the current spec :)


>
>
> We had an off-list meeting where I proposed addressing one device
> from another or grouping multiple devices as a more specific
> scope. That would be one way to address this.

Are you suggestion a creation of a virtio subsystem or a virtio group 
definition ?

Devices will be part of this subsystem: one primary/manager device and 
many secondary/managed devices ?

Each subsystem will have a unique UUID and each device will have a 
unique vdev_id within this subsystem.

If this is the direction, I can prepare something..

>
> Following this idea, all commands would then gain fields for addressing
> one device from another.
>
> Not everything maps well to a queue. E.g. it would be great to have
> list of available commands in memory.

I'm not sure I agree. Why can't it map to a queue ?


> Figuring out max vectors also looks like a good
> example for memory and not through a command.

Any explanation why is it looks good ? or better ?

> VQ # of the admin VQ could also be made more discoverable.
> How about an SRIOV capability describing this stuff then?
>
>
>
>
>> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
>> +feature bit.
>> +
>> +Admin virtqueue index may vary among different device types.
>> +
>> +The Admin command set defines the commands that may be issued only to the admin
>> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
>> +support all the mandatory admin commands. A device MAY support also one or more
>> +optional admin commands. All commands are of the following form:
>> +
>> +\begin{lstlisting}
>> +struct virtio_admin_cmd {
>> +        /* Device-readable part */
>> +        u8 command;
>> +        u8 command-specific-data[];
>> +
>> +        /* Device-writable part */
>> +        u8 status;
>> +        u8 command-specific-result[];
>> +};
>> +
>> +/* status values */
>> +#define VIRTIO_ADMIN_STATUS_OK 0
>> +#define VIRTIO_ADMIN_STATUS_ERR 1
>> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
>> +\end{lstlisting}
>> +
>> +The \field{command} and \field{command-specific-data} are
>> +set by the driver, and the device sets the \field{status} and the
>> +\field{command-specific-result}, if needed.
>> +
>> +The following table describes the Admin command set:
>> +
>> +\begin{tabular}{|l|l|l|l|}
>> +\hline
>> +Opcode (bits) & Opcode (hex) & Command & M/O \\
>> +\hline \hline
>> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
>> +\hline
>> + -  & 80h - FFh   & Reserved    & - \\
>> +\hline
>> +\end{tabular}
>> +
> Add conformance clauses pls. If this section is too generic to have any then
> this functionality is too generic to be useful ;)
>
>> diff --git a/content.tex b/content.tex
>> index 32de668..c524fab 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>   \begin{description}
>>   \item[0 to 23] Feature bits for the specific device type
>>   
>> -\item[24 to 40] Feature bits reserved for extensions to the queue and
>> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>>     feature negotiation mechanisms
>>   
>> -\item[41 and above] Feature bits reserved for future extensions.
>> +\item[42 and above] Feature bits reserved for future extensions.
>>   \end{description}
>>   
>>   \begin{note}
>> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\input{admin-virtq.tex}
>> +
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>   
>>   We start with an overview of device initialization, then expand on the
>> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     that the driver can reset a queue individually.
>>     See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>>   
>> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>> +  the device supports administration virtqueue negotiation.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17  9:56     ` Max Gurtovoy
@ 2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-18  3:22         ` Parav Pandit
  2022-01-19  3:04         ` Jason Wang
  0 siblings, 2 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 21:30 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
> > > In one of the many use cases a user wants to manipulate features and
> > > configuration of the virtio devices regardless of the device type
> > > (net/block/console). Some of this configuration is generic enough. i.e
> > > Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
> > > such features query and manipulation by its parent PCI PF.
> > > 
> > > Currently virtio specification defines control virtqueue to manipulate
> > > features and configuration of the device it operates on. However,
> > > control virtqueue commands are device type specific, which makes it very
> > > difficult to extend for device agnostic commands. Control virtqueue is
> > > also limited to follow in order completion for the device which
> > > negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
> > > control virtqueue for feature manipulation in out of order manner for
> > > unrelated commands.
> > > 
> > > To support these requirements which overcome above two limitations in
> > > elegant way, this patch introduces a new admin virtqueue. Admin
> > > virtqueue will use the same command format for all types of virtio
> > > devices.
> > > 
> > > Subsequent patches make use of this admin virtqueue.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >   content.tex     |  9 +++++++--
> > >   2 files changed, 56 insertions(+), 2 deletions(-)
> > >   create mode 100644 admin-virtq.tex
> > > 
> > > diff --git a/admin-virtq.tex b/admin-virtq.tex
> > > new file mode 100644
> > > index 0000000..ad20f89
> > > --- /dev/null
> > > +++ b/admin-virtq.tex
> > > @@ -0,0 +1,49 @@
> > > +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
> > > +
> > > +Admin virtqueue is used to send administrative commands to manipulate
> > > +various features of the device which would not easily map into the
> > > +configuration space.
> > IMHO this is too vague to be useful. E.g. I don't really see
> > why would not commands specified in the next patch map to config space.
> 
> Well I took this sentence from the current spec :)

Well in current spec it applies to things like MAC address filtering,
which does not easily map into config space because number of MACs
varies.


> 
> > 
> > 
> > We had an off-list meeting where I proposed addressing one device
> > from another or grouping multiple devices as a more specific
> > scope. That would be one way to address this.
> 
> Are you suggestion a creation of a virtio subsystem or a virtio group
> definition ?
> 
> Devices will be part of this subsystem: one primary/manager device and many
> secondary/managed devices ?
> 
> Each subsystem will have a unique UUID and each device will have a unique
> vdev_id within this subsystem.
> 
> If this is the direction, I can prepare something..

I was merely saying that what is special about admin queue is that it
allows controlling one device from another within some group.
Or maybe that it allows grouping multiple devices.
*Not* that these are things that do not map to config space.

Let me give you another example, imagine that you want to handle
pagefaults from device.  Clearly a generic thing that does not map to
config space.  It could be a good candidate for the admin queue, however
it would require that lots of buffers are pre-added to the queue. So it
looks like it will beed another distinct fault queue.  Further it is
possible that you want to handle faults within guest, by the driver. In
that case you do not want it in the admin queue since that is controlled
by hypervisor, you want it in a separate queue controlled by driver.


I don't recall discussion about UUID so I can't really say what
I think about that. Do we need a UUID? I'm not sure I understand why.
It can't hurt to abstract things a bit so it's not all tied to
PFs/VFs since we know we'll want subfunctions down the road, too,
if that is what you mean.



> > 
> > Following this idea, all commands would then gain fields for addressing
> > one device from another.
> > 
> > Not everything maps well to a queue. E.g. it would be great to have
> > list of available commands in memory.
> 
> I'm not sure I agree. Why can't it map to a queue ?

You can map it to a queue, yes. But something static
and read only such as list of commands maps well to
config space. And it's not controlling one device from
another, so does not really seem to belong in the admin queue.

> 
> > Figuring out max vectors also looks like a good
> > example for memory and not through a command.
> 
> Any explanation why is it looks good ? or better ?

why is memory easier to operate than a VQ?
It's much simpler and so less error prone.  you can have multiple actors
read such a field at the same time without races, so e.g.  there could
be a sysfs attribute that reads from device on each access, and not
special error handling is needed.

> > VQ # of the admin VQ could also be made more discoverable.
> > How about an SRIOV capability describing this stuff then?
> > 
> > 
> > 
> > 
> > > +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
> > > +feature bit.
> > > +
> > > +Admin virtqueue index may vary among different device types.
> > > +
> > > +The Admin command set defines the commands that may be issued only to the admin
> > > +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
> > > +support all the mandatory admin commands. A device MAY support also one or more
> > > +optional admin commands. All commands are of the following form:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd {
> > > +        /* Device-readable part */
> > > +        u8 command;
> > > +        u8 command-specific-data[];
> > > +
> > > +        /* Device-writable part */
> > > +        u8 status;
> > > +        u8 command-specific-result[];
> > > +};
> > > +
> > > +/* status values */
> > > +#define VIRTIO_ADMIN_STATUS_OK 0
> > > +#define VIRTIO_ADMIN_STATUS_ERR 1
> > > +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
> > > +\end{lstlisting}
> > > +
> > > +The \field{command} and \field{command-specific-data} are
> > > +set by the driver, and the device sets the \field{status} and the
> > > +\field{command-specific-result}, if needed.
> > > +
> > > +The following table describes the Admin command set:
> > > +
> > > +\begin{tabular}{|l|l|l|l|}
> > > +\hline
> > > +Opcode (bits) & Opcode (hex) & Command & M/O \\
> > > +\hline \hline
> > > + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> > > +\hline
> > > + -  & 80h - FFh   & Reserved    & - \\
> > > +\hline
> > > +\end{tabular}
> > > +
> > Add conformance clauses pls. If this section is too generic to have any then
> > this functionality is too generic to be useful ;)
> > 
> > > diff --git a/content.tex b/content.tex
> > > index 32de668..c524fab 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
> > >   \begin{description}
> > >   \item[0 to 23] Feature bits for the specific device type
> > > -\item[24 to 40] Feature bits reserved for extensions to the queue and
> > > +\item[24 to 41] Feature bits reserved for extensions to the queue and
> > >     feature negotiation mechanisms
> > > -\item[41 and above] Feature bits reserved for future extensions.
> > > +\item[42 and above] Feature bits reserved for future extensions.
> > >   \end{description}
> > >   \begin{note}
> > > @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
> > >   types. It is RECOMMENDED that devices generate version 4
> > >   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> > > +\input{admin-virtq.tex}
> > > +
> > >   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > >   We start with an overview of device initialization, then expand on the
> > > @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     that the driver can reset a queue individually.
> > >     See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
> > > +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> > > +  the device supports administration virtqueue negotiation.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 21:30       ` Michael S. Tsirkin
@ 2022-01-18  3:22         ` Parav Pandit
  2022-01-18  6:17           ` Michael S. Tsirkin
  2022-01-19  3:04         ` Jason Wang
  1 sibling, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  3:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:01 AM

> On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
> > > > +Admin virtqueue is used to send administrative commands to
> > > > +manipulate various features of the device which would not easily
> > > > +map into the configuration space.
> > > IMHO this is too vague to be useful. E.g. I don't really see why
> > > would not commands specified in the next patch map to config space.
> >
> > Well I took this sentence from the current spec :)
> 
> Well in current spec it applies to things like MAC address filtering, which does
> not easily map into config space because number of MACs varies.

It doesn't well map to the config space for very primary reason that that it is read+write access that driver should be able to in async manner.
Yes, we will improve this part of the commit log to described that doing via AQ enables driver to not get blocked by previous outstanding command.

> 
> 
> >
> > >
> > >
> > > We had an off-list meeting where I proposed addressing one device
> > > from another or grouping multiple devices as a more specific scope.
> > > That would be one way to address this.
> >
> > Are you suggestion a creation of a virtio subsystem or a virtio group
> > definition ?
> >
> > Devices will be part of this subsystem: one primary/manager device and
> > many secondary/managed devices ?
> >
> > Each subsystem will have a unique UUID and each device will have a
> > unique vdev_id within this subsystem.
> >
> > If this is the direction, I can prepare something..
> 
> I was merely saying that what is special about admin queue is that it allows
> controlling one device from another within some group.
> Or maybe that it allows grouping multiple devices.
> *Not* that these are things that do not map to config space.
> 
> Let me give you another example, imagine that you want to handle pagefaults
> from device.  Clearly a generic thing that does not map to config space.  It
> could be a good candidate for the admin queue, however it would require that
> lots of buffers are pre-added to the queue. So it looks like it will beed another
> distinct fault queue.  
Right page fault queue is async queue located in hv and/or guest more like net device rq.
AQ is serving request-response queue.
Page fault queue likely needed multiple to have any reasonable bw, per cpu is one option.

> Further it is possible that you want to handle faults
> within guest, by the driver. In that case you do not want it in the admin queue
> since that is controlled by hypervisor, you want it in a separate queue
> controlled by driver.
> 
Yes. so it is better to not merge page fault queue with admin queue.

> 
> I don't recall discussion about UUID so I can't really say what I think about that.
> Do we need a UUID? I'm not sure I understand why.
> It can't hurt to abstract things a bit so it's not all tied to PFs/VFs since we know
> we'll want subfunctions down the road, too, if that is what you mean.
>
I still didn't find any reason in the discussion to find out why grouping device is needed.
Current AQ proposal implicitly indicates that VFs of a PF are managed by its parent PF.
And for some reason this work by one of the VF, this role assignment can be certainly a new command on AQ as group command or some other command.
 
> 
> 
> > >
> > > Following this idea, all commands would then gain fields for
> > > addressing one device from another.
> > >
> > > Not everything maps well to a queue. E.g. it would be great to have
> > > list of available commands in memory.
> >
> > I'm not sure I agree. Why can't it map to a queue ?
> 
> You can map it to a queue, yes. But something static and read only such as list
> of commands maps well to config space. And it's not controlling one device
> from another, so does not really seem to belong in the admin queue.
> 
Aq serves the writing device config too in patch-5 in this patchset.

> >
> > > Figuring out max vectors also looks like a good example for memory
> > > and not through a command.
> >
> > Any explanation why is it looks good ? or better ?
> 
> why is memory easier to operate than a VQ?
> It's much simpler and so less error prone.  you can have multiple actors read
> such a field at the same time without races, so e.g.  there could be a sysfs
> attribute that reads from device on each access, and not special error handling
> is needed.
>
Writing fields is inherent part of the aq without getting blocked on previous writes.
I see you acked that AQ is fine in cover letter patch as below, so we are sync on the motivation now.
Yes, will update the commit log as you suggested.

 " if the answer is "commands A,B,C do not fit in config space, we placed commands D,E in a VQ for consistency"
then that is an ok answer, but it's something to be mentioned in the commit log"


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  3:22         ` Parav Pandit
@ 2022-01-18  6:17           ` Michael S. Tsirkin
  0 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:17 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 03:22:27AM +0000, Parav Pandit wrote:
> > I don't recall discussion about UUID so I can't really say what I think about that.
> > Do we need a UUID? I'm not sure I understand why.
> > It can't hurt to abstract things a bit so it's not all tied to PFs/VFs since we know
> > we'll want subfunctions down the road, too, if that is what you mean.
> >
> I still didn't find any reason in the discussion to find out why grouping device is needed.

VFs are already grouped with their PF. However we should spell this out
as the motivation for the admin queue.

> Current AQ proposal implicitly indicates that VFs of a PF are managed by its parent PF.
> And for some reason this work by one of the VF, this role assignment
> can be certainly a new command on AQ as group command or some other
> command.

> > 
> > 
> > > >
> > > > Following this idea, all commands would then gain fields for
> > > > addressing one device from another.
> > > >
> > > > Not everything maps well to a queue. E.g. it would be great to have
> > > > list of available commands in memory.
> > >
> > > I'm not sure I agree. Why can't it map to a queue ?
> > 
> > You can map it to a queue, yes. But something static and read only such as list
> > of commands maps well to config space. And it's not controlling one device
> > from another, so does not really seem to belong in the admin queue.
> > 
> Aq serves the writing device config too in patch-5 in this patchset.

List of available admin commands does not need to be written.

> > >
> > > > Figuring out max vectors also looks like a good example for memory
> > > > and not through a command.
> > >
> > > Any explanation why is it looks good ? or better ?
> > 
> > why is memory easier to operate than a VQ?
> > It's much simpler and so less error prone.  you can have multiple actors read
> > such a field at the same time without races, so e.g.  there could be a sysfs
> > attribute that reads from device on each access, and not special error handling
> > is needed.
> >
> Writing fields is inherent part of the aq without getting blocked on previous writes.
> I see you acked that AQ is fine in cover letter patch as below, so we are sync on the motivation now.
> Yes, will update the commit log as you suggested.
> 
>  " if the answer is "commands A,B,C do not fit in config space, we placed commands D,E in a VQ for consistency"
> then that is an ok answer, but it's something to be mentioned in the commit log"


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-18  3:22         ` Parav Pandit
@ 2022-01-19  3:04         ` Jason Wang
  2022-01-19  8:11           ` Michael S. Tsirkin
  1 sibling, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-19  3:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, parav, shahafs, oren,
	stefanha


在 2022/1/18 上午5:30, Michael S. Tsirkin 写道:
> On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
>> On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
>>> On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
>>>> In one of the many use cases a user wants to manipulate features and
>>>> configuration of the virtio devices regardless of the device type
>>>> (net/block/console). Some of this configuration is generic enough. i.e
>>>> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
>>>> such features query and manipulation by its parent PCI PF.
>>>>
>>>> Currently virtio specification defines control virtqueue to manipulate
>>>> features and configuration of the device it operates on. However,
>>>> control virtqueue commands are device type specific, which makes it very
>>>> difficult to extend for device agnostic commands. Control virtqueue is
>>>> also limited to follow in order completion for the device which
>>>> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
>>>> control virtqueue for feature manipulation in out of order manner for
>>>> unrelated commands.
>>>>
>>>> To support these requirements which overcome above two limitations in
>>>> elegant way, this patch introduces a new admin virtqueue. Admin
>>>> virtqueue will use the same command format for all types of virtio
>>>> devices.
>>>>
>>>> Subsequent patches make use of this admin virtqueue.
>>>>
>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>    admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    content.tex     |  9 +++++++--
>>>>    2 files changed, 56 insertions(+), 2 deletions(-)
>>>>    create mode 100644 admin-virtq.tex
>>>>
>>>> diff --git a/admin-virtq.tex b/admin-virtq.tex
>>>> new file mode 100644
>>>> index 0000000..ad20f89
>>>> --- /dev/null
>>>> +++ b/admin-virtq.tex
>>>> @@ -0,0 +1,49 @@
>>>> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
>>>> +
>>>> +Admin virtqueue is used to send administrative commands to manipulate
>>>> +various features of the device which would not easily map into the
>>>> +configuration space.
>>> IMHO this is too vague to be useful. E.g. I don't really see
>>> why would not commands specified in the next patch map to config space.
>> Well I took this sentence from the current spec :)
> Well in current spec it applies to things like MAC address filtering,
> which does not easily map into config space because number of MACs
> varies.
>
>
>>>
>>> We had an off-list meeting where I proposed addressing one device
>>> from another or grouping multiple devices as a more specific
>>> scope. That would be one way to address this.
>> Are you suggestion a creation of a virtio subsystem or a virtio group
>> definition ?
>>
>> Devices will be part of this subsystem: one primary/manager device and many
>> secondary/managed devices ?
>>
>> Each subsystem will have a unique UUID and each device will have a unique
>> vdev_id within this subsystem.
>>
>> If this is the direction, I can prepare something..
> I was merely saying that what is special about admin queue is that it
> allows controlling one device from another within some group.
> Or maybe that it allows grouping multiple devices.
> *Not* that these are things that do not map to config space.
>
> Let me give you another example, imagine that you want to handle
> pagefaults from device.  Clearly a generic thing that does not map to
> config space.  It could be a good candidate for the admin queue, however
> it would require that lots of buffers are pre-added to the queue. So it
> looks like it will beed another distinct fault queue.


That seems a duplication of the PRS queue which is implemented in the 
AMD/Intel IOMMUs which I'm not sure it's worth.


>   Further it is
> possible that you want to handle faults within guest, by the driver. In
> that case you do not want it in the admin queue since that is controlled
> by hypervisor, you want it in a separate queue controlled by driver.


Exactly, another call for the using the PRS queue instead. But generally 
speaking, admin virtqueue limit or complicate the functions that can be 
exported to guest. That's why I suggest to decouple all the possible 
features out of admin virtqueue, and make it available by both the admin 
virtqueue and the transport specific method (e.g capability).

Thanks


>
>
> I don't recall discussion about UUID so I can't really say what
> I think about that. Do we need a UUID? I'm not sure I understand why.
> It can't hurt to abstract things a bit so it's not all tied to
> PFs/VFs since we know we'll want subfunctions down the road, too,
> if that is what you mean.
>
>
>
>>> Following this idea, all commands would then gain fields for addressing
>>> one device from another.
>>>
>>> Not everything maps well to a queue. E.g. it would be great to have
>>> list of available commands in memory.
>> I'm not sure I agree. Why can't it map to a queue ?
> You can map it to a queue, yes. But something static
> and read only such as list of commands maps well to
> config space. And it's not controlling one device from
> another, so does not really seem to belong in the admin queue.
>
>>> Figuring out max vectors also looks like a good
>>> example for memory and not through a command.
>> Any explanation why is it looks good ? or better ?
> why is memory easier to operate than a VQ?
> It's much simpler and so less error prone.  you can have multiple actors
> read such a field at the same time without races, so e.g.  there could
> be a sysfs attribute that reads from device on each access, and not
> special error handling is needed.
>
>>> VQ # of the admin VQ could also be made more discoverable.
>>> How about an SRIOV capability describing this stuff then?
>>>
>>>
>>>
>>>
>>>> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
>>>> +feature bit.
>>>> +
>>>> +Admin virtqueue index may vary among different device types.
>>>> +
>>>> +The Admin command set defines the commands that may be issued only to the admin
>>>> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
>>>> +support all the mandatory admin commands. A device MAY support also one or more
>>>> +optional admin commands. All commands are of the following form:
>>>> +
>>>> +\begin{lstlisting}
>>>> +struct virtio_admin_cmd {
>>>> +        /* Device-readable part */
>>>> +        u8 command;
>>>> +        u8 command-specific-data[];
>>>> +
>>>> +        /* Device-writable part */
>>>> +        u8 status;
>>>> +        u8 command-specific-result[];
>>>> +};
>>>> +
>>>> +/* status values */
>>>> +#define VIRTIO_ADMIN_STATUS_OK 0
>>>> +#define VIRTIO_ADMIN_STATUS_ERR 1
>>>> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
>>>> +\end{lstlisting}
>>>> +
>>>> +The \field{command} and \field{command-specific-data} are
>>>> +set by the driver, and the device sets the \field{status} and the
>>>> +\field{command-specific-result}, if needed.
>>>> +
>>>> +The following table describes the Admin command set:
>>>> +
>>>> +\begin{tabular}{|l|l|l|l|}
>>>> +\hline
>>>> +Opcode (bits) & Opcode (hex) & Command & M/O \\
>>>> +\hline \hline
>>>> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
>>>> +\hline
>>>> + -  & 80h - FFh   & Reserved    & - \\
>>>> +\hline
>>>> +\end{tabular}
>>>> +
>>> Add conformance clauses pls. If this section is too generic to have any then
>>> this functionality is too generic to be useful ;)
>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 32de668..c524fab 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>>>    \begin{description}
>>>>    \item[0 to 23] Feature bits for the specific device type
>>>> -\item[24 to 40] Feature bits reserved for extensions to the queue and
>>>> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>>>>      feature negotiation mechanisms
>>>> -\item[41 and above] Feature bits reserved for future extensions.
>>>> +\item[42 and above] Feature bits reserved for future extensions.
>>>>    \end{description}
>>>>    \begin{note}
>>>> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>>>    types. It is RECOMMENDED that devices generate version 4
>>>>    UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>>> +\input{admin-virtq.tex}
>>>> +
>>>>    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>>    We start with an overview of device initialization, then expand on the
>>>> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>>>      that the driver can reset a queue individually.
>>>>      See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>>>> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>>>> +  the device supports administration virtqueue negotiation.
>>>> +
>>>>    \end{description}
>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>>> -- 
>>>> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19  3:04         ` Jason Wang
@ 2022-01-19  8:11           ` Michael S. Tsirkin
  2022-01-25  3:35             ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  8:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, parav, shahafs,
	oren, stefanha

On Wed, Jan 19, 2022 at 11:04:50AM +0800, Jason Wang wrote:
> Exactly, another call for the using the PRS queue instead. But generally
> speaking, admin virtqueue limit or complicate the functions that can be
> exported to guest. That's why I suggest to decouple all the possible
> features out of admin virtqueue, and make it available by both the admin
> virtqueue and the transport specific method (e.g capability).

I'm not exactly sure what's wrong with starting with a queue, if there's
need to also allow passing that over another transport we can add that.
In particular, I think it's useful to have a capability to inject
requests as if they have been passed through a VQ.
Such a capability would address this need, won't it?

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19  8:11           ` Michael S. Tsirkin
@ 2022-01-25  3:35             ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2022-01-25  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, Cornelia Huck, Virtio-Dev,
	Parav Pandit, Shahaf Shuler, Oren Duer, Stefan Hajnoczi

On Wed, Jan 19, 2022 at 4:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 11:04:50AM +0800, Jason Wang wrote:
> > Exactly, another call for the using the PRS queue instead. But generally
> > speaking, admin virtqueue limit or complicate the functions that can be
> > exported to guest. That's why I suggest to decouple all the possible
> > features out of admin virtqueue, and make it available by both the admin
> > virtqueue and the transport specific method (e.g capability).
>
> I'm not exactly sure what's wrong with starting with a queue, if there's
> need to also allow passing that over another transport we can add that.

Nothing wrong, but I think we can't mandate the features to be
implemented solely via admin virtqueue. Each transport has its own use
cases. Making admin virtqueue to be visible in the nested environment
will be a challenge.

> In particular, I think it's useful to have a capability to inject
> requests as if they have been passed through a VQ.
> Such a capability would address this need, won't it?

It really depends on the requirement, for simple requests like MSI
support in virtio-mmio, introducing such a large change seems
sub-optimal than an ad-hoc MSI interface.

Thanks


>
> --
> MST
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-17  9:56     ` Max Gurtovoy
@ 2022-01-17 14:12     ` Parav Pandit
  2022-01-17 22:03       ` Michael S. Tsirkin
  1 sibling, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-17 14:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, January 13, 2022 11:24 PM

> 
> We had an off-list meeting where I proposed addressing one device from
> another or grouping multiple devices as a more specific scope. That would be
> one way to address this.
> 
> Following this idea, all commands would then gain fields for addressing one
> device from another.
> 
Can you please explain your idea more and a need for grouping?
What do you want to group? VFs of parent pci device?
How to refer to each VF within a group?

If you have notes of the off-list meeting, it will be useful for us to read through.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 14:12     ` Parav Pandit
@ 2022-01-17 22:03       ` Michael S. Tsirkin
  2022-01-18  3:36         ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:03 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, January 13, 2022 11:24 PM
> 
> > 
> > We had an off-list meeting where I proposed addressing one device from
> > another or grouping multiple devices as a more specific scope. That would be
> > one way to address this.
> > 
> > Following this idea, all commands would then gain fields for addressing one
> > device from another.
> > 
> Can you please explain your idea more and a need for grouping?
> What do you want to group? VFs of parent pci device?
> How to refer to each VF within a group?

So for example, VFs of a PF are a group right? And they are all
controlled by a PF.

I can think of setups like nesting where we might want to
create a group of VFs and pass them to L1, one of the
VFs to act as an admin for the reset of them for purposes
of L2.  subfunctions with PASID etc are another
example. I am not asking you to add such mechanisms straight away
but the current proposal kind of obscures this to the point
where I don't see how would we extend it with these things
down the road.


> If you have notes of the off-list meeting, it will be useful for us to read through.

Sorry didn't take notes.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 22:03       ` Michael S. Tsirkin
@ 2022-01-18  3:36         ` Parav Pandit
  2022-01-18  7:07           ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  3:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:33 AM
> 
> On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, January 13, 2022 11:24 PM
> >
> > >
> > > We had an off-list meeting where I proposed addressing one device
> > > from another or grouping multiple devices as a more specific scope.
> > > That would be one way to address this.
> > >
> > > Following this idea, all commands would then gain fields for
> > > addressing one device from another.
> > >
> > Can you please explain your idea more and a need for grouping?
> > What do you want to group? VFs of parent pci device?
> > How to refer to each VF within a group?
> 
> So for example, VFs of a PF are a group right? And they are all controlled by a
> PF.
> 
> I can think of setups like nesting where we might want to create a group of VFs
> and pass them to L1, one of the VFs to act as an admin for the reset of them for
> purposes of L2.  subfunctions with PASID etc are another example. 

Subfunctions with PASID can be similarly managed by extending device identification and its MSIX/IMS vector details.
May be vf_number should be put in the union as,

union device_id {
	struct pci_vf vf_id; /* current */
	struct pci_sf sf_id; /* future */
};

So that they both can use command opcode.

> I am not
> asking you to add such mechanisms straight away but the current proposal
> kind of obscures this to the point where I don't see how would we extend it
> with these things down the road.
> 
Which part in specific make it obscure? New device type can be identifiable by above union.

May be a better structure would be in patch-5 is:
Something like below,

struct virtio_admin_pci_virt_property_set {
	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction */
	union virtio_device_identifier {
		struct virtio_pci_dev_id pf_vf; /* current */
		struct virtio_subfunction sf; /* future */
	};
	enum virtio_interrupt_type interrupt_type; /* msix, ims=device specific, intx, something else */
	union virtio_interrupt_config {
		struct virtio_pci_msix_config msix_config;
	};
};

struct virtio_pci_interrupt_config {
	le16 msix_count;
};


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  3:36         ` Parav Pandit
@ 2022-01-18  7:07           ` Michael S. Tsirkin
  2022-01-18  7:14             ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:07 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 03:36:19AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 3:33 AM
> > 
> > On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, January 13, 2022 11:24 PM
> > >
> > > >
> > > > We had an off-list meeting where I proposed addressing one device
> > > > from another or grouping multiple devices as a more specific scope.
> > > > That would be one way to address this.
> > > >
> > > > Following this idea, all commands would then gain fields for
> > > > addressing one device from another.
> > > >
> > > Can you please explain your idea more and a need for grouping?
> > > What do you want to group? VFs of parent pci device?
> > > How to refer to each VF within a group?
> > 
> > So for example, VFs of a PF are a group right? And they are all controlled by a
> > PF.
> > 
> > I can think of setups like nesting where we might want to create a group of VFs
> > and pass them to L1, one of the VFs to act as an admin for the reset of them for
> > purposes of L2.  subfunctions with PASID etc are another example. 
> 
> Subfunctions with PASID can be similarly managed by extending device identification and its MSIX/IMS vector details.
> May be vf_number should be put in the union as,
> 
> union device_id {
> 	struct pci_vf vf_id; /* current */
> 	struct pci_sf sf_id; /* future */
> };
> 
> So that they both can use command opcode.

device id is not a good name, but yes. However this is why I think we
should have a slightly more generic terminology, and more space for
these IDs, and then we'd have a specific binding for VFs.


> > I am not
> > asking you to add such mechanisms straight away but the current proposal
> > kind of obscures this to the point where I don't see how would we extend it
> > with these things down the road.
> > 
> Which part in specific make it obscure?

just that the text is not generic. would be nicer if adding
new types would involve only changing one or two places

> New device type can be identifiable by above union.
> 
> May be a better structure would be in patch-5 is:
> Something like below,
> 
> struct virtio_admin_pci_virt_property_set {
> 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction */
> 	union virtio_device_identifier {
> 		struct virtio_pci_dev_id pf_vf; /* current */
> 		struct virtio_subfunction sf; /* future */
> 	};
> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device specific, intx, something else */
> 	union virtio_interrupt_config {
> 		struct virtio_pci_msix_config msix_config;
> 	};
> };
> 
> struct virtio_pci_interrupt_config {
> 	le16 msix_count;
> };

you do not need a union straight away, Simply use something like this "device
identifier" everywhere and then add some text explaining that currently
it is a VF number and that admin device is a PF.

However we need better names, device ID is already used in the spec
for enumeration/discovery. come up with something else please.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:07           ` Michael S. Tsirkin
@ 2022-01-18  7:14             ` Parav Pandit
  2022-01-18  7:20               ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:38 PM

> > Subfunctions with PASID can be similarly managed by extending device
> identification and its MSIX/IMS vector details.
> > May be vf_number should be put in the union as,
> >
> > union device_id {
> > 	struct pci_vf vf_id; /* current */
> > 	struct pci_sf sf_id; /* future */
> > };
> >
> > So that they both can use command opcode.
> 
> device id is not a good name, but yes. However this is why I think we should
> have a slightly more generic terminology, and more space for these IDs, and
> then we'd have a specific binding for VFs.
> 
I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.

> > > I am not
> > > asking you to add such mechanisms straight away but the current
> > > proposal kind of obscures this to the point where I don't see how
> > > would we extend it with these things down the road.
> > >
> > Which part in specific make it obscure?
> 
> just that the text is not generic. would be nicer if adding new types would
> involve only changing one or two places
> 
> > New device type can be identifiable by above union.
> >
> > May be a better structure would be in patch-5 is:
> > Something like below,
> >
> > struct virtio_admin_pci_virt_property_set {
> > 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
> */
> > 	union virtio_device_identifier {
> > 		struct virtio_pci_dev_id pf_vf; /* current */
> > 		struct virtio_subfunction sf; /* future */
> > 	};
> > 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> specific, intx, something else */
> > 	union virtio_interrupt_config {
> > 		struct virtio_pci_msix_config msix_config;
> > 	};
> > };
> >
> > struct virtio_pci_interrupt_config {
> > 	le16 msix_count;
> > };
> 
> you do not need a union straight away, Simply use something like this "device
> identifier" everywhere and then add some text explaining that currently it is a
> VF number and that admin device is a PF.
Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.
pci_vf_number is very crisp for the pci device for a PCI VF specific command.
So I am ruling out arbitrary number of bytes reservation.
And split the command to two pieces.
1. command opcode
2. command content (pci vf specific). This will be different structure for subfunction or for non pci device

Would that be more elegant?

> 
> However we need better names, device ID is already used in the spec for
> enumeration/discovery. come up with something else please.
Yes.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:14             ` Parav Pandit
@ 2022-01-18  7:20               ` Michael S. Tsirkin
  2022-01-19 11:33                 ` Max Gurtovoy
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:14:56AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:38 PM
> 
> > > Subfunctions with PASID can be similarly managed by extending device
> > identification and its MSIX/IMS vector details.
> > > May be vf_number should be put in the union as,
> > >
> > > union device_id {
> > > 	struct pci_vf vf_id; /* current */
> > > 	struct pci_sf sf_id; /* future */
> > > };
> > >
> > > So that they both can use command opcode.
> > 
> > device id is not a good name, but yes. However this is why I think we should
> > have a slightly more generic terminology, and more space for these IDs, and
> > then we'd have a specific binding for VFs.
> > 
> I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.
> 
> > > > I am not
> > > > asking you to add such mechanisms straight away but the current
> > > > proposal kind of obscures this to the point where I don't see how
> > > > would we extend it with these things down the road.
> > > >
> > > Which part in specific make it obscure?
> > 
> > just that the text is not generic. would be nicer if adding new types would
> > involve only changing one or two places
> > 
> > > New device type can be identifiable by above union.
> > >
> > > May be a better structure would be in patch-5 is:
> > > Something like below,
> > >
> > > struct virtio_admin_pci_virt_property_set {
> > > 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
> > */
> > > 	union virtio_device_identifier {
> > > 		struct virtio_pci_dev_id pf_vf; /* current */
> > > 		struct virtio_subfunction sf; /* future */
> > > 	};
> > > 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> > specific, intx, something else */
> > > 	union virtio_interrupt_config {
> > > 		struct virtio_pci_msix_config msix_config;
> > > 	};
> > > };
> > >
> > > struct virtio_pci_interrupt_config {
> > > 	le16 msix_count;
> > > };
> > 
> > you do not need a union straight away, Simply use something like this "device
> > identifier" everywhere and then add some text explaining that currently it is a
> > VF number and that admin device is a PF.
> Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.

So reserve some bytes then. 4 should be plenty.

> pci_vf_number is very crisp for the pci device for a PCI VF specific command.
> So I am ruling out arbitrary number of bytes reservation.

we already know we'll need subfunctions. so I would say make it 4 bytes.

> And split the command to two pieces.
> 1. command opcode
> 2. command content (pci vf specific). This will be different structure for subfunction or for non pci device
> 
> Would that be more elegant?

no idea about non pci. we do know about subfunctions so let us not
pretend then are this unknown entity that are very hard to reason about,
it's something that's just around the corner.

> > 
> > However we need better names, device ID is already used in the spec for
> > enumeration/discovery. come up with something else please.
> Yes.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:20               ` Michael S. Tsirkin
@ 2022-01-19 11:33                 ` Max Gurtovoy
  2022-01-19 12:21                   ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-19 11:33 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


On 1/18/2022 9:20 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2022 at 07:14:56AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Tuesday, January 18, 2022 12:38 PM
>>>> Subfunctions with PASID can be similarly managed by extending device
>>> identification and its MSIX/IMS vector details.
>>>> May be vf_number should be put in the union as,
>>>>
>>>> union device_id {
>>>> 	struct pci_vf vf_id; /* current */
>>>> 	struct pci_sf sf_id; /* future */
>>>> };
>>>>
>>>> So that they both can use command opcode.
>>> device id is not a good name, but yes. However this is why I think we should
>>> have a slightly more generic terminology, and more space for these IDs, and
>>> then we'd have a specific binding for VFs.
>>>
>> I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.
>>
>>>>> I am not
>>>>> asking you to add such mechanisms straight away but the current
>>>>> proposal kind of obscures this to the point where I don't see how
>>>>> would we extend it with these things down the road.
>>>>>
>>>> Which part in specific make it obscure?
>>> just that the text is not generic. would be nicer if adding new types would
>>> involve only changing one or two places
>>>
>>>> New device type can be identifiable by above union.
>>>>
>>>> May be a better structure would be in patch-5 is:
>>>> Something like below,
>>>>
>>>> struct virtio_admin_pci_virt_property_set {
>>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
>>> */
>>>> 	union virtio_device_identifier {
>>>> 		struct virtio_pci_dev_id pf_vf; /* current */
>>>> 		struct virtio_subfunction sf; /* future */
>>>> 	};
>>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
>>> specific, intx, something else */
>>>> 	union virtio_interrupt_config {
>>>> 		struct virtio_pci_msix_config msix_config;
>>>> 	};
>>>> };
>>>>
>>>> struct virtio_pci_interrupt_config {
>>>> 	le16 msix_count;
>>>> };
>>> you do not need a union straight away, Simply use something like this "device
>>> identifier" everywhere and then add some text explaining that currently it is a
>>> VF number and that admin device is a PF.
>> Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.
> So reserve some bytes then. 4 should be plenty.

Ok so in V2 we'll use 4 bytes as device identifier to be generic.

We can call it lid (local id) of vlid (virtio local id) ?

Are we ok with one of the above names ?



^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 11:33                 ` Max Gurtovoy
@ 2022-01-19 12:21                   ` Parav Pandit
  2022-01-19 14:47                     ` Max Gurtovoy
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-19 12:21 UTC (permalink / raw)
  To: Max Gurtovoy, Michael S. Tsirkin
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> Sent: Wednesday, January 19, 2022 5:04 PM

[..]
> >>>> struct virtio_admin_pci_virt_property_set {
> >>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf,
> >>>> subfunction
> >>> */
> >>>> 	union virtio_device_identifier {
> >>>> 		struct virtio_pci_dev_id pf_vf; /* current */
> >>>> 		struct virtio_subfunction sf; /* future */
> >>>> 	};
> >>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> >>> specific, intx, something else */
> >>>> 	union virtio_interrupt_config {
> >>>> 		struct virtio_pci_msix_config msix_config;
> >>>> 	};
> >>>> };
> >>>>
> >>>> struct virtio_pci_interrupt_config {
> >>>> 	le16 msix_count;
> >>>> };
> >>> you do not need a union straight away, Simply use something like
> >>> this "device identifier" everywhere and then add some text
> >>> explaining that currently it is a VF number and that admin device is a PF.
> >> Unless we reserve some bytes, I fail to see how can it be future compatible
> for unknown device id type for subfunction.
> > So reserve some bytes then. 4 should be plenty.
> 
I am not comfortable reserving 4 bytes for sf, though it is good option and already in use in one OS for more a year now.

> Ok so in V2 we'll use 4 bytes as device identifier to be generic.
> 
> We can call it lid (local id) of vlid (virtio local id) ?
> 
> Are we ok with one of the above names ?
> 
I go back to rethink the structure, and don’t see a need to abstract something which is so well defined.

I see need of below structures, how should it be made more abstract without breaking backward compat and without defining as TLV.

struct virtio_admin_pci_vf_interrupt_config {
	/* v1 current */
	le64 property_mask; /* bit 0 valid */
	le16 vf_number;
	le16 msix_count;
};

struct virtio_admin_pci_vf_interrupt_config {
	/* v2 near future, backward compatible */
	le64 property_mask; /* bit 0, 1 valid */
	le16 vf_number;
	le16 msix_count;
	le16 ims_count;
};

struct virtio_admin_pci_sf_interrupt_config {
	/* v3 future, new struct, no need of backward compat */
	le64 property_mask; /* bit 0,1,2, valid */
	le32 sf_number;
	/* is 4 bytes enough to describe sf,
	 * what if community decides uuid to identify sf?
	 * How about we take out device identifier outside of this struct?
	 */
	le16 msix_count;
	le16 ims_count;
	le16 pci_caps; /* pci atomics enable */
};

virtio_unknown_transport_interrupt_config {
	/* vX future */
	le64 property_mask;
	<unknown len> device identifier;
	le16 msix_count;
	le16 ims_count;
};

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 12:21                   ` Parav Pandit
@ 2022-01-19 14:47                     ` Max Gurtovoy
  2022-01-19 15:38                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-19 14:47 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


On 1/19/2022 2:21 PM, Parav Pandit wrote:
>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
>> Sent: Wednesday, January 19, 2022 5:04 PM
> [..]
>>>>>> struct virtio_admin_pci_virt_property_set {
>>>>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf,
>>>>>> subfunction
>>>>> */
>>>>>> 	union virtio_device_identifier {
>>>>>> 		struct virtio_pci_dev_id pf_vf; /* current */
>>>>>> 		struct virtio_subfunction sf; /* future */
>>>>>> 	};
>>>>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
>>>>> specific, intx, something else */
>>>>>> 	union virtio_interrupt_config {
>>>>>> 		struct virtio_pci_msix_config msix_config;
>>>>>> 	};
>>>>>> };
>>>>>>
>>>>>> struct virtio_pci_interrupt_config {
>>>>>> 	le16 msix_count;
>>>>>> };
>>>>> you do not need a union straight away, Simply use something like
>>>>> this "device identifier" everywhere and then add some text
>>>>> explaining that currently it is a VF number and that admin device is a PF.
>>>> Unless we reserve some bytes, I fail to see how can it be future compatible
>> for unknown device id type for subfunction.
>>> So reserve some bytes then. 4 should be plenty.
> I am not comfortable reserving 4 bytes for sf, though it is good option and already in use in one OS for more a year now.
>
>> Ok so in V2 we'll use 4 bytes as device identifier to be generic.
>>
>> We can call it lid (local id) of vlid (virtio local id) ?
>>
>> Are we ok with one of the above names ?
>>
> I go back to rethink the structure, and don’t see a need to abstract something which is so well defined.
>
> I see need of below structures, how should it be made more abstract without breaking backward compat and without defining as TLV.

I agree, I don't see why it's not possible to use a different command 
opcode for vf interrupt configuration and sf interrupt configuration.

We're not short in opcodes and it's very elegant and extendable IMO.

I think the order should be:

1. add adminq to virtio spec with one simple example (say MSIX config 
for VFs)

2. in parallel submission for admin commands: S-IOV support, num VQs 
config, feature bits config and more...

The below example emphasizes that the adminq protocol is flexible and 
easily extendable.

>
> struct virtio_admin_pci_vf_interrupt_config {
> 	/* v1 current */
> 	le64 property_mask; /* bit 0 valid */
> 	le16 vf_number;
> 	le16 msix_count;
> };
>
> struct virtio_admin_pci_vf_interrupt_config {
> 	/* v2 near future, backward compatible */
> 	le64 property_mask; /* bit 0, 1 valid */
> 	le16 vf_number;
> 	le16 msix_count;
> 	le16 ims_count;
> };
>
> struct virtio_admin_pci_sf_interrupt_config {
> 	/* v3 future, new struct, no need of backward compat */
> 	le64 property_mask; /* bit 0,1,2, valid */
> 	le32 sf_number;
> 	/* is 4 bytes enough to describe sf,
> 	 * what if community decides uuid to identify sf?
> 	 * How about we take out device identifier outside of this struct?
> 	 */
> 	le16 msix_count;
> 	le16 ims_count;
> 	le16 pci_caps; /* pci atomics enable */
> };
>
> virtio_unknown_transport_interrupt_config {
> 	/* vX future */
> 	le64 property_mask;
> 	<unknown len> device identifier;
> 	le16 msix_count;
> 	le16 ims_count;
> };


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 14:47                     ` Max Gurtovoy
@ 2022-01-19 15:38                       ` Michael S. Tsirkin
  2022-01-19 15:47                         ` Max Gurtovoy
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19 15:38 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Wed, Jan 19, 2022 at 04:47:10PM +0200, Max Gurtovoy wrote:
> I agree, I don't see why it's not possible to use a different command opcode
> for vf interrupt configuration and sf interrupt configuration.
> 
> We're not short in opcodes and it's very elegant and extendable IMO.
> 
> I think the order should be:
> 
> 1. add adminq to virtio spec with one simple example (say MSIX config for
> VFs)
> 
> 2. in parallel submission for admin commands: S-IOV support, num VQs config,
> feature bits config and more...

Up to you for sure but didn't you guys try this already?  I think there
are concerns such as how this will be extended to support subfunctions.
I can't say what do you want to do about that in v2, ignoring them
completely is probably not a good way to get more support in the TC.
Just my two cents, hope this helps.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 15:38                       ` Michael S. Tsirkin
@ 2022-01-19 15:47                         ` Max Gurtovoy
  0 siblings, 0 replies; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-19 15:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com


On 1/19/2022 5:38 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 19, 2022 at 04:47:10PM +0200, Max Gurtovoy wrote:
>> I agree, I don't see why it's not possible to use a different command opcode
>> for vf interrupt configuration and sf interrupt configuration.
>>
>> We're not short in opcodes and it's very elegant and extendable IMO.
>>
>> I think the order should be:
>>
>> 1. add adminq to virtio spec with one simple example (say MSIX config for
>> VFs)
>>
>> 2. in parallel submission for admin commands: S-IOV support, num VQs config,
>> feature bits config and more...
> Up to you for sure but didn't you guys try this already?  I think there
> are concerns such as how this will be extended to support subfunctions.
> I can't say what do you want to do about that in v2, ignoring them
> completely is probably not a good way to get more support in the TC.
> Just my two cents, hope this helps.

I think that Parav demonstrated the flexibility and extendability of 
this interface.

And also I think you mentioned that you don't expect us to instrument 
this solution to this series.

During this discussion we agreed that using admin commands one can 
manage SRIOV and SIOV, didn't we ?


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 15:33   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

These new features are parallel to VIRTIO_F_INDIRECT_DESC and
VIRTIO_F_IN_ORDER. Some devices might support these features only for
admin virtqueues and some might support them for both admin virtqueues
and request virtqueues or only for non-admin virtqueues. Some
optimization can be made for each type of virtqueue, thus we separate
these features for the different virtqueue types.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
 packed-ring.tex | 26 +++++++++++++-------------
 split-ring.tex  | 35 +++++++++++++++++++++++------------
 3 files changed, 75 insertions(+), 33 deletions(-)

diff --git a/content.tex b/content.tex
index c524fab..cc3e648 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 41] Feature bits reserved for extensions to the queue and
+\item[24 to 43] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[42 and above] Feature bits reserved for future extensions.
+\item[44 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
 
 Some devices always use descriptors in the same order in which
 they have been made available. These devices can offer the
-VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
-might allow optimizations or simplify driver and/or device code.
+VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
+If negotiated, this knowledge might allow optimizations or
+simplify driver and/or device code.
 
 Each virtqueue can consist of up to 3 parts:
 \begin{itemize}
@@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
 Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
 Virtqueues / The Virtqueue Descriptor Table / Indirect
-Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
+Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
   \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
   and the \field{avail_event} fields as described in
 \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
@@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   support for the packed virtqueue layout as described in
   \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
   \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
-  that all buffers are used by the device in the same
-  order in which they have been made available.
+  that all buffers are used by the device, excluding buffers used by
+  the admin virtqueue, in the same order in which they have been made
+  available.
   \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
   that memory accesses by the driver and the device are ordered
   in a way described by the platform.
@@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
   the device supports administration virtqueue negotiation.
 
+  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
+  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
+  flag set, as described in \ref{sec:Basic Facilities of a Virtio
+Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
+Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
+  virtqueue and excluding the descriptors that sent via other virtqueues.
+  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
+  that all buffers are used by the admin virtqueue of the device in
+  the same order in which they have been made available.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
 
+A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
+A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
@@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 accepted.
 
 If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
-buffers in the same order in which they have been available.
+buffers in the same order in which they have been available. This refers
+to buffers that are used by virtqueue that is not the admin virtqueue.
+
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
+buffers in the same order in which they have been available. This refers
+only for buffers that are used by the admin virtqueue.
 
 A device MAY fail to operate further if
 VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
@@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 and presents a PCI SR-IOV capability structure, otherwise
 it MUST NOT offer VIRTIO_F_SR_IOV.
 
+A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
+VIRTIO_F_ADMIN_VQ.
+
+A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
+VIRTIO_F_ADMIN_VQ.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
diff --git a/packed-ring.tex b/packed-ring.tex
index a9e6c16..ef1dbc2 100644
--- a/packed-ring.tex
+++ b/packed-ring.tex
@@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
 \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
 
 Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
-ring capacity the driver can store a (read-only by the device) table of indirect
-descriptors anywhere in memory, and insert a descriptor in the main
-virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
-a buffer element
-containing this indirect descriptor table; \field{addr} and \field{len}
-refer to the indirect table address and length in bytes,
+of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
+features allows this. To increase ring capacity the driver can store a (read-only
+by the device) table of indirect descriptors anywhere in memory, and insert a
+descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
+that refers to a buffer element containing this indirect descriptor table;
+\field{addr} and \field{len} refer to the indirect table address and length in bytes,
 respectively.
 \begin{lstlisting}
 /* This means the element contains a table of descriptors. */
@@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
 
 Some devices always use descriptors in the same order in which
 they have been made available. These devices can offer the
-VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
-devices to notify the use of a batch of buffers to the driver by
-only writing out a single used descriptor with the Buffer ID
-corresponding to the last descriptor in the batch.
+VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
+If negotiated, this knowledge allows devices to notify the use of
+a batch of buffers to the driver by only writing out a single used
+descriptor with the Buffer ID corresponding to the last descriptor
+in the batch.
 
 The device then skips forward in the ring according to the size of
 the batch. The driver needs to look up the used Buffer ID and
@@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
 
 \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The driver MUST NOT set the DESC_F_INDIRECT flag unless the
-VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
-set any flags except DESC_F_WRITE within an indirect descriptor.
+VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
+The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
 
 A driver MUST NOT create a descriptor chain longer than allowed
 by the device.
diff --git a/split-ring.tex b/split-ring.tex
index de94038..cd53840 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
 descriptors in ring order: starting from offset 0 in the table,
 and wrapping around at the end of the table.
 
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
+descriptors in admin virtqueue ring order: starting from offset 0 in the
+table, and wrapping around at the end of the table.
+
 \begin{note}
 The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
 referred to this structure as vring_desc, and the constants as
@@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
 Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
 this implies that loops in the descriptor chain are forbidden!
 
-If VIRTIO_F_IN_ORDER has been negotiated, and when making a
-descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
-$x$ in the table available to the device, driver MUST set
+If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
+and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
+offset $x$ in the table available to the device, driver MUST set
 \field{next} to $0$ for the last descriptor in the table
 (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
+This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
 
 \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 
 Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
+of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
+features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
 ring capacity the driver can store a table of indirect
 descriptors anywhere in memory, and insert a descriptor in main
 virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
@@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
 A single indirect descriptor
 table can include both device-readable and device-writable descriptors.
 
-If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
-use sequential indices, in-order: index 0 followed by index 1
+If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
+for non admin virtqueue, use sequential indices, in-order: index 0 followed
+by index 1 followed by index 2, etc.
+
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
+descriptors use sequential indices, in-order: index 0 followed by index 1
 followed by index 2, etc.
 
 \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
-VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
-set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
-one table per descriptor).
+VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
+The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
+descriptor (ie. only one table per descriptor).
 
 A driver MUST NOT create a descriptor chain longer than the Queue Size of
 the device.
@@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
 A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
 in \field{flags}.
 
-If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
-MUST appear sequentially, with \field{next} taking the value
-of 1 for the 1st descriptor, 2 for the 2nd one, etc.
+If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
+indirect descriptors MUST appear sequentially, with \field{next} taking the
+value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
+and rest other virtqueues types respectively.
 
 \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
@ 2022-01-13 15:33   ` Michael S. Tsirkin
  2022-01-13 17:07     ` Max Gurtovoy
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 15:33 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> VIRTIO_F_IN_ORDER. Some devices might support these features only for
> admin virtqueues and some might support them for both admin virtqueues
> and request virtqueues or only for non-admin virtqueues. Some
> optimization can be made for each type of virtqueue, thus we separate
> these features for the different virtqueue types.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

That seems vague as motivation.
Why do we need to optimize admin queues? Aren't they
fundamentally a control path feature?
Why would we want to special-case these features specifically?
Should we allow control of features per VQ generally?


> ---
>  content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
>  packed-ring.tex | 26 +++++++++++++-------------
>  split-ring.tex  | 35 +++++++++++++++++++++++------------
>  3 files changed, 75 insertions(+), 33 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index c524fab..cc3e648 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 41] Feature bits reserved for extensions to the queue and
> +\item[24 to 43] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[42 and above] Feature bits reserved for future extensions.
> +\item[44 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> -might allow optimizations or simplify driver and/or device code.
> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> +If negotiated, this knowledge might allow optimizations or
> +simplify driver and/or device code.
>  
>  Each virtqueue can consist of up to 3 parts:
>  \begin{itemize}
> @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>  Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>  Virtqueues / The Virtqueue Descriptor Table / Indirect
> -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
>    \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
>    and the \field{avail_event} fields as described in
>  \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
> @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    support for the packed virtqueue layout as described in
>    \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
>    \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
> -  that all buffers are used by the device in the same
> -  order in which they have been made available.
> +  that all buffers are used by the device, excluding buffers used by
> +  the admin virtqueue, in the same order in which they have been made
> +  available.
>    \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
>    that memory accesses by the driver and the device are ordered
>    in a way described by the platform.
> @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>    the device supports administration virtqueue negotiation.
>  
> +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
> +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
> +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> +Virtqueues / The Virtqueue Descriptor Table / Indirect
> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
> +  virtqueue and excluding the descriptors that sent via other virtqueues.
> +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
> +  that all buffers are used by the admin virtqueue of the device in
> +  the same order in which they have been made available.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  accepted.
>  
>  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> -buffers in the same order in which they have been available.
> +buffers in the same order in which they have been available. This refers
> +to buffers that are used by virtqueue that is not the admin virtqueue.
> +
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
> +buffers in the same order in which they have been available. This refers
> +only for buffers that are used by the admin virtqueue.
>  
>  A device MAY fail to operate further if
>  VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
> @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  and presents a PCI SR-IOV capability structure, otherwise
>  it MUST NOT offer VIRTIO_F_SR_IOV.
>  
> +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
> +VIRTIO_F_ADMIN_VQ.
> +
> +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
> +VIRTIO_F_ADMIN_VQ.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> diff --git a/packed-ring.tex b/packed-ring.tex
> index a9e6c16..ef1dbc2 100644
> --- a/packed-ring.tex
> +++ b/packed-ring.tex
> @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
>  \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
>  
>  Some devices benefit by concurrently dispatching a large number
> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
> -ring capacity the driver can store a (read-only by the device) table of indirect
> -descriptors anywhere in memory, and insert a descriptor in the main
> -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
> -a buffer element
> -containing this indirect descriptor table; \field{addr} and \field{len}
> -refer to the indirect table address and length in bytes,
> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> +features allows this. To increase ring capacity the driver can store a (read-only
> +by the device) table of indirect descriptors anywhere in memory, and insert a
> +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
> +that refers to a buffer element containing this indirect descriptor table;
> +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
>  respectively.
>  \begin{lstlisting}
>  /* This means the element contains a table of descriptors. */
> @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> -devices to notify the use of a batch of buffers to the driver by
> -only writing out a single used descriptor with the Buffer ID
> -corresponding to the last descriptor in the batch.
> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> +If negotiated, this knowledge allows devices to notify the use of
> +a batch of buffers to the driver by only writing out a single used
> +descriptor with the Buffer ID corresponding to the last descriptor
> +in the batch.
>  
>  The device then skips forward in the ring according to the size of
>  the batch. The driver needs to look up the used Buffer ID and
> @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
>  
>  \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The driver MUST NOT set the DESC_F_INDIRECT flag unless the
> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> -set any flags except DESC_F_WRITE within an indirect descriptor.
> +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
>  
>  A driver MUST NOT create a descriptor chain longer than allowed
>  by the device.
> diff --git a/split-ring.tex b/split-ring.tex
> index de94038..cd53840 100644
> --- a/split-ring.tex
> +++ b/split-ring.tex
> @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>  descriptors in ring order: starting from offset 0 in the table,
>  and wrapping around at the end of the table.
>  
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
> +descriptors in admin virtqueue ring order: starting from offset 0 in the
> +table, and wrapping around at the end of the table.
> +
>  \begin{note}
>  The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
>  referred to this structure as vring_desc, and the constants as
> @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>  Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
>  this implies that loops in the descriptor chain are forbidden!
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
> -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
> -$x$ in the table available to the device, driver MUST set
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
> +offset $x$ in the table available to the device, driver MUST set
>  \field{next} to $0$ for the last descriptor in the table
>  (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
> +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
>  
>  \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  
>  Some devices benefit by concurrently dispatching a large number
> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>  ring capacity the driver can store a table of indirect
>  descriptors anywhere in memory, and insert a descriptor in main
>  virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
> @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>  A single indirect descriptor
>  table can include both device-readable and device-writable descriptors.
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> -use sequential indices, in-order: index 0 followed by index 1
> +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
> +for non admin virtqueue, use sequential indices, in-order: index 0 followed
> +by index 1 followed by index 2, etc.
> +
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
> +descriptors use sequential indices, in-order: index 0 followed by index 1
>  followed by index 2, etc.
>  
>  \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
> -one table per descriptor).
> +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
> +descriptor (ie. only one table per descriptor).
>  
>  A driver MUST NOT create a descriptor chain longer than the Queue Size of
>  the device.
> @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>  A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
>  in \field{flags}.
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> -MUST appear sequentially, with \field{next} taking the value
> -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> +indirect descriptors MUST appear sequentially, with \field{next} taking the
> +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
> +and rest other virtqueues types respectively.
>  
>  \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 15:33   ` Michael S. Tsirkin
@ 2022-01-13 17:07     ` Max Gurtovoy
  2022-01-13 17:25       ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 17:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
>> These new features are parallel to VIRTIO_F_INDIRECT_DESC and
>> VIRTIO_F_IN_ORDER. Some devices might support these features only for
>> admin virtqueues and some might support them for both admin virtqueues
>> and request virtqueues or only for non-admin virtqueues. Some
>> optimization can be made for each type of virtqueue, thus we separate
>> these features for the different virtqueue types.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> That seems vague as motivation.
> Why do we need to optimize admin queues? Aren't they
> fundamentally a control path feature?
> Why would we want to special-case these features specifically?
> Should we allow control of features per VQ generally?

We would like to allow executing admins commands out of order and IO 
requests in order for efficiency.

And also the other way around.

IO cmds and admin cmds have different considerations in many cases.

>
>
>> ---
>>   content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
>>   packed-ring.tex | 26 +++++++++++++-------------
>>   split-ring.tex  | 35 +++++++++++++++++++++++------------
>>   3 files changed, 75 insertions(+), 33 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index c524fab..cc3e648 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>   \begin{description}
>>   \item[0 to 23] Feature bits for the specific device type
>>   
>> -\item[24 to 41] Feature bits reserved for extensions to the queue and
>> +\item[24 to 43] Feature bits reserved for extensions to the queue and
>>     feature negotiation mechanisms
>>   
>> -\item[42 and above] Feature bits reserved for future extensions.
>> +\item[44 and above] Feature bits reserved for future extensions.
>>   \end{description}
>>   
>>   \begin{note}
>> @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>>   
>>   Some devices always use descriptors in the same order in which
>>   they have been made available. These devices can offer the
>> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
>> -might allow optimizations or simplify driver and/or device code.
>> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
>> +If negotiated, this knowledge might allow optimizations or
>> +simplify driver and/or device code.
>>   
>>   Each virtqueue can consist of up to 3 parts:
>>   \begin{itemize}
>> @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>>   Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>>   Virtqueues / The Virtqueue Descriptor Table / Indirect
>> -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
>> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
>>     \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
>>     and the \field{avail_event} fields as described in
>>   \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
>> @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     support for the packed virtqueue layout as described in
>>     \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
>>     \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
>> -  that all buffers are used by the device in the same
>> -  order in which they have been made available.
>> +  that all buffers are used by the device, excluding buffers used by
>> +  the admin virtqueue, in the same order in which they have been made
>> +  available.
>>     \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
>>     that memory accesses by the driver and the device are ordered
>>     in a way described by the platform.
>> @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>>     the device supports administration virtqueue negotiation.
>>   
>> +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
>> +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
>> +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
>> +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>> +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>> +Virtqueues / The Virtqueue Descriptor Table / Indirect
>> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
>> +  virtqueue and excluding the descriptors that sent via other virtqueues.
>> +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
>> +  that all buffers are used by the admin virtqueue of the device in
>> +  the same order in which they have been made available.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   
>>   A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
>>   
>> +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
>> +VIRTIO_F_ADMIN_VQ.
>> +
>> +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>> +VIRTIO_F_ADMIN_VQ.
>> +
>>   \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>   
>>   A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
>> @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   accepted.
>>   
>>   If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
>> -buffers in the same order in which they have been available.
>> +buffers in the same order in which they have been available. This refers
>> +to buffers that are used by virtqueue that is not the admin virtqueue.
>> +
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
>> +buffers in the same order in which they have been available. This refers
>> +only for buffers that are used by the admin virtqueue.
>>   
>>   A device MAY fail to operate further if
>>   VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
>> @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   and presents a PCI SR-IOV capability structure, otherwise
>>   it MUST NOT offer VIRTIO_F_SR_IOV.
>>   
>> +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
>> +VIRTIO_F_ADMIN_VQ.
>> +
>> +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>> +VIRTIO_F_ADMIN_VQ.
>> +
>>   \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>>   
>>   Transitional devices MAY offer the following:
>> diff --git a/packed-ring.tex b/packed-ring.tex
>> index a9e6c16..ef1dbc2 100644
>> --- a/packed-ring.tex
>> +++ b/packed-ring.tex
>> @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
>>   \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
>>   
>>   Some devices benefit by concurrently dispatching a large number
>> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
>> -ring capacity the driver can store a (read-only by the device) table of indirect
>> -descriptors anywhere in memory, and insert a descriptor in the main
>> -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
>> -a buffer element
>> -containing this indirect descriptor table; \field{addr} and \field{len}
>> -refer to the indirect table address and length in bytes,
>> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
>> +features allows this. To increase ring capacity the driver can store a (read-only
>> +by the device) table of indirect descriptors anywhere in memory, and insert a
>> +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
>> +that refers to a buffer element containing this indirect descriptor table;
>> +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
>>   respectively.
>>   \begin{lstlisting}
>>   /* This means the element contains a table of descriptors. */
>> @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
>>   
>>   Some devices always use descriptors in the same order in which
>>   they have been made available. These devices can offer the
>> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
>> -devices to notify the use of a batch of buffers to the driver by
>> -only writing out a single used descriptor with the Buffer ID
>> -corresponding to the last descriptor in the batch.
>> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
>> +If negotiated, this knowledge allows devices to notify the use of
>> +a batch of buffers to the driver by only writing out a single used
>> +descriptor with the Buffer ID corresponding to the last descriptor
>> +in the batch.
>>   
>>   The device then skips forward in the ring according to the size of
>>   the batch. The driver needs to look up the used Buffer ID and
>> @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
>>   
>>   \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The driver MUST NOT set the DESC_F_INDIRECT flag unless the
>> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
>> -set any flags except DESC_F_WRITE within an indirect descriptor.
>> +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
>> +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
>>   
>>   A driver MUST NOT create a descriptor chain longer than allowed
>>   by the device.
>> diff --git a/split-ring.tex b/split-ring.tex
>> index de94038..cd53840 100644
>> --- a/split-ring.tex
>> +++ b/split-ring.tex
>> @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>>   descriptors in ring order: starting from offset 0 in the table,
>>   and wrapping around at the end of the table.
>>   
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
>> +descriptors in admin virtqueue ring order: starting from offset 0 in the
>> +table, and wrapping around at the end of the table.
>> +
>>   \begin{note}
>>   The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
>>   referred to this structure as vring_desc, and the constants as
>> @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>>   Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
>>   this implies that loops in the descriptor chain are forbidden!
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
>> -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
>> -$x$ in the table available to the device, driver MUST set
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
>> +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
>> +offset $x$ in the table available to the device, driver MUST set
>>   \field{next} to $0$ for the last descriptor in the table
>>   (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
>> +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
>>   
>>   \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   
>>   Some devices benefit by concurrently dispatching a large number
>> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
>> +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>>   ring capacity the driver can store a table of indirect
>>   descriptors anywhere in memory, and insert a descriptor in main
>>   virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
>> @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>>   A single indirect descriptor
>>   table can include both device-readable and device-writable descriptors.
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
>> -use sequential indices, in-order: index 0 followed by index 1
>> +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
>> +for non admin virtqueue, use sequential indices, in-order: index 0 followed
>> +by index 1 followed by index 2, etc.
>> +
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
>> +descriptors use sequential indices, in-order: index 0 followed by index 1
>>   followed by index 2, etc.
>>   
>>   \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
>> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
>> -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
>> -one table per descriptor).
>> +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
>> +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
>> +descriptor (ie. only one table per descriptor).
>>   
>>   A driver MUST NOT create a descriptor chain longer than the Queue Size of
>>   the device.
>> @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>>   A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
>>   in \field{flags}.
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
>> -MUST appear sequentially, with \field{next} taking the value
>> -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
>> +indirect descriptors MUST appear sequentially, with \field{next} taking the
>> +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
>> +and rest other virtqueues types respectively.
>>   
>>   \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 17:07     ` Max Gurtovoy
@ 2022-01-13 17:25       ` Michael S. Tsirkin
  2022-01-17 13:59         ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:25 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > VIRTIO_F_IN_ORDER. Some devices might support these features only for
> > > admin virtqueues and some might support them for both admin virtqueues
> > > and request virtqueues or only for non-admin virtqueues. Some
> > > optimization can be made for each type of virtqueue, thus we separate
> > > these features for the different virtqueue types.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > That seems vague as motivation.
> > Why do we need to optimize admin queues? Aren't they
> > fundamentally a control path feature?
> > Why would we want to special-case these features specifically?
> > Should we allow control of features per VQ generally?
> 
> We would like to allow executing admins commands out of order and IO
> requests in order for efficiency.

It's a control queue. Why do we worry?


> 
> And also the other way around.

what exactly does this mean?

> IO cmds and admin cmds have different considerations in many cases.

That's still pretty vague.  so do other types of VQ, such as RX/TX.

E.g. I can see how a hardware vendor might want to avoid supporting
indirect with RX for virtio net with mergeable buffers, but still
support it for TX.


I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
I think you want to reorder admin commands dealing with
unrelated VFs but keep io vqs in order for speed.
Just guessing, you should spell the real motivation out.
However, I think a better way to do that is with finalizing the
VIRTIO_F_PARTIAL_ORDER proposal from august.
Pls review and let me know. If there's finally a use for it
I'll prioritize finalizing that idea.
Don't see much point in tweaking INDIRECT at all.



> > 
> > 
> > > ---
> > >   content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
> > >   packed-ring.tex | 26 +++++++++++++-------------
> > >   split-ring.tex  | 35 +++++++++++++++++++++++------------
> > >   3 files changed, 75 insertions(+), 33 deletions(-)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index c524fab..cc3e648 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
> > >   \begin{description}
> > >   \item[0 to 23] Feature bits for the specific device type
> > > -\item[24 to 41] Feature bits reserved for extensions to the queue and
> > > +\item[24 to 43] Feature bits reserved for extensions to the queue and
> > >     feature negotiation mechanisms
> > > -\item[42 and above] Feature bits reserved for future extensions.
> > > +\item[44 and above] Feature bits reserved for future extensions.
> > >   \end{description}
> > >   \begin{note}
> > > @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> > >   Some devices always use descriptors in the same order in which
> > >   they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > -might allow optimizations or simplify driver and/or device code.
> > > +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> > > +If negotiated, this knowledge might allow optimizations or
> > > +simplify driver and/or device code.
> > >   Each virtqueue can consist of up to 3 parts:
> > >   \begin{itemize}
> > > @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> > >   Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> > >   Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
> > > +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
> > >     \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
> > >     and the \field{avail_event} fields as described in
> > >   \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
> > > @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     support for the packed virtqueue layout as described in
> > >     \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
> > >     \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
> > > -  that all buffers are used by the device in the same
> > > -  order in which they have been made available.
> > > +  that all buffers are used by the device, excluding buffers used by
> > > +  the admin virtqueue, in the same order in which they have been made
> > > +  available.
> > >     \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
> > >     that memory accesses by the driver and the device are ordered
> > >     in a way described by the platform.
> > > @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> > >     the device supports administration virtqueue negotiation.
> > > +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
> > > +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> > > +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > > +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> > > +Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
> > > +  virtqueue and excluding the descriptors that sent via other virtqueues.
> > > +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
> > > +  that all buffers are used by the admin virtqueue of the device in
> > > +  the same order in which they have been made available.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
> > > +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > > +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > >   \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > >   A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> > > @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   accepted.
> > >   If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> > > -buffers in the same order in which they have been available.
> > > +buffers in the same order in which they have been available. This refers
> > > +to buffers that are used by virtqueue that is not the admin virtqueue.
> > > +
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
> > > +buffers in the same order in which they have been available. This refers
> > > +only for buffers that are used by the admin virtqueue.
> > >   A device MAY fail to operate further if
> > >   VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
> > > @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   and presents a PCI SR-IOV capability structure, otherwise
> > >   it MUST NOT offer VIRTIO_F_SR_IOV.
> > > +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > > +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > >   \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
> > >   Transitional devices MAY offer the following:
> > > diff --git a/packed-ring.tex b/packed-ring.tex
> > > index a9e6c16..ef1dbc2 100644
> > > --- a/packed-ring.tex
> > > +++ b/packed-ring.tex
> > > @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
> > >   \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
> > >   Some devices benefit by concurrently dispatching a large number
> > > -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
> > > -ring capacity the driver can store a (read-only by the device) table of indirect
> > > -descriptors anywhere in memory, and insert a descriptor in the main
> > > -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
> > > -a buffer element
> > > -containing this indirect descriptor table; \field{addr} and \field{len}
> > > -refer to the indirect table address and length in bytes,
> > > +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> > > +features allows this. To increase ring capacity the driver can store a (read-only
> > > +by the device) table of indirect descriptors anywhere in memory, and insert a
> > > +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
> > > +that refers to a buffer element containing this indirect descriptor table;
> > > +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
> > >   respectively.
> > >   \begin{lstlisting}
> > >   /* This means the element contains a table of descriptors. */
> > > @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
> > >   Some devices always use descriptors in the same order in which
> > >   they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> > > -devices to notify the use of a batch of buffers to the driver by
> > > -only writing out a single used descriptor with the Buffer ID
> > > -corresponding to the last descriptor in the batch.
> > > +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> > > +If negotiated, this knowledge allows devices to notify the use of
> > > +a batch of buffers to the driver by only writing out a single used
> > > +descriptor with the Buffer ID corresponding to the last descriptor
> > > +in the batch.
> > >   The device then skips forward in the ring according to the size of
> > >   the batch. The driver needs to look up the used Buffer ID and
> > > @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
> > >   \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The driver MUST NOT set the DESC_F_INDIRECT flag unless the
> > > -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> > > -set any flags except DESC_F_WRITE within an indirect descriptor.
> > > +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> > > +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
> > >   A driver MUST NOT create a descriptor chain longer than allowed
> > >   by the device.
> > > diff --git a/split-ring.tex b/split-ring.tex
> > > index de94038..cd53840 100644
> > > --- a/split-ring.tex
> > > +++ b/split-ring.tex
> > > @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
> > >   descriptors in ring order: starting from offset 0 in the table,
> > >   and wrapping around at the end of the table.
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
> > > +descriptors in admin virtqueue ring order: starting from offset 0 in the
> > > +table, and wrapping around at the end of the table.
> > > +
> > >   \begin{note}
> > >   The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
> > >   referred to this structure as vring_desc, and the constants as
> > > @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
> > >   Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
> > >   this implies that loops in the descriptor chain are forbidden!
> > > -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
> > > -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
> > > -$x$ in the table available to the device, driver MUST set
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> > > +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
> > > +offset $x$ in the table available to the device, driver MUST set
> > >   \field{next} to $0$ for the last descriptor in the table
> > >   (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
> > > +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
> > >   \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   Some devices benefit by concurrently dispatching a large number
> > > -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> > > +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> > > +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> > >   ring capacity the driver can store a table of indirect
> > >   descriptors anywhere in memory, and insert a descriptor in main
> > >   virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
> > > @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
> > >   A single indirect descriptor
> > >   table can include both device-readable and device-writable descriptors.
> > > -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> > > -use sequential indices, in-order: index 0 followed by index 1
> > > +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
> > > +for non admin virtqueue, use sequential indices, in-order: index 0 followed
> > > +by index 1 followed by index 2, etc.
> > > +
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
> > > +descriptors use sequential indices, in-order: index 0 followed by index 1
> > >   followed by index 2, etc.
> > >   \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
> > > -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> > > -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
> > > -one table per descriptor).
> > > +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> > > +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
> > > +descriptor (ie. only one table per descriptor).
> > >   A driver MUST NOT create a descriptor chain longer than the Queue Size of
> > >   the device.
> > > @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
> > >   A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
> > >   in \field{flags}.
> > > -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> > > -MUST appear sequentially, with \field{next} taking the value
> > > -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> > > +indirect descriptors MUST appear sequentially, with \field{next} taking the
> > > +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
> > > +and rest other virtqueues types respectively.
> > >   \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 17:25       ` Michael S. Tsirkin
@ 2022-01-17 13:59         ` Parav Pandit
  2022-01-17 22:14           ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-17 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, January 13, 2022 10:56 PM
> 
> On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> >
> > On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > > VIRTIO_F_IN_ORDER. Some devices might support these features only
> > > > for admin virtqueues and some might support them for both admin
> > > > virtqueues and request virtqueues or only for non-admin
> > > > virtqueues. Some optimization can be made for each type of
> > > > virtqueue, thus we separate these features for the different virtqueue
> types.
> > > >
> > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > That seems vague as motivation.
> > > Why do we need to optimize admin queues? Aren't they fundamentally a
> > > control path feature?
> > > Why would we want to special-case these features specifically?
> > > Should we allow control of features per VQ generally?
> >
> > We would like to allow executing admins commands out of order and IO
> > requests in order for efficiency.
> 
> It's a control queue. Why do we worry?
It is used to control/manage the resource of a VF which is deployed usually to a VM.
So higher the latency, higher the time it takes to deploy start the VM.
Hence, it is better to have this basic functionality in place, being useful beyond MSI-X config.
It is not functionally must. But riding AQ command ordering on VIRTIO_F_IN_ORDER for now and later on driving based on new field requires dual handling.
Better to start with its AQ's own ordering and one scheme.

> 
> 
> >
> > And also the other way around.
> 
> what exactly does this mean?
> 
IO commands out of order (for say block device), but AQ commands in order.
May be AQ command execution can be always treated as out of order, even when VIRTIO_F_IN_ORDER is negotiated.
This way it will be even more simpler design for driver and device.

> > IO cmds and admin cmds have different considerations in many cases.
> 
> That's still pretty vague.  so do other types of VQ, such as RX/TX.
> 
> E.g. I can see how a hardware vendor might want to avoid supporting indirect
> with RX for virtio net with mergeable buffers, but still support it for TX.
> 
> 
> I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
I agree. It only helps driver to ensure that AQ commands are processed in order, so it doesn't need to serialize it.
But yes, driver can always serialize it if needed when AQ is always out of order.
I think we should word it that AQ is always out of order.

> I think you want to reorder admin commands dealing with unrelated VFs but
> keep io vqs in order for speed.
> Just guessing, you should spell the real motivation out.
> However, I think a better way to do that is with finalizing the
> VIRTIO_F_PARTIAL_ORDER proposal from august.
I read the partial order proposal at [1].
It still appears IN_ORDER from driver POV.
So I am not sure if driver can complete AQ commands out of order. Can it?
I think data path needs more plumbing that just PARTIAL_ORDER flag, for descriptor processing differently on tx and rx side.
Not sure merging AQ to it is useful, given that we agree that AQ should always behave as out of order from beginning.

[1] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html

> Pls review and let me know. If there's finally a use for it I'll prioritize finalizing
> that idea.
> Don't see much point in tweaking INDIRECT at all.
Common negotiation of INDIRECT on AQ and other queues forces data path also to handle that.
It is better to not impact the device to handler indirect descriptors on non AQ queues, just because AQ prefers to handle it.
Often AQ and data path queues are not handled by same set of processing engines given they both do different tasks.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-17 13:59         ` Parav Pandit
@ 2022-01-17 22:14           ` Michael S. Tsirkin
  2022-01-18  4:44             ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Mon, Jan 17, 2022 at 01:59:29PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, January 13, 2022 10:56 PM
> > 
> > On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> > >
> > > On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > > > VIRTIO_F_IN_ORDER. Some devices might support these features only
> > > > > for admin virtqueues and some might support them for both admin
> > > > > virtqueues and request virtqueues or only for non-admin
> > > > > virtqueues. Some optimization can be made for each type of
> > > > > virtqueue, thus we separate these features for the different virtqueue
> > types.
> > > > >
> > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > That seems vague as motivation.
> > > > Why do we need to optimize admin queues? Aren't they fundamentally a
> > > > control path feature?
> > > > Why would we want to special-case these features specifically?
> > > > Should we allow control of features per VQ generally?
> > >
> > > We would like to allow executing admins commands out of order and IO
> > > requests in order for efficiency.
> > 
> > It's a control queue. Why do we worry?
> It is used to control/manage the resource of a VF which is deployed usually to a VM.
> So higher the latency, higher the time it takes to deploy start the VM.

What are the savings here, in real terms? Boot times for smallest VMs
are in 10s of milliseconds. Is reordering of a queue somehow
going to save more than microseconds?

> Hence, it is better to have this basic functionality in place, being useful beyond MSI-X config.
> It is not functionally must. But riding AQ command ordering on VIRTIO_F_IN_ORDER for now and later on driving based on new field requires dual handling.
> Better to start with its AQ's own ordering and one scheme.

Sorry I'm still scratching my head.


> > 
> > 
> > >
> > > And also the other way around.
> > 
> > what exactly does this mean?
> > 
> IO commands out of order (for say block device), but AQ commands in order.
> May be AQ command execution can be always treated as out of order, even when VIRTIO_F_IN_ORDER is negotiated.
> This way it will be even more simpler design for driver and device.
> 
> > > IO cmds and admin cmds have different considerations in many cases.
> > 
> > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > 
> > E.g. I can see how a hardware vendor might want to avoid supporting indirect
> > with RX for virtio net with mergeable buffers, but still support it for TX.
> > 
> > 
> > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> I agree. It only helps driver to ensure that AQ commands are processed in order, so it doesn't need to serialize it.
> But yes, driver can always serialize it if needed when AQ is always out of order.
> I think we should word it that AQ is always out of order.
> 
> > I think you want to reorder admin commands dealing with unrelated VFs but
> > keep io vqs in order for speed.
> > Just guessing, you should spell the real motivation out.
> > However, I think a better way to do that is with finalizing the
> > VIRTIO_F_PARTIAL_ORDER proposal from august.
> I read the partial order proposal at [1].
> It still appears IN_ORDER from driver POV.
> So I am not sure if driver can complete AQ commands out of order. Can it?

complete commands == use buffers?
drivers do not use buffers.

> I think data path needs more plumbing that just PARTIAL_ORDER flag, for descriptor processing differently on tx and rx side.
> Not sure merging AQ to it is useful, given that we agree that AQ should always behave as out of order from beginning.
> 
> [1] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html

You mean *device*. Driver does not control the order.
The point of PARTIAL_ORDER is basically that some
descriptors are in order some out of order and its up to device. So it is
even finer resolution.


> > Pls review and let me know. If there's finally a use for it I'll prioritize finalizing
> > that idea.
> > Don't see much point in tweaking INDIRECT at all.
> Common negotiation of INDIRECT on AQ and other queues forces data path also to handle that.

I don't see why admin queue needs indirect descriptors.

> It is better to not impact the device to handler indirect descriptors on non AQ queues, just because AQ prefers to handle it.
> Often AQ and data path queues are not handled by same set of processing engines given they both do different tasks.

so for example, many guests want to use indirect for tx but not for rx.
if you are worrying about things like that, maybe a per-vq control
over indirect support makes sense.
adding complexity like that should really be much better motivated,
and maybe have some PoC code or back of the napkin math
showing the expected gains.


-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-17 22:14           ` Michael S. Tsirkin
@ 2022-01-18  4:44             ` Parav Pandit
  2022-01-18  6:23               ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  4:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:44 AM

> > > It's a control queue. Why do we worry?
> > It is used to control/manage the resource of a VF which is deployed usually
> to a VM.
> > So higher the latency, higher the time it takes to deploy start the VM.
> 
> What are the savings here, in real terms? Boot times for smallest VMs are in
> 10s of milliseconds. Is reordering of a queue somehow going to save more than
> microseconds?
>
It is probably better not to pick on a specific vendor implementation.
But for real numbers, I see that an implementation takes 54usec to 500 usec range for simple configuration.

It is better to not small VM 4 vector configuration to take longer because there was previous AQ command for 64 vectors.
 
> > Hence, it is better to have this basic functionality in place, being useful
> beyond MSI-X config.
> > It is not functionally must. But riding AQ command ordering on
> VIRTIO_F_IN_ORDER for now and later on driving based on new field requires
> dual handling.
> > Better to start with its AQ's own ordering and one scheme.
> 
> Sorry I'm still scratching my head.

if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
    AQ.IN_ORDERED_NEGOTIATED /* new */) {
	/* handle AQ descriptors in-order way */
} else { 
	/* handle AQ desc out-of order way */
}

By always doing AQ commands out of order, we simplify the driver and device to avoid in-order execution.

> > > >
> > > > And also the other way around.
> > >
> > > what exactly does this mean?
> > >
> > IO commands out of order (for say block device), but AQ commands in order.
> > May be AQ command execution can be always treated as out of order, even
> when VIRTIO_F_IN_ORDER is negotiated.
> > This way it will be even more simpler design for driver and device.
> >
> > > > IO cmds and admin cmds have different considerations in many cases.
> > >
> > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > >
> > > E.g. I can see how a hardware vendor might want to avoid supporting
> > > indirect with RX for virtio net with mergeable buffers, but still support it for
> TX.
> > >
> > >
> > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > I agree. It only helps driver to ensure that AQ commands are processed in
> order, so it doesn't need to serialize it.
> > But yes, driver can always serialize it if needed when AQ is always out of
> order.
> > I think we should word it that AQ is always out of order.
> >
> > > I think you want to reorder admin commands dealing with unrelated
> > > VFs but keep io vqs in order for speed.
> > > Just guessing, you should spell the real motivation out.
> > > However, I think a better way to do that is with finalizing the
> > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > I read the partial order proposal at [1].
> > It still appears IN_ORDER from driver POV.
> > So I am not sure if driver can complete AQ commands out of order. Can it?
> 
> complete commands == use buffers?
Complete descriptors out of order.
I used term command as AQ descriptor used commands.
Will rephase.

> drivers do not use buffers.
>
 
> > I think data path needs more plumbing that just PARTIAL_ORDER flag, for
> descriptor processing differently on tx and rx side.
> > Not sure merging AQ to it is useful, given that we agree that AQ should
> always behave as out of order from beginning.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html
> 
> You mean *device*. Driver does not control the order.
Data path I meant device and driver both.
Driver doesn't control the order, but should be ready to handle used descriptors out of order when PARTIAL is negotiated.

> The point of PARTIAL_ORDER is basically that some descriptors are in order
> some out of order and its up to device. So it is even finer resolution.
> 
> 
> > > Pls review and let me know. If there's finally a use for it I'll
> > > prioritize finalizing that idea.
> > > Don't see much point in tweaking INDIRECT at all.
> > Common negotiation of INDIRECT on AQ and other queues forces data path
> also to handle that.
> 
> I don't see why admin queue needs indirect descriptors.
> 
Probably yes. the simple idea is, not to impose indirect descriptors on AQ because txq/rxq prefers to use it.
Not that AQ needs it.
At the same time, you don't want AQ object in spec to be limited to always operate without indirect descriptor.

> > It is better to not impact the device to handler indirect descriptors on non
> AQ queues, just because AQ prefers to handle it.
> > Often AQ and data path queues are not handled by same set of processing
> engines given they both do different tasks.
> 
> so for example, many guests want to use indirect for tx but not for rx.
> if you are worrying about things like that, maybe a per-vq control over indirect
> support makes sense.
> adding complexity like that should really be much better motivated, and
> maybe have some PoC code or back of the napkin math showing the expected
> gains.
I do not have gains handy for tx vs rx q. It was in your example of partial order page fault thread.
So may be you can share those results and/or poc code?

The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect descriptor capability should not be imposed on AQ.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  4:44             ` Parav Pandit
@ 2022-01-18  6:23               ` Michael S. Tsirkin
  2022-01-18  6:32                 ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:23 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 3:44 AM
> 
> > > > It's a control queue. Why do we worry?
> > > It is used to control/manage the resource of a VF which is deployed usually
> > to a VM.
> > > So higher the latency, higher the time it takes to deploy start the VM.
> > 
> > What are the savings here, in real terms? Boot times for smallest VMs are in
> > 10s of milliseconds. Is reordering of a queue somehow going to save more than
> > microseconds?
> >
> It is probably better not to pick on a specific vendor implementation.
> But for real numbers, I see that an implementation takes 54usec to 500 usec range for simple configuration.
> 
> It is better to not small VM 4 vector configuration to take longer because there was previous AQ command for 64 vectors.

So virtio discovery on boot includes multiple of vmexits, each costs ~1000
cycles.  And people do not seem to worry about it.
You want a compelling argument for working on performance of config.
I frankly think it's not really useful but I especially think
you should cut this out of the current proposal, it's too big as it is.

> > > Hence, it is better to have this basic functionality in place, being useful
> > beyond MSI-X config.
> > > It is not functionally must. But riding AQ command ordering on
> > VIRTIO_F_IN_ORDER for now and later on driving based on new field requires
> > dual handling.
> > > Better to start with its AQ's own ordering and one scheme.
> > 
> > Sorry I'm still scratching my head.
> 
> if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
>     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> 	/* handle AQ descriptors in-order way */
> } else { 
> 	/* handle AQ desc out-of order way */
> }
> 
> By always doing AQ commands out of order, we simplify the driver and device to avoid in-order execution.

No idea what this means. Needs much more motivational discussion, and
more thought about using generic infrastructure.
How about making this a separate proposal?


> > > > >
> > > > > And also the other way around.
> > > >
> > > > what exactly does this mean?
> > > >
> > > IO commands out of order (for say block device), but AQ commands in order.
> > > May be AQ command execution can be always treated as out of order, even
> > when VIRTIO_F_IN_ORDER is negotiated.
> > > This way it will be even more simpler design for driver and device.
> > >
> > > > > IO cmds and admin cmds have different considerations in many cases.
> > > >
> > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > >
> > > > E.g. I can see how a hardware vendor might want to avoid supporting
> > > > indirect with RX for virtio net with mergeable buffers, but still support it for
> > TX.
> > > >
> > > >
> > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > I agree. It only helps driver to ensure that AQ commands are processed in
> > order, so it doesn't need to serialize it.
> > > But yes, driver can always serialize it if needed when AQ is always out of
> > order.
> > > I think we should word it that AQ is always out of order.
> > >
> > > > I think you want to reorder admin commands dealing with unrelated
> > > > VFs but keep io vqs in order for speed.
> > > > Just guessing, you should spell the real motivation out.
> > > > However, I think a better way to do that is with finalizing the
> > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > I read the partial order proposal at [1].
> > > It still appears IN_ORDER from driver POV.
> > > So I am not sure if driver can complete AQ commands out of order. Can it?
> > 
> > complete commands == use buffers?
> Complete descriptors out of order.
> I used term command as AQ descriptor used commands.
> Will rephase.

So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think
there's a way to make it work for your usecase.

> > drivers do not use buffers.
> >
>  
> > > I think data path needs more plumbing that just PARTIAL_ORDER flag, for
> > descriptor processing differently on tx and rx side.
> > > Not sure merging AQ to it is useful, given that we agree that AQ should
> > always behave as out of order from beginning.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html
> > 
> > You mean *device*. Driver does not control the order.
> Data path I meant device and driver both.
> Driver doesn't control the order, but should be ready to handle used descriptors out of order when PARTIAL is negotiated.
> 
> > The point of PARTIAL_ORDER is basically that some descriptors are in order
> > some out of order and its up to device. So it is even finer resolution.
> > 
> > 
> > > > Pls review and let me know. If there's finally a use for it I'll
> > > > prioritize finalizing that idea.
> > > > Don't see much point in tweaking INDIRECT at all.
> > > Common negotiation of INDIRECT on AQ and other queues forces data path
> > also to handle that.
> > 
> > I don't see why admin queue needs indirect descriptors.
> > 
> Probably yes. the simple idea is, not to impose indirect descriptors on AQ because txq/rxq prefers to use it.
> Not that AQ needs it.
> At the same time, you don't want AQ object in spec to be limited to always operate without indirect descriptor.
> 
> > > It is better to not impact the device to handler indirect descriptors on non
> > AQ queues, just because AQ prefers to handle it.
> > > Often AQ and data path queues are not handled by same set of processing
> > engines given they both do different tasks.
> > 
> > so for example, many guests want to use indirect for tx but not for rx.
> > if you are worrying about things like that, maybe a per-vq control over indirect
> > support makes sense.
> > adding complexity like that should really be much better motivated, and
> > maybe have some PoC code or back of the napkin math showing the expected
> > gains.
> I do not have gains handy for tx vs rx q. It was in your example of partial order page fault thread.
> So may be you can share those results and/or poc code?
> 
> The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect descriptor capability should not be imposed on AQ.


If I were you I would defer this, the AQ patch is already too big.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:23               ` Michael S. Tsirkin
@ 2022-01-18  6:32                 ` Parav Pandit
  2022-01-18  6:54                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  6:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 11:54 AM
> 
> On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 3:44 AM
> >
> > > > > It's a control queue. Why do we worry?
> > > > It is used to control/manage the resource of a VF which is
> > > > deployed usually
> > > to a VM.
> > > > So higher the latency, higher the time it takes to deploy start the VM.
> > >
> > > What are the savings here, in real terms? Boot times for smallest
> > > VMs are in 10s of milliseconds. Is reordering of a queue somehow
> > > going to save more than microseconds?
> > >
> > It is probably better not to pick on a specific vendor implementation.
> > But for real numbers, I see that an implementation takes 54usec to 500 usec
> range for simple configuration.
> >
> > It is better to not small VM 4 vector configuration to take longer because
> there was previous AQ command for 64 vectors.
> 
> So virtio discovery on boot includes multiple of vmexits, each costs ~1000
> cycles.  And people do not seem to worry about it.
It is not the vector configuration by guest VM.
It is the AQ command that provisions number of msix vectors for the VF that takes tens to hundreds of usecs.
These are the command in patch-5 in this proposal.

> You want a compelling argument for working on performance of config.
> I frankly think it's not really useful but I especially think you should cut this out
> of the current proposal, it's too big as it is.
> 
Ok. We can do follow on proposal after AQ.
We already see need of out of order AQ in internal performance tests we are running.
But fine, we can differ.

> > > > Hence, it is better to have this basic functionality in place,
> > > > being useful
> > > beyond MSI-X config.
> > > > It is not functionally must. But riding AQ command ordering on
> > > VIRTIO_F_IN_ORDER for now and later on driving based on new field
> > > requires dual handling.
> > > > Better to start with its AQ's own ordering and one scheme.
> > >
> > > Sorry I'm still scratching my head.
> >
> > if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
> >     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> > 	/* handle AQ descriptors in-order way */ } else {
> > 	/* handle AQ desc out-of order way */ }
> >
> > By always doing AQ commands out of order, we simplify the driver and
> device to avoid in-order execution.
> 
> No idea what this means. Needs much more motivational discussion, and
> more thought about using generic infrastructure.
> How about making this a separate proposal?
>
Got it. Will drive it in follow on separate proposal.
 
> 
> > > > > >
> > > > > > And also the other way around.
> > > > >
> > > > > what exactly does this mean?
> > > > >
> > > > IO commands out of order (for say block device), but AQ commands in
> order.
> > > > May be AQ command execution can be always treated as out of order,
> > > > even
> > > when VIRTIO_F_IN_ORDER is negotiated.
> > > > This way it will be even more simpler design for driver and device.
> > > >
> > > > > > IO cmds and admin cmds have different considerations in many cases.
> > > > >
> > > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > > >
> > > > > E.g. I can see how a hardware vendor might want to avoid
> > > > > supporting indirect with RX for virtio net with mergeable
> > > > > buffers, but still support it for
> > > TX.
> > > > >
> > > > >
> > > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > > I agree. It only helps driver to ensure that AQ commands are
> > > > processed in
> > > order, so it doesn't need to serialize it.
> > > > But yes, driver can always serialize it if needed when AQ is
> > > > always out of
> > > order.
> > > > I think we should word it that AQ is always out of order.
> > > >
> > > > > I think you want to reorder admin commands dealing with
> > > > > unrelated VFs but keep io vqs in order for speed.
> > > > > Just guessing, you should spell the real motivation out.
> > > > > However, I think a better way to do that is with finalizing the
> > > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > > I read the partial order proposal at [1].
> > > > It still appears IN_ORDER from driver POV.
> > > > So I am not sure if driver can complete AQ commands out of order. Can
> it?
> > >
> > > complete commands == use buffers?
> > Complete descriptors out of order.
> > I used term command as AQ descriptor used commands.
> > Will rephase.
> 
> So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think there's a
> way to make it work for your usecase.
> 
> > > drivers do not use buffers.
> > >
> >
> > > > I think data path needs more plumbing that just PARTIAL_ORDER
> > > > flag, for
> > > descriptor processing differently on tx and rx side.
> > > > Not sure merging AQ to it is useful, given that we agree that AQ
> > > > should
> > > always behave as out of order from beginning.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.h
> > > > tml
> > >
> > > You mean *device*. Driver does not control the order.
> > Data path I meant device and driver both.
> > Driver doesn't control the order, but should be ready to handle used
> descriptors out of order when PARTIAL is negotiated.
> >
> > > The point of PARTIAL_ORDER is basically that some descriptors are in
> > > order some out of order and its up to device. So it is even finer resolution.
> > >
> > >
> > > > > Pls review and let me know. If there's finally a use for it I'll
> > > > > prioritize finalizing that idea.
> > > > > Don't see much point in tweaking INDIRECT at all.
> > > > Common negotiation of INDIRECT on AQ and other queues forces data
> > > > path
> > > also to handle that.
> > >
> > > I don't see why admin queue needs indirect descriptors.
> > >
> > Probably yes. the simple idea is, not to impose indirect descriptors on AQ
> because txq/rxq prefers to use it.
> > Not that AQ needs it.
> > At the same time, you don't want AQ object in spec to be limited to always
> operate without indirect descriptor.
> >
> > > > It is better to not impact the device to handler indirect
> > > > descriptors on non
> > > AQ queues, just because AQ prefers to handle it.
> > > > Often AQ and data path queues are not handled by same set of
> > > > processing
> > > engines given they both do different tasks.
> > >
> > > so for example, many guests want to use indirect for tx but not for rx.
> > > if you are worrying about things like that, maybe a per-vq control
> > > over indirect support makes sense.
> > > adding complexity like that should really be much better motivated,
> > > and maybe have some PoC code or back of the napkin math showing the
> > > expected gains.
> > I do not have gains handy for tx vs rx q. It was in your example of partial
> order page fault thread.
> > So may be you can share those results and/or poc code?
> >
> > The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect
> descriptor capability should not be imposed on AQ.
> 
> 
> If I were you I would defer this, the AQ patch is already too big.
o.k. we can differ.
But if you see on the other side, AQ always following INDIRECT_DESC feature bit, forces device implementation to support indirect descriptors.
Isn't that make device implementation big from beginning, even though it may not be needed?


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:32                 ` Parav Pandit
@ 2022-01-18  6:54                   ` Michael S. Tsirkin
  2022-01-18  7:07                     ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:54 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 11:54 AM
> > 
> > On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 3:44 AM
> > >
> > > > > > It's a control queue. Why do we worry?
> > > > > It is used to control/manage the resource of a VF which is
> > > > > deployed usually
> > > > to a VM.
> > > > > So higher the latency, higher the time it takes to deploy start the VM.
> > > >
> > > > What are the savings here, in real terms? Boot times for smallest
> > > > VMs are in 10s of milliseconds. Is reordering of a queue somehow
> > > > going to save more than microseconds?
> > > >
> > > It is probably better not to pick on a specific vendor implementation.
> > > But for real numbers, I see that an implementation takes 54usec to 500 usec
> > range for simple configuration.
> > >
> > > It is better to not small VM 4 vector configuration to take longer because
> > there was previous AQ command for 64 vectors.
> > 
> > So virtio discovery on boot includes multiple of vmexits, each costs ~1000
> > cycles.  And people do not seem to worry about it.
> It is not the vector configuration by guest VM.
> It is the AQ command that provisions number of msix vectors for the VF that takes tens to hundreds of usecs.
> These are the command in patch-5 in this proposal.

Hundreds of usecs is negligeable compared to VM boot time.
Sorry I don't really see why we worry about indirect in that case.


> > You want a compelling argument for working on performance of config.
> > I frankly think it's not really useful but I especially think you should cut this out
> > of the current proposal, it's too big as it is.
> > 
> Ok. We can do follow on proposal after AQ.
> We already see need of out of order AQ in internal performance tests we are running.

OK so first of all you can avoid declaring IN_ORDER.  If you see that
IN_ORDER improves performance for you so you need it, then look at
PARTIAL_ORDER pls. And if that does not address your needs then let's
discuss, I'd rather have a generic solution since the requirement does
not seem to be specific to AQ.

> But fine, we can differ.
> 
> > > > > Hence, it is better to have this basic functionality in place,
> > > > > being useful
> > > > beyond MSI-X config.
> > > > > It is not functionally must. But riding AQ command ordering on
> > > > VIRTIO_F_IN_ORDER for now and later on driving based on new field
> > > > requires dual handling.
> > > > > Better to start with its AQ's own ordering and one scheme.
> > > >
> > > > Sorry I'm still scratching my head.
> > >
> > > if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
> > >     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> > > 	/* handle AQ descriptors in-order way */ } else {
> > > 	/* handle AQ desc out-of order way */ }
> > >
> > > By always doing AQ commands out of order, we simplify the driver and
> > device to avoid in-order execution.
> > 
> > No idea what this means. Needs much more motivational discussion, and
> > more thought about using generic infrastructure.
> > How about making this a separate proposal?
> >
> Got it. Will drive it in follow on separate proposal.
>  
> > 
> > > > > > >
> > > > > > > And also the other way around.
> > > > > >
> > > > > > what exactly does this mean?
> > > > > >
> > > > > IO commands out of order (for say block device), but AQ commands in
> > order.
> > > > > May be AQ command execution can be always treated as out of order,
> > > > > even
> > > > when VIRTIO_F_IN_ORDER is negotiated.
> > > > > This way it will be even more simpler design for driver and device.
> > > > >
> > > > > > > IO cmds and admin cmds have different considerations in many cases.
> > > > > >
> > > > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > > > >
> > > > > > E.g. I can see how a hardware vendor might want to avoid
> > > > > > supporting indirect with RX for virtio net with mergeable
> > > > > > buffers, but still support it for
> > > > TX.
> > > > > >
> > > > > >
> > > > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > > > I agree. It only helps driver to ensure that AQ commands are
> > > > > processed in
> > > > order, so it doesn't need to serialize it.
> > > > > But yes, driver can always serialize it if needed when AQ is
> > > > > always out of
> > > > order.
> > > > > I think we should word it that AQ is always out of order.
> > > > >
> > > > > > I think you want to reorder admin commands dealing with
> > > > > > unrelated VFs but keep io vqs in order for speed.
> > > > > > Just guessing, you should spell the real motivation out.
> > > > > > However, I think a better way to do that is with finalizing the
> > > > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > > > I read the partial order proposal at [1].
> > > > > It still appears IN_ORDER from driver POV.
> > > > > So I am not sure if driver can complete AQ commands out of order. Can
> > it?
> > > >
> > > > complete commands == use buffers?
> > > Complete descriptors out of order.
> > > I used term command as AQ descriptor used commands.
> > > Will rephase.
> > 
> > So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think there's a
> > way to make it work for your usecase.
> > 
> > > > drivers do not use buffers.
> > > >
> > >
> > > > > I think data path needs more plumbing that just PARTIAL_ORDER
> > > > > flag, for
> > > > descriptor processing differently on tx and rx side.
> > > > > Not sure merging AQ to it is useful, given that we agree that AQ
> > > > > should
> > > > always behave as out of order from beginning.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.h
> > > > > tml
> > > >
> > > > You mean *device*. Driver does not control the order.
> > > Data path I meant device and driver both.
> > > Driver doesn't control the order, but should be ready to handle used
> > descriptors out of order when PARTIAL is negotiated.
> > >
> > > > The point of PARTIAL_ORDER is basically that some descriptors are in
> > > > order some out of order and its up to device. So it is even finer resolution.
> > > >
> > > >
> > > > > > Pls review and let me know. If there's finally a use for it I'll
> > > > > > prioritize finalizing that idea.
> > > > > > Don't see much point in tweaking INDIRECT at all.
> > > > > Common negotiation of INDIRECT on AQ and other queues forces data
> > > > > path
> > > > also to handle that.
> > > >
> > > > I don't see why admin queue needs indirect descriptors.
> > > >
> > > Probably yes. the simple idea is, not to impose indirect descriptors on AQ
> > because txq/rxq prefers to use it.
> > > Not that AQ needs it.
> > > At the same time, you don't want AQ object in spec to be limited to always
> > operate without indirect descriptor.
> > >
> > > > > It is better to not impact the device to handler indirect
> > > > > descriptors on non
> > > > AQ queues, just because AQ prefers to handle it.
> > > > > Often AQ and data path queues are not handled by same set of
> > > > > processing
> > > > engines given they both do different tasks.
> > > >
> > > > so for example, many guests want to use indirect for tx but not for rx.
> > > > if you are worrying about things like that, maybe a per-vq control
> > > > over indirect support makes sense.
> > > > adding complexity like that should really be much better motivated,
> > > > and maybe have some PoC code or back of the napkin math showing the
> > > > expected gains.
> > > I do not have gains handy for tx vs rx q. It was in your example of partial
> > order page fault thread.
> > > So may be you can share those results and/or poc code?
> > >
> > > The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect
> > descriptor capability should not be imposed on AQ.
> > 
> > 
> > If I were you I would defer this, the AQ patch is already too big.
> o.k. we can differ.
> But if you see on the other side, AQ always following INDIRECT_DESC feature bit, forces device implementation to support indirect descriptors.
> Isn't that make device implementation big from beginning, even though it may not be needed?

The problem is not unique to AQ though. RX queues for virtio net have
the same issue.
If it's there but not used you can punt it to a slow path in firmware
was always our approach. If not, worth thinking of a generic solution.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:54                   ` Michael S. Tsirkin
@ 2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
                                         ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  7:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:25 PM
> 
> On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 11:54 AM
> > >
> > > On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, January 18, 2022 3:44 AM
> > > >
> > > > > > > It's a control queue. Why do we worry?
> > > > > > It is used to control/manage the resource of a VF which is
> > > > > > deployed usually
> > > > > to a VM.
> > > > > > So higher the latency, higher the time it takes to deploy start the VM.
> > > > >
> > > > > What are the savings here, in real terms? Boot times for
> > > > > smallest VMs are in 10s of milliseconds. Is reordering of a
> > > > > queue somehow going to save more than microseconds?
> > > > >
> > > > It is probably better not to pick on a specific vendor implementation.
> > > > But for real numbers, I see that an implementation takes 54usec to
> > > > 500 usec
> > > range for simple configuration.
> > > >
> > > > It is better to not small VM 4 vector configuration to take longer
> > > > because
> > > there was previous AQ command for 64 vectors.
> > >
> > > So virtio discovery on boot includes multiple of vmexits, each costs
> > > ~1000 cycles.  And people do not seem to worry about it.
> > It is not the vector configuration by guest VM.
> > It is the AQ command that provisions number of msix vectors for the VF that
> takes tens to hundreds of usecs.
> > These are the command in patch-5 in this proposal.
> 
> Hundreds of usecs is negligeable compared to VM boot time.
> Sorry I don't really see why we worry about indirect in that case.
> 
> 
Ok. we will do incremental proposal after this for wider use case.

> > > You want a compelling argument for working on performance of config.
> > > I frankly think it's not really useful but I especially think you
> > > should cut this out of the current proposal, it's too big as it is.
> > >
> > Ok. We can do follow on proposal after AQ.
> > We already see need of out of order AQ in internal performance tests we are
> running.
> 
> OK so first of all you can avoid declaring IN_ORDER.  
This will force non IN_ORDER on other txq and rxq too that causes higher latency.
But fine, initial implementation can start without it.

> If you see that IN_ORDER
> improves performance for you so you need it, then look at PARTIAL_ORDER
> pls. 
Ok. will consider PARTIAL_ORDER more in future proposal.

> And if that does not address your needs then let's discuss, I'd rather have a
> generic solution since the requirement does not seem to be specific to AQ.
> 
> > But fine, we can differ.

So far I gather below summary that needs to be addressed in v2.

1. Use AQ for msix query and config
2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)
4. Improve documentation around msix config to link to sriov section of virtio spec
5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code

Did I miss anything?

Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.

Cornelia, Jason,
Can you please review current proposal as well before we revise v2?


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
@ 2022-01-18  7:12                       ` Michael S. Tsirkin
  2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:13                       ` Michael S. Tsirkin
  2022-01-19  4:03                       ` Jason Wang
  2 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> 1. Use AQ for msix query and config
> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
> 3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)
> 4. Improve documentation around msix config to link to sriov section of virtio spec
> 5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code
> 
> Did I miss anything?

Better document in spec text just what is the scope for AQ.


> Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.

I think this is related to the subfunction usecase or other future
usecase. In case of PF/VF grouping is implicit through the SRIOV
capability. It would be nice to have things somewhat generic in
most of the text though since we already know this will be needed.
E.g. jason sent a proposal for commands to add/delete subfunctions,
take a look at it, somehow AQ needs to be extendable to support that
functionality too.

> Cornelia, Jason,
> Can you please review current proposal as well before we revise v2?


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:12                       ` Michael S. Tsirkin
@ 2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:40                           ` Michael S. Tsirkin
  2022-01-18 10:38                           ` Michael S. Tsirkin
  0 siblings, 2 replies; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  7:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:42 PM
> 
> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > 1. Use AQ for msix query and config
> > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > the queues 3. Update commit log to describe why config space is not
> > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > Improve documentation around msix config to link to sriov section of
> > virtio spec 5. Describe error that if VF is bound to the device, admin
> > commands targeting VF can fail, describe this error code
> >
> > Did I miss anything?
> 
> Better document in spec text just what is the scope for AQ.
>
Yes, will improve this spec.
 
> 
> > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> be in this proposal, what pieces prevents it do as follow-on.
> 
> I think this is related to the subfunction usecase or other future usecase. In
> case of PF/VF grouping is implicit through the SRIOV capability. It would be
> nice to have things somewhat generic in most of the text though since we
> already know this will be needed.
> E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> look at it, somehow AQ needs to be extendable to support that functionality
> too.
I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
But more commands can be added in future.

What I wanted to check with you and other is, do we want command opcode to be 7-bit enough? 
#127 is lot of admin commands. 😊
But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
What do you think?

An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
We see that creator of the subfunction is often not the only entity managing it.
They being same in new era finding less and less users.
So this piece needs more discussion whenever we address that.

[1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:30                         ` Parav Pandit
@ 2022-01-18  7:40                           ` Michael S. Tsirkin
  2022-01-19  4:21                             ` Jason Wang
  2022-01-18 10:38                           ` Michael S. Tsirkin
  1 sibling, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:42 PM
> > 
> > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > 1. Use AQ for msix query and config
> > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > the queues 3. Update commit log to describe why config space is not
> > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > Improve documentation around msix config to link to sriov section of
> > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > commands targeting VF can fail, describe this error code
> > >
> > > Did I miss anything?
> > 
> > Better document in spec text just what is the scope for AQ.
> >
> Yes, will improve this spec.
>  
> > 
> > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > be in this proposal, what pieces prevents it do as follow-on.
> > 
> > I think this is related to the subfunction usecase or other future usecase. In
> > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > nice to have things somewhat generic in most of the text though since we
> > already know this will be needed.
> > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > look at it, somehow AQ needs to be extendable to support that functionality
> > too.
> I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> But more commands can be added in future.
> 
> What I wanted to check with you and other is, do we want command opcode to be 7-bit enough? 
> #127 is lot of admin commands. 😊
> But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> What do you think?

I agree, we are not short on bits.

> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> We see that creator of the subfunction is often not the only entity managing it.

I think whoever does it can go through the main function driver.

> They being same in new era finding less and less users.
> So this piece needs more discussion whenever we address that.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:40                           ` Michael S. Tsirkin
@ 2022-01-19  4:21                             ` Jason Wang
  2022-01-19  9:30                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-19  4:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com


在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Tuesday, January 18, 2022 12:42 PM
>>>
>>> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
>>>> 1. Use AQ for msix query and config
>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>> the queues 3. Update commit log to describe why config space is not
>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
>>>> Improve documentation around msix config to link to sriov section of
>>>> virtio spec 5. Describe error that if VF is bound to the device, admin
>>>> commands targeting VF can fail, describe this error code
>>>>
>>>> Did I miss anything?
>>> Better document in spec text just what is the scope for AQ.
>>>
>> Yes, will improve this spec.
>>   
>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it must
>>> be in this proposal, what pieces prevents it do as follow-on.
>>>
>>> I think this is related to the subfunction usecase or other future usecase. In
>>> case of PF/VF grouping is implicit through the SRIOV capability. It would be
>>> nice to have things somewhat generic in most of the text though since we
>>> already know this will be needed.
>>> E.g. jason sent a proposal for commands to add/delete subfunctions, take a
>>> look at it, somehow AQ needs to be extendable to support that functionality
>>> too.
>> I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
>> But more commands can be added in future.
>>
>> What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
>> #127 is lot of admin commands. 😊
>> But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
>> What do you think?
> I agree, we are not short on bits.
>
>> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
>> We see that creator of the subfunction is often not the only entity managing it.
> I think whoever does it can go through the main function driver.
>
>> They being same in new era finding less and less users.
>> So this piece needs more discussion whenever we address that.
>>
>> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


Yes, I do that for dynamic provisioning which seems a requirement (or 
better to have) for SIOV spec. We can extend or tweak it for static 
provisioning.

Thanks



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  4:21                             ` Jason Wang
@ 2022-01-19  9:30                               ` Michael S. Tsirkin
  2022-01-25  3:39                                 ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  9:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 19, 2022 at 12:21:36PM +0800, Jason Wang wrote:
> 
> 在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > 
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 12:42 PM
> > > > 
> > > > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > > > 1. Use AQ for msix query and config
> > > > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > > > the queues 3. Update commit log to describe why config space is not
> > > > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > > > Improve documentation around msix config to link to sriov section of
> > > > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > > > commands targeting VF can fail, describe this error code
> > > > > 
> > > > > Did I miss anything?
> > > > Better document in spec text just what is the scope for AQ.
> > > > 
> > > Yes, will improve this spec.
> > > > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > > > be in this proposal, what pieces prevents it do as follow-on.
> > > > 
> > > > I think this is related to the subfunction usecase or other future usecase. In
> > > > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > > > nice to have things somewhat generic in most of the text though since we
> > > > already know this will be needed.
> > > > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > > > look at it, somehow AQ needs to be extendable to support that functionality
> > > > too.
> > > I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> > > But more commands can be added in future.
> > > 
> > > What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
> > > #127 is lot of admin commands. 😊
> > > But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> > > What do you think?
> > I agree, we are not short on bits.
> > 
> > > An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> > > We see that creator of the subfunction is often not the only entity managing it.
> > I think whoever does it can go through the main function driver.
> > 
> > > They being same in new era finding less and less users.
> > > So this piece needs more discussion whenever we address that.
> > > 
> > > [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> 
> 
> Yes, I do that for dynamic provisioning which seems a requirement (or better
> to have) for SIOV spec. We can extend or tweak it for static provisioning.
> 
> Thanks
> 

So you are basically saying that since with scalable iov we need
commands to create subfunctions, let's straight away teach
people to use them to manage VFs.
So before a VF can be used, you are asking that people "allocate" it
through a PF.  Is that right?

I have to say that addresses one concern I just had, which is that
it's unclear what is the status of a VF before any commands are
issued.


-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  9:30                               ` Michael S. Tsirkin
@ 2022-01-25  3:39                                 ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2022-01-25  3:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 19, 2022 at 5:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 12:21:36PM +0800, Jason Wang wrote:
> >
> > 在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, January 18, 2022 12:42 PM
> > > > >
> > > > > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > > > > 1. Use AQ for msix query and config
> > > > > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > > > > the queues 3. Update commit log to describe why config space is not
> > > > > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > > > > Improve documentation around msix config to link to sriov section of
> > > > > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > > > > commands targeting VF can fail, describe this error code
> > > > > >
> > > > > > Did I miss anything?
> > > > > Better document in spec text just what is the scope for AQ.
> > > > >
> > > > Yes, will improve this spec.
> > > > > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > > > > be in this proposal, what pieces prevents it do as follow-on.
> > > > >
> > > > > I think this is related to the subfunction usecase or other future usecase. In
> > > > > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > > > > nice to have things somewhat generic in most of the text though since we
> > > > > already know this will be needed.
> > > > > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > > > > look at it, somehow AQ needs to be extendable to support that functionality
> > > > > too.
> > > > I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> > > > But more commands can be added in future.
> > > >
> > > > What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
> > > > #127 is lot of admin commands. 😊
> > > > But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> > > > What do you think?
> > > I agree, we are not short on bits.
> > >
> > > > An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> > > > We see that creator of the subfunction is often not the only entity managing it.
> > > I think whoever does it can go through the main function driver.
> > >
> > > > They being same in new era finding less and less users.
> > > > So this piece needs more discussion whenever we address that.
> > > >
> > > > [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> >
> >
> > Yes, I do that for dynamic provisioning which seems a requirement (or better
> > to have) for SIOV spec. We can extend or tweak it for static provisioning.
> >
> > Thanks
> >
>
> So you are basically saying that since with scalable iov we need
> commands to create subfunctions, let's straight away teach
> people to use them to manage VFs.
> So before a VF can be used, you are asking that people "allocate" it
> through a PF.  Is that right?

Right.

>
> I have to say that addresses one concern I just had, which is that
> it's unclear what is the status of a VF before any commands are
> issued.

I'm not even sure it's possible, my understanding is that most vendors
choose to go with static provisioning via sriov_numvfs. So such
dynamic on demand provisioning might be tricky.

For SR-IOV it has another subtle limitation that mandates all VF to
have the same device type.

Thanks

>
>
> --
> MST
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:40                           ` Michael S. Tsirkin
@ 2022-01-18 10:38                           ` Michael S. Tsirkin
  2022-01-18 10:50                             ` Parav Pandit
  1 sibling, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:38 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> We see that creator of the subfunction is often not the only entity managing it.
> They being same in new era finding less and less users.
> So this piece needs more discussion whenever we address that.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html

This reminds me. How do AQ commands interact with VF lifecycle?
E.g. can one change number of vectors for an active VF?
Need to specify this.

Also, I started worrying about compatibility here.
Let's say the msix capability in a VF specifies 16 vectors.
Can PF specify 32? If yes how will driver program them?
Can PF specify 8? If yes how do we make sure driver does not
attempt to use 16? And what happens if it does?
Again, something to address.


-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 10:38                           ` Michael S. Tsirkin
@ 2022-01-18 10:50                             ` Parav Pandit
  2022-01-18 15:09                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18 10:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 4:09 PM
> 
> On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > An unrelated command to AQ in Jason's proposal [1] is about " The
> management driver MUST create a managed device by allocating".
> > We see that creator of the subfunction is often not the only entity managing
> it.
> > They being same in new era finding less and less users.
> > So this piece needs more discussion whenever we address that.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.h
> > tml
> 
> This reminds me. How do AQ commands interact with VF lifecycle?
VF device usage is controlled by the same system which is configuring the VF via its parent PF device.
So VF device shouldn't be in use. Any configuration change while VF device is in use will result in failing the AQ command.

> E.g. can one change number of vectors for an active VF?
> Need to specify this.
> 
> Also, I started worrying about compatibility here.
> Let's say the msix capability in a VF specifies 16 vectors.
> Can PF specify 32? If yes how will driver program them?
Yes, PF can change to 32. When VF driver queries the PCI capability, it will reflect 32 instead of 16.
> Can PF specify 8? If yes how do we make sure driver does not attempt to use
> 16? And what happens if it does?
PF is programming the VF msix capability in the device. So virtio pci driver operating the PCI VF device cannot access vectors beyond the max value programmed by the PF driver.

> Again, something to address.
Yes, this has to be described in the spec text. Will add more clearly in v2.

> 
> 
> --
> MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 10:50                             ` Parav Pandit
@ 2022-01-18 15:09                               ` Michael S. Tsirkin
  2022-01-18 17:17                                 ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 15:09 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 4:09 PM
> > 
> > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > management driver MUST create a managed device by allocating".
> > > We see that creator of the subfunction is often not the only entity managing
> > it.
> > > They being same in new era finding less and less users.
> > > So this piece needs more discussion whenever we address that.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.h
> > > tml
> > 
> > This reminds me. How do AQ commands interact with VF lifecycle?
> VF device usage is controlled by the same system which is configuring the VF via its parent PF device.
> So VF device shouldn't be in use. Any configuration change while VF device is in use will result in failing the AQ command.
> 
> > E.g. can one change number of vectors for an active VF?
> > Need to specify this.
> > 
> > Also, I started worrying about compatibility here.
> > Let's say the msix capability in a VF specifies 16 vectors.
> > Can PF specify 32? If yes how will driver program them?
> Yes, PF can change to 32. When VF driver queries the PCI capability, it will reflect 32 instead of 16.
> > Can PF specify 8? If yes how do we make sure driver does not attempt to use
> > 16? And what happens if it does?
> PF is programming the VF msix capability in the device. So virtio pci driver operating the PCI VF device cannot access vectors beyond the max value programmed by the PF driver.

Um. Interesting. This means that the msix capability of the VF changes?
Is that in fact spec compliant? Could some OSes cache the value of the
capability even if the device is not in active use? E.g. I can see how
this might happen in order to map the MSIX tables even before loading
the driver.

The spec says:
	Depending upon system software policy, system software, device driver software, or each at
	different times or environments may configure a function’s MSI-X capability and table
	structures with suitable vectors.

So MSIX canfiguation might not be up to the driver.

We actually ask driver to read back any vector assigned to a VQ
so it's possible to fail vector assignment. Maybe that's better.

> > Again, something to address.
> Yes, this has to be described in the spec text. Will add more clearly in v2.
> 
> > 
> > 
> > --
> > MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 15:09                               ` Michael S. Tsirkin
@ 2022-01-18 17:17                                 ` Parav Pandit
  2022-01-19  7:20                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18 17:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 8:39 PM
> 
> On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 4:09 PM
> > >
> > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > > management driver MUST create a managed device by allocating".
> > > > We see that creator of the subfunction is often not the only
> > > > entity managing
> > > it.
> > > > They being same in new era finding less and less users.
> > > > So this piece needs more discussion whenever we address that.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg001
> > > > 36.h
> > > > tml
> > >
> > > This reminds me. How do AQ commands interact with VF lifecycle?
> > VF device usage is controlled by the same system which is configuring the VF
> via its parent PF device.
> > So VF device shouldn't be in use. Any configuration change while VF device is
> in use will result in failing the AQ command.
> >
> > > E.g. can one change number of vectors for an active VF?
> > > Need to specify this.
> > >
> > > Also, I started worrying about compatibility here.
> > > Let's say the msix capability in a VF specifies 16 vectors.
> > > Can PF specify 32? If yes how will driver program them?
> > Yes, PF can change to 32. When VF driver queries the PCI capability, it will
> reflect 32 instead of 16.
> > > Can PF specify 8? If yes how do we make sure driver does not attempt
> > > to use 16? And what happens if it does?
> > PF is programming the VF msix capability in the device. So virtio pci driver
> operating the PCI VF device cannot access vectors beyond the max value
> programmed by the PF driver.
> 
> Um. Interesting. This means that the msix capability of the VF changes?
Yes.
> Is that in fact spec compliant? Could some OSes cache the value of the
> capability even if the device is not in active use? E.g. I can see how this might
> happen in order to map the MSIX tables even before loading the driver.
> 
PCI subsystem can catch the value before the device driver can load.
Generally a device support intx/msix or intx/msi. So PCI subsystem is not aware what will be used by its upper layer device drivers.
So it usually differs such initialization to a later stage until it is actually used.

Whichever OS driver which implements msix configuration, will have to either not cache it or flush+ rebuild the cache.

> The spec says:
> 	Depending upon system software policy, system software, device driver
> software, or each at
> 	different times or environments may configure a function’s MSI-X
> capability and table
> 	structures with suitable vectors.
> 
> So MSIX canfiguation might not be up to the driver.
> 
> We actually ask driver to read back any vector assigned to a VQ so it's possible
> to fail vector assignment. Maybe that's better.
> 
Virtio driver should not incur any additional complexity in re-reading vector etc.
All the msix config should happen much before drivers gets loaded for the VF.
It is PCI layer of the HV to provide a stable device to virtio device driver which is not undergoing msix table changes, when virtio device driver is operating on it.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 17:17                                 ` Parav Pandit
@ 2022-01-19  7:20                                   ` Michael S. Tsirkin
  2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  7:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 05:17:06PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 8:39 PM
> > 
> > On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 4:09 PM
> > > >
> > > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > > > management driver MUST create a managed device by allocating".
> > > > > We see that creator of the subfunction is often not the only
> > > > > entity managing
> > > > it.
> > > > > They being same in new era finding less and less users.
> > > > > So this piece needs more discussion whenever we address that.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg001
> > > > > 36.h
> > > > > tml
> > > >
> > > > This reminds me. How do AQ commands interact with VF lifecycle?
> > > VF device usage is controlled by the same system which is configuring the VF
> > via its parent PF device.
> > > So VF device shouldn't be in use. Any configuration change while VF device is
> > in use will result in failing the AQ command.
> > >
> > > > E.g. can one change number of vectors for an active VF?
> > > > Need to specify this.
> > > >
> > > > Also, I started worrying about compatibility here.
> > > > Let's say the msix capability in a VF specifies 16 vectors.
> > > > Can PF specify 32? If yes how will driver program them?
> > > Yes, PF can change to 32. When VF driver queries the PCI capability, it will
> > reflect 32 instead of 16.
> > > > Can PF specify 8? If yes how do we make sure driver does not attempt
> > > > to use 16? And what happens if it does?
> > > PF is programming the VF msix capability in the device. So virtio pci driver
> > operating the PCI VF device cannot access vectors beyond the max value
> > programmed by the PF driver.
> > 
> > Um. Interesting. This means that the msix capability of the VF changes?
> Yes.
> > Is that in fact spec compliant? Could some OSes cache the value of the
> > capability even if the device is not in active use? E.g. I can see how this might
> > happen in order to map the MSIX tables even before loading the driver.
> > 
> PCI subsystem can catch the value before the device driver can load.
> Generally a device support intx/msix or intx/msi. So PCI subsystem is not aware what will be used by its upper layer device drivers.
> So it usually differs such initialization to a later stage until it is actually used.
> 
> Whichever OS driver which implements msix configuration, will have to either not cache it or flush+ rebuild the cache.

Seems to contradict what the spec says (below).

> > The spec says:
> > 	Depending upon system software policy, system software, device driver
> > software, or each at
> > 	different times or environments may configure a function’s MSI-X
> > capability and table
> > 	structures with suitable vectors.
> > 
> > So MSIX canfiguation might not be up to the driver.
> > 
> > We actually ask driver to read back any vector assigned to a VQ so it's possible
> > to fail vector assignment. Maybe that's better.
> > 
> Virtio driver should not incur any additional complexity in re-reading vector etc.

I think it does this already.

> All the msix config should happen much before drivers gets loaded for the VF.
> It is PCI layer of the HV to provide a stable device to virtio device driver which is not undergoing msix table changes, when virtio device driver is operating on it.


Problem is in the guest though. I'm not sure we can rely on this part
being part of the driver and not part of the OS.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  7:20                                   ` Michael S. Tsirkin
@ 2022-01-19  8:15                                     ` Parav Pandit
  2022-01-19  8:21                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-19  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com


> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> Behalf Of Michael S. Tsirkin
> Sent: Wednesday, January 19, 2022 12:51 PM

> > > Um. Interesting. This means that the msix capability of the VF changes?
> > Yes.
> > > Is that in fact spec compliant? Could some OSes cache the value of
> > > the capability even if the device is not in active use? E.g. I can
> > > see how this might happen in order to map the MSIX tables even before
> loading the driver.
> > >
> > PCI subsystem can catch the value before the device driver can load.
> > Generally a device support intx/msix or intx/msi. So PCI subsystem is not
> aware what will be used by its upper layer device drivers.
> > So it usually differs such initialization to a later stage until it is actually used.
> >
> > Whichever OS driver which implements msix configuration, will have to
> either not cache it or flush+ rebuild the cache.
> 
> Seems to contradict what the spec says (below).
No it doesn't. spec covers that dependency is on system software policy, system sw, driver sw.
So this system that contains policy, sw, driver sw will implement virtio extension.

In above sentence of "whichever OS driver", it covers all sw components involved in this functionality.
> 
> > > The spec says:
> > > 	Depending upon system software policy, system software, device
> > > driver software, or each at
> > > 	different times or environments may configure a function’s MSI-X
> > > capability and table
> > > 	structures with suitable vectors.
> > >
> > > So MSIX canfiguation might not be up to the driver.
> > >
> > > We actually ask driver to read back any vector assigned to a VQ so
> > > it's possible to fail vector assignment. Maybe that's better.
> > >
> > Virtio driver should not incur any additional complexity in re-reading vector
> etc.
> 
> I think it does this already.
When does it re-read?
I do not follow your point of "ask driver to read back any vector". When do you want to do this?

> 
> > All the msix config should happen much before drivers gets loaded for the VF.
> > It is PCI layer of the HV to provide a stable device to virtio device driver which
> is not undergoing msix table changes, when virtio device driver is operating on
> it.
> 
> 
> Problem is in the guest though. I'm not sure we can rely on this part being part
> of the driver and not part of the OS.
It is part of the system software that consist of virtio driver, pci subsystem and user interface.
I do not follow your comment about "problem is in guest though".
Can you please explain?
VF is simply not available to the guest, when HV has not given it. And when its given, HV doesn’t modify the msix in some random manner.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
@ 2022-01-19  8:21                                       ` Michael S. Tsirkin
  2022-01-19 10:10                                         ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  8:21 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Wed, Jan 19, 2022 at 08:15:50AM +0000, Parav Pandit wrote:
> > > Virtio driver should not incur any additional complexity in re-reading vector
> > etc.
> > 
> > I think it does this already.
> When does it re-read?
> I do not follow your point of "ask driver to read back any vector". When do you want to do this?


After mapping an event to vector, the
driver MUST verify success by reading the Vector field value: on
success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver MAY retry mapping with fewer vectors, disable MSI-X
or report device failure.



> > 
> > > All the msix config should happen much before drivers gets loaded for the VF.
> > > It is PCI layer of the HV to provide a stable device to virtio device driver which
> > is not undergoing msix table changes, when virtio device driver is operating on
> > it.
> > 
> > 
> > Problem is in the guest though. I'm not sure we can rely on this part being part
> > of the driver and not part of the OS.
> It is part of the system software that consist of virtio driver, pci subsystem and user interface.
> I do not follow your comment about "problem is in guest though".

sorry I meant host of course.

> Can you please explain?
> VF is simply not available to the guest, when HV has not given it. And
> when its given, HV doesn’t modify the msix in some random manner.

I am concerned that we can not be sure that changing MSIX capability
while device is present is safe since spec does not promise
the capability is not read by host at boot. However, given device can instead
fail to map events to vectors, even if it is not safe we have other
ways to fail gracefully. It's probably a good idea to mention all
this in the spec text.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  8:21                                       ` Michael S. Tsirkin
@ 2022-01-19 10:10                                         ` Parav Pandit
  2022-01-19 16:40                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-19 10:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, January 19, 2022 1:51 PM
> 
> On Wed, Jan 19, 2022 at 08:15:50AM +0000, Parav Pandit wrote:
> > > > Virtio driver should not incur any additional complexity in
> > > > re-reading vector
> > > etc.
> > >
> > > I think it does this already.
> > When does it re-read?
> > I do not follow your point of "ask driver to read back any vector". When do
> you want to do this?
> 
> 
> After mapping an event to vector, the
> driver MUST verify success by reading the Vector field value: on success, the
> previously written value is returned, and on failure, NO_VECTOR is returned. If
> a mapping failure is detected, the driver MAY retry mapping with fewer
> vectors, disable MSI-X or report device failure.
Ok I got it now.
But insane HV can attempt to change the value this vector even after this read was successful.
And it will obviously break the VM.
This isn't the usage model.
PF (admin device) user giving VF to VM ( = system software) has to ensure that they don’t give VF to VM while in middle of configuration.
We must add it to the spec in v2.

> 
> > >
> > > > All the msix config should happen much before drivers gets loaded for the
> VF.
> > > > It is PCI layer of the HV to provide a stable device to virtio
> > > > device driver which
> > > is not undergoing msix table changes, when virtio device driver is
> > > operating on it.
> > >
> > >
> > > Problem is in the guest though. I'm not sure we can rely on this
> > > part being part of the driver and not part of the OS.
> > It is part of the system software that consist of virtio driver, pci subsystem
> and user interface.
> > I do not follow your comment about "problem is in guest though".
> 
> sorry I meant host of course.
> 
> > Can you please explain?
> > VF is simply not available to the guest, when HV has not given it. And
> > when its given, HV doesn’t modify the msix in some random manner.
> 
> I am concerned that we can not be sure that changing MSIX capability while
> device is present is safe since spec does not promise the capability is not read
> by host at boot. However, given device can instead fail to map events to
> vectors, even if it is not safe we have other ways to fail gracefully. It's probably
> a good idea to mention all this in the spec text.
It is the system who implements virtio spec has to ensure that it doesn't change msix capability while device is use.
Virtio spec should define a minimum expectations from the system such as flushing the cache or no cache or not use the device while undergoing config.

For sure, this will be added to v2.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19 10:10                                         ` Parav Pandit
@ 2022-01-19 16:40                                           ` Michael S. Tsirkin
  2022-01-19 17:07                                             ` Parav Pandit
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19 16:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Wed, Jan 19, 2022 at 10:10:38AM +0000, Parav Pandit wrote:
> Virtio spec should define a minimum expectations from the system such
> as flushing the cache or no cache

well one of the things virtio is trying to do is being compatible with a
wide range of hypervisors/OSes. it might be tricky to change how they
work internally. if we are relying on tricks like this it might be
necessary to poke at some popular systems to see what they do.
lots of work ...

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19 16:40                                           ` Michael S. Tsirkin
@ 2022-01-19 17:07                                             ` Parav Pandit
  0 siblings, 0 replies; 75+ messages in thread
From: Parav Pandit @ 2022-01-19 17:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, January 19, 2022 10:10 PM
> 
> On Wed, Jan 19, 2022 at 10:10:38AM +0000, Parav Pandit wrote:
> > Virtio spec should define a minimum expectations from the system such
> > as flushing the cache or no cache
> 
> well one of the things virtio is trying to do is being compatible with a wide
> range of hypervisors/OSes. 
We are not breaking any compatibility with this optional enhancement.

> it might be tricky to change how they work
> internally. if we are relying on tricks like this it might be necessary to poke at
> some popular systems to see what they do.
Sure it should work on wide range of hypervisors/OSes. It's a new feature so those will implement when scale is critical for them.

If we consider Linux as popular system than Linux pci subsystem and mlx5 driver already implements it in upstream kernel 5.13.
(It doesn't cache it).

Similar implementation for non virtio also exists in _other_ popular OS, which I should avoid annotating here.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
@ 2022-01-18  7:13                       ` Michael S. Tsirkin
  2022-01-18  7:21                         ` Parav Pandit
  2022-01-19  4:03                       ` Jason Wang
  2 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> Can you please review current proposal as well before we revise v2?

I think what you listed amounts to a significant rework and will make
things easier to review. Not 100% sure you need more feedback at this
point.

-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:13                       ` Michael S. Tsirkin
@ 2022-01-18  7:21                         ` Parav Pandit
  2022-01-18  7:37                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-18  7:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:44 PM
> 
> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > Can you please review current proposal as well before we revise v2?
> 
> I think what you listed amounts to a significant rework and will make things
> easier to review. Not 100% sure you need more feedback at this point.

With 
(a) the motivation that Jason mentioned for config vqs, vectors etc,
(b) the msix config/query of this proposal
(c) your description to handle them in uniform way,
(d) understanding the scale inefficiency, on-die resources, multiple outstanding cmds discussion in the thread,

I would like to receive feedback that we all agree to configure these values via AQ.

Rest of the plumbing on AQ etc to address comments to complete in v2, once this looks ok.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:21                         ` Parav Pandit
@ 2022-01-18  7:37                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Tue, Jan 18, 2022 at 07:21:02AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:44 PM
> > 
> > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > Can you please review current proposal as well before we revise v2?
> > 
> > I think what you listed amounts to a significant rework and will make things
> > easier to review. Not 100% sure you need more feedback at this point.
> 
> With 
> (a) the motivation that Jason mentioned for config vqs, vectors etc,
> (b) the msix config/query of this proposal
> (c) your description to handle them in uniform way,
> (d) understanding the scale inefficiency, on-die resources, multiple outstanding cmds discussion in the thread,
> 
> I would like to receive feedback that we all agree to configure these values via AQ.
> Rest of the plumbing on AQ etc to address comments to complete in v2, once this looks ok.

Go ahead and wait if you like, that was just my advice because
personally if I see a mega-thread like this one on the list I just wait
for the next version. Review time has to be viewed as more precious than
developer time, otherwise things do not scale.

Or to put it more succinctly, iterating quickly is recipe for success.
-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
  2022-01-18  7:13                       ` Michael S. Tsirkin
@ 2022-01-19  4:03                       ` Jason Wang
  2 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2022-01-19  4:03 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com


在 2022/1/18 下午3:07, Parav Pandit 写道:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, January 18, 2022 12:25 PM
>>
>> On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
>>>
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Tuesday, January 18, 2022 11:54 AM
>>>>
>>>> On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
>>>>>
>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>> Sent: Tuesday, January 18, 2022 3:44 AM
>>>>>>>> It's a control queue. Why do we worry?
>>>>>>> It is used to control/manage the resource of a VF which is
>>>>>>> deployed usually
>>>>>> to a VM.
>>>>>>> So higher the latency, higher the time it takes to deploy start the VM.
>>>>>> What are the savings here, in real terms? Boot times for
>>>>>> smallest VMs are in 10s of milliseconds. Is reordering of a
>>>>>> queue somehow going to save more than microseconds?
>>>>>>
>>>>> It is probably better not to pick on a specific vendor implementation.
>>>>> But for real numbers, I see that an implementation takes 54usec to
>>>>> 500 usec
>>>> range for simple configuration.
>>>>> It is better to not small VM 4 vector configuration to take longer
>>>>> because
>>>> there was previous AQ command for 64 vectors.
>>>>
>>>> So virtio discovery on boot includes multiple of vmexits, each costs
>>>> ~1000 cycles.  And people do not seem to worry about it.
>>> It is not the vector configuration by guest VM.
>>> It is the AQ command that provisions number of msix vectors for the VF that
>> takes tens to hundreds of usecs.
>>> These are the command in patch-5 in this proposal.
>> Hundreds of usecs is negligeable compared to VM boot time.
>> Sorry I don't really see why we worry about indirect in that case.
>>
>>
> Ok. we will do incremental proposal after this for wider use case.
>
>>>> You want a compelling argument for working on performance of config.
>>>> I frankly think it's not really useful but I especially think you
>>>> should cut this out of the current proposal, it's too big as it is.
>>>>
>>> Ok. We can do follow on proposal after AQ.
>>> We already see need of out of order AQ in internal performance tests we are
>> running.
>>
>> OK so first of all you can avoid declaring IN_ORDER.
> This will force non IN_ORDER on other txq and rxq too that causes higher latency.
> But fine, initial implementation can start without it.
>
>> If you see that IN_ORDER
>> improves performance for you so you need it, then look at PARTIAL_ORDER
>> pls.
> Ok. will consider PARTIAL_ORDER more in future proposal.
>
>> And if that does not address your needs then let's discuss, I'd rather have a
>> generic solution since the requirement does not seem to be specific to AQ.
>>
>>> But fine, we can differ.
> So far I gather below summary that needs to be addressed in v2.
>
> 1. Use AQ for msix query and config


It it means IMS, there's already a proposal[1] that introduce MSI 
commands via the admin virtqueue. And we had similar requirement for 
virtio-MMIO[2] and managed device or SF [3], so I would rather to 
introduce IMS (need a better name though) as a basic facility instead of 
tie it to any specific transport.


> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
> 3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)


I fail to understand the scale/registeres issues. With the one of my 
previous proposal (device selector), technically we don't even need any 
config space or BAR for VF or SF by multiplexing the registers for PF.

I do see one advantage is that the admin virtqueue is transport 
independent (or it could be used as a transport).


> 4. Improve documentation around msix config to link to sriov section of virtio spec
> 5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code
>
> Did I miss anything?
>
> Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.
>
> Cornelia, Jason,
> Can you please review current proposal as well before we revise v2?


If I understand correctly, most of the features (except for the admin 
virtqueue in_order stuffs) are not specific to the admin virtqueue. As 
discussed in the previous versions, I still think it's better:

1) adding sections in the basic device facility or data structure for 
provisioning and MSI
2) introduce admin virtqueue on top as an device interface for those 
features

The leaves the chance for future extensions to allow those features to 
be used by transport specific interface which will benefit for

1) vendor that doesn't want to transport specific method (MMIO or PCIe 
capability) [4]
2) features that can be used by guest or nesting environment (L1)

Thanks

[1] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html

[2] https://lkml.org/lkml/2020/1/21/31

[3] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00134.html

[4] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 18:24   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/content.tex b/content.tex
index cc3e648..0ae4b68 100644
--- a/content.tex
+++ b/content.tex
@@ -4518,10 +4518,19 @@ \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
   2
 
 \subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues}
+ if VIRTIO_F_ADMIN_VQ is not negotiated, the request queues layout is as follows:
 \begin{description}
 \item[0] requestq1
 \item[\ldots]
 \item[N-1] requestqN
+\end{description}
+
+ If VIRTIO_F_ADMIN_VQ is negotiated, the queues layout is as follows:
+\begin{description}
+\item[0] requestq1
+\item[\ldots]
+\item[N-1] requestqN
+\item[N] adminq
 \end{description}
 
  N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by
@@ -4590,7 +4599,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
 bits as indicated above.
 
 The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies
-the number of queues.
+the number of request queues. This field doesn't account admin virtqueue.
 
 The parameters in the configuration space of the device \field{max_discard_sectors}
 \field{discard_sector_alignment} are expressed in 512-byte units if the
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
@ 2022-01-13 18:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:24 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:01PM +0200, Max Gurtovoy wrote:
> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

Igh. Need to update each and every device just so it can get
generic commands seems very annoying.

> ---
>  content.tex | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/content.tex b/content.tex
> index cc3e648..0ae4b68 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -4518,10 +4518,19 @@ \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
>    2
>  
>  \subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues}
> + if VIRTIO_F_ADMIN_VQ is not negotiated, the request queues layout is as follows:
>  \begin{description}
>  \item[0] requestq1
>  \item[\ldots]
>  \item[N-1] requestqN
> +\end{description}
> +
> + If VIRTIO_F_ADMIN_VQ is negotiated, the queues layout is as follows:
> +\begin{description}
> +\item[0] requestq1
> +\item[\ldots]
> +\item[N-1] requestqN
> +\item[N] adminq
>  \end{description}
>  
>   N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by
> @@ -4590,7 +4599,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
>  bits as indicated above.
>  
>  The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies
> -the number of queues.
> +the number of request queues. This field doesn't account admin virtqueue.
>  
>  The parameters in the configuration space of the device \field{max_discard_sectors}
>  \field{discard_sector_alignment} are expressed in 512-byte units if the
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (2 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 17:56   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
  5 siblings, 1 reply; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex
index 0ae4b68..e9c2383 100644
--- a/content.tex
+++ b/content.tex
@@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
 \item[2(N-1)] receiveqN
 \item[2(N-1)+1] transmitqN
 \item[2N] controlq
+\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
 \end{description}
 
  N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
@@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
 
  controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
 
+ adminq only exists if VIRTIO_F_ADMIN_VQ set.
+
 \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
 
 \begin{description}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
@ 2022-01-13 17:56   ` Michael S. Tsirkin
  2022-01-16  9:47     ` Max Gurtovoy
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:56 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

So admin VQ # is only known when all features are negotiated.
Which is quite annoying if hypervisor wants to partition
things e.g. handling admin q in process and handling vqs
by an external process or by hardware.

I think we can allow devices to set the VQ# for the admin queue
instead. Would that work?


> ---
>  content.tex | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0ae4b68..e9c2383 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>  \item[2(N-1)] receiveqN
>  \item[2(N-1)+1] transmitqN
>  \item[2N] controlq
> +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
>  \end{description}
>  
>   N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
> @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>  
>   controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
>  
> + adminq only exists if VIRTIO_F_ADMIN_VQ set.
> +
>  \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
>  
>  \begin{description}
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 17:56   ` Michael S. Tsirkin
@ 2022-01-16  9:47     ` Max Gurtovoy
  2022-01-16 16:45       ` Michael S. Tsirkin
  2022-01-17 14:07       ` Parav Pandit
  0 siblings, 2 replies; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-16  9:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
>> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> So admin VQ # is only known when all features are negotiated.

No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ are 
set by the device.

Negotiation is not a must.

Lets say CTRL_VQ is supported by the device and driver A would like to 
use it and driver B wouldn't like to use it - in both cases the admiq VQ 
# would be 2N + 1.

> Which is quite annoying if hypervisor wants to partition
> things e.g. handling admin q in process and handling vqs
> by an external process or by hardware.
>
> I think we can allow devices to set the VQ# for the admin queue
> instead. Would that work?
>
>
>> ---
>>   content.tex | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0ae4b68..e9c2383 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>>   \item[2(N-1)] receiveqN
>>   \item[2(N-1)+1] transmitqN
>>   \item[2N] controlq
>> +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
>>   \end{description}
>>   
>>    N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
>> @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>>   
>>    controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
>>   
>> + adminq only exists if VIRTIO_F_ADMIN_VQ set.
>> +
>>   \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
>>   
>>   \begin{description}
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-16  9:47     ` Max Gurtovoy
@ 2022-01-16 16:45       ` Michael S. Tsirkin
  2022-01-17 14:07       ` Parav Pandit
  1 sibling, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-16 16:45 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Sun, Jan 16, 2022 at 11:47:30AM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > So admin VQ # is only known when all features are negotiated.
> 
> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ are set
> by the device.
> 
> Negotiation is not a must.
> 
> Lets say CTRL_VQ is supported by the device and driver A would like to use
> it and driver B wouldn't like to use it - in both cases the admiq VQ # would
> be 2N + 1.

What's N here though?

> > Which is quite annoying if hypervisor wants to partition
> > things e.g. handling admin q in process and handling vqs
> > by an external process or by hardware.


This part stands.

> > 
> > I think we can allow devices to set the VQ# for the admin queue
> > instead. Would that work?
> > 
> > 
> > > ---
> > >   content.tex | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0ae4b68..e9c2383 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
> > >   \item[2(N-1)] receiveqN
> > >   \item[2(N-1)+1] transmitqN
> > >   \item[2N] controlq
> > > +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
> > >   \end{description}
> > >    N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
> > > @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
> > >    controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
> > > + adminq only exists if VIRTIO_F_ADMIN_VQ set.
> > > +
> > >   \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
> > >   \begin{description}
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-16  9:47     ` Max Gurtovoy
  2022-01-16 16:45       ` Michael S. Tsirkin
@ 2022-01-17 14:07       ` Parav Pandit
  2022-01-17 22:22         ` Michael S. Tsirkin
  1 sibling, 1 reply; 75+ messages in thread
From: Parav Pandit @ 2022-01-17 14:07 UTC (permalink / raw)
  To: Max Gurtovoy, Michael S. Tsirkin
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Shahaf Shuler, Oren Duer, stefanha@redhat.com


> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> Sent: Sunday, January 16, 2022 3:18 PM
> 
> 
> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> >> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> >>
> >> Reviewed-by: Parav Pandit <parav@nvidia.com>
> >> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > So admin VQ # is only known when all features are negotiated.
> 
> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> are set by the device.
> 
> Negotiation is not a must.
> 
> Lets say CTRL_VQ is supported by the device and driver A would like to use it
> and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> + 1.
> 
> > Which is quite annoying if hypervisor wants to partition things e.g.
> > handling admin q in process and handling vqs by an external process or
> > by hardware.
> >
> > I think we can allow devices to set the VQ# for the admin queue
> > instead. Would that work?
Number of MSI-X configuration and number of VQs config are two different, though it has strong correlation.
Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).

So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-17 14:07       ` Parav Pandit
@ 2022-01-17 22:22         ` Michael S. Tsirkin
  2022-01-18  2:18           ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:22 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Shahaf Shuler, Oren Duer,
	stefanha@redhat.com

On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> 
> > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > Sent: Sunday, January 16, 2022 3:18 PM
> > 
> > 
> > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > >> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > >>
> > >> Reviewed-by: Parav Pandit <parav@nvidia.com>
> > >> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > So admin VQ # is only known when all features are negotiated.
> > 
> > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > are set by the device.
> > 
> > Negotiation is not a must.
> > 
> > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > + 1.
> > 
> > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > handling admin q in process and handling vqs by an external process or
> > > by hardware.
> > >
> > > I think we can allow devices to set the VQ# for the admin queue
> > > instead. Would that work?
> Number of MSI-X configuration and number of VQs config are two different,


I was talking about the number of the VQ used for admin commands. Not
about the number of VQs.

> though it has strong correlation.
> Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> 
> So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.

I was not talking about that at all, but since you mention that,
to me it looks like something that many device types can support.
It's not necessarily rss related, MQ config would benefit too,
so I am not sure why not have a command for controlling number
of queues. Looks like it could quite be generic.

Since current guests only support two modes: a vector
per VQ and a shared vector for all VQs, it follows that
it is important when configuring vectors per VF to also configure
VQs per VF. This makes me wonder whether ability to configure
vectors per VF in isolation without ability to configure or
at least query VQs per VF even has value.


-- 
MST


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-17 22:22         ` Michael S. Tsirkin
@ 2022-01-18  2:18           ` Jason Wang
  2022-01-18  5:25             ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-18  2:18 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com


在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
>>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
>>> Sent: Sunday, January 16, 2022 3:18 PM
>>>
>>>
>>> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
>>>>> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
>>>>>
>>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> So admin VQ # is only known when all features are negotiated.
>>> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
>>> are set by the device.
>>>
>>> Negotiation is not a must.
>>>
>>> Lets say CTRL_VQ is supported by the device and driver A would like to use it
>>> and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
>>> + 1.
>>>
>>>> Which is quite annoying if hypervisor wants to partition things e.g.
>>>> handling admin q in process and handling vqs by an external process or
>>>> by hardware.
>>>>
>>>> I think we can allow devices to set the VQ# for the admin queue
>>>> instead. Would that work?
>> Number of MSI-X configuration and number of VQs config are two different,
>
> I was talking about the number of the VQ used for admin commands. Not
> about the number of VQs.
>
>> though it has strong correlation.
>> Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
>>
>> So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> I was not talking about that at all, but since you mention that,
> to me it looks like something that many device types can support.
> It's not necessarily rss related, MQ config would benefit too,
> so I am not sure why not have a command for controlling number
> of queues. Looks like it could quite be generic.
>
> Since current guests only support two modes: a vector
> per VQ and a shared vector for all VQs, it follows that
> it is important when configuring vectors per VF to also configure
> VQs per VF. This makes me wonder whether ability to configure
> vectors per VF in isolation without ability to configure or
> at least query VQs per VF even has value.


So I had some thought in the past, it looks to me we need a generic 
provision interface that contains all the necessary attributes:

1) #queues
2) device_features
3) #msi_vectors
4) device specific configurations

It could be either an admin virtqueue interface[1] or a dedicated 
capability[2], (the latter seems easier).

Thanks

[1] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
[2] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-18  2:18           ` Jason Wang
@ 2022-01-18  5:25             ` Michael S. Tsirkin
  2022-01-19  4:16               ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  5:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Tue, Jan 18, 2022 at 10:18:29AM +0800, Jason Wang wrote:
> 
> 在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> > On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> > > > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > Sent: Sunday, January 16, 2022 3:18 PM
> > > > 
> > > > 
> > > > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > > > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > > > > 
> > > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > So admin VQ # is only known when all features are negotiated.
> > > > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > > > are set by the device.
> > > > 
> > > > Negotiation is not a must.
> > > > 
> > > > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > > > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > > > + 1.
> > > > 
> > > > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > > > handling admin q in process and handling vqs by an external process or
> > > > > by hardware.
> > > > > 
> > > > > I think we can allow devices to set the VQ# for the admin queue
> > > > > instead. Would that work?
> > > Number of MSI-X configuration and number of VQs config are two different,
> > 
> > I was talking about the number of the VQ used for admin commands. Not
> > about the number of VQs.
> > 
> > > though it has strong correlation.
> > > Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> > > 
> > > So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> > I was not talking about that at all, but since you mention that,
> > to me it looks like something that many device types can support.
> > It's not necessarily rss related, MQ config would benefit too,
> > so I am not sure why not have a command for controlling number
> > of queues. Looks like it could quite be generic.
> > 
> > Since current guests only support two modes: a vector
> > per VQ and a shared vector for all VQs, it follows that
> > it is important when configuring vectors per VF to also configure
> > VQs per VF. This makes me wonder whether ability to configure
> > vectors per VF in isolation without ability to configure or
> > at least query VQs per VF even has value.
> 
> 
> So I had some thought in the past, it looks to me we need a generic
> provision interface that contains all the necessary attributes:
> 
> 1) #queues
> 2) device_features
> 3) #msi_vectors
> 4) device specific configurations
> 
> It could be either an admin virtqueue interface[1] or a dedicated
> capability[2], (the latter seems easier).
> 
> Thanks
> 
> [1]
> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
> [2]
> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> 

We also need 
- something like injecting cvq commands to control rx mode from the admin device
- page fault / dirty page handling

these two seem to call for a vq.


> > 
> > 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-18  5:25             ` Michael S. Tsirkin
@ 2022-01-19  4:16               ` Jason Wang
  2022-01-19  9:26                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-19  4:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Tue, Jan 18, 2022 at 1:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 18, 2022 at 10:18:29AM +0800, Jason Wang wrote:
> >
> > 在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> > > On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> > > > > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > Sent: Sunday, January 16, 2022 3:18 PM
> > > > >
> > > > >
> > > > > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > > > > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > > > > >
> > > > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > > So admin VQ # is only known when all features are negotiated.
> > > > > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > > > > are set by the device.
> > > > >
> > > > > Negotiation is not a must.
> > > > >
> > > > > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > > > > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > > > > + 1.
> > > > >
> > > > > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > > > > handling admin q in process and handling vqs by an external process or
> > > > > > by hardware.
> > > > > >
> > > > > > I think we can allow devices to set the VQ# for the admin queue
> > > > > > instead. Would that work?
> > > > Number of MSI-X configuration and number of VQs config are two different,
> > >
> > > I was talking about the number of the VQ used for admin commands. Not
> > > about the number of VQs.
> > >
> > > > though it has strong correlation.
> > > > Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> > > >
> > > > So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> > > I was not talking about that at all, but since you mention that,
> > > to me it looks like something that many device types can support.
> > > It's not necessarily rss related, MQ config would benefit too,
> > > so I am not sure why not have a command for controlling number
> > > of queues. Looks like it could quite be generic.
> > >
> > > Since current guests only support two modes: a vector
> > > per VQ and a shared vector for all VQs, it follows that
> > > it is important when configuring vectors per VF to also configure
> > > VQs per VF. This makes me wonder whether ability to configure
> > > vectors per VF in isolation without ability to configure or
> > > at least query VQs per VF even has value.
> >
> >
> > So I had some thought in the past, it looks to me we need a generic
> > provision interface that contains all the necessary attributes:
> >
> > 1) #queues
> > 2) device_features
> > 3) #msi_vectors
> > 4) device specific configurations
> >
> > It could be either an admin virtqueue interface[1] or a dedicated
> > capability[2], (the latter seems easier).
> >
> > Thanks
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
> > [2]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> >
>
> We also need
> - something like injecting cvq commands to control rx mode from the admin device
> - page fault / dirty page handling
>
> these two seem to call for a vq.

Right, but vq is not necessarily for PF if we had PASID. And with
PASID we don't even need a dedicated new cvq.

Thanks

>
>
> > >
> > >
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-19  4:16               ` Jason Wang
@ 2022-01-19  9:26                 ` Michael S. Tsirkin
  2022-01-25  3:53                   ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > We also need
> > - something like injecting cvq commands to control rx mode from the admin device
> > - page fault / dirty page handling
> >
> > these two seem to call for a vq.
> 
> Right, but vq is not necessarily for PF if we had PASID. And with
> PASID we don't even need a dedicated new cvq.

I don't think it's a good idea to mix transactions from
multiple PASIDs on the same vq.

Attaching a PASID to a queue seems more reasonable.
cvq is under guest control, so yes I think a separate
vq is preferable.

What is true is that with subfunctions you would have
PASID per subfunction and then one subfunction for control.

I think a sketch of how things will work with scalable iov can't hurt as
part of this proposal.  And, I'm not sure we should have so much
flexibility: if there's an interface that works for SRIOV and SIOV then
that seems preferable than having distinct transports for SRIOV and
SIOV.

-- 
MST

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-19  9:26                 ` Michael S. Tsirkin
@ 2022-01-25  3:53                   ` Jason Wang
  2022-01-25  7:19                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-25  3:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > We also need
> > > - something like injecting cvq commands to control rx mode from the admin device
> > > - page fault / dirty page handling
> > >
> > > these two seem to call for a vq.
> >
> > Right, but vq is not necessarily for PF if we had PASID. And with
> > PASID we don't even need a dedicated new cvq.
>
> I don't think it's a good idea to mix transactions from
> multiple PASIDs on the same vq.

To be clear, I don't mean to let a single vq use multiple PASIDs.

>
> Attaching a PASID to a queue seems more reasonable.
> cvq is under guest control, so yes I think a separate
> vq is preferable.

Sorry, I don't get here. E.g in the case of virtio-net, it's more than
sufficient to assign a dedicated PASID to cvq, any reason for yet
another one?

>
> What is true is that with subfunctions you would have
> PASID per subfunction and then one subfunction for control.

Well, it's possible, but it's also possible to have everything self
contained in a single subfucntion. Then cvq can be assigned to a PASID
that is used only for the hypervisor.

>
> I think a sketch of how things will work with scalable iov can't hurt as
> part of this proposal.  And, I'm not sure we should have so much
> flexibility: if there's an interface that works for SRIOV and SIOV then
> that seems preferable than having distinct transports for SRIOV and
> SIOV.

Some of my understanding of SR-IOV vs SIOV:

1) SR-IOV doesn't requires a transport, VF use PCI config space; But
SIOV requires one
2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can

So I'm not sure how hard it is if we want to unify the management
plane of the above two.

Thanks


>
>
> --
> MST
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-25  3:53                   ` Jason Wang
@ 2022-01-25  7:19                     ` Michael S. Tsirkin
  2022-01-26  5:49                       ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-25  7:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > We also need
> > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > - page fault / dirty page handling
> > > >
> > > > these two seem to call for a vq.
> > >
> > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > PASID we don't even need a dedicated new cvq.
> >
> > I don't think it's a good idea to mix transactions from
> > multiple PASIDs on the same vq.
> 
> To be clear, I don't mean to let a single vq use multiple PASIDs.
> 
> >
> > Attaching a PASID to a queue seems more reasonable.
> > cvq is under guest control, so yes I think a separate
> > vq is preferable.
> 
> Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> sufficient to assign a dedicated PASID to cvq, any reason for yet
> another one?

Well I'm not sure how cheap it is to have an extra PASID.
In theory you can share page tables making it not that
expensive. In practice is it hard for the MMU to do so?
If page tables are not shared extra PASIDs become expensive.


> >
> > What is true is that with subfunctions you would have
> > PASID per subfunction and then one subfunction for control.
> 
> Well, it's possible, but it's also possible to have everything self
> contained in a single subfucntion. Then cvq can be assigned to a PASID
> that is used only for the hypervisor.
> 
> >
> > I think a sketch of how things will work with scalable iov can't hurt as
> > part of this proposal.  And, I'm not sure we should have so much
> > flexibility: if there's an interface that works for SRIOV and SIOV then
> > that seems preferable than having distinct transports for SRIOV and
> > SIOV.
> 
> Some of my understanding of SR-IOV vs SIOV:
> 
> 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> SIOV requires one
> 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> 
> So I'm not sure how hard it is if we want to unify the management
> plane of the above two.
> 
> Thanks

Interesting. So are you fine with a proposal which ignores the PASID
things completely then? If yes can we take that discussion to
a different thread then? This one is already too long ...


> 
> >
> >
> > --
> > MST
> >


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-25  7:19                     ` Michael S. Tsirkin
@ 2022-01-26  5:49                       ` Jason Wang
  2022-01-26  7:02                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2022-01-26  5:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > We also need
> > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > - page fault / dirty page handling
> > > > >
> > > > > these two seem to call for a vq.
> > > >
> > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > PASID we don't even need a dedicated new cvq.
> > >
> > > I don't think it's a good idea to mix transactions from
> > > multiple PASIDs on the same vq.
> >
> > To be clear, I don't mean to let a single vq use multiple PASIDs.
> >
> > >
> > > Attaching a PASID to a queue seems more reasonable.
> > > cvq is under guest control, so yes I think a separate
> > > vq is preferable.
> >
> > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > another one?
>
> Well I'm not sure how cheap it is to have an extra PASID.
> In theory you can share page tables making it not that
> expensive.

I think it should not be expensive since PASID is per RID according to
the PCIe spec.

> In practice is it hard for the MMU to do so?
> If page tables are not shared extra PASIDs become expensive.

Why? For CVQ, we don't need sharing page tables, just maintaining one
dedicated buffer for command forwarding is sufficient.

>
>
> > >
> > > What is true is that with subfunctions you would have
> > > PASID per subfunction and then one subfunction for control.
> >
> > Well, it's possible, but it's also possible to have everything self
> > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > that is used only for the hypervisor.
> >
> > >
> > > I think a sketch of how things will work with scalable iov can't hurt as
> > > part of this proposal.  And, I'm not sure we should have so much
> > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > that seems preferable than having distinct transports for SRIOV and
> > > SIOV.
> >
> > Some of my understanding of SR-IOV vs SIOV:
> >
> > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > SIOV requires one
> > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> >
> > So I'm not sure how hard it is if we want to unify the management
> > plane of the above two.
> >
> > Thanks
>
> Interesting. So are you fine with a proposal which ignores the PASID
> things completely then?

I'm fine, just a note that:

The main advantages of using admin virtqueue in another device (PF) is
that the DMA is isolated, but with the help of PASID, there's no need
to do that and we will have a better interface for nesting.

Thanks

> If yes can we take that discussion to
> a different thread then? This one is already too long ...
>
>
> >
> > >
> > >
> > > --
> > > MST
> > >
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-26  5:49                       ` Jason Wang
@ 2022-01-26  7:02                         ` Michael S. Tsirkin
  2022-01-26  7:10                           ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-26  7:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 26, 2022 at 01:49:05PM +0800, Jason Wang wrote:
> On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > > We also need
> > > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > > - page fault / dirty page handling
> > > > > >
> > > > > > these two seem to call for a vq.
> > > > >
> > > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > > PASID we don't even need a dedicated new cvq.
> > > >
> > > > I don't think it's a good idea to mix transactions from
> > > > multiple PASIDs on the same vq.
> > >
> > > To be clear, I don't mean to let a single vq use multiple PASIDs.
> > >
> > > >
> > > > Attaching a PASID to a queue seems more reasonable.
> > > > cvq is under guest control, so yes I think a separate
> > > > vq is preferable.
> > >
> > > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > > another one?
> >
> > Well I'm not sure how cheap it is to have an extra PASID.
> > In theory you can share page tables making it not that
> > expensive.
> 
> I think it should not be expensive since PASID is per RID according to
> the PCIe spec.
> 
> > In practice is it hard for the MMU to do so?
> > If page tables are not shared extra PASIDs become expensive.
> 
> Why? For CVQ, we don't need sharing page tables, just maintaining one
> dedicated buffer for command forwarding is sufficient.

I am talking about the IOMMU page tables, these are not part of PCIe
spec. You need to map all of guest memory to the device, this needs a
set of PTEs. If two PASIDs map same memory you might be able to share
PTEs but I am guessing that this will need some kind of reference
counting to track their usage. I am not sure how complex/expensive that
will turn out to be. In absence of that, we are doubling the amount of
PTEs by using two PASIDs for the same device.


> >
> >
> > > >
> > > > What is true is that with subfunctions you would have
> > > > PASID per subfunction and then one subfunction for control.
> > >
> > > Well, it's possible, but it's also possible to have everything self
> > > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > > that is used only for the hypervisor.
> > >
> > > >
> > > > I think a sketch of how things will work with scalable iov can't hurt as
> > > > part of this proposal.  And, I'm not sure we should have so much
> > > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > > that seems preferable than having distinct transports for SRIOV and
> > > > SIOV.
> > >
> > > Some of my understanding of SR-IOV vs SIOV:
> > >
> > > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > > SIOV requires one
> > > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> > >
> > > So I'm not sure how hard it is if we want to unify the management
> > > plane of the above two.
> > >
> > > Thanks
> >
> > Interesting. So are you fine with a proposal which ignores the PASID
> > things completely then?
> 
> I'm fine, just a note that:
> 
> The main advantages of using admin virtqueue in another device (PF) is
> that the DMA is isolated,

Right

> but with the help of PASID, there's no need
> to do that

In that you can make the AQ part of the VF itself?

> and we will have a better interface for nesting.
> 
> Thanks

In fact, nesting is an interesting use case. I have not
thought about this too much, it is worth thinking about
how this interface will virtualize.

> > If yes can we take that discussion to
> > a different thread then? This one is already too long ...
> >
> >
> > >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
> >


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-26  7:02                         ` Michael S. Tsirkin
@ 2022-01-26  7:10                           ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2022-01-26  7:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org, Shahaf Shuler,
	Oren Duer, stefanha@redhat.com

On Wed, Jan 26, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 26, 2022 at 01:49:05PM +0800, Jason Wang wrote:
> > On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > > > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > > > We also need
> > > > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > > > - page fault / dirty page handling
> > > > > > >
> > > > > > > these two seem to call for a vq.
> > > > > >
> > > > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > > > PASID we don't even need a dedicated new cvq.
> > > > >
> > > > > I don't think it's a good idea to mix transactions from
> > > > > multiple PASIDs on the same vq.
> > > >
> > > > To be clear, I don't mean to let a single vq use multiple PASIDs.
> > > >
> > > > >
> > > > > Attaching a PASID to a queue seems more reasonable.
> > > > > cvq is under guest control, so yes I think a separate
> > > > > vq is preferable.
> > > >
> > > > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > > > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > > > another one?
> > >
> > > Well I'm not sure how cheap it is to have an extra PASID.
> > > In theory you can share page tables making it not that
> > > expensive.
> >
> > I think it should not be expensive since PASID is per RID according to
> > the PCIe spec.
> >
> > > In practice is it hard for the MMU to do so?
> > > If page tables are not shared extra PASIDs become expensive.
> >
> > Why? For CVQ, we don't need sharing page tables, just maintaining one
> > dedicated buffer for command forwarding is sufficient.
>
> I am talking about the IOMMU page tables, these are not part of PCIe
> spec. You need to map all of guest memory to the device, this needs a
> set of PTEs. If two PASIDs map same memory you might be able to share
> PTEs but I am guessing that this will need some kind of reference
> counting to track their usage. I am not sure how complex/expensive that
> will turn out to be. In absence of that, we are doubling the amount of
> PTEs by using two PASIDs for the same device.

So it depends on the migration model

1) save and restore

or

2) trap and emulate

Then:

- If the device provides the facility to sync the state we don't need
a dedicated PASID for CVQ, and CVQ can be assigned to guests.
- If the device doesn't provide the facility to sync the state, we
need trap CVQ and get the state (what Qemu currently did), then CVQ
needs to be trapped (an emulated CVQ will be presented to guests). And
we need a dedicated PASID for hardware CVQ, but in this case we don't
need to map guest memory to hardware CVQ otherwise there will be
security implications. It's sufficient to map a small buffer.


>
>
> > >
> > >
> > > > >
> > > > > What is true is that with subfunctions you would have
> > > > > PASID per subfunction and then one subfunction for control.
> > > >
> > > > Well, it's possible, but it's also possible to have everything self
> > > > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > > > that is used only for the hypervisor.
> > > >
> > > > >
> > > > > I think a sketch of how things will work with scalable iov can't hurt as
> > > > > part of this proposal.  And, I'm not sure we should have so much
> > > > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > > > that seems preferable than having distinct transports for SRIOV and
> > > > > SIOV.
> > > >
> > > > Some of my understanding of SR-IOV vs SIOV:
> > > >
> > > > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > > > SIOV requires one
> > > > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> > > >
> > > > So I'm not sure how hard it is if we want to unify the management
> > > > plane of the above two.
> > > >
> > > > Thanks
> > >
> > > Interesting. So are you fine with a proposal which ignores the PASID
> > > things completely then?
> >
> > I'm fine, just a note that:
> >
> > The main advantages of using admin virtqueue in another device (PF) is
> > that the DMA is isolated,
>
> Right
>
> > but with the help of PASID, there's no need
> > to do that
>
> In that you can make the AQ part of the VF itself?

Not sure, but I guess for nesting, A bar/register interface is much
more simpler/better for the case that doesn't need DMA.

>
> > and we will have a better interface for nesting.
> >
> > Thanks
>
> In fact, nesting is an interesting use case. I have not
> thought about this too much, it is worth thinking about
> how this interface will virtualize.

I totally agree.

Thanks

>
> > > If yes can we take that discussion to
> > > a different thread then? This one is already too long ...
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (3 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 18:20   ` Michael S. Tsirkin
  2022-01-18 10:38   ` Michael S. Tsirkin
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
  5 siblings, 2 replies; 75+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

A typical cloud provider SR-IOV use case is to create many VFs for
use by guest VMs. The VFs may not be assigned to a VM until a user
requests a VM of a certain size, e.g., number of CPUs. A VF may need
MSI-X vectors proportional to the number of CPUs in the VM, but there is
no standard way today in the spec to change the number of MSI-X vectors
supported by a VF, although there are some operating systems that
support this.

Introduce new feature bits for generic PCI virtualization management
mechanism and a specific mechanism to manage the MSI-X vector assignment
process of virtual/managed functions by its parent virtio device via its
admin virtqueue. For now, virtio supports only PCI virtual function
virtualization, thus the virt manager device will be the PF and the
managed device will be the VF.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
 content.tex     | 29 ++++++++++++++-
 2 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/admin-virtq.tex b/admin-virtq.tex
index ad20f89..4ee8a32 100644
--- a/admin-virtq.tex
+++ b/admin-virtq.tex
@@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
 \hline
 Opcode (bits) & Opcode (hex) & Command & M/O \\
 \hline \hline
- -  & 00h - 7Fh   & Generic admin cmds    & -  \\
+ 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
+\hline
+ 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
+\hline
+ 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
+\hline
+ -  & 03h - 7Fh   & Generic admin cmds    & -  \\
 \hline
  -  & 80h - FFh   & Reserved    & - \\
 \hline
 \end{tabular}
 
+\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
+
+The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
+This command upon success, returns a data buffer that describes information about PCI virtualization
+management attributes. This information is of form:
+\begin{lstlisting}
+struct virtio_admin_pci_virt_mgmt_attr_identify_result {
+        /* For compatibility - indicates which of the below fields are valid (1 means valid):
+         * Bit 0x0 - total_free_vfs_msix_count
+         * Bit 0x1 - per_vf_max_msix_count
+         * Bits 0x2 - 0x3F - reserved for future fields
+         */
+        le64 mask;
+        /* Number of free msix in the global msix pool for VFs */
+        le32 total_free_vfs_msix_count;
+        /* Max number of msix vectors that can be assigned for a single VF */
+        le16 per_vf_max_msix_count;
+};
+\end{lstlisting}
+
+\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
+
+The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.
+The command specific data set by the driver is of form:
+\begin{lstlisting}
+virtio_admin_pci_virt_property_set_data {
+        /* The virtual function number */
+        le16 vf_number;
+        /* For compatibility - indicates which of the below properties should be
+         * modified (1 means that field should be modified):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
+
+\begin{note}
+{vf_number can't be greater than NumVFs value as defined in the PCI specification
+or smaller than 1. An error status will be returned otherwise.}
+\end{note}
+
+This command has no command specific result set by the device. Upon success, the device guarantees
+that all the requested properties were modified to the given values. Otherwise, error will be returned.
+
+\begin{note}
+{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
+\end{note}
+
+\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
+
+The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
+The command specific data set by the driver is of form:
+\begin{lstlisting}
+virtio_admin_pci_virt_property_get_data {
+        /* The virtual function number */
+        le16 vf_number;
+        /* For compatibility - indicates which of the below properties should be
+         * queried (1 means that field should be queried):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
+
+\begin{note}
+{vf_number can't be greater than NumVFs value as defined in the PCI specification
+or smaller than 1. An error status will be returned otherwise.}
+\end{note}
+
+This command, upon success, returns a data buffer that describes the properties that were requested
+and their values for the subject virtio VF device according to the given vf_number.
+This information is of form:
+\begin{lstlisting}
+struct virtio_admin_pci_virt_property_get_result {
+        /* For compatibility - indicates which of the below fields were returned
+         * (1 means that field was returned):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
diff --git a/content.tex b/content.tex
index e9c2383..64678f0 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 43] Feature bits reserved for extensions to the queue and
+\item[24 to 45] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[44 and above] Feature bits reserved for future extensions.
+\item[46 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   that all buffers are used by the admin virtqueue of the device in
   the same order in which they have been made available.
 
+  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
+  that the device can manage PCI related capabilities for its managed PCI VF
+  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
+  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
+  admin commands. This feature can be supported only by PCI devices.
+
+  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
+  that the device supports management of the MSI-X vectors for its
+  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
+  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
 VIRTIO_F_ADMIN_VQ.
 
+A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
+A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
+VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
+MSI-X management of PCI virtual functions is supported, so the driver
+MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
@@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
 VIRTIO_F_ADMIN_VQ.
 
+A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
+offers VIRTIO_F_ADMIN_VQ.
+
+A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
+offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
@ 2022-01-13 18:20   ` Michael S. Tsirkin
  2022-01-18 10:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:20 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:03PM +0200, Max Gurtovoy wrote:
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. The VFs may not be assigned to a VM until a user
> requests a VM of a certain size, e.g., number of CPUs. A VF may need
> MSI-X vectors proportional to the number of CPUs in the VM, but there is
> no standard way today in the spec to change the number of MSI-X vectors
> supported by a VF, although there are some operating systems that
> support this.
> 
> Introduce new feature bits for generic PCI virtualization management
> mechanism and a specific mechanism to manage the MSI-X vector assignment
> process of virtual/managed functions by its parent virtio device via its
> admin virtqueue. For now, virtio supports only PCI virtual function
> virtualization, thus the virt manager device will be the PF and the
> managed device will be the VF.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

So, we have the concept of vectors.


> ---
>  admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  content.tex     | 29 ++++++++++++++-
>  2 files changed, 124 insertions(+), 3 deletions(-)
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> index ad20f89..4ee8a32 100644
> --- a/admin-virtq.tex
> +++ b/admin-virtq.tex
> @@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
>  \hline
>  Opcode (bits) & Opcode (hex) & Command & M/O \\
>  \hline \hline
> - -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> + 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
> +\hline
> + 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
> +\hline
> + 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
> +\hline
> + -  & 03h - 7Fh   & Generic admin cmds    & -  \\

What are these?

>  \hline
>   -  & 80h - FFh   & Reserved    & - \\
>  \hline
>  \end{tabular}
>  

What are the rules for these commands? Can they be issued when any VFs
are in use? What happens then? I don't exactly understand how this
interacts with existing virtio devices binding to VFs.
Does device fail assignment of a vector # out of range,
falling back to smaller # of vectors?
Generally # of VQs to use and # of interrupts can be related, otherwise
performance might suffer - e.g. it's pointless to have many more
interrupts than VQs.
Shouldn't we control # of per-device VQs with these commands too then?


> +\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
> +This command upon success, returns a data buffer that describes information about PCI virtualization
> +management attributes. This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_mgmt_attr_identify_result {
> +        /* For compatibility - indicates which of the below fields are valid (1 means valid):
> +         * Bit 0x0 - total_free_vfs_msix_count
> +         * Bit 0x1 - per_vf_max_msix_count
> +         * Bits 0x2 - 0x3F - reserved for future fields
> +         */
> +        le64 mask;
> +        /* Number of free msix in the global msix pool for VFs */
> +        le32 total_free_vfs_msix_count;
> +        /* Max number of msix vectors that can be assigned for a single VF */
> +        le16 per_vf_max_msix_count;
> +};
> +\end{lstlisting}

Looks like something that should be memory mapped. In fact
you reinvented a features/capability mask here which would
make memory-mapping this quite easy.

> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.

VF appears completely out of the blue here. You need some description
and quote relevant specs to introduce this to the reader.
Since this depends on a feature and feature depends on VIRTIO_F_SR_IOV
and that in turn is only for PFs, I conclude that this is also
only for PFs. But would not hurt to spell this out.
Also can other transport types support partitioning?
Or is that always a PCI thing?


> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_set_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * modified (1 means that field should be modified):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */

Addressing specific VFs seems like something that should be
a generic capability rather a command specific one.
No? Why not?


> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}

Meaning VIRTIO_ADMIN_STATUS_ERR?



> +\end{note}
> +
> +This command has no command specific result set by the device. Upon success, the device guarantees
> +that all the requested properties were modified to the given values. Otherwise, error will be returned.
> +
> +\begin{note}
> +{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
> +\end{note}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_get_data {
> +        /* The virtual function number */
> +        le16 vf_number;

How will we extend this for things like scalable IOV partitioning?
Defining a completely new set of commands for that expected usecase
seems weird ...

> +        /* For compatibility - indicates which of the below properties should be
> +         * queried (1 means that field should be queried):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};


Pls change the layout adding padding so fields are length-aligned.

Unclear. So why does query send msix_count?


> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command, upon success, returns a data buffer that describes the properties that were requested
> +and their values for the subject virtio VF device according to the given vf_number.
> +This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_property_get_result {
> +        /* For compatibility - indicates which of the below fields were returned
> +         * (1 means that field was returned):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};

Seems the same except for VF #. So how about reusing it so reader does
not need to parse this twice?

> +\end{lstlisting}

Please describe the various fields in the document body. The structure formatting
should be of the format similar to

struct virtio_pci_cap {
        u8 cap_vndr;    /* Short description */
	...
};

We should describe this in indtroduction, I notice that we do not
currently do this.

Also pls use bitfields, defines etc as explained in introduction.tex

Pls also add conformance statements about the use here.



> diff --git a/content.tex b/content.tex
> index e9c2383..64678f0 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 43] Feature bits reserved for extensions to the queue and
> +\item[24 to 45] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[44 and above] Feature bits reserved for future extensions.
> +\item[46 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that all buffers are used by the admin virtqueue of the device in
>    the same order in which they have been made available.
>  
> +  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
> +  that the device can manage PCI related capabilities for its managed PCI VF
> +  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
> +  admin commands. This feature can be supported only by PCI devices.


Not sure what does _VIRT_ here stand for.

> +
> +  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
> +  that the device supports management of the MSI-X vectors for its
> +  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>  VIRTIO_F_ADMIN_VQ.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
> +VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
> +MSI-X management of PCI virtual functions is supported, so the driver
> +MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>  VIRTIO_F_ADMIN_VQ.
>  
> +A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
> +offers VIRTIO_F_ADMIN_VQ.
> +
> +A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
> +offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
  2022-01-13 18:20   ` Michael S. Tsirkin
@ 2022-01-18 10:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:38 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:03PM +0200, Max Gurtovoy wrote:
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. The VFs may not be assigned to a VM until a user
> requests a VM of a certain size, e.g., number of CPUs.
> A VF may need
> MSI-X vectors proportional to the number of CPUs in the VM,

Problem is, it does not work like that.  VF needs vectors
proportional to # of VQs and yes, # of VQs proportional to # of CPUs.

So I am not sure what does control over # of vectors get us
without control over # of VQs. Something to better explain in
the cover letter.


> but there is
> no standard way today in the spec to change the number of MSI-X vectors
> supported by a VF, although there are some operating systems that
> support this.
> 
> Introduce new feature bits for generic PCI virtualization management
> mechanism and a specific mechanism to manage the MSI-X vector assignment
> process of virtual/managed functions by its parent virtio device via its
> admin virtqueue. For now, virtio supports only PCI virtual function
> virtualization, thus the virt manager device will be the PF and the
> managed device will be the VF.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  content.tex     | 29 ++++++++++++++-
>  2 files changed, 124 insertions(+), 3 deletions(-)
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> index ad20f89..4ee8a32 100644
> --- a/admin-virtq.tex
> +++ b/admin-virtq.tex
> @@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
>  \hline
>  Opcode (bits) & Opcode (hex) & Command & M/O \\
>  \hline \hline
> - -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> + 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
> +\hline
> + 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
> +\hline
> + 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
> +\hline
> + -  & 03h - 7Fh   & Generic admin cmds    & -  \\
>  \hline
>   -  & 80h - FFh   & Reserved    & - \\
>  \hline
>  \end{tabular}
>  
> +\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
> +This command upon success, returns a data buffer that describes information about PCI virtualization
> +management attributes. This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_mgmt_attr_identify_result {
> +        /* For compatibility - indicates which of the below fields are valid (1 means valid):
> +         * Bit 0x0 - total_free_vfs_msix_count
> +         * Bit 0x1 - per_vf_max_msix_count
> +         * Bits 0x2 - 0x3F - reserved for future fields
> +         */
> +        le64 mask;
> +        /* Number of free msix in the global msix pool for VFs */
> +        le32 total_free_vfs_msix_count;
> +        /* Max number of msix vectors that can be assigned for a single VF */
> +        le16 per_vf_max_msix_count;
> +};
> +\end{lstlisting}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_set_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * modified (1 means that field should be modified):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command has no command specific result set by the device. Upon success, the device guarantees
> +that all the requested properties were modified to the given values. Otherwise, error will be returned.
> +
> +\begin{note}
> +{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
> +\end{note}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_get_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * queried (1 means that field should be queried):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command, upon success, returns a data buffer that describes the properties that were requested
> +and their values for the subject virtio VF device according to the given vf_number.
> +This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_property_get_result {
> +        /* For compatibility - indicates which of the below fields were returned
> +         * (1 means that field was returned):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> diff --git a/content.tex b/content.tex
> index e9c2383..64678f0 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 43] Feature bits reserved for extensions to the queue and
> +\item[24 to 45] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[44 and above] Feature bits reserved for future extensions.
> +\item[46 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that all buffers are used by the admin virtqueue of the device in
>    the same order in which they have been made available.
>  
> +  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
> +  that the device can manage PCI related capabilities for its managed PCI VF
> +  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
> +  admin commands. This feature can be supported only by PCI devices.
> +
> +  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
> +  that the device supports management of the MSI-X vectors for its
> +  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>  VIRTIO_F_ADMIN_VQ.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
> +VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
> +MSI-X management of PCI virtual functions is supported, so the driver
> +MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>  VIRTIO_F_ADMIN_VQ.
>  
> +A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
> +offers VIRTIO_F_ADMIN_VQ.
> +
> +A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
> +offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (4 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
@ 2022-01-13 18:32 ` Michael S. Tsirkin
  2022-01-17 10:00   ` Shahaf Shuler
  5 siblings, 1 reply; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:32 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> Hi,
> 
> In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> device resource. Hence making efficient use of it based on the use case
> that aligns to the VM configuration is desired for best system
> performance.
> 
> For example, today's static assignment of the amount of MSI-X vectors
> doesn't allow sophisticated utilization of resources.
> 
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. Each VM might have a different purpose and different
> amount of resources accordingly (e.g. number of CPUs). A common driver
> usage of device's MSI-X vectors is proportional to the number of CPUs in
> the VM. Since the system administrator might know the amount of CPUs in
> the requested VM, he can also configure the VF's MSI-X vectors amount
> proportional to the number of CPUs in the VM. In this way, the
> utilization of the physical hardware will be improved.
> 
> Today we have some operating systems that support provisioning MSI-X
> vectors for PCI VFs.
> 
> Update the specification to have a method to change the number of MSI-X
> vectors supported by a VF using the PF admin virtqueue interface. For that,
> create a generic infrastructure for managing PCI resources of the managed
> VF by its parent PF.

Can you describe in the cover letter or the commit log of
the admin VQ patch the motivation for using a VQ and not
memory mapped space for this capability?
In fact I feel at least some commands would be better replaced
with a memory mapped structure.


> Patches (1/5)-(2/5) introduce the admin virtqueue concept and feature bits.
> Patches (3/5)-(4/5) add the admin virtq to virtio-blk and virtio-net
> devices.
> Patch (5/5) introduce MSI-X mgmt support.
> 
> Max Gurtovoy (5):
>   Add virtio Admin Virtqueue specification
>   Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
>   virtio-blk: add support for VIRTIO_F_ADMIN_VQ
>   virtio-net: add support for VIRTIO_F_ADMIN_VQ
>   Add support for dynamic MSI-X vector mgmt for VFs
> 
>  admin-virtq.tex | 145 ++++++++++++++++++++++++++++++++++++++++++++++++
>  content.tex     |  91 +++++++++++++++++++++++++++---
>  packed-ring.tex |  26 ++++-----
>  split-ring.tex  |  35 ++++++++----
>  4 files changed, 263 insertions(+), 34 deletions(-)
>  create mode 100644 admin-virtq.tex
> 
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
@ 2022-01-17 10:00   ` Shahaf Shuler
  2022-01-17 21:41     ` Michael S. Tsirkin
  0 siblings, 1 reply; 75+ messages in thread
From: Shahaf Shuler @ 2022-01-17 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment@lists.oasis-open.org, cohuck@redhat.com,
	virtio-dev@lists.oasis-open.org, jasowang@redhat.com,
	Parav Pandit, Oren Duer, stefanha@redhat.com

Thursday, January 13, 2022 8:32 PM, Michael S. Tsirkin:
> Subject: Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a
> VF
> 
> On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> > Hi,
> >
> > In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> > device resource. Hence making efficient use of it based on the use
> > case that aligns to the VM configuration is desired for best system
> > performance.
> >
> > For example, today's static assignment of the amount of MSI-X vectors
> > doesn't allow sophisticated utilization of resources.
> >
> > A typical cloud provider SR-IOV use case is to create many VFs for use
> > by guest VMs. Each VM might have a different purpose and different
> > amount of resources accordingly (e.g. number of CPUs). A common driver
> > usage of device's MSI-X vectors is proportional to the number of CPUs
> > in the VM. Since the system administrator might know the amount of
> > CPUs in the requested VM, he can also configure the VF's MSI-X vectors
> > amount proportional to the number of CPUs in the VM. In this way, the
> > utilization of the physical hardware will be improved.
> >
> > Today we have some operating systems that support provisioning MSI-X
> > vectors for PCI VFs.
> >
> > Update the specification to have a method to change the number of
> > MSI-X vectors supported by a VF using the PF admin virtqueue
> > interface. For that, create a generic infrastructure for managing PCI
> > resources of the managed VF by its parent PF.
> 
> Can you describe in the cover letter or the commit log of the admin VQ patch
> the motivation for using a VQ and not memory mapped space for this
> capability?
> In fact I feel at least some commands would be better replaced with a
> memory mapped structure.

I am wondering what is the motivation to go for memory mapped structures for such control operations. 

I can fully understand why data plane related fields should be placed on MMIO structures. However for control, memory mapped commands are:
1. More constraining for the device implementor and thus not scalable. MMIO direct access implies on-die resources to be allocated. You can see as example the IMS section on Scalable IOV spec[1] that follows this exact design
2. Hard to maintain - each new command may add new MMIO fields, making the device BAR complex.
3. Implies a non-uniform design - some commands are memory mapped, some commands are VQ based. How do we provide the guiding rules to decide? Isn't it simpler to have a single i/f for all the control? 

[1]
https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-scalable-io-virtualization.html


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-17 10:00   ` Shahaf Shuler
@ 2022-01-17 21:41     ` Michael S. Tsirkin
  0 siblings, 0 replies; 75+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 21:41 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Max Gurtovoy, virtio-comment@lists.oasis-open.org,
	cohuck@redhat.com, virtio-dev@lists.oasis-open.org,
	jasowang@redhat.com, Parav Pandit, Oren Duer, stefanha@redhat.com

On Mon, Jan 17, 2022 at 10:00:21AM +0000, Shahaf Shuler wrote:
> Thursday, January 13, 2022 8:32 PM, Michael S. Tsirkin:
> > Subject: Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a
> > VF
> > 
> > On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> > > Hi,
> > >
> > > In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> > > device resource. Hence making efficient use of it based on the use
> > > case that aligns to the VM configuration is desired for best system
> > > performance.
> > >
> > > For example, today's static assignment of the amount of MSI-X vectors
> > > doesn't allow sophisticated utilization of resources.
> > >
> > > A typical cloud provider SR-IOV use case is to create many VFs for use
> > > by guest VMs. Each VM might have a different purpose and different
> > > amount of resources accordingly (e.g. number of CPUs). A common driver
> > > usage of device's MSI-X vectors is proportional to the number of CPUs
> > > in the VM. Since the system administrator might know the amount of
> > > CPUs in the requested VM, he can also configure the VF's MSI-X vectors
> > > amount proportional to the number of CPUs in the VM. In this way, the
> > > utilization of the physical hardware will be improved.
> > >
> > > Today we have some operating systems that support provisioning MSI-X
> > > vectors for PCI VFs.
> > >
> > > Update the specification to have a method to change the number of
> > > MSI-X vectors supported by a VF using the PF admin virtqueue
> > > interface. For that, create a generic infrastructure for managing PCI
> > > resources of the managed VF by its parent PF.
> > 
> > Can you describe in the cover letter or the commit log of the admin VQ patch
> > the motivation for using a VQ and not memory mapped space for this
> > capability?
> > In fact I feel at least some commands would be better replaced with a
> > memory mapped structure.
> 
> I am wondering what is the motivation to go for memory mapped structures for such control operations. 
> 
> I can fully understand why data plane related fields should be placed on MMIO structures.

Actually, data plane is usually in a VQ for us, since MMIO accesses
trigger VM exits.

> However for control, memory mapped commands are:
> 1. More constraining for the device implementor and thus not scalable. MMIO direct access implies on-die resources to be allocated. You can see as example the IMS section on Scalable IOV spec[1] that follows this exact design

Oh it's a PCIe thing, right? Read can not depend on another read?
So this is one of the reasons we don't put big structures in MMIO.
But a couple of bytes is really no big deal IMHO.

> 2. Hard to maintain - each new command may add new MMIO fields, making the device BAR complex.

Well actually we have very nice APIs to handle dependency
between memory and feature bits. It's much harder to abstract
away VQ commands, we don't have anything uniform for that.

> 3. Implies a non-uniform design - some commands are memory mapped,
> some commands are VQ based. How do we provide the guiding rules to
> decide? Isn't it simpler to have a single i/f for all the control? 

newdevice.tex has some guiding principles, see "What Device
Configuration Space Layout?".

But yes, if the answer is "commands A,B,C do not fit in
config space, we placed commands D,E in a VQ for consitency"
then that is an ok answer, but it's something to be mentioned
in the commit log.



> 
> [1]
> https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-scalable-io-virtualization.html

config space is generally more robust, requires less code
on both host and guest side.



^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2022-01-26  7:10 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
2022-01-13 17:53   ` Michael S. Tsirkin
2022-01-17  9:56     ` Max Gurtovoy
2022-01-17 21:30       ` Michael S. Tsirkin
2022-01-18  3:22         ` Parav Pandit
2022-01-18  6:17           ` Michael S. Tsirkin
2022-01-19  3:04         ` Jason Wang
2022-01-19  8:11           ` Michael S. Tsirkin
2022-01-25  3:35             ` Jason Wang
2022-01-17 14:12     ` Parav Pandit
2022-01-17 22:03       ` Michael S. Tsirkin
2022-01-18  3:36         ` Parav Pandit
2022-01-18  7:07           ` Michael S. Tsirkin
2022-01-18  7:14             ` Parav Pandit
2022-01-18  7:20               ` Michael S. Tsirkin
2022-01-19 11:33                 ` Max Gurtovoy
2022-01-19 12:21                   ` Parav Pandit
2022-01-19 14:47                     ` Max Gurtovoy
2022-01-19 15:38                       ` Michael S. Tsirkin
2022-01-19 15:47                         ` Max Gurtovoy
2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
2022-01-13 15:33   ` Michael S. Tsirkin
2022-01-13 17:07     ` Max Gurtovoy
2022-01-13 17:25       ` Michael S. Tsirkin
2022-01-17 13:59         ` Parav Pandit
2022-01-17 22:14           ` Michael S. Tsirkin
2022-01-18  4:44             ` Parav Pandit
2022-01-18  6:23               ` Michael S. Tsirkin
2022-01-18  6:32                 ` Parav Pandit
2022-01-18  6:54                   ` Michael S. Tsirkin
2022-01-18  7:07                     ` Parav Pandit
2022-01-18  7:12                       ` Michael S. Tsirkin
2022-01-18  7:30                         ` Parav Pandit
2022-01-18  7:40                           ` Michael S. Tsirkin
2022-01-19  4:21                             ` Jason Wang
2022-01-19  9:30                               ` Michael S. Tsirkin
2022-01-25  3:39                                 ` Jason Wang
2022-01-18 10:38                           ` Michael S. Tsirkin
2022-01-18 10:50                             ` Parav Pandit
2022-01-18 15:09                               ` Michael S. Tsirkin
2022-01-18 17:17                                 ` Parav Pandit
2022-01-19  7:20                                   ` Michael S. Tsirkin
2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
2022-01-19  8:21                                       ` Michael S. Tsirkin
2022-01-19 10:10                                         ` Parav Pandit
2022-01-19 16:40                                           ` Michael S. Tsirkin
2022-01-19 17:07                                             ` Parav Pandit
2022-01-18  7:13                       ` Michael S. Tsirkin
2022-01-18  7:21                         ` Parav Pandit
2022-01-18  7:37                           ` Michael S. Tsirkin
2022-01-19  4:03                       ` Jason Wang
2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
2022-01-13 18:24   ` Michael S. Tsirkin
2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
2022-01-13 17:56   ` Michael S. Tsirkin
2022-01-16  9:47     ` Max Gurtovoy
2022-01-16 16:45       ` Michael S. Tsirkin
2022-01-17 14:07       ` Parav Pandit
2022-01-17 22:22         ` Michael S. Tsirkin
2022-01-18  2:18           ` Jason Wang
2022-01-18  5:25             ` Michael S. Tsirkin
2022-01-19  4:16               ` Jason Wang
2022-01-19  9:26                 ` Michael S. Tsirkin
2022-01-25  3:53                   ` Jason Wang
2022-01-25  7:19                     ` Michael S. Tsirkin
2022-01-26  5:49                       ` Jason Wang
2022-01-26  7:02                         ` Michael S. Tsirkin
2022-01-26  7:10                           ` Jason Wang
2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
2022-01-13 18:20   ` Michael S. Tsirkin
2022-01-18 10:38   ` Michael S. Tsirkin
2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
2022-01-17 10:00   ` Shahaf Shuler
2022-01-17 21:41     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox