Discussion of the VIRTIO specification
 help / color / mirror / Atom feed
From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	virtio-comment@lists.oasis-open.org, mst@redhat.com,
	jasowang@redhat.com, oren@nvidia.com, parav@nvidia.com,
	shahafs@nvidia.com, eperezma@redhat.com, aadam@redhat.com,
	bodong@nvidia.com, amikheev@nvidia.com
Subject: Re: [RFC PATCH v2 1/2] Add virtio Admin Virtqueue specification
Date: Wed, 28 Jul 2021 17:20:29 +0300	[thread overview]
Message-ID: <eedd595d-77b9-2921-bbcc-ced2618bccc9@nvidia.com> (raw)
In-Reply-To: <YQFev1vXVFLlvW0w@stefanha-x1.localdomain>


On 7/28/2021 4:42 PM, Stefan Hajnoczi wrote:
> On Wed, Jul 28, 2021 at 01:59:26PM +0300, Max Gurtovoy wrote:
>> On 7/28/2021 11:52 AM, Stefan Hajnoczi wrote:
>>> On Tue, Jul 27, 2021 at 06:29:49PM +0300, Max Gurtovoy wrote:
>>>> On 7/27/2021 5:28 PM, Cornelia Huck wrote:
>>>>> On Tue, Jul 27 2021, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>>
>>>>>> On Mon, Jul 26, 2021 at 07:52:53PM +0300, Max Gurtovoy wrote:
>>>>>>> Admin virtqueues will be used to send administrative commands to
>>>>>>> manipulate various features of the device which would not easily map
>>>>>>> into the configuration space.
>>>>>>>
>>>>>>> The same Admin command format will be used for all virtio devices. The
>>>>>>> Admin command set will include 4 types of command classes:
>>>>>>> 1. The generic common class
>>>>>>> 2. The transport specific class
>>>>>>> 3. The device specific class
>>>>>>> 4. The vendor specific class
>>>>>>>
>>>>>>> The above mechanism will enable adding various features to the virtio
>>>>>>> specification, e.g.:
>>>>>>> 1. Format virtio-blk devices in various configurations (512B block size,
>>>>>>>       512B + 8B T10-DIF, 4K block size, 4k + 8B T10-DIF, etc..).
>>>>>>> 2. Live migration management.
>>>>>>> 3. Encrypt/Decrypt descriptors.
>>>>>>> 4. Virtualization management.
>>>>>>> 5. Get device error logs.
>>>>>>> 6. Implement advanced vendor/device/transport specific features.
>>>>>>> 7. Run device health test.
>>>>>>> 8. More.
>>>>>>>
>>>>>>> As virtio evolves beyond the para-virt/sw-emulated world, it's mandatory
>>>>>>> for the specification to become flexible and allow a wider feature set.
>>>>>>> The corrent ctrl virtq that is defined for some of the virtio devices is
>>>>>>> device specific and wasn't designed to be a generic virtq for
>>>>>>> admininistration.
>>>>>>>
>>>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>>>> ---
>>>>>>>     admin-virtq.tex | 241 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>     content.tex     |   4 +
>>>>>>>     2 files changed, 245 insertions(+)
>>>>>>>     create mode 100644 admin-virtq.tex
>>>>>>>
>>>>>>> diff --git a/admin-virtq.tex b/admin-virtq.tex
>>>>>>> new file mode 100644
>>>>>>> index 0000000..ccec2ca
>>>>>>> --- /dev/null
>>>>>>> +++ b/admin-virtq.tex
>>>>>>> @@ -0,0 +1,241 @@
>>>>>>> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
>>>>>>> +
>>>>>>> +Admin virtqueues are used to send administrative commands to manipulate
>>>>>>> +various features of the device which would not easily map into the
>>>>>>> +configuration space.
>>>>>>> +
>>>>>>> +Use of Admin virtqueues is negotiated by the VIRTIO_F_ADMIN_VQ
>>>>>>> +feature bit.
>>>>>>> +
>>>>>>> +Admin virtqueue index may vary among different device types.
>>>>>>> +
>>>>>>> +All commands are of the following form:
>>>>>>> +
>>>>>>> +\begin{lstlisting}
>>>>>>> +struct virtio_admin_cmd {
>>>>>>> +        /* Device-readable part */
>>>>>>> +        u8 class;
>>>>>>> +        u8 command;
>>>>>>> +        u8 command-specific-data[];
>>>>>>> +
>>>>>>> +        /* Device-writable part */
>>>>>>> +        u8 command-specific-result[];
>>>>>>> +        u8 status_type : 4;
>>>>>>> +        u8 reserved : 4;
>>>>>>> +        u8 status;
>>>>>>> +};
>>>>>>> +
>>>>>>> +/* Status type values */
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_GENERIC               0
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_CLASS_SPECIFIC        1
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_COMMAND_SPECIFIC      2
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_TRANSPORT_SPECIFIC    3
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_DEVICE_SPECIFIC       4
>>>>>>> +#define VIRTIO_ADMIN_STATUS_TYPE_VENDOR_SPECIFIC       5
>>>>>>> +
>>>>>>> +/* Generic status values */
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_OK                     0
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_ERR                    1
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_INVALID_CLASS          2
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_INVALID_COMMAND        3
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_DATA_TRANSFER_ERR      4
>>>>>>> +#define VIRTIO_ADMIN_STATUS_GENERIC_DEVICE_INTERNAL_ERR    5
>>>>>>> +\end{lstlisting}
>>>>> This is very complex, and it feels like we're overengineering this.
>>>> Do you mean the status type and the status ?
>>>>
>>>>>>> +
>>>>>>> +The \field{class}, \field{command} and \field{command-specific-data} are
>>>>>>> +set by the driver, and the device sets the \field{status_type}, the
>>>>>>> +\field{status} and  the \field{command-specific-result}, if needed.
>>>>>>> +
>>>>>>> +The virtio Admin command class codes are divided in the following form:
>>>>>>> +
>>>>>>> +\begin{lstlisting}
>>>>>>> +/* class values that are transport, device and vendor independent */
>>>>>>> +#define VIRTIO_ADMIN_COMMON_CLASS_START    0
>>>>>>> +#define VIRTIO_ADMIN_COMMON_CLASS_END      63
>>>>>>> +
>>>>>>> +/* class values that are transport specific */
>>>>>>> +#define VIRTIO_ADMIN_TRANSPORT_CLASS_START  64
>>>>>>> +#define VIRTIO_ADMIN_TRANSPORT_CLASS_END    127
>>>>>>> +
>>>>>>> +/* class values that are device specific */
>>>>>>> +#define VIRTIO_ADMIN_DEVICE_CLASS_START     128
>>>>>>> +#define VIRTIO_ADMIN_DEVICE_CLASS_END       191
>>>>>>> +
>>>>>>> +/* class values that are vendor specific */
>>>>>>> +#define VIRTIO_ADMIN_VENDOR_CLASS_START     192
>>>>>>> +#define VIRTIO_ADMIN_VENDOR_CLASS_END       255
>>>>>>> +\end{lstlisting}
>>>>>>> +
>>>>>>> +\subsection{Admin command set}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / Admin command set}
>>>>>>> +
>>>>>>> +Each virtio device that advertise VIRTIO_F_ADMIN_VQ feature, MUST
>>>>>> "advertises the VIRTIO_F_ADMIN_VQ feature"
>>>>>>
>>>>>>> +support all the mandatory admin commands. A device MAY support also
>>>>>>> +one or more optional admin commands.
>>>>>>> +
>>>>>>> +\subsubsection{Common command set}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / Admin command set / Common command set}
>>>>>>> +
>>>>>>> +The Common command set is a group of classes and commands within each
>>>>>>> +of these classes which are transport, device and vendor independent.
>>>>>>> +A mandatory class is a class that has at least one mandatory command.
>>>>>>> +The Common command set is summarized in following table:
>>>>>>> +
>>>>>>> +\begin{tabular}{|l|l|l|}
>>>>>>> +\hline
>>>>>>> +Class  & Description    & M/O \\
>>>>>>> +\hline \hline
>>>>>>> +0  & VIRTIO_ADMIN_DISCOVER_DEVICE    & M \\
>>>>>>> +\hline
>>>>>>> +1  & VIRTIO_ADMIN_DISCOVER_DEVICE_CLASS_COMMANDS    & M \\
>>>>>>> +\hline
>>>>>>> +2-63  & reserved    & - \\
>>>>>>> +\hline
>>>>>>> +\end{tabular}
>>>>>>> +
>>>>>>> +\paragraph{Discover device class}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / Admin command set / Common command set / Discover device class}
>>>>>>> +
>>>>>>> +This class (opcode: 0) of commands is used to query generic device
>>>>>>> +information. The following table describes the commands supported for
>>>>>>> +this class:
>>>>>>> +
>>>>>>> +\begin{tabular}{|l|l|l|}
>>>>>>> +\hline
>>>>>>> +Command  & Description    & M/O \\
>>>>>>> +\hline \hline
>>>>>>> +0  & VIRTIO_ADMIN_DISCOVER_DEVICE_IDENTITY    & M \\
>>>>>>> +\hline
>>>>>>> +1  & VIRTIO_ADMIN_DISCOVER_DEVICE_SUPPORTED_CLASSES    & M \\
>>>>>>> +\hline
>>>>>>> +2-255  & reserved    & - \\
>>>>>>> +\hline
>>>>>>> +\end{tabular}
>>>>>>> +
>>>>>>> +\subparagraph{Device identity command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / Admin command set / Common command set / Discover device class / Device identity command}
>>>>>>> +
>>>>>>> +This mandatory command should return device identity in the following
>>>>>>> +structure:
>>>>>>> +
>>>>>>> +\begin{tabular}{|l|l|l|}
>>>>>>> +\hline
>>>>>>> +Bytes  & Description    & M/O \\
>>>>>>> +\hline \hline
>>>>>>> +03:00  & VIRTIO DEVICE ID    & M \\
>>>>>>> +\hline
>>>>>>> +05:04  & VIRTIO TRANSPORT ID    & M \\
>>>>>> These fields are not defined. I wonder why they are necessary - the
>>>>>> driver should already have this information.
>>>>> Agreed.
>>>> These are initial fields.
>>>>
>>>> We can add also model, serial_number and more in the future.
>>>>
>>>>
>>>>>> In general, I'm a little concerned that this whole infrastructure will
>>>>>> increase the complexity of VIRTIO significantly with little benefit. I
>>>>>> do think an admin virtqueue makes sense, e.g. for migration, but would
>>>>>> prefer it if we focus on actual commands first instead of
>>>>>> infrastructure. That way it will be clear what infrastructure is needed.
>>>> admin virtq is not only for migration.
>>>>
>>>> You'll be able to configure virtio device properties using user space tools
>>>> like: virtio-cli.
>>>>
>>>> For example: format a block device, manage virtual function resources using
>>>> its PF, query for error logs, device health and more.
>>> That sounds good.
>>>
>>>> In the SW world maybe all the above were redundant, but now that you have
>>>> more and more HW virtio devices the protocol should be more flexible and
>>>> adjust.
>>> HW is not special in this regard, I think this will be useful for
>>> software too. In-band admin commands are necessary for nested
>>> virtualization, for example. They also provide a standard admin
>>> interface for out-of-process devices (vhost-user, etc).
>>>
>>>> Few weeks ago I've sent a concrete commands for live migration but then I
>>>> was told that new infrastructure (admin virtq) should be developed and this
>>>> is what I did in this RFC.
>>>>
>>>> if you combine the 2 RFCs you can imagine what is needed here for adding
>>>> Live migration support.
>>>>
>>>> But I want to add it step by step.
>>>>
>>>> We need to agree on the infrastructure.
>>>>
>>>>> A concrete example would be good, but I think we can come up with a
>>>>> bare-bones spec to start with.
>>>>>
>>>>> - feature bit for the admin vq, as defined here
>>>>> - location of the admin vq is device specific
>>>>> - I think we can get away with two classes, as for feature bits (not
>>>>>      device specificic and device specific); I don't think we need separate
>>>>>      classes for transport or vendor specific
>>>> We need it for live migration probably. It will be a transport class.
>>>>
>>>> Vendor specific is also important to allow vendors develop their special
>>>> souse.
>>>>
>>>>> - make the format for the request simple (command + length + payload?)
>>>> I used almost the same format as virtio net ctrl queue.
>>> The virtio_net_ctrl packet format looks good to me, it's close to what
>>> Cornelia's command + length + payload suggestion:
>> I guess I didn't understand Cornelia suggestion.
>>
>>
>>>     struct virtio_net_ctrl {
>>>             u8 class;
>>>             u8 command;
>>>             u8 command-specific-data[];
>>>             u8 ack;
>>>     };
>>>     /* ack values */
>>>     #define VIRTIO_NET_OK     0
>>>     #define VIRTIO_NET_ERR    1
>>>
>>> I'm not sure how vendor commands will be allocated though. Will each
>>> vendor get a unique class id to prevent collisions? If we want to
>>> support cross-implementation migration then it may be necessary to allow
>>> vendor command availability to change while the device is running.
>> vendor specific commands can collide.
>>
>> Vendor A can implement class 192 to do X and Vendor B can implement class
>> 192 to do Y.
>>
>> what do you mean "support cross-implementation migration" ?
> Migrating from vhost_net to vDPA virtio-net, for example. Or migrating
> between two different vDPA virtio-net implementations.
>
> If vendor commands are all in a single namespace then the guest cannot
> use them without the risk of the command accidentally executing on the
> migration destination (where it has a different effect because the
> vendor has changed!).
>
>>> I prefer the simpler struct virtio_net_ctrl format to the more
>>> complicated one proposed in this patch series.
>> This is the same besides adding status type
>>
>> u8 status_type : 4;
>> u8 reserved : 4;
> I'm not sure why it's needed.

If we can live with 256 status code, I guess we can drop it and divide 
it to groups:

/* status values that are transport, device and vendor independent */
#define VIRTIO_ADMIN_STATUS_GENERIC_START    0
#define VIRTIO_ADMIN_STATUS_GENERIC_END      63

/* status values that are transport specific */
#define VIRTIO_ADMIN_STATUS_TRANSPORT_START  64
#define VIRTIO_ADMIN_STATUS_TRANSPORT_END    127

/* status values that are device specific */
#define VIRTIO_ADMIN_STATUS_DEVICE_START     128
#define VIRTIO_ADMIN_STATUS_DEVICE_END       191

/* status values that are vendor specific */
#define VIRTIO_ADMIN_STATUS_VENDOR_START     192
#define VIRTIO_ADMIN_STATUS_VENDOR_END       255


>
>> I split "u8 command-specific-data[];"
>> to
>> "u8 command-specific-data[];
>>   u8 command-specific-result[];"
>>
>> to emphasize that there is some data that can be written by the device and some data written by the driver in the same command.
>> And this is also the case in virtio-net-ctrl, right ?
> The split makes sense to me.
>
>>>>> How many different (groups of) commands can we reasonably expect? Do we
>>>>> need a generic discovery command, or can we get away with a feature bit
>>>>> covering each new group of commands?
>>>> I can't predict the future but IMO we need a discovery command.
>>>>
>>>> We have many devices and more can be added in the future.
>>> A <u8 class, u8 command> space is 65536 bits or 8KB. I think admin
>>> commands would not be included in VIRTIO Feature Bits but instead
>>> reported via a separate admin command that returns up to 8KB of data:
>>>
>>>     struct virtio_admin_report_cmds {
>>>         /* Bitmap of available admin commands [Device->Driver]
>>>          * bool command_present =
>>>          *        command_bits[class * 32 + command / 8] & (command % 8);
>>>          */
>>>         u8 command_bits[8192];
>>>     };
>> Yes, I divided it to multiple commands per class to cover the case we will
>> need more than 1 bit to describe a command.
>>
>> But I guess we can add it later on.
>>
>> I think the above should be:
>>
>> bool command_present = command_bits[class * 32 + command / 8] & (1 << (command % 8));
>>
>> isn't it ?
> You're right. I forgot to shift the bit :D.
>
>> Also what do you think about renaming <class, command> to <opcode, opmod> ?
> I need to understand how opcode and opmod values are used. I'm not sure.

Same as class and command just with different naming.

>
> Stefan


  reply	other threads:[~2021-07-28 14:20 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-26 16:52 [RFC PATCH v2 1/2] Add virtio Admin Virtqueue specification Max Gurtovoy
2021-07-26 16:52 ` [RFC PATCH v2 2/2] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
2021-07-27 12:24   ` Stefan Hajnoczi
2021-07-27 16:08     ` [virtio-comment] " Max Gurtovoy
2021-07-28  8:25       ` Stefan Hajnoczi
2021-07-27 10:27 ` [RFC PATCH v2 1/2] Add virtio Admin Virtqueue specification Stefan Hajnoczi
2021-07-27 14:28   ` [virtio-comment] " Cornelia Huck
2021-07-27 15:29     ` Max Gurtovoy
2021-07-28  8:52       ` Stefan Hajnoczi
2021-07-28 10:59         ` Max Gurtovoy
2021-07-28 13:42           ` Stefan Hajnoczi
2021-07-28 14:20             ` Max Gurtovoy [this message]
2021-07-29  8:48               ` Stefan Hajnoczi
2021-08-01 10:46                 ` [virtio-comment] " Max Gurtovoy
2021-08-02 12:58                   ` Stefan Hajnoczi
2021-07-28 12:53       ` Michael S. Tsirkin
2021-07-30  6:45         ` [virtio-comment] " Cornelia Huck
2021-07-28 12:48 ` Michael S. Tsirkin
2021-07-29 14:51   ` Max Gurtovoy
2021-07-30  7:05     ` [virtio-comment] " Cornelia Huck
2021-07-31 11:34       ` Max Gurtovoy
2021-07-31 22:26         ` Michael S. Tsirkin
2021-07-31 22:53           ` Max Gurtovoy
2021-08-01  8:16             ` Michael S. Tsirkin
2021-08-01  8:38               ` Max Gurtovoy
2021-08-02  2:17             ` Jason Wang
2021-08-02  2:19               ` Jason Wang
2021-08-02  9:54               ` Max Gurtovoy
2021-08-02 14:51                 ` [virtio-comment] " Cornelia Huck
2021-08-02 15:27                   ` Max Gurtovoy
2021-08-02 17:28                     ` Michael S. Tsirkin
2021-08-03  3:39                     ` Jason Wang
2021-08-03  8:32                       ` Max Gurtovoy
2021-08-03  9:01                         ` Jason Wang
2021-08-03  9:21                           ` Max Gurtovoy
2021-08-03 10:04                             ` [virtio-comment] " Jason Wang
2021-07-30  7:36     ` Michael S. Tsirkin
2021-07-31 11:53       ` Max Gurtovoy
2021-07-31 22:17         ` Michael S. Tsirkin
2021-07-31 23:46           ` Max Gurtovoy
2021-08-02 13:22             ` Stefan Hajnoczi
2021-08-02 14:34               ` [virtio-comment] " Cornelia Huck
2021-08-02 14:58                 ` Max Gurtovoy
2021-08-02 16:39                   ` Stefan Hajnoczi
2021-08-02 15:21             ` [virtio-comment] " Cornelia Huck
2021-08-02 16:03               ` Max Gurtovoy
2021-08-02 17:05                 ` Michael S. Tsirkin
2021-08-03  6:28                   ` [virtio-comment] " Cornelia Huck
2021-08-03  6:41                     ` Jason Wang
2021-08-03  6:51                       ` [virtio-comment] " Cornelia Huck
2021-08-03  7:55                         ` Max Gurtovoy
2021-08-03  8:55                           ` Cornelia Huck
2021-08-03  9:04                             ` Max Gurtovoy
2021-08-02  2:25   ` Jason Wang
2021-08-02  9:51     ` Max Gurtovoy
2021-08-02 17:07     ` Michael S. Tsirkin
2021-08-03  3:22       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eedd595d-77b9-2921-bbcc-ced2618bccc9@nvidia.com \
    --to=mgurtovoy@nvidia.com \
    --cc=aadam@redhat.com \
    --cc=amikheev@nvidia.com \
    --cc=bodong@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=oren@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox