Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: Jason Wang <jasowang@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Eugenio Perez Martin <eperezma@redhat.com>
Cc: virtio-comment@lists.oasis-open.org,
	Virtio-Dev <virtio-dev@lists.oasis-open.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>, Oren Duer <oren@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>,
	Parav Pandit <parav@nvidia.com>, Bodong Wang <bodong@nvidia.com>,
	Alexander Mikheev <amikheev@nvidia.com>,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [PATCH V2 1/2] virtio: introduce virtqueue state as basic facility
Date: Wed, 7 Jul 2021 15:03:37 +0300	[thread overview]
Message-ID: <20bb67da-fd93-b102-6d08-af693d0951dd@nvidia.com> (raw)
In-Reply-To: <1973b90f-6a78-ded2-5249-8635ef15e67d@redhat.com>


On 7/7/2021 5:50 AM, Jason Wang wrote:
>
> 在 2021/7/7 上午7:49, Max Gurtovoy 写道:
>>
>> On 7/6/2021 10:08 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jul 06, 2021 at 07:09:10PM +0200, Eugenio Perez Martin wrote:
>>>> On Tue, Jul 6, 2021 at 11:32 AM Michael S. Tsirkin <mst@redhat.com> 
>>>> wrote:
>>>>> On Tue, Jul 06, 2021 at 12:33:33PM +0800, Jason Wang wrote:
>>>>>> This patch adds new device facility to save and restore virtqueue
>>>>>> state. The virtqueue state is split into two parts:
>>>>>>
>>>>>> - The available state: The state that is used for read the next
>>>>>>    available buffer.
>>>>>> - The used state: The state that is used for making buffer used.
>>>>>>
>>>>>> Note that, there could be devices that is required to set and get 
>>>>>> the
>>>>>> requests that are being processed by the device. I leave such API to
>>>>>> be device specific.
>>>>>>
>>>>>> This facility could be used by both migration and device diagnostic.
>>>>>>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Hi Jason!
>>>>> I feel that for use-cases such as SRIOV,
>>>>> the facility to save/restore vq should be part of a PF
>>>>> that is there needs to be a way for one virtio device to
>>>>> address the state of another one.
>>>>>
>>>> Hi!
>>>>
>>>> In my opinion we should go the other way around: To make features as
>>>> orthogonal/independent as possible, and just make them work together
>>>> if we have to. In this particular case, I think it should be easier to
>>>> decide how to report status, its needs, etc for a VF, and then open
>>>> the possibility for the PF to query or set them, reusing format,
>>>> behavior, etc. as much as possible.
>>>>
>>>> I think that the most controversial point about doing it non-SR IOV
>>>> way is the exposing of these features/fields to the guest using
>>>> specific transport facilities, like PCI common config. However I think
>>>> it should not be hard for the hypervisor to intercept them and even to
>>>> expose them conditionally. Please correct me if this guessing was not
>>>> right and you had other concerns.
>>>
>>> Possibly. I'd like to see some guidance on how this all will work
>>> in practice then. Maybe make it all part of a non-normative section
>>> for now.
>>> I think that the feature itself is not very useful outside of
>>> migration so we don't really gain much by adding it as is
>>> without all the other missing pieces.
>>> I would say let's see more of the whole picture before we commit.
>>
>> I agree here. I also can't see the whole picture for SRIOV case.
>
>
> Again, it's not related to SR-IOV at all. It tries to introduce basic 
> facility in the virtio level which can work for all types of virtio 
> device.
>
> Transport such as PCI need to implement its own way to access those 
> state. It's not hard to implement them simply via capability.
>
> It works like other basic facility like device status, features etc.
>
> For SR-IOV, it doesn't prevent you from implementing that via the 
> admin virtqueue.
>
>
>>
>> I'll try to combine the admin control queue suggested in previous 
>> patch set to my proposal of PF managing the VF migration.
>
>
> Note that, the admin virtqueue should be transport indepedent when 
> trying to introduce them.
>
>
>>
>> Feature negotiation is part of virtio device-driver communication and 
>> not part of the migration software that should manage the migration 
>> process.
>>
>> For me, seems like queue state is something that should be internal 
>> and not be exposed to guest drivers that see this as a new feature.
>
>
> This is not true, we have the case of nested virtualization. As 
> mentioned in another thread, it's the hypervisor that need to choose 
> between hiding or shadowing the internal virtqueue state.
>
> Thanks

In the nested environment, do you mean the Level 1 is Real PF with X VFs 
and in Level 2 the X VF seen as PFs in the guests and expose another Y VFs ?

If so, the guest PF will manage the migration for it's Y VFs.


>
>
>>
>>>
>>>
>>>
>>>>> Thoughts?
>>>>>
>>>>>> ---
>>>>>>   content.tex | 117 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 117 insertions(+)
>>>>>>
>>>>>> diff --git a/content.tex b/content.tex
>>>>>> index 620c0e2..8877b6f 100644
>>>>>> --- a/content.tex
>>>>>> +++ b/content.tex
>>>>>> @@ -385,6 +385,116 @@ \section{Exporting Objects}\label{sec:Basic 
>>>>>> Facilities of a Virtio Device / Expo
>>>>>>   types. It is RECOMMENDED that devices generate version 4
>>>>>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>>>>>
>>>>>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>>>>>> +
>>>>>> +When VIRTIO_F_RING_STATE is negotiated, the driver can set and
>>>>>> +get the device internal virtqueue state through the following
>>>>>> +fields. The way to access those fields is transport specific.
>>>>>> +
>>>>>> +\subsection{\field{Available State} Field}
>>>>>> +
>>>>>> +The \field{Available State} field is two bytes for the driver to 
>>>>>> get
>>>>>> +or set the state that is used by the virtqueue to read for the next
>>>>>> +available buffer.
>>>>>> +
>>>>>> +When VIRTIO_F_RING_PACKED is not negotiated, it contains:
>>>>>> +
>>>>>> +\begin{lstlisting}
>>>>>> +le16 {
>>>>>> +        last_avail_idx : 16;
>>>>>> +} avail_state;
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The \field{last_avail_idx} field indicates where the device 
>>>>>> would read
>>>>>> +for the next index from the virtqueue available ring(modulo the 
>>>>>> queue
>>>>>> + size). This starts at the value set by the driver, and increases.
>>>>>> +
>>>>>> +When VIRTIO_F_RING_PACKED is negotiated, it contains:
>>>>>> +
>>>>>> +\begin{lstlisting}
>>>>>> +le16 {
>>>>>> +        last_avail_idx : 15;
>>>>>> +        last_avail_wrap_counter : 1;
>>>>>> +} avail_state;
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The \field{last_avail_idx} field indicates where the device 
>>>>>> would read for
>>>>>> +the next descriptor head from the descriptor ring. This starts 
>>>>>> at the
>>>>>> +value set by the driver and wraps around when reaching the end 
>>>>>> of the
>>>>>> +ring.
>>>>>> +
>>>>>> +The \field{last_avail_wrap_counter} field indicates the last 
>>>>>> Driver Ring
>>>>>> +Wrap Counter that is observed by the device. This starts at the 
>>>>>> value
>>>>>> +set by the driver, and is flipped when reaching the end of the 
>>>>>> ring.
>>>>>> +
>>>>>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring 
>>>>>> Wrap Counters}.
>>>>>> +
>>>>>> +\subsection{\field{Used State} Field}
>>>>>> +
>>>>>> +The \field{Used State} field is two bytes for the driver to set and
>>>>>> +get the state used by the virtqueue to make buffer used.
>>>>>> +
>>>>>> +When VIRTIO_F_RING_PACKED is not negotiated, the used state 
>>>>>> contains:
>>>>>> +
>>>>>> +\begin{lstlisting}
>>>>>> +le16 {
>>>>>> +        used_idx : 16;
>>>>>> +} used_state;
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The \field{used_idx} where the device would write the next used
>>>>>> +descriptor head to the used ring (modulo the queue size). This 
>>>>>> starts
>>>>>> +at the value set by the driver, and increases. It is easy to see 
>>>>>> this
>>>>>> +is the initial value of the \field{idx} in the used ring.
>>>>>> +
>>>>>> +See also \ref{sec:Basic Facilities of a Virtio Device / 
>>>>>> Virtqueues / The Virtqueue Used Ring}
>>>>>> +
>>>>>> +When VIRTIO_F_RING_PACKED is negotiated, the used state contains:
>>>>>> +
>>>>>> +\begin{lstlisting}
>>>>>> +le16 {
>>>>>> +        used_idx : 15;
>>>>>> +        used_wrap_counter : 1;
>>>>>> +} used_state;
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The \field{used_idx} indicates where the device would write the 
>>>>>> next used
>>>>>> +descriptor head to the descriptor ring. This starts at the value 
>>>>>> set
>>>>>> +by the driver, and warps around when reaching the end of the ring.
>>>>>> +
>>>>>> +\field{used_wrap_counter} is the Device Ring Wrap Counter. This 
>>>>>> starts
>>>>>> +at the value set by the driver, and is flipped when reaching the 
>>>>>> end
>>>>>> +of the ring.
>>>>>> +
>>>>>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring 
>>>>>> Wrap Counters}.
>>>>>
>>>>> Above only fully describes the vq state if descriptors
>>>>> are used in order or at least all out of order descriptors are 
>>>>> consumed
>>>>> at time of save.
>>>>>
>>>> I think that the most straightforward solution would be to add
>>>> something similar to VHOST_USER_GET_INFLIGHT_FD, but without the _FD
>>>> part.
>>>>
>>>> Thanks!
>>>>
>>>>> Adding later option to devices such as net will need extra spec work.
>>>>>
>>>>>
>>>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities 
>>>>>> of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_RING_STATE has been negotiated:
>>>>>> +\begin{itemize}
>>>>>> +\item A driver MUST NOT set the virtqueue state before setting the
>>>>>> +  FEATURE_OK status bit.
>>>>>> +\item A driver MUST NOT set the virtqueue state after setting the
>>>>>> +  DRIVER_OK status bit.
>>>>>> +\end{itemize}
>>>>>> +
>>>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities 
>>>>>> of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_RING_STATE has not been negotiated, a device MUST 
>>>>>> ingore
>>>>>> +the read and write to the virtqueue state.
>>>>>> +
>>>>>> +If VIRTIO_F_RING_STATE has been negotiated:
>>>>>> +\begin{itemize}
>>>>>> +\item A device SHOULD ignore the write to the virtqueue state if 
>>>>>> the
>>>>>> +FEATURE_OK status bit is not set.
>>>>>> +\item A device SHOULD ignore the write to the virtqueue state if 
>>>>>> the
>>>>>> +DRIVER_OK status bit is set.
>>>>>> +\end{itemize}
>>>>>> +
>>>>>> +If VIRTIO_F_RING_STATE has been negotiated, a device MAY has its
>>>>>
>>>>> may have?
>>>>> should also go into a normative section
>>>>>
>>>>>> +device-specific way for the driver to set and get extra virtqueue
>>>>>> +states such as in flight requests.
>>>>>> +
>>>>>>   \chapter{General Initialization And Device 
>>>>>> Operation}\label{sec:General Initialization And Device Operation}
>>>>>>
>>>>>>   We start with an overview of device initialization, then expand 
>>>>>> on the
>>>>>> @@ -420,6 +530,9 @@ \section{Device 
>>>>>> Initialization}\label{sec:General Initialization And Device Oper
>>>>>>      device, optional per-bus setup, reading and possibly writing 
>>>>>> the
>>>>>>      device's virtio configuration space, and population of 
>>>>>> virtqueues.
>>>>>>
>>>>>> +\item\label{itm:General Initialization And Device Operation / 
>>>>>> Device
>>>>>> +  Initialization / Virtqueue State Setup} When 
>>>>>> VIRTIO_F_RING_STATE has been negotiated, perform virtqueue state 
>>>>>> setup, including the initialization of the per virtqueue 
>>>>>> available state, used state and the possible device specific 
>>>>>> virtqueue state.
>>>>>> +
>>>>>>   \item\label{itm:General Initialization And Device Operation / 
>>>>>> Device Initialization / Set DRIVER-OK} Set the DRIVER_OK status 
>>>>>> bit.  At this point the device is
>>>>>>      ``live''.
>>>>>>   \end{enumerate}
>>>>>> @@ -6596,6 +6709,10 @@ \chapter{Reserved Feature 
>>>>>> Bits}\label{sec:Reserved Feature Bits}
>>>>>>     transport specific.
>>>>>>     For more details about driver notifications over PCI see 
>>>>>> \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / 
>>>>>> PCI-specific Initialization And Device Operation / Available 
>>>>>> Buffer Notifications}.
>>>>>>
>>>>>> +  \item[VIRTIO_F_RING_STATE(40)] This feature indicates that the 
>>>>>> driver
>>>>>> +  can set and get the device internal virtqueue state.
>>>>>> +  See \ref{sec:Virtqueues / Virtqueue 
>>>>>> State}~\nameref{sec:Virtqueues / Virtqueue State}.
>>>>>> +
>>>>>>   \end{description}
>>>>>>
>>>>>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved 
>>>>>> Feature Bits}
>>>>>> -- 
>>>>>> 2.25.1
>>
>


  reply	other threads:[~2021-07-07 12:03 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06  4:33 [PATCH V2 0/2] Vitqueue State Synchronization Jason Wang
2021-07-06  4:33 ` [PATCH V2 1/2] virtio: introduce virtqueue state as basic facility Jason Wang
2021-07-06  9:32   ` Michael S. Tsirkin
2021-07-06 17:09     ` Eugenio Perez Martin
2021-07-06 19:08       ` Michael S. Tsirkin
2021-07-06 23:49         ` Max Gurtovoy
2021-07-07  2:50           ` Jason Wang
2021-07-07 12:03             ` Max Gurtovoy [this message]
2021-07-07  2:42         ` Jason Wang
2021-07-07  4:36           ` Jason Wang
2021-07-07  2:41       ` Jason Wang
2021-07-06 12:27   ` [virtio-comment] " Cornelia Huck
2021-07-07  3:29     ` [virtio-dev] " Jason Wang
2021-07-06  4:33 ` [PATCH V2 2/2] virtio: introduce STOP status bit Jason Wang
2021-07-06  9:24   ` [virtio-comment] " Dr. David Alan Gilbert
2021-07-07  3:20     ` Jason Wang
2021-07-09 17:23       ` Eugenio Perez Martin
2021-07-10 20:36         ` Michael S. Tsirkin
2021-07-12  4:00           ` Jason Wang
2021-07-12  9:57             ` Stefan Hajnoczi
2021-07-13  3:27               ` Jason Wang
2021-07-13  8:19                 ` Cornelia Huck
2021-07-13  9:13                   ` Jason Wang
2021-07-13 11:31                     ` Cornelia Huck
2021-07-13 12:23                       ` Jason Wang
2021-07-13 12:28                         ` Cornelia Huck
2021-07-14  2:47                           ` Jason Wang
2021-07-14  6:20                             ` Cornelia Huck
2021-07-14  8:53                               ` Jason Wang
2021-07-14  9:24                                 ` [virtio-dev] " Cornelia Huck
2021-07-15  2:01                                   ` Jason Wang
2021-07-13 10:00                 ` Stefan Hajnoczi
2021-07-13 12:16                   ` Jason Wang
2021-07-14  9:53                     ` Stefan Hajnoczi
2021-07-14 10:29                       ` Jason Wang
2021-07-14 15:07                         ` Stefan Hajnoczi
2021-07-14 16:22                           ` Max Gurtovoy
2021-07-15  1:38                             ` Jason Wang
2021-07-15  9:26                               ` Stefan Hajnoczi
2021-07-16  1:48                                 ` Jason Wang
2021-07-19 12:08                                   ` Stefan Hajnoczi
2021-07-20  2:46                                     ` Jason Wang
2021-07-15 21:18                               ` Michael S. Tsirkin
2021-07-16  2:19                                 ` Jason Wang
2021-07-15  1:35                           ` Jason Wang
2021-07-15  9:16                             ` [virtio-dev] " Stefan Hajnoczi
2021-07-16  1:44                               ` Jason Wang
2021-07-19 12:18                                 ` [virtio-dev] " Stefan Hajnoczi
2021-07-20  2:50                                   ` Jason Wang
2021-07-20 10:31                                 ` Cornelia Huck
2021-07-21  2:59                                   ` Jason Wang
2021-07-15 10:01                             ` Stefan Hajnoczi
2021-07-16  2:03                               ` Jason Wang
2021-07-16  3:53                                 ` Jason Wang
2021-07-19 12:45                                   ` Stefan Hajnoczi
2021-07-20  3:04                                     ` Jason Wang
2021-07-20  8:50                                       ` Stefan Hajnoczi
2021-07-20 10:48                                         ` Cornelia Huck
2021-07-20 12:47                                           ` Stefan Hajnoczi
2021-07-21  2:29                                         ` Jason Wang
2021-07-21 10:20                                           ` Stefan Hajnoczi
2021-07-22  7:33                                             ` Jason Wang
2021-07-22 10:24                                               ` Stefan Hajnoczi
2021-07-22 13:08                                                 ` Jason Wang
2021-07-26 15:07                                                   ` Stefan Hajnoczi
2021-07-27  7:43                                                     ` Max Reitz
2021-08-03  6:33                                                     ` Jason Wang
2021-08-03 10:37                                                       ` Stefan Hajnoczi
2021-08-03 11:42                                                         ` Jason Wang
2021-08-03 12:22                                                           ` Dr. David Alan Gilbert
2021-08-04  1:42                                                             ` Jason Wang
2021-08-04  9:07                                                               ` Dr. David Alan Gilbert
2021-08-05  6:38                                                                 ` Jason Wang
2021-08-05  8:19                                                                   ` Dr. David Alan Gilbert
2021-08-06  6:15                                                                     ` Jason Wang
2021-08-08  9:31                                                                       ` Max Gurtovoy
2021-08-04  9:20                                                               ` Stefan Hajnoczi
2021-08-05  6:45                                                                 ` Jason Wang
2021-08-04  8:38                                                             ` Stefan Hajnoczi
2021-08-04  8:36                                                           ` Stefan Hajnoczi
2021-08-05  6:35                                                             ` Jason Wang
2021-07-19 12:43                                 ` Stefan Hajnoczi
2021-07-20  3:02                                   ` Jason Wang
2021-07-20 10:19                                     ` Stefan Hajnoczi
2021-07-21  2:52                                       ` Jason Wang
2021-07-21 10:42                                         ` Stefan Hajnoczi
2021-07-22  2:08                                           ` Jason Wang
2021-07-22 10:30                                             ` Stefan Hajnoczi
2021-07-20 12:27                                     ` Max Gurtovoy
2021-07-20 12:57                                       ` Stefan Hajnoczi
2021-07-20 13:09                                         ` Max Gurtovoy
2021-07-21  3:06                                           ` Jason Wang
2021-07-21 10:48                                           ` Stefan Hajnoczi
2021-07-21 11:37                                             ` Max Gurtovoy
2021-07-21  3:09                                       ` Jason Wang
2021-07-21 11:43                                         ` Max Gurtovoy
2021-07-22  2:01                                           ` Jason Wang
2021-07-12  3:53         ` Jason Wang
2021-07-06 12:50   ` [virtio-comment] " Cornelia Huck
2021-07-06 13:18     ` Jason Wang
2021-07-06 14:27       ` [virtio-dev] " Cornelia Huck
2021-07-07  0:05         ` Max Gurtovoy
2021-07-07  3:14           ` Jason Wang
2021-07-07  2:56         ` Jason Wang
2021-07-07 16:45           ` [virtio-comment] " Cornelia Huck
2021-07-08  4:06             ` Jason Wang
2021-07-09 17:35   ` Eugenio Perez Martin
2021-07-12  4:06     ` Jason Wang
2021-07-10 20:40   ` Michael S. Tsirkin
2021-07-12  4:04     ` Jason Wang
2021-07-12 10:12 ` [PATCH V2 0/2] Vitqueue State Synchronization Stefan Hajnoczi
2021-07-13  3:08   ` Jason Wang
2021-07-13 10:30     ` Stefan Hajnoczi
2021-07-13 11:56       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20bb67da-fd93-b102-6d08-af693d0951dd@nvidia.com \
    --to=mgurtovoy@nvidia.com \
    --cc=amikheev@nvidia.com \
    --cc=bodong@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=oren@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=pasic@linux.ibm.com \
    --cc=shahafs@nvidia.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox