[virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

Discussion of the VIRTIO specification
 help / color / mirror / Atom feed

* [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06  8:16 Zhu Lingshan
  2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
                   ` (7 more replies)
  0 siblings, 8 replies; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This series introduces
1)a new SUSPEND bit in the device status
Which is used to suspend the device, so that the device states
and virtqueue states are stabilized.

2)virtqueue state and its accessor, to get and set last_avail_idx
and last_used_idx of virtqueues.

The main usecase of these new facilities is Live Migration.

Future work: dirty page tracking and in-flight descriptors.

This series addresses many comments from Jason, Stefan and Eugenio
from RFC series.

Zhu Lingshan (5):
  virtio: introduce vq state as basic facility
  virtio: introduce SUSPEND bit in device status
  virtqueue: constraints for virtqueue state
  virtqueue: ignore resetting vqs when SUSPEND
  virtio-pci: implement VIRTIO_F_QUEUE_STATE

 content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
 transport-pci.tex |  18 +++++++
 2 files changed, 136 insertions(+)

-- 
2.35.3

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
@ 2023-09-06  8:16 ` Zhu Lingshan
  2023-09-06  8:28   ` Michael S. Tsirkin
  2023-09-14 11:25   ` Michael S. Tsirkin
  2023-09-06  8:16 ` [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status Zhu Lingshan
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds new device facility to save and restore virtqueue
state. The virtqueue state is split into two parts:

- The available state: The state that is used for read the next
  available buffer.
- The used state: The state that is used for make buffer used.

This will simply the transport specific method implementation. E.g two
le16 could be used instead of a single le32). For split virtqueue, we
only need the available state since the used state is implemented in
the virtqueue itself (the used index). For packed virtqueue, we need
both the available state and the used state.

Those states are required to implement live migration support for
virtio device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/content.tex b/content.tex
index 0a62dce..0e492cd 100644
--- a/content.tex
+++ b/content.tex
@@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
+
+When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
+get the device internal virtqueue state through the following
+fields. The implementation of the interfaces is transport specific.
+
+\subsection{\field{Available State} Field}
+
+The available state field is two bytes of virtqueue state that is used by
+the device to read the next available buffer.
+
+When VIRTIO_RING_F_PACKED is not negotiated, it contains:
+
+\begin{lstlisting}
+le16 last_avail_idx;
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running available ring
+index where the device will read the next available head of a
+descriptor chain.
+
+See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
+
+When VIRTIO_RING_F_PACKED is negotiated, it contains:
+
+\begin{lstlisting}
+le16 {
+  last_avail_idx : 15;
+  last_avail_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running location
+where the device read the next descriptor from the virtqueue descriptor ring.
+
+The \field{last_avail_wrap_counter} field is the last driver ring wrap
+counter that was observed by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+\subsection{\field{Used State} Field}
+
+The used state field is two bytes of virtqueue state that is used by
+the device when marking a buffer used.
+
+When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
+
+\begin{lstlisting}
+le16 {
+  used_idx : 15;
+  used_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{used_idx} field is the free-running location where the device write the next
+used descriptor to the descriptor ring.
+
+The \field{used_wrap_counter} field is the wrap counter that is used
+by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
+is always 0
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
@ 2023-09-06  8:28   ` Michael S. Tsirkin
  2023-09-06  9:43     ` Zhu, Lingshan
  2023-09-14 11:25   ` Michael S. Tsirkin
  1 sibling, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:28 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index). For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.


virtqueue state can not, generally, be described by two 16 bit
indices.

Consider an example: these buffers available: A B C D
After device used descriptors A and C, what is its state and
how do you describe it using a single index?


> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

next after what?

> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.

I don't get what good does this used_idx do - used descriptors are written in
order so just check which ones are valid?
And driver does of course know what the used_wrap_counter is
otherwise it can't work.
Or is this for some kind of
split driver setup where looking at the ring is impossible?


> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:28   ` Michael S. Tsirkin
@ 2023-09-06  9:43     ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:28 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index). For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>
> virtqueue state can not, generally, be described by two 16 bit
> indices.
>
> Consider an example: these buffers available: A B C D
> After device used descriptors A and C, what is its state and
> how do you describe it using a single index?
<discussed in the 0/5 cover letter>
>
>
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> next after what?
I am not sure I get it.
It's like a pointer, so I assume it goes without saying, implies the 
next "address".

Do you suggest "next after current being processed ones" or others?
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
> I don't get what good does this used_idx do - used descriptors are written in
> order so just check which ones are valid?
a bit confused. please correct me if I misunderstood.

Valid to the device? avail_idx? How to speculate used_idx from avail_idx,
or walk through the ring?
> And driver does of course know what the used_wrap_counter is
> otherwise it can't work.
> Or is this for some kind of
> split driver setup where looking at the ring is impossible?
For splitted virtqueue, we don't need to migrate used_idx, but for
packed vq, is it easier if we record and restore it?
>
>
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
  2023-09-06  8:28   ` Michael S. Tsirkin
@ 2023-09-14 11:25   ` Michael S. Tsirkin
  2023-09-15  2:46     ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:25 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index).

hmm no, simply because when ring is not running and when
buffers are processed in order, last avail == used.

> For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>



> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.
> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

I dislike how this pokes at split-ring.txt externally.
Will make it harder to add new formats down the road.



> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.


> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.
> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-14 11:25   ` Michael S. Tsirkin
@ 2023-09-15  2:46     ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:25 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index).
> hmm no, simply because when ring is not running and when
> buffers are processed in order, last avail == used.
in the ideal case, yes.

I will remove this since this may be ambiguous.
>
>> For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> I dislike how this pokes at split-ring.txt externally.
> Will make it harder to add new formats down the road.
I can move these contents to split-ring.tex and packed-ring.tex
respectively, however some contents have to be duplicated.

Thanks
>
>
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>>
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
  2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
@ 2023-09-06  8:16 ` Zhu Lingshan
  2023-09-14 11:34   ` [virtio-comment] " Michael S. Tsirkin
  2023-09-06  8:16 ` [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state Zhu Lingshan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch introduces a new status bit in the device status: SUSPEND.

This SUSPEND bit can be used by the driver to suspend a device,
in order to stabilize the device states and virtqueue states.

Its main use case is live migration.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/content.tex b/content.tex
index 0e492cd..0fab537 100644
--- a/content.tex
+++ b/content.tex
@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
   drive the device.
 
+\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
+  device has been suspended by the driver.
+
 \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
   an error from which it can't recover.
 \end{description}
@@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 recover by issuing a reset.
 \end{note}
 
+The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
+
+When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
+
 \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
 
 The device MUST NOT consume buffers or send any used buffer
@@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
 MUST send a device configuration change notification to the driver.
 
+The device MUST ignore SUSPEND if FEATURES_OK is not set.
+
+The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
+
+The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
+
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
+and resumes operation upon DRIVER_OK.
+
+If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
+the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
+
+\begin{itemize}
+\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
+\item Wait until all descriptors that being processed to finish and mark them as used.
+\item Flush all used buffer and send used buffer notifications to the driver.
+\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
+\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
+\end{itemize}
+
 \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
 
 Each virtio device offers all the features it understands.  During
@@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
 	handling features reserved for future use.
 
+  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
+   SUSPEND the device.
+   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-06  8:16 ` [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status Zhu Lingshan
@ 2023-09-14 11:34   ` Michael S. Tsirkin
  2023-09-15  2:57     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:34 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> This patch introduces a new status bit in the device status: SUSPEND.
> 
> This SUSPEND bit can be used by the driver to suspend a device,
> in order to stabilize the device states and virtqueue states.
> 
> Its main use case is live migration.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0e492cd..0fab537 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> +  device has been suspended by the driver.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>  
> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> +
> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  
>  The device MUST NOT consume buffers or send any used buffer
> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
>  
> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> +
> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.

why? let's just forbid driver from setting it.

> +
> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> +
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> +and resumes operation upon DRIVER_OK.
> +

sorry what?

> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> +
> +\begin{itemize}
> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> +\item Wait until all descriptors that being processed to finish and mark them as used.
> +\item Flush all used buffer and send used buffer notifications to the driver.

flush how?

> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}


record where?

> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}

pause in what sense? completely?  this does not seem realistic.
e.g. pci express link has to stay active or device will die.


also, presumably here it is except a bunch of other fields.
e.g. what about queue select and all related queue fields?


> +\end{itemize}
> +
>  \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>  
>  Each virtio device offers all the features it understands.  During
> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>  	handling features reserved for future use.
>  
> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> +   SUSPEND the device.
> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-14 11:34   ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  2:57     ` Zhu, Lingshan
  2023-09-15 11:10       ` Michael S. Tsirkin
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>> This patch introduces a new status bit in the device status: SUSPEND.
>>
>> This SUSPEND bit can be used by the driver to suspend a device,
>> in order to stabilize the device states and virtqueue states.
>>
>> Its main use case is live migration.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0e492cd..0fab537 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>> +  device has been suspended by the driver.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>   
>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>> +
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   
>>   The device MUST NOT consume buffers or send any used buffer
>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>>   
>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>> +
>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> why? let's just forbid driver from setting it.
OK
>
>> +
>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>> +
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>> +and resumes operation upon DRIVER_OK.
>> +
> sorry what?
In case of a failed or cancelled Live Migration, the device needs to 
resume operation.
However the spec forbids the driver to clear a device status bit, so 
re-writing
DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>
>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>> +
>> +\begin{itemize}
>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>> +\item Flush all used buffer and send used buffer notifications to the driver.
> flush how?
This is device-type-specific, and we will include tracking inflight 
descriptors(buffers) in V2.
>
>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>
> record where?
This is transport specific, for PCI, patch 5 introduces two new fields 
for avail and used state
>
>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> pause in what sense? completely?  this does not seem realistic.
> e.g. pci express link has to stay active or device will die.
only pause virtio, I will rephrase the sentence as "pause its virtio 
operation".
Others like PCI link in the example is out of the spec and we don't need
to migrate them.
>
>
> also, presumably here it is except a bunch of other fields.
> e.g. what about queue select and all related queue fields?
For now they are forbidden.

As SiWei suggested, we will introduce a new feature bit to control whether
allowing resetting a VQ after SUSPEND. We can use more feature bits if
there are requirements to perform anything after SUSPEND. But for now
they are forbidden.
>
>> +\end{itemize}
>> +
>>   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>   
>>   Each virtio device offers all the features it understands.  During
>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>   	handling features reserved for future use.
>>   
>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>> +   SUSPEND the device.
>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-15  2:57     ` Zhu, Lingshan
@ 2023-09-15 11:10       ` Michael S. Tsirkin
  2023-09-18  2:56         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:10 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> > > This patch introduces a new status bit in the device status: SUSPEND.
> > > 
> > > This SUSPEND bit can be used by the driver to suspend a device,
> > > in order to stabilize the device states and virtqueue states.
> > > 
> > > Its main use case is live migration.
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 31 +++++++++++++++++++++++++++++++
> > >   1 file changed, 31 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0e492cd..0fab537 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> > >     drive the device.
> > > +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> > > +  device has been suspended by the driver.
> > > +
> > >   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
> > >     an error from which it can't recover.
> > >   \end{description}
> > > @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   recover by issuing a reset.
> > >   \end{note}
> > > +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> > > +
> > > +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> > > +
> > >   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
> > >   The device MUST NOT consume buffers or send any used buffer
> > > @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
> > >   MUST send a device configuration change notification to the driver.
> > > +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> > > +
> > > +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> > why? let's just forbid driver from setting it.
> OK
> > 
> > > +
> > > +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> > > +
> > > +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> > > +and resumes operation upon DRIVER_OK.
> > > +
> > sorry what?
> In case of a failed or cancelled Live Migration, the device needs to resume
> operation.
> However the spec forbids the driver to clear a device status bit, so
> re-writing
> DRIVER_OK is expected to clear SUSPEND and the device resume operation.

No, DRIVER_OK is already set. Setting a bit that is already set should
not have side effects. In fact auto-clearing suspend is problematic too.


> > 
> > > +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> > > +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> > > +
> > > +\begin{itemize}
> > > +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> > > +\item Wait until all descriptors that being processed to finish and mark them as used.
> > > +\item Flush all used buffer and send used buffer notifications to the driver.
> > flush how?
> This is device-type-specific, and we will include tracking inflight
> descriptors(buffers) in V2.
> > 
> > > +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
> > 
> > record where?
> This is transport specific, for PCI, patch 5 introduces two new fields for
> avail and used state

they clearly can't store state for all vqs, these are just two 16 bit fields.

> > 
> > > +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> > pause in what sense? completely?  this does not seem realistic.
> > e.g. pci express link has to stay active or device will die.
> only pause virtio, I will rephrase the sentence as "pause its virtio
> operation".

that is vague too. for example what happens to link state of
a networking device?

> Others like PCI link in the example is out of the spec and we don't need
> to migrate them.
> > 
> > 
> > also, presumably here it is except a bunch of other fields.
> > e.g. what about queue select and all related queue fields?
> For now they are forbidden.
> 
> As SiWei suggested, we will introduce a new feature bit to control whether
> allowing resetting a VQ after SUSPEND. We can use more feature bits if
> there are requirements to perform anything after SUSPEND. But for now
> they are forbidden.

I don't know how this means, but whatever. you need to make
all this explicit though.

> > 
> > > +\end{itemize}
> > > +
> > >   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
> > >   Each virtio device offers all the features it understands.  During
> > > @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> > >   	handling features reserved for future use.
> > > +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> > > +   SUSPEND the device.
> > > +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > -- 
> > > 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-15 11:10       ` Michael S. Tsirkin
@ 2023-09-18  2:56         ` Zhu, Lingshan
  2023-09-18  4:42           ` Parav Pandit
  2023-09-18  6:50           ` Zhu, Lingshan
  0 siblings, 2 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>
>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>> in order to stabilize the device states and virtqueue states.
>>>>
>>>> Its main use case is live migration.
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>    1 file changed, 31 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0e492cd..0fab537 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>      drive the device.
>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>>>> +  device has been suspended by the driver.
>>>> +
>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>>>      an error from which it can't recover.
>>>>    \end{description}
>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    recover by issuing a reset.
>>>>    \end{note}
>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>>>> +
>>>>    \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>>>    The device MUST NOT consume buffers or send any used buffer
>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>>>    MUST send a device configuration change notification to the driver.
>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
>>> why? let's just forbid driver from setting it.
>> OK
>>>> +
>>>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>>>> +
>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>>>> +and resumes operation upon DRIVER_OK.
>>>> +
>>> sorry what?
>> In case of a failed or cancelled Live Migration, the device needs to resume
>> operation.
>> However the spec forbids the driver to clear a device status bit, so
>> re-writing
>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
> No, DRIVER_OK is already set. Setting a bit that is already set should
> not have side effects. In fact auto-clearing suspend is problematic too.
The spec says: Set the DRIVER_OK status bit. At this point the device is 
“live”.

So semantically DRIVER_OK can bring the device to live even from SUSPEND.

In the implementation, the device can check whether SUSPEND is set, then
decide what to do. Just don't ignore DRIVER_OK if it is already
set, and the driver should not clear a device status bit.

>
>
>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>>>> +
>>>> +\begin{itemize}
>>>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>>>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>>>> +\item Flush all used buffer and send used buffer notifications to the driver.
>>> flush how?
>> This is device-type-specific, and we will include tracking inflight
>> descriptors(buffers) in V2.
>>>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>>> record where?
>> This is transport specific, for PCI, patch 5 introduces two new fields for
>> avail and used state
> they clearly can't store state for all vqs, these are just two 16 bit fields.
vq states filed can work with queue_select like other vq fields.
I will document this in the comment.
>
>>>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
>>> pause in what sense? completely?  this does not seem realistic.
>>> e.g. pci express link has to stay active or device will die.
>> only pause virtio, I will rephrase the sentence as "pause its virtio
>> operation".
> that is vague too. for example what happens to link state of
> a networking device?
Then how about we say: pause operation in both data-path and control-path?

Or do you have any suggestion?
>
>> Others like PCI link in the example is out of the spec and we don't need
>> to migrate them.
>>>
>>> also, presumably here it is except a bunch of other fields.
>>> e.g. what about queue select and all related queue fields?
>> For now they are forbidden.
>>
>> As SiWei suggested, we will introduce a new feature bit to control whether
>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>> there are requirements to perform anything after SUSPEND. But for now
>> they are forbidden.
> I don't know how this means, but whatever. you need to make
> all this explicit though.
a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
bit has been negotiated then the device allow reset a vq after SUSPEND.
>
>>>> +\end{itemize}
>>>> +
>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>>>    Each virtio device offers all the features it understands.  During
>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>>>    	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>>>    	handling features reserved for future use.
>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>>>> +   SUSPEND the device.
>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>>>> +
>>>>    \end{description}
>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>>> -- 
>>>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  2:56         ` Zhu, Lingshan
@ 2023-09-18  4:42           ` Parav Pandit
  2023-09-18  5:14             ` Zhu, Lingshan
  2023-09-18  6:50           ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-18  4:42 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Monday, September 18, 2023 8:27 AM


> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
> negotiated then the device allow reset a vq after SUSPEND.

This is simply a wrong semantics to build to operate individual object after its parent object is suspended.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  4:42           ` Parav Pandit
@ 2023-09-18  5:14             ` Zhu, Lingshan
  2023-09-18  6:17               ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  5:14 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/18/2023 12:42 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Monday, September 18, 2023 8:27 AM
>
>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
>> negotiated then the device allow reset a vq after SUSPEND.
> This is simply a wrong semantics to build to operate individual object after its parent object is suspended.
A device can choose to respond to a set of signals and ignore others, right?

And, This is not your admin vq based LM solution, therefore there is NO 
PARENT objects.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  5:14             ` Zhu, Lingshan
@ 2023-09-18  6:17               ` Parav Pandit
  2023-09-18  6:38                 ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-18  6:17 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 10:45 AM
> 
> On 9/18/2023 12:42 PM, Parav Pandit wrote:
> >> From: virtio-comment@lists.oasis-open.org
> >> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
> >> Sent: Monday, September 18, 2023 8:27 AM
> >
> >> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
> >> has been negotiated then the device allow reset a vq after SUSPEND.
> > This is simply a wrong semantics to build to operate individual object after its
> parent object is suspended.
> A device can choose to respond to a set of signals and ignore others, right?
> 
> And, This is not your admin vq based LM solution, therefore there is NO PARENT
> objects.
There is parent object.
There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.

Admin commands and vq exists in the spec because to admin work.
The admin vq series is split from its users because it is hard to do everything in one go.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:17               ` Parav Pandit
@ 2023-09-18  6:38                 ` Zhu, Lingshan
  2023-09-18  6:46                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:38 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/18/2023 2:17 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 10:45 AM
>>
>> On 9/18/2023 12:42 PM, Parav Pandit wrote:
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
>>>> Sent: Monday, September 18, 2023 8:27 AM
>>>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
>>>> has been negotiated then the device allow reset a vq after SUSPEND.
>>> This is simply a wrong semantics to build to operate individual object after its
>> parent object is suspended.
>> A device can choose to respond to a set of signals and ignore others, right?
>>
>> And, This is not your admin vq based LM solution, therefore there is NO PARENT
>> objects.
> There is parent object.
> There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.
that is why we plan to implement a new feature bit to control this behavior.

However, in next version, as MST suggested, I will forbid resetting vqs 
after SUSPEND.
>
> Admin commands and vq exists in the spec because to admin work.
> The admin vq series is split from its users because it is hard to do everything in one go.
I failed to process this comment, how is this related to the question?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:38                 ` Zhu, Lingshan
@ 2023-09-18  6:46                   ` Parav Pandit
  2023-09-18  6:49                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-18  6:46 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 12:09 PM

> > There is parent object.
> > There is VQ which you propose to do SUSPEND_RESET of the parent virtio
> device which is already SUSPENDED.
> that is why we plan to implement a new feature bit to control this behavior.
> 
It does not matter adding a feature bit when the semantics itself is wrong.

> However, in next version, as MST suggested, I will forbid resetting vqs after
> SUSPEND.
Ok. That is good.
I got confused with above proposed bit.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:46                   ` Parav Pandit
@ 2023-09-18  6:49                     ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:49 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/18/2023 2:46 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 12:09 PM
>>> There is parent object.
>>> There is VQ which you propose to do SUSPEND_RESET of the parent virtio
>> device which is already SUSPENDED.
>> that is why we plan to implement a new feature bit to control this behavior.
>>
> It does not matter adding a feature bit when the semantics itself is wrong.
>
>> However, in next version, as MST suggested, I will forbid resetting vqs after
>> SUSPEND.
> Ok. That is good.
> I got confused with above proposed bit.
OK, lets at least close this one


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  2:56         ` Zhu, Lingshan
  2023-09-18  4:42           ` Parav Pandit
@ 2023-09-18  6:50           ` Zhu, Lingshan
  1 sibling, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 10:56 AM, Zhu, Lingshan wrote:
>
>
> On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
>> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>>
>>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>>
>>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>>> in order to stabilize the device states and virtqueue states.
>>>>>
>>>>> Its main use case is live migration.
>>>>>
>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>>    1 file changed, 31 insertions(+)
>>>>>
>>>>> diff --git a/content.tex b/content.tex
>>>>> index 0e492cd..0fab537 100644
>>>>> --- a/content.tex
>>>>> +++ b/content.tex
>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and 
>>>>> ready to
>>>>>      drive the device.
>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, 
>>>>> indicates that the
>>>>> +  device has been suspended by the driver.
>>>>> +
>>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has 
>>>>> experienced
>>>>>      an error from which it can't recover.
>>>>>    \end{description}
>>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    recover by issuing a reset.
>>>>>    \end{note}
>>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +When setting SUSPEND, the driver MUST re-read \field{device 
>>>>> status} to ensure the SUSPEND bit is set.
>>>>> +
>>>>>    \devicenormative{\subsection}{Device Status Field}{Basic 
>>>>> Facilities of a Virtio Device / Device Status Field}
>>>>>    The device MUST NOT consume buffers or send any used buffer
>>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets 
>>>>> DEVICE_NEEDS_RESET, the device
>>>>>    MUST send a device configuration change notification to the 
>>>>> driver.
>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not 
>>>>> negotiated.
>>>> why? let's just forbid driver from setting it.
>>> OK
>>>>> +
>>>>> +The device SHOULD allow settings to \field{device status} even 
>>>>> when SUSPEND is set.
>>>>> +
>>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device 
>>>>> SHOULD clear SUSPEND
>>>>> +and resumes operation upon DRIVER_OK.
>>>>> +
>>>> sorry what?
>>> In case of a failed or cancelled Live Migration, the device needs to 
>>> resume
>>> operation.
>>> However the spec forbids the driver to clear a device status bit, so
>>> re-writing
>>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>> No, DRIVER_OK is already set. Setting a bit that is already set should
>> not have side effects. In fact auto-clearing suspend is problematic too.
> The spec says: Set the DRIVER_OK status bit. At this point the device 
> is “live”.
>
> So semantically DRIVER_OK can bring the device to live even from SUSPEND.
>
> In the implementation, the device can check whether SUSPEND is set, then
> decide what to do. Just don't ignore DRIVER_OK if it is already
> set, and the driver should not clear a device status bit.
>
>>
>>
>>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>>> +the device SHOULD perform the following actions before presenting 
>>>>> SUSPEND bit in the \field{device status}:
>>>>> +
>>>>> +\begin{itemize}
>>>>> +\item Stop consuming buffers of any virtqueues and mark all 
>>>>> finished descritors as used.
>>>>> +\item Wait until all descriptors that being processed to finish 
>>>>> and mark them as used.
>>>>> +\item Flush all used buffer and send used buffer notifications to 
>>>>> the driver.
>>>> flush how?
>>> This is device-type-specific, and we will include tracking inflight
>>> descriptors(buffers) in V2.
>>>>> +\item Record Virtqueue State of each enabled virtqueue, see 
>>>>> section \ref{sec:Virtqueues / Virtqueue State}
>>>> record where?
>>> This is transport specific, for PCI, patch 5 introduces two new 
>>> fields for
>>> avail and used state
>> they clearly can't store state for all vqs, these are just two 16 bit 
>> fields.
> vq states filed can work with queue_select like other vq fields.
> I will document this in the comment.
>>
>>>>> +\item Pause its operation except \field{device status} and 
>>>>> preserve configurations in its Device Configuration Space, see 
>>>>> \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Configuration Space}
>>>> pause in what sense? completely? this does not seem realistic.
>>>> e.g. pci express link has to stay active or device will die.
>>> only pause virtio, I will rephrase the sentence as "pause its virtio
>>> operation".
>> that is vague too. for example what happens to link state of
>> a networking device?
> Then how about we say: pause operation in both data-path and 
> control-path?
>
> Or do you have any suggestion?
>>
>>> Others like PCI link in the example is out of the spec and we don't 
>>> need
>>> to migrate them.
>>>>
>>>> also, presumably here it is except a bunch of other fields.
>>>> e.g. what about queue select and all related queue fields?
>>> For now they are forbidden.
>>>
>>> As SiWei suggested, we will introduce a new feature bit to control 
>>> whether
>>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>>> there are requirements to perform anything after SUSPEND. But for now
>>> they are forbidden.
>> I don't know how this means, but whatever. you need to make
>> all this explicit though.
> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
> bit has been negotiated then the device allow reset a vq after SUSPEND.
Hi Michael,

Rethink of this, as you suggested before, In V2, I will forbid resetting
VQs after SUSPEND.

Thanks
>>
>>>>> +\end{itemize}
>>>>> +
>>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio 
>>>>> Device / Feature Bits}
>>>>>    Each virtio device offers all the features it understands.  During
>>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature 
>>>>> Bits}\label{sec:Reserved Feature Bits}
>>>>>        \ref{devicenormative:Basic Facilities of a Virtio Device / 
>>>>> Feature Bits} for
>>>>>        handling features reserved for future use.
>>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the 
>>>>> driver can
>>>>> +   SUSPEND the device.
>>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Status Field}.
>>>>> +
>>>>>    \end{description}
>>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved 
>>>>> Feature Bits}
>>>>> -- 
>>>>> 2.35.3
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
  2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
  2023-09-06  8:16 ` [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status Zhu Lingshan
@ 2023-09-06  8:16 ` Zhu Lingshan
  2023-09-14 11:30   ` [virtio-comment] " Michael S. Tsirkin
  2023-09-06  8:16 ` [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND Zhu Lingshan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This commit specifies the constraints of the virtqueue state,
and the actions should be taken by the device when SUSPEND
and DRIVER_OK is set

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/content.tex b/content.tex
index 0fab537..9d727ce 100644
--- a/content.tex
+++ b/content.tex
@@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
 When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
 is always 0
 
+\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
+the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
+used index in the used ring.
+
+\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
+Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
+or both of DRIVER_OK and SUSPEND are set in \field{device status}.
+Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
+
+If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
+the device MUST record the Virtqueue State of every enabled virtqueue
+in \field{Available State} and \field{Used State} respectively,
+and correspondingly restore the Virtqueue State of every enabled virtqueue
+from \field{Available State} and \field{Used State} when DRIVER_OK is set.
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-06  8:16 ` [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state Zhu Lingshan
@ 2023-09-14 11:30   ` Michael S. Tsirkin
  2023-09-15  2:59     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:30 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> This commit specifies the constraints of the virtqueue state,
> and the actions should be taken by the device when SUSPEND
> and DRIVER_OK is set
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0fab537..9d727ce 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>  When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>  is always 0
>  
> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> +used index in the used ring.
> +
> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> +
> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> +the device MUST record the Virtqueue State of every enabled virtqueue
> +in \field{Available State} and \field{Used State} respectively,

record how?

> +and correspondingly restore the Virtqueue State of every enabled virtqueue
> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.

when is that?


> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-14 11:30   ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  2:59     ` Zhu, Lingshan
  2023-09-15 11:16       ` Michael S. Tsirkin
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>> This commit specifies the constraints of the virtqueue state,
>> and the actions should be taken by the device when SUSPEND
>> and DRIVER_OK is set
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0fab537..9d727ce 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>   is always 0
>>   
>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>> +used index in the used ring.
>> +
>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>> +
>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>> +the device MUST record the Virtqueue State of every enabled virtqueue
>> +in \field{Available State} and \field{Used State} respectively,
> record how?
This is transport specific, for PCI they are recorded in the common 
config space,
two new fields of them are introduced in patch 5.
>
>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> when is that?
When the DRIVER sets DRIVER_OK and done before the device presents 
DRIVER_OK.
>
>
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-15  2:59     ` Zhu, Lingshan
@ 2023-09-15 11:16       ` Michael S. Tsirkin
  2023-09-18  3:02         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:16 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > This commit specifies the constraints of the virtqueue state,
> > > and the actions should be taken by the device when SUSPEND
> > > and DRIVER_OK is set
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 19 +++++++++++++++++++
> > >   1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0fab537..9d727ce 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > >   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > >   is always 0
> > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > +used index in the used ring.
> > > +
> > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > +
> > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > +in \field{Available State} and \field{Used State} respectively,
> > record how?
> This is transport specific, for PCI they are recorded in the common config
> space,
> two new fields of them are introduced in patch 5.


that is not enough space to record state for every enabled vq.

> > 
> > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > when is that?
> When the DRIVER sets DRIVER_OK and done before the device presents
> DRIVER_OK.

I don't really understand the flow here. does SUSPEND clear DRIVER_OK
then?


> > 
> > 
> > > +
> > >   \input{admin.tex}
> > >   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > -- 
> > > 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-15 11:16       ` Michael S. Tsirkin
@ 2023-09-18  3:02         ` Zhu, Lingshan
  2023-09-18 17:30           ` Michael S. Tsirkin
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>> This commit specifies the constraints of the virtqueue state,
>>>> and the actions should be taken by the device when SUSPEND
>>>> and DRIVER_OK is set
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 19 +++++++++++++++++++
>>>>    1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0fab537..9d727ce 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>    is always 0
>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>> +used index in the used ring.
>>>> +
>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>> +in \field{Available State} and \field{Used State} respectively,
>>> record how?
>> This is transport specific, for PCI they are recorded in the common config
>> space,
>> two new fields of them are introduced in patch 5.
>
> that is not enough space to record state for every enabled vq.
They can work with queue_select like many other vq configurations.
I will mention this in the comment.
>
>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>> when is that?
>> When the DRIVER sets DRIVER_OK and done before the device presents
>> DRIVER_OK.
> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> then?
SUSPEND does not clear DRIVER, I think this is not a must.
>
>
>>>
>>>> +
>>>>    \input{admin.tex}
>>>>    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>> -- 
>>>> 2.35.3
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-18  3:02         ` Zhu, Lingshan
@ 2023-09-18 17:30           ` Michael S. Tsirkin
  2023-09-19  7:56             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-18 17:30 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > > > This commit specifies the constraints of the virtqueue state,
> > > > > and the actions should be taken by the device when SUSPEND
> > > > > and DRIVER_OK is set
> > > > > 
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >    content.tex | 19 +++++++++++++++++++
> > > > >    1 file changed, 19 insertions(+)
> > > > > 
> > > > > diff --git a/content.tex b/content.tex
> > > > > index 0fab537..9d727ce 100644
> > > > > --- a/content.tex
> > > > > +++ b/content.tex
> > > > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > > > >    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > > > >    is always 0
> > > > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > > > +used index in the used ring.
> > > > > +
> > > > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > > > +in \field{Available State} and \field{Used State} respectively,
> > > > record how?
> > > This is transport specific, for PCI they are recorded in the common config
> > > space,
> > > two new fields of them are introduced in patch 5.
> > 
> > that is not enough space to record state for every enabled vq.
> They can work with queue_select like many other vq configurations.

queue select is under driver control.


> I will mention this in the comment.
> > 
> > > > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > > > when is that?
> > > When the DRIVER sets DRIVER_OK and done before the device presents
> > > DRIVER_OK.
> > I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> > then?
> SUSPEND does not clear DRIVER, I think this is not a must.

then I don't get what does "when DRIVER_OK is set" mean - it stays
set all the time.


> > 
> > 
> > > > 
> > > > > +
> > > > >    \input{admin.tex}
> > > > >    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > > > -- 
> > > > > 2.35.3
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-18 17:30           ` Michael S. Tsirkin
@ 2023-09-19  7:56             ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  7:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/19/2023 1:30 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
>>> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>>>> This commit specifies the constraints of the virtqueue state,
>>>>>> and the actions should be taken by the device when SUSPEND
>>>>>> and DRIVER_OK is set
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>> ---
>>>>>>     content.tex | 19 +++++++++++++++++++
>>>>>>     1 file changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/content.tex b/content.tex
>>>>>> index 0fab537..9d727ce 100644
>>>>>> --- a/content.tex
>>>>>> +++ b/content.tex
>>>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>>>     When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>>>     is always 0
>>>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>>>> +used index in the used ring.
>>>>>> +
>>>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>>>> +in \field{Available State} and \field{Used State} respectively,
>>>>> record how?
>>>> This is transport specific, for PCI they are recorded in the common config
>>>> space,
>>>> two new fields of them are introduced in patch 5.
>>> that is not enough space to record state for every enabled vq.
>> They can work with queue_select like many other vq configurations.
> queue select is under driver control.
queue_select works for other fields like queue_size which is also RW.

It looks no difference between queue_size and vq_state.
>
>
>> I will mention this in the comment.
>>>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>>>> when is that?
>>>> When the DRIVER sets DRIVER_OK and done before the device presents
>>>> DRIVER_OK.
>>> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
>>> then?
>> SUSPEND does not clear DRIVER, I think this is not a must.
> then I don't get what does "when DRIVER_OK is set" mean - it stays
> set all the time.
That means the driver sets DRIVER_OK.

I am not a native speaker, but This wording can be found throughout the 
spec, e.g.:

2.1.2 Device Requirements: Device Status Field

If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device MUST 
send a device configuration
change notification to the driver.
>
>
>>>
>>>>>> +
>>>>>>     \input{admin.tex}
>>>>>>     \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>>>> -- 
>>>>>> 2.35.3
>>> This publicly archived list offers a means to provide input to the
>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>
>>> In order to verify user consent to the Feedback License terms and
>>> to minimize spam in the list archive, subscription is required
>>> before posting.
>>>
>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>> List help: virtio-comment-help@lists.oasis-open.org
>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>> Committee: https://www.oasis-open.org/committees/virtio/
>>> Join OASIS: https://www.oasis-open.org/join/
>>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
                   ` (2 preceding siblings ...)
  2023-09-06  8:16 ` [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state Zhu Lingshan
@ 2023-09-06  8:16 ` Zhu Lingshan
  2023-09-14 11:09   ` [virtio-comment] " Michael S. Tsirkin
  2023-09-06  8:16 ` [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

When SUSPEND is set, the device should stabilize the device
states and virtqueue states, therefore the device should
ignore resetting vqs when SUSPEND is set in device status.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 content.tex | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex
index 9d727ce..cd2b426 100644
--- a/content.tex
+++ b/content.tex
@@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
 The device MUST reset any state of a virtqueue to the default state,
 including the available state and the used state.
 
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
+the device SHOULD ignore resetting any virtqueues.
+
 \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
 
 After the driver tells the device to reset a queue, the driver MUST verify that
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-06  8:16 ` [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND Zhu Lingshan
@ 2023-09-14 11:09   ` Michael S. Tsirkin
  2023-09-15  4:06     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:09 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
> When SUSPEND is set, the device should stabilize the device
> states and virtqueue states, therefore the device should
> ignore resetting vqs when SUSPEND is set in device status.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>

And do what? If you really feel it's important we can prohibit
driver from touching state. But generally this seems
un-orthogonal.


> ---
>  content.tex | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 9d727ce..cd2b426 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>  The device MUST reset any state of a virtqueue to the default state,
>  including the available state and the used state.
>  
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
> +the device SHOULD ignore resetting any virtqueues.
> +
>  \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>  
>  After the driver tells the device to reset a queue, the driver MUST verify that
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-14 11:09   ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  4:06     ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:09 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
>> When SUSPEND is set, the device should stabilize the device
>> states and virtqueue states, therefore the device should
>> ignore resetting vqs when SUSPEND is set in device status.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> And do what? If you really feel it's important we can prohibit
> driver from touching state. But generally this seems
> un-orthogonal.
As discussed in other threads, we will introduce new feature bit controlling
this.
>
>
>> ---
>>   content.tex | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 9d727ce..cd2b426 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>>   The device MUST reset any state of a virtqueue to the default state,
>>   including the available state and the used state.
>>   
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
>> +the device SHOULD ignore resetting any virtqueues.
>> +
>>   \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>>   
>>   After the driver tells the device to reset a queue, the driver MUST verify that
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
                   ` (3 preceding siblings ...)
  2023-09-06  8:16 ` [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND Zhu Lingshan
@ 2023-09-06  8:16 ` Zhu Lingshan
  2023-09-06  8:32   ` Michael S. Tsirkin
  2023-09-14 11:27   ` Michael S. Tsirkin
  2023-09-06  8:29 ` [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Michael S. Tsirkin
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 148+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds two new le16 fields to common configuration structure
to support VIRTIO_F_QUEUE_STATE in PCI transport layer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 transport-pci.tex | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..3161519 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
         /* About the administration virtqueue. */
         le16 admin_queue_index;         /* read-only for driver */
         le16 admin_queue_num;         /* read-only for driver */
+
+	/* Virtqueue state */
+        le16 queue_avail_state;         /* read-write */
+        le16 queue_used_state;          /* read-write */
 };
 \end{lstlisting}
 
@@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 	The value 0 indicates no supported administration virtqueues.
 	This field is valid only if VIRTIO_F_ADMIN_VQ has been
 	negotiated.
+
+\item[\field{queue_avail_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the available state of
+        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
+\item[\field{queue_used_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the used state of the
+        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
@@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 present either a value of 0 or a power of 2 in
 \field{queue_size}.
 
+If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
+any accesses to \field{queue_avail_state} and \field{queue_used_state}.
+
 If VIRTIO_F_ADMIN_VQ has been negotiated, the value
 \field{admin_queue_index} MUST be equal to, or bigger than
 \field{num_queues}; also, \field{admin_queue_num} MUST be
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16 ` [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
@ 2023-09-06  8:32   ` Michael S. Tsirkin
  2023-09-06  8:37     ` Parav Pandit
                       ` (2 more replies)
  2023-09-14 11:27   ` Michael S. Tsirkin
  1 sibling, 3 replies; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:32 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>


I do not see why this would be pci specific at all.

But besides I thought work on live migration will use
admin queue. This was explicitly one of the motivators.

Poking at the device from the driver to migrate it
is not going to work if the driver lives within guest.




> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32   ` Michael S. Tsirkin
@ 2023-09-06  8:37     ` Parav Pandit
  2023-09-06  9:37     ` Zhu, Lingshan
  2023-09-11  3:01     ` Jason Wang
  2 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-06  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang@redhat.com, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, September 6, 2023 2:03 PM
> To: Zhu Lingshan <lingshan.zhu@intel.com>
> Cc: jasowang@redhat.com; eperezma@redhat.com; cohuck@redhat.com;
> stefanha@redhat.com; virtio-comment@lists.oasis-open.org; virtio-
> dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> 
> 
> I do not see why this would be pci specific at all.
> 
> But besides I thought work on live migration will use admin queue. This was
> explicitly one of the motivators.
> 
> Poking at the device from the driver to migrate it is not going to work if the
> driver lives within guest.

Exactly.
I was not paying attention to this thread as we have AQ based proposal for the passthrough device.
It leverages many idea of what Si-Wei presented in KVM forum 2022.
I will post the first draft in few days.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32   ` Michael S. Tsirkin
  2023-09-06  8:37     ` Parav Pandit
@ 2023-09-06  9:37     ` Zhu, Lingshan
  2023-09-11  3:01     ` Jason Wang
  2 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:32 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
> I do not see why this would be pci specific at all.
It is just the implementation is transport specific.
>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.
I assume this straight forward solution can work.
>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.
The hypervisor can still set SUSPEND and do other stuffs like
collecting dirty pages.

The process should be freeze the guest first, then suspend the device.
>
>
>
>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32   ` Michael S. Tsirkin
  2023-09-06  8:37     ` Parav Pandit
  2023-09-06  9:37     ` Zhu, Lingshan
@ 2023-09-11  3:01     ` Jason Wang
  2023-09-11  4:11       ` Parav Pandit
  2 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-11  3:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
>
> I do not see why this would be pci specific at all.

This is the PCI interface for live migration. The facility is not
specific to PCI.

It can choose to reuse the common configuration or not, but the
semantic is general enough to be used by other transports. We can
introduce one for MMIO for sure.

>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.

I think not. Using admin virtqueue will end up with several problems:

1) the feature is not self contained so at the end we need transport
specific facility to migrate the admin virtqueue
2) won't work in the nested environment, or we need complicated SR-IOV
emulation in order to work

>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.

This is by design to allow live migration to work in the nested layer.
And it's the way we've used for CPU and MMU. Anything may virtio
different here?

Thanks


>
>
>
>
> > ---
> >  transport-pci.tex | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..3161519 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >          /* About the administration virtqueue. */
> >          le16 admin_queue_index;         /* read-only for driver */
> >          le16 admin_queue_num;         /* read-only for driver */
> > +
> > +     /* Virtqueue state */
> > +        le16 queue_avail_state;         /* read-write */
> > +        le16 queue_used_state;          /* read-write */
> >  };
> >  \end{lstlisting}
> >
> > @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >       The value 0 indicates no supported administration virtqueues.
> >       This field is valid only if VIRTIO_F_ADMIN_VQ has been
> >       negotiated.
> > +
> > +\item[\field{queue_avail_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the available state of
> > +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> > +\item[\field{queue_used_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the used state of the
> > +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> >  \end{description}
> >
> >  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> > @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >  present either a value of 0 or a power of 2 in
> >  \field{queue_size}.
> >
> > +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> > +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> > +
> >  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
> >  \field{admin_queue_index} MUST be equal to, or bigger than
> >  \field{num_queues}; also, \field{admin_queue_num} MUST be
> > --
> > 2.35.3
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> >
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  3:01     ` Jason Wang
@ 2023-09-11  4:11       ` Parav Pandit
  2023-09-11  6:30         ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  4:11 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

Hi Michael,

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Jason Wang
> Sent: Monday, September 11, 2023 8:31 AM
> 
> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > This patch adds two new le16 fields to common configuration
> > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > >
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >
> >
> > I do not see why this would be pci specific at all.
> 
> This is the PCI interface for live migration. The facility is not specific to PCI.
> 
> It can choose to reuse the common configuration or not, but the semantic is
> general enough to be used by other transports. We can introduce one for
> MMIO for sure.
> 
> >
> > But besides I thought work on live migration will use admin queue.
> > This was explicitly one of the motivators.
>
Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

 > I think not. Using admin virtqueue will end up with several problems:
> 
> 1) the feature is not self contained so at the end we need transport specific
> facility to migrate the admin virtqueue

You mixed up.
Admin queue of the owner device is not migrated.
Admin queue of the member device is migrated like any other queue using above [1].

> 2) won't work in the nested environment, or we need complicated SR-IOV
> emulation in order to work
> 
> >
> > Poking at the device from the driver to migrate it is not going to
> > work if the driver lives within guest.
> 
> This is by design to allow live migration to work in the nested layer.
> And it's the way we've used for CPU and MMU. Anything may virtio different
> here?

Nested and non-nested use cases likely cannot be addressed by single solution/interface.
So both are orthogonal requirements to me.

One can defined some administration commands to issue on the AQ of the member device itself for nested case.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  4:11       ` Parav Pandit
@ 2023-09-11  6:30         ` Jason Wang
  2023-09-11  6:47           ` Parav Pandit
                             ` (2 more replies)
  0 siblings, 3 replies; 148+ messages in thread
From: Jason Wang @ 2023-09-11  6:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi Michael,
>
> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> > open.org> On Behalf Of Jason Wang
> > Sent: Monday, September 11, 2023 8:31 AM
> >
> > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > This patch adds two new le16 fields to common configuration
> > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > >
> > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > >
> > >
> > > I do not see why this would be pci specific at all.
> >
> > This is the PCI interface for live migration. The facility is not specific to PCI.
> >
> > It can choose to reuse the common configuration or not, but the semantic is
> > general enough to be used by other transports. We can introduce one for
> > MMIO for sure.
> >
> > >
> > > But besides I thought work on live migration will use admin queue.
> > > This was explicitly one of the motivators.
> >
> Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

This proposal couples live migration with several requirements, and
suffers from the exact issues I've mentioned below.

In some cases, it's even worse (coupling with PCI/SR-IOV, second state
machine other than the device status).

>
>  > I think not. Using admin virtqueue will end up with several problems:
> >
> > 1) the feature is not self contained so at the end we need transport specific
> > facility to migrate the admin virtqueue
>
> You mixed up.
> Admin queue of the owner device is not migrated.

Why not? Ling Shan's proposal makes everything work including
migrating the owner or in the case there's even no owner.

In this proposal, the facility (suspending, queue state, inflight
descriptors) is decoupled from the transport specific API. Each
transport can implement one or more types of interfaces. A MMIO based
interface is proposed but It doesn't prevent you from adding admin
commands for those facilities on top.

> Admin queue of the member device is migrated like any other queue using above [1].
>
> > 2) won't work in the nested environment, or we need complicated SR-IOV
> > emulation in order to work
> >
> > >
> > > Poking at the device from the driver to migrate it is not going to
> > > work if the driver lives within guest.
> >
> > This is by design to allow live migration to work in the nested layer.
> > And it's the way we've used for CPU and MMU. Anything may virtio different
> > here?
>
> Nested and non-nested use cases likely cannot be addressed by single solution/interface.

I think Ling Shan's proposal addressed them both.

> So both are orthogonal requirements to me.
>
> One can defined some administration commands to issue on the AQ of the member device itself for nested case.

This is not easy, DMA needs to be isolated so this means you need to
either emulate SR-IOV and use AQ on virtual PF in the guest or using
PASID.

Customers don't want to have admin stuff, SR-IOV or PASID in the guest
in order to migrate a single virtio device in the nest.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30         ` Jason Wang
@ 2023-09-11  6:47           ` Parav Pandit
  2023-09-11  6:58             ` Zhu, Lingshan
  2023-09-12  4:18             ` Jason Wang
  2023-09-11  6:59           ` Parav Pandit
  2023-09-11 10:15           ` Michael S. Tsirkin
  2 siblings, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  6:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM
> 
> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Michael,
> >
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > Sent: Monday, September 11, 2023 8:31 AM
> > >
> > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > This patch adds two new le16 fields to common configuration
> > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > >
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >
> > > >
> > > > I do not see why this would be pci specific at all.
> > >
> > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > >
> > > It can choose to reuse the common configuration or not, but the
> > > semantic is general enough to be used by other transports. We can
> > > introduce one for MMIO for sure.
> > >
> > > >
> > > > But besides I thought work on live migration will use admin queue.
> > > > This was explicitly one of the motivators.
> > >
> > Please find the proposal that uses administration commands for device
> migration at [1] for passthrough devices.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> This proposal couples live migration with several requirements, and suffers from
> the exact issues I've mentioned below.
>
It does not.
Can you please list which one?
 
> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> other than the device status).
> 
There is no state machine in [1].
It is not coupled with PCI/SR-IOV either.
It supports PCI/SR-IOV transport and in future other transports too when they evolve.

> >
> >  > I think not. Using admin virtqueue will end up with several problems:
> > >
> > > 1) the feature is not self contained so at the end we need transport
> > > specific facility to migrate the admin virtqueue
> >
> > You mixed up.
> > Admin queue of the owner device is not migrated.
>
If you actually read more, it is for the member device migration and not the owner.
Hence, owner device admin queue is not migrated.
 
> Why not? Ling Shan's proposal makes everything work including migrating the
> owner or in the case there's even no owner.
> 
I don’t see in his proposal how all the features and functionality supported is achieved.

> In this proposal, the facility (suspending, queue state, inflight
> descriptors) is decoupled from the transport specific API. Each transport can
> implement one or more types of interfaces. A MMIO based interface is
> proposed but It doesn't prevent you from adding admin commands for those
> facilities on top.
>
Even in proposal [1] most things are transport agonistic.
Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.


> > Admin queue of the member device is migrated like any other queue using
> above [1].
> >
> > > 2) won't work in the nested environment, or we need complicated
> > > SR-IOV emulation in order to work
> > >
> > > >
> > > > Poking at the device from the driver to migrate it is not going to
> > > > work if the driver lives within guest.
> > >
> > > This is by design to allow live migration to work in the nested layer.
> > > And it's the way we've used for CPU and MMU. Anything may virtio
> > > different here?
> >
> > Nested and non-nested use cases likely cannot be addressed by single
> solution/interface.
> 
> I think Ling Shan's proposal addressed them both.
>
I don’t see how all above points are covered.

 
> > So both are orthogonal requirements to me.
> >
> > One can defined some administration commands to issue on the AQ of the
> member device itself for nested case.
> 
> This is not easy, DMA needs to be isolated so this means you need to either
> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>
This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
For passthrough device use case [1] has covered the necessary pieces.

> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.

As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.

Nested is some special case and likely need mediated based scheme using administration commands.

In best case we can produce common commands, if that fits. 
Else both proposals are orthogonal addressing different use cases.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:47           ` Parav Pandit
@ 2023-09-11  6:58             ` Zhu, Lingshan
  2023-09-11  7:07               ` Parav Pandit
  2023-09-12  4:18             ` Jason Wang
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  6:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 2:47 PM, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Monday, September 11, 2023 12:01 PM
>>
>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>>> Hi Michael,
>>>
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>
>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
>> wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>> This patch adds two new le16 fields to common configuration
>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>
>>>>> I do not see why this would be pci specific at all.
>>>> This is the PCI interface for live migration. The facility is not specific to PCI.
>>>>
>>>> It can choose to reuse the common configuration or not, but the
>>>> semantic is general enough to be used by other transports. We can
>>>> introduce one for MMIO for sure.
>>>>
>>>>> But besides I thought work on live migration will use admin queue.
>>>>> This was explicitly one of the motivators.
>>> Please find the proposal that uses administration commands for device
>> migration at [1] for passthrough devices.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> This proposal couples live migration with several requirements, and suffers from
>> the exact issues I've mentioned below.
>>
> It does not.
> Can you please list which one?
>   
>> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
>> other than the device status).
>>
> There is no state machine in [1].
> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>
>>>   > I think not. Using admin virtqueue will end up with several problems:
>>>> 1) the feature is not self contained so at the end we need transport
>>>> specific facility to migrate the admin virtqueue
>>> You mixed up.
>>> Admin queue of the owner device is not migrated.
> If you actually read more, it is for the member device migration and not the owner.
> Hence, owner device admin queue is not migrated.
Then how to serve bare-metal migration? Migrate by itself?
>   
>> Why not? Ling Shan's proposal makes everything work including migrating the
>> owner or in the case there's even no owner.
>>
> I don’t see in his proposal how all the features and functionality supported is achieved.
I will include in-flight descriptor tracker and diry-page traking in V2, 
anything else missed?
It can migrate the device itself, why don't you think so, can you name 
some issues we can work on
for improvements?
>
>> In this proposal, the facility (suspending, queue state, inflight
>> descriptors) is decoupled from the transport specific API. Each transport can
>> implement one or more types of interfaces. A MMIO based interface is
>> proposed but It doesn't prevent you from adding admin commands for those
>> facilities on top.
>>
> Even in proposal [1] most things are transport agonistic.
> Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.
If you want to implement LM by admin vq, the facilities in my series can 
be re-used. E.g., forward your suspend to SUSPEND bit.
>
>
>>> Admin queue of the member device is migrated like any other queue using
>> above [1].
>>>> 2) won't work in the nested environment, or we need complicated
>>>> SR-IOV emulation in order to work
>>>>
>>>>> Poking at the device from the driver to migrate it is not going to
>>>>> work if the driver lives within guest.
>>>> This is by design to allow live migration to work in the nested layer.
>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>> different here?
>>> Nested and non-nested use cases likely cannot be addressed by single
>> solution/interface.
>>
>> I think Ling Shan's proposal addressed them both.
>>
> I don’t see how all above points are covered.
Why?


And how do you migrate nested VMs by admin vq?

How many admin vqs and the bandwidth are
reserved for migrate all VMs?

Remember CSP migrates all VMs on a host for powersaving or upgrade.
>
>   
>>> So both are orthogonal requirements to me.
>>>
>>> One can defined some administration commands to issue on the AQ of the
>> member device itself for nested case.
>>
>> This is not easy, DMA needs to be isolated so this means you need to either
>> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>>
> This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
> For passthrough device use case [1] has covered the necessary pieces.
>
>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
>> to migrate a single virtio device in the nest.
> As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.
>
> Nested is some special case and likely need mediated based scheme using administration commands.
>
> In best case we can produce common commands, if that fits.
> Else both proposals are orthogonal addressing different use cases.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:58             ` Zhu, Lingshan
@ 2023-09-11  7:07               ` Parav Pandit
  2023-09-11  7:18                 ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  7:07 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:28 PM

> > I don’t see in his proposal how all the features and functionality supported is
> achieved.
> I will include in-flight descriptor tracker and diry-page traking in V2, anything
> else missed?
> It can migrate the device itself, why don't you think so, can you name some
> issues we can work on for improvements?

I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
Else, I don’t see a need to merge two things.

Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.

> If you want to implement LM by admin vq, the facilities in my series can be re-
> used. E.g., forward your suspend to SUSPEND bit.
Just VQ suspend is not enough...

> >
> >
> >>> Admin queue of the member device is migrated like any other queue
> >>> using
> >> above [1].
> >>>> 2) won't work in the nested environment, or we need complicated
> >>>> SR-IOV emulation in order to work
> >>>>
> >>>>> Poking at the device from the driver to migrate it is not going to
> >>>>> work if the driver lives within guest.
> >>>> This is by design to allow live migration to work in the nested layer.
> >>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>> different here?
> >>> Nested and non-nested use cases likely cannot be addressed by single
> >> solution/interface.
> >>
> >> I think Ling Shan's proposal addressed them both.
> >>
> > I don’t see how all above points are covered.
> Why?
> 
> 
> And how do you migrate nested VMs by admin vq?
>
Hypervisor = level 1.
VM = level 2.
Nested VM = level 3.
VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.

> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>
It does not matter because number of AQs is configurable that device and driver can decide to use.
I am not sure which BW are talking about.
There are many BW in place that one can regulate, at network level, pci level, VM level etc.

> Remember CSP migrates all VMs on a host for powersaving or upgrade.
I am not sure why the migration reason has any influence on the design.

The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting. 
CPU doesnt have support for 3 level of page table nesting either.
I agree that there could be other users who care for nested functionality.

Any ways, nesting and non-nesting are two different requirements.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:07               ` Parav Pandit
@ 2023-09-11  7:18                 ` Zhu, Lingshan
  2023-09-11  7:30                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:18 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 3:07 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:28 PM
>>> I don’t see in his proposal how all the features and functionality supported is
>> achieved.
>> I will include in-flight descriptor tracker and diry-page traking in V2, anything
>> else missed?
>> It can migrate the device itself, why don't you think so, can you name some
>> issues we can work on for improvements?
> I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
> Else, I don’t see a need to merge two things.
>
> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.
We are introducing basic facilities, feel free to re-use them in the 
admin vq solution.
>
>> If you want to implement LM by admin vq, the facilities in my series can be re-
>> used. E.g., forward your suspend to SUSPEND bit.
> Just VQ suspend is not enough...
In this series, it contains: device SUSPEND, queue state accessor.
MST required in-flight descriptor tracking, which will be included in 
next version.
>
>>>
>>>>> Admin queue of the member device is migrated like any other queue
>>>>> using
>>>> above [1].
>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>> SR-IOV emulation in order to work
>>>>>>
>>>>>>> Poking at the device from the driver to migrate it is not going to
>>>>>>> work if the driver lives within guest.
>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>> different here?
>>>>> Nested and non-nested use cases likely cannot be addressed by single
>>>> solution/interface.
>>>>
>>>> I think Ling Shan's proposal addressed them both.
>>>>
>>> I don’t see how all above points are covered.
>> Why?
>>
>>
>> And how do you migrate nested VMs by admin vq?
>>
> Hypervisor = level 1.
> VM = level 2.
> Nested VM = level 3.
> VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.
So, nested VM is not aware of the admin vq or does not have access to 
admin vq, right?
>
>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>
> It does not matter because number of AQs is configurable that device and driver can decide to use.
> I am not sure which BW are talking about.
> There are many BW in place that one can regulate, at network level, pci level, VM level etc.
It matters because of QOS and the downtime must converge.

E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the 
number in HW implementation and how
does the driver get informed?
>
>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> I am not sure why the migration reason has any influence on the design.
Because this design is for live migration.
>
> The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting.
> CPU doesnt have support for 3 level of page table nesting either.
> I agree that there could be other users who care for nested functionality.
>
> Any ways, nesting and non-nesting are two different requirements.
The LM facility should server both, or it is far from ready. And it does 
not serve bare-metal live migration either.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:18                 ` Zhu, Lingshan
@ 2023-09-11  7:30                   ` Parav Pandit
  2023-09-11  7:58                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  7:30 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:48 PM
> 
> On 9/11/2023 3:07 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 12:28 PM
> >>> I don’t see in his proposal how all the features and functionality
> >>> supported is
> >> achieved.
> >> I will include in-flight descriptor tracker and diry-page traking in
> >> V2, anything else missed?
> >> It can migrate the device itself, why don't you think so, can you
> >> name some issues we can work on for improvements?
> > I would like to see a proposal similar to [1] that can work without mediation
> in case if you want to combine two use cases under one.
> > Else, I don’t see a need to merge two things.
> >
> > Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
> in [1] for passthrough cases.
> We are introducing basic facilities, feel free to re-use them in the admin vq
> solution.
Basic facilities are added in [1] for passthrough devices.
You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.

> >
> >> If you want to implement LM by admin vq, the facilities in my series
> >> can be re- used. E.g., forward your suspend to SUSPEND bit.
> > Just VQ suspend is not enough...
> In this series, it contains: device SUSPEND, queue state accessor.
> MST required in-flight descriptor tracking, which will be included in next
> version.
For passthrough more than that is needed.
> >
> >>>
> >>>>> Admin queue of the member device is migrated like any other queue
> >>>>> using
> >>>> above [1].
> >>>>>> 2) won't work in the nested environment, or we need complicated
> >>>>>> SR-IOV emulation in order to work
> >>>>>>
> >>>>>>> Poking at the device from the driver to migrate it is not going
> >>>>>>> to work if the driver lives within guest.
> >>>>>> This is by design to allow live migration to work in the nested layer.
> >>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>>>> different here?
> >>>>> Nested and non-nested use cases likely cannot be addressed by
> >>>>> single
> >>>> solution/interface.
> >>>>
> >>>> I think Ling Shan's proposal addressed them both.
> >>>>
> >>> I don’t see how all above points are covered.
> >> Why?
> >>
> >>
> >> And how do you migrate nested VMs by admin vq?
> >>
> > Hypervisor = level 1.
> > VM = level 2.
> > Nested VM = level 3.
> > VM of level 2 to take care of migrating level 3 composed device using its sw
> composition or may be using some kind of mediation that you proposed.
> So, nested VM is not aware of the admin vq or does not have access to admin
> vq, right?
Right. It is not aware.

> >

> >> How many admin vqs and the bandwidth are reserved for migrate all VMs?
> >>
> > It does not matter because number of AQs is configurable that device and
> driver can decide to use.
> > I am not sure which BW are talking about.
> > There are many BW in place that one can regulate, at network level, pci level,
> VM level etc.
> It matters because of QOS and the downtime must converge.
QOS is such a broad term that is hard to debate unless you get to a specific point.
> 
> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
> in HW implementation and how does the driver get informed?
Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
Device exposes number of supported AQs that driver is free to use.

Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
But when such requirements arise, a device may support it.
Just like how a net device can support from 1 to 32K txqueues at spec level.

> >
> >> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> > I am not sure why the migration reason has any influence on the design.
> Because this design is for live migration.
> >
> > The CSPs that we had discussed, care for performance more and hence
> prefers passthrough instead or mediation and don’t seem to be doing any
> nesting.
> > CPU doesnt have support for 3 level of page table nesting either.
> > I agree that there could be other users who care for nested functionality.
> >
> > Any ways, nesting and non-nesting are two different requirements.
> The LM facility should server both, 
I don’t see how PCI spec let you do it.
PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
Hence, its done over admin queue for passthrough devices.

If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.

> And it does not serve bare-metal live migration either.
A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:30                   ` Parav Pandit
@ 2023-09-11  7:58                     ` Zhu, Lingshan
  2023-09-11  8:12                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 3:30 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:48 PM
>>
>> On 9/11/2023 3:07 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 12:28 PM
>>>>> I don’t see in his proposal how all the features and functionality
>>>>> supported is
>>>> achieved.
>>>> I will include in-flight descriptor tracker and diry-page traking in
>>>> V2, anything else missed?
>>>> It can migrate the device itself, why don't you think so, can you
>>>> name some issues we can work on for improvements?
>>> I would like to see a proposal similar to [1] that can work without mediation
>> in case if you want to combine two use cases under one.
>>> Else, I don’t see a need to merge two things.
>>>
>>> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
>> in [1] for passthrough cases.
>> We are introducing basic facilities, feel free to re-use them in the admin vq
>> solution.
> Basic facilities are added in [1] for passthrough devices.
> You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.
Basic facilities should be better not depend on others, but admin vq can 
re-use the basic facilities.

For P2P, what if the devices are placed in different IOMMU group?
>
>>>> If you want to implement LM by admin vq, the facilities in my series
>>>> can be re- used. E.g., forward your suspend to SUSPEND bit.
>>> Just VQ suspend is not enough...
>> In this series, it contains: device SUSPEND, queue state accessor.
>> MST required in-flight descriptor tracking, which will be included in next
>> version.
> For passthrough more than that is needed.
Dirty page tracking will be addressed too, others should we work on?
>>>>>>> Admin queue of the member device is migrated like any other queue
>>>>>>> using
>>>>>> above [1].
>>>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>>>> SR-IOV emulation in order to work
>>>>>>>>
>>>>>>>>> Poking at the device from the driver to migrate it is not going
>>>>>>>>> to work if the driver lives within guest.
>>>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>>>> different here?
>>>>>>> Nested and non-nested use cases likely cannot be addressed by
>>>>>>> single
>>>>>> solution/interface.
>>>>>>
>>>>>> I think Ling Shan's proposal addressed them both.
>>>>>>
>>>>> I don’t see how all above points are covered.
>>>> Why?
>>>>
>>>>
>>>> And how do you migrate nested VMs by admin vq?
>>>>
>>> Hypervisor = level 1.
>>> VM = level 2.
>>> Nested VM = level 3.
>>> VM of level 2 to take care of migrating level 3 composed device using its sw
>> composition or may be using some kind of mediation that you proposed.
>> So, nested VM is not aware of the admin vq or does not have access to admin
>> vq, right?
> Right. It is not aware.
>
>>>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>>>
>>> It does not matter because number of AQs is configurable that device and
>> driver can decide to use.
>>> I am not sure which BW are talking about.
>>> There are many BW in place that one can regulate, at network level, pci level,
>> VM level etc.
>> It matters because of QOS and the downtime must converge.
> QOS is such a broad term that is hard to debate unless you get to a specific point.
E.g., there can be hundreds or thousands of VMs, how many admin vq are 
required to serve them when
LM? To converge, no timeout.
>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
>> in HW implementation and how does the driver get informed?
> Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
> You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
> Device exposes number of supported AQs that driver is free to use.
RSS is not a must for the transition through maybe performance overhead.
But if the host can not finish Live Migration in the due time, then it is
a failed LM.
>
> Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
> But when such requirements arise, a device may support it.
> Just like how a net device can support from 1 to 32K txqueues at spec level.
The orchestration layer may do that for host upgrade or power-saving.
And the VMs may be required to migrate together, for example:
a cluster of VMs in the same subnet.

Lets do not introduce new frangibility
>
>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>> I am not sure why the migration reason has any influence on the design.
>> Because this design is for live migration.
>>> The CSPs that we had discussed, care for performance more and hence
>> prefers passthrough instead or mediation and don’t seem to be doing any
>> nesting.
>>> CPU doesnt have support for 3 level of page table nesting either.
>>> I agree that there could be other users who care for nested functionality.
>>>
>>> Any ways, nesting and non-nesting are two different requirements.
>> The LM facility should server both,
> I don’t see how PCI spec let you do it.
> PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
> Hence, its done over admin queue for passthrough devices.
>
> If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.
Do you mean nested? Why this series can not support nested?
>
>> And it does not serve bare-metal live migration either.
> A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
> But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.
Bare metal live migration require other components like firmware OS and 
partitioning, that's why the device live migration should not
be a blocker.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:58                     ` Zhu, Lingshan
@ 2023-09-11  8:12                       ` Parav Pandit
  2023-09-11  8:46                         ` Zhu, Lingshan
  2023-09-12  4:10                         ` Jason Wang
  0 siblings, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  8:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 1:28 PM


> > Basic facilities are added in [1] for passthrough devices.
> > You can leverage them in your v2 for supporting p2p devices, dirty page
> tracking, passthrough support, shorter downtime and more.
> Basic facilities should be better not depend on others, but admin vq can re-use
> the basic facilities.
> 
> For P2P, what if the devices are placed in different IOMMU group?
IOMMU grouping is one OS specific notion of an older API.
Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
It is outside primarily because proposal [1] is not migrating the whole "PCI device".
It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
And vis-versa.

> > QOS is such a broad term that is hard to debate unless you get to a specific
> point.
> E.g., there can be hundreds or thousands of VMs, how many admin vq are
> required to serve them when LM? To converge, no timeout.
How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
Such details are outside the scope of virtio specification.
Those are implementation details of the device.

Similarly here for AQ too.
The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.

> >> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >> number in HW implementation and how does the driver get informed?
> > Usually just one AQ is enough as proposal [1] is built around inherent
> downtime reduction.
> > You can ask similar question for RSS, how does hw device how many RSS
> > queues are needed. 😊
> > Device exposes number of supported AQs that driver is free to use.
> RSS is not a must for the transition through maybe performance overhead.
> But if the host can not finish Live Migration in the due time, then it is a failed
> LM.
It can aborts the LM and restore it back by resuming the device.

> >
> > Most sane sys admins do not migrate 1000 VMs at same time for obvious
> reasons.
> > But when such requirements arise, a device may support it.
> > Just like how a net device can support from 1 to 32K txqueues at spec level.
> The orchestration layer may do that for host upgrade or power-saving.
> And the VMs may be required to migrate together, for example:
> a cluster of VMs in the same subnet.
> 
Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.

> Lets do not introduce new frangibility
I don’t see any frangibility added by [1].
If you see one, please let me know.

> >

> >>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>> I am not sure why the migration reason has any influence on the design.
> >> Because this design is for live migration.
> >>> The CSPs that we had discussed, care for performance more and hence
> >> prefers passthrough instead or mediation and don’t seem to be doing
> >> any nesting.
> >>> CPU doesnt have support for 3 level of page table nesting either.
> >>> I agree that there could be other users who care for nested functionality.
> >>>
> >>> Any ways, nesting and non-nesting are two different requirements.
> >> The LM facility should server both,
> > I don’t see how PCI spec let you do it.
> > PCI community already handed over this to SR-PCIM interface outside of the
> PCI spec domain.
> > Hence, its done over admin queue for passthrough devices.
> >
> > If you can explain, how your proposal addresses passthrough support without
> mediation and also does DMA, I am very interested to learn that.
> Do you mean nested? 
Before nesting, just like to see basic single level passthrough to see functional and performant like [1].

> Why this series can not support nested?
I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >> And it does not serve bare-metal live migration either.
> > A bare-metal migration seems a distance theory as one need side cpu,
> memory accessor apart from device accessor.
> > But somehow if that exists, there will be similar admin device to migrate it
> may be TDDISP will own this whole piece one day.
> Bare metal live migration require other components like firmware OS and
> partitioning, that's why the device live migration should not be a blocker.
Device migration is not blocker.
In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.

Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:12                       ` Parav Pandit
@ 2023-09-11  8:46                         ` Zhu, Lingshan
  2023-09-11  9:05                           ` Parav Pandit
  2023-09-12  4:10                         ` Jason Wang
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  8:46 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 4:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 1:28 PM
>
>>> Basic facilities are added in [1] for passthrough devices.
>>> You can leverage them in your v2 for supporting p2p devices, dirty page
>> tracking, passthrough support, shorter downtime and more.
>> Basic facilities should be better not depend on others, but admin vq can re-use
>> the basic facilities.
>>
>> For P2P, what if the devices are placed in different IOMMU group?
> IOMMU grouping is one OS specific notion of an older API.
> Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
> It is outside primarily because proposal [1] is not migrating the whole "PCI device".
> It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
> And vis-versa.
Since you talked about P2P, IOMMU is basically for address space 
isolation. For security reasons, it is usually
suggest to passthrough all devices in one IOMMU group to a single guest.

That means, if you want the VF to perform P2P with the PF there the AQ 
resides, you have to place them in the same
IOMMU group and passthrough them all to a guest. So how this AQ serve 
other purposes?
>
>>> QOS is such a broad term that is hard to debate unless you get to a specific
>> point.
>> E.g., there can be hundreds or thousands of VMs, how many admin vq are
>> required to serve them when LM? To converge, no timeout.
> How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
> Such details are outside the scope of virtio specification.
> Those are implementation details of the device.
>
> Similarly here for AQ too.
> The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
> And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.
As pointed above insufficient RSS capabilities may cause performance 
overhead, not not a failure, the device still stay functional.
But too few AQ to serve too high volume of VMs may be a problem.
Yes the number of AQs are negotiable, but how many exactly should the HW 
provide?

Naming a number or an algorithm for the ratio of devices / num_of_AQs is 
beyond this topic, but I made my point clear.
>
>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>> number in HW implementation and how does the driver get informed?
>>> Usually just one AQ is enough as proposal [1] is built around inherent
>> downtime reduction.
>>> You can ask similar question for RSS, how does hw device how many RSS
>>> queues are needed. 😊
>>> Device exposes number of supported AQs that driver is free to use.
>> RSS is not a must for the transition through maybe performance overhead.
>> But if the host can not finish Live Migration in the due time, then it is a failed
>> LM.
> It can aborts the LM and restore it back by resuming the device.
aborts means fail
>
>>> Most sane sys admins do not migrate 1000 VMs at same time for obvious
>> reasons.
>>> But when such requirements arise, a device may support it.
>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>> The orchestration layer may do that for host upgrade or power-saving.
>> And the VMs may be required to migrate together, for example:
>> a cluster of VMs in the same subnet.
>>
> Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.
PCI transition is FIFO, can depth = 1K introduce significant latency? 
And 1K depths is
almost identical to 2 X 500 queue depths, so still the same problem, how 
many resource
does the HW need to reserve to serve the worst case?

Let's forget the numbers, the point is clear.
>
>> Lets do not introduce new frangibility
> I don’t see any frangibility added by [1].
> If you see one, please let me know.
The resource and latency explained above.
>
>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>> I am not sure why the migration reason has any influence on the design.
>>>> Because this design is for live migration.
>>>>> The CSPs that we had discussed, care for performance more and hence
>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>> any nesting.
>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>> I agree that there could be other users who care for nested functionality.
>>>>>
>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>> The LM facility should server both,
>>> I don’t see how PCI spec let you do it.
>>> PCI community already handed over this to SR-PCIM interface outside of the
>> PCI spec domain.
>>> Hence, its done over admin queue for passthrough devices.
>>>
>>> If you can explain, how your proposal addresses passthrough support without
>> mediation and also does DMA, I am very interested to learn that.
>> Do you mean nested?
> Before nesting, just like to see basic single level passthrough to see functional and performant like [1].
I think we have discussed about this, the nested guest is not aware of 
the admin vq and can not access it,
because the admin vq is a host facility.
>
>> Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
We have discussed many other issues in this thread.
>
>>>> And it does not serve bare-metal live migration either.
>>> A bare-metal migration seems a distance theory as one need side cpu,
>> memory accessor apart from device accessor.
>>> But somehow if that exists, there will be similar admin device to migrate it
>> may be TDDISP will own this whole piece one day.
>> Bare metal live migration require other components like firmware OS and
>> partitioning, that's why the device live migration should not be a blocker.
> Device migration is not blocker.
> In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.
>
> Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.
The admin vq can not migrate it self therefore baremetal can not be 
migrated by admin vq


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:46                         ` Zhu, Lingshan
@ 2023-09-11  9:05                           ` Parav Pandit
  2023-09-11  9:32                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  9:05 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 2:17 PM

[..]
> > Hypervisor needs to do right setup anyway for using PCI spec define access
> control and other semantics which is outside the scope of [1].
> > It is outside primarily because proposal [1] is not migrating the whole "PCI
> device".
> > It is migrating the virtio device, so that we can migrate from PCI VF member
> to some software based device too.
> > And vis-versa.
> Since you talked about P2P, IOMMU is basically for address space isolation. For
> security reasons, it is usually suggest to passthrough all devices in one IOMMU
> group to a single guest.
> 
IOMMU group is OS concept and no need to mix it here.

> That means, if you want the VF to perform P2P with the PF there the AQ
> resides, you have to place them in the same IOMMU group and passthrough
> them all to a guest. So how this AQ serve other purposes?
> >
A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
Such extension is only needed for nesting.

For non-nesting being the known common case to us, such extension is not needed.

> >>> QOS is such a broad term that is hard to debate unless you get to a
> >>> specific
> >> point.
> >> E.g., there can be hundreds or thousands of VMs, how many admin vq
> >> are required to serve them when LM? To converge, no timeout.
> > How many RSS queues are required to reach 800Gbs NIC performance at what
> q depth at what interrupt moderation level?
> > Such details are outside the scope of virtio specification.
> > Those are implementation details of the device.
> >
> > Similarly here for AQ too.
> > The inherent nature of AQ to queue commands and execute them out of order
> in the device is the fundamental reason, AQ is introduced.
> > And one can have more AQs to do unrelated work, mainly from the hypervisor
> owner device who wants to enqueue unrelated commands in parallel.
> As pointed above insufficient RSS capabilities may cause performance
> overhead, not not a failure, the device still stay functional.
If UDP packets are dropped, even application can fail who do no retry.

> But too few AQ to serve too high volume of VMs may be a problem.
It is left for the device to implement the needed scale requirement.

> Yes the number of AQs are negotiable, but how many exactly should the HW
> provide?
Again, it is outside the scope. It is left to the device implementation like many other performance aspects.

> 
> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
> beyond this topic, but I made my point clear.
Sure. It is beyond.
And it is not a concern either.

> >
> >>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >>>> number in HW implementation and how does the driver get informed?
> >>> Usually just one AQ is enough as proposal [1] is built around
> >>> inherent
> >> downtime reduction.
> >>> You can ask similar question for RSS, how does hw device how many
> >>> RSS queues are needed. 😊
> >>> Device exposes number of supported AQs that driver is free to use.
> >> RSS is not a must for the transition through maybe performance overhead.
> >> But if the host can not finish Live Migration in the due time, then
> >> it is a failed LM.
> > It can aborts the LM and restore it back by resuming the device.
> aborts means fail
> >
> >>> Most sane sys admins do not migrate 1000 VMs at same time for
> >>> obvious
> >> reasons.
> >>> But when such requirements arise, a device may support it.
> >>> Just like how a net device can support from 1 to 32K txqueues at spec level.
> >> The orchestration layer may do that for host upgrade or power-saving.
> >> And the VMs may be required to migrate together, for example:
> >> a cluster of VMs in the same subnet.
> >>
> > Sure. AQ of depth 1K can support 1K outstanding commands at a time for
> 1000 member devices.
> PCI transition is FIFO, 
I do not understand what is "PCI transition".

> can depth = 1K introduce significant latency?
AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.

> And 1K depths is
> almost identical to 2 X 500 queue depths, so still the same problem, how many
> resource does the HW need to reserve to serve the worst case?
> 
You didn’t describe the problem.
Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
How many to execute in parallel, how many queues to have are device implementation specific.

> Let's forget the numbers, the point is clear.
Ok. I agree with you.
Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
Design wise, key thing to have the queuing interface between driver and device for device migration commands.
This enables both entities to execute things in parallel.

This is fully covered in [1].
So let's improve [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >
> >> Lets do not introduce new frangibility
> > I don’t see any frangibility added by [1].
> > If you see one, please let me know.
> The resource and latency explained above.
> >
> >>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>>>> I am not sure why the migration reason has any influence on the design.
> >>>> Because this design is for live migration.
> >>>>> The CSPs that we had discussed, care for performance more and
> >>>>> hence
> >>>> prefers passthrough instead or mediation and don’t seem to be doing
> >>>> any nesting.
> >>>>> CPU doesnt have support for 3 level of page table nesting either.
> >>>>> I agree that there could be other users who care for nested functionality.
> >>>>>
> >>>>> Any ways, nesting and non-nesting are two different requirements.
> >>>> The LM facility should server both,
> >>> I don’t see how PCI spec let you do it.
> >>> PCI community already handed over this to SR-PCIM interface outside
> >>> of the
> >> PCI spec domain.
> >>> Hence, its done over admin queue for passthrough devices.
> >>>
> >>> If you can explain, how your proposal addresses passthrough support
> >>> without
> >> mediation and also does DMA, I am very interested to learn that.
> >> Do you mean nested?
> > Before nesting, just like to see basic single level passthrough to see functional
> and performant like [1].
> I think we have discussed about this, the nested guest is not aware of the admin
> vq and can not access it, because the admin vq is a host facility.

A nested guest VM is not aware and should not.
The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.

At present for PCI transport, owner device is PF.

In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.

For bare metal may be some other admin device like DPU can do that role.

> >

> >> Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> We have discussed many other issues in this thread.
> >
> >>>> And it does not serve bare-metal live migration either.
> >>> A bare-metal migration seems a distance theory as one need side cpu,
> >> memory accessor apart from device accessor.
> >>> But somehow if that exists, there will be similar admin device to
> >>> migrate it
> >> may be TDDISP will own this whole piece one day.
> >> Bare metal live migration require other components like firmware OS
> >> and partitioning, that's why the device live migration should not be a
> blocker.
> > Device migration is not blocker.
> > In-fact it facilitates for this future in case if that happens where side cpu like
> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >
> > Long ago when admin commands were discussed, this was discussed too
> where a admin device may not be an owner device.
> The admin vq can not migrate it self therefore baremetal can not be migrated
> by admin vq
May be I was not clear. The admin commands are executed by some other device than the PF.
In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.

We don’t need to cook all now, as long as we have administration commands its good.
The real credit owner for detaching the administration command from the admin vq is Michael. :)
We like to utilize this in future for DPU case where admin device is not the PCI PF.
Eswitch, PF migration etc may utilize it in future when needed.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:05                           ` Parav Pandit
@ 2023-09-11  9:32                             ` Zhu, Lingshan
  2023-09-11 10:21                               ` Parav Pandit
  2023-09-11 11:50                               ` Parav Pandit
  0 siblings, 2 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  9:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 5:05 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 2:17 PM
> [..]
>>> Hypervisor needs to do right setup anyway for using PCI spec define access
>> control and other semantics which is outside the scope of [1].
>>> It is outside primarily because proposal [1] is not migrating the whole "PCI
>> device".
>>> It is migrating the virtio device, so that we can migrate from PCI VF member
>> to some software based device too.
>>> And vis-versa.
>> Since you talked about P2P, IOMMU is basically for address space isolation. For
>> security reasons, it is usually suggest to passthrough all devices in one IOMMU
>> group to a single guest.
>>
> IOMMU group is OS concept and no need to mix it here.
>
>> That means, if you want the VF to perform P2P with the PF there the AQ
>> resides, you have to place them in the same IOMMU group and passthrough
>> them all to a guest. So how this AQ serve other purposes?
> A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
> When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
> Such extension is only needed for nesting.
>
> For non-nesting being the known common case to us, such extension is not needed.
So implement AQ on the "admin" VF? This require the HW reserve dedicated 
resource for every VF?
So expensive, Overkill?

And a VF may be managed by the PF and its admin "vf"?
>
>>>>> QOS is such a broad term that is hard to debate unless you get to a
>>>>> specific
>>>> point.
>>>> E.g., there can be hundreds or thousands of VMs, how many admin vq
>>>> are required to serve them when LM? To converge, no timeout.
>>> How many RSS queues are required to reach 800Gbs NIC performance at what
>> q depth at what interrupt moderation level?
>>> Such details are outside the scope of virtio specification.
>>> Those are implementation details of the device.
>>>
>>> Similarly here for AQ too.
>>> The inherent nature of AQ to queue commands and execute them out of order
>> in the device is the fundamental reason, AQ is introduced.
>>> And one can have more AQs to do unrelated work, mainly from the hypervisor
>> owner device who wants to enqueue unrelated commands in parallel.
>> As pointed above insufficient RSS capabilities may cause performance
>> overhead, not not a failure, the device still stay functional.
> If UDP packets are dropped, even application can fail who do no retry.
UDP is not reliable, and performance overhead does not mean fail.
>
>> But too few AQ to serve too high volume of VMs may be a problem.
> It is left for the device to implement the needed scale requirement.
Yes, so how many HW resource should the HW implementation reserved
to serve the worst case? Half of the board resource?
>
>> Yes the number of AQs are negotiable, but how many exactly should the HW
>> provide?
> Again, it is outside the scope. It is left to the device implementation like many other performance aspects.
I agree we can skip this issue, but the point is clear. and this is not 
only a performance issue,
this can lead to failed LM.
>
>> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
>> beyond this topic, but I made my point clear.
> Sure. It is beyond.
> And it is not a concern either.
It is, the user expect the LM process success than fail.
>
>>>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>>>> number in HW implementation and how does the driver get informed?
>>>>> Usually just one AQ is enough as proposal [1] is built around
>>>>> inherent
>>>> downtime reduction.
>>>>> You can ask similar question for RSS, how does hw device how many
>>>>> RSS queues are needed. 😊
>>>>> Device exposes number of supported AQs that driver is free to use.
>>>> RSS is not a must for the transition through maybe performance overhead.
>>>> But if the host can not finish Live Migration in the due time, then
>>>> it is a failed LM.
>>> It can aborts the LM and restore it back by resuming the device.
>> aborts means fail
>>>>> Most sane sys admins do not migrate 1000 VMs at same time for
>>>>> obvious
>>>> reasons.
>>>>> But when such requirements arise, a device may support it.
>>>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>>>> The orchestration layer may do that for host upgrade or power-saving.
>>>> And the VMs may be required to migrate together, for example:
>>>> a cluster of VMs in the same subnet.
>>>>
>>> Sure. AQ of depth 1K can support 1K outstanding commands at a time for
>> 1000 member devices.
>> PCI transition is FIFO,
> I do not understand what is "PCI transition".
PCI data flow.
>
>> can depth = 1K introduce significant latency?
> AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.
Then require more HW resource, I don't see difference.
>
>> And 1K depths is
>> almost identical to 2 X 500 queue depths, so still the same problem, how many
>> resource does the HW need to reserve to serve the worst case?
>>
> You didn’t describe the problem.
> Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
> How many to execute in parallel, how many queues to have are device implementation specific.
So the question is how many to serve the worst case? Does the HW vendor 
need to reserve half of the board resource?
>
>> Let's forget the numbers, the point is clear.
> Ok. I agree with you.
> Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
> Design wise, key thing to have the queuing interface between driver and device for device migration commands.
> This enables both entities to execute things in parallel.
>
> This is fully covered in [1].
> So let's improve [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, why [1] is a must? There are certain issues discussed in 
this thread for [1] stay unsolved.

By the way, do you see anything we need to improve in this series?
>
>>>> Lets do not introduce new frangibility
>>> I don’t see any frangibility added by [1].
>>> If you see one, please let me know.
>> The resource and latency explained above.
>>>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>>>> I am not sure why the migration reason has any influence on the design.
>>>>>> Because this design is for live migration.
>>>>>>> The CSPs that we had discussed, care for performance more and
>>>>>>> hence
>>>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>>>> any nesting.
>>>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>>>> I agree that there could be other users who care for nested functionality.
>>>>>>>
>>>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>>>> The LM facility should server both,
>>>>> I don’t see how PCI spec let you do it.
>>>>> PCI community already handed over this to SR-PCIM interface outside
>>>>> of the
>>>> PCI spec domain.
>>>>> Hence, its done over admin queue for passthrough devices.
>>>>>
>>>>> If you can explain, how your proposal addresses passthrough support
>>>>> without
>>>> mediation and also does DMA, I am very interested to learn that.
>>>> Do you mean nested?
>>> Before nesting, just like to see basic single level passthrough to see functional
>> and performant like [1].
>> I think we have discussed about this, the nested guest is not aware of the admin
>> vq and can not access it, because the admin vq is a host facility.
> A nested guest VM is not aware and should not.
> The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.
The VM does not talk to admin vq either, the admin vq is a host 
facility, host owns it.
>
> At present for PCI transport, owner device is PF.
>
> In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.
Then it may run into the problems explained above.
>
> For bare metal may be some other admin device like DPU can do that role.
So [1] is not ready
>
>>>> Why this series can not support nested?
>>> I don’t see all the aspects that I covered in series [1] ranging from flr, device
>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>> covered in some device, vq suspend resume piece.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> We have discussed many other issues in this thread.
>>>>>> And it does not serve bare-metal live migration either.
>>>>> A bare-metal migration seems a distance theory as one need side cpu,
>>>> memory accessor apart from device accessor.
>>>>> But somehow if that exists, there will be similar admin device to
>>>>> migrate it
>>>> may be TDDISP will own this whole piece one day.
>>>> Bare metal live migration require other components like firmware OS
>>>> and partitioning, that's why the device live migration should not be a
>> blocker.
>>> Device migration is not blocker.
>>> In-fact it facilitates for this future in case if that happens where side cpu like
>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>> Long ago when admin commands were discussed, this was discussed too
>> where a admin device may not be an owner device.
>> The admin vq can not migrate it self therefore baremetal can not be migrated
>> by admin vq
> May be I was not clear. The admin commands are executed by some other device than the PF.
 From SW perspective, it should be the admin vq and the device it resides.
> In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.
virito device should be self-contained. Not depend on other components.
>
> We don’t need to cook all now, as long as we have administration commands its good.
> The real credit owner for detaching the administration command from the admin vq is Michael. :)
> We like to utilize this in future for DPU case where admin device is not the PCI PF.
> Eswitch, PF migration etc may utilize it in future when needed.
Again, the design should not rely on other host components.

And it is not about the credit, this is reliable work outcome


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:32                             ` Zhu, Lingshan
@ 2023-09-11 10:21                               ` Parav Pandit
  2023-09-12  4:06                                 ` Zhu, Lingshan
  2023-09-11 11:50                               ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-11 10:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> So implement AQ on the "admin" VF? This require the HW reserve dedicated
> resource for every VF?
> So expensive, Overkill?
> 
> And a VF may be managed by the PF and its admin "vf"?
Yes.

> > If UDP packets are dropped, even application can fail who do no retry.
> UDP is not reliable, and performance overhead does not mean fail.
It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.

> >
> >> But too few AQ to serve too high volume of VMs may be a problem.
> > It is left for the device to implement the needed scale requirement.
> Yes, so how many HW resource should the HW implementation reserved to
> serve the worst case? Half of the board resource?
The board designer can decide how to manage the resource.
Administration commands are explicit instructions to the device.
It knows how many members device's dirty tracking is ongoing, which device context is being read/written.

Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.

They key part is all of these happens outside of the VM's downtime.
Majority of the work in proposal [1] is done when the VM is _live_.
Hence, the resource consumption or reservation is significantly less.


> >> Naming a number or an algorithm for the ratio of devices / num_of_AQs
> >> is beyond this topic, but I made my point clear.
> > Sure. It is beyond.
> > And it is not a concern either.
> It is, the user expect the LM process success than fail.
I still fail to understand why LM process fails.
The migration process is slow, but downtime is not in [1].

> >> can depth = 1K introduce significant latency?
> > AQ command execution is not done serially. There is enough text on the AQ
> chapter as I recall.
> Then require more HW resource, I don't see difference.
Difference compared to what, multiple AQs?
If so, sure.
The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.

> >
> >> And 1K depths is
> >> almost identical to 2 X 500 queue depths, so still the same problem,
> >> how many resource does the HW need to reserve to serve the worst case?
> >>
> > You didn’t describe the problem.
> > Virtqueue is generic infrastructure to execute commands, be it admin
> command, control command, flow filter command, scsi command.
> > How many to execute in parallel, how many queues to have are device
> implementation specific.
> So the question is how many to serve the worst case? Does the HW vendor need
> to reserve half of the board resource?
No. It does not need to.

> >
> >> Let's forget the numbers, the point is clear.
> > Ok. I agree with you.
> > Number of AQs and its depth matter for this discussion, and its performance
> characterization is outside the spec.
> > Design wise, key thing to have the queuing interface between driver and
> device for device migration commands.
> > This enables both entities to execute things in parallel.
> >
> > This is fully covered in [1].
> > So let's improve [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, why [1] is a must? There are certain issues discussed in this
> thread for [1] stay unsolved.
> 
> By the way, do you see anything we need to improve in this series?
In [1], device context needs to more rich as we progress in v1/v2 versions.

[..]

> > A nested guest VM is not aware and should not.
> > The VM hosting the nested VM, is aware on how to execute administrative
> commands using the owner device.
> The VM does not talk to admin vq either, the admin vq is a host facility, host
> owns it.
Admin VQ is owned by the device whichever has it.
As I explained before, it is on the owner device.
If needed one can do on more than owner device.
A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.

> >
> > At present for PCI transport, owner device is PF.
> >
> > In future for nesting, may be another peer VF can be delegated such task and
> it can perform administration command.
> Then it may run into the problems explained above.
> >
> > For bare metal may be some other admin device like DPU can do that role.
> So [1] is not ready
> >
> >>>> Why this series can not support nested?
> >>> I don’t see all the aspects that I covered in series [1] ranging
> >>> from flr, device
> >> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> >> covered in some device, vq suspend resume piece.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> We have discussed many other issues in this thread.
> >>>>>> And it does not serve bare-metal live migration either.
> >>>>> A bare-metal migration seems a distance theory as one need side
> >>>>> cpu,
> >>>> memory accessor apart from device accessor.
> >>>>> But somehow if that exists, there will be similar admin device to
> >>>>> migrate it
> >>>> may be TDDISP will own this whole piece one day.
> >>>> Bare metal live migration require other components like firmware OS
> >>>> and partitioning, that's why the device live migration should not
> >>>> be a
> >> blocker.
> >>> Device migration is not blocker.
> >>> In-fact it facilitates for this future in case if that happens where
> >>> side cpu like
> >> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >>> Long ago when admin commands were discussed, this was discussed too
> >> where a admin device may not be an owner device.
> >> The admin vq can not migrate it self therefore baremetal can not be
> >> migrated by admin vq
> > May be I was not clear. The admin commands are executed by some other
> device than the PF.
>  From SW perspective, it should be the admin vq and the device it resides.
> > In above I call it admin device, which can be a DPU may be some other
> dedicated admin device or something else.
> > Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
> needs to evolve before virtio can utilize it.
> virito device should be self-contained. Not depend on other components.
> >
> > We don’t need to cook all now, as long as we have administration commands
> its good.
> > The real credit owner for detaching the administration command from
> > the admin vq is Michael. :) We like to utilize this in future for DPU case where
> admin device is not the PCI PF.
> > Eswitch, PF migration etc may utilize it in future when needed.
> Again, the design should not rely on other host components.
It does not. It relies on the administration commands.

> 
> And it is not about the credit, this is reliable work outcome
I didn’t follow the comment.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 10:21                               ` Parav Pandit
@ 2023-09-12  4:06                                 ` Zhu, Lingshan
  2023-09-12  5:58                                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  4:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 6:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> So implement AQ on the "admin" VF? This require the HW reserve dedicated
>> resource for every VF?
>> So expensive, Overkill?
>>
>> And a VF may be managed by the PF and its admin "vf"?
> Yes.
it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed
by both L1 guest VF and the host PF, that means two owners of the L2 VF.
>
>>> If UDP packets are dropped, even application can fail who do no retry.
>> UDP is not reliable, and performance overhead does not mean fail.
> It largely depends on application.
> I have seen iperf UDP failing on packet drop and never recovered.
> A retransmission over UDP can fail.
That depends on the workload, if it choose UDP, it is aware of the 
possibilities
of losing packets. But anyway, LM are expected to perform
successfully in the due time
>
>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>> It is left for the device to implement the needed scale requirement.
>> Yes, so how many HW resource should the HW implementation reserved to
>> serve the worst case? Half of the board resource?
> The board designer can decide how to manage the resource.
> Administration commands are explicit instructions to the device.
> It knows how many members device's dirty tracking is ongoing, which device context is being read/written.
Still, does the board designer need to prepare for the worst case? How 
to meet that challenge?
>
> Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.
As demonstrated, this series is reliable as the config space 
functionalities, so maybe less possibilities to fail?
>
> They key part is all of these happens outside of the VM's downtime.
> Majority of the work in proposal [1] is done when the VM is _live_.
> Hence, the resource consumption or reservation is significantly less.
Still depends on the volume of VMs and devices, the orchestration layer
needs to migrate the last round of dirty pages and states even when the VM
has been suspended.
>
>
>>>> Naming a number or an algorithm for the ratio of devices / num_of_AQs
>>>> is beyond this topic, but I made my point clear.
>>> Sure. It is beyond.
>>> And it is not a concern either.
>> It is, the user expect the LM process success than fail.
> I still fail to understand why LM process fails.
> The migration process is slow, but downtime is not in [1].
If I recall it clear, the downtime is around 300ms, so
don't let the bandwidth or num of admin vqs become
a bottle neck which may introduce more possibilities to fail.
>
>>>> can depth = 1K introduce significant latency?
>>> AQ command execution is not done serially. There is enough text on the AQ
>> chapter as I recall.
>> Then require more HW resource, I don't see difference.
> Difference compared to what, multiple AQs?
> If so, sure.
> The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.
I think we are discussing the same issue as above "resource for the 
worst case" problem
>
>>>> And 1K depths is
>>>> almost identical to 2 X 500 queue depths, so still the same problem,
>>>> how many resource does the HW need to reserve to serve the worst case?
>>>>
>>> You didn’t describe the problem.
>>> Virtqueue is generic infrastructure to execute commands, be it admin
>> command, control command, flow filter command, scsi command.
>>> How many to execute in parallel, how many queues to have are device
>> implementation specific.
>> So the question is how many to serve the worst case? Does the HW vendor need
>> to reserve half of the board resource?
> No. It does not need to.
same as above
>
>>>> Let's forget the numbers, the point is clear.
>>> Ok. I agree with you.
>>> Number of AQs and its depth matter for this discussion, and its performance
>> characterization is outside the spec.
>>> Design wise, key thing to have the queuing interface between driver and
>> device for device migration commands.
>>> This enables both entities to execute things in parallel.
>>>
>>> This is fully covered in [1].
>>> So let's improve [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, why [1] is a must? There are certain issues discussed in this
>> thread for [1] stay unsolved.
>>
>> By the way, do you see anything we need to improve in this series?
> In [1], device context needs to more rich as we progress in v1/v2 versions.
>
> [..]
>
>>> A nested guest VM is not aware and should not.
>>> The VM hosting the nested VM, is aware on how to execute administrative
>> commands using the owner device.
>> The VM does not talk to admin vq either, the admin vq is a host facility, host
>> owns it.
> Admin VQ is owned by the device whichever has it.
> As I explained before, it is on the owner device.
> If needed one can do on more than owner device.
> A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.
so two or more owners own the same device, conflict?
>
>>> At present for PCI transport, owner device is PF.
>>>
>>> In future for nesting, may be another peer VF can be delegated such task and
>> it can perform administration command.
>> Then it may run into the problems explained above.
>>> For bare metal may be some other admin device like DPU can do that role.
>> So [1] is not ready
>>>>>> Why this series can not support nested?
>>>>> I don’t see all the aspects that I covered in series [1] ranging
>>>>> from flr, device
>>>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>>>> covered in some device, vq suspend resume piece.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> We have discussed many other issues in this thread.
>>>>>>>> And it does not serve bare-metal live migration either.
>>>>>>> A bare-metal migration seems a distance theory as one need side
>>>>>>> cpu,
>>>>>> memory accessor apart from device accessor.
>>>>>>> But somehow if that exists, there will be similar admin device to
>>>>>>> migrate it
>>>>>> may be TDDISP will own this whole piece one day.
>>>>>> Bare metal live migration require other components like firmware OS
>>>>>> and partitioning, that's why the device live migration should not
>>>>>> be a
>>>> blocker.
>>>>> Device migration is not blocker.
>>>>> In-fact it facilitates for this future in case if that happens where
>>>>> side cpu like
>>>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>>>> Long ago when admin commands were discussed, this was discussed too
>>>> where a admin device may not be an owner device.
>>>> The admin vq can not migrate it self therefore baremetal can not be
>>>> migrated by admin vq
>>> May be I was not clear. The admin commands are executed by some other
>> device than the PF.
>>   From SW perspective, it should be the admin vq and the device it resides.
>>> In above I call it admin device, which can be a DPU may be some other
>> dedicated admin device or something else.
>>> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
>> needs to evolve before virtio can utilize it.
>> virito device should be self-contained. Not depend on other components.
>>> We don’t need to cook all now, as long as we have administration commands
>> its good.
>>> The real credit owner for detaching the administration command from
>>> the admin vq is Michael. :) We like to utilize this in future for DPU case where
>> admin device is not the PCI PF.
>>> Eswitch, PF migration etc may utilize it in future when needed.
>> Again, the design should not rely on other host components.
> It does not. It relies on the administration commands.
I remember you have mentioned using DPU infrastructure to
perform bare-metal live migration?
>
>> And it is not about the credit, this is reliable work outcome
> I didn’t follow the comment.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:06                                 ` Zhu, Lingshan
@ 2023-09-12  5:58                                   ` Parav Pandit
  2023-09-12  6:33                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  5:58 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:37 AM
> 
> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >> "admin" VF? This require the HW reserve dedicated resource for every
> >> VF?
> >> So expensive, Overkill?
> >>
> >> And a VF may be managed by the PF and its admin "vf"?
> > Yes.
> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
> both L1 guest VF and the host PF, that means two owners of the L2 VF.
This is the nesting.
When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?

> >
> >>> If UDP packets are dropped, even application can fail who do no retry.
> >> UDP is not reliable, and performance overhead does not mean fail.
> > It largely depends on application.
> > I have seen iperf UDP failing on packet drop and never recovered.
> > A retransmission over UDP can fail.
> That depends on the workload, if it choose UDP, it is aware of the possibilities of
> losing packets. But anyway, LM are expected to perform successfully in the due
> time
And LM also depends on the workload. :)
It is pointless to discuss performance characteristics as a point to use AQ or not.

> >
> >>>> But too few AQ to serve too high volume of VMs may be a problem.
> >>> It is left for the device to implement the needed scale requirement.
> >> Yes, so how many HW resource should the HW implementation reserved to
> >> serve the worst case? Half of the board resource?
> > The board designer can decide how to manage the resource.
> > Administration commands are explicit instructions to the device.
> > It knows how many members device's dirty tracking is ongoing, which device
> context is being read/written.
> Still, does the board designer need to prepare for the worst case? How to meet
> that challenge?
No. board designer does not need to.
As explained already, if board wants to supporting single command of AQ, sure.

> >
> > Admin command can even fail with EAGAIN error code when device is out of
> resource and software can retry the command.
> As demonstrated, this series is reliable as the config space functionalities, so
> maybe less possibilities to fail?
Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
For any bulk data transfer virtqueue is spec defined approach.
For more than a year this was debated you can check some 2021 emails.

You can see the patches that data transfer done in [1] over registers is snail slow.

> >
> > They key part is all of these happens outside of the VM's downtime.
> > Majority of the work in proposal [1] is done when the VM is _live_.
> > Hence, the resource consumption or reservation is significantly less.
> Still depends on the volume of VMs and devices, the orchestration layer needs
> to migrate the last round of dirty pages and states even when the VM has been
> suspended.
That has nothing do with admin virtqueue.
And migration layer already does it and used by multiple devices.

> >
> >
> >>>> Naming a number or an algorithm for the ratio of devices /
> >>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>> Sure. It is beyond.
> >>> And it is not a concern either.
> >> It is, the user expect the LM process success than fail.
> > I still fail to understand why LM process fails.
> > The migration process is slow, but downtime is not in [1].
> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
> num of admin vqs become a bottle neck which may introduce more possibilities
> to fail.
> >
> >>>> can depth = 1K introduce significant latency?
> >>> AQ command execution is not done serially. There is enough text on
> >>> the AQ
> >> chapter as I recall.
> >> Then require more HW resource, I don't see difference.
> > Difference compared to what, multiple AQs?
> > If so, sure.
> > The device who prefers to do only one AQ command at a time, sure it can
> work with less resource and do one at a time.
> I think we are discussing the same issue as above "resource for the worst case"
> problem
Frankly I am not seeing any issue.
AQ is just another virtqueue as basic construct in the spec used by 30+ device types.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:58                                   ` Parav Pandit
@ 2023-09-12  6:33                                     ` Zhu, Lingshan
  2023-09-12  6:47                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:33 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 1:58 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:37 AM
>>
>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>> "admin" VF? This require the HW reserve dedicated resource for every
>>>> VF?
>>>> So expensive, Overkill?
>>>>
>>>> And a VF may be managed by the PF and its admin "vf"?
>>> Yes.
>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
>> both L1 guest VF and the host PF, that means two owners of the L2 VF.
> This is the nesting.
> When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?
Not exactly, in nesting, L1 guest is the host/infrastructure emulator 
for L2, so L2 is expect to do nothing with the host,
or something like L2 VF managed by both L1 VF and host PF can lead to 
operational and security issues?
>
>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>> UDP is not reliable, and performance overhead does not mean fail.
>>> It largely depends on application.
>>> I have seen iperf UDP failing on packet drop and never recovered.
>>> A retransmission over UDP can fail.
>> That depends on the workload, if it choose UDP, it is aware of the possibilities of
>> losing packets. But anyway, LM are expected to perform successfully in the due
>> time
> And LM also depends on the workload. :)
Exactly! That's the point, how to meet the requirements!
> It is pointless to discuss performance characteristics as a point to use AQ or not.
How to meet QOS requirement when LM?
>
>>>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>>>> It is left for the device to implement the needed scale requirement.
>>>> Yes, so how many HW resource should the HW implementation reserved to
>>>> serve the worst case? Half of the board resource?
>>> The board designer can decide how to manage the resource.
>>> Administration commands are explicit instructions to the device.
>>> It knows how many members device's dirty tracking is ongoing, which device
>> context is being read/written.
>> Still, does the board designer need to prepare for the worst case? How to meet
>> that challenge?
> No. board designer does not need to.
> As explained already, if board wants to supporting single command of AQ, sure.
Same as above, the QOS question. For example, how to avoid the situation 
that
half VMs can be migrated and others timeout?
>
>>> Admin command can even fail with EAGAIN error code when device is out of
>> resource and software can retry the command.
>> As demonstrated, this series is reliable as the config space functionalities, so
>> maybe less possibilities to fail?
> Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
> For any bulk data transfer virtqueue is spec defined approach.
> For more than a year this was debated you can check some 2021 emails.
>
> You can see the patches that data transfer done in [1] over registers is snail slow.
Do you often observe virtio PCI config space fail? Or does admin vq need 
to transfer data through PCI?
>
>>> They key part is all of these happens outside of the VM's downtime.
>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>> Hence, the resource consumption or reservation is significantly less.
>> Still depends on the volume of VMs and devices, the orchestration layer needs
>> to migrate the last round of dirty pages and states even when the VM has been
>> suspended.
> That has nothing do with admin virtqueue.
> And migration layer already does it and used by multiple devices.
same as above, QOS
>
>>>
>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>> Sure. It is beyond.
>>>>> And it is not a concern either.
>>>> It is, the user expect the LM process success than fail.
>>> I still fail to understand why LM process fails.
>>> The migration process is slow, but downtime is not in [1].
>> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
>> num of admin vqs become a bottle neck which may introduce more possibilities
>> to fail.
>>>>>> can depth = 1K introduce significant latency?
>>>>> AQ command execution is not done serially. There is enough text on
>>>>> the AQ
>>>> chapter as I recall.
>>>> Then require more HW resource, I don't see difference.
>>> Difference compared to what, multiple AQs?
>>> If so, sure.
>>> The device who prefers to do only one AQ command at a time, sure it can
>> work with less resource and do one at a time.
>> I think we are discussing the same issue as above "resource for the worst case"
>> problem
> Frankly I am not seeing any issue.
> AQ is just another virtqueue as basic construct in the spec used by 30+ device types.
explained above, when migrate a VM, the time consuming has to 
convergence and the total
downtime has a due, I remember it is less than 300ms. That is the QOS 
requirement.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:33                                     ` Zhu, Lingshan
@ 2023-09-12  6:47                                       ` Parav Pandit
  2023-09-12  7:27                                         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  6:47 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:04 PM
> 
> 
> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:37 AM
> >>
> >> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>> every VF?
> >>>> So expensive, Overkill?
> >>>>
> >>>> And a VF may be managed by the PF and its admin "vf"?
> >>> Yes.
> >> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >> managed by both L1 guest VF and the host PF, that means two owners of the
> L2 VF.
> > This is the nesting.
> > When you do M level nesting, does any cpu in world handle its own page
> tables in isolation of next level and also perform equally well?
> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
> is expect to do nothing with the host, or something like L2 VF managed by both
> L1 VF and host PF can lead to operational and security issues?
> >
> >>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>> UDP is not reliable, and performance overhead does not mean fail.
> >>> It largely depends on application.
> >>> I have seen iperf UDP failing on packet drop and never recovered.
> >>> A retransmission over UDP can fail.
> >> That depends on the workload, if it choose UDP, it is aware of the
> >> possibilities of losing packets. But anyway, LM are expected to
> >> perform successfully in the due time
> > And LM also depends on the workload. :)
> Exactly! That's the point, how to meet the requirements!
> > It is pointless to discuss performance characteristics as a point to use AQ or
> not.
> How to meet QOS requirement when LM?
By following [1] where large part of device context and dirty page tracking is done when the VM is running.

> > No. board designer does not need to.
> > As explained already, if board wants to supporting single command of AQ,
> sure.
> Same as above, the QOS question. For example, how to avoid the situation that
> half VMs can be migrated and others timeout?
Why would this happen?
Timeout is not related to AQ in case if that happens.
Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.

I am still not able to follow your point for asking about unrelated QOS questions.

> >
> >>> Admin command can even fail with EAGAIN error code when device is
> >>> out of
> >> resource and software can retry the command.
> >> As demonstrated, this series is reliable as the config space
> >> functionalities, so maybe less possibilities to fail?
> > Huh. Config space has far higher failure rate for the PCI transport when due to
> inherent nature of PCI timeouts and reads and polling.
> > For any bulk data transfer virtqueue is spec defined approach.
> > For more than a year this was debated you can check some 2021 emails.
> >
> > You can see the patches that data transfer done in [1] over registers is snail
> slow.
> Do you often observe virtio PCI config space fail? Or does admin vq need to
> transfer data through PCI?
Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.

> >
> >>> They key part is all of these happens outside of the VM's downtime.
> >>> Majority of the work in proposal [1] is done when the VM is _live_.
> >>> Hence, the resource consumption or reservation is significantly less.
> >> Still depends on the volume of VMs and devices, the orchestration
> >> layer needs to migrate the last round of dirty pages and states even
> >> when the VM has been suspended.
> > That has nothing do with admin virtqueue.
> > And migration layer already does it and used by multiple devices.
> same as above, QOS
> >
> >>>
> >>>>>> Naming a number or an algorithm for the ratio of devices /
> >>>>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>>>> Sure. It is beyond.
> >>>>> And it is not a concern either.
> >>>> It is, the user expect the LM process success than fail.
> >>> I still fail to understand why LM process fails.
> >>> The migration process is slow, but downtime is not in [1].
> >> If I recall it clear, the downtime is around 300ms, so don't let the
> >> bandwidth or num of admin vqs become a bottle neck which may
> >> introduce more possibilities to fail.
> >>>>>> can depth = 1K introduce significant latency?
> >>>>> AQ command execution is not done serially. There is enough text on
> >>>>> the AQ
> >>>> chapter as I recall.
> >>>> Then require more HW resource, I don't see difference.
> >>> Difference compared to what, multiple AQs?
> >>> If so, sure.
> >>> The device who prefers to do only one AQ command at a time, sure it
> >>> can
> >> work with less resource and do one at a time.
> >> I think we are discussing the same issue as above "resource for the worst
> case"
> >> problem
> > Frankly I am not seeing any issue.
> > AQ is just another virtqueue as basic construct in the spec used by 30+ device
> types.
> explained above, when migrate a VM, the time consuming has to convergence
> and the total downtime has a due, I remember it is less than 300ms. That is the
> QOS requirement.
And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:47                                       ` Parav Pandit
@ 2023-09-12  7:27                                         ` Zhu, Lingshan
  2023-09-12  7:40                                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 2:47 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:04 PM
>>
>>
>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>
>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>> every VF?
>>>>>> So expensive, Overkill?
>>>>>>
>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>> Yes.
>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>> managed by both L1 guest VF and the host PF, that means two owners of the
>> L2 VF.
>>> This is the nesting.
>>> When you do M level nesting, does any cpu in world handle its own page
>> tables in isolation of next level and also perform equally well?
>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
>> is expect to do nothing with the host, or something like L2 VF managed by both
>> L1 VF and host PF can lead to operational and security issues?
>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>> It largely depends on application.
>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>> A retransmission over UDP can fail.
>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>> possibilities of losing packets. But anyway, LM are expected to
>>>> perform successfully in the due time
>>> And LM also depends on the workload. :)
>> Exactly! That's the point, how to meet the requirements!
>>> It is pointless to discuss performance characteristics as a point to use AQ or
>> not.
>> How to meet QOS requirement when LM?
> By following [1] where large part of device context and dirty page tracking is done when the VM is running.
Still needs to migrate the last round of dirty pages and device states 
when VM freeze. Still can be large if
take big amount of VMs into consideration, and that is where ~300ms due 
time rules.
>
>>> No. board designer does not need to.
>>> As explained already, if board wants to supporting single command of AQ,
>> sure.
>> Same as above, the QOS question. For example, how to avoid the situation that
>> half VMs can be migrated and others timeout?
> Why would this happen?
> Timeout is not related to AQ in case if that happens.
explained above
> Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.
When the VM freeze, the virtio functionalities, for example virito-net 
transaction is suspended as well,
so no TLPs for networking traffic buffers.

The on-device Live Migration facility can use the full PCI device 
bandwidth for migration.

That is the difference with the admin vq.
>
> I am still not able to follow your point for asking about unrelated QOS questions.
explained above, it has to meet the due time requirement and many VMs 
can be migrated simultaneously,
in that situation, they have to race for the admin vq resource/bandwidth.
>
>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>> out of
>>>> resource and software can retry the command.
>>>> As demonstrated, this series is reliable as the config space
>>>> functionalities, so maybe less possibilities to fail?
>>> Huh. Config space has far higher failure rate for the PCI transport when due to
>> inherent nature of PCI timeouts and reads and polling.
>>> For any bulk data transfer virtqueue is spec defined approach.
>>> For more than a year this was debated you can check some 2021 emails.
>>>
>>> You can see the patches that data transfer done in [1] over registers is snail
>> slow.
>> Do you often observe virtio PCI config space fail? Or does admin vq need to
>> transfer data through PCI?
> Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.
So you agree actually PCI config space are very unlikely to fail? It is 
reliable.

Please allow me to provide an extreme example, is one single admin vq 
limitless, that can
serve hundreds to thousands of VMs migration? If not, two or three or 
what number?
>
>>>>> They key part is all of these happens outside of the VM's downtime.
>>>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>>>> Hence, the resource consumption or reservation is significantly less.
>>>> Still depends on the volume of VMs and devices, the orchestration
>>>> layer needs to migrate the last round of dirty pages and states even
>>>> when the VM has been suspended.
>>> That has nothing do with admin virtqueue.
>>> And migration layer already does it and used by multiple devices.
>> same as above, QOS
>>>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>>>> Sure. It is beyond.
>>>>>>> And it is not a concern either.
>>>>>> It is, the user expect the LM process success than fail.
>>>>> I still fail to understand why LM process fails.
>>>>> The migration process is slow, but downtime is not in [1].
>>>> If I recall it clear, the downtime is around 300ms, so don't let the
>>>> bandwidth or num of admin vqs become a bottle neck which may
>>>> introduce more possibilities to fail.
>>>>>>>> can depth = 1K introduce significant latency?
>>>>>>> AQ command execution is not done serially. There is enough text on
>>>>>>> the AQ
>>>>>> chapter as I recall.
>>>>>> Then require more HW resource, I don't see difference.
>>>>> Difference compared to what, multiple AQs?
>>>>> If so, sure.
>>>>> The device who prefers to do only one AQ command at a time, sure it
>>>>> can
>>>> work with less resource and do one at a time.
>>>> I think we are discussing the same issue as above "resource for the worst
>> case"
>>>> problem
>>> Frankly I am not seeing any issue.
>>> AQ is just another virtqueue as basic construct in the spec used by 30+ device
>> types.
>> explained above, when migrate a VM, the time consuming has to convergence
>> and the total downtime has a due, I remember it is less than 300ms. That is the
>> QOS requirement.
> And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].
explained above, depends on the amount of the migrating VMs.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:27                                         ` Zhu, Lingshan
@ 2023-09-12  7:40                                           ` Parav Pandit
  2023-09-12  9:02                                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  7:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:58 PM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/12/2023 2:47 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:04 PM
> >>
> >>
> >> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:37 AM
> >>>>
> >>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>>>> every VF?
> >>>>>> So expensive, Overkill?
> >>>>>>
> >>>>>> And a VF may be managed by the PF and its admin "vf"?
> >>>>> Yes.
> >>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >>>> managed by both L1 guest VF and the host PF, that means two owners
> >>>> of the
> >> L2 VF.
> >>> This is the nesting.
> >>> When you do M level nesting, does any cpu in world handle its own
> >>> page
> >> tables in isolation of next level and also perform equally well?
> >> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
> >> for L2, so L2 is expect to do nothing with the host, or something
> >> like L2 VF managed by both
> >> L1 VF and host PF can lead to operational and security issues?
> >>>>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>>>> UDP is not reliable, and performance overhead does not mean fail.
> >>>>> It largely depends on application.
> >>>>> I have seen iperf UDP failing on packet drop and never recovered.
> >>>>> A retransmission over UDP can fail.
> >>>> That depends on the workload, if it choose UDP, it is aware of the
> >>>> possibilities of losing packets. But anyway, LM are expected to
> >>>> perform successfully in the due time
> >>> And LM also depends on the workload. :)
> >> Exactly! That's the point, how to meet the requirements!
> >>> It is pointless to discuss performance characteristics as a point to
> >>> use AQ or
> >> not.
> >> How to meet QOS requirement when LM?
> > By following [1] where large part of device context and dirty page tracking is
> done when the VM is running.
> Still needs to migrate the last round of dirty pages and device states when VM
> freeze. Still can be large if take big amount of VMs into consideration, and that
> is where ~300ms due time rules.
> >
> >>> No. board designer does not need to.
> >>> As explained already, if board wants to supporting single command of
> >>> AQ,
> >> sure.
> >> Same as above, the QOS question. For example, how to avoid the
> >> situation that half VMs can be migrated and others timeout?
> > Why would this happen?
> > Timeout is not related to AQ in case if that happens.
> explained above
> > Timeout can happen to config registers too. And it can be even far more
> harder for board designers to support PCI reads in a timeout to handle in 384
> reads in parallel.
> When the VM freeze, the virtio functionalities, for example virito-net
> transaction is suspended as well, so no TLPs for networking traffic buffers.
The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.

> 
> The on-device Live Migration facility can use the full PCI device bandwidth for
> migration.
So does admin commands also.
However the big difference is: registers do not scale with large number of VFs.
Admin commands scale easily.

I probably should not repeat what is already captured in the admin commands commit log and cover letter.

> 
> That is the difference with the admin vq.
I don’t know what difference you are talking about.
PCI device bandwidth for migration is available with admin commands and some config registers both.
BW != timeout.

> >
> > I am still not able to follow your point for asking about unrelated QOS
> questions.
> explained above, it has to meet the due time requirement and many VMs can
> be migrated simultaneously, in that situation, they have to race for the admin
> vq resource/bandwidth.
> >
> >>>>> Admin command can even fail with EAGAIN error code when device is
> >>>>> out of
> >>>> resource and software can retry the command.
> >>>> As demonstrated, this series is reliable as the config space
> >>>> functionalities, so maybe less possibilities to fail?
> >>> Huh. Config space has far higher failure rate for the PCI transport
> >>> when due to
> >> inherent nature of PCI timeouts and reads and polling.
> >>> For any bulk data transfer virtqueue is spec defined approach.
> >>> For more than a year this was debated you can check some 2021 emails.
> >>>
> >>> You can see the patches that data transfer done in [1] over
> >>> registers is snail
> >> slow.
> >> Do you often observe virtio PCI config space fail? Or does admin vq
> >> need to transfer data through PCI?
> > Admin commands needs to transfer bulk data across thousands of VFs in
> parallel for many VFs without baking registers in PCI.
> So you agree actually PCI config space are very unlikely to fail? It is reliable.
> 
No. I do not agree. It can fail and very hard for board designers.
AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.

> Please allow me to provide an extreme example, is one single admin vq
> limitless, that can serve hundreds to thousands of VMs migration? 
It is left to the device implementation. Just like RSS and multi queue support?
Is one Q enough for 800Gbps to 10Mbps link?
Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.

> If not, two or
> three or what number?
It really does not matter. Its wrong point to discuss here.
Number of queues and command execution depends on the device implementation.
A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
And we don’t put details about such things in specification.
Spec takes the requirements and provides driver device interface to implement and scale.

I still don’t follow the motivation behind the question.
Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
It is similar to how such things depend on implementation for 30 virtio device types.

And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:40                                           ` Parav Pandit
@ 2023-09-12  9:02                                             ` Zhu, Lingshan
  2023-09-12  9:21                                               ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:02 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 3:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:58 PM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/12/2023 2:47 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:04 PM
>>>>
>>>>
>>>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>>>
>>>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>>>> every VF?
>>>>>>>> So expensive, Overkill?
>>>>>>>>
>>>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>>>> Yes.
>>>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>>>> managed by both L1 guest VF and the host PF, that means two owners
>>>>>> of the
>>>> L2 VF.
>>>>> This is the nesting.
>>>>> When you do M level nesting, does any cpu in world handle its own
>>>>> page
>>>> tables in isolation of next level and also perform equally well?
>>>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
>>>> for L2, so L2 is expect to do nothing with the host, or something
>>>> like L2 VF managed by both
>>>> L1 VF and host PF can lead to operational and security issues?
>>>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>>>> It largely depends on application.
>>>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>>>> A retransmission over UDP can fail.
>>>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>>>> possibilities of losing packets. But anyway, LM are expected to
>>>>>> perform successfully in the due time
>>>>> And LM also depends on the workload. :)
>>>> Exactly! That's the point, how to meet the requirements!
>>>>> It is pointless to discuss performance characteristics as a point to
>>>>> use AQ or
>>>> not.
>>>> How to meet QOS requirement when LM?
>>> By following [1] where large part of device context and dirty page tracking is
>> done when the VM is running.
>> Still needs to migrate the last round of dirty pages and device states when VM
>> freeze. Still can be large if take big amount of VMs into consideration, and that
>> is where ~300ms due time rules.
>>>>> No. board designer does not need to.
>>>>> As explained already, if board wants to supporting single command of
>>>>> AQ,
>>>> sure.
>>>> Same as above, the QOS question. For example, how to avoid the
>>>> situation that half VMs can be migrated and others timeout?
>>> Why would this happen?
>>> Timeout is not related to AQ in case if that happens.
>> explained above
>>> Timeout can happen to config registers too. And it can be even far more
>> harder for board designers to support PCI reads in a timeout to handle in 384
>> reads in parallel.
>> When the VM freeze, the virtio functionalities, for example virito-net
>> transaction is suspended as well, so no TLPs for networking traffic buffers.
> The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
> In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.
This is per-device operations, directly access device config space, 
consume the dedicated device resource & bandwidth, like
other standard virito operations.
>
>> The on-device Live Migration facility can use the full PCI device bandwidth for
>> migration.
> So does admin commands also.
> However the big difference is: registers do not scale with large number of VFs.
> Admin commands scale easily.
admin vq require fixed and dedicated resource to serve the VMs, the 
question still
remains, does is scale to server big amount of devices migration? how 
many admin
vqs do you need to serve 10 VMs, how many for 100? and so on? How to scale?

If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable 
time?
If not, how many exactly.


And register does not need to scale, it resides on the VF and only serve 
the VF.

It does not reside on the PF to migrate the VFs.
>
> I probably should not repeat what is already captured in the admin commands commit log and cover letter.
>
>> That is the difference with the admin vq.
> I don’t know what difference you are talking about.
> PCI device bandwidth for migration is available with admin commands and some config registers both.
> BW != timeout.
VFs config space can use the device dedicated resource like the bandwidth.

for AQ, still you need to reserve resource and how much?
>
>>> I am still not able to follow your point for asking about unrelated QOS
>> questions.
>> explained above, it has to meet the due time requirement and many VMs can
>> be migrated simultaneously, in that situation, they have to race for the admin
>> vq resource/bandwidth.
>>>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>>>> out of
>>>>>> resource and software can retry the command.
>>>>>> As demonstrated, this series is reliable as the config space
>>>>>> functionalities, so maybe less possibilities to fail?
>>>>> Huh. Config space has far higher failure rate for the PCI transport
>>>>> when due to
>>>> inherent nature of PCI timeouts and reads and polling.
>>>>> For any bulk data transfer virtqueue is spec defined approach.
>>>>> For more than a year this was debated you can check some 2021 emails.
>>>>>
>>>>> You can see the patches that data transfer done in [1] over
>>>>> registers is snail
>>>> slow.
>>>> Do you often observe virtio PCI config space fail? Or does admin vq
>>>> need to transfer data through PCI?
>>> Admin commands needs to transfer bulk data across thousands of VFs in
>> parallel for many VFs without baking registers in PCI.
>> So you agree actually PCI config space are very unlikely to fail? It is reliable.
>>
> No. I do not agree. It can fail and very hard for board designers.
> AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.
Really? How often do you observe virtio config space fail?
>
>> Please allow me to provide an extreme example, is one single admin vq
>> limitless, that can serve hundreds to thousands of VMs migration?
> It is left to the device implementation. Just like RSS and multi queue support?
> Is one Q enough for 800Gbps to 10Mbps link?
> Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.
Even if not support RSS or MQ, the device still can work with 
performance overhead, not fail.

Insufficient bandwidth & resource caused live migration fail is totally 
different.
>
>> If not, two or
>> three or what number?
> It really does not matter. Its wrong point to discuss here.
> Number of queues and command execution depends on the device implementation.
> A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
> And we don’t put details about such things in specification.
> Spec takes the requirements and provides driver device interface to implement and scale.
>
> I still don’t follow the motivation behind the question.
> Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
> It is similar to how such things depend on implementation for 30 virtio device types.
>
> And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
> Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!
if so, leave it as undefined? A potential risk for device implantation?
Then why must the admin vq?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:02                                             ` Zhu, Lingshan
@ 2023-09-12  9:21                                               ` Parav Pandit
  2023-09-12 13:03                                                 ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  9:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:33 PM

> admin vq require fixed and dedicated resource to serve the VMs, the question
> still remains, does is scale to server big amount of devices migration? how many
> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> scale?
>
Yes, it scales within the AQ and across multiple AQs.
Please consult your board designers to know such limits for your device.
 
> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
> If not, how many exactly.
> 
Yes, it can serve both 100 and 1000 VMs in reasonable time.

> 
> And register does not need to scale, it resides on the VF and only serve
> the VF.
>
Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
It clearly scaled poor than queue.
 
> It does not reside on the PF to migrate the VFs.
Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.

Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
I don’t see a point to discuss it.
Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"

> VFs config space can use the device dedicated resource like the bandwidth.
>
> for AQ, still you need to reserve resource and how much?
It depends on your board, please consult your board designer to know depending on the implementation.
From spec point of view, it should not be same as any other virtqueue.

> > No. I do not agree. It can fail and very hard for board designers.
> > AQs are more reliable way to transport bulk data in scalable manner for tens
> of member devices.
> Really? How often do you observe virtio config space fail?

On Intel Icelake server we have seen it failing with 128 VFs.
And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.


> >
> >> Please allow me to provide an extreme example, is one single admin vq
> >> limitless, that can serve hundreds to thousands of VMs migration?
> > It is left to the device implementation. Just like RSS and multi queue support?
> > Is one Q enough for 800Gbps to 10Mbps link?
> > Answer is: Not the scope of specification, spec provide the framework to scale
> this way, but not impose on the device.
> Even if not support RSS or MQ, the device still can work with
> performance overhead, not fail.
>
_work_ is subjective. 
The financial transaction (application) failed. Packeted worked.
LM commands were successful, but it was not timely.

Same same..
 
> Insufficient bandwidth & resource caused live migration fail is totally
> different.
Very abstract point and unrelated to administration commands.

> >
> >> If not, two or
> >> three or what number?
> > It really does not matter. Its wrong point to discuss here.
> > Number of queues and command execution depends on the device
> implementation.
> > A financial transaction application can timeout when a device queuing delay
> for virtio net rx queue is long.
> > And we don’t put details about such things in specification.
> > Spec takes the requirements and provides driver device interface to
> implement and scale.
> >
> > I still don’t follow the motivation behind the question.
> > Is your question: How many admin queues are needed to migrate N member
> devices? If so, it is implementation specific.
> > It is similar to how such things depend on implementation for 30 virtio device
> types.
> >
> > And if are implying that because it is implementation specific, that is why
> administration queue should not be used, but some configuration register
> should be used.
> > Than you should propose a config register interface to post virtqueue
> descriptors that way for 30 device types!
> if so, leave it as undefined? A potential risk for device implantation?


> Then why must the admin vq?

Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.

To summarize, administration commands and queue offer following benefits.

1. Ability to do bulk data transfer between driver and device

2. Ability to parallelize the work within driver and within device within single or multiple virtqueues

3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval

4. Better utilize host cpu as no one needs to poll on the device register for completion

5. Ability to handle variability in command completion by device and ability to notify the driver

If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:21                                               ` Parav Pandit
@ 2023-09-12 13:03                                                 ` Zhu, Lingshan
  2023-09-12 13:43                                                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 5:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:33 PM
>> admin vq require fixed and dedicated resource to serve the VMs, the question
>> still remains, does is scale to server big amount of devices migration? how many
>> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>> scale?
>>
> Yes, it scales within the AQ and across multiple AQs.
> Please consult your board designers to know such limits for your device.
scales require multiple AQs, then how many should a vendor provide for the
worst case?

I am boring for the same repeating questions.
>   
>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
>> If not, how many exactly.
>>
> Yes, it can serve both 100 and 1000 VMs in reasonable time.
I am not sure, the aq is limitless? Can serve thousands of VMs
in a reasonable time? Like in 300ms?

If you say, that require multiple AQ, then how many should a vendor provide?

Don't say the board designer own the risks.
>
>> And register does not need to scale, it resides on the VF and only serve
>> the VF.
>>
> Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
> It clearly scaled poor than queue.
Please read my series. For example, we introduce a new bit SUSPEND in 
the \field{device status}, any scalability issues here?
>   
>> It does not reside on the PF to migrate the VFs.
> Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.
Why its not scale? It is a per device facility.
Why do you need parallel operation against the LM facility?
That doesn't make a lot of sense.
>
> Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
> I don’t see a point to discuss it.
> Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"
Where do you see the series intends to transfer bulk data through registers?
>
>> VFs config space can use the device dedicated resource like the bandwidth.
>>
>> for AQ, still you need to reserve resource and how much?
> It depends on your board, please consult your board designer to know depending on the implementation.
>  From spec point of view, it should not be same as any other virtqueue.
so the vendor own the risk to implement AQ LM? Why they have to?
>>> No. I do not agree. It can fail and very hard for board designers.
>>> AQs are more reliable way to transport bulk data in scalable manner for tens
>> of member devices.
>> Really? How often do you observe virtio config space fail?
> On Intel Icelake server we have seen it failing with 128 VFs.
> And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.
That is your setup problem.
>
>
>>>> Please allow me to provide an extreme example, is one single admin vq
>>>> limitless, that can serve hundreds to thousands of VMs migration?
>>> It is left to the device implementation. Just like RSS and multi queue support?
>>> Is one Q enough for 800Gbps to 10Mbps link?
>>> Answer is: Not the scope of specification, spec provide the framework to scale
>> this way, but not impose on the device.
>> Even if not support RSS or MQ, the device still can work with
>> performance overhead, not fail.
>>
> _work_ is subjective.
> The financial transaction (application) failed. Packeted worked.
> LM commands were successful, but it was not timely.
>
> Same same..
>   
>> Insufficient bandwidth & resource caused live migration fail is totally
>> different.
> Very abstract point and unrelated to administration commands.
It is your design facing the problem.
>
>>>> If not, two or
>>>> three or what number?
>>> It really does not matter. Its wrong point to discuss here.
>>> Number of queues and command execution depends on the device
>> implementation.
>>> A financial transaction application can timeout when a device queuing delay
>> for virtio net rx queue is long.
>>> And we don’t put details about such things in specification.
>>> Spec takes the requirements and provides driver device interface to
>> implement and scale.
>>> I still don’t follow the motivation behind the question.
>>> Is your question: How many admin queues are needed to migrate N member
>> devices? If so, it is implementation specific.
>>> It is similar to how such things depend on implementation for 30 virtio device
>> types.
>>> And if are implying that because it is implementation specific, that is why
>> administration queue should not be used, but some configuration register
>> should be used.
>>> Than you should propose a config register interface to post virtqueue
>> descriptors that way for 30 device types!
>> if so, leave it as undefined? A potential risk for device implantation?
>
>> Then why must the admin vq?
> Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
> The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.
>
> To summarize, administration commands and queue offer following benefits.
>
> 1. Ability to do bulk data transfer between driver and device
>
> 2. Ability to parallelize the work within driver and within device within single or multiple virtqueues
>
> 3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval
>
> 4. Better utilize host cpu as no one needs to poll on the device register for completion
>
> 5. Ability to handle variability in command completion by device and ability to notify the driver
>
> If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.
I think you mixed up the facility and the implementation in my series, 
please read.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:03                                                 ` Zhu, Lingshan
@ 2023-09-12 13:43                                                   ` Parav Pandit
  2023-09-13  4:01                                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 13:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:33 PM
> 
> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
> >> dedicated resource to serve the VMs, the question still remains, does
> >> is scale to server big amount of devices migration? how many admin
> >> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> >> scale?
> >>
> > Yes, it scales within the AQ and across multiple AQs.
> > Please consult your board designers to know such limits for your device.
> scales require multiple AQs, then how many should a vendor provide for the
> worst case?
> 
> I am boring for the same repeating questions.
I said it scales, within the AQ. (and across AQs).
I have answered enough times, so I will stop on same repeated question.
Your repeated question is not helping anyone as it is not in the scope of virtio.

If you think it is, please get it written first for RSS and MQ in net section and post for review.

> >
> >> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
> time?
> >> If not, how many exactly.
> >>
> > Yes, it can serve both 100 and 1000 VMs in reasonable time.
> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
> time? Like in 300ms?
> 
Yes.

> If you say, that require multiple AQ, then how many should a vendor provide?
> 
I didn’t say multiple AQs must be used.
It is same as NIC RQs.

> Don't say the board designer own the risks.

> >
> >> And register does not need to scale, it resides on the VF and only
> >> serve the VF.
> >>
> > Since its per VF, by nature it is linearly growing entity that the board design
> needs to support read and write with guaranteed timing.
> > It clearly scaled poor than queue.
> Please read my series. For example, we introduce a new bit SUSPEND in the
> \field{device status}, any scalability issues here?
That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
And that brings the scale issue.
On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.

> >
> >> It does not reside on the PF to migrate the VFs.
> > Hence it does not scale and cannot do parallel operation within the VF, unless
> each register is replicated.
> Why its not scale? It is a per device facility.
Because the device needs to answer per device through some large scale memory to fit in a response time.

> Why do you need parallel operation against the LM facility?
Because your downtime was 300msec for 1000 VMs.

> That doesn't make a lot of sense.
> >
> > Using register of a queue for bulk data transfer is solved question when the
> virtio spec was born.
> > I don’t see a point to discuss it.
> > Snippet from spec: " As a device can have zero or more virtqueues for bulk
> data transport"
> Where do you see the series intends to transfer bulk data through registers?
> >
> >> VFs config space can use the device dedicated resource like the bandwidth.
> >>
> >> for AQ, still you need to reserve resource and how much?
> > It depends on your board, please consult your board designer to know
> depending on the implementation.
> >  From spec point of view, it should not be same as any other virtqueue.
> so the vendor own the risk to implement AQ LM? Why they have to?
> >>> No. I do not agree. It can fail and very hard for board designers.
> >>> AQs are more reliable way to transport bulk data in scalable manner
> >>> for tens
> >> of member devices.
> >> Really? How often do you observe virtio config space fail?
> > On Intel Icelake server we have seen it failing with 128 VFs.
> > And device needs to do very weird things to support 1000+ VFs forever
> expanding config space, which is not the topic of this discussion anyway.
> That is your setup problem.
> >
> >
> >>>> Please allow me to provide an extreme example, is one single admin
> >>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>> It is left to the device implementation. Just like RSS and multi queue
> support?
> >>> Is one Q enough for 800Gbps to 10Mbps link?
> >>> Answer is: Not the scope of specification, spec provide the
> >>> framework to scale
> >> this way, but not impose on the device.
> >> Even if not support RSS or MQ, the device still can work with
> >> performance overhead, not fail.
> >>
> > _work_ is subjective.
> > The financial transaction (application) failed. Packeted worked.
> > LM commands were successful, but it was not timely.
> >
> > Same same..
> >
> >> Insufficient bandwidth & resource caused live migration fail is
> >> totally different.
> > Very abstract point and unrelated to administration commands.
> It is your design facing the problem.
> >
> >>>> If not, two or
> >>>> three or what number?
> >>> It really does not matter. Its wrong point to discuss here.
> >>> Number of queues and command execution depends on the device
> >> implementation.
> >>> A financial transaction application can timeout when a device
> >>> queuing delay
> >> for virtio net rx queue is long.
> >>> And we don’t put details about such things in specification.
> >>> Spec takes the requirements and provides driver device interface to
> >> implement and scale.
> >>> I still don’t follow the motivation behind the question.
> >>> Is your question: How many admin queues are needed to migrate N
> >>> member
> >> devices? If so, it is implementation specific.
> >>> It is similar to how such things depend on implementation for 30
> >>> virtio device
> >> types.
> >>> And if are implying that because it is implementation specific, that
> >>> is why
> >> administration queue should not be used, but some configuration
> >> register should be used.
> >>> Than you should propose a config register interface to post
> >>> virtqueue
> >> descriptors that way for 30 device types!
> >> if so, leave it as undefined? A potential risk for device implantation?
> >
> >> Then why must the admin vq?
> > Because administration commands and admin vq does not impose devices to
> implement thousands of registers which must have time bound completion
> guarantee.
> > The large part of industry including SIOV devices led by Intel and others are
> moving away from register access mode.
> >
> > To summarize, administration commands and queue offer following benefits.
> >
> > 1. Ability to do bulk data transfer between driver and device
> >
> > 2. Ability to parallelize the work within driver and within device
> > within single or multiple virtqueues
> >
> > 3. Eliminates implementing PCI read/write MMIO registers which demand
> > low latency response interval
> >
> > 4. Better utilize host cpu as no one needs to poll on the device
> > register for completion
> >
> > 5. Ability to handle variability in command completion by device and
> > ability to notify the driver
> >
> > If this does not satisfy you, please refer to some of the past email discussions
> during administration virtuqueue time.
> I think you mixed up the facility and the implementation in my series, please
> read.
I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:43                                                   ` Parav Pandit
@ 2023-09-13  4:01                                                     ` Zhu, Lingshan
  2023-09-13  4:12                                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:01 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 9:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:33 PM
>>
>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
>>>> dedicated resource to serve the VMs, the question still remains, does
>>>> is scale to server big amount of devices migration? how many admin
>>>> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>>>> scale?
>>>>
>>> Yes, it scales within the AQ and across multiple AQs.
>>> Please consult your board designers to know such limits for your device.
>> scales require multiple AQs, then how many should a vendor provide for the
>> worst case?
>>
>> I am boring for the same repeating questions.
> I said it scales, within the AQ. (and across AQs).
> I have answered enough times, so I will stop on same repeated question.
> Your repeated question is not helping anyone as it is not in the scope of virtio.
>
> If you think it is, please get it written first for RSS and MQ in net section and post for review.
You missed the point of the question and I agree no need to discuss this 
anymore.
>
>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
>> time?
>>>> If not, how many exactly.
>>>>
>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
>> time? Like in 300ms?
>>
> Yes.
really? limitless?
>
>> If you say, that require multiple AQ, then how many should a vendor provide?
>>
> I didn’t say multiple AQs must be used.
> It is same as NIC RQs.
don't you agree a single vq has its own performance limitations?
>
>> Don't say the board designer own the risks.
>>>> And register does not need to scale, it resides on the VF and only
>>>> serve the VF.
>>>>
>>> Since its per VF, by nature it is linearly growing entity that the board design
>> needs to support read and write with guaranteed timing.
>>> It clearly scaled poor than queue.
>> Please read my series. For example, we introduce a new bit SUSPEND in the
>> \field{device status}, any scalability issues here?
> That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
> And that brings the scale issue.
In this series, it says:
+When setting SUSPEND, the driver MUST re-read \field{device status} to 
ensure the SUSPEND bit is set.

And this is nothing to do with scale.
> On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.
so as SiWei suggested, there will be a new feature bit introduced in V2 
for vq reset.
>
>>>> It does not reside on the PF to migrate the VFs.
>>> Hence it does not scale and cannot do parallel operation within the VF, unless
>> each register is replicated.
>> Why its not scale? It is a per device facility.
> Because the device needs to answer per device through some large scale memory to fit in a response time.
Again, it is a per-device facility, and it is register based serve the 
only one device itself.
And we do not plan to log the dirty pages in bar.
>
>> Why do you need parallel operation against the LM facility?
> Because your downtime was 300msec for 1000 VMs.
the LM facility in this series is per-device, it only severs itself.
>
>> That doesn't make a lot of sense.
>>> Using register of a queue for bulk data transfer is solved question when the
>> virtio spec was born.
>>> I don’t see a point to discuss it.
>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>> data transport"
>> Where do you see the series intends to transfer bulk data through registers?
>>>> VFs config space can use the device dedicated resource like the bandwidth.
>>>>
>>>> for AQ, still you need to reserve resource and how much?
>>> It depends on your board, please consult your board designer to know
>> depending on the implementation.
>>>   From spec point of view, it should not be same as any other virtqueue.
>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>> for tens
>>>> of member devices.
>>>> Really? How often do you observe virtio config space fail?
>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>> And device needs to do very weird things to support 1000+ VFs forever
>> expanding config space, which is not the topic of this discussion anyway.
>> That is your setup problem.
>>>
>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>> It is left to the device implementation. Just like RSS and multi queue
>> support?
>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>> Answer is: Not the scope of specification, spec provide the
>>>>> framework to scale
>>>> this way, but not impose on the device.
>>>> Even if not support RSS or MQ, the device still can work with
>>>> performance overhead, not fail.
>>>>
>>> _work_ is subjective.
>>> The financial transaction (application) failed. Packeted worked.
>>> LM commands were successful, but it was not timely.
>>>
>>> Same same..
>>>
>>>> Insufficient bandwidth & resource caused live migration fail is
>>>> totally different.
>>> Very abstract point and unrelated to administration commands.
>> It is your design facing the problem.
>>>>>> If not, two or
>>>>>> three or what number?
>>>>> It really does not matter. Its wrong point to discuss here.
>>>>> Number of queues and command execution depends on the device
>>>> implementation.
>>>>> A financial transaction application can timeout when a device
>>>>> queuing delay
>>>> for virtio net rx queue is long.
>>>>> And we don’t put details about such things in specification.
>>>>> Spec takes the requirements and provides driver device interface to
>>>> implement and scale.
>>>>> I still don’t follow the motivation behind the question.
>>>>> Is your question: How many admin queues are needed to migrate N
>>>>> member
>>>> devices? If so, it is implementation specific.
>>>>> It is similar to how such things depend on implementation for 30
>>>>> virtio device
>>>> types.
>>>>> And if are implying that because it is implementation specific, that
>>>>> is why
>>>> administration queue should not be used, but some configuration
>>>> register should be used.
>>>>> Than you should propose a config register interface to post
>>>>> virtqueue
>>>> descriptors that way for 30 device types!
>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>> Then why must the admin vq?
>>> Because administration commands and admin vq does not impose devices to
>> implement thousands of registers which must have time bound completion
>> guarantee.
>>> The large part of industry including SIOV devices led by Intel and others are
>> moving away from register access mode.
>>> To summarize, administration commands and queue offer following benefits.
>>>
>>> 1. Ability to do bulk data transfer between driver and device
>>>
>>> 2. Ability to parallelize the work within driver and within device
>>> within single or multiple virtqueues
>>>
>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>> low latency response interval
>>>
>>> 4. Better utilize host cpu as no one needs to poll on the device
>>> register for completion
>>>
>>> 5. Ability to handle variability in command completion by device and
>>> ability to notify the driver
>>>
>>> If this does not satisfy you, please refer to some of the past email discussions
>> during administration virtuqueue time.
>> I think you mixed up the facility and the implementation in my series, please
>> read.
> I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.
Again, we are implementing facilities, V2 will include inflgiht 
descriptors and dirty page tracking. That works for LM.

>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:01                                                     ` Zhu, Lingshan
@ 2023-09-13  4:12                                                       ` Parav Pandit
  2023-09-13  4:20                                                         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:31 AM
> 
> On 9/12/2023 9:43 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:33 PM
> >>
> >> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
> >>>> and dedicated resource to serve the VMs, the question still
> >>>> remains, does is scale to server big amount of devices migration?
> >>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
> >>>> and so on? How to scale?
> >>>>
> >>> Yes, it scales within the AQ and across multiple AQs.
> >>> Please consult your board designers to know such limits for your device.
> >> scales require multiple AQs, then how many should a vendor provide
> >> for the worst case?
> >>
> >> I am boring for the same repeating questions.
> > I said it scales, within the AQ. (and across AQs).
> > I have answered enough times, so I will stop on same repeated question.
> > Your repeated question is not helping anyone as it is not in the scope of virtio.
> >
> > If you think it is, please get it written first for RSS and MQ in net section and
> post for review.
> You missed the point of the question and I agree no need to discuss this
> anymore.
Ok. thanks.

> >
> >>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
> >>>> reasonable
> >> time?
> >>>> If not, how many exactly.
> >>>>
> >>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
> >> I am not sure, the aq is limitless? Can serve thousands of VMs in a
> >> reasonable time? Like in 300ms?
> >>
> > Yes.
> really? limitless?
> >
I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
VQ depth defines the VQ's limit.

> >> If you say, that require multiple AQ, then how many should a vendor
> provide?
> >>
> > I didn’t say multiple AQs must be used.
> > It is same as NIC RQs.
> don't you agree a single vq has its own performance limitations?
For LM I don’t see the limitation. 
The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.

> In this series, it says:
> +When setting SUSPEND, the driver MUST re-read \field{device status} to
> ensure the SUSPEND bit is set.
> 
> And this is nothing to do with scale.
Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.

And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
And hence the mode setting command of [1] is just fine.

> > On top of that once the device is SUSPENDED, it cannot accept some other
> RESET_VQ command.
> so as SiWei suggested, there will be a new feature bit introduced in V2
> for vq reset.
VQ cannot be RESET after the device reset as you wrote.

> >
> >>>> It does not reside on the PF to migrate the VFs.
> >>> Hence it does not scale and cannot do parallel operation within the VF,
> unless
> >> each register is replicated.
> >> Why its not scale? It is a per device facility.
> > Because the device needs to answer per device through some large scale
> memory to fit in a response time.
> Again, it is a per-device facility, and it is register based serve the
> only one device itself.
> And we do not plan to log the dirty pages in bar.
Hence, there is no reason to wrap suspend resume on the BAR either.
The mode setting admin command is just fine.

> >
> >> Why do you need parallel operation against the LM facility?
> > Because your downtime was 300msec for 1000 VMs.
> the LM facility in this series is per-device, it only severs itself.
And that single threading and single threading per VQ reset via single register wont scale.

> >
> >> That doesn't make a lot of sense.
> >>> Using register of a queue for bulk data transfer is solved question when the
> >> virtio spec was born.
> >>> I don’t see a point to discuss it.
> >>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
> >> data transport"
> >> Where do you see the series intends to transfer bulk data through registers?
> >>>> VFs config space can use the device dedicated resource like the
> bandwidth.
> >>>>
> >>>> for AQ, still you need to reserve resource and how much?
> >>> It depends on your board, please consult your board designer to know
> >> depending on the implementation.
> >>>   From spec point of view, it should not be same as any other virtqueue.
> >> so the vendor own the risk to implement AQ LM? Why they have to?
> >>>>> No. I do not agree. It can fail and very hard for board designers.
> >>>>> AQs are more reliable way to transport bulk data in scalable manner
> >>>>> for tens
> >>>> of member devices.
> >>>> Really? How often do you observe virtio config space fail?
> >>> On Intel Icelake server we have seen it failing with 128 VFs.
> >>> And device needs to do very weird things to support 1000+ VFs forever
> >> expanding config space, which is not the topic of this discussion anyway.
> >> That is your setup problem.
> >>>
> >>>>>> Please allow me to provide an extreme example, is one single admin
> >>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>>>> It is left to the device implementation. Just like RSS and multi queue
> >> support?
> >>>>> Is one Q enough for 800Gbps to 10Mbps link?
> >>>>> Answer is: Not the scope of specification, spec provide the
> >>>>> framework to scale
> >>>> this way, but not impose on the device.
> >>>> Even if not support RSS or MQ, the device still can work with
> >>>> performance overhead, not fail.
> >>>>
> >>> _work_ is subjective.
> >>> The financial transaction (application) failed. Packeted worked.
> >>> LM commands were successful, but it was not timely.
> >>>
> >>> Same same..
> >>>
> >>>> Insufficient bandwidth & resource caused live migration fail is
> >>>> totally different.
> >>> Very abstract point and unrelated to administration commands.
> >> It is your design facing the problem.
> >>>>>> If not, two or
> >>>>>> three or what number?
> >>>>> It really does not matter. Its wrong point to discuss here.
> >>>>> Number of queues and command execution depends on the device
> >>>> implementation.
> >>>>> A financial transaction application can timeout when a device
> >>>>> queuing delay
> >>>> for virtio net rx queue is long.
> >>>>> And we don’t put details about such things in specification.
> >>>>> Spec takes the requirements and provides driver device interface to
> >>>> implement and scale.
> >>>>> I still don’t follow the motivation behind the question.
> >>>>> Is your question: How many admin queues are needed to migrate N
> >>>>> member
> >>>> devices? If so, it is implementation specific.
> >>>>> It is similar to how such things depend on implementation for 30
> >>>>> virtio device
> >>>> types.
> >>>>> And if are implying that because it is implementation specific, that
> >>>>> is why
> >>>> administration queue should not be used, but some configuration
> >>>> register should be used.
> >>>>> Than you should propose a config register interface to post
> >>>>> virtqueue
> >>>> descriptors that way for 30 device types!
> >>>> if so, leave it as undefined? A potential risk for device implantation?
> >>>> Then why must the admin vq?
> >>> Because administration commands and admin vq does not impose devices
> to
> >> implement thousands of registers which must have time bound completion
> >> guarantee.
> >>> The large part of industry including SIOV devices led by Intel and others are
> >> moving away from register access mode.
> >>> To summarize, administration commands and queue offer following
> benefits.
> >>>
> >>> 1. Ability to do bulk data transfer between driver and device
> >>>
> >>> 2. Ability to parallelize the work within driver and within device
> >>> within single or multiple virtqueues
> >>>
> >>> 3. Eliminates implementing PCI read/write MMIO registers which demand
> >>> low latency response interval
> >>>
> >>> 4. Better utilize host cpu as no one needs to poll on the device
> >>> register for completion
> >>>
> >>> 5. Ability to handle variability in command completion by device and
> >>> ability to notify the driver
> >>>
> >>> If this does not satisfy you, please refer to some of the past email
> discussions
> >> during administration virtuqueue time.
> >> I think you mixed up the facility and the implementation in my series, please
> >> read.
> > I don’t know what you refer to. You asked "why AQ is must?" I answered
> above what AQ has to offer than some synchronous register.
> Again, we are implementing facilities, V2 will include inflgiht
> descriptors and dirty page tracking. That works for LM.

It can be named under anything, what matters is how/where it is used?
So "facility" and "implementation" in your above comment are just abstract word.
I answered you "Why AQ is must"?

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:12                                                       ` Parav Pandit
@ 2023-09-13  4:20                                                         ` Zhu, Lingshan
  2023-09-13  4:36                                                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:20 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:31 AM
>>
>> On 9/12/2023 9:43 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:33 PM
>>>>
>>>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
>>>>>> and dedicated resource to serve the VMs, the question still
>>>>>> remains, does is scale to server big amount of devices migration?
>>>>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
>>>>>> and so on? How to scale?
>>>>>>
>>>>> Yes, it scales within the AQ and across multiple AQs.
>>>>> Please consult your board designers to know such limits for your device.
>>>> scales require multiple AQs, then how many should a vendor provide
>>>> for the worst case?
>>>>
>>>> I am boring for the same repeating questions.
>>> I said it scales, within the AQ. (and across AQs).
>>> I have answered enough times, so I will stop on same repeated question.
>>> Your repeated question is not helping anyone as it is not in the scope of virtio.
>>>
>>> If you think it is, please get it written first for RSS and MQ in net section and
>> post for review.
>> You missed the point of the question and I agree no need to discuss this
>> anymore.
> Ok. thanks.
>
>>>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
>>>>>> reasonable
>>>> time?
>>>>>> If not, how many exactly.
>>>>>>
>>>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>>>> I am not sure, the aq is limitless? Can serve thousands of VMs in a
>>>> reasonable time? Like in 300ms?
>>>>
>>> Yes.
>> really? limitless?
> I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
> VQ depth defines the VQ's limit.
still sounds like limitless and I will stop arguing this as you can see 
if there is REALLY
a queue can be limitless, we even don't need Multi-queue or RSS.
>
>>>> If you say, that require multiple AQ, then how many should a vendor
>> provide?
>>> I didn’t say multiple AQs must be used.
>>> It is same as NIC RQs.
>> don't you agree a single vq has its own performance limitations?
> For LM I don’t see the limitation.
> The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.
see above, and we are implementing per device facilities.
>
>> In this series, it says:
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to
>> ensure the SUSPEND bit is set.
>>
>> And this is nothing to do with scale.
> Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.
>
> And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
> And hence the mode setting command of [1] is just fine.
The bar registers are almost "triggers"
>
>>> On top of that once the device is SUSPENDED, it cannot accept some other
>> RESET_VQ command.
>> so as SiWei suggested, there will be a new feature bit introduced in V2
>> for vq reset.
> VQ cannot be RESET after the device reset as you wrote.
It is device SUSPEND, not reset.
>
>>>>>> It does not reside on the PF to migrate the VFs.
>>>>> Hence it does not scale and cannot do parallel operation within the VF,
>> unless
>>>> each register is replicated.
>>>> Why its not scale? It is a per device facility.
>>> Because the device needs to answer per device through some large scale
>> memory to fit in a response time.
>> Again, it is a per-device facility, and it is register based serve the
>> only one device itself.
>> And we do not plan to log the dirty pages in bar.
> Hence, there is no reason to wrap suspend resume on the BAR either.
> The mode setting admin command is just fine.
They are device status bits.
>
>>>> Why do you need parallel operation against the LM facility?
>>> Because your downtime was 300msec for 1000 VMs.
>> the LM facility in this series is per-device, it only severs itself.
> And that single threading and single threading per VQ reset via single register wont scale.
it is per-device facility, for example, on the VF, not the owner PF.
>
>>>> That doesn't make a lot of sense.
>>>>> Using register of a queue for bulk data transfer is solved question when the
>>>> virtio spec was born.
>>>>> I don’t see a point to discuss it.
>>>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>>>> data transport"
>>>> Where do you see the series intends to transfer bulk data through registers?
>>>>>> VFs config space can use the device dedicated resource like the
>> bandwidth.
>>>>>> for AQ, still you need to reserve resource and how much?
>>>>> It depends on your board, please consult your board designer to know
>>>> depending on the implementation.
>>>>>    From spec point of view, it should not be same as any other virtqueue.
>>>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>>>> for tens
>>>>>> of member devices.
>>>>>> Really? How often do you observe virtio config space fail?
>>>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>>>> And device needs to do very weird things to support 1000+ VFs forever
>>>> expanding config space, which is not the topic of this discussion anyway.
>>>> That is your setup problem.
>>>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>>>> It is left to the device implementation. Just like RSS and multi queue
>>>> support?
>>>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>>>> Answer is: Not the scope of specification, spec provide the
>>>>>>> framework to scale
>>>>>> this way, but not impose on the device.
>>>>>> Even if not support RSS or MQ, the device still can work with
>>>>>> performance overhead, not fail.
>>>>>>
>>>>> _work_ is subjective.
>>>>> The financial transaction (application) failed. Packeted worked.
>>>>> LM commands were successful, but it was not timely.
>>>>>
>>>>> Same same..
>>>>>
>>>>>> Insufficient bandwidth & resource caused live migration fail is
>>>>>> totally different.
>>>>> Very abstract point and unrelated to administration commands.
>>>> It is your design facing the problem.
>>>>>>>> If not, two or
>>>>>>>> three or what number?
>>>>>>> It really does not matter. Its wrong point to discuss here.
>>>>>>> Number of queues and command execution depends on the device
>>>>>> implementation.
>>>>>>> A financial transaction application can timeout when a device
>>>>>>> queuing delay
>>>>>> for virtio net rx queue is long.
>>>>>>> And we don’t put details about such things in specification.
>>>>>>> Spec takes the requirements and provides driver device interface to
>>>>>> implement and scale.
>>>>>>> I still don’t follow the motivation behind the question.
>>>>>>> Is your question: How many admin queues are needed to migrate N
>>>>>>> member
>>>>>> devices? If so, it is implementation specific.
>>>>>>> It is similar to how such things depend on implementation for 30
>>>>>>> virtio device
>>>>>> types.
>>>>>>> And if are implying that because it is implementation specific, that
>>>>>>> is why
>>>>>> administration queue should not be used, but some configuration
>>>>>> register should be used.
>>>>>>> Than you should propose a config register interface to post
>>>>>>> virtqueue
>>>>>> descriptors that way for 30 device types!
>>>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>>>> Then why must the admin vq?
>>>>> Because administration commands and admin vq does not impose devices
>> to
>>>> implement thousands of registers which must have time bound completion
>>>> guarantee.
>>>>> The large part of industry including SIOV devices led by Intel and others are
>>>> moving away from register access mode.
>>>>> To summarize, administration commands and queue offer following
>> benefits.
>>>>> 1. Ability to do bulk data transfer between driver and device
>>>>>
>>>>> 2. Ability to parallelize the work within driver and within device
>>>>> within single or multiple virtqueues
>>>>>
>>>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>>>> low latency response interval
>>>>>
>>>>> 4. Better utilize host cpu as no one needs to poll on the device
>>>>> register for completion
>>>>>
>>>>> 5. Ability to handle variability in command completion by device and
>>>>> ability to notify the driver
>>>>>
>>>>> If this does not satisfy you, please refer to some of the past email
>> discussions
>>>> during administration virtuqueue time.
>>>> I think you mixed up the facility and the implementation in my series, please
>>>> read.
>>> I don’t know what you refer to. You asked "why AQ is must?" I answered
>> above what AQ has to offer than some synchronous register.
>> Again, we are implementing facilities, V2 will include inflgiht
>> descriptors and dirty page tracking. That works for LM.
> It can be named under anything, what matters is how/where it is used?
> So "facility" and "implementation" in your above comment are just abstract word.
> I answered you "Why AQ is must"?
see above and please feel free to reuse the basic facilities if you like 
in your AQ LM


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:20                                                         ` Zhu, Lingshan
@ 2023-09-13  4:36                                                           ` Parav Pandit
  2023-09-14  8:19                                                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM


> > VQ depth defines the VQ's limit.
> still sounds like limitless and I will stop arguing this as you can see if there is
> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.

If you see some value in limitless queue, please add one.
I have not seen such construct until now and don’t see the need for it.

> >
> >>>> If you say, that require multiple AQ, then how many should a vendor
> >> provide?
> >>> I didn’t say multiple AQs must be used.
> >>> It is same as NIC RQs.
> >> don't you agree a single vq has its own performance limitations?
> > For LM I don’t see the limitation.
> > The finite limit an AQ has, such limitation is no different than some register
> write poll with one entry at a time per device.
> see above, and we are implementing per device facilities.
> >
> >> In this series, it says:
> >> +When setting SUSPEND, the driver MUST re-read \field{device status}
> >> +to
> >> ensure the SUSPEND bit is set.
> >>
> >> And this is nothing to do with scale.
> > Hence, it is bringing same scale QOS limitation on register too that you claim
> may be present in the AQ.
> >
> > And hence, I responded earlier that when most things are not done through
> BAR, so there is no need to do suspend/resume via BAR either.
> > And hence the mode setting command of [1] is just fine.
> The bar registers are almost "triggers"
> >
> >>> On top of that once the device is SUSPENDED, it cannot accept some
> >>> other
> >> RESET_VQ command.
> >> so as SiWei suggested, there will be a new feature bit introduced in
> >> V2 for vq reset.
> > VQ cannot be RESET after the device reset as you wrote.
> It is device SUSPEND, not reset.
> >
Suspend means suspend of English language.
It cannot accept more synchronous commands after that and not supposed to respond.

> >>>>>> It does not reside on the PF to migrate the VFs.
> >>>>> Hence it does not scale and cannot do parallel operation within
> >>>>> the VF,
> >> unless
> >>>> each register is replicated.
> >>>> Why its not scale? It is a per device facility.
> >>> Because the device needs to answer per device through some large
> >>> scale
> >> memory to fit in a response time.
> >> Again, it is a per-device facility, and it is register based serve
> >> the only one device itself.
> >> And we do not plan to log the dirty pages in bar.
> > Hence, there is no reason to wrap suspend resume on the BAR either.
> > The mode setting admin command is just fine.
> They are device status bits.
And it doesn't have to be.

> >
> >>>> Why do you need parallel operation against the LM facility?
> >>> Because your downtime was 300msec for 1000 VMs.
> >> the LM facility in this series is per-device, it only severs itself.
> > And that single threading and single threading per VQ reset via single register
> wont scale.
> it is per-device facility, for example, on the VF, not the owner PF.
And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
A weird device bifurcation is not supported by pci and not to be done in virtio.

> see above and please feel free to reuse the basic facilities if you like in your AQ
> LM
The whole attitude that "We .." and use in "your" LM is just simply wrong.
Please work towards collaborative design in technical committee.
What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:36                                                           ` Parav Pandit
@ 2023-09-14  8:19                                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:19 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:36 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>
>>> VQ depth defines the VQ's limit.
>> still sounds like limitless and I will stop arguing this as you can see if there is
>> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.
> If you see some value in limitless queue, please add one.
> I have not seen such construct until now and don’t see the need for it.
It is you consider the admin vq is limitless, not me, and I don't agree 
with that.
And I stop arguing this, the point is clear.
>
>>>>>> If you say, that require multiple AQ, then how many should a vendor
>>>> provide?
>>>>> I didn’t say multiple AQs must be used.
>>>>> It is same as NIC RQs.
>>>> don't you agree a single vq has its own performance limitations?
>>> For LM I don’t see the limitation.
>>> The finite limit an AQ has, such limitation is no different than some register
>> write poll with one entry at a time per device.
>> see above, and we are implementing per device facilities.
>>>> In this series, it says:
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status}
>>>> +to
>>>> ensure the SUSPEND bit is set.
>>>>
>>>> And this is nothing to do with scale.
>>> Hence, it is bringing same scale QOS limitation on register too that you claim
>> may be present in the AQ.
>>> And hence, I responded earlier that when most things are not done through
>> BAR, so there is no need to do suspend/resume via BAR either.
>>> And hence the mode setting command of [1] is just fine.
>> The bar registers are almost "triggers"
>>>>> On top of that once the device is SUSPENDED, it cannot accept some
>>>>> other
>>>> RESET_VQ command.
>>>> so as SiWei suggested, there will be a new feature bit introduced in
>>>> V2 for vq reset.
>>> VQ cannot be RESET after the device reset as you wrote.
>> It is device SUSPEND, not reset.
> Suspend means suspend of English language.
> It cannot accept more synchronous commands after that and not supposed to respond.
Please read the series, 2/5 patch describes SUSPEND behaviors.
>
>>>>>>>> It does not reside on the PF to migrate the VFs.
>>>>>>> Hence it does not scale and cannot do parallel operation within
>>>>>>> the VF,
>>>> unless
>>>>>> each register is replicated.
>>>>>> Why its not scale? It is a per device facility.
>>>>> Because the device needs to answer per device through some large
>>>>> scale
>>>> memory to fit in a response time.
>>>> Again, it is a per-device facility, and it is register based serve
>>>> the only one device itself.
>>>> And we do not plan to log the dirty pages in bar.
>>> Hence, there is no reason to wrap suspend resume on the BAR either.
>>> The mode setting admin command is just fine.
>> They are device status bits.
> And it doesn't have to be.
I don't get your comment.
Do you mean there should not be device status bits?
Challenging even DRIVER_OK is unreasonable?
>
>>>>>> Why do you need parallel operation against the LM facility?
>>>>> Because your downtime was 300msec for 1000 VMs.
>>>> the LM facility in this series is per-device, it only severs itself.
>>> And that single threading and single threading per VQ reset via single register
>> wont scale.
>> it is per-device facility, for example, on the VF, not the owner PF.
> And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
> A weird device bifurcation is not supported by pci and not to be done in virtio.
Didn't you find the answer in my comments?

I repeated several times:
1) as described in this series, once SUSPEND, the device should present 
stabilized config space.
2) the device freeze both it's control path and data path. So we don't 
expect any queues functional since when SUSPEND ~ !SUSPEND.
3) but device status still operational because we may need to recover 
from failed LM or cancel the LM process.
4) We will introduce a new feature bit to allow reset vqs after SUSPEND.

Where so you see we expect the queue to work after SUSPEND?

Clear now?
>
>> see above and please feel free to reuse the basic facilities if you like in your AQ
>> LM
> The whole attitude that "We .." and use in "your" LM is just simply wrong.
why wrong?
> Please work towards collaborative design in technical committee.
This is what I am doing now, no? Or why I am talking to you?
> What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.
EOM
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:32                             ` Zhu, Lingshan
  2023-09-11 10:21                               ` Parav Pandit
@ 2023-09-11 11:50                               ` Parav Pandit
  2023-09-12  3:43                                 ` Jason Wang
  2023-09-12  3:48                                 ` Zhu, Lingshan
  1 sibling, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-11 11:50 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> By the way, do you see anything we need to improve in this series?

Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

To me [1] is superset work that covers all needed functionality and downtime aspects.

I plan to improve [1] with v1 this week by extending device context and addressing other review comments.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 11:50                               ` Parav Pandit
@ 2023-09-12  3:43                                 ` Jason Wang
  2023-09-12  5:50                                   ` Parav Pandit
  2023-09-12  3:48                                 ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Monday, September 11, 2023 3:03 PM
>
> > By the way, do you see anything we need to improve in this series?
>
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

What do you mean by "all the aspects"?

Of course it can't handle nesting well, passthrough doesn't work when
your hardware has N levels abstractions but nesting is M levels. Trap
and emulation is a must.

And exposing the whole device to the guest drivers will have security
implications, your proposal has demonstrated that you need a
workaround for FLR at least.

For non standard device we don't have choices other than passthrough,
but for standard devices we have other choices.

Thanks

>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:43                                 ` Jason Wang
@ 2023-09-12  5:50                                   ` Parav Pandit
  2023-09-13  4:44                                     ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  5:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:14 AM
> 
> On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Monday, September 11, 2023 3:03 PM
> >
> > > By the way, do you see anything we need to improve in this series?
> >
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> 
> What do you mean by "all the aspects"?
They are covered in the proposal cover letter.
1. state migration, p2p, dirty page tracking, lower downtime, flr, device reset, no extra mediation requirement.

> 
> Of course it can't handle nesting well, passthrough doesn't work when your
> hardware has N levels abstractions but nesting is M levels. Trap and emulation
> is a must.
One can delicate the work to other VF for purpose of nesting.

One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.
So for such N and M being > 1, one can use software base emulation anyway.

> 
> And exposing the whole device to the guest drivers will have security
> implications, your proposal has demonstrated that you need a workaround for
There is no security implications in passthrough.

> FLR at least.
It is actually the opposite.
FLR is supported with the proposal without any workarounds and mediation.

> 
> For non standard device we don't have choices other than passthrough, but for
> standard devices we have other choices.

Passthrough is basic requirement that we will be fulfilling.
If one wants to do special nesting, may be, there.
If both commands can converge its good, if not, they are orthogonal requirements.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:50                                   ` Parav Pandit
@ 2023-09-13  4:44                                     ` Jason Wang
  2023-09-13  6:05                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-13  4:44 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Tue, Sep 12, 2023 at 1:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> One can delicate the work to other VF for purpose of nesting.
>
> One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.

We are talking about nested virtualization but nested emulation. I
won't repeat the definition of virtualization but no matter how much
level of nesting, the hypervisor will try hard to let the application
run natively for most of the time, otherwise it's not the nested
virtualization at all.

Nested virtualization has been supported by all major cloud vendors,
please read the relevant documentation for the performance
implications. Virtio community is not the correct place to debate
whether a nest is useful. We need to make sure the datapath could be
assigned to any nest layers without losing any fundamental facilities
like migration.

> So for such N and M being > 1, one can use software base emulation anyway.

No, only the control path is trapped, the datapath is still passthrough.

>
> >
> > And exposing the whole device to the guest drivers will have security
> > implications, your proposal has demonstrated that you need a workaround for
> There is no security implications in passthrough.

How can you prove this or is it even possible for you to prove this?
You expose all device details to guests (especially the transport
specific details), the attack surface is increased in this way.

What's more, a simple passthrough may lose the chance to workaround
hardware erratas and you will finally get back to the trap and
emulation.

>
> > FLR at least.
> It is actually the opposite.
> FLR is supported with the proposal without any workarounds and mediation.

It's an obvious drawback but not an advantage. And it's not a must for
live migration to work. You need to prove the FLR doesn't conflict
with the live migration, and it's not only FLR but also all the other
PCI facilities. one other example is P2P and what's the next? As more
features were added to the PCI spec, you will have endless work in
auditing the possible conflict with the passthrough based live
migration.

>
> >
> > For non standard device we don't have choices other than passthrough, but for
> > standard devices we have other choices.
>
> Passthrough is basic requirement that we will be fulfilling.

It has several drawbacks that I would not like to repeat. We all know
even for VFIO, it requires a trap instead of a complete passthrough.

> If one wants to do special nesting, may be, there.

Nesting is not special. Go and see how it is supported by major cloud
vendors and you will get the answer. Introducing an interface in
virtio that is hard to be virtualized is even worse than writing a
compiler that can not do bootstrap compilation.

Thanks

> If both commands can converge its good, if not, they are orthogonal requirements.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:44                                     ` Jason Wang
@ 2023-09-13  6:05                                       ` Parav Pandit
  2023-09-14  3:11                                         ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM
> To: Parav Pandit <parav@nvidia.com>

> > One can build infinite level of nesting to not do passthrough, at the end user
> applications remains slow.
> 
> We are talking about nested virtualization but nested emulation. I won't repeat
> the definition of virtualization but no matter how much level of nesting, the
> hypervisor will try hard to let the application run natively for most of the time,
> otherwise it's not the nested virtualization at all.
> 
> Nested virtualization has been supported by all major cloud vendors, please
> read the relevant documentation for the performance implications. Virtio
> community is not the correct place to debate whether a nest is useful. We need
> to make sure the datapath could be assigned to any nest layers without losing
> any fundamental facilities like migration.
> 
I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.
And for sure virtio do not need to live in the dark shadow of mediation always.
For nesting use case sure one can do mediation related mode.

So only mediation is not the direction.

> > So for such N and M being > 1, one can use software base emulation anyway.
> 
> No, only the control path is trapped, the datapath is still passthrough.
> 
Again, it depends on the use case.

> >
> > >
> > > And exposing the whole device to the guest drivers will have
> > > security implications, your proposal has demonstrated that you need
> > > a workaround for
> > There is no security implications in passthrough.
> 
> How can you prove this or is it even possible for you to prove this?
Huh, when you claim that it is not secure, please point out exactly what is not secure.
Please take with PCI SIG and file CVE to PCI sig.

> You expose all device details to guests (especially the transport specific details),
> the attack surface is increased in this way.
One can say it is the opposite.
Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.


> 
> What's more, a simple passthrough may lose the chance to workaround
> hardware erratas and you will finally get back to the trap and emulation.
Hardware errata's is not the starting point to build the software stack and spec.
What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Same argument of hardware errata applied to data path too.
One should not implement in hw...

I disagree with such argument.

You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
Then it is fair discussion.

I will not debate further on passthrough vs control path mediation as either_or approach.

> 
> >
> > > FLR at least.
> > It is actually the opposite.
> > FLR is supported with the proposal without any workarounds and mediation.
> 
> It's an obvious drawback but not an advantage. And it's not a must for live
> migration to work. You need to prove the FLR doesn't conflict with the live
> migration, and it's not only FLR but also all the other PCI facilities. 
I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
You should read [1].

> one other
> example is P2P and what's the next? As more features were added to the PCI
> spec, you will have endless work in auditing the possible conflict with the
> passthrough based live migration.
> 
This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.
So each method has its pros and cons. One suits one use case, other suits other use case.
Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.
Again, it is a choice that user make with the tradeoff.

> >
> > >
> > > For non standard device we don't have choices other than
> > > passthrough, but for standard devices we have other choices.
> >
> > Passthrough is basic requirement that we will be fulfilling.
> 
> It has several drawbacks that I would not like to repeat. We all know even for
> VFIO, it requires a trap instead of a complete passthrough.
> 
Sure. Both has pros and cons.
And both can co-exist.

> > If one wants to do special nesting, may be, there.
> 
> Nesting is not special. Go and see how it is supported by major cloud vendors
> and you will get the answer. Introducing an interface in virtio that is hard to be
> virtualized is even worse than writing a compiler that can not do bootstrap
> compilation.
We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  6:05                                       ` Parav Pandit
@ 2023-09-14  3:11                                         ` Jason Wang
  2023-09-17  5:22                                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
> > To: Parav Pandit <parav@nvidia.com>
>
> > > One can build infinite level of nesting to not do passthrough, at the end user
> > applications remains slow.
> >
> > We are talking about nested virtualization but nested emulation. I won't repeat
> > the definition of virtualization but no matter how much level of nesting, the
> > hypervisor will try hard to let the application run natively for most of the time,
> > otherwise it's not the nested virtualization at all.
> >
> > Nested virtualization has been supported by all major cloud vendors, please
> > read the relevant documentation for the performance implications. Virtio
> > community is not the correct place to debate whether a nest is useful. We need
> > to make sure the datapath could be assigned to any nest layers without losing
> > any fundamental facilities like migration.
> >
> I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.

Let me correct your temiology again. It's "trap and emulation" . It
means the workload runs mostly native but sometimes is trapped by the
hypervisor.

And it's not the only way. It's the start point since all current
virtio spec is built upon this methodology.

> And for sure virtio do not need to live in the dark shadow of mediation always.

99% of virtio devices are implemented in this way (which is what you
call dark and shadow) now.

> For nesting use case sure one can do mediation related mode.
>
> So only mediation is not the direction.

CPU and MMU virtualization were all built in this way.

>
> > > So for such N and M being > 1, one can use software base emulation anyway.
> >
> > No, only the control path is trapped, the datapath is still passthrough.
> >
> Again, it depends on the use case.

No matter what use case, the definition and methodology of
virtualization stands still.

>
> > >
> > > >
> > > > And exposing the whole device to the guest drivers will have
> > > > security implications, your proposal has demonstrated that you need
> > > > a workaround for
> > > There is no security implications in passthrough.
> >
> > How can you prove this or is it even possible for you to prove this?
> Huh, when you claim that it is not secure, please point out exactly what is not secure.
> Please take with PCI SIG and file CVE to PCI sig.

I am saying it has security implications. That is why you need to
explain why you think it doesn't. What's more, the implications are
obviously nothing related to PCI SIG but a vendor virtio hardware
implementation.

>
> > You expose all device details to guests (especially the transport specific details),
> > the attack surface is increased in this way.
> One can say it is the opposite.
> Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.
>

We all know such a stack has been widely used for decades. But you
want to say your new stack is much more secure than this?

>
> >
> > What's more, a simple passthrough may lose the chance to workaround
> > hardware erratas and you will finally get back to the trap and emulation.
> Hardware errata's is not the starting point to build the software stack and spec.

It's not the starting point. But it's definitely something that needs
to be considered, go and see kernel codes (especially the KVM part)
and you will get the answer.

> What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Do I say so? Trap and emulation is the common methodology used in KVM
and VFIO. And if you want to replace it with a complete passthrough,
you need to prove your method can work.

>
> Same argument of hardware errata applied to data path too.

Anything makes datapath different? Xen used to fallback to shadow page
tables to workaround hardware TDP errata in the past.

> One should not implement in hw...
>
> I disagree with such argument.

It's not my argument.

>
> You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
> Then it is fair discussion.
>
> I will not debate further on passthrough vs control path mediation as either_or approach.
>
> >
> > >
> > > > FLR at least.
> > > It is actually the opposite.
> > > FLR is supported with the proposal without any workarounds and mediation.
> >
> > It's an obvious drawback but not an advantage. And it's not a must for live
> > migration to work. You need to prove the FLR doesn't conflict with the live
> > migration, and it's not only FLR but also all the other PCI facilities.
> I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
> You should read [1].

I don't think you answered my question in that thread.

>
> > one other
> > example is P2P and what's the next? As more features were added to the PCI
> > spec, you will have endless work in auditing the possible conflict with the
> > passthrough based live migration.
> >
> This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.

No, for trap and emulation we don't need to do that. We only do
datapath assignments.

> So each method has its pros and cons. One suits one use case, other suits other use case.
> Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

I never say things like this, it is your proposal that mandates
migration with admin commands. Could you please read what is proposed
in this series carefully?

On top of this series, you can build your amd commands easily. But
there's nothing that can be done on top of your proposal.

>
> In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.

One reality that you constantly ignore is that such trapping and
device models have been widely used by a lot of cloud vendors for more
than a decade.

> Again, it is a choice that user make with the tradeoff.
>
> > >
> > > >
> > > > For non standard device we don't have choices other than
> > > > passthrough, but for standard devices we have other choices.
> > >
> > > Passthrough is basic requirement that we will be fulfilling.
> >
> > It has several drawbacks that I would not like to repeat. We all know even for
> > VFIO, it requires a trap instead of a complete passthrough.
> >
> Sure. Both has pros and cons.
> And both can co-exist.

I don't see how it can co-exist with your proposal. I can see how
admin commands can co-exist on top of this series.

>
> > > If one wants to do special nesting, may be, there.
> >
> > Nesting is not special. Go and see how it is supported by major cloud vendors
> > and you will get the answer. Introducing an interface in virtio that is hard to be
> > virtualized is even worse than writing a compiler that can not do bootstrap
> > compilation.
> We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
> And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.

Where do I say passthrough must not be done? I'm saying you need to
justify your proposal instead of simply saying "hey, you are wrong".

Again, nesting is not the only issue, the key point is that it's
partial and not self contained.

Thanks

>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:11                                         ` Jason Wang
@ 2023-09-17  5:22                                           ` Parav Pandit
  2023-09-19  4:35                                             ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > To: Parav Pandit <parav@nvidia.com>
> >
> > > > One can build infinite level of nesting to not do passthrough, at
> > > > the end user
> > > applications remains slow.
> > >
> > > We are talking about nested virtualization but nested emulation. I
> > > won't repeat the definition of virtualization but no matter how much
> > > level of nesting, the hypervisor will try hard to let the
> > > application run natively for most of the time, otherwise it's not the nested
> virtualization at all.
> > >
> > > Nested virtualization has been supported by all major cloud vendors,
> > > please read the relevant documentation for the performance
> > > implications. Virtio community is not the correct place to debate
> > > whether a nest is useful. We need to make sure the datapath could be
> > > assigned to any nest layers without losing any fundamental facilities like
> migration.
> > >
> > I am not debating. You or Lingshan claim or imply that mediation is the only
> way to progress.
> 
> Let me correct your temiology again. It's "trap and emulation" . It means the
> workload runs mostly native but sometimes is trapped by the hypervisor.
>
 
> And it's not the only way. It's the start point since all current virtio spec is built
> upon this methodology.
Current spec is not the steering point to define new methods.
So we will build the spec infra to support passthrough.

Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.

And hence, both are not mutually exclusive.
Hence we should not debate that anymore.

> 
> > And for sure virtio do not need to live in the dark shadow of mediation always.
> 
> 99% of virtio devices are implemented in this way (which is what you call dark
> and shadow) now.
> 
What I am saying is one should not say mediation/trap-emulation is the only way for virtio.
So let passthrough device migration to progress.

> > For nesting use case sure one can do mediation related mode.
> >
> > So only mediation is not the direction.
> 
> CPU and MMU virtualization were all built in this way.
> 
Not anymore. Both of them have vcpus and viommu where may things are not trapped.
So as I said both has pros and cons and users will pick what fits their need and use case.

> >
> > > > So for such N and M being > 1, one can use software base emulation
> anyway.
> > >
> > > No, only the control path is trapped, the datapath is still passthrough.
> > >
> > Again, it depends on the use case.
> 
> No matter what use case, the definition and methodology of virtualization
> stands still.
> 
I will stop debating this because the core technical question is not answered.
I don’t see a technology available that virtio can utilize to it.
That is interface that can work without messing with device status and flr while device migration is ongoing.
Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
And that is just fine.

> >
> > > >
> > > > >
> > > > > And exposing the whole device to the guest drivers will have
> > > > > security implications, your proposal has demonstrated that you
> > > > > need a workaround for
> > > > There is no security implications in passthrough.
> > >
> > > How can you prove this or is it even possible for you to prove this?
> > Huh, when you claim that it is not secure, please point out exactly what is not
> secure.
> > Please take with PCI SIG and file CVE to PCI sig.
> 
> I am saying it has security implications. That is why you need to explain why you
> think it doesn't. What's more, the implications are obviously nothing related to
> PCI SIG but a vendor virtio hardware implementation.
> 
PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
Device migration is not adding/removing anything, nor touching any security aspect of it.
Because it does not need to it either.
Device migration is making sure that it continue to exists.

> >
> > > You expose all device details to guests (especially the transport
> > > specific details), the attack surface is increased in this way.
> > One can say it is the opposite.
> > Attack surface is increased in hypervisor due to mediation poking at
> everything controlled by the guest.
> >
> 
> We all know such a stack has been widely used for decades. But you want to say
> your new stack is much more secure than this?
> 
It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
And not involving hypervisor in core device operation.

> >
> > >
> > > What's more, a simple passthrough may lose the chance to workaround
> > > hardware erratas and you will finally get back to the trap and emulation.
> > Hardware errata's is not the starting point to build the software stack and
> spec.
> 
> It's not the starting point. But it's definitely something that needs to be
> considered, go and see kernel codes (especially the KVM part) and you will get
> the answer.
> 
There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.

So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

> > What you imply is, one must never use vfio stack, one must not use vcpu
> acceleration and everything must be emulated.
> 
> Do I say so? Trap and emulation is the common methodology used in KVM and
> VFIO. And if you want to replace it with a complete passthrough, you need to
> prove your method can work.
> 
Please review patches. I do not plan to _replace_ is either.
Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
And those users who prefer trap+emualation can use that.

> >
> > Same argument of hardware errata applied to data path too.
> 
> Anything makes datapath different? Xen used to fallback to shadow page tables
> to workaround hardware TDP errata in the past.
> 
> > One should not implement in hw...
> >
> > I disagree with such argument.
> 
> It's not my argument.
> 
You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

> >
> > You can say nesting is requirement for some use cases, so spec should support
> it without blocking the passthrough mode.
> > Then it is fair discussion.
> >
> > I will not debate further on passthrough vs control path mediation as
> either_or approach.
> >
> > >
> > > >
> > > > > FLR at least.
> > > > It is actually the opposite.
> > > > FLR is supported with the proposal without any workarounds and
> mediation.
> > >
> > > It's an obvious drawback but not an advantage. And it's not a must
> > > for live migration to work. You need to prove the FLR doesn't
> > > conflict with the live migration, and it's not only FLR but also all the other
> PCI facilities.
> > I don’t know what you mean by prove. It is already clear from the proposal
> FLR is not messing with rest of the device migration infrastructure.
> > You should read [1].
> 
> I don't think you answered my question in that thread.
> 
Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.

> >
> > > one other
> > > example is P2P and what's the next? As more features were added to
> > > the PCI spec, you will have endless work in auditing the possible
> > > conflict with the passthrough based live migration.
> > >
> > This drawback equally applies to mediation route where one need to do more
> than audit where the mediation layer to be extended.
> 
> No, for trap and emulation we don't need to do that. We only do datapath
> assignments.
> 
It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

> > So each method has its pros and cons. One suits one use case, other suits
> other use case.
> > Therefore, again attempting to claim that only mediation approach is the only
> way to progress is incorrect.
> 
> I never say things like this, it is your proposal that mandates migration with
> admin commands. Could you please read what is proposed in this series
> carefully?
> 
Admin commands are split from the AQ so one can use the admin commands inband as well.
Though, I don’t see how it can functionality work without mediation.
This is the key technical difference between two approaches.

> On top of this series, you can build your amd commands easily. But there's
> nothing that can be done on top of your proposal.
> 
I don’t see what more to be done on top of our proposal.
If you hint nesting, than it can be done through a peer admin device to delete such admin role.

> >
> > In fact audit is still better than mediation because most audits are read only
> work as opposed to endlessly extending trapping and adding support in core
> stack.
> 
> One reality that you constantly ignore is that such trapping and device models
> have been widely used by a lot of cloud vendors for more than a decade.
> 
It may be but, it is not the only option.

> > Again, it is a choice that user make with the tradeoff.
> >
> > > >
> > > > >
> > > > > For non standard device we don't have choices other than
> > > > > passthrough, but for standard devices we have other choices.
> > > >
> > > > Passthrough is basic requirement that we will be fulfilling.
> > >
> > > It has several drawbacks that I would not like to repeat. We all
> > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > >
> > Sure. Both has pros and cons.
> > And both can co-exist.
> 
> I don't see how it can co-exist with your proposal. I can see how admin
> commands can co-exist on top of this series.
> 
The reason to me both has difficulty is because both are solving different problem.
And they can co-exist as two different methods to two different problems.

> >
> > > > If one wants to do special nesting, may be, there.
> > >
> > > Nesting is not special. Go and see how it is supported by major
> > > cloud vendors and you will get the answer. Introducing an interface
> > > in virtio that is hard to be virtualized is even worse than writing
> > > a compiler that can not do bootstrap compilation.
> > We checked with more than two major cloud vendors and passthrough suffice
> their use cases and they are not doing nesting.
> > And other virtio vendor would also like to support native devices. So again,
> please do not portray that nesting is the only thing and passthrough must not be
> done.
> 
> Where do I say passthrough must not be done? I'm saying you need to justify
> your proposal instead of simply saying "hey, you are wrong".
> 
I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.

> Again, nesting is not the only issue, the key point is that it's partial and not self
> contained.

Admin commands are self-contained to the owner device.
They are not self contained in the member device, because it cannot be. Self containment cannot work with device reset, flr, dma flow.
Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.
Lingshan so far didn’t answer this.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:22                                           ` Parav Pandit
@ 2023-09-19  4:35                                             ` Jason Wang
  2023-09-19  7:33                                               ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-19  4:35 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > > To: Parav Pandit <parav@nvidia.com>
> > >
> > > > > One can build infinite level of nesting to not do passthrough, at
> > > > > the end user
> > > > applications remains slow.
> > > >
> > > > We are talking about nested virtualization but nested emulation. I
> > > > won't repeat the definition of virtualization but no matter how much
> > > > level of nesting, the hypervisor will try hard to let the
> > > > application run natively for most of the time, otherwise it's not the nested
> > virtualization at all.
> > > >
> > > > Nested virtualization has been supported by all major cloud vendors,
> > > > please read the relevant documentation for the performance
> > > > implications. Virtio community is not the correct place to debate
> > > > whether a nest is useful. We need to make sure the datapath could be
> > > > assigned to any nest layers without losing any fundamental facilities like
> > migration.
> > > >
> > > I am not debating. You or Lingshan claim or imply that mediation is the only
> > way to progress.
> >
> > Let me correct your temiology again. It's "trap and emulation" . It means the
> > workload runs mostly native but sometimes is trapped by the hypervisor.
> >
>
> > And it's not the only way. It's the start point since all current virtio spec is built
> > upon this methodology.
> Current spec is not the steering point to define new methods.
> So we will build the spec infra to support passthrough.
>

Passthrough migration actually, passthrough is already supported now.

> Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.
>
> And hence, both are not mutually exclusive.
> Hence we should not debate that anymore.
>
> >
> > > And for sure virtio do not need to live in the dark shadow of mediation always.
> >
> > 99% of virtio devices are implemented in this way (which is what you call dark
> > and shadow) now.
> >
> What I am saying is one should not say mediation/trap-emulation is the only way for virtio.

Then using things like "dark shadow" is not fair.

> So let passthrough device migration to progress.

Then you need to answer or address the concerns.

>
> > > For nesting use case sure one can do mediation related mode.
> > >
> > > So only mediation is not the direction.
> >
> > CPU and MMU virtualization were all built in this way.
> >
> Not anymore. Both of them have vcpus and viommu where may things are not trapped.

We are talking about different things. I'm saying trap is a must but
you say not all are trapped.

> So as I said both has pros and cons and users will pick what fits their need and use case.
>
> > >
> > > > > So for such N and M being > 1, one can use software base emulation
> > anyway.
> > > >
> > > > No, only the control path is trapped, the datapath is still passthrough.
> > > >
> > > Again, it depends on the use case.
> >
> > No matter what use case, the definition and methodology of virtualization
> > stands still.
> >
> I will stop debating this because the core technical question is not answered.
> I don’t see a technology available that virtio can utilize to it.
> That is interface that can work without messing with device status and flr while device migration is ongoing.

Again, you need to justify it. For example, why does it mess up device
status? Why is rest ok but not suspending?

At least so far, I don't see good answers for thoses.

> Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
> And that is just fine.
>
> > >
> > > > >
> > > > > >
> > > > > > And exposing the whole device to the guest drivers will have
> > > > > > security implications, your proposal has demonstrated that you
> > > > > > need a workaround for
> > > > > There is no security implications in passthrough.
> > > >
> > > > How can you prove this or is it even possible for you to prove this?
> > > Huh, when you claim that it is not secure, please point out exactly what is not
> > secure.
> > > Please take with PCI SIG and file CVE to PCI sig.
> >
> > I am saying it has security implications. That is why you need to explain why you
> > think it doesn't. What's more, the implications are obviously nothing related to
> > PCI SIG but a vendor virtio hardware implementation.
> >
> PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
> Device migration is not adding/removing anything, nor touching any security aspect of it.
> Because it does not need to it either.
> Device migration is making sure that it continue to exists.

Since we are discussing in the virtio community, what we care about is
the chance that guest(driver) can explore device security
vulnerabilities. In this context, exposing more means the increasing
of the attacking surfaces since we (cloud vendor) can't control guests
but the hypervisor.

>
> > >
> > > > You expose all device details to guests (especially the transport
> > > > specific details), the attack surface is increased in this way.
> > > One can say it is the opposite.
> > > Attack surface is increased in hypervisor due to mediation poking at
> > everything controlled by the guest.
> > >
> >
> > We all know such a stack has been widely used for decades. But you want to say
> > your new stack is much more secure than this?
> >
> It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
> And not involving hypervisor in core device operation.

That's perfectly fine if we can do this. But you need to justify this.

>
> > >
> > > >
> > > > What's more, a simple passthrough may lose the chance to workaround
> > > > hardware erratas and you will finally get back to the trap and emulation.
> > > Hardware errata's is not the starting point to build the software stack and
> > spec.
> >
> > It's not the starting point. But it's definitely something that needs to be
> > considered, go and see kernel codes (especially the KVM part) and you will get
> > the answer.
> >
> There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.
>
> So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

Well, for sure there are cases that can't be worked around. But for
the case that it can, trap and emulation gives much more flexibility.

>
> > > What you imply is, one must never use vfio stack, one must not use vcpu
> > acceleration and everything must be emulated.
> >
> > Do I say so? Trap and emulation is the common methodology used in KVM and
> > VFIO. And if you want to replace it with a complete passthrough, you need to
> > prove your method can work.
> >
> Please review patches. I do not plan to _replace_ is either.

You define all the migration stuffs in the admin commands section,
isn't this an implicit coupling?

> Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
> And those users who prefer trap+emualation can use that.
>
> > >
> > > Same argument of hardware errata applied to data path too.
> >
> > Anything makes datapath different? Xen used to fallback to shadow page tables
> > to workaround hardware TDP errata in the past.
> >
> > > One should not implement in hw...
> > >
> > > I disagree with such argument.
> >
> > It's not my argument.
> >
> You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
> And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

I've explained before, we all know there're errata that can't be a
workaround in any way.

>
> > >
> > > You can say nesting is requirement for some use cases, so spec should support
> > it without blocking the passthrough mode.
> > > Then it is fair discussion.
> > >
> > > I will not debate further on passthrough vs control path mediation as
> > either_or approach.
> > >
> > > >
> > > > >
> > > > > > FLR at least.
> > > > > It is actually the opposite.
> > > > > FLR is supported with the proposal without any workarounds and
> > mediation.
> > > >
> > > > It's an obvious drawback but not an advantage. And it's not a must
> > > > for live migration to work. You need to prove the FLR doesn't
> > > > conflict with the live migration, and it's not only FLR but also all the other
> > PCI facilities.
> > > I don’t know what you mean by prove. It is already clear from the proposal
> > FLR is not messing with rest of the device migration infrastructure.
> > > You should read [1].
> >
> > I don't think you answered my question in that thread.
> >
> Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.
>
> > >
> > > > one other
> > > > example is P2P and what's the next? As more features were added to
> > > > the PCI spec, you will have endless work in auditing the possible
> > > > conflict with the passthrough based live migration.
> > > >
> > > This drawback equally applies to mediation route where one need to do more
> > than audit where the mediation layer to be extended.
> >
> > No, for trap and emulation we don't need to do that. We only do datapath
> > assignments.
> >
> It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

You need first answer the following questions:

1) Why FLR is a must for the guest
2) What's wrong with the current Qemu emulation of FLR for virtio-pci device

>
> > > So each method has its pros and cons. One suits one use case, other suits
> > other use case.
> > > Therefore, again attempting to claim that only mediation approach is the only
> > way to progress is incorrect.
> >
> > I never say things like this, it is your proposal that mandates migration with
> > admin commands. Could you please read what is proposed in this series
> > carefully?
> >
> Admin commands are split from the AQ so one can use the admin commands inband as well.

How can it? It couples a lot of concepts like group, owner and
members. All of these have only existed in SR-IOV so far.

I don't know how to define those for MMIO where the design wants to be
as simple as possible.

> Though, I don’t see how it can functionality work without mediation.
> This is the key technical difference between two approaches.
>
> > On top of this series, you can build your amd commands easily. But there's
> > nothing that can be done on top of your proposal.
> >
> I don’t see what more to be done on top of our proposal.

Actually it really has one, that is moving the description/definition
of those states to the basc facility part. But if we do this, why not
do it from the start? This is exactly what Lingshan's proposal did.

> If you hint nesting, than it can be done through a peer admin device to delete such admin role.
>
> > >
> > > In fact audit is still better than mediation because most audits are read only
> > work as opposed to endlessly extending trapping and adding support in core
> > stack.
> >
> > One reality that you constantly ignore is that such trapping and device models
> > have been widely used by a lot of cloud vendors for more than a decade.
> >
> It may be but, it is not the only option.

I don't say it's the only option. If most of the devices were built in
this way, we should first allow any new function to be available to
those devices and then consider other cases. Inventing a mechanism
that can't work for most of the existing devices is sub-optimal.

>
> > > Again, it is a choice that user make with the tradeoff.
> > >
> > > > >
> > > > > >
> > > > > > For non standard device we don't have choices other than
> > > > > > passthrough, but for standard devices we have other choices.
> > > > >
> > > > > Passthrough is basic requirement that we will be fulfilling.
> > > >
> > > > It has several drawbacks that I would not like to repeat. We all
> > > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > > >
> > > Sure. Both has pros and cons.
> > > And both can co-exist.
> >
> > I don't see how it can co-exist with your proposal. I can see how admin
> > commands can co-exist on top of this series.
> >
> The reason to me both has difficulty is because both are solving different problem.
> And they can co-exist as two different methods to two different problems.

It's not hard to demonstrate how admin commands can be built on top.

>
> > >
> > > > > If one wants to do special nesting, may be, there.
> > > >
> > > > Nesting is not special. Go and see how it is supported by major
> > > > cloud vendors and you will get the answer. Introducing an interface
> > > > in virtio that is hard to be virtualized is even worse than writing
> > > > a compiler that can not do bootstrap compilation.
> > > We checked with more than two major cloud vendors and passthrough suffice
> > their use cases and they are not doing nesting.
> > > And other virtio vendor would also like to support native devices. So again,
> > please do not portray that nesting is the only thing and passthrough must not be
> > done.
> >
> > Where do I say passthrough must not be done? I'm saying you need to justify
> > your proposal instead of simply saying "hey, you are wrong".
> >
> I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.
>
> > Again, nesting is not the only issue, the key point is that it's partial and not self
> > contained.
>
> Admin commands are self-contained to the owner device.
> They are not self contained in the member device, because it cannot be.

There're cases that self contained is not required for example the
provisioning. Admin commands/queues fit perfectly there.

> Self containment cannot work with device reset, flr, dma flow.

How do you define self containment? We all know that virtio can't fly
without transporting specific things ...

For the context of "self contain" I mean the basic virtio facility
needs to be self contained.

> Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
> And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.

Why is it the job for me? This proposal doesn't use admin commands at all.

Thanks





> Lingshan so far didn’t answer this.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:35                                             ` Jason Wang
@ 2023-09-19  7:33                                               ` Parav Pandit
  0 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-19  7:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:06 AM

> > Current spec is not the steering point to define new methods.
> > So we will build the spec infra to support passthrough.
> >
> 
> Passthrough migration actually, passthrough is already supported now.
Yes, device migration basic facility for passthrough devices.
My series adds basic facility section extension.

> 
> > Mediation/trap-emulation where hypervisor is involved is also second use
> case that you are addressing.
> >
> > And hence, both are not mutually exclusive.
> > Hence we should not debate that anymore.
> >
> > >
> > > > And for sure virtio do not need to live in the dark shadow of mediation
> always.
> > >
> > > 99% of virtio devices are implemented in this way (which is what you
> > > call dark and shadow) now.
> > >
> > What I am saying is one should not say mediation/trap-emulation is the only
> way for virtio.
> 
> Then using things like "dark shadow" is not fair.
I apologize. Lets work towards supporting device migration for passthrough as well.
The comments I hear from you hints that virtio must live its life through mediation.

> 
> > So let passthrough device migration to progress.
> 
> Then you need to answer or address the concerns.
> 
Sure. Will do once I receive the comments in the patches of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > For nesting use case sure one can do mediation related mode.
> > > >
> > > > So only mediation is not the direction.
> > >
> > > CPU and MMU virtualization were all built in this way.
> > >
> > Not anymore. Both of them have vcpus and viommu where may things are not
> trapped.
> 
> We are talking about different things. I'm saying trap is a must but you say not
> all are trapped.
> 
To be clear, all I am saying is virtio interface level trap is not must to achieve device migration.

> > So as I said both has pros and cons and users will pick what fits their need and
> use case.
> >
> > > >
> > > > > > So for such N and M being > 1, one can use software base
> > > > > > emulation
> > > anyway.
> > > > >
> > > > > No, only the control path is trapped, the datapath is still passthrough.
> > > > >
> > > > Again, it depends on the use case.
> > >
> > > No matter what use case, the definition and methodology of
> > > virtualization stands still.
> > >
> > I will stop debating this because the core technical question is not answered.
> > I don’t see a technology available that virtio can utilize to it.
> > That is interface that can work without messing with device status and flr
> while device migration is ongoing.
> 
> Again, you need to justify it. For example, why does it mess up device status?
With FLR and device reset in place, many things like queues, device registers etc do not work.
Because they are reset while migration is going on, and inflight descriptors info is lost.

> Why is rest ok but not suspending?
> 
Rest is not ok either to me.
I am not suggesting trapping CVQ or AQ or other virtio registers either.

> At least so far, I don't see good answers for thoses.
> 
> > Hence, methodology for passthrough and mediation/trap-emulation is
> fundamentally different.
> > And that is just fine.
> >
> > > >
> > > > > >
> > > > > > >
> > > > > > > And exposing the whole device to the guest drivers will have
> > > > > > > security implications, your proposal has demonstrated that
> > > > > > > you need a workaround for
> > > > > > There is no security implications in passthrough.
> > > > >
> > > > > How can you prove this or is it even possible for you to prove this?
> > > > Huh, when you claim that it is not secure, please point out
> > > > exactly what is not
> > > secure.
> > > > Please take with PCI SIG and file CVE to PCI sig.
> > >
> > > I am saying it has security implications. That is why you need to
> > > explain why you think it doesn't. What's more, the implications are
> > > obviously nothing related to PCI SIG but a vendor virtio hardware
> implementation.
> > >
> > PCI passthough for virtio member devices and non virtio devices with P2P, and
> their interaction is already there in the VM.
> > Device migration is not adding/removing anything, nor touching any security
> aspect of it.
> > Because it does not need to it either.
> > Device migration is making sure that it continue to exists.
> 
> Since we are discussing in the virtio community, what we care about is the
> chance that guest(driver) can explore device security vulnerabilities. In this
> context, exposing more means the increasing of the attacking surfaces since we
> (cloud vendor) can't control guests but the hypervisor.
> 
Guest (driver) can explore device security vulnerabilities in many areas not just device status and cvq.
So it is not good answer to me.

> >
> > > >
> > > > > You expose all device details to guests (especially the
> > > > > transport specific details), the attack surface is increased in this way.
> > > > One can say it is the opposite.
> > > > Attack surface is increased in hypervisor due to mediation poking
> > > > at
> > > everything controlled by the guest.
> > > >
> > >
> > > We all know such a stack has been widely used for decades. But you
> > > want to say your new stack is much more secure than this?
> > >
> > It can be yes, because it exposes all necessary things defined in the virtio spec
> boundary today.
> > And not involving hypervisor in core device operation.
> 
> That's perfectly fine if we can do this. But you need to justify this.
> 
We are not inventing any new things here. As you acknowledged passthtrough devices are already there...
> >
> > > >
> > > > >
> > > > > What's more, a simple passthrough may lose the chance to
> > > > > workaround hardware erratas and you will finally get back to the trap
> and emulation.
> > > > Hardware errata's is not the starting point to build the software
> > > > stack and
> > > spec.
> > >
> > > It's not the starting point. But it's definitely something that
> > > needs to be considered, go and see kernel codes (especially the KVM
> > > part) and you will get the answer.
> > >
> > There are kernels which cannot be updated in field today in Nvidia cloud
> shipped by Redhat's OS variant.
> >
> > So it is invalid assumption that somehow data path does not have bug, but
> large part of the control plane has bug, hence it should be done in software...
> 
> Well, for sure there are cases that can't be worked around. But for the case that
> it can, trap and emulation gives much more flexibility.
> 
It can. So it is not your or mine decision.
The user will pick what they want to use.
So it is invalid assumption and hence invalid point to discuss.

> >
> > > > What you imply is, one must never use vfio stack, one must not use
> > > > vcpu
> > > acceleration and everything must be emulated.
> > >
> > > Do I say so? Trap and emulation is the common methodology used in
> > > KVM and VFIO. And if you want to replace it with a complete
> > > passthrough, you need to prove your method can work.
> > >
> > Please review patches. I do not plan to _replace_ is either.
> 
> You define all the migration stuffs in the admin commands section, isn't this an
> implicit coupling?
> 
RSS is done in receive packet section. Is this coupling RSS with receive q? Yes, because it is meant for it.

You are questioning, 
How one can receive packets and post descriptors without virtqueues? Oh, descriptors are implicitly tied to virtqueues.. too bad..

If above mechanism looks coupling, than it is one connection.
May be there is another use case and more efficicent way without admin commands, I didn’t hear it so far.

I don’t see a point of writing some non-practical abstract spec.

> > Those users who want to use passthrough, can use passthrough with major
> traps+emulation on FLR, device_status, cvq, avq and without implementing AQ
> on every single member device.
> > And those users who prefer trap+emualation can use that.
> >
> > > >
> > > > Same argument of hardware errata applied to data path too.
> > >
> > > Anything makes datapath different? Xen used to fallback to shadow
> > > page tables to workaround hardware TDP errata in the past.
> > >
> > > > One should not implement in hw...
> > > >
> > > > I disagree with such argument.
> > >
> > > It's not my argument.
> > >
> > You claimed that to overcome hw errata, one should use trap_emulation,
> somehow only for portion of the functionality.
> > And rest portion of the functionality does not have hw errata, hence
> > hw should be use (for example for data path). :)
> 
> I've explained before, we all know there're errata that can't be a workaround in
> any way.
> 
No point in discussing hw errata. As it is not the goal or anti-goal for trap+emulation or passthrough either.

> >
> > > >
> > > > You can say nesting is requirement for some use cases, so spec
> > > > should support
> > > it without blocking the passthrough mode.
> > > > Then it is fair discussion.
> > > >
> > > > I will not debate further on passthrough vs control path mediation
> > > > as
> > > either_or approach.
> > > >
> > > > >
> > > > > >
> > > > > > > FLR at least.
> > > > > > It is actually the opposite.
> > > > > > FLR is supported with the proposal without any workarounds and
> > > mediation.
> > > > >
> > > > > It's an obvious drawback but not an advantage. And it's not a
> > > > > must for live migration to work. You need to prove the FLR
> > > > > doesn't conflict with the live migration, and it's not only FLR
> > > > > but also all the other
> > > PCI facilities.
> > > > I don’t know what you mean by prove. It is already clear from the
> > > > proposal
> > > FLR is not messing with rest of the device migration infrastructure.
> > > > You should read [1].
> > >
> > > I don't think you answered my question in that thread.
> > >
> > Please ask the question in that series if any, because there is no FLR, device
> reset interaction in passthrough between owner and member device.
> >
> > > >
> > > > > one other
> > > > > example is P2P and what's the next? As more features were added
> > > > > to the PCI spec, you will have endless work in auditing the
> > > > > possible conflict with the passthrough based live migration.
> > > > >
> > > > This drawback equally applies to mediation route where one need to
> > > > do more
> > > than audit where the mediation layer to be extended.
> > >
> > > No, for trap and emulation we don't need to do that. We only do
> > > datapath assignments.
> > >
> > It is required, because also such paths to be audited and extended as without
> it the feature does not visible to the guest.
> 
> You need first answer the following questions:
> 
> 1) Why FLR is a must for the guest
Because passthrough device guest does it.

> 2) What's wrong with the current Qemu emulation of FLR for virtio-pci device
> 
I am not sure QEMU discussion is relevant here.

But in general, when passthrough device is given to an example software like QEMU, it will not bisect the FLR differently from other VF.

> >
> > > > So each method has its pros and cons. One suits one use case,
> > > > other suits
> > > other use case.
> > > > Therefore, again attempting to claim that only mediation approach
> > > > is the only
> > > way to progress is incorrect.
> > >
> > > I never say things like this, it is your proposal that mandates
> > > migration with admin commands. Could you please read what is
> > > proposed in this series carefully?
> > >
> > Admin commands are split from the AQ so one can use the admin commands
> inband as well.
> 
> How can it? It couples a lot of concepts like group, owner and members. All of
> these have only existed in SR-IOV so far.
> 
It is drafted nicely by Michael that admin commands has one transport as AQ.
It can be extended for MMIO when one needs it, I will let it to the creativity of MMIO supporter to have admin queue on MMIO device like other virtqueues.

So far no one seems interested in to extend MMIO other than theoretical questions.

> I don't know how to define those for MMIO where the design wants to be as
> simple as possible.
> 
What prevents MMIO to have AQ?

> > Though, I don’t see how it can functionality work without mediation.
> > This is the key technical difference between two approaches.
> >
> > > On top of this series, you can build your amd commands easily. But
> > > there's nothing that can be done on top of your proposal.
> > >
> > I don’t see what more to be done on top of our proposal.
> 
> Actually it really has one, that is moving the description/definition of those
> states to the basc facility part. But if we do this, why not do it from the start?
> This is exactly what Lingshan's proposal did.
> 
Device context is defined in basic facility section. It is under admin commands because reading something large without command doesn't seem possible.
Lingshan is not showing how a giant RSS context, many other fields of the device can be read/written during device migration flow.
So its incomplete work, which is covered in [1] using admin commands.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead.

> > If you hint nesting, than it can be done through a peer admin device to delete
> such admin role.
> >
> > > >
> > > > In fact audit is still better than mediation because most audits
> > > > are read only
> > > work as opposed to endlessly extending trapping and adding support
> > > in core stack.
> > >
> > > One reality that you constantly ignore is that such trapping and
> > > device models have been widely used by a lot of cloud vendors for more
> than a decade.
> > >
> > It may be but, it is not the only option.
> 
> I don't say it's the only option. If most of the devices were built in this way, we
> should first allow any new function to be available to those devices and then
> consider other cases. Inventing a mechanism that can't work for most of the
> existing devices is sub-optimal.
> 
I don’t agree to "first", "cant work" and "sub-optimal".

It seems to work for more than one vendor.
Proposal [1] should work for passthrough devices that users are using.
[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > Again, it is a choice that user make with the tradeoff.
> > > >
> > > > > >
> > > > > > >
> > > > > > > For non standard device we don't have choices other than
> > > > > > > passthrough, but for standard devices we have other choices.
> > > > > >
> > > > > > Passthrough is basic requirement that we will be fulfilling.
> > > > >
> > > > > It has several drawbacks that I would not like to repeat. We all
> > > > > know even for VFIO, it requires a trap instead of a complete
> passthrough.
> > > > >
> > > > Sure. Both has pros and cons.
> > > > And both can co-exist.
> > >
> > > I don't see how it can co-exist with your proposal. I can see how
> > > admin commands can co-exist on top of this series.
> > >
> > The reason to me both has difficulty is because both are solving different
> problem.
> > And they can co-exist as two different methods to two different problems.
> 
> It's not hard to demonstrate how admin commands can be built on top.
> 
I don’t see a reason why it should be on top.

Look, if you have _real_ interest in both use cases to utilize lets work toward such definition.
If you don’t have interest, and I don’t see a point of objecting and pointing figure at using trap+emulation.

> >
> > > >
> > > > > > If one wants to do special nesting, may be, there.
> > > > >
> > > > > Nesting is not special. Go and see how it is supported by major
> > > > > cloud vendors and you will get the answer. Introducing an
> > > > > interface in virtio that is hard to be virtualized is even worse
> > > > > than writing a compiler that can not do bootstrap compilation.
> > > > We checked with more than two major cloud vendors and passthrough
> > > > suffice
> > > their use cases and they are not doing nesting.
> > > > And other virtio vendor would also like to support native devices.
> > > > So again,
> > > please do not portray that nesting is the only thing and passthrough
> > > must not be done.
> > >
> > > Where do I say passthrough must not be done? I'm saying you need to
> > > justify your proposal instead of simply saying "hey, you are wrong".
> > >
> > I never said you are wrong. I replied to Lingshan that resuming/suspending
> queues after the device is suspended, is wrong, and it should not be done.
> >
> > > Again, nesting is not the only issue, the key point is that it's
> > > partial and not self contained.
> >
> > Admin commands are self-contained to the owner device.
> > They are not self contained in the member device, because it cannot be.
> 
> There're cases that self contained is not required for example the provisioning.
> Admin commands/queues fit perfectly there.
And it is not limited to it.

> 
> > Self containment cannot work with device reset, flr, dma flow.
> 
> How do you define self containment? We all know that virtio can't fly without
> transporting specific things ...
> 
Self containment is only member device drives following without need of owner or peer device.
1. device reset
2. FLR
3. dirty page tracking
4. device context read + write

> For the context of "self contain" I mean the basic virtio facility needs to be self
> contained.
> 
> > Self containment requires mediation or renamed trap+emulation; which is the
> anti-goal of passtrough.
> > And I am very interested if you can show how admin commands can work
> with device reset, flr flow WITHOUT mediation approach.
> 
> Why is it the job for me? This proposal doesn't use admin commands at all.
Because few days back Lingshan claimed that he wants to see both needs must be addressed somehow
And you claimed that basic facility that you build in this patch can work for _any_ software, not some specific software (otherwise you quoted it as failure).

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 11:50                               ` Parav Pandit
  2023-09-12  3:43                                 ` Jason Wang
@ 2023-09-12  3:48                                 ` Zhu, Lingshan
  2023-09-12  5:51                                   ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:48 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/11/2023 7:50 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> By the way, do you see anything we need to improve in this series?
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.
>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, we have discussed a lot about the potential issues in the 
treads. I guess we should
resolve them first. E.g., nested use cases.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:48                                 ` Zhu, Lingshan
@ 2023-09-12  5:51                                   ` Parav Pandit
  2023-09-12  6:37                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  5:51 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:19 AM
> 
> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >> anything we need to improve in this series?
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> >
> > To me [1] is superset work that covers all needed functionality and downtime
> aspects.
> >
> > I plan to improve [1] with v1 this week by extending device context and
> addressing other review comments.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, we have discussed a lot about the potential issues in the treads. I
> guess we should resolve them first. E.g., nested use cases.
You are using nesting use case as the _only_ use case and attempt to steer using that.
Not right.

If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
If we cannot, both requirements should be handled differently.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:51                                   ` Parav Pandit
@ 2023-09-12  6:37                                     ` Zhu, Lingshan
  2023-09-12  6:49                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:37 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 1:51 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:19 AM
>>
>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>> anything we need to improve in this series?
>>> Admin commands for passthrough devices of [1] is comprehensive proposal
>> covering all the aspects.
>>> To me [1] is superset work that covers all needed functionality and downtime
>> aspects.
>>> I plan to improve [1] with v1 this week by extending device context and
>> addressing other review comments.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, we have discussed a lot about the potential issues in the treads. I
>> guess we should resolve them first. E.g., nested use cases.
> You are using nesting use case as the _only_ use case and attempt to steer using that.
> Not right.
>
> If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
> If we cannot, both requirements should be handled differently.
Isn't nested a clear use case that should be supported?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:37                                     ` Zhu, Lingshan
@ 2023-09-12  6:49                                       ` Parav Pandit
  2023-09-12  7:29                                         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  6:49 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:08 PM
> 
> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:19 AM
> >>
> >> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>> anything we need to improve in this series?
> >>> Admin commands for passthrough devices of [1] is comprehensive
> >>> proposal
> >> covering all the aspects.
> >>> To me [1] is superset work that covers all needed functionality and
> >>> downtime
> >> aspects.
> >>> I plan to improve [1] with v1 this week by extending device context
> >>> and
> >> addressing other review comments.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> I am not sure, we have discussed a lot about the potential issues in
> >> the treads. I guess we should resolve them first. E.g., nested use cases.
> > You are using nesting use case as the _only_ use case and attempt to steer
> using that.
> > Not right.
> >
> > If you want to discuss, then lets have both the use cases, attempt to converge
> and if we can its really good.
> > If we cannot, both requirements should be handled differently.
> Isn't nested a clear use case that should be supported?

Most users who care for running real applications and real performance, have not asked for nesting.
It is not mandatory case; it may be required for some users.
I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:49                                       ` Parav Pandit
@ 2023-09-12  7:29                                         ` Zhu, Lingshan
  2023-09-12  7:53                                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:29 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 2:49 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:08 PM
>>
>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>
>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>> anything we need to improve in this series?
>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>> proposal
>>>> covering all the aspects.
>>>>> To me [1] is superset work that covers all needed functionality and
>>>>> downtime
>>>> aspects.
>>>>> I plan to improve [1] with v1 this week by extending device context
>>>>> and
>>>> addressing other review comments.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> I am not sure, we have discussed a lot about the potential issues in
>>>> the treads. I guess we should resolve them first. E.g., nested use cases.
>>> You are using nesting use case as the _only_ use case and attempt to steer
>> using that.
>>> Not right.
>>>
>>> If you want to discuss, then lets have both the use cases, attempt to converge
>> and if we can its really good.
>>> If we cannot, both requirements should be handled differently.
>> Isn't nested a clear use case that should be supported?
> Most users who care for running real applications and real performance, have not asked for nesting.
> It is not mandatory case; it may be required for some users.
> I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.
Nested is a common use case and it is mandatory.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:29                                         ` Zhu, Lingshan
@ 2023-09-12  7:53                                           ` Parav Pandit
  2023-09-12  9:06                                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  7:53 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:59 PM
> 
> On 9/12/2023 2:49 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:08 PM
> >>
> >> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:19 AM
> >>>>
> >>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>>>> anything we need to improve in this series?
> >>>>> Admin commands for passthrough devices of [1] is comprehensive
> >>>>> proposal
> >>>> covering all the aspects.
> >>>>> To me [1] is superset work that covers all needed functionality
> >>>>> and downtime
> >>>> aspects.
> >>>>> I plan to improve [1] with v1 this week by extending device
> >>>>> context and
> >>>> addressing other review comments.
> >>>>> [1]
> >>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> >>>>> 61
> >>>>> .h
> >>>>> tml
> >>>> I am not sure, we have discussed a lot about the potential issues
> >>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
> >>> You are using nesting use case as the _only_ use case and attempt to
> >>> steer
> >> using that.
> >>> Not right.
> >>>
> >>> If you want to discuss, then lets have both the use cases, attempt
> >>> to converge
> >> and if we can its really good.
> >>> If we cannot, both requirements should be handled differently.
> >> Isn't nested a clear use case that should be supported?
> > Most users who care for running real applications and real performance, have
> not asked for nesting.
> > It is not mandatory case; it may be required for some users.
> > I don’t know who needs M level nesting and how cpu also support its
> acceleration etc to run some reasonable workload.
> Nested is a common use case and it is mandatory.
Maybe it is common case for the users you interact with, it is required for some complicated mode.
How many level of nesting 10, 2, 100?

I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.

As I repeatedly acknowledged, 
We are open to converge on doing administration commands that can work for passthrough and nested way.

I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
This is the primary reason; I am less inclined to go the in-band method.
Until now, no one technically explained how it can even work on question from yesterday.

And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.

So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:53                                           ` Parav Pandit
@ 2023-09-12  9:06                                             ` Zhu, Lingshan
  2023-09-12  9:08                                               ` Zhu, Lingshan
  2023-09-12  9:28                                               ` Parav Pandit
  0 siblings, 2 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 3:53 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:59 PM
>>
>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>
>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>
>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>> anything we need to improve in this series?
>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>> proposal
>>>>>> covering all the aspects.
>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>> and downtime
>>>>>> aspects.
>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>> context and
>>>>>> addressing other review comments.
>>>>>>> [1]
>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>> 61
>>>>>>> .h
>>>>>>> tml
>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>> steer
>>>> using that.
>>>>> Not right.
>>>>>
>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>> to converge
>>>> and if we can its really good.
>>>>> If we cannot, both requirements should be handled differently.
>>>> Isn't nested a clear use case that should be supported?
>>> Most users who care for running real applications and real performance, have
>> not asked for nesting.
>>> It is not mandatory case; it may be required for some users.
>>> I don’t know who needs M level nesting and how cpu also support its
>> acceleration etc to run some reasonable workload.
>> Nested is a common use case and it is mandatory.
> Maybe it is common case for the users you interact with, it is required for some complicated mode.
> How many level of nesting 10, 2, 100?
>
> I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.
>
> As I repeatedly acknowledged,
> We are open to converge on doing administration commands that can work for passthrough and nested way.
>
> I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
> This is the primary reason; I am less inclined to go the in-band method.
> Until now, no one technically explained how it can even work on question from yesterday.
>
> And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.
>
> So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.
If you want AQ used for LM, it should support nested anyway, don't break 
user logic.


This (my)series can support nested.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:06                                             ` Zhu, Lingshan
@ 2023-09-12  9:08                                               ` Zhu, Lingshan
  2023-09-12  9:35                                                 ` Parav Pandit
  2023-09-12  9:28                                               ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:08 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 5:06 PM, Zhu, Lingshan wrote:
>
>
> On 9/12/2023 3:53 PM, Parav Pandit wrote:
>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>> Sent: Tuesday, September 12, 2023 12:59 PM
>>>
>>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>>
>>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>>
>>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>>> anything we need to improve in this series?
>>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>>> proposal
>>>>>>> covering all the aspects.
>>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>>> and downtime
>>>>>>> aspects.
>>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>>> context and
>>>>>>> addressing other review comments.
>>>>>>>> [1]
>>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>>> 61
>>>>>>>> .h
>>>>>>>> tml
>>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>>> in the treads. I guess we should resolve them first. E.g., 
>>>>>>> nested use cases.
>>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>>> steer
>>>>> using that.
>>>>>> Not right.
>>>>>>
>>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>>> to converge
>>>>> and if we can its really good.
>>>>>> If we cannot, both requirements should be handled differently.
>>>>> Isn't nested a clear use case that should be supported?
>>>> Most users who care for running real applications and real 
>>>> performance, have
>>> not asked for nesting.
>>>> It is not mandatory case; it may be required for some users.
>>>> I don’t know who needs M level nesting and how cpu also support its
>>> acceleration etc to run some reasonable workload.
>>> Nested is a common use case and it is mandatory.
>> Maybe it is common case for the users you interact with, it is 
>> required for some complicated mode.
>> How many level of nesting 10, 2, 100?
>>
>> I don’t see a point of debating that "nesting is the only case and 
>> mediation is the only way" to do device migration.
>>
>> As I repeatedly acknowledged,
>> We are open to converge on doing administration commands that can 
>> work for passthrough and nested way.
>>
>> I just don’t see how nested solution can work without any mediation, 
>> as everything you do touches device reset and FLR flow and it 
>> practically breaks the PCI specification with these side band 
>> registers and faking device reset and FLR when asked.
>> This is the primary reason; I am less inclined to go the in-band method.
>> Until now, no one technically explained how it can even work on 
>> question from yesterday.
>>
>> And if there is one, please explain, I am very interested to learn, 
>> how is this done without hacks where device reset by guest _actually_ 
>> reset the underlying member device while the dirty page tracking is 
>> also ongoing.
>>
>> So, my humble request is, try to work towards co-existing both the 
>> methods if possible, rather than doing either or mode.
> If you want AQ used for LM, it should support nested anyway, don't 
> break user logic.
>
>
> This (my)series can support nested.
supplementary: As Jason ever pointed out: the two solution can co-exist 
for sure, I am implementing basic facilities, admin vq can free feel to 
reuse them like forwarding messages to them, and this can help support 
nested.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:08                                               ` Zhu, Lingshan
@ 2023-09-12  9:35                                                 ` Parav Pandit
  2023-09-12 10:14                                                   ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  9:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:38 PM

> supplementary: As Jason ever pointed out: the two solution can co-exist for
> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
> forwarding messages to them, and this can help support nested.

Sure. Sounds good.

At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
And guideline for driver on how it should not touch them to make this usable.
This will make the nested solution more clear.

Do you find the administration commands we proposed in [1] useful for nested case?
If not, both will likely diverge.

We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.

So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:35                                                 ` Parav Pandit
@ 2023-09-12 10:14                                                   ` Zhu, Lingshan
  2023-09-12 10:16                                                     ` Parav Pandit
  2023-09-13  2:23                                                     ` Parav Pandit
  0 siblings, 2 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:14 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 5:35 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:38 PM
>> supplementary: As Jason ever pointed out: the two solution can co-exist for
>> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
>> forwarding messages to them, and this can help support nested.
> Sure. Sounds good.
>
> At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
This series is self-contained, it is an register based solution. It 
introduces basic facilities, doesn't depend on others like AQ.
> And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
> And guideline for driver on how it should not touch them to make this usable.
> This will make the nested solution more clear.
PCI FLR is out of this scope, for virtio you can still reset the device 
by writing 0.
>
> Do you find the administration commands we proposed in [1] useful for nested case?
> If not, both will likely diverge.
Not till now.
>
> We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
> It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.
>
> So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.
It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:14                                                   ` Zhu, Lingshan
@ 2023-09-12 10:16                                                     ` Parav Pandit
  2023-09-12 10:28                                                       ` Zhu, Lingshan
  2023-09-13  2:23                                                     ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 10:16 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM
> 
> On 9/12/2023 5:35 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:38 PM
> >> supplementary: As Jason ever pointed out: the two solution can
> >> co-exist for sure, I am implementing basic facilities, admin vq can
> >> free feel to reuse them like forwarding messages to them, and this can help
> support nested.
> > Sure. Sounds good.
> >
> > At lest two device vendors + other industry bodies including led by Intel are
> moving away from the register-based implementation in virtualization area.
> This series is self-contained, it is an register based solution. It introduces basic
> facilities, doesn't depend on others like AQ.
> > And registers that you expose are not supporting device reset and FLR
> sequence. So please add some text for that in PCI transport section about
> violation.
> > And guideline for driver on how it should not touch them to make this usable.
> > This will make the nested solution more clear.
> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
> >
> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.
> >
> > We would like to avoid suspending individual VQs in the passthrough case, as
> things are controlled at the device level.
> > It also reduces driver -> device interaction for large queue count ranging from
> 1 to 32K.
> >
> > So at present I see very little overlap between the two. I will look more again
> on 9/13 if passthrough proposal can utilize anything from your series.
> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.

We need to stop configuration notifications as well and shared memory update etc.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:16                                                     ` Parav Pandit
@ 2023-09-12 10:28                                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:28 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 6:16 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>
>> On 9/12/2023 5:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:38 PM
>>>> supplementary: As Jason ever pointed out: the two solution can
>>>> co-exist for sure, I am implementing basic facilities, admin vq can
>>>> free feel to reuse them like forwarding messages to them, and this can help
>> support nested.
>>> Sure. Sounds good.
>>>
>>> At lest two device vendors + other industry bodies including led by Intel are
>> moving away from the register-based implementation in virtualization area.
>> This series is self-contained, it is an register based solution. It introduces basic
>> facilities, doesn't depend on others like AQ.
>>> And registers that you expose are not supporting device reset and FLR
>> sequence. So please add some text for that in PCI transport section about
>> violation.
>>> And guideline for driver on how it should not touch them to make this usable.
>>> This will make the nested solution more clear.
>> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
>>> We would like to avoid suspending individual VQs in the passthrough case, as
>> things are controlled at the device level.
>>> It also reduces driver -> device interaction for large queue count ranging from
>> 1 to 32K.
>>> So at present I see very little overlap between the two. I will look more again
>> on 9/13 if passthrough proposal can utilize anything from your series.
>> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
> We need to stop configuration notifications as well and shared memory update etc.
already did if you have read my series


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:14                                                   ` Zhu, Lingshan
  2023-09-12 10:16                                                     ` Parav Pandit
@ 2023-09-13  2:23                                                     ` Parav Pandit
  2023-09-13  4:03                                                       ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  2:23 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM

> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.

I don’t think you reviewed [1] enough.
Following functionality that you want to post in v1 is already covered.
Why cannot you use it from [1]?

a. Dirty page tracking (write recording in [1]), 
b. device suspend/resume (mode setting)
c. inflight descriptors (device context)

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  2:23                                                     ` Parav Pandit
@ 2023-09-13  4:03                                                       ` Zhu, Lingshan
  2023-09-13  4:15                                                         ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 10:23 AM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
> I don’t think you reviewed [1] enough.
> Following functionality that you want to post in v1 is already covered.
> Why cannot you use it from [1]?
>
> a. Dirty page tracking (write recording in [1]),
> b. device suspend/resume (mode setting)
> c. inflight descriptors (device context)
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
you cut off the message, I don't know which conversation you are 
replying to.

But anyway, as pointed out many times, we are implementing basic facilities.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:03                                                       ` Zhu, Lingshan
@ 2023-09-13  4:15                                                         ` Parav Pandit
  2023-09-13  4:21                                                           ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:15 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:33 AM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/13/2023 10:23 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:45 PM
> >>> Do you find the administration commands we proposed in [1] useful
> >>> for
> >> nested case?
> >>> If not, both will likely diverge.
> >> Not till now.
> > I don’t think you reviewed [1] enough.
> > Following functionality that you want to post in v1 is already covered.
> > Why cannot you use it from [1]?
> >
> > a. Dirty page tracking (write recording in [1]), b. device
> > suspend/resume (mode setting) c. inflight descriptors (device context)
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> you cut off the message, I don't know which conversation you are replying to.
> 
> But anyway, as pointed out many times, we are implementing basic facilities.

I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
You replied, none is useful.
And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.

So I asked why you cannot use [1] that covers things that you plan to send in future?

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

Hope this clarifies.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:15                                                         ` Parav Pandit
@ 2023-09-13  4:21                                                           ` Zhu, Lingshan
  2023-09-13  4:37                                                             ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:21 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:15 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:33 AM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/13/2023 10:23 AM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:45 PM
>>>>> Do you find the administration commands we proposed in [1] useful
>>>>> for
>>>> nested case?
>>>>> If not, both will likely diverge.
>>>> Not till now.
>>> I don’t think you reviewed [1] enough.
>>> Following functionality that you want to post in v1 is already covered.
>>> Why cannot you use it from [1]?
>>>
>>> a. Dirty page tracking (write recording in [1]), b. device
>>> suspend/resume (mode setting) c. inflight descriptors (device context)
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> you cut off the message, I don't know which conversation you are replying to.
>>
>> But anyway, as pointed out many times, we are implementing basic facilities.
> I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
> You replied, none is useful.
> And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.
>
> So I asked why you cannot use [1] that covers things that you plan to send in future?
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
>
> Hope this clarifies.
we plan to implement a self-contain solution


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:21                                                           ` Zhu, Lingshan
@ 2023-09-13  4:37                                                             ` Parav Pandit
  2023-09-14  3:11                                                               ` Jason Wang
  2023-09-14  8:22                                                               ` Zhu, Lingshan
  0 siblings, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:37 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM

> we plan to implement a self-contain solution
Make sure that works with device reset and FLR.
And if not, explain that it is for mediation mode related tricks.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:37                                                             ` Parav Pandit
@ 2023-09-14  3:11                                                               ` Jason Wang
  2023-09-17  5:25                                                                 ` Parav Pandit
  2023-09-14  8:22                                                               ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Wednesday, September 13, 2023 9:51 AM
>
> > we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.

We don't need to do that. It's out of the spec.

> And if not, explain that it is for mediation mode related tricks.

It's not the tricks and again, it's not mediation but trap and
emulation. It's the fundamental methodology used in virtualization, so
does the virtio spec.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:11                                                               ` Jason Wang
@ 2023-09-17  5:25                                                                 ` Parav Pandit
  2023-09-19  4:34                                                                   ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-17  5:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Wednesday, September 13, 2023 9:51 AM
> >
> > > we plan to implement a self-contain solution
> > Make sure that works with device reset and FLR.
> 
> We don't need to do that. It's out of the spec.
> 
It is not. For the PCI member device, it needs to work reliably.
Not doing means it relies on the trap+emulation, hence it just cannot complete.
And it is ok to me.
I just wont claim that trap+emulation is _complete_ method.

> > And if not, explain that it is for mediation mode related tricks.
> 
> It's not the tricks and again, it's not mediation but trap and emulation. It's the
> fundamental methodology used in virtualization, so does the virtio spec.

Not the virto spec of 2023 and more for new features.
The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:25                                                                 ` Parav Pandit
@ 2023-09-19  4:34                                                                   ` Jason Wang
  2023-09-19  7:32                                                                     ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-19  4:34 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > >
> > > > we plan to implement a self-contain solution
> > > Make sure that works with device reset and FLR.
> >
> > We don't need to do that. It's out of the spec.
> >
> It is not. For the PCI member device, it needs to work reliably.

We never mentioned FLR in the PCI transport layer before and vendors
have produced tons of hardware PCI devices for several years.

If it's important, please describe it in detail in your series but it doesn't.

> Not doing means it relies on the trap+emulation, hence it just cannot complete.
> And it is ok to me.
> I just wont claim that trap+emulation is _complete_ method.
>
> > > And if not, explain that it is for mediation mode related tricks.
> >
> > It's not the tricks and again, it's not mediation but trap and emulation. It's the
> > fundamental methodology used in virtualization, so does the virtio spec.
>
> Not the virto spec of 2023 and more for new features.
> The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

Are you saying those new features will not be suitable for software
devices? If yes, please explain why.

Or are you saying the virtio spec is not capable for hardware devices?

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:34                                                                   ` Jason Wang
@ 2023-09-19  7:32                                                                     ` Parav Pandit
  0 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:04 AM
> 
> On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:41 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > >
> > > >
> > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > > >
> > > > > we plan to implement a self-contain solution
> > > > Make sure that works with device reset and FLR.
> > >
> > > We don't need to do that. It's out of the spec.
> > >
> > It is not. For the PCI member device, it needs to work reliably.
> 
> We never mentioned FLR in the PCI transport layer before and vendors have
> produced tons of hardware PCI devices for several years.
It is not mentioned like many other PCI things because its native.
What I was saying is that if you are claiming that suspend, resume etc all are some basic facilities that can work with passthrough also, please show how it works with FLR in place.

> 
> If it's important, please describe it in detail in your series but it doesn't.
> 
It is mentioned. Please review there.

> > Not doing means it relies on the trap+emulation, hence it just cannot
> complete.
> > And it is ok to me.
> > I just wont claim that trap+emulation is _complete_ method.
> >
> > > > And if not, explain that it is for mediation mode related tricks.
> > >
> > > It's not the tricks and again, it's not mediation but trap and
> > > emulation. It's the fundamental methodology used in virtualization, so does
> the virtio spec.
> >
> > Not the virto spec of 2023 and more for new features.
> > The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation
> based software AFAIK.
> 
> Are you saying those new features will not be suitable for software devices? If
> yes, please explain why.
> 
> Or are you saying the virtio spec is not capable for hardware devices?

No, you were hinting that trap+emulation is the fundamental technology of virtualization and virtio spec.

And I replied that for virtio spec 1.x only base line was spec 0.9.5, trap+emulation was not the base line when 1.x spec is drafted.

Virtio spec with few caveats is capable of hw and new sw, and it needs to continue to build new features that can work without trap+emulation mode.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:37                                                             ` Parav Pandit
  2023-09-14  3:11                                                               ` Jason Wang
@ 2023-09-14  8:22                                                               ` Zhu, Lingshan
  1 sibling, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:37 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>> we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.
> And if not, explain that it is for mediation mode related tricks.
also repeated for many times, this is trap and emulate, I don't know why you
keep talking about mediation.

And Why FLR? Is it related or out-of-sepc?

And again and again, we are implementing BASIC FACILITIES, which should
NOT introduce unnecessary dependencies.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:06                                             ` Zhu, Lingshan
  2023-09-12  9:08                                               ` Zhu, Lingshan
@ 2023-09-12  9:28                                               ` Parav Pandit
  2023-09-12 10:17                                                 ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  9:28 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:36 PM

> If you want AQ used for LM, it should support nested anyway, don't break user
> logic.
You ignored the other part of my question when you asked above.
i.e. a PCI transport do not allow such weird bifurcation.
> 
> 
> This (my)series can support nested.
Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
All these requirements are not addressed.

If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges, 
then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:28                                               ` Parav Pandit
@ 2023-09-12 10:17                                                 ` Zhu, Lingshan
  2023-09-12 10:25                                                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:17 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 5:28 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:36 PM
>> If you want AQ used for LM, it should support nested anyway, don't break user
>> logic.
> You ignored the other part of my question when you asked above.
> i.e. a PCI transport do not allow such weird bifurcation.
I failed to process your comment. Do you mean the registers don't 
support nested?
>>
>> This (my)series can support nested.
> Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
> All these requirements are not addressed.
>
> If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges,
> then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.
Why do you need P2P? Why FLR and reset are concerned? Why do you think 
dirty page tracking is not supported?
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:17                                                 ` Zhu, Lingshan
@ 2023-09-12 10:25                                                   ` Parav Pandit
  2023-09-12 10:32                                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 10:25 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:47 PM
> 
> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
> >> it should support nested anyway, don't break user logic.
> > You ignored the other part of my question when you asked above.
> > i.e. a PCI transport do not allow such weird bifurcation.
> I failed to process your comment. 

> Do you mean the registers don't support nested?
No. I mean registers access should support device reset flow and FLR flow.

> >>

> >> This (my)series can support nested.
> > Maybe it does, with hacking the device reset and FLR sequence, without dirty
> tracking, without P2P support, without passthrough mode.
> > All these requirements are not addressed.
> >
> > If you intent to cover both requirements, lets work towards it to see
> > if it can converge, if there are technical challenges, then there is no point in
> pushing to claim that in-band VF with mediation is the only way to move
> forward.
> Why do you need P2P? 
I answered this before few hours back.

> Why FLR and reset are concerned? 
Because they must work.
> Why do you think dirty page tracking is not supported?
Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.

All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:25                                                   ` Parav Pandit
@ 2023-09-12 10:32                                                     ` Zhu, Lingshan
  2023-09-12 10:40                                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 6:25 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:47 PM
>>
>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
>>>> it should support nested anyway, don't break user logic.
>>> You ignored the other part of my question when you asked above.
>>> i.e. a PCI transport do not allow such weird bifurcation.
>> I failed to process your comment.
>> Do you mean the registers don't support nested?
> No. I mean registers access should support device reset flow and FLR flow.
DO you see this is a concern? Or why do you think there are problems?
>
>>>> This (my)series can support nested.
>>> Maybe it does, with hacking the device reset and FLR sequence, without dirty
>> tracking, without P2P support, without passthrough mode.
>>> All these requirements are not addressed.
>>>
>>> If you intent to cover both requirements, lets work towards it to see
>>> if it can converge, if there are technical challenges, then there is no point in
>> pushing to claim that in-band VF with mediation is the only way to move
>> forward.
>> Why do you need P2P?
> I answered this before few hours back.
still, why P2P is a blocker of my series?
>
>> Why FLR and reset are concerned?
> Because they must work.
why FLR and reset are affected? When SUSPEND, the device should stop 
operation.
>> Why do you think dirty page tracking is not supported?
> Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
> And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
> And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
> I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.
>
> All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I will post V2 with in-flight descriptors tracking and dirty-page 
tracking. They are not in this series bcasue
I want this series to focus and small


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:32                                                     ` Zhu, Lingshan
@ 2023-09-12 10:40                                                       ` Parav Pandit
  2023-09-12 13:04                                                         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 10:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:02 PM
> 
> On 9/12/2023 6:25 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:47 PM
> >>
> >> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
> >>>> LM, it should support nested anyway, don't break user logic.
> >>> You ignored the other part of my question when you asked above.
> >>> i.e. a PCI transport do not allow such weird bifurcation.
> >> I failed to process your comment.
> >> Do you mean the registers don't support nested?
> > No. I mean registers access should support device reset flow and FLR flow.
> DO you see this is a concern? Or why do you think there are problems?
Yes. administration queue wont answer after SUSPEND is done in device status.
So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?

Device bifurcation is not supported in the pci spec and its hack in virtio to do so.

As I asked few times before, if you have solved this, I am very interested to learn more about it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:40                                                       ` Parav Pandit
@ 2023-09-12 13:04                                                         ` Zhu, Lingshan
  2023-09-12 13:36                                                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:04 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 6:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:02 PM
>>
>> On 9/12/2023 6:25 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:47 PM
>>>>
>>>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
>>>>>> LM, it should support nested anyway, don't break user logic.
>>>>> You ignored the other part of my question when you asked above.
>>>>> i.e. a PCI transport do not allow such weird bifurcation.
>>>> I failed to process your comment.
>>>> Do you mean the registers don't support nested?
>>> No. I mean registers access should support device reset flow and FLR flow.
>> DO you see this is a concern? Or why do you think there are problems?
> Yes. administration queue wont answer after SUSPEND is done in device status.
> So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?
>
> Device bifurcation is not supported in the pci spec and its hack in virtio to do so.
>
> As I asked few times before, if you have solved this, I am very interested to learn more about it.
Please read the series


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:04                                                         ` Zhu, Lingshan
@ 2023-09-12 13:36                                                           ` Parav Pandit
  0 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 13:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:34 PM


> > As I asked few times before, if you have solved this, I am very interested to
> learn more about it.
> Please read the series
As I promised already I will read on 9/13.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:12                       ` Parav Pandit
  2023-09-11  8:46                         ` Zhu, Lingshan
@ 2023-09-12  4:10                         ` Jason Wang
  2023-09-12  6:05                           ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-12  4:10 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

>
> > Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

The series works for stateless devices. Before we introduce device
states in the spec, we can't migrate stateful devices. So the device
context doesn't make much sense right now.

Dirty page tracking in virtio is not a must for live migration to
work. It can be done via platform facilities or even software. And to
make it more efficient, it needs to utilize transport facilities
instead of a general one.

The FLR, P2P demonstrates the fragility of a simple passthrough method
and how it conflicts with live migration and complicates the device
implementation. And it means you need to audit all PCI features and do
workaround if there're any possible issues (or using a whitelist).
This is tricky and we are migrating virtio not virtio-pci. If we don't
use simple passthrough we don't need to care about this.

Since the functionality proposed in this series focus on the minimal
set of the functionality for migration, it is virtio specific and self
contained so nothing special is required to work in the nest.

Thanks

>

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:10                         ` Jason Wang
@ 2023-09-12  6:05                           ` Parav Pandit
  2023-09-13  4:45                             ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:40 AM
> >
> > > Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> The series works for stateless devices. Before we introduce device states in the
> spec, we can't migrate stateful devices. So the device context doesn't make
> much sense right now.
The series works for stateful devices too. The device context covers it.

> 
> Dirty page tracking in virtio is not a must for live migration to work. It can be
> done via platform facilities or even software. And to make it more efficient, it
> needs to utilize transport facilities instead of a general one.
> 
It is also optional in the spec proposal.
Most platforms claimed are not able to do efficiently either, hence the vfio subsystem added the support for it.

> The FLR, P2P demonstrates the fragility of a simple passthrough method and
> how it conflicts with live migration and complicates the device implementation.
Huh, it shows the opposite.
It shows that both will seamlessly work.

> And it means you need to audit all PCI features and do workaround if there're
> any possible issues (or using a whitelist).
No need for any of this.

> This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> passthrough we don't need to care about this.
> 
Exactly, we are migrating virtio device for the PCI transport.
As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.
Virtio does not need to stay in the weird umbrella to always mediate etc.

Series [1] will be enhanced further to support virtio passthrough device for device context and more.
Even further we like to extend the support.

> Since the functionality proposed in this series focus on the minimal set of the
> functionality for migration, it is virtio specific and self contained so nothing
> special is required to work in the nest.

Maybe it is.

Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.
If we can converge it is good.
If not both modes can expand.
It is not either or as use cases are different.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:05                           ` Parav Pandit
@ 2023-09-13  4:45                             ` Jason Wang
  2023-09-13  6:39                               ` Parav Pandit
  2023-09-13  8:27                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 148+ messages in thread
From: Jason Wang @ 2023-09-13  4:45 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Tue, Sep 12, 2023 at 2:05 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:40 AM
> > >
> > > > Why this series can not support nested?
> > > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> > context migration, virtio level reset, dirty page tracking, p2p support, etc.
> > covered in some device, vq suspend resume piece.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > The series works for stateless devices. Before we introduce device states in the
> > spec, we can't migrate stateful devices. So the device context doesn't make
> > much sense right now.
> The series works for stateful devices too. The device context covers it.

How? Can it be used for migrating any existing stateful devices? Don't
we need to define what context means for a specific stateful device
before you can introduce things like device context? Please go through
the archives for the relevant discussions (e.g virtio-FS), it's not as
simple as introducing a device context API.

And what's more, how can it handle the migration compatibility?

>
> >
> > Dirty page tracking in virtio is not a must for live migration to work. It can be
> > done via platform facilities or even software. And to make it more efficient, it
> > needs to utilize transport facilities instead of a general one.
> >
> It is also optional in the spec proposal.
> Most platforms claimed are not able to do efficiently either,

Most platforms are working towards an efficient way. But we are
talking about different things, hardware based dirty page logging is
not a must, that is what I'm saying. For example, KVM doesn't use
hardware to log dirty pages.

> hence the vfio subsystem added the support for it.

As an open standard, if it is designed for a specific software
subsystem on a specific OS, it's a failure.

>
> > The FLR, P2P demonstrates the fragility of a simple passthrough method and
> > how it conflicts with live migration and complicates the device implementation.
> Huh, it shows the opposite.
> It shows that both will seamlessly work.

Have you even tried your proposal with a prototype device?

>
> > And it means you need to audit all PCI features and do workaround if there're
> > any possible issues (or using a whitelist).
> No need for any of this.

You need to prove this otherwise it's fragile. It's the duty of the
author to justify not the reviewer.

For example FLR is required to be done in 100ms. How could you achieve
this during the live migration? How does it affect the downtime and
FRS?

>
> > This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> > passthrough we don't need to care about this.
> >
> Exactly, we are migrating virtio device for the PCI transport.

No, the migration facility is a general requirement for all transport.
Starting from a PCI specific (actually your proposal does not even
cover all even for PCI) solution which may easily end up with issues
in other transports.

Even if you want to migrate virtio for PCI,  please at least read Qemu
migration codes for virtio and PCI, then you will soon realize that a
lot of things are missing in your proposal.

> As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.

Who is "we"? Is something like what you said here passed the vote and
written to the spec? We all know the current virtio spec is not built
upon passthrough.

> Virtio does not need to stay in the weird umbrella to always mediate etc.

It's not the mediation, we're not doing vDPA, the device model we had
in hardware and we present to guests are all virtio devices. It's the
trap and emulation which is fundamental in the world of virtualization
for the past decades. It's the model we used to virtualize standard
devices. If you want to debate this methodology, virtio community is
clearly the wrong forum.

>
> Series [1] will be enhanced further to support virtio passthrough device for device context and more.
> Even further we like to extend the support.
>
> > Since the functionality proposed in this series focus on the minimal set of the
> > functionality for migration, it is virtio specific and self contained so nothing
> > special is required to work in the nest.
>
> Maybe it is.
>
> Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.

You need to prove at least that your proposal can work for the
passthrough before we can try to converge.

> If we can converge it is good.
> If not both modes can expand.
> It is not either or as use cases are different.

Admin commands are not the cure for all, I've stated drawbacks in
other threads. Not repeating it again here.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:45                             ` Jason Wang
@ 2023-09-13  6:39                               ` Parav Pandit
  2023-09-14  3:08                                 ` Jason Wang
  2023-09-13  8:27                               ` Michael S. Tsirkin
  1 sibling, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  6:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:15 AM

[..]
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > The series works for stateless devices. Before we introduce device
> > > states in the spec, we can't migrate stateful devices. So the device
> > > context doesn't make much sense right now.
> > The series works for stateful devices too. The device context covers it.
> 
> How? Can it be used for migrating any existing stateful devices? Don't we need
> to define what context means for a specific stateful device before you can
> introduce things like device context? Please go through the archives for the
> relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> context API.
> 
A device will have its own context for example RSS definition, or flow filters tomorrow.
The device context will be extended post the first series.

> And what's more, how can it handle the migration compatibility?
It will be taken care in follow on as we all know that this to be checked.
I will include the notes of future follow up work items in v1, which will be taken care post this series.

> > > Dirty page tracking in virtio is not a must for live migration to
> > > work. It can be done via platform facilities or even software. And
> > > to make it more efficient, it needs to utilize transport facilities instead of a
> general one.
> > >
> > It is also optional in the spec proposal.
> > Most platforms claimed are not able to do efficiently either,
> 
> Most platforms are working towards an efficient way. But we are talking about
> different things, hardware based dirty page logging is not a must, that is what
> I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> 
I also said same, that hw based dirty page logging is not must. :)
One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

> > hence the vfio subsystem added the support for it.
> 
> As an open standard, if it is designed for a specific software subsystem on a
> specific OS, it's a failure.
> 
It is not.
One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
So that virtio spec advancement fits in to supply such use cases.
And blocking such advancement of virtio spec to promote only_mediation approach is not good either.

BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
I will stay away from quoting it, as I don’t see it this way.

> >
> > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > method and how it conflicts with live migration and complicates the device
> implementation.
> > Huh, it shows the opposite.
> > It shows that both will seamlessly work.
> 
> Have you even tried your proposal with a prototype device?
Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

> 
> >
> > > And it means you need to audit all PCI features and do workaround if
> > > there're any possible issues (or using a whitelist).
> > No need for any of this.
> 
> You need to prove this otherwise it's fragile. It's the duty of the author to justify
> not the reviewer.
> 
One cannot post patches and nor review giant series in one go.
Hence the work to be split on a logical boundary.
Features provisioning, pci layout etc is secondary tasks to take care of.

> For example FLR is required to be done in 100ms. How could you achieve this
> during the live migration? How does it affect the downtime and FRS?
> 
Good technical question to discuss instead of passthrough vs mediation. :)

Device administration work is separate from the device operational part.
The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
And on next read of the device context the FLRed context is returned.

> >
> > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > don't use simple passthrough we don't need to care about this.
> > >
> > Exactly, we are migrating virtio device for the PCI transport.
> 
> No, the migration facility is a general requirement for all transport.
It is for all transport. One can extend when do for MMIO.

> Starting from a PCI specific (actually your proposal does not even cover all even
> for PCI) solution which may easily end up with issues in other transports.
> 
Like?

> Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> codes for virtio and PCI, then you will soon realize that a lot of things are
> missing in your proposal.
> 
Device context is something that will be extended.
VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

> > As usual, if you have to keep arguing about not doing passhthrough, we are
> surely past that point.
> 
> Who is "we"? 
> 
We = You and me.
From 2021, you keep objecting that passthrough must not be done.
And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

> Is something like what you said here passed the vote and written
> to the spec? 
Not only me.
The virtio technical committee has agreed for nested and hardware-based implementation _both_.

" hardware-based implementations" is part of the virtio specification charter with ballot of [1].

[1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

And passthrough hardware-based device is in the charter that we strive to support.

> We all know the current virtio spec is not built upon passthrough.

This efforts improve the passthrough hw based implementation that should not be blocked.

> > Virtio does not need to stay in the weird umbrella to always mediate etc.
> 
> It's not the mediation, we're not doing vDPA, the device model we had in
> hardware and we present to guests are all virtio devices. It's the trap and
> emulation which is fundamental in the world of virtualization for the past
> decades. It's the model we used to virtualize standard devices. If you want to
> debate this methodology, virtio community is clearly the wrong forum.
> 
I am not debating it at all. You keep bringing up the point of mediation.

The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.

So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,

And also hardware based passthrough device.

> >
> > Series [1] will be enhanced further to support virtio passthrough device for
> device context and more.
> > Even further we like to extend the support.
> >
> > > Since the functionality proposed in this series focus on the minimal
> > > set of the functionality for migration, it is virtio specific and
> > > self contained so nothing special is required to work in the nest.
> >
> > Maybe it is.
> >
> > Again, I repeat and like to converge the admin commands between
> passthrough and non-passthrough cases.
> 
> You need to prove at least that your proposal can work for the passthrough
> before we can try to converge.
> 
What do you mean by "prove"? virtio specification development is not proof based method.

If you want to participate, please review the patches and help community to improve.

> > If we can converge it is good.
> > If not both modes can expand.
> > It is not either or as use cases are different.
> 
> Admin commands are not the cure for all, I've stated drawbacks in other
> threads. Not repeating it again here.
He he, sure, I am not attempting to cure all.
One solution does not fit all cases.
Admin commands are used to solve the specific problem for which the AQ is designed for.

One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...

Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].
We really would like to make it more robust with your rich experience and inputs, if you care to participate.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  6:39                               ` Parav Pandit
@ 2023-09-14  3:08                                 ` Jason Wang
  2023-09-17  5:22                                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-14  3:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:15 AM
>
> [..]
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > The series works for stateless devices. Before we introduce device
> > > > states in the spec, we can't migrate stateful devices. So the device
> > > > context doesn't make much sense right now.
> > > The series works for stateful devices too. The device context covers it.
> >
> > How? Can it be used for migrating any existing stateful devices? Don't we need
> > to define what context means for a specific stateful device before you can
> > introduce things like device context? Please go through the archives for the
> > relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> > context API.
> >
> A device will have its own context for example RSS definition, or flow filters tomorrow.

If you know there are things that are missing when posting the
patches, please use the RFC tag.

> The device context will be extended post the first series.
>
> > And what's more, how can it handle the migration compatibility?
> It will be taken care in follow on as we all know that this to be checked.

You don't even mention it anywhere in your series.

> I will include the notes of future follow up work items in v1, which will be taken care post this series.
>
> > > > Dirty page tracking in virtio is not a must for live migration to
> > > > work. It can be done via platform facilities or even software. And
> > > > to make it more efficient, it needs to utilize transport facilities instead of a
> > general one.
> > > >
> > > It is also optional in the spec proposal.
> > > Most platforms claimed are not able to do efficiently either,
> >
> > Most platforms are working towards an efficient way. But we are talking about
> > different things, hardware based dirty page logging is not a must, that is what
> > I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> >
> I also said same, that hw based dirty page logging is not must. :)
> One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

How do you define efficiency? KVM uses page fault and most modern
IOMMU support PRI now.

>
> > > hence the vfio subsystem added the support for it.
> >
> > As an open standard, if it is designed for a specific software subsystem on a
> > specific OS, it's a failure.
> >
> It is not.
> One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
> So that virtio spec advancement fits in to supply such use cases.
> And blocking such advancement of virtio spec to promote only_mediation approach is not good either.
>
> BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
> I will stay away from quoting it, as I don’t see it this way.

The proposal is based on well known technology since the birth of
virtualization. I never knew a mainstream hypervisor that doesn't do
trap and emulate, did you?

>
> > >
> > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > method and how it conflicts with live migration and complicates the device
> > implementation.
> > > Huh, it shows the opposite.
> > > It shows that both will seamlessly work.
> >
> > Have you even tried your proposal with a prototype device?
> Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

I hope this is your serious answer, but it looks like it is not. Your
proposal misses a lot of states as I pointed out in another thread,
how can it work in fact?

>
> >
> > >
> > > > And it means you need to audit all PCI features and do workaround if
> > > > there're any possible issues (or using a whitelist).
> > > No need for any of this.
> >
> > You need to prove this otherwise it's fragile. It's the duty of the author to justify
> > not the reviewer.
> >
> One cannot post patches and nor review giant series in one go.
> Hence the work to be split on a logical boundary.
> Features provisioning, pci layout etc is secondary tasks to take care of.

Again, if you know something is missing, you need to explain it in the
series instead of waiting for some reviewers to point it out and say
it's well-known afterwards.

>
> > For example FLR is required to be done in 100ms. How could you achieve this
> > during the live migration? How does it affect the downtime and FRS?
> >
> Good technical question to discuss instead of passthrough vs mediation. :)
>
> Device administration work is separate from the device operational part.
> The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
> And on next read of the device context the FLRed context is returned.

Firstly, you didn't explain how it affects the live migration, for
example, what happens if we try to migrate while FLR is ongoing.
Secondly, you ignore the other two questions.

Let's save the time of both.

>
> > >
> > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > don't use simple passthrough we don't need to care about this.
> > > >
> > > Exactly, we are migrating virtio device for the PCI transport.
> >
> > No, the migration facility is a general requirement for all transport.
> It is for all transport. One can extend when do for MMIO.

By using admin commands? It can not perform well for registered.

>
> > Starting from a PCI specific (actually your proposal does not even cover all even
> > for PCI) solution which may easily end up with issues in other transports.
> >
> Like?

The admin command/virtqueue itself may not work well for other
transport. That's the drawback of your proposal while this proposal
doesn't do any coupling.

>
> > Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> > codes for virtio and PCI, then you will soon realize that a lot of things are
> > missing in your proposal.
> >
> Device context is something that will be extended.
> VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

This is just one mini stuff, how about PCI config space and others?

Again, please read Qemu codes, a lot of things are missing in your
proposal now. If everything is fine to do passthrough based live
migration, I'm pretty sure you need more than what Qemu has since it
can only do a small fraction of the whole PCI.

>
> > > As usual, if you have to keep arguing about not doing passhthrough, we are
> > surely past that point.
> >
> > Who is "we"?
> >
> We = You and me.
> From 2021, you keep objecting that passthrough must not be done.

This is a big misunderstanding, you need to justify it or at least
address the concerns from any reviewer.

> And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

It's unrealistic to think that one will be 100% correct. Justify your
proposal or why I was wrong instead of ignoring my questions and
complaining. That is why we need a community. If it doesn't work,
virtio provides another process for convergence.

>
> > Is something like what you said here passed the vote and written
> > to the spec?
> Not only me.
> The virtio technical committee has agreed for nested and hardware-based implementation _both_.
>
> " hardware-based implementations" is part of the virtio specification charter with ballot of [1].
>
> [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

Let's don't do conceptual shifts, I was asking the passthrough but you
give me the hardware implementation.

>
> And passthrough hardware-based device is in the charter that we strive to support.
>
> > We all know the current virtio spec is not built upon passthrough.
>
> This efforts improve the passthrough hw based implementation that should not be blocked.

Your proposal was posted only for several days and you think I would
block that just because I asked several questions and some of them are
not answered?

>
> > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> >
> > It's not the mediation, we're not doing vDPA, the device model we had in
> > hardware and we present to guests are all virtio devices. It's the trap and
> > emulation which is fundamental in the world of virtualization for the past
> > decades. It's the model we used to virtualize standard devices. If you want to
> > debate this methodology, virtio community is clearly the wrong forum.
> >
> I am not debating it at all. You keep bringing up the point of mediation.
>
> The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.
>
> So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,
>
> And also hardware based passthrough device.
>
> > >
> > > Series [1] will be enhanced further to support virtio passthrough device for
> > device context and more.
> > > Even further we like to extend the support.
> > >
> > > > Since the functionality proposed in this series focus on the minimal
> > > > set of the functionality for migration, it is virtio specific and
> > > > self contained so nothing special is required to work in the nest.
> > >
> > > Maybe it is.
> > >
> > > Again, I repeat and like to converge the admin commands between
> > passthrough and non-passthrough cases.
> >
> > You need to prove at least that your proposal can work for the passthrough
> > before we can try to converge.
> >
> What do you mean by "prove"? virtio specification development is not proof based method.

For example, several of my questions were ignored.

>
> If you want to participate, please review the patches and help community to improve.

See above.

>
> > > If we can converge it is good.
> > > If not both modes can expand.
> > > It is not either or as use cases are different.
> >
> > Admin commands are not the cure for all, I've stated drawbacks in other
> > threads. Not repeating it again here.
> He he, sure, I am not attempting to cure all.
> One solution does not fit all cases.

Then why do you want to couple migration with admin commands?

> Admin commands are used to solve the specific problem for which the AQ is designed for.
>
> One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...
>
> Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].

As a reviewer, I ask questions but some of them are ignored, do you
expect the reviewer to figure out by themselves?  If yes, then the
virtio community is not the only community that can block you.

> We really would like to make it more robust with your rich experience and inputs, if you care to participate.

We can collaborate for sure: as I pointed out in another threads, from
what I can see from the both proposals of the current version:

I see a good opportunity to build your admin commands proposal on top
of this proposal. Or it means, we can focus on what needs to be
migrated first:

1) queue state
2) inflight descriptors
3) dirty pages (optional)
4) device state(context) (optional)

I'd leave 3 or 4 since they are very complicated features. Then we can
invent an interface to access those facilities? This is how this
series is structured.

And what's more, admin commands or transport specific interfaces. And
when we invent admin commands, you may realize you are inventing a new
transport which is the idea of transport via admin commands.

Thanks


>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:08                                 ` Jason Wang
@ 2023-09-17  5:22                                   ` Parav Pandit
  2023-09-19  4:32                                     ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:39 AM
> 
> On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:15 AM
> >
> > [..]
> > > > > > [1]
> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > g000
> > > > > > 61.h
> > > > > > tml
> > > > >
> > > > > The series works for stateless devices. Before we introduce
> > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > the device context doesn't make much sense right now.
> > > > The series works for stateful devices too. The device context covers it.
> > >
> > > How? Can it be used for migrating any existing stateful devices?
> > > Don't we need to define what context means for a specific stateful
> > > device before you can introduce things like device context? Please
> > > go through the archives for the relevant discussions (e.g
> > > virtio-FS), it's not as simple as introducing a device context API.
> > >
> > A device will have its own context for example RSS definition, or flow filters
> tomorrow.
> 
> If you know there are things that are missing when posting the patches, please
> use the RFC tag.
> 
It is not missing. They are optional, which is why it is not needed in this series.

> > The device context will be extended post the first series.
> >
> > > And what's more, how can it handle the migration compatibility?
> > It will be taken care in follow on as we all know that this to be checked.
> 
> You don't even mention it anywhere in your series.
> 
Migration compatibility is topic in itself regardless of device migration series.
It is part of the feature provisioning phase needed regardless.
Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
I don’t see a need to mention the long known missing functionality and common to both approaches.

> > I will include the notes of future follow up work items in v1, which will be
> taken care post this series.
> >
> > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > to work. It can be done via platform facilities or even
> > > > > software. And to make it more efficient, it needs to utilize
> > > > > transport facilities instead of a
> > > general one.
> > > > >
> > > > It is also optional in the spec proposal.
> > > > Most platforms claimed are not able to do efficiently either,
> > >
> > > Most platforms are working towards an efficient way. But we are
> > > talking about different things, hardware based dirty page logging is
> > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> to log dirty pages.
> > >
> > I also said same, that hw based dirty page logging is not must. :) One
> > day hw mmu will be able to track everything efficiently. I have not seen it
> happening yet.
> 
> How do you define efficiency? KVM uses page fault and most modern IOMMU
> support PRI now.
>
One cannot define PRI as mandatory feature. In our research and experiments we see that PRI is significantly slower to handle page faults.
Yet different topic...

Efficiency is defined by the downtime of the multiple devices in a VM.
And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...

One can use post-copy approach as well, current device migration is around established pre-copy approach.

> >
> > > > hence the vfio subsystem added the support for it.
> > >
> > > As an open standard, if it is designed for a specific software
> > > subsystem on a specific OS, it's a failure.
> > >
> > It is not.
> > One need accept that, in certain areas virtio is following the trails of
> advancement already done in sw stack.
> > So that virtio spec advancement fits in to supply such use cases.
> > And blocking such advancement of virtio spec to promote only_mediation
> approach is not good either.
> >
> > BTW: One can say the mediation approach is also designed for specific
> software subsystem and hence failure.
> > I will stay away from quoting it, as I don’t see it this way.
> 
> The proposal is based on well known technology since the birth of virtualization.
Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..
And hence failure.

I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
Mostly multiple of them, who all support passthrough devices.

> I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> you?
> 
It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

> >
> > > >
> > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > method and how it conflicts with live migration and complicates
> > > > > the device
> > > implementation.
> > > > Huh, it shows the opposite.
> > > > It shows that both will seamlessly work.
> > >
> > > Have you even tried your proposal with a prototype device?
> > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> spec with virtio-net and virtio-blk devices.
> 
> I hope this is your serious answer, but it looks like it is not. Your proposal misses
> a lot of states as I pointed out in another thread, how can it work in fact?
> 
Which states?
What is posted in series [1] is minimal and base required items, optional one is omitted as it can be done incrementally.
Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.
So there is no point in pushing large part of the device context and making the series blurry.
It will be done incrementally subsequently.

> > > >
> > > > > And it means you need to audit all PCI features and do
> > > > > workaround if there're any possible issues (or using a whitelist).
> > > > No need for any of this.
> > >
> > > You need to prove this otherwise it's fragile. It's the duty of the
> > > author to justify not the reviewer.
> > >
> > One cannot post patches and nor review giant series in one go.
> > Hence the work to be split on a logical boundary.
> > Features provisioning, pci layout etc is secondary tasks to take care of.
> 
> Again, if you know something is missing, you need to explain it in the series
> instead of waiting for some reviewers to point it out and say it's well-known
> afterwards.
> 
The patch set cannot be a laundry list of items missing in virtio spec.
It is short and focused on the device migration.

> >
> > > For example FLR is required to be done in 100ms. How could you
> > > achieve this during the live migration? How does it affect the downtime and
> FRS?
> > >
> > Good technical question to discuss instead of passthrough vs
> > mediation. :)
> >
> > Device administration work is separate from the device operational part.
> > The device context records what is the current device context, when the FLR
> occurs, the device stops all the operations.
> > And on next read of the device context the FLRed context is returned.
> 
> Firstly, you didn't explain how it affects the live migration, for example, what
> happens if we try to migrate while FLR is ongoing.
> Secondly, you ignore the other two questions.
> 
> Let's save the time of both.
> 
There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.
device_status is just another registers like rest of them.
One does not need to poke around registers when doing passthrough.

> >
> > > >
> > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > don't use simple passthrough we don't need to care about this.
> > > > >
> > > > Exactly, we are migrating virtio device for the PCI transport.
> > >
> > > No, the migration facility is a general requirement for all transport.
> > It is for all transport. One can extend when do for MMIO.
> 
> By using admin commands? It can not perform well for registered.
> 
Yes, admin commands using AQ on MMIO based owner device will also be just fine.

> >
> > > Starting from a PCI specific (actually your proposal does not even
> > > cover all even for PCI) solution which may easily end up with issues in other
> transports.
> > >
> > Like?
> 
> The admin command/virtqueue itself may not work well for other transport.
> That's the drawback of your proposal while this proposal doesn't do any
> coupling.
> 
There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
And in my proposal also there is no such coupling.

> >
> > > Even if you want to migrate virtio for PCI,  please at least read
> > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > that a lot of things are missing in your proposal.
> > >
> > Device context is something that will be extended.
> > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> transport.
> 
> This is just one mini stuff, how about PCI config space and others?
> 
No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.
Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

> Again, please read Qemu codes, a lot of things are missing in your proposal
> now. If everything is fine to do passthrough based live migration, I'm pretty sure
> you need more than what Qemu has since it can only do a small fraction of the
> whole PCI.
> 
I will read.
Many of the pieces may be implemented by the device over time following the charter.

> >
> > > > As usual, if you have to keep arguing about not doing
> > > > passhthrough, we are
> > > surely past that point.
> > >
> > > Who is "we"?
> > >
> > We = You and me.
> > From 2021, you keep objecting that passthrough must not be done.
> 
> This is a big misunderstanding, you need to justify it or at least address the
> concerns from any reviewer.
> 
They are getting addressed, if you have comments, please post those comments in the actual series.
I wouldn’t diverge to discuss in different series here.

> > And blocking the work done by other technical committee members to
> improve the virtio spec to make that happen is simply wrong.
> 
> It's unrealistic to think that one will be 100% correct. Justify your proposal or
> why I was wrong instead of ignoring my questions and complaining. That is why
> we need a community. If it doesn't work, virtio provides another process for
> convergence.
> 
I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

> >
> > > Is something like what you said here passed the vote and written to
> > > the spec?
> > Not only me.
> > The virtio technical committee has agreed for nested and hardware-based
> implementation _both_.
> >
> > " hardware-based implementations" is part of the virtio specification charter
> with ballot of [1].
> >
> > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> 
> Let's don't do conceptual shifts, I was asking the passthrough but you give me
> the hardware implementation.
> 
Passthrough devices implemented by hw which does dirty tracking and following the spec.

> >
> > And passthrough hardware-based device is in the charter that we strive to
> support.
> >
> > > We all know the current virtio spec is not built upon passthrough.
> >
> > This efforts improve the passthrough hw based implementation that should
> not be blocked.
> 
> Your proposal was posted only for several days and you think I would block that
> just because I asked several questions and some of them are not answered?
> 
If I misunderstood, then I am sorry.
Lets progress and improve the passthrough use case without trap+emulation.
Trap+emulation=mediation is also a valid solution for nested case.
And I frankly see a need for both as both are solving a different problem.
Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.

> >
> > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > >
> > > It's not the mediation, we're not doing vDPA, the device model we
> > > had in hardware and we present to guests are all virtio devices.
> > > It's the trap and emulation which is fundamental in the world of
> > > virtualization for the past decades. It's the model we used to
> > > virtualize standard devices. If you want to debate this methodology, virtio
> community is clearly the wrong forum.
> > >
> > I am not debating it at all. You keep bringing up the point of mediation.
> >
> > The proposal of [1] is clear that wants to do hardware based passthrough
> devices with least amount of virtio level mediation.
> >
> > So somewhere mode of virtualizing has been used, that’s fine, it can
> > continue with full virtualization, mediation,
> >
> > And also hardware based passthrough device.
> >
> > > >
> > > > Series [1] will be enhanced further to support virtio passthrough
> > > > device for
> > > device context and more.
> > > > Even further we like to extend the support.
> > > >
> > > > > Since the functionality proposed in this series focus on the
> > > > > minimal set of the functionality for migration, it is virtio
> > > > > specific and self contained so nothing special is required to work in the
> nest.
> > > >
> > > > Maybe it is.
> > > >
> > > > Again, I repeat and like to converge the admin commands between
> > > passthrough and non-passthrough cases.
> > >
> > > You need to prove at least that your proposal can work for the
> > > passthrough before we can try to converge.
> > >
> > What do you mean by "prove"? virtio specification development is not proof
> based method.
> 
> For example, several of my questions were ignored.
> 
I didn’t ignore, but if I miss, I will answer.

> >
> > If you want to participate, please review the patches and help community to
> improve.
> 
> See above.
> 
> >
> > > > If we can converge it is good.
> > > > If not both modes can expand.
> > > > It is not either or as use cases are different.
> > >
> > > Admin commands are not the cure for all, I've stated drawbacks in
> > > other threads. Not repeating it again here.
> > He he, sure, I am not attempting to cure all.
> > One solution does not fit all cases.
> 
> Then why do you want to couple migration with admin commands?
> 
Because of following.
1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
Cannot be done through registers, because
a. registers are slow for bidirectional communication
b. do not scale well with scale of VFs

> > Admin commands are used to solve the specific problem for which the AQ is
> designed for.
> >
> > One can make argument saying take pci fabric to 10 km distance, don’t bring
> new virtio tcp transport...
> >
> > Drawing boundaries around virtio spec in certain way only makes it further
> inferior. So please do not block advancements bring in [1].
> 
> As a reviewer, I ask questions but some of them are ignored, do you expect the
> reviewer to figure out by themselves?  
Sure, please review.

Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

> 
> > We really would like to make it more robust with your rich experience and
> inputs, if you care to participate.
> 
> We can collaborate for sure: as I pointed out in another threads, from what I
> can see from the both proposals of the current version:
> 
> I see a good opportunity to build your admin commands proposal on top of this
> proposal. Or it means, we can focus on what needs to be migrated first:
> 
> 1) queue state
This is just one small part of the device context
So once a device context is read/written, it covers q.

> 2) inflight descriptors
Same a q state, it is part of the device context.

> 3) dirty pages (optional)
> 4) device state(context) (optional)
>
It is same as #1 and #2.
Splitting them from #1 and #2 is not needed.

We can extend the device context to be selectively queried for nested case..
 
> I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> interface to access those facilities? This is how this series is structured.
> 
> And what's more, admin commands or transport specific interfaces. And when
> we invent admin commands, you may realize you are inventing a new transport
> which is the idea of transport via admin commands.

Not really. it is not a new transport at all.
I explained you before when you quote is as transport, it must carry the driver notifications as well..
Otherwise it is just set of commands..

The new commands are self contained anyway of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:22                                   ` Parav Pandit
@ 2023-09-19  4:32                                     ` Jason Wang
  2023-09-19  7:32                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-19  4:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:39 AM
> >
> > On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:15 AM
> > >
> > > [..]
> > > > > > > [1]
> > > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > > g000
> > > > > > > 61.h
> > > > > > > tml
> > > > > >
> > > > > > The series works for stateless devices. Before we introduce
> > > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > > the device context doesn't make much sense right now.
> > > > > The series works for stateful devices too. The device context covers it.
> > > >
> > > > How? Can it be used for migrating any existing stateful devices?
> > > > Don't we need to define what context means for a specific stateful
> > > > device before you can introduce things like device context? Please
> > > > go through the archives for the relevant discussions (e.g
> > > > virtio-FS), it's not as simple as introducing a device context API.
> > > >
> > > A device will have its own context for example RSS definition, or flow filters
> > tomorrow.
> >
> > If you know there are things that are missing when posting the patches, please
> > use the RFC tag.
> >
> It is not missing. They are optional, which is why it is not needed in this series.
>
> > > The device context will be extended post the first series.
> > >
> > > > And what's more, how can it handle the migration compatibility?
> > > It will be taken care in follow on as we all know that this to be checked.
> >
> > You don't even mention it anywhere in your series.
> >
> Migration compatibility is topic in itself regardless of device migration series.

Why? Without compatibility support, migration can't work in the
production environment.

> It is part of the feature provisioning phase needed regardless.

Definitely not, it is something that must be considered even without
any feature. It's about the robustness of the migration protocol.
Sometimes you need to do that since some states were lost in the
previous version of protocols or formats .

> Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
> I don’t see a need to mention the long known missing functionality and common to both approaches.

Again, your proposal needs to describe at least the plan for dealing
with migration compatibility since you want a passthrough based
solution. That's the point.

>
> > > I will include the notes of future follow up work items in v1, which will be
> > taken care post this series.
> > >
> > > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > > to work. It can be done via platform facilities or even
> > > > > > software. And to make it more efficient, it needs to utilize
> > > > > > transport facilities instead of a
> > > > general one.
> > > > > >
> > > > > It is also optional in the spec proposal.
> > > > > Most platforms claimed are not able to do efficiently either,
> > > >
> > > > Most platforms are working towards an efficient way. But we are
> > > > talking about different things, hardware based dirty page logging is
> > > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> > to log dirty pages.
> > > >
> > > I also said same, that hw based dirty page logging is not must. :) One
> > > day hw mmu will be able to track everything efficiently. I have not seen it
> > happening yet.
> >
> > How do you define efficiency? KVM uses page fault and most modern IOMMU
> > support PRI now.
> >
> One cannot define PRI as mandatory feature.

There's no way to mandate PRI, it's a PCI specific facility.

> In our research and experiments we see that PRI is significantly slower to handle page faults.
> Yet different topic...

PRI's performance is definitely another topic, it's just an example
that tracking dirty pages by device is optional and transport (PCI)
can evolve for sure. What's more important, it demonstrates the basic
design of virtio, which is trying to leverage the transport instead of
a mandatory reveinting of everything.

>
> Efficiency is defined by the downtime of the multiple devices in a VM.

Ok, but you tend to ignore my question regarding the downtime.

> And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...
>

It has many things that I don't see a good answer for. For example,
the QOS raised by Ling Shan.

> One can use post-copy approach as well, current device migration is around established pre-copy approach.

Another drawback of your proposal. With transport specific assistance
like PRI, you can do both pre and post. But the point is we need to
make sure pre-copy downtime can satisfy the requirement instead of
switching to another.

>
> > >
> > > > > hence the vfio subsystem added the support for it.
> > > >
> > > > As an open standard, if it is designed for a specific software
> > > > subsystem on a specific OS, it's a failure.
> > > >
> > > It is not.
> > > One need accept that, in certain areas virtio is following the trails of
> > advancement already done in sw stack.
> > > So that virtio spec advancement fits in to supply such use cases.
> > > And blocking such advancement of virtio spec to promote only_mediation
> > approach is not good either.
> > >
> > > BTW: One can say the mediation approach is also designed for specific
> > software subsystem and hence failure.
> > > I will stay away from quoting it, as I don’t see it this way.
> >
> > The proposal is based on well known technology since the birth of virtualization.
> Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..

How, this series reuses the existing capability by introducing just
two more registers on the existing common cfg structure and you think
it targets a specific software subsystem? If this is true, I think you
are actually challenging the design of the whole modern PCI transport.

> And hence failure.

Failure in what sense?

>
> I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
> Mostly multiple of them, who all support passthrough devices.

We are talking about different things again.

>
> > I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> > you?
> >
> It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

Well, we are in the context of live migration, no? We all know
passthrough just works fine with the existing virtio spec...

>
> > >
> > > > >
> > > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > > method and how it conflicts with live migration and complicates
> > > > > > the device
> > > > implementation.
> > > > > Huh, it shows the opposite.
> > > > > It shows that both will seamlessly work.
> > > >
> > > > Have you even tried your proposal with a prototype device?
> > > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> > spec with virtio-net and virtio-blk devices.
> >
> > I hope this is your serious answer, but it looks like it is not. Your proposal misses
> > a lot of states as I pointed out in another thread, how can it work in fact?
> >
> Which states?

Let me repeat it for the third time. You don't even cover all the
functionality of common cfg, how can guests see a consistent common
cfg state?

> What is posted in series [1] is minimal and base required items,

You need to prove it is minimal, instead of ignoring my questions. For
example, dirty page tracking is definitely optional.

> optional one is omitted as it can be done incrementally.
> Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.

You never explain why this series needs to deal with P2P and dirty
page tracking.

> So there is no point in pushing large part of the device context and making the series blurry.

I don't see a good definition of "device context" and most of the
device context has been covered by the existing PCI capabilities.

> It will be done incrementally subsequently.
>
> > > > >
> > > > > > And it means you need to audit all PCI features and do
> > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > No need for any of this.
> > > >
> > > > You need to prove this otherwise it's fragile. It's the duty of the
> > > > author to justify not the reviewer.
> > > >
> > > One cannot post patches and nor review giant series in one go.
> > > Hence the work to be split on a logical boundary.
> > > Features provisioning, pci layout etc is secondary tasks to take care of.
> >
> > Again, if you know something is missing, you need to explain it in the series
> > instead of waiting for some reviewers to point it out and say it's well-known
> > afterwards.
> >
> The patch set cannot be a laundry list of items missing in virtio spec.
> It is short and focused on the device migration.

You need to mention it in the cover letter at least for a big picture
at least, what's wrong with this? It helps to save time for everyone
or people will keep asking similar questions. Is this too hard to be
understood?

>
> > >
> > > > For example FLR is required to be done in 100ms. How could you
> > > > achieve this during the live migration? How does it affect the downtime and
> > FRS?
> > > >
> > > Good technical question to discuss instead of passthrough vs
> > > mediation. :)
> > >
> > > Device administration work is separate from the device operational part.
> > > The device context records what is the current device context, when the FLR
> > occurs, the device stops all the operations.
> > > And on next read of the device context the FLRed context is returned.
> >
> > Firstly, you didn't explain how it affects the live migration, for example, what
> > happens if we try to migrate while FLR is ongoing.
> > Secondly, you ignore the other two questions.
> >
> > Let's save the time of both.
> >
> There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.

Do you think this is a valid answer to my above question? Let's don't
exhaust the patience from any reviewer.

> device_status is just another registers like rest of them.

I don't see device status itself as anything related to FLR.

> One does not need to poke around registers when doing passthrough.
>
> > >
> > > > >
> > > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > > don't use simple passthrough we don't need to care about this.
> > > > > >
> > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > >
> > > > No, the migration facility is a general requirement for all transport.
> > > It is for all transport. One can extend when do for MMIO.
> >
> > By using admin commands? It can not perform well for registered.
> >
> Yes, admin commands using AQ on MMIO based owner device will also be just fine.

Can admin commands be implemented efficiently via registered? I would
like to see how it can work.

MMIO doesn't have the concepts of group owner etc at all or do you
know how to build one?

>
> > >
> > > > Starting from a PCI specific (actually your proposal does not even
> > > > cover all even for PCI) solution which may easily end up with issues in other
> > transports.
> > > >
> > > Like?
> >
> > The admin command/virtqueue itself may not work well for other transport.
> > That's the drawback of your proposal while this proposal doesn't do any
> > coupling.
> >
> There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
> And in my proposal also there is no such coupling.

I hope so but I don't think so. We need to at least do this explicitly
by moving all the state definitions to the "basic facility" part.

>
> > >
> > > > Even if you want to migrate virtio for PCI,  please at least read
> > > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > > that a lot of things are missing in your proposal.
> > > >
> > > Device context is something that will be extended.
> > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > transport.
> >
> > This is just one mini stuff, how about PCI config space and others?
> >
> No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.

Let me ask you a simple question, if you don't migrate the PCI config
space, how can you guarantee that guests see the same config space
state after migration? What happens if a specific capability exists
only in the src but not the destination? Or do you want to provision
PCI capabilities?

> Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

Please answer my question above.

>
> > Again, please read Qemu codes, a lot of things are missing in your proposal
> > now. If everything is fine to do passthrough based live migration, I'm pretty sure
> > you need more than what Qemu has since it can only do a small fraction of the
> > whole PCI.
> >
> I will read.
> Many of the pieces may be implemented by the device over time following the charter.
>
> > >
> > > > > As usual, if you have to keep arguing about not doing
> > > > > passhthrough, we are
> > > > surely past that point.
> > > >
> > > > Who is "we"?
> > > >
> > > We = You and me.
> > > From 2021, you keep objecting that passthrough must not be done.
> >
> > This is a big misunderstanding, you need to justify it or at least address the
> > concerns from any reviewer.
> >
> They are getting addressed, if you have comments, please post those comments in the actual series.
> I wouldn’t diverge to discuss in different series here.

Well, Lingshan's series was posted before you and it's you that keep
referring to your proposal here. What's more, I've asked some
questions but most of them don't have a good answer.  So I need to
stop before I can ask more.

>
> > > And blocking the work done by other technical committee members to
> > improve the virtio spec to make that happen is simply wrong.
> >
> > It's unrealistic to think that one will be 100% correct. Justify your proposal or
> > why I was wrong instead of ignoring my questions and complaining. That is why
> > we need a community. If it doesn't work, virtio provides another process for
> > convergence.
> >
> I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
> And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
> The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

Again, this is a big misunderstanding. Passthrough can work doesn't
mean your proposal can work. I'm asking questions and want to figure
out if/how it can work correctly. But you keep ignoring them or
raising other unrelated issues.

>
> > >
> > > > Is something like what you said here passed the vote and written to
> > > > the spec?
> > > Not only me.
> > > The virtio technical committee has agreed for nested and hardware-based
> > implementation _both_.
> > >
> > > " hardware-based implementations" is part of the virtio specification charter
> > with ballot of [1].
> > >
> > > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> >
> > Let's don't do conceptual shifts, I was asking the passthrough but you give me
> > the hardware implementation.
> >
> Passthrough devices implemented by hw which does dirty tracking and following the spec.

Why is passthrough coupled with dirty tracking?

>
> > >
> > > And passthrough hardware-based device is in the charter that we strive to
> > support.
> > >
> > > > We all know the current virtio spec is not built upon passthrough.
> > >
> > > This efforts improve the passthrough hw based implementation that should
> > not be blocked.
> >
> > Your proposal was posted only for several days and you think I would block that
> > just because I asked several questions and some of them are not answered?
> >
> If I misunderstood, then I am sorry.
> Lets progress and improve the passthrough use case without trap+emulation.

Unless any reviewer says no, the comments or concerns are a good
opportunity for you to justify your method. That's what I'm doing
right now and how the community works.

> Trap+emulation=mediation is also a valid solution for nested case.

Again. Not only for the nested case. This method has been used for
cloud vendors now.

> And I frankly see a need for both as both are solving a different problem.

Then, let's don't couple state, suspending, dirty page tracking with
admin commands.

> Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

It's easy to not step on others, but it would end up with duplications for sure.

>
> When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.
>
> > >
> > > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > > >
> > > > It's not the mediation, we're not doing vDPA, the device model we
> > > > had in hardware and we present to guests are all virtio devices.
> > > > It's the trap and emulation which is fundamental in the world of
> > > > virtualization for the past decades. It's the model we used to
> > > > virtualize standard devices. If you want to debate this methodology, virtio
> > community is clearly the wrong forum.
> > > >
> > > I am not debating it at all. You keep bringing up the point of mediation.
> > >
> > > The proposal of [1] is clear that wants to do hardware based passthrough
> > devices with least amount of virtio level mediation.
> > >
> > > So somewhere mode of virtualizing has been used, that’s fine, it can
> > > continue with full virtualization, mediation,
> > >
> > > And also hardware based passthrough device.
> > >
> > > > >
> > > > > Series [1] will be enhanced further to support virtio passthrough
> > > > > device for
> > > > device context and more.
> > > > > Even further we like to extend the support.
> > > > >
> > > > > > Since the functionality proposed in this series focus on the
> > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > specific and self contained so nothing special is required to work in the
> > nest.
> > > > >
> > > > > Maybe it is.
> > > > >
> > > > > Again, I repeat and like to converge the admin commands between
> > > > passthrough and non-passthrough cases.
> > > >
> > > > You need to prove at least that your proposal can work for the
> > > > passthrough before we can try to converge.
> > > >
> > > What do you mean by "prove"? virtio specification development is not proof
> > based method.
> >
> > For example, several of my questions were ignored.
> >
> I didn’t ignore, but if I miss, I will answer.
>
> > >
> > > If you want to participate, please review the patches and help community to
> > improve.
> >
> > See above.
> >
> > >
> > > > > If we can converge it is good.
> > > > > If not both modes can expand.
> > > > > It is not either or as use cases are different.
> > > >
> > > > Admin commands are not the cure for all, I've stated drawbacks in
> > > > other threads. Not repeating it again here.
> > > He he, sure, I am not attempting to cure all.
> > > One solution does not fit all cases.
> >
> > Then why do you want to couple migration with admin commands?
> >
> Because of following.
> 1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
> Cannot be done through registers, because
> a. registers are slow for bidirectional communication
> b. do not scale well with scale of VFs

That's pretty fine, but let's not limit it to a virtqueue. Virtqueue
may not work for all the cases:

I must repeat some of Ling Shan's questions since I don't see a good
answer for them now.

1) If you want to use virtqueue to do the migration with a downtime
requirement. Is the driver required to do some sort of software QOS?
For example what happens if one wants to migrate but the admin
virtqueue is out of space? And do we need a timeout for a specific
command and if yes what happens after the timeout?
2) Assuming one round of the migration requires several commands. Are
they allowed to be submitted in a batch? If yes, how is the ordering
guaranteed or we don't need it at all? If not, why do we even need a
queue?

If you're using an existing transport specific mechanism, you don't
need to care about the above. I'm not saying admin virtqueue can't
work but it definitely has more things to be considered.

>
> > > Admin commands are used to solve the specific problem for which the AQ is
> > designed for.
> > >
> > > One can make argument saying take pci fabric to 10 km distance, don’t bring
> > new virtio tcp transport...
> > >
> > > Drawing boundaries around virtio spec in certain way only makes it further
> > inferior. So please do not block advancements bring in [1].
> >
> > As a reviewer, I ask questions but some of them are ignored, do you expect the
> > reviewer to figure out by themselves?
> Sure, please review.
>
> Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

I think we all agree that your proposal does not fit for nesting, no?
It demonstrates that work needs to be done in the basic facility
first.

What's more the conclusion is for coupling live migration with admin
command. This point has been clarified several times before.

>
> >
> > > We really would like to make it more robust with your rich experience and
> > inputs, if you care to participate.
> >
> > We can collaborate for sure: as I pointed out in another threads, from what I
> > can see from the both proposals of the current version:
> >
> > I see a good opportunity to build your admin commands proposal on top of this
> > proposal. Or it means, we can focus on what needs to be migrated first:
> >
> > 1) queue state
> This is just one small part of the device context
> So once a device context is read/written, it covers q.

That's a layer violation. Virtqueue is the basic facility, states need
to be defined there.

>
> > 2) inflight descriptors
> Same a q state, it is part of the device context.

Admin commands are not the only way to access device context. For
example, do you agree the virtqueue address is part of the device
context? If yes, it is available in the common configuration now.

>
> > 3) dirty pages (optional)
> > 4) device state(context) (optional)
> >
> It is same as #1 and #2.
> Splitting them from #1 and #2 is not needed.
>
> We can extend the device context to be selectively queried for nested case..
>
> > I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> > interface to access those facilities? This is how this series is structured.
> >
> > And what's more, admin commands or transport specific interfaces. And when
> > we invent admin commands, you may realize you are inventing a new transport
> > which is the idea of transport via admin commands.
>
> Not really. it is not a new transport at all.
> I explained you before when you quote is as transport, it must carry the driver notifications as well..
> Otherwise it is just set of commands..

I've explained that you need admin commands to save and load all
existing virtio PCI capabilities. This means a driver can just use
those commands to work. If not, please explain why I was wrong.

Thanks





>
> The new commands are self contained anyway of [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:32                                     ` Jason Wang
@ 2023-09-19  7:32                                       ` Parav Pandit
  0 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:02 AM

> > Migration compatibility is topic in itself regardless of device migration series.
> 
> Why? Without compatibility support, migration can't work in the production
> environment.

As I said it is part of the future series. We don’t cook all features at one.
Orchestration knows when not to migrate in out of band manner.

> 
> > It is part of the feature provisioning phase needed regardless.
> 
> Definitely not, it is something that must be considered even without any feature.
I disagree to make it must.
It is not must but it is greatly useful for sure.

> It's about the robustness of the migration protocol.
> Sometimes you need to do that since some states were lost in the previous
> version of protocols or formats .
> 
> > Like how you and Lingshan wanted to keep the suspend bit series small and
> logical, device migration series is also logically split for the functionality.
> > I don’t see a need to mention the long known missing functionality and
> common to both approaches.
> 
> Again, your proposal needs to describe at least the plan for dealing with
> migration compatibility since you want a passthrough based solution. That's the
> point.
> 
No matter passthrough or no-passthrough, migration compatibility cannot be achieved if the device does not provide a way to query and configure.
Migration will fail when features mis-match.

So, I will add note to add this in future in the commit log.

> >
> > > > I will include the notes of future follow up work items in v1,
> > > > which will be
> > > taken care post this series.
> > > >
> > > > > > > Dirty page tracking in virtio is not a must for live
> > > > > > > migration to work. It can be done via platform facilities or
> > > > > > > even software. And to make it more efficient, it needs to
> > > > > > > utilize transport facilities instead of a
> > > > > general one.
> > > > > > >
> > > > > > It is also optional in the spec proposal.
> > > > > > Most platforms claimed are not able to do efficiently either,
> > > > >
> > > > > Most platforms are working towards an efficient way. But we are
> > > > > talking about different things, hardware based dirty page
> > > > > logging is not a must, that is what I'm saying. For example, KVM
> > > > > doesn't use hardware
> > > to log dirty pages.
> > > > >
> > > > I also said same, that hw based dirty page logging is not must. :)
> > > > One day hw mmu will be able to track everything efficiently. I
> > > > have not seen it
> > > happening yet.
> > >
> > > How do you define efficiency? KVM uses page fault and most modern
> > > IOMMU support PRI now.
> > >
> > One cannot define PRI as mandatory feature.
> 
> There's no way to mandate PRI, it's a PCI specific facility.
> 
You proposed it to do PRI for migration, it becomes mandatory at that point.

> > In our research and experiments we see that PRI is significantly slower to
> handle page faults.
> > Yet different topic...
> 
> PRI's performance is definitely another topic, it's just an example that tracking
> dirty pages by device is optional and transport (PCI) can evolve for sure. What's
> more important, it demonstrates the basic design of virtio, which is trying to
> leverage the transport instead of a mandatory reveinting of everything.
> 
An example that does not work is not worth and dependable technology to rely on to achieve it now.
Anyway, all will not use PRI always.

> >
> > Efficiency is defined by the downtime of the multiple devices in a VM.
> 
> Ok, but you tend to ignore my question regarding the downtime.
> 
What is the question?
Admin commands can achieve the desired downtime, if that is what you are asking.

> > And leading OS allowed device advancements by allowing device to report
> dirty pages in cpu and platform agnostic way...
> >
> 
> It has many things that I don't see a good answer for. For example, the QOS
> raised by Ling Shan.
> 
I am not going to repeat QoS anymore. :)
He is questioning virtqueue semantics itself, he better rewrite the spec to not use virtqueue.

> > One can use post-copy approach as well, current device migration is around
> established pre-copy approach.
> 
> Another drawback of your proposal. With transport specific assistance like PRI,
PRI page fault rate in our research is 20x slower than cpu page fault rate.

> you can do both pre and post. But the point is we need to make sure pre-copy
> downtime can satisfy the requirement instead of switching to another.
> 
In our work we see it satisfy the downtime requirements.

Again dirty page tracking is optional so when PRI can catch up in next few years, driver can stop relying on it.

> >
> > > >
> > > > > > hence the vfio subsystem added the support for it.
> > > > >
> > > > > As an open standard, if it is designed for a specific software
> > > > > subsystem on a specific OS, it's a failure.
> > > > >
> > > > It is not.
> > > > One need accept that, in certain areas virtio is following the
> > > > trails of
> > > advancement already done in sw stack.
> > > > So that virtio spec advancement fits in to supply such use cases.
> > > > And blocking such advancement of virtio spec to promote
> > > > only_mediation
> > > approach is not good either.
> > > >
> > > > BTW: One can say the mediation approach is also designed for
> > > > specific
> > > software subsystem and hence failure.
> > > > I will stay away from quoting it, as I don’t see it this way.
> > >
> > > The proposal is based on well known technology since the birth of
> virtualization.
> > Sure, but that does not change the fact that such series is also targeted for a
> specific software subsystem..
> 
> How, this series reuses the existing capability by introducing just two more
> registers on the existing common cfg structure and you think it targets a specific
> software subsystem? If this is true, I think you are actually challenging the
> design of the whole modern PCI transport.
> 
No. the way I understood is you are targeting trap+emulation approach that you posted.
You need to show that your mechanism also works for passthrough that it is proved that it is not targeted for a specific use case.

> > And hence failure.
> 
> Failure in what sense?
> 
You defined the failure first when quoted passthrough. :)

> >
> > I didn’t say that, I said the opposite that yes, since the virtio is in catch up
> mode, it is defining the interface so that it can fit into these OS platforms.
> > Mostly multiple of them, who all support passthrough devices.
> 
> We are talking about different things again.
> 
:)

> >
> > > I never knew a mainstream hypervisor that doesn't do trap and
> > > emulate, did you?
> > >
> > It does trap and emulation for PCI config space, not for virtio interfaces like
> queues, config space and more for passthrough devices.
> 
> Well, we are in the context of live migration, no? We all know passthrough just
> works fine with the existing virtio spec...
> 
Right and we want to continue to make passtrough work fine with device migration.
So we are in the passthrough context where only PCI specific things are trapped as before, without additional virtio traps.

> >
> > > >
> > > > > >
> > > > > > > The FLR, P2P demonstrates the fragility of a simple
> > > > > > > passthrough method and how it conflicts with live migration
> > > > > > > and complicates the device
> > > > > implementation.
> > > > > > Huh, it shows the opposite.
> > > > > > It shows that both will seamlessly work.
> > > > >
> > > > > Have you even tried your proposal with a prototype device?
> > > > Of course, it is delivered to user for 1.5 years ago before
> > > > bringing it to the
> > > spec with virtio-net and virtio-blk devices.
> > >
> > > I hope this is your serious answer, but it looks like it is not.
> > > Your proposal misses a lot of states as I pointed out in another thread, how
> can it work in fact?
> > >
> > Which states?
> 
> Let me repeat it for the third time. You don't even cover all the functionality of
> common cfg, how can guests see a consistent common cfg state?
> 
Please respond in that series, what is missing. I will fix it in v1.

> > What is posted in series [1] is minimal and base required items,
> 
> You need to prove it is minimal, instead of ignoring my questions. For example,
> dirty page tracking is definitely optional.
>
Again, reviews are not proof based. Please review the series.
 
It is optional that significantly improves the VM downtime in pre-copy approach.

> > optional one is omitted as it can be done incrementally.
> > Lingshan had hard time digesting the basics of P2P and dirty page tracking
> work in this short series.
> 
> You never explain why this series needs to deal with P2P and dirty page
> tracking.
> 
Please read my response to him, you likely missed it.

> > So there is no point in pushing large part of the device context and making the
> series blurry.
> 
> I don't see a good definition of "device context" and most of the device context
> has been covered by the existing PCI capabilities.
> 
Please respond in that patch.  Device context is well defined in the theory of operation [2] and also in the independent patch [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#u
[2] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#m12a5f675aaa95a1de8945772a3f5d1efb0c9e25e

> > It will be done incrementally subsequently.
> >
> > > > > >
> > > > > > > And it means you need to audit all PCI features and do
> > > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > > No need for any of this.
> > > > >
> > > > > You need to prove this otherwise it's fragile. It's the duty of
> > > > > the author to justify not the reviewer.
> > > > >
> > > > One cannot post patches and nor review giant series in one go.
> > > > Hence the work to be split on a logical boundary.
> > > > Features provisioning, pci layout etc is secondary tasks to take care of.
> > >
> > > Again, if you know something is missing, you need to explain it in
> > > the series instead of waiting for some reviewers to point it out and
> > > say it's well-known afterwards.
> > >
> > The patch set cannot be a laundry list of items missing in virtio spec.
> > It is short and focused on the device migration.
> 
> You need to mention it in the cover letter at least for a big picture at least,
> what's wrong with this? It helps to save time for everyone or people will keep
> asking similar questions. Is this too hard to be understood?
> 
No, it is not hard.
I will mention about adjacent features in the cover letter.
> >
> > > >
> > > > > For example FLR is required to be done in 100ms. How could you
> > > > > achieve this during the live migration? How does it affect the
> > > > > downtime and
> > > FRS?
> > > > >
> > > > Good technical question to discuss instead of passthrough vs
> > > > mediation. :)
> > > >
> > > > Device administration work is separate from the device operational part.
> > > > The device context records what is the current device context,
> > > > when the FLR
> > > occurs, the device stops all the operations.
> > > > And on next read of the device context the FLRed context is returned.
> > >
> > > Firstly, you didn't explain how it affects the live migration, for
> > > example, what happens if we try to migrate while FLR is ongoing.
> > > Secondly, you ignore the other two questions.
> > >
> > > Let's save the time of both.
> > >
> > There is nothing to explain about device reset and live migration, because
> there is absolutely there is no touch points.
> 
> Do you think this is a valid answer to my above question? Let's don't exhaust the
> patience from any reviewer.
> 
You asked follow up related questions above.
A device status update do not affect the live migration.
Reading/writing other registers do not affect the live migration.

I am not sure such explicit mention is worth in the spec, but if you find it useful, I will add it v1.

> > device_status is just another registers like rest of them.
> 
> I don't see device status itself as anything related to FLR.
> 
I don’t follow your above comment.

> > One does not need to poke around registers when doing passthrough.
> >
> > > >
> > > > > >
> > > > > > > This is tricky and we are migrating virtio not virtio-pci.
> > > > > > > If we don't use simple passthrough we don't need to care about this.
> > > > > > >
> > > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > > >
> > > > > No, the migration facility is a general requirement for all transport.
> > > > It is for all transport. One can extend when do for MMIO.
> > >
> > > By using admin commands? It can not perform well for registered.
> > >
> > Yes, admin commands using AQ on MMIO based owner device will also be
> just fine.
> 
> Can admin commands be implemented efficiently via registered? I would like to
> see how it can work.
> 
Well you always liked MMIO for long time to do everything via MMIO registers, so you should define it.
I don’t see any modern device implementing it. May be vendors who want to focus on nested use case, may do.

> MMIO doesn't have the concepts of group owner etc at all or do you know how
> to build one?
I think Michael suggested to have new group type. That would work.

> 
> >
> > > >
> > > > > Starting from a PCI specific (actually your proposal does not
> > > > > even cover all even for PCI) solution which may easily end up
> > > > > with issues in other
> > > transports.
> > > > >
> > > > Like?
> > >
> > > The admin command/virtqueue itself may not work well for other transport.
> > > That's the drawback of your proposal while this proposal doesn't do
> > > any coupling.
> > >
> > There is no coupling in the spec of admin command with virtqueue as
> Michael consistently insisted.
> > And in my proposal also there is no such coupling.
> 
> I hope so but I don't think so. We need to at least do this explicitly by moving all
> the state definitions to the "basic facility" part.
I am not sure who will use it beyond device migration use case.
Maybe it can be moved at that point in future.

> 
> >
> > > >
> > > > > Even if you want to migrate virtio for PCI,  please at least
> > > > > read Qemu migration codes for virtio and PCI, then you will soon
> > > > > realize that a lot of things are missing in your proposal.
> > > > >
> > > > Device context is something that will be extended.
> > > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > > transport.
> > >
> > > This is just one mini stuff, how about PCI config space and others?
> > >
> > No need to migrate the PCI config space, because migration is of the virtio
> device, and not the underlying transport.
> 
> Let me ask you a simple question, if you don't migrate the PCI config space,
> how can you guarantee that guests see the same config space state after
> migration? What happens if a specific capability exists only in the src but not
> the destination? Or do you want to provision PCI capabilities?
> 
PCI capabilities are to be provisioned only if it is needed.
It is optional.
One can check if they match or not.

> > Therefore, one can migrate from virtio member device to a fully software
> based device as well and vis versa.
> 
> Please answer my question above.
> 
> >
> > > Again, please read Qemu codes, a lot of things are missing in your
> > > proposal now. If everything is fine to do passthrough based live
> > > migration, I'm pretty sure you need more than what Qemu has since it
> > > can only do a small fraction of the whole PCI.
> > >
> > I will read.
> > Many of the pieces may be implemented by the device over time following
> the charter.
> >
> > > >
> > > > > > As usual, if you have to keep arguing about not doing
> > > > > > passhthrough, we are
> > > > > surely past that point.
> > > > >
> > > > > Who is "we"?
> > > > >
> > > > We = You and me.
> > > > From 2021, you keep objecting that passthrough must not be done.
> > >
> > > This is a big misunderstanding, you need to justify it or at least
> > > address the concerns from any reviewer.
> > >
> > They are getting addressed, if you have comments, please post those
> comments in the actual series.
> > I wouldn’t diverge to discuss in different series here.
> 
> Well, Lingshan's series was posted before you and it's you that keep referring to
> your proposal here. What's more, I've asked some questions but most of them
> don't have a good answer.  So I need to stop before I can ask more.
> 
If you really want to count the timing that you got to go back to 2021 or so which series posted first. :)

Please ask your question in the relevant series and not Lingshan's series.

> >
> > > > And blocking the work done by other technical committee members to
> > > improve the virtio spec to make that happen is simply wrong.
> > >
> > > It's unrealistic to think that one will be 100% correct. Justify
> > > your proposal or why I was wrong instead of ignoring my questions
> > > and complaining. That is why we need a community. If it doesn't
> > > work, virtio provides another process for convergence.
> > >
> > I am not expecting you to be correct at all. I totally agree that you may miss
> something, I may miss something.
> > And this is why I repeatedly, humbly ask to converge and jointly address the
> passthrough mode without trap+emulation method.
> > The way I understood from your comment is, passthrough for hw based device
> must not be done and multiple of hw vendors disagree to it.
> 
> Again, this is a big misunderstanding. Passthrough can work doesn't mean your
> proposal can work. I'm asking questions and want to figure out if/how it can
> work correctly. But you keep ignoring them or raising other unrelated issues.
> 
Since you agree that passthrough is equally valid case,
Lets review the passthrough series line by line.
This is not the right email thread to review passthrough.

> >
> > > >
> > > > > Is something like what you said here passed the vote and written
> > > > > to the spec?
> > > > Not only me.
> > > > The virtio technical committee has agreed for nested and
> > > > hardware-based
> > > implementation _both_.
> > > >
> > > > " hardware-based implementations" is part of the virtio
> > > > specification charter
> > > with ballot of [1].
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> > >
> > > Let's don't do conceptual shifts, I was asking the passthrough but
> > > you give me the hardware implementation.
> > >
> > Passthrough devices implemented by hw which does dirty tracking and
> following the spec.
> 
> Why is passthrough coupled with dirty tracking?
> 
You are free to use and extend it without passthrough too.
Can you please explain the use case of doing dirty tracking that you have in mind without passthrough device migration, due to which you want to see it differently?

> >
> > > >
> > > > And passthrough hardware-based device is in the charter that we
> > > > strive to
> > > support.
> > > >
> > > > > We all know the current virtio spec is not built upon passthrough.
> > > >
> > > > This efforts improve the passthrough hw based implementation that
> > > > should
> > > not be blocked.
> > >
> > > Your proposal was posted only for several days and you think I would
> > > block that just because I asked several questions and some of them are not
> answered?
> > >
> > If I misunderstood, then I am sorry.
> > Lets progress and improve the passthrough use case without trap+emulation.
> 
> Unless any reviewer says no, the comments or concerns are a good opportunity
> for you to justify your method. That's what I'm doing right now and how the
> community works.
> 
So lets please continue the review of passthrough work in that series. No need to do it here.

> > Trap+emulation=mediation is also a valid solution for nested case.
> 
> Again. Not only for the nested case. This method has been used for cloud
> vendors now.
> 
We are advancing the virtio spec for future, and there is no reason for it to be limited to only nested case.

> > And I frankly see a need for both as both are solving a different problem.
> 
> Then, let's don't couple state, suspending, dirty page tracking with admin
> commands.
> 
Please explain the use case for your proposal.
I think it is incorrect to say coupled.
It is the way to do it.
One can invent some other way when admin commands does not fit the requirement for the explained use case.

> > Trap+emulation cannot achieve passthrough mode, hence my request was not
> to step on each other.
> 
> It's easy to not step on others, but it would end up with duplications for sure.
> 
This is why I keep asking the author to review others work to converge, but author is not cooperative to do the joint community work.

> >
> > When both can use the common infra, it is good to do that, when they cannot,
> due to the technical challenges of underlying transport, they should evolve
> differently.
> >
> > > >
> > > > > > Virtio does not need to stay in the weird umbrella to always mediate
> etc.
> > > > >
> > > > > It's not the mediation, we're not doing vDPA, the device model
> > > > > we had in hardware and we present to guests are all virtio devices.
> > > > > It's the trap and emulation which is fundamental in the world of
> > > > > virtualization for the past decades. It's the model we used to
> > > > > virtualize standard devices. If you want to debate this
> > > > > methodology, virtio
> > > community is clearly the wrong forum.
> > > > >
> > > > I am not debating it at all. You keep bringing up the point of mediation.
> > > >
> > > > The proposal of [1] is clear that wants to do hardware based
> > > > passthrough
> > > devices with least amount of virtio level mediation.
> > > >
> > > > So somewhere mode of virtualizing has been used, that’s fine, it
> > > > can continue with full virtualization, mediation,
> > > >
> > > > And also hardware based passthrough device.
> > > >
> > > > > >
> > > > > > Series [1] will be enhanced further to support virtio
> > > > > > passthrough device for
> > > > > device context and more.
> > > > > > Even further we like to extend the support.
> > > > > >
> > > > > > > Since the functionality proposed in this series focus on the
> > > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > > specific and self contained so nothing special is required
> > > > > > > to work in the
> > > nest.
> > > > > >
> > > > > > Maybe it is.
> > > > > >
> > > > > > Again, I repeat and like to converge the admin commands
> > > > > > between
> > > > > passthrough and non-passthrough cases.
> > > > >
> > > > > You need to prove at least that your proposal can work for the
> > > > > passthrough before we can try to converge.
> > > > >
> > > > What do you mean by "prove"? virtio specification development is
> > > > not proof
> > > based method.
> > >
> > > For example, several of my questions were ignored.
> > >
> > I didn’t ignore, but if I miss, I will answer.
> >
> > > >
> > > > If you want to participate, please review the patches and help
> > > > community to
> > > improve.
> > >
> > > See above.
> > >
> > > >
> > > > > > If we can converge it is good.
> > > > > > If not both modes can expand.
> > > > > > It is not either or as use cases are different.
> > > > >
> > > > > Admin commands are not the cure for all, I've stated drawbacks
> > > > > in other threads. Not repeating it again here.
> > > > He he, sure, I am not attempting to cure all.
> > > > One solution does not fit all cases.
> > >
> > > Then why do you want to couple migration with admin commands?
> > >
> > Because of following.
> > 1. A device migration needs to bulk data transfer, this is something cannot be
> done with tiny registers.
> > Cannot be done through registers, because a. registers are slow for
> > bidirectional communication b. do not scale well with scale of VFs
> 
> That's pretty fine, but let's not limit it to a virtqueue. Virtqueue may not work
> for all the cases:
> 
> I must repeat some of Ling Shan's questions since I don't see a good answer for
> them now.
> 
> 1) If you want to use virtqueue to do the migration with a downtime
> requirement. Is the driver required to do some sort of software QOS?
Should not require.

> For example what happens if one wants to migrate but the admin virtqueue is
> out of space? 
When error code EGAIN is returned, migration may be retried.
Alternatively, driver can also wait and retry.
A device may be able to support multiple VQs as well.
Many options are possible.

> And do we need a timeout for a specific command and if yes what
> happens after the timeout?
If timeout occurs, it is likely a failure from the device.
One can retry or mark the device in error.

> 2) Assuming one round of the migration requires several commands. Are they
> allowed to be submitted in a batch? 
Yes, can be submitted in the batch when they are unrelated.

> If yes, how is the ordering guaranteed or
> we don't need it at all? If not, why do we even need a queue?
> 
Software can order them if needed.
In the OS UAPI we explored, does not require any ordering.
Queue to parallelize the work of multiple unrelated member device migrations.

> If you're using an existing transport specific mechanism, you don't need to care
> about the above. I'm not saying admin virtqueue can't work but it definitely has
> more things to be considered.
> 
Ok. yes, admin virtqueue is considered as transport agnostic method.

> >
> > > > Admin commands are used to solve the specific problem for which
> > > > the AQ is
> > > designed for.
> > > >
> > > > One can make argument saying take pci fabric to 10 km distance,
> > > > don’t bring
> > > new virtio tcp transport...
> > > >
> > > > Drawing boundaries around virtio spec in certain way only makes it
> > > > further
> > > inferior. So please do not block advancements bring in [1].
> > >
> > > As a reviewer, I ask questions but some of them are ignored, do you
> > > expect the reviewer to figure out by themselves?
> > Sure, please review.
> >
> > Many of them were not questions, but assertion and conclusions that it does
> not fit nested.. and sub-optional etc.
> 
> I think we all agree that your proposal does not fit for nesting, no?
Sure. I never claimed it works.

> It demonstrates that work needs to be done in the basic facility first.
You continue to claim to hint nesting + trap+emulation must be  done first.
I disagree to what you define as first.
Both are valid use cases and both can progress.

> 
> What's more the conclusion is for coupling live migration with admin
> command. This point has been clarified several times before.
Well it has to be connected to something.
Equally to say the live migration to be not connected with device status.

> 
> >
> > >
> > > > We really would like to make it more robust with your rich
> > > > experience and
> > > inputs, if you care to participate.
> > >
> > > We can collaborate for sure: as I pointed out in another threads,
> > > from what I can see from the both proposals of the current version:
> > >
> > > I see a good opportunity to build your admin commands proposal on
> > > top of this proposal. Or it means, we can focus on what needs to be
> migrated first:
> > >
> > > 1) queue state
> > This is just one small part of the device context So once a device
> > context is read/written, it covers q.
> 
> That's a layer violation. Virtqueue is the basic facility, states need to be defined
> there.
> 
It is not layer violation.
But anyways, the state is defined in the basic facility in my series.

> >
> > > 2) inflight descriptors
> > Same a q state, it is part of the device context.
> 
> Admin commands are not the only way to access device context. For example,
> do you agree the virtqueue address is part of the device context? If yes, it is
> available in the common configuration now.
> 
Virtqueue address is accessible to common configuration for the guest, for the obvious reason.

Device context is accessible to basic migration facility to cover the case where migration facility is not trapping any of the virtio guest accesses.

> >
> > > 3) dirty pages (optional)
> > > 4) device state(context) (optional)
> > >
> > It is same as #1 and #2.
> > Splitting them from #1 and #2 is not needed.
> >
> > We can extend the device context to be selectively queried for nested case..
> >
> > > I'd leave 3 or 4 since they are very complicated features. Then we
> > > can invent an interface to access those facilities? This is how this series is
> structured.
> > >
> > > And what's more, admin commands or transport specific interfaces.
> > > And when we invent admin commands, you may realize you are inventing
> > > a new transport which is the idea of transport via admin commands.
> >
> > Not really. it is not a new transport at all.
> > I explained you before when you quote is as transport, it must carry the driver
> notifications as well..
> > Otherwise it is just set of commands..
> 
> I've explained that you need admin commands to save and load all existing
> virtio PCI capabilities. This means a driver can just use those commands to
> work. If not, please explain why I was wrong.

Virtio pci capabilities are read only except the one, which needs to migrate.
Those caps needs to match on src and dst side.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:45                             ` Jason Wang
  2023-09-13  6:39                               ` Parav Pandit
@ 2023-09-13  8:27                               ` Michael S. Tsirkin
  2023-09-14  3:11                                 ` Jason Wang
  1 sibling, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-13  8:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu, Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> For example, KVM doesn't use
> hardware to log dirty pages.

It uses a mix of PML, PTE bit and EPT write protection.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  8:27                               ` Michael S. Tsirkin
@ 2023-09-14  3:11                                 ` Jason Wang
  0 siblings, 0 replies; 148+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu, Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 4:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> > For example, KVM doesn't use
> > hardware to log dirty pages.
>
> It uses a mix of PML, PTE bit and EPT write protection.

Well EPT/PML is Intel specific, the minimal requirement is page fault.
The logging is done by software anyhow.

Virtio can choose to go device page fault for sure.

Thanks

>
> --
> MST
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:47           ` Parav Pandit
  2023-09-11  6:58             ` Zhu, Lingshan
@ 2023-09-12  4:18             ` Jason Wang
  2023-09-12  6:11               ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-12  4:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 11, 2023 12:01 PM
> >
> > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Michael,
> > >
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > Sent: Monday, September 11, 2023 8:31 AM
> > > >
> > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > > > >
> > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > This patch adds two new le16 fields to common configuration
> > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > >
> > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > >
> > > > >
> > > > > I do not see why this would be pci specific at all.
> > > >
> > > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > > >
> > > > It can choose to reuse the common configuration or not, but the
> > > > semantic is general enough to be used by other transports. We can
> > > > introduce one for MMIO for sure.
> > > >
> > > > >
> > > > > But besides I thought work on live migration will use admin queue.
> > > > > This was explicitly one of the motivators.
> > > >
> > > Please find the proposal that uses administration commands for device
> > migration at [1] for passthrough devices.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > This proposal couples live migration with several requirements, and suffers from
> > the exact issues I've mentioned below.
> >
> It does not.
> Can you please list which one?
>
> > In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> > other than the device status).
> >
> There is no state machine in [1].

Isn't the migration modes of "active, stop, freeze" a state machine?

> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>

For example:

+struct virtio_dev_ctx_pci_vq_cfg {
+ le16 vq_index;
+        le16 queue_size;
+        le16 queue_msix_vector;
+ le64 queue_descÍ
+        le64 queue_driverÍ
+        le64 queue_deviceÍ
+};
+\end{lstlisting}

And does this mean we will have commands for MMIO and other transport?
(Most of the fields except the msix are general enough). And it's just
a partial implementation of the queue related functionality of the
common cfg, so I wonder how it can work.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:18             ` Jason Wang
@ 2023-09-12  6:11               ` Parav Pandit
  2023-09-12  6:43                 ` Zhu, Lingshan
  2023-09-13  4:43                 ` Jason Wang
  0 siblings, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  6:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:48 AM
> 
> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Monday, September 11, 2023 12:01 PM
> > >
> > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > > From: virtio-comment@lists.oasis-open.org
> > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > >
> > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > <mst@redhat.com>
> > > wrote:
> > > > > >
> > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > >
> > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > >
> > > > > >
> > > > > > I do not see why this would be pci specific at all.
> > > > >
> > > > > This is the PCI interface for live migration. The facility is not specific to
> PCI.
> > > > >
> > > > > It can choose to reuse the common configuration or not, but the
> > > > > semantic is general enough to be used by other transports. We
> > > > > can introduce one for MMIO for sure.
> > > > >
> > > > > >
> > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > This was explicitly one of the motivators.
> > > > >
> > > > Please find the proposal that uses administration commands for
> > > > device
> > > migration at [1] for passthrough devices.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > This proposal couples live migration with several requirements, and
> > > suffers from the exact issues I've mentioned below.
> > >
> > It does not.
> > Can you please list which one?
> >
> > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > state machine other than the device status).
> > >
> > There is no state machine in [1].
> 
> Isn't the migration modes of "active, stop, freeze" a state machine?
> 
Huh, no. Each mode stops/starts specific thing.
Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
If you call suspend resume as states, it is still 2 state state machines. :)

> > It is not coupled with PCI/SR-IOV either.
> > It supports PCI/SR-IOV transport and in future other transports too when they
> evolve.
> >
> 
> For example:
> 
> +struct virtio_dev_ctx_pci_vq_cfg {
> + le16 vq_index;
> +        le16 queue_size;
> +        le16 queue_msix_vector;
> + le64 queue_descÍ
> +        le64 queue_driverÍ
> +        le64 queue_deviceÍ
> +};
> +\end{lstlisting}
> 
> And does this mean we will have commands for MMIO and other transport?

There are transports so yes, field structures from the device context will have PCI specific items.

> (Most of the fields except the msix are general enough). And it's just a partial
> implementation of the queue related functionality of the common cfg, so I
> wonder how it can work.
> 
As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:11               ` Parav Pandit
@ 2023-09-12  6:43                 ` Zhu, Lingshan
  2023-09-12  6:52                   ` Parav Pandit
  2023-09-13  4:43                 ` Jason Wang
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:43 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 2:11 PM, Parav Pandit wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 12, 2023 9:48 AM
>>
>> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Monday, September 11, 2023 12:01 PM
>>>>
>>>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
>> wrote:
>>>>> Hi Michael,
>>>>>
>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>>>
>>>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
>>>>>> <mst@redhat.com>
>>>> wrote:
>>>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>>>> This patch adds two new le16 fields to common configuration
>>>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>>>
>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>>>
>>>>>>> I do not see why this would be pci specific at all.
>>>>>> This is the PCI interface for live migration. The facility is not specific to
>> PCI.
>>>>>> It can choose to reuse the common configuration or not, but the
>>>>>> semantic is general enough to be used by other transports. We
>>>>>> can introduce one for MMIO for sure.
>>>>>>
>>>>>>> But besides I thought work on live migration will use admin queue.
>>>>>>> This was explicitly one of the motivators.
>>>>> Please find the proposal that uses administration commands for
>>>>> device
>>>> migration at [1] for passthrough devices.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>> 61.h
>>>>> tml
>>>> This proposal couples live migration with several requirements, and
>>>> suffers from the exact issues I've mentioned below.
>>>>
>>> It does not.
>>> Can you please list which one?
>>>
>>>> In some cases, it's even worse (coupling with PCI/SR-IOV, second
>>>> state machine other than the device status).
>>>>
>>> There is no state machine in [1].
>> Isn't the migration modes of "active, stop, freeze" a state machine?
>>
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)
Why need P2P for Live Migration?
>
>>> It is not coupled with PCI/SR-IOV either.
>>> It supports PCI/SR-IOV transport and in future other transports too when they
>> evolve.
>> For example:
>>
>> +struct virtio_dev_ctx_pci_vq_cfg {
>> + le16 vq_index;
>> +        le16 queue_size;
>> +        le16 queue_msix_vector;
>> + le64 queue_descÍ
>> +        le64 queue_driverÍ
>> +        le64 queue_deviceÍ
>> +};
>> +\end{lstlisting}
>>
>> And does this mean we will have commands for MMIO and other transport?
> There are transports so yes, field structures from the device context will have PCI specific items.
>
>> (Most of the fields except the msix are general enough). And it's just a partial
>> implementation of the queue related functionality of the common cfg, so I
>> wonder how it can work.
>>
> As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
> True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
> And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:43                 ` Zhu, Lingshan
@ 2023-09-12  6:52                   ` Parav Pandit
  2023-09-12  7:36                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  6:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:13 PM

> Why need P2P for Live Migration?
A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:52                   ` Parav Pandit
@ 2023-09-12  7:36                     ` Zhu, Lingshan
  2023-09-12  7:43                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:36 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 2:52 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:13 PM
>> Why need P2P for Live Migration?
> A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
> Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.
Is it beyond the spec? Nvidia specific use case and not related to 
virtio live migration?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:36                     ` Zhu, Lingshan
@ 2023-09-12  7:43                       ` Parav Pandit
  2023-09-12 10:27                         ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12  7:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 1:06 PM
> 
> On 9/12/2023 2:52 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
> >> Migration?
> > A peer device may be accessing the virtio device. Hence first all the devices to
> be stopped like [1] allowing them to accept driver notifications from the peer
> device.
> > Once all the devices are stopped, than each device to be freeze to not do any
> device context updates. At this point the final device context can be read by the
> owner driver.
> Is it beyond the spec? Nvidia specific use case and not related to virtio live
> migration?
Not at all Nvidia specific.
And not all at all beyond the specification.
PCI transport is probably by far most common transport of virtio.
And hence, spec proposed in [1] covers it.

It is the base line implementation of leading OS such as Linux kernel.

Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:43                       ` Parav Pandit
@ 2023-09-12 10:27                         ` Zhu, Lingshan
  2023-09-12 10:33                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 3:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 1:06 PM
>>
>> On 9/12/2023 2:52 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
>>>> Migration?
>>> A peer device may be accessing the virtio device. Hence first all the devices to
>> be stopped like [1] allowing them to accept driver notifications from the peer
>> device.
>>> Once all the devices are stopped, than each device to be freeze to not do any
>> device context updates. At this point the final device context can be read by the
>> owner driver.
>> Is it beyond the spec? Nvidia specific use case and not related to virtio live
>> migration?
> Not at all Nvidia specific.
> And not all at all beyond the specification.
> PCI transport is probably by far most common transport of virtio.
> And hence, spec proposed in [1] covers it.
>
> It is the base line implementation of leading OS such as Linux kernel.
>
> Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
still, why do you think P2P is a concern of live migration? Or why do 
you think Live Migration should implement special support for P2P?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:27                         ` Zhu, Lingshan
@ 2023-09-12 10:33                           ` Parav Pandit
  2023-09-12 10:35                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 10:33 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Tuesday, September 12, 2023 3:57 PM

> > It is the base line implementation of leading OS such as Linux kernel.
> >
> > Decade mature stack like vfio recommends support for p2p as base line
> without which multiple devices migration can fail as hypervisor has no
> knowledge if two devices are interacting or not.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> still, why do you think P2P is a concern of live migration? Or why do you think
> Live Migration should implement special support for P2P?

It is because p2p needs to work with live migration.
You probably missed the response.
I answered before at [1].

[1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:33                           ` Parav Pandit
@ 2023-09-12 10:35                             ` Zhu, Lingshan
  2023-09-12 10:41                               ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:35 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 6:33 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Tuesday, September 12, 2023 3:57 PM
>>> It is the base line implementation of leading OS such as Linux kernel.
>>>
>>> Decade mature stack like vfio recommends support for p2p as base line
>> without which multiple devices migration can fail as hypervisor has no
>> knowledge if two devices are interacting or not.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> still, why do you think P2P is a concern of live migration? Or why do you think
>> Live Migration should implement special support for P2P?
> It is because p2p needs to work with live migration.
> You probably missed the response.
> I answered before at [1].
I mean, why do you think my series can not work with P2P
>
> [1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:35                             ` Zhu, Lingshan
@ 2023-09-12 10:41                               ` Parav Pandit
  2023-09-12 13:09                                 ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 10:41 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:05 PM

> I mean, why do you think my series can not work with P2P
Because it misses the intermediate mode STOP that we have in series [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:41                               ` Parav Pandit
@ 2023-09-12 13:09                                 ` Zhu, Lingshan
  2023-09-12 13:35                                   ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:09 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 6:41 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:05 PM
>> I mean, why do you think my series can not work with P2P
> Because it misses the intermediate mode STOP that we have in series [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
Again, when SUSPEND:
1) the device freezes, means stop operation in both data-path and 
control-path, except the device status
2) a new feature bit will be introduced in V2, to allow RESET_VQ after 
SUSPEND
3) if there is a device doing P2P against the device.
They should be pass-through-ed to the same guest and should be suspended 
as well for LM,
or it is a security problem.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:09                                 ` Zhu, Lingshan
@ 2023-09-12 13:35                                   ` Parav Pandit
  2023-09-13  4:13                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-12 13:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:39 PM
> 
> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
> >> series can not work with P2P
> > Because it misses the intermediate mode STOP that we have in series [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> Again, when SUSPEND:
> 1) the device freezes, means stop operation in both data-path and control-path,
> except the device status
Exactly, including the RESET_VQ command also cannot be served because device is frozen.

> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.

> 3) if there is a device doing P2P against the device.
> They should be pass-through-ed to the same guest and should be suspended as
> well for LM, or it is a security problem.
There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:35                                   ` Parav Pandit
@ 2023-09-13  4:13                                     ` Zhu, Lingshan
  2023-09-13  4:19                                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:13 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 9:35 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:39 PM
>>
>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
>>>> series can not work with P2P
>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> Again, when SUSPEND:
>> 1) the device freezes, means stop operation in both data-path and control-path,
>> except the device status
> Exactly, including the RESET_VQ command also cannot be served because device is frozen.
see below
>
>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
> RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.
No, when the device presents SUSPEND, that means the device config space 
is stabilized at that moment, from the SW perspective
the device will not make changes to config space until !SUSPEND.

However at that moment, the driver can still make modification to the 
config space and the driver handles the synchronization(checks, re-read, 
etc),
so the driver is responsible for what it reads.

As you can see, this is not perfect, so SiWei suggest to implement a new 
feature bit to control this, and it will be implemented in V2.
>
>> 3) if there is a device doing P2P against the device.
>> They should be pass-through-ed to the same guest and should be suspended as
>> well for LM, or it is a security problem.
> There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.
As you aware of ACS, that means you have to trust them all, for example 
P2P devices has to be placed in one IOMMU group, and all devices
in the group should be pass-through-ed to a guest
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:13                                     ` Zhu, Lingshan
@ 2023-09-13  4:19                                       ` Parav Pandit
  2023-09-13  4:22                                         ` Zhu, Lingshan
  2023-09-13  4:56                                         ` Jason Wang
  0 siblings, 2 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:19 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:44 AM
> 
> 
> On 9/12/2023 9:35 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:39 PM
> >>
> >> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
> >>>> my series can not work with P2P
> >>> Because it misses the intermediate mode STOP that we have in series [1].
> >>>
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
> >>> .h
> >>> tml
> >> Again, when SUSPEND:
> >> 1) the device freezes, means stop operation in both data-path and
> >> control-path, except the device status
> > Exactly, including the RESET_VQ command also cannot be served because
> device is frozen.
> see below
> >
> >> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
> >> after SUSPEND
> > RESET_VQ after suspend is simply wrong. Because device is already
> suspended to not respond to some  extra RESET_VQ command.
> No, when the device presents SUSPEND, that means the device config space is
> stabilized at that moment, from the SW perspective the device will not make
> changes to config space until !SUSPEND.
> 
> However at that moment, the driver can still make modification to the config
> space and the driver handles the synchronization(checks, re-read, etc), so the
> driver is responsible for what it reads.
>
It should be named as SUSPEND_CFG_SPACE.!
All of this frankly seems intrusive enough as Michael pointed out.
Good luck.
 
> As you can see, this is not perfect, so SiWei suggest to implement a new feature
> bit to control this, and it will be implemented in V2.
> >
> >> 3) if there is a device doing P2P against the device.
> >> They should be pass-through-ed to the same guest and should be
> >> suspended as well for LM, or it is a security problem.
> > There is no security problem. Multiple passthrough devices and P2P is already
> there in PCI using ACS for probably a decade now.
> As you aware of ACS, that means you have to trust them all, for example P2P
> devices has to be placed in one IOMMU group, and all devices in the group
> should be pass-through-ed to a guest
> >
Such things are done by the hypervisor already. There is nothing virtio specific here.
There is no security problem.
If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:19                                       ` Parav Pandit
@ 2023-09-13  4:22                                         ` Zhu, Lingshan
  2023-09-13  4:39                                           ` Parav Pandit
  2023-09-13  4:56                                         ` Jason Wang
  1 sibling, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:19 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:44 AM
>>
>>
>> On 9/12/2023 9:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:39 PM
>>>>
>>>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
>>>>>> my series can not work with P2P
>>>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>>>
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
>>>>> .h
>>>>> tml
>>>> Again, when SUSPEND:
>>>> 1) the device freezes, means stop operation in both data-path and
>>>> control-path, except the device status
>>> Exactly, including the RESET_VQ command also cannot be served because
>> device is frozen.
>> see below
>>>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
>>>> after SUSPEND
>>> RESET_VQ after suspend is simply wrong. Because device is already
>> suspended to not respond to some  extra RESET_VQ command.
>> No, when the device presents SUSPEND, that means the device config space is
>> stabilized at that moment, from the SW perspective the device will not make
>> changes to config space until !SUSPEND.
>>
>> However at that moment, the driver can still make modification to the config
>> space and the driver handles the synchronization(checks, re-read, etc), so the
>> driver is responsible for what it reads.
>>
> It should be named as SUSPEND_CFG_SPACE.!
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
it also SUSPEND the data-path
>   
>> As you can see, this is not perfect, so SiWei suggest to implement a new feature
>> bit to control this, and it will be implemented in V2.
>>>> 3) if there is a device doing P2P against the device.
>>>> They should be pass-through-ed to the same guest and should be
>>>> suspended as well for LM, or it is a security problem.
>>> There is no security problem. Multiple passthrough devices and P2P is already
>> there in PCI using ACS for probably a decade now.
>> As you aware of ACS, that means you have to trust them all, for example P2P
>> devices has to be placed in one IOMMU group, and all devices in the group
>> should be pass-through-ed to a guest
> Such things are done by the hypervisor already. There is nothing virtio specific here.
> There is no security problem.
> If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:22                                         ` Zhu, Lingshan
@ 2023-09-13  4:39                                           ` Parav Pandit
  2023-09-14  8:24                                             ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:39 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:52 AM

> > It should be named as SUSPEND_CFG_SPACE.!
> > All of this frankly seems intrusive enough as Michael pointed out.
> > Good luck.
> it also SUSPEND the data-path
Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
Because it is suspended.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:39                                           ` Parav Pandit
@ 2023-09-14  8:24                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:24 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/13/2023 12:39 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:52 AM
>>> It should be named as SUSPEND_CFG_SPACE.!
>>> All of this frankly seems intrusive enough as Michael pointed out.
>>> Good luck.
>> it also SUSPEND the data-path
> Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
> Because it is suspended.
After suspend, the vqs should not consume more buffers, and should
1)track in-flight descriptors(will be added in next version)
2)Or wait until all in-flight descriptors finish, mark them used and flush


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:19                                       ` Parav Pandit
  2023-09-13  4:22                                         ` Zhu, Lingshan
@ 2023-09-13  4:56                                         ` Jason Wang
  1 sibling, 0 replies; 148+ messages in thread
From: Jason Wang @ 2023-09-13  4:56 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 12:20 PM Parav Pandit <parav@nvidia.com> wrote:
>
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
>

How do you define "intrusive"?

To me it's much less intrusive than what you've proposed.

1) It gives sufficient flexibility to implement migration via any
transport specific interface. It means your proposal could be built on
top of this but not vice versa.
2) It doesn't need to re-invent the wheels to save and load all the
existing PCI capabilities but your proposal needs to do that in order
to be self contained which turns out to be a new transport which
duplicates with the work of Ling Shan
3) It reuse the device status state machine instead of inventing a other

Only small extensions are required for device implementation to
migrate instead of coupling it with admin commands.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:11               ` Parav Pandit
  2023-09-12  6:43                 ` Zhu, Lingshan
@ 2023-09-13  4:43                 ` Jason Wang
  2023-09-13  4:46                   ` Parav Pandit
  1 sibling, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-13  4:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Tue, Sep 12, 2023 at 2:11 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:48 AM
> >
> > On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Monday, September 11, 2023 12:01 PM
> > > >
> > > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Michael,
> > > > >
> > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > > >
> > > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > > <mst@redhat.com>
> > > > wrote:
> > > > > > >
> > > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > > >
> > > > > > >
> > > > > > > I do not see why this would be pci specific at all.
> > > > > >
> > > > > > This is the PCI interface for live migration. The facility is not specific to
> > PCI.
> > > > > >
> > > > > > It can choose to reuse the common configuration or not, but the
> > > > > > semantic is general enough to be used by other transports. We
> > > > > > can introduce one for MMIO for sure.
> > > > > >
> > > > > > >
> > > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > > This was explicitly one of the motivators.
> > > > > >
> > > > > Please find the proposal that uses administration commands for
> > > > > device
> > > > migration at [1] for passthrough devices.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > This proposal couples live migration with several requirements, and
> > > > suffers from the exact issues I've mentioned below.
> > > >
> > > It does not.
> > > Can you please list which one?
> > >
> > > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > > state machine other than the device status).
> > > >
> > > There is no state machine in [1].
> >
> > Isn't the migration modes of "active, stop, freeze" a state machine?
> >
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)

It's not about how many states in a single state machine, it's about
how many state machines that exist for device status. Having more than
one creates big obstacles and complexity in the device. You need to
define the interaction of each state otherwise you leave undefined
behaviours.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:43                 ` Jason Wang
@ 2023-09-13  4:46                   ` Parav Pandit
  2023-09-14  3:12                     ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-13  4:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM

> It's not about how many states in a single state machine, it's about how many
> state machines that exist for device status. Having more than one creates big
> obstacles and complexity in the device. You need to define the interaction of
> each state otherwise you leave undefined behaviours.
The device mode has zero relation to the device status. It does not mess with it at all.
In fact the new bits in device status is making it more complex for the device to handle.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:46                   ` Parav Pandit
@ 2023-09-14  3:12                     ` Jason Wang
  2023-09-17  5:29                       ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-14  3:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
>
> > It's not about how many states in a single state machine, it's about how many
> > state machines that exist for device status. Having more than one creates big
> > obstacles and complexity in the device. You need to define the interaction of
> > each state otherwise you leave undefined behaviours.
> The device mode has zero relation to the device status.

You will soon get this issue when you want to do nesting.

> It does not mess with it at all.
> In fact the new bits in device status is making it more complex for the device to handle.

Are you challenging the design of the device status? It's definitely
too late to do this.

This proposal increases just one bit and that worries you? Or you
think one more state is much more complicated than a new state machine
with two states?

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:12                     ` Jason Wang
@ 2023-09-17  5:29                       ` Parav Pandit
  2023-09-19  4:25                         ` Jason Wang
  0 siblings, 1 reply; 148+ messages in thread
From: Parav Pandit @ 2023-09-17  5:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:42 AM
> 
> On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> >
> > > It's not about how many states in a single state machine, it's about
> > > how many state machines that exist for device status. Having more
> > > than one creates big obstacles and complexity in the device. You
> > > need to define the interaction of each state otherwise you leave undefined
> behaviours.
> > The device mode has zero relation to the device status.
> 
> You will soon get this issue when you want to do nesting.
> 
I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

> > It does not mess with it at all.
> > In fact the new bits in device status is making it more complex for the device
> to handle.
> 
> Are you challenging the design of the device status? It's definitely too late to do
> this.
> 
No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

> This proposal increases just one bit and that worries you? Or you think one
> more state is much more complicated than a new state machine with two
> states?

It is mode and not state. And two modes are needed for supporting P2P device.
When one wants to do with mediation, there also two states are needed.

The key is modes are not interacting with the device_status because device_status is just another register of the virtio.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:29                       ` Parav Pandit
@ 2023-09-19  4:25                         ` Jason Wang
  2023-09-19  7:32                           ` Parav Pandit
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-19  4:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:42 AM
> >
> > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > >
> > > > It's not about how many states in a single state machine, it's about
> > > > how many state machines that exist for device status. Having more
> > > > than one creates big obstacles and complexity in the device. You
> > > > need to define the interaction of each state otherwise you leave undefined
> > behaviours.
> > > The device mode has zero relation to the device status.
> >
> > You will soon get this issue when you want to do nesting.
> >
> I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

Well, how can you trap it? You have admin vq in L0, it means the
suspending is never exposed to L1 unless you assign the owner to L1.
Is this what you want?

>
> > > It does not mess with it at all.
> > > In fact the new bits in device status is making it more complex for the device
> > to handle.
> >
> > Are you challenging the design of the device status? It's definitely too late to do
> > this.
> >
> No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

You never explain why.

>
> > This proposal increases just one bit and that worries you? Or you think one
> > more state is much more complicated than a new state machine with two
> > states?
>
> It is mode and not state. And two modes are needed for supporting P2P device.

You keep saying you are migrating the core virtio devices but then you
are saying it is required for PCI. And you never explain why it can't
be done by reusing the device status bit.

> When one wants to do with mediation, there also two states are needed.
>
> The key is modes are not interacting

You need to explain why they are not interacting. It touches the
virtio facility which (partially) overlaps the function of the device
status for sure. You invent a new state machine, and leave the vendors
to guess how or why they are not interacting with the existing one.
There are just too many corner cases that need to be figured out.

For example:

How do you define stop? Is it a virtio level stop, transport level or
a mixing of them both? Is the device allowed to stop in the middle or
reset, feature negotiation or even transport specific things like FLR?
If yes, how about other operations and who defines and maintains those
transitional states? If not, why and how long would a stop wait for an
operation? Can a stop fail? What happens if the driver wants to reset
but the device is stopped by the admin commands? Who suppresses who
and why?

This demonstrates the complexity of your proposal and I don't see any
of the above were clearly stated in your series. Reusing the existing
device status machine, everything would be simplified.

> with the device_status because device_status is just another register of the virtio.

Let's don't do layer violation, device status is the basic facility of
the virtio device which is not coupled with any transport so it is not
necessarily implemented via registers.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:25                         ` Jason Wang
@ 2023-09-19  7:32                           ` Parav Pandit
  0 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 9:56 AM

> On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:42 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > >
> > > > > It's not about how many states in a single state machine, it's
> > > > > about how many state machines that exist for device status.
> > > > > Having more than one creates big obstacles and complexity in the
> > > > > device. You need to define the interaction of each state
> > > > > otherwise you leave undefined
> > > behaviours.
> > > > The device mode has zero relation to the device status.
> > >
> > > You will soon get this issue when you want to do nesting.
> > >
> > I don’t think so. One needs to intercept it when one wants to do
> trap+emulation which seems to fullfil the nesting use case.
> 
> Well, how can you trap it? You have admin vq in L0, it means the suspending is
> never exposed to L1 unless you assign the owner to L1.
> Is this what you want?
> 
When nesting is not done, it is not needed.
Only the nest cases need to trap it.
So when one want to do nesting use case, one should also place the admin peer PF in that guest.
Right assign one VF and its peer admin VF to L1.

> >
> > > > It does not mess with it at all.
> > > > In fact the new bits in device status is making it more complex
> > > > for the device
> > > to handle.
> > >
> > > Are you challenging the design of the device status? It's definitely
> > > too late to do this.
> > >
> > No. I am saying the extending device_status with yet another state is equally
> complex and its core of the device.
> 
> You never explain why.
If you are comparing two methods, then a new feature adds complexity.
Hence, they both score equal adding complexity for new feature.
In case of device_status one needs to things synchronously.
This adds complexity on the device side to answer those registers in hot downtime path.
When done over admin commands, they happen in parallel.
 
> 
> >
> > > This proposal increases just one bit and that worries you? Or you
> > > think one more state is much more complicated than a new state
> > > machine with two states?
> >
> > It is mode and not state. And two modes are needed for supporting P2P
> device.
> 
> You keep saying you are migrating the core virtio devices but then you are
> saying it is required for PCI. And you never explain why it can't be done by
> reusing the device status bit.
It cannot be done using device status bits, because hypervisor is not involved in trapping, and parsing it.
We better discuss in the actual series where things are posted.

> 
> > When one wants to do with mediation, there also two states are needed.
> >
> > The key is modes are not interacting
> 
> You need to explain why they are not interacting. It touches the virtio facility
> which (partially) overlaps the function of the device status for sure. You invent a
> new state machine, and leave the vendors to guess how or why they are not
> interacting with the existing one.
Huh, something needs explanation when there is interaction.
I explained that device_status is just another virtio register.

I missed to add the other vendors Sign-off. I will add it.

> There are just too many corner cases that need to be figured out.
> 
> For example:
> 
> How do you define stop? Is it a virtio level stop, transport level or a mixing of
> them both? 
It is defined in the series and device and driver requirements section and also in theory of operation.

> Is the device allowed to stop in the middle or reset, feature
> negotiation or even transport specific things like FLR?
Yes.

> If yes, how about other operations and who defines and maintains those
> transitional states? If not, why and how long would a stop wait for an operation?
> Can a stop fail? What happens if the driver wants to reset but the device is
> stopped by the admin commands? Who suppresses who and why?
All are described in the series in the normative. If something is missing, please put comment there and I will fix in v1.

> 
> This demonstrates the complexity of your proposal and I don't see any of the
> above were clearly stated in your series. Reusing the existing device status
> machine, everything would be simplified.
You see, too early conclusion saying things are missing, take mine...
I will fix missing items in v1, please put review comments in there.

> 
> > with the device_status because device_status is just another register of the
> virtio.
> 
> Let's don't do layer violation, device status is the basic facility of the virtio
> device which is not coupled with any transport so it is not necessarily
> implemented via registers.

There is no violation. Device_status is already part of the transport specific context.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30         ` Jason Wang
  2023-09-11  6:47           ` Parav Pandit
@ 2023-09-11  6:59           ` Parav Pandit
  2023-09-11 10:15           ` Michael S. Tsirkin
  2 siblings, 0 replies; 148+ messages in thread
From: Parav Pandit @ 2023-09-11  6:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM


> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.
It is not the customer want/does not want.
The pci transport simply do not allow one to bifurcate the PCI device to do things like resetting the device and still letting partial things run like some admin commands or registers.
So one needs to do tricks of mediation and build things on such depdency for nested use case.

Anyway mediation approach of AQ etc does not address the basic passthrough requirements.
So both are still orthogonal proposals addressing different use cases.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30         ` Jason Wang
  2023-09-11  6:47           ` Parav Pandit
  2023-09-11  6:59           ` Parav Pandit
@ 2023-09-11 10:15           ` Michael S. Tsirkin
  2023-09-12  3:35             ` Jason Wang
  2 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-11 10:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> in order to migrate a single virtio device in the nest.

Built an alternative facility to implement admin commands then.
The advantage of admin commands is they are nicely contained.
This proposal is way too intrusive.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 10:15           ` Michael S. Tsirkin
@ 2023-09-12  3:35             ` Jason Wang
  2023-09-12  3:43               ` Zhu, Lingshan
  0 siblings, 1 reply; 148+ messages in thread
From: Jason Wang @ 2023-09-12  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu Lingshan, eperezma@redhat.com,
	cohuck@redhat.com, stefanha@redhat.com,
	virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org

On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> > Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> > in order to migrate a single virtio device in the nest.
>
> Built an alternative facility to implement admin commands then.

I wonder if it could be built in an efficient way. For example the
length of admin commands is not fixed and we don't want to grow MMIO
areas as the admin command, this will result in something like
VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
accesses than simply introducing new fields in common cfg).

> The advantage of admin commands is they are nicely contained.

If it want to be contained it needs to duplicate the functionality of
the existing facilities like common cfg and others (one example is to
setup the virtqueue after migration). Otherwise during live migration,
we will use both admin commands and existing configuration structure
which will end up with more issues.

As stated before, the best way is to decouple the basic facilities
(states like index, inflight, dirty page) from a specific
interface/transport and keep the flexibility at the transport layer.
Transport layer can choose to stick to the existing interfaces or
implement the admin commands. So we can have two ways in parallel:

1) live migration via the existing transport specific facilities, this
allows us to reuse the existing interfaces with minimal extensions or
take the advantages of the transport specific facilities like PASID
2) live migration via admin commands, but it needs to invent commands
to access existing facilities which is just a new transport interface
that LingShan is work (transport over admin commands)

Instead of focusing on a solution that only works for a specific setup
on a specific transport.

Thanks

> This proposal is way too intrusive.
>
> --
> MST
>

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:35             ` Jason Wang
@ 2023-09-12  3:43               ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Parav Pandit, eperezma@redhat.com, cohuck@redhat.com,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org



On 9/12/2023 11:35 AM, Jason Wang wrote:
> On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
>>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
>>> in order to migrate a single virtio device in the nest.
>> Built an alternative facility to implement admin commands then.
> I wonder if it could be built in an efficient way. For example the
> length of admin commands is not fixed and we don't want to grow MMIO
> areas as the admin command, this will result in something like
> VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
> accesses than simply introducing new fields in common cfg).
>
>> The advantage of admin commands is they are nicely contained.
> If it want to be contained it needs to duplicate the functionality of
> the existing facilities like common cfg and others (one example is to
> setup the virtqueue after migration). Otherwise during live migration,
> we will use both admin commands and existing configuration structure
> which will end up with more issues.
>
> As stated before, the best way is to decouple the basic facilities
> (states like index, inflight, dirty page) from a specific
> interface/transport and keep the flexibility at the transport layer.
> Transport layer can choose to stick to the existing interfaces or
> implement the admin commands. So we can have two ways in parallel:
>
> 1) live migration via the existing transport specific facilities, this
> allows us to reuse the existing interfaces with minimal extensions or
> take the advantages of the transport specific facilities like PASID
> 2) live migration via admin commands, but it needs to invent commands
> to access existing facilities which is just a new transport interface
> that LingShan is work (transport over admin commands)
>
> Instead of focusing on a solution that only works for a specific setup
> on a specific transport.
I totally agree
>
> Thanks
>
>> This proposal is way too intrusive.
>>
>> --
>> MST
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16 ` [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
  2023-09-06  8:32   ` Michael S. Tsirkin
@ 2023-09-14 11:27   ` Michael S. Tsirkin
  2023-09-15  4:13     ` Zhu, Lingshan
  1 sibling, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:27 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).

I see no description either here or in the generic patch
of what does it mean to set or get the state.

> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14 11:27   ` Michael S. Tsirkin
@ 2023-09-15  4:13     ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:27 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> I see no description either here or in the generic patch
> of what does it mean to set or get the state.
When SUSPEND, the device stores vq state here, then migrate to the
destination, then the destination hypervisor restores vq state
from here.

I will add more description in V2
>
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
                   ` (4 preceding siblings ...)
  2023-09-06  8:16 ` [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
@ 2023-09-06  8:29 ` Michael S. Tsirkin
  2023-09-06  8:38   ` Zhu, Lingshan
  2023-09-14 11:14 ` [virtio-comment] " Michael S. Tsirkin
  2023-09-14 11:37 ` Michael S. Tsirkin
  7 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:29 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.

oh that answers my question - it's not covered.
I don't think we can merge this without in-flight descriptor
support.



> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.
> 
> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:29 ` [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Michael S. Tsirkin
@ 2023-09-06  8:38   ` Zhu, Lingshan
  2023-09-06 13:49     ` Michael S. Tsirkin
  0 siblings, 1 reply; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  8:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
> oh that answers my question - it's not covered.
> I don't think we can merge this without in-flight descriptor
> support.
When SUSPEND, we require the device wait until all descriptors that
being processed to finish and mark them as used.(In patch 2)

at this point there may be no in-flight descriptors, so this is still
self-consistent. The tracker for in-flight descriptors is excluded to
make this series small and focus.
>
>
>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:38   ` Zhu, Lingshan
@ 2023-09-06 13:49     ` Michael S. Tsirkin
  2023-09-07  1:51       ` Zhu, Lingshan
  2023-09-07 10:57       ` Eugenio Perez Martin
  0 siblings, 2 replies; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06 13:49 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > This series introduces
> > > 1)a new SUSPEND bit in the device status
> > > Which is used to suspend the device, so that the device states
> > > and virtqueue states are stabilized.
> > > 
> > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > and last_used_idx of virtqueues.
> > > 
> > > The main usecase of these new facilities is Live Migration.
> > > 
> > > Future work: dirty page tracking and in-flight descriptors.
> > oh that answers my question - it's not covered.
> > I don't think we can merge this without in-flight descriptor
> > support.
> When SUSPEND, we require the device wait until all descriptors that
> being processed to finish and mark them as used.(In patch 2)
> at this point there may be no in-flight descriptors, so this is still
> self-consistent. The tracker for in-flight descriptors is excluded to
> make this series small and focus.

Does not work generally.
Imagine RX ring of a network device for example. You can wait
as long as you can but if there's no incoming network traffic
buffers will not be used.

Also please, keep to the spec terminology. buffers are used not
descriptors. Best to keep it straight errors will not leak into
spec.


> > 
> > 
> > 
> > > This series addresses many comments from Jason, Stefan and Eugenio
> > > from RFC series.
> > > 
> > > Zhu Lingshan (5):
> > >    virtio: introduce vq state as basic facility
> > >    virtio: introduce SUSPEND bit in device status
> > >    virtqueue: constraints for virtqueue state
> > >    virtqueue: ignore resetting vqs when SUSPEND
> > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > 
> > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > >   transport-pci.tex |  18 +++++++
> > >   2 files changed, 136 insertions(+)
> > > 
> > > -- 
> > > 2.35.3
> > > 
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06 13:49     ` Michael S. Tsirkin
@ 2023-09-07  1:51       ` Zhu, Lingshan
  2023-09-07 10:57       ` Eugenio Perez Martin
  1 sibling, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-07  1:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 9:49 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>> This series introduces
>>>> 1)a new SUSPEND bit in the device status
>>>> Which is used to suspend the device, so that the device states
>>>> and virtqueue states are stabilized.
>>>>
>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>>>> and last_used_idx of virtqueues.
>>>>
>>>> The main usecase of these new facilities is Live Migration.
>>>>
>>>> Future work: dirty page tracking and in-flight descriptors.
>>> oh that answers my question - it's not covered.
>>> I don't think we can merge this without in-flight descriptor
>>> support.
>> When SUSPEND, we require the device wait until all descriptors that
>> being processed to finish and mark them as used.(In patch 2)
>> at this point there may be no in-flight descriptors, so this is still
>> self-consistent. The tracker for in-flight descriptors is excluded to
>> make this series small and focus.
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
Yes we will include a patch tracking in-flight descriptors in V2.
>
> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
OK
>
>
>>>
>>>
>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>> from RFC series.
>>>>
>>>> Zhu Lingshan (5):
>>>>     virtio: introduce vq state as basic facility
>>>>     virtio: introduce SUSPEND bit in device status
>>>>     virtqueue: constraints for virtqueue state
>>>>     virtqueue: ignore resetting vqs when SUSPEND
>>>>     virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>>>
>>>>    content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>    transport-pci.tex |  18 +++++++
>>>>    2 files changed, 136 insertions(+)
>>>>
>>>> -- 
>>>> 2.35.3
>>>>
>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06 13:49     ` Michael S. Tsirkin
  2023-09-07  1:51       ` Zhu, Lingshan
@ 2023-09-07 10:57       ` Eugenio Perez Martin
  2023-09-07 19:55         ` Michael S. Tsirkin
  1 sibling, 1 reply; 148+ messages in thread
From: Eugenio Perez Martin @ 2023-09-07 10:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment,
	virtio-dev

On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > This series introduces
> > > > 1)a new SUSPEND bit in the device status
> > > > Which is used to suspend the device, so that the device states
> > > > and virtqueue states are stabilized.
> > > >
> > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > and last_used_idx of virtqueues.
> > > >
> > > > The main usecase of these new facilities is Live Migration.
> > > >
> > > > Future work: dirty page tracking and in-flight descriptors.
> > > oh that answers my question - it's not covered.
> > > I don't think we can merge this without in-flight descriptor
> > > support.
> > When SUSPEND, we require the device wait until all descriptors that
> > being processed to finish and mark them as used.(In patch 2)
> > at this point there may be no in-flight descriptors, so this is still
> > self-consistent. The tracker for in-flight descriptors is excluded to
> > make this series small and focus.
>
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
>

The patch should word it differently, yes.

QEMU's vhost-kernel net handler currently assumes the device will use
the descriptors sequentially from avail_idx. In that case, it is
possible to simply finish receiving in-flight packets (not buffers)
and just stop receiving new packets. As all packets has been received,
we have a valid used-idx, and the device at resume (or the destination
device at migration) can just fetch all buffers from there.

I may have a better wording of this in other mails.

Would it work to use this solution with in_order, and defer the
inflight buffers handling for the future? It would allow to keep this
series small.

Thanks!

> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
>
>
> > >
> > >
> > >
> > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > from RFC series.
> > > >
> > > > Zhu Lingshan (5):
> > > >    virtio: introduce vq state as basic facility
> > > >    virtio: introduce SUSPEND bit in device status
> > > >    virtqueue: constraints for virtqueue state
> > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > >
> > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >   transport-pci.tex |  18 +++++++
> > > >   2 files changed, 136 insertions(+)
> > > >
> > > > --
> > > > 2.35.3
> > > >
> > > >
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > >
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > >
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/
> > > >
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-07 10:57       ` Eugenio Perez Martin
@ 2023-09-07 19:55         ` Michael S. Tsirkin
  0 siblings, 0 replies; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-07 19:55 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment,
	virtio-dev

On Thu, Sep 07, 2023 at 12:57:58PM +0200, Eugenio Perez Martin wrote:
> On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> > >
> > >
> > > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > > This series introduces
> > > > > 1)a new SUSPEND bit in the device status
> > > > > Which is used to suspend the device, so that the device states
> > > > > and virtqueue states are stabilized.
> > > > >
> > > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > > and last_used_idx of virtqueues.
> > > > >
> > > > > The main usecase of these new facilities is Live Migration.
> > > > >
> > > > > Future work: dirty page tracking and in-flight descriptors.
> > > > oh that answers my question - it's not covered.
> > > > I don't think we can merge this without in-flight descriptor
> > > > support.
> > > When SUSPEND, we require the device wait until all descriptors that
> > > being processed to finish and mark them as used.(In patch 2)
> > > at this point there may be no in-flight descriptors, so this is still
> > > self-consistent. The tracker for in-flight descriptors is excluded to
> > > make this series small and focus.
> >
> > Does not work generally.
> > Imagine RX ring of a network device for example. You can wait
> > as long as you can but if there's no incoming network traffic
> > buffers will not be used.
> >
> 
> The patch should word it differently, yes.
> 
> QEMU's vhost-kernel net handler currently assumes the device will use
> the descriptors sequentially from avail_idx. In that case, it is
> possible to simply finish receiving in-flight packets (not buffers)
> and just stop receiving new packets. As all packets has been received,
> we have a valid used-idx, and the device at resume (or the destination
> device at migration) can just fetch all buffers from there.
> 
> I may have a better wording of this in other mails.
> 
> Would it work to use this solution with in_order, and defer the
> inflight buffers handling for the future? It would allow to keep this
> series small.
> 
> Thanks!

in_order isn't used widely so I doubt depending on it is wise.
And again the whole things has to be rewritten with admin queue
and the group owner generally does not know whether in_order
was negotiated by member or not.


> > Also please, keep to the spec terminology. buffers are used not
> > descriptors. Best to keep it straight errors will not leak into
> > spec.
> >
> >
> > > >
> > > >
> > > >
> > > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > > from RFC series.
> > > > >
> > > > > Zhu Lingshan (5):
> > > > >    virtio: introduce vq state as basic facility
> > > > >    virtio: introduce SUSPEND bit in device status
> > > > >    virtqueue: constraints for virtqueue state
> > > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > > >
> > > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > >   transport-pci.tex |  18 +++++++
> > > > >   2 files changed, 136 insertions(+)
> > > > >
> > > > > --
> > > > > 2.35.3
> > > > >
> > > > >
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > >
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > >
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > >
> >


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
                   ` (5 preceding siblings ...)
  2023-09-06  8:29 ` [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Michael S. Tsirkin
@ 2023-09-14 11:14 ` Michael S. Tsirkin
  2023-09-14 11:37 ` Michael S. Tsirkin
  7 siblings, 0 replies; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:14 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.

Compared to Parav's patchset this is much less functional.

Assuming that one goes in, can't we add ability to submit
admin commands through MMIO on the device itself and be done with it?

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
                   ` (6 preceding siblings ...)
  2023-09-14 11:14 ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-14 11:37 ` Michael S. Tsirkin
  2023-09-15  4:41   ` [virtio-comment] Re: [virtio-dev] " Zhu, Lingshan
  7 siblings, 1 reply; 148+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:37 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> 
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.


after going over this in detail, it is like I worried: this
tries to do too much through a single register and
the ownership is muddied significantly.

I feel a separate capability for suspend/resume that would
be independent of device status would be preferable.

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-14 11:37 ` Michael S. Tsirkin
@ 2023-09-15  4:41   ` Zhu, Lingshan
  0 siblings, 0 replies; 148+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:37 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
>>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>
> after going over this in detail, it is like I worried: this
> tries to do too much through a single register and
> the ownership is muddied significantly.
Not sure about what ownership, device usually STOPPED after
guest freezes, so the hypervisor owns the device status
and LM facilities at that moment.
>
> I feel a separate capability for suspend/resume that would
> be independent of device status would be preferable.
The implementation of the live migration basic facilities are transport 
specific, for PCI:
1)Dirty page tracking will have its own capability
2)In-flight descriptors tracker will have its own capability
3)vq states stored in common config space

Only SUSPEND is implemented in the device status, and this is a valid 
device status.
There are already 6 device status bits, and IMHO this series 
implementing SUSPEND does not
introduce more complexities.
>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 148+ messages in thread

end of thread, other threads:[~2023-09-19  7:56 UTC | newest]

Thread overview: 148+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06  8:16 [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
2023-09-06  8:28   ` Michael S. Tsirkin
2023-09-06  9:43     ` Zhu, Lingshan
2023-09-14 11:25   ` Michael S. Tsirkin
2023-09-15  2:46     ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status Zhu Lingshan
2023-09-14 11:34   ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  2:57     ` Zhu, Lingshan
2023-09-15 11:10       ` Michael S. Tsirkin
2023-09-18  2:56         ` Zhu, Lingshan
2023-09-18  4:42           ` Parav Pandit
2023-09-18  5:14             ` Zhu, Lingshan
2023-09-18  6:17               ` Parav Pandit
2023-09-18  6:38                 ` Zhu, Lingshan
2023-09-18  6:46                   ` Parav Pandit
2023-09-18  6:49                     ` Zhu, Lingshan
2023-09-18  6:50           ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state Zhu Lingshan
2023-09-14 11:30   ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  2:59     ` Zhu, Lingshan
2023-09-15 11:16       ` Michael S. Tsirkin
2023-09-18  3:02         ` Zhu, Lingshan
2023-09-18 17:30           ` Michael S. Tsirkin
2023-09-19  7:56             ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND Zhu Lingshan
2023-09-14 11:09   ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  4:06     ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
2023-09-06  8:32   ` Michael S. Tsirkin
2023-09-06  8:37     ` Parav Pandit
2023-09-06  9:37     ` Zhu, Lingshan
2023-09-11  3:01     ` Jason Wang
2023-09-11  4:11       ` Parav Pandit
2023-09-11  6:30         ` Jason Wang
2023-09-11  6:47           ` Parav Pandit
2023-09-11  6:58             ` Zhu, Lingshan
2023-09-11  7:07               ` Parav Pandit
2023-09-11  7:18                 ` Zhu, Lingshan
2023-09-11  7:30                   ` Parav Pandit
2023-09-11  7:58                     ` Zhu, Lingshan
2023-09-11  8:12                       ` Parav Pandit
2023-09-11  8:46                         ` Zhu, Lingshan
2023-09-11  9:05                           ` Parav Pandit
2023-09-11  9:32                             ` Zhu, Lingshan
2023-09-11 10:21                               ` Parav Pandit
2023-09-12  4:06                                 ` Zhu, Lingshan
2023-09-12  5:58                                   ` Parav Pandit
2023-09-12  6:33                                     ` Zhu, Lingshan
2023-09-12  6:47                                       ` Parav Pandit
2023-09-12  7:27                                         ` Zhu, Lingshan
2023-09-12  7:40                                           ` Parav Pandit
2023-09-12  9:02                                             ` Zhu, Lingshan
2023-09-12  9:21                                               ` Parav Pandit
2023-09-12 13:03                                                 ` Zhu, Lingshan
2023-09-12 13:43                                                   ` Parav Pandit
2023-09-13  4:01                                                     ` Zhu, Lingshan
2023-09-13  4:12                                                       ` Parav Pandit
2023-09-13  4:20                                                         ` Zhu, Lingshan
2023-09-13  4:36                                                           ` Parav Pandit
2023-09-14  8:19                                                             ` Zhu, Lingshan
2023-09-11 11:50                               ` Parav Pandit
2023-09-12  3:43                                 ` Jason Wang
2023-09-12  5:50                                   ` Parav Pandit
2023-09-13  4:44                                     ` Jason Wang
2023-09-13  6:05                                       ` Parav Pandit
2023-09-14  3:11                                         ` Jason Wang
2023-09-17  5:22                                           ` Parav Pandit
2023-09-19  4:35                                             ` Jason Wang
2023-09-19  7:33                                               ` Parav Pandit
2023-09-12  3:48                                 ` Zhu, Lingshan
2023-09-12  5:51                                   ` Parav Pandit
2023-09-12  6:37                                     ` Zhu, Lingshan
2023-09-12  6:49                                       ` Parav Pandit
2023-09-12  7:29                                         ` Zhu, Lingshan
2023-09-12  7:53                                           ` Parav Pandit
2023-09-12  9:06                                             ` Zhu, Lingshan
2023-09-12  9:08                                               ` Zhu, Lingshan
2023-09-12  9:35                                                 ` Parav Pandit
2023-09-12 10:14                                                   ` Zhu, Lingshan
2023-09-12 10:16                                                     ` Parav Pandit
2023-09-12 10:28                                                       ` Zhu, Lingshan
2023-09-13  2:23                                                     ` Parav Pandit
2023-09-13  4:03                                                       ` Zhu, Lingshan
2023-09-13  4:15                                                         ` Parav Pandit
2023-09-13  4:21                                                           ` Zhu, Lingshan
2023-09-13  4:37                                                             ` Parav Pandit
2023-09-14  3:11                                                               ` Jason Wang
2023-09-17  5:25                                                                 ` Parav Pandit
2023-09-19  4:34                                                                   ` Jason Wang
2023-09-19  7:32                                                                     ` Parav Pandit
2023-09-14  8:22                                                               ` Zhu, Lingshan
2023-09-12  9:28                                               ` Parav Pandit
2023-09-12 10:17                                                 ` Zhu, Lingshan
2023-09-12 10:25                                                   ` Parav Pandit
2023-09-12 10:32                                                     ` Zhu, Lingshan
2023-09-12 10:40                                                       ` Parav Pandit
2023-09-12 13:04                                                         ` Zhu, Lingshan
2023-09-12 13:36                                                           ` Parav Pandit
2023-09-12  4:10                         ` Jason Wang
2023-09-12  6:05                           ` Parav Pandit
2023-09-13  4:45                             ` Jason Wang
2023-09-13  6:39                               ` Parav Pandit
2023-09-14  3:08                                 ` Jason Wang
2023-09-17  5:22                                   ` Parav Pandit
2023-09-19  4:32                                     ` Jason Wang
2023-09-19  7:32                                       ` Parav Pandit
2023-09-13  8:27                               ` Michael S. Tsirkin
2023-09-14  3:11                                 ` Jason Wang
2023-09-12  4:18             ` Jason Wang
2023-09-12  6:11               ` Parav Pandit
2023-09-12  6:43                 ` Zhu, Lingshan
2023-09-12  6:52                   ` Parav Pandit
2023-09-12  7:36                     ` Zhu, Lingshan
2023-09-12  7:43                       ` Parav Pandit
2023-09-12 10:27                         ` Zhu, Lingshan
2023-09-12 10:33                           ` Parav Pandit
2023-09-12 10:35                             ` Zhu, Lingshan
2023-09-12 10:41                               ` Parav Pandit
2023-09-12 13:09                                 ` Zhu, Lingshan
2023-09-12 13:35                                   ` Parav Pandit
2023-09-13  4:13                                     ` Zhu, Lingshan
2023-09-13  4:19                                       ` Parav Pandit
2023-09-13  4:22                                         ` Zhu, Lingshan
2023-09-13  4:39                                           ` Parav Pandit
2023-09-14  8:24                                             ` Zhu, Lingshan
2023-09-13  4:56                                         ` Jason Wang
2023-09-13  4:43                 ` Jason Wang
2023-09-13  4:46                   ` Parav Pandit
2023-09-14  3:12                     ` Jason Wang
2023-09-17  5:29                       ` Parav Pandit
2023-09-19  4:25                         ` Jason Wang
2023-09-19  7:32                           ` Parav Pandit
2023-09-11  6:59           ` Parav Pandit
2023-09-11 10:15           ` Michael S. Tsirkin
2023-09-12  3:35             ` Jason Wang
2023-09-12  3:43               ` Zhu, Lingshan
2023-09-14 11:27   ` Michael S. Tsirkin
2023-09-15  4:13     ` Zhu, Lingshan
2023-09-06  8:29 ` [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Michael S. Tsirkin
2023-09-06  8:38   ` Zhu, Lingshan
2023-09-06 13:49     ` Michael S. Tsirkin
2023-09-07  1:51       ` Zhu, Lingshan
2023-09-07 10:57       ` Eugenio Perez Martin
2023-09-07 19:55         ` Michael S. Tsirkin
2023-09-14 11:14 ` [virtio-comment] " Michael S. Tsirkin
2023-09-14 11:37 ` Michael S. Tsirkin
2023-09-15  4:41   ` [virtio-comment] Re: [virtio-dev] " Zhu, Lingshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox