* [PATCH V7 v7] virtio: introduce SUSPEND bit in device status @ 2024-08-01 11:35 Zhu Lingshan 2024-08-13 4:42 ` Parav Pandit 2024-08-13 8:01 ` Michael S. Tsirkin 0 siblings, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-08-01 11:35 UTC (permalink / raw) To: mst, cohuck, jasowang Cc: virtio-comment, Zhu Lingshan, Zhu Lingshan, Eugenio Pérez, David Stevens This commit allows the driver to suspend the device by introducing a new status bit SUSPEND in device_status. This commit also introduce a new feature bit VIRTIO_F_SUSPEND which indicating whether the device support SUSPEND. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Signed-off-by: David Stevens <stevensd@chromium.org> --- content.tex | 75 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 65 insertions(+), 10 deletions(-) Changes from V6: - the device should hold its config interrupt while SUSPEND, and send config interrupt when the SUSPEND bit is cleared. - while SUSPEND, the driver MUST NOT access Device Configuration Space - minor changes. Changes from V5: - the device should present NEEDS_RESET if failed to suspend - allow the driver access device status in the config space when suspended if it is implemented in config space. - language improvements Changes from V4: - re-order the device status bits section - kick vqs --> notify vqs Changes from V3: - allow the driver clearing the SUSPEND bit to resume the device. - disallow access to config space while suspended. diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 100644 --- a/content.tex +++ b/content.tex @@ -36,19 +36,22 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev this bit. For example, under Linux, drivers can be loadable modules. \end{note} -\item[FAILED (128)] Indicates that something went wrong in the guest, - and it has given up on the device. This could be an internal - error, or the driver didn't like the device for some reason, or - even a fatal error during device operation. +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to + drive the device. \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all the features it understands, and feature negotiation is complete. -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to - drive the device. +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the + device has been suspended by the driver. \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced an error from which it can't recover. + +\item[FAILED (128)] Indicates that something went wrong in the guest, + and it has given up on the device. This could be an internal + error, or the driver didn't like the device for some reason, or + even a fatal error during device operation. \end{description} The \field{device status} field starts out as 0, and is reinitialized to 0 by @@ -60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev initialization sequence specified in \ref{sec:General Initialization And Device Operation / Device Initialization}. -The driver MUST NOT clear a -\field{device status} bit. If the driver sets the FAILED bit, +The driver MUST NOT clear a \field{device status} bit other than SUSPEND +except when setting \field{device status} to 0 as a transport-specific way to +initiate a reset. If the driver sets the FAILED bit, the driver MUST later reset the device before attempting to re-initialize. The driver SHOULD NOT rely on completion of operations of a @@ -99,10 +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B \begin{description} \item[0 to 23, and 50 to 127] Feature bits for the specific device type -\item[24 to 41] Feature bits reserved for extensions to the queue and +\item[24 to 42] Feature bits reserved for extensions to the queue and feature negotiation mechanisms -\item[42 to 49, and 128 and above] Feature bits reserved for future extensions. +\item[43 to 49, and 128 and above] Feature bits reserved for future extensions. \end{description} \begin{note} @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General Initialization And Device Operation / Thus a driver MUST ensure a virtqueue isn't live (by device reset) before removing exposed buffers. +\section{Device Suspend}\label{sec:General Initialization And Device Operation / Device Suspend} + +When VIRTIO_F_SUSPEND is negotiated, the driver can set the +SUSPEND bit in \field{device status} to suspend a device, and can +clear the SUSPEND bit to resume a suspended device. + +\drivernormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} + +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. + +Once the driver sets SUSPEND to \field{device status} of the device: +\begin{itemize} +\item The driver MUST re-read \field{device status} to verify whether the SUSPEND bit is set. +\item The driver MUST NOT make any more buffers available to the device. +\item The driver MUST NOT access any virtqueues or send notifications for any virtqueues. +\item The driver MUST NOT access Device Configuration Space. +\end{itemize} + +\devicenormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} + +The device MUST ignore SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. + +The device MUST ignore all access to its Configuration Space while +suspended, except for \field{device status} if it is part of the Configuration Space. + +A device MUST NOT send any notifications for any virtqeuues, +access any virtqueues, or modify any fields in +its Configuration Space while suspended. + +If changes occur in the Configuration Space while the SUSPEND bit is set, +the device MUST NOT send any configuration change notifications. +Instead, the device MUST send the notification after the SUSPEND bit has been cleared. + +When the driver sets SUSPEND, the device MUST either suspend itself or set DEVICE_NEEDS_RESET if failed to suspend. + +If SUSPEND is set in \field{device status}, when the driver clears SUSPEND, +the device MUST either resume normal operation or set DEVICE_NEEDS_RESET. + +When the driver sets SUSPEND, +the device SHOULD perform the following actions before presenting that the SUSPEND bit is set to 1 in the \field{device status}: + +\begin{itemize} +\item Stop processing more buffers of any virtqueues +\item Wait until all buffers that are being processed have been used. +\item Send used buffer notifications to the driver. +\end{itemize} + \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} Virtio can use various different buses, thus the standard is split @@ -872,6 +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for handling features reserved for future use. + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can + trigger suspending the device via the SUSPEND flag + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. + \end{description} \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} -- 2.45.2 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-01 11:35 [PATCH V7 v7] virtio: introduce SUSPEND bit in device status Zhu Lingshan @ 2024-08-13 4:42 ` Parav Pandit 2024-08-13 5:44 ` Zhu Lingshan 2024-08-13 7:51 ` Michael S. Tsirkin 2024-08-13 8:01 ` Michael S. Tsirkin 1 sibling, 2 replies; 69+ messages in thread From: Parav Pandit @ 2024-08-13 4:42 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Zhu Lingshan, Eugenio Pérez, David Stevens Hi Lingshan, David, > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, August 1, 2024 5:05 PM > > This commit allows the driver to suspend the device by introducing a new > status bit SUSPEND in device_status. > > This commit also introduce a new feature bit VIRTIO_F_SUSPEND which > indicating whether the device support SUSPEND. > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > Signed-off-by: David Stevens <stevensd@chromium.org> > --- > content.tex | 75 ++++++++++++++++++++++++++++++++++++++++++++++----- > -- > 1 file changed, 65 insertions(+), 10 deletions(-) > > Changes from V6: > - the device should hold its config interrupt while SUSPEND, and send config > interrupt when the SUSPEND bit is cleared. > - while SUSPEND, the driver MUST NOT access Device Configuration Space > - minor changes. > > Changes from V5: > - the device should present NEEDS_RESET if failed to suspend > - allow the driver access device status in the config space when > suspended if it is implemented in config space. > - language improvements > > Changes from V4: > - re-order the device status bits section > - kick vqs --> notify vqs > > Changes from V3: > - allow the driver clearing the SUSPEND bit to resume the device. > - disallow access to config space while suspended. > > diff --git a/content.tex b/content.tex > index 0a62dce..2d1bee8 100644 > --- a/content.tex > +++ b/content.tex > @@ -36,19 +36,22 @@ \section{\field{Device Status} Field}\label{sec:Basic > Facilities of a Virtio Dev > this bit. For example, under Linux, drivers can be loadable modules. > \end{note} > > -\item[FAILED (128)] Indicates that something went wrong in the guest, > - and it has given up on the device. This could be an internal > - error, or the driver didn't like the device for some reason, or > - even a fatal error during device operation. > +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > + drive the device. > > \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all the > features it understands, and feature negotiation is complete. > > -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > - drive the device. > +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that > +the > + device has been suspended by the driver. > > \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced > an error from which it can't recover. > + > +\item[FAILED (128)] Indicates that something went wrong in the guest, > + and it has given up on the device. This could be an internal > + error, or the driver didn't like the device for some reason, or > + even a fatal error during device operation. > \end{description} > > The \field{device status} field starts out as 0, and is reinitialized to 0 by @@ - > 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities > of a Virtio Dev initialization sequence specified in \ref{sec:General > Initialization And Device Operation / Device Initialization}. > -The driver MUST NOT clear a > -\field{device status} bit. If the driver sets the FAILED bit, > +The driver MUST NOT clear a \field{device status} bit other than > +SUSPEND except when setting \field{device status} to 0 as a > +transport-specific way to initiate a reset. If the driver sets the > +FAILED bit, > the driver MUST later reset the device before attempting to re-initialize. > > The driver SHOULD NOT rely on completion of operations of a @@ -99,10 > +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device > / Feature B \begin{description} > \item[0 to 23, and 50 to 127] Feature bits for the specific device type > > -\item[24 to 41] Feature bits reserved for extensions to the queue and > +\item[24 to 42] Feature bits reserved for extensions to the queue and > feature negotiation mechanisms > > -\item[42 to 49, and 128 and above] Feature bits reserved for future > extensions. > +\item[43 to 49, and 128 and above] Feature bits reserved for future > extensions. > \end{description} > > \begin{note} > @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General > Initialization And Device Operation / > > Thus a driver MUST ensure a virtqueue isn't live (by device reset) before > removing exposed buffers. > > +\section{Device Suspend}\label{sec:General Initialization And Device > +Operation / Device Suspend} > + > +When VIRTIO_F_SUSPEND is negotiated, the driver can set the SUSPEND bit > +in \field{device status} to suspend a device, and can clear the SUSPEND > +bit to resume a suspended device. > + > +\drivernormative{\subsection}{Device Suspend}{General Initialization > +And Device Operation / Device Suspend} > + > +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or > VIRTIO_F_SUSPEND is not negotiated. > + > +Once the driver sets SUSPEND to \field{device status} of the device: > +\begin{itemize} > +\item The driver MUST re-read \field{device status} to verify whether the > SUSPEND bit is set. > +\item The driver MUST NOT make any more buffers available to the device. > +\item The driver MUST NOT access any virtqueues or send notifications for > any virtqueues. > +\item The driver MUST NOT access Device Configuration Space. > +\end{itemize} > + Do we agree that a. suspending a device is non frequent operation (in order of N operations/sec, where N is roughly in range of 10 or 100) per device? b. A software-based device may not always want to force VM_EXIT on read and write on the device_status register? > +\devicenormative{\subsection}{Device Suspend}{General Initialization > +And Device Operation / Device Suspend} > + > +The device MUST ignore SUSPEND if FEATURES_OK is not set or > VIRTIO_F_SUSPEND is not negotiated. > + > +The device MUST ignore all access to its Configuration Space while > +suspended, except for \field{device status} if it is part of the Configuration > Space. > + > +A device MUST NOT send any notifications for any virtqeuues, access any > +virtqueues, or modify any fields in its Configuration Space while > +suspended. > + > +If changes occur in the Configuration Space while the SUSPEND bit is > +set, the device MUST NOT send any configuration change notifications. > +Instead, the device MUST send the notification after the SUSPEND bit has > been cleared. > + > +When the driver sets SUSPEND, the device MUST either suspend itself or set > DEVICE_NEEDS_RESET if failed to suspend. > + > +If SUSPEND is set in \field{device status}, when the driver clears > +SUSPEND, the device MUST either resume normal operation or set > DEVICE_NEEDS_RESET. > + > +When the driver sets SUSPEND, > +the device SHOULD perform the following actions before presenting that > the SUSPEND bit is set to 1 in the \field{device status}: > + > +\begin{itemize} > +\item Stop processing more buffers of any virtqueues \item Wait until > +all buffers that are being processed have been used. > +\item Send used buffer notifications to the driver. > +\end{itemize} > + > \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} > > Virtio can use various different buses, thus the standard is split @@ -872,6 > +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature > Bits} > \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} > for > handling features reserved for future use. > > + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can > + trigger suspending the device via the SUSPEND flag > + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > + > \end{description} > > \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} > -- > 2.45.2 > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 4:42 ` Parav Pandit @ 2024-08-13 5:44 ` Zhu Lingshan 2024-08-13 5:50 ` Parav Pandit 2024-08-13 7:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-08-13 5:44 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Zhu Lingshan, Eugenio Pérez, David Stevens On 8/13/2024 12:42 PM, Parav Pandit wrote: > Hi Lingshan, David, > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Thursday, August 1, 2024 5:05 PM >> >> This commit allows the driver to suspend the device by introducing a new >> status bit SUSPEND in device_status. >> >> This commit also introduce a new feature bit VIRTIO_F_SUSPEND which >> indicating whether the device support SUSPEND. >> >> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >> Signed-off-by: David Stevens <stevensd@chromium.org> >> --- >> content.tex | 75 ++++++++++++++++++++++++++++++++++++++++++++++----- >> -- >> 1 file changed, 65 insertions(+), 10 deletions(-) >> >> Changes from V6: >> - the device should hold its config interrupt while SUSPEND, and send config >> interrupt when the SUSPEND bit is cleared. >> - while SUSPEND, the driver MUST NOT access Device Configuration Space >> - minor changes. >> >> Changes from V5: >> - the device should present NEEDS_RESET if failed to suspend >> - allow the driver access device status in the config space when >> suspended if it is implemented in config space. >> - language improvements >> >> Changes from V4: >> - re-order the device status bits section >> - kick vqs --> notify vqs >> >> Changes from V3: >> - allow the driver clearing the SUSPEND bit to resume the device. >> - disallow access to config space while suspended. >> >> diff --git a/content.tex b/content.tex >> index 0a62dce..2d1bee8 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -36,19 +36,22 @@ \section{\field{Device Status} Field}\label{sec:Basic >> Facilities of a Virtio Dev >> this bit. For example, under Linux, drivers can be loadable modules. >> \end{note} >> >> -\item[FAILED (128)] Indicates that something went wrong in the guest, >> - and it has given up on the device. This could be an internal >> - error, or the driver didn't like the device for some reason, or >> - even a fatal error during device operation. >> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> + drive the device. >> >> \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all the >> features it understands, and feature negotiation is complete. >> >> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> - drive the device. >> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that >> +the >> + device has been suspended by the driver. >> >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >> an error from which it can't recover. >> + >> +\item[FAILED (128)] Indicates that something went wrong in the guest, >> + and it has given up on the device. This could be an internal >> + error, or the driver didn't like the device for some reason, or >> + even a fatal error during device operation. >> \end{description} >> >> The \field{device status} field starts out as 0, and is reinitialized to 0 by @@ - >> 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities >> of a Virtio Dev initialization sequence specified in \ref{sec:General >> Initialization And Device Operation / Device Initialization}. >> -The driver MUST NOT clear a >> -\field{device status} bit. If the driver sets the FAILED bit, >> +The driver MUST NOT clear a \field{device status} bit other than >> +SUSPEND except when setting \field{device status} to 0 as a >> +transport-specific way to initiate a reset. If the driver sets the >> +FAILED bit, >> the driver MUST later reset the device before attempting to re-initialize. >> >> The driver SHOULD NOT rely on completion of operations of a @@ -99,10 >> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device >> / Feature B \begin{description} >> \item[0 to 23, and 50 to 127] Feature bits for the specific device type >> >> -\item[24 to 41] Feature bits reserved for extensions to the queue and >> +\item[24 to 42] Feature bits reserved for extensions to the queue and >> feature negotiation mechanisms >> >> -\item[42 to 49, and 128 and above] Feature bits reserved for future >> extensions. >> +\item[43 to 49, and 128 and above] Feature bits reserved for future >> extensions. >> \end{description} >> >> \begin{note} >> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General >> Initialization And Device Operation / >> >> Thus a driver MUST ensure a virtqueue isn't live (by device reset) before >> removing exposed buffers. >> >> +\section{Device Suspend}\label{sec:General Initialization And Device >> +Operation / Device Suspend} >> + >> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the SUSPEND bit >> +in \field{device status} to suspend a device, and can clear the SUSPEND >> +bit to resume a suspended device. >> + >> +\drivernormative{\subsection}{Device Suspend}{General Initialization >> +And Device Operation / Device Suspend} >> + >> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or >> VIRTIO_F_SUSPEND is not negotiated. >> + >> +Once the driver sets SUSPEND to \field{device status} of the device: >> +\begin{itemize} >> +\item The driver MUST re-read \field{device status} to verify whether the >> SUSPEND bit is set. >> +\item The driver MUST NOT make any more buffers available to the device. >> +\item The driver MUST NOT access any virtqueues or send notifications for >> any virtqueues. >> +\item The driver MUST NOT access Device Configuration Space. >> +\end{itemize} >> + Hi Parva > Do we agree that > a. suspending a device is non frequent operation (in order of N operations/sec, where N is roughly in range of 10 or 100) per device? Ideally it should not be often in normal operations, but remember we can not restrict the behaviors of the driver, so we must be able to handle the scenario in which SUSPENDING is often. > b. A software-based device may not always want to force VM_EXIT on read and write on the device_status register? Trap and Emulation is the basic of virtualization, and how to pass-through a device is out of this spec. Thanks Zhu Lingshan > > >> +\devicenormative{\subsection}{Device Suspend}{General Initialization >> +And Device Operation / Device Suspend} >> + >> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >> VIRTIO_F_SUSPEND is not negotiated. >> + >> +The device MUST ignore all access to its Configuration Space while >> +suspended, except for \field{device status} if it is part of the Configuration >> Space. >> + >> +A device MUST NOT send any notifications for any virtqeuues, access any >> +virtqueues, or modify any fields in its Configuration Space while >> +suspended. >> + >> +If changes occur in the Configuration Space while the SUSPEND bit is >> +set, the device MUST NOT send any configuration change notifications. >> +Instead, the device MUST send the notification after the SUSPEND bit has >> been cleared. >> + >> +When the driver sets SUSPEND, the device MUST either suspend itself or set >> DEVICE_NEEDS_RESET if failed to suspend. >> + >> +If SUSPEND is set in \field{device status}, when the driver clears >> +SUSPEND, the device MUST either resume normal operation or set >> DEVICE_NEEDS_RESET. >> + >> +When the driver sets SUSPEND, >> +the device SHOULD perform the following actions before presenting that >> the SUSPEND bit is set to 1 in the \field{device status}: >> + >> +\begin{itemize} >> +\item Stop processing more buffers of any virtqueues \item Wait until >> +all buffers that are being processed have been used. >> +\item Send used buffer notifications to the driver. >> +\end{itemize} >> + >> \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} >> >> Virtio can use various different buses, thus the standard is split @@ -872,6 >> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature >> Bits} >> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} >> for >> handling features reserved for future use. >> >> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can >> + trigger suspending the device via the SUSPEND flag >> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >> + >> \end{description} >> >> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} >> -- >> 2.45.2 >> ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 5:44 ` Zhu Lingshan @ 2024-08-13 5:50 ` Parav Pandit 2024-08-13 6:14 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-13 5:50 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Zhu Lingshan, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Tuesday, August 13, 2024 11:15 AM > > On 8/13/2024 12:42 PM, Parav Pandit wrote: > > Hi Lingshan, David, > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Thursday, August 1, 2024 5:05 PM > >> > >> This commit allows the driver to suspend the device by introducing a > >> new status bit SUSPEND in device_status. > >> > >> This commit also introduce a new feature bit VIRTIO_F_SUSPEND which > >> indicating whether the device support SUSPEND. > >> > >> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > >> Signed-off-by: David Stevens <stevensd@chromium.org> > >> --- > >> content.tex | 75 ++++++++++++++++++++++++++++++++++++++++++++++- > ---- > >> -- > >> 1 file changed, 65 insertions(+), 10 deletions(-) > >> > >> Changes from V6: > >> - the device should hold its config interrupt while SUSPEND, and > >> send config interrupt when the SUSPEND bit is cleared. > >> - while SUSPEND, the driver MUST NOT access Device Configuration > >> Space > >> - minor changes. > >> > >> Changes from V5: > >> - the device should present NEEDS_RESET if failed to suspend > >> - allow the driver access device status in the config space when > >> suspended if it is implemented in config space. > >> - language improvements > >> > >> Changes from V4: > >> - re-order the device status bits section > >> - kick vqs --> notify vqs > >> > >> Changes from V3: > >> - allow the driver clearing the SUSPEND bit to resume the device. > >> - disallow access to config space while suspended. > >> > >> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 100644 > >> --- a/content.tex > >> +++ b/content.tex > >> @@ -36,19 +36,22 @@ \section{\field{Device Status} > >> Field}\label{sec:Basic Facilities of a Virtio Dev > >> this bit. For example, under Linux, drivers can be loadable modules. > >> \end{note} > >> > >> -\item[FAILED (128)] Indicates that something went wrong in the > >> guest, > >> - and it has given up on the device. This could be an internal > >> - error, or the driver didn't like the device for some reason, or > >> - even a fatal error during device operation. > >> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready > >> +to > >> + drive the device. > >> > >> \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all > the > >> features it understands, and feature negotiation is complete. > >> > >> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready > >> to > >> - drive the device. > >> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates > >> +that the > >> + device has been suspended by the driver. > >> > >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > experienced > >> an error from which it can't recover. > >> + > >> +\item[FAILED (128)] Indicates that something went wrong in the > >> +guest, > >> + and it has given up on the device. This could be an internal > >> + error, or the driver didn't like the device for some reason, or > >> + even a fatal error during device operation. > >> \end{description} > >> > >> The \field{device status} field starts out as 0, and is > >> reinitialized to 0 by @@ - > >> 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic > >> Facilities of a Virtio Dev initialization sequence specified in > >> \ref{sec:General Initialization And Device Operation / Device > Initialization}. > >> -The driver MUST NOT clear a > >> -\field{device status} bit. If the driver sets the FAILED bit, > >> +The driver MUST NOT clear a \field{device status} bit other than > >> +SUSPEND except when setting \field{device status} to 0 as a > >> +transport-specific way to initiate a reset. If the driver sets the > >> +FAILED bit, > >> the driver MUST later reset the device before attempting to re-initialize. > >> > >> The driver SHOULD NOT rely on completion of operations of a @@ > >> -99,10 > >> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a > >> +Virtio Device > >> / Feature B \begin{description} > >> \item[0 to 23, and 50 to 127] Feature bits for the specific device > >> type > >> > >> -\item[24 to 41] Feature bits reserved for extensions to the queue > >> and > >> +\item[24 to 42] Feature bits reserved for extensions to the queue > >> +and > >> feature negotiation mechanisms > >> > >> -\item[42 to 49, and 128 and above] Feature bits reserved for future > >> extensions. > >> +\item[43 to 49, and 128 and above] Feature bits reserved for future > >> extensions. > >> \end{description} > >> > >> \begin{note} > >> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General > >> Initialization And Device Operation / > >> > >> Thus a driver MUST ensure a virtqueue isn't live (by device reset) > >> before removing exposed buffers. > >> > >> +\section{Device Suspend}\label{sec:General Initialization And Device > >> +Operation / Device Suspend} > >> + > >> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the SUSPEND > >> +bit in \field{device status} to suspend a device, and can clear the > >> +SUSPEND bit to resume a suspended device. > >> + > >> +\drivernormative{\subsection}{Device Suspend}{General Initialization > >> +And Device Operation / Device Suspend} > >> + > >> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or > >> VIRTIO_F_SUSPEND is not negotiated. > >> + > >> +Once the driver sets SUSPEND to \field{device status} of the device: > >> +\begin{itemize} > >> +\item The driver MUST re-read \field{device status} to verify > >> +whether the > >> SUSPEND bit is set. > >> +\item The driver MUST NOT make any more buffers available to the > device. > >> +\item The driver MUST NOT access any virtqueues or send > >> +notifications for > >> any virtqueues. > >> +\item The driver MUST NOT access Device Configuration Space. > >> +\end{itemize} > >> + > Hi Parva > > Do we agree that > > a. suspending a device is non frequent operation (in order of N > operations/sec, where N is roughly in range of 10 or 100) per device? > Ideally it should not be often in normal operations, but remember we can > not restrict the behaviors of the driver, so we must be able to handle the > scenario in which SUSPENDING is often. Sure. the intent is slow rate, but one can do at unexpected times. Do you agree? > > b. A software-based device may not always want to force VM_EXIT on read > and write on the device_status register? > Trap and Emulation is the basic of virtualization, and how to pass-through a > device is out of this spec. > Sure, I didn’t suggest to put such things in the spec. My question is, whether to trap and emulate or not is a choice of the software. Do you agree? > Thanks > Zhu Lingshan > > > > > >> +\devicenormative{\subsection}{Device Suspend}{General Initialization > >> +And Device Operation / Device Suspend} > >> + > >> +The device MUST ignore SUSPEND if FEATURES_OK is not set or > >> VIRTIO_F_SUSPEND is not negotiated. > >> + > >> +The device MUST ignore all access to its Configuration Space while > >> +suspended, except for \field{device status} if it is part of the > >> +Configuration > >> Space. > >> + > >> +A device MUST NOT send any notifications for any virtqeuues, access > >> +any virtqueues, or modify any fields in its Configuration Space > >> +while suspended. > >> + > >> +If changes occur in the Configuration Space while the SUSPEND bit is > >> +set, the device MUST NOT send any configuration change notifications. > >> +Instead, the device MUST send the notification after the SUSPEND bit > >> +has > >> been cleared. > >> + > >> +When the driver sets SUSPEND, the device MUST either suspend itself > >> +or set > >> DEVICE_NEEDS_RESET if failed to suspend. > >> + > >> +If SUSPEND is set in \field{device status}, when the driver clears > >> +SUSPEND, the device MUST either resume normal operation or set > >> DEVICE_NEEDS_RESET. > >> + > >> +When the driver sets SUSPEND, > >> +the device SHOULD perform the following actions before presenting > >> +that > >> the SUSPEND bit is set to 1 in the \field{device status}: > >> + > >> +\begin{itemize} > >> +\item Stop processing more buffers of any virtqueues \item Wait > >> +until all buffers that are being processed have been used. > >> +\item Send used buffer notifications to the driver. > >> +\end{itemize} > >> + > >> \chapter{Virtio Transport Options}\label{sec:Virtio Transport > >> Options} > >> > >> Virtio can use various different buses, thus the standard is split > >> @@ -872,6 > >> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature > >> Bits} > >> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature > >> Bits} for > >> handling features reserved for future use. > >> > >> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can > >> + trigger suspending the device via the SUSPEND flag > >> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > >> + > >> \end{description} > >> > >> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature > >> Bits} > >> -- > >> 2.45.2 > >> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 5:50 ` Parav Pandit @ 2024-08-13 6:14 ` Zhu Lingshan 2024-08-13 6:55 ` Parav Pandit 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-08-13 6:14 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Zhu Lingshan, Eugenio Pérez, David Stevens On 8/13/2024 1:50 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Tuesday, August 13, 2024 11:15 AM >> >> On 8/13/2024 12:42 PM, Parav Pandit wrote: >>> Hi Lingshan, David, >>> >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Thursday, August 1, 2024 5:05 PM >>>> >>>> This commit allows the driver to suspend the device by introducing a >>>> new status bit SUSPEND in device_status. >>>> >>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND which >>>> indicating whether the device support SUSPEND. >>>> >>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Signed-off-by: David Stevens <stevensd@chromium.org> >>>> --- >>>> content.tex | 75 ++++++++++++++++++++++++++++++++++++++++++++++- >> ---- >>>> -- >>>> 1 file changed, 65 insertions(+), 10 deletions(-) >>>> >>>> Changes from V6: >>>> - the device should hold its config interrupt while SUSPEND, and >>>> send config interrupt when the SUSPEND bit is cleared. >>>> - while SUSPEND, the driver MUST NOT access Device Configuration >>>> Space >>>> - minor changes. >>>> >>>> Changes from V5: >>>> - the device should present NEEDS_RESET if failed to suspend >>>> - allow the driver access device status in the config space when >>>> suspended if it is implemented in config space. >>>> - language improvements >>>> >>>> Changes from V4: >>>> - re-order the device status bits section >>>> - kick vqs --> notify vqs >>>> >>>> Changes from V3: >>>> - allow the driver clearing the SUSPEND bit to resume the device. >>>> - disallow access to config space while suspended. >>>> >>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 100644 >>>> --- a/content.tex >>>> +++ b/content.tex >>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} >>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>> this bit. For example, under Linux, drivers can be loadable modules. >>>> \end{note} >>>> >>>> -\item[FAILED (128)] Indicates that something went wrong in the >>>> guest, >>>> - and it has given up on the device. This could be an internal >>>> - error, or the driver didn't like the device for some reason, or >>>> - even a fatal error during device operation. >>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready >>>> +to >>>> + drive the device. >>>> >>>> \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all >> the >>>> features it understands, and feature negotiation is complete. >>>> >>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready >>>> to >>>> - drive the device. >>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates >>>> +that the >>>> + device has been suspended by the driver. >>>> >>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >> experienced >>>> an error from which it can't recover. >>>> + >>>> +\item[FAILED (128)] Indicates that something went wrong in the >>>> +guest, >>>> + and it has given up on the device. This could be an internal >>>> + error, or the driver didn't like the device for some reason, or >>>> + even a fatal error during device operation. >>>> \end{description} >>>> >>>> The \field{device status} field starts out as 0, and is >>>> reinitialized to 0 by @@ - >>>> 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic >>>> Facilities of a Virtio Dev initialization sequence specified in >>>> \ref{sec:General Initialization And Device Operation / Device >> Initialization}. >>>> -The driver MUST NOT clear a >>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>> +The driver MUST NOT clear a \field{device status} bit other than >>>> +SUSPEND except when setting \field{device status} to 0 as a >>>> +transport-specific way to initiate a reset. If the driver sets the >>>> +FAILED bit, >>>> the driver MUST later reset the device before attempting to re-initialize. >>>> >>>> The driver SHOULD NOT rely on completion of operations of a @@ >>>> -99,10 >>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a >>>> +Virtio Device >>>> / Feature B \begin{description} >>>> \item[0 to 23, and 50 to 127] Feature bits for the specific device >>>> type >>>> >>>> -\item[24 to 41] Feature bits reserved for extensions to the queue >>>> and >>>> +\item[24 to 42] Feature bits reserved for extensions to the queue >>>> +and >>>> feature negotiation mechanisms >>>> >>>> -\item[42 to 49, and 128 and above] Feature bits reserved for future >>>> extensions. >>>> +\item[43 to 49, and 128 and above] Feature bits reserved for future >>>> extensions. >>>> \end{description} >>>> >>>> \begin{note} >>>> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General >>>> Initialization And Device Operation / >>>> >>>> Thus a driver MUST ensure a virtqueue isn't live (by device reset) >>>> before removing exposed buffers. >>>> >>>> +\section{Device Suspend}\label{sec:General Initialization And Device >>>> +Operation / Device Suspend} >>>> + >>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the SUSPEND >>>> +bit in \field{device status} to suspend a device, and can clear the >>>> +SUSPEND bit to resume a suspended device. >>>> + >>>> +\drivernormative{\subsection}{Device Suspend}{General Initialization >>>> +And Device Operation / Device Suspend} >>>> + >>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or >>>> VIRTIO_F_SUSPEND is not negotiated. >>>> + >>>> +Once the driver sets SUSPEND to \field{device status} of the device: >>>> +\begin{itemize} >>>> +\item The driver MUST re-read \field{device status} to verify >>>> +whether the >>>> SUSPEND bit is set. >>>> +\item The driver MUST NOT make any more buffers available to the >> device. >>>> +\item The driver MUST NOT access any virtqueues or send >>>> +notifications for >>>> any virtqueues. >>>> +\item The driver MUST NOT access Device Configuration Space. >>>> +\end{itemize} >>>> + >> Hi Parva >>> Do we agree that >>> a. suspending a device is non frequent operation (in order of N >> operations/sec, where N is roughly in range of 10 or 100) per device? >> Ideally it should not be often in normal operations, but remember we can >> not restrict the behaviors of the driver, so we must be able to handle the >> scenario in which SUSPENDING is often. > Sure. the intent is slow rate, but one can do at unexpected times. > Do you agree? I think we don't have an intention of the frequency in the spec. The spec only provides generic mechanisms and interfaces. Don't assume it(or driver wants it to be) would be often or not, that depends on the driver. > >>> b. A software-based device may not always want to force VM_EXIT on read >> and write on the device_status register? >> Trap and Emulation is the basic of virtualization, and how to pass-through a >> device is out of this spec. >> > Sure, I didn’t suggest to put such things in the spec. > My question is, whether to trap and emulate or not is a choice of the software. > Do you agree? The device emulator does not know anything about whether trapped or not. Trapping and Emulation is a hypervisor thing. If here "software" refers to the device emulator, then Yes, it is not the emulator's decision. And the device should not be aware of VM_EXIT & VM_ENTRY. this is out of the spec anyway. Thanks Zhu Lingshan > > >> Thanks >> Zhu Lingshan >>> >>>> +\devicenormative{\subsection}{Device Suspend}{General Initialization >>>> +And Device Operation / Device Suspend} >>>> + >>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >>>> VIRTIO_F_SUSPEND is not negotiated. >>>> + >>>> +The device MUST ignore all access to its Configuration Space while >>>> +suspended, except for \field{device status} if it is part of the >>>> +Configuration >>>> Space. >>>> + >>>> +A device MUST NOT send any notifications for any virtqeuues, access >>>> +any virtqueues, or modify any fields in its Configuration Space >>>> +while suspended. >>>> + >>>> +If changes occur in the Configuration Space while the SUSPEND bit is >>>> +set, the device MUST NOT send any configuration change notifications. >>>> +Instead, the device MUST send the notification after the SUSPEND bit >>>> +has >>>> been cleared. >>>> + >>>> +When the driver sets SUSPEND, the device MUST either suspend itself >>>> +or set >>>> DEVICE_NEEDS_RESET if failed to suspend. >>>> + >>>> +If SUSPEND is set in \field{device status}, when the driver clears >>>> +SUSPEND, the device MUST either resume normal operation or set >>>> DEVICE_NEEDS_RESET. >>>> + >>>> +When the driver sets SUSPEND, >>>> +the device SHOULD perform the following actions before presenting >>>> +that >>>> the SUSPEND bit is set to 1 in the \field{device status}: >>>> + >>>> +\begin{itemize} >>>> +\item Stop processing more buffers of any virtqueues \item Wait >>>> +until all buffers that are being processed have been used. >>>> +\item Send used buffer notifications to the driver. >>>> +\end{itemize} >>>> + >>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport >>>> Options} >>>> >>>> Virtio can use various different buses, thus the standard is split >>>> @@ -872,6 >>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature >>>> Bits} >>>> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature >>>> Bits} for >>>> handling features reserved for future use. >>>> >>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can >>>> + trigger suspending the device via the SUSPEND flag >>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >>>> + >>>> \end{description} >>>> >>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature >>>> Bits} >>>> -- >>>> 2.45.2 >>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 6:14 ` Zhu Lingshan @ 2024-08-13 6:55 ` Parav Pandit 2024-08-15 8:23 ` Zhu Lingshan 2024-08-15 10:52 ` Michael S. Tsirkin 0 siblings, 2 replies; 69+ messages in thread From: Parav Pandit @ 2024-08-13 6:55 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Tuesday, August 13, 2024 11:44 AM > I removed the unreachable intel email id as every single email is bouncing from it. Please consider dropping that email from v8 as it will cause all reviewers email to bounce. > On 8/13/2024 1:50 PM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Tuesday, August 13, 2024 11:15 AM > >> > >> On 8/13/2024 12:42 PM, Parav Pandit wrote: > >>> Hi Lingshan, David, > >>> > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Thursday, August 1, 2024 5:05 PM > >>>> > >>>> This commit allows the driver to suspend the device by introducing > >>>> a new status bit SUSPEND in device_status. > >>>> > >>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND which > >>>> indicating whether the device support SUSPEND. > >>>> > >>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Signed-off-by: David Stevens <stevensd@chromium.org> > >>>> --- > >>>> content.tex | 75 > ++++++++++++++++++++++++++++++++++++++++++++++- > >> ---- > >>>> -- > >>>> 1 file changed, 65 insertions(+), 10 deletions(-) > >>>> > >>>> Changes from V6: > >>>> - the device should hold its config interrupt while SUSPEND, and > >>>> send config interrupt when the SUSPEND bit is cleared. > >>>> - while SUSPEND, the driver MUST NOT access Device Configuration > >>>> Space > >>>> - minor changes. > >>>> > >>>> Changes from V5: > >>>> - the device should present NEEDS_RESET if failed to suspend > >>>> - allow the driver access device status in the config space when > >>>> suspended if it is implemented in config space. > >>>> - language improvements > >>>> > >>>> Changes from V4: > >>>> - re-order the device status bits section > >>>> - kick vqs --> notify vqs > >>>> > >>>> Changes from V3: > >>>> - allow the driver clearing the SUSPEND bit to resume the device. > >>>> - disallow access to config space while suspended. > >>>> > >>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 > >>>> 100644 > >>>> --- a/content.tex > >>>> +++ b/content.tex > >>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} > >>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>> this bit. For example, under Linux, drivers can be loadable modules. > >>>> \end{note} > >>>> > >>>> -\item[FAILED (128)] Indicates that something went wrong in the > >>>> guest, > >>>> - and it has given up on the device. This could be an internal > >>>> - error, or the driver didn't like the device for some reason, or > >>>> - even a fatal error during device operation. > >>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready > >>>> +to > >>>> + drive the device. > >>>> > >>>> \item[FEATURES_OK (8)] Indicates that the driver has acknowledged > >>>> all > >> the > >>>> features it understands, and feature negotiation is complete. > >>>> > >>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready > >>>> to > >>>> - drive the device. > >>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, > indicates > >>>> +that the > >>>> + device has been suspended by the driver. > >>>> > >>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > >> experienced > >>>> an error from which it can't recover. > >>>> + > >>>> +\item[FAILED (128)] Indicates that something went wrong in the > >>>> +guest, > >>>> + and it has given up on the device. This could be an internal > >>>> + error, or the driver didn't like the device for some reason, or > >>>> + even a fatal error during device operation. > >>>> \end{description} > >>>> > >>>> The \field{device status} field starts out as 0, and is > >>>> reinitialized to 0 by @@ - > >>>> 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic > >>>> Facilities of a Virtio Dev initialization sequence specified in > >>>> \ref{sec:General Initialization And Device Operation / Device > >> Initialization}. > >>>> -The driver MUST NOT clear a > >>>> -\field{device status} bit. If the driver sets the FAILED bit, > >>>> +The driver MUST NOT clear a \field{device status} bit other than > >>>> +SUSPEND except when setting \field{device status} to 0 as a > >>>> +transport-specific way to initiate a reset. If the driver sets the > >>>> +FAILED bit, > >>>> the driver MUST later reset the device before attempting to re- > initialize. > >>>> > >>>> The driver SHOULD NOT rely on completion of operations of a @@ > >>>> -99,10 > >>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a > >>>> +Virtio Device > >>>> / Feature B \begin{description} > >>>> \item[0 to 23, and 50 to 127] Feature bits for the specific device > >>>> type > >>>> > >>>> -\item[24 to 41] Feature bits reserved for extensions to the queue > >>>> and > >>>> +\item[24 to 42] Feature bits reserved for extensions to the queue > >>>> +and > >>>> feature negotiation mechanisms > >>>> > >>>> -\item[42 to 49, and 128 and above] Feature bits reserved for > >>>> future extensions. > >>>> +\item[43 to 49, and 128 and above] Feature bits reserved for > >>>> +future > >>>> extensions. > >>>> \end{description} > >>>> > >>>> \begin{note} > >>>> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General > >>>> Initialization And Device Operation / > >>>> > >>>> Thus a driver MUST ensure a virtqueue isn't live (by device reset) > >>>> before removing exposed buffers. > >>>> > >>>> +\section{Device Suspend}\label{sec:General Initialization And > >>>> +Device Operation / Device Suspend} > >>>> + > >>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the > >>>> +SUSPEND bit in \field{device status} to suspend a device, and can > >>>> +clear the SUSPEND bit to resume a suspended device. > >>>> + > >>>> +\drivernormative{\subsection}{Device Suspend}{General > >>>> +Initialization And Device Operation / Device Suspend} > >>>> + > >>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or > >>>> VIRTIO_F_SUSPEND is not negotiated. > >>>> + > >>>> +Once the driver sets SUSPEND to \field{device status} of the device: > >>>> +\begin{itemize} > >>>> +\item The driver MUST re-read \field{device status} to verify > >>>> +whether the > >>>> SUSPEND bit is set. > >>>> +\item The driver MUST NOT make any more buffers available to the > >> device. > >>>> +\item The driver MUST NOT access any virtqueues or send > >>>> +notifications for > >>>> any virtqueues. > >>>> +\item The driver MUST NOT access Device Configuration Space. > >>>> +\end{itemize} > >>>> + > >> Hi Parva > >>> Do we agree that > >>> a. suspending a device is non frequent operation (in order of N > >> operations/sec, where N is roughly in range of 10 or 100) per device? > >> Ideally it should not be often in normal operations, but remember we > >> can not restrict the behaviors of the driver, so we must be able to > >> handle the scenario in which SUSPENDING is often. > > Sure. the intent is slow rate, but one can do at unexpected times. > > Do you agree? > I think we don't have an intention of the frequency in the spec. Sure. > The spec only provides generic mechanisms and interfaces. Sure. > Don't assume it(or driver wants it to be) would be often or not, that depends > on the driver. As you rightly said : it cannot be assumed. The driver will read the device status right after it wrote it. This typically is < 50nsec of time. The suspend operation for a net device to store hundreds of queues, RSS table, flow filters, takes plenty of time (at least more than 50nsec :) ). Similarly for the GPU to store some MBs of memory takes more than 50nsec of time, for example to store in a file for a software-based GPU device. So a device cannot respond back suspend=true in next 50nsec time. More below. > > > >>> b. A software-based device may not always want to force VM_EXIT on > >>> read > >> and write on the device_status register? > >> Trap and Emulation is the basic of virtualization, and how to > >> pass-through a device is out of this spec. > >> > > Sure, I didn’t suggest to put such things in the spec. > > My question is, whether to trap and emulate or not is a choice of the > software. > > Do you agree? > The device emulator does not know anything about whether trapped or not. > Trapping and Emulation is a hypervisor thing. > > If here "software" refers to the device emulator, then Yes, it is not the > emulator's decision. And the device should not be aware of VM_EXIT & > VM_ENTRY. > Right. So a software wants to implement device_status as pure MMIO writes. (and not VM_EXIT). And prefer to returning SUSPEND=true at slow pace. This means, the device implementation cannot immediately return suspend=true right after it was written. A MMIO read will read it back, as suspend=true. An alternative would be, to forward CPU loads and CPU stores to different address. However, this does not work for the hw based devices. That means, PCI HW needs to return suspend=0, until the device is not suspended. In this example, the device cannot build special circuitry to answer suspend=true within 50nsec, or in other words building special circuitry to return suspend=false is too complex for the slow operation. If this understanding of burden is clear, The proposal is, can you please extend the interface such that, 1. driver writes suspend command. 2. driver reads suspend_status, and receives not_completed=(false). This is the default value. 3. When the device completes suspend, it changes the polarity of suspend_status=true. This has two main benefits: [A] This will enable software-based devices to write data to slow files and does not have to force VM_EXITs. [B] It also enables hw based devices to not build special circuitry to answer within 50nsec, which can get very complicated for tens or hundreds of PCI PFs. > this is out of the spec anyway. > > Thanks > Zhu Lingshan > > > > > >> Thanks > >> Zhu Lingshan > >>> > >>>> +\devicenormative{\subsection}{Device Suspend}{General > >>>> +Initialization And Device Operation / Device Suspend} > >>>> + > >>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or > >>>> VIRTIO_F_SUSPEND is not negotiated. > >>>> + > >>>> +The device MUST ignore all access to its Configuration Space while > >>>> +suspended, except for \field{device status} if it is part of the > >>>> +Configuration > >>>> Space. > >>>> + > >>>> +A device MUST NOT send any notifications for any virtqeuues, > >>>> +access any virtqueues, or modify any fields in its Configuration > >>>> +Space while suspended. > >>>> + > >>>> +If changes occur in the Configuration Space while the SUSPEND bit > >>>> +is set, the device MUST NOT send any configuration change > notifications. > >>>> +Instead, the device MUST send the notification after the SUSPEND > >>>> +bit has > >>>> been cleared. > >>>> + > >>>> +When the driver sets SUSPEND, the device MUST either suspend > >>>> +itself or set > >>>> DEVICE_NEEDS_RESET if failed to suspend. > >>>> + > >>>> +If SUSPEND is set in \field{device status}, when the driver clears > >>>> +SUSPEND, the device MUST either resume normal operation or set > >>>> DEVICE_NEEDS_RESET. > >>>> + > >>>> +When the driver sets SUSPEND, > >>>> +the device SHOULD perform the following actions before presenting > >>>> +that > >>>> the SUSPEND bit is set to 1 in the \field{device status}: > >>>> + > >>>> +\begin{itemize} > >>>> +\item Stop processing more buffers of any virtqueues \item Wait > >>>> +until all buffers that are being processed have been used. > >>>> +\item Send used buffer notifications to the driver. > >>>> +\end{itemize} > >>>> + > >>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport > >>>> Options} > >>>> > >>>> Virtio can use various different buses, thus the standard is split > >>>> @@ -872,6 > >>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > >>>> +Feature > >>>> Bits} > >>>> \ref{devicenormative:Basic Facilities of a Virtio Device / > >>>> Feature Bits} for > >>>> handling features reserved for future use. > >>>> > >>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver > can > >>>> + trigger suspending the device via the SUSPEND flag > >>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > >>>> + > >>>> \end{description} > >>>> > >>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature > >>>> Bits} > >>>> -- > >>>> 2.45.2 > >>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 6:55 ` Parav Pandit @ 2024-08-15 8:23 ` Zhu Lingshan 2024-08-15 9:34 ` Parav Pandit 2024-08-15 10:45 ` Michael S. Tsirkin 2024-08-15 10:52 ` Michael S. Tsirkin 1 sibling, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-08-15 8:23 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/13/2024 2:55 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Tuesday, August 13, 2024 11:44 AM >> > I removed the unreachable intel email id as every single email is bouncing from it. > Please consider dropping that email from v8 as it will cause all reviewers email to bounce. > >> On 8/13/2024 1:50 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Tuesday, August 13, 2024 11:15 AM >>>> >>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: >>>>> Hi Lingshan, David, >>>>> >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Sent: Thursday, August 1, 2024 5:05 PM >>>>>> >>>>>> This commit allows the driver to suspend the device by introducing >>>>>> a new status bit SUSPEND in device_status. >>>>>> >>>>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND which >>>>>> indicating whether the device support SUSPEND. >>>>>> >>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> >>>>>> --- >>>>>> content.tex | 75 >> ++++++++++++++++++++++++++++++++++++++++++++++- >>>> ---- >>>>>> -- >>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) >>>>>> >>>>>> Changes from V6: >>>>>> - the device should hold its config interrupt while SUSPEND, and >>>>>> send config interrupt when the SUSPEND bit is cleared. >>>>>> - while SUSPEND, the driver MUST NOT access Device Configuration >>>>>> Space >>>>>> - minor changes. >>>>>> >>>>>> Changes from V5: >>>>>> - the device should present NEEDS_RESET if failed to suspend >>>>>> - allow the driver access device status in the config space when >>>>>> suspended if it is implemented in config space. >>>>>> - language improvements >>>>>> >>>>>> Changes from V4: >>>>>> - re-order the device status bits section >>>>>> - kick vqs --> notify vqs >>>>>> >>>>>> Changes from V3: >>>>>> - allow the driver clearing the SUSPEND bit to resume the device. >>>>>> - disallow access to config space while suspended. >>>>>> >>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 >>>>>> 100644 >>>>>> --- a/content.tex >>>>>> +++ b/content.tex >>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} >>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>> this bit. For example, under Linux, drivers can be loadable modules. >>>>>> \end{note} >>>>>> >>>>>> -\item[FAILED (128)] Indicates that something went wrong in the >>>>>> guest, >>>>>> - and it has given up on the device. This could be an internal >>>>>> - error, or the driver didn't like the device for some reason, or >>>>>> - even a fatal error during device operation. >>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready >>>>>> +to >>>>>> + drive the device. >>>>>> >>>>>> \item[FEATURES_OK (8)] Indicates that the driver has acknowledged >>>>>> all >>>> the >>>>>> features it understands, and feature negotiation is complete. >>>>>> >>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready >>>>>> to >>>>>> - drive the device. >>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, >> indicates >>>>>> +that the >>>>>> + device has been suspended by the driver. >>>>>> >>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >>>> experienced >>>>>> an error from which it can't recover. >>>>>> + >>>>>> +\item[FAILED (128)] Indicates that something went wrong in the >>>>>> +guest, >>>>>> + and it has given up on the device. This could be an internal >>>>>> + error, or the driver didn't like the device for some reason, or >>>>>> + even a fatal error during device operation. >>>>>> \end{description} >>>>>> >>>>>> The \field{device status} field starts out as 0, and is >>>>>> reinitialized to 0 by @@ - >>>>>> 60,8 +63,9 @@ \section{\field{Device Status} Field}\label{sec:Basic >>>>>> Facilities of a Virtio Dev initialization sequence specified in >>>>>> \ref{sec:General Initialization And Device Operation / Device >>>> Initialization}. >>>>>> -The driver MUST NOT clear a >>>>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>>>> +The driver MUST NOT clear a \field{device status} bit other than >>>>>> +SUSPEND except when setting \field{device status} to 0 as a >>>>>> +transport-specific way to initiate a reset. If the driver sets the >>>>>> +FAILED bit, >>>>>> the driver MUST later reset the device before attempting to re- >> initialize. >>>>>> The driver SHOULD NOT rely on completion of operations of a @@ >>>>>> -99,10 >>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a >>>>>> +Virtio Device >>>>>> / Feature B \begin{description} >>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific device >>>>>> type >>>>>> >>>>>> -\item[24 to 41] Feature bits reserved for extensions to the queue >>>>>> and >>>>>> +\item[24 to 42] Feature bits reserved for extensions to the queue >>>>>> +and >>>>>> feature negotiation mechanisms >>>>>> >>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for >>>>>> future extensions. >>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for >>>>>> +future >>>>>> extensions. >>>>>> \end{description} >>>>>> >>>>>> \begin{note} >>>>>> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General >>>>>> Initialization And Device Operation / >>>>>> >>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device reset) >>>>>> before removing exposed buffers. >>>>>> >>>>>> +\section{Device Suspend}\label{sec:General Initialization And >>>>>> +Device Operation / Device Suspend} >>>>>> + >>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the >>>>>> +SUSPEND bit in \field{device status} to suspend a device, and can >>>>>> +clear the SUSPEND bit to resume a suspended device. >>>>>> + >>>>>> +\drivernormative{\subsection}{Device Suspend}{General >>>>>> +Initialization And Device Operation / Device Suspend} >>>>>> + >>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or >>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>> + >>>>>> +Once the driver sets SUSPEND to \field{device status} of the device: >>>>>> +\begin{itemize} >>>>>> +\item The driver MUST re-read \field{device status} to verify >>>>>> +whether the >>>>>> SUSPEND bit is set. >>>>>> +\item The driver MUST NOT make any more buffers available to the >>>> device. >>>>>> +\item The driver MUST NOT access any virtqueues or send >>>>>> +notifications for >>>>>> any virtqueues. >>>>>> +\item The driver MUST NOT access Device Configuration Space. >>>>>> +\end{itemize} >>>>>> + >>>> Hi Parva >>>>> Do we agree that >>>>> a. suspending a device is non frequent operation (in order of N >>>> operations/sec, where N is roughly in range of 10 or 100) per device? >>>> Ideally it should not be often in normal operations, but remember we >>>> can not restrict the behaviors of the driver, so we must be able to >>>> handle the scenario in which SUSPENDING is often. >>> Sure. the intent is slow rate, but one can do at unexpected times. >>> Do you agree? >> I think we don't have an intention of the frequency in the spec. > Sure. > >> The spec only provides generic mechanisms and interfaces. > Sure. > >> Don't assume it(or driver wants it to be) would be often or not, that depends >> on the driver. > As you rightly said : it cannot be assumed. > The driver will read the device status right after it wrote it. This typically is < 50nsec of time. > The suspend operation for a net device to store hundreds of queues, RSS table, flow filters, takes plenty of time (at least more than 50nsec :) ). > Similarly for the GPU to store some MBs of memory takes more than 50nsec of time, for example to store in a file for a software-based GPU device. > So a device cannot respond back suspend=true in next 50nsec time. It's OK for the device to take longer time to respond, the driver simply re-reads device status. > > More below. > >>>>> b. A software-based device may not always want to force VM_EXIT on >>>>> read >>>> and write on the device_status register? >>>> Trap and Emulation is the basic of virtualization, and how to >>>> pass-through a device is out of this spec. >>>> >>> Sure, I didn’t suggest to put such things in the spec. >>> My question is, whether to trap and emulate or not is a choice of the >> software. >>> Do you agree? >> The device emulator does not know anything about whether trapped or not. >> Trapping and Emulation is a hypervisor thing. >> >> If here "software" refers to the device emulator, then Yes, it is not the >> emulator's decision. And the device should not be aware of VM_EXIT & >> VM_ENTRY. >> > Right. So a software wants to implement device_status as pure MMIO writes. (and not VM_EXIT). This is not always true, there can be VM_EXIT of pure emulated devices. The HW registers are sensitive resource and any access to them need to be trapped and emulated. > And prefer to returning SUSPEND=true at slow pace. > This means, the device implementation cannot immediately return suspend=true right after it was written. > A MMIO read will read it back, as suspend=true. > > An alternative would be, to forward CPU loads and CPU stores to different address. > However, this does not work for the hw based devices. > > That means, PCI HW needs to return suspend=0, until the device is not suspended. > In this example, the device cannot build special circuitry to answer suspend=true within 50nsec, or in other words building special circuitry to return suspend=false is too complex for the slow operation. why? The device can just not to change the value of the SUSPEND bit before it has fully suspended. > > If this understanding of burden is clear, > > The proposal is, can you please extend the interface such that, > > 1. driver writes suspend command. > 2. driver reads suspend_status, and receives not_completed=(false). This is the default value. > 3. When the device completes suspend, it changes the polarity of suspend_status=true. > > This has two main benefits: > [A] This will enable software-based devices to write data to slow files and does not have to force VM_EXITs. > > [B] It also enables hw based devices to not build special circuitry to answer within 50nsec, which can get very complicated for tens or hundreds of PCI PFs. I think we have already discussed on this before in V5, and Jason has some insightful comments https://lore.kernel.org/virtio-comment/20240612082055-mutt-send-email-mst@kernel.org/T/#mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9 > >> this is out of the spec anyway. >> >> Thanks >> Zhu Lingshan >>> >>>> Thanks >>>> Zhu Lingshan >>>>>> +\devicenormative{\subsection}{Device Suspend}{General >>>>>> +Initialization And Device Operation / Device Suspend} >>>>>> + >>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>> + >>>>>> +The device MUST ignore all access to its Configuration Space while >>>>>> +suspended, except for \field{device status} if it is part of the >>>>>> +Configuration >>>>>> Space. >>>>>> + >>>>>> +A device MUST NOT send any notifications for any virtqeuues, >>>>>> +access any virtqueues, or modify any fields in its Configuration >>>>>> +Space while suspended. >>>>>> + >>>>>> +If changes occur in the Configuration Space while the SUSPEND bit >>>>>> +is set, the device MUST NOT send any configuration change >> notifications. >>>>>> +Instead, the device MUST send the notification after the SUSPEND >>>>>> +bit has >>>>>> been cleared. >>>>>> + >>>>>> +When the driver sets SUSPEND, the device MUST either suspend >>>>>> +itself or set >>>>>> DEVICE_NEEDS_RESET if failed to suspend. >>>>>> + >>>>>> +If SUSPEND is set in \field{device status}, when the driver clears >>>>>> +SUSPEND, the device MUST either resume normal operation or set >>>>>> DEVICE_NEEDS_RESET. >>>>>> + >>>>>> +When the driver sets SUSPEND, >>>>>> +the device SHOULD perform the following actions before presenting >>>>>> +that >>>>>> the SUSPEND bit is set to 1 in the \field{device status}: >>>>>> + >>>>>> +\begin{itemize} >>>>>> +\item Stop processing more buffers of any virtqueues \item Wait >>>>>> +until all buffers that are being processed have been used. >>>>>> +\item Send used buffer notifications to the driver. >>>>>> +\end{itemize} >>>>>> + >>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport >>>>>> Options} >>>>>> >>>>>> Virtio can use various different buses, thus the standard is split >>>>>> @@ -872,6 >>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >>>>>> +Feature >>>>>> Bits} >>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / >>>>>> Feature Bits} for >>>>>> handling features reserved for future use. >>>>>> >>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver >> can >>>>>> + trigger suspending the device via the SUSPEND flag >>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >>>>>> + >>>>>> \end{description} >>>>>> >>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature >>>>>> Bits} >>>>>> -- >>>>>> 2.45.2 >>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 8:23 ` Zhu Lingshan @ 2024-08-15 9:34 ` Parav Pandit 2024-08-30 2:31 ` Zhu Lingshan 2024-08-15 10:45 ` Michael S. Tsirkin 1 sibling, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-15 9:34 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, August 15, 2024 1:53 PM > > On 8/13/2024 2:55 PM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Tuesday, August 13, 2024 11:44 AM > >> > > I removed the unreachable intel email id as every single email is bouncing > from it. > > Please consider dropping that email from v8 as it will cause all reviewers > email to bounce. > > > >> On 8/13/2024 1:50 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Tuesday, August 13, 2024 11:15 AM > >>>> > >>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: > >>>>> Hi Lingshan, David, > >>>>> > >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>> Sent: Thursday, August 1, 2024 5:05 PM > >>>>>> > >>>>>> This commit allows the driver to suspend the device by > >>>>>> introducing a new status bit SUSPEND in device_status. > >>>>>> > >>>>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND > >>>>>> which indicating whether the device support SUSPEND. > >>>>>> > >>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> > >>>>>> --- > >>>>>> content.tex | 75 > >> ++++++++++++++++++++++++++++++++++++++++++++++- > >>>> ---- > >>>>>> -- > >>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) > >>>>>> > >>>>>> Changes from V6: > >>>>>> - the device should hold its config interrupt while SUSPEND, and > >>>>>> send config interrupt when the SUSPEND bit is cleared. > >>>>>> - while SUSPEND, the driver MUST NOT access Device Configuration > >>>>>> Space > >>>>>> - minor changes. > >>>>>> > >>>>>> Changes from V5: > >>>>>> - the device should present NEEDS_RESET if failed to suspend > >>>>>> - allow the driver access device status in the config space when > >>>>>> suspended if it is implemented in config space. > >>>>>> - language improvements > >>>>>> > >>>>>> Changes from V4: > >>>>>> - re-order the device status bits section > >>>>>> - kick vqs --> notify vqs > >>>>>> > >>>>>> Changes from V3: > >>>>>> - allow the driver clearing the SUSPEND bit to resume the device. > >>>>>> - disallow access to config space while suspended. > >>>>>> > >>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 > >>>>>> 100644 > >>>>>> --- a/content.tex > >>>>>> +++ b/content.tex > >>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} > >>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>> this bit. For example, under Linux, drivers can be loadable modules. > >>>>>> \end{note} > >>>>>> > >>>>>> -\item[FAILED (128)] Indicates that something went wrong in the > >>>>>> guest, > >>>>>> - and it has given up on the device. This could be an internal > >>>>>> - error, or the driver didn't like the device for some reason, > >>>>>> or > >>>>>> - even a fatal error during device operation. > >>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>> +ready to > >>>>>> + drive the device. > >>>>>> > >>>>>> \item[FEATURES_OK (8)] Indicates that the driver has > >>>>>> acknowledged all > >>>> the > >>>>>> features it understands, and feature negotiation is complete. > >>>>>> > >>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>> ready to > >>>>>> - drive the device. > >>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, > >> indicates > >>>>>> +that the > >>>>>> + device has been suspended by the driver. > >>>>>> > >>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > >>>> experienced > >>>>>> an error from which it can't recover. > >>>>>> + > >>>>>> +\item[FAILED (128)] Indicates that something went wrong in the > >>>>>> +guest, > >>>>>> + and it has given up on the device. This could be an internal > >>>>>> + error, or the driver didn't like the device for some reason, > >>>>>> +or > >>>>>> + even a fatal error during device operation. > >>>>>> \end{description} > >>>>>> > >>>>>> The \field{device status} field starts out as 0, and is > >>>>>> reinitialized to 0 by @@ - > >>>>>> 60,8 +63,9 @@ \section{\field{Device Status} > >>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev initialization > >>>>>> sequence specified in \ref{sec:General Initialization And Device > >>>>>> Operation / Device > >>>> Initialization}. > >>>>>> -The driver MUST NOT clear a > >>>>>> -\field{device status} bit. If the driver sets the FAILED bit, > >>>>>> +The driver MUST NOT clear a \field{device status} bit other than > >>>>>> +SUSPEND except when setting \field{device status} to 0 as a > >>>>>> +transport-specific way to initiate a reset. If the driver sets > >>>>>> +the FAILED bit, > >>>>>> the driver MUST later reset the device before attempting to re- > >> initialize. > >>>>>> The driver SHOULD NOT rely on completion of operations of a @@ > >>>>>> -99,10 > >>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a > >>>>>> +Virtio Device > >>>>>> / Feature B \begin{description} > >>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific > >>>>>> device type > >>>>>> > >>>>>> -\item[24 to 41] Feature bits reserved for extensions to the > >>>>>> queue and > >>>>>> +\item[24 to 42] Feature bits reserved for extensions to the > >>>>>> +queue and > >>>>>> feature negotiation mechanisms > >>>>>> > >>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for > >>>>>> future extensions. > >>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for > >>>>>> +future > >>>>>> extensions. > >>>>>> \end{description} > >>>>>> > >>>>>> \begin{note} > >>>>>> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General > >>>>>> Initialization And Device Operation / > >>>>>> > >>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device > >>>>>> reset) before removing exposed buffers. > >>>>>> > >>>>>> +\section{Device Suspend}\label{sec:General Initialization And > >>>>>> +Device Operation / Device Suspend} > >>>>>> + > >>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the > >>>>>> +SUSPEND bit in \field{device status} to suspend a device, and > >>>>>> +can clear the SUSPEND bit to resume a suspended device. > >>>>>> + > >>>>>> +\drivernormative{\subsection}{Device Suspend}{General > >>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>> + > >>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or > >>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>> + > >>>>>> +Once the driver sets SUSPEND to \field{device status} of the device: > >>>>>> +\begin{itemize} > >>>>>> +\item The driver MUST re-read \field{device status} to verify > >>>>>> +whether the > >>>>>> SUSPEND bit is set. > >>>>>> +\item The driver MUST NOT make any more buffers available to the > >>>> device. > >>>>>> +\item The driver MUST NOT access any virtqueues or send > >>>>>> +notifications for > >>>>>> any virtqueues. > >>>>>> +\item The driver MUST NOT access Device Configuration Space. > >>>>>> +\end{itemize} > >>>>>> + > >>>> Hi Parva > >>>>> Do we agree that > >>>>> a. suspending a device is non frequent operation (in order of N > >>>> operations/sec, where N is roughly in range of 10 or 100) per device? > >>>> Ideally it should not be often in normal operations, but remember > >>>> we can not restrict the behaviors of the driver, so we must be able > >>>> to handle the scenario in which SUSPENDING is often. > >>> Sure. the intent is slow rate, but one can do at unexpected times. > >>> Do you agree? > >> I think we don't have an intention of the frequency in the spec. > > Sure. > > > >> The spec only provides generic mechanisms and interfaces. > > Sure. > > > >> Don't assume it(or driver wants it to be) would be often or not, that > >> depends on the driver. > > As you rightly said : it cannot be assumed. > > The driver will read the device status right after it wrote it. This typically is < > 50nsec of time. > > The suspend operation for a net device to store hundreds of queues, RSS > table, flow filters, takes plenty of time (at least more than 50nsec :) ). > > Similarly for the GPU to store some MBs of memory takes more than 50nsec > of time, for example to store in a file for a software-based GPU device. > > So a device cannot respond back suspend=true in next 50nsec time. > It's OK for the device to take longer time to respond, the driver simply re-reads > device status. Sure, the issue is, when the driver re-reads, the device must present suspend=false within 50nsec. (because device didn't suspend it yet). As I explained in previous email, this requires building special circuitry. Such circuitry can be avoided if the suspend interface is done slightly differently. > > > > More below. > > > >>>>> b. A software-based device may not always want to force VM_EXIT on > >>>>> read > >>>> and write on the device_status register? > >>>> Trap and Emulation is the basic of virtualization, and how to > >>>> pass-through a device is out of this spec. > >>>> > >>> Sure, I didn't suggest to put such things in the spec. > >>> My question is, whether to trap and emulate or not is a choice of > >>> the > >> software. > >>> Do you agree? > >> The device emulator does not know anything about whether trapped or > not. > >> Trapping and Emulation is a hypervisor thing. > >> > >> If here "software" refers to the device emulator, then Yes, it is not > >> the emulator's decision. And the device should not be aware of > >> VM_EXIT & VM_ENTRY. > >> > > Right. So a software wants to implement device_status as pure MMIO > writes. (and not VM_EXIT). > This is not always true, there can be VM_EXIT of pure emulated devices. I am not denying that VM_EXIT can/cannot be there. I am saying, the proposal forces VM_EXIT based approach in my understanding of this patch. If that is not true, may be can you please explain how this can be implemented without VM_EXIT? We should have an interface that can be done with and without VM_EXIT method at least for any new additions. > The > HW registers are sensitive resource and any access to them need to be trapped > and emulated. This does not apply to PCI PFs and VFs which are HW devices (mainly PFs). so this trap + emulation is narrow view that we better avoid. If you think this is the way forward, you should put forward in patch as MUST requirement. and that does not look right to me. I hope you also don't mean to force this method to device implementations. Right? > > And prefer to returning SUSPEND=true at slow pace. > > This means, the device implementation cannot immediately return > suspend=true right after it was written. > > A MMIO read will read it back, as suspend=true. > > > > An alternative would be, to forward CPU loads and CPU stores to different > address. > > However, this does not work for the hw based devices. > > > > That means, PCI HW needs to return suspend=0, until the device is not > suspended. > > In this example, the device cannot build special circuitry to answer > suspend=true within 50nsec, or in other words building special circuitry to > return suspend=false is too complex for the slow operation. > why? The device can just not to change the value of the SUSPEND bit before it > has fully suspended. When driver wrote, it wrote suspend=true, And device returns suspend=false while suspend is ongoing, right? If yes, this is expensive because the device needs to operate within 50nsec or less to answer suspend=false. And even worst, it needs to suspend=true when unsuspending within 50nsec when resuming is ongoing. > > > > If this understanding of burden is clear, > > > > The proposal is, can you please extend the interface such that, > > > > 1. driver writes suspend command. > > 2. driver reads suspend_status, and receives not_completed=(false). This is > the default value. > > 3. When the device completes suspend, it changes the polarity of > suspend_status=true. > > > > This has two main benefits: > > [A] This will enable software-based devices to write data to slow files and > does not have to force VM_EXITs. > > > > [B] It also enables hw based devices to not build special circuitry to answer > within 50nsec, which can get very complicated for tens or hundreds of PCI > PFs. > I think we have already discussed on this before in V5, and Jason has some > insightful comments > Unfortunately, not. His comment was that it is not specific to suspend. But here we are introducing a new interface and functionality that does not need to suffer or follow anything that may not be efficient. > https://lore.kernel.org/virtio-comment/20240612082055-mutt-send-email-mst@kernel.org/T/#mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9 > > > >> this is out of the spec anyway. > >> > >> Thanks > >> Zhu Lingshan > >>> > >>>> Thanks > >>>> Zhu Lingshan > >>>>>> +\devicenormative{\subsection}{Device Suspend}{General > >>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>> + > >>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or > >>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>> + > >>>>>> +The device MUST ignore all access to its Configuration Space > >>>>>> +while suspended, except for \field{device status} if it is part > >>>>>> +of the Configuration > >>>>>> Space. > >>>>>> + > >>>>>> +A device MUST NOT send any notifications for any virtqeuues, > >>>>>> +access any virtqueues, or modify any fields in its Configuration > >>>>>> +Space while suspended. > >>>>>> + > >>>>>> +If changes occur in the Configuration Space while the SUSPEND > >>>>>> +bit is set, the device MUST NOT send any configuration change > >> notifications. > >>>>>> +Instead, the device MUST send the notification after the SUSPEND > >>>>>> +bit has > >>>>>> been cleared. > >>>>>> + > >>>>>> +When the driver sets SUSPEND, the device MUST either suspend > >>>>>> +itself or set > >>>>>> DEVICE_NEEDS_RESET if failed to suspend. > >>>>>> + > >>>>>> +If SUSPEND is set in \field{device status}, when the driver > >>>>>> +clears SUSPEND, the device MUST either resume normal operation > >>>>>> +or set > >>>>>> DEVICE_NEEDS_RESET. > >>>>>> + > >>>>>> +When the driver sets SUSPEND, > >>>>>> +the device SHOULD perform the following actions before > >>>>>> +presenting that > >>>>>> the SUSPEND bit is set to 1 in the \field{device status}: > >>>>>> + > >>>>>> +\begin{itemize} > >>>>>> +\item Stop processing more buffers of any virtqueues \item Wait > >>>>>> +until all buffers that are being processed have been used. > >>>>>> +\item Send used buffer notifications to the driver. > >>>>>> +\end{itemize} > >>>>>> + > >>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport > >>>>>> Options} > >>>>>> > >>>>>> Virtio can use various different buses, thus the standard is > >>>>>> split @@ -872,6 > >>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > >>>>>> +Feature > >>>>>> Bits} > >>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / > >>>>>> Feature Bits} for > >>>>>> handling features reserved for future use. > >>>>>> > >>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the > >>>>>> + driver > >> can > >>>>>> + trigger suspending the device via the SUSPEND flag > >>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > >>>>>> + > >>>>>> \end{description} > >>>>>> > >>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved > >>>>>> Feature Bits} > >>>>>> -- > >>>>>> 2.45.2 > >>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 9:34 ` Parav Pandit @ 2024-08-30 2:31 ` Zhu Lingshan 2024-08-30 3:02 ` Parav Pandit 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-08-30 2:31 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/15/2024 5:34 PM, Parav Pandit wrote: >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Thursday, August 15, 2024 1:53 PM >> >> On 8/13/2024 2:55 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Tuesday, August 13, 2024 11:44 AM >>>> >>> I removed the unreachable intel email id as every single email is bouncing >> from it. >>> Please consider dropping that email from v8 as it will cause all reviewers >> email to bounce. >>>> On 8/13/2024 1:50 PM, Parav Pandit wrote: >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Sent: Tuesday, August 13, 2024 11:15 AM >>>>>> >>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: >>>>>>> Hi Lingshan, David, >>>>>>> >>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM >>>>>>>> >>>>>>>> This commit allows the driver to suspend the device by >>>>>>>> introducing a new status bit SUSPEND in device_status. >>>>>>>> >>>>>>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND >>>>>>>> which indicating whether the device support SUSPEND. >>>>>>>> >>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> >>>>>>>> --- >>>>>>>> content.tex | 75 >>>> ++++++++++++++++++++++++++++++++++++++++++++++- >>>>>> ---- >>>>>>>> -- >>>>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) >>>>>>>> >>>>>>>> Changes from V6: >>>>>>>> - the device should hold its config interrupt while SUSPEND, and >>>>>>>> send config interrupt when the SUSPEND bit is cleared. >>>>>>>> - while SUSPEND, the driver MUST NOT access Device Configuration >>>>>>>> Space >>>>>>>> - minor changes. >>>>>>>> >>>>>>>> Changes from V5: >>>>>>>> - the device should present NEEDS_RESET if failed to suspend >>>>>>>> - allow the driver access device status in the config space when >>>>>>>> suspended if it is implemented in config space. >>>>>>>> - language improvements >>>>>>>> >>>>>>>> Changes from V4: >>>>>>>> - re-order the device status bits section >>>>>>>> - kick vqs --> notify vqs >>>>>>>> >>>>>>>> Changes from V3: >>>>>>>> - allow the driver clearing the SUSPEND bit to resume the device. >>>>>>>> - disallow access to config space while suspended. >>>>>>>> >>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 >>>>>>>> 100644 >>>>>>>> --- a/content.tex >>>>>>>> +++ b/content.tex >>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} >>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>> this bit. For example, under Linux, drivers can be loadable modules. >>>>>>>> \end{note} >>>>>>>> >>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in the >>>>>>>> guest, >>>>>>>> - and it has given up on the device. This could be an internal >>>>>>>> - error, or the driver didn't like the device for some reason, >>>>>>>> or >>>>>>>> - even a fatal error during device operation. >>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>> +ready to >>>>>>>> + drive the device. >>>>>>>> >>>>>>>> \item[FEATURES_OK (8)] Indicates that the driver has >>>>>>>> acknowledged all >>>>>> the >>>>>>>> features it understands, and feature negotiation is complete. >>>>>>>> >>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>> ready to >>>>>>>> - drive the device. >>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, >>>> indicates >>>>>>>> +that the >>>>>>>> + device has been suspended by the driver. >>>>>>>> >>>>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >>>>>> experienced >>>>>>>> an error from which it can't recover. >>>>>>>> + >>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in the >>>>>>>> +guest, >>>>>>>> + and it has given up on the device. This could be an internal >>>>>>>> + error, or the driver didn't like the device for some reason, >>>>>>>> +or >>>>>>>> + even a fatal error during device operation. >>>>>>>> \end{description} >>>>>>>> >>>>>>>> The \field{device status} field starts out as 0, and is >>>>>>>> reinitialized to 0 by @@ - >>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status} >>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev initialization >>>>>>>> sequence specified in \ref{sec:General Initialization And Device >>>>>>>> Operation / Device >>>>>> Initialization}. >>>>>>>> -The driver MUST NOT clear a >>>>>>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>>>>>> +The driver MUST NOT clear a \field{device status} bit other than >>>>>>>> +SUSPEND except when setting \field{device status} to 0 as a >>>>>>>> +transport-specific way to initiate a reset. If the driver sets >>>>>>>> +the FAILED bit, >>>>>>>> the driver MUST later reset the device before attempting to re- >>>> initialize. >>>>>>>> The driver SHOULD NOT rely on completion of operations of a @@ >>>>>>>> -99,10 >>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a >>>>>>>> +Virtio Device >>>>>>>> / Feature B \begin{description} >>>>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific >>>>>>>> device type >>>>>>>> >>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the >>>>>>>> queue and >>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the >>>>>>>> +queue and >>>>>>>> feature negotiation mechanisms >>>>>>>> >>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for >>>>>>>> future extensions. >>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for >>>>>>>> +future >>>>>>>> extensions. >>>>>>>> \end{description} >>>>>>>> >>>>>>>> \begin{note} >>>>>>>> @@ -629,6 +633,53 @@ \section{Device Cleanup}\label{sec:General >>>>>>>> Initialization And Device Operation / >>>>>>>> >>>>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device >>>>>>>> reset) before removing exposed buffers. >>>>>>>> >>>>>>>> +\section{Device Suspend}\label{sec:General Initialization And >>>>>>>> +Device Operation / Device Suspend} >>>>>>>> + >>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the >>>>>>>> +SUSPEND bit in \field{device status} to suspend a device, and >>>>>>>> +can clear the SUSPEND bit to resume a suspended device. >>>>>>>> + >>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General >>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>> + >>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or >>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>> + >>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the device: >>>>>>>> +\begin{itemize} >>>>>>>> +\item The driver MUST re-read \field{device status} to verify >>>>>>>> +whether the >>>>>>>> SUSPEND bit is set. >>>>>>>> +\item The driver MUST NOT make any more buffers available to the >>>>>> device. >>>>>>>> +\item The driver MUST NOT access any virtqueues or send >>>>>>>> +notifications for >>>>>>>> any virtqueues. >>>>>>>> +\item The driver MUST NOT access Device Configuration Space. >>>>>>>> +\end{itemize} >>>>>>>> + >>>>>> Hi Parva >>>>>>> Do we agree that >>>>>>> a. suspending a device is non frequent operation (in order of N >>>>>> operations/sec, where N is roughly in range of 10 or 100) per device? >>>>>> Ideally it should not be often in normal operations, but remember >>>>>> we can not restrict the behaviors of the driver, so we must be able >>>>>> to handle the scenario in which SUSPENDING is often. >>>>> Sure. the intent is slow rate, but one can do at unexpected times. >>>>> Do you agree? >>>> I think we don't have an intention of the frequency in the spec. >>> Sure. >>> >>>> The spec only provides generic mechanisms and interfaces. >>> Sure. >>> >>>> Don't assume it(or driver wants it to be) would be often or not, that >>>> depends on the driver. >>> As you rightly said : it cannot be assumed. >>> The driver will read the device status right after it wrote it. This typically is < >> 50nsec of time. >>> The suspend operation for a net device to store hundreds of queues, RSS >> table, flow filters, takes plenty of time (at least more than 50nsec :) ). >>> Similarly for the GPU to store some MBs of memory takes more than 50nsec >> of time, for example to store in a file for a software-based GPU device. >>> So a device cannot respond back suspend=true in next 50nsec time. >> It's OK for the device to take longer time to respond, the driver simply re-reads >> device status. > Sure, the issue is, when the driver re-reads, the device must present suspend=false within 50nsec. > (because device didn't suspend it yet). > > As I explained in previous email, this requires building special circuitry. > Such circuitry can be avoided if the suspend interface is done slightly differently. why there is a constraint condition of the time? Are there any similar constraining for other states like RESET or DRIVER_OK? Don't assume any other states transitions are faster than SUSPEND. > >>> More below. >>> >>>>>>> b. A software-based device may not always want to force VM_EXIT on >>>>>>> read >>>>>> and write on the device_status register? >>>>>> Trap and Emulation is the basic of virtualization, and how to >>>>>> pass-through a device is out of this spec. >>>>>> >>>>> Sure, I didn't suggest to put such things in the spec. >>>>> My question is, whether to trap and emulate or not is a choice of >>>>> the >>>> software. >>>>> Do you agree? >>>> The device emulator does not know anything about whether trapped or >> not. >>>> Trapping and Emulation is a hypervisor thing. >>>> >>>> If here "software" refers to the device emulator, then Yes, it is not >>>> the emulator's decision. And the device should not be aware of >>>> VM_EXIT & VM_ENTRY. >>>> >>> Right. So a software wants to implement device_status as pure MMIO >> writes. (and not VM_EXIT). >> This is not always true, there can be VM_EXIT of pure emulated devices. > I am not denying that VM_EXIT can/cannot be there. > I am saying, the proposal forces VM_EXIT based approach in my understanding of this patch. > If that is not true, may be can you please explain how this can be implemented without VM_EXIT? > > We should have an interface that can be done with and without VM_EXIT method at least for any new additions. VM_EXIT is out of spec, it is a hypervisor and the processor thing. In non-pass-through case, any register access are sensitive and will trigger VM_EXIT. Like RESET or DRIVER_OK needs to access device_status, nothing different. > >> The >> HW registers are sensitive resource and any access to them need to be trapped >> and emulated. > This does not apply to PCI PFs and VFs which are HW devices (mainly PFs). > so this trap + emulation is narrow view that we better avoid. This is how *basic* virtualization work, once access sensitive resource, trap it. > > If you think this is the way forward, you should put forward in patch as MUST requirement. > and that does not look right to me. > I hope you also don't mean to force this method to device implementations. > Right? Again, VM_EXIT is a hypervisor thing, out of spec. Whether there is a VM_EXIT when setting SUSPEND totally depends on the virtualization solution. And SUSPEND is nothing different from DRIVER_OK. Means, if your virtualization needs to trap SUSPEND, it also needs to trap DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND. > >>> And prefer to returning SUSPEND=true at slow pace. >>> This means, the device implementation cannot immediately return >> suspend=true right after it was written. >>> A MMIO read will read it back, as suspend=true. >>> >>> An alternative would be, to forward CPU loads and CPU stores to different >> address. >>> However, this does not work for the hw based devices. >>> >>> That means, PCI HW needs to return suspend=0, until the device is not >> suspended. >>> In this example, the device cannot build special circuitry to answer >> suspend=true within 50nsec, or in other words building special circuitry to >> return suspend=false is too complex for the slow operation. >> why? The device can just not to change the value of the SUSPEND bit before it >> has fully suspended. > When driver wrote, it wrote suspend=true, > And device returns suspend=false while suspend is ongoing, right? > If yes, this is expensive because the device needs to operate within 50nsec or less to answer suspend=false. > > And even worst, it needs to suspend=true when unsuspending within 50nsec when resuming is ongoing. again, there is not a 50nsec constraining and please take a reference of how DRIVER_OK work with virtualization. > >>> If this understanding of burden is clear, >>> >>> The proposal is, can you please extend the interface such that, >>> >>> 1. driver writes suspend command. >>> 2. driver reads suspend_status, and receives not_completed=(false). This is >> the default value. >>> 3. When the device completes suspend, it changes the polarity of >> suspend_status=true. >>> This has two main benefits: >>> [A] This will enable software-based devices to write data to slow files and >> does not have to force VM_EXITs. >>> [B] It also enables hw based devices to not build special circuitry to answer >> within 50nsec, which can get very complicated for tens or hundreds of PCI >> PFs. >> I think we have already discussed on this before in V5, and Jason has some >> insightful comments >> > Unfortunately, not. His comment was that it is not specific to suspend. > But here we are introducing a new interface and functionality that does not need to suffer or follow anything that may not be efficient. > >> https://lore.kernel.org/virtio-comment/20240612082055-mutt-send-email-mst@kernel.org/T/#mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9 >>>> this is out of the spec anyway. >>>> >>>> Thanks >>>> Zhu Lingshan >>>>>> Thanks >>>>>> Zhu Lingshan >>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General >>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>> + >>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>> + >>>>>>>> +The device MUST ignore all access to its Configuration Space >>>>>>>> +while suspended, except for \field{device status} if it is part >>>>>>>> +of the Configuration >>>>>>>> Space. >>>>>>>> + >>>>>>>> +A device MUST NOT send any notifications for any virtqeuues, >>>>>>>> +access any virtqueues, or modify any fields in its Configuration >>>>>>>> +Space while suspended. >>>>>>>> + >>>>>>>> +If changes occur in the Configuration Space while the SUSPEND >>>>>>>> +bit is set, the device MUST NOT send any configuration change >>>> notifications. >>>>>>>> +Instead, the device MUST send the notification after the SUSPEND >>>>>>>> +bit has >>>>>>>> been cleared. >>>>>>>> + >>>>>>>> +When the driver sets SUSPEND, the device MUST either suspend >>>>>>>> +itself or set >>>>>>>> DEVICE_NEEDS_RESET if failed to suspend. >>>>>>>> + >>>>>>>> +If SUSPEND is set in \field{device status}, when the driver >>>>>>>> +clears SUSPEND, the device MUST either resume normal operation >>>>>>>> +or set >>>>>>>> DEVICE_NEEDS_RESET. >>>>>>>> + >>>>>>>> +When the driver sets SUSPEND, >>>>>>>> +the device SHOULD perform the following actions before >>>>>>>> +presenting that >>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}: >>>>>>>> + >>>>>>>> +\begin{itemize} >>>>>>>> +\item Stop processing more buffers of any virtqueues \item Wait >>>>>>>> +until all buffers that are being processed have been used. >>>>>>>> +\item Send used buffer notifications to the driver. >>>>>>>> +\end{itemize} >>>>>>>> + >>>>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport >>>>>>>> Options} >>>>>>>> >>>>>>>> Virtio can use various different buses, thus the standard is >>>>>>>> split @@ -872,6 >>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >>>>>>>> +Feature >>>>>>>> Bits} >>>>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / >>>>>>>> Feature Bits} for >>>>>>>> handling features reserved for future use. >>>>>>>> >>>>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the >>>>>>>> + driver >>>> can >>>>>>>> + trigger suspending the device via the SUSPEND flag >>>>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >>>>>>>> + >>>>>>>> \end{description} >>>>>>>> >>>>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved >>>>>>>> Feature Bits} >>>>>>>> -- >>>>>>>> 2.45.2 >>>>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-30 2:31 ` Zhu Lingshan @ 2024-08-30 3:02 ` Parav Pandit 2024-09-03 9:05 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-30 3:02 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Friday, August 30, 2024 8:02 AM > > On 8/15/2024 5:34 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Thursday, August 15, 2024 1:53 PM > >> > >> On 8/13/2024 2:55 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Tuesday, August 13, 2024 11:44 AM > >>>> > >>> I removed the unreachable intel email id as every single email is > >>> bouncing > >> from it. > >>> Please consider dropping that email from v8 as it will cause all > >>> reviewers > >> email to bounce. > >>>> On 8/13/2024 1:50 PM, Parav Pandit wrote: > >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>> Sent: Tuesday, August 13, 2024 11:15 AM > >>>>>> > >>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: > >>>>>>> Hi Lingshan, David, > >>>>>>> > >>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM > >>>>>>>> > >>>>>>>> This commit allows the driver to suspend the device by > >>>>>>>> introducing a new status bit SUSPEND in device_status. > >>>>>>>> > >>>>>>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND > >>>>>>>> which indicating whether the device support SUSPEND. > >>>>>>>> > >>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> > >>>>>>>> --- > >>>>>>>> content.tex | 75 > >>>> ++++++++++++++++++++++++++++++++++++++++++++++- > >>>>>> ---- > >>>>>>>> -- > >>>>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) > >>>>>>>> > >>>>>>>> Changes from V6: > >>>>>>>> - the device should hold its config interrupt while SUSPEND, > >>>>>>>> and send config interrupt when the SUSPEND bit is cleared. > >>>>>>>> - while SUSPEND, the driver MUST NOT access Device > >>>>>>>> Configuration Space > >>>>>>>> - minor changes. > >>>>>>>> > >>>>>>>> Changes from V5: > >>>>>>>> - the device should present NEEDS_RESET if failed to suspend > >>>>>>>> - allow the driver access device status in the config space when > >>>>>>>> suspended if it is implemented in config space. > >>>>>>>> - language improvements > >>>>>>>> > >>>>>>>> Changes from V4: > >>>>>>>> - re-order the device status bits section > >>>>>>>> - kick vqs --> notify vqs > >>>>>>>> > >>>>>>>> Changes from V3: > >>>>>>>> - allow the driver clearing the SUSPEND bit to resume the device. > >>>>>>>> - disallow access to config space while suspended. > >>>>>>>> > >>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 > >>>>>>>> 100644 > >>>>>>>> --- a/content.tex > >>>>>>>> +++ b/content.tex > >>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} > >>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>>>> this bit. For example, under Linux, drivers can be loadable > modules. > >>>>>>>> \end{note} > >>>>>>>> > >>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in the > >>>>>>>> guest, > >>>>>>>> - and it has given up on the device. This could be an internal > >>>>>>>> - error, or the driver didn't like the device for some reason, > >>>>>>>> or > >>>>>>>> - even a fatal error during device operation. > >>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>>>> +ready to > >>>>>>>> + drive the device. > >>>>>>>> > >>>>>>>> \item[FEATURES_OK (8)] Indicates that the driver has > >>>>>>>> acknowledged all > >>>>>> the > >>>>>>>> features it understands, and feature negotiation is complete. > >>>>>>>> > >>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>>>> ready to > >>>>>>>> - drive the device. > >>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, > >>>> indicates > >>>>>>>> +that the > >>>>>>>> + device has been suspended by the driver. > >>>>>>>> > >>>>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > >>>>>> experienced > >>>>>>>> an error from which it can't recover. > >>>>>>>> + > >>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in the > >>>>>>>> +guest, > >>>>>>>> + and it has given up on the device. This could be an internal > >>>>>>>> + error, or the driver didn't like the device for some reason, > >>>>>>>> +or > >>>>>>>> + even a fatal error during device operation. > >>>>>>>> \end{description} > >>>>>>>> > >>>>>>>> The \field{device status} field starts out as 0, and is > >>>>>>>> reinitialized to 0 by @@ - > >>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status} > >>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>>>> initialization sequence specified in \ref{sec:General > >>>>>>>> Initialization And Device Operation / Device > >>>>>> Initialization}. > >>>>>>>> -The driver MUST NOT clear a > >>>>>>>> -\field{device status} bit. If the driver sets the FAILED bit, > >>>>>>>> +The driver MUST NOT clear a \field{device status} bit other > >>>>>>>> +than SUSPEND except when setting \field{device status} to 0 as > >>>>>>>> +a transport-specific way to initiate a reset. If the driver > >>>>>>>> +sets the FAILED bit, > >>>>>>>> the driver MUST later reset the device before attempting to > >>>>>>>> re- > >>>> initialize. > >>>>>>>> The driver SHOULD NOT rely on completion of operations of a > @@ > >>>>>>>> -99,10 > >>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of > >>>>>>>> +a Virtio Device > >>>>>>>> / Feature B \begin{description} > >>>>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific > >>>>>>>> device type > >>>>>>>> > >>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the > >>>>>>>> queue and > >>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the > >>>>>>>> +queue and > >>>>>>>> feature negotiation mechanisms > >>>>>>>> > >>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for > >>>>>>>> future extensions. > >>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for > >>>>>>>> +future > >>>>>>>> extensions. > >>>>>>>> \end{description} > >>>>>>>> > >>>>>>>> \begin{note} > >>>>>>>> @@ -629,6 +633,53 @@ \section{Device > Cleanup}\label{sec:General > >>>>>>>> Initialization And Device Operation / > >>>>>>>> > >>>>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device > >>>>>>>> reset) before removing exposed buffers. > >>>>>>>> > >>>>>>>> +\section{Device Suspend}\label{sec:General Initialization And > >>>>>>>> +Device Operation / Device Suspend} > >>>>>>>> + > >>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the > >>>>>>>> +SUSPEND bit in \field{device status} to suspend a device, and > >>>>>>>> +can clear the SUSPEND bit to resume a suspended device. > >>>>>>>> + > >>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General > >>>>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>>>> + > >>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or > >>>>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>>>> + > >>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the > device: > >>>>>>>> +\begin{itemize} > >>>>>>>> +\item The driver MUST re-read \field{device status} to verify > >>>>>>>> +whether the > >>>>>>>> SUSPEND bit is set. > >>>>>>>> +\item The driver MUST NOT make any more buffers available to > >>>>>>>> +the > >>>>>> device. > >>>>>>>> +\item The driver MUST NOT access any virtqueues or send > >>>>>>>> +notifications for > >>>>>>>> any virtqueues. > >>>>>>>> +\item The driver MUST NOT access Device Configuration Space. > >>>>>>>> +\end{itemize} > >>>>>>>> + > >>>>>> Hi Parva > >>>>>>> Do we agree that > >>>>>>> a. suspending a device is non frequent operation (in order of N > >>>>>> operations/sec, where N is roughly in range of 10 or 100) per device? > >>>>>> Ideally it should not be often in normal operations, but remember > >>>>>> we can not restrict the behaviors of the driver, so we must be > >>>>>> able to handle the scenario in which SUSPENDING is often. > >>>>> Sure. the intent is slow rate, but one can do at unexpected times. > >>>>> Do you agree? > >>>> I think we don't have an intention of the frequency in the spec. > >>> Sure. > >>> > >>>> The spec only provides generic mechanisms and interfaces. > >>> Sure. > >>> > >>>> Don't assume it(or driver wants it to be) would be often or not, > >>>> that depends on the driver. > >>> As you rightly said : it cannot be assumed. > >>> The driver will read the device status right after it wrote it. This > >>> typically is < > >> 50nsec of time. > >>> The suspend operation for a net device to store hundreds of queues, > >>> RSS > >> table, flow filters, takes plenty of time (at least more than 50nsec :) ). > >>> Similarly for the GPU to store some MBs of memory takes more than > >>> 50nsec > >> of time, for example to store in a file for a software-based GPU device. > >>> So a device cannot respond back suspend=true in next 50nsec time. > >> It's OK for the device to take longer time to respond, the driver > >> simply re-reads device status. > > Sure, the issue is, when the driver re-reads, the device must present > suspend=false within 50nsec. > > (because device didn't suspend it yet). > > > > As I explained in previous email, this requires building special circuitry. > > Such circuitry can be avoided if the suspend interface is done slightly > differently. > why there is a constraint condition of the time? Because this is what driver does as you explained in [1] indicating "we must be able to handle ..." [1] https://lore.kernel.org/virtio-comment/c4d5eed3-774b-4d35-a007-f9dff28ce884@amd.com/T/#m6f081f96ef9dcea29c64a88b633eb21d50e8c410 > Are there any similar > constraining for other states like RESET or DRIVER_OK? Don't assume any > other states transitions are faster than SUSPEND. DRIVER_OK does not suffer from it because it is async notification. A device may start slow after DRIVER_OK. SUSPEND operation cannot rely on such async behavior. RESET also suffers from similar inefficiencies. But that is because it is inherited from the past. Here a new functionality is being proposed and it has a chance for efficient device implementation. Therefore the request is to improve it. > > > >>> More below. > >>> > >>>>>>> b. A software-based device may not always want to force VM_EXIT > >>>>>>> on read > >>>>>> and write on the device_status register? > >>>>>> Trap and Emulation is the basic of virtualization, and how to > >>>>>> pass-through a device is out of this spec. > >>>>>> > >>>>> Sure, I didn't suggest to put such things in the spec. > >>>>> My question is, whether to trap and emulate or not is a choice of > >>>>> the > >>>> software. > >>>>> Do you agree? > >>>> The device emulator does not know anything about whether trapped > or > >> not. > >>>> Trapping and Emulation is a hypervisor thing. > >>>> > >>>> If here "software" refers to the device emulator, then Yes, it is > >>>> not the emulator's decision. And the device should not be aware of > >>>> VM_EXIT & VM_ENTRY. > >>>> > >>> Right. So a software wants to implement device_status as pure MMIO > >> writes. (and not VM_EXIT). > >> This is not always true, there can be VM_EXIT of pure emulated > >> devices. > > I am not denying that VM_EXIT can/cannot be there. > > I am saying, the proposal forces VM_EXIT based approach in my > understanding of this patch. [MARKER_1] > > If that is not true, may be can you please explain how this can be > implemented without VM_EXIT? > > > > We should have an interface that can be done with and without VM_EXIT > method at least for any new additions. > VM_EXIT is out of spec, it is a hypervisor and the processor thing. You keep repeating VM_EXIST is out of spec. I already replied at [MARKER_1], sure it is out of spec, the current approach forces one to do VM_EXIT based approach. And if not, please explain, how can it be achieved? > In non-pass-through case, any register access are sensitive and will trigger > VM_EXIT. Like RESET or DRIVER_OK needs to access device_status, nothing > different. I don't think you understood the point. Let me repeat, The question is, if the device implementation wants to achieve the functionality without VM_EXIT, what is the way? > > > >> The > >> HW registers are sensitive resource and any access to them need to be > >> trapped and emulated. > > This does not apply to PCI PFs and VFs which are HW devices (mainly PFs). > > so this trap + emulation is narrow view that we better avoid. > This is how *basic* virtualization work, once access sensitive resource, trap it. > > > > If you think this is the way forward, you should put forward in patch as > MUST requirement. > > and that does not look right to me. > > I hope you also don't mean to force this method to device > implementations. > > Right? > Again, VM_EXIT is a hypervisor thing, out of spec. Whether there is a > VM_EXIT when setting SUSPEND totally depends on the virtualization > solution. And SUSPEND is nothing different from DRIVER_OK. > Please avoid repeating the point that VM_EXIT is hypervosor thing. No one asked to put this in spec. Please re-read [MARKER1]. You probably missed that. SUSPEND is different than DRIVER_OK. I explained the timing constraints and the required circuitry needed to fulfill the proposal. And with the additional register, such complicated circuitry can be easily avoided. > Means, if your virtualization needs to trap SUSPEND, it also needs to trap > DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND. DRIVER_OK is by law of physics is faster than SUSPEND because it does not demand the driver of reading back. There is no driver side loop to check if the device accepted DRIVER_OK or not. Agree? > > > >>> And prefer to returning SUSPEND=true at slow pace. > >>> This means, the device implementation cannot immediately return > >> suspend=true right after it was written. > >>> A MMIO read will read it back, as suspend=true. > >>> > >>> An alternative would be, to forward CPU loads and CPU stores to > >>> different > >> address. > >>> However, this does not work for the hw based devices. > >>> > >>> That means, PCI HW needs to return suspend=0, until the device is > >>> not > >> suspended. > >>> In this example, the device cannot build special circuitry to answer > >> suspend=true within 50nsec, or in other words building special > >> circuitry to return suspend=false is too complex for the slow operation. > >> why? The device can just not to change the value of the SUSPEND bit > >> before it has fully suspended. > > When driver wrote, it wrote suspend=true, And device returns > > suspend=false while suspend is ongoing, right? > > If yes, this is expensive because the device needs to operate within 50nsec > or less to answer suspend=false. > > > > And even worst, it needs to suspend=true when unsuspending within > 50nsec when resuming is ongoing. > again, there is not a 50nsec constraining and please take a reference of how > DRIVER_OK work with virtualization. There is. The device is expected to return back the desired value to indicate driver that suspend is ongoing. It is different than DRIVER_OK. > > > >>> If this understanding of burden is clear, > >>> > >>> The proposal is, can you please extend the interface such that, > >>> > >>> 1. driver writes suspend command. > >>> 2. driver reads suspend_status, and receives not_completed=(false). > >>> This is > >> the default value. > >>> 3. When the device completes suspend, it changes the polarity of > >> suspend_status=true. > >>> This has two main benefits: > >>> [A] This will enable software-based devices to write data to slow > >>> files and > >> does not have to force VM_EXITs. > >>> [B] It also enables hw based devices to not build special circuitry > >>> to answer > >> within 50nsec, which can get very complicated for tens or hundreds of > >> PCI PFs. > >> I think we have already discussed on this before in V5, and Jason has > >> some insightful comments > >> > > Unfortunately, not. His comment was that it is not specific to suspend. > > But here we are introducing a new interface and functionality that does not > need to suffer or follow anything that may not be efficient. > > > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor > >> e.kernel.org%2Fvirtio-comment%2F20240612082055-mutt-send-email- > mst%40 > >> > kernel.org%2FT%2F%23mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9&dat > a=05 > >> > %7C02%7Cparav%40nvidia.com%7C619c82b60b824ca03f8808dcc89beb4a%7 > C43083 > >> > d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638605819247985989%7CUn > known%7C > >> > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL > CJXV > >> > CI6Mn0%3D%7C0%7C%7C%7C&sdata=VqmJmqt3k5tf3x4ihjE5Bd7u59GadOn > OaOfJ5lvG > >> 1DE%3D&reserved=0 > >>>> this is out of the spec anyway. > >>>> > >>>> Thanks > >>>> Zhu Lingshan > >>>>>> Thanks > >>>>>> Zhu Lingshan > >>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General > >>>>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>>>> + > >>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or > >>>>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>>>> + > >>>>>>>> +The device MUST ignore all access to its Configuration Space > >>>>>>>> +while suspended, except for \field{device status} if it is > >>>>>>>> +part of the Configuration > >>>>>>>> Space. > >>>>>>>> + > >>>>>>>> +A device MUST NOT send any notifications for any virtqeuues, > >>>>>>>> +access any virtqueues, or modify any fields in its > >>>>>>>> +Configuration Space while suspended. > >>>>>>>> + > >>>>>>>> +If changes occur in the Configuration Space while the SUSPEND > >>>>>>>> +bit is set, the device MUST NOT send any configuration change > >>>> notifications. > >>>>>>>> +Instead, the device MUST send the notification after the > >>>>>>>> +SUSPEND bit has > >>>>>>>> been cleared. > >>>>>>>> + > >>>>>>>> +When the driver sets SUSPEND, the device MUST either suspend > >>>>>>>> +itself or set > >>>>>>>> DEVICE_NEEDS_RESET if failed to suspend. > >>>>>>>> + > >>>>>>>> +If SUSPEND is set in \field{device status}, when the driver > >>>>>>>> +clears SUSPEND, the device MUST either resume normal > operation > >>>>>>>> +or set > >>>>>>>> DEVICE_NEEDS_RESET. > >>>>>>>> + > >>>>>>>> +When the driver sets SUSPEND, > >>>>>>>> +the device SHOULD perform the following actions before > >>>>>>>> +presenting that > >>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}: > >>>>>>>> + > >>>>>>>> +\begin{itemize} > >>>>>>>> +\item Stop processing more buffers of any virtqueues \item > >>>>>>>> +Wait until all buffers that are being processed have been used. > >>>>>>>> +\item Send used buffer notifications to the driver. > >>>>>>>> +\end{itemize} > >>>>>>>> + > >>>>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport > >>>>>>>> Options} > >>>>>>>> > >>>>>>>> Virtio can use various different buses, thus the standard is > >>>>>>>> split @@ -872,6 > >>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > >>>>>>>> +Feature > >>>>>>>> Bits} > >>>>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / > >>>>>>>> Feature Bits} for > >>>>>>>> handling features reserved for future use. > >>>>>>>> > >>>>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the > >>>>>>>> + driver > >>>> can > >>>>>>>> + trigger suspending the device via the SUSPEND flag > >>>>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status > Field}. > >>>>>>>> + > >>>>>>>> \end{description} > >>>>>>>> > >>>>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved > >>>>>>>> Feature Bits} > >>>>>>>> -- > >>>>>>>> 2.45.2 > >>>>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-30 3:02 ` Parav Pandit @ 2024-09-03 9:05 ` Zhu Lingshan 2024-09-03 9:45 ` Michael S. Tsirkin 2024-09-03 10:28 ` Parav Pandit 0 siblings, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-03 9:05 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/30/2024 11:02 AM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Friday, August 30, 2024 8:02 AM >> >> On 8/15/2024 5:34 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Thursday, August 15, 2024 1:53 PM >>>> >>>> On 8/13/2024 2:55 PM, Parav Pandit wrote: >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Sent: Tuesday, August 13, 2024 11:44 AM >>>>>> >>>>> I removed the unreachable intel email id as every single email is >>>>> bouncing >>>> from it. >>>>> Please consider dropping that email from v8 as it will cause all >>>>> reviewers >>>> email to bounce. >>>>>> On 8/13/2024 1:50 PM, Parav Pandit wrote: >>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>> Sent: Tuesday, August 13, 2024 11:15 AM >>>>>>>> >>>>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: >>>>>>>>> Hi Lingshan, David, >>>>>>>>> >>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM >>>>>>>>>> >>>>>>>>>> This commit allows the driver to suspend the device by >>>>>>>>>> introducing a new status bit SUSPEND in device_status. >>>>>>>>>> >>>>>>>>>> This commit also introduce a new feature bit VIRTIO_F_SUSPEND >>>>>>>>>> which indicating whether the device support SUSPEND. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >>>>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> >>>>>>>>>> --- >>>>>>>>>> content.tex | 75 >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++- >>>>>>>> ---- >>>>>>>>>> -- >>>>>>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) >>>>>>>>>> >>>>>>>>>> Changes from V6: >>>>>>>>>> - the device should hold its config interrupt while SUSPEND, >>>>>>>>>> and send config interrupt when the SUSPEND bit is cleared. >>>>>>>>>> - while SUSPEND, the driver MUST NOT access Device >>>>>>>>>> Configuration Space >>>>>>>>>> - minor changes. >>>>>>>>>> >>>>>>>>>> Changes from V5: >>>>>>>>>> - the device should present NEEDS_RESET if failed to suspend >>>>>>>>>> - allow the driver access device status in the config space when >>>>>>>>>> suspended if it is implemented in config space. >>>>>>>>>> - language improvements >>>>>>>>>> >>>>>>>>>> Changes from V4: >>>>>>>>>> - re-order the device status bits section >>>>>>>>>> - kick vqs --> notify vqs >>>>>>>>>> >>>>>>>>>> Changes from V3: >>>>>>>>>> - allow the driver clearing the SUSPEND bit to resume the device. >>>>>>>>>> - disallow access to config space while suspended. >>>>>>>>>> >>>>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 >>>>>>>>>> 100644 >>>>>>>>>> --- a/content.tex >>>>>>>>>> +++ b/content.tex >>>>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} >>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>>>> this bit. For example, under Linux, drivers can be loadable >> modules. >>>>>>>>>> \end{note} >>>>>>>>>> >>>>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in the >>>>>>>>>> guest, >>>>>>>>>> - and it has given up on the device. This could be an internal >>>>>>>>>> - error, or the driver didn't like the device for some reason, >>>>>>>>>> or >>>>>>>>>> - even a fatal error during device operation. >>>>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>>>> +ready to >>>>>>>>>> + drive the device. >>>>>>>>>> >>>>>>>>>> \item[FEATURES_OK (8)] Indicates that the driver has >>>>>>>>>> acknowledged all >>>>>>>> the >>>>>>>>>> features it understands, and feature negotiation is complete. >>>>>>>>>> >>>>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>>>> ready to >>>>>>>>>> - drive the device. >>>>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, >>>>>> indicates >>>>>>>>>> +that the >>>>>>>>>> + device has been suspended by the driver. >>>>>>>>>> >>>>>>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >>>>>>>> experienced >>>>>>>>>> an error from which it can't recover. >>>>>>>>>> + >>>>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in the >>>>>>>>>> +guest, >>>>>>>>>> + and it has given up on the device. This could be an internal >>>>>>>>>> + error, or the driver didn't like the device for some reason, >>>>>>>>>> +or >>>>>>>>>> + even a fatal error during device operation. >>>>>>>>>> \end{description} >>>>>>>>>> >>>>>>>>>> The \field{device status} field starts out as 0, and is >>>>>>>>>> reinitialized to 0 by @@ - >>>>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status} >>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>>>> initialization sequence specified in \ref{sec:General >>>>>>>>>> Initialization And Device Operation / Device >>>>>>>> Initialization}. >>>>>>>>>> -The driver MUST NOT clear a >>>>>>>>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>>>>>>>> +The driver MUST NOT clear a \field{device status} bit other >>>>>>>>>> +than SUSPEND except when setting \field{device status} to 0 as >>>>>>>>>> +a transport-specific way to initiate a reset. If the driver >>>>>>>>>> +sets the FAILED bit, >>>>>>>>>> the driver MUST later reset the device before attempting to >>>>>>>>>> re- >>>>>> initialize. >>>>>>>>>> The driver SHOULD NOT rely on completion of operations of a >> @@ >>>>>>>>>> -99,10 >>>>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of >>>>>>>>>> +a Virtio Device >>>>>>>>>> / Feature B \begin{description} >>>>>>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific >>>>>>>>>> device type >>>>>>>>>> >>>>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the >>>>>>>>>> queue and >>>>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the >>>>>>>>>> +queue and >>>>>>>>>> feature negotiation mechanisms >>>>>>>>>> >>>>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for >>>>>>>>>> future extensions. >>>>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for >>>>>>>>>> +future >>>>>>>>>> extensions. >>>>>>>>>> \end{description} >>>>>>>>>> >>>>>>>>>> \begin{note} >>>>>>>>>> @@ -629,6 +633,53 @@ \section{Device >> Cleanup}\label{sec:General >>>>>>>>>> Initialization And Device Operation / >>>>>>>>>> >>>>>>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device >>>>>>>>>> reset) before removing exposed buffers. >>>>>>>>>> >>>>>>>>>> +\section{Device Suspend}\label{sec:General Initialization And >>>>>>>>>> +Device Operation / Device Suspend} >>>>>>>>>> + >>>>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the >>>>>>>>>> +SUSPEND bit in \field{device status} to suspend a device, and >>>>>>>>>> +can clear the SUSPEND bit to resume a suspended device. >>>>>>>>>> + >>>>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General >>>>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>>>> + >>>>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or >>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>>>> + >>>>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the >> device: >>>>>>>>>> +\begin{itemize} >>>>>>>>>> +\item The driver MUST re-read \field{device status} to verify >>>>>>>>>> +whether the >>>>>>>>>> SUSPEND bit is set. >>>>>>>>>> +\item The driver MUST NOT make any more buffers available to >>>>>>>>>> +the >>>>>>>> device. >>>>>>>>>> +\item The driver MUST NOT access any virtqueues or send >>>>>>>>>> +notifications for >>>>>>>>>> any virtqueues. >>>>>>>>>> +\item The driver MUST NOT access Device Configuration Space. >>>>>>>>>> +\end{itemize} >>>>>>>>>> + >>>>>>>> Hi Parva >>>>>>>>> Do we agree that >>>>>>>>> a. suspending a device is non frequent operation (in order of N >>>>>>>> operations/sec, where N is roughly in range of 10 or 100) per device? >>>>>>>> Ideally it should not be often in normal operations, but remember >>>>>>>> we can not restrict the behaviors of the driver, so we must be >>>>>>>> able to handle the scenario in which SUSPENDING is often. >>>>>>> Sure. the intent is slow rate, but one can do at unexpected times. >>>>>>> Do you agree? >>>>>> I think we don't have an intention of the frequency in the spec. >>>>> Sure. >>>>> >>>>>> The spec only provides generic mechanisms and interfaces. >>>>> Sure. >>>>> >>>>>> Don't assume it(or driver wants it to be) would be often or not, >>>>>> that depends on the driver. >>>>> As you rightly said : it cannot be assumed. >>>>> The driver will read the device status right after it wrote it. This >>>>> typically is < >>>> 50nsec of time. >>>>> The suspend operation for a net device to store hundreds of queues, >>>>> RSS >>>> table, flow filters, takes plenty of time (at least more than 50nsec :) ). >>>>> Similarly for the GPU to store some MBs of memory takes more than >>>>> 50nsec >>>> of time, for example to store in a file for a software-based GPU device. >>>>> So a device cannot respond back suspend=true in next 50nsec time. >>>> It's OK for the device to take longer time to respond, the driver >>>> simply re-reads device status. >>> Sure, the issue is, when the driver re-reads, the device must present >> suspend=false within 50nsec. >>> (because device didn't suspend it yet). >>> >>> As I explained in previous email, this requires building special circuitry. >>> Such circuitry can be avoided if the suspend interface is done slightly >> differently. >> why there is a constraint condition of the time? > Because this is what driver does as you explained in [1] indicating "we must be able to handle ..." > > [1] https://lore.kernel.org/virtio-comment/c4d5eed3-774b-4d35-a007-f9dff28ce884@amd.com/T/#m6f081f96ef9dcea29c64a88b633eb21d50e8c410 The sentence is "we must be able to handle the scenario in which SUSPENDING is often", where do you see any constraint conditions of time? > >> Are there any similar >> constraining for other states like RESET or DRIVER_OK? Don't assume any >> other states transitions are faster than SUSPEND. > DRIVER_OK does not suffer from it because it is async notification. > A device may start slow after DRIVER_OK. > SUSPEND operation cannot rely on such async behavior. The driver can set suspend and re-read & wait. This is a common routine in the driver. > > RESET also suffers from similar inefficiencies. > But that is because it is inherited from the past. > > Here a new functionality is being proposed and it has a chance for efficient device implementation. > Therefore the request is to improve it. sure, as long as we confirm this new register apply for all device status transitions, not only for SUSPEND. > >>>>> More below. >>>>> >>>>>>>>> b. A software-based device may not always want to force VM_EXIT >>>>>>>>> on read >>>>>>>> and write on the device_status register? >>>>>>>> Trap and Emulation is the basic of virtualization, and how to >>>>>>>> pass-through a device is out of this spec. >>>>>>>> >>>>>>> Sure, I didn't suggest to put such things in the spec. >>>>>>> My question is, whether to trap and emulate or not is a choice of >>>>>>> the >>>>>> software. >>>>>>> Do you agree? >>>>>> The device emulator does not know anything about whether trapped >> or >>>> not. >>>>>> Trapping and Emulation is a hypervisor thing. >>>>>> >>>>>> If here "software" refers to the device emulator, then Yes, it is >>>>>> not the emulator's decision. And the device should not be aware of >>>>>> VM_EXIT & VM_ENTRY. >>>>>> >>>>> Right. So a software wants to implement device_status as pure MMIO >>>> writes. (and not VM_EXIT). >>>> This is not always true, there can be VM_EXIT of pure emulated >>>> devices. >>> I am not denying that VM_EXIT can/cannot be there. >>> I am saying, the proposal forces VM_EXIT based approach in my >> understanding of this patch. > [MARKER_1] > >>> If that is not true, may be can you please explain how this can be >> implemented without VM_EXIT? >>> We should have an interface that can be done with and without VM_EXIT >> method at least for any new additions. >> VM_EXIT is out of spec, it is a hypervisor and the processor thing. > You keep repeating VM_EXIST is out of spec. > I already replied at [MARKER_1], sure it is out of spec, the current approach forces one to do VM_EXIT based approach. > And if not, please explain, how can it be achieved? It is not the current solution force the hypervisor & guest to perform VM_EXIT. In basic virtualization, any access to hypervisor emulated HW registers would be Traped and Emulated, means a VM_EXIT and a VM_ENTRY. > >> In non-pass-through case, any register access are sensitive and will trigger >> VM_EXIT. Like RESET or DRIVER_OK needs to access device_status, nothing >> different. > I don't think you understood the point. > Let me repeat, The question is, if the device implementation wants to achieve the functionality without VM_EXIT, what is the way? Map the register to guest address space, out of the spec. > >>>> The >>>> HW registers are sensitive resource and any access to them need to be >>>> trapped and emulated. >>> This does not apply to PCI PFs and VFs which are HW devices (mainly PFs). >>> so this trap + emulation is narrow view that we better avoid. >> This is how *basic* virtualization work, once access sensitive resource, trap it. >>> If you think this is the way forward, you should put forward in patch as >> MUST requirement. >>> and that does not look right to me. >>> I hope you also don't mean to force this method to device >> implementations. >>> Right? >> Again, VM_EXIT is a hypervisor thing, out of spec. Whether there is a >> VM_EXIT when setting SUSPEND totally depends on the virtualization >> solution. And SUSPEND is nothing different from DRIVER_OK. >> > Please avoid repeating the point that VM_EXIT is hypervosor thing. > No one asked to put this in spec. Please re-read [MARKER1]. You probably missed that. > > SUSPEND is different than DRIVER_OK. I explained the timing constraints and the required circuitry needed to fulfill the proposal. > And with the additional register, such complicated circuitry can be easily avoided. Please read QEMU code, that can help you find how the trap - emulate mechanism work for guest when it tries to access sensitive resources. > >> Means, if your virtualization needs to trap SUSPEND, it also needs to trap >> DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND. > DRIVER_OK is by law of physics is faster than SUSPEND because it does not demand the driver of reading back. > There is no driver side loop to check if the device accepted DRIVER_OK or not. > Agree? Why do you assume so? In some vendor implementation, DRIVER_OK can be slow. Can DRIVER_OK implement a deferred initialization even deferred resource allocation? Does the spec say that DRIVER_OK must be faster? > >>>>> And prefer to returning SUSPEND=true at slow pace. >>>>> This means, the device implementation cannot immediately return >>>> suspend=true right after it was written. >>>>> A MMIO read will read it back, as suspend=true. >>>>> >>>>> An alternative would be, to forward CPU loads and CPU stores to >>>>> different >>>> address. >>>>> However, this does not work for the hw based devices. >>>>> >>>>> That means, PCI HW needs to return suspend=0, until the device is >>>>> not >>>> suspended. >>>>> In this example, the device cannot build special circuitry to answer >>>> suspend=true within 50nsec, or in other words building special >>>> circuitry to return suspend=false is too complex for the slow operation. >>>> why? The device can just not to change the value of the SUSPEND bit >>>> before it has fully suspended. >>> When driver wrote, it wrote suspend=true, And device returns >>> suspend=false while suspend is ongoing, right? >>> If yes, this is expensive because the device needs to operate within 50nsec >> or less to answer suspend=false. >>> And even worst, it needs to suspend=true when unsuspending within >> 50nsec when resuming is ongoing. >> again, there is not a 50nsec constraining and please take a reference of how >> DRIVER_OK work with virtualization. > There is. The device is expected to return back the desired value to indicate driver that suspend is ongoing. > It is different than DRIVER_OK. where does the spec or any code say 50nsec? > >>>>> If this understanding of burden is clear, >>>>> >>>>> The proposal is, can you please extend the interface such that, >>>>> >>>>> 1. driver writes suspend command. >>>>> 2. driver reads suspend_status, and receives not_completed=(false). >>>>> This is >>>> the default value. >>>>> 3. When the device completes suspend, it changes the polarity of >>>> suspend_status=true. >>>>> This has two main benefits: >>>>> [A] This will enable software-based devices to write data to slow >>>>> files and >>>> does not have to force VM_EXITs. >>>>> [B] It also enables hw based devices to not build special circuitry >>>>> to answer >>>> within 50nsec, which can get very complicated for tens or hundreds of >>>> PCI PFs. >>>> I think we have already discussed on this before in V5, and Jason has >>>> some insightful comments >>>> >>> Unfortunately, not. His comment was that it is not specific to suspend. >>> But here we are introducing a new interface and functionality that does not >> need to suffer or follow anything that may not be efficient. >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor >>>> e.kernel.org%2Fvirtio-comment%2F20240612082055-mutt-send-email- >> mst%40 >> kernel.org%2FT%2F%23mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9&dat >> a=05 >> %7C02%7Cparav%40nvidia.com%7C619c82b60b824ca03f8808dcc89beb4a%7 >> C43083 >> d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638605819247985989%7CUn >> known%7C >> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL >> CJXV >> CI6Mn0%3D%7C0%7C%7C%7C&sdata=VqmJmqt3k5tf3x4ihjE5Bd7u59GadOn >> OaOfJ5lvG >>>> 1DE%3D&reserved=0 >>>>>> this is out of the spec anyway. >>>>>> >>>>>> Thanks >>>>>> Zhu Lingshan >>>>>>>> Thanks >>>>>>>> Zhu Lingshan >>>>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General >>>>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>>>> + >>>>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>>>> + >>>>>>>>>> +The device MUST ignore all access to its Configuration Space >>>>>>>>>> +while suspended, except for \field{device status} if it is >>>>>>>>>> +part of the Configuration >>>>>>>>>> Space. >>>>>>>>>> + >>>>>>>>>> +A device MUST NOT send any notifications for any virtqeuues, >>>>>>>>>> +access any virtqueues, or modify any fields in its >>>>>>>>>> +Configuration Space while suspended. >>>>>>>>>> + >>>>>>>>>> +If changes occur in the Configuration Space while the SUSPEND >>>>>>>>>> +bit is set, the device MUST NOT send any configuration change >>>>>> notifications. >>>>>>>>>> +Instead, the device MUST send the notification after the >>>>>>>>>> +SUSPEND bit has >>>>>>>>>> been cleared. >>>>>>>>>> + >>>>>>>>>> +When the driver sets SUSPEND, the device MUST either suspend >>>>>>>>>> +itself or set >>>>>>>>>> DEVICE_NEEDS_RESET if failed to suspend. >>>>>>>>>> + >>>>>>>>>> +If SUSPEND is set in \field{device status}, when the driver >>>>>>>>>> +clears SUSPEND, the device MUST either resume normal >> operation >>>>>>>>>> +or set >>>>>>>>>> DEVICE_NEEDS_RESET. >>>>>>>>>> + >>>>>>>>>> +When the driver sets SUSPEND, >>>>>>>>>> +the device SHOULD perform the following actions before >>>>>>>>>> +presenting that >>>>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}: >>>>>>>>>> + >>>>>>>>>> +\begin{itemize} >>>>>>>>>> +\item Stop processing more buffers of any virtqueues \item >>>>>>>>>> +Wait until all buffers that are being processed have been used. >>>>>>>>>> +\item Send used buffer notifications to the driver. >>>>>>>>>> +\end{itemize} >>>>>>>>>> + >>>>>>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport >>>>>>>>>> Options} >>>>>>>>>> >>>>>>>>>> Virtio can use various different buses, thus the standard is >>>>>>>>>> split @@ -872,6 >>>>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >>>>>>>>>> +Feature >>>>>>>>>> Bits} >>>>>>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / >>>>>>>>>> Feature Bits} for >>>>>>>>>> handling features reserved for future use. >>>>>>>>>> >>>>>>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the >>>>>>>>>> + driver >>>>>> can >>>>>>>>>> + trigger suspending the device via the SUSPEND flag >>>>>>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status >> Field}. >>>>>>>>>> + >>>>>>>>>> \end{description} >>>>>>>>>> >>>>>>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved >>>>>>>>>> Feature Bits} >>>>>>>>>> -- >>>>>>>>>> 2.45.2 >>>>>>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 9:05 ` Zhu Lingshan @ 2024-09-03 9:45 ` Michael S. Tsirkin 2024-09-03 10:09 ` Parav Pandit 2024-09-03 10:28 ` Parav Pandit 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-03 9:45 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 03, 2024 at 05:05:40PM +0800, Zhu Lingshan wrote: > >> Are there any similar > >> constraining for other states like RESET or DRIVER_OK? Don't assume any > >> other states transitions are faster than SUSPEND. > > DRIVER_OK does not suffer from it because it is async notification. > > A device may start slow after DRIVER_OK. > > SUSPEND operation cannot rely on such async behavior. > The driver can set suspend and re-read & wait. This is a common routine in the driver. I don't buy all this talk about special machinery, some 10 nor gates, the cost is in the noise - we are talking about cards with several ARM processors on chip. But I don't like it that looking at the registers, one does not know the device state. Hidden state is bad for debuggability. We have 4 states: suspending->suspended->resuming->resumed so we need a register with at least 2 bits. we could steal 2 bits from status but it seems a bit much. > > RESET also suffers from similar inefficiencies. > > But that is because it is inherited from the past. > > > > Here a new functionality is being proposed and it has a chance for efficient device implementation. > > Therefore the request is to improve it. > sure, as long as we confirm this new register apply for all device status transitions, not only for SUSPEND. ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 9:45 ` Michael S. Tsirkin @ 2024-09-03 10:09 ` Parav Pandit 2024-09-03 10:35 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-09-03 10:09 UTC (permalink / raw) To: Michael S. Tsirkin, Zhu Lingshan Cc: cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Tuesday, September 3, 2024 3:16 PM > > On Tue, Sep 03, 2024 at 05:05:40PM +0800, Zhu Lingshan wrote: > > >> Are there any similar > > >> constraining for other states like RESET or DRIVER_OK? Don't assume > > >> any other states transitions are faster than SUSPEND. > > > DRIVER_OK does not suffer from it because it is async notification. > > > A device may start slow after DRIVER_OK. > > > SUSPEND operation cannot rely on such async behavior. > > The driver can set suspend and re-read & wait. This is a common routine in > the driver. > > I don't buy all this talk about special machinery, some 10 nor gates, the cost > is in the noise - we are talking about cards with several ARM processors on > chip. They may not respond back under 50nsec. How many cores would you like to dedicate and for how many PCI functions? And how they can be easily underutilized... So, it is not a noise. Circuity is the device abstraction that we keep instead of digging down on gates. Also can you please explain how can it work without VM_EXIT? You have better insight to this than me.. > > But I don't like it that looking at the registers, one does not know the device > state. Hidden state is bad for debuggability. > We have 4 states: > suspending->suspended->resuming->resumed > so we need a register with at least 2 bits. > > we could steal 2 bits from status but it seems a bit much. > This is why letting the status tell the status and control register to control thing is elegant. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 10:09 ` Parav Pandit @ 2024-09-03 10:35 ` Michael S. Tsirkin 2024-09-03 10:37 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-03 10:35 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 03, 2024 at 10:09:34AM +0000, Parav Pandit wrote: > > > > From: Michael S. Tsirkin <mst@redhat.com> > > Sent: Tuesday, September 3, 2024 3:16 PM > > > > On Tue, Sep 03, 2024 at 05:05:40PM +0800, Zhu Lingshan wrote: > > > >> Are there any similar > > > >> constraining for other states like RESET or DRIVER_OK? Don't assume > > > >> any other states transitions are faster than SUSPEND. > > > > DRIVER_OK does not suffer from it because it is async notification. > > > > A device may start slow after DRIVER_OK. > > > > SUSPEND operation cannot rely on such async behavior. > > > The driver can set suspend and re-read & wait. This is a common routine in > > the driver. > > > > I don't buy all this talk about special machinery, some 10 nor gates, the cost > > is in the noise - we are talking about cards with several ARM processors on > > chip. > They may not respond back under 50nsec. > How many cores would you like to dedicate and for how many PCI functions? > And how they can be easily underutilized... > So, it is not a noise. > > Circuity is the device abstraction that we keep instead of digging down on gates. > > Also can you please explain how can it work without VM_EXIT? You have better insight to this than me.. I don't know which interface we are arguing about here, when there's a specific proposal, I can comment. > > > > But I don't like it that looking at the registers, one does not know the device > > state. Hidden state is bad for debuggability. > > We have 4 states: > > suspending->suspended->resuming->resumed > > so we need a register with at least 2 bits. > > > > we could steal 2 bits from status but it seems a bit much. > > > This is why letting the status tell the status and control register to control thing is elegant. No argument here. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 10:35 ` Michael S. Tsirkin @ 2024-09-03 10:37 ` Michael S. Tsirkin 2024-09-04 3:07 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-03 10:37 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > > But I don't like it that looking at the registers, one does not know the device > > > state. Hidden state is bad for debuggability. > > > We have 4 states: > > > suspending->suspended->resuming->resumed > > > so we need a register with at least 2 bits. > > > > > > we could steal 2 bits from status but it seems a bit much. > > > > > This is why letting the status tell the status and control register to control thing is elegant. > > > No argument here. Or, to be more precise, our status is driver status. If we need to reflect and control device status, we need something else. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 10:37 ` Michael S. Tsirkin @ 2024-09-04 3:07 ` Jason Wang 2024-09-04 4:02 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-04 3:07 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Parav Pandit, Zhu Lingshan, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > > > > But I don't like it that looking at the registers, one does not know the device > > > > state. Hidden state is bad for debuggability. > > > > We have 4 states: > > > > suspending->suspended->resuming->resumed > > > > so we need a register with at least 2 bits. > > > > > > > > we could steal 2 bits from status but it seems a bit much. > > > > > > > This is why letting the status tell the status and control register to control thing is elegant. > > > > > > No argument here. > > Or, to be more precise, our status is driver status. It looks like the device actually otherwise there's no need for re-read or poll for things like reset and others. Anyhow driver know its own status. Thanks > If we need > to reflect and control device status, we need something else. > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 3:07 ` Jason Wang @ 2024-09-04 4:02 ` Michael S. Tsirkin 2024-09-04 6:31 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-04 4:02 UTC (permalink / raw) To: Jason Wang Cc: Parav Pandit, Zhu Lingshan, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > > > > > > But I don't like it that looking at the registers, one does not know the device > > > > > state. Hidden state is bad for debuggability. > > > > > We have 4 states: > > > > > suspending->suspended->resuming->resumed > > > > > so we need a register with at least 2 bits. > > > > > > > > > > we could steal 2 bits from status but it seems a bit much. > > > > > > > > > This is why letting the status tell the status and control register to control thing is elegant. > > > > > > > > > No argument here. > > > > Or, to be more precise, our status is driver status. > > It looks like the device actually otherwise there's no need for > re-read or poll for things like reset and others. The need is there for complex device state transitions, which can not reasonably block a read response. Another standard approach with PCI is to specify the time transitions can take. I consider that less elegant - is this what you are advocating? The advantage is that driver does not load the pci bus with constant re-polling. The disadvantage is that it is hard to pick a universal number. A combination of these approaches might work, e.g. a recommended timeout then poll. > Anyhow driver know > its own status. > > Thanks indeed, the status register is there to inform the device about the driver status. > > If we need > > to reflect and control device status, we need something else. > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 4:02 ` Michael S. Tsirkin @ 2024-09-04 6:31 ` Jason Wang 2024-09-04 6:38 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-04 6:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Parav Pandit, Zhu Lingshan, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > > > > > > > > But I don't like it that looking at the registers, one does not know the device > > > > > > state. Hidden state is bad for debuggability. > > > > > > We have 4 states: > > > > > > suspending->suspended->resuming->resumed > > > > > > so we need a register with at least 2 bits. > > > > > > > > > > > > we could steal 2 bits from status but it seems a bit much. > > > > > > > > > > > This is why letting the status tell the status and control register to control thing is elegant. > > > > > > > > > > > > No argument here. > > > > > > Or, to be more precise, our status is driver status. > > > > It looks like the device actually otherwise there's no need for > > re-read or poll for things like reset and others. > > The need is there for complex device state transitions, which > can not reasonably block a read response. > Another standard approach with PCI is to specify the time > transitions can take. I consider that less elegant - > is this what you are advocating? The advantage is that > driver does not load the pci bus with constant re-polling. > The disadvantage is that it is hard to pick a universal > number. A combination of these approaches might work, > e.g. a recommended timeout then poll. We've already had msleep() for vp_reset(), anyhow we can increase the sleep time, if it can overload the pci: while (vp_modern_get_status(mdev)) msleep(1); We can do the same for suspending. The main blocker for timeout is that it may break migration and complicate the hardening. Another proposal in the past is to have a notification. But what I don't understand here is that suspend/resume should be lighter than reset. If we can afford a reset, so did the suspending/resume. If we want to have something new, that's fine but it should be orthogonal to a specific new status bit? > > > Anyhow driver know > > its own status. > > > > Thanks > > indeed, the status register is there to inform the device about > the driver status. > > > > > > If we need > > > to reflect and control device status, we need something else. > > > > Thanks ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 6:31 ` Jason Wang @ 2024-09-04 6:38 ` Zhu Lingshan 2024-09-04 6:46 ` Parav Pandit 2024-09-05 6:51 ` Michael S. Tsirkin 0 siblings, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-04 6:38 UTC (permalink / raw) To: Jason Wang, Michael S. Tsirkin Cc: Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/4/2024 2:31 PM, Jason Wang wrote: > On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>> But I don't like it that looking at the registers, one does not know the device >>>>>>> state. Hidden state is bad for debuggability. >>>>>>> We have 4 states: >>>>>>> suspending->suspended->resuming->resumed >>>>>>> so we need a register with at least 2 bits. >>>>>>> >>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>> >>>>>> This is why letting the status tell the status and control register to control thing is elegant. >>>>> >>>>> No argument here. >>>> Or, to be more precise, our status is driver status. >>> It looks like the device actually otherwise there's no need for >>> re-read or poll for things like reset and others. >> The need is there for complex device state transitions, which >> can not reasonably block a read response. >> Another standard approach with PCI is to specify the time >> transitions can take. I consider that less elegant - >> is this what you are advocating? The advantage is that >> driver does not load the pci bus with constant re-polling. >> The disadvantage is that it is hard to pick a universal >> number. A combination of these approaches might work, >> e.g. a recommended timeout then poll. > We've already had msleep() for vp_reset(), anyhow we can increase the > sleep time, if it can overload the pci: > > while (vp_modern_get_status(mdev)) > msleep(1); > > We can do the same for suspending. > > The main blocker for timeout is that it may break migration and > complicate the hardening. Another proposal in the past is to have a > notification. > > But what I don't understand here is that suspend/resume should be > lighter than reset. If we can afford a reset, so did the > suspending/resume. If we want to have something new, that's fine but > it should be orthogonal to a specific new status bit? I agree, if we want new status indicator, then the new indicator should not be specific to SUSPEND. Thanks Zhu Lingshan > >>> Anyhow driver know >>> its own status. >>> >>> Thanks >> indeed, the status register is there to inform the device about >> the driver status. >> >> >> >>>> If we need >>>> to reflect and control device status, we need something else. >>>> > Thanks > ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 6:38 ` Zhu Lingshan @ 2024-09-04 6:46 ` Parav Pandit 2024-09-05 7:14 ` Zhu Lingshan 2024-09-05 6:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-09-04 6:46 UTC (permalink / raw) To: Zhu Lingshan, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Wednesday, September 4, 2024 12:09 PM > > On 9/4/2024 2:31 PM, Jason Wang wrote: > > On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > >> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > >>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > >>>>>>> But I don't like it that looking at the registers, one does not > >>>>>>> know the device state. Hidden state is bad for debuggability. > >>>>>>> We have 4 states: > >>>>>>> suspending->suspended->resuming->resumed > >>>>>>> so we need a register with at least 2 bits. > >>>>>>> > >>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>> > >>>>>> This is why letting the status tell the status and control register to > control thing is elegant. > >>>>> > >>>>> No argument here. > >>>> Or, to be more precise, our status is driver status. > >>> It looks like the device actually otherwise there's no need for > >>> re-read or poll for things like reset and others. > >> The need is there for complex device state transitions, which can not > >> reasonably block a read response. > >> Another standard approach with PCI is to specify the time transitions > >> can take. I consider that less elegant - is this what you are > >> advocating? The advantage is that driver does not load the pci bus > >> with constant re-polling. > >> The disadvantage is that it is hard to pick a universal number. A > >> combination of these approaches might work, e.g. a recommended > >> timeout then poll. > > We've already had msleep() for vp_reset(), anyhow we can increase the > > sleep time, if it can overload the pci: > > > > while (vp_modern_get_status(mdev)) > > msleep(1); > > > > We can do the same for suspending. > > > > The main blocker for timeout is that it may break migration and > > complicate the hardening. Another proposal in the past is to have a > > notification. > > > > But what I don't understand here is that suspend/resume should be > > lighter than reset. If we can afford a reset, so did the > > suspending/resume. If we want to have something new, that's fine but > > it should be orthogonal to a specific new status bit? > I agree, if we want new status indicator, then the new indicator should not be > specific to SUSPEND. > Sounds good to me as long as new functionality does not force the device to react in 50nsec. In the current state it forces the device so please upgrade the proposal as Michael guided. Thanks. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 6:46 ` Parav Pandit @ 2024-09-05 7:14 ` Zhu Lingshan 2024-09-05 7:16 ` Parav Pandit 2024-09-05 7:17 ` Michael S. Tsirkin 0 siblings, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:14 UTC (permalink / raw) To: Parav Pandit, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/4/2024 2:46 PM, Parav Pandit wrote: >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Wednesday, September 4, 2024 12:09 PM >> >> On 9/4/2024 2:31 PM, Jason Wang wrote: >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> >> wrote: >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> >> wrote: >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>> But I don't like it that looking at the registers, one does not >>>>>>>>> know the device state. Hidden state is bad for debuggability. >>>>>>>>> We have 4 states: >>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>> >>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>> >>>>>>>> This is why letting the status tell the status and control register to >> control thing is elegant. >>>>>>> No argument here. >>>>>> Or, to be more precise, our status is driver status. >>>>> It looks like the device actually otherwise there's no need for >>>>> re-read or poll for things like reset and others. >>>> The need is there for complex device state transitions, which can not >>>> reasonably block a read response. >>>> Another standard approach with PCI is to specify the time transitions >>>> can take. I consider that less elegant - is this what you are >>>> advocating? The advantage is that driver does not load the pci bus >>>> with constant re-polling. >>>> The disadvantage is that it is hard to pick a universal number. A >>>> combination of these approaches might work, e.g. a recommended >>>> timeout then poll. >>> We've already had msleep() for vp_reset(), anyhow we can increase the >>> sleep time, if it can overload the pci: >>> >>> while (vp_modern_get_status(mdev)) >>> msleep(1); >>> >>> We can do the same for suspending. >>> >>> The main blocker for timeout is that it may break migration and >>> complicate the hardening. Another proposal in the past is to have a >>> notification. >>> >>> But what I don't understand here is that suspend/resume should be >>> lighter than reset. If we can afford a reset, so did the >>> suspending/resume. If we want to have something new, that's fine but >>> it should be orthogonal to a specific new status bit? >> I agree, if we want new status indicator, then the new indicator should not be >> specific to SUSPEND. >> > Sounds good to me as long as new functionality does not force the device to react in 50nsec. > In the current state it forces the device so please upgrade the proposal as Michael guided. Trap and emulate is very often in virtualization, of course the sooner the better for emulation. But there are no 50nsec assumption and no one can force the hypervisor to finish any emulations within a certain time > Thanks. ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:14 ` Zhu Lingshan @ 2024-09-05 7:16 ` Parav Pandit 2024-09-05 7:29 ` Zhu Lingshan 2024-09-05 7:17 ` Michael S. Tsirkin 1 sibling, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-09-05 7:16 UTC (permalink / raw) To: Zhu Lingshan, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, September 5, 2024 12:45 PM > > > On 9/4/2024 2:46 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Wednesday, September 4, 2024 12:09 PM > >> > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> > >> wrote: > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> > >> wrote: > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > >>>>>>>>> But I don't like it that looking at the registers, one does > >>>>>>>>> not know the device state. Hidden state is bad for debuggability. > >>>>>>>>> We have 4 states: > >>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>> > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>> > >>>>>>>> This is why letting the status tell the status and control > >>>>>>>> register to > >> control thing is elegant. > >>>>>>> No argument here. > >>>>>> Or, to be more precise, our status is driver status. > >>>>> It looks like the device actually otherwise there's no need for > >>>>> re-read or poll for things like reset and others. > >>>> The need is there for complex device state transitions, which can > >>>> not reasonably block a read response. > >>>> Another standard approach with PCI is to specify the time > >>>> transitions can take. I consider that less elegant - is this what > >>>> you are advocating? The advantage is that driver does not load the > >>>> pci bus with constant re-polling. > >>>> The disadvantage is that it is hard to pick a universal number. A > >>>> combination of these approaches might work, e.g. a recommended > >>>> timeout then poll. > >>> We've already had msleep() for vp_reset(), anyhow we can increase > >>> the sleep time, if it can overload the pci: > >>> > >>> while (vp_modern_get_status(mdev)) > >>> msleep(1); > >>> > >>> We can do the same for suspending. > >>> > >>> The main blocker for timeout is that it may break migration and > >>> complicate the hardening. Another proposal in the past is to have a > >>> notification. > >>> > >>> But what I don't understand here is that suspend/resume should be > >>> lighter than reset. If we can afford a reset, so did the > >>> suspending/resume. If we want to have something new, that's fine but > >>> it should be orthogonal to a specific new status bit? > >> I agree, if we want new status indicator, then the new indicator > >> should not be specific to SUSPEND. > >> > > Sounds good to me as long as new functionality does not force the device > to react in 50nsec. > > In the current state it forces the device so please upgrade the proposal as > Michael guided. > Trap and emulate is very often in virtualization, of course the sooner the > better for emulation. > But there are no 50nsec assumption and no one can force the hypervisor to > finish any emulations within a certain time > > Thanks. You are completely ignoring the comments. Good luck.. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:16 ` Parav Pandit @ 2024-09-05 7:29 ` Zhu Lingshan 2024-09-05 7:35 ` Parav Pandit 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:29 UTC (permalink / raw) To: Parav Pandit, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/5/2024 3:16 PM, Parav Pandit wrote: >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Thursday, September 5, 2024 12:45 PM >> >> >> On 9/4/2024 2:46 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Wednesday, September 4, 2024 12:09 PM >>>> >>>> On 9/4/2024 2:31 PM, Jason Wang wrote: >>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>>>> But I don't like it that looking at the registers, one does >>>>>>>>>>> not know the device state. Hidden state is bad for debuggability. >>>>>>>>>>> We have 4 states: >>>>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>>>> >>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>>>> >>>>>>>>>> This is why letting the status tell the status and control >>>>>>>>>> register to >>>> control thing is elegant. >>>>>>>>> No argument here. >>>>>>>> Or, to be more precise, our status is driver status. >>>>>>> It looks like the device actually otherwise there's no need for >>>>>>> re-read or poll for things like reset and others. >>>>>> The need is there for complex device state transitions, which can >>>>>> not reasonably block a read response. >>>>>> Another standard approach with PCI is to specify the time >>>>>> transitions can take. I consider that less elegant - is this what >>>>>> you are advocating? The advantage is that driver does not load the >>>>>> pci bus with constant re-polling. >>>>>> The disadvantage is that it is hard to pick a universal number. A >>>>>> combination of these approaches might work, e.g. a recommended >>>>>> timeout then poll. >>>>> We've already had msleep() for vp_reset(), anyhow we can increase >>>>> the sleep time, if it can overload the pci: >>>>> >>>>> while (vp_modern_get_status(mdev)) >>>>> msleep(1); >>>>> >>>>> We can do the same for suspending. >>>>> >>>>> The main blocker for timeout is that it may break migration and >>>>> complicate the hardening. Another proposal in the past is to have a >>>>> notification. >>>>> >>>>> But what I don't understand here is that suspend/resume should be >>>>> lighter than reset. If we can afford a reset, so did the >>>>> suspending/resume. If we want to have something new, that's fine but >>>>> it should be orthogonal to a specific new status bit? >>>> I agree, if we want new status indicator, then the new indicator >>>> should not be specific to SUSPEND. >>>> >>> Sounds good to me as long as new functionality does not force the device >> to react in 50nsec. >>> In the current state it forces the device so please upgrade the proposal as >> Michael guided. >> Trap and emulate is very often in virtualization, of course the sooner the >> better for emulation. >> But there are no 50nsec assumption and no one can force the hypervisor to >> finish any emulations within a certain time >>> Thanks. > You are completely ignoring the comments. > Good luck.. VM_EXIT and a VM_ENTRY work in a pair. > ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:29 ` Zhu Lingshan @ 2024-09-05 7:35 ` Parav Pandit 2024-09-05 8:30 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-09-05 7:35 UTC (permalink / raw) To: Zhu Lingshan, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, September 5, 2024 1:00 PM > > On 9/5/2024 3:16 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Thursday, September 5, 2024 12:45 PM > >> > >> > >> On 9/4/2024 2:46 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Wednesday, September 4, 2024 12:09 PM > >>>> > >>>> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin > >>>>> <mst@redhat.com> > >>>> wrote: > >>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin > >>>>>>> <mst@redhat.com> > >>>> wrote: > >>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin > wrote: > >>>>>>>>>>> But I don't like it that looking at the registers, one does > >>>>>>>>>>> not know the device state. Hidden state is bad for > debuggability. > >>>>>>>>>>> We have 4 states: > >>>>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>>>> > >>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>>>> > >>>>>>>>>> This is why letting the status tell the status and control > >>>>>>>>>> register to > >>>> control thing is elegant. > >>>>>>>>> No argument here. > >>>>>>>> Or, to be more precise, our status is driver status. > >>>>>>> It looks like the device actually otherwise there's no need for > >>>>>>> re-read or poll for things like reset and others. > >>>>>> The need is there for complex device state transitions, which can > >>>>>> not reasonably block a read response. > >>>>>> Another standard approach with PCI is to specify the time > >>>>>> transitions can take. I consider that less elegant - is this what > >>>>>> you are advocating? The advantage is that driver does not load > >>>>>> the pci bus with constant re-polling. > >>>>>> The disadvantage is that it is hard to pick a universal number. > >>>>>> A combination of these approaches might work, e.g. a recommended > >>>>>> timeout then poll. > >>>>> We've already had msleep() for vp_reset(), anyhow we can increase > >>>>> the sleep time, if it can overload the pci: > >>>>> > >>>>> while (vp_modern_get_status(mdev)) > >>>>> msleep(1); > >>>>> > >>>>> We can do the same for suspending. > >>>>> > >>>>> The main blocker for timeout is that it may break migration and > >>>>> complicate the hardening. Another proposal in the past is to have > >>>>> a notification. > >>>>> > >>>>> But what I don't understand here is that suspend/resume should be > >>>>> lighter than reset. If we can afford a reset, so did the > >>>>> suspending/resume. If we want to have something new, that's fine > >>>>> but it should be orthogonal to a specific new status bit? > >>>> I agree, if we want new status indicator, then the new indicator > >>>> should not be specific to SUSPEND. > >>>> > >>> Sounds good to me as long as new functionality does not force the > >>> device > >> to react in 50nsec. > >>> In the current state it forces the device so please upgrade the > >>> proposal as > >> Michael guided. > >> Trap and emulate is very often in virtualization, of course the > >> sooner the better for emulation. > >> But there are no 50nsec assumption and no one can force the > >> hypervisor to finish any emulations within a certain time > >>> Thanks. > > You are completely ignoring the comments. > > Good luck.. > VM_EXIT and a VM_ENTRY work in a pair. > > For hw pci pf where there is no virtualization used? How? ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:35 ` Parav Pandit @ 2024-09-05 8:30 ` Zhu Lingshan 2024-09-05 8:41 ` David Stevens 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 8:30 UTC (permalink / raw) To: Parav Pandit, Jason Wang, Michael S. Tsirkin Cc: cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/5/2024 3:35 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Thursday, September 5, 2024 1:00 PM >> >> On 9/5/2024 3:16 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Thursday, September 5, 2024 12:45 PM >>>> >>>> >>>> On 9/4/2024 2:46 PM, Parav Pandit wrote: >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Sent: Wednesday, September 4, 2024 12:09 PM >>>>>> >>>>>> On 9/4/2024 2:31 PM, Jason Wang wrote: >>>>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin >>>>>>> <mst@redhat.com> >>>>>> wrote: >>>>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin >>>>>>>>> <mst@redhat.com> >>>>>> wrote: >>>>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin >> wrote: >>>>>>>>>>>>> But I don't like it that looking at the registers, one does >>>>>>>>>>>>> not know the device state. Hidden state is bad for >> debuggability. >>>>>>>>>>>>> We have 4 states: >>>>>>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>>>>>> >>>>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>>>>>> >>>>>>>>>>>> This is why letting the status tell the status and control >>>>>>>>>>>> register to >>>>>> control thing is elegant. >>>>>>>>>>> No argument here. >>>>>>>>>> Or, to be more precise, our status is driver status. >>>>>>>>> It looks like the device actually otherwise there's no need for >>>>>>>>> re-read or poll for things like reset and others. >>>>>>>> The need is there for complex device state transitions, which can >>>>>>>> not reasonably block a read response. >>>>>>>> Another standard approach with PCI is to specify the time >>>>>>>> transitions can take. I consider that less elegant - is this what >>>>>>>> you are advocating? The advantage is that driver does not load >>>>>>>> the pci bus with constant re-polling. >>>>>>>> The disadvantage is that it is hard to pick a universal number. >>>>>>>> A combination of these approaches might work, e.g. a recommended >>>>>>>> timeout then poll. >>>>>>> We've already had msleep() for vp_reset(), anyhow we can increase >>>>>>> the sleep time, if it can overload the pci: >>>>>>> >>>>>>> while (vp_modern_get_status(mdev)) >>>>>>> msleep(1); >>>>>>> >>>>>>> We can do the same for suspending. >>>>>>> >>>>>>> The main blocker for timeout is that it may break migration and >>>>>>> complicate the hardening. Another proposal in the past is to have >>>>>>> a notification. >>>>>>> >>>>>>> But what I don't understand here is that suspend/resume should be >>>>>>> lighter than reset. If we can afford a reset, so did the >>>>>>> suspending/resume. If we want to have something new, that's fine >>>>>>> but it should be orthogonal to a specific new status bit? >>>>>> I agree, if we want new status indicator, then the new indicator >>>>>> should not be specific to SUSPEND. >>>>>> >>>>> Sounds good to me as long as new functionality does not force the >>>>> device >>>> to react in 50nsec. >>>>> In the current state it forces the device so please upgrade the >>>>> proposal as >>>> Michael guided. >>>> Trap and emulate is very often in virtualization, of course the >>>> sooner the better for emulation. >>>> But there are no 50nsec assumption and no one can force the >>>> hypervisor to finish any emulations within a certain time >>>>> Thanks. >>> You are completely ignoring the comments. >>> Good luck.. >> VM_EXIT and a VM_ENTRY work in a pair. > For hw pci pf where there is no virtualization used? > How? you bring the VM_EXIT problem to me, if without virtualization, there is no VM_EXIT. > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 8:30 ` Zhu Lingshan @ 2024-09-05 8:41 ` David Stevens 2024-09-06 1:53 ` Parav Pandit 0 siblings, 1 reply; 69+ messages in thread From: David Stevens @ 2024-09-05 8:41 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, Jason Wang, Michael S. Tsirkin, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez On Thu, Sep 5, 2024 at 5:31 PM Zhu Lingshan <lingshan.zhu@amd.com> wrote: > On 9/5/2024 3:35 PM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Thursday, September 5, 2024 1:00 PM > >> > >> On 9/5/2024 3:16 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Thursday, September 5, 2024 12:45 PM > >>>> > >>>> > >>>> On 9/4/2024 2:46 PM, Parav Pandit wrote: > >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>> Sent: Wednesday, September 4, 2024 12:09 PM > >>>>>> > >>>>>> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>>>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin > >>>>>>> <mst@redhat.com> > >>>>>> wrote: > >>>>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin > >>>>>>>>> <mst@redhat.com> > >>>>>> wrote: > >>>>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin > >> wrote: > >>>>>>>>>>>>> But I don't like it that looking at the registers, one does > >>>>>>>>>>>>> not know the device state. Hidden state is bad for > >> debuggability. > >>>>>>>>>>>>> We have 4 states: > >>>>>>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>>>>>> > >>>>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>>>>>> > >>>>>>>>>>>> This is why letting the status tell the status and control > >>>>>>>>>>>> register to > >>>>>> control thing is elegant. > >>>>>>>>>>> No argument here. > >>>>>>>>>> Or, to be more precise, our status is driver status. > >>>>>>>>> It looks like the device actually otherwise there's no need for > >>>>>>>>> re-read or poll for things like reset and others. > >>>>>>>> The need is there for complex device state transitions, which can > >>>>>>>> not reasonably block a read response. > >>>>>>>> Another standard approach with PCI is to specify the time > >>>>>>>> transitions can take. I consider that less elegant - is this what > >>>>>>>> you are advocating? The advantage is that driver does not load > >>>>>>>> the pci bus with constant re-polling. > >>>>>>>> The disadvantage is that it is hard to pick a universal number. > >>>>>>>> A combination of these approaches might work, e.g. a recommended > >>>>>>>> timeout then poll. > >>>>>>> We've already had msleep() for vp_reset(), anyhow we can increase > >>>>>>> the sleep time, if it can overload the pci: > >>>>>>> > >>>>>>> while (vp_modern_get_status(mdev)) > >>>>>>> msleep(1); > >>>>>>> > >>>>>>> We can do the same for suspending. > >>>>>>> > >>>>>>> The main blocker for timeout is that it may break migration and > >>>>>>> complicate the hardening. Another proposal in the past is to have > >>>>>>> a notification. > >>>>>>> > >>>>>>> But what I don't understand here is that suspend/resume should be > >>>>>>> lighter than reset. If we can afford a reset, so did the > >>>>>>> suspending/resume. If we want to have something new, that's fine > >>>>>>> but it should be orthogonal to a specific new status bit? > >>>>>> I agree, if we want new status indicator, then the new indicator > >>>>>> should not be specific to SUSPEND. > >>>>>> > >>>>> Sounds good to me as long as new functionality does not force the > >>>>> device > >>>> to react in 50nsec. > >>>>> In the current state it forces the device so please upgrade the > >>>>> proposal as > >>>> Michael guided. > >>>> Trap and emulate is very often in virtualization, of course the > >>>> sooner the better for emulation. > >>>> But there are no 50nsec assumption and no one can force the > >>>> hypervisor to finish any emulations within a certain time > >>>>> Thanks. > >>> You are completely ignoring the comments. > >>> Good luck.. > >> VM_EXIT and a VM_ENTRY work in a pair. > > For hw pci pf where there is no virtualization used? > > How? > you bring the VM_EXIT problem to me, if without virtualization, there is no VM_EXIT. The problem Parav is raising is that the single bit approach effectively requires that the device handles updates to the status field synchronously. In a virtualization context with VM_EXIT/VM_ENTRY, this synchronous handling is easy. However, it's not necessarily easy with bare metal. Even in the virtualization context, it's entirely possible to implement a virtualized device where the status field is plain memory that is periodically checked by the device. If the device only checks the status field once per second, then it's pretty clear that the single suspend bit approach won't work at all. I don't know why someone would implement a device like this, but it's not something that the virtio spec should rule out. -David ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 8:41 ` David Stevens @ 2024-09-06 1:53 ` Parav Pandit 0 siblings, 0 replies; 69+ messages in thread From: Parav Pandit @ 2024-09-06 1:53 UTC (permalink / raw) To: David Stevens, Zhu Lingshan Cc: Jason Wang, Michael S. Tsirkin, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez > From: David Stevens <stevensd@chromium.org> > Sent: Thursday, September 5, 2024 2:12 PM > > On Thu, Sep 5, 2024 at 5:31 PM Zhu Lingshan <lingshan.zhu@amd.com> > wrote: > > On 9/5/2024 3:35 PM, Parav Pandit wrote: > > > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > > >> Sent: Thursday, September 5, 2024 1:00 PM > > >> > > >> On 9/5/2024 3:16 PM, Parav Pandit wrote: > > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > > >>>> Sent: Thursday, September 5, 2024 12:45 PM > > >>>> > > >>>> > > >>>> On 9/4/2024 2:46 PM, Parav Pandit wrote: > > >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > > >>>>>> Sent: Wednesday, September 4, 2024 12:09 PM > > >>>>>> > > >>>>>> On 9/4/2024 2:31 PM, Jason Wang wrote: > > >>>>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin > > >>>>>>> <mst@redhat.com> > > >>>>>> wrote: > > >>>>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > >>>>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin > > >>>>>>>>> <mst@redhat.com> > > >>>>>> wrote: > > >>>>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. > > >>>>>>>>>> Tsirkin > > >> wrote: > > >>>>>>>>>>>>> But I don't like it that looking at the registers, one > > >>>>>>>>>>>>> does not know the device state. Hidden state is bad for > > >> debuggability. > > >>>>>>>>>>>>> We have 4 states: > > >>>>>>>>>>>>> suspending->suspended->resuming->resumed > > >>>>>>>>>>>>> so we need a register with at least 2 bits. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > >>>>>>>>>>>>> > > >>>>>>>>>>>> This is why letting the status tell the status and > > >>>>>>>>>>>> control register to > > >>>>>> control thing is elegant. > > >>>>>>>>>>> No argument here. > > >>>>>>>>>> Or, to be more precise, our status is driver status. > > >>>>>>>>> It looks like the device actually otherwise there's no need > > >>>>>>>>> for re-read or poll for things like reset and others. > > >>>>>>>> The need is there for complex device state transitions, which > > >>>>>>>> can not reasonably block a read response. > > >>>>>>>> Another standard approach with PCI is to specify the time > > >>>>>>>> transitions can take. I consider that less elegant - is this > > >>>>>>>> what you are advocating? The advantage is that driver does > > >>>>>>>> not load the pci bus with constant re-polling. > > >>>>>>>> The disadvantage is that it is hard to pick a universal number. > > >>>>>>>> A combination of these approaches might work, e.g. a > > >>>>>>>> recommended timeout then poll. > > >>>>>>> We've already had msleep() for vp_reset(), anyhow we can > > >>>>>>> increase the sleep time, if it can overload the pci: > > >>>>>>> > > >>>>>>> while (vp_modern_get_status(mdev)) > > >>>>>>> msleep(1); > > >>>>>>> > > >>>>>>> We can do the same for suspending. > > >>>>>>> > > >>>>>>> The main blocker for timeout is that it may break migration > > >>>>>>> and complicate the hardening. Another proposal in the past is > > >>>>>>> to have a notification. > > >>>>>>> > > >>>>>>> But what I don't understand here is that suspend/resume should > > >>>>>>> be lighter than reset. If we can afford a reset, so did the > > >>>>>>> suspending/resume. If we want to have something new, that's > > >>>>>>> fine but it should be orthogonal to a specific new status bit? > > >>>>>> I agree, if we want new status indicator, then the new > > >>>>>> indicator should not be specific to SUSPEND. > > >>>>>> > > >>>>> Sounds good to me as long as new functionality does not force > > >>>>> the device > > >>>> to react in 50nsec. > > >>>>> In the current state it forces the device so please upgrade the > > >>>>> proposal as > > >>>> Michael guided. > > >>>> Trap and emulate is very often in virtualization, of course the > > >>>> sooner the better for emulation. > > >>>> But there are no 50nsec assumption and no one can force the > > >>>> hypervisor to finish any emulations within a certain time > > >>>>> Thanks. > > >>> You are completely ignoring the comments. > > >>> Good luck.. > > >> VM_EXIT and a VM_ENTRY work in a pair. > > > For hw pci pf where there is no virtualization used? > > > How? > > you bring the VM_EXIT problem to me, if without virtualization, there is no > VM_EXIT. > > The problem Parav is raising is that the single bit approach effectively > requires that the device handles updates to the status field synchronously. In > a virtualization context with VM_EXIT/VM_ENTRY, this synchronous handling > is easy. However, it's not necessarily easy with bare metal. > > Even in the virtualization context, it's entirely possible to implement a > virtualized device where the status field is plain memory that is periodically > checked by the device. If the device only checks the status field once per > second, then it's pretty clear that the single suspend bit approach won't work > at all. Thanks a lot David. You captured the summary precisely. > I don't know why someone would implement a device like this, but it's > not something that the virtio spec should rule out. > > -David ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:14 ` Zhu Lingshan 2024-09-05 7:16 ` Parav Pandit @ 2024-09-05 7:17 ` Michael S. Tsirkin 2024-09-05 7:31 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-05 7:17 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, Jason Wang, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 05, 2024 at 03:14:45PM +0800, Zhu Lingshan wrote: > > > On 9/4/2024 2:46 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Wednesday, September 4, 2024 12:09 PM > >> > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> > >> wrote: > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> > >> wrote: > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > >>>>>>>>> But I don't like it that looking at the registers, one does not > >>>>>>>>> know the device state. Hidden state is bad for debuggability. > >>>>>>>>> We have 4 states: > >>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>> > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>> > >>>>>>>> This is why letting the status tell the status and control register to > >> control thing is elegant. > >>>>>>> No argument here. > >>>>>> Or, to be more precise, our status is driver status. > >>>>> It looks like the device actually otherwise there's no need for > >>>>> re-read or poll for things like reset and others. > >>>> The need is there for complex device state transitions, which can not > >>>> reasonably block a read response. > >>>> Another standard approach with PCI is to specify the time transitions > >>>> can take. I consider that less elegant - is this what you are > >>>> advocating? The advantage is that driver does not load the pci bus > >>>> with constant re-polling. > >>>> The disadvantage is that it is hard to pick a universal number. A > >>>> combination of these approaches might work, e.g. a recommended > >>>> timeout then poll. > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > >>> sleep time, if it can overload the pci: > >>> > >>> while (vp_modern_get_status(mdev)) > >>> msleep(1); > >>> > >>> We can do the same for suspending. > >>> > >>> The main blocker for timeout is that it may break migration and > >>> complicate the hardening. Another proposal in the past is to have a > >>> notification. > >>> > >>> But what I don't understand here is that suspend/resume should be > >>> lighter than reset. If we can afford a reset, so did the > >>> suspending/resume. If we want to have something new, that's fine but > >>> it should be orthogonal to a specific new status bit? > >> I agree, if we want new status indicator, then the new indicator should not be > >> specific to SUSPEND. > >> > > Sounds good to me as long as new functionality does not force the device to react in 50nsec. > > In the current state it forces the device so please upgrade the proposal as Michael guided. > Trap and emulate is very often in virtualization, of course the sooner the better for emulation. > But there are no 50nsec assumption and no one can force the hypervisor to finish any emulations within > a certain time The spec should not assume devices are virtual, there are baremetal deployments of virtio. In such setups, there is no one to trap or emulate: driver must do everything itself. > > Thanks. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:17 ` Michael S. Tsirkin @ 2024-09-05 7:31 ` Zhu Lingshan 2024-09-05 7:34 ` Parav Pandit 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Parav Pandit, Jason Wang, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/5/2024 3:17 PM, Michael S. Tsirkin wrote: > On Thu, Sep 05, 2024 at 03:14:45PM +0800, Zhu Lingshan wrote: >> >> On 9/4/2024 2:46 PM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Wednesday, September 4, 2024 12:09 PM >>>> >>>> On 9/4/2024 2:31 PM, Jason Wang wrote: >>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>>>> But I don't like it that looking at the registers, one does not >>>>>>>>>>> know the device state. Hidden state is bad for debuggability. >>>>>>>>>>> We have 4 states: >>>>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>>>> >>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>>>> >>>>>>>>>> This is why letting the status tell the status and control register to >>>> control thing is elegant. >>>>>>>>> No argument here. >>>>>>>> Or, to be more precise, our status is driver status. >>>>>>> It looks like the device actually otherwise there's no need for >>>>>>> re-read or poll for things like reset and others. >>>>>> The need is there for complex device state transitions, which can not >>>>>> reasonably block a read response. >>>>>> Another standard approach with PCI is to specify the time transitions >>>>>> can take. I consider that less elegant - is this what you are >>>>>> advocating? The advantage is that driver does not load the pci bus >>>>>> with constant re-polling. >>>>>> The disadvantage is that it is hard to pick a universal number. A >>>>>> combination of these approaches might work, e.g. a recommended >>>>>> timeout then poll. >>>>> We've already had msleep() for vp_reset(), anyhow we can increase the >>>>> sleep time, if it can overload the pci: >>>>> >>>>> while (vp_modern_get_status(mdev)) >>>>> msleep(1); >>>>> >>>>> We can do the same for suspending. >>>>> >>>>> The main blocker for timeout is that it may break migration and >>>>> complicate the hardening. Another proposal in the past is to have a >>>>> notification. >>>>> >>>>> But what I don't understand here is that suspend/resume should be >>>>> lighter than reset. If we can afford a reset, so did the >>>>> suspending/resume. If we want to have something new, that's fine but >>>>> it should be orthogonal to a specific new status bit? >>>> I agree, if we want new status indicator, then the new indicator should not be >>>> specific to SUSPEND. >>>> >>> Sounds good to me as long as new functionality does not force the device to react in 50nsec. >>> In the current state it forces the device so please upgrade the proposal as Michael guided. >> Trap and emulate is very often in virtualization, of course the sooner the better for emulation. >> But there are no 50nsec assumption and no one can force the hypervisor to finish any emulations within >> a certain time > The spec should not assume devices are virtual, there are baremetal > deployments of virtio. In such setups, there is no one to trap or > emulate: driver must do everything itself. sure thing, I am also confused why VM_EXIT & VM_ENTRY and is a problem here... > > >>> Thanks. ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:31 ` Zhu Lingshan @ 2024-09-05 7:34 ` Parav Pandit 0 siblings, 0 replies; 69+ messages in thread From: Parav Pandit @ 2024-09-05 7:34 UTC (permalink / raw) To: Zhu Lingshan, Michael S. Tsirkin Cc: Jason Wang, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, September 5, 2024 1:01 PM > > On 9/5/2024 3:17 PM, Michael S. Tsirkin wrote: > > On Thu, Sep 05, 2024 at 03:14:45PM +0800, Zhu Lingshan wrote: > >> > >> On 9/4/2024 2:46 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Wednesday, September 4, 2024 12:09 PM > >>>> > >>>> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin > >>>>> <mst@redhat.com> > >>>> wrote: > >>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin > >>>>>>> <mst@redhat.com> > >>>> wrote: > >>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin > wrote: > >>>>>>>>>>> But I don't like it that looking at the registers, one does > >>>>>>>>>>> not know the device state. Hidden state is bad for > debuggability. > >>>>>>>>>>> We have 4 states: > >>>>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>>>> > >>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>>>> > >>>>>>>>>> This is why letting the status tell the status and control > >>>>>>>>>> register to > >>>> control thing is elegant. > >>>>>>>>> No argument here. > >>>>>>>> Or, to be more precise, our status is driver status. > >>>>>>> It looks like the device actually otherwise there's no need for > >>>>>>> re-read or poll for things like reset and others. > >>>>>> The need is there for complex device state transitions, which can > >>>>>> not reasonably block a read response. > >>>>>> Another standard approach with PCI is to specify the time > >>>>>> transitions can take. I consider that less elegant - is this what > >>>>>> you are advocating? The advantage is that driver does not load > >>>>>> the pci bus with constant re-polling. > >>>>>> The disadvantage is that it is hard to pick a universal number. > >>>>>> A combination of these approaches might work, e.g. a recommended > >>>>>> timeout then poll. > >>>>> We've already had msleep() for vp_reset(), anyhow we can increase > >>>>> the sleep time, if it can overload the pci: > >>>>> > >>>>> while (vp_modern_get_status(mdev)) > >>>>> msleep(1); > >>>>> > >>>>> We can do the same for suspending. > >>>>> > >>>>> The main blocker for timeout is that it may break migration and > >>>>> complicate the hardening. Another proposal in the past is to have > >>>>> a notification. > >>>>> > >>>>> But what I don't understand here is that suspend/resume should be > >>>>> lighter than reset. If we can afford a reset, so did the > >>>>> suspending/resume. If we want to have something new, that's fine > >>>>> but it should be orthogonal to a specific new status bit? > >>>> I agree, if we want new status indicator, then the new indicator > >>>> should not be specific to SUSPEND. > >>>> > >>> Sounds good to me as long as new functionality does not force the > device to react in 50nsec. > >>> In the current state it forces the device so please upgrade the proposal > as Michael guided. > >> Trap and emulate is very often in virtualization, of course the sooner the > better for emulation. > >> But there are no 50nsec assumption and no one can force the > >> hypervisor to finish any emulations within a certain time > > The spec should not assume devices are virtual, there are baremetal > > deployments of virtio. In such setups, there is no one to trap or > > emulate: driver must do everything itself. > sure thing, I am also confused why VM_EXIT & VM_ENTRY and is a problem > here... How does proposed VM_EXIT and VM_ENTRY work for basic PCI PF hw device? How can one build sw without VM_EXIT and VM_ENTRY restriction if they want to? Apart from other timing restrictions imposed that makes this very high complexity device for rare event. Please list down the assumptions and limitations so that it becomes clear the value of this bit.. > > > > > >>> Thanks. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-04 6:38 ` Zhu Lingshan 2024-09-04 6:46 ` Parav Pandit @ 2024-09-05 6:51 ` Michael S. Tsirkin 2024-09-05 7:12 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-05 6:51 UTC (permalink / raw) To: Zhu Lingshan Cc: Jason Wang, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > > On 9/4/2024 2:31 PM, Jason Wang wrote: > > On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > >>>>>>> But I don't like it that looking at the registers, one does not know the device > >>>>>>> state. Hidden state is bad for debuggability. > >>>>>>> We have 4 states: > >>>>>>> suspending->suspended->resuming->resumed > >>>>>>> so we need a register with at least 2 bits. > >>>>>>> > >>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>> > >>>>>> This is why letting the status tell the status and control register to control thing is elegant. > >>>>> > >>>>> No argument here. > >>>> Or, to be more precise, our status is driver status. > >>> It looks like the device actually otherwise there's no need for > >>> re-read or poll for things like reset and others. > >> The need is there for complex device state transitions, which > >> can not reasonably block a read response. > >> Another standard approach with PCI is to specify the time > >> transitions can take. I consider that less elegant - > >> is this what you are advocating? The advantage is that > >> driver does not load the pci bus with constant re-polling. > >> The disadvantage is that it is hard to pick a universal > >> number. A combination of these approaches might work, > >> e.g. a recommended timeout then poll. > > We've already had msleep() for vp_reset(), anyhow we can increase the > > sleep time, if it can overload the pci: > > > > while (vp_modern_get_status(mdev)) > > msleep(1); > > > > We can do the same for suspending. > > > > The main blocker for timeout is that it may break migration and > > complicate the hardening. Another proposal in the past is to have a > > notification. > > > > But what I don't understand here is that suspend/resume should be > > lighter than reset. If we can afford a reset, so did the > > suspending/resume. If we want to have something new, that's fine but > > it should be orthogonal to a specific new status bit? > I agree, if we want new status indicator, then the new indicator should not be > specific to SUSPEND. > > Thanks > Zhu Lingshan If you mean reset, we have a problem. Specifically, reset has to work before feature negotiation. So we can not do as we would with SUSPEND and use a feature bit to expose presence of the indicator register. Regrettably, reset will have to still be supported through clearing status just because this is always there. > > > >>> Anyhow driver know > >>> its own status. > >>> > >>> Thanks > >> indeed, the status register is there to inform the device about > >> the driver status. > >> > >> > >> > >>>> If we need > >>>> to reflect and control device status, we need something else. > >>>> > > Thanks > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 6:51 ` Michael S. Tsirkin @ 2024-09-05 7:12 ` Zhu Lingshan 2024-09-05 8:12 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: >> >> On 9/4/2024 2:31 PM, Jason Wang wrote: >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>> But I don't like it that looking at the registers, one does not know the device >>>>>>>>> state. Hidden state is bad for debuggability. >>>>>>>>> We have 4 states: >>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>> >>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>> >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. >>>>>>> No argument here. >>>>>> Or, to be more precise, our status is driver status. >>>>> It looks like the device actually otherwise there's no need for >>>>> re-read or poll for things like reset and others. >>>> The need is there for complex device state transitions, which >>>> can not reasonably block a read response. >>>> Another standard approach with PCI is to specify the time >>>> transitions can take. I consider that less elegant - >>>> is this what you are advocating? The advantage is that >>>> driver does not load the pci bus with constant re-polling. >>>> The disadvantage is that it is hard to pick a universal >>>> number. A combination of these approaches might work, >>>> e.g. a recommended timeout then poll. >>> We've already had msleep() for vp_reset(), anyhow we can increase the >>> sleep time, if it can overload the pci: >>> >>> while (vp_modern_get_status(mdev)) >>> msleep(1); >>> >>> We can do the same for suspending. >>> >>> The main blocker for timeout is that it may break migration and >>> complicate the hardening. Another proposal in the past is to have a >>> notification. >>> >>> But what I don't understand here is that suspend/resume should be >>> lighter than reset. If we can afford a reset, so did the >>> suspending/resume. If we want to have something new, that's fine but >>> it should be orthogonal to a specific new status bit? >> I agree, if we want new status indicator, then the new indicator should not be >> specific to SUSPEND. >> >> Thanks >> Zhu Lingshan > If you mean reset, we have a problem. > Specifically, reset has to work before feature > negotiation. So we can not do as we would with > SUSPEND and use a feature bit to expose presence > of the indicator register. I guess the driver can reset the device even after DRIVER_OK. IMHO the indicator should work for all device_status transitions, not only for RESET or SUSPEND, it should also apply for FEATURE_OK and DRIVER_OK. So we don't need a feature bit, we may just place it in the common config space as you proposed before to define common config space for all transport. We can do this for sure! Of course another approach is what Jason proposed, the existing msleep and poll, driver knows whether the device is in progress of status transitions because it is the driver writes device_status. Thanks > > Regrettably, reset will have to still be supported > through clearing status just because this is always there. > > >>>>> Anyhow driver know >>>>> its own status. >>>>> >>>>> Thanks >>>> indeed, the status register is there to inform the device about >>>> the driver status. >>>> >>>> >>>> >>>>>> If we need >>>>>> to reflect and control device status, we need something else. >>>>>> >>> Thanks >>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:12 ` Zhu Lingshan @ 2024-09-05 8:12 ` Michael S. Tsirkin 2024-09-05 9:09 ` Zhu Lingshan 2024-09-05 23:51 ` Jason Wang 0 siblings, 2 replies; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-05 8:12 UTC (permalink / raw) To: Zhu Lingshan Cc: Jason Wang, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > >> > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > >>>>>>>>> state. Hidden state is bad for debuggability. > >>>>>>>>> We have 4 states: > >>>>>>>>> suspending->suspended->resuming->resumed > >>>>>>>>> so we need a register with at least 2 bits. > >>>>>>>>> > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > >>>>>>>>> > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > >>>>>>> No argument here. > >>>>>> Or, to be more precise, our status is driver status. > >>>>> It looks like the device actually otherwise there's no need for > >>>>> re-read or poll for things like reset and others. > >>>> The need is there for complex device state transitions, which > >>>> can not reasonably block a read response. > >>>> Another standard approach with PCI is to specify the time > >>>> transitions can take. I consider that less elegant - > >>>> is this what you are advocating? The advantage is that > >>>> driver does not load the pci bus with constant re-polling. > >>>> The disadvantage is that it is hard to pick a universal > >>>> number. A combination of these approaches might work, > >>>> e.g. a recommended timeout then poll. > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > >>> sleep time, if it can overload the pci: > >>> > >>> while (vp_modern_get_status(mdev)) > >>> msleep(1); > >>> > >>> We can do the same for suspending. > >>> > >>> The main blocker for timeout is that it may break migration and > >>> complicate the hardening. Another proposal in the past is to have a > >>> notification. > >>> > >>> But what I don't understand here is that suspend/resume should be > >>> lighter than reset. If we can afford a reset, so did the > >>> suspending/resume. If we want to have something new, that's fine but > >>> it should be orthogonal to a specific new status bit? > >> I agree, if we want new status indicator, then the new indicator should not be > >> specific to SUSPEND. > >> > >> Thanks > >> Zhu Lingshan > > If you mean reset, we have a problem. > > Specifically, reset has to work before feature > > negotiation. So we can not do as we would with > > SUSPEND and use a feature bit to expose presence > > of the indicator register. > I guess the driver can reset the device even after DRIVER_OK. in fact, unlike suspend - at any point at all. > IMHO the indicator should work for all device_status transitions, > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > and DRIVER_OK. I don't know what kind of transition is there for DRIVER/DRIVER_OK. FEATURES_OK brings with it more issues, e.g. it can fail. > So we don't need a feature bit, we may just > place it in the common config space as you proposed before to > define common config space for all transport. We can do this for sure! I have to say, this project is really ballooning out ;) > Of course another approach is what Jason proposed, the existing > msleep and poll, driver knows whether the device is in progress > of status transitions because it is the driver writes device_status. > > Thanks or just a register for suspend transitions, worry about extending it later/ > > > > Regrettably, reset will have to still be supported > > through clearing status just because this is always there. > > > > > >>>>> Anyhow driver know > >>>>> its own status. > >>>>> > >>>>> Thanks > >>>> indeed, the status register is there to inform the device about > >>>> the driver status. > >>>> > >>>> > >>>> > >>>>>> If we need > >>>>>> to reflect and control device status, we need something else. > >>>>>> > >>> Thanks > >>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 8:12 ` Michael S. Tsirkin @ 2024-09-05 9:09 ` Zhu Lingshan 2024-09-06 1:54 ` Parav Pandit 2024-09-05 23:51 ` Jason Wang 1 sibling, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 9:09 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/5/2024 4:12 PM, Michael S. Tsirkin wrote: > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: >> >> On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: >>> On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: >>>> On 9/4/2024 2:31 PM, Jason Wang wrote: >>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>>>> But I don't like it that looking at the registers, one does not know the device >>>>>>>>>>> state. Hidden state is bad for debuggability. >>>>>>>>>>> We have 4 states: >>>>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>>>> >>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>>>> >>>>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. >>>>>>>>> No argument here. >>>>>>>> Or, to be more precise, our status is driver status. >>>>>>> It looks like the device actually otherwise there's no need for >>>>>>> re-read or poll for things like reset and others. >>>>>> The need is there for complex device state transitions, which >>>>>> can not reasonably block a read response. >>>>>> Another standard approach with PCI is to specify the time >>>>>> transitions can take. I consider that less elegant - >>>>>> is this what you are advocating? The advantage is that >>>>>> driver does not load the pci bus with constant re-polling. >>>>>> The disadvantage is that it is hard to pick a universal >>>>>> number. A combination of these approaches might work, >>>>>> e.g. a recommended timeout then poll. >>>>> We've already had msleep() for vp_reset(), anyhow we can increase the >>>>> sleep time, if it can overload the pci: >>>>> >>>>> while (vp_modern_get_status(mdev)) >>>>> msleep(1); >>>>> >>>>> We can do the same for suspending. >>>>> >>>>> The main blocker for timeout is that it may break migration and >>>>> complicate the hardening. Another proposal in the past is to have a >>>>> notification. >>>>> >>>>> But what I don't understand here is that suspend/resume should be >>>>> lighter than reset. If we can afford a reset, so did the >>>>> suspending/resume. If we want to have something new, that's fine but >>>>> it should be orthogonal to a specific new status bit? >>>> I agree, if we want new status indicator, then the new indicator should not be >>>> specific to SUSPEND. >>>> >>>> Thanks >>>> Zhu Lingshan >>> If you mean reset, we have a problem. >>> Specifically, reset has to work before feature >>> negotiation. So we can not do as we would with >>> SUSPEND and use a feature bit to expose presence >>> of the indicator register. >> I guess the driver can reset the device even after DRIVER_OK. > in fact, unlike suspend - at any point at all. > >> IMHO the indicator should work for all device_status transitions, >> not only for RESET or SUSPEND, it should also apply for FEATURE_OK >> and DRIVER_OK. > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > FEATURES_OK brings with it more issues, e.g. it can fail. > > >> So we don't need a feature bit, we may just >> place it in the common config space as you proposed before to >> define common config space for all transport. We can do this for sure! > I have to say, this project is really ballooning out ;) > >> Of course another approach is what Jason proposed, the existing >> msleep and poll, driver knows whether the device is in progress >> of status transitions because it is the driver writes device_status. >> >> Thanks > or just a register for suspend transitions, worry about > extending it later/ OK, will implement it in V8 > >>> Regrettably, reset will have to still be supported >>> through clearing status just because this is always there. >>> >>> >>>>>>> Anyhow driver know >>>>>>> its own status. >>>>>>> >>>>>>> Thanks >>>>>> indeed, the status register is there to inform the device about >>>>>> the driver status. >>>>>> >>>>>> >>>>>> >>>>>>>> If we need >>>>>>>> to reflect and control device status, we need something else. >>>>>>>> >>>>> Thanks >>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 9:09 ` Zhu Lingshan @ 2024-09-06 1:54 ` Parav Pandit 0 siblings, 0 replies; 69+ messages in thread From: Parav Pandit @ 2024-09-06 1:54 UTC (permalink / raw) To: Zhu Lingshan, Michael S. Tsirkin Cc: Jason Wang, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Thursday, September 5, 2024 2:40 PM > >> Thanks > > or just a register for suspend transitions, worry about extending it > > later/ > OK, will implement it in V8 Thanks Zhu Lingshan. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 8:12 ` Michael S. Tsirkin 2024-09-05 9:09 ` Zhu Lingshan @ 2024-09-05 23:51 ` Jason Wang 2024-09-11 3:52 ` Zhu Lingshan 2024-09-11 10:20 ` Michael S. Tsirkin 1 sibling, 2 replies; 69+ messages in thread From: Jason Wang @ 2024-09-05 23:51 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > >> > > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > > >>>>>>>>> state. Hidden state is bad for debuggability. > > >>>>>>>>> We have 4 states: > > >>>>>>>>> suspending->suspended->resuming->resumed > > >>>>>>>>> so we need a register with at least 2 bits. > > >>>>>>>>> > > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > >>>>>>>>> > > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > > >>>>>>> No argument here. > > >>>>>> Or, to be more precise, our status is driver status. > > >>>>> It looks like the device actually otherwise there's no need for > > >>>>> re-read or poll for things like reset and others. > > >>>> The need is there for complex device state transitions, which > > >>>> can not reasonably block a read response. > > >>>> Another standard approach with PCI is to specify the time > > >>>> transitions can take. I consider that less elegant - > > >>>> is this what you are advocating? The advantage is that > > >>>> driver does not load the pci bus with constant re-polling. > > >>>> The disadvantage is that it is hard to pick a universal > > >>>> number. A combination of these approaches might work, > > >>>> e.g. a recommended timeout then poll. > > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > > >>> sleep time, if it can overload the pci: > > >>> > > >>> while (vp_modern_get_status(mdev)) > > >>> msleep(1); > > >>> > > >>> We can do the same for suspending. > > >>> > > >>> The main blocker for timeout is that it may break migration and > > >>> complicate the hardening. Another proposal in the past is to have a > > >>> notification. > > >>> > > >>> But what I don't understand here is that suspend/resume should be > > >>> lighter than reset. If we can afford a reset, so did the > > >>> suspending/resume. If we want to have something new, that's fine but > > >>> it should be orthogonal to a specific new status bit? > > >> I agree, if we want new status indicator, then the new indicator should not be > > >> specific to SUSPEND. > > >> > > >> Thanks > > >> Zhu Lingshan > > > If you mean reset, we have a problem. > > > Specifically, reset has to work before feature > > > negotiation. So we can not do as we would with > > > SUSPEND and use a feature bit to expose presence > > > of the indicator register. > > I guess the driver can reset the device even after DRIVER_OK. > > in fact, unlike suspend - at any point at all. > > > IMHO the indicator should work for all device_status transitions, > > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > > and DRIVER_OK. > > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > FEATURES_OK brings with it more issues, e.g. it can fail. > > > > So we don't need a feature bit, we may just > > place it in the common config space as you proposed before to > > define common config space for all transport. We can do this for sure! > > I have to say, this project is really ballooning out ;) > > > Of course another approach is what Jason proposed, the existing > > msleep and poll, driver knows whether the device is in progress > > of status transitions because it is the driver writes device_status. > > > > Thanks > > or just a register for suspend transitions, worry about > extending it later/ Is this for PCI only? Another question is that, if suspend needs that, reset would also want. Or it doesn't justify itself as reset needs to take longer time than reset. Thanks > > > > > > > Regrettably, reset will have to still be supported > > > through clearing status just because this is always there. > > > > > > > > >>>>> Anyhow driver know > > >>>>> its own status. > > >>>>> > > >>>>> Thanks > > >>>> indeed, the status register is there to inform the device about > > >>>> the driver status. > > >>>> > > >>>> > > >>>> > > >>>>>> If we need > > >>>>>> to reflect and control device status, we need something else. > > >>>>>> > > >>> Thanks > > >>> > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 23:51 ` Jason Wang @ 2024-09-11 3:52 ` Zhu Lingshan 2024-09-11 10:20 ` Michael S. Tsirkin 1 sibling, 0 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-11 3:52 UTC (permalink / raw) To: Jason Wang, Michael S. Tsirkin Cc: Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/6/2024 7:51 AM, Jason Wang wrote: > On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: >>> >>> On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: >>>> On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: >>>>> On 9/4/2024 2:31 PM, Jason Wang wrote: >>>>>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: >>>>>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: >>>>>>>>>>>> But I don't like it that looking at the registers, one does not know the device >>>>>>>>>>>> state. Hidden state is bad for debuggability. >>>>>>>>>>>> We have 4 states: >>>>>>>>>>>> suspending->suspended->resuming->resumed >>>>>>>>>>>> so we need a register with at least 2 bits. >>>>>>>>>>>> >>>>>>>>>>>> we could steal 2 bits from status but it seems a bit much. >>>>>>>>>>>> >>>>>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. >>>>>>>>>> No argument here. >>>>>>>>> Or, to be more precise, our status is driver status. >>>>>>>> It looks like the device actually otherwise there's no need for >>>>>>>> re-read or poll for things like reset and others. >>>>>>> The need is there for complex device state transitions, which >>>>>>> can not reasonably block a read response. >>>>>>> Another standard approach with PCI is to specify the time >>>>>>> transitions can take. I consider that less elegant - >>>>>>> is this what you are advocating? The advantage is that >>>>>>> driver does not load the pci bus with constant re-polling. >>>>>>> The disadvantage is that it is hard to pick a universal >>>>>>> number. A combination of these approaches might work, >>>>>>> e.g. a recommended timeout then poll. >>>>>> We've already had msleep() for vp_reset(), anyhow we can increase the >>>>>> sleep time, if it can overload the pci: >>>>>> >>>>>> while (vp_modern_get_status(mdev)) >>>>>> msleep(1); >>>>>> >>>>>> We can do the same for suspending. >>>>>> >>>>>> The main blocker for timeout is that it may break migration and >>>>>> complicate the hardening. Another proposal in the past is to have a >>>>>> notification. >>>>>> >>>>>> But what I don't understand here is that suspend/resume should be >>>>>> lighter than reset. If we can afford a reset, so did the >>>>>> suspending/resume. If we want to have something new, that's fine but >>>>>> it should be orthogonal to a specific new status bit? >>>>> I agree, if we want new status indicator, then the new indicator should not be >>>>> specific to SUSPEND. >>>>> >>>>> Thanks >>>>> Zhu Lingshan >>>> If you mean reset, we have a problem. >>>> Specifically, reset has to work before feature >>>> negotiation. So we can not do as we would with >>>> SUSPEND and use a feature bit to expose presence >>>> of the indicator register. >>> I guess the driver can reset the device even after DRIVER_OK. >> in fact, unlike suspend - at any point at all. >> >>> IMHO the indicator should work for all device_status transitions, >>> not only for RESET or SUSPEND, it should also apply for FEATURE_OK >>> and DRIVER_OK. >> I don't know what kind of transition is there for DRIVER/DRIVER_OK. >> FEATURES_OK brings with it more issues, e.g. it can fail. >> >> >>> So we don't need a feature bit, we may just >>> place it in the common config space as you proposed before to >>> define common config space for all transport. We can do this for sure! >> I have to say, this project is really ballooning out ;) >> >>> Of course another approach is what Jason proposed, the existing >>> msleep and poll, driver knows whether the device is in progress >>> of status transitions because it is the driver writes device_status. >>> >>> Thanks >> or just a register for suspend transitions, worry about >> extending it later/ > Is this for PCI only? Another question is that, if suspend needs that, > reset would also want. Or it doesn't justify itself as reset needs to > take longer time than reset. Ping Michael. I think Jason's question is critical and we should address it before we proceed v8 Thanks > > Thanks > > >>>> Regrettably, reset will have to still be supported >>>> through clearing status just because this is always there. >>>> >>>> >>>>>>>> Anyhow driver know >>>>>>>> its own status. >>>>>>>> >>>>>>>> Thanks >>>>>>> indeed, the status register is there to inform the device about >>>>>>> the driver status. >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> If we need >>>>>>>>> to reflect and control device status, we need something else. >>>>>>>>> >>>>>> Thanks >>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 23:51 ` Jason Wang 2024-09-11 3:52 ` Zhu Lingshan @ 2024-09-11 10:20 ` Michael S. Tsirkin 2024-09-12 2:05 ` Jason Wang 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-11 10:20 UTC (permalink / raw) To: Jason Wang Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Fri, Sep 06, 2024 at 07:51:08AM +0800, Jason Wang wrote: > On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > > > > > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > > >> > > > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > > > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > > > >>>>>>>>> state. Hidden state is bad for debuggability. > > > >>>>>>>>> We have 4 states: > > > >>>>>>>>> suspending->suspended->resuming->resumed > > > >>>>>>>>> so we need a register with at least 2 bits. > > > >>>>>>>>> > > > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > > >>>>>>>>> > > > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > > > >>>>>>> No argument here. > > > >>>>>> Or, to be more precise, our status is driver status. > > > >>>>> It looks like the device actually otherwise there's no need for > > > >>>>> re-read or poll for things like reset and others. > > > >>>> The need is there for complex device state transitions, which > > > >>>> can not reasonably block a read response. > > > >>>> Another standard approach with PCI is to specify the time > > > >>>> transitions can take. I consider that less elegant - > > > >>>> is this what you are advocating? The advantage is that > > > >>>> driver does not load the pci bus with constant re-polling. > > > >>>> The disadvantage is that it is hard to pick a universal > > > >>>> number. A combination of these approaches might work, > > > >>>> e.g. a recommended timeout then poll. > > > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > > > >>> sleep time, if it can overload the pci: > > > >>> > > > >>> while (vp_modern_get_status(mdev)) > > > >>> msleep(1); > > > >>> > > > >>> We can do the same for suspending. > > > >>> > > > >>> The main blocker for timeout is that it may break migration and > > > >>> complicate the hardening. Another proposal in the past is to have a > > > >>> notification. > > > >>> > > > >>> But what I don't understand here is that suspend/resume should be > > > >>> lighter than reset. If we can afford a reset, so did the > > > >>> suspending/resume. If we want to have something new, that's fine but > > > >>> it should be orthogonal to a specific new status bit? > > > >> I agree, if we want new status indicator, then the new indicator should not be > > > >> specific to SUSPEND. > > > >> > > > >> Thanks > > > >> Zhu Lingshan > > > > If you mean reset, we have a problem. > > > > Specifically, reset has to work before feature > > > > negotiation. So we can not do as we would with > > > > SUSPEND and use a feature bit to expose presence > > > > of the indicator register. > > > I guess the driver can reset the device even after DRIVER_OK. > > > > in fact, unlike suspend - at any point at all. > > > > > IMHO the indicator should work for all device_status transitions, > > > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > > > and DRIVER_OK. > > > > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > > FEATURES_OK brings with it more issues, e.g. it can fail. > > > > > > > So we don't need a feature bit, we may just > > > place it in the common config space as you proposed before to > > > define common config space for all transport. We can do this for sure! > > > > I have to say, this project is really ballooning out ;) > > > > > Of course another approach is what Jason proposed, the existing > > > msleep and poll, driver knows whether the device is in progress > > > of status transitions because it is the driver writes device_status. > > > > > > Thanks > > > > or just a register for suspend transitions, worry about > > extending it later/ > > Is this for PCI only? It's best to add it to all transports. Not rocket science at all. > Another question is that, if suspend needs that, > reset would also want. I don't know what does "reset would also want" means. Unlike suspend, reset is not a special state, so it does not really require a spcial register to track that state. > Or it doesn't justify itself as reset needs to > take longer time than reset. > > Thanks > Don't know what this means, either. > > > > > > > > > > Regrettably, reset will have to still be supported > > > > through clearing status just because this is always there. > > > > > > > > > > > >>>>> Anyhow driver know > > > >>>>> its own status. > > > >>>>> > > > >>>>> Thanks > > > >>>> indeed, the status register is there to inform the device about > > > >>>> the driver status. > > > >>>> > > > >>>> > > > >>>> > > > >>>>>> If we need > > > >>>>>> to reflect and control device status, we need something else. > > > >>>>>> > > > >>> Thanks > > > >>> > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-11 10:20 ` Michael S. Tsirkin @ 2024-09-12 2:05 ` Jason Wang 2024-09-12 5:44 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-12 2:05 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 11, 2024 at 6:20 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Sep 06, 2024 at 07:51:08AM +0800, Jason Wang wrote: > > On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > > > > > > > > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > > > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > > > >> > > > > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > > > > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > > > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > > > > >>>>>>>>> state. Hidden state is bad for debuggability. > > > > >>>>>>>>> We have 4 states: > > > > >>>>>>>>> suspending->suspended->resuming->resumed > > > > >>>>>>>>> so we need a register with at least 2 bits. > > > > >>>>>>>>> > > > > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > > > >>>>>>>>> > > > > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > > > > >>>>>>> No argument here. > > > > >>>>>> Or, to be more precise, our status is driver status. > > > > >>>>> It looks like the device actually otherwise there's no need for > > > > >>>>> re-read or poll for things like reset and others. > > > > >>>> The need is there for complex device state transitions, which > > > > >>>> can not reasonably block a read response. > > > > >>>> Another standard approach with PCI is to specify the time > > > > >>>> transitions can take. I consider that less elegant - > > > > >>>> is this what you are advocating? The advantage is that > > > > >>>> driver does not load the pci bus with constant re-polling. > > > > >>>> The disadvantage is that it is hard to pick a universal > > > > >>>> number. A combination of these approaches might work, > > > > >>>> e.g. a recommended timeout then poll. > > > > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > > > > >>> sleep time, if it can overload the pci: > > > > >>> > > > > >>> while (vp_modern_get_status(mdev)) > > > > >>> msleep(1); > > > > >>> > > > > >>> We can do the same for suspending. > > > > >>> > > > > >>> The main blocker for timeout is that it may break migration and > > > > >>> complicate the hardening. Another proposal in the past is to have a > > > > >>> notification. > > > > >>> > > > > >>> But what I don't understand here is that suspend/resume should be > > > > >>> lighter than reset. If we can afford a reset, so did the > > > > >>> suspending/resume. If we want to have something new, that's fine but > > > > >>> it should be orthogonal to a specific new status bit? > > > > >> I agree, if we want new status indicator, then the new indicator should not be > > > > >> specific to SUSPEND. > > > > >> > > > > >> Thanks > > > > >> Zhu Lingshan > > > > > If you mean reset, we have a problem. > > > > > Specifically, reset has to work before feature > > > > > negotiation. So we can not do as we would with > > > > > SUSPEND and use a feature bit to expose presence > > > > > of the indicator register. > > > > I guess the driver can reset the device even after DRIVER_OK. > > > > > > in fact, unlike suspend - at any point at all. > > > > > > > IMHO the indicator should work for all device_status transitions, > > > > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > > > > and DRIVER_OK. > > > > > > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > > > FEATURES_OK brings with it more issues, e.g. it can fail. > > > > > > > > > > So we don't need a feature bit, we may just > > > > place it in the common config space as you proposed before to > > > > define common config space for all transport. We can do this for sure! > > > > > > I have to say, this project is really ballooning out ;) > > > > > > > Of course another approach is what Jason proposed, the existing > > > > msleep and poll, driver knows whether the device is in progress > > > > of status transitions because it is the driver writes device_status. > > > > > > > > Thanks > > > > > > or just a register for suspend transitions, worry about > > > extending it later/ > > > > Is this for PCI only? > > It's best to add it to all transports. Not rocket science at all. I'm asking since I wonder if the "issue" exists only for PCI. If yes, solving the PCI transport (registers) issue at the basic facility level seems like a layer violation. And I still don't see why introducing a single new bit in the status brings any new troubles compared with the existing reset and other state transitions. As mentioned before, re-read has been used for both FEAUTRES_OK and reset. > > > Another question is that, if suspend needs that, > > reset would also want. > > I don't know what does "reset would also want" means. I meant we have already had a state transition like reset. For example, you worries about: suspending->suspended->resuming->resumed But we've already had setting_features -> feature_ok(or not) -> resetting -> reseted > Unlike suspend, reset is not a special state, It depends but we had a dedicated chapter in the basic facility to describe the reset state. > so it > does not really require a spcial register to track > that state. This seems to be PCI specific. Note that ccw has a dedicated command for reset: #define CCW_CMD_VDEV_RESET 0x33 ... #define CCW_CMD_WRITE_STATUS 0x31 > > > Or it doesn't justify itself as reset needs to > > take longer time than reset. > > > > Thanks > > > > Don't know what this means, either. I meant 1) driver knows its own status and it can read device status 2) reset may take much longer time than suspend If we really need a new status register (I don't see why so far), we should not only suspend. Thanks > > > > > > > > > > > > > > Regrettably, reset will have to still be supported > > > > > through clearing status just because this is always there. > > > > > > > > > > > > > > >>>>> Anyhow driver know > > > > >>>>> its own status. > > > > >>>>> > > > > >>>>> Thanks > > > > >>>> indeed, the status register is there to inform the device about > > > > >>>> the driver status. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>>>> If we need > > > > >>>>>> to reflect and control device status, we need something else. > > > > >>>>>> > > > > >>> Thanks > > > > >>> > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-12 2:05 ` Jason Wang @ 2024-09-12 5:44 ` Michael S. Tsirkin 2024-09-24 7:35 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-12 5:44 UTC (permalink / raw) To: Jason Wang Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 12, 2024 at 10:05:02AM +0800, Jason Wang wrote: > On Wed, Sep 11, 2024 at 6:20 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Sep 06, 2024 at 07:51:08AM +0800, Jason Wang wrote: > > > On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > > > > > > > > > > > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > > > > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > > > > >> > > > > > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > > > > > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > > > > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > > > > > >>>>>>>>> state. Hidden state is bad for debuggability. > > > > > >>>>>>>>> We have 4 states: > > > > > >>>>>>>>> suspending->suspended->resuming->resumed > > > > > >>>>>>>>> so we need a register with at least 2 bits. > > > > > >>>>>>>>> > > > > > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > > > > >>>>>>>>> > > > > > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > > > > > >>>>>>> No argument here. > > > > > >>>>>> Or, to be more precise, our status is driver status. > > > > > >>>>> It looks like the device actually otherwise there's no need for > > > > > >>>>> re-read or poll for things like reset and others. > > > > > >>>> The need is there for complex device state transitions, which > > > > > >>>> can not reasonably block a read response. > > > > > >>>> Another standard approach with PCI is to specify the time > > > > > >>>> transitions can take. I consider that less elegant - > > > > > >>>> is this what you are advocating? The advantage is that > > > > > >>>> driver does not load the pci bus with constant re-polling. > > > > > >>>> The disadvantage is that it is hard to pick a universal > > > > > >>>> number. A combination of these approaches might work, > > > > > >>>> e.g. a recommended timeout then poll. > > > > > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > > > > > >>> sleep time, if it can overload the pci: > > > > > >>> > > > > > >>> while (vp_modern_get_status(mdev)) > > > > > >>> msleep(1); > > > > > >>> > > > > > >>> We can do the same for suspending. > > > > > >>> > > > > > >>> The main blocker for timeout is that it may break migration and > > > > > >>> complicate the hardening. Another proposal in the past is to have a > > > > > >>> notification. > > > > > >>> > > > > > >>> But what I don't understand here is that suspend/resume should be > > > > > >>> lighter than reset. If we can afford a reset, so did the > > > > > >>> suspending/resume. If we want to have something new, that's fine but > > > > > >>> it should be orthogonal to a specific new status bit? > > > > > >> I agree, if we want new status indicator, then the new indicator should not be > > > > > >> specific to SUSPEND. > > > > > >> > > > > > >> Thanks > > > > > >> Zhu Lingshan > > > > > > If you mean reset, we have a problem. > > > > > > Specifically, reset has to work before feature > > > > > > negotiation. So we can not do as we would with > > > > > > SUSPEND and use a feature bit to expose presence > > > > > > of the indicator register. > > > > > I guess the driver can reset the device even after DRIVER_OK. > > > > > > > > in fact, unlike suspend - at any point at all. > > > > > > > > > IMHO the indicator should work for all device_status transitions, > > > > > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > > > > > and DRIVER_OK. > > > > > > > > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > > > > FEATURES_OK brings with it more issues, e.g. it can fail. > > > > > > > > > > > > > So we don't need a feature bit, we may just > > > > > place it in the common config space as you proposed before to > > > > > define common config space for all transport. We can do this for sure! > > > > > > > > I have to say, this project is really ballooning out ;) > > > > > > > > > Of course another approach is what Jason proposed, the existing > > > > > msleep and poll, driver knows whether the device is in progress > > > > > of status transitions because it is the driver writes device_status. > > > > > > > > > > Thanks > > > > > > > > or just a register for suspend transitions, worry about > > > > extending it later/ > > > > > > Is this for PCI only? > > > > It's best to add it to all transports. Not rocket science at all. > > I'm asking since I wonder if the "issue" exists only for PCI. > If yes, > solving the PCI transport (registers) issue at the basic facility > level seems like a layer violation. I don't think so. Suspend is fundamentally a slow operation for any transport. > And I still don't see why > introducing a single new bit in the status brings any new troubles > compared with the existing reset and other state transitions. reset is not a state. > As > mentioned before, re-read has been used for both FEAUTRES_OK and > reset. FEAUTRES_OK is driver state, so can take place immediately > > > > > Another question is that, if suspend needs that, > > > reset would also want. > > > > I don't know what does "reset would also want" means. > > I meant we have already had a state transition like reset. > > For example, you worries about: > > suspending->suspended->resuming->resumed > > But we've already had > > setting_features -> feature_ok(or not) -> resetting -> reseted features ok just has to validate the bitmap, so it can take place immediately. no intermediate state. > > Unlike suspend, reset is not a special state, > > It depends but we had a dedicated chapter in the basic facility to > describe the reset state. this is two different meanings of the same word. reset is not a special state, device operates normally. > > so it > > does not really require a spcial register to track > > that state. > > This seems to be PCI specific. Note that ccw has a dedicated command for reset: > > #define CCW_CMD_VDEV_RESET 0x33 > ... > #define CCW_CMD_WRITE_STATUS 0x31 maybe reusing state 0 to reset was a bad idea. > > > > > Or it doesn't justify itself as reset needs to > > > take longer time than reset. > > > > > > Thanks > > > > > > > Don't know what this means, either. > > I meant > > 1) driver knows its own status and it can read device status > 2) reset may take much longer time than suspend that is why driver re-reads status to check for reset. > > If we really need a new status register (I don't see why so far), we > should not only suspend. > > Thanks it's not a new status register, it's a suspend register used for PM. > > > > > > > > > > > > > > > > > > Regrettably, reset will have to still be supported > > > > > > through clearing status just because this is always there. > > > > > > > > > > > > > > > > > >>>>> Anyhow driver know > > > > > >>>>> its own status. > > > > > >>>>> > > > > > >>>>> Thanks > > > > > >>>> indeed, the status register is there to inform the device about > > > > > >>>> the driver status. > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>>>> If we need > > > > > >>>>>> to reflect and control device status, we need something else. > > > > > >>>>>> > > > > > >>> Thanks > > > > > >>> > > > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-12 5:44 ` Michael S. Tsirkin @ 2024-09-24 7:35 ` Jason Wang 2024-09-24 23:05 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-24 7:35 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 12, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Sep 12, 2024 at 10:05:02AM +0800, Jason Wang wrote: > > On Wed, Sep 11, 2024 at 6:20 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Sep 06, 2024 at 07:51:08AM +0800, Jason Wang wrote: > > > > On Thu, Sep 5, 2024 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Thu, Sep 05, 2024 at 03:12:41PM +0800, Zhu Lingshan wrote: > > > > > > > > > > > > > > > > > > On 9/5/2024 2:51 PM, Michael S. Tsirkin wrote: > > > > > > > On Wed, Sep 04, 2024 at 02:38:36PM +0800, Zhu Lingshan wrote: > > > > > > >> > > > > > > >> On 9/4/2024 2:31 PM, Jason Wang wrote: > > > > > > >>> On Wed, Sep 4, 2024 at 12:03 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > >>>> On Wed, Sep 04, 2024 at 11:07:25AM +0800, Jason Wang wrote: > > > > > > >>>>> On Tue, Sep 3, 2024 at 6:37 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > >>>>>> On Tue, Sep 03, 2024 at 06:35:59AM -0400, Michael S. Tsirkin wrote: > > > > > > >>>>>>>>> But I don't like it that looking at the registers, one does not know the device > > > > > > >>>>>>>>> state. Hidden state is bad for debuggability. > > > > > > >>>>>>>>> We have 4 states: > > > > > > >>>>>>>>> suspending->suspended->resuming->resumed > > > > > > >>>>>>>>> so we need a register with at least 2 bits. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> we could steal 2 bits from status but it seems a bit much. > > > > > > >>>>>>>>> > > > > > > >>>>>>>> This is why letting the status tell the status and control register to control thing is elegant. > > > > > > >>>>>>> No argument here. > > > > > > >>>>>> Or, to be more precise, our status is driver status. > > > > > > >>>>> It looks like the device actually otherwise there's no need for > > > > > > >>>>> re-read or poll for things like reset and others. > > > > > > >>>> The need is there for complex device state transitions, which > > > > > > >>>> can not reasonably block a read response. > > > > > > >>>> Another standard approach with PCI is to specify the time > > > > > > >>>> transitions can take. I consider that less elegant - > > > > > > >>>> is this what you are advocating? The advantage is that > > > > > > >>>> driver does not load the pci bus with constant re-polling. > > > > > > >>>> The disadvantage is that it is hard to pick a universal > > > > > > >>>> number. A combination of these approaches might work, > > > > > > >>>> e.g. a recommended timeout then poll. > > > > > > >>> We've already had msleep() for vp_reset(), anyhow we can increase the > > > > > > >>> sleep time, if it can overload the pci: > > > > > > >>> > > > > > > >>> while (vp_modern_get_status(mdev)) > > > > > > >>> msleep(1); > > > > > > >>> > > > > > > >>> We can do the same for suspending. > > > > > > >>> > > > > > > >>> The main blocker for timeout is that it may break migration and > > > > > > >>> complicate the hardening. Another proposal in the past is to have a > > > > > > >>> notification. > > > > > > >>> > > > > > > >>> But what I don't understand here is that suspend/resume should be > > > > > > >>> lighter than reset. If we can afford a reset, so did the > > > > > > >>> suspending/resume. If we want to have something new, that's fine but > > > > > > >>> it should be orthogonal to a specific new status bit? > > > > > > >> I agree, if we want new status indicator, then the new indicator should not be > > > > > > >> specific to SUSPEND. > > > > > > >> > > > > > > >> Thanks > > > > > > >> Zhu Lingshan > > > > > > > If you mean reset, we have a problem. > > > > > > > Specifically, reset has to work before feature > > > > > > > negotiation. So we can not do as we would with > > > > > > > SUSPEND and use a feature bit to expose presence > > > > > > > of the indicator register. > > > > > > I guess the driver can reset the device even after DRIVER_OK. > > > > > > > > > > in fact, unlike suspend - at any point at all. > > > > > > > > > > > IMHO the indicator should work for all device_status transitions, > > > > > > not only for RESET or SUSPEND, it should also apply for FEATURE_OK > > > > > > and DRIVER_OK. > > > > > > > > > > I don't know what kind of transition is there for DRIVER/DRIVER_OK. > > > > > FEATURES_OK brings with it more issues, e.g. it can fail. > > > > > > > > > > > > > > > > So we don't need a feature bit, we may just > > > > > > place it in the common config space as you proposed before to > > > > > > define common config space for all transport. We can do this for sure! > > > > > > > > > > I have to say, this project is really ballooning out ;) > > > > > > > > > > > Of course another approach is what Jason proposed, the existing > > > > > > msleep and poll, driver knows whether the device is in progress > > > > > > of status transitions because it is the driver writes device_status. > > > > > > > > > > > > Thanks > > > > > > > > > > or just a register for suspend transitions, worry about > > > > > extending it later/ > > > > > > > > Is this for PCI only? > > > > > > It's best to add it to all transports. Not rocket science at all. > > > > I'm asking since I wonder if the "issue" exists only for PCI. > > If yes, > > solving the PCI transport (registers) issue at the basic facility > > level seems like a layer violation. > > > I don't think so. Suspend is fundamentally a slow operation for > any transport. Probably, but how such slowness makes a new register a must? And my understanding is, reset should be even slower than suspend. > > > > And I still don't see why > > introducing a single new bit in the status brings any new troubles > > compared with the existing reset and other state transitions. > > reset is not a state. Well, I'm not sure I get this. Each value read from status should represent a device state. And 0 is a valid status which means device is resetted: """ The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0. """ > > > As > > mentioned before, re-read has been used for both FEAUTRES_OK and > > reset. > > FEAUTRES_OK is driver state, so can take place immediately I'm not sure how to define immediately, but suspend could be very fast depending on the implementation. > > > > > > > > Another question is that, if suspend needs that, > > > > reset would also want. > > > > > > I don't know what does "reset would also want" means. > > > > I meant we have already had a state transition like reset. > > > > For example, you worries about: > > > > suspending->suspended->resuming->resumed > > > > But we've already had > > > > setting_features -> feature_ok(or not) -> resetting -> reseted > > > features ok just has to validate the bitmap, so it > can take place immediately. no intermediate state. True but there's an intermediate state between resetting and reseted. When is the state when the driver has written 0 to status but not read 0 from status. > > > > Unlike suspend, reset is not a special state, > > > > It depends but we had a dedicated chapter in the basic facility to > > describe the reset state. > > this is two different meanings of the same word. > reset is not a special state, device operates normally. > > > > > so it > > > does not really require a spcial register to track > > > that state. > > > > This seems to be PCI specific. Note that ccw has a dedicated command for reset: > > > > #define CCW_CMD_VDEV_RESET 0x33 > > ... > > #define CCW_CMD_WRITE_STATUS 0x31 > > > maybe reusing state 0 to reset was a bad idea. Not sure, but it's too late to change that. > > > > > > > > > Or it doesn't justify itself as reset needs to > > > > take longer time than reset. > > > > > > > > Thanks > > > > > > > > > > Don't know what this means, either. > > > > I meant > > > > 1) driver knows its own status and it can read device status > > 2) reset may take much longer time than suspend > > that is why driver re-reads status to check for reset. Yes, but this is exactly how suspend is supposed to work for this patch. Reset has one intermediate state and drivers need to poll for the completion. Suspending has two intermediate states, drivers need to poll the completion for them. > > > > > > If we really need a new status register (I don't see why so far), we > > should not only suspend. > > > > Thanks > > it's not a new status register, it's a suspend register used for PM. I think power management should belong to the transport layer, not the virtio layer. Thanks > > > > > > > > > > > > > > > > > > > > > > > > Regrettably, reset will have to still be supported > > > > > > > through clearing status just because this is always there. > > > > > > > > > > > > > > > > > > > > >>>>> Anyhow driver know > > > > > > >>>>> its own status. > > > > > > >>>>> > > > > > > >>>>> Thanks > > > > > > >>>> indeed, the status register is there to inform the device about > > > > > > >>>> the driver status. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>>>> If we need > > > > > > >>>>>> to reflect and control device status, we need something else. > > > > > > >>>>>> > > > > > > >>> Thanks > > > > > > >>> > > > > > > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-24 7:35 ` Jason Wang @ 2024-09-24 23:05 ` Michael S. Tsirkin 2024-09-25 3:47 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-24 23:05 UTC (permalink / raw) To: Jason Wang Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > And I still don't see why > > > introducing a single new bit in the status brings any new troubles > > > compared with the existing reset and other state transitions. > > > > reset is not a state. > > Well, I'm not sure I get this. Each value read from status should > represent a device state. What makes you say so? status represents driver state, not device state. In fact, driver writes there and does not double check with the sole exception of reset. > And 0 is a valid status which means device > is resetted: > > """ > The driver SHOULD consider a driver-initiated reset complete when it > reads device status as 0. > """ It means all registers have their reset values, but it is not a special state in that device behaves normally in this state, it does not need to be handles in any special way. Contrast with suspend where device does not respond and must be first resumed. -- MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-24 23:05 ` Michael S. Tsirkin @ 2024-09-25 3:47 ` Jason Wang 2024-09-25 11:17 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-25 3:47 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 25, 2024 at 7:05 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > > > And I still don't see why > > > > introducing a single new bit in the status brings any new troubles > > > > compared with the existing reset and other state transitions. > > > > > > reset is not a state. > > > > Well, I'm not sure I get this. Each value read from status should > > represent a device state. > > What makes you say so? > status represents driver state, not device state. > In fact, driver writes there and does not double check with the > sole exception of reset. For example, spec said """ 2.1 Device Status Field During device initialization by a driver, the driver follows the sequence of steps specified in 3.1. The device status field provides a simple low-level indication of the completed steps of this sequence. It’s most useful to imagine it hooked up to traffic lights on the console indicating the status of each device. The following bits are defined (listed below in the order in which they would be typically set): """ Driver knows its own state, so there's no need for the driver to read it from the device. > > > And 0 is a valid status which means device > > is resetted: > > > > """ > > The driver SHOULD consider a driver-initiated reset complete when it > > reads device status as 0. > > """ > > It means all registers have their reset values, This is exactly the device status? Drivers need to know that there's a device state that is safe to start to program. > but it is > not a special state in that device behaves normally in this > state, it does not need to be handles in any special way. > > Contrast with suspend where device does not respond and > must be first resumed. I think not. Driver can choose to reset the device when the device is suspended. Thanks > > > -- > MST > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-25 3:47 ` Jason Wang @ 2024-09-25 11:17 ` Michael S. Tsirkin 2024-09-27 4:08 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-25 11:17 UTC (permalink / raw) To: Jason Wang Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 25, 2024 at 11:47:49AM +0800, Jason Wang wrote: > On Wed, Sep 25, 2024 at 7:05 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > > > > > And I still don't see why > > > > > introducing a single new bit in the status brings any new troubles > > > > > compared with the existing reset and other state transitions. > > > > > > > > reset is not a state. > > > > > > Well, I'm not sure I get this. Each value read from status should > > > represent a device state. > > > > What makes you say so? > > status represents driver state, not device state. > > In fact, driver writes there and does not double check with the > > sole exception of reset. > > > For example, spec said > > """ > > 2.1 Device Status Field yes, this is confusingly written > During device initialization by a driver, the driver follows the > sequence of steps specified in 3.1. > > The device status field provides a simple low-level indication of the > completed steps of this sequence. And note the sequence is driver sequence. > It’s most useful to imagine it > hooked up to traffic lights on the console indicating the status of > each device. The following bits are defined (listed below in the order > in which they would be typically set): > > """ > Driver knows its own state, so there's no need for the driver to read > it from the device. That is why it never reads status. > > > > > And 0 is a valid status which means device > > > is resetted: > > > > > > """ > > > The driver SHOULD consider a driver-initiated reset complete when it > > > reads device status as 0. > > > """ > > > > It means all registers have their reset values, > > This is exactly the device status? Drivers need to know that there's a > device state that is safe to start to program. yes but otherwise, all registers respond normally device is assumed to be fully operational, and it is assumed that no time at all is needed to get from that to DRIVER. > > but it is > > not a special state in that device behaves normally in this > > state, it does not need to be handles in any special way. > > > > Contrast with suspend where device does not respond and > > must be first resumed. > > I think not. Driver can choose to reset the device when the device is suspended. > > Thanks Yes, reset should take device out of suspend. > > > > > > > > -- > > MST > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-25 11:17 ` Michael S. Tsirkin @ 2024-09-27 4:08 ` Jason Wang 2024-09-29 17:55 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Jason Wang @ 2024-09-27 4:08 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Wed, Sep 25, 2024 at 7:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Sep 25, 2024 at 11:47:49AM +0800, Jason Wang wrote: > > On Wed, Sep 25, 2024 at 7:05 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > > > > > > > And I still don't see why > > > > > > introducing a single new bit in the status brings any new troubles > > > > > > compared with the existing reset and other state transitions. > > > > > > > > > > reset is not a state. > > > > > > > > Well, I'm not sure I get this. Each value read from status should > > > > represent a device state. > > > > > > What makes you say so? > > > status represents driver state, not device state. > > > In fact, driver writes there and does not double check with the > > > sole exception of reset. > > > > > > For example, spec said > > > > """ > > > > 2.1 Device Status Field > > yes, this is confusingly written Anything makes you think it is confusing? > > > During device initialization by a driver, the driver follows the > > sequence of steps specified in 3.1. > > > > The device status field provides a simple low-level indication of the > > completed steps of this sequence. > > And note the sequence is driver sequence. Right but it means the driver needs to know the status of the device in order to proceed. > > > It’s most useful to imagine it > > hooked up to traffic lights on the console indicating the status of > > each device. The following bits are defined (listed below in the order > > in which they would be typically set): > > > > """ > > > > > > Driver knows its own state, so there's no need for the driver to read > > it from the device. > > That is why it never reads status. Well, looking at the current Linux drivers there're plenty (in virtio core, there could be other in the transport) drivers/virtio/virtio.c: return sysfs_emit(buf, "0x%08x\n", dev->config->get_status(dev)); drivers/virtio/virtio.c: dev->config->set_status(dev, dev->config->get_status(dev) | status); drivers/virtio/virtio.c: status = dev->config->get_status(dev); drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) drivers/virtio/virtio.c: WARN_ON_ONCE(dev->config->get_status(dev)); drivers/virtio/virtio.c: dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) > > > > > > > > > And 0 is a valid status which means device > > > > is resetted: > > > > > > > > """ > > > > The driver SHOULD consider a driver-initiated reset complete when it > > > > reads device status as 0. > > > > """ > > > > > > It means all registers have their reset values, > > > > This is exactly the device status? Drivers need to know that there's a > > device state that is safe to start to program. > > yes but otherwise, all registers respond normally > device is assumed to be fully operational, and > it is assumed that no time at all is needed to > get from that to DRIVER. > > > > > but it is > > > not a special state in that device behaves normally in this > > > state, it does not need to be handles in any special way. > > > > > > Contrast with suspend where device does not respond and > > > must be first resumed. > > > > I think not. Driver can choose to reset the device when the device is suspended. > > > > Thanks > > Yes, reset should take device out of suspend. Thanks > > > > > > > > > > > > > > -- > > > MST > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-27 4:08 ` Jason Wang @ 2024-09-29 17:55 ` Michael S. Tsirkin 2024-10-17 6:56 ` Jason Wang 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-29 17:55 UTC (permalink / raw) To: Jason Wang Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Fri, Sep 27, 2024 at 12:08:54PM +0800, Jason Wang wrote: > On Wed, Sep 25, 2024 at 7:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Sep 25, 2024 at 11:47:49AM +0800, Jason Wang wrote: > > > On Wed, Sep 25, 2024 at 7:05 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > > > > > > > > > And I still don't see why > > > > > > > introducing a single new bit in the status brings any new troubles > > > > > > > compared with the existing reset and other state transitions. > > > > > > > > > > > > reset is not a state. > > > > > > > > > > Well, I'm not sure I get this. Each value read from status should > > > > > represent a device state. > > > > > > > > What makes you say so? > > > > status represents driver state, not device state. > > > > In fact, driver writes there and does not double check with the > > > > sole exception of reset. > > > > > > > > > For example, spec said > > > > > > """ > > > > > > 2.1 Device Status Field > > > > yes, this is confusingly written > > Anything makes you think it is confusing? It says Device Status but most of it reflects the driver probe progress. Compare to pci RO status and RW control. > > > > > During device initialization by a driver, the driver follows the > > > sequence of steps specified in 3.1. > > > > > > The device status field provides a simple low-level indication of the > > > completed steps of this sequence. > > > > And note the sequence is driver sequence. > > Right but it means the driver needs to know the status of the device > in order to proceed. again this is not device status at all. > > > > > It’s most useful to imagine it > > > hooked up to traffic lights on the console indicating the status of > > > each device. The following bits are defined (listed below in the order > > > in which they would be typically set): > > > > > > """ > > > > > > > > > > > Driver knows its own state, so there's no need for the driver to read > > > it from the device. > > > > That is why it never reads status. > > Well, looking at the current Linux drivers there're plenty (in virtio > core, there could be other in the transport) > > drivers/virtio/virtio.c: return sysfs_emit(buf, "0x%08x\n", > dev->config->get_status(dev)); > drivers/virtio/virtio.c: dev->config->set_status(dev, > dev->config->get_status(dev) | status); > drivers/virtio/virtio.c: status = dev->config->get_status(dev); > drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & > VIRTIO_CONFIG_S_DRIVER_OK)) > drivers/virtio/virtio.c: WARN_ON_ONCE(dev->config->get_status(dev)); > drivers/virtio/virtio.c: dev->failed = > dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; > drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & > VIRTIO_CONFIG_S_DRIVER_OK)) It's done like this for robustness and simplicity. But we could easily cache it in the driver on write. > > > > > > > > > > > > > And 0 is a valid status which means device > > > > > is resetted: > > > > > > > > > > """ > > > > > The driver SHOULD consider a driver-initiated reset complete when it > > > > > reads device status as 0. > > > > > """ > > > > > > > > It means all registers have their reset values, > > > > > > This is exactly the device status? Drivers need to know that there's a > > > device state that is safe to start to program. > > > > yes but otherwise, all registers respond normally > > device is assumed to be fully operational, and > > it is assumed that no time at all is needed to > > get from that to DRIVER. > > > > > > > > but it is > > > > not a special state in that device behaves normally in this > > > > state, it does not need to be handles in any special way. > > > > > > > > Contrast with suspend where device does not respond and > > > > must be first resumed. > > > > > > I think not. Driver can choose to reset the device when the device is suspended. > > > > > > Thanks > > > > Yes, reset should take device out of suspend. > > Thanks > > > > > > > > > > > > > > > > > > > > > -- > > > > MST > > > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-29 17:55 ` Michael S. Tsirkin @ 2024-10-17 6:56 ` Jason Wang 0 siblings, 0 replies; 69+ messages in thread From: Jason Wang @ 2024-10-17 6:56 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, Parav Pandit, cohuck@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Mon, Sep 30, 2024 at 1:55 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Sep 27, 2024 at 12:08:54PM +0800, Jason Wang wrote: > > On Wed, Sep 25, 2024 at 7:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Wed, Sep 25, 2024 at 11:47:49AM +0800, Jason Wang wrote: > > > > On Wed, Sep 25, 2024 at 7:05 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Tue, Sep 24, 2024 at 03:35:57PM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > And I still don't see why > > > > > > > > introducing a single new bit in the status brings any new troubles > > > > > > > > compared with the existing reset and other state transitions. > > > > > > > > > > > > > > reset is not a state. > > > > > > > > > > > > Well, I'm not sure I get this. Each value read from status should > > > > > > represent a device state. > > > > > > > > > > What makes you say so? > > > > > status represents driver state, not device state. > > > > > In fact, driver writes there and does not double check with the > > > > > sole exception of reset. > > > > > > > > > > > > For example, spec said > > > > > > > > """ > > > > > > > > 2.1 Device Status Field > > > > > > yes, this is confusingly written > > > > Anything makes you think it is confusing? > > > It says Device Status but most of it reflects the driver probe progress. > Compare to pci RO status and RW control. My understanding is that 1) any value that is read from the device status reflects a device status. 2) any value write from the driver for the device status reflects a device status that driver want to go Here are some examples: 1) DRIVER_OK: """ The device MUST NOT consume buffers or send any used buffer notifications to the driver before DRIVER_OK. """ It's a device status that "Device knows driver is OK" 2) FEATURES_OK """ The device MUST NOT offer a feature which requires another feature which was not offered. The device SHOULD accept any valid subset of features the driver accepts, otherwise it MUST fail to set the FEATURES_OK device status bit when the driver writes it. """ Device tells the driver that it doesn't support some of the features. 3) NEEDS_RESET: """ The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device MUST send a device configuration change notification to the driver. """ This is a device status but not a driver. And usually, device status is a bitmask of several bits of status defined in the spec. > > > > > > > > During device initialization by a driver, the driver follows the > > > > sequence of steps specified in 3.1. > > > > > > > > The device status field provides a simple low-level indication of the > > > > completed steps of this sequence. > > > > > > And note the sequence is driver sequence. > > > > Right but it means the driver needs to know the status of the device > > in order to proceed. > > again this is not device status at all. > > > > > > > > > > It’s most useful to imagine it > > > > hooked up to traffic lights on the console indicating the status of > > > > each device. The following bits are defined (listed below in the order > > > > in which they would be typically set): > > > > > > > > """ > > > > > > > > > > > > > > > > Driver knows its own state, so there's no need for the driver to read > > > > it from the device. > > > > > > That is why it never reads status. > > > > Well, looking at the current Linux drivers there're plenty (in virtio > > core, there could be other in the transport) > > > > drivers/virtio/virtio.c: return sysfs_emit(buf, "0x%08x\n", > > dev->config->get_status(dev)); > > drivers/virtio/virtio.c: dev->config->set_status(dev, > > dev->config->get_status(dev) | status); > > drivers/virtio/virtio.c: status = dev->config->get_status(dev); > > drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & > > VIRTIO_CONFIG_S_DRIVER_OK)) > > drivers/virtio/virtio.c: WARN_ON_ONCE(dev->config->get_status(dev)); > > drivers/virtio/virtio.c: dev->failed = > > dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; > > drivers/virtio/virtio.c: if (!(dev->config->get_status(dev) & > > VIRTIO_CONFIG_S_DRIVER_OK)) > > It's done like this for robustness and simplicity. > But we could easily cache it in the driver on write. Probably not, one example: virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); status = dev->config->get_status(dev); if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { dev_err(&dev->dev, "virtio: device refuses features: %x\n", status); return -ENODEV; } Devices may refuse the features. Thanks > > > > > > > > > > > > > > > > > > > And 0 is a valid status which means device > > > > > > is resetted: > > > > > > > > > > > > """ > > > > > > The driver SHOULD consider a driver-initiated reset complete when it > > > > > > reads device status as 0. > > > > > > """ > > > > > > > > > > It means all registers have their reset values, > > > > > > > > This is exactly the device status? Drivers need to know that there's a > > > > device state that is safe to start to program. > > > > > > yes but otherwise, all registers respond normally > > > device is assumed to be fully operational, and > > > it is assumed that no time at all is needed to > > > get from that to DRIVER. > > > > > > > > > > > but it is > > > > > not a special state in that device behaves normally in this > > > > > state, it does not need to be handles in any special way. > > > > > > > > > > Contrast with suspend where device does not respond and > > > > > must be first resumed. > > > > > > > > I think not. Driver can choose to reset the device when the device is suspended. > > > > > > > > Thanks > > > > > > Yes, reset should take device out of suspend. > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 9:05 ` Zhu Lingshan 2024-09-03 9:45 ` Michael S. Tsirkin @ 2024-09-03 10:28 ` Parav Pandit 2024-09-05 7:20 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-09-03 10:28 UTC (permalink / raw) To: Zhu Lingshan, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Tuesday, September 3, 2024 2:36 PM > > On 8/30/2024 11:02 AM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Friday, August 30, 2024 8:02 AM > >> > >> On 8/15/2024 5:34 PM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Thursday, August 15, 2024 1:53 PM > >>>> > >>>> On 8/13/2024 2:55 PM, Parav Pandit wrote: > >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>> Sent: Tuesday, August 13, 2024 11:44 AM > >>>>>> > >>>>> I removed the unreachable intel email id as every single email is > >>>>> bouncing > >>>> from it. > >>>>> Please consider dropping that email from v8 as it will cause all > >>>>> reviewers > >>>> email to bounce. > >>>>>> On 8/13/2024 1:50 PM, Parav Pandit wrote: > >>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>>>> Sent: Tuesday, August 13, 2024 11:15 AM > >>>>>>>> > >>>>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: > >>>>>>>>> Hi Lingshan, David, > >>>>>>>>> > >>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM > >>>>>>>>>> > >>>>>>>>>> This commit allows the driver to suspend the device by > >>>>>>>>>> introducing a new status bit SUSPEND in device_status. > >>>>>>>>>> > >>>>>>>>>> This commit also introduce a new feature bit > VIRTIO_F_SUSPEND > >>>>>>>>>> which indicating whether the device support SUSPEND. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> > >>>>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> > >>>>>>>>>> --- > >>>>>>>>>> content.tex | 75 > >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++- > >>>>>>>> ---- > >>>>>>>>>> -- > >>>>>>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) > >>>>>>>>>> > >>>>>>>>>> Changes from V6: > >>>>>>>>>> - the device should hold its config interrupt while SUSPEND, > >>>>>>>>>> and send config interrupt when the SUSPEND bit is cleared. > >>>>>>>>>> - while SUSPEND, the driver MUST NOT access Device > >>>>>>>>>> Configuration Space > >>>>>>>>>> - minor changes. > >>>>>>>>>> > >>>>>>>>>> Changes from V5: > >>>>>>>>>> - the device should present NEEDS_RESET if failed to suspend > >>>>>>>>>> - allow the driver access device status in the config space when > >>>>>>>>>> suspended if it is implemented in config space. > >>>>>>>>>> - language improvements > >>>>>>>>>> > >>>>>>>>>> Changes from V4: > >>>>>>>>>> - re-order the device status bits section > >>>>>>>>>> - kick vqs --> notify vqs > >>>>>>>>>> > >>>>>>>>>> Changes from V3: > >>>>>>>>>> - allow the driver clearing the SUSPEND bit to resume the > device. > >>>>>>>>>> - disallow access to config space while suspended. > >>>>>>>>>> > >>>>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 > >>>>>>>>>> 100644 > >>>>>>>>>> --- a/content.tex > >>>>>>>>>> +++ b/content.tex > >>>>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} > >>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>>>>>> this bit. For example, under Linux, drivers can be > >>>>>>>>>> loadable > >> modules. > >>>>>>>>>> \end{note} > >>>>>>>>>> > >>>>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in > >>>>>>>>>> the guest, > >>>>>>>>>> - and it has given up on the device. This could be an > >>>>>>>>>> internal > >>>>>>>>>> - error, or the driver didn't like the device for some > >>>>>>>>>> reason, or > >>>>>>>>>> - even a fatal error during device operation. > >>>>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>>>>>> +ready to > >>>>>>>>>> + drive the device. > >>>>>>>>>> > >>>>>>>>>> \item[FEATURES_OK (8)] Indicates that the driver has > >>>>>>>>>> acknowledged all > >>>>>>>> the > >>>>>>>>>> features it understands, and feature negotiation is complete. > >>>>>>>>>> > >>>>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and > >>>>>>>>>> ready to > >>>>>>>>>> - drive the device. > >>>>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, > >>>>>> indicates > >>>>>>>>>> +that the > >>>>>>>>>> + device has been suspended by the driver. > >>>>>>>>>> > >>>>>>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > >>>>>>>> experienced > >>>>>>>>>> an error from which it can't recover. > >>>>>>>>>> + > >>>>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in > >>>>>>>>>> +the guest, > >>>>>>>>>> + and it has given up on the device. This could be an > >>>>>>>>>> +internal > >>>>>>>>>> + error, or the driver didn't like the device for some > >>>>>>>>>> +reason, or > >>>>>>>>>> + even a fatal error during device operation. > >>>>>>>>>> \end{description} > >>>>>>>>>> > >>>>>>>>>> The \field{device status} field starts out as 0, and is > >>>>>>>>>> reinitialized to 0 by @@ - > >>>>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status} > >>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>>>>>> initialization sequence specified in \ref{sec:General > >>>>>>>>>> Initialization And Device Operation / Device > >>>>>>>> Initialization}. > >>>>>>>>>> -The driver MUST NOT clear a > >>>>>>>>>> -\field{device status} bit. If the driver sets the FAILED > >>>>>>>>>> bit, > >>>>>>>>>> +The driver MUST NOT clear a \field{device status} bit other > >>>>>>>>>> +than SUSPEND except when setting \field{device status} to 0 > >>>>>>>>>> +as a transport-specific way to initiate a reset. If the > >>>>>>>>>> +driver sets the FAILED bit, > >>>>>>>>>> the driver MUST later reset the device before attempting to > >>>>>>>>>> re- > >>>>>> initialize. > >>>>>>>>>> The driver SHOULD NOT rely on completion of operations of a > >> @@ > >>>>>>>>>> -99,10 > >>>>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities > >>>>>>>>>> +of a Virtio Device > >>>>>>>>>> / Feature B \begin{description} > >>>>>>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific > >>>>>>>>>> device type > >>>>>>>>>> > >>>>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the > >>>>>>>>>> queue and > >>>>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the > >>>>>>>>>> +queue and > >>>>>>>>>> feature negotiation mechanisms > >>>>>>>>>> > >>>>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for > >>>>>>>>>> future extensions. > >>>>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for > >>>>>>>>>> +future > >>>>>>>>>> extensions. > >>>>>>>>>> \end{description} > >>>>>>>>>> > >>>>>>>>>> \begin{note} > >>>>>>>>>> @@ -629,6 +633,53 @@ \section{Device > >> Cleanup}\label{sec:General > >>>>>>>>>> Initialization And Device Operation / > >>>>>>>>>> > >>>>>>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device > >>>>>>>>>> reset) before removing exposed buffers. > >>>>>>>>>> > >>>>>>>>>> +\section{Device Suspend}\label{sec:General Initialization > >>>>>>>>>> +And Device Operation / Device Suspend} > >>>>>>>>>> + > >>>>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the > >>>>>>>>>> +SUSPEND bit in \field{device status} to suspend a device, > >>>>>>>>>> +and can clear the SUSPEND bit to resume a suspended device. > >>>>>>>>>> + > >>>>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General > >>>>>>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>>>>>> + > >>>>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set > or > >>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>>>>>> + > >>>>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the > >> device: > >>>>>>>>>> +\begin{itemize} > >>>>>>>>>> +\item The driver MUST re-read \field{device status} to > >>>>>>>>>> +verify whether the > >>>>>>>>>> SUSPEND bit is set. > >>>>>>>>>> +\item The driver MUST NOT make any more buffers available to > >>>>>>>>>> +the > >>>>>>>> device. > >>>>>>>>>> +\item The driver MUST NOT access any virtqueues or send > >>>>>>>>>> +notifications for > >>>>>>>>>> any virtqueues. > >>>>>>>>>> +\item The driver MUST NOT access Device Configuration Space. > >>>>>>>>>> +\end{itemize} > >>>>>>>>>> + > >>>>>>>> Hi Parva > >>>>>>>>> Do we agree that > >>>>>>>>> a. suspending a device is non frequent operation (in order of > >>>>>>>>> N > >>>>>>>> operations/sec, where N is roughly in range of 10 or 100) per > device? > >>>>>>>> Ideally it should not be often in normal operations, but > >>>>>>>> remember we can not restrict the behaviors of the driver, so we > >>>>>>>> must be able to handle the scenario in which SUSPENDING is > often. > >>>>>>> Sure. the intent is slow rate, but one can do at unexpected times. > >>>>>>> Do you agree? > >>>>>> I think we don't have an intention of the frequency in the spec. > >>>>> Sure. > >>>>> > >>>>>> The spec only provides generic mechanisms and interfaces. > >>>>> Sure. > >>>>> > >>>>>> Don't assume it(or driver wants it to be) would be often or not, > >>>>>> that depends on the driver. > >>>>> As you rightly said : it cannot be assumed. > >>>>> The driver will read the device status right after it wrote it. > >>>>> This typically is < > >>>> 50nsec of time. > >>>>> The suspend operation for a net device to store hundreds of > >>>>> queues, RSS > >>>> table, flow filters, takes plenty of time (at least more than 50nsec :) ). > >>>>> Similarly for the GPU to store some MBs of memory takes more than > >>>>> 50nsec > >>>> of time, for example to store in a file for a software-based GPU device. > >>>>> So a device cannot respond back suspend=true in next 50nsec time. > >>>> It's OK for the device to take longer time to respond, the driver > >>>> simply re-reads device status. > >>> Sure, the issue is, when the driver re-reads, the device must > >>> present > >> suspend=false within 50nsec. > >>> (because device didn't suspend it yet). > >>> > >>> As I explained in previous email, this requires building special circuitry. > >>> Such circuitry can be avoided if the suspend interface is done > >>> slightly > >> differently. > >> why there is a constraint condition of the time? > > Because this is what driver does as you explained in [1] indicating "we must > be able to handle ..." > > > > [1] > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore > > .kernel.org%2Fvirtio-comment%2Fc4d5eed3-774b-4d35-a007- > f9dff28ce884%40 > > > amd.com%2FT%2F%23m6f081f96ef9dcea29c64a88b633eb21d50e8c410&dat > a=05%7C0 > > > 2%7Cparav%40nvidia.com%7C4a248d6982514660015808dccbf79c55%7C430 > 83d1572 > > > 7340c1b7db39efd9ccc17a%7C0%7C0%7C638609511605195425%7CUnknown > %7CTWFpbG > > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6 > Mn0% > > > 3D%7C0%7C%7C%7C&sdata=Gj4oG8Z1O7wEl%2B7Mbud%2BbP31aMDMAFA > H9ElHJSvBTUA% > > 3D&reserved=0 > The sentence is "we must be able to handle the scenario in which > SUSPENDING is often", where do you see any constraint conditions of time? If the device takes X usec, how can the device inform that, it suspending in progress without flipping the polarity? > > > >> Are there any similar > >> constraining for other states like RESET or DRIVER_OK? Don't assume > >> any other states transitions are faster than SUSPEND. > > DRIVER_OK does not suffer from it because it is async notification. > > A device may start slow after DRIVER_OK. > > SUSPEND operation cannot rely on such async behavior. > The driver can set suspend and re-read & wait. This is a common routine in > the driver. > > > > RESET also suffers from similar inefficiencies. > > But that is because it is inherited from the past. > > > > Here a new functionality is being proposed and it has a chance for efficient > device implementation. > > Therefore the request is to improve it. > sure, as long as we confirm this new register apply for all device status > transitions, not only for SUSPEND. Alright, please do it as you suggested here. > > > >>>>> More below. > >>>>> > >>>>>>>>> b. A software-based device may not always want to force > >>>>>>>>> VM_EXIT on read > >>>>>>>> and write on the device_status register? > >>>>>>>> Trap and Emulation is the basic of virtualization, and how to > >>>>>>>> pass-through a device is out of this spec. > >>>>>>>> > >>>>>>> Sure, I didn't suggest to put such things in the spec. > >>>>>>> My question is, whether to trap and emulate or not is a choice > >>>>>>> of the > >>>>>> software. > >>>>>>> Do you agree? > >>>>>> The device emulator does not know anything about whether trapped > >> or > >>>> not. > >>>>>> Trapping and Emulation is a hypervisor thing. > >>>>>> > >>>>>> If here "software" refers to the device emulator, then Yes, it is > >>>>>> not the emulator's decision. And the device should not be aware > >>>>>> of VM_EXIT & VM_ENTRY. > >>>>>> > >>>>> Right. So a software wants to implement device_status as pure MMIO > >>>> writes. (and not VM_EXIT). > >>>> This is not always true, there can be VM_EXIT of pure emulated > >>>> devices. > >>> I am not denying that VM_EXIT can/cannot be there. > >>> I am saying, the proposal forces VM_EXIT based approach in my > >> understanding of this patch. > > [MARKER_1] > > > >>> If that is not true, may be can you please explain how this can be > >> implemented without VM_EXIT? > >>> We should have an interface that can be done with and without > >>> VM_EXIT > >> method at least for any new additions. > >> VM_EXIT is out of spec, it is a hypervisor and the processor thing. > > You keep repeating VM_EXIST is out of spec. > > I already replied at [MARKER_1], sure it is out of spec, the current approach > forces one to do VM_EXIT based approach. > > And if not, please explain, how can it be achieved? > It is not the current solution force the hypervisor & guest to perform > VM_EXIT. I read it few times but couldn't parse it. Can you please explain how your proposal works WITHOUT VM_EXIT? If not, you can say, that current proposal is limited to force VM_EXIT. Please explain. > In basic virtualization, any access to hypervisor emulated HW registers would > be Traped and Emulated, means a VM_EXIT and a VM_ENTRY. That was not my question, my question is how to achieve without VM_EXIT? You are suggesting a spec that forces VM_EXIT. > > > >> In non-pass-through case, any register access are sensitive and will > >> trigger VM_EXIT. Like RESET or DRIVER_OK needs to access > >> device_status, nothing different. > > I don't think you understood the point. > > Let me repeat, The question is, if the device implementation wants to > achieve the functionality without VM_EXIT, what is the way? > Map the register to guest address space, out of the spec. And which component will flip the bit of suspend before driver reads it to indicate that device is still_suspending? > > > >>>> The > >>>> HW registers are sensitive resource and any access to them need to > >>>> be trapped and emulated. > >>> This does not apply to PCI PFs and VFs which are HW devices (mainly > PFs). > >>> so this trap + emulation is narrow view that we better avoid. > >> This is how *basic* virtualization work, once access sensitive resource, > trap it. > >>> If you think this is the way forward, you should put forward in > >>> patch as > >> MUST requirement. > >>> and that does not look right to me. > >>> I hope you also don't mean to force this method to device > >> implementations. > >>> Right? > >> Again, VM_EXIT is a hypervisor thing, out of spec. Whether there is > >> a VM_EXIT when setting SUSPEND totally depends on the virtualization > >> solution. And SUSPEND is nothing different from DRIVER_OK. > >> > > Please avoid repeating the point that VM_EXIT is hypervosor thing. > > No one asked to put this in spec. Please re-read [MARKER1]. You probably > missed that. > > > > SUSPEND is different than DRIVER_OK. I explained the timing constraints > and the required circuitry needed to fulfill the proposal. > > And with the additional register, such complicated circuitry can be easily > avoided. > Please read QEMU code, that can help you find how the trap - emulate > mechanism work for guest when it tries to access sensitive resources. I am not suggesting trap and emulation. Please come out of this looping thoughts. I repeat my question, please explain how does suggested solution works without VM_EXIT? From your response, I derive it cannot. Can you please confirm? > > > >> Means, if your virtualization needs to trap SUSPEND, it also needs to > >> trap DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND. > > DRIVER_OK is by law of physics is faster than SUSPEND because it does not > demand the driver of reading back. > > There is no driver side loop to check if the device accepted DRIVER_OK or > not. > > Agree? > Why do you assume so? I am not assuming. It is the current spec. :) Please read the device initialization sequence, that does NOT say that driver must READ DRIVER_OK. Hence by law of physics it is faster. > In some vendor implementation, DRIVER_OK can be > slow. Can DRIVER_OK implement a deferred initialization even deferred > resource allocation? Does the spec say that DRIVER_OK must be faster? No. it does not. But your suspend proposal says so that device must respond back before the next read arrives from the driver. > > > >>>>> And prefer to returning SUSPEND=true at slow pace. > >>>>> This means, the device implementation cannot immediately return > >>>> suspend=true right after it was written. > >>>>> A MMIO read will read it back, as suspend=true. > >>>>> > >>>>> An alternative would be, to forward CPU loads and CPU stores to > >>>>> different > >>>> address. > >>>>> However, this does not work for the hw based devices. > >>>>> > >>>>> That means, PCI HW needs to return suspend=0, until the device is > >>>>> not > >>>> suspended. > >>>>> In this example, the device cannot build special circuitry to > >>>>> answer > >>>> suspend=true within 50nsec, or in other words building special > >>>> circuitry to return suspend=false is too complex for the slow operation. > >>>> why? The device can just not to change the value of the SUSPEND bit > >>>> before it has fully suspended. > >>> When driver wrote, it wrote suspend=true, And device returns > >>> suspend=false while suspend is ongoing, right? > >>> If yes, this is expensive because the device needs to operate within > >>> 50nsec > >> or less to answer suspend=false. > >>> And even worst, it needs to suspend=true when unsuspending within > >> 50nsec when resuming is ongoing. > >> again, there is not a 50nsec constraining and please take a reference > >> of how DRIVER_OK work with virtualization. > > There is. The device is expected to return back the desired value to indicate > driver that suspend is ongoing. > > It is different than DRIVER_OK. > where does the spec or any code say 50nsec? Your spec and your previous comment implies that when device is busy suspending, it must returned the toggled value than what driver set it. Isn't it? It means device needs to react before the next read arrives from the driver, isn't it? I took that read as 50nsec example as you explained previously that driver will read very fast. > > > >>>>> If this understanding of burden is clear, > >>>>> > >>>>> The proposal is, can you please extend the interface such that, > >>>>> > >>>>> 1. driver writes suspend command. > >>>>> 2. driver reads suspend_status, and receives not_completed=(false). > >>>>> This is > >>>> the default value. > >>>>> 3. When the device completes suspend, it changes the polarity of > >>>> suspend_status=true. > >>>>> This has two main benefits: > >>>>> [A] This will enable software-based devices to write data to slow > >>>>> files and > >>>> does not have to force VM_EXITs. > >>>>> [B] It also enables hw based devices to not build special > >>>>> circuitry to answer > >>>> within 50nsec, which can get very complicated for tens or hundreds > >>>> of PCI PFs. > >>>> I think we have already discussed on this before in V5, and Jason > >>>> has some insightful comments > >>>> > >>> Unfortunately, not. His comment was that it is not specific to suspend. > >>> But here we are introducing a new interface and functionality that > >>> does not > >> need to suffer or follow anything that may not be efficient. > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor > >> > %2F&data=05%7C02%7Cparav%40nvidia.com%7C4a248d6982514660015808 > dccbf79 > >> > c55%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6386095116052 > 07133%7 > >> > CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB > TiI6Ik > >> > 1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=VGOfVe0deSDilzufY7NH > gU5RmciNh > >> dXMMQiVUSiP%2BKM%3D&reserved=0 > >>>> e.kernel.org%2Fvirtio-comment%2F20240612082055-mutt-send-email- > >> mst%40 > >> > kernel.org%2FT%2F%23mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9&dat > >> a=05 > >> > %7C02%7Cparav%40nvidia.com%7C619c82b60b824ca03f8808dcc89beb4a%7 > >> C43083 > >> > d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638605819247985989%7CUn > >> known%7C > >> > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL > >> CJXV > >> > CI6Mn0%3D%7C0%7C%7C%7C&sdata=VqmJmqt3k5tf3x4ihjE5Bd7u59GadOn > >> OaOfJ5lvG > >>>> 1DE%3D&reserved=0 > >>>>>> this is out of the spec anyway. > >>>>>> > >>>>>> Thanks > >>>>>> Zhu Lingshan > >>>>>>>> Thanks > >>>>>>>> Zhu Lingshan > >>>>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General > >>>>>>>>>> +Initialization And Device Operation / Device Suspend} > >>>>>>>>>> + > >>>>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or > >>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. > >>>>>>>>>> + > >>>>>>>>>> +The device MUST ignore all access to its Configuration Space > >>>>>>>>>> +while suspended, except for \field{device status} if it is > >>>>>>>>>> +part of the Configuration > >>>>>>>>>> Space. > >>>>>>>>>> + > >>>>>>>>>> +A device MUST NOT send any notifications for any virtqeuues, > >>>>>>>>>> +access any virtqueues, or modify any fields in its > >>>>>>>>>> +Configuration Space while suspended. > >>>>>>>>>> + > >>>>>>>>>> +If changes occur in the Configuration Space while the > >>>>>>>>>> +SUSPEND bit is set, the device MUST NOT send any > >>>>>>>>>> +configuration change > >>>>>> notifications. > >>>>>>>>>> +Instead, the device MUST send the notification after the > >>>>>>>>>> +SUSPEND bit has > >>>>>>>>>> been cleared. > >>>>>>>>>> + > >>>>>>>>>> +When the driver sets SUSPEND, the device MUST either > suspend > >>>>>>>>>> +itself or set > >>>>>>>>>> DEVICE_NEEDS_RESET if failed to suspend. > >>>>>>>>>> + > >>>>>>>>>> +If SUSPEND is set in \field{device status}, when the driver > >>>>>>>>>> +clears SUSPEND, the device MUST either resume normal > >> operation > >>>>>>>>>> +or set > >>>>>>>>>> DEVICE_NEEDS_RESET. > >>>>>>>>>> + > >>>>>>>>>> +When the driver sets SUSPEND, the device SHOULD perform > the > >>>>>>>>>> +following actions before presenting that > >>>>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}: > >>>>>>>>>> + > >>>>>>>>>> +\begin{itemize} > >>>>>>>>>> +\item Stop processing more buffers of any virtqueues \item > >>>>>>>>>> +Wait until all buffers that are being processed have been used. > >>>>>>>>>> +\item Send used buffer notifications to the driver. > >>>>>>>>>> +\end{itemize} > >>>>>>>>>> + > >>>>>>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio > >>>>>>>>>> Transport Options} > >>>>>>>>>> > >>>>>>>>>> Virtio can use various different buses, thus the standard is > >>>>>>>>>> split @@ -872,6 > >>>>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > >>>>>>>>>> +Feature > >>>>>>>>>> Bits} > >>>>>>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / > >>>>>>>>>> Feature Bits} for > >>>>>>>>>> handling features reserved for future use. > >>>>>>>>>> > >>>>>>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that > >>>>>>>>>> + the driver > >>>>>> can > >>>>>>>>>> + trigger suspending the device via the SUSPEND flag > >>>>>>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device > >>>>>>>>>> + Status > >> Field}. > >>>>>>>>>> + > >>>>>>>>>> \end{description} > >>>>>>>>>> > >>>>>>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved > >>>>>>>>>> Feature Bits} > >>>>>>>>>> -- > >>>>>>>>>> 2.45.2 > >>>>>>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 10:28 ` Parav Pandit @ 2024-09-05 7:20 ` Zhu Lingshan 0 siblings, 0 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:20 UTC (permalink / raw) To: Parav Pandit, mst@redhat.com, cohuck@redhat.com, jasowang@redhat.com Cc: virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/3/2024 6:28 PM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Tuesday, September 3, 2024 2:36 PM >> >> On 8/30/2024 11:02 AM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Friday, August 30, 2024 8:02 AM >>>> >>>> On 8/15/2024 5:34 PM, Parav Pandit wrote: >>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>> Sent: Thursday, August 15, 2024 1:53 PM >>>>>> >>>>>> On 8/13/2024 2:55 PM, Parav Pandit wrote: >>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>> Sent: Tuesday, August 13, 2024 11:44 AM >>>>>>>> >>>>>>> I removed the unreachable intel email id as every single email is >>>>>>> bouncing >>>>>> from it. >>>>>>> Please consider dropping that email from v8 as it will cause all >>>>>>> reviewers >>>>>> email to bounce. >>>>>>>> On 8/13/2024 1:50 PM, Parav Pandit wrote: >>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>>>> Sent: Tuesday, August 13, 2024 11:15 AM >>>>>>>>>> >>>>>>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote: >>>>>>>>>>> Hi Lingshan, David, >>>>>>>>>>> >>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM >>>>>>>>>>>> >>>>>>>>>>>> This commit allows the driver to suspend the device by >>>>>>>>>>>> introducing a new status bit SUSPEND in device_status. >>>>>>>>>>>> >>>>>>>>>>>> This commit also introduce a new feature bit >> VIRTIO_F_SUSPEND >>>>>>>>>>>> which indicating whether the device support SUSPEND. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> >>>>>>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> >>>>>>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org> >>>>>>>>>>>> --- >>>>>>>>>>>> content.tex | 75 >>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++- >>>>>>>>>> ---- >>>>>>>>>>>> -- >>>>>>>>>>>> 1 file changed, 65 insertions(+), 10 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> Changes from V6: >>>>>>>>>>>> - the device should hold its config interrupt while SUSPEND, >>>>>>>>>>>> and send config interrupt when the SUSPEND bit is cleared. >>>>>>>>>>>> - while SUSPEND, the driver MUST NOT access Device >>>>>>>>>>>> Configuration Space >>>>>>>>>>>> - minor changes. >>>>>>>>>>>> >>>>>>>>>>>> Changes from V5: >>>>>>>>>>>> - the device should present NEEDS_RESET if failed to suspend >>>>>>>>>>>> - allow the driver access device status in the config space when >>>>>>>>>>>> suspended if it is implemented in config space. >>>>>>>>>>>> - language improvements >>>>>>>>>>>> >>>>>>>>>>>> Changes from V4: >>>>>>>>>>>> - re-order the device status bits section >>>>>>>>>>>> - kick vqs --> notify vqs >>>>>>>>>>>> >>>>>>>>>>>> Changes from V3: >>>>>>>>>>>> - allow the driver clearing the SUSPEND bit to resume the >> device. >>>>>>>>>>>> - disallow access to config space while suspended. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8 >>>>>>>>>>>> 100644 >>>>>>>>>>>> --- a/content.tex >>>>>>>>>>>> +++ b/content.tex >>>>>>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status} >>>>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>>>>>> this bit. For example, under Linux, drivers can be >>>>>>>>>>>> loadable >>>> modules. >>>>>>>>>>>> \end{note} >>>>>>>>>>>> >>>>>>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in >>>>>>>>>>>> the guest, >>>>>>>>>>>> - and it has given up on the device. This could be an >>>>>>>>>>>> internal >>>>>>>>>>>> - error, or the driver didn't like the device for some >>>>>>>>>>>> reason, or >>>>>>>>>>>> - even a fatal error during device operation. >>>>>>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>>>>>> +ready to >>>>>>>>>>>> + drive the device. >>>>>>>>>>>> >>>>>>>>>>>> \item[FEATURES_OK (8)] Indicates that the driver has >>>>>>>>>>>> acknowledged all >>>>>>>>>> the >>>>>>>>>>>> features it understands, and feature negotiation is complete. >>>>>>>>>>>> >>>>>>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and >>>>>>>>>>>> ready to >>>>>>>>>>>> - drive the device. >>>>>>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, >>>>>>>> indicates >>>>>>>>>>>> +that the >>>>>>>>>>>> + device has been suspended by the driver. >>>>>>>>>>>> >>>>>>>>>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >>>>>>>>>> experienced >>>>>>>>>>>> an error from which it can't recover. >>>>>>>>>>>> + >>>>>>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in >>>>>>>>>>>> +the guest, >>>>>>>>>>>> + and it has given up on the device. This could be an >>>>>>>>>>>> +internal >>>>>>>>>>>> + error, or the driver didn't like the device for some >>>>>>>>>>>> +reason, or >>>>>>>>>>>> + even a fatal error during device operation. >>>>>>>>>>>> \end{description} >>>>>>>>>>>> >>>>>>>>>>>> The \field{device status} field starts out as 0, and is >>>>>>>>>>>> reinitialized to 0 by @@ - >>>>>>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status} >>>>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>>>>>> initialization sequence specified in \ref{sec:General >>>>>>>>>>>> Initialization And Device Operation / Device >>>>>>>>>> Initialization}. >>>>>>>>>>>> -The driver MUST NOT clear a >>>>>>>>>>>> -\field{device status} bit. If the driver sets the FAILED >>>>>>>>>>>> bit, >>>>>>>>>>>> +The driver MUST NOT clear a \field{device status} bit other >>>>>>>>>>>> +than SUSPEND except when setting \field{device status} to 0 >>>>>>>>>>>> +as a transport-specific way to initiate a reset. If the >>>>>>>>>>>> +driver sets the FAILED bit, >>>>>>>>>>>> the driver MUST later reset the device before attempting to >>>>>>>>>>>> re- >>>>>>>> initialize. >>>>>>>>>>>> The driver SHOULD NOT rely on completion of operations of a >>>> @@ >>>>>>>>>>>> -99,10 >>>>>>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities >>>>>>>>>>>> +of a Virtio Device >>>>>>>>>>>> / Feature B \begin{description} >>>>>>>>>>>> \item[0 to 23, and 50 to 127] Feature bits for the specific >>>>>>>>>>>> device type >>>>>>>>>>>> >>>>>>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the >>>>>>>>>>>> queue and >>>>>>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the >>>>>>>>>>>> +queue and >>>>>>>>>>>> feature negotiation mechanisms >>>>>>>>>>>> >>>>>>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for >>>>>>>>>>>> future extensions. >>>>>>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for >>>>>>>>>>>> +future >>>>>>>>>>>> extensions. >>>>>>>>>>>> \end{description} >>>>>>>>>>>> >>>>>>>>>>>> \begin{note} >>>>>>>>>>>> @@ -629,6 +633,53 @@ \section{Device >>>> Cleanup}\label{sec:General >>>>>>>>>>>> Initialization And Device Operation / >>>>>>>>>>>> >>>>>>>>>>>> Thus a driver MUST ensure a virtqueue isn't live (by device >>>>>>>>>>>> reset) before removing exposed buffers. >>>>>>>>>>>> >>>>>>>>>>>> +\section{Device Suspend}\label{sec:General Initialization >>>>>>>>>>>> +And Device Operation / Device Suspend} >>>>>>>>>>>> + >>>>>>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the >>>>>>>>>>>> +SUSPEND bit in \field{device status} to suspend a device, >>>>>>>>>>>> +and can clear the SUSPEND bit to resume a suspended device. >>>>>>>>>>>> + >>>>>>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General >>>>>>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>>>>>> + >>>>>>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set >> or >>>>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>>>>>> + >>>>>>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the >>>> device: >>>>>>>>>>>> +\begin{itemize} >>>>>>>>>>>> +\item The driver MUST re-read \field{device status} to >>>>>>>>>>>> +verify whether the >>>>>>>>>>>> SUSPEND bit is set. >>>>>>>>>>>> +\item The driver MUST NOT make any more buffers available to >>>>>>>>>>>> +the >>>>>>>>>> device. >>>>>>>>>>>> +\item The driver MUST NOT access any virtqueues or send >>>>>>>>>>>> +notifications for >>>>>>>>>>>> any virtqueues. >>>>>>>>>>>> +\item The driver MUST NOT access Device Configuration Space. >>>>>>>>>>>> +\end{itemize} >>>>>>>>>>>> + >>>>>>>>>> Hi Parva >>>>>>>>>>> Do we agree that >>>>>>>>>>> a. suspending a device is non frequent operation (in order of >>>>>>>>>>> N >>>>>>>>>> operations/sec, where N is roughly in range of 10 or 100) per >> device? >>>>>>>>>> Ideally it should not be often in normal operations, but >>>>>>>>>> remember we can not restrict the behaviors of the driver, so we >>>>>>>>>> must be able to handle the scenario in which SUSPENDING is >> often. >>>>>>>>> Sure. the intent is slow rate, but one can do at unexpected times. >>>>>>>>> Do you agree? >>>>>>>> I think we don't have an intention of the frequency in the spec. >>>>>>> Sure. >>>>>>> >>>>>>>> The spec only provides generic mechanisms and interfaces. >>>>>>> Sure. >>>>>>> >>>>>>>> Don't assume it(or driver wants it to be) would be often or not, >>>>>>>> that depends on the driver. >>>>>>> As you rightly said : it cannot be assumed. >>>>>>> The driver will read the device status right after it wrote it. >>>>>>> This typically is < >>>>>> 50nsec of time. >>>>>>> The suspend operation for a net device to store hundreds of >>>>>>> queues, RSS >>>>>> table, flow filters, takes plenty of time (at least more than 50nsec :) ). >>>>>>> Similarly for the GPU to store some MBs of memory takes more than >>>>>>> 50nsec >>>>>> of time, for example to store in a file for a software-based GPU device. >>>>>>> So a device cannot respond back suspend=true in next 50nsec time. >>>>>> It's OK for the device to take longer time to respond, the driver >>>>>> simply re-reads device status. >>>>> Sure, the issue is, when the driver re-reads, the device must >>>>> present >>>> suspend=false within 50nsec. >>>>> (because device didn't suspend it yet). >>>>> >>>>> As I explained in previous email, this requires building special circuitry. >>>>> Such circuitry can be avoided if the suspend interface is done >>>>> slightly >>>> differently. >>>> why there is a constraint condition of the time? >>> Because this is what driver does as you explained in [1] indicating "we must >> be able to handle ..." >>> [1] >>> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore >>> .kernel.org%2Fvirtio-comment%2Fc4d5eed3-774b-4d35-a007- >> f9dff28ce884%40 >> amd.com%2FT%2F%23m6f081f96ef9dcea29c64a88b633eb21d50e8c410&dat >> a=05%7C0 >> 2%7Cparav%40nvidia.com%7C4a248d6982514660015808dccbf79c55%7C430 >> 83d1572 >> 7340c1b7db39efd9ccc17a%7C0%7C0%7C638609511605195425%7CUnknown >> %7CTWFpbG >> Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6 >> Mn0% >> 3D%7C0%7C%7C%7C&sdata=Gj4oG8Z1O7wEl%2B7Mbud%2BbP31aMDMAFA >> H9ElHJSvBTUA% >>> 3D&reserved=0 >> The sentence is "we must be able to handle the scenario in which >> SUSPENDING is often", where do you see any constraint conditions of time? > If the device takes X usec, how can the device inform that, it suspending in progress without flipping the polarity? Oh, I see your puzzle. The device does not inform that. It is VM_EXIT and VM_ENTRY, once VM_ENTRY, the guest vcpu aka the driver resume running > >>>> Are there any similar >>>> constraining for other states like RESET or DRIVER_OK? Don't assume >>>> any other states transitions are faster than SUSPEND. >>> DRIVER_OK does not suffer from it because it is async notification. >>> A device may start slow after DRIVER_OK. >>> SUSPEND operation cannot rely on such async behavior. >> The driver can set suspend and re-read & wait. This is a common routine in >> the driver. >>> RESET also suffers from similar inefficiencies. >>> But that is because it is inherited from the past. >>> >>> Here a new functionality is being proposed and it has a chance for efficient >> device implementation. >>> Therefore the request is to improve it. >> sure, as long as we confirm this new register apply for all device status >> transitions, not only for SUSPEND. > Alright, please do it as you suggested here. > >>>>>>> More below. >>>>>>> >>>>>>>>>>> b. A software-based device may not always want to force >>>>>>>>>>> VM_EXIT on read >>>>>>>>>> and write on the device_status register? >>>>>>>>>> Trap and Emulation is the basic of virtualization, and how to >>>>>>>>>> pass-through a device is out of this spec. >>>>>>>>>> >>>>>>>>> Sure, I didn't suggest to put such things in the spec. >>>>>>>>> My question is, whether to trap and emulate or not is a choice >>>>>>>>> of the >>>>>>>> software. >>>>>>>>> Do you agree? >>>>>>>> The device emulator does not know anything about whether trapped >>>> or >>>>>> not. >>>>>>>> Trapping and Emulation is a hypervisor thing. >>>>>>>> >>>>>>>> If here "software" refers to the device emulator, then Yes, it is >>>>>>>> not the emulator's decision. And the device should not be aware >>>>>>>> of VM_EXIT & VM_ENTRY. >>>>>>>> >>>>>>> Right. So a software wants to implement device_status as pure MMIO >>>>>> writes. (and not VM_EXIT). >>>>>> This is not always true, there can be VM_EXIT of pure emulated >>>>>> devices. >>>>> I am not denying that VM_EXIT can/cannot be there. >>>>> I am saying, the proposal forces VM_EXIT based approach in my >>>> understanding of this patch. >>> [MARKER_1] >>> >>>>> If that is not true, may be can you please explain how this can be >>>> implemented without VM_EXIT? >>>>> We should have an interface that can be done with and without >>>>> VM_EXIT >>>> method at least for any new additions. >>>> VM_EXIT is out of spec, it is a hypervisor and the processor thing. >>> You keep repeating VM_EXIST is out of spec. >>> I already replied at [MARKER_1], sure it is out of spec, the current approach >> forces one to do VM_EXIT based approach. >>> And if not, please explain, how can it be achieved? >> It is not the current solution force the hypervisor & guest to perform >> VM_EXIT. > I read it few times but couldn't parse it. > Can you please explain how your proposal works WITHOUT VM_EXIT? > If not, you can say, that current proposal is limited to force VM_EXIT. > Please explain. It needs VM_EXIT, not only SUSPEND, all sensitive resource, like the config space access should be trapped. > >> In basic virtualization, any access to hypervisor emulated HW registers would >> be Traped and Emulated, means a VM_EXIT and a VM_ENTRY. > That was not my question, my question is how to achieve without VM_EXIT? > You are suggesting a spec that forces VM_EXIT. It needs VM_EXIT, and there is a pairing VM_ENTRY > >>>> In non-pass-through case, any register access are sensitive and will >>>> trigger VM_EXIT. Like RESET or DRIVER_OK needs to access >>>> device_status, nothing different. >>> I don't think you understood the point. >>> Let me repeat, The question is, if the device implementation wants to >> achieve the functionality without VM_EXIT, what is the way? >> Map the register to guest address space, out of the spec. > And which component will flip the bit of suspend before driver reads it to indicate that device is still_suspending? > >>>>>> The >>>>>> HW registers are sensitive resource and any access to them need to >>>>>> be trapped and emulated. >>>>> This does not apply to PCI PFs and VFs which are HW devices (mainly >> PFs). >>>>> so this trap + emulation is narrow view that we better avoid. >>>> This is how *basic* virtualization work, once access sensitive resource, >> trap it. >>>>> If you think this is the way forward, you should put forward in >>>>> patch as >>>> MUST requirement. >>>>> and that does not look right to me. >>>>> I hope you also don't mean to force this method to device >>>> implementations. >>>>> Right? >>>> Again, VM_EXIT is a hypervisor thing, out of spec. Whether there is >>>> a VM_EXIT when setting SUSPEND totally depends on the virtualization >>>> solution. And SUSPEND is nothing different from DRIVER_OK. >>>> >>> Please avoid repeating the point that VM_EXIT is hypervosor thing. >>> No one asked to put this in spec. Please re-read [MARKER1]. You probably >> missed that. >>> SUSPEND is different than DRIVER_OK. I explained the timing constraints >> and the required circuitry needed to fulfill the proposal. >>> And with the additional register, such complicated circuitry can be easily >> avoided. >> Please read QEMU code, that can help you find how the trap - emulate >> mechanism work for guest when it tries to access sensitive resources. > I am not suggesting trap and emulation. Please come out of this looping thoughts. > > I repeat my question, please explain how does suggested solution works without VM_EXIT? > From your response, I derive it cannot. Can you please confirm? > >>>> Means, if your virtualization needs to trap SUSPEND, it also needs to >>>> trap DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND. >>> DRIVER_OK is by law of physics is faster than SUSPEND because it does not >> demand the driver of reading back. >>> There is no driver side loop to check if the device accepted DRIVER_OK or >> not. >>> Agree? >> Why do you assume so? > I am not assuming. It is the current spec. :) > Please read the device initialization sequence, that does NOT say that driver must READ DRIVER_OK. > Hence by law of physics it is faster. > >> In some vendor implementation, DRIVER_OK can be >> slow. Can DRIVER_OK implement a deferred initialization even deferred >> resource allocation? Does the spec say that DRIVER_OK must be faster? > No. it does not. But your suspend proposal says so that device must respond back before the next read arrives from the driver. > >>>>>>> And prefer to returning SUSPEND=true at slow pace. >>>>>>> This means, the device implementation cannot immediately return >>>>>> suspend=true right after it was written. >>>>>>> A MMIO read will read it back, as suspend=true. >>>>>>> >>>>>>> An alternative would be, to forward CPU loads and CPU stores to >>>>>>> different >>>>>> address. >>>>>>> However, this does not work for the hw based devices. >>>>>>> >>>>>>> That means, PCI HW needs to return suspend=0, until the device is >>>>>>> not >>>>>> suspended. >>>>>>> In this example, the device cannot build special circuitry to >>>>>>> answer >>>>>> suspend=true within 50nsec, or in other words building special >>>>>> circuitry to return suspend=false is too complex for the slow operation. >>>>>> why? The device can just not to change the value of the SUSPEND bit >>>>>> before it has fully suspended. >>>>> When driver wrote, it wrote suspend=true, And device returns >>>>> suspend=false while suspend is ongoing, right? >>>>> If yes, this is expensive because the device needs to operate within >>>>> 50nsec >>>> or less to answer suspend=false. >>>>> And even worst, it needs to suspend=true when unsuspending within >>>> 50nsec when resuming is ongoing. >>>> again, there is not a 50nsec constraining and please take a reference >>>> of how DRIVER_OK work with virtualization. >>> There is. The device is expected to return back the desired value to indicate >> driver that suspend is ongoing. >>> It is different than DRIVER_OK. >> where does the spec or any code say 50nsec? > Your spec and your previous comment implies that when device is busy suspending, it must returned the toggled value than what driver set it. > Isn't it? It means device needs to react before the next read arrives from the driver, isn't it? > I took that read as 50nsec example as you explained previously that driver will read very fast. > >>>>>>> If this understanding of burden is clear, >>>>>>> >>>>>>> The proposal is, can you please extend the interface such that, >>>>>>> >>>>>>> 1. driver writes suspend command. >>>>>>> 2. driver reads suspend_status, and receives not_completed=(false). >>>>>>> This is >>>>>> the default value. >>>>>>> 3. When the device completes suspend, it changes the polarity of >>>>>> suspend_status=true. >>>>>>> This has two main benefits: >>>>>>> [A] This will enable software-based devices to write data to slow >>>>>>> files and >>>>>> does not have to force VM_EXITs. >>>>>>> [B] It also enables hw based devices to not build special >>>>>>> circuitry to answer >>>>>> within 50nsec, which can get very complicated for tens or hundreds >>>>>> of PCI PFs. >>>>>> I think we have already discussed on this before in V5, and Jason >>>>>> has some insightful comments >>>>>> >>>>> Unfortunately, not. His comment was that it is not specific to suspend. >>>>> But here we are introducing a new interface and functionality that >>>>> does not >>>> need to suffer or follow anything that may not be efficient. >>>> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor >> %2F&data=05%7C02%7Cparav%40nvidia.com%7C4a248d6982514660015808 >> dccbf79 >> c55%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6386095116052 >> 07133%7 >> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB >> TiI6Ik >> 1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=VGOfVe0deSDilzufY7NH >> gU5RmciNh >>>> dXMMQiVUSiP%2BKM%3D&reserved=0 >>>>>> e.kernel.org%2Fvirtio-comment%2F20240612082055-mutt-send-email- >>>> mst%40 >>>> >> kernel.org%2FT%2F%23mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9&dat >>>> a=05 >>>> >> %7C02%7Cparav%40nvidia.com%7C619c82b60b824ca03f8808dcc89beb4a%7 >>>> C43083 >>>> >> d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638605819247985989%7CUn >>>> known%7C >>>> >> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL >>>> CJXV >>>> >> CI6Mn0%3D%7C0%7C%7C%7C&sdata=VqmJmqt3k5tf3x4ihjE5Bd7u59GadOn >>>> OaOfJ5lvG >>>>>> 1DE%3D&reserved=0 >>>>>>>> this is out of the spec anyway. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Zhu Lingshan >>>>>>>>>> Thanks >>>>>>>>>> Zhu Lingshan >>>>>>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General >>>>>>>>>>>> +Initialization And Device Operation / Device Suspend} >>>>>>>>>>>> + >>>>>>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or >>>>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated. >>>>>>>>>>>> + >>>>>>>>>>>> +The device MUST ignore all access to its Configuration Space >>>>>>>>>>>> +while suspended, except for \field{device status} if it is >>>>>>>>>>>> +part of the Configuration >>>>>>>>>>>> Space. >>>>>>>>>>>> + >>>>>>>>>>>> +A device MUST NOT send any notifications for any virtqeuues, >>>>>>>>>>>> +access any virtqueues, or modify any fields in its >>>>>>>>>>>> +Configuration Space while suspended. >>>>>>>>>>>> + >>>>>>>>>>>> +If changes occur in the Configuration Space while the >>>>>>>>>>>> +SUSPEND bit is set, the device MUST NOT send any >>>>>>>>>>>> +configuration change >>>>>>>> notifications. >>>>>>>>>>>> +Instead, the device MUST send the notification after the >>>>>>>>>>>> +SUSPEND bit has >>>>>>>>>>>> been cleared. >>>>>>>>>>>> + >>>>>>>>>>>> +When the driver sets SUSPEND, the device MUST either >> suspend >>>>>>>>>>>> +itself or set >>>>>>>>>>>> DEVICE_NEEDS_RESET if failed to suspend. >>>>>>>>>>>> + >>>>>>>>>>>> +If SUSPEND is set in \field{device status}, when the driver >>>>>>>>>>>> +clears SUSPEND, the device MUST either resume normal >>>> operation >>>>>>>>>>>> +or set >>>>>>>>>>>> DEVICE_NEEDS_RESET. >>>>>>>>>>>> + >>>>>>>>>>>> +When the driver sets SUSPEND, the device SHOULD perform >> the >>>>>>>>>>>> +following actions before presenting that >>>>>>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}: >>>>>>>>>>>> + >>>>>>>>>>>> +\begin{itemize} >>>>>>>>>>>> +\item Stop processing more buffers of any virtqueues \item >>>>>>>>>>>> +Wait until all buffers that are being processed have been used. >>>>>>>>>>>> +\item Send used buffer notifications to the driver. >>>>>>>>>>>> +\end{itemize} >>>>>>>>>>>> + >>>>>>>>>>>> \chapter{Virtio Transport Options}\label{sec:Virtio >>>>>>>>>>>> Transport Options} >>>>>>>>>>>> >>>>>>>>>>>> Virtio can use various different buses, thus the standard is >>>>>>>>>>>> split @@ -872,6 >>>>>>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >>>>>>>>>>>> +Feature >>>>>>>>>>>> Bits} >>>>>>>>>>>> \ref{devicenormative:Basic Facilities of a Virtio Device / >>>>>>>>>>>> Feature Bits} for >>>>>>>>>>>> handling features reserved for future use. >>>>>>>>>>>> >>>>>>>>>>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that >>>>>>>>>>>> + the driver >>>>>>>> can >>>>>>>>>>>> + trigger suspending the device via the SUSPEND flag >>>>>>>>>>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device >>>>>>>>>>>> + Status >>>> Field}. >>>>>>>>>>>> + >>>>>>>>>>>> \end{description} >>>>>>>>>>>> >>>>>>>>>>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved >>>>>>>>>>>> Feature Bits} >>>>>>>>>>>> -- >>>>>>>>>>>> 2.45.2 >>>>>>>>>>>> ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 8:23 ` Zhu Lingshan 2024-08-15 9:34 ` Parav Pandit @ 2024-08-15 10:45 ` Michael S. Tsirkin 2024-08-30 2:32 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-15 10:45 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Aug 15, 2024 at 04:23:26PM +0800, Zhu Lingshan wrote: > > So a device cannot respond back suspend=true in next 50nsec time. > It's OK for the device to take longer time to respond, the driver simply re-reads device status. You didn't bother saying in the spec that it must, though. -- MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 10:45 ` Michael S. Tsirkin @ 2024-08-30 2:32 ` Zhu Lingshan 0 siblings, 0 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-08-30 2:32 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/15/2024 6:45 PM, Michael S. Tsirkin wrote: > On Thu, Aug 15, 2024 at 04:23:26PM +0800, Zhu Lingshan wrote: >>> So a device cannot respond back suspend=true in next 50nsec time. >> It's OK for the device to take longer time to respond, the driver simply re-reads device status. > You didn't bother saying in the spec that it must, though. I can add this re-read like how we handle FEATURE_OK > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 6:55 ` Parav Pandit 2024-08-15 8:23 ` Zhu Lingshan @ 2024-08-15 10:52 ` Michael S. Tsirkin 2024-08-15 10:59 ` Parav Pandit 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-15 10:52 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > That means, PCI HW needs to return suspend=0, until the device is not suspended. > In this example, the device cannot build special circuitry to answer suspend=true within 50nsec, or in other words building special circuitry to return suspend=false is too complex for the slow operation. > > If this understanding of burden is clear, > > The proposal is, can you please extend the interface such that, > > 1. driver writes suspend command. > 2. driver reads suspend_status, and receives not_completed=(false). This is the default value. > 3. When the device completes suspend, it changes the polarity of suspend_status=true. > > This has two main benefits: > [A] This will enable software-based devices to write data to slow files and does not have to force VM_EXITs. > > [B] It also enables hw based devices to not build special circuitry to answer within 50nsec, which can get very complicated for tens or hundreds of PCI PFs. I read this several times, and I don't understand what is proposed. A special register for suspend/resume? Is this the difference? -- MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 10:52 ` Michael S. Tsirkin @ 2024-08-15 10:59 ` Parav Pandit 2024-08-15 15:07 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-15 10:59 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Thursday, August 15, 2024 4:23 PM > > On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > > That means, PCI HW needs to return suspend=0, until the device is not > suspended. > > In this example, the device cannot build special circuitry to answer > suspend=true within 50nsec, or in other words building special circuitry to > return suspend=false is too complex for the slow operation. > > > > If this understanding of burden is clear, > > > > The proposal is, can you please extend the interface such that, > > > > 1. driver writes suspend command. > > 2. driver reads suspend_status, and receives not_completed=(false). This is > the default value. > > 3. When the device completes suspend, it changes the polarity of > suspend_status=true. > > > > This has two main benefits: > > [A] This will enable software-based devices to write data to slow files and > does not have to force VM_EXITs. > > > > [B] It also enables hw based devices to not build special circuitry to answer > within 50nsec, which can get very complicated for tens or hundreds of PCI > PFs. > > I read this several times, and I don't understand what is proposed. > A special register for suspend/resume? Is this the difference? > Yes, a command register for suspend/resume operation. And device_status new bit that Lingshan defined returns the status of this operation. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 10:59 ` Parav Pandit @ 2024-08-15 15:07 ` Michael S. Tsirkin 2024-08-17 5:19 ` Parav Pandit 2024-08-30 2:37 ` Zhu Lingshan 0 siblings, 2 replies; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-15 15:07 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > > > > From: Michael S. Tsirkin <mst@redhat.com> > > Sent: Thursday, August 15, 2024 4:23 PM > > > > On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > > > That means, PCI HW needs to return suspend=0, until the device is not > > suspended. > > > In this example, the device cannot build special circuitry to answer > > suspend=true within 50nsec, or in other words building special circuitry to > > return suspend=false is too complex for the slow operation. > > > > > > If this understanding of burden is clear, > > > > > > The proposal is, can you please extend the interface such that, > > > > > > 1. driver writes suspend command. > > > 2. driver reads suspend_status, and receives not_completed=(false). This is > > the default value. > > > 3. When the device completes suspend, it changes the polarity of > > suspend_status=true. > > > > > > This has two main benefits: > > > [A] This will enable software-based devices to write data to slow files and > > does not have to force VM_EXITs. > > > > > > [B] It also enables hw based devices to not build special circuitry to answer > > within 50nsec, which can get very complicated for tens or hundreds of PCI > > PFs. > > > > I read this several times, and I don't understand what is proposed. > > A special register for suspend/resume? Is this the difference? > > > Yes, a command register for suspend/resume operation. > And device_status new bit that Lingshan defined returns the status of this operation. Ugh, it's all quite messy IMHO. We have 4 states: - operational (resumed) - suspend in progress - suspended - resume in progress What I'd do then is a two bit register. To suspend: - write suspend in progress - re-read, waiting until suspended To resume - write resume in progress - re-read, waiting until operational (resumed) How does this sound? -- MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 15:07 ` Michael S. Tsirkin @ 2024-08-17 5:19 ` Parav Pandit 2024-08-30 2:37 ` Zhu Lingshan 1 sibling, 0 replies; 69+ messages in thread From: Parav Pandit @ 2024-08-17 5:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Thursday, August 15, 2024 8:38 PM > > On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > Sent: Thursday, August 15, 2024 4:23 PM > > > > > > On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > > > > That means, PCI HW needs to return suspend=0, until the device is > > > > not > > > suspended. > > > > In this example, the device cannot build special circuitry to > > > > answer > > > suspend=true within 50nsec, or in other words building special > > > circuitry to return suspend=false is too complex for the slow operation. > > > > > > > > If this understanding of burden is clear, > > > > > > > > The proposal is, can you please extend the interface such that, > > > > > > > > 1. driver writes suspend command. > > > > 2. driver reads suspend_status, and receives > > > > not_completed=(false). This is > > > the default value. > > > > 3. When the device completes suspend, it changes the polarity of > > > suspend_status=true. > > > > > > > > This has two main benefits: > > > > [A] This will enable software-based devices to write data to slow > > > > files and > > > does not have to force VM_EXITs. > > > > > > > > [B] It also enables hw based devices to not build special > > > > circuitry to answer > > > within 50nsec, which can get very complicated for tens or hundreds > > > of PCI PFs. > > > > > > I read this several times, and I don't understand what is proposed. > > > A special register for suspend/resume? Is this the difference? > > > > > Yes, a command register for suspend/resume operation. > > And device_status new bit that Lingshan defined returns the status of this > operation. > > > > Ugh, it's all quite messy IMHO. > We have 4 states: > - operational (resumed) > - suspend in progress > - suspended > - resume in progress > > What I'd do then is a two bit register. > To suspend: > - write suspend in progress > - re-read, waiting until suspended > To resume > - write resume in progress > - re-read, waiting until operational (resumed) > > How does this sound? > Yes. Just that having command in different register than the status is better approach to me. As it allows steering command to different piece of hw from the status. 1. separate status register can be optimized by the device as its largely serving the reads. 2. modular 3. simplicity and clarity to developers cmd_register: 0 = resume 1 = suspend status: 0 = resumed. 1 = suspended cmd status 0 0 operational (resumed), typical default. 1 0 suspend in progress (initiated) 1 1 suspended 0 1 resume initiated (still suspended) > -- > MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 15:07 ` Michael S. Tsirkin 2024-08-17 5:19 ` Parav Pandit @ 2024-08-30 2:37 ` Zhu Lingshan 2024-08-30 3:10 ` Parav Pandit 1 sibling, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-08-30 2:37 UTC (permalink / raw) To: Michael S. Tsirkin, Parav Pandit Cc: cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: > On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: >> >>> From: Michael S. Tsirkin <mst@redhat.com> >>> Sent: Thursday, August 15, 2024 4:23 PM >>> >>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: >>>> That means, PCI HW needs to return suspend=0, until the device is not >>> suspended. >>>> In this example, the device cannot build special circuitry to answer >>> suspend=true within 50nsec, or in other words building special circuitry to >>> return suspend=false is too complex for the slow operation. >>>> If this understanding of burden is clear, >>>> >>>> The proposal is, can you please extend the interface such that, >>>> >>>> 1. driver writes suspend command. >>>> 2. driver reads suspend_status, and receives not_completed=(false). This is >>> the default value. >>>> 3. When the device completes suspend, it changes the polarity of >>> suspend_status=true. >>>> This has two main benefits: >>>> [A] This will enable software-based devices to write data to slow files and >>> does not have to force VM_EXITs. >>>> [B] It also enables hw based devices to not build special circuitry to answer >>> within 50nsec, which can get very complicated for tens or hundreds of PCI >>> PFs. >>> >>> I read this several times, and I don't understand what is proposed. >>> A special register for suspend/resume? Is this the difference? >>> >> Yes, a command register for suspend/resume operation. >> And device_status new bit that Lingshan defined returns the status of this operation. > > > Ugh, it's all quite messy IMHO. > We have 4 states: > - operational (resumed) > - suspend in progress > - suspended > - resume in progress > > What I'd do then is a two bit register. > To suspend: > - write suspend in progress > - re-read, waiting until suspended > To resume > - write resume in progress > - re-read, waiting until operational (resumed) > > How does this sound? This can work for sure. but is it a must? I mean, the driver has its own knowledge of how it operate the device. When device presents SUSPEND == 0, It know whether the device is in normal operational state or in the progress of SUSPENDING. But if you think we should add a new register which applying for all device_status transitions, NOT only for SUSPEND. we can surely do that. Thanks > ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-30 2:37 ` Zhu Lingshan @ 2024-08-30 3:10 ` Parav Pandit 2024-09-03 8:51 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-30 3:10 UTC (permalink / raw) To: Zhu Lingshan, Michael S. Tsirkin Cc: cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Friday, August 30, 2024 8:07 AM > > > On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: > > On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > >> > >>> From: Michael S. Tsirkin <mst@redhat.com> > >>> Sent: Thursday, August 15, 2024 4:23 PM > >>> > >>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > >>>> That means, PCI HW needs to return suspend=0, until the device is > >>>> not > >>> suspended. > >>>> In this example, the device cannot build special circuitry to > >>>> answer > >>> suspend=true within 50nsec, or in other words building special > >>> circuitry to return suspend=false is too complex for the slow operation. > >>>> If this understanding of burden is clear, > >>>> > >>>> The proposal is, can you please extend the interface such that, > >>>> > >>>> 1. driver writes suspend command. > >>>> 2. driver reads suspend_status, and receives not_completed=(false). > >>>> This is > >>> the default value. > >>>> 3. When the device completes suspend, it changes the polarity of > >>> suspend_status=true. > >>>> This has two main benefits: > >>>> [A] This will enable software-based devices to write data to slow > >>>> files and > >>> does not have to force VM_EXITs. > >>>> [B] It also enables hw based devices to not build special circuitry > >>>> to answer > >>> within 50nsec, which can get very complicated for tens or hundreds > >>> of PCI PFs. > >>> > >>> I read this several times, and I don't understand what is proposed. > >>> A special register for suspend/resume? Is this the difference? > >>> > >> Yes, a command register for suspend/resume operation. > >> And device_status new bit that Lingshan defined returns the status of this > operation. > > > > > > Ugh, it's all quite messy IMHO. > > We have 4 states: > > - operational (resumed) > > - suspend in progress > > - suspended > > - resume in progress > > > > What I'd do then is a two bit register. > > To suspend: > > - write suspend in progress > > - re-read, waiting until suspended > > To resume > > - write resume in progress > > - re-read, waiting until operational (resumed) > > > > How does this sound? > This can work for sure. but is it a must? > I mean, the driver has its own knowledge of how it operate the device. > When device presents SUSPEND == 0, It know whether the device is in > normal operational state or in the progress of SUSPENDING. > > But if you think we should add a new register which applying for all > device_status transitions, NOT only for SUSPEND. we can surely do that. > > Thanks > > New register beyond suspend+resume can be useful too. For sure it will simplify the suspend + resume flow. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-30 3:10 ` Parav Pandit @ 2024-09-03 8:51 ` Zhu Lingshan 2024-09-03 8:55 ` Parav Pandit 2024-09-03 9:36 ` Michael S. Tsirkin 0 siblings, 2 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-09-03 8:51 UTC (permalink / raw) To: Parav Pandit, Michael S. Tsirkin Cc: cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 8/30/2024 11:10 AM, Parav Pandit wrote: > >> From: Zhu Lingshan <lingshan.zhu@amd.com> >> Sent: Friday, August 30, 2024 8:07 AM >> >> >> On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: >>> On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: >>>>> From: Michael S. Tsirkin <mst@redhat.com> >>>>> Sent: Thursday, August 15, 2024 4:23 PM >>>>> >>>>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: >>>>>> That means, PCI HW needs to return suspend=0, until the device is >>>>>> not >>>>> suspended. >>>>>> In this example, the device cannot build special circuitry to >>>>>> answer >>>>> suspend=true within 50nsec, or in other words building special >>>>> circuitry to return suspend=false is too complex for the slow operation. >>>>>> If this understanding of burden is clear, >>>>>> >>>>>> The proposal is, can you please extend the interface such that, >>>>>> >>>>>> 1. driver writes suspend command. >>>>>> 2. driver reads suspend_status, and receives not_completed=(false). >>>>>> This is >>>>> the default value. >>>>>> 3. When the device completes suspend, it changes the polarity of >>>>> suspend_status=true. >>>>>> This has two main benefits: >>>>>> [A] This will enable software-based devices to write data to slow >>>>>> files and >>>>> does not have to force VM_EXITs. >>>>>> [B] It also enables hw based devices to not build special circuitry >>>>>> to answer >>>>> within 50nsec, which can get very complicated for tens or hundreds >>>>> of PCI PFs. >>>>> >>>>> I read this several times, and I don't understand what is proposed. >>>>> A special register for suspend/resume? Is this the difference? >>>>> >>>> Yes, a command register for suspend/resume operation. >>>> And device_status new bit that Lingshan defined returns the status of this >> operation. >>> >>> Ugh, it's all quite messy IMHO. >>> We have 4 states: >>> - operational (resumed) >>> - suspend in progress >>> - suspended >>> - resume in progress >>> >>> What I'd do then is a two bit register. >>> To suspend: >>> - write suspend in progress >>> - re-read, waiting until suspended >>> To resume >>> - write resume in progress >>> - re-read, waiting until operational (resumed) >>> >>> How does this sound? >> This can work for sure. but is it a must? >> I mean, the driver has its own knowledge of how it operate the device. >> When device presents SUSPEND == 0, It know whether the device is in >> normal operational state or in the progress of SUSPENDING. >> >> But if you think we should add a new register which applying for all >> device_status transitions, NOT only for SUSPEND. we can surely do that. >> >> Thanks > New register beyond suspend+resume can be useful too. > For sure it will simplify the suspend + resume flow. There should be no difference in how the driver handles SUSPEND and other device status like RESET. If we want to add a new register, then it is not only for SUSPEND, but for all status transitions. We need Michael to confirm we should implement this new register that apply to all device_status transitions for common interests. Thanks > ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 8:51 ` Zhu Lingshan @ 2024-09-03 8:55 ` Parav Pandit 2024-09-03 9:36 ` Michael S. Tsirkin 1 sibling, 0 replies; 69+ messages in thread From: Parav Pandit @ 2024-09-03 8:55 UTC (permalink / raw) To: Zhu Lingshan, Michael S. Tsirkin Cc: cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Zhu Lingshan <lingshan.zhu@amd.com> > Sent: Tuesday, September 3, 2024 2:21 PM > > > On 8/30/2024 11:10 AM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Friday, August 30, 2024 8:07 AM > >> > >> > >> On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: > >>> On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > >>>>> From: Michael S. Tsirkin <mst@redhat.com> > >>>>> Sent: Thursday, August 15, 2024 4:23 PM > >>>>> > >>>>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > >>>>>> That means, PCI HW needs to return suspend=0, until the device is > >>>>>> not > >>>>> suspended. > >>>>>> In this example, the device cannot build special circuitry to > >>>>>> answer > >>>>> suspend=true within 50nsec, or in other words building special > >>>>> circuitry to return suspend=false is too complex for the slow > operation. > >>>>>> If this understanding of burden is clear, > >>>>>> > >>>>>> The proposal is, can you please extend the interface such that, > >>>>>> > >>>>>> 1. driver writes suspend command. > >>>>>> 2. driver reads suspend_status, and receives not_completed=(false). > >>>>>> This is > >>>>> the default value. > >>>>>> 3. When the device completes suspend, it changes the polarity of > >>>>> suspend_status=true. > >>>>>> This has two main benefits: > >>>>>> [A] This will enable software-based devices to write data to slow > >>>>>> files and > >>>>> does not have to force VM_EXITs. > >>>>>> [B] It also enables hw based devices to not build special > >>>>>> circuitry to answer > >>>>> within 50nsec, which can get very complicated for tens or hundreds > >>>>> of PCI PFs. > >>>>> > >>>>> I read this several times, and I don't understand what is proposed. > >>>>> A special register for suspend/resume? Is this the difference? > >>>>> > >>>> Yes, a command register for suspend/resume operation. > >>>> And device_status new bit that Lingshan defined returns the status > >>>> of this > >> operation. > >>> > >>> Ugh, it's all quite messy IMHO. > >>> We have 4 states: > >>> - operational (resumed) > >>> - suspend in progress > >>> - suspended > >>> - resume in progress > >>> > >>> What I'd do then is a two bit register. > >>> To suspend: > >>> - write suspend in progress > >>> - re-read, waiting until suspended > >>> To resume > >>> - write resume in progress > >>> - re-read, waiting until operational (resumed) > >>> > >>> How does this sound? > >> This can work for sure. but is it a must? > >> I mean, the driver has its own knowledge of how it operate the device. > >> When device presents SUSPEND == 0, It know whether the device is in > >> normal operational state or in the progress of SUSPENDING. > >> > >> But if you think we should add a new register which applying for all > >> device_status transitions, NOT only for SUSPEND. we can surely do that. > >> > >> Thanks > > New register beyond suspend+resume can be useful too. > > For sure it will simplify the suspend + resume flow. > There should be no difference in how the driver handles SUSPEND and other > device status like RESET. > You still miss the fundamental point that I explained many times that is to avoid special time sensitive circuitry in the device by means of a new register. (not the driver). > If we want to add a new register, then it is not only for SUSPEND, but for all > status transitions. We need Michael to confirm we should implement this > new register that apply to all device_status transitions for common interests. Sure. Look forward to his opinion. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 8:51 ` Zhu Lingshan 2024-09-03 8:55 ` Parav Pandit @ 2024-09-03 9:36 ` Michael S. Tsirkin 2024-09-05 7:27 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-03 9:36 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Sep 03, 2024 at 04:51:18PM +0800, Zhu Lingshan wrote: > > > On 8/30/2024 11:10 AM, Parav Pandit wrote: > > > >> From: Zhu Lingshan <lingshan.zhu@amd.com> > >> Sent: Friday, August 30, 2024 8:07 AM > >> > >> > >> On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: > >>> On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > >>>>> From: Michael S. Tsirkin <mst@redhat.com> > >>>>> Sent: Thursday, August 15, 2024 4:23 PM > >>>>> > >>>>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > >>>>>> That means, PCI HW needs to return suspend=0, until the device is > >>>>>> not > >>>>> suspended. > >>>>>> In this example, the device cannot build special circuitry to > >>>>>> answer > >>>>> suspend=true within 50nsec, or in other words building special > >>>>> circuitry to return suspend=false is too complex for the slow operation. > >>>>>> If this understanding of burden is clear, > >>>>>> > >>>>>> The proposal is, can you please extend the interface such that, > >>>>>> > >>>>>> 1. driver writes suspend command. > >>>>>> 2. driver reads suspend_status, and receives not_completed=(false). > >>>>>> This is > >>>>> the default value. > >>>>>> 3. When the device completes suspend, it changes the polarity of > >>>>> suspend_status=true. > >>>>>> This has two main benefits: > >>>>>> [A] This will enable software-based devices to write data to slow > >>>>>> files and > >>>>> does not have to force VM_EXITs. > >>>>>> [B] It also enables hw based devices to not build special circuitry > >>>>>> to answer > >>>>> within 50nsec, which can get very complicated for tens or hundreds > >>>>> of PCI PFs. > >>>>> > >>>>> I read this several times, and I don't understand what is proposed. > >>>>> A special register for suspend/resume? Is this the difference? > >>>>> > >>>> Yes, a command register for suspend/resume operation. > >>>> And device_status new bit that Lingshan defined returns the status of this > >> operation. > >>> > >>> Ugh, it's all quite messy IMHO. > >>> We have 4 states: > >>> - operational (resumed) > >>> - suspend in progress > >>> - suspended > >>> - resume in progress > >>> > >>> What I'd do then is a two bit register. > >>> To suspend: > >>> - write suspend in progress > >>> - re-read, waiting until suspended > >>> To resume > >>> - write resume in progress > >>> - re-read, waiting until operational (resumed) > >>> > >>> How does this sound? > >> This can work for sure. but is it a must? > >> I mean, the driver has its own knowledge of how it operate the device. > >> When device presents SUSPEND == 0, It know whether the device is in > >> normal operational state or in the progress of SUSPENDING. > >> > >> But if you think we should add a new register which applying for all > >> device_status transitions, NOT only for SUSPEND. we can surely do that. > >> > >> Thanks > > New register beyond suspend+resume can be useful too. > > For sure it will simplify the suspend + resume flow. > There should be no difference in how the driver handles SUSPEND > and other device status like RESET. > > If we want to add a new register, then it is not only for SUSPEND, > but for all status transitions. > We need Michael to confirm we should implement this new register > that apply to all device_status transitions for common interests. > > Thanks There is a difference between SUSPEND and RESET. RESET is not a state. Thus a single bit is enough to signal "reset in progress". I don't really see any other transitions that can take a long time. We can start with just suspend, and extend it later if appropriate. > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-03 9:36 ` Michael S. Tsirkin @ 2024-09-05 7:27 ` Zhu Lingshan 2024-09-24 23:07 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-09-05 7:27 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On 9/3/2024 5:36 PM, Michael S. Tsirkin wrote: > On Tue, Sep 03, 2024 at 04:51:18PM +0800, Zhu Lingshan wrote: >> >> On 8/30/2024 11:10 AM, Parav Pandit wrote: >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> >>>> Sent: Friday, August 30, 2024 8:07 AM >>>> >>>> >>>> On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: >>>>> On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: >>>>>>> From: Michael S. Tsirkin <mst@redhat.com> >>>>>>> Sent: Thursday, August 15, 2024 4:23 PM >>>>>>> >>>>>>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: >>>>>>>> That means, PCI HW needs to return suspend=0, until the device is >>>>>>>> not >>>>>>> suspended. >>>>>>>> In this example, the device cannot build special circuitry to >>>>>>>> answer >>>>>>> suspend=true within 50nsec, or in other words building special >>>>>>> circuitry to return suspend=false is too complex for the slow operation. >>>>>>>> If this understanding of burden is clear, >>>>>>>> >>>>>>>> The proposal is, can you please extend the interface such that, >>>>>>>> >>>>>>>> 1. driver writes suspend command. >>>>>>>> 2. driver reads suspend_status, and receives not_completed=(false). >>>>>>>> This is >>>>>>> the default value. >>>>>>>> 3. When the device completes suspend, it changes the polarity of >>>>>>> suspend_status=true. >>>>>>>> This has two main benefits: >>>>>>>> [A] This will enable software-based devices to write data to slow >>>>>>>> files and >>>>>>> does not have to force VM_EXITs. >>>>>>>> [B] It also enables hw based devices to not build special circuitry >>>>>>>> to answer >>>>>>> within 50nsec, which can get very complicated for tens or hundreds >>>>>>> of PCI PFs. >>>>>>> >>>>>>> I read this several times, and I don't understand what is proposed. >>>>>>> A special register for suspend/resume? Is this the difference? >>>>>>> >>>>>> Yes, a command register for suspend/resume operation. >>>>>> And device_status new bit that Lingshan defined returns the status of this >>>> operation. >>>>> Ugh, it's all quite messy IMHO. >>>>> We have 4 states: >>>>> - operational (resumed) >>>>> - suspend in progress >>>>> - suspended >>>>> - resume in progress >>>>> >>>>> What I'd do then is a two bit register. >>>>> To suspend: >>>>> - write suspend in progress >>>>> - re-read, waiting until suspended >>>>> To resume >>>>> - write resume in progress >>>>> - re-read, waiting until operational (resumed) >>>>> >>>>> How does this sound? >>>> This can work for sure. but is it a must? >>>> I mean, the driver has its own knowledge of how it operate the device. >>>> When device presents SUSPEND == 0, It know whether the device is in >>>> normal operational state or in the progress of SUSPENDING. >>>> >>>> But if you think we should add a new register which applying for all >>>> device_status transitions, NOT only for SUSPEND. we can surely do that. >>>> >>>> Thanks >>> New register beyond suspend+resume can be useful too. >>> For sure it will simplify the suspend + resume flow. >> There should be no difference in how the driver handles SUSPEND >> and other device status like RESET. >> >> If we want to add a new register, then it is not only for SUSPEND, >> but for all status transitions. >> We need Michael to confirm we should implement this new register >> that apply to all device_status transitions for common interests. >> >> Thanks > There is a difference between SUSPEND and RESET. > RESET is not a state. Thus a single bit is enough to > signal "reset in progress". > > I don't really see any other transitions that can take > a long time. We can start with just suspend, and > extend it later if appropriate. Yes, that is what I mean, the register should not only work for SUSPEND. To be more specific, the definition should be: // The device_status is still in a status transition #define DEVICE_STATUS_TRANSITION_IN_PROGRESS 0 // device status transition is done #define DEVICE_STATUS_TRANSITION_DONE 1 They should not be defined as: #define DEVICE_STATUS_SUSPEND_IN_PROGRESS 0 Thanks > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-09-05 7:27 ` Zhu Lingshan @ 2024-09-24 23:07 ` Michael S. Tsirkin 0 siblings, 0 replies; 69+ messages in thread From: Michael S. Tsirkin @ 2024-09-24 23:07 UTC (permalink / raw) To: Zhu Lingshan Cc: Parav Pandit, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Thu, Sep 05, 2024 at 03:27:25PM +0800, Zhu Lingshan wrote: > > > On 9/3/2024 5:36 PM, Michael S. Tsirkin wrote: > > On Tue, Sep 03, 2024 at 04:51:18PM +0800, Zhu Lingshan wrote: > >> > >> On 8/30/2024 11:10 AM, Parav Pandit wrote: > >>>> From: Zhu Lingshan <lingshan.zhu@amd.com> > >>>> Sent: Friday, August 30, 2024 8:07 AM > >>>> > >>>> > >>>> On 8/15/2024 11:07 PM, Michael S. Tsirkin wrote: > >>>>> On Thu, Aug 15, 2024 at 10:59:45AM +0000, Parav Pandit wrote: > >>>>>>> From: Michael S. Tsirkin <mst@redhat.com> > >>>>>>> Sent: Thursday, August 15, 2024 4:23 PM > >>>>>>> > >>>>>>> On Tue, Aug 13, 2024 at 06:55:04AM +0000, Parav Pandit wrote: > >>>>>>>> That means, PCI HW needs to return suspend=0, until the device is > >>>>>>>> not > >>>>>>> suspended. > >>>>>>>> In this example, the device cannot build special circuitry to > >>>>>>>> answer > >>>>>>> suspend=true within 50nsec, or in other words building special > >>>>>>> circuitry to return suspend=false is too complex for the slow operation. > >>>>>>>> If this understanding of burden is clear, > >>>>>>>> > >>>>>>>> The proposal is, can you please extend the interface such that, > >>>>>>>> > >>>>>>>> 1. driver writes suspend command. > >>>>>>>> 2. driver reads suspend_status, and receives not_completed=(false). > >>>>>>>> This is > >>>>>>> the default value. > >>>>>>>> 3. When the device completes suspend, it changes the polarity of > >>>>>>> suspend_status=true. > >>>>>>>> This has two main benefits: > >>>>>>>> [A] This will enable software-based devices to write data to slow > >>>>>>>> files and > >>>>>>> does not have to force VM_EXITs. > >>>>>>>> [B] It also enables hw based devices to not build special circuitry > >>>>>>>> to answer > >>>>>>> within 50nsec, which can get very complicated for tens or hundreds > >>>>>>> of PCI PFs. > >>>>>>> > >>>>>>> I read this several times, and I don't understand what is proposed. > >>>>>>> A special register for suspend/resume? Is this the difference? > >>>>>>> > >>>>>> Yes, a command register for suspend/resume operation. > >>>>>> And device_status new bit that Lingshan defined returns the status of this > >>>> operation. > >>>>> Ugh, it's all quite messy IMHO. > >>>>> We have 4 states: > >>>>> - operational (resumed) > >>>>> - suspend in progress > >>>>> - suspended > >>>>> - resume in progress > >>>>> > >>>>> What I'd do then is a two bit register. > >>>>> To suspend: > >>>>> - write suspend in progress > >>>>> - re-read, waiting until suspended > >>>>> To resume > >>>>> - write resume in progress > >>>>> - re-read, waiting until operational (resumed) > >>>>> > >>>>> How does this sound? > >>>> This can work for sure. but is it a must? > >>>> I mean, the driver has its own knowledge of how it operate the device. > >>>> When device presents SUSPEND == 0, It know whether the device is in > >>>> normal operational state or in the progress of SUSPENDING. > >>>> > >>>> But if you think we should add a new register which applying for all > >>>> device_status transitions, NOT only for SUSPEND. we can surely do that. > >>>> > >>>> Thanks > >>> New register beyond suspend+resume can be useful too. > >>> For sure it will simplify the suspend + resume flow. > >> There should be no difference in how the driver handles SUSPEND > >> and other device status like RESET. > >> > >> If we want to add a new register, then it is not only for SUSPEND, > >> but for all status transitions. > >> We need Michael to confirm we should implement this new register > >> that apply to all device_status transitions for common interests. > >> > >> Thanks > > There is a difference between SUSPEND and RESET. > > RESET is not a state. Thus a single bit is enough to > > signal "reset in progress". > > > > I don't really see any other transitions that can take > > a long time. We can start with just suspend, and > > extend it later if appropriate. > Yes, that is what I mean, the register should not only work for SUSPEND. > To be more specific, the definition should be: > > // The device_status is still in a status transition > #define DEVICE_STATUS_TRANSITION_IN_PROGRESS 0 > // device status transition is done > #define DEVICE_STATUS_TRANSITION_DONE 1 > > They should not be defined as: > #define DEVICE_STATUS_SUSPEND_IN_PROGRESS 0 > > Thanks > in case we add more commands down the line? ok, sure. > > > > ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 4:42 ` Parav Pandit 2024-08-13 5:44 ` Zhu Lingshan @ 2024-08-13 7:51 ` Michael S. Tsirkin 2024-08-13 7:58 ` Parav Pandit 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-13 7:51 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Zhu Lingshan, Eugenio Pérez, David Stevens On Tue, Aug 13, 2024 at 04:42:06AM +0000, Parav Pandit wrote: > a. suspending a device is non frequent operation (in order of N operations/sec, where N is roughly in range of 10 or 100) per device? > b. A software-based device may not always want to force VM_EXIT on read and write on the device_status register? I just see where this is going. Parav, if you want a device with as little as possible memory on it, using DMA as much as possible, what you want is develop an alternative transport that will let you control status without using memory accesses. This will also, with time, allow doing this for existing config space. We aren't blocking all spec development until you do, and the tradeoff of easy to debug, robust MMIO control versus complex but maybe cheaper for some hardware vendors DMA based access is up to the device. -- MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* RE: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 7:51 ` Michael S. Tsirkin @ 2024-08-13 7:58 ` Parav Pandit 2024-08-13 8:03 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Parav Pandit @ 2024-08-13 7:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Tuesday, August 13, 2024 1:22 PM > Removing intel email id again to avoid constant bouncing messages. > On Tue, Aug 13, 2024 at 04:42:06AM +0000, Parav Pandit wrote: > > a. suspending a device is non frequent operation (in order of N > operations/sec, where N is roughly in range of 10 or 100) per device? > > b. A software-based device may not always want to force VM_EXIT on read > and write on the device_status register? > > I just see where this is going. Parav, if you want a device with as little as > possible memory on it, using DMA as much as possible, what you want is > develop an alternative transport that will let you control status without using > memory accesses. No, there is no need of DMA here. This is not about less memory either. As I explained, the challenge is implementing suspend bits to operate at sub-microsecond level granularity. > This will also, with time, allow doing this for existing config space. > We aren't blocking all spec development until you do, and the tradeoff of > easy to debug, robust MMIO control versus complex but maybe cheaper for > some hardware vendors DMA based access is up to the device. > No one is blocking the spec etc. Not sure why you say that. The idea is to have suspend as command register, and its status to reflect in the existing device_status. Last time you recommended to only make design suggestion and not go down the exact details of the bit and let author to fix it. If you prefer to discuss the bit level details, it looks good to me to go to that details. > -- > MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 7:58 ` Parav Pandit @ 2024-08-13 8:03 ` Michael S. Tsirkin 0 siblings, 0 replies; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-13 8:03 UTC (permalink / raw) To: Parav Pandit Cc: Zhu Lingshan, cohuck@redhat.com, jasowang@redhat.com, virtio-comment@lists.linux.dev, Eugenio Pérez, David Stevens On Tue, Aug 13, 2024 at 07:58:32AM +0000, Parav Pandit wrote: > > > > From: Michael S. Tsirkin <mst@redhat.com> > > Sent: Tuesday, August 13, 2024 1:22 PM > > > Removing intel email id again to avoid constant bouncing messages. Thanks. > > On Tue, Aug 13, 2024 at 04:42:06AM +0000, Parav Pandit wrote: > > > a. suspending a device is non frequent operation (in order of N > > operations/sec, where N is roughly in range of 10 or 100) per device? > > > b. A software-based device may not always want to force VM_EXIT on read > > and write on the device_status register? > > > > I just see where this is going. Parav, if you want a device with as little as > > possible memory on it, using DMA as much as possible, what you want is > > develop an alternative transport that will let you control status without using > > memory accesses. > No, there is no need of DMA here. > This is not about less memory either. OK, sorry. I misunderstood. > As I explained, the challenge is implementing suspend bits to operate at sub-microsecond level granularity. Yes, I think the proposal is too vague on how all this happens. Replied separately. > > This will also, with time, allow doing this for existing config space. > > We aren't blocking all spec development until you do, and the tradeoff of > > easy to debug, robust MMIO control versus complex but maybe cheaper for > > some hardware vendors DMA based access is up to the device. > > > No one is blocking the spec etc. Not sure why you say that. > The idea is to have suspend as command register, and its status to reflect in the existing device_status. > > Last time you recommended to only make design suggestion and not go down the exact details of the bit and let author to fix it. > > If you prefer to discuss the bit level details, it looks good to me to go to that details. Sorry, reacted too early. > > -- > > MST ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-01 11:35 [PATCH V7 v7] virtio: introduce SUSPEND bit in device status Zhu Lingshan 2024-08-13 4:42 ` Parav Pandit @ 2024-08-13 8:01 ` Michael S. Tsirkin 2024-08-15 9:12 ` Zhu Lingshan 1 sibling, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-13 8:01 UTC (permalink / raw) To: Zhu Lingshan Cc: cohuck, jasowang, virtio-comment, Eugenio Pérez, David Stevens On Thu, Aug 01, 2024 at 07:35:16PM +0800, Zhu Lingshan wrote: > +\drivernormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} > + > +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. Actually, it has no effect before DRIVER_OK, no? I would forbid that then. > +Once the driver sets SUSPEND to \field{device status} of the device: > +\begin{itemize} > +\item The driver MUST re-read \field{device status} to verify whether the SUSPEND bit is set. This is still vague, I commented on this several times. I think what you mean is that until it reads status with SUSPEND as 1, it does not consider SUSPEND set. But, what happens if driver clears SUSPEND before it reads it as set? It would seem that should cancel suspend (which is a useful thing to support)? But this creates a problem as it breaks read/modify/write that some hypervisors assumed to be safe. I guess we need an extra SUSPEND_IN_PROGRESS bit then? A little too much for status, at this stage - maybe we need an extra register for this. > +\item The driver MUST NOT make any more buffers available to the device. > +\item The driver MUST NOT access any virtqueues or send notifications for any virtqueues. > +\item The driver MUST NOT access Device Configuration Space. > +\end{itemize} > + > +\devicenormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} > + > +The device MUST ignore SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. > + > +The device MUST ignore all access to its Configuration Space while > +suspended, except for \field{device status} if it is part of the Configuration Space. > + > +A device MUST NOT send any notifications for any virtqeuues, > +access any virtqueues, or modify any fields in > +its Configuration Space while suspended. > + > +If changes occur in the Configuration Space while the SUSPEND bit is set, > +the device MUST NOT send any configuration change notifications. > +Instead, the device MUST send the notification after the SUSPEND bit has been cleared. > + > +When the driver sets SUSPEND, the device MUST either suspend itself or set DEVICE_NEEDS_RESET if failed to suspend. > + > +If SUSPEND is set in \field{device status}, when the driver clears SUSPEND, > +the device MUST either resume normal operation or set DEVICE_NEEDS_RESET. > + > +When the driver sets SUSPEND, > +the device SHOULD perform the following actions before presenting that the SUSPEND bit is set to 1 in the \field{device status}: what does "before presenting" mean? does it return SUSPEND as 0 after driver wrote 1 there and before it completed these actions? > + > +\begin{itemize} > +\item Stop processing more buffers of any virtqueues > +\item Wait until all buffers that are being processed have been used. > +\item Send used buffer notifications to the driver. > +\end{itemize} > + > \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} > > Virtio can use various different buses, thus the standard is split > @@ -872,6 +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for > handling features reserved for future use. > > + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can > + trigger suspending the device via the SUSPEND flag > + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > + > \end{description} > > \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} > -- > 2.45.2 ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-13 8:01 ` Michael S. Tsirkin @ 2024-08-15 9:12 ` Zhu Lingshan 2024-08-15 10:50 ` Michael S. Tsirkin 0 siblings, 1 reply; 69+ messages in thread From: Zhu Lingshan @ 2024-08-15 9:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: cohuck, jasowang, virtio-comment, Eugenio Pérez, David Stevens On 8/13/2024 4:01 PM, Michael S. Tsirkin wrote: > On Thu, Aug 01, 2024 at 07:35:16PM +0800, Zhu Lingshan wrote: >> +\drivernormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} >> + >> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. > Actually, it has no effect before DRIVER_OK, no? I would forbid that > then. There can be two cases: 1) debugging the device before DRIVER_OK 2) VM migration right after boot, and before DRIVER_OK. So it needs to migrate the device before DRIVER_OK > >> +Once the driver sets SUSPEND to \field{device status} of the device: >> +\begin{itemize} >> +\item The driver MUST re-read \field{device status} to verify whether the SUSPEND bit is set. > > This is still vague, I commented on this several times. > I think what you mean is that until it reads status with > SUSPEND as 1, it does not consider SUSPEND set. > > > But, what happens if driver clears SUSPEND before it reads it as set? > It would seem that should cancel suspend (which is a useful thing to > support)? But this creates a problem as it breaks read/modify/write that > some hypervisors assumed to be safe. I guess we need an extra > SUSPEND_IN_PROGRESS bit then? A little too much for status, at this > stage - maybe we need an extra register for this. if the driver reads SUSPEND bit == 0, with the knowledge of its own operations, it knows the device is either in a) normal operation 2). suspending in progress Cancelling a SUSPEND right after setting SUSPEND is actually a RESUME. The ideal case is the driver waits until SUSPEND == 1, then clear it. But there are various drivers, not all drivers follow our expectation, so there can be a corner case that how the driver know whether the device is resumed after clearing the SUSPEND bit or not finish suspending. I think adding a new resister can surely fix this issue, however as you said, add a new status bit is overkill and where to place this new register is a new question. So, to make things easier, how about suspend & resume the device sequentially: If the driver has suspended the device by setting the SUSPEND bit to 1, it MUST NOT clear the SUSPEND bit before the device presenting the SUSPEND bit set as 1. > > >> +\item The driver MUST NOT make any more buffers available to the device. >> +\item The driver MUST NOT access any virtqueues or send notifications for any virtqueues. >> +\item The driver MUST NOT access Device Configuration Space. >> +\end{itemize} >> + >> +\devicenormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} >> + >> +The device MUST ignore SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. >> + >> +The device MUST ignore all access to its Configuration Space while >> +suspended, except for \field{device status} if it is part of the Configuration Space. >> + >> +A device MUST NOT send any notifications for any virtqeuues, >> +access any virtqueues, or modify any fields in >> +its Configuration Space while suspended. >> + >> +If changes occur in the Configuration Space while the SUSPEND bit is set, >> +the device MUST NOT send any configuration change notifications. >> +Instead, the device MUST send the notification after the SUSPEND bit has been cleared. >> + >> +When the driver sets SUSPEND, the device MUST either suspend itself or set DEVICE_NEEDS_RESET if failed to suspend. >> + >> +If SUSPEND is set in \field{device status}, when the driver clears SUSPEND, >> +the device MUST either resume normal operation or set DEVICE_NEEDS_RESET. >> + >> +When the driver sets SUSPEND, >> +the device SHOULD perform the following actions before presenting that the SUSPEND bit is set to 1 in the \field{device status}: > what does "before presenting" mean? does it return SUSPEND as 0 > after driver wrote 1 there and before it completed these > actions? Yes, it should return 0 if the suspending operation is still in progress, It should not change SUSPEND bit value before finish. Do you suggest we add: The device MUST present SUSPEND bit set to 1 in \field{device status} once it has been suspended It says: presenting that the SUSPEND bit is set to 1. Not a native speaker, but this looks clear to me. > >> + >> +\begin{itemize} >> +\item Stop processing more buffers of any virtqueues >> +\item Wait until all buffers that are being processed have been used. >> +\item Send used buffer notifications to the driver. >> +\end{itemize} >> + >> \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} >> >> Virtio can use various different buses, thus the standard is split >> @@ -872,6 +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} >> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for >> handling features reserved for future use. >> >> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can >> + trigger suspending the device via the SUSPEND flag >> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >> + >> \end{description} >> >> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} >> -- >> 2.45.2 ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 9:12 ` Zhu Lingshan @ 2024-08-15 10:50 ` Michael S. Tsirkin 2024-08-30 2:20 ` Zhu Lingshan 0 siblings, 1 reply; 69+ messages in thread From: Michael S. Tsirkin @ 2024-08-15 10:50 UTC (permalink / raw) To: Zhu Lingshan Cc: cohuck, jasowang, virtio-comment, Eugenio Pérez, David Stevens On Thu, Aug 15, 2024 at 05:12:23PM +0800, Zhu Lingshan wrote: > > > On 8/13/2024 4:01 PM, Michael S. Tsirkin wrote: > > On Thu, Aug 01, 2024 at 07:35:16PM +0800, Zhu Lingshan wrote: > >> +\drivernormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} > >> + > >> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. > > Actually, it has no effect before DRIVER_OK, no? I would forbid that > > then. > There can be two cases: > 1) debugging the device before DRIVER_OK > 2) VM migration right after boot, and before DRIVER_OK. So it needs to migrate the device before DRIVER_OK Sorry I don't understand. The only effect of SUSPEND as you specified it is not consuming buffers. Buffers are not consumed before DRIVER_OK, thus there is no point to set SUSPEND before DRIVER_OK. > > > >> +Once the driver sets SUSPEND to \field{device status} of the device: > >> +\begin{itemize} > >> +\item The driver MUST re-read \field{device status} to verify whether the SUSPEND bit is set. > > > > This is still vague, I commented on this several times. > > I think what you mean is that until it reads status with > > SUSPEND as 1, it does not consider SUSPEND set. > > > > > > But, what happens if driver clears SUSPEND before it reads it as set? > > It would seem that should cancel suspend (which is a useful thing to > > support)? But this creates a problem as it breaks read/modify/write that > > some hypervisors assumed to be safe. I guess we need an extra > > SUSPEND_IN_PROGRESS bit then? A little too much for status, at this > > stage - maybe we need an extra register for this. > if the driver reads SUSPEND bit == 0, with the knowledge of its own operations, > it knows the device is either in a) normal operation 2). suspending in progress > > Cancelling a SUSPEND right after setting SUSPEND is actually a RESUME. > The ideal case is the driver waits until SUSPEND == 1, then clear it. > But there are various drivers, not all drivers follow our expectation, > so there can be a corner case that how the driver know whether > the device is resumed after clearing the SUSPEND bit or not finish suspending. > > I think adding a new resister can surely fix this issue, however as you > said, add a new status bit is overkill and where to place this new register > is a new question. Place it after existing registers. > > So, to make things easier, how about suspend & resume the device sequentially: > > If the driver has suspended the device by setting the SUSPEND bit to 1, it MUST NOT > clear the SUSPEND bit before the device presenting the SUSPEND bit set as 1. I'm worried that it's fragile - one can no longer deduce the device state by reading the status. > > > > > >> +\item The driver MUST NOT make any more buffers available to the device. > >> +\item The driver MUST NOT access any virtqueues or send notifications for any virtqueues. > >> +\item The driver MUST NOT access Device Configuration Space. > >> +\end{itemize} > >> + > >> +\devicenormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} > >> + > >> +The device MUST ignore SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. > >> + > >> +The device MUST ignore all access to its Configuration Space while > >> +suspended, except for \field{device status} if it is part of the Configuration Space. > >> + > >> +A device MUST NOT send any notifications for any virtqeuues, > >> +access any virtqueues, or modify any fields in > >> +its Configuration Space while suspended. > >> + > >> +If changes occur in the Configuration Space while the SUSPEND bit is set, > >> +the device MUST NOT send any configuration change notifications. > >> +Instead, the device MUST send the notification after the SUSPEND bit has been cleared. > >> + > >> +When the driver sets SUSPEND, the device MUST either suspend itself or set DEVICE_NEEDS_RESET if failed to suspend. > >> + > >> +If SUSPEND is set in \field{device status}, when the driver clears SUSPEND, > >> +the device MUST either resume normal operation or set DEVICE_NEEDS_RESET. > >> + > >> +When the driver sets SUSPEND, > >> +the device SHOULD perform the following actions before presenting that the SUSPEND bit is set to 1 in the \field{device status}: > > what does "before presenting" mean? does it return SUSPEND as 0 > > after driver wrote 1 there and before it completed these > > actions? > Yes, it should return 0 if the suspending operation is still in progress, > It should not change SUSPEND bit value before finish. > Do you suggest we add: > The device MUST present SUSPEND bit set to 1 in \field{device status} once it has been suspended > > It says: presenting that the SUSPEND bit is set to 1. > Not a native speaker, but this looks clear to me. I don't think this is sufficient, you need to document all assumptions on both device and driver. > > > >> + > >> +\begin{itemize} > >> +\item Stop processing more buffers of any virtqueues > >> +\item Wait until all buffers that are being processed have been used. > >> +\item Send used buffer notifications to the driver. > >> +\end{itemize} > >> + > >> \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} > >> > >> Virtio can use various different buses, thus the standard is split > >> @@ -872,6 +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > >> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for > >> handling features reserved for future use. > >> > >> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can > >> + trigger suspending the device via the SUSPEND flag > >> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. > >> + > >> \end{description} > >> > >> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} > >> -- > >> 2.45.2 ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status 2024-08-15 10:50 ` Michael S. Tsirkin @ 2024-08-30 2:20 ` Zhu Lingshan 0 siblings, 0 replies; 69+ messages in thread From: Zhu Lingshan @ 2024-08-30 2:20 UTC (permalink / raw) To: Michael S. Tsirkin Cc: cohuck, jasowang, virtio-comment, Eugenio Pérez, David Stevens On 8/15/2024 6:50 PM, Michael S. Tsirkin wrote: > On Thu, Aug 15, 2024 at 05:12:23PM +0800, Zhu Lingshan wrote: >> >> On 8/13/2024 4:01 PM, Michael S. Tsirkin wrote: >>> On Thu, Aug 01, 2024 at 07:35:16PM +0800, Zhu Lingshan wrote: >>>> +\drivernormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} >>>> + >>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. >>> Actually, it has no effect before DRIVER_OK, no? I would forbid that >>> then. >> There can be two cases: >> 1) debugging the device before DRIVER_OK >> 2) VM migration right after boot, and before DRIVER_OK. So it needs to migrate the device before DRIVER_OK > > Sorry I don't understand. The only effect of SUSPEND as you specified > it is not consuming buffers. Buffers are not consumed before DRIVER_OK, > thus there is no point to set SUSPEND before DRIVER_OK. Sorry for the late reply, I am very busy these two weeks. There can be other operations except buffers before DRIVER_OK, like config interrupts. And the hypervisor may migrate VMs right after creating them, at that moment DRIVER_OK may not be set yet, we should not block this use case. > >>>> +Once the driver sets SUSPEND to \field{device status} of the device: >>>> +\begin{itemize} >>>> +\item The driver MUST re-read \field{device status} to verify whether the SUSPEND bit is set. >>> This is still vague, I commented on this several times. >>> I think what you mean is that until it reads status with >>> SUSPEND as 1, it does not consider SUSPEND set. >>> >>> >>> But, what happens if driver clears SUSPEND before it reads it as set? >>> It would seem that should cancel suspend (which is a useful thing to >>> support)? But this creates a problem as it breaks read/modify/write that >>> some hypervisors assumed to be safe. I guess we need an extra >>> SUSPEND_IN_PROGRESS bit then? A little too much for status, at this >>> stage - maybe we need an extra register for this. >> if the driver reads SUSPEND bit == 0, with the knowledge of its own operations, >> it knows the device is either in a) normal operation 2). suspending in progress >> >> Cancelling a SUSPEND right after setting SUSPEND is actually a RESUME. >> The ideal case is the driver waits until SUSPEND == 1, then clear it. >> But there are various drivers, not all drivers follow our expectation, >> so there can be a corner case that how the driver know whether >> the device is resumed after clearing the SUSPEND bit or not finish suspending. >> >> I think adding a new resister can surely fix this issue, however as you >> said, add a new status bit is overkill and where to place this new register >> is a new question. > Place it after existing registers. I am still confused, SUSPEND and RESUME are both device state transition, just like RESET, FEATURES_OK and DRIVER_OK. I see current virtio works fine for these existing status without an extra register. And we can not assume SUSPEND / RESUME takes longer time than any other states. So, do you want a new register for all state transitions? Apply to all device states, not only SUSPEND. > > >> So, to make things easier, how about suspend & resume the device sequentially: >> >> If the driver has suspended the device by setting the SUSPEND bit to 1, it MUST NOT >> clear the SUSPEND bit before the device presenting the SUSPEND bit set as 1. > > I'm worried that it's fragile - one can no longer deduce > the device state by reading the status. It is the driver reads / sets device status, it has the knowledge of the device status I think. For example, if it sets SUSPEND and reads SUSPEND == 0, means SUSPEND in progress. so I even fell this is actually not necessary . > >>> >>>> +\item The driver MUST NOT make any more buffers available to the device. >>>> +\item The driver MUST NOT access any virtqueues or send notifications for any virtqueues. >>>> +\item The driver MUST NOT access Device Configuration Space. >>>> +\end{itemize} >>>> + >>>> +\devicenormative{\subsection}{Device Suspend}{General Initialization And Device Operation / Device Suspend} >>>> + >>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or VIRTIO_F_SUSPEND is not negotiated. >>>> + >>>> +The device MUST ignore all access to its Configuration Space while >>>> +suspended, except for \field{device status} if it is part of the Configuration Space. >>>> + >>>> +A device MUST NOT send any notifications for any virtqeuues, >>>> +access any virtqueues, or modify any fields in >>>> +its Configuration Space while suspended. >>>> + >>>> +If changes occur in the Configuration Space while the SUSPEND bit is set, >>>> +the device MUST NOT send any configuration change notifications. >>>> +Instead, the device MUST send the notification after the SUSPEND bit has been cleared. >>>> + >>>> +When the driver sets SUSPEND, the device MUST either suspend itself or set DEVICE_NEEDS_RESET if failed to suspend. >>>> + >>>> +If SUSPEND is set in \field{device status}, when the driver clears SUSPEND, >>>> +the device MUST either resume normal operation or set DEVICE_NEEDS_RESET. >>>> + >>>> +When the driver sets SUSPEND, >>>> +the device SHOULD perform the following actions before presenting that the SUSPEND bit is set to 1 in the \field{device status}: >>> what does "before presenting" mean? does it return SUSPEND as 0 >>> after driver wrote 1 there and before it completed these >>> actions? >> Yes, it should return 0 if the suspending operation is still in progress, >> It should not change SUSPEND bit value before finish. >> Do you suggest we add: >> The device MUST present SUSPEND bit set to 1 in \field{device status} once it has been suspended >> >> It says: presenting that the SUSPEND bit is set to 1. >> Not a native speaker, but this looks clear to me. > I don't think this is sufficient, you need to document all assumptions > on both device and driver. Would you please help me understanding what assumptions should be addressed here? IMHO, the driver knows what it is doing I think. > > >>>> + >>>> +\begin{itemize} >>>> +\item Stop processing more buffers of any virtqueues >>>> +\item Wait until all buffers that are being processed have been used. >>>> +\item Send used buffer notifications to the driver. >>>> +\end{itemize} >>>> + >>>> \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options} >>>> >>>> Virtio can use various different buses, thus the standard is split >>>> @@ -872,6 +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} >>>> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for >>>> handling features reserved for future use. >>>> >>>> + \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can >>>> + trigger suspending the device via the SUSPEND flag >>>> + See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}. >>>> + >>>> \end{description} >>>> >>>> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} >>>> -- >>>> 2.45.2 ^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2024-10-17 6:56 UTC | newest] Thread overview: 69+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-01 11:35 [PATCH V7 v7] virtio: introduce SUSPEND bit in device status Zhu Lingshan 2024-08-13 4:42 ` Parav Pandit 2024-08-13 5:44 ` Zhu Lingshan 2024-08-13 5:50 ` Parav Pandit 2024-08-13 6:14 ` Zhu Lingshan 2024-08-13 6:55 ` Parav Pandit 2024-08-15 8:23 ` Zhu Lingshan 2024-08-15 9:34 ` Parav Pandit 2024-08-30 2:31 ` Zhu Lingshan 2024-08-30 3:02 ` Parav Pandit 2024-09-03 9:05 ` Zhu Lingshan 2024-09-03 9:45 ` Michael S. Tsirkin 2024-09-03 10:09 ` Parav Pandit 2024-09-03 10:35 ` Michael S. Tsirkin 2024-09-03 10:37 ` Michael S. Tsirkin 2024-09-04 3:07 ` Jason Wang 2024-09-04 4:02 ` Michael S. Tsirkin 2024-09-04 6:31 ` Jason Wang 2024-09-04 6:38 ` Zhu Lingshan 2024-09-04 6:46 ` Parav Pandit 2024-09-05 7:14 ` Zhu Lingshan 2024-09-05 7:16 ` Parav Pandit 2024-09-05 7:29 ` Zhu Lingshan 2024-09-05 7:35 ` Parav Pandit 2024-09-05 8:30 ` Zhu Lingshan 2024-09-05 8:41 ` David Stevens 2024-09-06 1:53 ` Parav Pandit 2024-09-05 7:17 ` Michael S. Tsirkin 2024-09-05 7:31 ` Zhu Lingshan 2024-09-05 7:34 ` Parav Pandit 2024-09-05 6:51 ` Michael S. Tsirkin 2024-09-05 7:12 ` Zhu Lingshan 2024-09-05 8:12 ` Michael S. Tsirkin 2024-09-05 9:09 ` Zhu Lingshan 2024-09-06 1:54 ` Parav Pandit 2024-09-05 23:51 ` Jason Wang 2024-09-11 3:52 ` Zhu Lingshan 2024-09-11 10:20 ` Michael S. Tsirkin 2024-09-12 2:05 ` Jason Wang 2024-09-12 5:44 ` Michael S. Tsirkin 2024-09-24 7:35 ` Jason Wang 2024-09-24 23:05 ` Michael S. Tsirkin 2024-09-25 3:47 ` Jason Wang 2024-09-25 11:17 ` Michael S. Tsirkin 2024-09-27 4:08 ` Jason Wang 2024-09-29 17:55 ` Michael S. Tsirkin 2024-10-17 6:56 ` Jason Wang 2024-09-03 10:28 ` Parav Pandit 2024-09-05 7:20 ` Zhu Lingshan 2024-08-15 10:45 ` Michael S. Tsirkin 2024-08-30 2:32 ` Zhu Lingshan 2024-08-15 10:52 ` Michael S. Tsirkin 2024-08-15 10:59 ` Parav Pandit 2024-08-15 15:07 ` Michael S. Tsirkin 2024-08-17 5:19 ` Parav Pandit 2024-08-30 2:37 ` Zhu Lingshan 2024-08-30 3:10 ` Parav Pandit 2024-09-03 8:51 ` Zhu Lingshan 2024-09-03 8:55 ` Parav Pandit 2024-09-03 9:36 ` Michael S. Tsirkin 2024-09-05 7:27 ` Zhu Lingshan 2024-09-24 23:07 ` Michael S. Tsirkin 2024-08-13 7:51 ` Michael S. Tsirkin 2024-08-13 7:58 ` Parav Pandit 2024-08-13 8:03 ` Michael S. Tsirkin 2024-08-13 8:01 ` Michael S. Tsirkin 2024-08-15 9:12 ` Zhu Lingshan 2024-08-15 10:50 ` Michael S. Tsirkin 2024-08-30 2:20 ` Zhu Lingshan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox