public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
From: Zhu Lingshan <lingshan.zhu@amd.com>
To: Parav Pandit <parav@nvidia.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>
Cc: "virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
	"Ray.Huang@amd.com" <Ray.Huang@amd.com>
Subject: Re: [PATCH v2] virtio: introduce SUSPEND and RESUME feature
Date: Mon, 9 Jun 2025 16:00:35 +0800	[thread overview]
Message-ID: <3b27c481-45b6-49de-a660-ea79063cc9d7@amd.com> (raw)
In-Reply-To: <CY8PR12MB71950056CD7393AADA44EFCBDC69A@CY8PR12MB7195.namprd12.prod.outlook.com>

On 6/7/2025 10:11 AM, Parav Pandit wrote:
>
>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>> Sent: Friday, June 6, 2025 4:01 PM
>>
>> On 6/6/2025 1:52 PM, Parav Pandit wrote:
>>> Hi,
>>>
>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>> Sent: Thursday, May 29, 2025 1:54 PM
>>>>
>>>> This commit allows the driver to suspend the device through a new
>>>> device status bit SUSPEND and resume the device running by re-setting
>>>> DRIVER_OK bit in device status.
>>>>
>>>> This commit re-orders the device status bits.
>>>>
>>> I overall like this proposal. Few small comments to fix more below.
>>> It is lot cleaner as it eliminates the previous bit flipping complexities.
>>>
>>> Can you please add one or two line on a use case/motivation,
>>>
>>> For example,
>>>
>>> Presently when an OS is suspended, the driver must fully stop and reset the
>> device, and on resume the driver must re-initialize the device.
>>> This can be time consuming task to recreate large number of VQs.
>>>
>>> Instead, extend the specification to allow drivers to suspend and resume
>> the device, without recreating all the resources.
>>> ...
>> Hello Parav
>>
>> Here I think the spec provides mechanisms, not a policy. The user may
>> employ this feature for other tasks other than live migration which requires
>> OS suspending.
>>
> Its not about some policy.
> Sure user may use the feature in N different use cases.
>
> The intent is very simple. For the reviewers to review the patch, why something is needed should be presented.
> So motivation/usecase/need is worth the entry.
> Some reviewers do rely on the commit logs to understand the design intent behind it.
>
> Without the need of the patch, I am not sure why would even one write and review the spec patch.
> Please add known use cases. I am familiar with one that I suggested above.

we can add these descriptions in the cover letter.

>  
>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>  content.tex | 74 ++++++++++++++++++++++++++++++++++++++++++++++-
>> ----
>>>> --
>>>>  1 file changed, 65 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/content.tex b/content.tex index d3fc6a4..caef5d0 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status}
>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>      this bit.  For example, under Linux, drivers can be loadable modules.
>>>>    \end{note}
>>>>
>>>> -\item[FAILED (128)] Indicates that something went wrong in the
>>>> guest,
>>>> -  and it has given up on the device. This could be an internal
>>>> -  error, or the driver didn't like the device for some reason, or
>>>> -  even a fatal error during device operation.
>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and ready
>>>> +to
>>>> +  drive the device.
>>>> +
>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates
>>>> +that the
>>>> +  device has been suspended by the driver.
>>>>
>>>>  \item[FEATURES_OK (8)] Indicates that the driver has acknowledged all
>> the
>>>>    features it understands, and feature negotiation is complete.
>>>>
>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and ready
>>>> to
>>>> -  drive the device.
>>>> -
>>>>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has
>> experienced
>>>>    an error from which it can't recover.
>>>> +
>>>> +\item[FAILED (128)] Indicates that something went wrong in the
>>>> +guest,
>>>> +  and it has given up on the device. This could be an internal
>>>> +  error, or the driver didn't like the device for some reason, or
>>>> +  even a fatal error during device operation.
>>>>  \end{description}
>>>>
>>> A new bit SUSPEND introduction should be done at the end without re-
>> organizing rest of the bits.
>>> Please add at the end.
>>>
>>> Is the intention to rearrange the fields in ascending order of the bit
>> number?
>>> If yes, its good change too but than it missed re-arranging FEATURE_OK.
>> oh yes, good catch, will fix
>>
>>> If you want to re-arrange, please have the pre-patch in this series.
>>>
>>> Either way is ok to me,
>>> 1. Either as pre-patch for re-arrangement + addition of SUSPEND in middle.
>>> Or
>>> 2. adding SUSPEND bit definition at the end
>>>
>>> #1 is better to see in ascending order.
>> I was required to re-arrange these status bits, but this is a trivial change, so
>> lets keep them in a single patch for easier review
> Having logically separate patches are always good. Re-arrangement is pure cosmetic vs this one that adds new functionality.

ok, we can split this patch into a series if this is preferred.

>
>>>>  The \field{device status} field starts out as 0, and is
>>>> reinitialized to 0 by @@ -
>>>> 99,10 +102,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of
>>>> a Virtio Device / Feature B  \begin{description}
>>>>  \item[0 to 23, and 50 to 127] Feature bits for the specific device
>>>> type
>>>>
>>>> -\item[24 to 41] Feature bits reserved for extensions to the queue
>>>> and
>>>> +\item[24 to 43] Feature bits reserved for extensions to the queue
>>>> +and
>>>>    feature negotiation mechanisms, see \ref{sec:Reserved Feature
>>>> Bits}
>>>>
>>> Only one feature bit is used by this proposal.
>>> So it should be 24 to 42 and use bit 42.
>>>
>>> Did you use bit 43 due to existence of VIRTIO_NET_F_GUEST_RSC6?
>> Yes, bit 42 is taken for VIRTIO_NET_F_GUEST_RSC6.
> So the current spec wordings  is incorrect 0 to 23, and 50 to 127 Feature bits for the specific device type

current spec missed documenting bit 42.

>
> We need to says,
>
> 0 to 23, 41 to 42 and 50 to 127 Feature bits for the specific device type
> This should also be pre-patch, so that it can be backported to master branch.
> Another reason for the small patch to be separate.
>
> If you want me to write up this pre-patch, please let me know.
> You can include in the series.

There could a a separate patch in a series says:
41 to 42 Feature bits reserved for extensions to the queue and feature negotiation mechanisms
43 to 49, and 128 and above Feature bits reserved for future extensions. 

which fixes commit 9438414 

>
>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for future
>>>> extensions.
>>>> +\item[44 to 49, and 128 and above] Feature bits reserved for future
>>>> extensions.
>>>>  \end{description}
>>>>
>>>>  \begin{note}
>>>> @@ -629,6 +632,54 @@ \section{Device Cleanup}\label{sec:General
>>>> Initialization And Device Operation /
>>>>
>>>>  Thus a driver MUST ensure a virtqueue isn't live (by device reset)
>>>> before removing exposed buffers.
>>>>
>>>> +\section{Device Suspend}\label{sec:General Initialization And Device
>>>> +Operation / Device Suspend}
>>>> +
>>>> +If VIRTIO_F_SUSPEND is negotiated, the driver is eligible to suspend
>>>> +the
>>>> device by setting the SUSPEND bit in \field{device status} to 1, and
>>>> the device SHOULD set the DRIVER_OK bit to 0 once it has been
>> suspended.
>>>> +
>>> This is not a normative section so please avoid using SHOULD.
>>> Better to word it as informative theory of operation text.
>>> Such as,
>> why "SHOULD" can only be used in normative sections?
>>
> Because:
>
> 1. To clarify the intent:
> These keywords carry precise semantic weight. When a spec says MUST, it means compliance is not optional. Using such terms in general or explanatory text risks misinterpreting guidance as a requirement.
>
> 2. Avoiding Ambiguity:
> If normative language is used in non-normative (informative) sections, it becomes unclear whether the reader is required to follow it or if it's just background or suggestion.
>
> 3. Informative sections explain context, rationale, or examples. Normative sections define rules. Mixing the two muddies the structure and weakens the authority of the normative part.

There are no such rules.

In the spec section 1.3 Terminology, it says:

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119] and [RFC8174] when, and only when, they appear in all capitals, as shown here. 

RFC 2119 says:

SHOULD This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


Here this "SHOULD" exactly conforms to the definition.

"SHOULD" has already been used in non-normative sections, for example:
2.5.4 Legacy Interfaces: when using the legacy interface, drivers SHOULD read these fields multiple times until two reads generate a consistent result. 

The spec does not say: “SHOULD” can only be used in driver or device requirements section.

>
>>> If X is not negotiated, the....
>>> And the device sets the DRIVER_OK bit to 0 once it has been suspend.
>>>
>>>> +If the device has been suspended, the driver can resume the device
>>>> +running
>>>> by setting the DRIVER_OK bit in \field{device status} to 1, and the
>>>> device should set the SUSPEND bit to 0 once it resumes running.
>>>> +
>>> The device sets the SUSPEND bit to 0 once the device is running.
>> What is the difference in semantic?
>>
> "resumes running" does not read correct.
> And normative "should" is used...
> Hence the suggestion to re-word it.

Looking into a dictionary https://www.merriam-webster.com/dictionary/resume
There is quite a lot of examples of resume + noun.

And here, "SHOULD" still conforms to RFC 2119.

>
>>>> +\drivernormative{\subsection}{Device Suspend}{General Initialization
>>>> +And Device Operation / Device Suspend}
>>>> +
>>>> +The driver SHOULD NOT set SUSPEND bit if DRIVER_OK is not set or
>>>> VIRTIO_F_SUSPEND is not negotiated.
>>>> +
>>>> +Once the driver sets SUSPEND bit in \field{device status} to 1:
>>>> +\begin{itemize}
>>>> +\item The driver MUST verify whether the device has been suspended
>>>> +by
>>>> re-reading \field{device status}, examining whether the SUSPEND bit
>>>> is set to
>>>> 1 and the DRIVER_OK bit is set to 0.
>>>> +\item The driver MUST NOT make any more buffers available to the
>> device.
>>>> +\item The driver MUST NOT send notifications for any virtqueues.
>>>> +\item The driver MUST NOT make any changes to Device Configuration
>>>> Space except for \field{device status} if it is part of the Configuration
>> Space.
>>>> +\end{itemize}
>>>> +
>>>> +\devicenormative{\subsection}{Device Suspend}{General Initialization
>>>> +And Device Operation / Device Suspend}
>>>> +
>>>> +The device MUST ignore any operations on the SUSPEND bit from the
>>>> +driver if the device has not been completely initialized by the
>>>> +procedures in \ref{sec:General Initialization And Device Operation /
>>>> +Device Initialization}
>>>> +
>>>> +The device SHOULD ignore any write access to its Configuration Space
>>>> +while suspended, except for \field{device status} if it is part of
>>>> +the
>>>> Configuration Space.
>>>> +
>>>> +A device MUST NOT send any notifications for any virtqueues, access
>>>> +any virtqueues, or modify any fields in its Configuration Space
>>>> +while suspended.
>>>> +
>>>> +If changes occur in the Configuration Space during suspended period,
>>>> +the device MUST NOT send any configuration change notifications.
>>>> +Instead, the device MUST send the notification when it resumes running.
>>>> +
>>>> +If the driver sets the SUSPEND bit in \field{device status} to 1,
>>>> +the device MUST either suspend itself or set the DEVICE_NEEDS_RESET
>>>> +bit in
>>>> \field{device status} to 1 when it fails to suspend.
>>>> +
>>> Even though you have the driver normative, it is good to have device
>> normative for suspend and resume operation:
>>> Something like,
>>>
>>> When the suspend operation completes in the device, the device MUST
>>> set SUSPEND bit to 1 and DRIVERK_OK to 0.
>>>
>>> When the suspended device is resumed, the device MUST set SUSPEND bit
>> to 0 and DRIVER_OK to 1.
>>
>> I think we already have similar descriptions in the beginning of this section
>>
> I didn't find it in the normative device section.

+\section{Device Suspend}\label{sec:General Initialization And Device Operation / Device Suspend}
+
+If VIRTIO_F_SUSPEND is negotiated, the driver is eligible to suspend the device by setting the SUSPEND bit in \field{device status} to 1, and the device SHOULD set the DRIVER_OK b
it to 0 once it has been suspended.
+
+If the device has been suspended, the driver can resume the device running by setting the DRIVER_OK bit in \field{device status} to 1, and the device SHOULD set the SUSPEND bit to 0 once it resumes running.

>  
>>>
>>>> +If the device has been suspended and the driver resumes the device
>>>> +running by setting the DRIVER_OK bit in \field{device status} to 1,
>>>> +the
>>>> device MUST either resume normal operation or set the
>>>> DEVICE_NEEDS_RESET bit in \field{device status} to 1 when it fails to
>> resume.
>>>> +
>>>> +When the driver sets the SUSPEND bit to 1, the device SHOULD perform
>>> It should be MUST.
>>> Otherwise it can crash the system accessing memory that may not be any
>> more accessible.
>>
>> Not sure how this can crash system memory, but "MUST" is OK
>>
> Ok.
>>>> +the following actions before presenting the SUSPEND bit as 1 and
>>>> DRIVER_OK bit as 0 in the \field{device status}:
>>>> +
>>>> +\begin{itemize}
>>>> +\item Stop consuming more buffers of any virtqueues.
>>>> +\item Wait until all buffers that are being processed have been used.
>>>> +\item Send used buffer notifications to the driver.
>>>> +\end{itemize}
>>>> +
>>>>  \chapter{Virtio Transport Options}\label{sec:Virtio Transport
>>>> Options}
>>>>
>>>>  Virtio can use various different buses, thus the standard is split
>>>> @@ -872,6
>>>> +923,11 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature
>>>> Bits}
>>>>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature
>>>> Bits} for
>>>>  	handling features reserved for future use.
>>>>
>>>> +  \item[VIRTIO_F_SUSPEND(43)] This feature indicates that the driver can
>>>> +   suspend the device by set the SUSPEND bit to 1.
>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>>>> +
>>>> +
>>>>  \end{description}
>>>>
>>>>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature
>>>> Bits}
>>>> --
>>>> 2.49.0
>>>>
>>> Apart from these small comments, please add the line in v3 commit log:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>> ub.com%2Foasis-tcs%2Fvirtio-
>> spec%2Fissues%2F&data=05%7C02%7Cparav%40nv
>> idia.com%7C0ef2b01694d942339bb308dda4e5365c%7C43083d15727340c1b
>> 7db39ef
>> d9ccc17a%7C0%7C0%7C638848026619639978%7CUnknown%7CTWFpbGZsb
>> 3d8eyJFbXB0
>> eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpb
>> CIsIl
>> dUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=0st2osYsAxascHHmHj0RgSyYAn9l
>> yQRKqxOl
>>> aIjE8dk%3D&reserved=0<issue_number>
>>>
>>> This will help to link the two sides.
>> sure
>>
> Thanks.


  reply	other threads:[~2025-06-09  8:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29  8:23 [PATCH v2] virtio: introduce SUSPEND and RESUME feature Zhu Lingshan
2025-06-05  6:11 ` Zhu Lingshan
2025-06-06  5:52 ` Parav Pandit
2025-06-06 10:30   ` Zhu Lingshan
2025-06-07  2:11     ` Parav Pandit
2025-06-09  8:00       ` Zhu Lingshan [this message]
2025-06-10 10:37         ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b27c481-45b6-49de-a660-ea79063cc9d7@amd.com \
    --to=lingshan.zhu@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=cohuck@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=virtio-comment@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox