public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
From: Zhu Lingshan <lingshan.zhu@amd.com>
To: Parav Pandit <parav@nvidia.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>
Cc: "virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"David Stevens" <stevensd@chromium.org>
Subject: Re: [PATCH V7 v7] virtio: introduce SUSPEND bit in device status
Date: Thu, 5 Sep 2024 15:20:54 +0800	[thread overview]
Message-ID: <de10e802-c0d7-4cf6-b1f8-3624b47e191c@amd.com> (raw)
In-Reply-To: <IA0PR12MB8713A9B32BED215EE545C7A5DC932@IA0PR12MB8713.namprd12.prod.outlook.com>



On 9/3/2024 6:28 PM, Parav Pandit wrote:
>
>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>> Sent: Tuesday, September 3, 2024 2:36 PM
>>
>> On 8/30/2024 11:02 AM, Parav Pandit wrote:
>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>> Sent: Friday, August 30, 2024 8:02 AM
>>>>
>>>> On 8/15/2024 5:34 PM, Parav Pandit wrote:
>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>>>> Sent: Thursday, August 15, 2024 1:53 PM
>>>>>>
>>>>>> On 8/13/2024 2:55 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>>>>>> Sent: Tuesday, August 13, 2024 11:44 AM
>>>>>>>>
>>>>>>> I removed the unreachable intel email id as every single email is
>>>>>>> bouncing
>>>>>> from it.
>>>>>>> Please consider dropping that email from v8 as it will cause all
>>>>>>> reviewers
>>>>>> email to bounce.
>>>>>>>> On 8/13/2024 1:50 PM, Parav Pandit wrote:
>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>>>>>>>> Sent: Tuesday, August 13, 2024 11:15 AM
>>>>>>>>>>
>>>>>>>>>> On 8/13/2024 12:42 PM, Parav Pandit wrote:
>>>>>>>>>>> Hi Lingshan, David,
>>>>>>>>>>>
>>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@amd.com>
>>>>>>>>>>>> Sent: Thursday, August 1, 2024 5:05 PM
>>>>>>>>>>>>
>>>>>>>>>>>> This commit allows the driver to suspend the device by
>>>>>>>>>>>> introducing a new status bit SUSPEND in device_status.
>>>>>>>>>>>>
>>>>>>>>>>>> This commit also introduce a new feature bit
>> VIRTIO_F_SUSPEND
>>>>>>>>>>>> which indicating whether the device support SUSPEND.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
>>>>>>>>>>>> Signed-off-by: David Stevens <stevensd@chromium.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  content.tex | 75
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>>>>> ----
>>>>>>>>>>>> --
>>>>>>>>>>>>  1 file changed, 65 insertions(+), 10 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> Changes from V6:
>>>>>>>>>>>> - the device should hold its config interrupt while SUSPEND,
>>>>>>>>>>>> and send config interrupt when the SUSPEND bit is cleared.
>>>>>>>>>>>> - while SUSPEND, the driver MUST NOT access Device
>>>>>>>>>>>> Configuration Space
>>>>>>>>>>>> - minor changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Changes from V5:
>>>>>>>>>>>> - the device should present NEEDS_RESET if failed to suspend
>>>>>>>>>>>> - allow the driver access device status in the config space when
>>>>>>>>>>>>   suspended if it is implemented in config space.
>>>>>>>>>>>> - language improvements
>>>>>>>>>>>>
>>>>>>>>>>>> Changes from V4:
>>>>>>>>>>>> - re-order the device status bits section
>>>>>>>>>>>> - kick vqs --> notify vqs
>>>>>>>>>>>>
>>>>>>>>>>>> Changes from V3:
>>>>>>>>>>>> - allow the driver clearing the SUSPEND bit to resume the
>> device.
>>>>>>>>>>>> - disallow access to config space while suspended.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/content.tex b/content.tex index 0a62dce..2d1bee8
>>>>>>>>>>>> 100644
>>>>>>>>>>>> --- a/content.tex
>>>>>>>>>>>> +++ b/content.tex
>>>>>>>>>>>> @@ -36,19 +36,22 @@ \section{\field{Device Status}
>>>>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>>>>>>>>      this bit.  For example, under Linux, drivers can be
>>>>>>>>>>>> loadable
>>>> modules.
>>>>>>>>>>>>    \end{note}
>>>>>>>>>>>>
>>>>>>>>>>>> -\item[FAILED (128)] Indicates that something went wrong in
>>>>>>>>>>>> the guest,
>>>>>>>>>>>> -  and it has given up on the device. This could be an
>>>>>>>>>>>> internal
>>>>>>>>>>>> -  error, or the driver didn't like the device for some
>>>>>>>>>>>> reason, or
>>>>>>>>>>>> -  even a fatal error during device operation.
>>>>>>>>>>>> +\item[DRIVER_OK (4)] Indicates that the driver is set up and
>>>>>>>>>>>> +ready to
>>>>>>>>>>>> +  drive the device.
>>>>>>>>>>>>
>>>>>>>>>>>>  \item[FEATURES_OK (8)] Indicates that the driver has
>>>>>>>>>>>> acknowledged all
>>>>>>>>>> the
>>>>>>>>>>>>    features it understands, and feature negotiation is complete.
>>>>>>>>>>>>
>>>>>>>>>>>> -\item[DRIVER_OK (4)] Indicates that the driver is set up and
>>>>>>>>>>>> ready to
>>>>>>>>>>>> -  drive the device.
>>>>>>>>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated,
>>>>>>>> indicates
>>>>>>>>>>>> +that the
>>>>>>>>>>>> +  device has been suspended by the driver.
>>>>>>>>>>>>
>>>>>>>>>>>>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has
>>>>>>>>>> experienced
>>>>>>>>>>>>    an error from which it can't recover.
>>>>>>>>>>>> +
>>>>>>>>>>>> +\item[FAILED (128)] Indicates that something went wrong in
>>>>>>>>>>>> +the guest,
>>>>>>>>>>>> +  and it has given up on the device. This could be an
>>>>>>>>>>>> +internal
>>>>>>>>>>>> +  error, or the driver didn't like the device for some
>>>>>>>>>>>> +reason, or
>>>>>>>>>>>> +  even a fatal error during device operation.
>>>>>>>>>>>>  \end{description}
>>>>>>>>>>>>
>>>>>>>>>>>>  The \field{device status} field starts out as 0, and is
>>>>>>>>>>>> reinitialized to 0 by @@ -
>>>>>>>>>>>> 60,8 +63,9 @@ \section{\field{Device Status}
>>>>>>>>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>>>>>>>> initialization sequence specified in \ref{sec:General
>>>>>>>>>>>> Initialization And Device Operation / Device
>>>>>>>>>> Initialization}.
>>>>>>>>>>>> -The driver MUST NOT clear a
>>>>>>>>>>>> -\field{device status} bit.  If the driver sets the FAILED
>>>>>>>>>>>> bit,
>>>>>>>>>>>> +The driver MUST NOT clear a \field{device status} bit other
>>>>>>>>>>>> +than SUSPEND except when setting \field{device status} to 0
>>>>>>>>>>>> +as a transport-specific way to initiate a reset. If the
>>>>>>>>>>>> +driver sets the FAILED bit,
>>>>>>>>>>>>  the driver MUST later reset the device before attempting to
>>>>>>>>>>>> re-
>>>>>>>> initialize.
>>>>>>>>>>>>  The driver SHOULD NOT rely on completion of operations of a
>>>> @@
>>>>>>>>>>>> -99,10
>>>>>>>>>>>> +103,10 @@ \section{Feature Bits}\label{sec:Basic Facilities
>>>>>>>>>>>> +of a Virtio Device
>>>>>>>>>>>> / Feature B  \begin{description}
>>>>>>>>>>>>  \item[0 to 23, and 50 to 127] Feature bits for the specific
>>>>>>>>>>>> device type
>>>>>>>>>>>>
>>>>>>>>>>>> -\item[24 to 41] Feature bits reserved for extensions to the
>>>>>>>>>>>> queue and
>>>>>>>>>>>> +\item[24 to 42] Feature bits reserved for extensions to the
>>>>>>>>>>>> +queue and
>>>>>>>>>>>>    feature negotiation mechanisms
>>>>>>>>>>>>
>>>>>>>>>>>> -\item[42 to 49, and 128 and above] Feature bits reserved for
>>>>>>>>>>>> future extensions.
>>>>>>>>>>>> +\item[43 to 49, and 128 and above] Feature bits reserved for
>>>>>>>>>>>> +future
>>>>>>>>>>>> extensions.
>>>>>>>>>>>>  \end{description}
>>>>>>>>>>>>
>>>>>>>>>>>>  \begin{note}
>>>>>>>>>>>> @@ -629,6 +633,53 @@ \section{Device
>>>> Cleanup}\label{sec:General
>>>>>>>>>>>> Initialization And Device Operation /
>>>>>>>>>>>>
>>>>>>>>>>>>  Thus a driver MUST ensure a virtqueue isn't live (by device
>>>>>>>>>>>> reset) before removing exposed buffers.
>>>>>>>>>>>>
>>>>>>>>>>>> +\section{Device Suspend}\label{sec:General Initialization
>>>>>>>>>>>> +And Device Operation / Device Suspend}
>>>>>>>>>>>> +
>>>>>>>>>>>> +When VIRTIO_F_SUSPEND is negotiated, the driver can set the
>>>>>>>>>>>> +SUSPEND bit in \field{device status} to suspend a device,
>>>>>>>>>>>> +and can clear the SUSPEND bit to resume a suspended device.
>>>>>>>>>>>> +
>>>>>>>>>>>> +\drivernormative{\subsection}{Device Suspend}{General
>>>>>>>>>>>> +Initialization And Device Operation / Device Suspend}
>>>>>>>>>>>> +
>>>>>>>>>>>> +The driver MUST NOT set SUSPEND if FEATURES_OK is not set
>> or
>>>>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated.
>>>>>>>>>>>> +
>>>>>>>>>>>> +Once the driver sets SUSPEND to \field{device status} of the
>>>> device:
>>>>>>>>>>>> +\begin{itemize}
>>>>>>>>>>>> +\item The driver MUST re-read \field{device status} to
>>>>>>>>>>>> +verify whether the
>>>>>>>>>>>> SUSPEND bit is set.
>>>>>>>>>>>> +\item The driver MUST NOT make any more buffers available to
>>>>>>>>>>>> +the
>>>>>>>>>> device.
>>>>>>>>>>>> +\item The driver MUST NOT access any virtqueues or send
>>>>>>>>>>>> +notifications for
>>>>>>>>>>>> any virtqueues.
>>>>>>>>>>>> +\item The driver MUST NOT access Device Configuration Space.
>>>>>>>>>>>> +\end{itemize}
>>>>>>>>>>>> +
>>>>>>>>>> Hi Parva
>>>>>>>>>>> Do we agree that
>>>>>>>>>>> a. suspending a device is non frequent operation (in order of
>>>>>>>>>>> N
>>>>>>>>>> operations/sec, where N is roughly in range of 10 or 100) per
>> device?
>>>>>>>>>> Ideally it should not be often in normal operations, but
>>>>>>>>>> remember we can not restrict the behaviors of the driver, so we
>>>>>>>>>> must be able to handle the scenario in which SUSPENDING is
>> often.
>>>>>>>>> Sure. the intent is slow rate, but one can do at unexpected times.
>>>>>>>>> Do you agree?
>>>>>>>> I think we don't have an intention of the frequency in the spec.
>>>>>>> Sure.
>>>>>>>
>>>>>>>> The spec only provides generic mechanisms and interfaces.
>>>>>>> Sure.
>>>>>>>
>>>>>>>> Don't assume it(or driver wants it to be) would be often or not,
>>>>>>>> that depends on the driver.
>>>>>>> As you rightly said : it cannot be assumed.
>>>>>>> The driver will read the device status right after it wrote it.
>>>>>>> This typically is <
>>>>>> 50nsec of time.
>>>>>>> The suspend operation for a net device to store hundreds of
>>>>>>> queues, RSS
>>>>>> table, flow filters, takes plenty of time (at least more than 50nsec :) ).
>>>>>>> Similarly for the GPU to store some MBs of memory takes more than
>>>>>>> 50nsec
>>>>>> of time, for example to store in a file for a software-based GPU device.
>>>>>>> So a device cannot respond back suspend=true in next 50nsec time.
>>>>>> It's OK for the device to take longer time to respond, the driver
>>>>>> simply re-reads device status.
>>>>> Sure, the issue is, when the driver re-reads, the device must
>>>>> present
>>>> suspend=false within 50nsec.
>>>>> (because device didn't suspend it yet).
>>>>>
>>>>> As I explained in previous email, this requires building special circuitry.
>>>>> Such circuitry can be avoided if the suspend interface is done
>>>>> slightly
>>>> differently.
>>>> why there is a constraint condition of the time?
>>> Because this is what driver does as you explained in [1] indicating "we must
>> be able to handle ..."
>>> [1]
>>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
>>> .kernel.org%2Fvirtio-comment%2Fc4d5eed3-774b-4d35-a007-
>> f9dff28ce884%40
>> amd.com%2FT%2F%23m6f081f96ef9dcea29c64a88b633eb21d50e8c410&dat
>> a=05%7C0
>> 2%7Cparav%40nvidia.com%7C4a248d6982514660015808dccbf79c55%7C430
>> 83d1572
>> 7340c1b7db39efd9ccc17a%7C0%7C0%7C638609511605195425%7CUnknown
>> %7CTWFpbG
>> Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
>> Mn0%
>> 3D%7C0%7C%7C%7C&sdata=Gj4oG8Z1O7wEl%2B7Mbud%2BbP31aMDMAFA
>> H9ElHJSvBTUA%
>>> 3D&reserved=0
>> The sentence is "we must be able to handle the scenario in which
>> SUSPENDING is often", where do you see any constraint conditions of time?
> If the device takes X usec, how can the device inform that, it suspending in progress without flipping the polarity?
Oh, I see your puzzle. The device does not inform that. It is VM_EXIT and VM_ENTRY, once VM_ENTRY, the guest vcpu aka the driver resume running
>
>>>> Are there any similar
>>>> constraining for other states like RESET or DRIVER_OK? Don't assume
>>>> any other states transitions are faster than SUSPEND.
>>> DRIVER_OK does not suffer from it because it is async notification.
>>> A device may start slow after DRIVER_OK.
>>> SUSPEND operation cannot rely on such async behavior.
>> The driver can set suspend and re-read & wait. This is a common routine in
>> the driver.
>>> RESET also suffers from similar inefficiencies.
>>> But that is because it is inherited from the past.
>>>
>>> Here a new functionality is being proposed and it has a chance for efficient
>> device implementation.
>>> Therefore the request is to improve it.
>> sure, as long as we confirm this new register apply for all device status
>> transitions, not only for SUSPEND.
> Alright, please do it as you suggested here.
>
>>>>>>> More below.
>>>>>>>
>>>>>>>>>>> b. A software-based device may not always want to force
>>>>>>>>>>> VM_EXIT on read
>>>>>>>>>> and write on the device_status register?
>>>>>>>>>> Trap and Emulation is the basic of virtualization, and how to
>>>>>>>>>> pass-through a device is out of this spec.
>>>>>>>>>>
>>>>>>>>> Sure, I didn't suggest to put such things in the spec.
>>>>>>>>> My question is, whether to trap and emulate or not is a choice
>>>>>>>>> of the
>>>>>>>> software.
>>>>>>>>> Do you agree?
>>>>>>>> The device emulator does not know anything about whether trapped
>>>> or
>>>>>> not.
>>>>>>>> Trapping and Emulation is a hypervisor thing.
>>>>>>>>
>>>>>>>> If here "software" refers to the device emulator, then Yes, it is
>>>>>>>> not the emulator's decision. And the device should not be aware
>>>>>>>> of VM_EXIT & VM_ENTRY.
>>>>>>>>
>>>>>>> Right. So a software wants to implement device_status as pure MMIO
>>>>>> writes. (and not VM_EXIT).
>>>>>> This is not always true, there can be VM_EXIT of pure emulated
>>>>>> devices.
>>>>> I am not denying that VM_EXIT can/cannot be there.
>>>>> I am saying, the proposal forces VM_EXIT based approach in my
>>>> understanding of this patch.
>>> [MARKER_1]
>>>
>>>>> If that is not true, may be can you please explain how this can be
>>>> implemented without VM_EXIT?
>>>>> We should have an interface that can be done with and without
>>>>> VM_EXIT
>>>> method at least for any new additions.
>>>> VM_EXIT is out of spec, it is a hypervisor and the processor thing.
>>> You keep repeating VM_EXIST is out of spec.
>>> I already replied at [MARKER_1], sure it is out of spec, the current approach
>> forces one to do VM_EXIT based approach.
>>> And if not, please explain, how can it be achieved?
>> It is not the current solution force the hypervisor & guest to perform
>> VM_EXIT.
> I read it few times but couldn't parse it.
> Can you please explain how your proposal works WITHOUT VM_EXIT?
> If not, you can say, that current proposal is limited to force VM_EXIT.
> Please explain.
It needs VM_EXIT, not only SUSPEND, all sensitive resource, like the config space access
should be trapped.
>
>> In basic virtualization, any access to hypervisor emulated HW registers would
>> be Traped and Emulated, means a VM_EXIT and a VM_ENTRY.
> That was not my question, my question is how to achieve without VM_EXIT?
> You are suggesting a spec that forces VM_EXIT.
It needs VM_EXIT, and there is a pairing VM_ENTRY
>
>>>> In non-pass-through case, any register access are sensitive and will
>>>> trigger VM_EXIT. Like RESET or DRIVER_OK needs to access
>>>> device_status, nothing different.
>>> I don't think you understood the point.
>>> Let me repeat, The question is, if the device implementation wants to
>> achieve the functionality without VM_EXIT, what is the way?
>> Map the register to guest address space, out of the spec.
> And which component will flip the bit of suspend before driver reads it to indicate that device is still_suspending?
>
>>>>>> The
>>>>>> HW registers are sensitive resource and any access to them need to
>>>>>> be trapped and emulated.
>>>>> This does not apply to PCI PFs and VFs which are HW devices (mainly
>> PFs).
>>>>> so this trap + emulation is narrow view that we better avoid.
>>>> This is how *basic* virtualization work, once access sensitive resource,
>> trap it.
>>>>> If you think this is the way forward, you should put forward in
>>>>> patch as
>>>> MUST requirement.
>>>>> and that does not look right to me.
>>>>> I hope you also don't mean to force this method to device
>>>> implementations.
>>>>> Right?
>>>> Again, VM_EXIT is a hypervisor thing,  out of spec. Whether there is
>>>> a VM_EXIT when setting SUSPEND totally depends on the virtualization
>>>> solution. And SUSPEND is nothing different from DRIVER_OK.
>>>>
>>> Please avoid repeating the point that VM_EXIT is hypervosor thing.
>>> No one asked to put this in spec. Please re-read [MARKER1]. You probably
>> missed that.
>>> SUSPEND is different than DRIVER_OK. I explained the timing constraints
>> and the required circuitry needed to fulfill the proposal.
>>> And with the additional register, such complicated circuitry can be easily
>> avoided.
>> Please read QEMU code, that can help you find how the trap - emulate
>> mechanism work for guest when it tries to access sensitive resources.
> I am not suggesting trap and emulation. Please come out of this looping thoughts.
>
> I repeat my question, please explain how does suggested solution works without VM_EXIT?
> From your response, I derive it cannot. Can you please confirm?
>
>>>> Means, if your virtualization needs to trap SUSPEND, it also needs to
>>>> trap DRIVER_OK, and don't assume DRIVER_OK is faster than SUSPEND.
>>> DRIVER_OK is by law of physics is faster than SUSPEND because it does not
>> demand the driver of reading back.
>>> There is no driver side loop to check if the device accepted DRIVER_OK or
>> not.
>>> Agree?
>> Why do you assume so? 
> I am not assuming. It is the current spec. :)
> Please read the device initialization sequence, that does NOT say that driver must READ DRIVER_OK.
> Hence by law of physics it is faster.
>
>> In some vendor implementation, DRIVER_OK can be
>> slow. Can DRIVER_OK implement a deferred initialization even deferred
>> resource allocation? Does the spec say that DRIVER_OK must be faster?
> No. it does not. But your suspend proposal says so that device must respond back before the next read arrives from the driver.
>
>>>>>>> And prefer to returning SUSPEND=true at slow pace.
>>>>>>> This means, the device implementation cannot immediately return
>>>>>> suspend=true right after it was written.
>>>>>>> A MMIO read will read it back, as suspend=true.
>>>>>>>
>>>>>>> An alternative would be, to forward CPU loads and CPU stores to
>>>>>>> different
>>>>>> address.
>>>>>>> However, this does not work for the hw based devices.
>>>>>>>
>>>>>>> That means, PCI HW needs to return suspend=0, until the device is
>>>>>>> not
>>>>>> suspended.
>>>>>>> In this example, the device cannot build special circuitry to
>>>>>>> answer
>>>>>> suspend=true within 50nsec, or in other words building special
>>>>>> circuitry to return suspend=false is too complex for the slow operation.
>>>>>> why? The device can just not to change the value of the SUSPEND bit
>>>>>> before it has fully suspended.
>>>>> When driver wrote, it wrote suspend=true, And device returns
>>>>> suspend=false while suspend is ongoing, right?
>>>>> If yes, this is expensive because the device needs to operate within
>>>>> 50nsec
>>>> or less to answer suspend=false.
>>>>> And even worst, it needs to suspend=true when unsuspending within
>>>> 50nsec when resuming is ongoing.
>>>> again, there is not a 50nsec constraining and please take a reference
>>>> of how DRIVER_OK work with virtualization.
>>> There is. The device is expected to return back the desired value to indicate
>> driver that suspend is ongoing.
>>> It is different than DRIVER_OK.
>> where does the spec or any code say 50nsec?
> Your spec and your previous comment implies that when device is busy suspending, it must returned the toggled value than what driver set it.
> Isn't it? It means device needs to react before the next read arrives from the driver, isn't it?
> I took that read as 50nsec example as you explained previously that driver will read very fast.
>
>>>>>>> If this understanding of burden is clear,
>>>>>>>
>>>>>>> The proposal is, can you please extend the interface such that,
>>>>>>>
>>>>>>> 1. driver writes suspend command.
>>>>>>> 2. driver reads suspend_status, and receives not_completed=(false).
>>>>>>> This is
>>>>>> the default value.
>>>>>>> 3. When the device completes suspend, it changes the polarity of
>>>>>> suspend_status=true.
>>>>>>> This has two main benefits:
>>>>>>> [A] This will enable software-based devices to write data to slow
>>>>>>> files and
>>>>>> does not have to force VM_EXITs.
>>>>>>> [B] It also enables hw based devices to not build special
>>>>>>> circuitry to answer
>>>>>> within 50nsec, which can get very complicated for tens or hundreds
>>>>>> of PCI PFs.
>>>>>> I think we have already discussed on this before in V5, and Jason
>>>>>> has some insightful comments
>>>>>>
>>>>> Unfortunately, not. His comment was that it is not specific to suspend.
>>>>> But here we are introducing a new interface and functionality that
>>>>> does not
>>>> need to suffer or follow anything that may not be efficient.
>>>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor
>> %2F&data=05%7C02%7Cparav%40nvidia.com%7C4a248d6982514660015808
>> dccbf79
>> c55%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6386095116052
>> 07133%7
>> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB
>> TiI6Ik
>> 1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=VGOfVe0deSDilzufY7NH
>> gU5RmciNh
>>>> dXMMQiVUSiP%2BKM%3D&reserved=0
>>>>>> e.kernel.org%2Fvirtio-comment%2F20240612082055-mutt-send-email-
>>>> mst%40
>>>>
>> kernel.org%2FT%2F%23mc817fc6ca12ff0bcbae62b43b6146a177ecf13a9&dat
>>>> a=05
>>>>
>> %7C02%7Cparav%40nvidia.com%7C619c82b60b824ca03f8808dcc89beb4a%7
>>>> C43083
>>>>
>> d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638605819247985989%7CUn
>>>> known%7C
>>>>
>> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
>>>> CJXV
>>>>
>> CI6Mn0%3D%7C0%7C%7C%7C&sdata=VqmJmqt3k5tf3x4ihjE5Bd7u59GadOn
>>>> OaOfJ5lvG
>>>>>> 1DE%3D&reserved=0
>>>>>>>> this is out of the spec anyway.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Zhu Lingshan
>>>>>>>>>> Thanks
>>>>>>>>>> Zhu Lingshan
>>>>>>>>>>>> +\devicenormative{\subsection}{Device Suspend}{General
>>>>>>>>>>>> +Initialization And Device Operation / Device Suspend}
>>>>>>>>>>>> +
>>>>>>>>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set or
>>>>>>>>>>>> VIRTIO_F_SUSPEND is not negotiated.
>>>>>>>>>>>> +
>>>>>>>>>>>> +The device MUST ignore all access to its Configuration Space
>>>>>>>>>>>> +while suspended, except for \field{device status} if it is
>>>>>>>>>>>> +part of the Configuration
>>>>>>>>>>>> Space.
>>>>>>>>>>>> +
>>>>>>>>>>>> +A device MUST NOT send any notifications for any virtqeuues,
>>>>>>>>>>>> +access any virtqueues, or modify any fields in its
>>>>>>>>>>>> +Configuration Space while suspended.
>>>>>>>>>>>> +
>>>>>>>>>>>> +If changes occur in the Configuration Space while the
>>>>>>>>>>>> +SUSPEND bit is set, the device MUST NOT send any
>>>>>>>>>>>> +configuration change
>>>>>>>> notifications.
>>>>>>>>>>>> +Instead, the device MUST send the notification after the
>>>>>>>>>>>> +SUSPEND bit has
>>>>>>>>>>>> been cleared.
>>>>>>>>>>>> +
>>>>>>>>>>>> +When the driver sets SUSPEND, the device MUST either
>> suspend
>>>>>>>>>>>> +itself or set
>>>>>>>>>>>> DEVICE_NEEDS_RESET if failed to suspend.
>>>>>>>>>>>> +
>>>>>>>>>>>> +If SUSPEND is set in \field{device status}, when the driver
>>>>>>>>>>>> +clears SUSPEND, the device MUST either resume normal
>>>> operation
>>>>>>>>>>>> +or set
>>>>>>>>>>>> DEVICE_NEEDS_RESET.
>>>>>>>>>>>> +
>>>>>>>>>>>> +When the driver sets SUSPEND, the device SHOULD perform
>> the
>>>>>>>>>>>> +following actions before presenting that
>>>>>>>>>>>> the SUSPEND bit is set to 1 in the \field{device status}:
>>>>>>>>>>>> +
>>>>>>>>>>>> +\begin{itemize}
>>>>>>>>>>>> +\item Stop processing more buffers of any virtqueues \item
>>>>>>>>>>>> +Wait until all buffers that are being processed have been used.
>>>>>>>>>>>> +\item Send used buffer notifications to the driver.
>>>>>>>>>>>> +\end{itemize}
>>>>>>>>>>>> +
>>>>>>>>>>>>  \chapter{Virtio Transport Options}\label{sec:Virtio
>>>>>>>>>>>> Transport Options}
>>>>>>>>>>>>
>>>>>>>>>>>>  Virtio can use various different buses, thus the standard is
>>>>>>>>>>>> split @@ -872,6
>>>>>>>>>>>> +923,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved
>>>>>>>>>>>> +Feature
>>>>>>>>>>>> Bits}
>>>>>>>>>>>>  	\ref{devicenormative:Basic Facilities of a Virtio Device /
>>>>>>>>>>>> Feature Bits} for
>>>>>>>>>>>>  	handling features reserved for future use.
>>>>>>>>>>>>
>>>>>>>>>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that
>>>>>>>>>>>> + the driver
>>>>>>>> can
>>>>>>>>>>>> +   trigger suspending the device via the SUSPEND flag
>>>>>>>>>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device
>>>>>>>>>>>> + Status
>>>> Field}.
>>>>>>>>>>>> +
>>>>>>>>>>>>  \end{description}
>>>>>>>>>>>>
>>>>>>>>>>>>  \drivernormative{\section}{Reserved Feature Bits}{Reserved
>>>>>>>>>>>> Feature Bits}
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.45.2
>>>>>>>>>>>>


  reply	other threads:[~2024-09-05  7:21 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-01 11:35 [PATCH V7 v7] virtio: introduce SUSPEND bit in device status Zhu Lingshan
2024-08-13  4:42 ` Parav Pandit
2024-08-13  5:44   ` Zhu Lingshan
2024-08-13  5:50     ` Parav Pandit
2024-08-13  6:14       ` Zhu Lingshan
2024-08-13  6:55         ` Parav Pandit
2024-08-15  8:23           ` Zhu Lingshan
2024-08-15  9:34             ` Parav Pandit
2024-08-30  2:31               ` Zhu Lingshan
2024-08-30  3:02                 ` Parav Pandit
2024-09-03  9:05                   ` Zhu Lingshan
2024-09-03  9:45                     ` Michael S. Tsirkin
2024-09-03 10:09                       ` Parav Pandit
2024-09-03 10:35                         ` Michael S. Tsirkin
2024-09-03 10:37                           ` Michael S. Tsirkin
2024-09-04  3:07                             ` Jason Wang
2024-09-04  4:02                               ` Michael S. Tsirkin
2024-09-04  6:31                                 ` Jason Wang
2024-09-04  6:38                                   ` Zhu Lingshan
2024-09-04  6:46                                     ` Parav Pandit
2024-09-05  7:14                                       ` Zhu Lingshan
2024-09-05  7:16                                         ` Parav Pandit
2024-09-05  7:29                                           ` Zhu Lingshan
2024-09-05  7:35                                             ` Parav Pandit
2024-09-05  8:30                                               ` Zhu Lingshan
2024-09-05  8:41                                                 ` David Stevens
2024-09-06  1:53                                                   ` Parav Pandit
2024-09-05  7:17                                         ` Michael S. Tsirkin
2024-09-05  7:31                                           ` Zhu Lingshan
2024-09-05  7:34                                             ` Parav Pandit
2024-09-05  6:51                                     ` Michael S. Tsirkin
2024-09-05  7:12                                       ` Zhu Lingshan
2024-09-05  8:12                                         ` Michael S. Tsirkin
2024-09-05  9:09                                           ` Zhu Lingshan
2024-09-06  1:54                                             ` Parav Pandit
2024-09-05 23:51                                           ` Jason Wang
2024-09-11  3:52                                             ` Zhu Lingshan
2024-09-11 10:20                                             ` Michael S. Tsirkin
2024-09-12  2:05                                               ` Jason Wang
2024-09-12  5:44                                                 ` Michael S. Tsirkin
2024-09-24  7:35                                                   ` Jason Wang
2024-09-24 23:05                                                     ` Michael S. Tsirkin
2024-09-25  3:47                                                       ` Jason Wang
2024-09-25 11:17                                                         ` Michael S. Tsirkin
2024-09-27  4:08                                                           ` Jason Wang
2024-09-29 17:55                                                             ` Michael S. Tsirkin
2024-10-17  6:56                                                               ` Jason Wang
2024-09-03 10:28                     ` Parav Pandit
2024-09-05  7:20                       ` Zhu Lingshan [this message]
2024-08-15 10:45             ` Michael S. Tsirkin
2024-08-30  2:32               ` Zhu Lingshan
2024-08-15 10:52           ` Michael S. Tsirkin
2024-08-15 10:59             ` Parav Pandit
2024-08-15 15:07               ` Michael S. Tsirkin
2024-08-17  5:19                 ` Parav Pandit
2024-08-30  2:37                 ` Zhu Lingshan
2024-08-30  3:10                   ` Parav Pandit
2024-09-03  8:51                     ` Zhu Lingshan
2024-09-03  8:55                       ` Parav Pandit
2024-09-03  9:36                       ` Michael S. Tsirkin
2024-09-05  7:27                         ` Zhu Lingshan
2024-09-24 23:07                           ` Michael S. Tsirkin
2024-08-13  7:51   ` Michael S. Tsirkin
2024-08-13  7:58     ` Parav Pandit
2024-08-13  8:03       ` Michael S. Tsirkin
2024-08-13  8:01 ` Michael S. Tsirkin
2024-08-15  9:12   ` Zhu Lingshan
2024-08-15 10:50     ` Michael S. Tsirkin
2024-08-30  2:20       ` Zhu Lingshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de10e802-c0d7-4cf6-b1f8-3624b47e191c@amd.com \
    --to=lingshan.zhu@amd.com \
    --cc=cohuck@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=stevensd@chromium.org \
    --cc=virtio-comment@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox