Discussion of the VIRTIO specification
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
	stefanha@redhat.com, virtio-comment@lists.oasis-open.org,
	mst@redhat.com, eperezma@redhat.com, sgarzare@redhat.com
Subject: Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
Date: Tue, 22 Dec 2020 16:54:31 +0100	[thread overview]
Message-ID: <20201222165431.3f49de29.cohuck@redhat.com> (raw)
In-Reply-To: <7da54d5b-0787-c78f-4b35-6a4f7ed2f5bf@redhat.com>

On Tue, 22 Dec 2020 20:51:04 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/12/22 下午8:14, Cornelia Huck wrote:
> > On Tue, 22 Dec 2020 15:30:31 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/12/22 下午2:50, Halil Pasic wrote:  
> >>> On Tue, 22 Dec 2020 10:36:41 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> On 2020/12/22 上午5:33, Halil Pasic wrote:  
> >>>>> On Fri, 18 Dec 2020 12:23:02 +0800
> >>>>> Jason Wang <jasowang@redhat.com> wrote:
> >>>>>     
> >>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
> >>>>>> used by the driver to stop and resume a device. The main user will be
> >>>>>> live migration support for virtio device.
> >>>>>>     
> >>>>> Can you please provide some more background information, or point
> >>>>> me to the appropriate discussion?
> >>>>>
> >>>>> I mean AFAIK migration already works without this driver initiated
> >>>>> drain. What is the exact motivation? What about the big picture? I
> >>>>> guess some agent in the guest would have to make the driver issue
> >>>>> the DEVICE_STOP.  
> >>>> This is not necessary if the datapath is done inside qemu and when
> >>>> migration is initiated by qemu itself.
> >>>>
> >>>> But it's a must for using virtio-device as a backend for emulated virtio
> >>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
> >>>> then it can safely synchronize the state from them.
> >>>>     
> >>> You say, in this case qemu needs to stop the device, which makes sense
> >>> (it also has to do this when the datapath is implemented in qemu), but
> >>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
> >>> confused.  
> >>
> >> It's initiated by Qemu. Guest is unware of live migration.  
> > But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
> > sounds like "guest" to me.  
> 
> 
> I think we need clarification on the "driver".

Looking through the spec, I think we have never properly defined what
we mean by "device" or "driver". Not sure what a good definition would
look like. I would consider a "device" an entity that is present and is
configured/used by a "driver". The "driver" is the means to actually
configure/use a "device", i.e. the "initiating" part (even though some
actions are initiated by the device once the driver has started
interacting with it).

> Think the following setup:
> 
> Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA 
> driver)
> 
> It's the Qemu that initiates the stop, and the request is forwarded to 
> virtio-pci vDPA driver. So the process is transparent to the guest (driver).

So it's

Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent
|<-- guest -->|<-------------- host -------------->|

but the vDPA parent is itself acting as a "driver" if you take my
attempt at a definition above?

(Wouldn't that mean that the device in QEMU is configured/used from two
entities?)

> 
> 
> >  
> >>  
> >>> I'm still curious about how the different components in the stack
> >>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
> >>> supposed to interact.  
> >>
> >> It works like:
> >>
> >>   From Qemu point of view, vhost-vDPA is just another type of vhost
> >> backend. Qemu needs to stop virtio (vhost) before it can do migration.
> >> So we require vDPA devices to have the ability of stopping or pausing
> >> its datapath. If the vDPA device is by chance the virtio-PCI device, it
> >> needs an interface for receiving stop/resume command from the driver.
> >>
> >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
> >> to vDPA parent which could be a virtio-PCI device in this case.  
> > But QEMU implements the _device_, not the driver, doesn't it?  
> 
> 
> The device is implemented with the co-operation between Qemu and 
> vhost-vDPA. During migration qemu need to stop the virtio-net device, 
> then vhost must be stopped as well.

Yes, it makes sense that any vhost parts need to be stopped as well.

> 
> 
> > And IIUC,
> > vhost-VDPA and the vDPA parent are also on the device side. I feel like
> > I'm missing something essential here.  
> 
> 
> Virtio-PCI driver could be a vDPA parent as well in this case. So we 
> need stop the virtio-pci device.

Who is the "driver" in that case?

> 
> 
> >  
> >>  
> >>>     
> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>>> ---
> >>>>>>     content.tex | 26 ++++++++++++++++++++++++--
> >>>>>>     1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/content.tex b/content.tex
> >>>>>> index 61eab41..4392b60 100644
> >>>>>> --- a/content.tex
> >>>>>> +++ b/content.tex
> >>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>>>>>     \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> >>>>>>       drive the device.
> >>>>>>     
> >>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> >>>>>> +  indicates that the device has been stopped by the driver.
> >>>>>> +  
> >>>>> AFAIU it is not only about indicating stopped, but also requesting to be
> >>>>> stopped.
> >>>>>
> >>>>> More importantly, that must not be set immediately, in a sense that the
> >>>>> one side initiates some action by requesting the bit to be set, and the
> >>>>> other side must not set the bit before the action is performed.  
> >>>> Yes.
> >>>>
> >>>>     
> >>>>> We also
> >>>>> seem to assume that every device implementation is capable of performing
> >>>>> this trick.  
> >>>> A dedicated feature bit is introduced for this.
> >>>>     
> >>> This is not about the feature bit, but about the mechanism. But your
> >>> subsequent answers explain, that this is nothing unusual, and then
> >>> we should be fine.
> >>>        
> >>>>> Is it for hardware devices (e.g. PCI) standard to request an
> >>>>> operation by writing some value into a register, and get feedback bout
> >>>>> a non-completion by reading different value that written,  
> >>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
> >>>> like this:
> >>>>
> >>>> """
> >>>>
> >>>> The device MUST NOT offer a feature which requires another feature which
> >>>> was not offered. The device SHOULD accept any valid subset of features
> >>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> >>>> status bit when the driver writes it.
> >>>>
> >>>> """
> >>>>     
> >>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
> >>> works, and how similar this is to DEVICE_STOPPED.  
> >>
> >> My understanding is that for both of them, driver can try to set the bit
> >> by writing to the status register but it's the device that decide when
> >> to set the bit.  
> > I think there's a difference: For FEATURES_OK, the driver can read back
> > the status and knows that the feature combination is not accepted if it
> > is not set. For DEVICE_STOPPED, it does not really know whether the
> > device has started to stop yet. It only knows that the device is
> > actually done stopping when DEVICE_STOPPED is set.  
> 
> 
> The only difference is that device may need sometime to be stopped. 
> FEATURES_OK might not be the best example. Let's see how reset work 
> which is more similar to DEVICE_STOPPED:
> 
> """
> 
> After writing 0 to device_status, the driver MUST wait for a read of 
> device_status to return 0 before reinitializing the device.
> 
> """

Hm, but that is PCI-specific writing of fields in the configuration
structure, right? CCW uses a different method for resetting (an
asynchronous channel command; channel commands create an interrupt when
done); "write 0 to status for reset" is not universal.

Which makes me think: is using the device status a good universal
method for stopping the device? If I look at it only from the CCW
perspective, I'd have used a new channel command with a payload of
stop/resume and only reflected the stopped state in the device status
(i.e. read-only from the driver, and only changed by the device when it
transitions to/from the stopped state.) Could PCI use a new field for
that?

> 
> >
> > Do we need a different mechanism for the device to signal the driver
> > that the device has actually been stopped? If the driver sees the
> > STOPPED status after reading back, it knows that the device has
> > acknowledged the request and is now stopping down. If it is not set, it
> > means that the device has not honoured the request. Same for clearing
> > it to resume.  
> 
> 
> I don't see much difference. Device reset doesn't have such notification 
> and driver may simply poll or use a timer.

See above; CCW does not really use the device status in the same way. I
think a notification would be useful.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


  reply	other threads:[~2020-12-22 15:54 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
2020-12-21  3:08   ` Jason Wang
2020-12-21 11:06     ` Stefano Garzarella
2020-12-22  2:38       ` Jason Wang
2020-12-21 21:33 ` [virtio-comment] " Halil Pasic
2020-12-22  2:36   ` Jason Wang
2020-12-22  6:50     ` Halil Pasic
2020-12-22  7:30       ` Jason Wang
2020-12-22 12:14         ` Cornelia Huck
2020-12-22 12:51           ` Jason Wang
2020-12-22 15:54             ` Cornelia Huck [this message]
2020-12-23  2:48               ` Jason Wang
2020-12-25  7:38                 ` Halil Pasic
2020-12-27 10:00                   ` Michael S. Tsirkin
2020-12-28  6:21                     ` Halil Pasic
2020-12-28  7:01                       ` Jason Wang
2020-12-28 12:30                         ` Michael S. Tsirkin
2020-12-29  9:04                           ` Jason Wang
2021-01-12 10:54                             ` Michael S. Tsirkin
2021-01-13  3:35                               ` Jason Wang
2020-12-29 13:35                         ` Halil Pasic
2020-12-30  8:15                           ` Jason Wang
2021-01-11 18:16                             ` Cornelia Huck
2021-01-12  3:27                               ` Jason Wang
2021-01-12 12:13                                 ` Cornelia Huck
2021-01-13  2:52                                   ` Jason Wang
2021-01-14 12:00                                     ` Cornelia Huck
2020-12-28  6:47                   ` Jason Wang
2020-12-29 13:20                     ` Halil Pasic
2020-12-30  8:03                       ` Jason Wang
2020-12-24  4:52             ` Halil Pasic
2020-12-24  5:51               ` Jason Wang
2020-12-25  3:18                 ` Halil Pasic
2020-12-25  6:45                   ` Jason Wang
2020-12-27 11:12                     ` Michael S. Tsirkin
2020-12-28  7:05                       ` Jason Wang
2020-12-28 12:27                         ` Michael S. Tsirkin
2020-12-29  8:57                           ` Jason Wang
2021-05-03  9:02 ` [virtio-comment] " Eugenio Perez Martin
2021-05-06  2:51   ` Jason Wang
2021-05-05 13:16 ` Michael S. Tsirkin
2021-05-06  7:26   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201222165431.3f49de29.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox