From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ryan Harper <ryanh@us.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
yamahata@valinux.co.jp, Markus Armbruster <armbru@redhat.com>,
qemu-devel@nongnu.org,
Anthony Liguori <aliguori@linux.vnet.ibm.com>,
Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
Date: Wed, 3 Nov 2010 20:02:24 +0200 [thread overview]
Message-ID: <20101103180224.GB19117@redhat.com> (raw)
In-Reply-To: <20101103172910.GT3469@us.ibm.com>
On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > Ryan Harper <ryanh@us.ibm.com> writes:
> >
> > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > >> > > > > > > >
> > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > >> > > > > > > > between notification and disconnect.
> > >> > > > > > >
> > >> > > > > > > The problem with this is that it has no analog in real world.
> > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > >> > > > > > > remove the card. Tying them together is what created the problem in the
> > >> > > > > > > first place.
> > >> > > > > > >
> > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > >> > > > > > > being shown to the user.
> > >> > > > > >
> > >> > > > > > Very true. I'm fine with forcing a disconnect during the removal path
> > >> > > > > > prior to notification. Do we want a new disconnect method at the device
> > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > >> > > > > > during the initial hotremov event?
> > >> > > > >
> > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > >> > > > > since we decided we are emulating device removal, let's call it
> > >> > > > > just that.
> > >> > > >
> > >> > > > Because current the "removal" process depends on the guest actually
> > >> > > > responding. What I'm suggesting is that, in Marcus's term, and what
> > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > >> > > > the guest device to prevent any further access to it in the case the
> > >> > > > guest doesn't respond to the removal request made via ACPI.
> > >> > > >
> > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > >> > > > command so to complete the device removal operation without waiting for
> > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > >> > > > response callback; it will appear as if the guest responded whether it
> > >> > > > did or not.
> > >> > > >
> > >> > > > What I was suggesting above was to instead of calling the callback for
> > >> > > > handing the guest response was to add a device function called
> > >> > > > disconnect which would remove any association of host resources from
> > >> > > > guest resources before we notified the guest. Thinking about it again
> > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > >> > > > removal requests via ACPI is in the first place.
> > >> > > >
> > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > >> > > > from host resources separate from the device removal *if* we're going to
> > >> > > > retain the guest notification. If we don't care to notify the guest,
> > >> > > > then we can just do device removal without notifying the guest
> > >> > > > and be done with it.
> > >> > >
> > >> > > I imagine management would typically want to do this:
> > >> > > 1. notify guest
> > >> > > 2. wait a bit
> > >> > > 3. remove device
> > >> >
> > >> > Yes; but this argues for (1) being a separate command from (3)
> > >>
> > >> Yes. Long term I think we will want a way to do that.
> > >>
> > >> > unless we
> > >> > require (3) to include (1) and (2) in the qemu implementation.
> > >> >
> > >> > Currently we implement:
> > >> >
> > >> > 1. device_del (attempt to remove device)
> > >> > 2. notify guest
> > >> > 3. if guest responds, remove device
> > >> > 4. disconnect host resource from device on destruction
> > >> >
> > >> > With my drive_unplug patch we do:
> > >> >
> > >> > 1. disconnect host resource from device
> > >>
> > >> This is what drive_unplug does, right?
> > >
> > > Correct.
> > >
> > >>
> > >> > 2. device_del (attempt to remove device)
> > >> > 3. notify guest
> > >> > 4. if guest responds, remove device
> > >> >
> > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > >> > device_del)
> > >> >
> > >> > 1. device_del (attemp to remove device)
> > >> > 2. notify guest
> > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > >> > 4. if guest responds, invoke device destruction path a second time.
> > >>
> > >> By response you mean eject? No, this is not what I was suggesting.
> > >> I was really suggesting that your patch is fine :)
> > >> Sorry about confusion.
> > >
> > > I don't mean eject; I mean responding to the ACPI event by writing a
> > > response to the PCI chipset which QEMU then in turn will invoke the
> > > qdev_unplug() path which ultimately kills the device and the Drive and
> > > BlockState objects.
> > >
> > >>
> > >> I was also saying that from what I hear, the pci express support
> > >> will at some point need interfaces to
> > >> - notify guest about device removal/addition
> > >> - get eject from guest
> > >> - remove device without talking to guest
> > >> - add device without talking to guest
> > >> - suppress device deletion on eject
> > >>
> > >> All this can be generic and can work through express
> > >> configuration mechanisms or through acpi for pci.
> > >> But this is completely separate from unplugging
> > >> the host backend, which should be possible at any point.
> > >
> > > Yes. I think we've worked out that we do want an independent
> > > unplug/disconnect mechanism rather than tying it to device_del.
> > >
> > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > > and that instead of having device_del always succeed and replacing it
> > > with a shell, we'd need to provide an explicit command to do the
> > > disconnect in a similar fashion to how we're doing drive_unplug?
> >
> > I'm not sure I parse this.
>
> You were asking for net and block disconnect to have similar mechanisms.
> You mentioned the net fix for suprise removal was to have device_del()
> always succeed by replacing the device with a shell/zombie. The
> drive_unplug() patch doesn't do the same thing; it doesn't affect the
> device_del() path at all, rather it provides mgmt apps a hook to
> directly disconnect host resource from guest resource.
Yes, the shell thing is just an implementation detail.
> >
> > > With at least two of these device types needing an explicit disconnect
> > > to sever the bond between host/guest makes me want a device-level
> > > interface for doing the disconnect that each device can implement
> > > differently.
> >
> > I'm fine with having a separate command to forcibly disconnect a device
> > from its host resources.
> >
> > Typical use:
> >
> > 1. device_del
> > ask guest to give up device, via ACPI
> >
> > 2a. guest replies "done", delete device, free host resources
> >
> > 2b. timeout, device_disconnect (or however we call that)
> >
> > Is this what you have in mind?
>
> Yeah, aboslutely. I think Michael was saying we should implement 2b in
> the mgmt stack. The current libvirt does the following
>
> 1. mgmt invokes detach-device
> 2. device_del
> 3. update mgmt view of resources, assumes guest has done it's part; does
> not confirm with qemu that device has been deleted.
>
> With drive_unplug in qemu and a patch to libvirt, it looks like:
>
> 1. mgmt invokes detach-device
> 2a. call drive_unplug, log warning if drive_unplug isn't available
> 2b. device_del
> 3. update mgmt view of resources, assumes guest has done it's part; does
> not confirm with qemu that device has been deleted.
>
> I can look at implementing the timeout before invoking the unplug
> (that's a bit tricky) in libvirt;
So we'd
1. reorder 2a and 2b, and add a small timeout
2. teach libvirt not to reuse the PCI slot and device id
until it is really free
Sounds good.
> but given the fact that the mgmt is
> invoking the removal I think it's reasonable to do forced disconnect
> (even if the guest hasn't responded).
This is really making an assumption about the user.
Giving the guest a bit of time to respond with eject seems prudent.
For disk we risk losing data otherwise.
The only reason we are pushing this out to management is so
it can track state implement timeouts and interact with the user.
If it doesn't, what's the point? Let's keep it all in qemu...
> >
> >
> > With qdev, device models are connected to host resources with special
> > properties such as qdev_prop_netdev and qdev_prop_drive. Thus, generic
> > qdev code can already find and disconnect them.
> >
> > How can we make sure device models survive such a disconnect?
> >
> > * Ask the device to disconnect itself (new DeviceInfo method).
> > Drawback: duplicates common functionality in every device model.
> > More code, more bugs.
> >
> > * Let qdev core disconnect and free host resources
> >
> > - and replace them with dummies. I guess we'd need a dummy
> > constructor method for that, in PropertyInfo. Done right, device
> > models should be able to carry on unawares.
> >
> > - and leave them null. Device models need to cope with that. NICs
> > do for netdev.
> >
>
> I like the latter here; the BlockDriverState handles nulls. I think
> netdev should be able to as well though I haven't looked very closely
> though so maybe Michael can confirm if that's a true statement.
Not at the moment: the issue is that NULL means legacy vlan setup there.
We can rework the code to avoid that assumption but it's not on my
priority list.
> > We might need to notify the device model (new DeviceInfo method).
> > Dunno.
>
>
>
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com
next prev parent reply other threads:[~2010-11-03 18:02 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
2010-10-29 13:18 ` Markus Armbruster
2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
2010-10-29 14:01 ` Markus Armbruster
2010-10-29 14:15 ` Anthony Liguori
2010-10-29 14:29 ` Kevin Wolf
2010-10-29 14:40 ` Anthony Liguori
2010-10-29 14:57 ` Kevin Wolf
2010-10-29 15:28 ` Anthony Liguori
2010-10-29 16:08 ` Kevin Wolf
2010-10-30 13:25 ` Christoph Hellwig
2010-10-29 15:28 ` Markus Armbruster
2010-11-01 21:06 ` Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug Ryan Harper
2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
2010-10-29 15:03 ` Ryan Harper
2010-10-29 16:10 ` Markus Armbruster
2010-10-29 16:50 ` Ryan Harper
2010-11-02 9:40 ` Markus Armbruster
2010-11-02 13:22 ` Michael S. Tsirkin
2010-11-02 13:41 ` Kevin Wolf
2010-11-02 13:46 ` Ryan Harper
2010-11-02 13:58 ` Michael S. Tsirkin
2010-11-02 14:22 ` Ryan Harper
2010-11-02 15:46 ` Michael S. Tsirkin
2010-11-02 16:53 ` Ryan Harper
2010-11-02 17:59 ` Michael S. Tsirkin
2010-11-02 19:01 ` Ryan Harper
2010-11-02 19:17 ` Michael S. Tsirkin
2010-11-02 20:23 ` Ryan Harper
2010-11-03 7:21 ` Michael S. Tsirkin
2010-11-03 12:04 ` Ryan Harper
2010-11-03 16:41 ` Markus Armbruster
2010-11-03 17:29 ` Ryan Harper
2010-11-03 18:02 ` Michael S. Tsirkin [this message]
2010-11-03 20:59 ` Ryan Harper
2010-11-03 21:26 ` Michael S. Tsirkin
2010-11-04 16:45 ` Ryan Harper
2010-11-04 17:04 ` Michael S. Tsirkin
2010-11-05 13:27 ` Markus Armbruster
2010-11-05 14:17 ` Michael S. Tsirkin
2010-11-05 14:29 ` Ryan Harper
2010-11-05 16:01 ` Markus Armbruster
2010-11-08 21:02 ` Michael S. Tsirkin
2010-11-05 14:25 ` Ryan Harper
2010-11-05 16:10 ` Markus Armbruster
2010-11-05 16:22 ` Ryan Harper
2010-11-06 8:18 ` Markus Armbruster
2010-11-08 2:19 ` Ryan Harper
2010-11-08 10:32 ` Markus Armbruster
2010-11-08 10:49 ` Michael S. Tsirkin
2010-11-08 12:03 ` Markus Armbruster
2010-11-08 14:02 ` Ryan Harper
2010-11-08 16:56 ` Michael S. Tsirkin
2010-11-08 17:04 ` Daniel P. Berrange
2010-11-08 18:41 ` Ryan Harper
2010-11-08 18:39 ` Ryan Harper
2010-11-08 19:06 ` Daniel P. Berrange
2010-11-08 16:34 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101103180224.GB19117@redhat.com \
--to=mst@redhat.com \
--cc=aliguori@linux.vnet.ibm.com \
--cc=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ryanh@us.ibm.com \
--cc=stefan.hajnoczi@uk.ibm.com \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).