qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ryan Harper <ryanh@us.ibm.com>
Cc: Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	Markus Armbruster <armbru@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
Date: Tue, 2 Nov 2010 15:58:27 +0200	[thread overview]
Message-ID: <20101102135827.GB31007@redhat.com> (raw)
In-Reply-To: <20101102134622.GG22904@us.ibm.com>

On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> > Ryan Harper <ryanh@us.ibm.com> writes:
> > 
> > > * Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
> > >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> 
> > >> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
> > >> >> [Note cc: Michael]
> > >> >> 
> > >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> >> 
> > >> >> 
> > >> >> If I understand your patch correctly, the difference between your
> > >> >> drive_unplug and my blockdev_del is as follows:
> > >> >> 
> > >> >> * drive_unplug forcefully severs the connection between the host part of
> > >> >>   the block device and its BlockDriverState.  A shell of the host part
> > >> >>   remains, to be cleaned up later.  You need forceful disconnect
> > >> >>   operation to be able to revoke access to an image whether the guest
> > >> >>   cooperates or not.  Fair enough.
> > >> >> 
> > >> >> * blockdev_del deletes a host part.  My current version fails when the
> > >> >>   host part is in use.  I patterned that after netdev_del, which used to
> > >> >>   work that way, until commit 2ffcb18d:
> > >> >> 
> > >> >>     Make netdev_del delete the netdev even when it's in use
> > >> >>     
> > >> >>     To hot-unplug guest and host part of a network device, you do:
> > >> >>     
> > >> >>         device_del NIC-ID
> > >> >>         netdev_del NETDEV-ID
> > >> >>     
> > >> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
> > >> >>     The device goes away for real only after the guest processed the ACPI
> > >> >>     unplug event.
> > >> >>     
> > >> >>     You have to wait until then (e.g. by polling info pci) before you can
> > >> >>     unplug the netdev.  Not good.
> > >> >>     
> > >> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
> > >> >>     netdev while it's in use is safe; packets simply get routed to the bit
> > >> >>     bucket.
> > >> >> 
> > >> >>   Isn't this the very same problem that's behind your drive_unplug?
> > >> >
> > >> > Yes it is.
> > >> >
> > >> >> 
> > >> >> I'd like to have some consistency among net, block and char device
> > >> >> commands, i.e. a common set of operations that work the same for all of
> > >> >> them.  Can we agree on such a set?
> > >> >
> > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > >> > that in the case where the guest responds to device_del induced ACPI
> > >> > removal event; the current qdev code already does the host-side device
> > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > >> > after the device_del.  What happens when we do:
> > >> >
> > >> > device_del
> > >> > ACPI to guest
> > >> > blockdev_del /* removes host-side device */
> > >> 
> > >> Fails in my tree, because the blockdev's still in use.  See below.
> > >> 
> > >> > guest responds to ACPI
> > >> > qdev calls pci device removal code
> > >> > qemu attempts to destroy the associated host-side block
> > >> >
> > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > >> > code to check for NULLs
> > >> 
> > >> I hate the automatic deletion of host part along with the guest part.
> > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > >> be similarly paired.
> > >
> > > Agreed.
> > >> 
> > >> In my blockdev branch, I keep the automatic delete only for backwards
> > >> compatibility: if you create the drive with drive_add, it gets
> > >> auto-deleted, but if you use blockdev_add, it stays around.
> > >
> > > But what to do about the case where we're doing drive_add and then a
> > > device_del()  That's the urgent situation that needs to be resolved.
> > 
> > What's the exact problem we need to solve urgently?
> > 
> > Is it "provide means to cut the connection to the host part immediately,
> > even with an uncooperative guest"?
> 
> Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> believes should have disassociated the host block device from the guest,
> we want to ensure that the host block device is no longer accessible
> from the guest.
> 
> > 
> > Does this need to be separate from device_del?
> 
> no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> something like unplug early in the device_del because that's all pci
> device code which has no knowledge of host block devices; having it
> disconnect seemed like a layering violation.

We invoke the cleanup callback, isn't that enough?

> > 
> > >> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
> > >> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
> > >> >> special-purpose command to be deprecated later.
> > >> >
> > >> > The fix is urgent; but I'm willing to spin a couple patches if it helps
> > >> > get this into better shape.
> > >> 
> > >> Can we agree on a common solution for block and net?  That's why I cc'ed
> > >> Michael.
> > >
> > > I didn't see a good way to have block behave the same as net; though I
> > > do agree that it would be good to have this be common, long term.
> > 
> > If we can't make them behave 100% the same, then the next best thing is
> > to offer a preferred way to do things that works similarly enough to let
> > users ignore the differences.
> > 
> > Possible preferred ways to revoke access to a host part:
> > 
> > A. device_del
> > 
> >    Need to make device_del cut the connection right away instead of when
> >    the guest completes unplug.
> > 
> >    device_del changes behavior.  Any problems with that?
> 
> I don't think so; current mgmt consumers assume a cooperative guest and
> don't handle an uncooperative one right now in the case where Selinux is
> disabled; so modifying this path to do a disconnect shouldn't be a
> problem.
> 
> > 
> >    Not an option if we need "cut the connection" to be separate from
> >    device_del.
> > 
> > B. FOO_del
> > 
> >    Got netdev_del.
> > 
> >    Need drive_del.  If drive is in use, replace it by a special "dead
> >    drive" without a device name (so the ID becomes available for new
> >    drives), then delete the original.
> > 
> >    Wart: drive_del doesn't work reliably after device_del, because the
> >    drive is auto-deleted when the guest completes unplug.  Not a problem
> >    for my blockdev_add/blockdev_del work-in-progress, because host parts
> >    created with blockdev_add don't auto-delete.
> > 
> > C. FOO_unplug
> > 
> >    You got a patch for drive_unplug.
> > 
> >    Need netdev_unplug.
> > 
> >    By the way, I hate "unplug", because it suggests relation to hot
> >    unplug.  What about "disconnect"?
> > 
> > Any preferences?
> 
> disconnect is fine.
> 
> > 
> > >> Currently, we have two different ways:
> > >> 
> > >> * The netdev way: "del" always succeeds
> > >> 
> > >>   How can it succeed if the host part is in use?
> > >> 
> > >>   If all device models are prepared to deal with a missing host part, we
> > >>   can delete it right away.
> > >> 
> > >>   Else, we need to replace it with a suitable zombie, which is
> > >>   auto-deleted when it goes out of use.  Such zombies are not be visible
> > >>   elsewhere, in particular, the ID becomes available immediately.
> > >> 
> > >> * The unplug way: "del" fails while in use, "unplug" always succeeds
> > >> 
> > >>   Feels a bit cleaner to me.  But changing netdev_del might not be
> > >>   acceptable.
> > >> 
> > >> Either way works for me as an user interface.  But I'd rather not have
> > >> both.
> > >> 
> > >> Next, we need to consider how to integrate this with the automatic
> > >> deletion of drives on qdev destruction.  That's too late for unplug, we
> > >> want that right in device_del.  I'd leave the stupid automatic delete
> > >> where it is now, in qdev destruction.  The C API need unplug and delete
> > >> separate for that.
> > >> 
> > >> 
> > >> Regardless of the way we choose, we need to think very clearly on how
> > >> exactly device models should behave when their host part is missing or a
> > >> zombie, and how that behavior appears in the guest.
> > >> 
> > >> For net, making it look exactly like a yanked out network cable would
> > >> make sense to me.
> > >> 
> > >> What about block?
> > >
> > > It seems to me that for block it's like cdrom with no disk, floppy with
> > > no media, hard disk that's gone bad.  I think we we throw EIO back; it's
> > > handled gracefully enough.  This is what happens when you do a
> > > drive_unplug with my patch; the application using the device gets IO
> > > errors.  That's expected if a drive were to suddently fail (which is
> > > what this looks like).  And certainly there is some responsibility
> > > at the mgmt console to ensure you're not unplugging a drive that you are
> > > currently using.
> > 
> > Total drive failure works for me.
> 
> OK
> 
> > 
> > "No media" is cute, but it's possible only for drives with removable
> > media.  I'd rather have all drives behave the same, whether their media
> > is removable or not.
> 
> right, I don't think there is any "no media" for hard disks.
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

  reply	other threads:[~2010-11-02 14:06 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
2010-10-29 13:18   ` Markus Armbruster
2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
2010-10-29 14:01   ` Markus Armbruster
2010-10-29 14:15     ` Anthony Liguori
2010-10-29 14:29       ` Kevin Wolf
2010-10-29 14:40         ` Anthony Liguori
2010-10-29 14:57           ` Kevin Wolf
2010-10-29 15:28             ` Anthony Liguori
2010-10-29 16:08               ` Kevin Wolf
2010-10-30 13:25                 ` Christoph Hellwig
2010-10-29 15:28       ` Markus Armbruster
2010-11-01 21:06     ` Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug Ryan Harper
2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
2010-10-29 15:03   ` Ryan Harper
2010-10-29 16:10     ` Markus Armbruster
2010-10-29 16:50       ` Ryan Harper
2010-11-02  9:40         ` Markus Armbruster
2010-11-02 13:22           ` Michael S. Tsirkin
2010-11-02 13:41           ` Kevin Wolf
2010-11-02 13:46           ` Ryan Harper
2010-11-02 13:58             ` Michael S. Tsirkin [this message]
2010-11-02 14:22               ` Ryan Harper
2010-11-02 15:46                 ` Michael S. Tsirkin
2010-11-02 16:53                   ` Ryan Harper
2010-11-02 17:59                     ` Michael S. Tsirkin
2010-11-02 19:01                       ` Ryan Harper
2010-11-02 19:17                         ` Michael S. Tsirkin
2010-11-02 20:23                           ` Ryan Harper
2010-11-03  7:21                             ` Michael S. Tsirkin
2010-11-03 12:04                               ` Ryan Harper
2010-11-03 16:41                                 ` Markus Armbruster
2010-11-03 17:29                                   ` Ryan Harper
2010-11-03 18:02                                     ` Michael S. Tsirkin
2010-11-03 20:59                                       ` Ryan Harper
2010-11-03 21:26                                         ` Michael S. Tsirkin
2010-11-04 16:45                                           ` Ryan Harper
2010-11-04 17:04                                             ` Michael S. Tsirkin
2010-11-05 13:27                                             ` Markus Armbruster
2010-11-05 14:17                                               ` Michael S. Tsirkin
2010-11-05 14:29                                                 ` Ryan Harper
2010-11-05 16:01                                                 ` Markus Armbruster
2010-11-08 21:02                                                   ` Michael S. Tsirkin
2010-11-05 14:25                                               ` Ryan Harper
2010-11-05 16:10                                                 ` Markus Armbruster
2010-11-05 16:22                                                   ` Ryan Harper
2010-11-06  8:18                                                     ` Markus Armbruster
2010-11-08  2:19                                                       ` Ryan Harper
2010-11-08 10:32                                                         ` Markus Armbruster
2010-11-08 10:49                                                           ` Michael S. Tsirkin
2010-11-08 12:03                                                             ` Markus Armbruster
2010-11-08 14:02                                                               ` Ryan Harper
2010-11-08 16:56                                                                 ` Michael S. Tsirkin
2010-11-08 17:04                                                                   ` Daniel P. Berrange
2010-11-08 18:41                                                                     ` Ryan Harper
2010-11-08 18:39                                                                   ` Ryan Harper
2010-11-08 19:06                                                                     ` Daniel P. Berrange
2010-11-08 16:34                                                               ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101102135827.GB31007@redhat.com \
    --to=mst@redhat.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=armbru@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ryanh@us.ibm.com \
    --cc=stefan.hajnoczi@uk.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).