From: Ryan Harper <ryanh@us.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org,
Anthony Liguori <aliguori@linux.vnet.ibm.com>,
Ryan Harper <ryanh@us.ibm.com>,
Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
Date: Wed, 10 Nov 2010 11:50:06 -0600 [thread overview]
Message-ID: <20101110175006.GE22381@us.ibm.com> (raw)
In-Reply-To: <m3bp5xkrhi.fsf@blackfin.pond.sub.org>
* Markus Armbruster <armbru@redhat.com> [2010-11-10 11:40]:
> Ryan Harper <ryanh@us.ibm.com> writes:
>
> > * Markus Armbruster <armbru@redhat.com> [2010-11-10 06:48]:
> >> One real question, and a couple of nits.
> >>
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >>
> >> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> >> > unplug event; this may not happen synchronously with the device removal command
> >>
> >> Well, I wouldn't call unplug "racy". It just takes an unpredictable
> >> length of time, possibly forever. To make a race, you need to throw in
> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> described in your next paragraph.
> >>
> >> Moreover, all PCI unplug is that way, not just block.
> >>
> >> > This series aims to close a gap where by mgmt applications that assume the
> >> > block resource has been removed without confirming that the guest has
> >> > acknowledged the removal may re-assign the underlying device to a second guest
> >> > leading to data leakage.
> >>
> >> Yes, the incorrect assumption is a problem. But with that fixed (in the
> >> management application), we run right into the next problem: there is no
> >> way for the management application to reliably disconnect the guest from
> >> a block device. And that's the problem you're fixing.
> >
> > Yeah, that's the right way to word it; providing a method to forcibly
> > disconnect the guest from the host device.
> >>
> >> > This series introduces a new montor command to decouple asynchornous device
> >>
> >> Typos "montor" and "asynchornous". You might want to use a spell
> >> checker :)
> >>
> >> Lines are a bit long. Recommend wrap at column 70.
> >>
> >> > removal from restricting guest access to a block device. We do this by creating
> >> > a new monitor command drive_del which maps to a bdrv_unplug() command which
> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent
> >> > IO is rejected from the device and the guest will get IO errors but continue to
> >> > function. In addition to preventing further IO, we clean up state pointers
> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >
> >> > A subsequent device removal command can be issued to remove the device, to which
> >> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> >>
> >> "maynot" is not a word.
> >>
> >> > will be sumbitted.
> >>
> >> This suggests to drive_del before device_del, which makes the device
> >> goes through a "broken device" state on its way to unplug. If the guest
> >> accesses the device in that state, it gets I/O errors. Not nice.
> >>
> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> drive_del on time out. If the guest reacts to the ACPI unplug promptly,
> >> it's never exposed to the "broken device" state. Note: if the drive_del
> >> fails because the device doesn't exist, we lost the race with the
> >> automatic destruction, which is harmless. Ignore that error.
> >
> > Honestly, other than describing what happens if you sever the connection
> > when the guest isn't aware of it; I don't want to try to capture how the
> > mgmt layer implements the removal.
> >
> > One may want to force the disconnect before attempting to remove the
> > device; or the other way around; that's really the mgmt layer's call.
>
> Fair enough.
>
> >> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> >> > ---
> >> > block.c | 7 +++++++
> >> > block.h | 1 +
> >> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++
> >> > blockdev.h | 1 +
> >> > hmp-commands.hx | 18 ++++++++++++++++++
> >> > 5 files changed, 63 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 6b505fb..c76a796 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
> >> > }
> >> > }
> >> >
> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> > +{
> >> > + qemu_aio_flush();
> >> > + bdrv_flush(bs);
> >> > + bdrv_close(bs);
> >> > +}
> >> > +
> >>
> >> Unless we expect more users, I'd inline this into its only caller.
> >> Matter of taste.
> >
> > Works for me.
> >
> >>
> >> > int bdrv_is_removable(BlockDriverState *bs)
> >> > {
> >> > return bs->removable;
> >> > diff --git a/block.h b/block.h
> >> > index 78ecfac..581414c 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
> >> > BlockErrorAction on_write_error);
> >> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >> > void bdrv_set_removable(BlockDriverState *bs, int removable);
> >> > +void bdrv_unplug(BlockDriverState *bs);
> >> > int bdrv_is_removable(BlockDriverState *bs);
> >> > int bdrv_is_read_only(BlockDriverState *bs);
> >> > int bdrv_is_sg(BlockDriverState *bs);
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index 6cb179a..ee8c2ec 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -14,6 +14,8 @@
> >> > #include "qemu-option.h"
> >> > #include "qemu-config.h"
> >> > #include "sysemu.h"
> >> > +#include "hw/qdev.h"
> >> > +#include "block_int.h"
> >> >
> >> > static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
> >> >
> >> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
> >> > }
> >> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >> > }
> >> > +
> >> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> >> > +{
> >> > + const char *id = qdict_get_str(qdict, "id");
> >> > + BlockDriverState *bs;
> >> > + Property *prop;
> >> > +
> >> > + bs = bdrv_find(id);
> >> > + if (!bs) {
> >> > + qerror_report(QERR_DEVICE_NOT_FOUND, id);
> >> > + return -1;
> >> > + }
> >> > +
> >> > + /* quiesce block driver; prevent further io */
> >> > + bdrv_unplug(bs);
> >> > +
> >> > + /* clean up guest state from pointing to host resource by
> >> > + * finding and removing DeviceState "drive" property */
> >> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >> > + if ((prop->info->type == PROP_TYPE_DRIVE) &&
> >> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == bs)) {
> >> > + if (prop->info->free) {
> >> > + prop->info->free(bs->peer, prop);
> >> > + }
>
> Your use of prop->info->free() in this context is wrong. More below.
>
> >>
> >> Does this null the drive property? I doubt it. Quick check in the
> >> debugger?
> >>
> >> The free callbacks generally don't zap the properties, because they run
> >> from qdev_free().
> >
> > To be honest; I didn't see anything that looked like "remove this
> > property" in the qdev api. Any pointers?
>
> The closest we have is indeed the Property method free(), but that's not
> quite right. It's really only for use by qdev_free().
>
> > should I be calling qdev_free() on the dev?
>
> No, because then the whole device is gone, not just the property :)
>
> > I don't quite understand
> > the distinction between the info list of properties and the device
> > itself, nor specifically what we need to remove in the drive_del()
> > operation versus the device_del() portion.
>
> device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci"
> device (C type VirtIOPCIProxy).
>
> drive_del destroys something else, namely the block device host part
> (BlockDriverState + DeviceInfo). Obviously, it needs to zap all
> pointers to the host part along with it. Specifically, it needs to zap
> the device's pointer to it.
>
> Example: if a "virtio-blk-pci" device is using drive "foo", then
> "drive_del foo" needs to zap its member block.bs.
>
> Complication: we don't (want to) know what kind of device exactly is
> using the drive. But we do know that a drive property must be
> describing it.
>
> So we search the properties (for (prop...)) for a drive property
> (prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... ==
> bs).
>
> Result:
>
> BlockDriverState *bs;
> Property *prop;
> BlockDriverState **ptr;
> [...]
> for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> if ((prop->info->type == PROP_TYPE_DRIVE)) {
> ptr = qdev_get_prop_ptr(dev, prop);
> if (*ptr == bs) {
> bdrv_detach(bs, bs->peer);
> *ptr = NULL;
> break;
> }
> }
> }
>
> Aside: arguably, bdrv_detach() should zap *both* pointers, i.e. also do
> the *ptr = NULL. Not your problem to fix.
>
> Only then are we ready to destroy the host part:
>
> drive_uninit(drive_get_by_blockdev(bs));
>
> Does this help?
Yep; lemme get a v7 out.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
next prev parent reply other threads:[~2010-11-10 17:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-09 2:25 [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal Ryan Harper
2010-11-09 2:25 ` [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() Ryan Harper
2010-11-10 12:48 ` Markus Armbruster
2010-11-10 16:01 ` Ryan Harper
2010-11-10 17:39 ` Markus Armbruster
2010-11-10 17:50 ` Ryan Harper [this message]
2010-11-10 19:31 ` Ryan Harper
2010-11-11 10:48 ` Markus Armbruster
2010-11-11 13:25 ` Ryan Harper
2010-11-11 14:50 ` Markus Armbruster
2010-11-09 2:25 ` [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del Ryan Harper
2010-11-09 13:29 ` [Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101110175006.GE22381@us.ibm.com \
--to=ryanh@us.ibm.com \
--cc=aliguori@linux.vnet.ibm.com \
--cc=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefan.hajnoczi@uk.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).