From: Ryan Harper <ryanh@us.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org,
Anthony Liguori <aliguori@linux.vnet.ibm.com>,
Ryan Harper <ryanh@us.ibm.com>,
Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
Date: Wed, 10 Nov 2010 10:01:22 -0600 [thread overview]
Message-ID: <20101110160122.GA22381@us.ibm.com> (raw)
In-Reply-To: <m3sjz9ibuk.fsf@blackfin.pond.sub.org>
* Markus Armbruster <armbru@redhat.com> [2010-11-10 06:48]:
> One real question, and a couple of nits.
>
> Ryan Harper <ryanh@us.ibm.com> writes:
>
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal command
>
> Well, I wouldn't call unplug "racy". It just takes an unpredictable
> length of time, possibly forever. To make a race, you need to throw in
> a client assuming (incorrectly) that unplug is instantaneous, as
> described in your next paragraph.
>
> Moreover, all PCI unplug is that way, not just block.
>
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second guest
> > leading to data leakage.
>
> Yes, the incorrect assumption is a problem. But with that fixed (in the
> management application), we run right into the next problem: there is no
> way for the management application to reliably disconnect the guest from
> a block device. And that's the problem you're fixing.
Yeah, that's the right way to word it; providing a method to forcibly
disconnect the guest from the host device.
>
> > This series introduces a new montor command to decouple asynchornous device
>
> Typos "montor" and "asynchornous". You might want to use a spell
> checker :)
>
> Lines are a bit long. Recommend wrap at column 70.
>
> > removal from restricting guest access to a block device. We do this by creating
> > a new monitor command drive_del which maps to a bdrv_unplug() command which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent
> > IO is rejected from the device and the guest will get IO errors but continue to
> > function. In addition to preventing further IO, we clean up state pointers
> > between host (BlockDriverState) and guest (DeviceInfo).
> >
> > A subsequent device removal command can be issued to remove the device, to which
> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
>
> "maynot" is not a word.
>
> > will be sumbitted.
>
> This suggests to drive_del before device_del, which makes the device
> goes through a "broken device" state on its way to unplug. If the guest
> accesses the device in that state, it gets I/O errors. Not nice.
>
> Instead, I'd recommend device_del, wait for the device to go away,
> drive_del on time out. If the guest reacts to the ACPI unplug promptly,
> it's never exposed to the "broken device" state. Note: if the drive_del
> fails because the device doesn't exist, we lost the race with the
> automatic destruction, which is harmless. Ignore that error.
Honestly, other than describing what happens if you sever the connection
when the guest isn't aware of it; I don't want to try to capture how the
mgmt layer implements the removal.
One may want to force the disconnect before attempting to remove the
device; or the other way around; that's really the mgmt layer's call.
>
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> > ---
> > block.c | 7 +++++++
> > block.h | 1 +
> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++
> > blockdev.h | 1 +
> > hmp-commands.hx | 18 ++++++++++++++++++
> > 5 files changed, 63 insertions(+), 0 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index 6b505fb..c76a796 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
> > }
> > }
> >
> > +void bdrv_unplug(BlockDriverState *bs)
> > +{
> > + qemu_aio_flush();
> > + bdrv_flush(bs);
> > + bdrv_close(bs);
> > +}
> > +
>
> Unless we expect more users, I'd inline this into its only caller.
> Matter of taste.
Works for me.
>
> > int bdrv_is_removable(BlockDriverState *bs)
> > {
> > return bs->removable;
> > diff --git a/block.h b/block.h
> > index 78ecfac..581414c 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
> > BlockErrorAction on_write_error);
> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> > void bdrv_set_removable(BlockDriverState *bs, int removable);
> > +void bdrv_unplug(BlockDriverState *bs);
> > int bdrv_is_removable(BlockDriverState *bs);
> > int bdrv_is_read_only(BlockDriverState *bs);
> > int bdrv_is_sg(BlockDriverState *bs);
> > diff --git a/blockdev.c b/blockdev.c
> > index 6cb179a..ee8c2ec 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -14,6 +14,8 @@
> > #include "qemu-option.h"
> > #include "qemu-config.h"
> > #include "sysemu.h"
> > +#include "hw/qdev.h"
> > +#include "block_int.h"
> >
> > static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
> >
> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
> > }
> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> > }
> > +
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > +{
> > + const char *id = qdict_get_str(qdict, "id");
> > + BlockDriverState *bs;
> > + Property *prop;
> > +
> > + bs = bdrv_find(id);
> > + if (!bs) {
> > + qerror_report(QERR_DEVICE_NOT_FOUND, id);
> > + return -1;
> > + }
> > +
> > + /* quiesce block driver; prevent further io */
> > + bdrv_unplug(bs);
> > +
> > + /* clean up guest state from pointing to host resource by
> > + * finding and removing DeviceState "drive" property */
> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> > + if ((prop->info->type == PROP_TYPE_DRIVE) &&
> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == bs)) {
> > + if (prop->info->free) {
> > + prop->info->free(bs->peer, prop);
> > + }
>
> Does this null the drive property? I doubt it. Quick check in the
> debugger?
>
> The free callbacks generally don't zap the properties, because they run
> from qdev_free().
To be honest; I didn't see anything that looked like "remove this
property" in the qdev api. Any pointers?
should I be calling qdev_free() on the dev? I don't quite understand
the distinction between the info list of properties and the device
itself, nor specifically what we need to remove in the drive_del()
operation versus the device_del() portion.
>
> > + }
> > + }
> > +
> > + /* clean up host state pointing to guest resource by removing
> > + * pointers to guest device in the BlockDriverState */
> > + bdrv_delete(bs);
> > +
> > + return 0;
> > +}
> > +
> > diff --git a/blockdev.h b/blockdev.h
> > index 653affc..2a0559e 100644
> > --- a/blockdev.h
> > +++ b/blockdev.h
> > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
> > int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
> > int do_change_block(Monitor *mon, const char *device,
> > const char *filename, const char *fmt);
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >
> > #endif
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index e5585ba..d6dc18c 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
> > ETEXI
> >
> > {
> > + .name = "drive_del",
> > + .args_type = "id:s",
> > + .params = "device",
> > + .help = "remove host block device",
> > + .user_print = monitor_user_noop,
> > + .mhandler.cmd_new = do_drive_del,
> > + },
> > +
> > +STEXI
> > +@item delete @var{device}
> > +@findex delete
> > +Remove host block device. The result is that guest generated IO is no longer
> > +submitted against the host device underlying the disk. Once a drive has
> > +been deleted, the QEMU Block layer returns -EIO which results in IO
> > +errors in the guest for applications that are reading/writing to the device.
> > +ETEXI
> > +
> > + {
> > .name = "change",
> > .args_type = "device:B,target:F,arg:s?",
> > .params = "device filename [format]",
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
next prev parent reply other threads:[~2010-11-10 16:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-09 2:25 [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal Ryan Harper
2010-11-09 2:25 ` [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() Ryan Harper
2010-11-10 12:48 ` Markus Armbruster
2010-11-10 16:01 ` Ryan Harper [this message]
2010-11-10 17:39 ` Markus Armbruster
2010-11-10 17:50 ` Ryan Harper
2010-11-10 19:31 ` Ryan Harper
2010-11-11 10:48 ` Markus Armbruster
2010-11-11 13:25 ` Ryan Harper
2010-11-11 14:50 ` Markus Armbruster
2010-11-09 2:25 ` [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del Ryan Harper
2010-11-09 13:29 ` [Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101110160122.GA22381@us.ibm.com \
--to=ryanh@us.ibm.com \
--cc=aliguori@linux.vnet.ibm.com \
--cc=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefan.hajnoczi@uk.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.