qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Harper <ryanh@us.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	Ryan Harper <ryanh@us.ibm.com>,
	Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
Date: Wed, 10 Nov 2010 10:01:22 -0600	[thread overview]
Message-ID: <20101110160122.GA22381@us.ibm.com> (raw)
In-Reply-To: <m3sjz9ibuk.fsf@blackfin.pond.sub.org>

* Markus Armbruster <armbru@redhat.com> [2010-11-10 06:48]:
> One real question, and a couple of nits.
> 
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal command
> 
> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> length of time, possibly forever.  To make a race, you need to throw in
> a client assuming (incorrectly) that unplug is instantaneous, as
> described in your next paragraph.
> 
> Moreover, all PCI unplug is that way, not just block.
> 
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second guest
> > leading to data leakage.
> 
> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> management application), we run right into the next problem: there is no
> way for the management application to reliably disconnect the guest from
> a block device.  And that's the problem you're fixing.

Yeah, that's the right way to word it; providing a method to forcibly
disconnect the guest from the host device.
> 
> > This series introduces a new montor command to decouple asynchornous device
> 
> Typos "montor" and "asynchornous".  You might want to use a spell
> checker :)
> 
> Lines are a bit long.  Recommend wrap at column 70.
> 
> > removal from restricting guest access to a block device.  We do this by creating
> > a new monitor command drive_del which maps to a bdrv_unplug() command which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> > IO is rejected from the device and the guest will get IO errors but continue to
> > function.  In addition to preventing further IO, we clean up state pointers
> > between host (BlockDriverState) and guest (DeviceInfo).
> >
> > A subsequent device removal command can be issued to remove the device, to which
> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> 
> "maynot" is not a word.
> 
> > will be sumbitted.
> 
> This suggests to drive_del before device_del, which makes the device
> goes through a "broken device" state on its way to unplug.  If the guest
> accesses the device in that state, it gets I/O errors.  Not nice.
> 
> Instead, I'd recommend device_del, wait for the device to go away,
> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> it's never exposed to the "broken device" state.  Note: if the drive_del
> fails because the device doesn't exist, we lost the race with the
> automatic destruction, which is harmless.  Ignore that error.

Honestly, other than describing what happens if you sever the connection
when the guest isn't aware of it; I don't want to try to capture how the
mgmt layer implements the removal.  

One may want to force the disconnect before attempting to remove the
device; or the other way around; that's really the mgmt layer's call.

> 
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> > ---
> >  block.c         |    7 +++++++
> >  block.h         |    1 +
> >  blockdev.c      |   36 ++++++++++++++++++++++++++++++++++++
> >  blockdev.h      |    1 +
> >  hmp-commands.hx |   18 ++++++++++++++++++
> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index 6b505fb..c76a796 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
> >      }
> >  }
> >  
> > +void bdrv_unplug(BlockDriverState *bs)
> > +{
> > +    qemu_aio_flush();
> > +    bdrv_flush(bs);
> > +    bdrv_close(bs);
> > +}
> > +
> 
> Unless we expect more users, I'd inline this into its only caller.
> Matter of taste.

Works for me.

> 
> >  int bdrv_is_removable(BlockDriverState *bs)
> >  {
> >      return bs->removable;
> > diff --git a/block.h b/block.h
> > index 78ecfac..581414c 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
> >                         BlockErrorAction on_write_error);
> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> > +void bdrv_unplug(BlockDriverState *bs);
> >  int bdrv_is_removable(BlockDriverState *bs);
> >  int bdrv_is_read_only(BlockDriverState *bs);
> >  int bdrv_is_sg(BlockDriverState *bs);
> > diff --git a/blockdev.c b/blockdev.c
> > index 6cb179a..ee8c2ec 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -14,6 +14,8 @@
> >  #include "qemu-option.h"
> >  #include "qemu-config.h"
> >  #include "sysemu.h"
> > +#include "hw/qdev.h"
> > +#include "block_int.h"
> >  
> >  static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
> >  
> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
> >      }
> >      return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >  }
> > +
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > +{
> > +    const char *id = qdict_get_str(qdict, "id");
> > +    BlockDriverState *bs;
> > +    Property *prop;
> > +
> > +    bs = bdrv_find(id);
> > +    if (!bs) {
> > +        qerror_report(QERR_DEVICE_NOT_FOUND, id);
> > +        return -1;
> > +    }
> > +
> > +    /* quiesce block driver; prevent further io */
> > +    bdrv_unplug(bs);
> > +
> > +    /* clean up guest state from pointing to host resource by
> > +     * finding and removing DeviceState "drive" property */
> > +    for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> > +        if ((prop->info->type == PROP_TYPE_DRIVE) && 
> > +            (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == bs)) {
> > +            if (prop->info->free) {
> > +                prop->info->free(bs->peer, prop);
> > +            }
> 
> Does this null the drive property?  I doubt it.  Quick check in the
> debugger?
> 
> The free callbacks generally don't zap the properties, because they run
> from qdev_free().

To be honest; I didn't see anything that looked like "remove this
property" in the qdev api.  Any pointers?

should I be calling qdev_free() on the dev?  I don't quite understand
the distinction between the info list of properties and the device
itself, nor specifically what we need to remove in the drive_del()
operation versus the device_del() portion.


> 
> > +        }
> > +    }
> > +
> > +    /* clean up host state pointing to guest resource by removing
> > +     * pointers to guest device in the BlockDriverState */
> > +    bdrv_delete(bs);
> > +
> > +    return 0;
> > +}
> > + 
> > diff --git a/blockdev.h b/blockdev.h
> > index 653affc..2a0559e 100644
> > --- a/blockdev.h
> > +++ b/blockdev.h
> > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  int do_change_block(Monitor *mon, const char *device,
> >                      const char *filename, const char *fmt);
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  
> >  #endif
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index e5585ba..d6dc18c 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
> >  ETEXI
> >  
> >      {
> > +        .name       = "drive_del",
> > +        .args_type  = "id:s",
> > +        .params     = "device",
> > +        .help       = "remove host block device",
> > +        .user_print = monitor_user_noop,
> > +        .mhandler.cmd_new = do_drive_del,
> > +    },
> > +
> > +STEXI
> > +@item delete @var{device}
> > +@findex delete
> > +Remove host block device.  The result is that guest generated IO is no longer
> > +submitted against the host device underlying the disk.  Once a drive has
> > +been deleted, the QEMU Block layer returns -EIO which results in IO 
> > +errors in the guest for applications that are reading/writing to the device.
> > +ETEXI
> > +
> > +    {
> >          .name       = "change",
> >          .args_type  = "device:B,target:F,arg:s?",
> >          .params     = "device filename [format]",

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

  reply	other threads:[~2010-11-10 16:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-09  2:25 [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal Ryan Harper
2010-11-09  2:25 ` [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() Ryan Harper
2010-11-10 12:48   ` Markus Armbruster
2010-11-10 16:01     ` Ryan Harper [this message]
2010-11-10 17:39       ` Markus Armbruster
2010-11-10 17:50         ` Ryan Harper
2010-11-10 19:31         ` Ryan Harper
2010-11-11 10:48           ` Markus Armbruster
2010-11-11 13:25             ` Ryan Harper
2010-11-11 14:50               ` Markus Armbruster
2010-11-09  2:25 ` [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del Ryan Harper
2010-11-09 13:29 ` [Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110160122.GA22381@us.ibm.com \
    --to=ryanh@us.ibm.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=armbru@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefan.hajnoczi@uk.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).