All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Harper <ryanh@us.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	Ryan Harper <ryanh@us.ibm.com>,
	Stefan Hajnoczi <stefan.hajnoczi@uk.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
Date: Wed, 10 Nov 2010 13:31:53 -0600	[thread overview]
Message-ID: <20101110193153.GF22381@us.ibm.com> (raw)
In-Reply-To: <m3bp5xkrhi.fsf@blackfin.pond.sub.org>

* Markus Armbruster <armbru@redhat.com> [2010-11-10 11:40]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-11-10 06:48]:
> >> One real question, and a couple of nits.
> >> 
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> >> > unplug event; this may not happen synchronously with the device removal command
> >> 
> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> >> length of time, possibly forever.  To make a race, you need to throw in
> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> described in your next paragraph.
> >> 
> >> Moreover, all PCI unplug is that way, not just block.
> >> 
> >> > This series aims to close a gap where by mgmt applications that assume the
> >> > block resource has been removed without confirming that the guest has
> >> > acknowledged the removal may re-assign the underlying device to a second guest
> >> > leading to data leakage.
> >> 
> >> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> >> management application), we run right into the next problem: there is no
> >> way for the management application to reliably disconnect the guest from
> >> a block device.  And that's the problem you're fixing.
> >
> > Yeah, that's the right way to word it; providing a method to forcibly
> > disconnect the guest from the host device.
> >> 
> >> > This series introduces a new montor command to decouple asynchornous device
> >> 
> >> Typos "montor" and "asynchornous".  You might want to use a spell
> >> checker :)
> >> 
> >> Lines are a bit long.  Recommend wrap at column 70.
> >> 
> >> > removal from restricting guest access to a block device.  We do this by creating
> >> > a new monitor command drive_del which maps to a bdrv_unplug() command which
> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> >> > IO is rejected from the device and the guest will get IO errors but continue to
> >> > function.  In addition to preventing further IO, we clean up state pointers
> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >
> >> > A subsequent device removal command can be issued to remove the device, to which
> >> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> >> 
> >> "maynot" is not a word.
> >> 
> >> > will be sumbitted.
> >> 
> >> This suggests to drive_del before device_del, which makes the device
> >> goes through a "broken device" state on its way to unplug.  If the guest
> >> accesses the device in that state, it gets I/O errors.  Not nice.
> >> 
> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> >> it's never exposed to the "broken device" state.  Note: if the drive_del
> >> fails because the device doesn't exist, we lost the race with the
> >> automatic destruction, which is harmless.  Ignore that error.
> >
> > Honestly, other than describing what happens if you sever the connection
> > when the guest isn't aware of it; I don't want to try to capture how the
> > mgmt layer implements the removal.  
> >
> > One may want to force the disconnect before attempting to remove the
> > device; or the other way around; that's really the mgmt layer's call.
> 
> Fair enough.
> 
> >> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> >> > ---
> >> >  block.c         |    7 +++++++
> >> >  block.h         |    1 +
> >> >  blockdev.c      |   36 ++++++++++++++++++++++++++++++++++++
> >> >  blockdev.h      |    1 +
> >> >  hmp-commands.hx |   18 ++++++++++++++++++
> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 6b505fb..c76a796 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
> >> >      }
> >> >  }
> >> >  
> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> > +{
> >> > +    qemu_aio_flush();
> >> > +    bdrv_flush(bs);
> >> > +    bdrv_close(bs);
> >> > +}
> >> > +
> >> 
> >> Unless we expect more users, I'd inline this into its only caller.
> >> Matter of taste.
> >
> > Works for me.
> >
> >> 
> >> >  int bdrv_is_removable(BlockDriverState *bs)
> >> >  {
> >> >      return bs->removable;
> >> > diff --git a/block.h b/block.h
> >> > index 78ecfac..581414c 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
> >> >                         BlockErrorAction on_write_error);
> >> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> >> > +void bdrv_unplug(BlockDriverState *bs);
> >> >  int bdrv_is_removable(BlockDriverState *bs);
> >> >  int bdrv_is_read_only(BlockDriverState *bs);
> >> >  int bdrv_is_sg(BlockDriverState *bs);
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index 6cb179a..ee8c2ec 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -14,6 +14,8 @@
> >> >  #include "qemu-option.h"
> >> >  #include "qemu-config.h"
> >> >  #include "sysemu.h"
> >> > +#include "hw/qdev.h"
> >> > +#include "block_int.h"
> >> >  
> >> >  static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
> >> >  
> >> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
> >> >      }
> >> >      return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >> >  }
> >> > +
> >> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> >> > +{
> >> > +    const char *id = qdict_get_str(qdict, "id");
> >> > +    BlockDriverState *bs;
> >> > +    Property *prop;
> >> > +
> >> > +    bs = bdrv_find(id);
> >> > +    if (!bs) {
> >> > +        qerror_report(QERR_DEVICE_NOT_FOUND, id);
> >> > +        return -1;
> >> > +    }
> >> > +
> >> > +    /* quiesce block driver; prevent further io */
> >> > +    bdrv_unplug(bs);
> >> > +
> >> > +    /* clean up guest state from pointing to host resource by
> >> > +     * finding and removing DeviceState "drive" property */
> >> > +    for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >> > +        if ((prop->info->type == PROP_TYPE_DRIVE) && 
> >> > +            (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == bs)) {
> >> > +            if (prop->info->free) {
> >> > +                prop->info->free(bs->peer, prop);
> >> > +            }
> 
> Your use of prop->info->free() in this context is wrong.  More below.
> 
> >> 
> >> Does this null the drive property?  I doubt it.  Quick check in the
> >> debugger?
> >> 
> >> The free callbacks generally don't zap the properties, because they run
> >> from qdev_free().
> >
> > To be honest; I didn't see anything that looked like "remove this
> > property" in the qdev api.  Any pointers?
> 
> The closest we have is indeed the Property method free(), but that's not
> quite right.  It's really only for use by qdev_free().
> 
> > should I be calling qdev_free() on the dev?
> 
> No, because then the whole device is gone, not just the property :)
> 
> >                                              I don't quite understand
> > the distinction between the info list of properties and the device
> > itself, nor specifically what we need to remove in the drive_del()
> > operation versus the device_del() portion.
> 
> device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci"
> device (C type VirtIOPCIProxy).
> 
> drive_del destroys something else, namely the block device host part
> (BlockDriverState + DeviceInfo).  Obviously, it needs to zap all
> pointers to the host part along with it.  Specifically, it needs to zap
> the device's pointer to it.
> 
> Example: if a "virtio-blk-pci" device is using drive "foo", then
> "drive_del foo" needs to zap its member block.bs.
> 
> Complication: we don't (want to) know what kind of device exactly is
> using the drive.  But we do know that a drive property must be
> describing it.
> 
> So we search the properties (for (prop...)) for a drive property
> (prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... ==
> bs).
> 
> Result:
> 
>     BlockDriverState *bs;
>     Property *prop;
>     BlockDriverState **ptr;
> [...]
>     for (prop = bs->peer->info->props; prop && prop->name; prop++) {
>         if ((prop->info->type == PROP_TYPE_DRIVE)) {
>             ptr = qdev_get_prop_ptr(dev, prop);
>             if (*ptr == bs) {
>                 bdrv_detach(bs, bs->peer);

Invoking the free method on the drive property does do detach:

free_drive
{
    BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);

    if (*ptr) {
        bdrv_detach(*ptr, dev);
        blockdev_auto_del(*ptr);
    }
}

and the bdrv_delete()

takes out the bs pointer.

> Only then are we ready to destroy the host part:
> 
>     drive_uninit(drive_get_by_blockdev(bs));

And if auto-deletion it set, then it handles the drive_uninit().  Do you think
we should explicitly invoke drive_uninit() ?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

  parent reply	other threads:[~2010-11-10 19:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-09  2:25 [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal Ryan Harper
2010-11-09  2:25 ` [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() Ryan Harper
2010-11-10 12:48   ` Markus Armbruster
2010-11-10 16:01     ` Ryan Harper
2010-11-10 17:39       ` Markus Armbruster
2010-11-10 17:50         ` Ryan Harper
2010-11-10 19:31         ` Ryan Harper [this message]
2010-11-11 10:48           ` Markus Armbruster
2010-11-11 13:25             ` Ryan Harper
2010-11-11 14:50               ` Markus Armbruster
2010-11-09  2:25 ` [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del Ryan Harper
2010-11-09 13:29 ` [Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110193153.GF22381@us.ibm.com \
    --to=ryanh@us.ibm.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=armbru@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefan.hajnoczi@uk.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.