From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=43198 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q4HKY-0007XK-IB for qemu-devel@nongnu.org; Mon, 28 Mar 2011 14:38:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q4HKI-0004V0-Vm for qemu-devel@nongnu.org; Mon, 28 Mar 2011 14:38:08 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:33905) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q4HKI-0004UY-J9 for qemu-devel@nongnu.org; Mon, 28 Mar 2011 14:38:06 -0400 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by e32.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p2SIRGKt009274 for ; Mon, 28 Mar 2011 12:27:16 -0600 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p2SIc4Hn100060 for ; Mon, 28 Mar 2011 12:38:04 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p2SIc4lF031826 for ; Mon, 28 Mar 2011 12:38:04 -0600 Date: Mon, 28 Mar 2011 13:38:00 -0500 From: Ryan Harper Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive Message-ID: <20110328183759.GG20374@us.ibm.com> References: <20110323015347.GA20139@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Kevin Wolf , Ryan Harper , qemu-devel@nongnu.org * Markus Armbruster [2011-03-24 07:27]: > Whoops, almost missed this. Best to cc: me to avoid that. > It was sent directly to you: > Sender: qemu-devel-bounces+ryanh=us.ibm.com@nongnu.org > From: Ryan Harper > Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive > Date: Tue, 22 Mar 2011 20:53:47 -0500 > Message-ID: <20110323015347.GA20139@us.ibm.com> > User-Agent: Mutt/1.5.6+20040907i > To: Markus Armbruster > Cc: Kevin Wolf , Ryan Harper , > qemu-devel@nongnu.org > Ryan Harper writes: > > > * Markus Armbruster [2011-03-15 04:48]: > >> Sorry for the long delay, I was out of action for a week. > >> > >> Ryan Harper writes: > >> > >> > When removing a drive from the host-side via drive_del we currently have the > >> > following path: > >> > > >> > drive_del > >> > qemu_aio_flush() > >> > bdrv_close() > >> > drive_uninit() > >> > bdrv_delete() > >> > > >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer > >> > however, the block devices retain a copy of this pointer, see > >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. > >> > > >> > We now have a use-after-free situation. If the guest continues to issue IO > >> > against the device, and we've reallocated the memory that the BlockDriverState > >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. > >> > >> "we will fail the bs->drv checks" is misleading, in my opinion. Here's > >> what happens: > >> > >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get > >> dropped. Works as designed. > >> > >> 2. drive_uninit() frees the bs. Since the device is still connected to > >> bs, any subsequent I/O is a use-after-free. > >> > >> The value of bs->drv becomes unpredictable on free. As long as it > >> remains null, I/O still gets dropped. I/O crashes or worse once that > >> changed. Could be right on free, could be much later. > >> > >> If you respin anyway, please clarify your description. > > > > Sure. I wasn't planning a new version, but I'll update and send anyhow > > as I didn't see it get included in pull from the block branch. > >> > >> > To resolve this issue as simply as possible, we can chose to not actually > >> > delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv > >> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST > >> > that is used to enumerate the block devices. This is currently handled within > >> > bdrv_delete, so move this into it's own function, bdrv_remove(). > >> > >> Why do we remove the BlockDriverState from bdrv_states? Because we want > >> drive_del make its *name* go away. > >> > >> Begs the question: is the code prepared for a BlockDriverState object > >> that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already > >> creates such objects when the device_name is empty. This is used for > >> internal BlockDriverStates such as COW backing files. Your code makes > >> device_name empty when taking the object off bdrv_states, so we're good. > >> > >> Begs yet another question: how does the behavior of a BlockDriverState > >> change when it's taken off bdrv_states, and is that the behavior we > >> want? Changes: > >> > >> * bdrv_delete() no longer takes it off bdrv_states. Good. > >> > >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer > >> cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() > >> do nothing anyway for closed BlockDriverStates. > >> > >> * "info block" and "info blockstats" no longer show it, because > >> bdrv_info() and bdrv_info_stats() no longer see it. Okay. > >> > >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? > >> Please check their uses and report. > > > > 1 664 block-migration.c <> > > bs = bdrv_find(device_name); > > > > - no longer see it. This is fine since we can't migrate a block > > device that has been removed > > > > 2 562 blockdev.c <> > > bs = bdrv_find(device); > > > > - do_commit won't see it in either when calling bdrv_commit_all() > > Fine as you mention above. If user specifies the device name > > we won't find it, that's OK because we can't commit data against > > a closed BlockDriverState. > > > > 3 587 blockdev.c <> > > bs = bdrv_find(device); > > > > - OK, cannot take a snapshot against a deleted BlockDriverState > > > > 4 662 blockdev.c <> > > bs = bdrv_find(filename); > > > > - OK, cannot eject a deleted BlockDriverState; > > > > 5 676 blockdev.c <> > > bs = bdrv_find(qdict_get_str(qdict, "device")); > > > > - OK, cannot set password a deleted BlockDriverState; > > > > 6 701 blockdev.c <> > > bs = bdrv_find(device); > > > > - OK, cannot change the file/device of a deleted BlockDriverState; > > > > 7 732 blockdev.c <> > > bs = bdrv_find(id); > > > > - OK, cannot delete an already deleted Drive > > > > 8 783 blockdev.c <> > > bs = bdrv_find(device); > > > > - OK, cannot resize a deleted Drive > > > > 9 312 hw/qdev-properties.c <> > > bs = bdrv_find(str); > > > > - Used when invoking qdev_prop_drive .parse method; parse method is invoked via > > qdev_device_add() which calls set_property() which invokes parse. AFAICT, this is OK > > since we won't be going down the device add path worrying about a > > deleted block device. > > Thanks for checking! > > >> > The result is that we can now invoke drive_del, this closes the file descriptors > >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device, > >> > and since we do not free BlockDriverState, we don't have to worry about the copy > >> > retained in the block devices. > >> > >> Yep. But there's one more question: is the BlockDriverState freed when > >> the device using it gets destroyed? > >> > >> qdev_free() runs prop->info->free() for all properties. The drive > >> property's free() is free_drive(): > >> > >> static void free_drive(DeviceState *dev, Property *prop) > >> { > >> BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > >> > >> if (*ptr) { > >> bdrv_detach(*ptr, dev); > >> blockdev_auto_del(*ptr); > >> } > >> } > >> > >> This should indeed delete the drive. But only if the property still > >> points to it. See below. > >> > >> > Reported-by: Marcus Armbruster > >> > Signed-off-by: Ryan Harper > >> > --- > >> > v1->v2 > >> > - NULL bs->device_name after removing from list to prevent > >> > second removal. > >> > > >> > block.c | 12 +++++++++--- > >> > block.h | 1 + > >> > blockdev.c | 2 +- > >> > 3 files changed, 11 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/block.c b/block.c > >> > index 1544d81..0df9942 100644 > >> > --- a/block.c > >> > +++ b/block.c > >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void) > >> > } > >> > } > >> > > >> > +void bdrv_remove(BlockDriverState *bs) > >> > +{ > >> > + if (bs->device_name[0] != '\0') { > >> > + QTAILQ_REMOVE(&bdrv_states, bs, list); > >> > + } > >> > + bs->device_name[0] = '\0'; > >> > +} > >> > + > >> > >> I don't like this name. What's the difference between "delete" and > >> "remove"? > >> > >> The function zaps the device name. bdrv_make_anon()? > > > > bdrv_delist? bdrv_hide? I'm also fine with make_anon. > > Any of the three works for me. "delist" seems the least obvious, but > that's a matter of taste. > > >> > void bdrv_delete(BlockDriverState *bs) > >> > { > >> > assert(!bs->peer); > >> > > >> > /* remove from list, if necessary */ > >> > - if (bs->device_name[0] != '\0') { > >> > - QTAILQ_REMOVE(&bdrv_states, bs, list); > >> > - } > >> > + bdrv_remove(bs); > >> > > >> > bdrv_close(bs); > >> > if (bs->file != NULL) { > >> > diff --git a/block.h b/block.h > >> > index 5d78fc0..8447397 100644 > >> > --- a/block.h > >> > +++ b/block.h > >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, > >> > QEMUOptionParameter *options); > >> > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); > >> > BlockDriverState *bdrv_new(const char *device_name); > >> > +void bdrv_remove(BlockDriverState *bs); > >> > void bdrv_delete(BlockDriverState *bs); > >> > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); > >> > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, > >> > diff --git a/blockdev.c b/blockdev.c > >> > index 0690cc8..1f57b50 100644 > >> > --- a/blockdev.c > >> > +++ b/blockdev.c > >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > >> > >> Let me add a bit more context: > >> > >> bdrv_flush(bs); > >> bdrv_close(bs); > >> > >> /* clean up guest state from pointing to host resource by > >> * finding and removing DeviceState "drive" property */ > >> if (bs->peer) { > >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { > >> if (prop->info->type == PROP_TYPE_DRIVE) { > >> ptr = qdev_get_prop_ptr(bs->peer, prop); > >> if (*ptr == bs) { > >> bdrv_detach(bs, bs->peer); > >> *ptr = NULL; > >> break; > >> } > >> } > >> } > >> > } > >> > >> This zaps the drive property. A subsequent free_drive() will do > >> nothing. We leak the BlockDriverState on device unplug, I'm afraid. > >> > >> Any reason why we still want to zap the drive property? > > > > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if > > we're keeping the bs around anyhow, then I don't think we need to remove > > the property. > > Agree. > > > One last check would be to see what if the device still > > shows up in qtree if we don't remove the property. > > I expect it to show up as "drive = " instead of "drive = ", > because: > > static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len) > { > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > return snprintf(dest, len, "%s", > *ptr ? bdrv_get_device_name(*ptr) : ""); > } > > and > > const char *bdrv_get_device_name(BlockDriverState *bs) > { > return bs->device_name; > } > > If the empty right hand side bothers you, you can change print_drive() > to print something else then, say "", or "". > > Looking forward to v3 :) After our chat: here's what I've got in v3: - Update drive_del use after free description - s/bdrv_remove/bdrv_make_anon/g - Don't remove qdev property since we don't delete bs any more - If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing drives with no device. Sending out v3 as a top-post now. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ryanh@us.ibm.com