From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53033) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WXwzV-0002Yh-HI for qemu-devel@nongnu.org; Wed, 09 Apr 2014 14:12:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WXwzP-0007B4-9d for qemu-devel@nongnu.org; Wed, 09 Apr 2014 14:12:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46770) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WXwzP-0007Aq-0t for qemu-devel@nongnu.org; Wed, 09 Apr 2014 14:12:47 -0400 Date: Wed, 9 Apr 2014 14:12:39 -0400 From: Jeff Cody Message-ID: <20140409181239.GB8087@localhost.localdomain> References: <1394436370-8908-1-git-send-email-famz@redhat.com> <1394436370-8908-9-git-send-email-famz@redhat.com> <20140407184712.GD4123@localhost.localdomain> <87a9bw1blj.fsf@blackfin.pond.sub.org> <20140408090711.GE11793@T430.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140408090711.GE11793@T430.redhat.com> Subject: Re: [Qemu-devel] [PATCH v17 08/14] block: Support dropping active in bdrv_drop_intermediate List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: kwolf@redhat.com, benoit.canet@irqsave.net, rjones@redhat.com, Markus Armbruster , qemu-devel@nongnu.org, ptoscano@redhat.com, imain@redhat.com, stefanha@redhat.com, pbonzini@redhat.com On Tue, Apr 08, 2014 at 05:07:38PM +0800, Fam Zheng wrote: > On Tue, 04/08 10:15, Markus Armbruster wrote: > > Jeff Cody writes: > > > > > On Mon, Mar 10, 2014 at 03:26:04PM +0800, Fam Zheng wrote: > > >> Dropping intermediate could be useful both for commit and stream, and > > >> BDS refcnt plus bdrv_swap could do most of the job nicely. It also needs > > >> to work with op blockers. > > >> > > >> Signed-off-by: Fam Zheng > > >> --- > > >> block.c | 139 ++++++++++++++++++++++++++++----------------------------- > > >> block/commit.c | 2 +- > > >> 2 files changed, 70 insertions(+), 71 deletions(-) > > >> > > >> diff --git a/block.c b/block.c > > >> index 05f7766..0af7c62 100644 > > >> --- a/block.c > > >> +++ b/block.c > > >> @@ -2503,115 +2503,114 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active, > > >> return overlay; > > >> } > > >> > > >> -typedef struct BlkIntermediateStates { > > >> - BlockDriverState *bs; > > >> - QSIMPLEQ_ENTRY(BlkIntermediateStates) entry; > > >> -} BlkIntermediateStates; > > >> - > > >> - > > >> /* > > >> - * Drops images above 'base' up to and including 'top', and sets the image > > >> - * above 'top' to have base as its backing file. > > >> + * Drops images above 'base' up to and including 'top', and sets new 'base' as > > >> + * backing_hd of top's overlay (the image orignally has 'top' as backing file). > > >> + * top's overlay may be NULL if 'top' is active, no such update needed. > > >> + * Requires that the top's overlay to 'top' is opened r/w. > > >> + * > > >> + * 1) This will convert the following chain: > > >> + * > > >> + * ... <- base <- ... <- top <- overlay <-... <- active > > >> * > > >> - * Requires that the overlay to 'top' is opened r/w, so that the backing file > > >> - * information in 'bs' can be properly updated. > > >> + * to > > >> + * > > >> + * ... <- base <- overlay <- active > > >> + * > > >> + * 2) It is allowed for bottom==base, in which case it converts: > > >> * > > >> - * E.g., this will convert the following chain: > > >> - * bottom <- base <- intermediate <- top <- active > > >> + * base <- ... <- top <- overlay <- ... <- active > > >> * > > >> * to > > >> * > > >> - * bottom <- base <- active > > >> + * base <- overlay <- active > > >> * > > >> - * It is allowed for bottom==base, in which case it converts: > > >> + * 2) It also allows active==top, in which case it converts: > > >> * > > >> - * base <- intermediate <- top <- active > > >> + * ... <- base <- ... <- top (active) > > >> * > > >> * to > > >> * > > >> - * base <- active > > >> + * ... <- base == active == top > > >> + * > > >> + * i.e. only base and lower remains: *top == *base when return. > > >> + * > > >> + * 3) If base==NULL, it will drop all the BDS below overlay and set its > > >> + * backing_hd to NULL. I.e.: > > >> * > > >> - * Error conditions: > > >> - * if active == top, that is considered an error > > >> + * base(NULL) <- ... <- overlay <- ... <- active > > >> + * > > >> + * to > > >> + * > > >> + * overlay <- ... <- active > > >> * > > >> */ > > >> int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top, > > >> BlockDriverState *base) > > >> { > > >> - BlockDriverState *intermediate; > > >> - BlockDriverState *base_bs = NULL; > > >> - BlockDriverState *new_top_bs = NULL; > > >> - BlkIntermediateStates *intermediate_state, *next; > > >> - int ret = -EIO; > > >> - > > >> - QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete; > > >> - QSIMPLEQ_INIT(&states_to_delete); > > >> + BlockDriverState *drop_start, *overlay, *bs; > > >> + int ret = -EINVAL; > > >> > > >> - if (!top->drv || !base->drv) { > > >> + assert(active); > > >> + assert(top); > > >> + /* Verify that top is in backing chain of active */ > > >> + bs = active; > > >> + while (bs && bs != top) { > > >> + bs = bs->backing_hd; > > >> + } > > >> + if (!bs) { > > >> goto exit; > > >> } > > >> + /* Verify that base is in backing chain of top */ > > >> + if (base) { > > >> + while (bs && bs != base) { > > >> + bs = bs->backing_hd; > > >> + } > > >> + if (bs != base) { > > >> + goto exit; > > >> + } > > >> + } > > >> > > >> - new_top_bs = bdrv_find_overlay(active, top); > > >> - > > >> - if (new_top_bs == NULL) { > > >> - /* we could not find the image above 'top', this is an error */ > > >> + if (!top->drv || (base && !base->drv)) { > > >> goto exit; > > >> } > > >> - > > >> - /* special case of new_top_bs->backing_hd already pointing to base - nothing > > >> - * to do, no intermediate images */ > > >> - if (new_top_bs->backing_hd == base) { > > >> + if (top == base) { > > >> + ret = 0; > > >> + goto exit; > > >> + } else if (top == active) { > > >> + assert(base); > > >> + drop_start = active->backing_hd; > > >> + bdrv_swap(active, base); > > > > > > This will assert in block.c, in bdrv_swap, on the test for > > > anonymity of active. (For testing, I changed the active layer commit > > > in mirror to use bdrv_drop_intermediate()). > > Jeff, you're right, because bdrv_swap requires first argument to be a "new" > BDS, while we are passing in an top BDS. > > But what happens if we write bdrv_swap(base, active)? > That seems like it could work - I did a quick test, and did not run into any issues, going from active->base, and active->intermediate. > > > > > > Unfortunately, there are other problems as well (anonymity could be > > > fixed by bdrv_make_anon(active)). > > > > > > Using line numbers from my version of block.c, lines 1957, 1959, and > > > 1960 will each cause an assert (these lines are all in bdrv_swap()): > > > > > > 1956: /* bs_new must be anonymous and shouldn't have anything fancy > > > enabled */ > > > 1957: assert(bs_new->device_name[0] == '\0'); > > > 1958: assert(QLIST_EMPTY(&bs_new->dirty_bitmaps)); > > > 1959: assert(bs_new->job == NULL); > > > 1960: assert(bs_new->dev == NULL); > > > > > > Markus - on line 1960 above, is it safe to remove that check (and the > > > other check further down in bdrv_swap())? > > > > I guess you mean the one under /* Check a few fields that should remain > > attached to the device */ by "the other check". > > > > bdrv_swap() is a scary, scary function. It has a number of > > preconditions, and we've tried to make them explicit in assertions. > > > > The preconditions could be: > > > > (0) Implicit. Or less politely said: unstated / unknown. > > > > (1) Explicit, but wrong. > > > > (2) Restrictions: the swapping code doesn't cover this state, but it > > could be made to cover it, relaxing the precondition. > > > > (3) Fundamental: the precondition cannot or should not be relaxed. > > > > BDS member dev is is the back pointer for the device model property > > pointing to the device model's backend. It must stay on top, and > > bdrv_move_feature_fields() duly moves it, along with its buddies dev_ops > > and dev_opaque. Obviously, only one of bdrv_old and bdrv_new can be on > > top at the same time. bdrv_swap() currently assumes that bdrv_old is > > the top one on entry. Feels like an instance of (3). > > > > > Thinking about it more, there may be other landmines in bdrv_swap() > > > for this case; prior to this, bdrv_swap() was always called with > > > bs_new being a newly-created BDS. With the active layer BDS it almost > > > certainly is not 'new', and could have other "prohibited" fields set, > > > in addition to the above (e.g. io limits, etc.) > > > > Hairy... > > > > We talked about splitting BlockBackend off BDS with an axe. Do you > > think that would make this series less hairy? > > Hopefully, because it will help us to get rid of bdrv_move_feature_fields. :) > > But if we need to get rid of bdrv_swap, we need a generic "level of indirect" > between BDS and BDS user, so we don't need to worry about the validity of old > BDS pointers when we update the chain. Just like BlockBackend does at device > emulation side. That's an even bigger project. > > Thanks, > Fam