From: Jeff Cody <jcody@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: kwolf@redhat.com, benoit.canet@irqsave.net, rjones@redhat.com,
qemu-devel@nongnu.org, armbru@redhat.com, imain@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH v14 08/14] block: Support dropping active in bdrv_drop_intermediate
Date: Wed, 19 Feb 2014 18:24:47 -0500 [thread overview]
Message-ID: <20140219232447.GA30664@localhost.localdomain> (raw)
In-Reply-To: <20140219212230.GA8204@localhost.localdomain>
On Wed, Feb 19, 2014 at 04:22:30PM -0500, Jeff Cody wrote:
> On Wed, Feb 19, 2014 at 09:42:25PM +0800, Fam Zheng wrote:
> > Dropping intermediate could be useful both for commit and stream, and
> > BDS refcnt plus bdrv_swap could do most of the job nicely. It also needs
> > to work with op blockers.
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> > block.c | 146 +++++++++++++++++++++++++--------------------------------
> > block/commit.c | 1 +
> > 2 files changed, 66 insertions(+), 81 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index a2bf24c..cf41f3d 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -2485,115 +2485,99 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
> > return overlay;
> > }
> >
> > -typedef struct BlkIntermediateStates {
> > - BlockDriverState *bs;
> > - QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> > -} BlkIntermediateStates;
> > -
> > -
> > /*
> > - * Drops images above 'base' up to and including 'top', and sets the image
> > - * above 'top' to have base as its backing file.
> > + * Drops images above 'base' up to and including 'top', and sets new 'base'
> > + * as backing_hd of top_overlay (the image orignally has 'top' as backing
>
> What is 'top_overlay'? Do you mean "top's overlay" by this?
>
> > + * file). top_overlay may be NULL if 'top' is active, no such update needed.
> > + * Requires that the top_overlay to 'top' is opened r/w.
> > *
> > - * Requires that the overlay to 'top' is opened r/w, so that the backing file
> > - * information in 'bs' can be properly updated.
> > + * 1) This will convert the following chain:
> > *
> > - * E.g., this will convert the following chain:
> > - * bottom <- base <- intermediate <- top <- active
> > + * ... <- base <- ... <- top <- overlay <-... <- active
> > *
> > * to
> > *
> > - * bottom <- base <- active
> > + * ... <- base <- overlay <- active
> > *
> > - * It is allowed for bottom==base, in which case it converts:
> > + * 2) It is allowed for bottom==base, in which case it converts:
> > *
> > - * base <- intermediate <- top <- active
> > + * base <- ... <- top <- overlay <- ... <- active
> > *
> > * to
> > *
> > - * base <- active
> > + * base <- overlay <- active
> > + *
> > + * 2) It also allows active==top, in which case it converts:
> > + *
> > + * ... <- base <- ... <- top (active)
> > + *
> > + * to
> > + *
> > + * ... <- base == active == top
> > + *
> > + * i.e. only base and lower remains: *top == *base when return.
> > + *
> > + * 3) If base==NULL, it will drop all the BDS below overlay and set its
> > + * backing_hd to NULL. I.e.:
> > + *
> > + * base(NULL) <- ... <- overlay <- ... <- active
> > + *
> > + * to
> > *
> > - * Error conditions:
> > - * if active == top, that is considered an error
> > + * overlay <- ... <- active
> > *
> > */
> > int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
> > BlockDriverState *base)
>
> With the active case, we aren't necessarily really just dropping
> intermediate images anymore. Maybe we should rename this function now to
> 'bdrv_rebase_chain()'?
>
> > {
> > - BlockDriverState *intermediate;
> > - BlockDriverState *base_bs = NULL;
> > - BlockDriverState *new_top_bs = NULL;
> > - BlkIntermediateStates *intermediate_state, *next;
> > - int ret = -EIO;
> > -
> > - QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
> > - QSIMPLEQ_INIT(&states_to_delete);
> > -
> > - if (!top->drv || !base->drv) {
> > - goto exit;
> > - }
> > -
> > - new_top_bs = bdrv_find_overlay(active, top);
> > + BlockDriverState *drop_start, *overlay;
> > + int ret = -EINVAL;
> >
> > - if (new_top_bs == NULL) {
> > - /* we could not find the image above 'top', this is an error */
> > + if (!top->drv || (base && !base->drv)) {
> > goto exit;
> > }
> > -
> > - /* special case of new_top_bs->backing_hd already pointing to base - nothing
> > - * to do, no intermediate images */
> > - if (new_top_bs->backing_hd == base) {
> > + if (top == base) {
> > ret = 0;
> > - goto exit;
> > - }
> > -
> > - intermediate = top;
> > -
> > - /* now we will go down through the list, and add each BDS we find
> > - * into our deletion queue, until we hit the 'base'
> > - */
> > - while (intermediate) {
> > - intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
> > - intermediate_state->bs = intermediate;
> > - QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
> > -
> > - if (intermediate->backing_hd == base) {
> > - base_bs = intermediate->backing_hd;
> > - break;
> > + } else if (top == active) {
> > + assert(base);
> > + drop_start = active->backing_hd;
> > + bdrv_swap(active, base);
> > + base->backing_hd = NULL;
> > + bdrv_unref(drop_start);
> > + ret = 0;
>
> This now orphans everything between active->backing_hd and the
> original base, without performing a bdrv_unref/delete on them.
>
Actually, nevermind on that - bdrv_delete() will end up recursively
closing the chain, until it reaches a NULL backing_hd.
So we have something like this:
|||<-- ([base]) <-- [B] <-- [A] <----- ([active])
bdrv_swap(active,base):
------ ([active]) <-- [B] <-- [A] |||-- ([base])
| ^
| |
--------------------------------
base->backing_hd = NULL:
|||<-- ([active]) <-- [B] <-- [A] |||-- ([base])
bdrv_unref(drop_start) will then close from [A] down until it reaches
a NULL backing_hd, and all we will be left is:
|||-- ([base])
So we are OK.
> > + } else {
> > + /* If there's an overlay, its backing_hd points to top's BDS now,
> > + * the top image is dropped but this BDS structure is kept and swapped
> > + * with base, this way we keep the pointers valid after dropping top */
> > + overlay = bdrv_find_overlay(active, top);
> > + if (!overlay) {
> > + goto exit;
> > + }
>
>
> > + if (base) {
> > + ret = bdrv_change_backing_file(overlay, base->filename,
> > + base->drv->format_name);
> > + } else {
> > + ret = bdrv_change_backing_file(overlay, NULL, NULL);
> > + }
> > + if (ret) {
> > + goto exit;
> > + }
> > + if (base) {
> > + drop_start = top->backing_hd;
> > + bdrv_swap(top, base);
> > + /* Break the loop formed by bdrv_swap */
> > + bdrv_set_backing_hd(base, NULL);
>
> And in the non-active case here, everything between top->backing_hd
> and the original base is orphaned as well. These should all be
> explicitly unreferenced.
Same here, bdrv_unref() will eventually go through the chain, starting
from top->backing_hd. But this is a problem; won't we end up in a
loop then?
Take this chain:
drop_start = [A]
|||-- ([base]) <-- [B] <--- [A] <--- ([top]) <--- [active]
bdrv_swap(top, base):
-- [B] <-- [A] <-- ([top]) |||--- ([base]) <-- [active]
| ^
| |
---------------------
Then we call bdrv_unref(drop_start (or bdrv_set_backing_hd() does),
and we end up with:
bdrv_unref(A)
bdrv_unref(B)
bdrv_unref(top)
bdrv_unref(A) <--- assert
.....
So I think we want this line:
> > + bdrv_set_backing_hd(base, NULL);
To be:
> > + bdrv_set_backing_hd(top, NULL);
Right? Or, just set top->backing_hd = NULL, so we get:
-- [B] <-- [A] |||-- ([top]) |||--- ([base]) <-- [active]
| ^
| |
---------------------------
bdrv_unref(A)
bdrv_unref(B)
bdrv_unref(top)
Which leaves:
|||--- ([base]) <-- [active]
So this part above still needs addressing, I think.
>
> Also, side effect:
> Caller needs to beware now that base and top are now swapped [1].
>
> > + } else {
> > + bdrv_set_backing_hd(overlay, NULL);
> > + drop_start = top;
>
> Again, everything between top and the original base is orphaned, but
> should be cleaned up.
>
> Caller does not have to worry about base and top being swapped [1].
>
This should be fine, I think.
I think everything else I mentioned below this point is still
relevant, however.
>
> > }
> > - intermediate = intermediate->backing_hd;
> > - }
> > - if (base_bs == NULL) {
> > - /* something went wrong, we did not end at the base. safely
> > - * unravel everything, and exit with error */
> > - goto exit;
> > - }
> > -
> > - /* success - we can delete the intermediate states, and link top->base */
> > - ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
> > - base_bs->drv ? base_bs->drv->format_name : "");
> > - if (ret) {
> > - goto exit;
> > - }
> > - new_top_bs->backing_hd = base_bs;
> > -
> > - bdrv_refresh_limits(new_top_bs);
> >
> > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
> > - /* so that bdrv_close() does not recursively close the chain */
> > - intermediate_state->bs->backing_hd = NULL;
> > - bdrv_unref(intermediate_state->bs);
> > + bdrv_unref(drop_start);
>
> We will get an assertion here. In the non-active case, the backing_hd
> is explicitly set to NULL via bdrv_set_backing_hd(). That function
> will call bdrv_unref() on the same BDS that drop_start was assigned,
> so we have a double call to bdrv_unref().
>
> > }
> > - ret = 0;
> > -
> > exit:
> > - QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
> > - g_free(intermediate_state);
> > - }
> > return ret;
> > }
> >
> > -
> > static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
> > size_t size)
> > {
> > diff --git a/block/commit.c b/block/commit.c
> > index acec4ac..b10eb79 100644
> > --- a/block/commit.c
> > +++ b/block/commit.c
> > @@ -142,6 +142,7 @@ wait:
> > if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> > /* success */
> > ret = bdrv_drop_intermediate(active, top, base);
> > + base = top;
>
> This is where it is highlighted to me how odd it is to use the side
> effects of bdrv_swap() in bdrv_drop_intermediate() for the non-active
> layer case.
>
> The function bdrv_drop_intermediate() is now actually pretty complex
> and tricky to use, with side effects that the caller needs to beware
> of, that change depending on the nature of the arguments passed.
>
> [1] Side affects, depending on active, top, and base:
>
> active = top | base = NULL | side effect
> -----------------------------------------------
> (A) false | false | top and base are swapped
> (B) false | true | none
> (C) true | false | top and base are swapped
> (D) true | true | assert()
>
>
> Case (C) is reasonable, because active and base need to be swapped,
> and top == active. It is expected almost by definition.
>
> Case (A) is a bit odd, especially in light of case (B).
>
>
> > }
> >
> > exit_free_buf:
>
>
> Further down, out of the context of this patch, we have:
>
>
> exit_restore_reopen:
> /* restore base open flags here if appropriate (e.g., change the base back
> * to r/o). These reopens do not need to be atomic, since we won't abort
> * even on failure here */
> if (s->base_flags != bdrv_get_flags(base)) {
> bdrv_reopen(base, s->base_flags, NULL);
> }
>
> OK, 'base' is the one we want to operate on now, that was set to
> 'top', which has the contents of the old 'base'.
>
>
> overlay_bs = bdrv_find_overlay(active, top);
>
> Will we find the right overlay here? I think now overlay_bs will
> always be NULL, so we won't restore the r/o flags (if set) for the
> overlay of the original 'top'.
>
> if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
> bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
> }
>
>
>
>
>
>
> > --
> > 1.8.5.4
> >
> >
next prev parent reply other threads:[~2014-02-19 23:27 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-19 13:42 [Qemu-devel] [PATCH v14 00/14] Drop in_use from BlockDriverState and enable point-in-time snapshot exporting over NBD Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 01/14] block: Add BlockOpType enum Fam Zheng
2014-02-19 15:25 ` Benoît Canet
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 02/14] block: Introduce op_blockers to BlockDriverState Fam Zheng
2014-02-19 15:26 ` Benoît Canet
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 03/14] block: Replace in_use with operation blocker Fam Zheng
2014-02-19 15:26 ` Benoît Canet
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 04/14] block: Move op_blocker check from block_job_create to its caller Fam Zheng
2014-02-19 15:28 ` Benoît Canet
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 05/14] block: Add bdrv_set_backing_hd() Fam Zheng
2014-02-19 15:27 ` Benoît Canet
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 06/14] block: Add backing_blocker in BlockDriverState Fam Zheng
2014-02-19 15:32 ` Benoît Canet
2014-02-19 21:17 ` Jeff Cody
2014-02-20 5:01 ` Fam Zheng
2014-02-20 5:08 ` Jeff Cody
2014-02-20 8:28 ` Fam Zheng
2014-02-20 11:59 ` Jeff Cody
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 07/14] block: Parse "backing" option to reference existing BDS Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 08/14] block: Support dropping active in bdrv_drop_intermediate Fam Zheng
2014-02-19 15:34 ` Benoît Canet
2014-02-19 21:22 ` Jeff Cody
2014-02-19 23:24 ` Jeff Cody [this message]
2014-02-20 4:37 ` Fam Zheng
2014-02-20 5:57 ` Jeff Cody
2014-02-20 8:34 ` Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 09/14] stream: Use bdrv_drop_intermediate and drop close_unused_images Fam Zheng
2014-02-19 21:23 ` Jeff Cody
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 10/14] qmp: Add command 'blockdev-backup' Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 11/14] block: Allow backup on referenced named BlockDriverState Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 12/14] block: Add blockdev-backup to transaction Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 13/14] qemu-iotests: Test blockdev-backup in 055 Fam Zheng
2014-02-19 13:42 ` [Qemu-devel] [PATCH v14 14/14] qemu-iotests: Image fleecing test case 081 Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140219232447.GA30664@localhost.localdomain \
--to=jcody@redhat.com \
--cc=armbru@redhat.com \
--cc=benoit.canet@irqsave.net \
--cc=famz@redhat.com \
--cc=imain@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rjones@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.