qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration
@ 2016-12-05 12:03 Markus Armbruster
  2016-12-06 11:03 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-12-07  2:55 ` [Qemu-devel] " Fam Zheng
  0 siblings, 2 replies; 11+ messages in thread
From: Markus Armbruster @ 2016-12-05 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

I recently met Kevin, and we discussed two block layer topics in some
depth.

= -blockdev =

We want a command line option to mirror QMP blockdev-add for 2.9.
QemuOpts has to grow from "list of (key, simple value) plus conventions
to support lists of simple values in limited ways" to the expressive
power of JSON.

== Basic idea ==

QMP pipeline: JSON string - JSON parser - QObject - QObject input
visitor - QAPI object.  For commands with with 'gen': false, we stop at
QObject.  These are rare.

Command line now: option argument string - QemuOpts parser - QemuOpts.

We occasionally continue - options or string input visitor - QAPI
object.  Both visitors can't do arbitrary QAPI objects.  Both visitors
extend QemuOpts syntax.

Daniel Berrange posted patches to instead do - crumple - QObject -
qobject input visitor - QAPI object.  Arbitrary QObjects (thus QAPI
objects) are possible with dotted key convention, which is already used
by block layer.

As before, a visitor sits on top of QemuOpts, providing syntax
extensions.  Stacking parsers like that is not a good idea.  We want
*one* option argument parser, and we need it to yield a QObject.

== Backward compatibility issues ==

* Traditional key=value,... syntax

* The "repeated key is list" hack

* Options and string input visitor syntax extensions

* Dotted key convention

Hopefully, most of the solutions can be adapted from Daniel's patches.

== Type ambguity ==

In JSON, the type of a value is syntactically obvious.  The JSON parser
yields QObject with these types.  The QObject input visitor rejects
values with types that don't match the QAPI schema.

In the traditional key=value command line syntax, the type of a value
isn't obvious.  Options and string input visitor convert the string
value to the type expected by the QAPI schema.

Unlike a QObject from JSON, a QObject from QemuOpts has only string
values, and the QObject input visitor needs to be able to convert
instead of reject.  Daniel's patches do that.

== Action item ==

Markus to explore the proposed solution as soon as possible.


= Dynamic block backend reconfiguration =

== Mirror job ==

State before the job:

    frontend
        |
       BB
        |
       BDS

Frontend writes flow down.

Passive mirror job, as it currently works:

    frontend   mirror-job
        |       |      |
        BB      BB'    BB2
        |  ____/       |
        | /            |
       BDS            BDS2

The mirror job copies the contents of BDS to BDS2.  To handle frontend
writes, BDS maintains a dirty bitmap, which mirror-job uses to copy
updates from BB' to BB2.

Pivot to mirror on job completion: replace BB's child BDS by BDS2,
delete mirror-job and its BB', BB2.

    frontend
        |
        BB
         \_____________
                       \
       BDS            BDS2


Future mirror job using a mirror-filter:

    frontend   mirror-job
        |         |
        BB       BB'
        |        /
    mirror-filter
        |        \
       BDS      BDS2

Passive mirror-filter: maintains dirty bitmap, copies from BDS to BDS2.

Active mirror-filter: no dirty bitmap, mirrors writes to BDS2 directly.

Can easily switch from passive to active at any time.

Pivot: replace parent of mirror-filter's child mirror-filter by BDS2,
delete mirror job and its BB'.  "Parent of" in case other filters have
been inserted: we drop the ones below mirror-filter, and keep the ones
above.

== Backup job ==

Current backup job:

    frontend   backup-job
        |       |      |
        BB      BB'    BB2
        |  ____/       |
        | /            |
       BDS            BDS2

The backup job copies the contents of BDS to BDS2.  To handle frontend
writes, BDS provices a before-write-notifier, backup-job uses it to copy
old data from BB' to BB2 right before it's overwritten.

Pivot: delete backup-job and its BB', BB2.

    frontend
        |
        BB
        |
        |
       BDS            BDS2

Future backup job using a backup-filter:

    frontend   backup-job
        |         |
        BB       BB'
        |        /
    backup-filter
        |       \
       BDS      BDS2

backup-filter copies old data from BDS to BDS2 before it forwards write
to BDS.

Pivot: replace parent of backup-filter's child backup-filter by BDS2,
delete backup-job and its BB'.

== Commit job ==

State before the job:

       frontend
           |
           BB
           |
         QCOW2
    file /   \ backing
        /     \
       /       \
     BDS1     QCOW2_top
         file /   \ backing
             /     .
          BDS_top   .
                     \
                  BDS_base

"file" and "backing" are the QCOW2 child names for the delta image and
the backing image, respectively.

Frontend writes flow to BDS1.

Current commit job to commit from QCOW2_top down to BDS_base:

       frontend
           |
           BB               commit-job
           |                 /     \
         QCOW2           BB_top  BB_base
    file /   \ backing     /       /
        /     \   ________/       /
       /       \ /               /
     BDS1     QCOW2_top         /
         file /   \ backing    /
             /     .          /
          BDS_top   .   _____/
                     \ /
                  BDS_base

commit-job copies anything allocated above BDS_base up to BDS_top from
BB_top to BB_base.

Pivot: replace backing child of QCOW2_top by BDS_base, delete commit-job
and its BB_top, BB_base.

       frontend
           |
           BB
           |
         QCOW2
    file /   \ backing
        /     \
       /       \
     BDS1     QCOW2_top
         file /   \ backing
             /     \
          BDS_top  BDS_base

Drops any filters meanwhile inserted between QCOW2_top and BDS_base.
Should we have a (otherwise no op) commit-filter node to provide a place
for filters we want to keep?  Would op blockers need / profit from such
a filter?

== Streaming job ==

Just like commit (hopefully).

== Basic dynamic reconfiguration operation ==

The basic operation is "replace child".

Beware of race conditions.  Consider:

          BB
          |
    mirror-filter
          |
         BDS

Add a throttle filter under BB while the mirror job is running.  First
step, create the filter:

          BB    throttle-filter
          |     /
    mirror-filter
          |
         BDS

Second step, replace child of BB by the new filter:

          BB
          |
   throttle-filter
          |
    mirror-filter
          |
         BDS

But: if mirror-filter goes away between the two steps, the replace
brings it right back!

To guard against such races, we need to specify both ends of the edge
being replaced, i.e. parent, child name, actual child.  Then the replace
step fails if the mirror-filter has gone away.  We can either fail the
whole operation, or start over.

Alternatively, transactions, but that feels much more complex.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-05 12:03 [Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration Markus Armbruster
@ 2016-12-06 11:03 ` Stefan Hajnoczi
  2016-12-07  2:55 ` [Qemu-devel] " Fam Zheng
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-12-06 11:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 261 bytes --]

On Mon, Dec 05, 2016 at 01:03:50PM +0100, Markus Armbruster wrote:
> I recently met Kevin, and we discussed two block layer topics in some
> depth.

Thanks for sharing this.  Both topics were a good read and the direction
you are heading in looks good.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-05 12:03 [Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration Markus Armbruster
  2016-12-06 11:03 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2016-12-07  2:55 ` Fam Zheng
  2016-12-07  9:48   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  1 sibling, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2016-12-07  2:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, qemu-block

On Mon, 12/05 13:03, Markus Armbruster wrote:
> == Basic dynamic reconfiguration operation ==
> 
> The basic operation is "replace child".
> 
> Beware of race conditions.  Consider:
> 
>           BB
>           |
>     mirror-filter
>           |
>          BDS
> 
> Add a throttle filter under BB while the mirror job is running.  First
> step, create the filter:
> 
>           BB    throttle-filter
>           |     /
>     mirror-filter
>           |
>          BDS
> 
> Second step, replace child of BB by the new filter:
> 
>           BB
>           |
>    throttle-filter
>           |
>     mirror-filter
>           |
>          BDS
> 
> But: if mirror-filter goes away between the two steps, the replace
> brings it right back!
> 
> To guard against such races, we need to specify both ends of the edge
> being replaced, i.e. parent, child name, actual child.  Then the replace
> step fails if the mirror-filter has gone away.  We can either fail the
> whole operation, or start over.
> 
> Alternatively, transactions, but that feels much more complex.
> 

Isn't it easy to make creating throttle-filter and replacing child happen in the
same critical section of BQL, without any coroutine yield?  If so I think there
is no race to worry about, mirror-filter should go away only after a QMP command.

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-07  2:55 ` [Qemu-devel] " Fam Zheng
@ 2016-12-07  9:48   ` Kevin Wolf
  2016-12-07 10:03     ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2016-12-07  9:48 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Markus Armbruster, qemu-devel, qemu-block

Am 07.12.2016 um 03:55 hat Fam Zheng geschrieben:
> On Mon, 12/05 13:03, Markus Armbruster wrote:
> > == Basic dynamic reconfiguration operation ==
> > 
> > The basic operation is "replace child".
> > 
> > Beware of race conditions.  Consider:
> > 
> >           BB
> >           |
> >     mirror-filter
> >           |
> >          BDS
> > 
> > Add a throttle filter under BB while the mirror job is running.  First
> > step, create the filter:
> > 
> >           BB    throttle-filter
> >           |     /
> >     mirror-filter
> >           |
> >          BDS
> > 
> > Second step, replace child of BB by the new filter:
> > 
> >           BB
> >           |
> >    throttle-filter
> >           |
> >     mirror-filter
> >           |
> >          BDS
> > 
> > But: if mirror-filter goes away between the two steps, the replace
> > brings it right back!
> > 
> > To guard against such races, we need to specify both ends of the edge
> > being replaced, i.e. parent, child name, actual child.  Then the replace
> > step fails if the mirror-filter has gone away.  We can either fail the
> > whole operation, or start over.
> > 
> > Alternatively, transactions, but that feels much more complex.
> > 
> 
> Isn't it easy to make creating throttle-filter and replacing child
> happen in the same critical section of BQL, without any coroutine
> yield?

Letting it happen in the same critical section of BQL would mean a
QMP transaction of blockdev-add (for the new throttle node) and
blockdev-replace-child or whatever it would be called. We weren't quite
confident that trying to make blockdev-add transactionable would be
possible or a good idea.

> If so I think there is no race to worry about, mirror-filter should go
> away only after a QMP command.

Currently, a mirror job goes away whenever it is done. This is not
directly tied to a QMP command.

Of course, in the new job API we want an explicit job-delete, so in
that case it wouldn't happen, but we need to keep the old case for
compatibility.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-07  9:48   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2016-12-07 10:03     ` Fam Zheng
  2016-12-08 12:46       ` Markus Armbruster
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2016-12-07 10:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Markus Armbruster, qemu-devel, qemu-block

On Wed, 12/07 10:48, Kevin Wolf wrote:
> > If so I think there is no race to worry about, mirror-filter should go
> > away only after a QMP command.
> 
> Currently, a mirror job goes away whenever it is done. This is not
> directly tied to a QMP command.

Ah right, block-job-complete is only "start to complete" and the job goes away
at some later point. I thought this is "the" QMP command but it is not.

> 
> Of course, in the new job API we want an explicit job-delete, so in
> that case it wouldn't happen, but we need to keep the old case for
> compatibility.

Another possibility is to add a placeholder node in the right location first
then fill in the actual throttling node once created. QMP owns the placeholder
node so it won't suddenly vanish when mirror job goes away.

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-07 10:03     ` Fam Zheng
@ 2016-12-08 12:46       ` Markus Armbruster
  2016-12-08 13:47         ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Armbruster @ 2016-12-08 12:46 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-block, qemu-devel

Fam Zheng <famz@redhat.com> writes:

> On Wed, 12/07 10:48, Kevin Wolf wrote:
>> > If so I think there is no race to worry about, mirror-filter should go
>> > away only after a QMP command.
>> 
>> Currently, a mirror job goes away whenever it is done. This is not
>> directly tied to a QMP command.
>
> Ah right, block-job-complete is only "start to complete" and the job goes away
> at some later point. I thought this is "the" QMP command but it is not.
>
>> 
>> Of course, in the new job API we want an explicit job-delete, so in
>> that case it wouldn't happen, but we need to keep the old case for
>> compatibility.
>
> Another possibility is to add a placeholder node in the right location first
> then fill in the actual throttling node once created. QMP owns the placeholder
> node so it won't suddenly vanish when mirror job goes away.

I'm not sure I understand this idea.  Could you explain it in a bit more
detail, perhaps even with a bit of ASCII art?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-08 12:46       ` Markus Armbruster
@ 2016-12-08 13:47         ` Fam Zheng
  2016-12-12 13:58           ` Markus Armbruster
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2016-12-08 13:47 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-block, qemu-devel

On Thu, 12/08 13:46, Markus Armbruster wrote:
> Fam Zheng <famz@redhat.com> writes:
> 
> > On Wed, 12/07 10:48, Kevin Wolf wrote:
> >> > If so I think there is no race to worry about, mirror-filter should go
> >> > away only after a QMP command.
> >> 
> >> Currently, a mirror job goes away whenever it is done. This is not
> >> directly tied to a QMP command.
> >
> > Ah right, block-job-complete is only "start to complete" and the job goes away
> > at some later point. I thought this is "the" QMP command but it is not.
> >
> >> 
> >> Of course, in the new job API we want an explicit job-delete, so in
> >> that case it wouldn't happen, but we need to keep the old case for
> >> compatibility.
> >
> > Another possibility is to add a placeholder node in the right location first
> > then fill in the actual throttling node once created. QMP owns the placeholder
> > node so it won't suddenly vanish when mirror job goes away.
> 
> I'm not sure I understand this idea.  Could you explain it in a bit more
> detail, perhaps even with a bit of ASCII art?

OK, I'll base on your version:

== Basic dynamic reconfiguration operation ==

The two primitives are "insert placeholder" and bdrv_replace_child;

A placeholder BDS is a dummy "pass through everything" filter. Inserting
placeholder is something like:

    void bdrv_insert_placeholder(BdrvChild *child) {
        BlockDriverState *ph = bdrv_new_placeholder();
        ph->file->bs = child->bs;
        child->bs = ph;
    }

Consider:

          BB
          |
    mirror-filter
          |
         BDS

Add a throttle filter under BB while the mirror job is running.  First step,
create the dummy _placeholder_ node and throttle-filter node (the existence of
placeholder node can be an implementation detail hidden from user):

    placeholder = bdrv_insert_placeholder(BB->root);
    throttle_filter = bdrv_create_throttle_filter(placeholder);

we get:

          BB     throttle-filter
          |     /
     placeholder
          |
    mirror-filter
          |
         BDS

Second step, move throttle filter in:

    bdrv_replace_child(BB->root, throttle_filter);
    bdrv_replace_child(throttle_filter->file, throttle_filter->file->bs->file->bs);
    bdrv_unref(placeholder);

we get:

          BB
          |
   throttle-filter
          |
    mirror-filter
          |
         BDS

Does this make us safe from racing with mirror-filter disappearing?

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-08 13:47         ` Fam Zheng
@ 2016-12-12 13:58           ` Markus Armbruster
  2016-12-12 14:43             ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Armbruster @ 2016-12-12 13:58 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel, qemu-block

Fam Zheng <famz@redhat.com> writes:

> On Thu, 12/08 13:46, Markus Armbruster wrote:
>> Fam Zheng <famz@redhat.com> writes:
>> 
>> > On Wed, 12/07 10:48, Kevin Wolf wrote:
>> >> > If so I think there is no race to worry about, mirror-filter should go
>> >> > away only after a QMP command.
>> >> 
>> >> Currently, a mirror job goes away whenever it is done. This is not
>> >> directly tied to a QMP command.
>> >
>> > Ah right, block-job-complete is only "start to complete" and the job goes away
>> > at some later point. I thought this is "the" QMP command but it is not.
>> >
>> >> 
>> >> Of course, in the new job API we want an explicit job-delete, so in
>> >> that case it wouldn't happen, but we need to keep the old case for
>> >> compatibility.
>> >
>> > Another possibility is to add a placeholder node in the right location first
>> > then fill in the actual throttling node once created. QMP owns the placeholder
>> > node so it won't suddenly vanish when mirror job goes away.
>> 
>> I'm not sure I understand this idea.  Could you explain it in a bit more
>> detail, perhaps even with a bit of ASCII art?
>
> OK, I'll base on your version:
>
> == Basic dynamic reconfiguration operation ==
>
> The two primitives are "insert placeholder" and bdrv_replace_child;
>
> A placeholder BDS is a dummy "pass through everything" filter. Inserting
> placeholder is something like:
>
>     void bdrv_insert_placeholder(BdrvChild *child) {
>         BlockDriverState *ph = bdrv_new_placeholder();
>         ph->file->bs = child->bs;
>         child->bs = ph;
>     }
>
> Consider:
>
>           BB
>           |
>     mirror-filter
>           |
>          BDS
>
> Add a throttle filter under BB while the mirror job is running.  First step,
> create the dummy _placeholder_ node and throttle-filter node (the existence of
> placeholder node can be an implementation detail hidden from user):
>
>     placeholder = bdrv_insert_placeholder(BB->root);
>     throttle_filter = bdrv_create_throttle_filter(placeholder);
>
> we get:
>
>           BB     throttle-filter
>           |     /
>      placeholder
>           |
>     mirror-filter
>           |
>          BDS
>
> Second step, move throttle filter in:
>
>     bdrv_replace_child(BB->root, throttle_filter);
>     bdrv_replace_child(throttle_filter->file, throttle_filter->file->bs->file->bs);
>     bdrv_unref(placeholder);
>
> we get:
>
>           BB
>           |
>    throttle-filter
>           |
>     mirror-filter
>           |
>          BDS
>
> Does this make us safe from racing with mirror-filter disappearing?

Thanks!  I figure your placeholder node does immunize the splicing in of
the throttle-filter against the removal of mirror-filter.

Given a specific dynamic reconfiguration, where do we have to put
placeholder nodes to immunize it against arbitrary other dynamic
changes?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-12 13:58           ` Markus Armbruster
@ 2016-12-12 14:43             ` Fam Zheng
  2016-12-12 18:23               ` Markus Armbruster
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2016-12-12 14:43 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, qemu-block

On Mon, 12/12 14:58, Markus Armbruster wrote:
> Fam Zheng <famz@redhat.com> writes:
> 
> > On Thu, 12/08 13:46, Markus Armbruster wrote:
> >> Fam Zheng <famz@redhat.com> writes:
> >> 
> >> > On Wed, 12/07 10:48, Kevin Wolf wrote:
> >> >> > If so I think there is no race to worry about, mirror-filter should go
> >> >> > away only after a QMP command.
> >> >> 
> >> >> Currently, a mirror job goes away whenever it is done. This is not
> >> >> directly tied to a QMP command.
> >> >
> >> > Ah right, block-job-complete is only "start to complete" and the job goes away
> >> > at some later point. I thought this is "the" QMP command but it is not.
> >> >
> >> >> 
> >> >> Of course, in the new job API we want an explicit job-delete, so in
> >> >> that case it wouldn't happen, but we need to keep the old case for
> >> >> compatibility.
> >> >
> >> > Another possibility is to add a placeholder node in the right location first
> >> > then fill in the actual throttling node once created. QMP owns the placeholder
> >> > node so it won't suddenly vanish when mirror job goes away.
> >> 
> >> I'm not sure I understand this idea.  Could you explain it in a bit more
> >> detail, perhaps even with a bit of ASCII art?
> >
> > OK, I'll base on your version:
> >
> > == Basic dynamic reconfiguration operation ==
> >
> > The two primitives are "insert placeholder" and bdrv_replace_child;
> >
> > A placeholder BDS is a dummy "pass through everything" filter. Inserting
> > placeholder is something like:
> >
> >     void bdrv_insert_placeholder(BdrvChild *child) {
> >         BlockDriverState *ph = bdrv_new_placeholder();
> >         ph->file->bs = child->bs;
> >         child->bs = ph;
> >     }
> >
> > Consider:
> >
> >           BB
> >           |
> >     mirror-filter
> >           |
> >          BDS
> >
> > Add a throttle filter under BB while the mirror job is running.  First step,
> > create the dummy _placeholder_ node and throttle-filter node (the existence of
> > placeholder node can be an implementation detail hidden from user):
> >
> >     placeholder = bdrv_insert_placeholder(BB->root);
> >     throttle_filter = bdrv_create_throttle_filter(placeholder);
> >
> > we get:
> >
> >           BB     throttle-filter
> >           |     /
> >      placeholder
> >           |
> >     mirror-filter
> >           |
> >          BDS
> >
> > Second step, move throttle filter in:
> >
> >     bdrv_replace_child(BB->root, throttle_filter);
> >     bdrv_replace_child(throttle_filter->file, throttle_filter->file->bs->file->bs);
> >     bdrv_unref(placeholder);
> >
> > we get:
> >
> >           BB
> >           |
> >    throttle-filter
> >           |
> >     mirror-filter
> >           |
> >          BDS
> >
> > Does this make us safe from racing with mirror-filter disappearing?
> 
> Thanks!  I figure your placeholder node does immunize the splicing in of
> the throttle-filter against the removal of mirror-filter.
> 
> Given a specific dynamic reconfiguration, where do we have to put
> placeholder nodes to immunize it against arbitrary other dynamic
> changes?

Generally, a placeholder node is essentially an insertion to the middle of an
existing edge, namely to change:

         NodeA
       /   |   \
      /    |    \
 [...]   NodeB   [...]

into:

         NodeA
       /   |   \
      /    |    \
 [...]  p-holdr  [...]
           |
           |
         NodeB

and the operation is safe because of BQL. Our later bdrv_replace_child()
operations on it will always work because a dynamic reconfiguration won't delete
the placeholder node in any case.

On the other hand, concurrent dynamic changes could yield usual outcomes from
user PoV (like the involved node is no longer in the same place or disappears),
but I think it is still in some degree under user's control. E.g. a
mirror-filter won't go away automatically until user does
block-job-complete/cancel.

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-12 14:43             ` Fam Zheng
@ 2016-12-12 18:23               ` Markus Armbruster
  2016-12-13  6:51                 ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Armbruster @ 2016-12-12 18:23 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel, qemu-block

Fam Zheng <famz@redhat.com> writes:

> On Mon, 12/12 14:58, Markus Armbruster wrote:
>> Fam Zheng <famz@redhat.com> writes:
>> 
>> > On Thu, 12/08 13:46, Markus Armbruster wrote:
>> >> Fam Zheng <famz@redhat.com> writes:
>> >> 
>> >> > On Wed, 12/07 10:48, Kevin Wolf wrote:
>> >> >> > If so I think there is no race to worry about, mirror-filter should go
>> >> >> > away only after a QMP command.
>> >> >> 
>> >> >> Currently, a mirror job goes away whenever it is done. This is not
>> >> >> directly tied to a QMP command.
>> >> >
>> >> > Ah right, block-job-complete is only "start to complete" and the job goes away
>> >> > at some later point. I thought this is "the" QMP command but it is not.
>> >> >
>> >> >> 
>> >> >> Of course, in the new job API we want an explicit job-delete, so in
>> >> >> that case it wouldn't happen, but we need to keep the old case for
>> >> >> compatibility.
>> >> >
>> >> > Another possibility is to add a placeholder node in the right location first
>> >> > then fill in the actual throttling node once created. QMP owns the placeholder
>> >> > node so it won't suddenly vanish when mirror job goes away.
>> >> 
>> >> I'm not sure I understand this idea.  Could you explain it in a bit more
>> >> detail, perhaps even with a bit of ASCII art?
>> >
>> > OK, I'll base on your version:
>> >
>> > == Basic dynamic reconfiguration operation ==
>> >
>> > The two primitives are "insert placeholder" and bdrv_replace_child;
>> >
>> > A placeholder BDS is a dummy "pass through everything" filter. Inserting
>> > placeholder is something like:
>> >
>> >     void bdrv_insert_placeholder(BdrvChild *child) {
>> >         BlockDriverState *ph = bdrv_new_placeholder();
>> >         ph->file->bs = child->bs;
>> >         child->bs = ph;
>> >     }
>> >
>> > Consider:
>> >
>> >           BB
>> >           |
>> >     mirror-filter
>> >           |
>> >          BDS
>> >
>> > Add a throttle filter under BB while the mirror job is running.  First step,
>> > create the dummy _placeholder_ node and throttle-filter node (the existence of
>> > placeholder node can be an implementation detail hidden from user):
>> >
>> >     placeholder = bdrv_insert_placeholder(BB->root);
>> >     throttle_filter = bdrv_create_throttle_filter(placeholder);
>> >
>> > we get:
>> >
>> >           BB     throttle-filter
>> >           |     /
>> >      placeholder
>> >           |
>> >     mirror-filter
>> >           |
>> >          BDS
>> >
>> > Second step, move throttle filter in:
>> >
>> >     bdrv_replace_child(BB->root, throttle_filter);
>> >     bdrv_replace_child(throttle_filter->file, throttle_filter->file->bs->file->bs);
>> >     bdrv_unref(placeholder);
>> >
>> > we get:
>> >
>> >           BB
>> >           |
>> >    throttle-filter
>> >           |
>> >     mirror-filter
>> >           |
>> >          BDS
>> >
>> > Does this make us safe from racing with mirror-filter disappearing?
>> 
>> Thanks!  I figure your placeholder node does immunize the splicing in of
>> the throttle-filter against the removal of mirror-filter.
>> 
>> Given a specific dynamic reconfiguration, where do we have to put
>> placeholder nodes to immunize it against arbitrary other dynamic
>> changes?
>
> Generally, a placeholder node is essentially an insertion to the middle of an
> existing edge, namely to change:
>
>          NodeA
>        /   |   \
>       /    |    \
>  [...]   NodeB   [...]
>
> into:
>
>          NodeA
>        /   |   \
>       /    |    \
>  [...]  p-holdr  [...]
>            |
>            |
>          NodeB
>
> and the operation is safe because of BQL. Our later bdrv_replace_child()
> operations on it will always work because a dynamic reconfiguration won't delete
> the placeholder node in any case.
>
> On the other hand, concurrent dynamic changes could yield usual outcomes from
> user PoV (like the involved node is no longer in the same place or disappears),
> but I think it is still in some degree under user's control. E.g. a
> mirror-filter won't go away automatically until user does
> block-job-complete/cancel.

I think I understand how we can safely insert a placeholder.  My
question is: *where* do we have to insert placeholders before a certain
dynamic reconfiguration to make that reconfiguration safe?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  Meeting notes on -blockdev, dynamic backend reconfiguration
  2016-12-12 18:23               ` Markus Armbruster
@ 2016-12-13  6:51                 ` Fam Zheng
  0 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2016-12-13  6:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, qemu-block

On Mon, 12/12 19:23, Markus Armbruster wrote:
> Fam Zheng <famz@redhat.com> writes:
> 
> > On Mon, 12/12 14:58, Markus Armbruster wrote:
> >> Fam Zheng <famz@redhat.com> writes:
> >> 
> >> > On Thu, 12/08 13:46, Markus Armbruster wrote:
> >> >> Fam Zheng <famz@redhat.com> writes:
> >> >> 
> >> >> > On Wed, 12/07 10:48, Kevin Wolf wrote:
> >> >> >> > If so I think there is no race to worry about, mirror-filter should go
> >> >> >> > away only after a QMP command.
> >> >> >> 
> >> >> >> Currently, a mirror job goes away whenever it is done. This is not
> >> >> >> directly tied to a QMP command.
> >> >> >
> >> >> > Ah right, block-job-complete is only "start to complete" and the job goes away
> >> >> > at some later point. I thought this is "the" QMP command but it is not.
> >> >> >
> >> >> >> 
> >> >> >> Of course, in the new job API we want an explicit job-delete, so in
> >> >> >> that case it wouldn't happen, but we need to keep the old case for
> >> >> >> compatibility.
> >> >> >
> >> >> > Another possibility is to add a placeholder node in the right location first
> >> >> > then fill in the actual throttling node once created. QMP owns the placeholder
> >> >> > node so it won't suddenly vanish when mirror job goes away.
> >> >> 
> >> >> I'm not sure I understand this idea.  Could you explain it in a bit more
> >> >> detail, perhaps even with a bit of ASCII art?
> >> >
> >> > OK, I'll base on your version:
> >> >
> >> > == Basic dynamic reconfiguration operation ==
> >> >
> >> > The two primitives are "insert placeholder" and bdrv_replace_child;
> >> >
> >> > A placeholder BDS is a dummy "pass through everything" filter. Inserting
> >> > placeholder is something like:
> >> >
> >> >     void bdrv_insert_placeholder(BdrvChild *child) {
> >> >         BlockDriverState *ph = bdrv_new_placeholder();
> >> >         ph->file->bs = child->bs;
> >> >         child->bs = ph;
> >> >     }
> >> >
> >> > Consider:
> >> >
> >> >           BB
> >> >           |
> >> >     mirror-filter
> >> >           |
> >> >          BDS
> >> >
> >> > Add a throttle filter under BB while the mirror job is running.  First step,
> >> > create the dummy _placeholder_ node and throttle-filter node (the existence of
> >> > placeholder node can be an implementation detail hidden from user):
> >> >
> >> >     placeholder = bdrv_insert_placeholder(BB->root);
> >> >     throttle_filter = bdrv_create_throttle_filter(placeholder);
> >> >
> >> > we get:
> >> >
> >> >           BB     throttle-filter
> >> >           |     /
> >> >      placeholder
> >> >           |
> >> >     mirror-filter
> >> >           |
> >> >          BDS
> >> >
> >> > Second step, move throttle filter in:
> >> >
> >> >     bdrv_replace_child(BB->root, throttle_filter);
> >> >     bdrv_replace_child(throttle_filter->file, throttle_filter->file->bs->file->bs);
> >> >     bdrv_unref(placeholder);
> >> >
> >> > we get:
> >> >
> >> >           BB
> >> >           |
> >> >    throttle-filter
> >> >           |
> >> >     mirror-filter
> >> >           |
> >> >          BDS
> >> >
> >> > Does this make us safe from racing with mirror-filter disappearing?
> >> 
> >> Thanks!  I figure your placeholder node does immunize the splicing in of
> >> the throttle-filter against the removal of mirror-filter.
> >> 
> >> Given a specific dynamic reconfiguration, where do we have to put
> >> placeholder nodes to immunize it against arbitrary other dynamic
> >> changes?
> >
> > Generally, a placeholder node is essentially an insertion to the middle of an
> > existing edge, namely to change:
> >
> >          NodeA
> >        /   |   \
> >       /    |    \
> >  [...]   NodeB   [...]
> >
> > into:
> >
> >          NodeA
> >        /   |   \
> >       /    |    \
> >  [...]  p-holdr  [...]
> >            |
> >            |
> >          NodeB
> >
> > and the operation is safe because of BQL. Our later bdrv_replace_child()
> > operations on it will always work because a dynamic reconfiguration won't delete
> > the placeholder node in any case.
> >
> > On the other hand, concurrent dynamic changes could yield usual outcomes from
> > user PoV (like the involved node is no longer in the same place or disappears),
> > but I think it is still in some degree under user's control. E.g. a
> > mirror-filter won't go away automatically until user does
> > block-job-complete/cancel.
> 
> I think I understand how we can safely insert a placeholder.  My
> question is: *where* do we have to insert placeholders before a certain
> dynamic reconfiguration to make that reconfiguration safe?

The placeholder will mainly protect "insertions" and therefore the location
woule be on edge where we insert a new filter.

Removing a node doesn't need placeholder protection.

Updating a node (to a different type) probably doesn't need that either but I'm
not 100% sure.

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-12-13  6:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-05 12:03 [Qemu-devel] Meeting notes on -blockdev, dynamic backend reconfiguration Markus Armbruster
2016-12-06 11:03 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-12-07  2:55 ` [Qemu-devel] " Fam Zheng
2016-12-07  9:48   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-12-07 10:03     ` Fam Zheng
2016-12-08 12:46       ` Markus Armbruster
2016-12-08 13:47         ` Fam Zheng
2016-12-12 13:58           ` Markus Armbruster
2016-12-12 14:43             ` Fam Zheng
2016-12-12 18:23               ` Markus Armbruster
2016-12-13  6:51                 ` Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).