[Qemu-devel] blockdev-commit design

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] blockdev-commit design
@ 2017-09-26 17:59 Kevin Wolf
  2017-09-26 18:29 ` Eric Blake
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kevin Wolf @ 2017-09-26 17:59 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, stefanha, jcody

Hi,

as the next step after my commit block job fixes, I'm trying to
implement a new, clean version of the QMP command, which I'm calling
blockdev-commit for consistency with all the other "modern" QMP
commands.

I'll start with the schema that I have so far:

{ 'command': 'blockdev-commit',
  'data': { 'job-id': 'str', 'top': 'str', '*base': 'str'
            '*backing-file': 'str', '*speed': 'int',
            '*filter-node-name': 'str' } }

In comparison with the old command, the important changes are:

  * top/base are node names instead of file names.

  * You don't need to specify the active layer any more (not the least
    because there could very well be more than one of them), but top
    becomes mandatory instead.

  * top/base don't accept device (BlockBackend) names, so for
    consistency with other block jobs, job-id becomes mandatory.

    Possible alternative: Accept device names and make them the default
    for job-id. This is more consistent with existing blockdev-*
    commands, but I consider BlockBackend names deprecated, so I prefer
    not adding them here.

  * filer-node-name is kept optional for now. Should we make it
    mandatory, too?

This was the easy part. Then I started looking at the code and found a
few a bit more interesting questions:

  * The old block-commit command decides between an "actual" commit job
    and the mirror-based active commit based on whether top is the
    active layer.

    We don't get an option for the active layer any more now, so this
    isn't how things can work for blockdev-commit. We could probably
    check whether top has a BlockBackend parent, but that's not really
    what we're interested in anyway. Maybe the best we could do to
    decide this automatically is to check whether there is any parent of
    top that requires write permissions. If there is, we need active
    commit, otherwise the "normal" one is good enough.

    However, who says that the intentions of the user stay as we deduce
    them at the start of the block job? Who says that the user doesn't
    want to add a writable disk as a user of the node while the block
    job is running?

    The optimal solution to this would be that the commit filter node
    responds to permission requests and switches between active and
    "normal" commit mode. I'm not sure how hard this would be to
    implement.

    As long as we don't have the automatic switch, do we have to allow
    the user to specify explicitly which mode they want instead of
    automatically choosing one?

  * The 'backing-file' option (which specifies the new backing file
    string for parents after the commit job completes) is currently not
    allowed if top is the active layer. If we allow graph changes, this
    doesn't seem to make sense to me because even if top doesn't have a
    parent node when the job starts, it could have one when it's
    completed.

Any opinions on this, especially the active/normal commit thing?

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] blockdev-commit design
  2017-09-26 17:59 [Qemu-devel] blockdev-commit design Kevin Wolf
@ 2017-09-26 18:29 ` Eric Blake
  2017-09-26 19:11   ` Kevin Wolf
  2017-09-27 16:29 ` Max Reitz
  2017-10-02 15:01 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
  2 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2017-09-26 18:29 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: qemu-devel, mreitz, stefanha, jcody

[-- Attachment #1: Type: text/plain, Size: 6147 bytes --]

On 09/26/2017 12:59 PM, Kevin Wolf wrote:
> Hi,
> 
> as the next step after my commit block job fixes, I'm trying to
> implement a new, clean version of the QMP command, which I'm calling
> blockdev-commit for consistency with all the other "modern" QMP
> commands.
> 
> I'll start with the schema that I have so far:
> 
> { 'command': 'blockdev-commit',
>   'data': { 'job-id': 'str', 'top': 'str', '*base': 'str'
>             '*backing-file': 'str', '*speed': 'int',
>             '*filter-node-name': 'str' } }

Seems okay at first glance, modulo your discussion below on active vs.
passive.

> 
> In comparison with the old command, the important changes are:
> 
>   * top/base are node names instead of file names.

I can agree to that.  Do we still allow device names to resolve into the
top-most node attached to the device? That matters for 'top', but not
for 'base'.

> 
>   * You don't need to specify the active layer any more (not the least
>     because there could very well be more than one of them), but top
>     becomes mandatory instead.

Libvirt should be fine with that.

> 
>   * top/base don't accept device (BlockBackend) names, so for
>     consistency with other block jobs, job-id becomes mandatory.
> 
>     Possible alternative: Accept device names and make them the default
>     for job-id. This is more consistent with existing blockdev-*
>     commands, but I consider BlockBackend names deprecated, so I prefer
>     not adding them here.

Oh, you're answering my question above.  I'm okay if job-id is
mandatory, even if we allow the shortcut of a device name for 'top'
mapping to its attached active node.

> 
>   * filer-node-name is kept optional for now. Should we make it
>     mandatory, too?
> 
> This was the easy part. Then I started looking at the code and found a
> few a bit more interesting questions:
> 
>   * The old block-commit command decides between an "actual" commit job
>     and the mirror-based active commit based on whether top is the
>     active layer.

And libvirt HAS to know whether it is requesting an active vs.
intermediate commit job up front, because the two code paths have
different expected signals for handling job completion (it is only
active commit that reaches a ready point between phases, requiring
further QMP commands to end the job).

> 
>     We don't get an option for the active layer any more now, so this
>     isn't how things can work for blockdev-commit. We could probably
>     check whether top has a BlockBackend parent, but that's not really
>     what we're interested in anyway. Maybe the best we could do to
>     decide this automatically is to check whether there is any parent of
>     top that requires write permissions. If there is, we need active
>     commit, otherwise the "normal" one is good enough.
> 
>     However, who says that the intentions of the user stay as we deduce
>     them at the start of the block job? Who says that the user doesn't
>     want to add a writable disk as a user of the node while the block
>     job is running?
> 
>     The optimal solution to this would be that the commit filter node
>     responds to permission requests and switches between active and
>     "normal" commit mode. I'm not sure how hard this would be to
>     implement.
> 
>     As long as we don't have the automatic switch, do we have to allow
>     the user to specify explicitly which mode they want instead of
>     automatically choosing one?

When committing one read-only image into another, you don't need the
active mode.  On the other hand, committing a writeable image generally
means you don't want to lose any data, even as further writes happen
while the job is ongoing.  Does a "normal" mode commit make sense on a
writeable image (perhaps as a point-in-time operation: all data that was
present when the job started gets written, but if we do a NEW write, we
make sure to FIRST commit the old data into the backing file then do the
write into the active layer, and mark that cluster as no longer needing
commit), differently from an "active" mode commit (a write to the active
layer dirties the cluster, and we make as many passes as necessary,
possibly writing some clusters to the backing file multiple times, so
that the backing file contains the contents at the point the job ends
rather than starts).

With our existing active commit code, is there a way to do an
intermediate style commit instead of an active commit (by passing the
node name instead of the device name, even though it resolves to the
same 'top' node)?

Maybe an optional boolean is worth having, where we default to active if
'top' is writable and 'normal' otherwise; but can set the boolean to
force 'normal' even on a writable, and where setting the boolean on
something that is not writable is either a no-op or an error?

> 
>   * The 'backing-file' option (which specifies the new backing file
>     string for parents after the commit job completes) is currently not
>     allowed if top is the active layer. If we allow graph changes, this
>     doesn't seem to make sense to me because even if top doesn't have a
>     parent node when the job starts, it could have one when it's
>     completed.

Based on your recent patch series, I think we're still murky on exactly
what graph changes the op-blockers are going to prevent.  But allowing
'backing-file' even when 'top' starts life as the active layer makes
sense, as we may create a snapshot or some other operation that changes
'top' into something that is no longer active, without invalidating the
fact that we are doing a commit job (but there's also the tricky issue
of whether libvirt should expect only one event with no followup command
to end the job, or two events marking the two phases where a followup is
necessary).

> 
> Any opinions on this, especially the active/normal commit thing?
> 
> Kevin
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] blockdev-commit design
  2017-09-26 18:29 ` Eric Blake
@ 2017-09-26 19:11   ` Kevin Wolf
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2017-09-26 19:11 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, qemu-devel, mreitz, stefanha, jcody

[-- Attachment #1: Type: text/plain, Size: 6592 bytes --]

Am 26.09.2017 um 20:29 hat Eric Blake geschrieben:
> On 09/26/2017 12:59 PM, Kevin Wolf wrote:
> > This was the easy part. Then I started looking at the code and found a
> > few a bit more interesting questions:
> > 
> >   * The old block-commit command decides between an "actual" commit job
> >     and the mirror-based active commit based on whether top is the
> >     active layer.
> 
> And libvirt HAS to know whether it is requesting an active vs.
> intermediate commit job up front, because the two code paths have
> different expected signals for handling job completion (it is only
> active commit that reaches a ready point between phases, requiring
> further QMP commands to end the job).

This is a good point. If this isn't transparent and libvirt has to know
anyway which kind of commit job it is using, there is no point in hiding
the difference in the QMP command that starts the job.

> >     We don't get an option for the active layer any more now, so this
> >     isn't how things can work for blockdev-commit. We could probably
> >     check whether top has a BlockBackend parent, but that's not really
> >     what we're interested in anyway. Maybe the best we could do to
> >     decide this automatically is to check whether there is any parent of
> >     top that requires write permissions. If there is, we need active
> >     commit, otherwise the "normal" one is good enough.
> > 
> >     However, who says that the intentions of the user stay as we deduce
> >     them at the start of the block job? Who says that the user doesn't
> >     want to add a writable disk as a user of the node while the block
> >     job is running?
> > 
> >     The optimal solution to this would be that the commit filter node
> >     responds to permission requests and switches between active and
> >     "normal" commit mode. I'm not sure how hard this would be to
> >     implement.
> > 
> >     As long as we don't have the automatic switch, do we have to allow
> >     the user to specify explicitly which mode they want instead of
> >     automatically choosing one?
> 
> When committing one read-only image into another, you don't need the
> active mode.  On the other hand, committing a writeable image generally
> means you don't want to lose any data, even as further writes happen
> while the job is ongoing.  Does a "normal" mode commit make sense on a
> writeable image

No, it doesn't. If you start a "normal" mode commit, it wouldn't have
BLK_PERM_WRITE in its shared permissions, so it would fail to start if
there is a writer. And once it is started, adding a writer would fail.

The explicit option would be for the case where a "normal" mode commit
is possible to start, but you intend to add a writer later, so you want
to start an active commit even though the writer doesn't exist yet.

> (perhaps as a point-in-time operation: all data that was present when
> the job started gets written, but if we do a NEW write, we make sure
> to FIRST commit the old data into the backing file then do the write
> into the active layer, and mark that cluster as no longer needing
> commit), differently from an "active" mode commit (a write to the
> active layer dirties the cluster, and we make as many passes as
> necessary, possibly writing some clusters to the backing file multiple
> times, so that the backing file contains the contents at the point the
> job ends rather than starts).

This sounds a lot like what the backup job is to mirror, just applied to
commit. It's an interesting idea, but I'm not sure if it wouldn't be a
separate block job then, like backup is separate from mirror.

Or maybe backup actually becomes very similar to the mirror job once we
implement the active mirror which intercepts requests...

> With our existing active commit code, is there a way to do an
> intermediate style commit instead of an active commit (by passing the
> node name instead of the device name, even though it resolves to the
> same 'top' node)?

No, passing the node name and passing the device name both starts an
active commit job.

> Maybe an optional boolean is worth having, where we default to active
> if 'top' is writable and 'normal' otherwise; but can set the boolean
> to force 'normal' even on a writable, and where setting the boolean on
> something that is not writable is either a no-op or an error?

I think if we add an option, I would actually make it mandatory.

Another thought I had is that we could make blockdev-commit support only
'normal' and extend the blockdev-mirror command so that you can use that
to start an active commit. I don't think there is a lot of code in the
mirror job implementation that has special cases for active commit.

> > 
> >   * The 'backing-file' option (which specifies the new backing file
> >     string for parents after the commit job completes) is currently not
> >     allowed if top is the active layer. If we allow graph changes, this
> >     doesn't seem to make sense to me because even if top doesn't have a
> >     parent node when the job starts, it could have one when it's
> >     completed.
> 
> Based on your recent patch series, I think we're still murky on exactly
> what graph changes the op-blockers are going to prevent.

Yes, it's still unclear how the protection is going to work in detail.
At the moment we're still using the old op blockers which just prevent
any change.

But I think the expected future behaviour for individual cases can be
reasonably clear. Adding new parents for the top node in a commit job
seems completely fine to me (because the commit operation should be
totally transparent for anything that looks at top), so maybe the APIs
should anticipate this.

> But allowing 'backing-file' even when 'top' starts life as the active
> layer makes sense, as we may create a snapshot or some other operation
> that changes 'top' into something that is no longer active, without
> invalidating the fact that we are doing a commit job

Right, taking a snapshot is a good practical example for adding a new
parent. Currently the old op blockers prevent this, but it's clear to me
that we want to allow this one day.

> (but there's also the tricky issue of whether libvirt should expect
> only one event with no followup command to end the job, or two events
> marking the two phases where a followup is necessary).

Yes. This seems to be an argument in favour of explicitly requesting
'normal' or active mode in the QMP command.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] blockdev-commit design
  2017-09-26 17:59 [Qemu-devel] blockdev-commit design Kevin Wolf
  2017-09-26 18:29 ` Eric Blake
@ 2017-09-27 16:29 ` Max Reitz
  2017-10-02 15:01 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
  2 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2017-09-27 16:29 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: qemu-devel, eblake, stefanha, jcody

[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

On 2017-09-26 19:59, Kevin Wolf wrote:

[...]

>   * The old block-commit command decides between an "actual" commit job
>     and the mirror-based active commit based on whether top is the
>     active layer.
> 
>     We don't get an option for the active layer any more now, so this
>     isn't how things can work for blockdev-commit. We could probably
>     check whether top has a BlockBackend parent, but that's not really
>     what we're interested in anyway. Maybe the best we could do to
>     decide this automatically is to check whether there is any parent of
>     top that requires write permissions. If there is, we need active
>     commit, otherwise the "normal" one is good enough.
> 
>     However, who says that the intentions of the user stay as we deduce
>     them at the start of the block job? Who says that the user doesn't
>     want to add a writable disk as a user of the node while the block
>     job is running?
> 
>     The optimal solution to this would be that the commit filter node
>     responds to permission requests and switches between active and
>     "normal" commit mode. I'm not sure how hard this would be to
>     implement.
> 
>     As long as we don't have the automatic switch, do we have to allow
>     the user to specify explicitly which mode they want instead of
>     automatically choosing one?

Probably a stupid question: What advantage does the commit job have over
the mirror job? I.e. would it be possible to just always do a mirror?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] blockdev-commit design
  2017-09-26 17:59 [Qemu-devel] blockdev-commit design Kevin Wolf
  2017-09-26 18:29 ` Eric Blake
  2017-09-27 16:29 ` Max Reitz
@ 2017-10-02 15:01 ` Kashyap Chamarthy
  2017-10-02 15:32   ` Kevin Wolf
  2 siblings, 1 reply; 7+ messages in thread
From: Kashyap Chamarthy @ 2017-10-02 15:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, stefanha, mreitz

On Tue, Sep 26, 2017 at 07:59:42PM +0200, Kevin Wolf wrote:

[...]

> In comparison with the old command, the important changes are:
> 
>   * top/base are node names instead of file names.
> 
>   * You don't need to specify the active layer any more (not the least
>     because there could very well be more than one of them), but top
>     becomes mandatory instead.

As a user of 'block-commit', I'm a bit confused on your above point.
Two questions:

(1) During active block commit, isn't "active layer" == "top"? 

    With the existing QMP `block-commit`, given: 
    
        A <- B <- C <- D (active)
    
    To merge B, C, and D (the active layer) into A, I _have_ to specify
    the active layer, which is "D", as a 'top' parameter:

        {
            "execute": "block-commit",
            "arguments": {
                "device": "node-D",
                "job-id": "job0",
                "top": "d.qcow2",
                "base": "a.qcow2"
            }
        }

    So when merging the top-most layer (D), there's at least one
    scenario where we _are_ specifying the "active layer".  And 'top'
    _is_ mandatory as seen above.  
    
    So I wonder if I'm misinterpreting your wording.

(2) Also, just for my own education, can you mind expanding a bit more
    about the "there can be more than one active layer" scenario?

Thanks.

[...]

-- 
/kashyap

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] blockdev-commit design
  2017-10-02 15:01 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
@ 2017-10-02 15:32   ` Kevin Wolf
  2017-10-03  8:00     ` Kashyap Chamarthy
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2017-10-02 15:32 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: qemu-block, qemu-devel, stefanha, mreitz

Am 02.10.2017 um 17:01 hat Kashyap Chamarthy geschrieben:
> On Tue, Sep 26, 2017 at 07:59:42PM +0200, Kevin Wolf wrote:
> 
> [...]
> 
> > In comparison with the old command, the important changes are:
> > 
> >   * top/base are node names instead of file names.
> > 
> >   * You don't need to specify the active layer any more (not the least
> >     because there could very well be more than one of them), but top
> >     becomes mandatory instead.
> 
> As a user of 'block-commit', I'm a bit confused on your above point.
> Two questions:
> 
> (1) During active block commit, isn't "active layer" == "top"? 
> 
>     With the existing QMP `block-commit`, given: 
>     
>         A <- B <- C <- D (active)
>     
>     To merge B, C, and D (the active layer) into A, I _have_ to specify
>     the active layer, which is "D", as a 'top' parameter:
> 
>         {
>             "execute": "block-commit",
>             "arguments": {
>                 "device": "node-D",
>                 "job-id": "job0",
>                 "top": "d.qcow2",
>                 "base": "a.qcow2"
>             }
>         }
> 
>     So when merging the top-most layer (D), there's at least one
>     scenario where we _are_ specifying the "active layer".  And 'top'
>     _is_ mandatory as seen above.  
>     
>     So I wonder if I'm misinterpreting your wording.

The point is that you specify both "device" and "top". There is no real
use in specifying "device", because "top" already identifies the node.
(Of course, file names aren't a good way to identify nodes, so the
assumption is that they are replaced by node names.)

In the active commit case, it is just duplicated information, but in the
case of an intermediate commit, it is additional information that needs
to be provided without good reason.

> (2) Also, just for my own education, can you mind expanding a bit more
>     about the "there can be more than one active layer" scenario?

Something like this, C is the backing file of both D and E:


                    +--- D
                    |
    A <- B <- C <---+
                    |
                    +--- E

I want to commit C into B. But do I specify D or E as the active layer?
They are both active layers above C.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] blockdev-commit design
  2017-10-02 15:32   ` Kevin Wolf
@ 2017-10-03  8:00     ` Kashyap Chamarthy
  0 siblings, 0 replies; 7+ messages in thread
From: Kashyap Chamarthy @ 2017-10-03  8:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, stefanha, mreitz

On Mon, Oct 02, 2017 at 05:32:52PM +0200, Kevin Wolf wrote:
> Am 02.10.2017 um 17:01 hat Kashyap Chamarthy geschrieben:
> > On Tue, Sep 26, 2017 at 07:59:42PM +0200, Kevin Wolf wrote:

[...]

> >         {
> >             "execute": "block-commit",
> >             "arguments": {
> >                 "device": "node-D",
> >                 "job-id": "job0",
> >                 "top": "d.qcow2",
> >                 "base": "a.qcow2"
> >             }
> >         }
> > 
> >     So when merging the top-most layer (D), there's at least one
> >     scenario where we _are_ specifying the "active layer".  And 'top'
> >     _is_ mandatory as seen above.  
> >     
> >     So I wonder if I'm misinterpreting your wording.
> 
> The point is that you specify both "device" and "top". There is no real
> use in specifying "device", because "top" already identifies the node.
> (Of course, file names aren't a good way to identify nodes, so the
> assumption is that they are replaced by node names.)

Ah, thanks.  Now I recall that the existing `block-commit` is the only
command (among mirror, backup, stream, and commit) that doesn't yet
support 'node-name' / 'snapshot-node-name'.  Hence your proposal, to
bring it more in-line w/ blockdev-{mirror,backup}, and `block-stream`[*]
(which accepts 'base-node', and its "device" parameter takes a named
node).

[*] https://lists.gnu.org/archive/html/qemu-block/2017-05/msg01230.html

> In the active commit case, it is just duplicated information, but in the
> case of an intermediate commit, it is additional information that needs
> to be provided without good reason.

It's clearer now, thank you.

> > (2) Also, just for my own education, can you mind expanding a bit more
> >     about the "there can be more than one active layer" scenario?
> 
> Something like this, C is the backing file of both D and E:
> 
> 
>                     +--- D
>                     |
>     A <- B <- C <---+
>                     |
>                     +--- E
> 
> I want to commit C into B. But do I specify D or E as the active layer?
> They are both active layers above C.

Ah-ha, yes, it's the "thin provisoning" case.

Thanks for the explanation.

-- 
/kashyap

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-03  8:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-26 17:59 [Qemu-devel] blockdev-commit design Kevin Wolf
2017-09-26 18:29 ` Eric Blake
2017-09-26 19:11   ` Kevin Wolf
2017-09-27 16:29 ` Max Reitz
2017-10-02 15:01 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
2017-10-02 15:32   ` Kevin Wolf
2017-10-03  8:00     ` Kashyap Chamarthy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).