[Qemu-devel] Jobs 2.0 QAPI [RFC]

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Jobs 2.0 QAPI [RFC]
@ 2015-12-17  0:50 John Snow
  2015-12-18 18:07 ` Eric Blake
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John Snow @ 2015-12-17  0:50 UTC (permalink / raw)
  To: Qemu-block; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster

In working through a prototype to enable multiple block jobs. A few
problem spots in our API compatibility become apparent.

In a nutshell, old Blockjobs rely on the "device" to identify the job,
which implies:

1) A job is always attached to a device / the root node of a device
2) There can only be one job per device
3) A job can only affect one device at a time
4) Once created, a job will always be associated with that device.

All four of these are wrong and something we need to change, so
principally the "device" addressing scheme for Jobs needs to go and we
need a new ID addressing scheme instead.

While we're at it, though, Jeff Cody is cooking up a new permissions
system for the block layer and we are considering the idea of a generic
job system for QEMU.

So we need to retool our QMP commands. Now's the time to design a good
clean break.

==== qmp_query_block_jobs ====

Currently, this loops through every BDS we have and sees if it has a job
attached or not. If yes, it appends its info into a flat list and
reports the list back.

This is easily extendable to an arbitrary number of jobs; we just append
more jobs into the list.

Instead of identifying a job by its device, we will now need to identify
a job based on its ID.

e.g.

[{"device": "drive0", ...}, {"device": "drive1", ...}, ...]

becomes:

[{"device": "drive0", "id": "#job102", ...},
 {"device": "drive1", "id": "#job529", ...}, ...]

However, if Jeff Cody's work on Block Job permissions progresses, device
may not always be reportable anymore. This job may not be attached to
any one device, not attached to a root node, or associated with multiple
graphs entirely.

Let's consider using a new command. I'm not sure if this is strictly
possible with current QAPI, but Eric will yell at me if it isn't --
forgive the wiggly pseudo-specification.

query-jobs

Will return a list of all jobs.

query-jobs sys="block"

Will return a list of all block jobs. This will be the only valid
subsystem to start with -- we don't have non-block jobs yet and it may
be some time before we do.

Each subsystem will have a sys= enumeration, and the jobs for that
particular subsystem can subclass the default return type. The QAPI
commands can be defined to accept a union of the subclasses so we don't
have to specify in advance which type each command accepts, but we can
allow the responsible subsystem to interpret it on demand dynamically.

The base type can be something along the lines of:

{ 'struct': 'JobInfo',
  'data': {
    'type': JobTypeEnum,
    'sys': JobSysEnum,
    'id': str,
    'busy': bool,
    'paused': bool,
    'ready': bool
  }
}

And the BlockJobInfo can inherit/extend, looking like this:

{ 'struct': 'BlockJobInfo',
  'base': JobInfo
  'data': {
    'len': int,
    'offset': int,
    'speed': 'int',
    'io-status': BlockDeviceIoStatus,
    'devices': <see below>,
  }
}

'devices' will now report a more nuanced view of this Job's associated
nodes in our graph, as in the following:

{ "devices": [
  { "name": "drive0",
    "nodes": [
      { "id": "#node123",
        "permissions": "+RW-X"
      },
      { "id": "#node456",
        "permissions": "+R"
      }
    ]
  }
}

This lets us show exactly what nodes we are touching, what BlockBackends
they belong to, and what permissions are involved. Unambiguously. You
can use your own imagination for a permissions string that will make
sense -- This depends on Jeff Cody's work primarily.

The new command gives a chance to make a clean break and any users of
this new command won't be confused about the information they receive.

The legacy qmp-query-block-jobs command can continue to operate
providing a best-effort approximation of the data it finds. The legacy
command will *abort* and return error if it detects any job that was
launched by a new-style command, e.g. if it detects there is more than
one job attached to any node under a single BlockBackend -- clearly the
user has been using new features -- and should abort announcing this
command is deprecated.

If the graph looks reasonably plain, it can report back the information
it always has, except that it will now also report the "id" field in
addition to be perfectly unambiguous about how to issue commands to this
job, like I outlined in the first paragraph of this section.

==== block-job-cancel ====

This command can remain as a legacy command. Provided the "device"
provided has precisely one job attached, we can allow this command to
cancel it. If more complicated configurations are found, abort and
encourage the user to use a new API.

We can refuse to augment block-job-cancel; forcing users who want to
specify IDs to cancel to use the new API.

We can add a new "job-cancel" command which removes the block specificity.

We can introduce sub-classed arguments as needed for e.g. BlockJob
cancellations:

{ 'command': 'job-cancel',
  'data': { 'id': str,
            '*force': bool,
            'options': JobCancelOpts
  }
}

The idea is that JobCancelOpts would be a subclassable type that can be
passed directly to the subsystem's implementation of the cancel
operation for further interpretation, so different subsystems can get
relevant arguments as needed. We don't currently have any such need for
BlockJobCancelOpts, but it can't hurt to be prepared for it.

The type can be interpreted through the lens of whichever type the job
we've identified expects to see. (Block Jobs expect to see
BlockJobCancelOpts, etc.)

==== block-job-pause, block-job-resume, block-job-complete ====

Same story as above. Introduce job-pause, job-resume and job-complete
with subclassible job structures we can use for the block subsystem.

Legacy commands continue to operate only as long as the "device"
argument is unambiguous to decipher.

==== block-job-set-speed ====

This one is interesting since it definitely does apply only to Block Jobs.

We can augment it and limp it along, allowing e.g.

{ 'data':
  { '*device': 'str',
    '*id': 'str',
    'speed': 'int'
  }
}

Where we follow the "One of ID or Device but not both nor either"
pattern that we've applied in the past.

Or, we can say "If there isn't one job per device, reject the command
and use the new API"

where the new API is:

job-set-option :
  { 'id': str,
    'options': {
      '*etc': ...,
      '*etc2': ...,
    }
  }

Similar to the other commands, the idea would be to take the subclassed
options structure that belongs to that Job's Subclass (e.g.
BlockJobOptions) but allow any field inside to be absent.

Then we could have commands like this:

job-set-option \
arguments={ 'id': '#job789',
            'options': {
              'speed': 10000
            }
}

And so on. These would remain a flexible way of setting any various
post-launch options a job would need, and the relevant job-subsystem can
be responsible for rejecting certain options, etc.

I believe that covers all the existing jobs interface in the QMP, and
that should be sufficiently vague and open-ended enough to behave well
with a generic jobs system if we roll one out in the future.

Eric, Markus: Please inform me of all the myriad ways my fever dreams
for QAPI are wrong. :~)

Thanks,
--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]
  2015-12-17  0:50 [Qemu-devel] Jobs 2.0 QAPI [RFC] John Snow
@ 2015-12-18 18:07 ` Eric Blake
  2015-12-18 21:24   ` John Snow
  2015-12-21 12:31 ` Kevin Wolf
  2015-12-21 16:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Blake @ 2015-12-18 18:07 UTC (permalink / raw)
  To: John Snow, Qemu-block
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 9388 bytes --]

On 12/16/2015 05:50 PM, John Snow wrote:
> In working through a prototype to enable multiple block jobs. A few
> problem spots in our API compatibility become apparent.
> 

> ==== qmp_query_block_jobs ====
> 

> Let's consider using a new command. I'm not sure if this is strictly
> possible with current QAPI, but Eric will yell at me if it isn't --
> forgive the wiggly pseudo-specification.
> 
> query-jobs
> 
> Will return a list of all jobs.
> 
> query-jobs sys="block"
> 
> Will return a list of all block jobs. This will be the only valid
> subsystem to start with -- we don't have non-block jobs yet and it may
> be some time before we do.
> 
> Each subsystem will have a sys= enumeration, and the jobs for that
> particular subsystem can subclass the default return type. The QAPI
> commands can be defined to accept a union of the subclasses so we don't
> have to specify in advance which type each command accepts, but we can
> allow the responsible subsystem to interpret it on demand dynamically.
> 
> The base type can be something along the lines of:
> 
> { 'struct': 'JobInfo',
>   'data': {
>     'type': JobTypeEnum,
>     'sys': JobSysEnum,
>     'id': str,
>     'busy': bool,
>     'paused': bool,
>     'ready': bool
>   }
> }
> 
> And the BlockJobInfo can inherit/extend, looking like this:
> 
> { 'struct': 'BlockJobInfo',
>   'base': JobInfo
>   'data': {
>     'len': int,
>     'offset': int,
>     'speed': 'int',
>     'io-status': BlockDeviceIoStatus,
>     'devices': <see below>,
>   }
> }

Done in QAPI via flat unions.  All the common fields, including the
discriminator 'sys', are present in the base type, and then each
subclass of a job adds additional parameters to the QMP type by
specifying their subtype as a branch of the union.  So should be doable.

> 
> 'devices' will now report a more nuanced view of this Job's associated
> nodes in our graph, as in the following:
> 
> { "devices": [
>   { "name": "drive0",
>     "nodes": [
>       { "id": "#node123",
>         "permissions": "+RW-X"
>       },
>       { "id": "#node456",
>         "permissions": "+R"
>       }
>     ]
>   }
> }
> 
> This lets us show exactly what nodes we are touching, what BlockBackends
> they belong to, and what permissions are involved. Unambiguously. You
> can use your own imagination for a permissions string that will make
> sense -- This depends on Jeff Cody's work primarily.

We'd probably want it to look more JSON-y, maybe something like:

{ "id": "#node456",
  "permissions": [ { "name":"read", "mode":"require" },
                   { "name":"write", "mode":"allow" },
                   { "name":"meta", "mode":"unspecified" } ] }

but I agree that how it is represented does not affect the proposal here
of merely representing a list of all associated BDS affected by this
job, broken out by permissions per BDS (as some jobs will affect
multiple BDS, and not all of them with the same permissions).

> 
> The new command gives a chance to make a clean break and any users of
> this new command won't be confused about the information they receive.
> 
> The legacy qmp-query-block-jobs command can continue to operate
> providing a best-effort approximation of the data it finds. The legacy
> command will *abort* and return error if it detects any job that was
> launched by a new-style command, e.g. if it detects there is more than
> one job attached to any node under a single BlockBackend -- clearly the
> user has been using new features -- and should abort announcing this
> command is deprecated.

Makes sense. If an old client only uses the old commands, things will
still work; while a new client, once converted to use the new commands,
should do the conversion completely and not rely on the old view.

> 
> If the graph looks reasonably plain, it can report back the information
> it always has, except that it will now also report the "id" field in
> addition to be perfectly unambiguous about how to issue commands to this
> job, like I outlined in the first paragraph of this section.

Old clients won't care about the new information, but it doesn't hurt to
supply it, especially if it lets us share code with the new query command.

> 
> 
> ==== block-job-cancel ====
> 
> This command can remain as a legacy command. Provided the "device"
> provided has precisely one job attached, we can allow this command to
> cancel it. If more complicated configurations are found, abort and
> encourage the user to use a new API.

Again, won't affect old client usage patterns, and new clients should
make the lock-step conversion to using only the new interface.  Sounds
reasonable.

> 
> We can refuse to augment block-job-cancel; forcing users who want to
> specify IDs to cancel to use the new API.
> 
> We can add a new "job-cancel" command which removes the block specificity.

Adding a new job-cancel sounds right.

> 
> We can introduce sub-classed arguments as needed for e.g. BlockJob
> cancellations:
> 
> { 'command': 'job-cancel',
>   'data': { 'id': str,
>             '*force': bool,
>             'options': JobCancelOpts
>   }
> }
> 
> The idea is that JobCancelOpts would be a subclassable type that can be
> passed directly to the subsystem's implementation of the cancel
> operation for further interpretation, so different subsystems can get
> relevant arguments as needed. We don't currently have any such need for
> BlockJobCancelOpts, but it can't hurt to be prepared for it.

To be subclassable, we need a flat union, and the discriminator must be
non-optional within that union.  Either 'options' is that flat union
(and we have a layer of {} nesting in QMP}, or we directly make the
'data' of job-cancel be the flat union (no need for an "options":{}
nesting in QMP); the latter still depends on some of my pending patches,
but we'll get there in plenty of time.

> 
> The type can be interpreted through the lens of whichever type the job
> we've identified expects to see. (Block Jobs expect to see
> BlockJobCancelOpts, etc.)
> 
> 
> ==== block-job-pause, block-job-resume, block-job-complete ====
> 
> Same story as above. Introduce job-pause, job-resume and job-complete
> with subclassible job structures we can use for the block subsystem.
> 
> Legacy commands continue to operate only as long as the "device"
> argument is unambiguous to decipher.

Seems reasonable. As you surmised, I'm willing to help come up with the
proper QAPI representation, if that becomes a sticking point in the design.

> 
> 
> ==== block-job-set-speed ====
> 
> This one is interesting since it definitely does apply only to Block Jobs.
> 
> We can augment it and limp it along, allowing e.g.
> 
> { 'data':
>   { '*device': 'str',
>     '*id': 'str',
>     'speed': 'int'
>   }
> }
> 
> Where we follow the "One of ID or Device but not both nor either"
> pattern that we've applied in the past.

Or even "at least one of ID or Device, and if you pass both, they must
both resolve to the same object".  But "exactly one" works - old clients
will pass "device", and it will always resolve (because they never used
the new API to confuse things); new clients will pass only "id" and it
will be the right thing.

> 
> Or, we can say "If there isn't one job per device, reject the command
> and use the new API"
> 
> where the new API is:
> 
> job-set-option :
>   { 'id': str,
>     'options': {
>       '*etc': ...,
>       '*etc2': ...,
>     }
>   }

That has a bit more flexibility, especially if we add new options for
other commands, more than just block-job speed.

> 
> Similar to the other commands, the idea would be to take the subclassed
> options structure that belongs to that Job's Subclass (e.g.
> BlockJobOptions) but allow any field inside to be absent.
> 
> Then we could have commands like this:
> 
> job-set-option \
> arguments={ 'id': '#job789',
>             'options': {
>               'speed': 10000
>             }
> }
> 
> And so on. These would remain a flexible way of setting any various
> post-launch options a job would need, and the relevant job-subsystem can
> be responsible for rejecting certain options, etc.

Keeping type-safety via a flat union may require it to look more like:

job-set-option \
arguments={ 'id': '#job789',
            'sys': 'block',
            'speed': 10000
          }

where the use of the 'sys' discriminator tells what other fields are
being supplied, and we can avoid the "options":{} nesting.  Then we'd
need a sanity check in the code that the 'sys' requested by
job-set-option matches the sys that owns '#job789', and fail if they differ.

> 
> I believe that covers all the existing jobs interface in the QMP, and
> that should be sufficiently vague and open-ended enough to behave well
> with a generic jobs system if we roll one out in the future.
> 
> Eric, Markus: Please inform me of all the myriad ways my fever dreams
> for QAPI are wrong. :~)

At this stage, your concepts seem reasonable, even if the concrete way
of representing a subclass in qapi is not quite spelled out.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]
  2015-12-18 18:07 ` Eric Blake
@ 2015-12-18 21:24   ` John Snow
  2015-12-18 21:46     ` Eric Blake
  0 siblings, 1 reply; 10+ messages in thread
From: John Snow @ 2015-12-18 21:24 UTC (permalink / raw)
  To: Eric Blake, Qemu-block
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster



On 12/18/2015 01:07 PM, Eric Blake wrote:
> On 12/16/2015 05:50 PM, John Snow wrote:
>> In working through a prototype to enable multiple block jobs. A few
>> problem spots in our API compatibility become apparent.
>>
> 
>> ==== qmp_query_block_jobs ====
>>
> 
>> Let's consider using a new command. I'm not sure if this is strictly
>> possible with current QAPI, but Eric will yell at me if it isn't --
>> forgive the wiggly pseudo-specification.
>>
>> query-jobs
>>
>> Will return a list of all jobs.
>>
>> query-jobs sys="block"
>>
>> Will return a list of all block jobs. This will be the only valid
>> subsystem to start with -- we don't have non-block jobs yet and it may
>> be some time before we do.
>>
>> Each subsystem will have a sys= enumeration, and the jobs for that
>> particular subsystem can subclass the default return type. The QAPI
>> commands can be defined to accept a union of the subclasses so we don't
>> have to specify in advance which type each command accepts, but we can
>> allow the responsible subsystem to interpret it on demand dynamically.
>>
>> The base type can be something along the lines of:
>>
>> { 'struct': 'JobInfo',
>>   'data': {
>>     'type': JobTypeEnum,
>>     'sys': JobSysEnum,
>>     'id': str,
>>     'busy': bool,
>>     'paused': bool,
>>     'ready': bool
>>   }
>> }
>>
>> And the BlockJobInfo can inherit/extend, looking like this:
>>
>> { 'struct': 'BlockJobInfo',
>>   'base': JobInfo
>>   'data': {
>>     'len': int,
>>     'offset': int,
>>     'speed': 'int',
>>     'io-status': BlockDeviceIoStatus,
>>     'devices': <see below>,
>>   }
>> }
> 
> Done in QAPI via flat unions.  All the common fields, including the
> discriminator 'sys', are present in the base type, and then each
> subclass of a job adds additional parameters to the QMP type by
> specifying their subtype as a branch of the union.  So should be doable.
> 
>>
>> 'devices' will now report a more nuanced view of this Job's associated
>> nodes in our graph, as in the following:
>>
>> { "devices": [
>>   { "name": "drive0",
>>     "nodes": [
>>       { "id": "#node123",
>>         "permissions": "+RW-X"
>>       },
>>       { "id": "#node456",
>>         "permissions": "+R"
>>       }
>>     ]
>>   }
>> }
>>
>> This lets us show exactly what nodes we are touching, what BlockBackends
>> they belong to, and what permissions are involved. Unambiguously. You
>> can use your own imagination for a permissions string that will make
>> sense -- This depends on Jeff Cody's work primarily.
> 
> We'd probably want it to look more JSON-y, maybe something like:
> 
> { "id": "#node456",
>   "permissions": [ { "name":"read", "mode":"require" },
>                    { "name":"write", "mode":"allow" },
>                    { "name":"meta", "mode":"unspecified" } ] }
> 
> but I agree that how it is represented does not affect the proposal here
> of merely representing a list of all associated BDS affected by this
> job, broken out by permissions per BDS (as some jobs will affect
> multiple BDS, and not all of them with the same permissions).
> 

And as not shown here, some may affect multiple devices, so this should
give the full granularity.

The more JSON-y permissions is fine, also. It will likely evolve to
match whatever Jeff Cody needs for his fine-grained op blocker idea.

>>
>> The new command gives a chance to make a clean break and any users of
>> this new command won't be confused about the information they receive.
>>
>> The legacy qmp-query-block-jobs command can continue to operate
>> providing a best-effort approximation of the data it finds. The legacy
>> command will *abort* and return error if it detects any job that was
>> launched by a new-style command, e.g. if it detects there is more than
>> one job attached to any node under a single BlockBackend -- clearly the
>> user has been using new features -- and should abort announcing this
>> command is deprecated.
> 
> Makes sense. If an old client only uses the old commands, things will
> still work; while a new client, once converted to use the new commands,
> should do the conversion completely and not rely on the old view.
> 

Yes. All or nothing. I prefer this approach to an incomplete report. I
shouldn't have used "best-effort" above; I should have said:

"The legacy command will report back jobs if it has sufficient
resolution to do so." e.g. no conflicting device or BDS id results to
the query.

I don't very much like the idea of a query command that can give you
incomplete data, especially to an unwitting querier.

>>
>> If the graph looks reasonably plain, it can report back the information
>> it always has, except that it will now also report the "id" field in
>> addition to be perfectly unambiguous about how to issue commands to this
>> job, like I outlined in the first paragraph of this section.
> 
> Old clients won't care about the new information, but it doesn't hurt to
> supply it, especially if it lets us share code with the new query command.
> 

That's the thought. The wrapper will investigate the list for
suitability and terminate if it finds duplicate BB/BDS IDs.

Though, I'll need to fill in the legacy device field anyway, so I could
just strip out the "devices" graph

>>
>>
>> ==== block-job-cancel ====
>>
>> This command can remain as a legacy command. Provided the "device"
>> provided has precisely one job attached, we can allow this command to
>> cancel it. If more complicated configurations are found, abort and
>> encourage the user to use a new API.
> 
> Again, won't affect old client usage patterns, and new clients should
> make the lock-step conversion to using only the new interface.  Sounds
> reasonable.
> 
>>
>> We can refuse to augment block-job-cancel; forcing users who want to
>> specify IDs to cancel to use the new API.
>>
>> We can add a new "job-cancel" command which removes the block specificity.
> 
> Adding a new job-cancel sounds right.
> 
>>
>> We can introduce sub-classed arguments as needed for e.g. BlockJob
>> cancellations:
>>
>> { 'command': 'job-cancel',
>>   'data': { 'id': str,
>>             '*force': bool,
>>             'options': JobCancelOpts
>>   }
>> }
>>
>> The idea is that JobCancelOpts would be a subclassable type that can be
>> passed directly to the subsystem's implementation of the cancel
>> operation for further interpretation, so different subsystems can get
>> relevant arguments as needed. We don't currently have any such need for
>> BlockJobCancelOpts, but it can't hurt to be prepared for it.
> 
> To be subclassable, we need a flat union, and the discriminator must be
> non-optional within that union.  Either 'options' is that flat union
> (and we have a layer of {} nesting in QMP}, or we directly make the
> 'data' of job-cancel be the flat union (no need for an "options":{}
> nesting in QMP); the latter still depends on some of my pending patches,
> but we'll get there in plenty of time.
> 

Ah, shame. It would have been nice to delay interpretation of the union
pending the result of the Job lookup.

If the discriminator must always be present, though, then so be it.

I am fine with either approach; the approach outlined above was a hope
to delay the evaluation of the union to allow for type resolution based
on the ID. If that's a non-starter, I'm fine with the flat union:

{ 'id': str,
  'sys': str,
  '*force': bool,
  <subclass options>
}

where if the 'sys' of the job you find is not the sys you supplied, then
it errors out with "Wrong subsystem ID provided for job '#job123',
expected 'blk', given 'abc'" or so.

>>
>> The type can be interpreted through the lens of whichever type the job
>> we've identified expects to see. (Block Jobs expect to see
>> BlockJobCancelOpts, etc.)
>>
>>
>> ==== block-job-pause, block-job-resume, block-job-complete ====
>>
>> Same story as above. Introduce job-pause, job-resume and job-complete
>> with subclassible job structures we can use for the block subsystem.
>>
>> Legacy commands continue to operate only as long as the "device"
>> argument is unambiguous to decipher.
> 
> Seems reasonable. As you surmised, I'm willing to help come up with the
> proper QAPI representation, if that becomes a sticking point in the design.
> 

I'll give it a college try and send some actual patches for x-job-query,
x-job-cancel, etc.

For the flat union support on the top level as hinted for job-cancel,
what branch of yours should I develop on top of?

>>
>>
>> ==== block-job-set-speed ====
>>
>> This one is interesting since it definitely does apply only to Block Jobs.
>>
>> We can augment it and limp it along, allowing e.g.
>>
>> { 'data':
>>   { '*device': 'str',
>>     '*id': 'str',
>>     'speed': 'int'
>>   }
>> }
>>
>> Where we follow the "One of ID or Device but not both nor either"
>> pattern that we've applied in the past.
> 
> Or even "at least one of ID or Device, and if you pass both, they must
> both resolve to the same object".  But "exactly one" works - old clients
> will pass "device", and it will always resolve (because they never used
> the new API to confuse things); new clients will pass only "id" and it
> will be the right thing.
> 

Less work and never potentially ambiguous.

>>
>> Or, we can say "If there isn't one job per device, reject the command
>> and use the new API"
>>
>> where the new API is:
>>
>> job-set-option :
>>   { 'id': str,
>>     'options': {
>>       '*etc': ...,
>>       '*etc2': ...,
>>     }
>>   }
> 
> That has a bit more flexibility, especially if we add new options for
> other commands, more than just block-job speed.
> 
>>
>> Similar to the other commands, the idea would be to take the subclassed
>> options structure that belongs to that Job's Subclass (e.g.
>> BlockJobOptions) but allow any field inside to be absent.
>>
>> Then we could have commands like this:
>>
>> job-set-option \
>> arguments={ 'id': '#job789',
>>             'options': {
>>               'speed': 10000
>>             }
>> }
>>
>> And so on. These would remain a flexible way of setting any various
>> post-launch options a job would need, and the relevant job-subsystem can
>> be responsible for rejecting certain options, etc.
> 
> Keeping type-safety via a flat union may require it to look more like:
> 
> job-set-option \
> arguments={ 'id': '#job789',
>             'sys': 'block',
>             'speed': 10000
>           }
> 
> where the use of the 'sys' discriminator tells what other fields are
> being supplied, and we can avoid the "options":{} nesting.  Then we'd
> need a sanity check in the code that the 'sys' requested by
> job-set-option matches the sys that owns '#job789', and fail if they differ.
> 

Yes, I was assuming again we could resolve the job first and then
interpret the data, but if it's easiest to always require the
discriminator (instead of having to /look/ for it dynamically), then
your outline is perfectly acceptable to me.

>>
>> I believe that covers all the existing jobs interface in the QMP, and
>> that should be sufficiently vague and open-ended enough to behave well
>> with a generic jobs system if we roll one out in the future.
>>
>> Eric, Markus: Please inform me of all the myriad ways my fever dreams
>> for QAPI are wrong. :~)
> 
> At this stage, your concepts seem reasonable, even if the concrete way
> of representing a subclass in qapi is not quite spelled out.
> 

Thanks!

I'll get rolling on some x-versions to mix alongside my multi-block-job
patches.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]
  2015-12-18 21:24   ` John Snow
@ 2015-12-18 21:46     ` Eric Blake
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Blake @ 2015-12-18 21:46 UTC (permalink / raw)
  To: John Snow, Qemu-block
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

On 12/18/2015 02:24 PM, John Snow wrote:

>> To be subclassable, we need a flat union, and the discriminator must be
>> non-optional within that union.  Either 'options' is that flat union
>> (and we have a layer of {} nesting in QMP}, or we directly make the
>> 'data' of job-cancel be the flat union (no need for an "options":{}
>> nesting in QMP); the latter still depends on some of my pending patches,
>> but we'll get there in plenty of time.
>>
> 
> Ah, shame. It would have been nice to delay interpretation of the union
> pending the result of the Job lookup.

Yeah, I don't think dynamic parsing is going to fly in the qapi
generator; right now it works as "parse the entire JSON string into a
QObject, validate that it can be converted to a qapi type, then pass
that to the marshaller"; whereas a dynamic parse would require more like
"parse half the JSON, call the first marshaller to see what else to
parse, finish the parse, and now dispatch to the second marshaller".

> 
> If the discriminator must always be present, though, then so be it.

Maybe a future patch can make it optional (if the discriminator is
missing, do a tentative parse of all remaining fields, and as long as
exactly one branch is happy with the result, then we know what
discriminator to fill in).  But I'm not sure it's worth the complexity,
when it's easier to just tell clients to always pass a discriminator.

> I'll give it a college try and send some actual patches for x-job-query,
> x-job-cancel, etc.
> 
> For the flat union support on the top level as hinted for job-cancel,
> what branch of yours should I develop on top of?

http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/qapi has all my
pending patches in the state last posted to the list, although it will
be non-fast-forwarded as I rebase to latest (I already know that today's
state of that branch doesn't build on today's qemu.git master, now that
2.6 is open and some churn has started creeping in).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]
  2015-12-17  0:50 [Qemu-devel] Jobs 2.0 QAPI [RFC] John Snow
  2015-12-18 18:07 ` Eric Blake
@ 2015-12-21 12:31 ` Kevin Wolf
  2015-12-21 17:45   ` John Snow
  2015-12-21 16:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2 siblings, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2015-12-21 12:31 UTC (permalink / raw)
  To: John Snow; +Cc: Jeff Cody, qemu-devel, Qemu-block, Markus Armbruster

Am 17.12.2015 um 01:50 hat John Snow geschrieben:
> In working through a prototype to enable multiple block jobs. A few
> problem spots in our API compatibility become apparent.
> 
> In a nutshell, old Blockjobs rely on the "device" to identify the job,
> which implies:
> 
> 1) A job is always attached to a device / the root node of a device
> 2) There can only be one job per device
> 3) A job can only affect one device at a time
> 4) Once created, a job will always be associated with that device.
> 
> All four of these are wrong and something we need to change, so
> principally the "device" addressing scheme for Jobs needs to go and we
> need a new ID addressing scheme instead.
> 
> While we're at it, though, Jeff Cody is cooking up a new permissions
> system for the block layer and we are considering the idea of a generic
> job system for QEMU.
> 
> So we need to retool our QMP commands. Now's the time to design a good
> clean break.

I'lll reply here without having read Eric's comments yet, because I'm
afraid if I were to read the whole thread first, I would forget half of
what I wanted to comment on here...

So I may be repeating some things or make suggestions where Eric has
already posted a good solution, sorry for that.

> ==== qmp_query_block_jobs ====
> 
> Currently, this loops through every BDS we have and sees if it has a job
> attached or not. If yes, it appends its info into a flat list and
> reports the list back.
> 
> This is easily extendable to an arbitrary number of jobs; we just append
> more jobs into the list.
> 
> Instead of identifying a job by its device, we will now need to identify
> a job based on its ID.
> 
> e.g.
> 
> [{"device": "drive0", ...}, {"device": "drive1", ...}, ...]
> 
> becomes:
> 
> [{"device": "drive0", "id": "#job102", ...},
>  {"device": "drive1", "id": "#job529", ...}, ...]
> 
> However, if Jeff Cody's work on Block Job permissions progresses, device
> may not always be reportable anymore. This job may not be attached to
> any one device, not attached to a root node, or associated with multiple
> graphs entirely.
> 
> Let's consider using a new command. I'm not sure if this is strictly
> possible with current QAPI, but Eric will yell at me if it isn't --
> forgive the wiggly pseudo-specification.
> 
> query-jobs
> 
> Will return a list of all jobs.
> 
> query-jobs sys="block"
> 
> Will return a list of all block jobs. This will be the only valid
> subsystem to start with -- we don't have non-block jobs yet and it may
> be some time before we do.

In the past we avoided filtering functionality in query-* commands and
instead required the client to request the full list and do its own
filtering. I'm not strictly opposed to such functionality if it's useful
enough, but we should be aware that we're doing something new here.

> Each subsystem will have a sys= enumeration, and the jobs for that
> particular subsystem can subclass the default return type. The QAPI
> commands can be defined to accept a union of the subclasses so we don't
> have to specify in advance which type each command accepts, but we can
> allow the responsible subsystem to interpret it on demand dynamically.
> 
> The base type can be something along the lines of:
> 
> { 'struct': 'JobInfo',
>   'data': {
>     'type': JobTypeEnum,
>     'sys': JobSysEnum,
>     'id': str,
>     'busy': bool,
>     'paused': bool,
>     'ready': bool
>   }
> }

'ready' optional and missing for jobs which don't require manual
completion?

> And the BlockJobInfo can inherit/extend, looking like this:
> 
> { 'struct': 'BlockJobInfo',
>   'base': JobInfo
>   'data': {
>     'len': int,
>     'offset': int,

len/offset are misnomers and really represent the progress of the job. I
think we want to have these in JobInfo, with better names.

>     'speed': 'int',

Eventually, we'll deprecate this and require insertion of a throttling
filter in the right place instead. But for now, I'm afraid we'll need
this.

I'm not completely sure if it makes sense for every thinkable block job,
though. A block job might in theory have no background operation or more
than one.

>     'io-status': BlockDeviceIoStatus,

This has always been a weird error reporting interface because it can't
tell you whether the I/O error happened on the source or the target.
It's similar to attach the job to a single BDS: It kind of works in
practice, but conceptually it doesn't make a lot of sense.

>     'devices': <see below>,
>   }
> }
> 
> 'devices' will now report a more nuanced view of this Job's associated
> nodes in our graph, as in the following:
> 
> { "devices": [
>   { "name": "drive0",
>     "nodes": [
>       { "id": "#node123",
>         "permissions": "+RW-X"
>       },
>       { "id": "#node456",
>         "permissions": "+R"
>       }
>     ]
>   }
> }
> 
> This lets us show exactly what nodes we are touching, what BlockBackends
> they belong to, and what permissions are involved. Unambiguously. You
> can use your own imagination for a permissions string that will make
> sense -- This depends on Jeff Cody's work primarily.

There is one important information missing: What role that node takes
for the job - that is, which is the source and which is the target. I'm
not completely sure if the op blocker information should be visible to
users, I consider this mostly internal node management.

I would suggest that we don't introduce a BlockJobInfo, but let the
individual job types directly inherit JobInfo, and also replace the list
of devices by named keys:

    { 'struct': 'JobInfo',
      'data': {
        'type': JobTypeEnum,
        'sys': JobSysEnum,
        'id': str,
        'busy': bool,
        'paused': bool,
        'ready': bool,
        'work_total': 'int',
        'work_completed': 'int'
      }
    }

    { 'struct': 'BlockJobNode',
      'data': {
        'node-name': 'str',
        'io-status': 'BlockDeviceIoStatus',
        'permissions': 'whatever'
      }
    }

    { 'struct': 'MirrorJobInfo',
      'base': JobInfo
      'data': {
          'source': 'BlockJobNode',
          'target': 'BlockJobNode',
          'speed': 'int',
      }
    }

> The new command gives a chance to make a clean break and any users of
> this new command won't be confused about the information they receive.
> 
> The legacy qmp-query-block-jobs command can continue to operate
> providing a best-effort approximation of the data it finds. The legacy
> command will *abort* and return error if it detects any job that was
> launched by a new-style command, e.g. if it detects there is more than
> one job attached to any node under a single BlockBackend -- clearly the
> user has been using new features -- and should abort announcing this
> command is deprecated.

I agree with that. As soon as you use the new API once, you should be
required to use it consistently.

> If the graph looks reasonably plain, it can report back the information
> it always has, except that it will now also report the "id" field in
> addition to be perfectly unambiguous about how to issue commands to this
> job, like I outlined in the first paragraph of this section.
> 
> 
> ==== block-job-cancel ====
> 
> This command can remain as a legacy command. Provided the "device"
> provided has precisely one job attached, we can allow this command to
> cancel it. If more complicated configurations are found, abort and
> encourage the user to use a new API.
> 
> We can refuse to augment block-job-cancel; forcing users who want to
> specify IDs to cancel to use the new API.
> 
> We can add a new "job-cancel" command which removes the block specificity.
> 
> We can introduce sub-classed arguments as needed for e.g. BlockJob
> cancellations:
> 
> { 'command': 'job-cancel',
>   'data': { 'id': str,
>             '*force': bool,
>             'options': JobCancelOpts
>   }
> }
> 
> The idea is that JobCancelOpts would be a subclassable type that can be
> passed directly to the subsystem's implementation of the cancel
> operation for further interpretation, so different subsystems can get
> relevant arguments as needed. We don't currently have any such need for
> BlockJobCancelOpts, but it can't hurt to be prepared for it.
> 
> The type can be interpreted through the lens of whichever type the job
> we've identified expects to see. (Block Jobs expect to see
> BlockJobCancelOpts, etc.)

Sounds good, possibly with s/subsystem/individual job type/.

'options' should be optional.

> ==== block-job-pause, block-job-resume, block-job-complete ====
> 
> Same story as above. Introduce job-pause, job-resume and job-complete
> with subclassible job structures we can use for the block subsystem.
> 
> Legacy commands continue to operate only as long as the "device"
> argument is unambiguous to decipher.

Ack.

> ==== block-job-set-speed ====
> 
> This one is interesting since it definitely does apply only to Block Jobs.

Why? The one non-block job we're envisioning so far, live migration,
does have its own speed option.

I think it highly dependent on the job type whether it makes sense.

> We can augment it and limp it along, allowing e.g.
> 
> { 'data':
>   { '*device': 'str',
>     '*id': 'str',
>     'speed': 'int'
>   }
> }
> 
> Where we follow the "One of ID or Device but not both nor either"
> pattern that we've applied in the past.
> 
> Or, we can say "If there isn't one job per device, reject the command
> and use the new API"

The latter would be more consistent, in my opinion.

> where the new API is:
> 
> job-set-option :
>   { 'id': str,
>     'options': {
>       '*etc': ...,
>       '*etc2': ...,
>     }
>   }
> 
> Similar to the other commands, the idea would be to take the subclassed
> options structure that belongs to that Job's Subclass (e.g.
> BlockJobOptions) but allow any field inside to be absent.
> 
> Then we could have commands like this:
> 
> job-set-option \
> arguments={ 'id': '#job789',
>             'options': {
>               'speed': 10000
>             }
> }
> 
> And so on. These would remain a flexible way of setting any various
> post-launch options a job would need, and the relevant job-subsystem can
> be responsible for rejecting certain options, etc.

That's an interesting thought. I kind of like it, but haven't thought
through the consequences yet.

> I believe that covers all the existing jobs interface in the QMP, and
> that should be sufficiently vague and open-ended enough to behave well
> with a generic jobs system if we roll one out in the future.
> 
> Eric, Markus: Please inform me of all the myriad ways my fever dreams
> for QAPI are wrong. :~)

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Jobs 2.0 QAPI [RFC]
  2015-12-17  0:50 [Qemu-devel] Jobs 2.0 QAPI [RFC] John Snow
  2015-12-18 18:07 ` Eric Blake
  2015-12-21 12:31 ` Kevin Wolf
@ 2015-12-21 16:58 ` Alberto Garcia
  2015-12-21 19:40   ` John Snow
  2 siblings, 1 reply; 10+ messages in thread
From: Alberto Garcia @ 2015-12-21 16:58 UTC (permalink / raw)
  To: John Snow, Qemu-block; +Cc: Kevin Wolf, qemu-devel, Markus Armbruster

On Thu 17 Dec 2015 01:50:08 AM CET, John Snow <jsnow@redhat.com> wrote:
> In working through a prototype to enable multiple block jobs. A few
> problem spots in our API compatibility become apparent.
>
> In a nutshell, old Blockjobs rely on the "device" to identify the job,
> which implies:
>
> 1) A job is always attached to a device / the root node of a device
> 2) There can only be one job per device
> 3) A job can only affect one device at a time
> 4) Once created, a job will always be associated with that device.
>
> All four of these are wrong and something we need to change, so
> principally the "device" addressing scheme for Jobs needs to go and we
> need a new ID addressing scheme instead.

Out of curiosity, do you have specific examples of block jobs that are
affected by these problems?

For the intermediate block-stream functionality I was having problems
because of 1), so I extended the 'device' parameter to identify
arbitrary node names as well.

Just to make things clear: your proposal looks good to me, I'm only
wondering whether you're simply looking for a cleaner API or you have a
use case that you cannot fulfill because of the way block jobs work
now...

Thanks!

Berto

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]
  2015-12-21 12:31 ` Kevin Wolf
@ 2015-12-21 17:45   ` John Snow
  0 siblings, 0 replies; 10+ messages in thread
From: John Snow @ 2015-12-21 17:45 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Qemu-block, Markus Armbruster



On 12/21/2015 07:31 AM, Kevin Wolf wrote:
> Am 17.12.2015 um 01:50 hat John Snow geschrieben:
>> In working through a prototype to enable multiple block jobs. A few
>> problem spots in our API compatibility become apparent.
>>
>> In a nutshell, old Blockjobs rely on the "device" to identify the job,
>> which implies:
>>
>> 1) A job is always attached to a device / the root node of a device
>> 2) There can only be one job per device
>> 3) A job can only affect one device at a time
>> 4) Once created, a job will always be associated with that device.
>>
>> All four of these are wrong and something we need to change, so
>> principally the "device" addressing scheme for Jobs needs to go and we
>> need a new ID addressing scheme instead.
>>
>> While we're at it, though, Jeff Cody is cooking up a new permissions
>> system for the block layer and we are considering the idea of a generic
>> job system for QEMU.
>>
>> So we need to retool our QMP commands. Now's the time to design a good
>> clean break.
> 
> I'lll reply here without having read Eric's comments yet, because I'm
> afraid if I were to read the whole thread first, I would forget half of
> what I wanted to comment on here...
> 
> So I may be repeating some things or make suggestions where Eric has
> already posted a good solution, sorry for that.
> 
>> ==== qmp_query_block_jobs ====
>>
>> Currently, this loops through every BDS we have and sees if it has a job
>> attached or not. If yes, it appends its info into a flat list and
>> reports the list back.
>>
>> This is easily extendable to an arbitrary number of jobs; we just append
>> more jobs into the list.
>>
>> Instead of identifying a job by its device, we will now need to identify
>> a job based on its ID.
>>
>> e.g.
>>
>> [{"device": "drive0", ...}, {"device": "drive1", ...}, ...]
>>
>> becomes:
>>
>> [{"device": "drive0", "id": "#job102", ...},
>>  {"device": "drive1", "id": "#job529", ...}, ...]
>>
>> However, if Jeff Cody's work on Block Job permissions progresses, device
>> may not always be reportable anymore. This job may not be attached to
>> any one device, not attached to a root node, or associated with multiple
>> graphs entirely.
>>
>> Let's consider using a new command. I'm not sure if this is strictly
>> possible with current QAPI, but Eric will yell at me if it isn't --
>> forgive the wiggly pseudo-specification.
>>
>> query-jobs
>>
>> Will return a list of all jobs.
>>
>> query-jobs sys="block"
>>
>> Will return a list of all block jobs. This will be the only valid
>> subsystem to start with -- we don't have non-block jobs yet and it may
>> be some time before we do.
> 
> In the past we avoided filtering functionality in query-* commands and
> instead required the client to request the full list and do its own
> filtering. I'm not strictly opposed to such functionality if it's useful
> enough, but we should be aware that we're doing something new here.
> 

I don't intend to add more complex filtering systems, but it seemed a
natural fit in this instance. In particular, allowing `query-jobs
sys=block` gives us the 2.0 equivalent to the block job query command.

Should make scripts easy to upgrade and doesn't cost us anything.

Why have we resisted filtering in the past?

>> Each subsystem will have a sys= enumeration, and the jobs for that
>> particular subsystem can subclass the default return type. The QAPI
>> commands can be defined to accept a union of the subclasses so we don't
>> have to specify in advance which type each command accepts, but we can
>> allow the responsible subsystem to interpret it on demand dynamically.
>>
>> The base type can be something along the lines of:
>>
>> { 'struct': 'JobInfo',
>>   'data': {
>>     'type': JobTypeEnum,
>>     'sys': JobSysEnum,
>>     'id': str,
>>     'busy': bool,
>>     'paused': bool,
>>     'ready': bool
>>   }
>> }
> 
> 'ready' optional and missing for jobs which don't require manual
> completion?
> 

Good add, yes.

>> And the BlockJobInfo can inherit/extend, looking like this:
>>
>> { 'struct': 'BlockJobInfo',
>>   'base': JobInfo
>>   'data': {
>>     'len': int,
>>     'offset': int,
> 
> len/offset are misnomers and really represent the progress of the job. I
> think we want to have these in JobInfo, with better names.
> 

We could add a generic "progress" field that goes from [0.0,100.0].
Block Jobs could still report things like bytes transferred, estimated
total bytes, etc.

>>     'speed': 'int',
> 
> Eventually, we'll deprecate this and require insertion of a throttling
> filter in the right place instead. But for now, I'm afraid we'll need
> this.
> 
> I'm not completely sure if it makes sense for every thinkable block job,
> though. A block job might in theory have no background operation or more
> than one.
> 

We could make it optional. I went for "minimal changes" at first, but we
can be more drastic if we desire.

>>     'io-status': BlockDeviceIoStatus,
> 
> This has always been a weird error reporting interface because it can't
> tell you whether the I/O error happened on the source or the target.
> It's similar to attach the job to a single BDS: It kind of works in
> practice, but conceptually it doesn't make a lot of sense.
> 

Let's nix it then; we can include it per-node in the tree-view below.

>>     'devices': <see below>,
>>   }
>> }
>>
>> 'devices' will now report a more nuanced view of this Job's associated
>> nodes in our graph, as in the following:
>>
>> { "devices": [
>>   { "name": "drive0",
>>     "nodes": [
>>       { "id": "#node123",
>>         "permissions": "+RW-X"
>>       },
>>       { "id": "#node456",
>>         "permissions": "+R"
>>       }
>>     ]
>>   }
>> }
>>
>> This lets us show exactly what nodes we are touching, what BlockBackends
>> they belong to, and what permissions are involved. Unambiguously. You
>> can use your own imagination for a permissions string that will make
>> sense -- This depends on Jeff Cody's work primarily.
> 
> There is one important information missing: What role that node takes
> for the job - that is, which is the source and which is the target. I'm
> not completely sure if the op blocker information should be visible to
> users, I consider this mostly internal node management.
> 

The thought was that if the permissions were visible, clients would be
able to pre-determine if they could launch another job or not from the
query data. I was also thinking the exact "role" of that node was not
really important as long as we knew the permissions associated with that
node.

Adding role information wouldn't hurt, though. We can add a string like:

"role": "target"
or
"role": "source"
or
"role": "deleting" (For nodes that get deleted via block commit, for
instance. Have a better name? "middle" to indicate it is not the base or
top? "interior" ?)

or any others that we need to grab permissions on.

> I would suggest that we don't introduce a BlockJobInfo, but let the
> individual job types directly inherit JobInfo, and also replace the list
> of devices by named keys:
> 
>     { 'struct': 'JobInfo',
>       'data': {
>         'type': JobTypeEnum,
>         'sys': JobSysEnum,
>         'id': str,
>         'busy': bool,
>         'paused': bool,
>         'ready': bool,
>         'work_total': 'int',
>         'work_completed': 'int'
>       }
>     }
> 
>     { 'struct': 'BlockJobNode',
>       'data': {
>         'node-name': 'str',
>         'io-status': 'BlockDeviceIoStatus',
>         'permissions': 'whatever'
>       }
>     }
> 
>     { 'struct': 'MirrorJobInfo',
>       'base': JobInfo
>       'data': {
>           'source': 'BlockJobNode',
>           'target': 'BlockJobNode',
>           'speed': 'int',
>       }
>     }
> 

Perhaps. I guess it depends on what information we wish to expose to
management applications. This method is very specific to each job, which
increases the parsing effort required by each management application for
each job we add.

My approach might reveal slightly more than we care for, but it will
definitely apply to the broad range of block jobs sufficiently.

We can always subclass the generic BlockJobInfo to include even more
specific information per-job if we need to, as well -- but I believe
we'll need job-specific discriminators instead of subsystem-specific
discriminators.

>> The new command gives a chance to make a clean break and any users of
>> this new command won't be confused about the information they receive.
>>
>> The legacy qmp-query-block-jobs command can continue to operate
>> providing a best-effort approximation of the data it finds. The legacy
>> command will *abort* and return error if it detects any job that was
>> launched by a new-style command, e.g. if it detects there is more than
>> one job attached to any node under a single BlockBackend -- clearly the
>> user has been using new features -- and should abort announcing this
>> command is deprecated.
> 
> I agree with that. As soon as you use the new API once, you should be
> required to use it consistently.
> 
>> If the graph looks reasonably plain, it can report back the information
>> it always has, except that it will now also report the "id" field in
>> addition to be perfectly unambiguous about how to issue commands to this
>> job, like I outlined in the first paragraph of this section.
>>
>>
>> ==== block-job-cancel ====
>>
>> This command can remain as a legacy command. Provided the "device"
>> provided has precisely one job attached, we can allow this command to
>> cancel it. If more complicated configurations are found, abort and
>> encourage the user to use a new API.
>>
>> We can refuse to augment block-job-cancel; forcing users who want to
>> specify IDs to cancel to use the new API.
>>
>> We can add a new "job-cancel" command which removes the block specificity.
>>
>> We can introduce sub-classed arguments as needed for e.g. BlockJob
>> cancellations:
>>
>> { 'command': 'job-cancel',
>>   'data': { 'id': str,
>>             '*force': bool,
>>             'options': JobCancelOpts
>>   }
>> }
>>
>> The idea is that JobCancelOpts would be a subclassable type that can be
>> passed directly to the subsystem's implementation of the cancel
>> operation for further interpretation, so different subsystems can get
>> relevant arguments as needed. We don't currently have any such need for
>> BlockJobCancelOpts, but it can't hurt to be prepared for it.
>>
>> The type can be interpreted through the lens of whichever type the job
>> we've identified expects to see. (Block Jobs expect to see
>> BlockJobCancelOpts, etc.)
> 
> Sounds good, possibly with s/subsystem/individual job type/.
> 
> 'options' should be optional.
> 

Yes.

>> ==== block-job-pause, block-job-resume, block-job-complete ====
>>
>> Same story as above. Introduce job-pause, job-resume and job-complete
>> with subclassible job structures we can use for the block subsystem.
>>
>> Legacy commands continue to operate only as long as the "device"
>> argument is unambiguous to decipher.
> 
> Ack.
> 
>> ==== block-job-set-speed ====
>>
>> This one is interesting since it definitely does apply only to Block Jobs.
> 
> Why? The one non-block job we're envisioning so far, live migration,
> does have its own speed option.
> 
> I think it highly dependent on the job type whether it makes sense.
> 

Fair enough -- It just seemed likely that "speed" was rather
block-specific in our current usage of it, but it /could/ become useful
to other systems.

We can just opt for the generic property setting API anyway.

>> We can augment it and limp it along, allowing e.g.
>>
>> { 'data':
>>   { '*device': 'str',
>>     '*id': 'str',
>>     'speed': 'int'
>>   }
>> }
>>
>> Where we follow the "One of ID or Device but not both nor either"
>> pattern that we've applied in the past.
>>
>> Or, we can say "If there isn't one job per device, reject the command
>> and use the new API"
> 
> The latter would be more consistent, in my opinion.
> 
>> where the new API is:
>>
>> job-set-option :
>>   { 'id': str,
>>     'options': {
>>       '*etc': ...,
>>       '*etc2': ...,
>>     }
>>   }
>>
>> Similar to the other commands, the idea would be to take the subclassed
>> options structure that belongs to that Job's Subclass (e.g.
>> BlockJobOptions) but allow any field inside to be absent.
>>
>> Then we could have commands like this:
>>
>> job-set-option \
>> arguments={ 'id': '#job789',
>>             'options': {
>>               'speed': 10000
>>             }
>> }
>>
>> And so on. These would remain a flexible way of setting any various
>> post-launch options a job would need, and the relevant job-subsystem can
>> be responsible for rejecting certain options, etc.
> 
> That's an interesting thought. I kind of like it, but haven't thought
> through the consequences yet.
> 
>> I believe that covers all the existing jobs interface in the QMP, and
>> that should be sufficiently vague and open-ended enough to behave well
>> with a generic jobs system if we roll one out in the future.
>>
>> Eric, Markus: Please inform me of all the myriad ways my fever dreams
>> for QAPI are wrong. :~)
> 
> Kevin
> 

Thanks for taking a look. You didn't duplicate anything by Eric afaict!
--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Jobs 2.0 QAPI [RFC]
  2015-12-21 16:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2015-12-21 19:40   ` John Snow
  2015-12-22 16:19     ` Alberto Garcia
  0 siblings, 1 reply; 10+ messages in thread
From: John Snow @ 2015-12-21 19:40 UTC (permalink / raw)
  To: Alberto Garcia, Qemu-block; +Cc: Kevin Wolf, qemu-devel, Markus Armbruster

On 12/21/2015 11:58 AM, Alberto Garcia wrote:
> On Thu 17 Dec 2015 01:50:08 AM CET, John Snow <jsnow@redhat.com> wrote:
>> In working through a prototype to enable multiple block jobs. A few
>> problem spots in our API compatibility become apparent.
>>
>> In a nutshell, old Blockjobs rely on the "device" to identify the job,
>> which implies:
>>
>> 1) A job is always attached to a device / the root node of a device
>> 2) There can only be one job per device
>> 3) A job can only affect one device at a time
>> 4) Once created, a job will always be associated with that device.
>>
>> All four of these are wrong and something we need to change, so
>> principally the "device" addressing scheme for Jobs needs to go and we
>> need a new ID addressing scheme instead.
> 
> Out of curiosity, do you have specific examples of block jobs that are
> affected by these problems?
> 

The motivating problem is multiple block jobs. Multiple block jobs are
hard to implement in a generic way (instead of per-job) because the API
is not suited well for it.

We need to refer to jobs by ID instead of "device," but while we're at
it Jeff Cody is working on a more universal/fine-grained op blocker
permission system. (See his RFC discussion thread for more details.)

The two can be co-developed to form a new jobs API.

> For the intermediate block-stream functionality I was having problems
> because of 1), so I extended the 'device' parameter to identify
> arbitrary node names as well.
> 
> Just to make things clear: your proposal looks good to me, I'm only
> wondering whether you're simply looking for a cleaner API or you have a
> use case that you cannot fulfill because of the way block jobs work
> now...
> 

The cleaner interface is definitely the larger motivator.

> Thanks!
> 
> Berto
> 

However, better flexibility also plays a part. Say we have two devices:

[drive0]: [X] --> [Y] --> [Z]
[drive1]: [A] --> [B]

In theory, we should be able to commit Z into Y into X while we
simultaneously perform a backup from X to A. We definitely can't do that
now.

There may be some better use cases -- backups, fleecing and other
read-only operations in particular have a high likelihood of being able
to run concurrently with other operations.

We definitely *can* just extend the old API to allow for these kinds of
things, but since it represents a new paradigm of job manipulation, it's
easier to just extend the block jobs api into a new "jobs" API and allow
the system to expand to other subsystems.

Thanks,
--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Jobs 2.0 QAPI [RFC]
  2015-12-21 19:40   ` John Snow
@ 2015-12-22 16:19     ` Alberto Garcia
  2015-12-22 16:21       ` John Snow
  0 siblings, 1 reply; 10+ messages in thread
From: Alberto Garcia @ 2015-12-22 16:19 UTC (permalink / raw)
  To: John Snow, Qemu-block; +Cc: Kevin Wolf, qemu-devel, Markus Armbruster

On Mon 21 Dec 2015 08:40:26 PM CET, John Snow <jsnow@redhat.com> wrote:
> However, better flexibility also plays a part. Say we have two devices:
>
> [drive0]: [X] --> [Y] --> [Z]
> [drive1]: [A] --> [B]
>
> In theory, we should be able to commit Z into Y into X while we
> simultaneously perform a backup from X to A. We definitely can't do that
> now.
>
> There may be some better use cases -- backups, fleecing and other
> read-only operations in particular have a high likelihood of being able
> to run concurrently with other operations.
>
>
> We definitely *can* just extend the old API to allow for these kinds of
> things, but since it represents a new paradigm of job manipulation, it's
> easier to just extend the block jobs api into a new "jobs" API and allow
> the system to expand to other subsystems.

That sounds good to me. Since I'm trying to achieve something similar
with the block-stream operations (two or more ops in parallel) I'll try
to follow this new API closely.

Thanks,

Berto

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Jobs 2.0 QAPI [RFC]
  2015-12-22 16:19     ` Alberto Garcia
@ 2015-12-22 16:21       ` John Snow
  0 siblings, 0 replies; 10+ messages in thread
From: John Snow @ 2015-12-22 16:21 UTC (permalink / raw)
  To: Alberto Garcia, Qemu-block; +Cc: Kevin Wolf, qemu-devel, Markus Armbruster



On 12/22/2015 11:19 AM, Alberto Garcia wrote:
> On Mon 21 Dec 2015 08:40:26 PM CET, John Snow <jsnow@redhat.com> wrote:
>> However, better flexibility also plays a part. Say we have two devices:
>>
>> [drive0]: [X] --> [Y] --> [Z]
>> [drive1]: [A] --> [B]
>>
>> In theory, we should be able to commit Z into Y into X while we
>> simultaneously perform a backup from X to A. We definitely can't do that
>> now.
>>
>> There may be some better use cases -- backups, fleecing and other
>> read-only operations in particular have a high likelihood of being able
>> to run concurrently with other operations.
>>
>>
>> We definitely *can* just extend the old API to allow for these kinds of
>> things, but since it represents a new paradigm of job manipulation, it's
>> easier to just extend the block jobs api into a new "jobs" API and allow
>> the system to expand to other subsystems.
> 
> That sounds good to me. Since I'm trying to achieve something similar
> with the block-stream operations (two or more ops in parallel) I'll try
> to follow this new API closely.
> 
> Thanks,
> 
> Berto
> 

Sure! Make sure you check out Jeff Cody's RFC on the new permissions
system which will form the basis for allowing multiple jobs, too.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-22 16:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-17  0:50 [Qemu-devel] Jobs 2.0 QAPI [RFC] John Snow
2015-12-18 18:07 ` Eric Blake
2015-12-18 21:24   ` John Snow
2015-12-18 21:46     ` Eric Blake
2015-12-21 12:31 ` Kevin Wolf
2015-12-21 17:45   ` John Snow
2015-12-21 16:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2015-12-21 19:40   ` John Snow
2015-12-22 16:19     ` Alberto Garcia
2015-12-22 16:21       ` John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).