From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55436)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1XwYld-0007oY-4E
	for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:39 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1XwYlW-0008GP-MP
	for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:33 -0500
Received: from mx1.redhat.com ([209.132.183.28]:34512)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1XwYlW-0008GE-Cq
	for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:26 -0500
From: Markus Armbruster <armbru@redhat.com>
References: <87vbltvpl0.fsf@blackfin.pond.sub.org>
	<20141203103041.GB4404@noname.str.redhat.com>
Date: Thu, 04 Dec 2014 16:56:18 +0100
In-Reply-To: <20141203103041.GB4404@noname.str.redhat.com> (Kevin Wolf's
	message of "Wed, 3 Dec 2014 11:30:41 +0100")
Message-ID: <87k327e7dp.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] Review of monitor commands identifying BDS / BB by
	name
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, jcody@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>, benoit.canet@nodalink.com

Kevin Wolf <kwolf@redhat.com> writes:

> [ CCed Beno=C3=AEt and Max, this is blockdev work ]
> [ CCed Jeff, we're also talking about op blockers ]
>
> Not stripping quoted text for their convenience.
>
> Am 02.12.2014 um 20:06 hat Markus Armbruster geschrieben:
>> =3D Introduction =3D
>>=20
>> The block layer and its monitor commands have evolved, resulting in a
>> bit of a mess when it comes to referring to block layer objects in
>> commands.
>>=20
>> Semantically, we refer to two different kinds of objects: backends and
>> nodes within backends.
>>=20
>> Example: eject parameter @device names a backend.
>>=20
>> Example: change-backing-file parameter @image-node-name names a node.
>>=20
>> Backend and node names share a name space.  Names are unique.
>> Corollary: a name unambiguously identifies either a backend, or a node,
>> or nothing.
>>=20
>> Example: blockdev-add parameter @options is a union with discriminator
>> "driver".  With its value is "raw", member "file" is an anonymous union.
>> Its string value can name either a backend or a node.
>>=20
>> Node names are a fairly recent feature.  Before, we referenced nodes by
>> their file name.  Pretty schlocky.  Should be replaced by node name.
>>=20
>> Example: block-commit parameter @base is the "file name of the backing
>> image to write data into".  In other words, it identifies a node.  We
>> should add a node name parameter, and deprecate @base.
>>=20
>> Some commands predating the node name feature can reference only
>> backends even though nodes could make sense, too.  Such restrictions
>> should be lifted.  Others reference backends where only nodes make
>> sense.  Should be cleaned up.
>>=20
>> Example: drive-mirror parameter @device names a backend.  This restricts
>> mirroring to backends.  If we want to support mirroring nodes, we need
>> to extend the command to permit referencing nodes.
>>=20
>> Example: block_passwd parameter @device names a backend.  However,
>> backends aren't encryped, nodes are.  In 2.0, we cleaned this up: we
>> added parameter @node-name.  We kept @device for backward compatibility.
>> Either @device or @node-name must be given.
>>=20
>> Note: in block_passwd, we have separate parameters for backend and node
>> name.  In the blockdev-add example above, we have only one, and use its
>> value to figure out what it is.
>>=20
>> I find this inconsistency rather ugly.  We discussed cleaning it up in
>> the "BB name vs. BDS name in the QAPI schema" thread.
>>=20
>> A backend has a tree of nodes.  We call the tree's root the backend's
>> root node.
>>=20
>> For convenience, we sometimes accept a backend name when a node name is
>> wanted, and resolve it to the backend's root node.
>>=20
>> Example: block_passwd again; it's how its @device parameter works.
>>=20
>> Recent development (blockdev-add) permits nodes to be shared by multiple
>> backends.  Op blockers should ensure the sharing is safe.  I wouldn't
>> bet a nickel on this to work today.
>>=20
>>=20
>> =3D Block layer data structures =3D
>>=20
>> A backend is represented by a BlockBackend object (short: BB).
>>=20
>> A node is represented by a BlockDriverState object (short: BDS).
>>=20
>> A backend (BB) has a tree of nodes (BDSes).
>>=20
>> Two of a nodes children carry special meaning: the one stored in BDS
>> member file (sometimes called "parent"), and the one stored in BDS
>> member backing_hd (sometimes called "backing").  These used to be the
>> only children a node could have.
>>=20
>> A BB always has a name.  We sometimes call it device name.  Stored in BB
>> member name.
>>=20
>> A BDS may have a name.  We call it node name.  Stored in BDS member
>> node_name[], empty when the node has no name.
>>=20
>> A BDS has a file name.  The actual matching of a command's file name
>> argument against a BDS's file name is complex, but we don't have to
>> worry about that here.
>>=20
>> BBs are a fairly recent invention.  Before, a backend was represented by
>> a BDS with a non-empty member device_name[].
>>=20
>>=20
>> =3D Commands =3D
>>=20
>> I'm going to discuss all QMP commands whose handlers are defined in
>> block*.  To make sense of it, you'll probably have to peruse the command
>> documentation in the schema and/or qmp-commands.hx.
>>=20
>> When I think it's fairly obvious what needs to be done for a command, I
>> write it down as TODO.  Please challenge it if you think I'm wrong.
>>=20
>> When it's not so obvious, I write it down as questions.  Answers
>> appreciated.
>>=20
>> =3D=3D block-core.json =3D=3D
>>=20
>> * block-commit
>>=20
>>   @device names a backend, @top and @base each name one of its nodes via
>>   file name matching.
>>=20
>>   TODO: support specifying the two nodes via node name.
>>=20
>>   Why do we need @device when a top node is specified?
>>=20
>>   * We currently need the backend to set op blockers, and finding a
>>     node's backend isn't easy.  Implementation detail shows through the
>>     interface, blech.
>>=20
>>   * We need to know whether the top node is the active layer.
>>=20
>>     Complication: if it's shared by multiple backends, it may be the
>>     active layer in one but not the other.  Specifying the backend
>>     resolves the ambiguity.  Whether that makes sense I don't know.
>
> It doesn't.
>
> The real requirement for the commit job (for inactive layers) is that
> base...top (I'm using git-rev-parse syntax for describing nodes and
> subtrees from here on) is read-only for the duration of the commit
> operation. For committing the active layer, we use a mirror job
> internally, which works for images that are written during the
> operation.
>
> In theory, the mirror should work in all cases, it's just less efficient
> (and the implementation of the completion code which reconfigures the
> tree would have to be changed). Jeff, is this correct?
>
> What commit should be doing:
> - Check whether it can block (as in op blockers) writes to top, i.e. no
>   other user is currently requiring the ability to write
> - If it can, block writes on base...top and start a commit job
> - If it can't, block writes on base...top^ and start a mirror job
> - If it can't block at least base...top^, fail
>
> This automatically solves the problem of multiple parents, as long as
> these parents advertise correctly which op blocker capabilities they are
> using (no, they don't, and we don't have the infrastructure for them to
> do so yet - but I think it's becoming clearer what it would have to look
> like).

Makes sense.

>> * block-job-cancel
>> * block-job-complete
>> * block-job-pause
>> * block-job-resume
>> * block-job-set-speed
>>=20
>>   @device names a backend.
>>=20
>>   The question whether we need to support a node name here is moot,
>>   because jobs shouldn't be tied to a single backend (or node) in the
>>   first place.  They should have their own, independent job name.
>
> Correct.
>
> TODO: add a @name argument to commands starting block jobs
> TODO: add a @job-name argument to these job management commands and
>       deprecate @device
>
> How to get rid of bs->job to actually allow multiple jobs on one BDS and
> make the job IDs useful is a separate question that is harder to answer.

Good point: we can fix the interface before we fix the implementation.

>> * block-stream
>>=20
>>   @device names a backend, @base names a node via file name matching.
>>=20
>>   TODO: support specifying the node via node name.
>>=20
>>   I think backend is inappropriate here, we should support streaming up
>>   to a node, just like block-commit supports committing down from a
>>   node.
>
> Yes. I think Beno=C3=AEt suggested this before. I don't remember if he al=
ready
> coded up something.
>
>> * block_passwd
>> * block_resize
>>=20
>>   @node-name names a node.
>>=20
>>   @device names a backend, and references its root node, for
>>   compatibility.
>>=20
>>   Either @device or @node-name must be given.
>>=20
>>   TODO: should have single mandatory parameter instead of two optionals.
>
> How is this "fairly obvious what needs to be done"? We can't get rid of
> any field because of compatibility. Do you mean this:
>
> TODO: Make @device accept node names as well

Obviousness is subjective.

I like to think about a nice, clean interface first, and worry about
compatibility second.  Compatibility may force us to compromise on
cleanliness.  I feel this helps finding the least unclean interface.

Furthermore, I like to deprecate unwanted interfaces.  Advertize the
clean core, not the compatibility gunk.

Let's try this approach here.

I think we can agree that the clean interface is "single mandatory
parameter".  Ironically, that's what we had until we screwed it up in
2.0.

@device is a sub-optimal name for this single parameter.  Either we
accept that and move on, or we deprecate it in favor of a new parameter
with a better name.  I guess the better name isn't worth that much
trouble, in particular since the command name is already ugly.

When @node-name is given, @device must not be given.  So @device is
mandatory *except* in the @node-name usage we retain for compatibility.
Deprecate the usage.

Command documentation could then look like this:

##
# @block-resize
#
# Resize a block image while a guest is running.
#
# @device: the name of the block backend or node to resize (node names
# supported since 2.3)
#
# @size: new image size in bytes
#
# Deprecated usage (since 2.3):
#
# @device: #optional the name of the block backend to resize
#
# @node-name: #optional name of the node to resize (since 2.0)
#
# Either @device or @node-name must be set but not both.
#
# Since: 0.14.0

>> * blockdev-snapshot-sync
>>=20
>>   @node-name and @device as for block_passwd, including TODO.
>>=20
>>   @snapshot-node-name defines the new node's node name.
>>=20
>> * block_set_io_throttle
>>=20
>>   @device names a backend.
>>=20
>>   TODO: support nodes.
>>=20
>>   Note: we'd like to redo throttling as a block filter node, so maybe
>>   we'll have a completely different command instead.
>>=20
>> * blockdev-add
>>=20
>>   This command defines a backend and its node tree, where sub-trees may
>>   be references to existing nodes.
>>=20
>>   @id defines the backend's name.
>>=20
>>   @node-name defines its root node's node name.
>>=20
>>   Note: blockdev-add always defines a backend.  When you build up a
>>   backend bottom-up with several commands, you get a bunch of unwanted
>>   backends "in the middle".  I'd like to make @id optional, and omit the
>>   backend when it's missing.
>
> Yes, this makes sense, as we discussed privately a while ago.
>
>> * change-backing-file
>>=20
>>   @device names a backend, @image-node-name names a node.
>>=20
>>   As far as I can tell, we need the backend only to set op blockers.
>>   Implementation detail shows through the interface.
>
> Once we have the new op blockers, we'll make @device optional then and
> completely ignore its value.
>
>> * drive-backup
>>=20
>>   @device names a backend.
>>=20
>>   Do we want to be able to back up any node, or only a backend?
>>=20
>>   Note: documentation of @target sounds like it could somehow name a
>>   backend, but as far as I can tell it's always interpreted as file
>>   name.
>
> I can't think of a reasonable use case for backing up non-roots because
> the only thing that would write to them are other block jobs. So you
> could backup the old snapshot contents while committing to it.
> Usefulness questionable.
>
> That said, while there's no urgent need for it, it would be nicer to
> allow all operations that can safely be performed, and this is one of
> them. Might fall out almost natually once we have op blockers, so that
> we really only need to add a @node-name option.
>
> But that's something for later.
>
>> * drive-mirror
>>=20
>>   @device names a backend, @replaces names a node, and @node-name
>>   defines the name of the new node.
>>=20
>>   Do we want to be able to mirror any node, or only a backend?
>>=20
>>   Note: documentation of @target sounds like it could somehow name a
>>   backend, but as far as I can tell it's always interpreted as file
>>   name.
>>=20
>>   Note: drive-mirror supports @replaces, but drive-backup doesn't.  Odd.
>
> We probably want to mirror any node (Beno=C3=AEt has a use case for this =
with
> replacing broken quorum children).
>
>> * query-block
>>=20
>>   Returns information on all backends as array of BlockInfo.  BlockInfo
>>   member @device is the backend name.
>>=20
>> * query-named-block-nodes
>>=20
>>   Returns information on all named nodes as an array of BlockDeviceInfo
>>   (we suck at naming).  BlockDeviceInfo member @node-name is the node
>>   name.
>>=20
>>   You can't get information on nodes that don't have a node name.
>>=20
>> * query-blockstats
>>=20
>>   Returns some stats for all backends as array of BlockStats.
>>   BlockStats member @device is the backend name.  Member @stats contains
>>   the stats for the backend's root node.  Member @parent is BlockStats
>>   for the child node that is stored in BDS member file.  Member @backing
>>   is BlockStats for the chold node that is stored in BDS member
>>   backing_hd.  Stats for other children aren't returned.
>>=20
>>   TODO: generalize this to the full tree, include node names.
>
> Do we want the same thing for query-block?

There are a couple of ways to skin the query-cat.

Let's get the easy case out of the way first: a query that reports only
backend properties and not node properties can return an array where
each element carries a backend name, like query-block does now.

For queries that report node properties, we have a couple of options:

* Flat array with node names

  Like current query-named-block-nodes.

  Can't report on nodes without names.  Jeff's project to give all nodes
  names would moot this issue.

* Array of trees mirroring the actual node forest,

  Similar to current query-blockstats.

  Tree roots correspond to backends, and backends have names.

  Unfortunately, the nodes don't actually form a forest: node trees may
  be shared.  To turn it into make a forest, we'd have to duplicate the
  shared trees.

* Hybrid: array of trees, but named sub-trees are represented by name

  Like the previous one, except the recursion stops at named nodes.
  Instead of including such a sub-tree, we reference it by name, then
  add it to the array if it's not already there.

"Flat array" seems simplest.

>> * query-block-jobs
>>=20
>>   Returns information on block jobs as array of BlockJobInfo.  A block
>>   job is always tied to a backend, and BlockJobInfo member @device is
>>   its name.
>>=20
>>   The question whether we need a node name here is moot; see
>>   block-job-cancel above.
>>=20
>> =3D=3D block.json =3D=3D
>>=20
>> * blockdev-snapshot-delete-internal-sync
>> * blockdev-snapshot-internal-sync
>>=20
>>   @device names a backend.
>>=20
>>   Do we want to be able to snapshot any node, or only a backend?
>
> Probably only a backend. At least until someone comes up with a
> reasonable use case. Internal snapshots and backing files don't mix very
> well anyway.
>
>> * eject
>>=20
>>   @device names a backend.  Appropriate, because this is really a
>>   backend operation.
>>=20
>> * nbd-server-add
>>=20
>>   @device names a backend.
>>=20
>>   I think backend is appropriate here.  The NBD server sits on top of
>>   the block layer just like device models do.  It should therefore use
>>   the BB API.  See Max's [PATCH v2 0/6] nbd: Use BlockBackend.
>
> Agreed.
>
>> * nbd-server-start
>> * nbd-server-stop
>>=20
>>   No backend or node name parameters.
>>=20
>> =3D=3D qapi-schema.json =3D=3D
>>=20
>> * transaction
>>=20
>>   This is a wrapper around a list of transaction-capable commands.
>>   Thus, nothing new here.
>
> Surprisingly few really hard questions.

Good :)