From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55436) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwYld-0007oY-4E for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XwYlW-0008GP-MP for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34512) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwYlW-0008GE-Cq for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:56:26 -0500 From: Markus Armbruster References: <87vbltvpl0.fsf@blackfin.pond.sub.org> <20141203103041.GB4404@noname.str.redhat.com> Date: Thu, 04 Dec 2014 16:56:18 +0100 In-Reply-To: <20141203103041.GB4404@noname.str.redhat.com> (Kevin Wolf's message of "Wed, 3 Dec 2014 11:30:41 +0100") Message-ID: <87k327e7dp.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Review of monitor commands identifying BDS / BB by name List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Fam Zheng , jcody@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, Stefan Hajnoczi , benoit.canet@nodalink.com Kevin Wolf writes: > [ CCed Beno=C3=AEt and Max, this is blockdev work ] > [ CCed Jeff, we're also talking about op blockers ] > > Not stripping quoted text for their convenience. > > Am 02.12.2014 um 20:06 hat Markus Armbruster geschrieben: >> =3D Introduction =3D >>=20 >> The block layer and its monitor commands have evolved, resulting in a >> bit of a mess when it comes to referring to block layer objects in >> commands. >>=20 >> Semantically, we refer to two different kinds of objects: backends and >> nodes within backends. >>=20 >> Example: eject parameter @device names a backend. >>=20 >> Example: change-backing-file parameter @image-node-name names a node. >>=20 >> Backend and node names share a name space. Names are unique. >> Corollary: a name unambiguously identifies either a backend, or a node, >> or nothing. >>=20 >> Example: blockdev-add parameter @options is a union with discriminator >> "driver". With its value is "raw", member "file" is an anonymous union. >> Its string value can name either a backend or a node. >>=20 >> Node names are a fairly recent feature. Before, we referenced nodes by >> their file name. Pretty schlocky. Should be replaced by node name. >>=20 >> Example: block-commit parameter @base is the "file name of the backing >> image to write data into". In other words, it identifies a node. We >> should add a node name parameter, and deprecate @base. >>=20 >> Some commands predating the node name feature can reference only >> backends even though nodes could make sense, too. Such restrictions >> should be lifted. Others reference backends where only nodes make >> sense. Should be cleaned up. >>=20 >> Example: drive-mirror parameter @device names a backend. This restricts >> mirroring to backends. If we want to support mirroring nodes, we need >> to extend the command to permit referencing nodes. >>=20 >> Example: block_passwd parameter @device names a backend. However, >> backends aren't encryped, nodes are. In 2.0, we cleaned this up: we >> added parameter @node-name. We kept @device for backward compatibility. >> Either @device or @node-name must be given. >>=20 >> Note: in block_passwd, we have separate parameters for backend and node >> name. In the blockdev-add example above, we have only one, and use its >> value to figure out what it is. >>=20 >> I find this inconsistency rather ugly. We discussed cleaning it up in >> the "BB name vs. BDS name in the QAPI schema" thread. >>=20 >> A backend has a tree of nodes. We call the tree's root the backend's >> root node. >>=20 >> For convenience, we sometimes accept a backend name when a node name is >> wanted, and resolve it to the backend's root node. >>=20 >> Example: block_passwd again; it's how its @device parameter works. >>=20 >> Recent development (blockdev-add) permits nodes to be shared by multiple >> backends. Op blockers should ensure the sharing is safe. I wouldn't >> bet a nickel on this to work today. >>=20 >>=20 >> =3D Block layer data structures =3D >>=20 >> A backend is represented by a BlockBackend object (short: BB). >>=20 >> A node is represented by a BlockDriverState object (short: BDS). >>=20 >> A backend (BB) has a tree of nodes (BDSes). >>=20 >> Two of a nodes children carry special meaning: the one stored in BDS >> member file (sometimes called "parent"), and the one stored in BDS >> member backing_hd (sometimes called "backing"). These used to be the >> only children a node could have. >>=20 >> A BB always has a name. We sometimes call it device name. Stored in BB >> member name. >>=20 >> A BDS may have a name. We call it node name. Stored in BDS member >> node_name[], empty when the node has no name. >>=20 >> A BDS has a file name. The actual matching of a command's file name >> argument against a BDS's file name is complex, but we don't have to >> worry about that here. >>=20 >> BBs are a fairly recent invention. Before, a backend was represented by >> a BDS with a non-empty member device_name[]. >>=20 >>=20 >> =3D Commands =3D >>=20 >> I'm going to discuss all QMP commands whose handlers are defined in >> block*. To make sense of it, you'll probably have to peruse the command >> documentation in the schema and/or qmp-commands.hx. >>=20 >> When I think it's fairly obvious what needs to be done for a command, I >> write it down as TODO. Please challenge it if you think I'm wrong. >>=20 >> When it's not so obvious, I write it down as questions. Answers >> appreciated. >>=20 >> =3D=3D block-core.json =3D=3D >>=20 >> * block-commit >>=20 >> @device names a backend, @top and @base each name one of its nodes via >> file name matching. >>=20 >> TODO: support specifying the two nodes via node name. >>=20 >> Why do we need @device when a top node is specified? >>=20 >> * We currently need the backend to set op blockers, and finding a >> node's backend isn't easy. Implementation detail shows through the >> interface, blech. >>=20 >> * We need to know whether the top node is the active layer. >>=20 >> Complication: if it's shared by multiple backends, it may be the >> active layer in one but not the other. Specifying the backend >> resolves the ambiguity. Whether that makes sense I don't know. > > It doesn't. > > The real requirement for the commit job (for inactive layers) is that > base...top (I'm using git-rev-parse syntax for describing nodes and > subtrees from here on) is read-only for the duration of the commit > operation. For committing the active layer, we use a mirror job > internally, which works for images that are written during the > operation. > > In theory, the mirror should work in all cases, it's just less efficient > (and the implementation of the completion code which reconfigures the > tree would have to be changed). Jeff, is this correct? > > What commit should be doing: > - Check whether it can block (as in op blockers) writes to top, i.e. no > other user is currently requiring the ability to write > - If it can, block writes on base...top and start a commit job > - If it can't, block writes on base...top^ and start a mirror job > - If it can't block at least base...top^, fail > > This automatically solves the problem of multiple parents, as long as > these parents advertise correctly which op blocker capabilities they are > using (no, they don't, and we don't have the infrastructure for them to > do so yet - but I think it's becoming clearer what it would have to look > like). Makes sense. >> * block-job-cancel >> * block-job-complete >> * block-job-pause >> * block-job-resume >> * block-job-set-speed >>=20 >> @device names a backend. >>=20 >> The question whether we need to support a node name here is moot, >> because jobs shouldn't be tied to a single backend (or node) in the >> first place. They should have their own, independent job name. > > Correct. > > TODO: add a @name argument to commands starting block jobs > TODO: add a @job-name argument to these job management commands and > deprecate @device > > How to get rid of bs->job to actually allow multiple jobs on one BDS and > make the job IDs useful is a separate question that is harder to answer. Good point: we can fix the interface before we fix the implementation. >> * block-stream >>=20 >> @device names a backend, @base names a node via file name matching. >>=20 >> TODO: support specifying the node via node name. >>=20 >> I think backend is inappropriate here, we should support streaming up >> to a node, just like block-commit supports committing down from a >> node. > > Yes. I think Beno=C3=AEt suggested this before. I don't remember if he al= ready > coded up something. > >> * block_passwd >> * block_resize >>=20 >> @node-name names a node. >>=20 >> @device names a backend, and references its root node, for >> compatibility. >>=20 >> Either @device or @node-name must be given. >>=20 >> TODO: should have single mandatory parameter instead of two optionals. > > How is this "fairly obvious what needs to be done"? We can't get rid of > any field because of compatibility. Do you mean this: > > TODO: Make @device accept node names as well Obviousness is subjective. I like to think about a nice, clean interface first, and worry about compatibility second. Compatibility may force us to compromise on cleanliness. I feel this helps finding the least unclean interface. Furthermore, I like to deprecate unwanted interfaces. Advertize the clean core, not the compatibility gunk. Let's try this approach here. I think we can agree that the clean interface is "single mandatory parameter". Ironically, that's what we had until we screwed it up in 2.0. @device is a sub-optimal name for this single parameter. Either we accept that and move on, or we deprecate it in favor of a new parameter with a better name. I guess the better name isn't worth that much trouble, in particular since the command name is already ugly. When @node-name is given, @device must not be given. So @device is mandatory *except* in the @node-name usage we retain for compatibility. Deprecate the usage. Command documentation could then look like this: ## # @block-resize # # Resize a block image while a guest is running. # # @device: the name of the block backend or node to resize (node names # supported since 2.3) # # @size: new image size in bytes # # Deprecated usage (since 2.3): # # @device: #optional the name of the block backend to resize # # @node-name: #optional name of the node to resize (since 2.0) # # Either @device or @node-name must be set but not both. # # Since: 0.14.0 >> * blockdev-snapshot-sync >>=20 >> @node-name and @device as for block_passwd, including TODO. >>=20 >> @snapshot-node-name defines the new node's node name. >>=20 >> * block_set_io_throttle >>=20 >> @device names a backend. >>=20 >> TODO: support nodes. >>=20 >> Note: we'd like to redo throttling as a block filter node, so maybe >> we'll have a completely different command instead. >>=20 >> * blockdev-add >>=20 >> This command defines a backend and its node tree, where sub-trees may >> be references to existing nodes. >>=20 >> @id defines the backend's name. >>=20 >> @node-name defines its root node's node name. >>=20 >> Note: blockdev-add always defines a backend. When you build up a >> backend bottom-up with several commands, you get a bunch of unwanted >> backends "in the middle". I'd like to make @id optional, and omit the >> backend when it's missing. > > Yes, this makes sense, as we discussed privately a while ago. > >> * change-backing-file >>=20 >> @device names a backend, @image-node-name names a node. >>=20 >> As far as I can tell, we need the backend only to set op blockers. >> Implementation detail shows through the interface. > > Once we have the new op blockers, we'll make @device optional then and > completely ignore its value. > >> * drive-backup >>=20 >> @device names a backend. >>=20 >> Do we want to be able to back up any node, or only a backend? >>=20 >> Note: documentation of @target sounds like it could somehow name a >> backend, but as far as I can tell it's always interpreted as file >> name. > > I can't think of a reasonable use case for backing up non-roots because > the only thing that would write to them are other block jobs. So you > could backup the old snapshot contents while committing to it. > Usefulness questionable. > > That said, while there's no urgent need for it, it would be nicer to > allow all operations that can safely be performed, and this is one of > them. Might fall out almost natually once we have op blockers, so that > we really only need to add a @node-name option. > > But that's something for later. > >> * drive-mirror >>=20 >> @device names a backend, @replaces names a node, and @node-name >> defines the name of the new node. >>=20 >> Do we want to be able to mirror any node, or only a backend? >>=20 >> Note: documentation of @target sounds like it could somehow name a >> backend, but as far as I can tell it's always interpreted as file >> name. >>=20 >> Note: drive-mirror supports @replaces, but drive-backup doesn't. Odd. > > We probably want to mirror any node (Beno=C3=AEt has a use case for this = with > replacing broken quorum children). > >> * query-block >>=20 >> Returns information on all backends as array of BlockInfo. BlockInfo >> member @device is the backend name. >>=20 >> * query-named-block-nodes >>=20 >> Returns information on all named nodes as an array of BlockDeviceInfo >> (we suck at naming). BlockDeviceInfo member @node-name is the node >> name. >>=20 >> You can't get information on nodes that don't have a node name. >>=20 >> * query-blockstats >>=20 >> Returns some stats for all backends as array of BlockStats. >> BlockStats member @device is the backend name. Member @stats contains >> the stats for the backend's root node. Member @parent is BlockStats >> for the child node that is stored in BDS member file. Member @backing >> is BlockStats for the chold node that is stored in BDS member >> backing_hd. Stats for other children aren't returned. >>=20 >> TODO: generalize this to the full tree, include node names. > > Do we want the same thing for query-block? There are a couple of ways to skin the query-cat. Let's get the easy case out of the way first: a query that reports only backend properties and not node properties can return an array where each element carries a backend name, like query-block does now. For queries that report node properties, we have a couple of options: * Flat array with node names Like current query-named-block-nodes. Can't report on nodes without names. Jeff's project to give all nodes names would moot this issue. * Array of trees mirroring the actual node forest, Similar to current query-blockstats. Tree roots correspond to backends, and backends have names. Unfortunately, the nodes don't actually form a forest: node trees may be shared. To turn it into make a forest, we'd have to duplicate the shared trees. * Hybrid: array of trees, but named sub-trees are represented by name Like the previous one, except the recursion stops at named nodes. Instead of including such a sub-tree, we reference it by name, then add it to the array if it's not already there. "Flat array" seems simplest. >> * query-block-jobs >>=20 >> Returns information on block jobs as array of BlockJobInfo. A block >> job is always tied to a backend, and BlockJobInfo member @device is >> its name. >>=20 >> The question whether we need a node name here is moot; see >> block-job-cancel above. >>=20 >> =3D=3D block.json =3D=3D >>=20 >> * blockdev-snapshot-delete-internal-sync >> * blockdev-snapshot-internal-sync >>=20 >> @device names a backend. >>=20 >> Do we want to be able to snapshot any node, or only a backend? > > Probably only a backend. At least until someone comes up with a > reasonable use case. Internal snapshots and backing files don't mix very > well anyway. > >> * eject >>=20 >> @device names a backend. Appropriate, because this is really a >> backend operation. >>=20 >> * nbd-server-add >>=20 >> @device names a backend. >>=20 >> I think backend is appropriate here. The NBD server sits on top of >> the block layer just like device models do. It should therefore use >> the BB API. See Max's [PATCH v2 0/6] nbd: Use BlockBackend. > > Agreed. > >> * nbd-server-start >> * nbd-server-stop >>=20 >> No backend or node name parameters. >>=20 >> =3D=3D qapi-schema.json =3D=3D >>=20 >> * transaction >>=20 >> This is a wrapper around a list of transaction-capable commands. >> Thus, nothing new here. > > Surprisingly few really hard questions. Good :)