From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40408) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VLTRd-0003c3-FG for qemu-devel@nongnu.org; Mon, 16 Sep 2013 03:42:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VLTRQ-0000Gu-6r for qemu-devel@nongnu.org; Mon, 16 Sep 2013 03:42:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58359) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VLTRP-0000Gd-Vl for qemu-devel@nongnu.org; Mon, 16 Sep 2013 03:41:52 -0400 Date: Mon, 16 Sep 2013 15:41:45 +0800 From: Fam Zheng Message-ID: <20130916074145.GA19311@T430s.nay.redhat.com> References: <20130903162449.GF5285@irqsave.net> <20130906075606.GD4814@T430s.nay.redhat.com> <20130906084513.GE2588@dhcp-200-207.str.redhat.com> <20130906091820.GA24154@T430s.nay.redhat.com> <20130906095538.GF2588@dhcp-200-207.str.redhat.com> <20130915181021.GA5868@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20130915181021.GA5868@irqsave.net> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Block Filters Reply-To: famz@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?iso-8859-1?Q?Beno=EEt?= Canet Cc: Kevin Wolf , jcody@redhat.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com On Sun, 09/15 20:10, Beno=EEt Canet wrote: > Le Friday 06 Sep 2013 =E0 11:55:38 (+0200), Kevin Wolf a =E9crit : > > Am 06.09.2013 um 11:18 hat Fam Zheng geschrieben: > > > On Fri, 09/06 10:45, Kevin Wolf wrote: > > > > Am 06.09.2013 um 09:56 hat Fam Zheng geschrieben: > > > > > Since BlockDriver.bdrv_snapshot_create() is an optional operati= on, blockdev.c > > > > > can navigate down the tree from top node, until hitting some la= yer where the op > > > > > is implemented (the QCow2 bs), so we get rid of this top_node_b= elow_filter > > > > > pointer. > > > >=20 > > > > Is it even inherent to a block driver (like a filter), if a snaps= hot is > > > > to be taken at its level? Or is it rather a policy decision that = should > > > > be made by the user? > > > >=20 > > > OK, getting the point that user should have full flexibility and fi= ne operation > > > granularity. It also stands against block_backend->top_node_below_f= ilter. Do we > > > really have the assumption that all the filters are on top of the t= ree and linear? > > > Shouldn't this be possible? > > >=20 > > > Block Backend > > > | > > > | > > > Quodrum BDS > > > / | \ > > > throttle filter | \ > > > / | \ > > > qcow2 qcow2 qcow2 > > >=20 > > > So we throttle only a particular image, not the whole device. But t= his will > > > make a top_node_below_filter pointer impossible. > >=20 > > I was assuming that Beno=EEt's model works for the special case of > > snapshotting in one predefined way, but this is actually a very good > > example of why it doesn't. > >=20 > > The approach relies on snapshotting siblings together, and in this ca= se > > the siblings would be throttle/qcow2/qcow2, while throttle is still a= filter. This > > would mean that either throttle needs to be top_node_below_filter and > > throttling doesn't stay on top, or the left qcow2 is > > top_node_below_filter and the other Quorum images aren't snapshotted. > >=20 > > > > In our example, the quorum driver, it's not at all clear to me th= at you > > > > want to snapshot all children. In order to roll back to a previou= s > > > > state, one snapshot is enough, you don't need multiple copies of = the > > > > same one. Perhaps you want two so that we can still compare them = for > > > > verification. Or all of them because you can afford the disk spac= e and > > > > want ultimate safety. I don't think qemu can know which one is tr= ue. > > > >=20 > > > Only if quorum ever knows about and operates on snapshots, it shoul= d be > > > considered specifically, but no. So we need to achieve this in the = general > > > design: allow user to take snapshot, or set throttle limits on part= icular > > > BDSes, as above graph. > > >=20 > > > > In the same way, in a typical case you may want to keep I/O throt= tling > > > > for the whole drive, including the new snapshot. But what if the > > > > throttling was used in order to not overload the network where th= e image > > > > is stored, and you're now doing a local snapshot, to which you wa= nt to > > > > stream the image? The I/O throttling should apply only to the bac= king > > > > file, not the new snapshot. > > > >=20 > > > Yes, and OTOH, throttling really suits to be a filter only if it ca= n be a non > > > top one, otherwise it's no better than what we have now. > >=20 > > Well, it would be a cleaner architecture in any case, but having it i= n > > the middle of the stack feels useful indeed, so we should support it. > >=20 > > > > So perhaps what we really need is a more flexible snapshot/BDS tr= ee > > > > manipulation command that describes in detail which structure you= want > > > > to have in the end. > >=20 > > Designing the corresponding QMP command is the hard part, I guess. >=20 > During my vacation I though about the fact that JSON is pretty good to = build a > tree. >=20 > QMP, HMP and the command line could take a "block-tree" argument which = would > look like the following. >=20 > block-tree =3D { 'quorum': [ > { > 'throttle' : { > 'qcow2' : { 'filename': "img1.qco= w2" } > 'snapshotable': true, What's the 'snapshotable' for? > }, > 'throttle-iops' : 150, > 'throttle-iops-max' : 1000, > }, > { > 'qcow2' : { 'filename': "img2.qcow2" = }, > 'snapshotable': true, > }, > { > 'qcow2' : { 'filename': "img3.qcow2" = } > 'snapshotable': false, > } > ] }; >=20 It's not very clear to me. Does this mean a key associated with a dict va= lue means creating type? What do you put in the inner dict (i.e. why filename= here) and what to put outter besides the key (i.e. snapshotable)? Where to put = 'id'? I think JSON is flexible enough to specify anything we can take, but the = format needs to be designed carefully. And do we really want to use JSON in the command line options? Very hard to imagine that. :) Thanks, Fam > This would be passed to QEMU in a compact form without carriage return = and > spaces. >=20 > The block layer would convert this to C structs like the QMP code would= do for a > QMP command and the bs tree would be recursively build from top to bott= om by > the Block Backend and each Block driver in the path using the C structs. >=20 > Each level would instanciate the lower level until a raw or protocol dr= iver is > reached. >=20 > What about this ? >=20 > Best regards >=20 > Beno=EEt >=20