qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Benoît Canet" <benoit.canet@irqsave.net>
To: Fam Zheng <famz@redhat.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
	"Kevin Wolf" <kwolf@redhat.com>,
	jcody@redhat.com, armbru@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] Block Filters
Date: Mon, 16 Sep 2013 13:00:21 +0200	[thread overview]
Message-ID: <20130916110021.GD5105@irqsave.net> (raw)
In-Reply-To: <20130916074145.GA19311@T430s.nay.redhat.com>

Le Monday 16 Sep 2013 à 15:41:45 (+0800), Fam Zheng a écrit :
> On Sun, 09/15 20:10, Benoît Canet wrote:
> > Le Friday 06 Sep 2013 à 11:55:38 (+0200), Kevin Wolf a écrit :
> > > Am 06.09.2013 um 11:18 hat Fam Zheng geschrieben:
> > > > On Fri, 09/06 10:45, Kevin Wolf wrote:
> > > > > Am 06.09.2013 um 09:56 hat Fam Zheng geschrieben:
> > > > > > Since BlockDriver.bdrv_snapshot_create() is an optional operation, blockdev.c
> > > > > > can navigate down the tree from top node, until hitting some layer where the op
> > > > > > is implemented (the QCow2 bs), so we get rid of this top_node_below_filter
> > > > > > pointer.
> > > > > 
> > > > > Is it even inherent to a block driver (like a filter), if a snapshot is
> > > > > to be taken at its level? Or is it rather a policy decision that should
> > > > > be made by the user?
> > > > > 
> > > > OK, getting the point that user should have full flexibility and fine operation
> > > > granularity. It also stands against block_backend->top_node_below_filter. Do we
> > > > really have the assumption that all the filters are on top of the tree and linear?
> > > > Shouldn't this be possible?
> > > > 
> > > >                    Block Backend
> > > >                          |
> > > >                          |
> > > >                     Quodrum BDS
> > > >                     /    |    \
> > > >              throttle filter  |     \
> > > >                   /      |      \
> > > >                 qcow2   qcow2   qcow2
> > > > 
> > > > So we throttle only a particular image, not the whole device. But this will
> > > > make a top_node_below_filter pointer impossible.
> > > 
> > > I was assuming that Benoît's model works for the special case of
> > > snapshotting in one predefined way, but this is actually a very good
> > > example of why it doesn't.
> > > 
> > > The approach relies on snapshotting siblings together, and in this case
> > > the siblings would be throttle/qcow2/qcow2, while throttle is still a filter. This
> > > would mean that either throttle needs to be top_node_below_filter and
> > > throttling doesn't stay on top, or the left qcow2 is
> > > top_node_below_filter and the other Quorum images aren't snapshotted.
> > > 
> > > > > In our example, the quorum driver, it's not at all clear to me that you
> > > > > want to snapshot all children. In order to roll back to a previous
> > > > > state, one snapshot is enough, you don't need multiple copies of the
> > > > > same one. Perhaps you want two so that we can still compare them for
> > > > > verification. Or all of them because you can afford the disk space and
> > > > > want ultimate safety. I don't think qemu can know which one is true.
> > > > > 
> > > > Only if quorum ever knows about and operates on snapshots, it should be
> > > > considered specifically, but no. So we need to achieve this in the general
> > > > design: allow user to take snapshot, or set throttle limits on particular
> > > > BDSes, as above graph.
> > > > 
> > > > > In the same way, in a typical case you may want to keep I/O throttling
> > > > > for the whole drive, including the new snapshot. But what if the
> > > > > throttling was used in order to not overload the network where the image
> > > > > is stored, and you're now doing a local snapshot, to which you want to
> > > > > stream the image? The I/O throttling should apply only to the backing
> > > > > file, not the new snapshot.
> > > > > 
> > > > Yes, and OTOH, throttling really suits to be a filter only if it can be a non
> > > > top one, otherwise it's no better than what we have now.
> > > 
> > > Well, it would be a cleaner architecture in any case, but having it in
> > > the middle of the stack feels useful indeed, so we should support it.
> > > 
> > > > > So perhaps what we really need is a more flexible snapshot/BDS tree
> > > > > manipulation command that describes in detail which structure you want
> > > > > to have in the end.
> > > 
> > > Designing the corresponding QMP command is the hard part, I guess.
> > 
> > During my vacation I though about the fact that JSON is pretty good to build a
> > tree.
> > 
> > QMP, HMP and the command line could take a "block-tree" argument which would
> > look like the following.
> > 
> > block-tree = { 'quorum': [
> >                               {
> >                                   'throttle' : {
> >                                       'qcow2' : { 'filename': "img1.qcow2" }
> >                                       'snapshotable': true,
> 
> What's the 'snapshotable' for?

Kevin mentioned the fact that when taking a snapshot with QUORUM we may want to
snapshot only a part of the QUORUM branches.
snapshotable is a way to indicate to the QCOW2 driver that it must create a new
snapshot when the snapshoting order cascade down the bs tree.

Best regards

Benoît

> 
> >                                   },
> >                                   'throttle-iops' : 150,
> >                                   'throttle-iops-max' : 1000,
> >                               },
> >                               {
> >                                   'qcow2' : { 'filename': "img2.qcow2" },
> >                                   'snapshotable': true,
> >                               },
> >                               {
> >                                   'qcow2' : { 'filename': "img3.qcow2" }
> >                                   'snapshotable': false,
> >                               }
> >                           ] };
> > 
> 
> It's not very clear to me. Does this mean a key associated with a dict value
> means creating type? What do you put in the inner dict (i.e. why filename here)
> and what to put outter besides the key (i.e. snapshotable)? Where to put 'id'?
> 
> I think JSON is flexible enough to specify anything we can take, but the format
> needs to be designed carefully. And do we really want to use JSON in the
> command line options? Very hard to imagine that. :)
> 
> Thanks,
> 
> Fam
> 
> > This would be passed to QEMU in a compact form without carriage return and
> > spaces.
> > 
> > The block layer would convert this to C structs like the QMP code would do for a
> > QMP command and the bs tree would be recursively build from top to bottom by
> > the Block Backend and each Block driver in the path using the C structs.
> > 
> > Each level would instanciate the lower level until a raw or protocol driver is
> > reached.
> > 
> > What about this ?
> > 
> > Best regards
> > 
> > Benoît
> > 

      parent reply	other threads:[~2013-09-16 10:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 16:24 [Qemu-devel] Block Filters Benoît Canet
2013-09-04  8:29 ` Stefan Hajnoczi
2013-09-04 18:15   ` Benoît Canet
2013-09-05  7:32     ` Kevin Wolf
2013-09-05 10:01     ` Stefan Hajnoczi
2013-09-05 10:18       ` Fam Zheng
2013-09-05 14:59         ` Stefan Hajnoczi
2013-09-05 17:29         ` Benoît Canet
2013-09-06  7:42           ` Kevin Wolf
2013-09-06  7:49             ` Fam Zheng
2013-09-06  7:56 ` Fam Zheng
2013-09-06  8:45   ` Kevin Wolf
2013-09-06  9:18     ` Fam Zheng
2013-09-06  9:55       ` Kevin Wolf
2013-09-06 10:50         ` Fam Zheng
2013-09-15 18:10         ` Benoît Canet
2013-09-16  7:41           ` Fam Zheng
2013-09-16 10:43             ` Benoît Canet
2013-09-16 11:00             ` Benoît Canet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130916110021.GD5105@irqsave.net \
    --to=benoit.canet@irqsave.net \
    --cc=armbru@redhat.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).