All of lore.kernel.org
 help / color / mirror / Atom feed
From: Francesco Romani <fromani@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: libvir-list@redhat.com, Nir Soffer <nsoffer@redhat.com>,
	Peter Krempa <pkrempa@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [libvirt] RFC: exposing qemu's block-set-write-threshold
Date: Fri, 22 May 2015 03:24:07 -0400 (EDT)	[thread overview]
Message-ID: <617435499.2339061.1432279447709.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <555EB17D.906@redhat.com>

----- Original Message -----
> From: "Eric Blake" <eblake@redhat.com>
> To: "Francesco Romani" <fromani@redhat.com>
> Cc: libvir-list@redhat.com, "Nir Soffer" <nsoffer@redhat.com>, "Peter Krempa" <pkrempa@redhat.com>,
> qemu-devel@nongnu.org
> Sent: Friday, May 22, 2015 6:33:01 AM
> Subject: Re: [libvirt] RFC: exposing qemu's block-set-write-threshold
> 
> [adding qemu]
> 

> > I read the thread and I'm pretty sure this will be a silly question, but I
> > want
> > to make sure I am on the same page and I'm not somehow confused by the
> > terminology.
> > 
> > Let's consider the simplest of the situation we face in oVirt:
> > 
> > (thin provisioned qcow2 disk on LV)
> > 
> > vda=[format=qcow2] -> lv=[path=/dev/mapper/$UUID]
> > 
> > Isn't the LV here the 'backing file' (actually, backing block device) of
> > the disk?
> 
> Restating what you wrote into libvirt terminology, I think this means
>
> that you have a <disk> where:
> <driver> is qcow2
> <source> is a local file name
> <device> names vda
> <backingStore index='1'> describes the backing LV:
>   <driver> is also qcow2 (as polling allocation growth in order to
> resize on demand only makes sense for qcow2 format)
>   <source> is /dev/mapper/$UUID


Yes, exactly my point. I just want to be 100% sure that the three (slightly) different
parlances of the three groups (oVirt/libvirt/QEMU) are aligned on the same meaning,
and that we're not getting anything lost in translation

> that you have a <disk> where:
> <driver> is qcow2
> <source> is a local file name
> <device> names vda
> <backingStore index='1'> describes the backing LV:
>   <driver> is also qcow2 (as polling allocation growth in order to
> resize on demand only makes sense for qcow2 format)
>   <source> is /dev/mapper/$UUID

For the final confirmation, here's the actual XML we produce:

<disk device="disk" snapshot="no" type="block">
  <address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci"/>
  <source dev="/rhev/data-center/00000002-0002-0002-0002-00000000014b/12f68692-2a5a-4e48-af5e-4679bca7fd44/images/ee1295ee-7ddc-4030-be5e-4557538bc4d2/05a88a94-5bd6-4698-be69-39e78c84e1a5"/>
  <target bus="virtio" dev="vda"/>
  <serial>ee1295ee-7ddc-4030-be5e-4557538bc4d2</serial>
  <boot order="1"/>
  <driver cache="none" error_policy="stop" io="native" name="qemu" type="qcow2"/>
</disk>

For the sake of completeness:

$ ls -lh /rhev/data-center/00000002-0002-0002-0002-00000000014b/12f68692-2a5a-4e48-af5e-4679bca7fd44/images/ee1295ee-7ddc-4030-be5e-4557538bc4d2/05a88a94-5bd6-4698-be69-39e78c84e1a5 
lrwxrwxrwx. 1 vdsm kvm 78 May 22 08:49 /rhev/data-center/00000002-0002-0002-0002-00000000014b/12f68692-2a5a-4e48-af5e-4679bca7fd44/images/ee1295ee-7ddc-4030-be5e-4557538bc4d2/05a88a94-5bd6-4698-be69-39e78c84e1a5 -> /dev/12f68692-2a5a-4e48-af5e-4679bca7fd44/05a88a94-5bd6-4698-be69-39e78c84e1a5

$ ls -lh /dev/12f68692-2a5a-4e48-af5e-4679bca7fd44/
total 0
lrwxrwxrwx. 1 root root 8 May 22 08:49 05a88a94-5bd6-4698-be69-39e78c84e1a5 -> ../dm-11
lrwxrwxrwx. 1 root root 8 May 22 08:49 54673e6d-207d-4a66-8f0d-3f5b3cda78e5 -> ../dm-12
lrwxrwxrwx. 1 root root 9 May 22 08:49 ids -> ../dm-606
lrwxrwxrwx. 1 root root 9 May 22 08:49 inbox -> ../dm-607
lrwxrwxrwx. 1 root root 9 May 22 08:49 leases -> ../dm-605
lrwxrwxrwx. 1 root root 9 May 22 08:49 master -> ../dm-608
lrwxrwxrwx. 1 root root 9 May 22 08:49 metadata -> ../dm-603
lrwxrwxrwx. 1 root root 9 May 22 08:49 outbox -> ../dm-604

lvs | grep 05a88a94
  05a88a94-5bd6-4698-be69-39e78c84e1a5 12f68692-2a5a-4e48-af5e-4679bca7fd44 -wi-ao----  14.12g

 
> then indeed, "vda" is the local qcow2 file, and "vda[1]" is the backing
> file on the LV storage.
> 
> Normally, you only care about the write threshold at the active layer
> (the local file, with name "vda"), because that is the only image that
> will normally be allocating sectors.  But in the case of active commit,
> where you are taking the thin-provisioned local file and writing its
> clusters back into the backing LV, the action of commit can allocate
> sectors in the backing file. 

Right

> Thus, libvirt wants to let you set a
> write-threshold on both parts of the backing chain (the active wrapper,
> and the LV backing file), where the event could fire on either node
> first.  The existing libvirt virConnectDomainGetAllStats() can already
> be used to poll allocation growth (the block.N.allocation statistic in
> libvirt, or 'virtual-size' in QMP's 'ImageInfo'), but the event would
> let you drop polling.

Yes, exactly the intent

> However, while starting to code the libvirt side of things, I've hit a
> couple of snags with interacting with the qemu design.  First, the
> 'block-set-write-threshold' command is allowed to set a threshold by
> 'node-name' (any BDS, whether active or backing),

Yes, this emerged during the review of my patch. 
I first took the simplest approach (probably simplistic, in retrospect),
but -IIRC- was pointed out that setting by node-name grants the most
flexible approach, hence was required.

See:
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02503.html
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02580.html
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02831.html

> but libvirt is not yet
> setting 'node-name' for backing files (so even though libvirt knows how
> to resolve "vda[1]" to the backing chain, 

I had vague memories of this, hence my clumsy and poorly worded question
about how to resolve 'vda[1]' before :\

> it does not yet have a way to
> tell qemu to set the threshold on that BDS until libvirt starts naming
> all nodes).  Second, querying for the current threshold value is only
> possible in struct 'BlockDeviceInfo', which is reported as the top-level
> of each disk in 'query-block', and also for 'query-named-block-nodes'.
> However, when it comes to collecting block.N.allocation, libvirt is
> instead getting information from the sub-struct 'ImageInfo', which is
> reported recursively for BlockDeviceInfo in 'query-block' but not
> reported for 'query-named-block-nodes'.

IIRC 'query-named-block-nodes' was the preferred way to extract this
information (see also
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02944.html )

>  So it is that much harder to
> call 'query-named-block-nodes' and then correlate that information back
> into the tree of information for anything but the active image. So it
> may be a while before thresholds on "vda[1]" actually work for block
> commit; my initial implementation will just focus on the active image "vda".
> 
> I'm wondering if qemu can make it easier by duplicating threshold
> information into 'ImageInfo' rather than just 'BlockDeviceInfo', so that
> a single call to 'query-block' rather than a second call to
> 'query-named-block-nodes' can scrape the threshold information for every
> BDS in the chain. 

I think I've just not explored this option back in time, just had
vague feeling that was better not to duplicate information, but I can't recall
real solid reason on my side.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani

      reply	other threads:[~2015-05-22  7:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <555A4B59.1090305@redhat.com>
     [not found] ` <20150519115209.GA11275@andariel.home>
     [not found]   ` <555B2FA8.3070901@redhat.com>
     [not found]     ` <1590143203.1982207.1432216183349.JavaMail.zimbra@redhat.com>
2015-05-22  4:33       ` [Qemu-devel] [libvirt] RFC: exposing qemu's block-set-write-threshold Eric Blake
2015-05-22  7:24         ` Francesco Romani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=617435499.2339061.1432279447709.JavaMail.zimbra@redhat.com \
    --to=fromani@redhat.com \
    --cc=eblake@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.