Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kashyap Chamarthy <kchamart@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Pierre Libeau <pierre.libeau@ovhcloud.com>,
	pkrempa@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Subject: Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
Date: Mon, 24 Jan 2022 18:24:38 +0100	[thread overview]
Message-ID: <Ye7g1ilCtJPVw7M9@paraplu> (raw)
In-Reply-To: <YelLPjw7Qliknhhb@redhat.com>

Hi,

(Sorry for the slowness here.)

On Thu, Jan 20, 2022 at 12:45:02PM +0100, Kevin Wolf wrote:
> Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
 
[...]

> > Hello,
> > 
> > I'm working on a patch in nova to improve the time of file system
> > freeze during live-snapshot on an instance with a local disk and I
> > need your opinion about the solution I would propose.
> > 
> > My issue during the live migration is the duration of file system
> > freeze on an instance with a big local disk. [1]
> >
> > In my case instance have locally a disk (400Go) and the
> > qemu-guest-agent is installed.
> >
> > Nova process like that: [2]
> > dev = guest.get_block_device(disk_path)
> > 
> > 1. guest.freeze_filesystems()
> > 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> > 3. while not dev.is_job_complete() #wait for the end of mirroring (the
> >    issue is here, the waiting time depend on the size of the disk and
> >    the IOPS)
> > 4. dev.abort_job()
> > 5. guest.thaw_filesystems()
> 
> So first of all, I have to do some translation of terminology which
> seems to be different from what I am used to.

First, here's the API mapping from Nova to QEMU:

  - rebase() is referring to a Nova's helper[b] method 
  - ... which maps to libvirt's blockRebase() API
  - ... which in turns maps to QMP 'block-stream'

And here's the broader QEMU and libvirt block API mapping (Eric/Peter
correct me if I missed something):

  - QEMU 'block-commit' == blockCommit() API in libvirt
  - QEMU 'block-stream' == blockRebase() API in libvirt
  - QEMU 'drive-mirror' / 'blockdev-mirror' == blockCopy() API in libvirt
  - QEMU 'blockdev-backup' == backupBegin() API in libvirt

> dev.rebase with copy=True seems to result in a mirror block job in QEMU?

A detail: if you _only_ have copy=True, then you're right it makes a
fully copy.  But there's also the "shallow=True" and "reuse_ext=True".
It's worth quoting (at least for me :-)) the official libvirt API
docs[c] of virDomainRebase():

  - "When flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a
    copy, where base must be the name of a new file to copy the chain
    to. By default, the copy will pull the entire source chain into the
    destination file"

  - ... "but if flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then
    only the top of the source chain will be copied (the source and
    destination have a common backing file)"

  - ... VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT means, "reuse an existing file
    which was pre-created with the correct format and metadata and
    sufficient size to hold the copy"

  - ... "In case the VIR_DOMAIN_BLOCK_REBASE_SHALLOW flag is used the
    pre-created file has to exhibit the same guest visible contents as
    the backing file of the original image. This allows a management app
    to pre-create files with relative backing file names, rather than
    the default of absolute backing file names; as a security
    precaution, you should generally only use reuse_ext with the shallow
    flag and a non-raw destination file"

> So what you're calling a snapshot here doesn't seem to be a differential
> snapshot (e.g. by adding a COW overlay), but a full copy that results in
> two fully independent, standalone images. Is this right?

Correct.  The "live snapshot" in Nova has always been full copies,
afraid.

> Adding a bit more context, the whole process seems to be:
> 
> 1. Create a qcow2 for the copy of the top layer that shares the backing
>    file with the active image.
> 
> 2. Freeze guest filesystems
> 
> 3. Create a full copy of the active layer (into the new qcow2 file)
>     a. Start a mirror job

As noted above, it's a stream job.  (Assuming libvirt's blockRebase() is
still caling stream under the hood)

>     b. Wait for the mirror job to move to the READY state
>     c. Cancel the mirror job with force=false, i.e. complete the mirror
>        job without changing the active image of the VM

Yeah, the "full copy of the active layer" is what libvirt calls "shallow
copy" -- shallow=True in the rebase() call above

> 4. Thaw the guest filesystems
> 
> 5. qemu-img convert the copied top layer with its full backing chain
>    to a standalone raw image
> 
> 6. Delete the temporary qcow2 copy
> 
> > My proposition is to move the freeze after the end of mirroring and
> > before the stop of mirroring. [3] I have tried on an instance and the
> > last written file on the fs corresponds to the end of the mirror.
> 
> Yes, you only need the freeze around the mirror job completion, that is,
> step 3c above.

Thanks for confirming; I always forget these freeze semantics.

> However, the whole process seems very complicated for a rather simple
> operation. A comment mentions that the dance with the temporary qcow2
> file is because of a (not further specified) bug in QEMU 1.3. I believe,
> libvirt hasn't supported a QEMU version that old for a while, so is this
> really still a valid reason?

You're right -- you spotted code-rot in Nova here; the QEMU 1.3
code-comment gives it away (although it doesn't tell what the bug was).
That part[a] of the Nova code in _live_snapshot() method can be
refactored to use newer libvirt/QEMU APIs.  

That said, some of the "undefine a guest XML and the redefine it later"
dance is because blockRebase() doesn't have a way to restart a copy job
on guest restart while mirroring is still intact.  So the trick when
using libvirt's blockRebase() for a copy-job is to temporarily make the
domain "transient" (the guest.delete_configuration() ...
host.write_instance_config() calls in Nova).

However, blockCopy() API has a _TRANSIENT_JOB that works around the
limitation of blockRebase()

Overall, wherever Nova can, it should completely use replace
blockRebase() usage with one of the following APIs:

    - virDomainBlockCopy() -- blockCopy() -- this is already used by
      Nova today; but not consistently
    - virDomainBackupBegin() -- backupBegin()
    - virDomainBackupGetXMLDesc() -- backupGetXMLDesc()
    - virDomainCheckpointCreateXML() -- checkpointCreateXML()
    - virDomainCheckpointDelete()

> But what I would actually have used is a backup block job, which makes
> sure that the copy will contain the disk content at the point of time
> when the block job was started rather than when it happened to complete.

I agree, I'd prefer that too for the long term -- using the backup APIs
above.  I _think_ Pierre can get his problem solved with libvirt's
blockCopy() API.  Pierre, Nova has a wrapper for it, look at the usage
of the copy() wrapper method[d] in Nova.

[...]


[a] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3166,L3190
[b] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L745,L767
[c] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[d] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L729,#L743


-- 
/kashyap

     prev parent reply	other threads:[~2022-01-24 17:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f3e284b53a6e46f9a18c9117fd841cf7@ovhcloud.com>
     [not found] ` <aca7e9de0935423ba1d59b5472ab64a7@ovhcloud.com>
2022-01-20 11:45   ` TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot Kevin Wolf
2022-01-20 14:50     ` Pierre Libeau
2022-01-24 17:24     ` Kashyap Chamarthy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ye7g1ilCtJPVw7M9@paraplu \
    --to=kchamart@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pierre.libeau@ovhcloud.com \
    --cc=pkrempa@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).