From: Kashyap Chamarthy <kchamart@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Pierre Libeau <pierre.libeau@ovhcloud.com>,
pkrempa@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Subject: Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
Date: Mon, 24 Jan 2022 18:24:38 +0100 [thread overview]
Message-ID: <Ye7g1ilCtJPVw7M9@paraplu> (raw)
In-Reply-To: <YelLPjw7Qliknhhb@redhat.com>
Hi,
(Sorry for the slowness here.)
On Thu, Jan 20, 2022 at 12:45:02PM +0100, Kevin Wolf wrote:
> Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
[...]
> > Hello,
> >
> > I'm working on a patch in nova to improve the time of file system
> > freeze during live-snapshot on an instance with a local disk and I
> > need your opinion about the solution I would propose.
> >
> > My issue during the live migration is the duration of file system
> > freeze on an instance with a big local disk. [1]
> >
> > In my case instance have locally a disk (400Go) and the
> > qemu-guest-agent is installed.
> >
> > Nova process like that: [2]
> > dev = guest.get_block_device(disk_path)
> >
> > 1. guest.freeze_filesystems()
> > 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> > 3. while not dev.is_job_complete() #wait for the end of mirroring (the
> > issue is here, the waiting time depend on the size of the disk and
> > the IOPS)
> > 4. dev.abort_job()
> > 5. guest.thaw_filesystems()
>
> So first of all, I have to do some translation of terminology which
> seems to be different from what I am used to.
First, here's the API mapping from Nova to QEMU:
- rebase() is referring to a Nova's helper[b] method
- ... which maps to libvirt's blockRebase() API
- ... which in turns maps to QMP 'block-stream'
And here's the broader QEMU and libvirt block API mapping (Eric/Peter
correct me if I missed something):
- QEMU 'block-commit' == blockCommit() API in libvirt
- QEMU 'block-stream' == blockRebase() API in libvirt
- QEMU 'drive-mirror' / 'blockdev-mirror' == blockCopy() API in libvirt
- QEMU 'blockdev-backup' == backupBegin() API in libvirt
> dev.rebase with copy=True seems to result in a mirror block job in QEMU?
A detail: if you _only_ have copy=True, then you're right it makes a
fully copy. But there's also the "shallow=True" and "reuse_ext=True".
It's worth quoting (at least for me :-)) the official libvirt API
docs[c] of virDomainRebase():
- "When flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a
copy, where base must be the name of a new file to copy the chain
to. By default, the copy will pull the entire source chain into the
destination file"
- ... "but if flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then
only the top of the source chain will be copied (the source and
destination have a common backing file)"
- ... VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT means, "reuse an existing file
which was pre-created with the correct format and metadata and
sufficient size to hold the copy"
- ... "In case the VIR_DOMAIN_BLOCK_REBASE_SHALLOW flag is used the
pre-created file has to exhibit the same guest visible contents as
the backing file of the original image. This allows a management app
to pre-create files with relative backing file names, rather than
the default of absolute backing file names; as a security
precaution, you should generally only use reuse_ext with the shallow
flag and a non-raw destination file"
> So what you're calling a snapshot here doesn't seem to be a differential
> snapshot (e.g. by adding a COW overlay), but a full copy that results in
> two fully independent, standalone images. Is this right?
Correct. The "live snapshot" in Nova has always been full copies,
afraid.
> Adding a bit more context, the whole process seems to be:
>
> 1. Create a qcow2 for the copy of the top layer that shares the backing
> file with the active image.
>
> 2. Freeze guest filesystems
>
> 3. Create a full copy of the active layer (into the new qcow2 file)
> a. Start a mirror job
As noted above, it's a stream job. (Assuming libvirt's blockRebase() is
still caling stream under the hood)
> b. Wait for the mirror job to move to the READY state
> c. Cancel the mirror job with force=false, i.e. complete the mirror
> job without changing the active image of the VM
Yeah, the "full copy of the active layer" is what libvirt calls "shallow
copy" -- shallow=True in the rebase() call above
> 4. Thaw the guest filesystems
>
> 5. qemu-img convert the copied top layer with its full backing chain
> to a standalone raw image
>
> 6. Delete the temporary qcow2 copy
>
> > My proposition is to move the freeze after the end of mirroring and
> > before the stop of mirroring. [3] I have tried on an instance and the
> > last written file on the fs corresponds to the end of the mirror.
>
> Yes, you only need the freeze around the mirror job completion, that is,
> step 3c above.
Thanks for confirming; I always forget these freeze semantics.
> However, the whole process seems very complicated for a rather simple
> operation. A comment mentions that the dance with the temporary qcow2
> file is because of a (not further specified) bug in QEMU 1.3. I believe,
> libvirt hasn't supported a QEMU version that old for a while, so is this
> really still a valid reason?
You're right -- you spotted code-rot in Nova here; the QEMU 1.3
code-comment gives it away (although it doesn't tell what the bug was).
That part[a] of the Nova code in _live_snapshot() method can be
refactored to use newer libvirt/QEMU APIs.
That said, some of the "undefine a guest XML and the redefine it later"
dance is because blockRebase() doesn't have a way to restart a copy job
on guest restart while mirroring is still intact. So the trick when
using libvirt's blockRebase() for a copy-job is to temporarily make the
domain "transient" (the guest.delete_configuration() ...
host.write_instance_config() calls in Nova).
However, blockCopy() API has a _TRANSIENT_JOB that works around the
limitation of blockRebase()
Overall, wherever Nova can, it should completely use replace
blockRebase() usage with one of the following APIs:
- virDomainBlockCopy() -- blockCopy() -- this is already used by
Nova today; but not consistently
- virDomainBackupBegin() -- backupBegin()
- virDomainBackupGetXMLDesc() -- backupGetXMLDesc()
- virDomainCheckpointCreateXML() -- checkpointCreateXML()
- virDomainCheckpointDelete()
> But what I would actually have used is a backup block job, which makes
> sure that the copy will contain the disk content at the point of time
> when the block job was started rather than when it happened to complete.
I agree, I'd prefer that too for the long term -- using the backup APIs
above. I _think_ Pierre can get his problem solved with libvirt's
blockCopy() API. Pierre, Nova has a wrapper for it, look at the usage
of the copy() wrapper method[d] in Nova.
[...]
[a] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3166,L3190
[b] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L745,L767
[c] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[d] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L729,#L743
--
/kashyap
prev parent reply other threads:[~2022-01-24 17:58 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <f3e284b53a6e46f9a18c9117fd841cf7@ovhcloud.com>
[not found] ` <aca7e9de0935423ba1d59b5472ab64a7@ovhcloud.com>
2022-01-20 11:45 ` TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot Kevin Wolf
2022-01-20 14:50 ` Pierre Libeau
2022-01-24 17:24 ` Kashyap Chamarthy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ye7g1ilCtJPVw7M9@paraplu \
--to=kchamart@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=pierre.libeau@ovhcloud.com \
--cc=pkrempa@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).