Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
       [not found] ` <aca7e9de0935423ba1d59b5472ab64a7@ovhcloud.com>
@ 2022-01-20 11:45   ` Kevin Wolf
  2022-01-20 14:50     ` Pierre Libeau
  2022-01-24 17:24     ` Kashyap Chamarthy
  0 siblings, 2 replies; 3+ messages in thread
From: Kevin Wolf @ 2022-01-20 11:45 UTC (permalink / raw)
  To: Pierre Libeau
  Cc: pkrempa, eblake, qemu-devel, qemu-block@nongnu.org, kchamart

Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
> Hello
> 
> I'm forwarding to you my question because I have pushed on the wrong
> mailing list at the beginning. Can you give me your opinion or forward
> me to the right people who can help me.
> 
> Thx.
> 
> Pierre
> 
> 
> ________________________________
> De : Qemu-discuss <qemu-discuss-bounces+pierre.libeau=corp.ovh.com@nongnu.org> de la part de Pierre Libeau <pierre.libeau@ovhcloud.com>
> Envoyé : lundi 17 janvier 2022 08:43
> À : qemu-discuss@nongnu.org
> Objet : Openstack NOVA - Improve the time of file system freeze during live-snapshot
> 
> 
> Hello,
> 
> I'm working on a patch in nova to improve the time of file system
> freeze during live-snapshot on an instance with a local disk and I
> need your opinion about the solution I would propose.
> 
> My issue during the live migration is the duration of file system
> freeze on an instance with a big local disk. [1]
>
> In my case instance have locally a disk (400Go) and the
> qemu-guest-agent is installed.
>
> Nova process like that: [2]
> dev = guest.get_block_device(disk_path)
> 
> 1. guest.freeze_filesystems()
> 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> 3. while not dev.is_job_complete() #wait for the end of mirroring (the
>    issue is here, the waiting time depend on the size of the disk and
>    the IOPS)
> 4. dev.abort_job()
> 5. guest.thaw_filesystems()

So first of all, I have to do some translation of terminology which
seems to be different from what I am used to.

dev.rebase with copy=True seems to result in a mirror block job in QEMU?

So what you're calling a snapshot here doesn't seem to be a differential
snapshot (e.g. by adding a COW overlay), but a full copy that results in
two fully independent, standalone images. Is this right?

Adding a bit more context, the whole process seems to be:

1. Create a qcow2 for the copy of the top layer that shares the backing
   file with the active image.

2. Freeze guest filesystems

3. Create a full copy of the active layer (into the new qcow2 file)
    a. Start a mirror job
    b. Wait for the mirror job to move to the READY state
    c. Cancel the mirror job with force=false, i.e. complete the mirror
       job without changing the active image of the VM

4. Thaw the guest filesystems

5. qemu-img convert the copied top layer with its full backing chain
   to a standalone raw image

6. Delete the temporary qcow2 copy

> My proposition is to move the freeze after the end of mirroring and
> before the stop of mirroring. [3] I have tried on an instance and the
> last written file on the fs corresponds to the end of the mirror.

Yes, you only need the freeze around the mirror job completion, that is,
step 3c above.

However, the whole process seems very complicated for a rather simple
operation. A comment mentions that the dance with the temporary qcow2
file is because of a (not further specified) bug in QEMU 1.3. I believe,
libvirt hasn't supported a QEMU version that old for a while, so is this
really still a valid reason?

But what I would actually have used is a backup block job, which makes
sure that the copy will contain the disk content at the point of time
when the block job was started rather than when it happened to complete.

I'm adding a few more people to CC who may have additional comments on
this.

Kevin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
  2022-01-20 11:45   ` TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot Kevin Wolf
@ 2022-01-20 14:50     ` Pierre Libeau
  2022-01-24 17:24     ` Kashyap Chamarthy
  1 sibling, 0 replies; 3+ messages in thread
From: Pierre Libeau @ 2022-01-20 14:50 UTC (permalink / raw)
  To: kwolf@redhat.com
  Cc: pkrempa@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, kchamart@redhat.com

[-- Attachment #1: Type: text/plain, Size: 4669 bytes --]

About the context:

In my case the file format is raw but it's can be also qcow2.

You have right in your explanation in nova it's not a "snapshot" but it's an image of the instance.

The goal of this image is to put it in glance after to store this image and create a new instance or rebuild an instance with this new image.

You have right, the result of "dev.rebase" is a mirror of the disk.

So my question is I break nothing when I'm moving the "Freeze guest filesystems" (step 2 in your process) just before "Cancel the mirror job" (step 3c in your process). I have tested it and it's working but I prefer to have your opinion.

About your question on the reason to do it like that related to QEMU 1.3 I will see with NOVA community. I'm a beginner at this part and your question is very good from my point of view.

Pierre

Public Cloud - VPS

________________________________
De : Kevin Wolf <kwolf@redhat.com>
Envoyé : jeudi 20 janvier 2022 12:45
À : Pierre Libeau
Cc : qemu-block@nongnu.org; qemu-devel@nongnu.org; kchamart@redhat.com; pkrempa@redhat.com; eblake@redhat.com
Objet : Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot

Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
> Hello
>
> I'm forwarding to you my question because I have pushed on the wrong
> mailing list at the beginning. Can you give me your opinion or forward
> me to the right people who can help me.
>
> Thx.
>
> Pierre
>
>
> ________________________________
> De : Qemu-discuss <qemu-discuss-bounces+pierre.libeau=corp.ovh.com@nongnu.org> de la part de Pierre Libeau <pierre.libeau@ovhcloud.com>
> Envoyé : lundi 17 janvier 2022 08:43
> À : qemu-discuss@nongnu.org
> Objet : Openstack NOVA - Improve the time of file system freeze during live-snapshot
>
>
> Hello,
>
> I'm working on a patch in nova to improve the time of file system
> freeze during live-snapshot on an instance with a local disk and I
> need your opinion about the solution I would propose.
>
> My issue during the live migration is the duration of file system
> freeze on an instance with a big local disk. [1]
>
> In my case instance have locally a disk (400Go) and the
> qemu-guest-agent is installed.
>
> Nova process like that: [2]
> dev = guest.get_block_device(disk_path)
>
> 1. guest.freeze_filesystems()
> 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> 3. while not dev.is_job_complete() #wait for the end of mirroring (the
>    issue is here, the waiting time depend on the size of the disk and
>    the IOPS)
> 4. dev.abort_job()
> 5. guest.thaw_filesystems()

So first of all, I have to do some translation of terminology which
seems to be different from what I am used to.

dev.rebase with copy=True seems to result in a mirror block job in QEMU?

So what you're calling a snapshot here doesn't seem to be a differential
snapshot (e.g. by adding a COW overlay), but a full copy that results in
two fully independent, standalone images. Is this right?

Adding a bit more context, the whole process seems to be:

1. Create a qcow2 for the copy of the top layer that shares the backing
   file with the active image.

2. Freeze guest filesystems

3. Create a full copy of the active layer (into the new qcow2 file)
    a. Start a mirror job
    b. Wait for the mirror job to move to the READY state
    c. Cancel the mirror job with force=false, i.e. complete the mirror
       job without changing the active image of the VM

4. Thaw the guest filesystems

5. qemu-img convert the copied top layer with its full backing chain
   to a standalone raw image

6. Delete the temporary qcow2 copy

> My proposition is to move the freeze after the end of mirroring and
> before the stop of mirroring. [3] I have tried on an instance and the
> last written file on the fs corresponds to the end of the mirror.

Yes, you only need the freeze around the mirror job completion, that is,
step 3c above.

However, the whole process seems very complicated for a rather simple
operation. A comment mentions that the dance with the temporary qcow2
file is because of a (not further specified) bug in QEMU 1.3. I believe,
libvirt hasn't supported a QEMU version that old for a while, so is this
really still a valid reason?

But what I would actually have used is a backup block job, which makes
sure that the copy will contain the disk content at the point of time
when the block job was started rather than when it happened to complete.

I'm adding a few more people to CC who may have additional comments on
this.

Kevin

[-- Attachment #2: Type: text/html, Size: 6812 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
  2022-01-20 11:45   ` TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot Kevin Wolf
  2022-01-20 14:50     ` Pierre Libeau
@ 2022-01-24 17:24     ` Kashyap Chamarthy
  1 sibling, 0 replies; 3+ messages in thread
From: Kashyap Chamarthy @ 2022-01-24 17:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Pierre Libeau, pkrempa, eblake, qemu-devel, qemu-block@nongnu.org

Hi,

(Sorry for the slowness here.)

On Thu, Jan 20, 2022 at 12:45:02PM +0100, Kevin Wolf wrote:
> Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
 
[...]

> > Hello,
> > 
> > I'm working on a patch in nova to improve the time of file system
> > freeze during live-snapshot on an instance with a local disk and I
> > need your opinion about the solution I would propose.
> > 
> > My issue during the live migration is the duration of file system
> > freeze on an instance with a big local disk. [1]
> >
> > In my case instance have locally a disk (400Go) and the
> > qemu-guest-agent is installed.
> >
> > Nova process like that: [2]
> > dev = guest.get_block_device(disk_path)
> > 
> > 1. guest.freeze_filesystems()
> > 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> > 3. while not dev.is_job_complete() #wait for the end of mirroring (the
> >    issue is here, the waiting time depend on the size of the disk and
> >    the IOPS)
> > 4. dev.abort_job()
> > 5. guest.thaw_filesystems()
> 
> So first of all, I have to do some translation of terminology which
> seems to be different from what I am used to.

First, here's the API mapping from Nova to QEMU:

  - rebase() is referring to a Nova's helper[b] method 
  - ... which maps to libvirt's blockRebase() API
  - ... which in turns maps to QMP 'block-stream'

And here's the broader QEMU and libvirt block API mapping (Eric/Peter
correct me if I missed something):

  - QEMU 'block-commit' == blockCommit() API in libvirt
  - QEMU 'block-stream' == blockRebase() API in libvirt
  - QEMU 'drive-mirror' / 'blockdev-mirror' == blockCopy() API in libvirt
  - QEMU 'blockdev-backup' == backupBegin() API in libvirt

> dev.rebase with copy=True seems to result in a mirror block job in QEMU?

A detail: if you _only_ have copy=True, then you're right it makes a
fully copy.  But there's also the "shallow=True" and "reuse_ext=True".
It's worth quoting (at least for me :-)) the official libvirt API
docs[c] of virDomainRebase():

  - "When flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a
    copy, where base must be the name of a new file to copy the chain
    to. By default, the copy will pull the entire source chain into the
    destination file"

  - ... "but if flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then
    only the top of the source chain will be copied (the source and
    destination have a common backing file)"

  - ... VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT means, "reuse an existing file
    which was pre-created with the correct format and metadata and
    sufficient size to hold the copy"

  - ... "In case the VIR_DOMAIN_BLOCK_REBASE_SHALLOW flag is used the
    pre-created file has to exhibit the same guest visible contents as
    the backing file of the original image. This allows a management app
    to pre-create files with relative backing file names, rather than
    the default of absolute backing file names; as a security
    precaution, you should generally only use reuse_ext with the shallow
    flag and a non-raw destination file"

> So what you're calling a snapshot here doesn't seem to be a differential
> snapshot (e.g. by adding a COW overlay), but a full copy that results in
> two fully independent, standalone images. Is this right?

Correct.  The "live snapshot" in Nova has always been full copies,
afraid.

> Adding a bit more context, the whole process seems to be:
> 
> 1. Create a qcow2 for the copy of the top layer that shares the backing
>    file with the active image.
> 
> 2. Freeze guest filesystems
> 
> 3. Create a full copy of the active layer (into the new qcow2 file)
>     a. Start a mirror job

As noted above, it's a stream job.  (Assuming libvirt's blockRebase() is
still caling stream under the hood)

>     b. Wait for the mirror job to move to the READY state
>     c. Cancel the mirror job with force=false, i.e. complete the mirror
>        job without changing the active image of the VM

Yeah, the "full copy of the active layer" is what libvirt calls "shallow
copy" -- shallow=True in the rebase() call above

> 4. Thaw the guest filesystems
> 
> 5. qemu-img convert the copied top layer with its full backing chain
>    to a standalone raw image
> 
> 6. Delete the temporary qcow2 copy
> 
> > My proposition is to move the freeze after the end of mirroring and
> > before the stop of mirroring. [3] I have tried on an instance and the
> > last written file on the fs corresponds to the end of the mirror.
> 
> Yes, you only need the freeze around the mirror job completion, that is,
> step 3c above.

Thanks for confirming; I always forget these freeze semantics.

> However, the whole process seems very complicated for a rather simple
> operation. A comment mentions that the dance with the temporary qcow2
> file is because of a (not further specified) bug in QEMU 1.3. I believe,
> libvirt hasn't supported a QEMU version that old for a while, so is this
> really still a valid reason?

You're right -- you spotted code-rot in Nova here; the QEMU 1.3
code-comment gives it away (although it doesn't tell what the bug was).
That part[a] of the Nova code in _live_snapshot() method can be
refactored to use newer libvirt/QEMU APIs.  

That said, some of the "undefine a guest XML and the redefine it later"
dance is because blockRebase() doesn't have a way to restart a copy job
on guest restart while mirroring is still intact.  So the trick when
using libvirt's blockRebase() for a copy-job is to temporarily make the
domain "transient" (the guest.delete_configuration() ...
host.write_instance_config() calls in Nova).

However, blockCopy() API has a _TRANSIENT_JOB that works around the
limitation of blockRebase()

Overall, wherever Nova can, it should completely use replace
blockRebase() usage with one of the following APIs:

    - virDomainBlockCopy() -- blockCopy() -- this is already used by
      Nova today; but not consistently
    - virDomainBackupBegin() -- backupBegin()
    - virDomainBackupGetXMLDesc() -- backupGetXMLDesc()
    - virDomainCheckpointCreateXML() -- checkpointCreateXML()
    - virDomainCheckpointDelete()

> But what I would actually have used is a backup block job, which makes
> sure that the copy will contain the disk content at the point of time
> when the block job was started rather than when it happened to complete.

I agree, I'd prefer that too for the long term -- using the backup APIs
above.  I _think_ Pierre can get his problem solved with libvirt's
blockCopy() API.  Pierre, Nova has a wrapper for it, look at the usage
of the copy() wrapper method[d] in Nova.

[...]


[a] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3166,L3190
[b] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L745,L767
[c] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[d] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L729,#L743


-- 
/kashyap



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-24 17:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <f3e284b53a6e46f9a18c9117fd841cf7@ovhcloud.com>
     [not found] ` <aca7e9de0935423ba1d59b5472ab64a7@ovhcloud.com>
2022-01-20 11:45   ` TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot Kevin Wolf
2022-01-20 14:50     ` Pierre Libeau
2022-01-24 17:24     ` Kashyap Chamarthy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).