qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [raw] Guest stuck during live live-migration
@ 2020-11-23  9:36 Quentin Grolleau
  2020-11-23 12:25 ` Kevin Wolf
  2020-12-15  1:46 ` Wei Wang
  0 siblings, 2 replies; 7+ messages in thread
From: Quentin Grolleau @ 2020-11-23  9:36 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 2462 bytes --]

Hello,

In our company, we are hosting a large number of Vm, hosted behind Openstack (so libvirt/qemu).
A large majority of our Vms are runnign with local data only, stored on NVME, and most of them are RAW disks.

With Qemu 4.0 (can be even with older version) we see strange  live-migration comportement:
    - some Vms live migrate at very high speed without issue (> 6 Gbps)
    - some Vms are running correctly, but migrating at a strange low speed (3Gbps)
    - some Vms are migrating at a very low speed (1Gbps, sometime less) and during the migration the guest is completely I/O stuck

When this issue happen the VM is completly block, iostat in the Vm show us a latency of 30 secs

First we thought it was related to an hardware issue we check it, we comparing different hardware, but no issue where found there

So one of my colleague had the idea to limit with "tc" the bandwidth on the interface the migration was done, and it worked the VM didn't lose any ping nor being  I/O  stuck
Important point : Once the Vm have been migrate (with the limitation ) one time, if we migrate it again right after, the migration will be done at full speed (8-9Gb/s) without freezing the Vm

It only happen on existing VM, we tried to replicate with a fresh instance with exactly the same spec and nothing was happening

We tried to replicate the workload inside the VM but there was no way to replicate the case. So it was not related to the workload nor to the server that hosts the Vm

So we thought about the disk of the instance : the raw file.

We also tried to strace -c the process during the live-migration and it was doing a lot of "lseek"

and we found this :
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00462.html


So i rebuilt Qemu with this patch and the live-migration went well, at high speed and with no VM freeze
( https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2601 )

Do you have a way to avoid the "lseek" mechanism as it consumes more resources to find the holes in the disk and don't let any for the VM ?


Server hosting the VM :
    - Bi-Xeon hosts With NVME storage and 10 Go Network card
    - Qemu 4.0 And Libvirt 5.4
    - Kernel 4.18.0.25

Guest having the issue :
    - raw image with Debian 8

Here the qemu img on the disk :
> qemu-img info disk
image: disk
file format: raw
virtual size: 400G (429496729600 bytes)
disk size: 400G


Quentin GROLLEAU


[-- Attachment #2: Type: text/html, Size: 9620 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-18 16:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-23  9:36 [raw] Guest stuck during live live-migration Quentin Grolleau
2020-11-23 12:25 ` Kevin Wolf
2020-11-24 12:58   ` Quentin Grolleau
2020-12-02 15:09     ` Quentin Grolleau
2020-12-02 15:33       ` Kevin Wolf
2021-01-18 15:35         ` Alexandre Arents
2020-12-15  1:46 ` Wei Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).