From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
To: qemu-devel@nongnu.org
Cc: "Steve Sistare" <steven.sistare@oracle.com>,
"William Roche" <william.roche@oracle.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Denis V. Lunev" <den@virtuozzo.com>,
andrey.drobyshev@virtuozzo.com
Subject: [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
Date: Fri, 28 Feb 2025 19:39:57 +0200 [thread overview]
Message-ID: <78309320-f19e-4a06-acfa-bc66cbc81bd7@virtuozzo.com> (raw)
Hi all,
We've been experimenting with cpr-transfer migration mode recently and
have discovered the following issue with the guest QXL driver:
Run migration source:
> EMULATOR=/path/to/emulator
> ROOTFS=/path/to/image
> QMPSOCK=/var/run/alma8qmp-src.sock
>
> $EMULATOR -enable-kvm \
> -machine q35 \
> -cpu host -smp 2 -m 2G \
> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
> -machine memory-backend=ram0 \
> -machine aux-ram-share=on \
> -drive file=$ROOTFS,media=disk,if=virtio \
> -qmp unix:$QMPSOCK,server=on,wait=off \
> -nographic \
> -device qxl-vga
Run migration target:
> EMULATOR=/path/to/emulator
> ROOTFS=/path/to/image
> QMPSOCK=/var/run/alma8qmp-dst.sock
>
> $EMULATOR -enable-kvm \
> -machine q35 \
> -cpu host -smp 2 -m 2G \
> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
> -machine memory-backend=ram0 \
> -machine aux-ram-share=on \
> -drive file=$ROOTFS,media=disk,if=virtio \
> -qmp unix:$QMPSOCK,server=on,wait=off \
> -nographic \
> -device qxl-vga \
> -incoming tcp:0:44444 \
> -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
Launch the migration:
> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
> QMPSOCK=/var/run/alma8qmp-src.sock
>
> $QMPSHELL -p $QMPSOCK <<EOF
> migrate-set-parameters mode=cpr-transfer
> migrate channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
> EOF
Then, after a while, QXL guest driver on target crashes spewing the
following messages:
> [ 73.962002] [TTM] Buffer eviction failed
> [ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
> [ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
That seems to be a known kernel QXL driver bug:
https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
(the latter discussion contains that reproduce script which speeds up
the crash in the guest):
> #!/bin/bash
>
> chvt 3
>
> for j in $(seq 80); do
> echo "$(date) starting round $j"
> if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then
> echo "bug was reproduced after $j tries"
> exit 1
> fi
> for i in $(seq 100); do
> dmesg > /dev/tty3
> done
> done
>
> echo "bug could not be reproduced"
> exit 0
The bug itself seems to remain unfixed, as I was able to reproduce that
with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
cpr-transfer code also seems to be buggy as it triggers the crash -
without the cpr-transfer migration the above reproduce doesn't lead to
crash on the source VM.
I suspect that, as cpr-transfer doesn't migrate the guest memory, but
rather passes it through the memory backend object, our code might
somehow corrupt the VRAM. However, I wasn't able to trace the
corruption so far.
Could somebody help the investigation and take a look into this? Any
suggestions would be appreciated. Thanks!
Andrey
next reply other threads:[~2025-02-28 17:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 17:39 Andrey Drobyshev [this message]
2025-02-28 18:13 ` [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration Steven Sistare
2025-02-28 18:20 ` Steven Sistare
2025-02-28 18:35 ` Andrey Drobyshev
2025-02-28 18:37 ` Andrey Drobyshev
2025-03-04 19:05 ` Steven Sistare
2025-03-05 16:50 ` Andrey Drobyshev
2025-03-05 21:19 ` Steven Sistare
2025-03-06 9:59 ` Denis V. Lunev
2025-03-06 15:16 ` Andrey Drobyshev
2025-03-06 15:52 ` Denis V. Lunev
2025-03-06 16:13 ` Steven Sistare
2025-03-07 21:00 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=78309320-f19e-4a06-acfa-bc66cbc81bd7@virtuozzo.com \
--to=andrey.drobyshev@virtuozzo.com \
--cc=berrange@redhat.com \
--cc=den@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).