From: Steven Sistare <steven.sistare@oracle.com>
To: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>, qemu-devel@nongnu.org
Cc: "William Roche" <william.roche@oracle.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Denis V. Lunev" <den@virtuozzo.com>
Subject: Re: [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
Date: Fri, 28 Feb 2025 13:13:07 -0500 [thread overview]
Message-ID: <6fd87c40-92dd-4290-9fa9-abd014ddf248@oracle.com> (raw)
In-Reply-To: <78309320-f19e-4a06-acfa-bc66cbc81bd7@virtuozzo.com>
On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
> Hi all,
>
> We've been experimenting with cpr-transfer migration mode recently and
> have discovered the following issue with the guest QXL driver:
>
> Run migration source:
>> EMULATOR=/path/to/emulator
>> ROOTFS=/path/to/image
>> QMPSOCK=/var/run/alma8qmp-src.sock
>>
>> $EMULATOR -enable-kvm \
>> -machine q35 \
>> -cpu host -smp 2 -m 2G \
>> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>> -machine memory-backend=ram0 \
>> -machine aux-ram-share=on \
>> -drive file=$ROOTFS,media=disk,if=virtio \
>> -qmp unix:$QMPSOCK,server=on,wait=off \
>> -nographic \
>> -device qxl-vga
>
> Run migration target:
>> EMULATOR=/path/to/emulator
>> ROOTFS=/path/to/image
>> QMPSOCK=/var/run/alma8qmp-dst.sock
>>
>> $EMULATOR -enable-kvm \
>> -machine q35 \
>> -cpu host -smp 2 -m 2G \
>> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>> -machine memory-backend=ram0 \
>> -machine aux-ram-share=on \
>> -drive file=$ROOTFS,media=disk,if=virtio \
>> -qmp unix:$QMPSOCK,server=on,wait=off \
>> -nographic \
>> -device qxl-vga \
>> -incoming tcp:0:44444 \
>> -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
>
>
> Launch the migration:
>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
>> QMPSOCK=/var/run/alma8qmp-src.sock
>>
>> $QMPSHELL -p $QMPSOCK <<EOF
>> migrate-set-parameters mode=cpr-transfer
>> migrate channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
>> EOF
>
> Then, after a while, QXL guest driver on target crashes spewing the
> following messages:
>> [ 73.962002] [TTM] Buffer eviction failed
>> [ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
>> [ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
>
> That seems to be a known kernel QXL driver bug:
>
> https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
> https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
>
> (the latter discussion contains that reproduce script which speeds up
> the crash in the guest):
>> #!/bin/bash
>>
>> chvt 3
>>
>> for j in $(seq 80); do
>> echo "$(date) starting round $j"
>> if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then
>> echo "bug was reproduced after $j tries"
>> exit 1
>> fi
>> for i in $(seq 100); do
>> dmesg > /dev/tty3
>> done
>> done
>>
>> echo "bug could not be reproduced"
>> exit 0
>
> The bug itself seems to remain unfixed, as I was able to reproduce that
> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
> cpr-transfer code also seems to be buggy as it triggers the crash -
> without the cpr-transfer migration the above reproduce doesn't lead to
> crash on the source VM.
>
> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
> rather passes it through the memory backend object, our code might
> somehow corrupt the VRAM. However, I wasn't able to trace the
> corruption so far.
>
> Could somebody help the investigation and take a look into this? Any
> suggestions would be appreciated. Thanks!
Possibly some memory region created by qxl is not being preserved.
Try adding these traces to see what is preserved:
-trace enable='*cpr*'
-trace enable='*ram_alloc*'
- Steve
next prev parent reply other threads:[~2025-02-28 18:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 17:39 [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration Andrey Drobyshev
2025-02-28 18:13 ` Steven Sistare [this message]
2025-02-28 18:20 ` Steven Sistare
2025-02-28 18:35 ` Andrey Drobyshev
2025-02-28 18:37 ` Andrey Drobyshev
2025-03-04 19:05 ` Steven Sistare
2025-03-05 16:50 ` Andrey Drobyshev
2025-03-05 21:19 ` Steven Sistare
2025-03-06 9:59 ` Denis V. Lunev
2025-03-06 15:16 ` Andrey Drobyshev
2025-03-06 15:52 ` Denis V. Lunev
2025-03-06 16:13 ` Steven Sistare
2025-03-07 21:00 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6fd87c40-92dd-4290-9fa9-abd014ddf248@oracle.com \
--to=steven.sistare@oracle.com \
--cc=andrey.drobyshev@virtuozzo.com \
--cc=berrange@redhat.com \
--cc=den@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).