From: Steven Sistare <steven.sistare@oracle.com>
To: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>, qemu-devel@nongnu.org
Cc: "William Roche" <william.roche@oracle.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Denis V. Lunev" <den@virtuozzo.com>
Subject: Re: [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
Date: Fri, 28 Feb 2025 13:20:28 -0500 [thread overview]
Message-ID: <8c79212c-4b0b-426b-8563-3e7d478ef24f@oracle.com> (raw)
In-Reply-To: <6fd87c40-92dd-4290-9fa9-abd014ddf248@oracle.com>
On 2/28/2025 1:13 PM, Steven Sistare wrote:
> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
>> Hi all,
>>
>> We've been experimenting with cpr-transfer migration mode recently and
>> have discovered the following issue with the guest QXL driver:
>>
>> Run migration source:
>>> EMULATOR=/path/to/emulator
>>> ROOTFS=/path/to/image
>>> QMPSOCK=/var/run/alma8qmp-src.sock
>>>
>>> $EMULATOR -enable-kvm \
>>> -machine q35 \
>>> -cpu host -smp 2 -m 2G \
>>> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>>> -machine memory-backend=ram0 \
>>> -machine aux-ram-share=on \
>>> -drive file=$ROOTFS,media=disk,if=virtio \
>>> -qmp unix:$QMPSOCK,server=on,wait=off \
>>> -nographic \
>>> -device qxl-vga
>>
>> Run migration target:
>>> EMULATOR=/path/to/emulator
>>> ROOTFS=/path/to/image
>>> QMPSOCK=/var/run/alma8qmp-dst.sock
>>> $EMULATOR -enable-kvm \
>>> -machine q35 \
>>> -cpu host -smp 2 -m 2G \
>>> -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>>> -machine memory-backend=ram0 \
>>> -machine aux-ram-share=on \
>>> -drive file=$ROOTFS,media=disk,if=virtio \
>>> -qmp unix:$QMPSOCK,server=on,wait=off \
>>> -nographic \
>>> -device qxl-vga \
>>> -incoming tcp:0:44444 \
>>> -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
>>
>>
>> Launch the migration:
>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
>>> QMPSOCK=/var/run/alma8qmp-src.sock
>>>
>>> $QMPSHELL -p $QMPSOCK <<EOF
>>> migrate-set-parameters mode=cpr-transfer
>>> migrate channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
>>> EOF
>>
>> Then, after a while, QXL guest driver on target crashes spewing the
>> following messages:
>>> [ 73.962002] [TTM] Buffer eviction failed
>>> [ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
>>> [ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
>>
>> That seems to be a known kernel QXL driver bug:
>>
>> https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
>> https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
>>
>> (the latter discussion contains that reproduce script which speeds up
>> the crash in the guest):
>>> #!/bin/bash
>>>
>>> chvt 3
>>>
>>> for j in $(seq 80); do
>>> echo "$(date) starting round $j"
>>> if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then
>>> echo "bug was reproduced after $j tries"
>>> exit 1
>>> fi
>>> for i in $(seq 100); do
>>> dmesg > /dev/tty3
>>> done
>>> done
>>>
>>> echo "bug could not be reproduced"
>>> exit 0
>>
>> The bug itself seems to remain unfixed, as I was able to reproduce that
>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
>> cpr-transfer code also seems to be buggy as it triggers the crash -
>> without the cpr-transfer migration the above reproduce doesn't lead to
>> crash on the source VM.
>>
>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
>> rather passes it through the memory backend object, our code might
>> somehow corrupt the VRAM. However, I wasn't able to trace the
>> corruption so far.
>>
>> Could somebody help the investigation and take a look into this? Any
>> suggestions would be appreciated. Thanks!
>
> Possibly some memory region created by qxl is not being preserved.
> Try adding these traces to see what is preserved:
>
> -trace enable='*cpr*'
> -trace enable='*ram_alloc*'
Also try adding this patch to see if it flags any ram blocks as not
compatible with cpr. A message is printed at migration start time.
https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-steven.sistare@oracle.com/
- Steve
next prev parent reply other threads:[~2025-02-28 18:21 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 17:39 [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration Andrey Drobyshev
2025-02-28 18:13 ` Steven Sistare
2025-02-28 18:20 ` Steven Sistare [this message]
2025-02-28 18:35 ` Andrey Drobyshev
2025-02-28 18:37 ` Andrey Drobyshev
2025-03-04 19:05 ` Steven Sistare
2025-03-05 16:50 ` Andrey Drobyshev
2025-03-05 21:19 ` Steven Sistare
2025-03-06 9:59 ` Denis V. Lunev
2025-03-06 15:16 ` Andrey Drobyshev
2025-03-06 15:52 ` Denis V. Lunev
2025-03-06 16:13 ` Steven Sistare
2025-03-07 21:00 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c79212c-4b0b-426b-8563-3e7d478ef24f@oracle.com \
--to=steven.sistare@oracle.com \
--cc=andrey.drobyshev@virtuozzo.com \
--cc=berrange@redhat.com \
--cc=den@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).