qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>, qemu-devel@nongnu.org
Cc: "William Roche" <william.roche@oracle.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Denis V. Lunev" <den@virtuozzo.com>
Subject: Re: [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
Date: Fri, 28 Feb 2025 13:13:07 -0500	[thread overview]
Message-ID: <6fd87c40-92dd-4290-9fa9-abd014ddf248@oracle.com> (raw)
In-Reply-To: <78309320-f19e-4a06-acfa-bc66cbc81bd7@virtuozzo.com>

On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
> Hi all,
> 
> We've been experimenting with cpr-transfer migration mode recently and
> have discovered the following issue with the guest QXL driver:
> 
> Run migration source:
>> EMULATOR=/path/to/emulator
>> ROOTFS=/path/to/image
>> QMPSOCK=/var/run/alma8qmp-src.sock
>>
>> $EMULATOR -enable-kvm \
>>      -machine q35 \
>>      -cpu host -smp 2 -m 2G \
>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>>      -machine memory-backend=ram0 \
>>      -machine aux-ram-share=on \
>>      -drive file=$ROOTFS,media=disk,if=virtio \
>>      -qmp unix:$QMPSOCK,server=on,wait=off \
>>      -nographic \
>>      -device qxl-vga
> 
> Run migration target:
>> EMULATOR=/path/to/emulator
>> ROOTFS=/path/to/image
>> QMPSOCK=/var/run/alma8qmp-dst.sock
>>                                                                                     
>> $EMULATOR -enable-kvm \
>>      -machine q35 \
>>      -cpu host -smp 2 -m 2G \
>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
>>      -machine memory-backend=ram0 \
>>      -machine aux-ram-share=on \
>>      -drive file=$ROOTFS,media=disk,if=virtio \
>>      -qmp unix:$QMPSOCK,server=on,wait=off \
>>      -nographic \
>>      -device qxl-vga \
>>      -incoming tcp:0:44444 \
>>      -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
> 
> 
> Launch the migration:
>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
>> QMPSOCK=/var/run/alma8qmp-src.sock
>>
>> $QMPSHELL -p $QMPSOCK <<EOF
>>      migrate-set-parameters mode=cpr-transfer
>>      migrate channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
>> EOF
> 
> Then, after a while, QXL guest driver on target crashes spewing the
> following messages:
>> [   73.962002] [TTM] Buffer eviction failed
>> [   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
>> [   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> 
> That seems to be a known kernel QXL driver bug:
> 
> https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
> https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
> 
> (the latter discussion contains that reproduce script which speeds up
> the crash in the guest):
>> #!/bin/bash
>>
>> chvt 3
>>
>> for j in $(seq 80); do
>>          echo "$(date) starting round $j"
>>          if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then
>>                  echo "bug was reproduced after $j tries"
>>                  exit 1
>>          fi
>>          for i in $(seq 100); do
>>                  dmesg > /dev/tty3
>>          done
>> done
>>
>> echo "bug could not be reproduced"
>> exit 0
> 
> The bug itself seems to remain unfixed, as I was able to reproduce that
> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
> cpr-transfer code also seems to be buggy as it triggers the crash -
> without the cpr-transfer migration the above reproduce doesn't lead to
> crash on the source VM.
> 
> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
> rather passes it through the memory backend object, our code might
> somehow corrupt the VRAM.  However, I wasn't able to trace the
> corruption so far.
> 
> Could somebody help the investigation and take a look into this?  Any
> suggestions would be appreciated.  Thanks!

Possibly some memory region created by qxl is not being preserved.
Try adding these traces to see what is preserved:

-trace enable='*cpr*'
-trace enable='*ram_alloc*'

- Steve






  reply	other threads:[~2025-02-28 18:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28 17:39 [BUG, RFC] cpr-transfer: qxl guest driver crashes after migration Andrey Drobyshev
2025-02-28 18:13 ` Steven Sistare [this message]
2025-02-28 18:20   ` Steven Sistare
2025-02-28 18:35     ` Andrey Drobyshev
2025-02-28 18:37       ` Andrey Drobyshev
2025-03-04 19:05         ` Steven Sistare
2025-03-05 16:50           ` Andrey Drobyshev
2025-03-05 21:19             ` Steven Sistare
2025-03-06  9:59               ` Denis V. Lunev
2025-03-06 15:16               ` Andrey Drobyshev
2025-03-06 15:52                 ` Denis V. Lunev
2025-03-06 16:13                   ` Steven Sistare
2025-03-07 21:00                     ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fd87c40-92dd-4290-9fa9-abd014ddf248@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=andrey.drobyshev@virtuozzo.com \
    --cc=berrange@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=kraxel@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=william.roche@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).