qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: "Akihiko Odaki" <akihiko.odaki@daynix.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Gert Wollny" <gert.wollny@collabora.com>,
	qemu-devel@nongnu.org,
	"Gurchetan Singh" <gurchetansingh@chromium.org>,
	"Alyssa Ross" <hi@alyssa.is>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Stefano Stabellini" <stefano.stabellini@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Xenia Ragiadakou" <xenia.ragiadakou@amd.com>,
	"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
	"Honglei Huang" <honglei1.huang@amd.com>,
	"Julia Zhang" <julia.zhang@amd.com>,
	"Chen Jiqian" <Jiqian.Chen@amd.com>,
	"Rob Clark" <robdclark@gmail.com>,
	"Yiwei Zhang" <zzyiwei@chromium.org>,
	"Sergio Lopez Pascual" <slp@redhat.com>
Subject: Re: [PATCH v5 0/8] Support virtio-gpu DRM native context
Date: Mon, 27 Jan 2025 14:50:29 +0000	[thread overview]
Message-ID: <87ikq048ga.fsf@draig.linaro.org> (raw)
In-Reply-To: <48195564-a5e4-45f1-906b-68c6ba7d7f81@collabora.com> (Dmitry Osipenko's message of "Thu, 23 Jan 2025 15:37:41 +0300")

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 1/23/25 14:58, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>>> On 1/22/25 20:00, Alex Bennée wrote:
>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>>
>>>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>>>>
>>>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>>>>> DRM native context [1] mediates lower level kernel driver UAPI, which
>>>>> reflects in a less CPU overhead and less/simpler code needed to support it.
>>>>> DRM context consists of a host and guest parts that have to be implemented
>>>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>>>>> a real/native host GPU device for GL/VK applications.
>>>>>
>>>>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>>>>>
>>>>> Today there are four known DRM native context drivers existing in a wild:
>>>>>
>>>>>   - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>>>>>   - AMDGPU, mostly merged into upstreams
>>>>
>>>> I tried my AMD system today with:
>>>>
>>>> Host:
>>>>   Aarch64 AVA system
>>>>   Trixie
>>>>   virglrenderer @ v1.1.0/99557f5aa130930d11f04ffeb07f3a9aa5963182
>>>>   -display sdl,gl=on (gtk,gl=on also came up but handled window resizing
>>>>   poorly)
>>>>   
>>>> KVM Guest
>>>>
>>>>   Aarch64
>>>>   Trixie
>>>>   mesa @ main/d27748a76f7dd9236bfcf9ef172dc13b8c0e170f
>>>>   -Dvulkan-drivers=virtio,amd -Dgallium-drivers=virgl,radeonsi -Damdgpu-virtio=true
>>>>
>>>> However when I ran vulkan-info --summary KVM faulted with:
>>>>
>>>>   debian-trixie login: error: kvm run failed Bad address
>>>>    PC=0000ffffb9aa1eb0 X00=0000ffffba0450a4 X01=0000aaaaf7f32400
>>>>   X02=000000000000013c X03=0000ffffba045098 X04=0000aaaaf7f3253c
>>>>   X05=0000ffffba0451d4 X06=00000000c0016900 X07=000000000000000e
>>>>   X08=0000000000000014 X09=00000000000000ff X10=0000aaaaf7f32500
>>>>   X11=0000aaaaf7e4d028 X12=0000aaaaf7edbcb0 X13=0000000000000001
>>>>   X14=000000000000000c X15=0000000000007718 X16=0000ffffb93601f0
>>>>   X17=0000ffffb9aa1dc0 X18=00000000000076f0 X19=0000aaaaf7f31330
>>>>   X20=0000aaaaf7f323f0 X21=0000aaaaf7f235e0 X22=000000000000004c
>>>>   X23=0000aaaaf7f2b5e0 X24=0000aaaaf7ee0cb0 X25=00000000000000ff
>>>>   X26=0000000000000076 X27=0000ffffcd2b18a8 X28=0000aaaaf7ee0cb0
>>>>   X29=0000ffffcd2b0bd0 X30=0000ffffb86c8b98  SP=0000ffffcd2b0bd0
>>>>   PSTATE=20001000 --C- EL0t
>>>>   QEMU 9.2.50 monitor - type 'help' for more information
>>>>   (qemu) quit
>>>>
>>>> Which looks very much like the PFN locking failure. However booting up
>>>> with venus=on instead works. Could there be any differences in the way
>>>> device memory is mapped in the two cases?
>>>
>>> Memory mapping works exactly the same for nctx and venus. Are you on
>>> 6.13 host kernel?
>> 
>> Yes - with the Altra PCI workaround patches on both host and guest
>> kernel.
>> 
>> Is there anyway to trace the sharing of device memory on the host so I
>> can verify its an attempt at device access? The PC looks like its in
>> user-space but once this fails the guest is suspended so I can't poke
>> around in its environment.
>
> I'm adding printk's to kernel in a such cases. Likely there is no other
> better way to find why it fails.
>
> Does your ARM VM and host both use 4k page size?
>
> Well, if it's a page refcounting bug on ARM/KMV, then applying [1] to
> the host driver will make it work and we will know where the problem is.
> Please try.
>
> [1]
> https://patchwork.kernel.org/project/kvm/patch/20220815095423.11131-1-dmitry.osipenko@collabora.com/

That makes no difference.

AFAICT the fault is triggered in userspace:

  error: kvm run failed Bad address
   PC=0000ffffb1911eb0 X00=0000ffffb1eb60a4 X01=0000aaaaeb1f5400
  X02=000000000000013c X03=0000ffffb1eb6098 X04=0000aaaaeb1f553c
  X05=0000ffffb1eb61d4 X06=00000000c0016900 X07=000000000000000e
  X08=0000000000000014 X09=00000000000000ff X10=0000aaaaeb1f5500
  X11=0000aaaaeb110028 X12=0000aaaaeb19ecb0 X13=0000000000000001
  X14=000000000000000c X15=0000000000007718 X16=0000ffffb11d01f0
  X17=0000ffffb1911dc0 X18=00000000000076f0 X19=0000aaaaeb1f4330
  X20=0000aaaaeb1f53f0 X21=0000aaaaeb1e65e0 X22=000000000000004c
  X23=0000aaaaeb1ee5e0 X24=0000aaaaeb1a3cb0 X25=00000000000000ff
  X26=0000000000000076 X27=0000ffffc7db4e58 X28=0000aaaaeb1a3cb0
  X29=0000ffffc7db4180 X30=0000ffffb0538b98  SP=0000ffffc7db4180
  PSTATE=20001000 --C- EL0t
  QEMU 9.2.50 monitor - type 'help' for more information
  (qemu) quit

  Thread 4 received signal SIGABRT, Aborted.
  [Switching to Thread 1.4]
  cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
  32              arm_cpuidle_restore_irq_context(&context);
  (gdb) alex
  Undefined command: "alex".  Try "help".
  (gdb) bt
  #0  cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
  #1  0xffff800081962180 in arch_cpu_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:44
  #2  0xffff8000819622c4 in default_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:117
  #3  0xffff80008013af8c in cpuidle_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:185
  #4  do_idle () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:325
  #5  0xffff80008013b208 in cpu_startup_entry (state=state@entry=CPUHP_AP_ONLINE_IDLE) at /home/alex/lsrc/linux.git/kernel/sched/idle.c:423
  #6  0xffff800080043668 in secondary_start_kernel () at /home/alex/lsrc/linux.git/arch/arm64/kernel/smp.c:279
  #7  0xffff800080051f78 in __secondary_switched () at /home/alex/lsrc/linux.git/arch/arm64/kernel/head.S:420
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) info threads
    Id   Target Id                    Frame 
    1    Thread 1.1 (CPU#0 [running]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
    2    Thread 1.2 (CPU#1 [halted ]) 0x0000ffffb1911eb0 in ?? ()
    3    Thread 1.3 (CPU#2 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
  * 4    Thread 1.4 (CPU#3 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
  (gdb) thread 2
  [Switching to thread 2 (Thread 1.2)]
  #0  0x0000ffffb1911eb0 in ?? ()
  (gdb) bt
  #0  0x0000ffffb1911eb0 in ?? ()
  #1  0x0000aaaaeb1ea5e0 in ?? ()
  Backtrace stopped: previous frame inner to this frame (corrupt stack?)
  (gdb) frame 0
  #0  0x0000ffffb1911eb0 in ?? ()
  (gdb) x/5i $pc
  => 0xffffb1911eb0:      str     q3, [x0]
     0xffffb1911eb4:      ldp     q2, q3, [x1, #48]
     0xffffb1911eb8:      subs    x2, x2, #0x90
     0xffffb1911ebc:      b.ls    0xffffb1911ee0  // b.plast
     0xffffb1911ec0:      stp     q0, q1, [x3, #16]
  (gdb) p/x $x0
  $1 = 0xffffb1eb60a4

I suspect that is memcpy again but I'll try and track it down. The only
other note is:

[  411.509647] kvm [7713]: Unsupported FSC: EC=0x24 xFSC=0x21 ESR_EL2=0x92000061

Which is:

  EC 0x24 - Data Abort from lower EL
  DFSC 0x21 - Alignment fault
  WnR 1 - Caused by write
  
-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


      reply	other threads:[~2025-01-27 14:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-19 22:00 [PATCH v5 0/8] Support virtio-gpu DRM native context Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 1/8] ui/sdl2: Restore original context after new context creation Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 2/8] ui/sdl2: Implement dpy dmabuf functions Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 3/8] virtio-gpu: Handle virgl fence creation errors Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 4/8] virtio-gpu: Support asynchronous fencing Dmitry Osipenko
     [not found]   ` <87cyghr3l2.fsf@draig.linaro.org>
2025-01-22 12:18     ` Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 5/8] virtio-gpu: Support DRM native context Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 6/8] ui/sdl2: Don't disable scanout when display is refreshed Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 7/8] ui/gtk: " Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 8/8] docs/system: Expand the virtio-gpu documentation Dmitry Osipenko
     [not found]   ` <c2e1c362-5d02-488e-b849-d0b14781a60f@daynix.com>
     [not found]     ` <87ikq9r7wj.fsf@draig.linaro.org>
2025-01-21  4:26       ` Akihiko Odaki
2025-01-26 18:06         ` Dmitry Osipenko
2025-01-27  4:57           ` Akihiko Odaki
2025-02-02 22:08             ` Dmitry Osipenko
2025-02-03  5:31               ` Akihiko Odaki
2025-02-05 17:40                 ` Dmitry Osipenko
2025-02-06  5:41                   ` Akihiko Odaki
2025-02-09 21:03                     ` Dmitry Osipenko
2025-02-13  4:32                       ` Akihiko Odaki
2025-02-18  6:27                         ` Dmitry Osipenko
2025-02-18  6:35                           ` Dmitry Osipenko
2025-02-27  6:40                           ` Akihiko Odaki
     [not found] ` <871pwxqyr3.fsf@draig.linaro.org>
2025-01-22 12:25   ` [PATCH v5 0/8] Support virtio-gpu DRM native context Dmitry Osipenko
2025-01-22 17:00 ` Alex Bennée
2025-01-23 11:23   ` Dmitry Osipenko
2025-01-23 11:58     ` Alex Bennée
2025-01-23 12:37       ` Dmitry Osipenko
2025-01-27 14:50         ` Alex Bennée [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ikq048ga.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=Jiqian.Chen@amd.com \
    --cc=akihiko.odaki@daynix.com \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=gert.wollny@collabora.com \
    --cc=gurchetansingh@chromium.org \
    --cc=hi@alyssa.is \
    --cc=honglei1.huang@amd.com \
    --cc=julia.zhang@amd.com \
    --cc=kraxel@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ray.huang@amd.com \
    --cc=robdclark@gmail.com \
    --cc=roger.pau@citrix.com \
    --cc=slp@redhat.com \
    --cc=stefano.stabellini@amd.com \
    --cc=xenia.ragiadakou@amd.com \
    --cc=zzyiwei@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).