From: "Alex Bennée" <alex.bennee@linaro.org>
To: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: "Akihiko Odaki" <akihiko.odaki@daynix.com>,
"Huang Rui" <ray.huang@amd.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Gert Wollny" <gert.wollny@collabora.com>,
qemu-devel@nongnu.org,
"Gurchetan Singh" <gurchetansingh@chromium.org>,
"Alyssa Ross" <hi@alyssa.is>,
"Roger Pau Monné" <roger.pau@citrix.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Stefano Stabellini" <stefano.stabellini@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Xenia Ragiadakou" <xenia.ragiadakou@amd.com>,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
"Honglei Huang" <honglei1.huang@amd.com>,
"Julia Zhang" <julia.zhang@amd.com>,
"Chen Jiqian" <Jiqian.Chen@amd.com>,
"Rob Clark" <robdclark@gmail.com>,
"Yiwei Zhang" <zzyiwei@chromium.org>,
"Sergio Lopez Pascual" <slp@redhat.com>
Subject: Re: [PATCH v5 0/8] Support virtio-gpu DRM native context
Date: Mon, 27 Jan 2025 14:50:29 +0000 [thread overview]
Message-ID: <87ikq048ga.fsf@draig.linaro.org> (raw)
In-Reply-To: <48195564-a5e4-45f1-906b-68c6ba7d7f81@collabora.com> (Dmitry Osipenko's message of "Thu, 23 Jan 2025 15:37:41 +0300")
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> On 1/23/25 14:58, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>
>>> On 1/22/25 20:00, Alex Bennée wrote:
>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>>
>>>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>>>>
>>>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>>>>> DRM native context [1] mediates lower level kernel driver UAPI, which
>>>>> reflects in a less CPU overhead and less/simpler code needed to support it.
>>>>> DRM context consists of a host and guest parts that have to be implemented
>>>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>>>>> a real/native host GPU device for GL/VK applications.
>>>>>
>>>>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>>>>>
>>>>> Today there are four known DRM native context drivers existing in a wild:
>>>>>
>>>>> - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>>>>> - AMDGPU, mostly merged into upstreams
>>>>
>>>> I tried my AMD system today with:
>>>>
>>>> Host:
>>>> Aarch64 AVA system
>>>> Trixie
>>>> virglrenderer @ v1.1.0/99557f5aa130930d11f04ffeb07f3a9aa5963182
>>>> -display sdl,gl=on (gtk,gl=on also came up but handled window resizing
>>>> poorly)
>>>>
>>>> KVM Guest
>>>>
>>>> Aarch64
>>>> Trixie
>>>> mesa @ main/d27748a76f7dd9236bfcf9ef172dc13b8c0e170f
>>>> -Dvulkan-drivers=virtio,amd -Dgallium-drivers=virgl,radeonsi -Damdgpu-virtio=true
>>>>
>>>> However when I ran vulkan-info --summary KVM faulted with:
>>>>
>>>> debian-trixie login: error: kvm run failed Bad address
>>>> PC=0000ffffb9aa1eb0 X00=0000ffffba0450a4 X01=0000aaaaf7f32400
>>>> X02=000000000000013c X03=0000ffffba045098 X04=0000aaaaf7f3253c
>>>> X05=0000ffffba0451d4 X06=00000000c0016900 X07=000000000000000e
>>>> X08=0000000000000014 X09=00000000000000ff X10=0000aaaaf7f32500
>>>> X11=0000aaaaf7e4d028 X12=0000aaaaf7edbcb0 X13=0000000000000001
>>>> X14=000000000000000c X15=0000000000007718 X16=0000ffffb93601f0
>>>> X17=0000ffffb9aa1dc0 X18=00000000000076f0 X19=0000aaaaf7f31330
>>>> X20=0000aaaaf7f323f0 X21=0000aaaaf7f235e0 X22=000000000000004c
>>>> X23=0000aaaaf7f2b5e0 X24=0000aaaaf7ee0cb0 X25=00000000000000ff
>>>> X26=0000000000000076 X27=0000ffffcd2b18a8 X28=0000aaaaf7ee0cb0
>>>> X29=0000ffffcd2b0bd0 X30=0000ffffb86c8b98 SP=0000ffffcd2b0bd0
>>>> PSTATE=20001000 --C- EL0t
>>>> QEMU 9.2.50 monitor - type 'help' for more information
>>>> (qemu) quit
>>>>
>>>> Which looks very much like the PFN locking failure. However booting up
>>>> with venus=on instead works. Could there be any differences in the way
>>>> device memory is mapped in the two cases?
>>>
>>> Memory mapping works exactly the same for nctx and venus. Are you on
>>> 6.13 host kernel?
>>
>> Yes - with the Altra PCI workaround patches on both host and guest
>> kernel.
>>
>> Is there anyway to trace the sharing of device memory on the host so I
>> can verify its an attempt at device access? The PC looks like its in
>> user-space but once this fails the guest is suspended so I can't poke
>> around in its environment.
>
> I'm adding printk's to kernel in a such cases. Likely there is no other
> better way to find why it fails.
>
> Does your ARM VM and host both use 4k page size?
>
> Well, if it's a page refcounting bug on ARM/KMV, then applying [1] to
> the host driver will make it work and we will know where the problem is.
> Please try.
>
> [1]
> https://patchwork.kernel.org/project/kvm/patch/20220815095423.11131-1-dmitry.osipenko@collabora.com/
That makes no difference.
AFAICT the fault is triggered in userspace:
error: kvm run failed Bad address
PC=0000ffffb1911eb0 X00=0000ffffb1eb60a4 X01=0000aaaaeb1f5400
X02=000000000000013c X03=0000ffffb1eb6098 X04=0000aaaaeb1f553c
X05=0000ffffb1eb61d4 X06=00000000c0016900 X07=000000000000000e
X08=0000000000000014 X09=00000000000000ff X10=0000aaaaeb1f5500
X11=0000aaaaeb110028 X12=0000aaaaeb19ecb0 X13=0000000000000001
X14=000000000000000c X15=0000000000007718 X16=0000ffffb11d01f0
X17=0000ffffb1911dc0 X18=00000000000076f0 X19=0000aaaaeb1f4330
X20=0000aaaaeb1f53f0 X21=0000aaaaeb1e65e0 X22=000000000000004c
X23=0000aaaaeb1ee5e0 X24=0000aaaaeb1a3cb0 X25=00000000000000ff
X26=0000000000000076 X27=0000ffffc7db4e58 X28=0000aaaaeb1a3cb0
X29=0000ffffc7db4180 X30=0000ffffb0538b98 SP=0000ffffc7db4180
PSTATE=20001000 --C- EL0t
QEMU 9.2.50 monitor - type 'help' for more information
(qemu) quit
Thread 4 received signal SIGABRT, Aborted.
[Switching to Thread 1.4]
cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
32 arm_cpuidle_restore_irq_context(&context);
(gdb) alex
Undefined command: "alex". Try "help".
(gdb) bt
#0 cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
#1 0xffff800081962180 in arch_cpu_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:44
#2 0xffff8000819622c4 in default_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:117
#3 0xffff80008013af8c in cpuidle_idle_call () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:185
#4 do_idle () at /home/alex/lsrc/linux.git/kernel/sched/idle.c:325
#5 0xffff80008013b208 in cpu_startup_entry (state=state@entry=CPUHP_AP_ONLINE_IDLE) at /home/alex/lsrc/linux.git/kernel/sched/idle.c:423
#6 0xffff800080043668 in secondary_start_kernel () at /home/alex/lsrc/linux.git/arch/arm64/kernel/smp.c:279
#7 0xffff800080051f78 in __secondary_switched () at /home/alex/lsrc/linux.git/arch/arm64/kernel/head.S:420
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info threads
Id Target Id Frame
1 Thread 1.1 (CPU#0 [running]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
2 Thread 1.2 (CPU#1 [halted ]) 0x0000ffffb1911eb0 in ?? ()
3 Thread 1.3 (CPU#2 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
* 4 Thread 1.4 (CPU#3 [halted ]) cpu_do_idle () at /home/alex/lsrc/linux.git/arch/arm64/kernel/idle.c:32
(gdb) thread 2
[Switching to thread 2 (Thread 1.2)]
#0 0x0000ffffb1911eb0 in ?? ()
(gdb) bt
#0 0x0000ffffb1911eb0 in ?? ()
#1 0x0000aaaaeb1ea5e0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) frame 0
#0 0x0000ffffb1911eb0 in ?? ()
(gdb) x/5i $pc
=> 0xffffb1911eb0: str q3, [x0]
0xffffb1911eb4: ldp q2, q3, [x1, #48]
0xffffb1911eb8: subs x2, x2, #0x90
0xffffb1911ebc: b.ls 0xffffb1911ee0 // b.plast
0xffffb1911ec0: stp q0, q1, [x3, #16]
(gdb) p/x $x0
$1 = 0xffffb1eb60a4
I suspect that is memcpy again but I'll try and track it down. The only
other note is:
[ 411.509647] kvm [7713]: Unsupported FSC: EC=0x24 xFSC=0x21 ESR_EL2=0x92000061
Which is:
EC 0x24 - Data Abort from lower EL
DFSC 0x21 - Alignment fault
WnR 1 - Caused by write
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
prev parent reply other threads:[~2025-01-27 14:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-19 22:00 [PATCH v5 0/8] Support virtio-gpu DRM native context Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 1/8] ui/sdl2: Restore original context after new context creation Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 2/8] ui/sdl2: Implement dpy dmabuf functions Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 3/8] virtio-gpu: Handle virgl fence creation errors Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 4/8] virtio-gpu: Support asynchronous fencing Dmitry Osipenko
[not found] ` <87cyghr3l2.fsf@draig.linaro.org>
2025-01-22 12:18 ` Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 5/8] virtio-gpu: Support DRM native context Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 6/8] ui/sdl2: Don't disable scanout when display is refreshed Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 7/8] ui/gtk: " Dmitry Osipenko
2025-01-19 22:00 ` [PATCH v5 8/8] docs/system: Expand the virtio-gpu documentation Dmitry Osipenko
[not found] ` <c2e1c362-5d02-488e-b849-d0b14781a60f@daynix.com>
[not found] ` <87ikq9r7wj.fsf@draig.linaro.org>
2025-01-21 4:26 ` Akihiko Odaki
2025-01-26 18:06 ` Dmitry Osipenko
2025-01-27 4:57 ` Akihiko Odaki
2025-02-02 22:08 ` Dmitry Osipenko
2025-02-03 5:31 ` Akihiko Odaki
2025-02-05 17:40 ` Dmitry Osipenko
2025-02-06 5:41 ` Akihiko Odaki
2025-02-09 21:03 ` Dmitry Osipenko
2025-02-13 4:32 ` Akihiko Odaki
2025-02-18 6:27 ` Dmitry Osipenko
2025-02-18 6:35 ` Dmitry Osipenko
2025-02-27 6:40 ` Akihiko Odaki
[not found] ` <871pwxqyr3.fsf@draig.linaro.org>
2025-01-22 12:25 ` [PATCH v5 0/8] Support virtio-gpu DRM native context Dmitry Osipenko
2025-01-22 17:00 ` Alex Bennée
2025-01-23 11:23 ` Dmitry Osipenko
2025-01-23 11:58 ` Alex Bennée
2025-01-23 12:37 ` Dmitry Osipenko
2025-01-27 14:50 ` Alex Bennée [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ikq048ga.fsf@draig.linaro.org \
--to=alex.bennee@linaro.org \
--cc=Jiqian.Chen@amd.com \
--cc=akihiko.odaki@daynix.com \
--cc=alexander.deucher@amd.com \
--cc=christian.koenig@amd.com \
--cc=dmitry.osipenko@collabora.com \
--cc=gert.wollny@collabora.com \
--cc=gurchetansingh@chromium.org \
--cc=hi@alyssa.is \
--cc=honglei1.huang@amd.com \
--cc=julia.zhang@amd.com \
--cc=kraxel@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=qemu-devel@nongnu.org \
--cc=ray.huang@amd.com \
--cc=robdclark@gmail.com \
--cc=roger.pau@citrix.com \
--cc=slp@redhat.com \
--cc=stefano.stabellini@amd.com \
--cc=xenia.ragiadakou@amd.com \
--cc=zzyiwei@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.