From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 108625] AMDGPU - Can't even get Xorg to start - Kernel driver
hangs with ring buffer timeout on ARM64
Date: Thu, 01 Nov 2018 15:59:10 +0000
Message-ID:
Bug ID
108625
Summary
AMDGPU - Can't even get Xorg to start - Kernel driver hangs =
with ring buffer timeout on ARM64
Product
DRI
Version
unspecified
Hardware
ARM
OS
Linux (All)
Status
NEW
Severity
blocker
Priority
medium
Component
DRM/AMDgpu
Assignee
dri-devel@lists.freedesktop.org
Reporter
raster@rasterman.com
So we're going to have fun with this one...
Start Xorg. It hangs in screen setup:
#0 ioctl () at ../sysdeps/unix/sysv/linux/aarch64/ioctl.S:25
#1 0x0000ffffbb149334 in drmIoctl () from /lib/aarch64-linux-gnu/libdrm.=
so.2
#2 0x0000ffffba5166b4 in amdgpu_cs_query_fence_status () from
/lib/aarch64-linux-gnu/libdrm_amdgpu.so.1
#3 0x0000ffffb9ef37f8 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
#4 0x0000ffffb9dd148c in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
#5 0x0000ffffb993d448 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
#6 0x0000ffffb993d4ac in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
#7 0x0000ffffba54425c in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
#8 0x0000ffffba537ca8 in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
#9 0x0000aaaae7133348 in MapWindow ()
#10 0x0000aaaae710c820 in ?? ()
#11 0x0000ffffbad52720 in __libc_start_main (main=3D0x0, argc=3D0, argv=
=3D0x0,
init=3D<optimized out>, fini=3D<optimized out>, rtld_fini=3D<=
;optimized out>,
stack_end=3D<optimized out>) at ../csu/libc-start.c:310
And that ioctl hangs because of:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled
seq=3D10, last emitted seq=3D11
[drm] GPU recovery disabled.
The amdgpu kernel driver reports:
[drm] amdgpu kernel modesetting enabled.
amdgpu 0000:89:00.0: enabling device (0100 -> 0102)
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mc.bin
amdgpu 0000:89:00.0: BAR 2: releasing [mem 0x14010000000-0x140101fffff 64=
bit
pref]
amdgpu 0000:89:00.0: BAR 0: releasing [mem 0x14000000000-0x1400fffffff 64=
bit
pref]
amdgpu 0000:89:00.0: BAR 0: assigned [mem 0x14000000000-0x140ffffffff 64b=
it
pref]
amdgpu 0000:89:00.0: BAR 2: assigned [mem 0x14100000000-0x141001fffff 64b=
it
pref]
amdgpu 0000:89:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF
(4096M used)
amdgpu 0000:89:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[drm] amdgpu: 4096M of VRAM memory ready
[drm] amdgpu: 4096M of GTT memory ready.
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_pfp_2.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_me_2.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_ce_2.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_rlc.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec_2.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec2_2.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma1.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_uvd.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_vce.bin
amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_k_smc.bin
[drm] Initialized amdgpu 3.26.0 20150101 for 0000:89:00.0 on minor 1
amdgpu 0000:89:00.0: vgaarb: changed VGA decodes:
olddecodes=3Dio+mem,decodes=3Dnone:owns=3Dnone
So here is where the fun begins. Kernel is:
Linux noisy 4.18.0-2-arm64 #1 SMP Debian 4.18.10-2 (2018-10-07) aarch64
GNU/Linux
It's Debian unstable on a Cavium Thunder-X2 64bit ARM system (2 CPUs with 32
cores each, 256 cores total with 4 way SMT enabled) with a bunch of PCIE sl=
ots.
There is an Nvidia card that works.... to a decent degree and an on-board P=
CIE
dumb framebuffer display device (ASPEED), but I'd rather a more open stack =
etc.
- I've fiddled with xorg configs to get it to ignore other devices other th=
an
the AMD one like with:
Section "ServerFlags"
Option "AutoAddGPU" "false"
EndSection
Section "Device"
Identifier "amdgpu"
Driver "amdgpu"
BusID "PCI:137:0:0"
Option "DRI" "2"
Option "TearFree" "on"
EndSection
I've even put the AMD card in the same slot as the Nvidia one with the same
results, so it's not a slot specific issue it seems. So where should I start
poking to see where this very early stage ring gfx timeout is originating f=
rom
specifically... I'm willing to start the fun of compiling kernels etc. to d=
ig
through this. So how can I help solve this and make AMD cards portable and
usable? :)