From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108625] AMDGPU - Can't even get Xorg to start - Kernel driver hangs with ring buffer timeout on ARM64 Date: Thu, 01 Nov 2018 15:59:10 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0614818432==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F37606E3F3 for ; Thu, 1 Nov 2018 15:59:09 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0614818432== Content-Type: multipart/alternative; boundary="15410879490.e9Bba.16329" Content-Transfer-Encoding: 7bit --15410879490.e9Bba.16329 Date: Thu, 1 Nov 2018 15:59:09 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108625 Bug ID: 108625 Summary: AMDGPU - Can't even get Xorg to start - Kernel driver hangs with ring buffer timeout on ARM64 Product: DRI Version: unspecified Hardware: ARM OS: Linux (All) Status: NEW Severity: blocker Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: raster@rasterman.com So we're going to have fun with this one... Start Xorg. It hangs in screen setup: #0 ioctl () at ../sysdeps/unix/sysv/linux/aarch64/ioctl.S:25 #1 0x0000ffffbb149334 in drmIoctl () from /lib/aarch64-linux-gnu/libdrm.= so.2 #2 0x0000ffffba5166b4 in amdgpu_cs_query_fence_status () from /lib/aarch64-linux-gnu/libdrm_amdgpu.so.1 #3 0x0000ffffb9ef37f8 in ?? () from /usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so #4 0x0000ffffb9dd148c in ?? () from /usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so #5 0x0000ffffb993d448 in ?? () from /usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so #6 0x0000ffffb993d4ac in ?? () from /usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so #7 0x0000ffffba54425c in ?? () from /usr/lib/xorg/modules/drivers/amdgpu_drv.so #8 0x0000ffffba537ca8 in ?? () from /usr/lib/xorg/modules/drivers/amdgpu_drv.so #9 0x0000aaaae7133348 in MapWindow () #10 0x0000aaaae710c820 in ?? () #11 0x0000ffffbad52720 in __libc_start_main (main=3D0x0, argc=3D0, argv= =3D0x0, init=3D, fini=3D, rtld_fini=3D, stack_end=3D) at ../csu/libc-start.c:310 And that ioctl hangs because of: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D10, last emitted seq=3D11 [drm] GPU recovery disabled. The amdgpu kernel driver reports: [drm] amdgpu kernel modesetting enabled. amdgpu 0000:89:00.0: enabling device (0100 -> 0102) amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_mc.bin amdgpu 0000:89:00.0: BAR 2: releasing [mem 0x14010000000-0x140101fffff 64= bit pref] amdgpu 0000:89:00.0: BAR 0: releasing [mem 0x14000000000-0x1400fffffff 64= bit pref] amdgpu 0000:89:00.0: BAR 0: assigned [mem 0x14000000000-0x140ffffffff 64b= it pref] amdgpu 0000:89:00.0: BAR 2: assigned [mem 0x14100000000-0x141001fffff 64b= it pref] amdgpu 0000:89:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) amdgpu 0000:89:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF [drm] amdgpu: 4096M of VRAM memory ready [drm] amdgpu: 4096M of GTT memory ready. amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_pfp_2.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_me_2.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_ce_2.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_rlc.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_mec_2.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_mec2_2.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_sdma.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_sdma1.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_uvd.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_vce.bin amdgpu 0000:89:00.0: firmware: direct-loading firmware amdgpu/polaris11_k_smc.bin [drm] Initialized amdgpu 3.26.0 20150101 for 0000:89:00.0 on minor 1 amdgpu 0000:89:00.0: vgaarb: changed VGA decodes: olddecodes=3Dio+mem,decodes=3Dnone:owns=3Dnone So here is where the fun begins. Kernel is: Linux noisy 4.18.0-2-arm64 #1 SMP Debian 4.18.10-2 (2018-10-07) aarch64 GNU/Linux It's Debian unstable on a Cavium Thunder-X2 64bit ARM system (2 CPUs with 32 cores each, 256 cores total with 4 way SMT enabled) with a bunch of PCIE sl= ots. There is an Nvidia card that works.... to a decent degree and an on-board P= CIE dumb framebuffer display device (ASPEED), but I'd rather a more open stack = etc. - I've fiddled with xorg configs to get it to ignore other devices other th= an the AMD one like with: Section "ServerFlags" Option "AutoAddGPU" "false" EndSection Section "Device" Identifier "amdgpu" Driver "amdgpu" BusID "PCI:137:0:0" Option "DRI" "2" Option "TearFree" "on" EndSection I've even put the AMD card in the same slot as the Nvidia one with the same results, so it's not a slot specific issue it seems. So where should I start poking to see where this very early stage ring gfx timeout is originating f= rom specifically... I'm willing to start the fun of compiling kernels etc. to d= ig through this. So how can I help solve this and make AMD cards portable and usable? :) --=20 You are receiving this mail because: You are the assignee for the bug.= --15410879490.e9Bba.16329 Date: Thu, 1 Nov 2018 15:59:09 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 108625
Summary AMDGPU - Can't even get Xorg to start - Kernel driver hangs = with ring buffer timeout on ARM64
Product DRI
Version unspecified
Hardware ARM
OS Linux (All)
Status NEW
Severity blocker
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter raster@rasterman.com

So we're going to have fun with this one...

Start Xorg. It hangs in screen setup:

  #0  ioctl () at ../sysdeps/unix/sysv/linux/aarch64/ioctl.S:25
  #1  0x0000ffffbb149334 in drmIoctl () from /lib/aarch64-linux-gnu/libdrm.=
so.2
  #2  0x0000ffffba5166b4 in amdgpu_cs_query_fence_status () from
/lib/aarch64-linux-gnu/libdrm_amdgpu.so.1
  #3  0x0000ffffb9ef37f8 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #4  0x0000ffffb9dd148c in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #5  0x0000ffffb993d448 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #6  0x0000ffffb993d4ac in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #7  0x0000ffffba54425c in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
  #8  0x0000ffffba537ca8 in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
  #9  0x0000aaaae7133348 in MapWindow ()
  #10 0x0000aaaae710c820 in ?? ()
  #11 0x0000ffffbad52720 in __libc_start_main (main=3D0x0, argc=3D0, argv=
=3D0x0,
init=3D<optimized out>, fini=3D<optimized out>, rtld_fini=3D<=
;optimized out>,
stack_end=3D<optimized out>) at ../csu/libc-start.c:310

And that ioctl hangs because of:

  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled
seq=3D10, last emitted seq=3D11
  [drm] GPU recovery disabled.

The amdgpu kernel driver reports:

  [drm] amdgpu kernel modesetting enabled.
  amdgpu 0000:89:00.0: enabling device (0100 -> 0102)
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mc.bin
  amdgpu 0000:89:00.0: BAR 2: releasing [mem 0x14010000000-0x140101fffff 64=
bit
pref]
  amdgpu 0000:89:00.0: BAR 0: releasing [mem 0x14000000000-0x1400fffffff 64=
bit
pref]
  amdgpu 0000:89:00.0: BAR 0: assigned [mem 0x14000000000-0x140ffffffff 64b=
it
pref]
  amdgpu 0000:89:00.0: BAR 2: assigned [mem 0x14100000000-0x141001fffff 64b=
it
pref]
  amdgpu 0000:89:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF
(4096M used)
  amdgpu 0000:89:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
  [drm] amdgpu: 4096M of VRAM memory ready
  [drm] amdgpu: 4096M of GTT memory ready.
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_pfp_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_me_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_ce_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_rlc.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec2_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma1.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_uvd.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_vce.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_k_smc.bin
  [drm] Initialized amdgpu 3.26.0 20150101 for 0000:89:00.0 on minor 1
  amdgpu 0000:89:00.0: vgaarb: changed VGA decodes:
olddecodes=3Dio+mem,decodes=3Dnone:owns=3Dnone

So here is where the fun begins. Kernel is:

  Linux noisy 4.18.0-2-arm64 #1 SMP Debian 4.18.10-2 (2018-10-07) aarch64
GNU/Linux

It's Debian unstable on a Cavium Thunder-X2 64bit ARM system (2 CPUs with 32
cores each, 256 cores total with 4 way SMT enabled) with a bunch of PCIE sl=
ots.
There is an Nvidia card that works.... to a decent degree and an on-board P=
CIE
dumb framebuffer display device (ASPEED), but I'd rather a more open stack =
etc.
- I've fiddled with xorg configs to get it to ignore other devices other th=
an
the AMD one like with:

  Section "ServerFlags"
         Option "AutoAddGPU" "false"
  EndSection

  Section "Device"
         Identifier "amdgpu"
         Driver "amdgpu"
         BusID "PCI:137:0:0"
         Option "DRI" "2"
         Option "TearFree" "on"
  EndSection

I've even put the AMD card in the same slot as the Nvidia one with the same
results, so it's not a slot specific issue it seems. So where should I start
poking to see where this very early stage ring gfx timeout is originating f=
rom
specifically... I'm willing to start the fun of compiling kernels etc. to d=
ig
through this. So how can I help solve this and make AMD cards portable and
usable? :)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15410879490.e9Bba.16329-- --===============0614818432== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0614818432==--