All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 108625] AMDGPU - Can't even get Xorg to start - Kernel driver hangs  with ring buffer timeout on ARM64
Date: Thu, 01 Nov 2018 15:59:10 +0000	[thread overview]
Message-ID: <bug-108625-502@http.bugs.freedesktop.org/> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 5096 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108625

            Bug ID: 108625
           Summary: AMDGPU - Can't even get Xorg to start - Kernel driver
                    hangs  with ring buffer timeout on ARM64
           Product: DRI
           Version: unspecified
          Hardware: ARM
                OS: Linux (All)
            Status: NEW
          Severity: blocker
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: raster@rasterman.com

So we're going to have fun with this one...

Start Xorg. It hangs in screen setup:

  #0  ioctl () at ../sysdeps/unix/sysv/linux/aarch64/ioctl.S:25
  #1  0x0000ffffbb149334 in drmIoctl () from /lib/aarch64-linux-gnu/libdrm.so.2
  #2  0x0000ffffba5166b4 in amdgpu_cs_query_fence_status () from
/lib/aarch64-linux-gnu/libdrm_amdgpu.so.1
  #3  0x0000ffffb9ef37f8 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #4  0x0000ffffb9dd148c in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #5  0x0000ffffb993d448 in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #6  0x0000ffffb993d4ac in ?? () from
/usr/lib/aarch64-linux-gnu/dri/radeonsi_dri.so
  #7  0x0000ffffba54425c in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
  #8  0x0000ffffba537ca8 in ?? () from
/usr/lib/xorg/modules/drivers/amdgpu_drv.so
  #9  0x0000aaaae7133348 in MapWindow ()
  #10 0x0000aaaae710c820 in ?? ()
  #11 0x0000ffffbad52720 in __libc_start_main (main=0x0, argc=0, argv=0x0,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=<optimized out>) at ../csu/libc-start.c:310

And that ioctl hangs because of:

  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled
seq=10, last emitted seq=11
  [drm] GPU recovery disabled.

The amdgpu kernel driver reports:

  [drm] amdgpu kernel modesetting enabled.
  amdgpu 0000:89:00.0: enabling device (0100 -> 0102)
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mc.bin
  amdgpu 0000:89:00.0: BAR 2: releasing [mem 0x14010000000-0x140101fffff 64bit
pref]
  amdgpu 0000:89:00.0: BAR 0: releasing [mem 0x14000000000-0x1400fffffff 64bit
pref]
  amdgpu 0000:89:00.0: BAR 0: assigned [mem 0x14000000000-0x140ffffffff 64bit
pref]
  amdgpu 0000:89:00.0: BAR 2: assigned [mem 0x14100000000-0x141001fffff 64bit
pref]
  amdgpu 0000:89:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF
(4096M used)
  amdgpu 0000:89:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
  [drm] amdgpu: 4096M of VRAM memory ready
  [drm] amdgpu: 4096M of GTT memory ready.
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_pfp_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_me_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_ce_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_rlc.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_mec2_2.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_sdma1.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_uvd.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_vce.bin
  amdgpu 0000:89:00.0: firmware: direct-loading firmware
amdgpu/polaris11_k_smc.bin
  [drm] Initialized amdgpu 3.26.0 20150101 for 0000:89:00.0 on minor 1
  amdgpu 0000:89:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none

So here is where the fun begins. Kernel is:

  Linux noisy 4.18.0-2-arm64 #1 SMP Debian 4.18.10-2 (2018-10-07) aarch64
GNU/Linux

It's Debian unstable on a Cavium Thunder-X2 64bit ARM system (2 CPUs with 32
cores each, 256 cores total with 4 way SMT enabled) with a bunch of PCIE slots.
There is an Nvidia card that works.... to a decent degree and an on-board PCIE
dumb framebuffer display device (ASPEED), but I'd rather a more open stack etc.
- I've fiddled with xorg configs to get it to ignore other devices other than
the AMD one like with:

  Section "ServerFlags"
         Option "AutoAddGPU" "false"
  EndSection

  Section "Device"
         Identifier "amdgpu"
         Driver "amdgpu"
         BusID "PCI:137:0:0"
         Option "DRI" "2"
         Option "TearFree" "on"
  EndSection

I've even put the AMD card in the same slot as the Nvidia one with the same
results, so it's not a slot specific issue it seems. So where should I start
poking to see where this very early stage ring gfx timeout is originating from
specifically... I'm willing to start the fun of compiling kernels etc. to dig
through this. So how can I help solve this and make AMD cards portable and
usable? :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

             reply	other threads:[~2018-11-01 15:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-01 15:59 bugzilla-daemon [this message]
2018-11-01 17:55 ` [Bug 108625] AMDGPU - Can't even get Xorg to start - Kernel driver hangs with ring buffer timeout on ARM64 bugzilla-daemon
2018-11-02 12:14 ` bugzilla-daemon
2018-11-02 12:15 ` bugzilla-daemon
2018-11-02 12:15 ` bugzilla-daemon
2018-11-02 12:16 ` bugzilla-daemon
2018-11-02 18:41 ` bugzilla-daemon
2018-11-02 18:41 ` bugzilla-daemon
2018-11-04 13:17 ` bugzilla-daemon
2018-11-04 16:15 ` bugzilla-daemon
2018-11-05  9:08 ` bugzilla-daemon
2018-11-05 15:20 ` bugzilla-daemon
2018-11-05 15:32 ` bugzilla-daemon
2018-11-09 20:32 ` bugzilla-daemon
2018-11-09 20:33 ` bugzilla-daemon
2018-11-19 13:00 ` bugzilla-daemon
2018-11-19 13:39 ` bugzilla-daemon
2019-03-21 20:55 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108625-502@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.