From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 109181] Mesa git causes AMDGPU hang, Tonga Firepro chip W7170M
MXM
Date: Sun, 30 Dec 2018 08:48:27 +0000
Message-ID:
Bug ID
109181
Summary
Mesa git causes AMDGPU hang, Tonga Firepro chip W7170M MXM
Product
Mesa
Version
git
Hardware
x86-64 (AMD64)
OS
Linux (All)
Status
NEW
Severity
normal
Priority
medium
Component
Drivers/Gallium/radeonsi
Assignee
dri-devel@lists.freedesktop.org
Reporter
Babblebones@gmail.com
QA Contact
dri-devel@lists.freedesktop.org
Hello,
I've run into a bug wherein AMDGPU hangs as mesa 19 does something it doesn=
't
like in particular applications. Mesa 18.3 is totally fine, as per padoka
stable ppa. Padoka unstable and Oibaf both crash.
OpenGL like team fortress 2 seem to be fine but Vulkan (DXVK) wrapped
applications will just bury the whole GPU.
Even steam itself seems to hard lock when I start it which I can sidesetep =
by
allowing GPU recovery with the kernel parameter and proceed. It does not
recover when I start something vulkan and graphically intensive from steam
itself.
Below is my dmesg from the card. This may be an issue with mesa or it may be
AMDGPU, I am very curious as to which as this has affected me for about a m=
onth
now across both Arch Linux and my new Ubuntu install, making mesa git unusa=
ble
on my new card.
To make matters worse there is an issue wherein the EDID is messed up on bo=
ot
with amdgpu.dc=3D1, worth mentioning if it's part of a deeper issue in AMDG=
PU.
[ 55.671991] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D1614, emitted seq=3D1616
[ 55.671996] amdgpu 0000:01:00.0: GPU reset begin!
[ 55.678313] amdgpu 0000:01:00.0: GPU pci config reset
[ 55.682867] amdgpu 0000:01:00.0: GPU reset succeeded, trying to resume
[ 55.683678] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E900=
0).
[ 55.687097] amdgpu: [powerplay] dpm has been enabled
[ 55.741072] [drm] UVD initialized successfully.
[ 55.950146] [drm] VCE initialized successfully.
[ 55.952966] [drm] recover vram bo from shadow start
[ 55.955698] [drm] recover vram bo from shadow done
[ 55.955746] WARNING: CPU: 5 PID: 120 at
/build/linux-liquorix-eJ9K8E/linux-liquorix-4.19/include/linux/dma-fence.h:=
503
drm_sched_job_recovery+0x1db/0x1e0 [gpu_sched]
[ 55.955747] Modules linked in: rfcomm fuse ccm ext4 jbd2 fscrypto af_pac=
ket
cmac bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_common btusb btrtl btbcm btintel videodev bluetooth media
ecdh_generic crc16 nls_utf8 nls_cp437 vfat fat ext2 mbcache squashfs loop
snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
snd_hda_codec snd_hda_core arc4 snd_hwdep intel_rapl snd_pcm
x86_pkg_temp_thermal intel_powerclamp snd_seq_dummy coretemp snd_seq_oss
snd_seq_midi kvm_intel snd_seq_midi_event ath9k ath9k_common snd_rawmidi
ath9k_hw kvm snd_seq ath irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel snd_seq_device mac80211 snd_timer pcbc aesni_intel
aes_x86_64 crypto_simd cryptd glue_helper joydev input_leds snd cfg80211
tpm_infineon hp_wmi sparse_keymap
[ 55.955775] serio_raw wmi_bmof sg lpc_ich rfkill soundcore tpm_tis
tpm_tis_core tpm rng_core hp_accel lis3lv02d input_polldev evdev pcc_cpufreq
acpi_cpufreq battery ac hp_wireless sch_fq_codel parport_pc ppdev lp parport
ip_tables x_tables ipv6 crc_ccitt autofs4 btrfs xor raid6_pq libcrc32c
crc32c_generic bcache crc64 sr_mod cdrom sd_mod hid_generic usbhid amdkfd
amd_iommu_v2 amdgpu chash gpu_sched ahci i2c_algo_bit libahci ttm sdhci_pci
libata cqhci drm_kms_helper sdhci ehci_pci crc32c_intel firewire_ohci xhci_=
pci
drm psmouse i2c_i801 firewire_core scsi_mod mmc_core crc_itu_t e1000e ehci_=
hcd
i2c_core xhci_hcd thermal wmi rtc_cmos video button
[ 55.955806] CPU: 5 PID: 120 Comm: kworker/5:1 Not tainted
4.19.0-13.1-liquorix-amd64 #1 liquorix 4.19-8ubuntu1~bionic
[ 55.955807] Hardware name: Hewlett-Packard /176C, BIOS 68IAV Ver. F.70
07/30/2018
[ 55.955809] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 55.955811] RIP: 0010:drm_sched_job_recovery+0x1db/0x1e0 [gpu_sched]
[ 55.955812] Code: ff ff ff 48 8b 3c 24 48 83 c4 20 5b 5d 41 5c 41 5d 41 =
5e
41 5f e9 d5 ab 71 e1 4c 89 f6 4c 89 ff e8 5a fd ff ff e9 33 ff ff ff <0f=
> 0b eb
93 90 55 53 48 89 fb 48 8b 46 10 48 89 f7 48 8b 68 08 48
[ 55.955813] RSP: 0018:ffffc900038e7de0 EFLAGS: 00210202
[ 55.955814] RAX: 0000000000000523 RBX: ffff888811704df0 RCX:
0000000000000001
[ 55.955814] RDX: ffff88878c5d8050 RSI: ffff888107c18c00 RDI:
0000000000200286
[ 55.955815] RBP: ffff888811704d10 R08: 0000000000000000 R09:
0000000000000001
[ 55.955816] R10: ffffc90003213dd0 R11: 0000000000000026 R12:
ffff88881491cb00
[ 55.955816] R13: ffff888811704e28 R14: ffff88878c5d8000 R15:
ffff888811704c98
[ 55.955817] FS: 0000000000000000(0000) GS:ffff88881db40000(0000)
knlGS:0000000000000000
[ 55.955818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.955819] CR2: 00000000c12ec008 CR3: 000000000320a003 CR4:
00000000001606e0
[ 55.955819] Call Trace:
[ 55.955855] amdgpu_device_gpu_recover+0x3bd/0xa30 [amdgpu]
[ 55.955860] process_one_work+0x1f5/0x420
[ 55.955862] worker_thread+0x43/0x490
[ 55.955864] ? rescuer_thread+0x490/0x490
[ 55.955865] kthread+0x153/0x170
[ 55.955866] ? kthread_park+0x80/0x80
[ 55.955869] ret_from_fork+0x35/0x40
[ 55.955870] ---[ end trace 2f9f5d70a335c56f ]---
[ 55.955875] [drm] Skip scheduling IBs!
[ 56.541150] amdgpu 0000:01:00.0: GPU reset(1) succeeded!