From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 91880] Radeonsi on Grenada cards (r9 390) exceptionally
unstable and poorly performing
Date: Sun, 18 Mar 2018 20:31:41 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1842849196=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id 258356E2CB
for ; Sun, 18 Mar 2018 20:31:43 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1842849196==
Content-Type: multipart/alternative; boundary="15214051031.D4aa.26066"
Content-Transfer-Encoding: 7bit
--15214051031.D4aa.26066
Date: Sun, 18 Mar 2018 20:31:43 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D91880
--- Comment #186 from Chris Heald ---
I've been doing a lot of experimentation, and I've found a few more things =
that
I feel are probably related:
* I can force a system hard-lock by doing anything which disables a monitor.
Notably, going full-screen under KDE/Xorg does this, but I can trigger it j=
ust
as easily by disabling a monitor with xrandr. Fullscreen under gnome doesn't
seem to trigger the issue, which I suspect is due to gnome's using mutter f=
or
screen management.
* Occassioanlly, the system boots up and gets stuck with a 150MHz memory cl=
ock,
rather than clocking up to the 1500MHz state. This causes the display
corruption even if the sclk is set to 500MHz+. Setting the mclk mask manual=
ly
fixes display corruption.
* I've been experimenting with different kernels ranging from 4.4 to 4.16rc=
5.
Earlier kernels feel more susceptible to hard-locking, though the later ker=
nels
aren't immune to it.
* I tried a fresh Ubuntu 16.04 LTS install, and while it did NOT exhibit the
artifacting behavior, the system hard-locked within a few minutes of light
desktop usage.
I've had a few classes of exceptions show up in kern.log:
On 4.4, my kde/wayland session hard-froze when moving a window, and produce=
d a
log like this:
kernel: [ 116.904013] radeon 0000:06:00.0: GPU fault detected: 146 0x0d8e0=
40c
kernel: [ 116.904017] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_=
ADDR
0x0001776C
kernel: [ 116.904019] radeon 0000:06:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E10400C
kernel: [ 116.904021] VM fault (0x0c, vmid 7) at page 96108, read from 'TC=
3'
(0x54433300) (260)
kernel: [ 127.306156] radeon 0000:06:00.0: ring 0 stalled for more than
10404msec
kernel: [ 127.306164] radeon 0000:06:00.0: GPU lockup (current fence id
0x0000000000002419 last fence id 0x0000000000002431 on ring 0)
kernel: [ 127.357942] radeon 0000:06:00.0: Saved 2200 dwords of commands on
ring 0.
kernel: [ 127.357961] radeon 0000:06:00.0: GPU softreset: 0x00000009
kernel: [ 127.357963] radeon 0000:06:00.0: GRBM_STATUS=3D0xF5D01028
kernel: [ 127.357965] radeon 0000:06:00.0: GRBM_STATUS2=3D0x50000008
kernel: [ 127.357968] radeon 0000:06:00.0: GRBM_STATUS_SE0=3D0xEC400002
kernel: [ 127.357970] radeon 0000:06:00.0: GRBM_STATUS_SE1=3D0xEC400002
kernel: [ 127.357972] radeon 0000:06:00.0: GRBM_STATUS_SE2=3D0x08000002
kernel: [ 127.357974] radeon 0000:06:00.0: GRBM_STATUS_SE3=3D0xEC000002
kernel: [ 127.357976] radeon 0000:06:00.0: SRBM_STATUS=3D0x20000040
kernel: [ 127.357978] radeon 0000:06:00.0: SRBM_STATUS2=3D0x00000000
kernel: [ 127.357980] radeon 0000:06:00.0: SDMA0_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.357982] radeon 0000:06:00.0: SDMA1_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.357984] radeon 0000:06:00.0: CP_STAT =3D 0x84228600
kernel: [ 127.357986] radeon 0000:06:00.0: CP_STALLED_STAT1 =3D 0x00000c=
00
kernel: [ 127.357988] radeon 0000:06:00.0: CP_STALLED_STAT2 =3D 0x400000=
00
kernel: [ 127.357991] radeon 0000:06:00.0: CP_STALLED_STAT3 =3D 0x000004=
00
kernel: [ 127.357993] radeon 0000:06:00.0: CP_CPF_BUSY_STAT =3D 0x000000=
06
kernel: [ 127.357995] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 =3D 0x00=
000003
kernel: [ 127.357997] radeon 0000:06:00.0: CP_CPF_STATUS =3D 0x80000063
kernel: [ 127.357999] radeon 0000:06:00.0: CP_CPC_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.358001] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.358003] radeon 0000:06:00.0: CP_CPC_STATUS =3D 0x00000000
kernel: [ 127.358005] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_=
ADDR
0x00000000
kernel: [ 127.358007] radeon 0000:06:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
kernel: [ 127.404670] radeon 0000:06:00.0: GRBM_SOFT_RESET=3D0x00010001
kernel: [ 127.404725] radeon 0000:06:00.0: SRBM_SOFT_RESET=3D0x00000100
kernel: [ 127.405874] radeon 0000:06:00.0: GRBM_STATUS=3D0x00003028
kernel: [ 127.405876] radeon 0000:06:00.0: GRBM_STATUS2=3D0x00000008
kernel: [ 127.405878] radeon 0000:06:00.0: GRBM_STATUS_SE0=3D0x00000006
kernel: [ 127.405880] radeon 0000:06:00.0: GRBM_STATUS_SE1=3D0x00000006
kernel: [ 127.405882] radeon 0000:06:00.0: GRBM_STATUS_SE2=3D0x00000006
kernel: [ 127.405884] radeon 0000:06:00.0: GRBM_STATUS_SE3=3D0x00000006
kernel: [ 127.405885] radeon 0000:06:00.0: SRBM_STATUS=3D0x20000A40
kernel: [ 127.405887] radeon 0000:06:00.0: SRBM_STATUS2=3D0x00000000
kernel: [ 127.405889] radeon 0000:06:00.0: SDMA0_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.405891] radeon 0000:06:00.0: SDMA1_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT =3D 0x00000000
kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT =3D 0x00000000
kernel: [ 127.405895] radeon 0000:06:00.0: CP_STALLED_STAT1 =3D 0x000000=
00
kernel: [ 127.405896] radeon 0000:06:00.0: CP_STALLED_STAT2 =3D 0x000000=
00
kernel: [ 127.405898] radeon 0000:06:00.0: CP_STALLED_STAT3 =3D 0x000000=
00
kernel: [ 127.405900] radeon 0000:06:00.0: CP_CPF_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.405902] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.405903] radeon 0000:06:00.0: CP_CPF_STATUS =3D 0x00000000
kernel: [ 127.405905] radeon 0000:06:00.0: CP_CPC_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.405907] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.405909] radeon 0000:06:00.0: CP_CPC_STATUS =3D 0x00000000
kernel: [ 127.405929] radeon 0000:06:00.0: GPU reset succeeded, trying to
resume
kernel: [ 127.658172] [drm:ci_dpm_enable [radeon]] *ERROR* ci_start_dpm fa=
iled
kernel: [ 127.658189] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm
resume failed
kernel: [ 127.658194] [drm] probing gen 2 caps for device 1022:1453 =3D 73=
3903/e
kernel: [ 127.658197] [drm] PCIE gen 3 link speeds already enabled
kernel: [ 127.664213] [drm] PCIE GART of 2048M enabled (table at
0x0000000000326000).
kernel: [ 127.664341] radeon 0000:06:00.0: WB enabled
kernel: [ 127.664344] radeon 0000:06:00.0: fence driver on ring 0 use gpu =
addr
0x0000000200000c00 and cpu addr 0xffff8807f3799c00
kernel: [ 127.664346] radeon 0000:06:00.0: fence driver on ring 1 use gpu =
addr
0x0000000200000c04 and cpu addr 0xffff8807f3799c04
kernel: [ 127.664347] radeon 0000:06:00.0: fence driver on ring 2 use gpu =
addr
0x0000000200000c08 and cpu addr 0xffff8807f3799c08
kernel: [ 127.664349] radeon 0000:06:00.0: fence driver on ring 3 use gpu =
addr
0x0000000200000c0c and cpu addr 0xffff8807f3799c0c
kernel: [ 127.664350] radeon 0000:06:00.0: fence driver on ring 4 use gpu =
addr
0x0000000200000c10 and cpu addr 0xffff8807f3799c10
kernel: [ 127.664772] radeon 0000:06:00.0: fence driver on ring 5 use gpu =
addr
0x0000000000078b30 and cpu addr 0xffffc90003c38b30
kernel: [ 127.664933] radeon 0000:06:00.0: fence driver on ring 6 use gpu =
addr
0x0000000200000c18 and cpu addr 0xffff8807f3799c18
kernel: [ 127.664934] radeon 0000:06:00.0: fence driver on ring 7 use gpu =
addr
0x0000000200000c1c and cpu addr 0xffff8807f3799c1c
kernel: [ 127.666482] [drm] ring test on 0 succeeded in 2 usecs
kernel: [ 127.666568] [drm] ring test on 1 succeeded in 2 usecs
kernel: [ 127.666586] [drm] ring test on 2 succeeded in 2 usecs
kernel: [ 127.666735] [drm] ring test on 3 succeeded in 3 usecs
kernel: [ 127.666745] [drm] ring test on 4 succeeded in 3 usecs
kernel: [ 127.692636] [drm] ring test on 5 succeeded in 1 usecs
kernel: [ 127.712543] [drm] UVD initialized successfully.
kernel: [ 127.813896] [drm] ring test on 6 succeeded in 708 usecs
kernel: [ 127.813920] [drm] ring test on 7 succeeded in 3 usecs
kernel: [ 127.813921] [drm] VCE initialized successfully.
kernel: [ 127.814029] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm
resume failed
On 4.15.10-041510-generic, I left my computer running overnight and came ba=
ck
to it frozen with this in kern.log:
Mar 18 04:25:10 Gaia kernel: [ 559.092721] BUG: stack guard page was hit at
000000001ecd1fa8 (stack is 0000000020941864..00000000cf703fbf)
Mar 18 04:25:10 Gaia kernel: [ 559.092729] kernel stack overflow (page fau=
lt):
0000 [#1] SMP NOPTI
Mar 18 04:25:10 Gaia kernel: [ 559.092733] Modules linked in:
nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay xfrm_user
xfrm4_tunnel tunnel4 l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel ipcomp
xfrm_ipcomp udp_tunnel esp4 pppox ah4 af_key xfrm_algo xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c
ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables devlink iptable_filter binfmt_misc
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
edac_mce_amd snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib kvm_a=
md
snd_hwdep kvm uvcvideo snd_seq_midi irqbypass snd_seq_midi_event snd_rawmidi
crct10dif_pclmul videobuf2_vmalloc crc32_pclmul
Mar 18 04:25:10 Gaia kernel: [ 559.092784] videobuf2_memops videobuf2_v4l2
snd_seq ghash_clmulni_intel videobuf2_core snd_pcm pcbc videodev snd_seq_de=
vice
media snd_timer joydev aesni_intel aes_x86_64 snd crypto_simd input_leds
glue_helper serio_raw soundcore cryptd ccp k10temp shpchp mac_hid wmi_bmof
sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_gen=
eric
usbhid hid amdkfd amd_iommu_v2 amdgpu chash radeon i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4
r8169 ahci mii libahci wmi gpio_amdpt gpio_generic
Mar 18 04:25:10 Gaia kernel: [ 559.092832] CPU: 5 PID: 7352 Comm: tail
Tainted: G W 4.15.10-041510-generic #201803152130
Mar 18 04:25:10 Gaia kernel: [ 559.092834] Hardware name: Gigabyte Technol=
ogy
Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F10 12/01/2017
Mar 18 04:25:10 Gaia kernel: [ 559.092881] RIP:
0010:amdgpu_get_pp_num_states+0x88/0x120 [amdgpu]
Mar 18 04:25:10 Gaia kernel: [ 559.092884] RSP: 0018:ffffb3cb8a837ca8 EFLA=
GS:
00010282
Mar 18 04:25:10 Gaia kernel: [ 559.092888] RAX: 00000000000000d4 RBX:
ffffb3cb8a837cac RCX: 0000000000000001
Mar 18 04:25:10 Gaia kernel: [ 559.092890] RDX: 0000000000000000 RSI:
ffffffffc087a88c RDI: 0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092893] RBP: ffffb3cb8a837d20 R08:
ffffffffc087a865 R09: ffff88c9ecebd98b
Mar 18 04:25:10 Gaia kernel: [ 559.092895] R10: 0000000000000000 R11:
ffff88c9ecebd98a R12: ffff88c9ecebd000
Mar 18 04:25:10 Gaia kernel: [ 559.092898] R13: ffffffffc087a858 R14:
00000000000000d4 R15: 0000000000000993
Mar 18 04:25:10 Gaia kernel: [ 559.092901] FS: 00007fccb1787540(0000)
GS:ffff88c9fe740000(0000) knlGS:0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092904] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Mar 18 04:25:10 Gaia kernel: [ 559.092906] CR2: ffffb3cb8a838000 CR3:
00000004a30d0000 CR4: 00000000003406e0
Mar 18 04:25:10 Gaia kernel: [ 559.092909] Call Trace:
Mar 18 04:25:10 Gaia kernel: [ 559.092918] ?
tty_insert_flip_string_fixed_flag+0x86/0xe0
Mar 18 04:25:10 Gaia kernel: [ 559.092925] dev_attr_show+0x23/0x60
Mar 18 04:25:10 Gaia kernel: [ 559.092931] sysfs_kf_seq_show+0xa3/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092935] kernfs_seq_show+0x27/0x30
Mar 18 04:25:10 Gaia kernel: [ 559.092939] seq_read+0xe5/0x430
Mar 18 04:25:10 Gaia kernel: [ 559.092943] kernfs_fop_read+0x137/0x180
Mar 18 04:25:10 Gaia kernel: [ 559.092948] __vfs_read+0x3a/0x170
Mar 18 04:25:10 Gaia kernel: [ 559.092954] ?
security_file_permission+0xa1/0xc0
Mar 18 04:25:10 Gaia kernel: [ 559.092958] vfs_read+0x8e/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092962] SyS_read+0x55/0xc0
Mar 18 04:25:10 Gaia kernel: [ 559.092967] do_syscall_64+0x73/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092973]=20
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Mar 18 04:25:10 Gaia kernel: [ 559.092976] RIP: 0033:0x7fccb12b5081
Mar 18 04:25:10 Gaia kernel: [ 559.092978] RSP: 002b:00007ffc17d84d68 EFLA=
GS:
00000246 ORIG_RAX: 0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092982] RAX: ffffffffffffffda RBX:
0000000000002000 RCX: 00007fccb12b5081
Mar 18 04:25:10 Gaia kernel: [ 559.092984] RDX: 0000000000002000 RSI:
00007ffc17d84db0 RDI: 0000000000000003
Mar 18 04:25:10 Gaia kernel: [ 559.092986] RBP: 0000000000000000 R08:
0000000000000000 R09: 00007fccb1313b40
Mar 18 04:25:10 Gaia kernel: [ 559.092988] R10: 00000000fffffff3 R11:
0000000000000246 R12: 00007ffc17d84db0
Mar 18 04:25:10 Gaia kernel: [ 559.092991] R13: 0000000000000003 R14:
ffffffffffffffff R15: 000055e8f3b747e0
Mar 18 04:25:10 Gaia kernel: [ 559.092994] Code: c7 c2 7a a8 87 c0 be 00 1=
0 00
00 4c 89 e7 e8 d0 08 90 d1 41 89 c7 8b 45 8c 85 c0 74 72 48 8d 5d 8c 45 31 =
f6
49 c7 c5 58 a8 87 c0 <42> 8b 44 b3 04 44 89 f1 4d 89 e8 83 f8 0a 74 2d 83 f=
8 02
49 c7
Mar 18 04:25:10 Gaia kernel: [ 559.093080] RIP:
amdgpu_get_pp_num_states+0x88/0x120 [amdgpu] RSP: ffffb3cb8a837ca8
Mar 18 04:25:10 Gaia kernel: [ 559.093084] ---[ end trace dbba232a9ca4c5c7
]---
Possibly related, if I `cat pp_num_states` from a terminal, I get a
segmentation fault:
root@Gaia:~# cat /sys/class/drm/card0/device/pp_num_states
Segmentation fault
I'm going to continue to dig. Let me know what logs/tests/whatnot I can pro=
vide
that would be useful.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15214051031.D4aa.26066
Date: Sun, 18 Mar 2018 20:31:43 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 186
on bug 91880<=
/a>
from <=
span class=3D"fn">Chris Heald
I've been doing a lot of experimentation, and I've found a few=
more things that
I feel are probably related:
* I can force a system hard-lock by doing anything which disables a monitor.
Notably, going full-screen under KDE/Xorg does this, but I can trigger it j=
ust
as easily by disabling a monitor with xrandr. Fullscreen under gnome doesn't
seem to trigger the issue, which I suspect is due to gnome's using mutter f=
or
screen management.
* Occassioanlly, the system boots up and gets stuck with a 150MHz memory cl=
ock,
rather than clocking up to the 1500MHz state. This causes the display
corruption even if the sclk is set to 500MHz+. Setting the mclk mask manual=
ly
fixes display corruption.
* I've been experimenting with different kernels ranging from 4.4 to 4.16rc=
5.
Earlier kernels feel more susceptible to hard-locking, though the later ker=
nels
aren't immune to it.
* I tried a fresh Ubuntu 16.04 LTS install, and while it did NOT exhibit the
artifacting behavior, the system hard-locked within a few minutes of light
desktop usage.
I've had a few classes of exceptions show up in kern.log:
On 4.4, my kde/wayland session hard-froze when moving a window, and produce=
d a
log like this:
kernel: [ 116.904013] radeon 0000:06:00.0: GPU fault detected: 146 0x0d8e0=
40c
kernel: [ 116.904017] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_=
ADDR
0x0001776C
kernel: [ 116.904019] radeon 0000:06:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E10400C
kernel: [ 116.904021] VM fault (0x0c, vmid 7) at page 96108, read from 'TC=
3'
(0x54433300) (260)
kernel: [ 127.306156] radeon 0000:06:00.0: ring 0 stalled for more than
10404msec
kernel: [ 127.306164] radeon 0000:06:00.0: GPU lockup (current fence id
0x0000000000002419 last fence id 0x0000000000002431 on ring 0)
kernel: [ 127.357942] radeon 0000:06:00.0: Saved 2200 dwords of commands on
ring 0.
kernel: [ 127.357961] radeon 0000:06:00.0: GPU softreset: 0x00000009
kernel: [ 127.357963] radeon 0000:06:00.0: GRBM_STATUS=3D0xF5D01028
kernel: [ 127.357965] radeon 0000:06:00.0: GRBM_STATUS2=3D0x50000008
kernel: [ 127.357968] radeon 0000:06:00.0: GRBM_STATUS_SE0=3D0xEC400002
kernel: [ 127.357970] radeon 0000:06:00.0: GRBM_STATUS_SE1=3D0xEC400002
kernel: [ 127.357972] radeon 0000:06:00.0: GRBM_STATUS_SE2=3D0x08000002
kernel: [ 127.357974] radeon 0000:06:00.0: GRBM_STATUS_SE3=3D0xEC000002
kernel: [ 127.357976] radeon 0000:06:00.0: SRBM_STATUS=3D0x20000040
kernel: [ 127.357978] radeon 0000:06:00.0: SRBM_STATUS2=3D0x00000000
kernel: [ 127.357980] radeon 0000:06:00.0: SDMA0_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.357982] radeon 0000:06:00.0: SDMA1_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.357984] radeon 0000:06:00.0: CP_STAT =3D 0x84228600
kernel: [ 127.357986] radeon 0000:06:00.0: CP_STALLED_STAT1 =3D 0x00000c=
00
kernel: [ 127.357988] radeon 0000:06:00.0: CP_STALLED_STAT2 =3D 0x400000=
00
kernel: [ 127.357991] radeon 0000:06:00.0: CP_STALLED_STAT3 =3D 0x000004=
00
kernel: [ 127.357993] radeon 0000:06:00.0: CP_CPF_BUSY_STAT =3D 0x000000=
06
kernel: [ 127.357995] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 =3D 0x00=
000003
kernel: [ 127.357997] radeon 0000:06:00.0: CP_CPF_STATUS =3D 0x80000063
kernel: [ 127.357999] radeon 0000:06:00.0: CP_CPC_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.358001] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.358003] radeon 0000:06:00.0: CP_CPC_STATUS =3D 0x00000000
kernel: [ 127.358005] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_=
ADDR
0x00000000
kernel: [ 127.358007] radeon 0000:06:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
kernel: [ 127.404670] radeon 0000:06:00.0: GRBM_SOFT_RESET=3D0x00010001
kernel: [ 127.404725] radeon 0000:06:00.0: SRBM_SOFT_RESET=3D0x00000100
kernel: [ 127.405874] radeon 0000:06:00.0: GRBM_STATUS=3D0x00003028
kernel: [ 127.405876] radeon 0000:06:00.0: GRBM_STATUS2=3D0x00000008
kernel: [ 127.405878] radeon 0000:06:00.0: GRBM_STATUS_SE0=3D0x00000006
kernel: [ 127.405880] radeon 0000:06:00.0: GRBM_STATUS_SE1=3D0x00000006
kernel: [ 127.405882] radeon 0000:06:00.0: GRBM_STATUS_SE2=3D0x00000006
kernel: [ 127.405884] radeon 0000:06:00.0: GRBM_STATUS_SE3=3D0x00000006
kernel: [ 127.405885] radeon 0000:06:00.0: SRBM_STATUS=3D0x20000A40
kernel: [ 127.405887] radeon 0000:06:00.0: SRBM_STATUS2=3D0x00000000
kernel: [ 127.405889] radeon 0000:06:00.0: SDMA0_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.405891] radeon 0000:06:00.0: SDMA1_STATUS_REG =3D 0x46CE=
E557
kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT =3D 0x00000000
kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT =3D 0x00000000
kernel: [ 127.405895] radeon 0000:06:00.0: CP_STALLED_STAT1 =3D 0x000000=
00
kernel: [ 127.405896] radeon 0000:06:00.0: CP_STALLED_STAT2 =3D 0x000000=
00
kernel: [ 127.405898] radeon 0000:06:00.0: CP_STALLED_STAT3 =3D 0x000000=
00
kernel: [ 127.405900] radeon 0000:06:00.0: CP_CPF_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.405902] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.405903] radeon 0000:06:00.0: CP_CPF_STATUS =3D 0x00000000
kernel: [ 127.405905] radeon 0000:06:00.0: CP_CPC_BUSY_STAT =3D 0x000000=
00
kernel: [ 127.405907] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 =3D 0x00=
000000
kernel: [ 127.405909] radeon 0000:06:00.0: CP_CPC_STATUS =3D 0x00000000
kernel: [ 127.405929] radeon 0000:06:00.0: GPU reset succeeded, trying to
resume
kernel: [ 127.658172] [drm:ci_dpm_enable [radeon]] *ERROR* ci_start_dpm fa=
iled
kernel: [ 127.658189] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm
resume failed
kernel: [ 127.658194] [drm] probing gen 2 caps for device 1022:1453 =3D 73=
3903/e
kernel: [ 127.658197] [drm] PCIE gen 3 link speeds already enabled
kernel: [ 127.664213] [drm] PCIE GART of 2048M enabled (table at
0x0000000000326000).
kernel: [ 127.664341] radeon 0000:06:00.0: WB enabled
kernel: [ 127.664344] radeon 0000:06:00.0: fence driver on ring 0 use gpu =
addr
0x0000000200000c00 and cpu addr 0xffff8807f3799c00
kernel: [ 127.664346] radeon 0000:06:00.0: fence driver on ring 1 use gpu =
addr
0x0000000200000c04 and cpu addr 0xffff8807f3799c04
kernel: [ 127.664347] radeon 0000:06:00.0: fence driver on ring 2 use gpu =
addr
0x0000000200000c08 and cpu addr 0xffff8807f3799c08
kernel: [ 127.664349] radeon 0000:06:00.0: fence driver on ring 3 use gpu =
addr
0x0000000200000c0c and cpu addr 0xffff8807f3799c0c
kernel: [ 127.664350] radeon 0000:06:00.0: fence driver on ring 4 use gpu =
addr
0x0000000200000c10 and cpu addr 0xffff8807f3799c10
kernel: [ 127.664772] radeon 0000:06:00.0: fence driver on ring 5 use gpu =
addr
0x0000000000078b30 and cpu addr 0xffffc90003c38b30
kernel: [ 127.664933] radeon 0000:06:00.0: fence driver on ring 6 use gpu =
addr
0x0000000200000c18 and cpu addr 0xffff8807f3799c18
kernel: [ 127.664934] radeon 0000:06:00.0: fence driver on ring 7 use gpu =
addr
0x0000000200000c1c and cpu addr 0xffff8807f3799c1c
kernel: [ 127.666482] [drm] ring test on 0 succeeded in 2 usecs
kernel: [ 127.666568] [drm] ring test on 1 succeeded in 2 usecs
kernel: [ 127.666586] [drm] ring test on 2 succeeded in 2 usecs
kernel: [ 127.666735] [drm] ring test on 3 succeeded in 3 usecs
kernel: [ 127.666745] [drm] ring test on 4 succeeded in 3 usecs
kernel: [ 127.692636] [drm] ring test on 5 succeeded in 1 usecs
kernel: [ 127.712543] [drm] UVD initialized successfully.
kernel: [ 127.813896] [drm] ring test on 6 succeeded in 708 usecs
kernel: [ 127.813920] [drm] ring test on 7 succeeded in 3 usecs
kernel: [ 127.813921] [drm] VCE initialized successfully.
kernel: [ 127.814029] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm
resume failed
On 4.15.10-041510-generic, I left my computer running overnight and came ba=
ck
to it frozen with this in kern.log:
Mar 18 04:25:10 Gaia kernel: [ 559.092721] BUG: stack guard page was hit at
000000001ecd1fa8 (stack is 0000000020941864..00000000cf703fbf)
Mar 18 04:25:10 Gaia kernel: [ 559.092729] kernel stack overflow (page fau=
lt):
0000 [#1] SMP NOPTI
Mar 18 04:25:10 Gaia kernel: [ 559.092733] Modules linked in:
nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay xfrm_user
xfrm4_tunnel tunnel4 l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel ipcomp
xfrm_ipcomp udp_tunnel esp4 pppox ah4 af_key xfrm_algo xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c
ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables devlink iptable_filter binfmt_misc
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
edac_mce_amd snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib kvm_a=
md
snd_hwdep kvm uvcvideo snd_seq_midi irqbypass snd_seq_midi_event snd_rawmidi
crct10dif_pclmul videobuf2_vmalloc crc32_pclmul
Mar 18 04:25:10 Gaia kernel: [ 559.092784] videobuf2_memops videobuf2_v4l2
snd_seq ghash_clmulni_intel videobuf2_core snd_pcm pcbc videodev snd_seq_de=
vice
media snd_timer joydev aesni_intel aes_x86_64 snd crypto_simd input_leds
glue_helper serio_raw soundcore cryptd ccp k10temp shpchp mac_hid wmi_bmof
sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_gen=
eric
usbhid hid amdkfd amd_iommu_v2 amdgpu chash radeon i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4
r8169 ahci mii libahci wmi gpio_amdpt gpio_generic
Mar 18 04:25:10 Gaia kernel: [ 559.092832] CPU: 5 PID: 7352 Comm: tail
Tainted: G W 4.15.10-041510-generic #201803152130
Mar 18 04:25:10 Gaia kernel: [ 559.092834] Hardware name: Gigabyte Technol=
ogy
Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F10 12/01/2017
Mar 18 04:25:10 Gaia kernel: [ 559.092881] RIP:
0010:amdgpu_get_pp_num_states+0x88/0x120 [amdgpu]
Mar 18 04:25:10 Gaia kernel: [ 559.092884] RSP: 0018:ffffb3cb8a837ca8 EFLA=
GS:
00010282
Mar 18 04:25:10 Gaia kernel: [ 559.092888] RAX: 00000000000000d4 RBX:
ffffb3cb8a837cac RCX: 0000000000000001
Mar 18 04:25:10 Gaia kernel: [ 559.092890] RDX: 0000000000000000 RSI:
ffffffffc087a88c RDI: 0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092893] RBP: ffffb3cb8a837d20 R08:
ffffffffc087a865 R09: ffff88c9ecebd98b
Mar 18 04:25:10 Gaia kernel: [ 559.092895] R10: 0000000000000000 R11:
ffff88c9ecebd98a R12: ffff88c9ecebd000
Mar 18 04:25:10 Gaia kernel: [ 559.092898] R13: ffffffffc087a858 R14:
00000000000000d4 R15: 0000000000000993
Mar 18 04:25:10 Gaia kernel: [ 559.092901] FS: 00007fccb1787540(0000)
GS:ffff88c9fe740000(0000) knlGS:0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092904] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Mar 18 04:25:10 Gaia kernel: [ 559.092906] CR2: ffffb3cb8a838000 CR3:
00000004a30d0000 CR4: 00000000003406e0
Mar 18 04:25:10 Gaia kernel: [ 559.092909] Call Trace:
Mar 18 04:25:10 Gaia kernel: [ 559.092918] ?
tty_insert_flip_string_fixed_flag+0x86/0xe0
Mar 18 04:25:10 Gaia kernel: [ 559.092925] dev_attr_show+0x23/0x60
Mar 18 04:25:10 Gaia kernel: [ 559.092931] sysfs_kf_seq_show+0xa3/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092935] kernfs_seq_show+0x27/0x30
Mar 18 04:25:10 Gaia kernel: [ 559.092939] seq_read+0xe5/0x430
Mar 18 04:25:10 Gaia kernel: [ 559.092943] kernfs_fop_read+0x137/0x180
Mar 18 04:25:10 Gaia kernel: [ 559.092948] __vfs_read+0x3a/0x170
Mar 18 04:25:10 Gaia kernel: [ 559.092954] ?
security_file_permission+0xa1/0xc0
Mar 18 04:25:10 Gaia kernel: [ 559.092958] vfs_read+0x8e/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092962] SyS_read+0x55/0xc0
Mar 18 04:25:10 Gaia kernel: [ 559.092967] do_syscall_64+0x73/0x130
Mar 18 04:25:10 Gaia kernel: [ 559.092973]=20
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Mar 18 04:25:10 Gaia kernel: [ 559.092976] RIP: 0033:0x7fccb12b5081
Mar 18 04:25:10 Gaia kernel: [ 559.092978] RSP: 002b:00007ffc17d84d68 EFLA=
GS:
00000246 ORIG_RAX: 0000000000000000
Mar 18 04:25:10 Gaia kernel: [ 559.092982] RAX: ffffffffffffffda RBX:
0000000000002000 RCX: 00007fccb12b5081
Mar 18 04:25:10 Gaia kernel: [ 559.092984] RDX: 0000000000002000 RSI:
00007ffc17d84db0 RDI: 0000000000000003
Mar 18 04:25:10 Gaia kernel: [ 559.092986] RBP: 0000000000000000 R08:
0000000000000000 R09: 00007fccb1313b40
Mar 18 04:25:10 Gaia kernel: [ 559.092988] R10: 00000000fffffff3 R11:
0000000000000246 R12: 00007ffc17d84db0
Mar 18 04:25:10 Gaia kernel: [ 559.092991] R13: 0000000000000003 R14:
ffffffffffffffff R15: 000055e8f3b747e0
Mar 18 04:25:10 Gaia kernel: [ 559.092994] Code: c7 c2 7a a8 87 c0 be 00 1=
0 00
00 4c 89 e7 e8 d0 08 90 d1 41 89 c7 8b 45 8c 85 c0 74 72 48 8d 5d 8c 45 31 =
f6
49 c7 c5 58 a8 87 c0 <42> 8b 44 b3 04 44 89 f1 4d 89 e8 83 f8 0a 74 2=
d 83 f8 02
49 c7
Mar 18 04:25:10 Gaia kernel: [ 559.093080] RIP:
amdgpu_get_pp_num_states+0x88/0x120 [amdgpu] RSP: ffffb3cb8a837ca8
Mar 18 04:25:10 Gaia kernel: [ 559.093084] ---[ end trace dbba232a9ca4c5c7
]---
Possibly related, if I `cat pp_num_states` from a terminal, I get a
segmentation fault:
root@Gaia:~# cat /sys/class/drm/card0/device/pp_num_states
Segmentation fault
I'm going to continue to dig. Let me know what logs/tests/whatnot I can pro=
vide
that would be useful.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15214051031.D4aa.26066--
--===============1842849196==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============1842849196==--