From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Sat, 15 Jun 2019 16:58:59 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0325663188==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 9E4DA892AC for ; Sat, 15 Jun 2019 16:58:59 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0325663188== Content-Type: multipart/alternative; boundary="15606179391.12aedF6.31676" Content-Transfer-Encoding: 7bit --15606179391.12aedF6.31676 Date: Sat, 15 Jun 2019 16:58:59 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D110674 --- Comment #37 from Tom B --- 5.1.9 makes this bug even worse. It now crashes as soon as the display serv= er is started. Running sensors now gives an error: ERROR: Can't get value of subfeature fan1_input: I/O error ERROR: Can't get value of subfeature power1_average: I/O error iwlwifi-virtual-0 Adapter: Virtual device temp1: +37.0=C2=B0C=20=20 k10temp-pci-00c3 Adapter: PCI adapter Tdie: +34.8=C2=B0C (high =3D +70.0=C2=B0C) Tctl: +61.8=C2=B0C=20=20 amdgpu-pci-4400 Adapter: PCI adapter vddgfx: +0.74 V=20=20 fan1: N/A (min =3D 0 RPM, max =3D 3850 RPM) temp1: +39.0=C2=B0C (crit =3D +118.0=C2=B0C, hyst =3D -273.1=C2=B0C) power1: N/A (cap =3D 250.00 W) k10temp-pci-00cb Adapter: PCI adapter Tdie: +33.2=C2=B0C (high =3D +70.0=C2=B0C) Tctl: +60.2=C2=B0C=20=20 I can't even see the wattage now.=20 # cat /sys/kernel/debug/dri/0/amdgpu_pm_info Clock Gating Flags Mask: 0x860200 Graphics Medium Grain Clock Gating: Off Graphics Medium Grain memory Light Sleep: Off Graphics Coarse Grain Clock Gating: Off Graphics Coarse Grain memory Light Sleep: Off Graphics Coarse Grain Tree Shader Clock Gating: Off Graphics Coarse Grain Tree Shader Light Sleep: Off Graphics Command Processor Light Sleep: Off Graphics Run List Controller Light Sleep: Off Graphics 3D Coarse Grain Clock Gating: Off Graphics 3D Coarse Grain memory Light Sleep: Off Memory Controller Light Sleep: Off Memory Controller Medium Grain Clock Gating: On System Direct Memory Access Light Sleep: Off System Direct Memory Access Medium Grain Clock Gating: Off Bus Interface Medium Grain Clock Gating: Off Bus Interface Light Sleep: Off Unified Video Decoder Medium Grain Clock Gating: Off Video Compression Engine Medium Grain Clock Gating: Off Host Data Path Light Sleep: Off Host Data Path Medium Grain Clock Gating: Off Digital Right Management Medium Grain Clock Gating: Off Digital Right Management Light Sleep: On Rom Medium Grain Clock Gating: On Data Fabric Medium Grain Clock Gating: On GFX Clocks and Power: 1373 MHz (PSTATE_SCLK) 1001 MHz (PSTATE_MCLK) 737 mV (VDDGFX) GPU Temperature: 39 C UVD: Disabled VCE: Disabled No clocks or wattage!=20 I'm guessing 34d07ce3d6a120056e4763ae9a3db0d769ab7c63 "fix ring test failure issue during s3 in vce 3.0 (V2)" is to blame as dmesg (attached in next pos= t) says [ 20.584937] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=3D25, emitted seq=3D27 It would be nice to see some acknowledgement from AMD on this. --=20 You are receiving this mail because: You are the assignee for the bug.= --15606179391.12aedF6.31676 Date: Sat, 15 Jun 2019 16:58:59 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 37 on bug 11067= 4 from Tom = B
5.1.9 makes this bug even worse. It now crashes as soon as the=
 display server
is started.

Running sensors now gives an error:


ERROR: Can't get value of subfeature fan1_input: I/O error
ERROR: Can't get value of subfeature power1_average: I/O error
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +37.0=C2=B0C=20=20

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +34.8=C2=B0C  (high =3D +70.0=C2=B0C)
Tctl:         +61.8=C2=B0C=20=20

amdgpu-pci-4400
Adapter: PCI adapter
vddgfx:       +0.74 V=20=20
fan1:             N/A  (min =3D    0 RPM, max =3D 3850 RPM)
temp1:        +39.0=C2=B0C  (crit =3D +118.0=C2=B0C, hyst =3D -273.1=C2=B0C)
power1:           N/A  (cap =3D 250.00 W)

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +33.2=C2=B0C  (high =3D +70.0=C2=B0C)
Tctl:         +60.2=C2=B0C=20=20



I can't even see the wattage now.=20

# cat /sys/kernel/debug/dri/0/amdgpu_pm_info

Clock Gating Flags Mask: 0x860200
        Graphics Medium Grain Clock Gating: Off
        Graphics Medium Grain memory Light Sleep: Off
        Graphics Coarse Grain Clock Gating: Off
        Graphics Coarse Grain memory Light Sleep: Off
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: Off
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: Off
        Graphics 3D Coarse Grain memory Light Sleep: Off
        Memory Controller Light Sleep: Off
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: Off
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: Off
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: Off
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: On

GFX Clocks and Power:
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        737 mV (VDDGFX)

GPU Temperature: 39 C

UVD: Disabled

VCE: Disabled


No clocks or wattage!=20

I'm guessing 34d07ce3d6a120056e4763ae9a3db0d769ab7c63 "fix ring test f=
ailure
issue during s3 in vce 3.0 (V2)" is to blame as dmesg (attached in nex=
t post)
says


[   20.584937] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D25, emitted seq=3D27

It would be nice to see some acknowledgement from AMD on this.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15606179391.12aedF6.31676-- --===============0325663188== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0325663188==--