From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108781] 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init Date: Sat, 17 Nov 2018 21:25:58 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1016060528==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 2B4756E013 for ; Sat, 17 Nov 2018 21:25:58 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1016060528== Content-Type: multipart/alternative; boundary="15424899580.E6A3DB1.7742" Content-Transfer-Encoding: 7bit --15424899580.E6A3DB1.7742 Date: Sat, 17 Nov 2018 21:25:58 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108781 Bug ID: 108781 Summary: 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: critical Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jamespharvey20@gmail.com Created attachment 142499 --> https://bugs.freedesktop.org/attachment.cgi?id=3D142499&action=3Dedit dmesg (journalctl) of failure on 4.19.2.arch1-1 arch 4.18.16.arch1-1 works, using kernel parameters: radeon.cik_support=3D0 amdgpu.cik_support=3D1 amdgpu.dpm=3D1 amdgpu.dc=3D1 Upgraded to 4.19.2.arch1-1, and started getting this failure. Going back to 4.19.arch1-1 still gives this failure. Full dmesg (journalctl) attached for 4.19.2.arch1-1 (failing), 4.19.arch1-1 (failing), and 4.18.16.arch1-1 (working). But pertinent part of failure is below for search. This failure occurs booting to a tty, so no X logs are involved. (You might see on 4.18.16.arch1-1, there is a [drm:generic_reg_wait [amdgpu]] error and backtrace which has been happening forever, but it works and doesn't cause a noticeable problem.) ----- # lspci -v ... 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (rev 80) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Hawaii PRO [Radeon R9 290/390] Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0 Memory at c0000000 (64-bit, prefetchable) [size=3D256M] Memory at d0000000 (64-bit, prefetchable) [size=3D8M] I/O ports at 8000 [size=3D256] Memory at dfe00000 (32-bit, non-prefetchable) [size=3D256K] Expansion ROM at 000c0000 [disabled] [size=3D128K] Capabilities: [48] Vendor Specific Information: Len=3D08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=3D1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=3D0001 Rev=3D1 = Len=3D010 Capabilities: [150] Advanced Error Reporting Capabilities: [200] Resizable BAR Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Kernel driver in use: amdgpu Kernel modules: radeon, amdgpu ----- [drm] Invalid PCC GPIO: 13! ui class: none internal class: boot caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 status: c r b ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 power level 1 sclk: 105000 mclk: 150000 pcie gen: 3 pcie lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 status: [drm] amdgpu: dpm initialized [drm] Found UVD firmware Version: 1.64 Family ID: 9=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] Found VCE firmware Version: 50.10 Binary ID: 2=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] PCIE gen 3 link speeds already enabled [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel state: 0!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VB= IOS with error code 4!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VB= IOS with error code 4!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 [drm] Display Core initialized with v3.1.59! [drm] DM_MST: Differing MST start on aconnector: 00000000d3bd29d7 [id: 55]= =20=20=20=20=20=20 [drm] DM_MST: Differing MST start on aconnector: 000000004b0d56b6 [id: 57]= =20=20=20=20=20=20 [drm] DM_MST: Differing MST start on aconnector: 0000000058d5a853 [id: 59]= =20=20=20=20=20=20 [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] Driver supports precise vblank timestamp query.=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] UVD initialized successfully. [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test faile= d=20=20=20=20 [drm:amdgpu_device_init.cold.14 [amdgpu]] *ERROR* hw_init of IP block failed -110=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 amdgpu 0000:03:00.0: Fatal error during GPU init=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 [drm] amdgpu: finishing device. ------------[ cut here ]------------ Memory manager not clean during takedown. WARNING: CPU: 0 PID: 670 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x1f/0x30 [drm]=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i> x_tables sr_mod cdrom btrfs xor sd_mod dm_thin_pool dm_persistent_data raid6_pq dm_bio_prison dm_bufio libcrc32c crc32c_gener> CPU: 0 PID: 670 Comm: kworker/0:4 Not tainted 4.19.0-arch1-1-ARCH #1=20=20= =20=20=20=20=20=20=20=20=20=20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS P1.90 04/12/2018=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Workqueue: events work_for_cpu_fn RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm] Code: 0d d0 cb 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 38 48 83 c7 = 38 48 39 c7 75 01 c3 48 c7 c7 08 b1 1b c1 e8 5b 10 > RSP: 0018:ffff91764827bd08 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8e5a1b613200 RCX: 0000000000000000 RDX: 0000000000000007 RSI: ffffffff8de9d696 RDI: 00000000ffffffff RBP: ffff8e5a0ca729a0 R08: 0000000000000001 R09: 00000000000005aa R10: 0000000000000004 R11: 0000000000000000 R12: ffff8e5a1b6132e8 R13: 0000000000000000 R14: 0000000000000170 R15: ffff8e5a0c69e650 FS: 0000000000000000(0000) GS:ffff8e5a1f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f26530480 CR3: 00000001f0a0a006 CR4: 00000000000606f0 Call Trace: amdgpu_vram_mgr_fini+0x27/0x50 [amdgpu] ttm_bo_clean_mm+0xa9/0xb0 [ttm] amdgpu_ttm_fini+0x71/0x100 [amdgpu] amdgpu_bo_fini+0xe/0x30 [amdgpu] gmc_v7_0_sw_fini+0x32/0x60 [amdgpu] amdgpu_device_fini+0x2cc/0x4aa [amdgpu] amdgpu_driver_unload_kms+0x42/0x90 [amdgpu] amdgpu_driver_load_kms+0x168/0x2c0 [amdgpu] drm_dev_register+0x109/0x140 [drm] amdgpu_pci_probe+0x13c/0x1c0 [amdgpu] ? _raw_spin_unlock_irqrestore+0x20/0x40 local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1eb/0x410 worker_thread+0x218/0x3d0 ? process_one_work+0x410/0x410 kthread+0x112/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 ---[ end trace 3cf1bcf02bf4fe1a ]--- --=20 You are receiving this mail because: You are the assignee for the bug.= --15424899580.E6A3DB1.7742 Date: Sat, 17 Nov 2018 21:25:58 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 108781
Summary 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC = GPIO / invalid powerlevel state / Fatal error during GPU init
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity critical
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter jamespharvey20@gmail.com

Created attachment 142499 [deta=
ils]
dmesg (journalctl) of failure on 4.19.2.arch1-1

arch 4.18.16.arch1-1 works, using kernel parameters:

 radeon.cik_support=3D0 amdgpu.cik_support=3D1 amdgpu.dpm=3D1 amdgpu.dc=3D1

Upgraded to 4.19.2.arch1-1, and started getting this failure.  Going back to
4.19.arch1-1 still gives this failure.

Full dmesg (journalctl) attached for 4.19.2.arch1-1 (failing), 4.19.arch1-1
(failing), and 4.18.16.arch1-1 (working).  But pertinent part of failure is
below for search.

This failure occurs booting to a tty, so no X logs are involved.  (You might
see on 4.18.16.arch1-1, there is a [drm:generic_reg_wait [amdgpu]] error and
backtrace which has been happening forever, but it works and doesn't cause a
noticeable problem.)

-----

# lspci -v
...
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Hawaii PRO [Radeon R9 290/390] (rev 80) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Hawaii PRO [Radeon R9 290/390]
        Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0
        Memory at c0000000 (64-bit, prefetchable) [size=3D256M]
        Memory at d0000000 (64-bit, prefetchable) [size=3D8M]
        I/O ports at 8000 [size=3D256]
        Memory at dfe00000 (32-bit, non-prefetchable) [size=3D256K]
        Expansion ROM at 000c0000 [disabled] [size=3D128K]
        Capabilities: [48] Vendor Specific Information: Len=3D08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=3D1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=3D0001 Rev=3D1 =
Len=3D010
<?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] Resizable BAR <?>
        Capabilities: [270] Secondary PCI Express <?>
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Kernel driver in use: amdgpu
        Kernel modules: radeon, amdgpu

-----

[drm] Invalid PCC GPIO: 13!
        ui class: none
        internal class: boot
        caps:
        uvd    vclk: 0 dclk: 0
                power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie
lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
        status: c r b
        ui class: performance
        internal class: none
        caps:
        uvd    vclk: 0 dclk: 0
                power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie
lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
                power level 1    sclk: 105000 mclk: 150000 pcie gen: 3 pcie
lanes: 16=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
        status:
[drm] amdgpu: dpm initialized
[drm] Found UVD firmware Version: 1.64 Family ID: 9=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] Found VCE firmware Version: 50.10 Binary ID: 2=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] PCIE gen 3 link speeds already enabled
[drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel
state: 0!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VB=
IOS
with error code 4!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VB=
IOS
with error code 4!=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
[drm] Display Core initialized with v3.1.59!
[drm] DM_MST: Differing MST start on aconnector: 00000000d3bd29d7 [id: 55]=
=20=20=20=20=20=20
[drm] DM_MST: Differing MST start on aconnector: 000000004b0d56b6 [id: 57]=
=20=20=20=20=20=20
[drm] DM_MST: Differing MST start on aconnector: 0000000058d5a853 [id: 59]=
=20=20=20=20=20=20
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] Driver supports precise vblank timestamp query.=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] UVD initialized successfully.
[drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test faile=
d=20=20=20=20
[drm:amdgpu_device_init.cold.14 [amdgpu]] *ERROR* hw_init of IP block
<vce_v2_0> failed -110=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
amdgpu 0000:03:00.0: amdgpu_device_ip_init failed=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
amdgpu 0000:03:00.0: Fatal error during GPU init=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
[drm] amdgpu: finishing device.
------------[ cut here ]------------
Memory manager not clean during takedown.
WARNING: CPU: 0 PID: 670 at drivers/gpu/drm/drm_mm.c:950
drm_mm_takedown+0x1f/0x30 [drm]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) intel_rapl sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i>
 x_tables sr_mod cdrom btrfs xor sd_mod dm_thin_pool dm_persistent_data
raid6_pq dm_bio_prison dm_bufio libcrc32c crc32c_gener>
CPU: 0 PID: 670 Comm: kworker/0:4 Not tainted 4.19.0-arch1-1-ARCH #1=20=20=
=20=20=20=20=20=20=20=20=20=20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS
P1.90 04/12/2018=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Workqueue: events work_for_cpu_fn
RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm]
Code: 0d d0 cb 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 38 48 83 c7 =
38
48 39 c7 75 01 c3 48 c7 c7 08 b1 1b c1 e8 5b 10 >
RSP: 0018:ffff91764827bd08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8e5a1b613200 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8de9d696 RDI: 00000000ffffffff
RBP: ffff8e5a0ca729a0 R08: 0000000000000001 R09: 00000000000005aa
R10: 0000000000000004 R11: 0000000000000000 R12: ffff8e5a1b6132e8
R13: 0000000000000000 R14: 0000000000000170 R15: ffff8e5a0c69e650
FS:  0000000000000000(0000) GS:ffff8e5a1f800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f26530480 CR3: 00000001f0a0a006 CR4: 00000000000606f0
Call Trace:
 amdgpu_vram_mgr_fini+0x27/0x50 [amdgpu]
 ttm_bo_clean_mm+0xa9/0xb0 [ttm]
 amdgpu_ttm_fini+0x71/0x100 [amdgpu]
 amdgpu_bo_fini+0xe/0x30 [amdgpu]
 gmc_v7_0_sw_fini+0x32/0x60 [amdgpu]
 amdgpu_device_fini+0x2cc/0x4aa [amdgpu]
 amdgpu_driver_unload_kms+0x42/0x90 [amdgpu]
 amdgpu_driver_load_kms+0x168/0x2c0 [amdgpu]
 drm_dev_register+0x109/0x140 [drm]
 amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x20/0x40
 local_pci_probe+0x41/0x90
 work_for_cpu_fn+0x16/0x20
 process_one_work+0x1eb/0x410
 worker_thread+0x218/0x3d0
 ? process_one_work+0x410/0x410
 kthread+0x112/0x130
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
---[ end trace 3cf1bcf02bf4fe1a ]---


You are receiving this mail because:
  • You are the assignee for the bug.
= --15424899580.E6A3DB1.7742-- --===============1016060528== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1016060528==--