[Bug 216173] New: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: bugzilla-daemon@kernel.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 216173] New: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237)
Date: Sat, 25 Jun 2022 23:52:53 +0000	[thread overview]
Message-ID: <bug-216173-2300@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=216173

            Bug ID: 216173
           Summary: amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1
                    pasid:32769, for process Xorg pid 2994 thread Xorg:cs0
                    pid 3237)
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.19-rc3
          Hardware: i386
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: witold.baryluk+kernel@gmail.com
        Regression: No

This appears to be a regression in 5.19-rc3 (and rc2, didn't test before that).
It works fine on 5.18.7. Both custom build. And also no issues on 5.18.0.

Debian, amd64.

44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi
21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)

CPU: AMD Threadripper 2950X, stock
Memory: 8x32GB ECC
Motherboard: MSI MEG Creation X399


Booting looks fine, but when Xorg server starts, the screen looks corrupted,
and it takes seconds until screen freezes and is not responding.

Dmesg output:


[  140.683672] amdgpu 0000:44:00.0: amdgpu: [gfxhub] page fault (src_id:0
ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid
3237)
[  140.683678] amdgpu 0000:44:00.0: amdgpu:   in page starting at address
0x0000800106ef5000 from client 0x1b (UTCL2)
[  140.683681] amdgpu 0000:44:00.0: amdgpu:
GCVM_L2_PROTECTION_FAULT_STATUS:0x0014115B
[  140.683682] amdgpu 0000:44:00.0: amdgpu:      Faulty UTCL2 client ID: TCP
(0x8)
[  140.683684] amdgpu 0000:44:00.0: amdgpu:      MORE_FAULTS: 0x1
[  140.683685] amdgpu 0000:44:00.0: amdgpu:      WALKER_ERROR: 0x5
[  140.683686] amdgpu 0000:44:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  140.683686] amdgpu 0000:44:00.0: amdgpu:      MAPPING_ERROR: 0x1
[  140.683687] amdgpu 0000:44:00.0: amdgpu:      RW: 0x1
...
[  151.015508] gmc_v10_0_process_interrupt: 699 callbacks suppressed
...


Eventually resets, but still not usable:

[  161.261520] amdgpu 0000:44:00.0: amdgpu: IH ring buffer overflow
(0x0008D620, 0x00002680, 0x0000D640)
[  161.270648] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=100, emitted seq=103
[  161.270854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 2994 thread Xorg:cs0 pid 3237
[  161.271004] amdgpu 0000:44:00.0: amdgpu: GPU reset begin!
[  161.830407] amdgpu 0000:44:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  161.830517] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[  162.084366] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[  162.101328] [drm] free PSP TMR buffer
[  162.149879] CPU: 15 PID: 188 Comm: kworker/u128:14 Tainted: G        W   E  
  5.19.0-rc3 #1
[  162.149883] Hardware name: Micro-Star International Co., Ltd. MS-7B92/MEG
X399 CREATION (MS-7B92), BIOS 1.30 03/25/2019
[  162.149884] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[  162.149890] Call Trace:
[  162.149892]  <TASK>
[  162.149893]  dump_stack_lvl+0x34/0x45
[  162.149898]  amdgpu_do_asic_reset+0x1b/0x3db [amdgpu]
[  162.150047]  amdgpu_device_gpu_recover_imp.cold+0x57e/0x910 [amdgpu]
[  162.150194]  amdgpu_job_timedout+0x14b/0x180 [amdgpu]
[  162.150323]  ? finish_task_switch.isra.0+0x7d/0x270
[  162.150326]  drm_sched_job_timedout+0x5b/0xf0 [gpu_sched]
[  162.150330]  process_one_work+0x1ab/0x300
[  162.150332]  worker_thread+0x48/0x3c0
[  162.150334]  ? rescuer_thread+0x3c0/0x3c0
[  162.150336]  kthread+0xd1/0x100
[  162.150338]  ? kthread_complete_and_exit+0x20/0x20
[  162.150339]  ret_from_fork+0x1f/0x30
[  162.150342]  </TASK>
[  162.150351] amdgpu 0000:44:00.0: amdgpu: MODE1 reset
[  162.150354] amdgpu 0000:44:00.0: amdgpu: GPU mode1 reset
[  162.150417] amdgpu 0000:44:00.0: amdgpu: GPU smu mode1 reset
[  162.653371] amdgpu 0000:44:00.0: amdgpu: GPU reset succeeded, trying to
resume
[  162.653516] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[  162.653537] [drm] VRAM is lost due to GPU reset!
[  162.653541] [drm] PSP is resuming...
[  162.834166] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
[  162.948850] amdgpu 0000:44:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[  162.948853] amdgpu 0000:44:00.0: amdgpu: SMU is resuming...
[  162.948884] amdgpu 0000:44:00.0: amdgpu: use vbios provided pptable
[  163.025704] amdgpu 0000:44:00.0: amdgpu: SMU is resumed successfully!
[  163.027473] [drm] DMUB hardware initialized: version=0x02020003
[  163.280274] [drm] kiq ring mec 2 pipe 1 q 0
[  163.284624] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[  163.284906] [drm] JPEG decode initialized successfully.
[  163.284926] amdgpu 0000:44:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[  163.284928] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[  163.284930] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[  163.284931] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[  163.284932] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[  163.284934] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[  163.284935] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[  163.284936] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[  163.284937] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[  163.284938] amdgpu 0000:44:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[  163.284940] amdgpu 0000:44:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[  163.284941] amdgpu 0000:44:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[  163.284942] amdgpu 0000:44:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on
hub 0
[  163.284943] amdgpu 0000:44:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on
hub 0
[  163.284944] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 1
[  163.284945] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 1
[  163.284947] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 1
[  163.284948] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on
hub 1
[  163.284949] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6
on hub 1
[  163.284950] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7
on hub 1
[  163.284951] amdgpu 0000:44:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on
hub 1
[  163.292565] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow start
[  163.292579] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow done
[  163.292582] [drm] Skip scheduling IBs!
[  163.292583] [drm] Skip scheduling IBs!
[  163.292598] amdgpu 0000:44:00.0: amdgpu: GPU reset(3) succeeded!
[  163.292618] [drm] Skip scheduling IBs!
[  163.292626] [drm] Skip scheduling IBs!
[  163.292629] [drm] Skip scheduling IBs!
[  163.989966] usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
[  166.265393] amdgpu_cs_ioctl: 3200 callbacks suppressed
[  166.265397] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.265812] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.282284] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  166.283327] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  171.486759] amdgpu_cs_ioctl: 65 callbacks suppressed

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

next             reply	other threads:[~2022-06-25 23:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-25 23:52 bugzilla-daemon [this message]
2022-06-25 23:53 ` [Bug 216173] amdgpu [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237) bugzilla-daemon
2022-06-25 23:53 ` bugzilla-daemon
2022-06-25 23:54 ` bugzilla-daemon
2022-06-25 23:54 ` bugzilla-daemon
2022-06-25 23:56 ` bugzilla-daemon
2022-06-25 23:57 ` bugzilla-daemon
2022-06-25 23:58 ` bugzilla-daemon
2022-06-26  0:06 ` bugzilla-daemon
2022-06-26  0:09 ` bugzilla-daemon
2022-06-29  0:03 ` bugzilla-daemon
2022-06-29 13:09 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-216173-2300@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.