AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Michel Dänzer" <michel@daenzer.net>
To: "Zhu, Jiadong" <Jiadong.Zhu@amd.com>
Cc: "Tuikov, Luben" <Luben.Tuikov@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
Date: Mon, 14 Nov 2022 18:15:16 +0100	[thread overview]
Message-ID: <ddf6786a-7bdc-c8fa-e432-7e20498bb26d@daenzer.net> (raw)
In-Reply-To: <fb72d05b-dc74-fa84-51cf-3c3911aa46fc@daenzer.net>

On 2022-11-10 18:00, Michel Dänzer wrote:
> On 2022-11-08 09:01, Zhu, Jiadong wrote:
>>
>> I reproduced the glxgears 400fps scenario locally. The issue is caused by the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows down the low priority scheduler thread if high priority ib is under executing. I'll drop this patch as we cannot identify gpu bound according to the unsignaled fence, etc.
> 
> Okay, I'm testing with patches 1-4 only now.
> 
> So far I haven't noticed any negative effects, no slowdowns or intermittent freezes.

I'm afraid I may have run into another issue. I just hit a GPU hang, see the
journalctl excerpt below.

(I tried rebooting the machine via SSH after this, but it never seemed to
complete, so I had to hard-power-off the machine by holding the power
button for a few seconds)

I can't be sure that the GPU hang is directly related to this series,
but it seems plausible, and I hadn't hit a GPU hang in months if not
over a year before. If this series results in potentially hitting a
GPU hang every few days, it definitely doesn't provide enough benefit
to justify that.


Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166051, emitted seq=1166052
Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860
Nov 14 17:21:22 thor kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
Nov 14 17:21:22 thor kernel: amdgpu 0000:05:00.0: amdgpu: free PSP TMR buffer
Nov 14 17:21:22 thor kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 reset
Nov 14 17:21:22 thor kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled.
Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x000000F400A00000
Nov 14 17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset!
Nov 14 17:21:22 thor kernel: [drm] PSP is resuming...
Nov 14 17:21:22 thor kernel: [drm] reserve 0x400000 from 0xf431c00000 for PSP TMR
Nov 14 17:21:23 thor kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 14 17:21:23 thor kernel: amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated.
Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0
Nov 14 17:21:23 thor kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 14 17:21:23 thor kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
Nov 14 17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Nov 14 17:21:23 thor kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset(2) failed
Nov 14 17:21:23 thor kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
Nov 14 17:21:23 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
[...]
Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166052, emitted seq=1166052
Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860
Nov 14 17:21:33 thor kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


  parent reply	other threads:[~2022-11-14 17:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18  9:08 [PATCH 1/5] drm/amdgpu: Introduce gfx software ring (v8) jiadong.zhu
2022-10-18  9:08 ` [PATCH 2/5] drm/amdgpu: Add software ring callbacks for gfx9 (v8) jiadong.zhu
2022-10-18  9:08 ` [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4) jiadong.zhu
2022-11-22  5:46   ` Luben Tuikov
2022-10-18  9:08 ` [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) jiadong.zhu
2022-10-31 12:01   ` Michel Dänzer
2022-11-01  1:04     ` Zhu, Jiadong
2022-11-01  9:10       ` Michel Dänzer
2022-11-01  9:58         ` Zhu, Jiadong
2022-11-01 10:09           ` Michel Dänzer
2022-11-02 11:26             ` Michel Dänzer
2022-11-03  2:58               ` Zhu, Jiadong
2022-11-03  9:04                 ` Michel Dänzer
2022-11-08  8:01                   ` Zhu, Jiadong
2022-11-10 17:00                     ` Michel Dänzer
2022-11-10 17:54                       ` Alex Deucher
2022-11-11  6:15                         ` Zhu, Jiadong
2022-11-14 17:15                       ` Michel Dänzer [this message]
2022-11-17  3:34                         ` Zhu, Jiadong
2022-11-10 19:27   ` Luben Tuikov
2022-10-18  9:08 ` [PATCH 5/5] drm/amdgpu: Improve the software ring priority scheduler jiadong.zhu
2022-10-18 11:24   ` Christian König
2022-10-19 15:14 ` [PATCH 1/5] drm/amdgpu: Introduce gfx software ring (v8) Luben Tuikov
2022-10-20 14:49 ` Michel Dänzer
2022-10-20 14:59   ` Christian König
2022-10-21  7:42     ` Michel Dänzer
2022-10-31  8:10     ` Zhu, Jiadong
2022-10-31 11:58       ` Michel Dänzer
2022-11-22  5:33 ` Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddf6786a-7bdc-c8fa-e432-7e20498bb26d@daenzer.net \
    --to=michel@daenzer.net \
    --cc=Christian.Koenig@amd.com \
    --cc=Jiadong.Zhu@amd.com \
    --cc=Luben.Tuikov@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox