From: Matthew Hall <mhall@mhcomputing.net>
To: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>,
"Koenig, Christian" <Christian.Koenig@amd.com>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: AMDGPU crash - request for assistance triaging / reporting
Date: Wed, 21 Feb 2024 16:14:55 -0800 [thread overview]
Message-ID: <20240222001455.GA14576@mhcomputing.net> (raw)
In-Reply-To: <20230721180131.GA10297@mhcomputing.net>
Hi All,
Even using the older stock Ubuntu kernel, I am still seeing this GPU crash killing my user sessions, even without intense 3-D activity.
It happens with X.org or Wayland, older kernel 6.5.0, and all recent latest non-snapshot releases from kernel.org as well.
The more Wayland, and the newer the kernel, the worse the frequency.
On the old stuff, frequency seems to be once every 2-3 weeks, on the new stuff, daily to a couple times daily.
Linux mhall-xps-01 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Is there any way I can get some help to debug or fix this? It is happening to multiple similar Lenovo model users for multiple months now.
https://gitlab.freedesktop.org/drm/amd/-/issues/2718
Regards,
Matthew.
--
2024-02-21T15:36:53.785101-08:00 mhall-xps-01 kernel: [2880378.141686] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1350708, emitted seq=1350710
2024-02-21T15:36:53.785121-08:00 mhall-xps-01 kernel: [2880378.142104] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
2024-02-21T15:36:53.785124-08:00 mhall-xps-01 kernel: [2880378.142485] amdgpu 0000:67:00.0: amdgpu: GPU reset begin!
2024-02-21T15:36:54.605112-08:00 mhall-xps-01 kernel: [2880378.965256] amdgpu 0000:67:00.0: amdgpu: MODE2 reset
2024-02-21T15:36:54.617089-08:00 mhall-xps-01 kernel: [2880378.973994] amdgpu 0000:67:00.0: amdgpu: GPU reset succeeded, trying to resume
2024-02-21T15:36:54.617097-08:00 mhall-xps-01 kernel: [2880378.974380] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
2024-02-21T15:36:54.617099-08:00 mhall-xps-01 kernel: [2880378.974465] [drm] VRAM is lost due to GPU reset!
2024-02-21T15:36:54.617100-08:00 mhall-xps-01 kernel: [2880378.974467] [drm] PSP is resuming...
2024-02-21T15:36:54.637111-08:00 mhall-xps-01 kernel: [2880378.996584] [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
2024-02-21T15:36:54.961131-08:00 mhall-xps-01 kernel: [2880379.321327] amdgpu 0000:67:00.0: amdgpu: RAS: optional ras ta ucode is not available
2024-02-21T15:36:54.973091-08:00 mhall-xps-01 kernel: [2880379.333287] amdgpu 0000:67:00.0: amdgpu: RAP: optional rap ta ucode is not available
2024-02-21T15:36:54.973099-08:00 mhall-xps-01 kernel: [2880379.333292] amdgpu 0000:67:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2024-02-21T15:36:54.973101-08:00 mhall-xps-01 kernel: [2880379.333300] amdgpu 0000:67:00.0: amdgpu: SMU is resuming...
2024-02-21T15:36:54.977097-08:00 mhall-xps-01 kernel: [2880379.334001] amdgpu 0000:67:00.0: amdgpu: SMU is resumed successfully!
2024-02-21T15:36:54.977105-08:00 mhall-xps-01 kernel: [2880379.335963] [drm] DMUB hardware initialized: version=0x0400003F
2024-02-21T15:36:55.981082-08:00 mhall-xps-01 kernel: [2880380.339381] [drm] kiq ring mec 2 pipe 1 q 0
2024-02-21T15:36:55.985137-08:00 mhall-xps-01 kernel: [2880380.344491] [drm] VCN decode and encode initialized successfully(under DPG Mode).
2024-02-21T15:36:55.985148-08:00 mhall-xps-01 kernel: [2880380.345269] [drm] JPEG decode initialized successfully.
2024-02-21T15:36:55.985149-08:00 mhall-xps-01 kernel: [2880380.345274] amdgpu 0000:67:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2024-02-21T15:36:55.985150-08:00 mhall-xps-01 kernel: [2880380.345278] amdgpu 0000:67:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2024-02-21T15:36:55.985151-08:00 mhall-xps-01 kernel: [2880380.345279] amdgpu 0000:67:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2024-02-21T15:36:55.985152-08:00 mhall-xps-01 kernel: [2880380.345281] amdgpu 0000:67:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
2024-02-21T15:36:55.985152-08:00 mhall-xps-01 kernel: [2880380.345283] amdgpu 0000:67:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
2024-02-21T15:36:55.985153-08:00 mhall-xps-01 kernel: [2880380.345284] amdgpu 0000:67:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
2024-02-21T15:36:55.985153-08:00 mhall-xps-01 kernel: [2880380.345286] amdgpu 0000:67:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
2024-02-21T15:36:55.985154-08:00 mhall-xps-01 kernel: [2880380.345288] amdgpu 0000:67:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
2024-02-21T15:36:55.985154-08:00 mhall-xps-01 kernel: [2880380.345289] amdgpu 0000:67:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
2024-02-21T15:36:55.985155-08:00 mhall-xps-01 kernel: [2880380.345291] amdgpu 0000:67:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
2024-02-21T15:36:55.985156-08:00 mhall-xps-01 kernel: [2880380.345293] amdgpu 0000:67:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2024-02-21T15:36:55.985168-08:00 mhall-xps-01 kernel: [2880380.345295] amdgpu 0000:67:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
2024-02-21T15:36:55.985168-08:00 mhall-xps-01 kernel: [2880380.345296] amdgpu 0000:67:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
2024-02-21T15:36:55.985169-08:00 mhall-xps-01 kernel: [2880380.345298] amdgpu 0000:67:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
2024-02-21T15:36:55.985169-08:00 mhall-xps-01 kernel: [2880380.345299] amdgpu 0000:67:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
2024-02-21T15:36:55.991915-08:00 mhall-xps-01 kernel: [2880380.352137] amdgpu 0000:67:00.0: amdgpu: recover vram bo from shadow start
2024-02-21T15:36:55.991926-08:00 mhall-xps-01 kernel: [2880380.352143] amdgpu 0000:67:00.0: amdgpu: recover vram bo from shadow done
2024-02-21T15:36:55.991928-08:00 mhall-xps-01 kernel: [2880380.352174] amdgpu 0000:67:00.0: amdgpu: GPU reset(1) succeeded!
2024-02-21T15:36:55.991929-08:00 mhall-xps-01 kernel: [2880380.352193] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991930-08:00 mhall-xps-01 kernel: [2880380.352203] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991932-08:00 mhall-xps-01 kernel: [2880380.352210] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991933-08:00 mhall-xps-01 kernel: [2880380.352221] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991935-08:00 mhall-xps-01 kernel: [2880380.352227] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991936-08:00 mhall-xps-01 kernel: [2880380.352235] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991937-08:00 mhall-xps-01 kernel: [2880380.352258] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991939-08:00 mhall-xps-01 kernel: [2880380.352262] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991940-08:00 mhall-xps-01 kernel: [2880380.352284] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991941-08:00 mhall-xps-01 kernel: [2880380.352299] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991942-08:00 mhall-xps-01 kernel: [2880380.352312] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991943-08:00 mhall-xps-01 kernel: [2880380.352325] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991944-08:00 mhall-xps-01 kernel: [2880380.352339] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991945-08:00 mhall-xps-01 kernel: [2880380.352358] [drm] Skip scheduling IBs!
2024-02-21T15:36:55.991947-08:00 mhall-xps-01 kernel: [2880380.352366] [drm] Skip scheduling IBs!
2024-02-21T15:36:56.989097-08:00 mhall-xps-01 kernel: [2880381.345919] [drm] PCIE GART of 512M enabled (table at 0x00000080FEB00000).
2024-02-21T15:36:56.989113-08:00 mhall-xps-01 kernel: [2880381.345949] [drm] PSP is resuming...
2024-02-21T15:36:57.065064-08:00 mhall-xps-01 kernel: [2880381.421985] [drm] reserve 0xa00000 from 0x80fd000000 for PSP TMR
2024-02-21T15:36:57.161054-08:00 mhall-xps-01 kernel: [2880381.521330] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
2024-02-21T15:36:57.177051-08:00 mhall-xps-01 kernel: [2880381.536487] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2024-02-21T15:36:57.177055-08:00 mhall-xps-01 kernel: [2880381.536493] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
2024-02-21T15:36:57.177056-08:00 mhall-xps-01 kernel: [2880381.536498] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version = 0x0000000f, smu fw program = 0, version = 0x00492000 (73.32.0)
2024-02-21T15:36:57.177057-08:00 mhall-xps-01 kernel: [2880381.536504] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
2024-02-21T15:36:57.217055-08:00 mhall-xps-01 kernel: [2880381.575099] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
2024-02-21T15:36:57.217063-08:00 mhall-xps-01 kernel: [2880381.576452] [drm] DMUB hardware initialized: version=0x02020017
2024-02-21T15:36:57.221054-08:00 mhall-xps-01 kernel: [2880381.579774] [drm] kiq ring mec 2 pipe 1 q 0
2024-02-21T15:36:57.225060-08:00 mhall-xps-01 kernel: [2880381.583037] [drm] VCN decode and encode initialized successfully(under DPG Mode).
2024-02-21T15:36:57.225062-08:00 mhall-xps-01 kernel: [2880381.583055] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2024-02-21T15:36:57.225063-08:00 mhall-xps-01 kernel: [2880381.583057] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2024-02-21T15:36:57.225063-08:00 mhall-xps-01 kernel: [2880381.583059] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2024-02-21T15:36:57.225064-08:00 mhall-xps-01 kernel: [2880381.583060] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
2024-02-21T15:36:57.225065-08:00 mhall-xps-01 kernel: [2880381.583061] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
2024-02-21T15:36:57.225066-08:00 mhall-xps-01 kernel: [2880381.583063] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
2024-02-21T15:36:57.225066-08:00 mhall-xps-01 kernel: [2880381.583064] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
2024-02-21T15:36:57.225067-08:00 mhall-xps-01 kernel: [2880381.583066] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
2024-02-21T15:36:57.225067-08:00 mhall-xps-01 kernel: [2880381.583067] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
2024-02-21T15:36:57.225068-08:00 mhall-xps-01 kernel: [2880381.583069] amdgpu 0000:03:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
2024-02-21T15:36:57.225068-08:00 mhall-xps-01 kernel: [2880381.583070] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2024-02-21T15:36:57.225069-08:00 mhall-xps-01 kernel: [2880381.583071] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
2024-02-21T15:36:57.237126-08:00 mhall-xps-01 kernel: [2880381.595618] [drm] Skip scheduling IBs!
2024-02-21T15:36:57.237133-08:00 mhall-xps-01 kernel: [2880381.595637] [drm] Skip scheduling IBs!
2024-02-21T15:36:57.237134-08:00 mhall-xps-01 kernel: [2880381.595648] [drm] Skip scheduling IBs!
... SNIPPED ...
2024-02-21T15:36:57.621302-08:00 mhall-xps-01 kernel: [2880381.979192] [drm] Skip scheduling IBs!
2024-02-21T15:36:57.621303-08:00 mhall-xps-01 kernel: [2880381.979199] [drm] Skip scheduling IBs!
2024-02-21T15:36:57.621303-08:00 mhall-xps-01 kernel: [2880381.979205] [drm] Skip scheduling IBs!
2024-02-21T15:36:57.665432-08:00 mhall-xps-01 kernel: [2880382.021491] workqueue: delayed_fput hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND
2024-02-21T15:37:23.225433-08:00 mhall-xps-01 kernel: [2880397.341439] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:37:33.465409-08:00 mhall-xps-01 kernel: [2880407.580685] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:37:43.705091-08:00 mhall-xps-01 kernel: [2880417.820114] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:37:53.945389-08:00 mhall-xps-01 kernel: [2880428.059779] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:04.185799-08:00 mhall-xps-01 kernel: [2880438.299089] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:14.425239-08:00 mhall-xps-01 kernel: [2880448.538928] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:24.665410-08:00 mhall-xps-01 kernel: [2880458.778463] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:34.905460-08:00 mhall-xps-01 kernel: [2880469.017993] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:45.145251-08:00 mhall-xps-01 kernel: [2880479.257543] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:38:55.385743-08:00 mhall-xps-01 kernel: [2880489.496921] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:39:05.625422-08:00 mhall-xps-01 kernel: [2880499.736383] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:39:15.865455-08:00 mhall-xps-01 kernel: [2880509.976241] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:39:26.105410-08:00 mhall-xps-01 kernel: [2880520.215766] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
2024-02-21T15:39:36.345444-08:00 mhall-xps-01 kernel: [2880530.455123] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
prev parent reply other threads:[~2024-02-22 0:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-21 3:43 AMDGPU crash - request for assistance triaging / reporting Matthew Hall
2023-07-21 13:33 ` Deucher, Alexander
2023-07-21 18:01 ` Matthew Hall
2024-02-22 0:14 ` Matthew Hall [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240222001455.GA14576@mhcomputing.net \
--to=mhall@mhcomputing.net \
--cc=Alexander.Deucher@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.