All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: alexander.deucher@amd.com, christian.koenig@amd.com
Cc: Borislav Petkov <bp@alien8.de>, amd-gfx@lists.freedesktop.org
Subject: amdgpu vs kexec
Date: Mon, 16 Jun 2025 11:39:45 +0200	[thread overview]
Message-ID: <20250616093945.GA1613200@noisy.programming.kicks-ass.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 4440 bytes --]

Hi guys,

My (Intel Sapphire Rapids) workstation has a RX 7800 XT and when I kexec
a bunch of times, the amdgpu driver gets upset and barfs on boot.

It starts like so:

[   16.926489] amdgpu 0000:19:00.0: amdgpu: Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 16
[   16.980590] amdgpu 0000:19:00.0: amdgpu: reserve 0xa700000 from 0x83e0000000 for PSP TMR
[   19.204585] amdgpu 0000:19:00.0: amdgpu: failed to load ucode SMC(0x32)
[   19.227333] amdgpu 0000:19:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
[   19.256420] amdgpu 0000:19:00.0: amdgpu: PSP load smu failed!
[   19.467875] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
[   19.491771] amdgpu 0000:19:00.0: amdgpu: PSP firmware loading failed
[   19.513372] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[   19.540397] amdgpu 0000:19:00.0: amdgpu: amdgpu_device_ip_init failed
[   19.562177] amdgpu 0000:19:00.0: amdgpu: Fatal error during GPU init
[   19.583785] amdgpu 0000:19:00.0: amdgpu: amdgpu: finishing device.
[   19.605474] ------------[ cut here ]------------
[   19.615370] WARNING: CPU: 0 PID: 704 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x46/0x70 [amdgpu]
[   19.638375] Modules linked in: rndis_host hid_generic cdc_ether usbhid usbnet mii hid amdgpu(+) amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy drm_ttm_helper ttm video wmi drm_exec drm_suballoc_helper drm_display_helper ast ah
ci cec rc_core iTCO_wdt libahci drm_shmem_helper xhci_pci drm_client_lib intel_pmc_bxt libata xhci_hcd iTCO_vendor_support igb nvme watchdog drm_kms_helper idxd atlantic intel_lpss_pci usbcore scsi_mod i2c_algo_bit drm nvme_core i2c_i80
1 idxd_bus crc16 intel_lpss macsec dca i2c_smbus idma64 scsi_common ucsi_acpi typec_ucsi typec roles usb_common pinctrl_alderlake button efivarfs
[   19.754852] CPU: 0 UID: 0 PID: 704 Comm: kworker/0:5 Not tainted 6.15.0-dirty #51 PREEMPT(full)
[   19.773770] Hardware name: Supermicro SYS-531A-I/X13SRA-TF, BIOS 1.1b 08/01/2023
[   19.789693] Workqueue: events work_for_cpu_fn
[   19.799066] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[   19.810480] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 c3 cc cc cc cc e9 5a fd ff ff <0f> 0b b8 ea ff ff ff c3 cc cc cc cc b8 ea ff ff ff c3 cc cc cc cc
[   19.851066] RSP: 0018:ff55eefd81aafd48 EFLAGS: 00010246
[   19.862314] RAX: ff466ca3653aac00 RBX: ff466ca2d7f98b40 RCX: 0000000000000000
[   19.877675] RDX: 0000000000000000 RSI: ff466ca2d7fa5990 RDI: ff466ca2d7f80000
[   19.893037] RBP: ff466ca2d7f90388 R08: 0000000000000000 R09: ff55eefd81aafb10
[   19.908401] R10: ff466cc1ffcd2fa8 R11: 0000000000000003 R12: ff466ca2d7f90830
[   19.923763] R13: ff466ca2d7f80010 R14: ff466ca2d7f80000 R15: ff466ca2d7fa5990
[   19.939132] FS:  0000000000000000(0000) GS:ff466cc1db2ee000(0000) knlGS:0000000000000000
[   19.956551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.968920] CR2: 00007f45f54e3de8 CR3: 000000207e624003 CR4: 0000000000f71ef0
[   19.984282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.999645] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   20.015010] PKRU: 55555554
[   20.020820] Call Trace:
[   20.026075]  <TASK>
[   20.030581]  amdgpu_fence_driver_hw_fini+0xfc/0x130 [amdgpu]
[   20.042894]  amdgpu_device_fini_hw+0xb7/0x2c6 [amdgpu]
[   20.054152]  amdgpu_driver_load_kms.cold+0x18/0x2e [amdgpu]
[   20.066323]  amdgpu_pci_probe+0x1cf/0x470 [amdgpu]
[   20.076775]  local_pci_probe+0x42/0x90
[   20.084839]  work_for_cpu_fn+0x17/0x30
[   20.092899]  process_one_work+0x188/0x340
[   20.101523]  worker_thread+0x256/0x3a0
[   20.109584]  ? __pfx_worker_thread+0x10/0x10
[   20.118767]  kthread+0xf9/0x240
[   20.125519]  ? __pfx_kthread+0x10/0x10
[   20.133578]  ret_from_fork+0x31/0x50
[   20.141268]  ? __pfx_kthread+0x10/0x10
[   20.149326]  ret_from_fork_asm+0x1a/0x30
[   20.157765]  </TASK>
[   20.162457] ---[ end trace 0000000000000000 ]---

and then continues to barf for a while longer. Full dmesg attached.

When I do a full power cycle its okay again for a few kexecs, but will
ultimately go unhappy again.

I'm doing a 'normal' systemctl kexec, which I figure should more or less
shut things down normally. Its not like a crash-kexec -- which is a
whole other story and can be expected to cause trouble.

[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 35326 bytes --]

             reply	other threads:[~2025-06-16 13:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16  9:39 Peter Zijlstra [this message]
2025-06-16 11:51 ` amdgpu vs kexec Christian König
2025-06-16 14:54   ` Peter Zijlstra
2025-06-18  2:12     ` Mario Limonciello
2025-06-18  8:51       ` Peter Zijlstra
2025-06-18  9:05         ` Christian König
2025-06-18 13:34           ` Mario Limonciello
2025-06-18 13:46             ` Alex Deucher
2025-06-18  9:12         ` Peter Zijlstra
2025-06-18  9:26           ` Peter Zijlstra
2025-06-18 13:35             ` Mario Limonciello
2025-06-20 10:39             ` Lazar, Lijo
2025-06-18 23:55           ` Baoquan He
2025-06-19 13:32             ` Mario Limonciello
2025-06-16 14:02 ` Lazar, Lijo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250616093945.GA1613200@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bp@alien8.de \
    --cc=christian.koenig@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.