All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Bert Karwatzki <spasswolf@web.de>,
	Mateusz Guzik <mjguzik@gmail.com>,
	Christian Brauner <brauner@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, linux-fsdevel@vger.kernel.org,
	adobriyan@gmail.com, jack@suse.cz, viro@zeniv.linux.org.uk,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	spasswolf@web.de, Alex Deucher <alexander.deucher@amd.com>,
	amd-gfx@lists.freedesktop.org
Subject: Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
Date: Thu, 21 May 2026 12:17:15 +0200	[thread overview]
Message-ID: <878q9dvzh0.ffs@tglx> (raw)
In-Reply-To: <4f548d61b2dd12e01f401ce4b8c865f238f7b23c.camel@web.de>

On Thu, May 21 2026 at 11:20, Bert Karwatzki wrote:
> Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
>
> with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
>
> [    2.793416] [    T331] ------------[ cut here ]------------
> [    2.793433] [    T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [    2.793434] [    T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331

So either the mutex is corrupted or was never initialized.

> [    2.793463] [    T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
> drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
> scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
> [    2.793518] [    T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT 
> [    2.793534] [    T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> [    2.793547] [    T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
> [    2.793555] [    T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
> <67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
> [    2.793579] [    T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
> [    2.793588] [    T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
> [    2.793598] [    T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
> [    2.793609] [    T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
> [    2.793619] [    T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    2.793629] [    T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
> [    2.793641] [    T331] FS:  00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
> [    2.793653] [    T331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.793662] [    T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
> [    2.793673] [    T331] PKRU: 55555554
> [    2.793678] [    T331] Call Trace:
> [    2.793683] [    T331]  <TASK>
> [    2.793687] [    T331]  ? lock_acquire+0xbe/0x2d0
> [    2.793696] [    T331]  ? init_mqd+0x122/0x190 [amdgpu]
> [    2.793809] [    T331]  ? lock_release+0xc6/0x2a0
> [    2.793816] [    T331]  ? init_mqd+0x122/0x190 [amdgpu]
> [    2.793902] [    T331]  init_mqd+0x122/0x190 [amdgpu]
> [    2.793961] [    T331]  init_mqd_hiq+0xd/0x20 [amdgpu]
> [    2.794015] [    T331]  kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
> [    2.794071] [    T331]  kernel_queue_init+0x3f/0x60 [amdgpu]
> [    2.794125] [    T331]  pm_init+0x6b/0x100 [amdgpu]
> [    2.794178] [    T331]  start_cpsch+0x1d6/0x270 [amdgpu]
> [    2.794234] [    T331]  kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
> [    2.794365] [    T331]  amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]

amdgpu_amdkfd_device_init()
  kgd2kfd_device_init() {
      ....
        init_mqd()
          mutex_lock(... profiler_lock); <- FAIL

      mutex_init(...profiler_lock);
  }

Seems the famous graphics CI failed to catch this...

Thanks,

        tglx
---
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -744,6 +744,9 @@ bool kgd2kfd_device_init(struct kfd_dev
 			KGD_ENGINE_SDMA1);
 	kfd->shared_resources = *gpu_resources;
 
+	kfd->profiler_process = NULL;
+	mutex_init(&kfd->profiler_lock);
+
 	kfd->num_nodes = amdgpu_xcp_get_num_xcp(kfd->adev->xcp_mgr);
 
 	if (kfd->num_nodes == 0) {
@@ -936,9 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev
 
 	svm_range_set_max_pages(kfd->adev);
 
-	kfd->profiler_process = NULL;
-	mutex_init(&kfd->profiler_lock);
-
 	kfd->init_complete = true;
 	dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
 		 kfd->adev->pdev->device);

  parent reply	other threads:[~2026-05-21 10:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 22:52 context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Bert Karwatzki
2026-05-21  8:37 ` Thomas Gleixner
2026-05-21  8:53 ` Mateusz Guzik
2026-05-21  9:08   ` Sebastian Andrzej Siewior
2026-05-21  9:17     ` Mateusz Guzik
2026-05-21  9:09   ` Mateusz Guzik
2026-05-21  9:20     ` Bert Karwatzki
2026-05-21  9:25       ` Mateusz Guzik
2026-05-21  9:57         ` Bert Karwatzki
2026-05-21 10:17       ` Thomas Gleixner [this message]
2026-05-21 10:21         ` Bert Karwatzki
2026-05-21 10:33           ` Mateusz Guzik
2026-05-21 11:50             ` Bert Karwatzki
2026-05-21 12:01               ` Mateusz Guzik
2026-05-28 17:59                 ` Bert Karwatzki
2026-05-29 17:20                   ` Mateusz Guzik
2026-05-21 12:55         ` [PATCH] amd/amdkfd: Initialize kfd_dev::profiler lock early Thomas Gleixner
2026-05-21 10:05   ` context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Thomas Gleixner
2026-05-21 10:13     ` Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878q9dvzh0.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=adobriyan@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=mjguzik@gmail.com \
    --cc=spasswolf@web.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.