Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* REGRESSION on Linux 6.19-rc1
@ 2025-12-16 16:50 Borah, Chaitanya Kumar
  0 siblings, 0 replies; only message in thread
From: Borah, Chaitanya Kumar @ 2025-12-16 16:50 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: rafael.j.wysocki, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, linux-pm, linux-kernel,
	regressions, Kurmi, Suresh Kumar, Saarinen, Jani, linux-kernel

Hello Sathyanarayanan,

Hope you are doing well. I am Chaitanya from the linux graphics team in 
Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
drm-tip repository.

Since backmerge of Linux 6.19-rc1, we are seeing the following 
regression in the PTL machines.

`````````````````````````````````````````````````````````````````````````````````
<4>[    8.197433] ============================================
<4>[    8.197437] WARNING: possible recursive locking detected
<4>[    8.197440] 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 Not 
tainted
<4>[    8.197444] --------------------------------------------
<4>[    8.197447] cpuhp/0/20 is trying to acquire lock:
<4>[    8.197450] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: 
rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197463]
                   but task is already holding lock:
<4>[    8.197466] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: 
cpuhp_thread_fun+0x6d/0x290
<4>[    8.197477]
                   other info that might help us debug this:
<4>[    8.197480]  Possible unsafe locking scenario:

<4>[    8.197483]        CPU0
<4>[    8.197485]        ----
<4>[    8.197487]   lock(cpu_hotplug_lock);
<4>[    8.197490]   lock(cpu_hotplug_lock);
<4>[    8.197493]
                    *** DEADLOCK ***

<4>[    8.197496]  May be due to missing lock nesting notation

<4>[    8.197499] 2 locks held by cpuhp/0/20:
<4>[    8.197503]  #0: ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, 
at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197513]  #1: ffffffff83489f60 (cpuhp_state-up){+.+.}-{0:0}, 
at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197523]
                   stack backtrace:
<4>[    8.197528] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 
6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 PREEMPT(voluntary)
<4>[    8.197530] Hardware name: Intel Corporation Panther Lake Client 
Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.3383.D10.2510222219 
10/22/2025
<4>[    8.197532] Call Trace:
<4>[    8.197532]  <TASK>
<4>[    8.197533]  dump_stack_lvl+0x91/0xf0
<4>[    8.197537]  dump_stack+0x10/0x20
<4>[    8.197538]  print_deadlock_bug+0x23f/0x320
<4>[    8.197542]  __lock_acquire+0x146e/0x2790
<4>[    8.197548]  lock_acquire+0xc4/0x2c0
<4>[    8.197550]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197556]  cpus_read_lock+0x41/0x110
<4>[    8.197558]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197561]  rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197565]  rapl_cpu_online+0x85/0x87 [intel_rapl_msr]
<4>[    8.197568]  ? __pfx_rapl_cpu_online+0x10/0x10 [intel_rapl_msr]
<4>[    8.197570]  cpuhp_invoke_callback+0x41f/0x6c0
<4>[    8.197573]  ? cpuhp_thread_fun+0x6d/0x290
<4>[    8.197575]  cpuhp_thread_fun+0x1e2/0x290
<4>[    8.197578]  ? smpboot_thread_fn+0x26/0x290
<4>[    8.197581]  smpboot_thread_fn+0x12f/0x290
<4>[    8.197584]  ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[    8.197586]  kthread+0x11f/0x250
<4>[    8.197589]  ? __pfx_kthread+0x10/0x10
<4>[    8.197592]  ret_from_fork+0x344/0x3a0
<4>[    8.197595]  ? __pfx_kthread+0x10/0x10
<4>[    8.197597]  ret_from_fork_asm+0x1a/0x30
<4>[    8.197604]  </TASK>
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].

After bisecting the tree, the following patch [3] seems to be the first 
"bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 748d6ba43afde7e9ac27443233203995cc15d235
Author: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>
Date:   Thu Nov 20 16:05:39 2025 -0800

     powercap: intel_rapl: Enable MSR-based RAPL PMU support
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We also verified that if we revert the patch the issue is not seen.

Could you please check why the patch causes this regression and provide 
a fix if necessary?

Thank you.

Regards

Chaitanya

[1]
https://intel-gfx-ci.01.org/tree/intel-xe/combined-alt.html?
[2]
https://intel-gfx-ci.01.org/tree/intel-xe/xe-4242-05b7c58b3367dca84d4745dfcac3b5d4ee142404/bat-ptl-2/boot0.txt
[3] 
https://cgit.freedesktop.org/drm-tip/commit/?id=748d6ba43afde7e9ac27443233203995cc15d235

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-12-16 16:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16 16:50 REGRESSION on Linux 6.19-rc1 Borah, Chaitanya Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox