public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: <rafael.j.wysocki@intel.com>, <guohanjun@huawei.com>,
	<gshan@redhat.com>, <miguel.luis@oracle.com>,
	<catalin.marinas@arm.com>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linux regressions mailing list <regressions@lists.linux.dev>,
	<linuxarm@huawei.com>, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>, <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: 6.11/regression/bisected - The commit c1385c1f0ba3 caused a new possible recursive locking detected warning at computer boot.
Date: Tue, 23 Jul 2024 11:24:56 +0100	[thread overview]
Message-ID: <20240723112456.000053b3@Huawei.com> (raw)
In-Reply-To: <CABXGCsPvqBfL5hQDOARwfqasLRJ_eNPBbCngZ257HOe=xbWDkA@mail.gmail.com>

On Tue, 23 Jul 2024 00:36:18 +0500
Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote:

> Hi,
> The first Fedora update to the 6.11 kernel
> (kernel-debug-6.11.0-0.rc0.20240716gitd67978318827.2.fc41.x86_64)
> brings a new warning: possible recursive locking detected.

Hi Mikhail,

Thanks for the report.

This is an interesting corner and perhaps reflects a flawed
assumption we were making that for this path anything that can happen for an
initially present CPU can also happen for a hotplugged one. On the hotplugged
path the lock was always held and hence the static_key_enable() would
have failed.

I'm somewhat stumped on working out why this path couldn't happen
for a hotplugged CPU so why this is a new problem?

Maybe this is just a case of no one is providing _CPC for CPUs in virtual
machines so the path wasn't seen? QEMU doesn't generate ACPI tables with
_CPC today, so maybe that's it.

So maybe this is has revealed an existing latent  bug.  There have been
QEMU patches for _CPC in the past but never merged. I'll hack them
into an x86 virtual machine and see if we hit the same bug you have
here before and after the series.

Either way obviously we need to fix it for the current kernel (and maybe
backport the fix if I can verify it's a latent bug).  I'll get a test
setup running asap and see if I can replicate.

+CC x86 maintainers.

Thanks,

Jonathan




> The trace looks like:
> ACPI: button: Power Button [PWRF]
> 
> ============================================
> WARNING: possible recursive locking detected
> 6.11.0-0.rc0.20240716gitd67978318827.2.fc41.x86_64+debug #1 Not tainted
> --------------------------------------------
> cpuhp/0/22 is trying to acquire lock:
> ffffffffb7f9cb40 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x12/0x20
> 
> but task is already holding lock:
> ffffffffb7f9cb40 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xcd/0x6f0
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(cpu_hotplug_lock);
>   lock(cpu_hotplug_lock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by cpuhp/0/22:
>  #0: ffffffffb7f9cb40 (cpu_hotplug_lock){++++}-{0:0}, at:
> cpuhp_thread_fun+0xcd/0x6f0
>  #1: ffffffffb7f9f2e0 (cpuhp_state-up){+.+.}-{0:0}, at:
> cpuhp_thread_fun+0xcd/0x6f0
>  #2: ffffffffb7f1d650 (freq_invariance_lock){+.+.}-{3:3}, at:
> init_freq_invariance_cppc+0xf4/0x1e0
> 
> stack backtrace:
> CPU: 0 PID: 22 Comm: cpuhp/0 Not tainted
> 6.11.0-0.rc0.20240716gitd67978318827.2.fc41.x86_64+debug #1
> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
> BIOS 2611 04/07/2024
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x84/0xd0
>  __lock_acquire+0x27e3/0x5c70
>  ? __pfx___lock_acquire+0x10/0x10
>  ? cppc_get_perf_caps+0x64f/0xf60
>  lock_acquire+0x1ae/0x540
>  ? static_key_enable+0x12/0x20
>  ? __pfx_lock_acquire+0x10/0x10
>  ? __pfx___might_resched+0x10/0x10
>  cpus_read_lock+0x40/0xe0
>  ? static_key_enable+0x12/0x20
>  static_key_enable+0x12/0x20
>  freq_invariance_enable+0x13/0x40
>  init_freq_invariance_cppc+0x17e/0x1e0
>  ? __pfx_init_freq_invariance_cppc+0x10/0x10
>  ? acpi_cppc_processor_probe+0x1046/0x2300
>  acpi_cppc_processor_probe+0x11ae/0x2300
>  ? _raw_spin_unlock_irqrestore+0x4f/0x80
>  ? __pfx_acpi_cppc_processor_probe+0x10/0x10
>  ? __pfx_acpi_scan_drop_device+0x10/0x10
>  ? acpi_fetch_acpi_dev+0x79/0xe0
>  ? __pfx_acpi_fetch_acpi_dev+0x10/0x10
>  ? __pfx_acpi_soft_cpu_online+0x10/0x10
>  acpi_soft_cpu_online+0x114/0x330
>  cpuhp_invoke_callback+0x2c7/0xa40
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  ? cpuhp_thread_fun+0xcd/0x6f0
>  cpuhp_thread_fun+0x33a/0x6f0
>  ? smpboot_thread_fn+0x56/0x930
>  smpboot_thread_fn+0x54b/0x930
>  ? __pfx_smpboot_thread_fn+0x10/0x10
>  ? __pfx_smpboot_thread_fn+0x10/0x10
>  kthread+0x2d2/0x3a0
>  ? _raw_spin_unlock_irq+0x28/0x60
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x31/0x70
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> 
> Bisect is pointed to commit
> commit c1385c1f0ba3b80bd12f26c440612175088c664c (HEAD)
> Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Date:   Wed May 29 14:34:28 2024 +0100
> 
>     ACPI: processor: Simplify initial onlining to use same path for
> cold and hotplug
> 
>     Separate code paths, combined with a flag set in acpi_processor.c to
>     indicate a struct acpi_processor was for a hotplugged CPU ensured that
>     per CPU data was only set up the first time that a CPU was initialized.
>     This appears to be unnecessary as the paths can be combined by letting
>     the online logic also handle any CPUs online at the time of driver load.
> 
>     Motivation for this change, beyond simplification, is that ARM64
>     virtual CPU HP uses the same code paths for hotplug and cold path in
>     acpi_processor.c so had no easy way to set the flag for hotplug only.
>     Removing this necessity will enable ARM64 vCPU HP to reuse the existing
>     code paths.
> 
>     Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>     Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
>     Tested-by: Miguel Luis <miguel.luis@oracle.com>
>     Reviewed-by: Gavin Shan <gshan@redhat.com>
>     Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
>     Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>     Link: https://lore.kernel.org/r/20240529133446.28446-2-Jonathan.Cameron@huawei.com
>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> 
>  drivers/acpi/acpi_processor.c   |  7 +++----
>  drivers/acpi/processor_driver.c | 43
> ++++++++++++-------------------------------
>  include/acpi/processor.h        |  2 +-
>  3 files changed, 16 insertions(+), 36 deletions(-)
> 
> And I can confirm that after reverting c1385c1f0ba3 the issue is gone.
> 
> I also attach here a full kernel log and build config.
> 
> My hardware specs: https://linux-hardware.org/?probe=c6de14f5b8
> 
> Jonathan, can you look into this, please?
> 


  reply	other threads:[~2024-07-23 10:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-22 19:36 6.11/regression/bisected - The commit c1385c1f0ba3 caused a new possible recursive locking detected warning at computer boot Mikhail Gavrilov
2024-07-23 10:24 ` Jonathan Cameron [this message]
2024-07-23 17:20   ` Jonathan Cameron
2024-07-25 17:13     ` Jonathan Cameron
2024-07-25 22:30       ` Mikhail Gavrilov
2024-07-26 15:07       ` Terry Bowman
2024-07-26 16:37         ` Jonathan Cameron
2024-07-26 17:59           ` Jonathan Cameron
2024-07-26 16:26       ` Thomas Gleixner
2024-07-26 17:14         ` Jonathan Cameron
2024-07-26 18:01           ` Jonathan Cameron
2024-07-26 20:35             ` Thomas Gleixner
2024-07-27  7:13               ` Mikhail Gavrilov
2024-08-03 15:48         ` Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240723112456.000053b3@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gshan@redhat.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=miguel.luis@oracle.com \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=mingo@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox