linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Guangbo Cui <jckeep.cuiguangbo@gmail.com>
Cc: Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Waiman Long <longman@redhat.com>,
	linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
Date: Fri, 10 Oct 2025 09:15:55 +0200	[thread overview]
Message-ID: <20251010071555.u4ubYPid@linutronix.de> (raw)
In-Reply-To: <20251009150651.93618-1-jckeep.cuiguangbo@gmail.com>

On 2025-10-09 15:06:50 [+0000], Guangbo Cui wrote:
> When injecting AER errors under PREEMPT_RT, the kernel may trigger a
> lockdep warning about an invalid wait context:
> 
> ```
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> ```
> 
> This happens because the AER injection path (`aer_inj_read_config()`)
> is called in the context of the PCIe PME interrupt thread, which runs
> through `irq_forced_thread_fn()` under PREEMPT_RT. In this context,
> `pci_lock` (a raw_spinlock_t) is held with interrupts disabled
> (`spin_lock_irqsave()`), and then `aer_inj_read_config()` tries to
> acquire `inject_lock`, which is a `rt_spin_lock`. (Thanks Waiman Long)

This is way more verbose than it has to be. The key point here is that
it is a "sleeping while atomic" condition because, as you correctly
identified, a raw_spinlock_t is held while a sleeping lock und
PREEMPT_RT is accessed. No need to paste halve of lockdep's output.

> `rt_spin_lock` may sleep, so acquiring it while holding a raw spinlock
> with IRQs disabled violates the lock ordering rules. This leads to
> the “Invalid wait context” lockdep warning.
> 
> In other words, the lock order looks like this:
> 
> ```
>   raw_spin_lock_irqsave(&pci_lock);
>       ↓
>   rt_spin_lock(&inject_lock);   <-- not allowed
> ```

...

> To fix this, convert `inject_lock` from an `rt_spin_lock` to a
> `raw_spinlock_t`, a raw spinlock is safe and consistent with the
> surrounding locking scheme.

Switching the lock type is a possible solution. It must not become the
default solution if things like this happen. In this case it is okay as
lock usage is limited and the only source of "unbound latencies" is the
list it protects. This is of no concern as you explained it should not
contain thousands of items _and_ it is used only for debugging and
development.

> This resolves the lockdep “Invalid wait context” warning observed when
> injecting correctable AER errors through `/dev/aer_inject` on PREEMPT_RT.
> 
> This was discovered while testing PCIe AER error injection on an arm64
> QEMU virtual machine:
> 
> ```
>   qemu-system-aarch64 \
>       -nographic \
>       -machine virt,highmem=off,gic-version=3 \
>       -cpu cortex-a72 \
>       -kernel arch/arm64/boot/Image \
>       -initrd initramfs.cpio.gz \
>       -append "console=ttyAMA0 root=/dev/ram rdinit=/linuxrc earlyprintk nokaslr" \
>       -m 2G \
>       -smp 1 \
>       -netdev user,id=net0,hostfwd=tcp::2223-:22 \
>       -device virtio-net-pci,netdev=net0 \
>       -device pcie-root-port,id=rp0,chassis=1,slot=0x0 \
>       -device pci-testdev -s -S
> ```
> 
> Injecting a correctable PCIe error via /dev/aer_inject caused a BUG
> report with "Invalid wait context" in the irq/PCIe thread.
> 
> ```
> ~ # export HEX="00020000000000000100000000000000000000000000000000000000"
> ~ # echo -n "$HEX" | xxd -r -p | tee /dev/aer_inject >/dev/null
> [ 1850.947170] pcieport 0000:00:02.0: aer_inject: Injecting errors 00000001/00000000 into device 0000:00:02.0
> [ 1850.949951]
> [ 1850.950479] =============================
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> [ 1850.955932] stack backtrace:
> [ 1850.956412] CPU: 0 UID: 0 PID: 56 Comm: irq/16-PCIe PME Not tainted 6.17.0-11316-g7a405dbb0f03-dirty #7 PREEMPT_{RT,(full)}
> [ 1850.957039] Hardware name: linux,dummy-virt (DT)
> [ 1850.957409] Call trace:
> [ 1850.957727]  show_stack+0x18/0x24 (C)
> [ 1850.958089]  dump_stack_lvl+0x40/0xbc
> [ 1850.958339]  dump_stack+0x18/0x24
> [ 1850.958586]  __lock_acquire+0xa84/0x3008
> [ 1850.958907]  lock_acquire+0x128/0x2a8
> [ 1850.959171]  rt_spin_lock+0x50/0x1b8
> [ 1850.959476]  aer_inj_read_config+0x38/0x1dc
> [ 1850.959821]  pci_bus_read_config_dword+0x80/0xd8
> [ 1850.960079]  pcie_capability_read_dword+0xac/0xd8
> [ 1850.960454]  pcie_pme_irq+0x44/0xc4
> [ 1850.960728]  irq_forced_thread_fn+0x30/0x94
> [ 1850.960984]  irq_thread+0x1ac/0x3a4
> [ 1850.961308]  kthread+0x1b4/0x208
> [ 1850.961557]  ret_from_fork+0x10/0x20
> [ 1850.963088] pcieport 0000:00:02.0: AER: Correctable error message received from 0000:00:02.0
> [ 1850.963330] pcieport 0000:00:02.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
> [ 1850.963351] pcieport 0000:00:02.0:   device [1b36:000c] error status/mask=00000001/0000e000
> [ 1850.963385] pcieport 0000:00:02.0:    [ 0] RxErr                  (First)
> ```

This is a lot. And I don't think this belongs here. The part how to use
aer_inject should be described somewhere in Documentation/ as _this_ is
not unique to your usage but a general problem of the driver.
The commit message should describe the problem you are facing and your
solution to it. Please take a look at
	https://docs.kernel.org/process/submitting-patches.html#describe-your-changes

> Signed-off-by: Guangbo Cui <jckeep.cuiguangbo@gmail.com>
> ---
> index 91acc7b17f68..6dd231d9fccd 100644
> --- a/drivers/pci/pcie/aer_inject.c
> +++ b/drivers/pci/pcie/aer_inject.c
> @@ -523,8 +523,8 @@ static int __init aer_inject_init(void)
>  static void __exit aer_inject_exit(void)
>  {
>  	struct aer_error *err, *err_next;
> -	unsigned long flags;
>  	struct pci_bus_ops *bus_ops;
> +	LIST_HEAD(einjected_to_free);
>  
>  	misc_deregister(&aer_inject_device);
>  
> @@ -533,12 +533,14 @@ static void __exit aer_inject_exit(void)
>  		kfree(bus_ops);
>  	}
>  
> -	spin_lock_irqsave(&inject_lock, flags);
> -	list_for_each_entry_safe(err, err_next, &einjected, list) {
> +	scoped_guard(raw_spinlock_irqsave, &inject_lock) {
> +		list_splice_init(&einjected, &einjected_to_free);
> +	}

I would either convert _all_ instance of the lock usage to guard
notation or none. But not one.
Also I wouldn't split everything to another list just to free it later.
I would argue here that locking in aer_inject_exit() locking is not
required because the misc device is removed (no one can keep it open as
it would not allow module removal) and so this is the only possible
user.
Therefore you can iterate over the list and free items lock less.
This reasoning of this change needs to be part of your commit
description. Also _why_ this becomes a problem. You do not mention this
change at all.

> +	list_for_each_entry_safe(err, err_next, &einjected_to_free, list) {
>  		list_del(&err->list);
>  		kfree(err);
>  	}
> -	spin_unlock_irqrestore(&inject_lock, flags);
>  }
>  
>  module_init(aer_inject_init);

Sebastian

  parent reply	other threads:[~2025-10-10  7:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09 15:06 [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Guangbo Cui
2025-10-10  2:52 ` Waiman Long
2025-10-10  7:15 ` Sebastian Andrzej Siewior [this message]
2025-10-10 17:18   ` Guangbo Cui
2025-10-24 13:17     ` Sebastian Andrzej Siewior
2025-10-24 17:30       ` Guangbo Cui
2025-10-24 16:59 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251010071555.u4ubYPid@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=bhelgaas@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=clrkwllms@kernel.org \
    --cc=jckeep.cuiguangbo@gmail.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).