linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
@ 2025-10-09 15:06 Guangbo Cui
  2025-10-10  2:52 ` Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Guangbo Cui @ 2025-10-09 15:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Thomas Gleixner, Bjorn Helgaas
  Cc: Jonathan Cameron, Waiman Long, linux-rt-devel, linux-kernel,
	linux-pci, Guangbo Cui

When injecting AER errors under PREEMPT_RT, the kernel may trigger a
lockdep warning about an invalid wait context:

```
[ 1850.950780] [ BUG: Invalid wait context ]
[ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
[ 1850.951457] -----------------------------
[ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
[ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
[ 1850.952731] other info that might help us debug this:
[ 1850.952997] context-{5:5}
[ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
[ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
[ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
[ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
[ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
[ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
```

This happens because the AER injection path (`aer_inj_read_config()`)
is called in the context of the PCIe PME interrupt thread, which runs
through `irq_forced_thread_fn()` under PREEMPT_RT. In this context,
`pci_lock` (a raw_spinlock_t) is held with interrupts disabled
(`spin_lock_irqsave()`), and then `aer_inj_read_config()` tries to
acquire `inject_lock`, which is a `rt_spin_lock`. (Thanks Waiman Long)

`rt_spin_lock` may sleep, so acquiring it while holding a raw spinlock
with IRQs disabled violates the lock ordering rules. This leads to
the “Invalid wait context” lockdep warning.

In other words, the lock order looks like this:

```
  raw_spin_lock_irqsave(&pci_lock);
      ↓
  rt_spin_lock(&inject_lock);   <-- not allowed
```

To fix this, convert `inject_lock` from an `rt_spin_lock` to a
`raw_spinlock_t`, a raw spinlock is safe and consistent with the
surrounding locking scheme.

This resolves the lockdep “Invalid wait context” warning observed when
injecting correctable AER errors through `/dev/aer_inject` on PREEMPT_RT.

This was discovered while testing PCIe AER error injection on an arm64
QEMU virtual machine:

```
  qemu-system-aarch64 \
      -nographic \
      -machine virt,highmem=off,gic-version=3 \
      -cpu cortex-a72 \
      -kernel arch/arm64/boot/Image \
      -initrd initramfs.cpio.gz \
      -append "console=ttyAMA0 root=/dev/ram rdinit=/linuxrc earlyprintk nokaslr" \
      -m 2G \
      -smp 1 \
      -netdev user,id=net0,hostfwd=tcp::2223-:22 \
      -device virtio-net-pci,netdev=net0 \
      -device pcie-root-port,id=rp0,chassis=1,slot=0x0 \
      -device pci-testdev -s -S
```

Injecting a correctable PCIe error via /dev/aer_inject caused a BUG
report with "Invalid wait context" in the irq/PCIe thread.

```
~ # export HEX="00020000000000000100000000000000000000000000000000000000"
~ # echo -n "$HEX" | xxd -r -p | tee /dev/aer_inject >/dev/null
[ 1850.947170] pcieport 0000:00:02.0: aer_inject: Injecting errors 00000001/00000000 into device 0000:00:02.0
[ 1850.949951]
[ 1850.950479] =============================
[ 1850.950780] [ BUG: Invalid wait context ]
[ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
[ 1850.951457] -----------------------------
[ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
[ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
[ 1850.952731] other info that might help us debug this:
[ 1850.952997] context-{5:5}
[ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
[ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
[ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
[ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
[ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
[ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
[ 1850.955932] stack backtrace:
[ 1850.956412] CPU: 0 UID: 0 PID: 56 Comm: irq/16-PCIe PME Not tainted 6.17.0-11316-g7a405dbb0f03-dirty #7 PREEMPT_{RT,(full)}
[ 1850.957039] Hardware name: linux,dummy-virt (DT)
[ 1850.957409] Call trace:
[ 1850.957727]  show_stack+0x18/0x24 (C)
[ 1850.958089]  dump_stack_lvl+0x40/0xbc
[ 1850.958339]  dump_stack+0x18/0x24
[ 1850.958586]  __lock_acquire+0xa84/0x3008
[ 1850.958907]  lock_acquire+0x128/0x2a8
[ 1850.959171]  rt_spin_lock+0x50/0x1b8
[ 1850.959476]  aer_inj_read_config+0x38/0x1dc
[ 1850.959821]  pci_bus_read_config_dword+0x80/0xd8
[ 1850.960079]  pcie_capability_read_dword+0xac/0xd8
[ 1850.960454]  pcie_pme_irq+0x44/0xc4
[ 1850.960728]  irq_forced_thread_fn+0x30/0x94
[ 1850.960984]  irq_thread+0x1ac/0x3a4
[ 1850.961308]  kthread+0x1b4/0x208
[ 1850.961557]  ret_from_fork+0x10/0x20
[ 1850.963088] pcieport 0000:00:02.0: AER: Correctable error message received from 0000:00:02.0
[ 1850.963330] pcieport 0000:00:02.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[ 1850.963351] pcieport 0000:00:02.0:   device [1b36:000c] error status/mask=00000001/0000e000
[ 1850.963385] pcieport 0000:00:02.0:    [ 0] RxErr                  (First)
```

Signed-off-by: Guangbo Cui <jckeep.cuiguangbo@gmail.com>
---
Changes in v2:
- Pulling kfree() out from the lock critical section. (Thanks Waiman Long)
- Link to v1: https://lore.kernel.org/linux-pci/20251007060218.57222-1-jckeep.cuiguangbo@gmail.com/

 drivers/pci/pcie/aer_inject.c | 40 ++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index 91acc7b17f68..6dd231d9fccd 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -72,7 +72,7 @@ static LIST_HEAD(einjected);
 static LIST_HEAD(pci_bus_ops_list);
 
 /* Protect einjected and pci_bus_ops_list */
-static DEFINE_SPINLOCK(inject_lock);
+static DEFINE_RAW_SPINLOCK(inject_lock);
 
 static void aer_error_init(struct aer_error *err, u32 domain,
 			   unsigned int bus, unsigned int devfn,
@@ -126,12 +126,12 @@ static struct pci_bus_ops *pci_bus_ops_pop(void)
 	unsigned long flags;
 	struct pci_bus_ops *bus_ops;
 
-	spin_lock_irqsave(&inject_lock, flags);
+	raw_spin_lock_irqsave(&inject_lock, flags);
 	bus_ops = list_first_entry_or_null(&pci_bus_ops_list,
 					   struct pci_bus_ops, list);
 	if (bus_ops)
 		list_del(&bus_ops->list);
-	spin_unlock_irqrestore(&inject_lock, flags);
+	raw_spin_unlock_irqrestore(&inject_lock, flags);
 	return bus_ops;
 }
 
@@ -223,7 +223,7 @@ static int aer_inj_read_config(struct pci_bus *bus, unsigned int devfn,
 	int domain;
 	int rv;
 
-	spin_lock_irqsave(&inject_lock, flags);
+	raw_spin_lock_irqsave(&inject_lock, flags);
 	if (size != sizeof(u32))
 		goto out;
 	domain = pci_domain_nr(bus);
@@ -236,12 +236,12 @@ static int aer_inj_read_config(struct pci_bus *bus, unsigned int devfn,
 	sim = find_pci_config_dword(err, where, NULL);
 	if (sim) {
 		*val = *sim;
-		spin_unlock_irqrestore(&inject_lock, flags);
+		raw_spin_unlock_irqrestore(&inject_lock, flags);
 		return 0;
 	}
 out:
 	rv = aer_inj_read(bus, devfn, where, size, val);
-	spin_unlock_irqrestore(&inject_lock, flags);
+	raw_spin_unlock_irqrestore(&inject_lock, flags);
 	return rv;
 }
 
@@ -255,7 +255,7 @@ static int aer_inj_write_config(struct pci_bus *bus, unsigned int devfn,
 	int domain;
 	int rv;
 
-	spin_lock_irqsave(&inject_lock, flags);
+	raw_spin_lock_irqsave(&inject_lock, flags);
 	if (size != sizeof(u32))
 		goto out;
 	domain = pci_domain_nr(bus);
@@ -271,12 +271,12 @@ static int aer_inj_write_config(struct pci_bus *bus, unsigned int devfn,
 			*sim ^= val;
 		else
 			*sim = val;
-		spin_unlock_irqrestore(&inject_lock, flags);
+		raw_spin_unlock_irqrestore(&inject_lock, flags);
 		return 0;
 	}
 out:
 	rv = aer_inj_write(bus, devfn, where, size, val);
-	spin_unlock_irqrestore(&inject_lock, flags);
+	raw_spin_unlock_irqrestore(&inject_lock, flags);
 	return rv;
 }
 
@@ -304,14 +304,14 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus)
 	if (!bus_ops)
 		return -ENOMEM;
 	ops = pci_bus_set_ops(bus, &aer_inj_pci_ops);
-	spin_lock_irqsave(&inject_lock, flags);
+	raw_spin_lock_irqsave(&inject_lock, flags);
 	if (ops == &aer_inj_pci_ops)
 		goto out;
 	pci_bus_ops_init(bus_ops, bus, ops);
 	list_add(&bus_ops->list, &pci_bus_ops_list);
 	bus_ops = NULL;
 out:
-	spin_unlock_irqrestore(&inject_lock, flags);
+	raw_spin_unlock_irqrestore(&inject_lock, flags);
 	kfree(bus_ops);
 	return 0;
 }
@@ -383,7 +383,7 @@ static int aer_inject(struct aer_error_inj *einj)
 				       uncor_mask);
 	}
 
-	spin_lock_irqsave(&inject_lock, flags);
+	raw_spin_lock_irqsave(&inject_lock, flags);
 
 	err = __find_aer_error_by_dev(dev);
 	if (!err) {
@@ -404,14 +404,14 @@ static int aer_inject(struct aer_error_inj *einj)
 	    !(einj->cor_status & ~cor_mask)) {
 		ret = -EINVAL;
 		pci_warn(dev, "The correctable error(s) is masked by device\n");
-		spin_unlock_irqrestore(&inject_lock, flags);
+		raw_spin_unlock_irqrestore(&inject_lock, flags);
 		goto out_put;
 	}
 	if (!aer_mask_override && einj->uncor_status &&
 	    !(einj->uncor_status & ~uncor_mask)) {
 		ret = -EINVAL;
 		pci_warn(dev, "The uncorrectable error(s) is masked by device\n");
-		spin_unlock_irqrestore(&inject_lock, flags);
+		raw_spin_unlock_irqrestore(&inject_lock, flags);
 		goto out_put;
 	}
 
@@ -445,7 +445,7 @@ static int aer_inject(struct aer_error_inj *einj)
 		rperr->source_id &= 0x0000ffff;
 		rperr->source_id |= PCI_DEVID(einj->bus, devfn) << 16;
 	}
-	spin_unlock_irqrestore(&inject_lock, flags);
+	raw_spin_unlock_irqrestore(&inject_lock, flags);
 
 	if (aer_mask_override) {
 		pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK,
@@ -523,8 +523,8 @@ static int __init aer_inject_init(void)
 static void __exit aer_inject_exit(void)
 {
 	struct aer_error *err, *err_next;
-	unsigned long flags;
 	struct pci_bus_ops *bus_ops;
+	LIST_HEAD(einjected_to_free);
 
 	misc_deregister(&aer_inject_device);
 
@@ -533,12 +533,14 @@ static void __exit aer_inject_exit(void)
 		kfree(bus_ops);
 	}
 
-	spin_lock_irqsave(&inject_lock, flags);
-	list_for_each_entry_safe(err, err_next, &einjected, list) {
+	scoped_guard(raw_spinlock_irqsave, &inject_lock) {
+		list_splice_init(&einjected, &einjected_to_free);
+	}
+
+	list_for_each_entry_safe(err, err_next, &einjected_to_free, list) {
 		list_del(&err->list);
 		kfree(err);
 	}
-	spin_unlock_irqrestore(&inject_lock, flags);
 }
 
 module_init(aer_inject_init);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-09 15:06 [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Guangbo Cui
@ 2025-10-10  2:52 ` Waiman Long
  2025-10-10  7:15 ` Sebastian Andrzej Siewior
  2025-10-24 16:59 ` Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2025-10-10  2:52 UTC (permalink / raw)
  To: Guangbo Cui, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Thomas Gleixner, Bjorn Helgaas
  Cc: Jonathan Cameron, linux-rt-devel, linux-kernel, linux-pci

On 10/9/25 11:06 AM, Guangbo Cui wrote:
> When injecting AER errors under PREEMPT_RT, the kernel may trigger a
> lockdep warning about an invalid wait context:
>
> ```
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> ```
>
> This happens because the AER injection path (`aer_inj_read_config()`)
> is called in the context of the PCIe PME interrupt thread, which runs
> through `irq_forced_thread_fn()` under PREEMPT_RT. In this context,
> `pci_lock` (a raw_spinlock_t) is held with interrupts disabled
> (`spin_lock_irqsave()`), and then `aer_inj_read_config()` tries to
> acquire `inject_lock`, which is a `rt_spin_lock`. (Thanks Waiman Long)
>
> `rt_spin_lock` may sleep, so acquiring it while holding a raw spinlock
> with IRQs disabled violates the lock ordering rules. This leads to
> the “Invalid wait context” lockdep warning.
>
> In other words, the lock order looks like this:
>
> ```
>    raw_spin_lock_irqsave(&pci_lock);
>        ↓
>    rt_spin_lock(&inject_lock);   <-- not allowed
> ```
>
> To fix this, convert `inject_lock` from an `rt_spin_lock` to a
> `raw_spinlock_t`, a raw spinlock is safe and consistent with the
> surrounding locking scheme.
>
> This resolves the lockdep “Invalid wait context” warning observed when
> injecting correctable AER errors through `/dev/aer_inject` on PREEMPT_RT.
>
> This was discovered while testing PCIe AER error injection on an arm64
> QEMU virtual machine:
>
> ```
>    qemu-system-aarch64 \
>        -nographic \
>        -machine virt,highmem=off,gic-version=3 \
>        -cpu cortex-a72 \
>        -kernel arch/arm64/boot/Image \
>        -initrd initramfs.cpio.gz \
>        -append "console=ttyAMA0 root=/dev/ram rdinit=/linuxrc earlyprintk nokaslr" \
>        -m 2G \
>        -smp 1 \
>        -netdev user,id=net0,hostfwd=tcp::2223-:22 \
>        -device virtio-net-pci,netdev=net0 \
>        -device pcie-root-port,id=rp0,chassis=1,slot=0x0 \
>        -device pci-testdev -s -S
> ```
>
> Injecting a correctable PCIe error via /dev/aer_inject caused a BUG
> report with "Invalid wait context" in the irq/PCIe thread.
>
> ```
> ~ # export HEX="00020000000000000100000000000000000000000000000000000000"
> ~ # echo -n "$HEX" | xxd -r -p | tee /dev/aer_inject >/dev/null
> [ 1850.947170] pcieport 0000:00:02.0: aer_inject: Injecting errors 00000001/00000000 into device 0000:00:02.0
> [ 1850.949951]
> [ 1850.950479] =============================
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> [ 1850.955932] stack backtrace:
> [ 1850.956412] CPU: 0 UID: 0 PID: 56 Comm: irq/16-PCIe PME Not tainted 6.17.0-11316-g7a405dbb0f03-dirty #7 PREEMPT_{RT,(full)}
> [ 1850.957039] Hardware name: linux,dummy-virt (DT)
> [ 1850.957409] Call trace:
> [ 1850.957727]  show_stack+0x18/0x24 (C)
> [ 1850.958089]  dump_stack_lvl+0x40/0xbc
> [ 1850.958339]  dump_stack+0x18/0x24
> [ 1850.958586]  __lock_acquire+0xa84/0x3008
> [ 1850.958907]  lock_acquire+0x128/0x2a8
> [ 1850.959171]  rt_spin_lock+0x50/0x1b8
> [ 1850.959476]  aer_inj_read_config+0x38/0x1dc
> [ 1850.959821]  pci_bus_read_config_dword+0x80/0xd8
> [ 1850.960079]  pcie_capability_read_dword+0xac/0xd8
> [ 1850.960454]  pcie_pme_irq+0x44/0xc4
> [ 1850.960728]  irq_forced_thread_fn+0x30/0x94
> [ 1850.960984]  irq_thread+0x1ac/0x3a4
> [ 1850.961308]  kthread+0x1b4/0x208
> [ 1850.961557]  ret_from_fork+0x10/0x20
> [ 1850.963088] pcieport 0000:00:02.0: AER: Correctable error message received from 0000:00:02.0
> [ 1850.963330] pcieport 0000:00:02.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
> [ 1850.963351] pcieport 0000:00:02.0:   device [1b36:000c] error status/mask=00000001/0000e000
> [ 1850.963385] pcieport 0000:00:02.0:    [ 0] RxErr                  (First)
> ```
>
> Signed-off-by: Guangbo Cui <jckeep.cuiguangbo@gmail.com>
> ---
> Changes in v2:
> - Pulling kfree() out from the lock critical section. (Thanks Waiman Long)
> - Link to v1: https://lore.kernel.org/linux-pci/20251007060218.57222-1-jckeep.cuiguangbo@gmail.com/

As PCI error injection is mainly for debug/development, I think it is OK 
to change inject_lock to a raw_spinlock. I think you should also mention 
about moving kfree() out of the lock critical section in the commit log. 
Or better yet, break it out as a separate patch. It is just a nit. So I 
am fine with the current version too.

Now it is up to the PCI maintainer to decide if further change is needed.

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-09 15:06 [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Guangbo Cui
  2025-10-10  2:52 ` Waiman Long
@ 2025-10-10  7:15 ` Sebastian Andrzej Siewior
  2025-10-10 17:18   ` Guangbo Cui
  2025-10-24 16:59 ` Bjorn Helgaas
  2 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-10-10  7:15 UTC (permalink / raw)
  To: Guangbo Cui
  Cc: Clark Williams, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Thomas Gleixner, Bjorn Helgaas,
	Jonathan Cameron, Waiman Long, linux-rt-devel, linux-kernel,
	linux-pci

On 2025-10-09 15:06:50 [+0000], Guangbo Cui wrote:
> When injecting AER errors under PREEMPT_RT, the kernel may trigger a
> lockdep warning about an invalid wait context:
> 
> ```
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> ```
> 
> This happens because the AER injection path (`aer_inj_read_config()`)
> is called in the context of the PCIe PME interrupt thread, which runs
> through `irq_forced_thread_fn()` under PREEMPT_RT. In this context,
> `pci_lock` (a raw_spinlock_t) is held with interrupts disabled
> (`spin_lock_irqsave()`), and then `aer_inj_read_config()` tries to
> acquire `inject_lock`, which is a `rt_spin_lock`. (Thanks Waiman Long)

This is way more verbose than it has to be. The key point here is that
it is a "sleeping while atomic" condition because, as you correctly
identified, a raw_spinlock_t is held while a sleeping lock und
PREEMPT_RT is accessed. No need to paste halve of lockdep's output.

> `rt_spin_lock` may sleep, so acquiring it while holding a raw spinlock
> with IRQs disabled violates the lock ordering rules. This leads to
> the “Invalid wait context” lockdep warning.
> 
> In other words, the lock order looks like this:
> 
> ```
>   raw_spin_lock_irqsave(&pci_lock);
>       ↓
>   rt_spin_lock(&inject_lock);   <-- not allowed
> ```

...

> To fix this, convert `inject_lock` from an `rt_spin_lock` to a
> `raw_spinlock_t`, a raw spinlock is safe and consistent with the
> surrounding locking scheme.

Switching the lock type is a possible solution. It must not become the
default solution if things like this happen. In this case it is okay as
lock usage is limited and the only source of "unbound latencies" is the
list it protects. This is of no concern as you explained it should not
contain thousands of items _and_ it is used only for debugging and
development.

> This resolves the lockdep “Invalid wait context” warning observed when
> injecting correctable AER errors through `/dev/aer_inject` on PREEMPT_RT.
> 
> This was discovered while testing PCIe AER error injection on an arm64
> QEMU virtual machine:
> 
> ```
>   qemu-system-aarch64 \
>       -nographic \
>       -machine virt,highmem=off,gic-version=3 \
>       -cpu cortex-a72 \
>       -kernel arch/arm64/boot/Image \
>       -initrd initramfs.cpio.gz \
>       -append "console=ttyAMA0 root=/dev/ram rdinit=/linuxrc earlyprintk nokaslr" \
>       -m 2G \
>       -smp 1 \
>       -netdev user,id=net0,hostfwd=tcp::2223-:22 \
>       -device virtio-net-pci,netdev=net0 \
>       -device pcie-root-port,id=rp0,chassis=1,slot=0x0 \
>       -device pci-testdev -s -S
> ```
> 
> Injecting a correctable PCIe error via /dev/aer_inject caused a BUG
> report with "Invalid wait context" in the irq/PCIe thread.
> 
> ```
> ~ # export HEX="00020000000000000100000000000000000000000000000000000000"
> ~ # echo -n "$HEX" | xxd -r -p | tee /dev/aer_inject >/dev/null
> [ 1850.947170] pcieport 0000:00:02.0: aer_inject: Injecting errors 00000001/00000000 into device 0000:00:02.0
> [ 1850.949951]
> [ 1850.950479] =============================
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> [ 1850.955932] stack backtrace:
> [ 1850.956412] CPU: 0 UID: 0 PID: 56 Comm: irq/16-PCIe PME Not tainted 6.17.0-11316-g7a405dbb0f03-dirty #7 PREEMPT_{RT,(full)}
> [ 1850.957039] Hardware name: linux,dummy-virt (DT)
> [ 1850.957409] Call trace:
> [ 1850.957727]  show_stack+0x18/0x24 (C)
> [ 1850.958089]  dump_stack_lvl+0x40/0xbc
> [ 1850.958339]  dump_stack+0x18/0x24
> [ 1850.958586]  __lock_acquire+0xa84/0x3008
> [ 1850.958907]  lock_acquire+0x128/0x2a8
> [ 1850.959171]  rt_spin_lock+0x50/0x1b8
> [ 1850.959476]  aer_inj_read_config+0x38/0x1dc
> [ 1850.959821]  pci_bus_read_config_dword+0x80/0xd8
> [ 1850.960079]  pcie_capability_read_dword+0xac/0xd8
> [ 1850.960454]  pcie_pme_irq+0x44/0xc4
> [ 1850.960728]  irq_forced_thread_fn+0x30/0x94
> [ 1850.960984]  irq_thread+0x1ac/0x3a4
> [ 1850.961308]  kthread+0x1b4/0x208
> [ 1850.961557]  ret_from_fork+0x10/0x20
> [ 1850.963088] pcieport 0000:00:02.0: AER: Correctable error message received from 0000:00:02.0
> [ 1850.963330] pcieport 0000:00:02.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
> [ 1850.963351] pcieport 0000:00:02.0:   device [1b36:000c] error status/mask=00000001/0000e000
> [ 1850.963385] pcieport 0000:00:02.0:    [ 0] RxErr                  (First)
> ```

This is a lot. And I don't think this belongs here. The part how to use
aer_inject should be described somewhere in Documentation/ as _this_ is
not unique to your usage but a general problem of the driver.
The commit message should describe the problem you are facing and your
solution to it. Please take a look at
	https://docs.kernel.org/process/submitting-patches.html#describe-your-changes

> Signed-off-by: Guangbo Cui <jckeep.cuiguangbo@gmail.com>
> ---
> index 91acc7b17f68..6dd231d9fccd 100644
> --- a/drivers/pci/pcie/aer_inject.c
> +++ b/drivers/pci/pcie/aer_inject.c
> @@ -523,8 +523,8 @@ static int __init aer_inject_init(void)
>  static void __exit aer_inject_exit(void)
>  {
>  	struct aer_error *err, *err_next;
> -	unsigned long flags;
>  	struct pci_bus_ops *bus_ops;
> +	LIST_HEAD(einjected_to_free);
>  
>  	misc_deregister(&aer_inject_device);
>  
> @@ -533,12 +533,14 @@ static void __exit aer_inject_exit(void)
>  		kfree(bus_ops);
>  	}
>  
> -	spin_lock_irqsave(&inject_lock, flags);
> -	list_for_each_entry_safe(err, err_next, &einjected, list) {
> +	scoped_guard(raw_spinlock_irqsave, &inject_lock) {
> +		list_splice_init(&einjected, &einjected_to_free);
> +	}

I would either convert _all_ instance of the lock usage to guard
notation or none. But not one.
Also I wouldn't split everything to another list just to free it later.
I would argue here that locking in aer_inject_exit() locking is not
required because the misc device is removed (no one can keep it open as
it would not allow module removal) and so this is the only possible
user.
Therefore you can iterate over the list and free items lock less.
This reasoning of this change needs to be part of your commit
description. Also _why_ this becomes a problem. You do not mention this
change at all.

> +	list_for_each_entry_safe(err, err_next, &einjected_to_free, list) {
>  		list_del(&err->list);
>  		kfree(err);
>  	}
> -	spin_unlock_irqrestore(&inject_lock, flags);
>  }
>  
>  module_init(aer_inject_init);

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-10  7:15 ` Sebastian Andrzej Siewior
@ 2025-10-10 17:18   ` Guangbo Cui
  2025-10-24 13:17     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Guangbo Cui @ 2025-10-10 17:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Thomas Gleixner, Bjorn Helgaas,
	Jonathan Cameron, Waiman Long, linux-rt-devel, linux-kernel,
	linux-pci

On Fri, Oct 10, 2025 at 09:15:55AM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-10-09 15:06:50 [+0000], Guangbo Cui wrote:
> > index 91acc7b17f68..6dd231d9fccd 100644
> > --- a/drivers/pci/pcie/aer_inject.c
> > +++ b/drivers/pci/pcie/aer_inject.c
> > @@ -523,8 +523,8 @@ static int __init aer_inject_init(void)
> >  static void __exit aer_inject_exit(void)
> >  {
> >     struct aer_error *err, *err_next;
> > -   unsigned long flags;
> >     struct pci_bus_ops *bus_ops;
> > +   LIST_HEAD(einjected_to_free);
> >
> >     misc_deregister(&aer_inject_device);
> >
> > @@ -533,12 +533,14 @@ static void __exit aer_inject_exit(void)
> >             kfree(bus_ops);
> >     }
> >
> > -   spin_lock_irqsave(&inject_lock, flags);
> > -   list_for_each_entry_safe(err, err_next, &einjected, list) {
> > +   scoped_guard(raw_spinlock_irqsave, &inject_lock) {
> > +           list_splice_init(&einjected, &einjected_to_free);
> > +   }
>
> I would either convert _all_ instance of the lock usage to guard
> notation or none. But not one.


> Also I wouldn't split everything to another list just to free it later.
> I would argue here that locking in aer_inject_exit() locking is not
> required because the misc device is removed (no one can keep it open as
> it would not allow module removal) and so this is the only possible
> user.
> Therefore you can iterate over the list and free items lock less.
> This reasoning of this change needs to be part of your commit
> description. Also _why_ this becomes a problem. You do not mention this
> change at all.

Hi, Sebastian

Yeah, you're right. Once misc_deregister is called, no new user can add
err injection, and aer_inj_pci_ops was deleted by pci_bus_ops_pop below.
All places that access einjected have already been released, so freeing
einjected can be done without locking.

I will drop the lock in aer_inject_exit, and update commit msg.

Best regards,
Guangbo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-10 17:18   ` Guangbo Cui
@ 2025-10-24 13:17     ` Sebastian Andrzej Siewior
  2025-10-24 17:30       ` Guangbo Cui
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-10-24 13:17 UTC (permalink / raw)
  To: Guangbo Cui
  Cc: Clark Williams, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Thomas Gleixner, Bjorn Helgaas,
	Jonathan Cameron, Waiman Long, linux-rt-devel, linux-kernel,
	linux-pci

On 2025-10-11 01:18:05 [+0800], Guangbo Cui wrote:
> I will drop the lock in aer_inject_exit, and update commit msg.

Was here a follow-up?

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-09 15:06 [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Guangbo Cui
  2025-10-10  2:52 ` Waiman Long
  2025-10-10  7:15 ` Sebastian Andrzej Siewior
@ 2025-10-24 16:59 ` Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2025-10-24 16:59 UTC (permalink / raw)
  To: Guangbo Cui
  Cc: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Thomas Gleixner, Bjorn Helgaas, Jonathan Cameron, Waiman Long,
	linux-rt-devel, linux-kernel, linux-pci

On Thu, Oct 09, 2025 at 03:06:50PM +0000, Guangbo Cui wrote:
> When injecting AER errors under PREEMPT_RT, the kernel may trigger a
> lockdep warning about an invalid wait context:
> 
> ```
> [ 1850.950780] [ BUG: Invalid wait context ]
> [ 1850.951152] 6.17.0-11316-g7a405dbb0f03-dirty #7 Not tainted
> [ 1850.951457] -----------------------------
> [ 1850.951680] irq/16-PCIe PME/56 is trying to lock:
> [ 1850.952004] ffff800082865238 (inject_lock){+.+.}-{3:3}, at: aer_inj_read_config+0x38/0x1dc
> [ 1850.952731] other info that might help us debug this:
> [ 1850.952997] context-{5:5}
> [ 1850.953192] 5 locks held by irq/16-PCIe PME/56:
> [ 1850.953415]  #0: ffff800082647390 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x30/0x268
> [ 1850.953931]  #1: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.954453]  #2: ffff000004bb6c58 (&data->lock){+...}-{3:3}, at: pcie_pme_irq+0x34/0xc4
> [ 1850.954949]  #3: ffff8000826c6b38 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
> [ 1850.955420]  #4: ffff800082863d10 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x5c/0xd8
> ```
> ...

If/when you update the commit log, please:

  - Update subject line to match history (use "git log --oneline")

  - Drop the timestamps because they don't help understand the issue

  - Drop the ```; just indent quoted material a couple spaces

  - Drop the `` around code names; the () after functions is enough

  - Trim it as others suggested; I appreciate the details about how
    to reproduce it, but they could go after the ---, where we'll be
    able to find them via the Link: tag

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t
  2025-10-24 13:17     ` Sebastian Andrzej Siewior
@ 2025-10-24 17:30       ` Guangbo Cui
  0 siblings, 0 replies; 7+ messages in thread
From: Guangbo Cui @ 2025-10-24 17:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Thomas Gleixner, Bjorn Helgaas,
	Jonathan Cameron, Waiman Long, linux-rt-devel, linux-kernel,
	linux-pci

On Fri, Oct 24, 2025 at 03:17:19PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-10-11 01:18:05 [+0800], Guangbo Cui wrote:
> > I will drop the lock in aer_inject_exit, and update commit msg.
>
> Was here a follow-up?

My apologies - I’ve been quite busy recently and it slipped my mind.
I will post a new version of the patch this weekend.

Best regards,
Guangbo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-24 17:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 15:06 [PATCH v2] pci/aer_inject: switching inject_lock to raw_spinlock_t Guangbo Cui
2025-10-10  2:52 ` Waiman Long
2025-10-10  7:15 ` Sebastian Andrzej Siewior
2025-10-10 17:18   ` Guangbo Cui
2025-10-24 13:17     ` Sebastian Andrzej Siewior
2025-10-24 17:30       ` Guangbo Cui
2025-10-24 16:59 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).