linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
@ 2025-09-29  9:15 Breno Leitao
  2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Breno Leitao @ 2025-09-29  9:15 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Kuppuswamy Sathyanarayanan, Jon Pan-Doh
  Cc: linuxppc-dev, linux-pci, linux-kernel, kernel-team, stable,
	Breno Leitao

Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
does not rate limit, given this is fatal.

This prevents a kernel crash triggered by dereferencing a NULL pointer
in aer_ratelimit(), ensuring safer handling of PCI devices that lack
AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
which already performs this NULL check.

Cc: stable@vger.kernel.org
Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
- This problem is still happening in upstream, and unfortunately no action
  was done in the previous discussion.
- Link to previous post:
  https://lore.kernel.org/r/20250804-aer_crash_2-v1-1-fd06562c18a4@debian.org
---
 drivers/pci/pcie/aer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e286c197d7167..55abc5e17b8b1 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
 
 static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
 {
+	if (!dev->aer_info)
+		return 1;
+
 	switch (severity) {
 	case AER_NONFATAL:
 		return __ratelimit(&dev->aer_info->nonfatal_ratelimit);

---
base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
change-id: 20250801-aer_crash_2-b21cc2ef0d00

Best regards,
--  
Breno Leitao <leitao@debian.org>



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-09-29  9:15 [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer() Breno Leitao
@ 2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
  2025-10-02 10:06   ` Christophe Leroy
  2025-09-29 17:01 ` Christophe Leroy
  2025-10-01 21:36 ` Bjorn Helgaas
  2 siblings, 1 reply; 8+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2025-09-29 15:10 UTC (permalink / raw)
  To: Breno Leitao, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Jon Pan-Doh
  Cc: linuxppc-dev, linux-pci, linux-kernel, kernel-team, stable


On 9/29/25 2:15 AM, Breno Leitao wrote:
> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
> does not rate limit, given this is fatal.
>
> This prevents a kernel crash triggered by dereferencing a NULL pointer
> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
> which already performs this NULL check.
>
> Cc: stable@vger.kernel.org
> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> - This problem is still happening in upstream, and unfortunately no action
>    was done in the previous discussion.
> - Link to previous post:
>    https://lore.kernel.org/r/20250804-aer_crash_2-v1-1-fd06562c18a4@debian.org
> ---

Although we haven't identified the path that triggers this issue, adding this check is harmless.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>



>   drivers/pci/pcie/aer.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e286c197d7167..55abc5e17b8b1 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>   
>   static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
>   {
> +	if (!dev->aer_info)
> +		return 1;
> +
>   	switch (severity) {
>   	case AER_NONFATAL:
>   		return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
>
> ---
> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
> change-id: 20250801-aer_crash_2-b21cc2ef0d00
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-09-29  9:15 [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer() Breno Leitao
  2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
@ 2025-09-29 17:01 ` Christophe Leroy
  2025-10-01 13:52   ` Breno Leitao
  2025-10-01 21:36 ` Bjorn Helgaas
  2 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2025-09-29 17:01 UTC (permalink / raw)
  To: Breno Leitao, Mahesh J Salgaonkar, Oliver O'Halloran,
	Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Jon Pan-Doh
  Cc: linuxppc-dev, linux-pci, linux-kernel, kernel-team, stable



Le 29/09/2025 à 11:15, Breno Leitao a écrit :
> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
> does not rate limit, given this is fatal.
> 
> This prevents a kernel crash triggered by dereferencing a NULL pointer
> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
> which already performs this NULL check.
> 
> Cc: stable@vger.kernel.org
> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> - This problem is still happening in upstream, and unfortunately no action
>    was done in the previous discussion.
> - Link to previous post:
>    https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20250804-aer_crash_2-v1-1-fd06562c18a4%40debian.org&data=05%7C02%7Cchristophe.leroy2%40cs-soprasteria.com%7Cf48f0ae03ec542e13e5408ddff38d9d9%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638947341818450358%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ZzJDmrmyDpWh4JZQQzKFZVf%2BeYucLdNOr5L6tgytNPE%3D&reserved=0
> ---
>   drivers/pci/pcie/aer.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e286c197d7167..55abc5e17b8b1 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>   
>   static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
>   {
> +	if (!dev->aer_info)
> +		return 1;
> +

This is a static function, it cannot be called from outside aer.c . Why 
do you need such a check ?

I a check was to be made it should be in pci_aer_init() and in fact if 
kmalloc fails then all the probe should be made to fail.

>   	switch (severity) {
>   	case AER_NONFATAL:
>   		return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
> 
> ---
> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
> change-id: 20250801-aer_crash_2-b21cc2ef0d00
> 
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-09-29 17:01 ` Christophe Leroy
@ 2025-10-01 13:52   ` Breno Leitao
  0 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2025-10-01 13:52 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Kuppuswamy Sathyanarayanan, Jon Pan-Doh, linuxppc-dev, linux-pci,
	linux-kernel, kernel-team, stable

Hello Christophe,

On Mon, Sep 29, 2025 at 07:01:43PM +0200, Christophe Leroy wrote:
> Le 29/09/2025 à 11:15, Breno Leitao a écrit :
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index e286c197d7167..55abc5e17b8b1 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
> >   static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
> >   {
> > +	if (!dev->aer_info)
> > +		return 1;
> > +
> 
> This is a static function, it cannot be called from outside aer.c . Why do
> you need such a check ?

I have more than 5 hosts crash in recent kernel due to this issue. This
is causing real crashes.

	[30745.821301] [ T964809] BUG: kernel NULL pointer dereference, address: 0000000000000264
	[30745.835249] [ T964809] #PF: supervisor read access in kernel mode
	[30745.845541] [ T964809] #PF: error_code(0x0000) - not-present page
	[30745.855831] [ T964809] PGD 7a26aa067 P4D 7a26aa067 PUD 2a807a067 PMD 0
	[30745.867267] [ T964809] Oops: Oops: 0000 [#1] SMP
	[30745.900469] [ T964809] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
	[30745.911999] [ T964809] Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A23 12/08/2020
	[30745.929557] [ T964809] Workqueue: events aer_recover_work_func
	[30745.939339] [     T964809] RIP: 0010:___ratelimit (lib/ratelimit.c:33)
	[30745.947896] [ T964809] Code: 89 06 48 8d 45 10 48 89 46 08 48 89 e0 48 89 46 10 48 89 df e8 18 48 0d 00 48 8d 65 f8 5b 5d c3 cc 55 41 57 41 56 41 54 53 50 <4c> 63 77 04 4d 85 f6 0f 9e c0 8b 5f 08 85 db 0f 9e c1 08 c1 80 f9
	All code
	========
	0:	89 06                	mov    %eax,(%rsi)
	2:	48 8d 45 10          	lea    0x10(%rbp),%rax
	6:	48 89 46 08          	mov    %rax,0x8(%rsi)
	a:	48 89 e0             	mov    %rsp,%rax
	d:	48 89 46 10          	mov    %rax,0x10(%rsi)
	11:	48 89 df             	mov    %rbx,%rdi
	14:	e8 18 48 0d 00       	call   0xd4831
	19:	48 8d 65 f8          	lea    -0x8(%rbp),%rsp
	1d:	5b                   	pop    %rbx
	1e:	5d                   	pop    %rbp
	1f:	c3                   	ret
	20:	cc                   	int3
	21:	55                   	push   %rbp
	22:	41 57                	push   %r15
	24:	41 56                	push   %r14
	26:	41 54                	push   %r12
	28:	53                   	push   %rbx
	29:	50                   	push   %rax
	2a:*	4c 63 77 04          	movslq 0x4(%rdi),%r14		<-- trapping instruction
	2e:	4d 85 f6             	test   %r14,%r14
	31:	0f 9e c0             	setle  %al
	34:	8b 5f 08             	mov    0x8(%rdi),%ebx
	37:	85 db                	test   %ebx,%ebx
	39:	0f 9e c1             	setle  %cl
	3c:	08 c1                	or     %al,%cl
	3e:	80                   	.byte 0x80
	3f:	f9                   	stc

	Code starting with the faulting instruction
	===========================================
	0:	4c 63 77 04          	movslq 0x4(%rdi),%r14
	4:	4d 85 f6             	test   %r14,%r14
	7:	0f 9e c0             	setle  %al
	a:	8b 5f 08             	mov    0x8(%rdi),%ebx
	d:	85 db                	test   %ebx,%ebx
	f:	0f 9e c1             	setle  %cl
	12:	08 c1                	or     %al,%cl
	14:	80                   	.byte 0x80
	15:	f9                   	stc
	[30745.985517] [ T964809] RSP: 0018:ffffc9002a10fc88 EFLAGS: 00010206
	[30745.996094] [ T964809] RAX: 0000000000000002 RBX: 0000000000000002 RCX: ffffffff8260c509
	[30746.010396] [ T964809] RDX: 0000000000000000 RSI: ffffffff825f37b7 RDI: 0000000000000260
	[30746.024710] [ T964809] RBP: 0000000000000000 R08: 8080808080808080 R09: 0000000000000000
	[30746.039044] [ T964809] R10: ffffc9002a10fcf0 R11: 8080000000000000 R12: 0000000000000000
	[30746.053349] [ T964809] R13: 0000000000000000 R14: ffff8882851f4000 R15: ffffc90000c312e0
	[30746.067649] [ T964809] FS:  0000000000000000(0000) GS:ffff8890fa91e000(0000) knlGS:0000000000000000
	[30746.083875] [ T964809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[30746.095394] [ T964809] CR2: 0000000000000264 CR3: 00000008150b6002 CR4: 00000000007726f0
	[30746.109695] [ T964809] PKRU: 55555554
	[30746.115123] [ T964809] Call Trace:
	[30746.120028] [ T964809]  <TASK>
	[30746.124253] [     T964809] pci_print_aer (drivers/pci/pcie/aer.c:929)
	[30746.140335] [     T964809] aer_recover_work_func (drivers/pci/pcie/aer.c:1181)
	[30746.149070] [     T964809] process_scheduled_works (kernel/workqueue.c:3243 kernel/workqueue.c:3321)
	[30746.158326] [     T964809] ? worker_thread (kernel/workqueue.c:3355)
	[30746.166195] [     T964809] worker_thread (./include/linux/list.h:373 kernel/workqueue.c:946 kernel/workqueue.c:3403)
	[30746.173718] [     T964809] kthread (kernel/kthread.c:466)
	[30746.180018] [     T964809] ? kick_pool (kernel/workqueue.c:3348)
	[30746.187183] [     T964809] ? finish_task_switch (./include/linux/perf_event.h:? kernel/sched/core.c:5260)
	[30746.195911] [     T964809] ? housekeeping_affine (kernel/kthread.c:413)
	[30746.204502] [     T964809] ret_from_fork (arch/x86/kernel/process.c:154)
	[30746.211842] [     T964809] ? housekeeping_affine (kernel/kthread.c:413)
	[30746.220400] [     T964809] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)


This stack is based on commit 038d61fd64227  ("Linux 6.16").

> I a check was to be made it should be in pci_aer_init() and in fact if
> kmalloc fails then all the probe should be made to fail.

Ok, the stack is not going through pci_aer_init(), but, through
aer_recover_work_func() -> pci_print_aer().


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-09-29  9:15 [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer() Breno Leitao
  2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
  2025-09-29 17:01 ` Christophe Leroy
@ 2025-10-01 21:36 ` Bjorn Helgaas
  2025-10-02  9:10   ` Breno Leitao
  2 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2025-10-01 21:36 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Kuppuswamy Sathyanarayanan, Jon Pan-Doh, linuxppc-dev, linux-pci,
	linux-kernel, kernel-team, stable

On Mon, Sep 29, 2025 at 02:15:47AM -0700, Breno Leitao wrote:
> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
> does not rate limit, given this is fatal.
> 
> This prevents a kernel crash triggered by dereferencing a NULL pointer
> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
> which already performs this NULL check.
> 
> Cc: stable@vger.kernel.org
> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> Signed-off-by: Breno Leitao <leitao@debian.org>

Thanks, Breno, I applied this to pci/aer for v6.18.  I added a little
more detail to the commit log because the path where we hit this is a
bit obscure.  Please take a look and see if it makes sense:

  https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/commit/?id=451f30b97807

> ---
> - This problem is still happening in upstream, and unfortunately no action
>   was done in the previous discussion.
> - Link to previous post:
>   https://lore.kernel.org/r/20250804-aer_crash_2-v1-1-fd06562c18a4@debian.org
> ---
>  drivers/pci/pcie/aer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e286c197d7167..55abc5e17b8b1 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>  
>  static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
>  {
> +	if (!dev->aer_info)
> +		return 1;
> +
>  	switch (severity) {
>  	case AER_NONFATAL:
>  		return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
> 
> ---
> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
> change-id: 20250801-aer_crash_2-b21cc2ef0d00
> 
> Best regards,
> --  
> Breno Leitao <leitao@debian.org>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-10-01 21:36 ` Bjorn Helgaas
@ 2025-10-02  9:10   ` Breno Leitao
  0 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2025-10-02  9:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
	Kuppuswamy Sathyanarayanan, Jon Pan-Doh, linuxppc-dev, linux-pci,
	linux-kernel, kernel-team, stable

On Wed, Oct 01, 2025 at 04:36:57PM -0500, Bjorn Helgaas wrote:
> On Mon, Sep 29, 2025 at 02:15:47AM -0700, Breno Leitao wrote:
> > Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
> > when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
> > calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
> > does not rate limit, given this is fatal.
> > 
> > This prevents a kernel crash triggered by dereferencing a NULL pointer
> > in aer_ratelimit(), ensuring safer handling of PCI devices that lack
> > AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
> > which already performs this NULL check.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> Thanks, Breno, I applied this to pci/aer for v6.18.  I added a little
> more detail to the commit log because the path where we hit this is a
> bit obscure.  Please take a look and see if it makes sense:

Thanks! That’s exactly what I would have written if I actually knew what
I was doing. :-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
@ 2025-10-02 10:06   ` Christophe Leroy
  2025-10-02 18:10     ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2025-10-02 10:06 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Breno Leitao, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Jon Pan-Doh
  Cc: linuxppc-dev, linux-pci, linux-kernel, kernel-team, stable



Le 29/09/2025 à 17:10, Sathyanarayanan Kuppuswamy a écrit :
> 
> On 9/29/25 2:15 AM, Breno Leitao wrote:
>> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
>> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
>> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
>> does not rate limit, given this is fatal.
>>
>> This prevents a kernel crash triggered by dereferencing a NULL pointer
>> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
>> AER info. This change aligns pci_print_aer() with 
>> pci_dev_aer_stats_incr()
>> which already performs this NULL check.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal 
>> error logging")
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>> - This problem is still happening in upstream, and unfortunately no 
>> action
>>    was done in the previous discussion.
>> - Link to previous post:
>>    https://eur01.safelinks.protection.outlook.com/? 
>> url=https%3A%2F%2Flore.kernel.org%2Fr%2F20250804-aer_crash_2-v1-1- 
>> fd06562c18a4%40debian.org&data=05%7C02%7Cchristophe.leroy2%40cs- 
>> soprasteria.com%7Cfd3d2f1b4e8448a8e67608ddff6a4e70%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638947554250805439%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6yTN1%2Fq%2Fy0VKX%2BXpE%2BiKxBrn19AkY4IPj01N2ZdxEkg%3D&reserved=0
>> ---
> 
> Although we haven't identified the path that triggers this issue, adding 
> this check is harmless.

Is it really harmless ?

The purpose of the function is to ratelimit logs. Here by returning 1 
when dev->aer_info is NULL it says: don't ratelimit. Isn't it an opened 
door to Denial of Service by overloading with logs ?

Christophe

> 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> 
> 
>>   drivers/pci/pcie/aer.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index e286c197d7167..55abc5e17b8b1 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct 
>> pci_dev *pdev,
>>   static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
>>   {
>> +    if (!dev->aer_info)
>> +        return 1;
>> +
>>       switch (severity) {
>>       case AER_NONFATAL:
>>           return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
>>
>> ---
>> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
>> change-id: 20250801-aer_crash_2-b21cc2ef0d00
>>
>> Best regards,
>> -- 
>> Breno Leitao <leitao@debian.org>
>>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer()
  2025-10-02 10:06   ` Christophe Leroy
@ 2025-10-02 18:10     ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 8+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2025-10-02 18:10 UTC (permalink / raw)
  To: Christophe Leroy, Breno Leitao, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Jon Pan-Doh
  Cc: linuxppc-dev, linux-pci, linux-kernel, kernel-team, stable


On 10/2/25 03:06, Christophe Leroy wrote:
>
>
> Le 29/09/2025 à 17:10, Sathyanarayanan Kuppuswamy a écrit :
>>
>> On 9/29/25 2:15 AM, Breno Leitao wrote:
>>> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
>>> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
>>> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
>>> does not rate limit, given this is fatal.
>>>
>>> This prevents a kernel crash triggered by dereferencing a NULL pointer
>>> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
>>> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
>>> which already performs this NULL check.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>> - This problem is still happening in upstream, and unfortunately no action
>>>    was done in the previous discussion.
>>> - Link to previous post:
>>>    https://eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Flore.kernel.org%2Fr%2F20250804-aer_crash_2-v1-1- fd06562c18a4%40debian.org&data=05%7C02%7Cchristophe.leroy2%40cs- soprasteria.com%7Cfd3d2f1b4e8448a8e67608ddff6a4e70%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638947554250805439%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6yTN1%2Fq%2Fy0VKX%2BXpE%2BiKxBrn19AkY4IPj01N2ZdxEkg%3D&reserved=0
>>> ---
>>
>> Although we haven't identified the path that triggers this issue, adding this check is harmless.
>
> Is it really harmless ?
>
> The purpose of the function is to ratelimit logs. Here by returning 1 when dev->aer_info is NULL it says: don't ratelimit. Isn't it an opened door to Denial of Service by overloading with logs ?

We only skip rate limiting when dev->aer_info is NULL, which happens for
devices without AER capability. In that case, I think the trade-off is reasonable:
generating more logs is better than triggering a NULL pointer exception.

Also, this approach is consistent with other functions (for example, the stat
collection helpers) that already perform similar checks before accessing
aer_info. So extending the same safeguard here seems acceptable to me.

>
> Christophe
>
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>>
>>
>>>   drivers/pci/pcie/aer.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index e286c197d7167..55abc5e17b8b1 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>>>   static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
>>>   {
>>> +    if (!dev->aer_info)
>>> +        return 1;
>>> +
>>>       switch (severity) {
>>>       case AER_NONFATAL:
>>>           return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
>>>
>>> ---
>>> base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
>>> change-id: 20250801-aer_crash_2-b21cc2ef0d00
>>>
>>> Best regards,
>>> -- 
>>> Breno Leitao <leitao@debian.org>
>>>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-02 18:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-29  9:15 [PATCH RESEND] PCI/AER: Check for NULL aer_info before ratelimiting in pci_print_aer() Breno Leitao
2025-09-29 15:10 ` Sathyanarayanan Kuppuswamy
2025-10-02 10:06   ` Christophe Leroy
2025-10-02 18:10     ` Kuppuswamy Sathyanarayanan
2025-09-29 17:01 ` Christophe Leroy
2025-10-01 13:52   ` Breno Leitao
2025-10-01 21:36 ` Bjorn Helgaas
2025-10-02  9:10   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).