* [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
@ 2025-06-27 6:50 Mohith Kumar Thummaluru
2025-07-02 17:58 ` Jacob Keller
2025-09-16 5:24 ` Shay Drori
0 siblings, 2 replies; 10+ messages in thread
From: Mohith Kumar Thummaluru @ 2025-06-27 6:50 UTC (permalink / raw)
To: saeedm@nvidia.com, leon@kernel.org, tariqt@nvidia.com,
netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, jacob.e.keller@intel.com,
shayd@nvidia.com, elic@nvidia.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Mohith Kumar Thummaluru,
Anand Khoje, Manjunath Patil, Rama Nichanamatlu,
Rajesh Sivaramasubramaniom, Rohit Sajan Kumar, Moshe Shemesh,
Mark Bloch, Qing Huang
The mlx5_irq_alloc() function can inadvertently free the entire rmap
and end up in a crash[1] when the other threads tries to access this,
when request_irq() fails due to exhausted IRQ vectors. This commit
modifies the cleanup to remove only the specific IRQ mapping that was
just added.
This prevents removal of other valid mappings and ensures precise
cleanup of the failed IRQ allocation's associated glue object.
Note: This error is observed when both fwctl and rds configs are enabled.
[1]
mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
general protection fault, probably for non-canonical address
0xe277a58fde16f291: 0000 [#1] SMP NOPTI
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Call Trace:
<TASK>
? show_trace_log_lvl+0x1d6/0x2f9
? show_trace_log_lvl+0x1d6/0x2f9
? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
? __die_body.cold+0x8/0xa
? die_addr+0x39/0x53
? exc_general_protection+0x1c4/0x3e9
? dev_vprintk_emit+0x5f/0x90
? asm_exc_general_protection+0x22/0x27
? free_irq_cpu_rmap+0x23/0x7d
mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
irq_pool_request_vector+0x7d/0x90 [mlx5_core]
mlx5_irq_request+0x2e/0xe0 [mlx5_core]
mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
comp_irq_request_pci+0x64/0xf0 [mlx5_core]
create_comp_eq+0x71/0x385 [mlx5_core]
? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
? xas_load+0x8/0x91
mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
mlx5e_open_channels+0xad/0x250 [mlx5_core]
mlx5e_open_locked+0x3e/0x110 [mlx5_core]
mlx5e_open+0x23/0x70 [mlx5_core]
__dev_open+0xf1/0x1a5
__dev_change_flags+0x1e1/0x249
dev_change_flags+0x21/0x5c
do_setlink+0x28b/0xcc4
? __nla_parse+0x22/0x3d
? inet6_validate_link_af+0x6b/0x108
? cpumask_next+0x1f/0x35
? __snmp6_fill_stats64.constprop.0+0x66/0x107
? __nla_validate_parse+0x48/0x1e6
__rtnl_newlink+0x5ff/0xa57
? kmem_cache_alloc_trace+0x164/0x2ce
rtnl_newlink+0x44/0x6e
rtnetlink_rcv_msg+0x2bb/0x362
? __netlink_sendskb+0x4c/0x6c
? netlink_unicast+0x28f/0x2ce
? rtnl_calcit.isra.0+0x150/0x146
netlink_rcv_skb+0x5f/0x112
netlink_unicast+0x213/0x2ce
netlink_sendmsg+0x24f/0x4d9
__sock_sendmsg+0x65/0x6a
____sys_sendmsg+0x28f/0x2c9
? import_iovec+0x17/0x2b
___sys_sendmsg+0x97/0xe0
__sys_sendmsg+0x81/0xd8
do_syscall_64+0x35/0x87
entry_SYSCALL_64_after_hwframe+0x6e/0x0
RIP: 0033:0x7fc328603727
Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
</TASK>
---[ end trace f43ce73c3c2b13a2 ]---
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
kvm-guest: disable async PF for cpu 0
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Mohith Kumar
Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 40024cfa3099..822e92ed2d45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -325,8 +325,7 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool
*pool, int i,
err_req_irq:
#ifdef CONFIG_RFS_ACCEL
if (i && rmap && *rmap) {
- free_irq_cpu_rmap(*rmap);
- *rmap = NULL;
+ irq_cpu_rmap_remove(*rmap, irq->map.virq);
}
err_irq_rmap:
#endif
--
2.43.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-06-27 6:50 [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure Mohith Kumar Thummaluru
@ 2025-07-02 17:58 ` Jacob Keller
2025-09-12 3:56 ` Pradyumn Rahar
2025-09-16 5:24 ` Shay Drori
1 sibling, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2025-07-02 17:58 UTC (permalink / raw)
To: Mohith Kumar Thummaluru, saeedm@nvidia.com, leon@kernel.org,
tariqt@nvidia.com, netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, shayd@nvidia.com,
elic@nvidia.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Anand Khoje, Manjunath Patil,
Rama Nichanamatlu, Rajesh Sivaramasubramaniom, Rohit Sajan Kumar,
Moshe Shemesh, Mark Bloch, Qing Huang
[-- Attachment #1.1: Type: text/plain, Size: 679 bytes --]
On 6/26/2025 11:50 PM, Mohith Kumar Thummaluru wrote:
> The mlx5_irq_alloc() function can inadvertently free the entire rmap
> and end up in a crash[1] when the other threads tries to access this,
> when request_irq() fails due to exhausted IRQ vectors. This commit
> modifies the cleanup to remove only the specific IRQ mapping that was
> just added.
>
> This prevents removal of other valid mappings and ensures precise
> cleanup of the failed IRQ allocation's associated glue object.
>
> Note: This error is observed when both fwctl and rds configs are enabled.
>
FWIW, figured i should add it here:
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-07-02 17:58 ` Jacob Keller
@ 2025-09-12 3:56 ` Pradyumn Rahar
0 siblings, 0 replies; 10+ messages in thread
From: Pradyumn Rahar @ 2025-09-12 3:56 UTC (permalink / raw)
To: Jacob Keller, Mohith Kumar Thummaluru, saeedm@nvidia.com,
leon@kernel.org, tariqt@nvidia.com, netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, shayd@nvidia.com,
elic@nvidia.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Anand Khoje, Manjunath Patil,
Rama Nichanamatlu, Rajesh Sivaramasubramaniom, Rohit Sajan Kumar,
Moshe Shemesh, Mark Bloch, Qing Huang
On 02-07-2025 23:28, Jacob Keller wrote:
> On 6/26/2025 11:50 PM, Mohith Kumar Thummaluru wrote:
>> The mlx5_irq_alloc() function can inadvertently free the entire rmap
>> and end up in a crash[1] when the other threads tries to access this,
>> when request_irq() fails due to exhausted IRQ vectors. This commit
>> modifies the cleanup to remove only the specific IRQ mapping that was
>> just added.
>>
>> This prevents removal of other valid mappings and ensures precise
>> cleanup of the failed IRQ allocation's associated glue object.
>>
>> Note: This error is observed when both fwctl and rds configs are enabled.
>>
> FWIW, figured i should add it here:
>
> Reviewed-by: Jacob Keller<jacob.e.keller@intel.com>
Hi, this patch has been reviewed but hasn't been applied, could you
please look into it?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-06-27 6:50 [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure Mohith Kumar Thummaluru
2025-07-02 17:58 ` Jacob Keller
@ 2025-09-16 5:24 ` Shay Drori
2025-09-16 19:53 ` Tariq Toukan
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
1 sibling, 2 replies; 10+ messages in thread
From: Shay Drori @ 2025-09-16 5:24 UTC (permalink / raw)
To: Mohith Kumar Thummaluru, saeedm@nvidia.com, leon@kernel.org,
tariqt@nvidia.com, netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, jacob.e.keller@intel.com,
elic@nvidia.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Anand Khoje, Manjunath Patil,
Rama Nichanamatlu, Rajesh Sivaramasubramaniom, Rohit Sajan Kumar,
Moshe Shemesh, Mark Bloch, Qing Huang
Hi, sorry for the late response :(
On 27/06/2025 9:50, Mohith Kumar Thummaluru wrote:
> External email: Use caution opening links or attachments
>
>
> The mlx5_irq_alloc() function can inadvertently free the entire rmap
> and end up in a crash[1] when the other threads tries to access this,
> when request_irq() fails due to exhausted IRQ vectors. This commit
> modifies the cleanup to remove only the specific IRQ mapping that was
> just added.
>
> This prevents removal of other valid mappings and ensures precise
> cleanup of the failed IRQ allocation's associated glue object.
>
> Note: This error is observed when both fwctl and rds configs are enabled.
>
> [1]
> mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> general protection fault, probably for non-canonical address
> 0xe277a58fde16f291: 0000 [#1] SMP NOPTI
>
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Call Trace:
> <TASK>
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> ? __die_body.cold+0x8/0xa
> ? die_addr+0x39/0x53
> ? exc_general_protection+0x1c4/0x3e9
> ? dev_vprintk_emit+0x5f/0x90
> ? asm_exc_general_protection+0x22/0x27
> ? free_irq_cpu_rmap+0x23/0x7d
> mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> irq_pool_request_vector+0x7d/0x90 [mlx5_core]
> mlx5_irq_request+0x2e/0xe0 [mlx5_core]
> mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
> comp_irq_request_pci+0x64/0xf0 [mlx5_core]
> create_comp_eq+0x71/0x385 [mlx5_core]
> ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
> mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
> ? xas_load+0x8/0x91
> mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
> mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
> mlx5e_open_channels+0xad/0x250 [mlx5_core]
> mlx5e_open_locked+0x3e/0x110 [mlx5_core]
> mlx5e_open+0x23/0x70 [mlx5_core]
> __dev_open+0xf1/0x1a5
> __dev_change_flags+0x1e1/0x249
> dev_change_flags+0x21/0x5c
> do_setlink+0x28b/0xcc4
> ? __nla_parse+0x22/0x3d
> ? inet6_validate_link_af+0x6b/0x108
> ? cpumask_next+0x1f/0x35
> ? __snmp6_fill_stats64.constprop.0+0x66/0x107
> ? __nla_validate_parse+0x48/0x1e6
> __rtnl_newlink+0x5ff/0xa57
> ? kmem_cache_alloc_trace+0x164/0x2ce
> rtnl_newlink+0x44/0x6e
> rtnetlink_rcv_msg+0x2bb/0x362
> ? __netlink_sendskb+0x4c/0x6c
> ? netlink_unicast+0x28f/0x2ce
> ? rtnl_calcit.isra.0+0x150/0x146
> netlink_rcv_skb+0x5f/0x112
> netlink_unicast+0x213/0x2ce
> netlink_sendmsg+0x24f/0x4d9
> __sock_sendmsg+0x65/0x6a
> ____sys_sendmsg+0x28f/0x2c9
> ? import_iovec+0x17/0x2b
> ___sys_sendmsg+0x97/0xe0
> __sys_sendmsg+0x81/0xd8
> do_syscall_64+0x35/0x87
> entry_SYSCALL_64_after_hwframe+0x6e/0x0
> RIP: 0033:0x7fc328603727
> Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
> ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
> RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
> RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
> RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
> </TASK>
> ---[ end trace f43ce73c3c2b13a2 ]---
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
> 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
> f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
> RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
> RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
> RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
> R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
> FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> kvm-guest: disable async PF for cpu 0
>
>
> Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
> Signed-off-by: Mohith Kumar
> Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> index 40024cfa3099..822e92ed2d45 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> @@ -325,8 +325,7 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool
> *pool, int i,
> err_req_irq:
> #ifdef CONFIG_RFS_ACCEL
> if (i && rmap && *rmap) {
> - free_irq_cpu_rmap(*rmap);
> - *rmap = NULL;
> + irq_cpu_rmap_remove(*rmap, irq->map.virq);
> }
now that the condition is only one line, you need to remove the
parenthesis.
other than that.
Reviewed-by: Shay Drory <shayd@nvidia.com>
> err_irq_rmap:
> #endif
> --
> 2.43.5
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-09-16 5:24 ` Shay Drori
@ 2025-09-16 19:53 ` Tariq Toukan
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
1 sibling, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-09-16 19:53 UTC (permalink / raw)
To: Shay Drori, Mohith Kumar Thummaluru, saeedm@nvidia.com,
leon@kernel.org, tariqt@nvidia.com, netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, jacob.e.keller@intel.com,
elic@nvidia.com, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Anand Khoje, Manjunath Patil,
Rama Nichanamatlu, Rajesh Sivaramasubramaniom, Rohit Sajan Kumar,
Moshe Shemesh, Mark Bloch, Qing Huang
On 16/09/2025 8:24, Shay Drori wrote:
> Hi, sorry for the late response :(
>
> On 27/06/2025 9:50, Mohith Kumar Thummaluru wrote:
>> External email: Use caution opening links or attachments
>>
>>
..
>
> now that the condition is only one line, you need to remove the
> parenthesis.
>
> other than that.
> Reviewed-by: Shay Drory <shayd@nvidia.com>
>
LGTM.
Acked-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-09-16 5:24 ` Shay Drori
2025-09-16 19:53 ` Tariq Toukan
@ 2025-09-23 6:28 ` Pradyumn Rahar
2025-09-24 18:14 ` Keller, Jacob E
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Pradyumn Rahar @ 2025-09-23 6:28 UTC (permalink / raw)
To: shayd
Cc: anand.a.khoje, andrew+netdev, davem, edumazet, elic,
jacob.e.keller, kuba, leon, linux-kernel, linux-rdma,
manjunath.b.patil, mbloch, pradyumn.rahar, moshe, netdev, pabeni,
qing.huang, rajesh.sivaramasubramaniom, rama.nichanamatlu,
rohit.sajan.kumar, saeedm, tariqt
The mlx5_irq_alloc() function can inadvertently free the entire rmap
and end up in a crash[1] when the other threads tries to access this,
when request_irq() fails due to exhausted IRQ vectors. This commit
modifies the cleanup to remove only the specific IRQ mapping that was
just added.
This prevents removal of other valid mappings and ensures precise
cleanup of the failed IRQ allocation's associated glue object.
Note: This error is observed when both fwctl and rds configs are enabled.
[1]
mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
general protection fault, probably for non-canonical address
0xe277a58fde16f291: 0000 [#1] SMP NOPTI
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Call Trace:
<TASK>
? show_trace_log_lvl+0x1d6/0x2f9
? show_trace_log_lvl+0x1d6/0x2f9
? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
? __die_body.cold+0x8/0xa
? die_addr+0x39/0x53
? exc_general_protection+0x1c4/0x3e9
? dev_vprintk_emit+0x5f/0x90
? asm_exc_general_protection+0x22/0x27
? free_irq_cpu_rmap+0x23/0x7d
mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
irq_pool_request_vector+0x7d/0x90 [mlx5_core]
mlx5_irq_request+0x2e/0xe0 [mlx5_core]
mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
comp_irq_request_pci+0x64/0xf0 [mlx5_core]
create_comp_eq+0x71/0x385 [mlx5_core]
? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
? xas_load+0x8/0x91
mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
mlx5e_open_channels+0xad/0x250 [mlx5_core]
mlx5e_open_locked+0x3e/0x110 [mlx5_core]
mlx5e_open+0x23/0x70 [mlx5_core]
__dev_open+0xf1/0x1a5
__dev_change_flags+0x1e1/0x249
dev_change_flags+0x21/0x5c
do_setlink+0x28b/0xcc4
? __nla_parse+0x22/0x3d
? inet6_validate_link_af+0x6b/0x108
? cpumask_next+0x1f/0x35
? __snmp6_fill_stats64.constprop.0+0x66/0x107
? __nla_validate_parse+0x48/0x1e6
__rtnl_newlink+0x5ff/0xa57
? kmem_cache_alloc_trace+0x164/0x2ce
rtnl_newlink+0x44/0x6e
rtnetlink_rcv_msg+0x2bb/0x362
? __netlink_sendskb+0x4c/0x6c
? netlink_unicast+0x28f/0x2ce
? rtnl_calcit.isra.0+0x150/0x146
netlink_rcv_skb+0x5f/0x112
netlink_unicast+0x213/0x2ce
netlink_sendmsg+0x24f/0x4d9
__sock_sendmsg+0x65/0x6a
____sys_sendmsg+0x28f/0x2c9
? import_iovec+0x17/0x2b
___sys_sendmsg+0x97/0xe0
__sys_sendmsg+0x81/0xd8
do_syscall_64+0x35/0x87
entry_SYSCALL_64_after_hwframe+0x6e/0x0
RIP: 0033:0x7fc328603727
Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
</TASK>
---[ end trace f43ce73c3c2b13a2 ]---
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
kvm-guest: disable async PF for cpu 0
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
---
v1->v2: removed unnecessary braces from if conditon.
---
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 692ef9c2f729..82ada674f8e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i,
free_irq(irq->map.virq, &irq->nh);
err_req_irq:
#ifdef CONFIG_RFS_ACCEL
- if (i && rmap && *rmap) {
- free_irq_cpu_rmap(*rmap);
- *rmap = NULL;
- }
+ if (i && rmap && *rmap)
+ irq_cpu_rmap_remove(*rmap, irq->map.virq);
err_irq_rmap:
#endif
if (i && pci_msix_can_alloc_dyn(dev->pdev))
--
2.43.7
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
@ 2025-09-24 18:14 ` Keller, Jacob E
2025-09-28 5:55 ` Tariq Toukan
2025-10-27 14:29 ` Pradyumn Rahar
2 siblings, 0 replies; 10+ messages in thread
From: Keller, Jacob E @ 2025-09-24 18:14 UTC (permalink / raw)
To: Pradyumn Rahar, shayd@nvidia.com
Cc: anand.a.khoje@oracle.com, andrew+netdev@lunn.ch,
davem@davemloft.net, edumazet@google.com, elic@nvidia.com,
kuba@kernel.org, leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, manjunath.b.patil@oracle.com,
mbloch@nvidia.com, moshe@nvidia.com, netdev@vger.kernel.org,
pabeni@redhat.com, qing.huang@oracle.com,
rajesh.sivaramasubramaniom@oracle.com,
rama.nichanamatlu@oracle.com, rohit.sajan.kumar@oracle.com,
saeedm@nvidia.com, tariqt@nvidia.com
> -----Original Message-----
> From: Pradyumn Rahar <pradyumn.rahar@oracle.com>
> Sent: Monday, September 22, 2025 11:28 PM
> To: shayd@nvidia.com
> Cc: anand.a.khoje@oracle.com; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; elic@nvidia.com; Keller, Jacob E
> <jacob.e.keller@intel.com>; kuba@kernel.org; leon@kernel.org; linux-
> kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
> manjunath.b.patil@oracle.com; mbloch@nvidia.com;
> pradyumn.rahar@oracle.com; moshe@nvidia.com; netdev@vger.kernel.org;
> pabeni@redhat.com; qing.huang@oracle.com;
> rajesh.sivaramasubramaniom@oracle.com; rama.nichanamatlu@oracle.com;
> rohit.sajan.kumar@oracle.com; saeedm@nvidia.com; tariqt@nvidia.com
> Subject: [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq()
> failure
>
> The mlx5_irq_alloc() function can inadvertently free the entire rmap
> and end up in a crash[1] when the other threads tries to access this,
> when request_irq() fails due to exhausted IRQ vectors. This commit
> modifies the cleanup to remove only the specific IRQ mapping that was
> just added.
>
> This prevents removal of other valid mappings and ensures precise
> cleanup of the failed IRQ allocation's associated glue object.
>
> Note: This error is observed when both fwctl and rds configs are enabled.
>
> [1]
> mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> general protection fault, probably for non-canonical address
> 0xe277a58fde16f291: 0000 [#1] SMP NOPTI
>
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Call Trace:
> <TASK>
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> ? __die_body.cold+0x8/0xa
> ? die_addr+0x39/0x53
> ? exc_general_protection+0x1c4/0x3e9
> ? dev_vprintk_emit+0x5f/0x90
> ? asm_exc_general_protection+0x22/0x27
> ? free_irq_cpu_rmap+0x23/0x7d
> mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> irq_pool_request_vector+0x7d/0x90 [mlx5_core]
> mlx5_irq_request+0x2e/0xe0 [mlx5_core]
> mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
> comp_irq_request_pci+0x64/0xf0 [mlx5_core]
> create_comp_eq+0x71/0x385 [mlx5_core]
> ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
> mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
> ? xas_load+0x8/0x91
> mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
> mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
> mlx5e_open_channels+0xad/0x250 [mlx5_core]
> mlx5e_open_locked+0x3e/0x110 [mlx5_core]
> mlx5e_open+0x23/0x70 [mlx5_core]
> __dev_open+0xf1/0x1a5
> __dev_change_flags+0x1e1/0x249
> dev_change_flags+0x21/0x5c
> do_setlink+0x28b/0xcc4
> ? __nla_parse+0x22/0x3d
> ? inet6_validate_link_af+0x6b/0x108
> ? cpumask_next+0x1f/0x35
> ? __snmp6_fill_stats64.constprop.0+0x66/0x107
> ? __nla_validate_parse+0x48/0x1e6
> __rtnl_newlink+0x5ff/0xa57
> ? kmem_cache_alloc_trace+0x164/0x2ce
> rtnl_newlink+0x44/0x6e
> rtnetlink_rcv_msg+0x2bb/0x362
> ? __netlink_sendskb+0x4c/0x6c
> ? netlink_unicast+0x28f/0x2ce
> ? rtnl_calcit.isra.0+0x150/0x146
> netlink_rcv_skb+0x5f/0x112
> netlink_unicast+0x213/0x2ce
> netlink_sendmsg+0x24f/0x4d9
> __sock_sendmsg+0x65/0x6a
> ____sys_sendmsg+0x28f/0x2c9
> ? import_iovec+0x17/0x2b
> ___sys_sendmsg+0x97/0xe0
> __sys_sendmsg+0x81/0xd8
> do_syscall_64+0x35/0x87
> entry_SYSCALL_64_after_hwframe+0x6e/0x0
> RIP: 0033:0x7fc328603727
> Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
> ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
> RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
> RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
> RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
> </TASK>
> ---[ end trace f43ce73c3c2b13a2 ]---
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
> 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
> f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
> RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
> RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
> RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
> R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
> FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> kvm-guest: disable async PF for cpu 0
>
> Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
> Signed-off-by: Mohith Kumar
> Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Tested-by: Mohith Kumar
> Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
> ---
> v1->v2: removed unnecessary braces from if conditon.
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> index 692ef9c2f729..82ada674f8e2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> @@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool
> *pool, int i,
> free_irq(irq->map.virq, &irq->nh);
> err_req_irq:
> #ifdef CONFIG_RFS_ACCEL
> - if (i && rmap && *rmap) {
> - free_irq_cpu_rmap(*rmap);
> - *rmap = NULL;
> - }
> + if (i && rmap && *rmap)
> + irq_cpu_rmap_remove(*rmap, irq->map.virq);
Presumably if this fails during initialization, the caller of mlx5_irq_alloc which allocates multiple IRQs would be responsible for cleaning up anything it allocated before failing. Makes sense. Cleaning up only what this function did makes more sense.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> err_irq_rmap:
> #endif
> if (i && pci_msix_can_alloc_dyn(dev->pdev))
> --
> 2.43.7
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
2025-09-24 18:14 ` Keller, Jacob E
@ 2025-09-28 5:55 ` Tariq Toukan
2025-10-27 14:29 ` Pradyumn Rahar
2 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-09-28 5:55 UTC (permalink / raw)
To: Pradyumn Rahar, shayd
Cc: anand.a.khoje, andrew+netdev, davem, edumazet, elic,
jacob.e.keller, kuba, leon, linux-kernel, linux-rdma,
manjunath.b.patil, mbloch, moshe, netdev, pabeni, qing.huang,
rajesh.sivaramasubramaniom, rama.nichanamatlu, rohit.sajan.kumar,
saeedm, tariqt
On 23/09/2025 9:28, Pradyumn Rahar wrote:
> The mlx5_irq_alloc() function can inadvertently free the entire rmap
> and end up in a crash[1] when the other threads tries to access this,
> when request_irq() fails due to exhausted IRQ vectors. This commit
> modifies the cleanup to remove only the specific IRQ mapping that was
> just added.
>
> This prevents removal of other valid mappings and ensures precise
> cleanup of the failed IRQ allocation's associated glue object.
>
> Note: This error is observed when both fwctl and rds configs are enabled.
>
> [1]
> mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> general protection fault, probably for non-canonical address
> 0xe277a58fde16f291: 0000 [#1] SMP NOPTI
>
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Call Trace:
> <TASK>
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> ? __die_body.cold+0x8/0xa
> ? die_addr+0x39/0x53
> ? exc_general_protection+0x1c4/0x3e9
> ? dev_vprintk_emit+0x5f/0x90
> ? asm_exc_general_protection+0x22/0x27
> ? free_irq_cpu_rmap+0x23/0x7d
> mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> irq_pool_request_vector+0x7d/0x90 [mlx5_core]
> mlx5_irq_request+0x2e/0xe0 [mlx5_core]
> mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
> comp_irq_request_pci+0x64/0xf0 [mlx5_core]
> create_comp_eq+0x71/0x385 [mlx5_core]
> ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
> mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
> ? xas_load+0x8/0x91
> mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
> mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
> mlx5e_open_channels+0xad/0x250 [mlx5_core]
> mlx5e_open_locked+0x3e/0x110 [mlx5_core]
> mlx5e_open+0x23/0x70 [mlx5_core]
> __dev_open+0xf1/0x1a5
> __dev_change_flags+0x1e1/0x249
> dev_change_flags+0x21/0x5c
> do_setlink+0x28b/0xcc4
> ? __nla_parse+0x22/0x3d
> ? inet6_validate_link_af+0x6b/0x108
> ? cpumask_next+0x1f/0x35
> ? __snmp6_fill_stats64.constprop.0+0x66/0x107
> ? __nla_validate_parse+0x48/0x1e6
> __rtnl_newlink+0x5ff/0xa57
> ? kmem_cache_alloc_trace+0x164/0x2ce
> rtnl_newlink+0x44/0x6e
> rtnetlink_rcv_msg+0x2bb/0x362
> ? __netlink_sendskb+0x4c/0x6c
> ? netlink_unicast+0x28f/0x2ce
> ? rtnl_calcit.isra.0+0x150/0x146
> netlink_rcv_skb+0x5f/0x112
> netlink_unicast+0x213/0x2ce
> netlink_sendmsg+0x24f/0x4d9
> __sock_sendmsg+0x65/0x6a
> ____sys_sendmsg+0x28f/0x2c9
> ? import_iovec+0x17/0x2b
> ___sys_sendmsg+0x97/0xe0
> __sys_sendmsg+0x81/0xd8
> do_syscall_64+0x35/0x87
> entry_SYSCALL_64_after_hwframe+0x6e/0x0
> RIP: 0033:0x7fc328603727
> Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
> ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
> RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
> RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
> RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
> </TASK>
> ---[ end trace f43ce73c3c2b13a2 ]---
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
> 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
> f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
> RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
> RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
> RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
> R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
> FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> kvm-guest: disable async PF for cpu 0
>
> Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
> Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
> ---
> v1->v2: removed unnecessary braces from if conditon.
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> index 692ef9c2f729..82ada674f8e2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> @@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i,
> free_irq(irq->map.virq, &irq->nh);
> err_req_irq:
> #ifdef CONFIG_RFS_ACCEL
> - if (i && rmap && *rmap) {
> - free_irq_cpu_rmap(*rmap);
> - *rmap = NULL;
> - }
> + if (i && rmap && *rmap)
> + irq_cpu_rmap_remove(*rmap, irq->map.virq);
> err_irq_rmap:
> #endif
> if (i && pci_msix_can_alloc_dyn(dev->pdev))
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
2025-09-24 18:14 ` Keller, Jacob E
2025-09-28 5:55 ` Tariq Toukan
@ 2025-10-27 14:29 ` Pradyumn Rahar
2025-11-03 11:37 ` Tariq Toukan
2 siblings, 1 reply; 10+ messages in thread
From: Pradyumn Rahar @ 2025-10-27 14:29 UTC (permalink / raw)
To: shayd
Cc: anand.a.khoje, andrew+netdev, davem, edumazet, elic,
jacob.e.keller, kuba, leon, linux-kernel, linux-rdma,
manjunath.b.patil, mbloch, moshe, netdev, pabeni, qing.huang,
rajesh.sivaramasubramaniom, rama.nichanamatlu, rohit.sajan.kumar,
saeedm, tariqt
On 23-09-2025 11:58, Pradyumn Rahar wrote:
> The mlx5_irq_alloc() function can inadvertently free the entire rmap
> and end up in a crash[1] when the other threads tries to access this,
> when request_irq() fails due to exhausted IRQ vectors. This commit
> modifies the cleanup to remove only the specific IRQ mapping that was
> just added.
>
> This prevents removal of other valid mappings and ensures precise
> cleanup of the failed IRQ allocation's associated glue object.
>
> Note: This error is observed when both fwctl and rds configs are enabled.
>
> [1]
> mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
> mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
> request irq. err = -28
> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
> trying to test write-combining support
> mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
> mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
> request irq. err = -28
> general protection fault, probably for non-canonical address
> 0xe277a58fde16f291: 0000 [#1] SMP NOPTI
>
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Call Trace:
> <TASK>
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? show_trace_log_lvl+0x1d6/0x2f9
> ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> ? __die_body.cold+0x8/0xa
> ? die_addr+0x39/0x53
> ? exc_general_protection+0x1c4/0x3e9
> ? dev_vprintk_emit+0x5f/0x90
> ? asm_exc_general_protection+0x22/0x27
> ? free_irq_cpu_rmap+0x23/0x7d
> mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
> irq_pool_request_vector+0x7d/0x90 [mlx5_core]
> mlx5_irq_request+0x2e/0xe0 [mlx5_core]
> mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
> comp_irq_request_pci+0x64/0xf0 [mlx5_core]
> create_comp_eq+0x71/0x385 [mlx5_core]
> ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
> mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
> ? xas_load+0x8/0x91
> mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
> mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
> mlx5e_open_channels+0xad/0x250 [mlx5_core]
> mlx5e_open_locked+0x3e/0x110 [mlx5_core]
> mlx5e_open+0x23/0x70 [mlx5_core]
> __dev_open+0xf1/0x1a5
> __dev_change_flags+0x1e1/0x249
> dev_change_flags+0x21/0x5c
> do_setlink+0x28b/0xcc4
> ? __nla_parse+0x22/0x3d
> ? inet6_validate_link_af+0x6b/0x108
> ? cpumask_next+0x1f/0x35
> ? __snmp6_fill_stats64.constprop.0+0x66/0x107
> ? __nla_validate_parse+0x48/0x1e6
> __rtnl_newlink+0x5ff/0xa57
> ? kmem_cache_alloc_trace+0x164/0x2ce
> rtnl_newlink+0x44/0x6e
> rtnetlink_rcv_msg+0x2bb/0x362
> ? __netlink_sendskb+0x4c/0x6c
> ? netlink_unicast+0x28f/0x2ce
> ? rtnl_calcit.isra.0+0x150/0x146
> netlink_rcv_skb+0x5f/0x112
> netlink_unicast+0x213/0x2ce
> netlink_sendmsg+0x24f/0x4d9
> __sock_sendmsg+0x65/0x6a
> ____sys_sendmsg+0x28f/0x2c9
> ? import_iovec+0x17/0x2b
> ___sys_sendmsg+0x97/0xe0
> __sys_sendmsg+0x81/0xd8
> do_syscall_64+0x35/0x87
> entry_SYSCALL_64_after_hwframe+0x6e/0x0
> RIP: 0033:0x7fc328603727
> Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
> ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
> RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
> RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
> RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
> </TASK>
> ---[ end trace f43ce73c3c2b13a2 ]---
> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
> Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
> 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
> f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
> RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
> RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
> RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
> R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
> FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> kvm-guest: disable async PF for cpu 0
>
> Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
> Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
> Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
> ---
> v1->v2: removed unnecessary braces from if conditon.
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> index 692ef9c2f729..82ada674f8e2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
> @@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct mlx5_irq_pool *pool, int i,
> free_irq(irq->map.virq, &irq->nh);
> err_req_irq:
> #ifdef CONFIG_RFS_ACCEL
> - if (i && rmap && *rmap) {
> - free_irq_cpu_rmap(*rmap);
> - *rmap = NULL;
> - }
> + if (i && rmap && *rmap)
> + irq_cpu_rmap_remove(*rmap, irq->map.virq);
> err_irq_rmap:
> #endif
> if (i && pci_msix_can_alloc_dyn(dev->pdev))
Hi, this patch has been reviewed but hasn't been applied yet. Could you
please look into it?
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure
2025-10-27 14:29 ` Pradyumn Rahar
@ 2025-11-03 11:37 ` Tariq Toukan
0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-11-03 11:37 UTC (permalink / raw)
To: Pradyumn Rahar, shayd
Cc: anand.a.khoje, andrew+netdev, davem, edumazet, elic,
jacob.e.keller, kuba, leon, linux-kernel, linux-rdma,
manjunath.b.patil, mbloch, moshe, netdev, pabeni, qing.huang,
rajesh.sivaramasubramaniom, rama.nichanamatlu, rohit.sajan.kumar,
saeedm, tariqt
On 27/10/2025 16:29, Pradyumn Rahar wrote:
>
> On 23-09-2025 11:58, Pradyumn Rahar wrote:
>> The mlx5_irq_alloc() function can inadvertently free the entire rmap
>> and end up in a crash[1] when the other threads tries to access this,
>> when request_irq() fails due to exhausted IRQ vectors. This commit
>> modifies the cleanup to remove only the specific IRQ mapping that was
>> just added.
>>
>> This prevents removal of other valid mappings and ensures precise
>> cleanup of the failed IRQ allocation's associated glue object.
>>
>> Note: This error is observed when both fwctl and rds configs are enabled.
>>
>> [1]
>> mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
>> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
>> request irq. err = -28
>> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
>> trying to test write-combining support
>> mlx5_core 0000:05:00.0: Successfully unregistered panic handler for
>> port 1
>> mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
>> mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
>> request irq. err = -28
>> infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
>> trying to test write-combining support
>> mlx5_core 0000:06:00.0: Successfully unregistered panic handler for
>> port 1
>> mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
>> request irq. err = -28
>> mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
>> request irq. err = -28
>> general protection fault, probably for non-canonical address
>> 0xe277a58fde16f291: 0000 [#1] SMP NOPTI
>>
>> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
>> Call Trace:
>> <TASK>
>> ? show_trace_log_lvl+0x1d6/0x2f9
>> ? show_trace_log_lvl+0x1d6/0x2f9
>> ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
>> ? __die_body.cold+0x8/0xa
>> ? die_addr+0x39/0x53
>> ? exc_general_protection+0x1c4/0x3e9
>> ? dev_vprintk_emit+0x5f/0x90
>> ? asm_exc_general_protection+0x22/0x27
>> ? free_irq_cpu_rmap+0x23/0x7d
>> mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
>> irq_pool_request_vector+0x7d/0x90 [mlx5_core]
>> mlx5_irq_request+0x2e/0xe0 [mlx5_core]
>> mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
>> comp_irq_request_pci+0x64/0xf0 [mlx5_core]
>> create_comp_eq+0x71/0x385 [mlx5_core]
>> ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
>> mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
>> ? xas_load+0x8/0x91
>> mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
>> mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
>> mlx5e_open_channels+0xad/0x250 [mlx5_core]
>> mlx5e_open_locked+0x3e/0x110 [mlx5_core]
>> mlx5e_open+0x23/0x70 [mlx5_core]
>> __dev_open+0xf1/0x1a5
>> __dev_change_flags+0x1e1/0x249
>> dev_change_flags+0x21/0x5c
>> do_setlink+0x28b/0xcc4
>> ? __nla_parse+0x22/0x3d
>> ? inet6_validate_link_af+0x6b/0x108
>> ? cpumask_next+0x1f/0x35
>> ? __snmp6_fill_stats64.constprop.0+0x66/0x107
>> ? __nla_validate_parse+0x48/0x1e6
>> __rtnl_newlink+0x5ff/0xa57
>> ? kmem_cache_alloc_trace+0x164/0x2ce
>> rtnl_newlink+0x44/0x6e
>> rtnetlink_rcv_msg+0x2bb/0x362
>> ? __netlink_sendskb+0x4c/0x6c
>> ? netlink_unicast+0x28f/0x2ce
>> ? rtnl_calcit.isra.0+0x150/0x146
>> netlink_rcv_skb+0x5f/0x112
>> netlink_unicast+0x213/0x2ce
>> netlink_sendmsg+0x24f/0x4d9
>> __sock_sendmsg+0x65/0x6a
>> ____sys_sendmsg+0x28f/0x2c9
>> ? import_iovec+0x17/0x2b
>> ___sys_sendmsg+0x97/0xe0
>> __sys_sendmsg+0x81/0xd8
>> do_syscall_64+0x35/0x87
>> entry_SYSCALL_64_after_hwframe+0x6e/0x0
>> RIP: 0033:0x7fc328603727
>> Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
>> ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
>> f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
>> RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
>> RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
>> RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
>> R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
>> </TASK>
>> ---[ end trace f43ce73c3c2b13a2 ]---
>> RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
>> Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
>> 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
>> f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
>> RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
>> RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
>> RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
>> R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
>> FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> PKRU: 55555554
>> Kernel panic - not syncing: Fatal exception
>> Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
>> 0xffffffff80000000-0xffffffffbfffffff)
>> kvm-guest: disable async PF for cpu 0
>>
>> Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
>> Signed-off-by: Mohith Kumar
>> Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
>> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
>> Reviewed-by: Moshe Shemesh<moshe@nvidia.com>
>> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
>> ---
>> v1->v2: removed unnecessary braces from if conditon.
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 6 ++----
>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/
>> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> index 692ef9c2f729..82ada674f8e2 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> @@ -324,10 +324,8 @@ struct mlx5_irq *mlx5_irq_alloc(struct
>> mlx5_irq_pool *pool, int i,
>> free_irq(irq->map.virq, &irq->nh);
>> err_req_irq:
>> #ifdef CONFIG_RFS_ACCEL
>> - if (i && rmap && *rmap) {
>> - free_irq_cpu_rmap(*rmap);
>> - *rmap = NULL;
>> - }
>> + if (i && rmap && *rmap)
>> + irq_cpu_rmap_remove(*rmap, irq->map.virq);
>> err_irq_rmap:
>> #endif
>> if (i && pci_msix_can_alloc_dyn(dev->pdev))
>
> Hi, this patch has been reviewed but hasn't been applied yet. Could you
> please look into it?
>
> Thanks.
>
>
I'll re-send it.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-03 11:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 6:50 [PATCH net 1/1] net/mlx5: Clean up only new IRQ glue on request_irq() failure Mohith Kumar Thummaluru
2025-07-02 17:58 ` Jacob Keller
2025-09-12 3:56 ` Pradyumn Rahar
2025-09-16 5:24 ` Shay Drori
2025-09-16 19:53 ` Tariq Toukan
2025-09-23 6:28 ` [PATCH net v2 " Pradyumn Rahar
2025-09-24 18:14 ` Keller, Jacob E
2025-09-28 5:55 ` Tariq Toukan
2025-10-27 14:29 ` Pradyumn Rahar
2025-11-03 11:37 ` Tariq Toukan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).