* [PATCH rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init
@ 2024-09-02 10:35 Leon Romanovsky
2024-09-04 7:50 ` Leon Romanovsky
0 siblings, 1 reply; 2+ messages in thread
From: Leon Romanovsky @ 2024-09-02 10:35 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Chris Mi, Jianbo Liu, linux-rdma
From: Chris Mi <cmi@nvidia.com>
The cited commit moves the pd allocation from function
mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
So the fix in commit [1] is broken. In error flow, will hit panic [2].
Fix it by checking pd pointer to avoid panic if it is NULL;
[1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
[2]
[ 347.567063] infiniband mlx5_0: Couldn't register device with driver model
[ 347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 347.593438] #PF: supervisor read access in kernel mode
[ 347.595176] #PF: error_code(0x0000) - not-present page
[ 347.596962] PGD 0 P4D 0
[ 347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
[ 347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
[ 347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
[ 347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
[ 347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
[ 347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
[ 347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
[ 347.609067] FS: 00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
[ 347.610094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
[ 347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 347.612804] Call Trace:
[ 347.613130] <TASK>
[ 347.613417] ? __die+0x20/0x60
[ 347.613793] ? page_fault_oops+0x150/0x3e0
[ 347.614243] ? free_msg+0x68/0x80 [mlx5_core]
[ 347.614840] ? cmd_exec+0x48f/0x11d0 [mlx5_core]
[ 347.615359] ? exc_page_fault+0x74/0x130
[ 347.615808] ? asm_exc_page_fault+0x22/0x30
[ 347.616273] ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
[ 347.616801] mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
[ 347.617365] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
[ 347.618025] __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
[ 347.618539] mlx5r_probe+0xe9/0x310 [mlx5_ib]
[ 347.619032] ? kernfs_add_one+0x107/0x150
[ 347.619478] ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
[ 347.619984] auxiliary_bus_probe+0x3e/0x90
[ 347.620448] really_probe+0xc5/0x3a0
[ 347.620857] __driver_probe_device+0x80/0x160
[ 347.621325] driver_probe_device+0x1e/0x90
[ 347.621770] __driver_attach+0xec/0x1c0
[ 347.622213] ? __device_attach_driver+0x100/0x100
[ 347.622724] bus_for_each_dev+0x71/0xc0
[ 347.623151] bus_add_driver+0xed/0x240
[ 347.623570] driver_register+0x58/0x100
[ 347.623998] __auxiliary_driver_register+0x6a/0xc0
[ 347.624499] ? driver_register+0xae/0x100
[ 347.624940] ? 0xffffffffa0893000
[ 347.625329] mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
[ 347.625845] do_one_initcall+0x4a/0x2a0
[ 347.626273] ? gcov_event+0x2e2/0x3a0
[ 347.626706] do_init_module+0x8a/0x260
[ 347.627126] init_module_from_file+0x8b/0xd0
[ 347.627596] __x64_sys_finit_module+0x1ca/0x2f0
[ 347.628089] do_syscall_64+0x4c/0x100
Fixes: 638420115cc4 ("IB/mlx5: Create UMR QP just before first reg_mr occurs")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/umr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c
index eb74c163fd83..887fd6fa3ba9 100644
--- a/drivers/infiniband/hw/mlx5/umr.c
+++ b/drivers/infiniband/hw/mlx5/umr.c
@@ -224,6 +224,9 @@ int mlx5r_umr_init(struct mlx5_ib_dev *dev)
void mlx5r_umr_cleanup(struct mlx5_ib_dev *dev)
{
+ if (!dev->umrc.pd)
+ return;
+
mutex_destroy(&dev->umrc.init_lock);
ib_dealloc_pd(dev->umrc.pd);
}
--
2.46.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init
2024-09-02 10:35 [PATCH rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init Leon Romanovsky
@ 2024-09-04 7:50 ` Leon Romanovsky
0 siblings, 0 replies; 2+ messages in thread
From: Leon Romanovsky @ 2024-09-04 7:50 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky; +Cc: Chris Mi, Jianbo Liu, linux-rdma
On Mon, 02 Sep 2024 13:35:40 +0300, Leon Romanovsky wrote:
> The cited commit moves the pd allocation from function
> mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
> So the fix in commit [1] is broken. In error flow, will hit panic [2].
>
> Fix it by checking pd pointer to avoid panic if it is NULL;
>
> [1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
> [2]
> [ 347.567063] infiniband mlx5_0: Couldn't register device with driver model
> [ 347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
> [ 347.593438] #PF: supervisor read access in kernel mode
> [ 347.595176] #PF: error_code(0x0000) - not-present page
> [ 347.596962] PGD 0 P4D 0
> [ 347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
> [ 347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
> [ 347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
> [ 347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
> [ 347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
> [ 347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
> [ 347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
> [ 347.609067] FS: 00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
> [ 347.610094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
> [ 347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 347.612804] Call Trace:
> [ 347.613130] <TASK>
> [ 347.613417] ? __die+0x20/0x60
> [ 347.613793] ? page_fault_oops+0x150/0x3e0
> [ 347.614243] ? free_msg+0x68/0x80 [mlx5_core]
> [ 347.614840] ? cmd_exec+0x48f/0x11d0 [mlx5_core]
> [ 347.615359] ? exc_page_fault+0x74/0x130
> [ 347.615808] ? asm_exc_page_fault+0x22/0x30
> [ 347.616273] ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
> [ 347.616801] mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
> [ 347.617365] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
> [ 347.618025] __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
> [ 347.618539] mlx5r_probe+0xe9/0x310 [mlx5_ib]
> [ 347.619032] ? kernfs_add_one+0x107/0x150
> [ 347.619478] ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
> [ 347.619984] auxiliary_bus_probe+0x3e/0x90
> [ 347.620448] really_probe+0xc5/0x3a0
> [ 347.620857] __driver_probe_device+0x80/0x160
> [ 347.621325] driver_probe_device+0x1e/0x90
> [ 347.621770] __driver_attach+0xec/0x1c0
> [ 347.622213] ? __device_attach_driver+0x100/0x100
> [ 347.622724] bus_for_each_dev+0x71/0xc0
> [ 347.623151] bus_add_driver+0xed/0x240
> [ 347.623570] driver_register+0x58/0x100
> [ 347.623998] __auxiliary_driver_register+0x6a/0xc0
> [ 347.624499] ? driver_register+0xae/0x100
> [ 347.624940] ? 0xffffffffa0893000
> [ 347.625329] mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
> [ 347.625845] do_one_initcall+0x4a/0x2a0
> [ 347.626273] ? gcov_event+0x2e2/0x3a0
> [ 347.626706] do_init_module+0x8a/0x260
> [ 347.627126] init_module_from_file+0x8b/0xd0
> [ 347.627596] __x64_sys_finit_module+0x1ca/0x2f0
> [ 347.628089] do_syscall_64+0x4c/0x100
>
> [...]
Applied, thanks!
[1/1] IB/mlx5: Fix UMR pd cleanup on error flow of driver init
https://git.kernel.org/rdma/rdma/c/112e6e83a89426
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-09-04 7:50 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-02 10:35 [PATCH rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init Leon Romanovsky
2024-09-04 7:50 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox