* [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free
@ 2025-01-07 12:12 Leon Romanovsky
2025-01-13 19:46 ` Leon Romanovsky
2025-01-13 19:47 ` Jason Gunthorpe
0 siblings, 2 replies; 3+ messages in thread
From: Leon Romanovsky @ 2025-01-07 12:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Patrisious Haddad, Artemy Kovalyov, linux-rdma, Michael Guralnik
From: Patrisious Haddad <phaddad@nvidia.com>
Prevent double queueing of implicit ODP mr destroy work by adding a bit
to the MR indicating if this MR is already queued for destruction.
Without this bit, we could try to invalidate this mr twice, which in
turn could result in queuing a MR work destroy twice, and eventually the
second work could execute after the MR was freed due to the first work,
causing a user after free and trace below.
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 12178 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x130
Modules linked in: bonding ib_ipoib vfio_pci ip_gre geneve nf_tables ip6_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad rdma_ucm mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 mlx5_ib vfio ib_uverbs mlx5_core iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs]
CPU: 2 PID: 12178 Comm: kworker/u20:5 Not tainted 6.5.0-rc1_net_next_mlx5_58c644e #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound free_implicit_child_mr_work [mlx5_ib]
RIP: 0010:refcount_warn_saturate+0x12b/0x130
Code: 48 c7 c7 38 95 2a 82 c6 05 bc c6 fe 00 01 e8 0c 66 aa ff 0f 0b 5b c3 48 c7 c7 e0 94 2a 82 c6 05 a7 c6 fe 00 01 e8 f5 65 aa ff <0f> 0b 5b c3 90 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 8d 50 ff
RSP: 0018:ffff8881008e3e40 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
RDX: ffff88852c91b5c8 RSI: 0000000000000001 RDI: ffff88852c91b5c0
RBP: ffff8881dacd4e00 R08: 00000000ffffffff R09: 0000000000000019
R10: 000000000000072e R11: 0000000063666572 R12: ffff88812bfd9e00
R13: ffff8881c792d200 R14: ffff88810011c005 R15: ffff8881002099c0
FS: 0000000000000000(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5694b5e000 CR3: 00000001153f6003 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x79/0x120
? refcount_warn_saturate+0x12b/0x130
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0x12b/0x130
free_implicit_child_mr_work+0x180/0x1b0 [mlx5_ib]
? try_to_wake_up+0x5d/0x450
? destroy_sched_domains_rcu+0x30/0x30
process_one_work+0x1cc/0x3c0
worker_thread+0x218/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
---[ end trace 0000000000000000 ]---
Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 ++
drivers/infiniband/hw/mlx5/odp.c | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a0138bdfa3894..696f48d871bda 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -687,6 +687,8 @@ struct mlx5_ib_mr {
/* The mr is data direct related */
u8 data_direct :1;
+ u8 implicit_destroy_queued :1;
+
union {
/* Used only by kernel MRs (umem == NULL) */
struct {
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 513023075c612..b65c3c431ddbe 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -240,6 +240,9 @@ static void destroy_unused_implicit_child_mr(struct mlx5_ib_mr *mr)
unsigned long idx = ib_umem_start(odp) >> MLX5_IMR_MTT_SHIFT;
struct mlx5_ib_mr *imr = mr->parent;
+ if (mr->implicit_destroy_queued)
+ return;
+
if (!refcount_inc_not_zero(&imr->mmkey.usecount))
return;
@@ -248,6 +251,7 @@ static void destroy_unused_implicit_child_mr(struct mlx5_ib_mr *mr)
xa_erase(&mr_to_mdev(mr)->odp_mkeys,
mlx5_base_mkey(mr->mmkey.key));
+ mr->implicit_destroy_queued = 1;
/* Freeing a MR is a sleeping operation, so bounce to a work queue */
INIT_WORK(&mr->odp_destroy.work, free_implicit_child_mr_work);
queue_work(system_unbound_wq, &mr->odp_destroy.work);
--
2.47.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free
2025-01-07 12:12 [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free Leon Romanovsky
@ 2025-01-13 19:46 ` Leon Romanovsky
2025-01-13 19:47 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Leon Romanovsky @ 2025-01-13 19:46 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Patrisious Haddad, Artemy Kovalyov, linux-rdma, Michael Guralnik
On Tue, Jan 07, 2025 at 02:12:53PM +0200, Leon Romanovsky wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
>
> Prevent double queueing of implicit ODP mr destroy work by adding a bit
> to the MR indicating if this MR is already queued for destruction.
>
> Without this bit, we could try to invalidate this mr twice, which in
> turn could result in queuing a MR work destroy twice, and eventually the
> second work could execute after the MR was freed due to the first work,
> causing a user after free and trace below.
>
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 2 PID: 12178 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x130
> Modules linked in: bonding ib_ipoib vfio_pci ip_gre geneve nf_tables ip6_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad rdma_ucm mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 mlx5_ib vfio ib_uverbs mlx5_core iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs]
> CPU: 2 PID: 12178 Comm: kworker/u20:5 Not tainted 6.5.0-rc1_net_next_mlx5_58c644e #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Workqueue: events_unbound free_implicit_child_mr_work [mlx5_ib]
> RIP: 0010:refcount_warn_saturate+0x12b/0x130
> Code: 48 c7 c7 38 95 2a 82 c6 05 bc c6 fe 00 01 e8 0c 66 aa ff 0f 0b 5b c3 48 c7 c7 e0 94 2a 82 c6 05 a7 c6 fe 00 01 e8 f5 65 aa ff <0f> 0b 5b c3 90 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 8d 50 ff
> RSP: 0018:ffff8881008e3e40 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
> RDX: ffff88852c91b5c8 RSI: 0000000000000001 RDI: ffff88852c91b5c0
> RBP: ffff8881dacd4e00 R08: 00000000ffffffff R09: 0000000000000019
> R10: 000000000000072e R11: 0000000063666572 R12: ffff88812bfd9e00
> R13: ffff8881c792d200 R14: ffff88810011c005 R15: ffff8881002099c0
> FS: 0000000000000000(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f5694b5e000 CR3: 00000001153f6003 CR4: 0000000000370ea0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __warn+0x79/0x120
> ? refcount_warn_saturate+0x12b/0x130
> ? report_bug+0x17c/0x190
> ? handle_bug+0x3c/0x60
> ? exc_invalid_op+0x14/0x70
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0x12b/0x130
> free_implicit_child_mr_work+0x180/0x1b0 [mlx5_ib]
> ? try_to_wake_up+0x5d/0x450
> ? destroy_sched_domains_rcu+0x30/0x30
> process_one_work+0x1cc/0x3c0
> worker_thread+0x218/0x3c0
> ? process_one_work+0x3c0/0x3c0
> kthread+0xc6/0xf0
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 ++
> drivers/infiniband/hw/mlx5/odp.c | 4 ++++
> 2 files changed, 6 insertions(+)
I'm dropping this patch, need to rewrite it.
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free
2025-01-07 12:12 [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free Leon Romanovsky
2025-01-13 19:46 ` Leon Romanovsky
@ 2025-01-13 19:47 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Jason Gunthorpe @ 2025-01-13 19:47 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Patrisious Haddad, Artemy Kovalyov, linux-rdma, Michael Guralnik
On Tue, Jan 07, 2025 at 02:12:53PM +0200, Leon Romanovsky wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
>
> Prevent double queueing of implicit ODP mr destroy work by adding a bit
> to the MR indicating if this MR is already queued for destruction.
I do not like adding random state bits to structs, you don't have any
locking protecting these bits, it is probably not safe.
We already have the xarray with its own locking and tracking, just use it:
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 4eb03fc0d30228..858a35a335dff6 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -228,13 +228,20 @@ static void destroy_unused_implicit_child_mr(struct mlx5_ib_mr *mr)
unsigned long idx = ib_umem_start(odp) >> MLX5_IMR_MTT_SHIFT;
struct mlx5_ib_mr *imr = mr->parent;
- if (!refcount_inc_not_zero(&imr->mmkey.usecount))
+ xa_lock(&imr->implicit_children);
+ if (__xa_cmpxchg(&imr->implicit_children, idx, mr, NULL, GFP_KERNEL) !=
+ mr) {
+ xa_unlock(&imr->implicit_children);
return;
+ }
- xa_erase(&imr->implicit_children, idx);
if (MLX5_CAP_ODP(mr_to_mdev(mr)->mdev, mem_page_fault))
xa_erase(&mr_to_mdev(mr)->odp_mkeys,
mlx5_base_mkey(mr->mmkey.key));
+ xa_unlock(&imr->implicit_children);
+
+ if (!refcount_inc_not_zero(&imr->mmkey.usecount))
+ return;
/* Freeing a MR is a sleeping operation, so bounce to a work queue */
INIT_WORK(&mr->odp_destroy.work, free_implicit_child_mr_work);
@@ -500,18 +507,18 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
refcount_inc(&ret->mmkey.usecount);
goto out_lock;
}
- xa_unlock(&imr->implicit_children);
if (MLX5_CAP_ODP(dev->mdev, mem_page_fault)) {
ret = xa_store(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key),
&mr->mmkey, GFP_KERNEL);
if (xa_is_err(ret)) {
ret = ERR_PTR(xa_err(ret));
- xa_erase(&imr->implicit_children, idx);
- goto out_mr;
+ __xa_erase(&imr->implicit_children, idx);
+ goto out_lock;
}
mr->mmkey.type = MLX5_MKEY_IMPLICIT_CHILD;
}
+ xa_unlock(&imr->implicit_children);
mlx5_ib_dbg(mr_to_mdev(imr), "key %x mr %p\n", mr->mmkey.key, mr);
return mr;
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-13 19:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 12:12 [PATCH rdma-next] RDMA/mlx5: Fix implicit ODP use after free Leon Romanovsky
2025-01-13 19:46 ` Leon Romanovsky
2025-01-13 19:47 ` Jason Gunthorpe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.