* [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported Leon Romanovsky
` (6 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Edward Srouji, linux-rdma, Maor Gottlieb,
Mark Zhang, Michael Guralnik, Or Har-Toov, Tamar Mashiah,
Yishai Hadas
From: Leon Romanovsky <leonro@nvidia.com>
------------[ cut here ]------------
memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2)
WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy
[last unloaded: mlx_compat(OE)]
CPU: 0 PID: 293779 Comm: ssh Tainted: G OE 6.2.0-32-generic #32~22.04.1-Ubuntu
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7
RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8
R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80
FS: 00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? show_regs+0x72/0x90
? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
? __warn+0x8d/0x160
? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
? report_bug+0x1bb/0x1d0
? handle_bug+0x46/0x90
? exc_invalid_op+0x19/0x80
? asm_exc_invalid_op+0x1b/0x20
? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib]
ipoib_send+0x2ec/0x770 [ib_ipoib]
ipoib_start_xmit+0x5a0/0x770 [ib_ipoib]
dev_hard_start_xmit+0x8e/0x1e0
? validate_xmit_skb_list+0x4d/0x80
sch_direct_xmit+0x116/0x3a0
__dev_xmit_skb+0x1fd/0x580
__dev_queue_xmit+0x284/0x6b0
? _raw_spin_unlock_irq+0xe/0x50
? __flush_work.isra.0+0x20d/0x370
? push_pseudo_header+0x17/0x40 [ib_ipoib]
neigh_connected_output+0xcd/0x110
ip_finish_output2+0x179/0x480
? __smp_call_single_queue+0x61/0xa0
__ip_finish_output+0xc3/0x190
ip_finish_output+0x2e/0xf0
ip_output+0x78/0x110
? __pfx_ip_finish_output+0x10/0x10
ip_local_out+0x64/0x70
__ip_queue_xmit+0x18a/0x460
ip_queue_xmit+0x15/0x30
__tcp_transmit_skb+0x914/0x9c0
tcp_write_xmit+0x334/0x8d0
tcp_push_one+0x3c/0x60
tcp_sendmsg_locked+0x2e1/0xac0
tcp_sendmsg+0x2d/0x50
inet_sendmsg+0x43/0x90
sock_sendmsg+0x68/0x80
sock_write_iter+0x93/0x100
vfs_write+0x326/0x3c0
ksys_write+0xbd/0xf0
? do_syscall_64+0x69/0x90
__x64_sys_write+0x19/0x30
do_syscall_64+0x59/0x90
? do_user_addr_fault+0x1d0/0x640
? exit_to_user_mode_prepare+0x3b/0xd0
? irqentry_exit_to_user_mode+0x9/0x20
? irqentry_exit+0x43/0x50
? exc_page_fault+0x92/0x1b0
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fc03ad14a37
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37
RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003
RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50
</TASK>
---[ end trace 0000000000000000 ]---
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/wr.c | 2 +-
include/linux/mlx5/qp.h | 5 ++++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/wr.c b/drivers/infiniband/hw/mlx5/wr.c
index df1d1b0a3ef7..9947feb7fb8a 100644
--- a/drivers/infiniband/hw/mlx5/wr.c
+++ b/drivers/infiniband/hw/mlx5/wr.c
@@ -78,7 +78,7 @@ static void set_eth_seg(const struct ib_send_wr *wr, struct mlx5_ib_qp *qp,
*/
copysz = min_t(u64, *cur_edge - (void *)eseg->inline_hdr.start,
left);
- memcpy(eseg->inline_hdr.start, pdata, copysz);
+ memcpy(eseg->inline_hdr.data, pdata, copysz);
stride = ALIGN(sizeof(struct mlx5_wqe_eth_seg) -
sizeof(eseg->inline_hdr.start) + copysz, 16);
*size += stride / 16;
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index bd53cf4be7bd..f0e55bf3ec8b 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -269,7 +269,10 @@ struct mlx5_wqe_eth_seg {
union {
struct {
__be16 sz;
- u8 start[2];
+ union {
+ u8 start[2];
+ DECLARE_FLEX_ARRAY(u8, data);
+ };
} inline_hdr;
struct {
__be16 type;
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands Leon Romanovsky
` (5 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Zhang, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Michael Guralnik, Or Har-Toov, Tamar Mashiah,
Yishai Hadas
From: Mark Zhang <markzhang@nvidia.com>
debugfs entries for RRoCE general CC parameters must be exposed only when
they are supported, otherwise when accessing them there may be a syndrome
error in kernel log, for example:
$ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp
cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument
$ dmesg
mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22)
Fixes: 66fb1d5df6ac ("IB/mlx5: Extend debug control for CC parameters")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/cong.c | 6 ++++++
include/linux/mlx5/mlx5_ifc.h | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index f87531318feb..a78a067e3ce7 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -458,6 +458,12 @@ void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u32 port_num)
dbg_cc_params->root = debugfs_create_dir("cc_params", mlx5_debugfs_get_dev_root(mdev));
for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
+ if ((i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP_VALID ||
+ i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP))
+ if (!MLX5_CAP_GEN(mdev, roce) ||
+ !MLX5_CAP_ROCE(mdev, roce_cc_general))
+ continue;
+
dbg_cc_params->params[i].offset = i;
dbg_cc_params->params[i].dev = dev;
dbg_cc_params->params[i].port_num = port_num;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index bf5320b28b8b..2c10350bd422 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1103,7 +1103,7 @@ struct mlx5_ifc_roce_cap_bits {
u8 sw_r_roce_src_udp_port[0x1];
u8 fl_rc_qp_when_roce_disabled[0x1];
u8 fl_rc_qp_when_roce_enabled[0x1];
- u8 reserved_at_7[0x1];
+ u8 roce_cc_general[0x1];
u8 qp_ooo_transmit_default[0x1];
u8 reserved_at_9[0x15];
u8 qp_ts_format[0x2];
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent Leon Romanovsky
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Yishai Hadas, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Mark Zhang, Michael Guralnik, Or Har-Toov,
Tamar Mashiah
From: Yishai Hadas <yishaih@nvidia.com>
Relax DEVX access upon modify commands to be UVERBS_ACCESS_READ.
The kernel doesn't need to protect what firmware protects, or what
causes no damage to anyone but the user.
As firmware needs to protect itself from parallel access to the same
object, don't block parallel modify/query commands on the same object in
the kernel side.
This change will allow user space application to run parallel updates to
different entries in the same bulk object.
Tested-by: Tamar Mashiah <tmashiah@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/devx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 4d8f5180134e..9d91790a2af2 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2950,7 +2950,7 @@ DECLARE_UVERBS_NAMED_METHOD(
MLX5_IB_METHOD_DEVX_OBJ_MODIFY,
UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_OBJ_MODIFY_HANDLE,
UVERBS_IDR_ANY_OBJECT,
- UVERBS_ACCESS_WRITE,
+ UVERBS_ACCESS_READ,
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(
MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN,
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
` (2 preceding siblings ...)
2024-01-28 9:29 ` [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-28 9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
Yishai Hadas
From: Or Har-Toov <ohartoov@nvidia.com>
As some mkeys can't be modified with UMR due to some UMR limitations,
like the size of translation that can be updated, not all user mkeys can
be cached.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4bdf3da579f4..69b1722c2280 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -643,7 +643,7 @@ struct mlx5_ib_mkey {
unsigned int ndescs;
struct wait_queue_head wait;
refcount_t usecount;
- /* User Mkey must hold either a rb_key or a cache_ent. */
+ /* Cacheable user Mkey must hold either a rb_key or a cache_ent. */
struct mlx5r_cache_rb_key rb_key;
struct mlx5_cache_ent *cache_ent;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
` (3 preceding siblings ...)
2024-01-28 9:29 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-29 17:52 ` Jason Gunthorpe
2024-01-28 9:29 ` [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags Leon Romanovsky
` (2 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
Yishai Hadas
From: Or Har-Toov <ohartoov@nvidia.com>
In the dereg flow, UMEM is not a good enough indication whether an MR
is from userspace since in mlx5_ib_rereg_user_mr there are some cases
when a new MR is created and the UMEM of the old MR is set to NULL.
Currently when mlx5_ib_dereg_mr is called on the old MR, UMEM is NULL
but cache_ent can be different than NULL. So, the mkey will not be
destroyed.
Therefore checking if mkey is from user application and cacheable
should be done by checking if rb_key or cache_ent exist and all other kind of
mkeys should be destroyed.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mr.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 12bca6ca4760..87552a689e07 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1857,6 +1857,11 @@ static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
return ret;
}
+static bool is_cacheable_mkey(struct mlx5_ib_mkey *mkey)
+{
+ return mkey->cache_ent || mkey->rb_key.ndescs;
+}
+
int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
{
struct mlx5_ib_mr *mr = to_mmr(ibmr);
@@ -1901,12 +1906,6 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
mr->sig = NULL;
}
- /* Stop DMA */
- if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
- if (mlx5r_umr_revoke_mr(mr) ||
- cache_ent_find_and_store(dev, mr))
- mr->mmkey.cache_ent = NULL;
-
if (mr->umem && mr->umem->is_peer) {
rc = mlx5r_umr_revoke_mr(mr);
if (rc)
@@ -1914,7 +1913,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
ib_umem_stop_invalidation_notifier(mr->umem);
}
- if (!mr->mmkey.cache_ent) {
+ /* Stop DMA */
+ if (!is_cacheable_mkey(&mr->mmkey) || mlx5r_umr_revoke_mr(mr) ||
+ cache_ent_find_and_store(dev, mr)) {
rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
if (rc)
return rc;
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
2024-01-28 9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
@ 2024-01-29 17:52 ` Jason Gunthorpe
2024-01-30 13:47 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2024-01-29 17:52 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
Yishai Hadas
On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> From: Or Har-Toov <ohartoov@nvidia.com>
>
> In the dereg flow, UMEM is not a good enough indication whether an MR
> is from userspace since in mlx5_ib_rereg_user_mr there are some cases
> when a new MR is created and the UMEM of the old MR is set to NULL.
Why is this a problem though? The only thing the umem has to do is to
trigger the UMR optimization. If UMR is not triggered then the mkey is
destroyed and it shouldn't be part of the cache at all.
> Currently when mlx5_ib_dereg_mr is called on the old MR, UMEM is NULL
> but cache_ent can be different than NULL. So, the mkey will not be
> destroyed.
> Therefore checking if mkey is from user application and cacheable
> should be done by checking if rb_key or cache_ent exist and all other kind of
> mkeys should be destroyed.
>
> Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/mr.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 12bca6ca4760..87552a689e07 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1857,6 +1857,11 @@ static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
> return ret;
> }
>
> +static bool is_cacheable_mkey(struct mlx5_ib_mkey *mkey)
> +{
> + return mkey->cache_ent || mkey->rb_key.ndescs;
> +}
> +
> int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> {
> struct mlx5_ib_mr *mr = to_mmr(ibmr);
> @@ -1901,12 +1906,6 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> mr->sig = NULL;
> }
>
> - /* Stop DMA */
> - if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
> - if (mlx5r_umr_revoke_mr(mr) ||
> - cache_ent_find_and_store(dev, mr))
> - mr->mmkey.cache_ent = NULL;
> -
> if (mr->umem && mr->umem->is_peer) {
> rc = mlx5r_umr_revoke_mr(mr);
> if (rc)
?? this isn't based on an upstream tree
> @@ -1914,7 +1913,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> ib_umem_stop_invalidation_notifier(mr->umem);
> }
>
> - if (!mr->mmkey.cache_ent) {
> + /* Stop DMA */
> + if (!is_cacheable_mkey(&mr->mmkey) || mlx5r_umr_revoke_mr(mr) ||
> + cache_ent_find_and_store(dev, mr)) {
And now the mlx5r_umr_can_load_pas() can been lost, that isn't good. A
non-umr-able object should never be placed in the cache. If the mkey's
size is too big it has to be freed normally.
> rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
> if (rc)
> return rc;
I'm not sure it is right to re-order this? The revokation of a mkey
should be a single operation, which ever path we choose to take..
Regardless the upstream code doesn't have this ordering so it should
all be one sequence of revoking the mkey and synchronizing the cache.
I suggest to put the revoke sequence into one function:
static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length)) {
if (mlx5r_umr_revoke_mr(mr))
goto destroy;
if (cache_ent_find_and_store(dev, mr))
goto destroy;
return 0;
}
destroy:
if (mr->mmkey.cache_ent) {
spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
mr->mmkey.cache_ent->in_use--;
mr->mmkey.cache_ent = NULL;
spin_unlock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
}
return destroy_mkey(dev, mr);
}
(notice we probably shouldn't set cache_ent to null without adjusting in_use)
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
2024-01-29 17:52 ` Jason Gunthorpe
@ 2024-01-30 13:47 ` Leon Romanovsky
0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-30 13:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Or Har-Toov, Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
Michael Guralnik, Tamar Mashiah, Yishai Hadas
On Mon, Jan 29, 2024 at 01:52:39PM -0400, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> > From: Or Har-Toov <ohartoov@nvidia.com>
<...>
> > if (mr->umem && mr->umem->is_peer) {
> > rc = mlx5r_umr_revoke_mr(mr);
> > if (rc)
>
> ?? this isn't based on an upstream tree
Yes, it is my mistake. I will fix it.
Thanks
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
` (4 preceding siblings ...)
2024-01-28 9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
@ 2024-01-28 9:29 ` Leon Romanovsky
2024-01-31 9:16 ` (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
2024-01-31 9:18 ` Leon Romanovsky
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28 9:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
Yishai Hadas
From: Or Har-Toov <ohartoov@nvidia.com>
Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR
although in some cases it should be possible. These cases are checked in
mlx5r_umr_can_reconfig function.
Fixes: ef3642c4f54d ("RDMA/mlx5: Fix error unwinds for rereg_mr")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/mr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 87552a689e07..db8c436de6ee 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1581,7 +1581,8 @@ static bool can_use_umr_rereg_access(struct mlx5_ib_dev *dev,
unsigned int diffs = current_access_flags ^ target_access_flags;
if (diffs & ~(IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
- IB_ACCESS_REMOTE_READ | IB_ACCESS_RELAXED_ORDERING))
+ IB_ACCESS_REMOTE_READ | IB_ACCESS_RELAXED_ORDERING |
+ IB_ACCESS_REMOTE_ATOMIC))
return false;
return mlx5r_umr_can_reconfig(dev, current_access_flags,
target_access_flags);
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
` (5 preceding siblings ...)
2024-01-28 9:29 ` [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags Leon Romanovsky
@ 2024-01-31 9:16 ` Leon Romanovsky
2024-01-31 9:18 ` Leon Romanovsky
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-31 9:16 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
Michael Guralnik, Or Har-Toov, Tamar Mashiah, Yishai Hadas,
Leon Romanovsky
On Sun, 28 Jan 2024 11:29:10 +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Changelog:
> v1:
> * Changed function signature is_cacheable_mkey to pass pointer and not value.
> v0: https://lore.kernel.org/all/cover.1706185318.git.leon@kernel.org
>
> [...]
Applied, thanks!
[1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment
https://git.kernel.org/rdma/rdma/c/4d5e86a56615cc
[2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
https://git.kernel.org/rdma/rdma/c/43fdbd140238d4
[3/6] RDMA/mlx5: Relax DEVX access upon modify commands
https://git.kernel.org/rdma/rdma/c/be551ee1574280
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes
2024-01-28 9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
` (6 preceding siblings ...)
2024-01-31 9:16 ` (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
@ 2024-01-31 9:18 ` Leon Romanovsky
7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-31 9:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
Michael Guralnik, Or Har-Toov, Tamar Mashiah, Yishai Hadas
On Sun, Jan 28, 2024 at 11:29:10AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
<...>
> Leon Romanovsky (1):
> RDMA/mlx5: Fix fortify source warning while accessing Eth segment
>
> Mark Zhang (1):
> IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if
> not supported
>
> Yishai Hadas (1):
> RDMA/mlx5: Relax DEVX access upon modify commands
Applied these patches to -rc.
>
> Or Har-Toov (3):
> RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
> RDMA/mlx5: Change check for cacheable user mkeys
> RDMA/mlx5: Adding remote atomic access flag to updatable flags
These patches under discussion and will be needed to resend anyway.
Thanks
^ permalink raw reply [flat|nested] 16+ messages in thread