linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes
@ 2024-01-28  9:29 Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Edward Srouji, linux-rdma, Maor Gottlieb,
	Mark Zhang, Michael Guralnik, Or Har-Toov, Tamar Mashiah,
	Yishai Hadas

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v1:
 * Changed function signature is_cacheable_mkey to pass pointer and not value.
v0: https://lore.kernel.org/all/cover.1706185318.git.leon@kernel.org

---------------------------------------------------------------------------------

Hi,

Collection of independent fixes for mlx5_ib driver.

Thanks

Leon Romanovsky (1):
  RDMA/mlx5: Fix fortify source warning while accessing Eth segment

Mark Zhang (1):
  IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if
    not supported

Or Har-Toov (3):
  RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
  RDMA/mlx5: Change check for cacheable user mkeys
  RDMA/mlx5: Adding remote atomic access flag to updatable flags

Yishai Hadas (1):
  RDMA/mlx5: Relax DEVX access upon modify commands

 drivers/infiniband/hw/mlx5/cong.c    |  6 ++++++
 drivers/infiniband/hw/mlx5/devx.c    |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  2 +-
 drivers/infiniband/hw/mlx5/mr.c      | 18 ++++++++++--------
 drivers/infiniband/hw/mlx5/wr.c      |  2 +-
 include/linux/mlx5/mlx5_ifc.h        |  2 +-
 include/linux/mlx5/qp.h              |  5 ++++-
 7 files changed, 24 insertions(+), 13 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported Leon Romanovsky
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Edward Srouji, linux-rdma, Maor Gottlieb,
	Mark Zhang, Michael Guralnik, Or Har-Toov, Tamar Mashiah,
	Yishai Hadas

From: Leon Romanovsky <leonro@nvidia.com>

 ------------[ cut here ]------------
 memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2)
 WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
 Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy
  [last unloaded: mlx_compat(OE)]
 CPU: 0 PID: 293779 Comm: ssh Tainted: G           OE      6.2.0-32-generic #32~22.04.1-Ubuntu
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
 Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7
 RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8
 R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80
 FS:  00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? show_regs+0x72/0x90
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  ? __warn+0x8d/0x160
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  ? report_bug+0x1bb/0x1d0
  ? handle_bug+0x46/0x90
  ? exc_invalid_op+0x19/0x80
  ? asm_exc_invalid_op+0x1b/0x20
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib]
  ipoib_send+0x2ec/0x770 [ib_ipoib]
  ipoib_start_xmit+0x5a0/0x770 [ib_ipoib]
  dev_hard_start_xmit+0x8e/0x1e0
  ? validate_xmit_skb_list+0x4d/0x80
  sch_direct_xmit+0x116/0x3a0
  __dev_xmit_skb+0x1fd/0x580
  __dev_queue_xmit+0x284/0x6b0
  ? _raw_spin_unlock_irq+0xe/0x50
  ? __flush_work.isra.0+0x20d/0x370
  ? push_pseudo_header+0x17/0x40 [ib_ipoib]
  neigh_connected_output+0xcd/0x110
  ip_finish_output2+0x179/0x480
  ? __smp_call_single_queue+0x61/0xa0
  __ip_finish_output+0xc3/0x190
  ip_finish_output+0x2e/0xf0
  ip_output+0x78/0x110
  ? __pfx_ip_finish_output+0x10/0x10
  ip_local_out+0x64/0x70
  __ip_queue_xmit+0x18a/0x460
  ip_queue_xmit+0x15/0x30
  __tcp_transmit_skb+0x914/0x9c0
  tcp_write_xmit+0x334/0x8d0
  tcp_push_one+0x3c/0x60
  tcp_sendmsg_locked+0x2e1/0xac0
  tcp_sendmsg+0x2d/0x50
  inet_sendmsg+0x43/0x90
  sock_sendmsg+0x68/0x80
  sock_write_iter+0x93/0x100
  vfs_write+0x326/0x3c0
  ksys_write+0xbd/0xf0
  ? do_syscall_64+0x69/0x90
  __x64_sys_write+0x19/0x30
  do_syscall_64+0x59/0x90
  ? do_user_addr_fault+0x1d0/0x640
  ? exit_to_user_mode_prepare+0x3b/0xd0
  ? irqentry_exit_to_user_mode+0x9/0x20
  ? irqentry_exit+0x43/0x50
  ? exc_page_fault+0x92/0x1b0
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 RIP: 0033:0x7fc03ad14a37
 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
 RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37
 RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003
 RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
 R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50
  </TASK>
 ---[ end trace 0000000000000000 ]---

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/wr.c | 2 +-
 include/linux/mlx5/qp.h         | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/wr.c b/drivers/infiniband/hw/mlx5/wr.c
index df1d1b0a3ef7..9947feb7fb8a 100644
--- a/drivers/infiniband/hw/mlx5/wr.c
+++ b/drivers/infiniband/hw/mlx5/wr.c
@@ -78,7 +78,7 @@ static void set_eth_seg(const struct ib_send_wr *wr, struct mlx5_ib_qp *qp,
 		 */
 		copysz = min_t(u64, *cur_edge - (void *)eseg->inline_hdr.start,
 			       left);
-		memcpy(eseg->inline_hdr.start, pdata, copysz);
+		memcpy(eseg->inline_hdr.data, pdata, copysz);
 		stride = ALIGN(sizeof(struct mlx5_wqe_eth_seg) -
 			       sizeof(eseg->inline_hdr.start) + copysz, 16);
 		*size += stride / 16;
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index bd53cf4be7bd..f0e55bf3ec8b 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -269,7 +269,10 @@ struct mlx5_wqe_eth_seg {
 	union {
 		struct {
 			__be16 sz;
-			u8     start[2];
+			union {
+				u8     start[2];
+				DECLARE_FLEX_ARRAY(u8, data);
+			};
 		} inline_hdr;
 		struct {
 			__be16 type;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands Leon Romanovsky
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Zhang, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Michael Guralnik, Or Har-Toov, Tamar Mashiah,
	Yishai Hadas

From: Mark Zhang <markzhang@nvidia.com>

debugfs entries for RRoCE general CC parameters must be exposed only when
they are supported, otherwise when accessing them there may be a syndrome
error in kernel log, for example:

$ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp
cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument
$ dmesg
 mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22)

Fixes: 66fb1d5df6ac ("IB/mlx5: Extend debug control for CC parameters")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cong.c | 6 ++++++
 include/linux/mlx5/mlx5_ifc.h     | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index f87531318feb..a78a067e3ce7 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -458,6 +458,12 @@ void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u32 port_num)
 	dbg_cc_params->root = debugfs_create_dir("cc_params", mlx5_debugfs_get_dev_root(mdev));
 
 	for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
+		if ((i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP_VALID ||
+		     i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP))
+			if (!MLX5_CAP_GEN(mdev, roce) ||
+			    !MLX5_CAP_ROCE(mdev, roce_cc_general))
+				continue;
+
 		dbg_cc_params->params[i].offset = i;
 		dbg_cc_params->params[i].dev = dev;
 		dbg_cc_params->params[i].port_num = port_num;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index bf5320b28b8b..2c10350bd422 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1103,7 +1103,7 @@ struct mlx5_ifc_roce_cap_bits {
 	u8         sw_r_roce_src_udp_port[0x1];
 	u8         fl_rc_qp_when_roce_disabled[0x1];
 	u8         fl_rc_qp_when_roce_enabled[0x1];
-	u8         reserved_at_7[0x1];
+	u8         roce_cc_general[0x1];
 	u8	   qp_ooo_transmit_default[0x1];
 	u8         reserved_at_9[0x15];
 	u8	   qp_ts_format[0x2];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent Leon Romanovsky
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Michael Guralnik, Or Har-Toov,
	Tamar Mashiah

From: Yishai Hadas <yishaih@nvidia.com>

Relax DEVX access upon modify commands to be UVERBS_ACCESS_READ.

The kernel doesn't need to protect what firmware protects, or what
causes no damage to anyone but the user.

As firmware needs to protect itself from parallel access to the same
object, don't block parallel modify/query commands on the same object in
the kernel side.

This change will allow user space application to run parallel updates to
different entries in the same bulk object.

Tested-by: Tamar Mashiah <tmashiah@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 4d8f5180134e..9d91790a2af2 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2950,7 +2950,7 @@ DECLARE_UVERBS_NAMED_METHOD(
 	MLX5_IB_METHOD_DEVX_OBJ_MODIFY,
 	UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_OBJ_MODIFY_HANDLE,
 			UVERBS_IDR_ANY_OBJECT,
-			UVERBS_ACCESS_WRITE,
+			UVERBS_ACCESS_READ,
 			UA_MANDATORY),
 	UVERBS_ATTR_PTR_IN(
 		MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
                   ` (2 preceding siblings ...)
  2024-01-28  9:29 ` [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-28  9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
	Yishai Hadas

From: Or Har-Toov <ohartoov@nvidia.com>

As some mkeys can't be modified with UMR due to some UMR limitations,
like the size of translation that can be updated, not all user mkeys can
be cached.

Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4bdf3da579f4..69b1722c2280 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -643,7 +643,7 @@ struct mlx5_ib_mkey {
 	unsigned int ndescs;
 	struct wait_queue_head wait;
 	refcount_t usecount;
-	/* User Mkey must hold either a rb_key or a cache_ent. */
+	/* Cacheable user Mkey must hold either a rb_key or a cache_ent. */
 	struct mlx5r_cache_rb_key rb_key;
 	struct mlx5_cache_ent *cache_ent;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
                   ` (3 preceding siblings ...)
  2024-01-28  9:29 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-29 17:52   ` Jason Gunthorpe
  2024-01-28  9:29 ` [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags Leon Romanovsky
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
	Yishai Hadas

From: Or Har-Toov <ohartoov@nvidia.com>

In the dereg flow, UMEM is not a good enough indication whether an MR
is from userspace since in mlx5_ib_rereg_user_mr there are some cases
when a new MR is created and the UMEM of the old MR is set to NULL.
Currently when mlx5_ib_dereg_mr is called on the old MR, UMEM is NULL
but cache_ent can be different than NULL. So, the mkey will not be
destroyed.
Therefore checking if mkey is from user application and cacheable
should be done by checking if rb_key or cache_ent exist and all other kind of
mkeys should be destroyed.

Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 12bca6ca4760..87552a689e07 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1857,6 +1857,11 @@ static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
 	return ret;
 }
 
+static bool is_cacheable_mkey(struct mlx5_ib_mkey *mkey)
+{
+	return mkey->cache_ent || mkey->rb_key.ndescs;
+}
+
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
@@ -1901,12 +1906,6 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 		mr->sig = NULL;
 	}
 
-	/* Stop DMA */
-	if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
-		if (mlx5r_umr_revoke_mr(mr) ||
-		    cache_ent_find_and_store(dev, mr))
-			mr->mmkey.cache_ent = NULL;
-
 	if (mr->umem && mr->umem->is_peer) {
 		rc = mlx5r_umr_revoke_mr(mr);
 		if (rc)
@@ -1914,7 +1913,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 		ib_umem_stop_invalidation_notifier(mr->umem);
 	}
 
-	if (!mr->mmkey.cache_ent) {
+	/* Stop DMA */
+	if (!is_cacheable_mkey(&mr->mmkey) || mlx5r_umr_revoke_mr(mr) ||
+	    cache_ent_find_and_store(dev, mr)) {
 		rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
 		if (rc)
 			return rc;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
                   ` (4 preceding siblings ...)
  2024-01-28  9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
@ 2024-01-28  9:29 ` Leon Romanovsky
  2024-01-31  9:16 ` (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
  2024-01-31  9:18 ` Leon Romanovsky
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-28  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
	Yishai Hadas

From: Or Har-Toov <ohartoov@nvidia.com>

Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR
although in some cases it should be possible. These cases are checked in
mlx5r_umr_can_reconfig function.

Fixes: ef3642c4f54d ("RDMA/mlx5: Fix error unwinds for rereg_mr")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 87552a689e07..db8c436de6ee 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1581,7 +1581,8 @@ static bool can_use_umr_rereg_access(struct mlx5_ib_dev *dev,
 	unsigned int diffs = current_access_flags ^ target_access_flags;
 
 	if (diffs & ~(IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
-		      IB_ACCESS_REMOTE_READ | IB_ACCESS_RELAXED_ORDERING))
+		      IB_ACCESS_REMOTE_READ | IB_ACCESS_RELAXED_ORDERING |
+		      IB_ACCESS_REMOTE_ATOMIC))
 		return false;
 	return mlx5r_umr_can_reconfig(dev, current_access_flags,
 				      target_access_flags);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-28  9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
@ 2024-01-29 17:52   ` Jason Gunthorpe
  2024-01-30 13:47     ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2024-01-29 17:52 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Or Har-Toov, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Michael Guralnik, Tamar Mashiah,
	Yishai Hadas

On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> From: Or Har-Toov <ohartoov@nvidia.com>
> 
> In the dereg flow, UMEM is not a good enough indication whether an MR
> is from userspace since in mlx5_ib_rereg_user_mr there are some cases
> when a new MR is created and the UMEM of the old MR is set to NULL.

Why is this a problem though? The only thing the umem has to do is to
trigger the UMR optimization. If UMR is not triggered then the mkey is
destroyed and it shouldn't be part of the cache at all.

> Currently when mlx5_ib_dereg_mr is called on the old MR, UMEM is NULL
> but cache_ent can be different than NULL. So, the mkey will not be
> destroyed.
> Therefore checking if mkey is from user application and cacheable
> should be done by checking if rb_key or cache_ent exist and all other kind of
> mkeys should be destroyed.
> 
> Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
> Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 12bca6ca4760..87552a689e07 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1857,6 +1857,11 @@ static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
>  	return ret;
>  }
>  
> +static bool is_cacheable_mkey(struct mlx5_ib_mkey *mkey)
> +{
> +	return mkey->cache_ent || mkey->rb_key.ndescs;
> +}
> +
>  int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>  {
>  	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> @@ -1901,12 +1906,6 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>  		mr->sig = NULL;
>  	}
>  
> -	/* Stop DMA */
> -	if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
> -		if (mlx5r_umr_revoke_mr(mr) ||
> -		    cache_ent_find_and_store(dev, mr))
> -			mr->mmkey.cache_ent = NULL;
> -
>  	if (mr->umem && mr->umem->is_peer) {
>  		rc = mlx5r_umr_revoke_mr(mr);
>  		if (rc)

?? this isn't based on an upstream tree

> @@ -1914,7 +1913,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>  		ib_umem_stop_invalidation_notifier(mr->umem);
>  	}
>  
> -	if (!mr->mmkey.cache_ent) {
> +	/* Stop DMA */
> +	if (!is_cacheable_mkey(&mr->mmkey) || mlx5r_umr_revoke_mr(mr) ||
> +	    cache_ent_find_and_store(dev, mr)) {

And now the mlx5r_umr_can_load_pas() can been lost, that isn't good. A
non-umr-able object should never be placed in the cache. If the mkey's
size is too big it has to be freed normally.

>  		rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
>  		if (rc)
>  			return rc;

I'm not sure it is right to re-order this? The revokation of a mkey
should be a single operation, which ever path we choose to take..

Regardless the upstream code doesn't have this ordering so it should
all be one sequence of revoking the mkey and synchronizing the cache.

I suggest to put the revoke sequence into one function:

static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
	struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);

	if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length)) {
		if (mlx5r_umr_revoke_mr(mr))
			goto destroy;

		if (cache_ent_find_and_store(dev, mr))
			goto destroy;
		return 0;
	}

destroy:
	if (mr->mmkey.cache_ent) {
		spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
		mr->mmkey.cache_ent->in_use--;
		mr->mmkey.cache_ent = NULL;
		spin_unlock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
	}
	return destroy_mkey(dev, mr);
}

(notice we probably shouldn't set cache_ent to null without adjusting in_use)

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-29 17:52   ` Jason Gunthorpe
@ 2024-01-30 13:47     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-30 13:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Har-Toov, Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
	Michael Guralnik, Tamar Mashiah, Yishai Hadas

On Mon, Jan 29, 2024 at 01:52:39PM -0400, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> > From: Or Har-Toov <ohartoov@nvidia.com>

<...>

> >  	if (mr->umem && mr->umem->is_peer) {
> >  		rc = mlx5r_umr_revoke_mr(mr);
> >  		if (rc)
> 
> ?? this isn't based on an upstream tree

Yes, it is my mistake. I will fix it.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
                   ` (5 preceding siblings ...)
  2024-01-28  9:29 ` [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags Leon Romanovsky
@ 2024-01-31  9:16 ` Leon Romanovsky
  2024-01-31  9:18 ` Leon Romanovsky
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-31  9:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
	Michael Guralnik, Or Har-Toov, Tamar Mashiah, Yishai Hadas,
	Leon Romanovsky


On Sun, 28 Jan 2024 11:29:10 +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Changelog:
> v1:
>  * Changed function signature is_cacheable_mkey to pass pointer and not value.
> v0: https://lore.kernel.org/all/cover.1706185318.git.leon@kernel.org
> 
> [...]

Applied, thanks!

[1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment
      https://git.kernel.org/rdma/rdma/c/4d5e86a56615cc
[2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
      https://git.kernel.org/rdma/rdma/c/43fdbd140238d4
[3/6] RDMA/mlx5: Relax DEVX access upon modify commands
      https://git.kernel.org/rdma/rdma/c/be551ee1574280

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes
  2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
                   ` (6 preceding siblings ...)
  2024-01-31  9:16 ` (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
@ 2024-01-31  9:18 ` Leon Romanovsky
  7 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-01-31  9:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Edward Srouji, linux-rdma, Maor Gottlieb, Mark Zhang,
	Michael Guralnik, Or Har-Toov, Tamar Mashiah, Yishai Hadas

On Sun, Jan 28, 2024 at 11:29:10AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>

<...>

> Leon Romanovsky (1):
>   RDMA/mlx5: Fix fortify source warning while accessing Eth segment
> 
> Mark Zhang (1):
>   IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if
>     not supported
>
> Yishai Hadas (1):
>   RDMA/mlx5: Relax DEVX access upon modify commands

Applied these patches to -rc.

>
> Or Har-Toov (3):
>   RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent
>   RDMA/mlx5: Change check for cacheable user mkeys
>   RDMA/mlx5: Adding remote atomic access flag to updatable flags

These patches under discussion and will be needed to resend anyway.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
       [not found] <7429abbc-5400-b034-c26a-cdc587689904@nvidia.com>
@ 2024-01-31 12:50 ` Michael Guralnik
  2024-01-31 14:18   ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Guralnik @ 2024-01-31 12:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Leon Romanovsky, Edward Srouji, linux-rdma, Maor Gottlieb,
	Mark Zhang, Tamar Mashiah, Yishai Hadas, Or Har-Toov

On 29/01/2024 19:52, Jason Gunthorpe wrote:
> On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
>> From: Or Har-Toov <ohartoov@nvidia.com>
>>
>> In the dereg flow, UMEM is not a good enough indication whether an MR
>> is from userspace since in mlx5_ib_rereg_user_mr there are some cases
>> when a new MR is created and the UMEM of the old MR is set to NULL.
> Why is this a problem though? The only thing the umem has to do is to
> trigger the UMR optimization. If UMR is not triggered then the mkey is
> destroyed and it shouldn't be part of the cache at all.

The problem is that it doesn't trigger the UMR on mkeys that are dereged
from the rereg flow.
Optimally, we'd want them to return to the cache, if possible.

We can keep relying on the UMEM to decide whether we want to try to return
them to cache, as you suggested in the revoke_mr() below, but that way those
mkeys will not return to the cache and we have to deal with the in_use in
the revoke flow.

>> @@ -1914,7 +1913,9 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct 
>> ib_udata *udata)
>> ib_umem_stop_invalidation_notifier(mr->umem);
>> }
>> - if (!mr->mmkey.cache_ent) {
>> + /* Stop DMA */
>> + if (!is_cacheable_mkey(&mr->mmkey) || mlx5r_umr_revoke_mr(mr) ||
>> + cache_ent_find_and_store(dev, mr)) {
> And now the mlx5r_umr_can_load_pas() can been lost, that isn't good. A
> non-umr-able object should never be placed in the cache. If the mkey's
> size is too big it has to be freed normally.

mlx5r_can_load_pas() will not get lost since mkeys that are not-umr-able 
will
not have rb_key or cache_ent set so is_cacheable_mkey is always false 
for them.

>> rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
>> if (rc)
>> return rc;
> I'm not sure it is right to re-order this? The revokation of a mkey
> should be a single operation, which ever path we choose to take..
>
> Regardless the upstream code doesn't have this ordering so it should
> all be one sequence of revoking the mkey and synchronizing the cache.
>
> I suggest to put the revoke sequence into one function:
>
> static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
> {
> struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
>
> if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length)) {
> if (mlx5r_umr_revoke_mr(mr))
> goto destroy;
>
> if (cache_ent_find_and_store(dev, mr))
> goto destroy;
> return 0;
> }
>
> destroy:
> if (mr->mmkey.cache_ent) {
> spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
> mr->mmkey.cache_ent->in_use--;
> mr->mmkey.cache_ent = NULL;
> spin_unlock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock);
> }
> return destroy_mkey(dev, mr);
> }
>
> (notice we probably shouldn't set cache_ent to null without adjusting 
> in_use)
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-31 12:50 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Michael Guralnik
@ 2024-01-31 14:18   ` Jason Gunthorpe
  2024-01-31 14:35     ` Michael Guralnik
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2024-01-31 14:18 UTC (permalink / raw)
  To: Michael Guralnik
  Cc: Leon Romanovsky, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Tamar Mashiah, Yishai Hadas,
	Or Har-Toov

On Wed, Jan 31, 2024 at 02:50:03PM +0200, Michael Guralnik wrote:
> On 29/01/2024 19:52, Jason Gunthorpe wrote:
> > On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> > > From: Or Har-Toov <ohartoov@nvidia.com>
> > > 
> > > In the dereg flow, UMEM is not a good enough indication whether an MR
> > > is from userspace since in mlx5_ib_rereg_user_mr there are some cases
> > > when a new MR is created and the UMEM of the old MR is set to NULL.
> > Why is this a problem though? The only thing the umem has to do is to
> > trigger the UMR optimization. If UMR is not triggered then the mkey is
> > destroyed and it shouldn't be part of the cache at all.
> 
> The problem is that it doesn't trigger the UMR on mkeys that are dereged
> from the rereg flow.
> Optimally, we'd want them to return to the cache, if possible.

Right, so you suggest changing the umem and umr_can_load into
is_cacheable_mkey() and carefully ensuring the rb_key.ndescs is 
zero for non-umrable?

> We can keep relying on the UMEM to decide whether we want to try to return
> them to cache, as you suggested in the revoke_mr() below, but that way those
> mkeys will not return to the cache and we have to deal with the in_use in
> the revoke flow.

I don't know what this in_use means? in_use should be only an issue if
the cache_ent is set? Are we really having in_use be set and cache_ent
bet NULL? That seems like a different bug that should be fixed by
keeping cache_ent and in_use consistent.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-31 14:18   ` Jason Gunthorpe
@ 2024-01-31 14:35     ` Michael Guralnik
  2024-01-31 15:23       ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Guralnik @ 2024-01-31 14:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Tamar Mashiah, Yishai Hadas,
	Or Har-Toov


On 31/01/2024 16:18, Jason Gunthorpe wrote:
> On Wed, Jan 31, 2024 at 02:50:03PM +0200, Michael Guralnik wrote:
>> On 29/01/2024 19:52, Jason Gunthorpe wrote:
>>> On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
>>>> From: Or Har-Toov <ohartoov@nvidia.com>
>>>>
>>>> In the dereg flow, UMEM is not a good enough indication whether an MR
>>>> is from userspace since in mlx5_ib_rereg_user_mr there are some cases
>>>> when a new MR is created and the UMEM of the old MR is set to NULL.
>>> Why is this a problem though? The only thing the umem has to do is to
>>> trigger the UMR optimization. If UMR is not triggered then the mkey is
>>> destroyed and it shouldn't be part of the cache at all.
>> The problem is that it doesn't trigger the UMR on mkeys that are dereged
>> from the rereg flow.
>> Optimally, we'd want them to return to the cache, if possible.
> Right, so you suggest changing the umem and umr_can_load into
> is_cacheable_mkey() and carefully ensuring the rb_key.ndescs is
> zero for non-umrable?

Yes. The code is already written trying to ensure this and we've rephrased
a comment in the previous patch to describe this more accurately.

>> We can keep relying on the UMEM to decide whether we want to try to return
>> them to cache, as you suggested in the revoke_mr() below, but that way those
>> mkeys will not return to the cache and we have to deal with the in_use in
>> the revoke flow.
> I don't know what this in_use means? in_use should be only an issue if
> the cache_ent is set? Are we really having in_use be set and cache_ent
> bet NULL? That seems like a different bug that should be fixed by
> keeping cache_ent and in_use consistent.

in_use should be handled only if mkey has a cache_ent.

I take back what I wrote previously, in_use should be handled in revoke_mr
no matter how we choose to implement this, since we're not guaranteed to
succeed in UMR and might end up dereging mkeys from the cache.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-31 14:35     ` Michael Guralnik
@ 2024-01-31 15:23       ` Jason Gunthorpe
  2024-01-31 18:25         ` Michael Guralnik
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2024-01-31 15:23 UTC (permalink / raw)
  To: Michael Guralnik
  Cc: Leon Romanovsky, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Tamar Mashiah, Yishai Hadas,
	Or Har-Toov

On Wed, Jan 31, 2024 at 04:35:17PM +0200, Michael Guralnik wrote:
> 
> On 31/01/2024 16:18, Jason Gunthorpe wrote:
> > On Wed, Jan 31, 2024 at 02:50:03PM +0200, Michael Guralnik wrote:
> > > On 29/01/2024 19:52, Jason Gunthorpe wrote:
> > > > On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
> > > > > From: Or Har-Toov <ohartoov@nvidia.com>
> > > > > 
> > > > > In the dereg flow, UMEM is not a good enough indication whether an MR
> > > > > is from userspace since in mlx5_ib_rereg_user_mr there are some cases
> > > > > when a new MR is created and the UMEM of the old MR is set to NULL.
> > > > Why is this a problem though? The only thing the umem has to do is to
> > > > trigger the UMR optimization. If UMR is not triggered then the mkey is
> > > > destroyed and it shouldn't be part of the cache at all.
> > > The problem is that it doesn't trigger the UMR on mkeys that are dereged
> > > from the rereg flow.
> > > Optimally, we'd want them to return to the cache, if possible.
> > Right, so you suggest changing the umem and umr_can_load into
> > is_cacheable_mkey() and carefully ensuring the rb_key.ndescs is
> > zero for non-umrable?
> 
> Yes. The code is already written trying to ensure this and we've rephrased
> a comment in the previous patch to describe this more accurately.

But then I wonder why does cache_ent become NULL but the rb_key.ndesc
is set? That seems pretty confusing.

> > > We can keep relying on the UMEM to decide whether we want to try to return
> > > them to cache, as you suggested in the revoke_mr() below, but that way those
> > > mkeys will not return to the cache and we have to deal with the in_use in
> > > the revoke flow.
> > I don't know what this in_use means? in_use should be only an issue if
> > the cache_ent is set? Are we really having in_use be set and cache_ent
> > bet NULL? That seems like a different bug that should be fixed by
> > keeping cache_ent and in_use consistent.
> 
> in_use should be handled only if mkey has a cache_ent.
> 
> I take back what I wrote previously, in_use should be handled in revoke_mr
> no matter how we choose to implement this, since we're not guaranteed to
> succeed in UMR and might end up dereging mkeys from the cache.

That makes the most sense, yes.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys
  2024-01-31 15:23       ` Jason Gunthorpe
@ 2024-01-31 18:25         ` Michael Guralnik
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Guralnik @ 2024-01-31 18:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Leon Romanovsky, Edward Srouji, linux-rdma,
	Maor Gottlieb, Mark Zhang, Tamar Mashiah, Yishai Hadas,
	Or Har-Toov


On 31/01/2024 17:23, Jason Gunthorpe wrote:
> On Wed, Jan 31, 2024 at 04:35:17PM +0200, Michael Guralnik wrote:
>> On 31/01/2024 16:18, Jason Gunthorpe wrote:
>>> On Wed, Jan 31, 2024 at 02:50:03PM +0200, Michael Guralnik wrote:
>>>> On 29/01/2024 19:52, Jason Gunthorpe wrote:
>>>>> On Sun, Jan 28, 2024 at 11:29:15AM +0200, Leon Romanovsky wrote:
>>>>>> From: Or Har-Toov <ohartoov@nvidia.com>
>>>>>>
>>>>>> In the dereg flow, UMEM is not a good enough indication whether an MR
>>>>>> is from userspace since in mlx5_ib_rereg_user_mr there are some cases
>>>>>> when a new MR is created and the UMEM of the old MR is set to NULL.
>>>>> Why is this a problem though? The only thing the umem has to do is to
>>>>> trigger the UMR optimization. If UMR is not triggered then the mkey is
>>>>> destroyed and it shouldn't be part of the cache at all.
>>>> The problem is that it doesn't trigger the UMR on mkeys that are dereged
>>>> from the rereg flow.
>>>> Optimally, we'd want them to return to the cache, if possible.
>>> Right, so you suggest changing the umem and umr_can_load into
>>> is_cacheable_mkey() and carefully ensuring the rb_key.ndescs is
>>> zero for non-umrable?
>> Yes. The code is already written trying to ensure this and we've rephrased
>> a comment in the previous patch to describe this more accurately.
> But then I wonder why does cache_ent become NULL but the rb_key.ndesc
> is set? That seems pretty confusing.

I think we did it only in the flow where we destroy the mkey to mark mkeys
that were not returned to the cache.

Do you think it'll be better if we switch to marking this explicitly?
Add a flag to the mkey that marks it as a cachable mkey and we'll make this
decision per mkey creation.
Then we can stop relying on the value of other variables to decide what
goes back to cache.


Michael


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-31 18:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-28  9:29 [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 1/6] RDMA/mlx5: Fix fortify source warning while accessing Eth segment Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 2/6] IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 3/6] RDMA/mlx5: Relax DEVX access upon modify commands Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 4/6] RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Leon Romanovsky
2024-01-29 17:52   ` Jason Gunthorpe
2024-01-30 13:47     ` Leon Romanovsky
2024-01-28  9:29 ` [PATCH rdma-next v1 6/6] RDMA/mlx5: Adding remote atomic access flag to updatable flags Leon Romanovsky
2024-01-31  9:16 ` (subset) [PATCH rdma-next v1 0/6] Collection of mlx5_ib fixes Leon Romanovsky
2024-01-31  9:18 ` Leon Romanovsky
     [not found] <7429abbc-5400-b034-c26a-cdc587689904@nvidia.com>
2024-01-31 12:50 ` [PATCH rdma-next v1 5/6] RDMA/mlx5: Change check for cacheable user mkeys Michael Guralnik
2024-01-31 14:18   ` Jason Gunthorpe
2024-01-31 14:35     ` Michael Guralnik
2024-01-31 15:23       ` Jason Gunthorpe
2024-01-31 18:25         ` Michael Guralnik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).