linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree
@ 2022-12-07  8:57 Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Michael Guralnik

This series moves the MR cache to use RB tree to store the entries of the
cache. By doing so, enabling more flexibility when managing the cache
entries.

The MR cache will now cache mkeys returned by the user even if they are
not from one of the predefined pools, by that allowing restarting
applications to reuse their released mkey and improve restart times.

v1->v2:
- Rearrange patch order to first introduce the RB-tree and only then
  introduce the caching of previously non-cachable mkeys

v0->v1:
- Fix rb tree search from memcmp to dedicated cmp function
- Rewording of some commit messages

Aharon Landau (2):
  RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries
  RDMA/mlx5: Remove explicit ODP cache entry

Michael Guralnik (4):
  RDMA/mlx5: Change the cache structure to an RB-tree
  RDMA/mlx5: Introduce mlx5r_cache_rb_key
  RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
  RDMA/mlx5: Add work to remove temporary entries from the cache

 drivers/infiniband/hw/mlx5/mlx5_ib.h |  42 ++-
 drivers/infiniband/hw/mlx5/mr.c      | 445 +++++++++++++++++++++------
 drivers/infiniband/hw/mlx5/odp.c     |  37 +--
 include/linux/mlx5/driver.h          |   1 -
 4 files changed, 394 insertions(+), 131 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Aharon Landau

From: Aharon Landau <aharonl@nvidia.com>

mkc.log_page_size can be changed using UMR. Therefore, don't treat it as
a cache entry property.
Removing it from struct mlx5_cache_ent.
All cache mkeys will be created with default PAGE_SHIFT, and updated
with the needed page_shift using UMR when passing them to a user.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 -
 drivers/infiniband/hw/mlx5/mr.c      | 3 +--
 drivers/infiniband/hw/mlx5/odp.c     | 2 --
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 8b91babdd4c0..8d985f792367 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -739,7 +739,6 @@ struct mlx5_cache_ent {
 	char                    name[4];
 	u32                     order;
 	u32			access_mode;
-	u32			page;
 	unsigned int		ndescs;
 
 	u8 disabled:1;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 410cc5fd2523..29ad674a9bcd 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -297,7 +297,7 @@ static void set_cache_mkc(struct mlx5_cache_ent *ent, void *mkc)
 
 	MLX5_SET(mkc, mkc, translations_octword_size,
 		 get_mkc_octo_size(ent->access_mode, ent->ndescs));
-	MLX5_SET(mkc, mkc, log_page_size, ent->page);
+	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
 }
 
 /* Asynchronously schedule new MRs to be populated in the cache. */
@@ -765,7 +765,6 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 		if (ent->order > mkey_cache_max_order(dev))
 			continue;
 
-		ent->page = PAGE_SHIFT;
 		ent->ndescs = 1 << ent->order;
 		ent->access_mode = MLX5_MKC_ACCESS_MODE_MTT;
 		if ((dev->mdev->profile.mask & MLX5_PROF_MASK_MR_CACHE) &&
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index bc97958818bb..e9a29adef7dc 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1595,14 +1595,12 @@ void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
 
 	switch (ent->order - 2) {
 	case MLX5_IMR_MTT_CACHE_ENTRY:
-		ent->page = PAGE_SHIFT;
 		ent->ndescs = MLX5_IMR_MTT_ENTRIES;
 		ent->access_mode = MLX5_MKC_ACCESS_MODE_MTT;
 		ent->limit = 0;
 		break;
 
 	case MLX5_IMR_KSM_CACHE_ENTRY:
-		ent->page = MLX5_KSM_PAGE_SHIFT;
 		ent->ndescs = mlx5_imr_ksm_entries;
 		ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
 		ent->limit = 0;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  2022-12-08  0:02   ` Jason Gunthorpe
  2022-12-08  0:22   ` Jason Gunthorpe
  2022-12-07  8:57 ` [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree Michael Guralnik
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Aharon Landau

From: Aharon Landau <aharonl@nvidia.com>

Explicit ODP mkey doesn't have unique properties. There is no need to
devote to it a special entry. Removing it.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 19 ++++---------------
 include/linux/mlx5/driver.h      |  1 -
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index e9a29adef7dc..71b733fcac37 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -406,6 +406,7 @@ static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
 static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
 						unsigned long idx)
 {
+	int order = order_base_2(MLX5_IMR_MTT_ENTRIES);
 	struct mlx5_ib_dev *dev = mr_to_mdev(imr);
 	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
@@ -418,7 +419,7 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
 	if (IS_ERR(odp))
 		return ERR_CAST(odp);
 
-	mr = mlx5_mr_cache_alloc(dev, &dev->cache.ent[MLX5_IMR_MTT_CACHE_ENTRY],
+	mr = mlx5_mr_cache_alloc(dev, &dev->cache.ent[order],
 				 imr->access_flags);
 	if (IS_ERR(mr)) {
 		ib_umem_odp_release(odp);
@@ -1592,20 +1593,8 @@ void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
 {
 	if (!(ent->dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
 		return;
-
-	switch (ent->order - 2) {
-	case MLX5_IMR_MTT_CACHE_ENTRY:
-		ent->ndescs = MLX5_IMR_MTT_ENTRIES;
-		ent->access_mode = MLX5_MKC_ACCESS_MODE_MTT;
-		ent->limit = 0;
-		break;
-
-	case MLX5_IMR_KSM_CACHE_ENTRY:
-		ent->ndescs = mlx5_imr_ksm_entries;
-		ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
-		ent->limit = 0;
-		break;
-	}
+	ent->ndescs = mlx5_imr_ksm_entries;
+	ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
 }
 
 static const struct ib_device_ops mlx5_ib_dev_odp_ops = {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index af2ceb4160bc..cdf6ed25778e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -736,7 +736,6 @@ enum {
 
 enum {
 	MKEY_CACHE_LAST_STD_ENTRY = 20,
-	MLX5_IMR_MTT_CACHE_ENTRY,
 	MLX5_IMR_KSM_CACHE_ENTRY,
 	MAX_MKEY_CACHE_ENTRIES
 };
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  2022-12-08  0:17   ` Jason Gunthorpe
  2022-12-07  8:57 ` [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Michael Guralnik

Currently, the cache structure is a static linear array. Therefore, his
size is limited to the number of entries in it and is not expandable.
The entries are dedicated to mkeys of size 2^x and no
access_flags. Mkeys with different properties are not cacheable.

In this patch, we change the cache structure to an RB-tree.
This will allow to extend the cache to support more entries with
different mkey properties.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  13 ++-
 drivers/infiniband/hw/mlx5/mr.c      | 160 ++++++++++++++++++++-------
 drivers/infiniband/hw/mlx5/odp.c     |   8 +-
 3 files changed, 133 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 8d985f792367..10e22fb01e1b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -741,6 +741,8 @@ struct mlx5_cache_ent {
 	u32			access_mode;
 	unsigned int		ndescs;
 
+	struct rb_node		node;
+
 	u8 disabled:1;
 	u8 fill_to_high_water:1;
 
@@ -770,8 +772,9 @@ struct mlx5r_async_create_mkey {
 
 struct mlx5_mkey_cache {
 	struct workqueue_struct *wq;
-	struct mlx5_cache_ent	ent[MAX_MKEY_CACHE_ENTRIES];
-	struct dentry		*root;
+	struct rb_root		rb_root;
+	struct mutex		rb_lock;
+	struct dentry		*fs_root;
 	unsigned long		last_add;
 };
 
@@ -1316,11 +1319,15 @@ void mlx5_ib_copy_pas(u64 *old, u64 *new, int step, int num);
 int mlx5_ib_get_cqe_size(struct ib_cq *ibcq);
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev);
 int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev);
+struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
+					      int order);
 
 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 				       struct mlx5_cache_ent *ent,
 				       int access_flags);
 
+struct mlx5_ib_mr *mlx5_mr_cache_alloc_order(struct mlx5_ib_dev *dev, u32 order,
+					     int access_flags);
 int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
 			    struct ib_mr_status *mr_status);
 struct ib_wq *mlx5_ib_create_wq(struct ib_pd *pd,
@@ -1362,7 +1369,7 @@ static inline int mlx5r_odp_create_eq(struct mlx5_ib_dev *dev,
 static inline void mlx5_ib_odp_cleanup_one(struct mlx5_ib_dev *ibdev) {}
 static inline int mlx5_ib_odp_init(void) { return 0; }
 static inline void mlx5_ib_odp_cleanup(void)				    {}
-static inline void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent) {}
+void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent) {}
 static inline void mlx5_odp_populate_xlt(void *xlt, size_t idx, size_t nentries,
 					 struct mlx5_ib_mr *mr, int flags) {}
 
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 29ad674a9bcd..f35022067769 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -515,18 +515,22 @@ static const struct file_operations limit_fops = {
 
 static bool someone_adding(struct mlx5_mkey_cache *cache)
 {
-	unsigned int i;
-
-	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		struct mlx5_cache_ent *ent = &cache->ent[i];
-		bool ret;
+	struct mlx5_cache_ent *ent;
+	struct rb_node *node;
+	bool ret;
 
+	mutex_lock(&cache->rb_lock);
+	for (node = rb_first(&cache->rb_root); node; node = rb_next(node)) {
+		ent = rb_entry(node, struct mlx5_cache_ent, node);
 		xa_lock_irq(&ent->mkeys);
 		ret = ent->stored < ent->limit;
 		xa_unlock_irq(&ent->mkeys);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&cache->rb_lock);
 			return true;
+		}
 	}
+	mutex_unlock(&cache->rb_lock);
 	return false;
 }
 
@@ -637,6 +641,59 @@ static void delayed_cache_work_func(struct work_struct *work)
 	__cache_work_func(ent);
 }
 
+static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache,
+				 struct mlx5_cache_ent *ent)
+{
+	struct rb_node **new = &cache->rb_root.rb_node, *parent = NULL;
+	struct mlx5_cache_ent *cur;
+
+	mutex_lock(&cache->rb_lock);
+	/* Figure out where to put new node */
+	while (*new) {
+		cur = rb_entry(*new, struct mlx5_cache_ent, node);
+		parent = *new;
+		if (ent->order < cur->order)
+			new = &((*new)->rb_left);
+		if (ent->order > cur->order)
+			new = &((*new)->rb_right);
+		if (ent->order == cur->order) {
+			mutex_unlock(&cache->rb_lock);
+			return -EEXIST;
+		}
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&ent->node, parent, new);
+	rb_insert_color(&ent->node, &cache->rb_root);
+
+	mutex_unlock(&cache->rb_lock);
+	return 0;
+}
+
+static struct mlx5_cache_ent *mkey_cache_ent_from_order(struct mlx5_ib_dev *dev,
+							unsigned int order)
+{
+	struct rb_node *node = dev->cache.rb_root.rb_node;
+	struct mlx5_cache_ent *cur, *smallest = NULL;
+
+	/*
+	 * Find the smallest ent with order >= requested_order.
+	 */
+	while (node) {
+		cur = rb_entry(node, struct mlx5_cache_ent, node);
+		if (cur->order > order) {
+			smallest = cur;
+			node = node->rb_left;
+		}
+		if (cur->order < order)
+			node = node->rb_right;
+		if (cur->order == order)
+			return cur;
+	}
+
+	return smallest;
+}
+
 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 				       struct mlx5_cache_ent *ent,
 				       int access_flags)
@@ -677,10 +734,16 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 	return mr;
 }
 
-static void clean_keys(struct mlx5_ib_dev *dev, int c)
+struct mlx5_ib_mr *mlx5_mr_cache_alloc_order(struct mlx5_ib_dev *dev,
+					     u32 order, int access_flags)
+{
+	struct mlx5_cache_ent *ent = mkey_cache_ent_from_order(dev, order);
+
+	return mlx5_mr_cache_alloc(dev, ent, access_flags);
+}
+
+static void clean_keys(struct mlx5_ib_dev *dev, struct mlx5_cache_ent *ent)
 {
-	struct mlx5_mkey_cache *cache = &dev->cache;
-	struct mlx5_cache_ent *ent = &cache->ent[c];
 	u32 mkey;
 
 	cancel_delayed_work(&ent->dwork);
@@ -699,8 +762,8 @@ static void mlx5_mkey_cache_debugfs_cleanup(struct mlx5_ib_dev *dev)
 	if (!mlx5_debugfs_root || dev->is_rep)
 		return;
 
-	debugfs_remove_recursive(dev->cache.root);
-	dev->cache.root = NULL;
+	debugfs_remove_recursive(dev->cache.fs_root);
+	dev->cache.fs_root = NULL;
 }
 
 static void mlx5_mkey_cache_debugfs_init(struct mlx5_ib_dev *dev)
@@ -713,12 +776,13 @@ static void mlx5_mkey_cache_debugfs_init(struct mlx5_ib_dev *dev)
 	if (!mlx5_debugfs_root || dev->is_rep)
 		return;
 
-	cache->root = debugfs_create_dir("mr_cache", mlx5_debugfs_get_dev_root(dev->mdev));
+	dir = mlx5_debugfs_get_dev_root(dev->mdev);
+	cache->fs_root = debugfs_create_dir("mr_cache", dir);
 
 	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		ent = &cache->ent[i];
+		ent = mkey_cache_ent_from_order(dev, i);
 		sprintf(ent->name, "%d", ent->order);
-		dir = debugfs_create_dir(ent->name, cache->root);
+		dir = debugfs_create_dir(ent->name, cache->fs_root);
 		debugfs_create_file("size", 0600, dir, ent, &size_fops);
 		debugfs_create_file("limit", 0600, dir, ent, &limit_fops);
 		debugfs_create_ulong("cur", 0400, dir, &ent->stored);
@@ -733,6 +797,30 @@ static void delay_time_func(struct timer_list *t)
 	WRITE_ONCE(dev->fill_delay, 0);
 }
 
+struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
+					      int order)
+{
+	struct mlx5_cache_ent *ent;
+	int ret;
+
+	ent = kzalloc(sizeof(*ent), GFP_KERNEL);
+	if (!ent)
+		return ERR_PTR(-ENOMEM);
+
+	xa_init_flags(&ent->mkeys, XA_FLAGS_LOCK_IRQ);
+	ent->order = order;
+	ent->dev = dev;
+
+	INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
+
+	ret = mlx5_cache_ent_insert(&dev->cache, ent);
+	if (ret) {
+		kfree(ent);
+		return ERR_PTR(ret);
+	}
+	return ent;
+}
+
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 {
 	struct mlx5_mkey_cache *cache = &dev->cache;
@@ -740,6 +828,8 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 	int i;
 
 	mutex_init(&dev->slow_path_mutex);
+	mutex_init(&dev->cache.rb_lock);
+	dev->cache.rb_root = RB_ROOT;
 	cache->wq = alloc_ordered_workqueue("mkey_cache", WQ_MEM_RECLAIM);
 	if (!cache->wq) {
 		mlx5_ib_warn(dev, "failed to create work queue\n");
@@ -749,13 +839,7 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 	mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx);
 	timer_setup(&dev->delay_timer, delay_time_func, 0);
 	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		ent = &cache->ent[i];
-		xa_init_flags(&ent->mkeys, XA_FLAGS_LOCK_IRQ);
-		ent->order = i + 2;
-		ent->dev = dev;
-		ent->limit = 0;
-
-		INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
+		ent = mlx5r_cache_create_ent(dev, i);
 
 		if (i > MKEY_CACHE_LAST_STD_ENTRY) {
 			mlx5_odp_init_mkey_cache_entry(ent);
@@ -785,14 +869,16 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 
 int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev)
 {
-	unsigned int i;
+	struct rb_root *root = &dev->cache.rb_root;
+	struct mlx5_cache_ent *ent;
+	struct rb_node *node;
 
 	if (!dev->cache.wq)
 		return 0;
 
-	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		struct mlx5_cache_ent *ent = &dev->cache.ent[i];
-
+	mutex_lock(&dev->cache.rb_lock);
+	for (node = rb_first(root); node; node = rb_next(node)) {
+		ent = rb_entry(node, struct mlx5_cache_ent, node);
 		xa_lock_irq(&ent->mkeys);
 		ent->disabled = true;
 		xa_unlock_irq(&ent->mkeys);
@@ -802,8 +888,15 @@ int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev)
 	mlx5_mkey_cache_debugfs_cleanup(dev);
 	mlx5_cmd_cleanup_async_ctx(&dev->async_ctx);
 
-	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++)
-		clean_keys(dev, i);
+	node = rb_first(root);
+	while (node) {
+		ent = rb_entry(node, struct mlx5_cache_ent, node);
+		node = rb_next(node);
+		clean_keys(dev, ent);
+		rb_erase(&ent->node, root);
+		kfree(ent);
+	}
+	mutex_unlock(&dev->cache.rb_lock);
 
 	destroy_workqueue(dev->cache.wq);
 	del_timer_sync(&dev->delay_timer);
@@ -876,19 +969,6 @@ static int mkey_cache_max_order(struct mlx5_ib_dev *dev)
 	return MLX5_MAX_UMR_SHIFT;
 }
 
-static struct mlx5_cache_ent *mkey_cache_ent_from_order(struct mlx5_ib_dev *dev,
-							unsigned int order)
-{
-	struct mlx5_mkey_cache *cache = &dev->cache;
-
-	if (order < cache->ent[0].order)
-		return &cache->ent[0];
-	order = order - cache->ent[0].order;
-	if (order > MKEY_CACHE_LAST_STD_ENTRY)
-		return NULL;
-	return &cache->ent[order];
-}
-
 static void set_mr_fields(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
 			  u64 length, int access_flags, u64 iova)
 {
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 71b733fcac37..c41e091618ce 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -419,8 +419,7 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
 	if (IS_ERR(odp))
 		return ERR_CAST(odp);
 
-	mr = mlx5_mr_cache_alloc(dev, &dev->cache.ent[order],
-				 imr->access_flags);
+	mr = mlx5_mr_cache_alloc_order(dev, order, imr->access_flags);
 	if (IS_ERR(mr)) {
 		ib_umem_odp_release(odp);
 		return mr;
@@ -494,9 +493,8 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 	if (IS_ERR(umem_odp))
 		return ERR_CAST(umem_odp);
 
-	imr = mlx5_mr_cache_alloc(dev,
-				  &dev->cache.ent[MLX5_IMR_KSM_CACHE_ENTRY],
-				  access_flags);
+	imr = mlx5_mr_cache_alloc_order(dev, MLX5_IMR_KSM_CACHE_ENTRY,
+					access_flags);
 	if (IS_ERR(imr)) {
 		ib_umem_odp_release(umem_odp);
 		return imr;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
                   ` (2 preceding siblings ...)
  2022-12-07  8:57 ` [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  2022-12-08  0:39   ` Jason Gunthorpe
  2022-12-07  8:57 ` [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Michael Guralnik
  2022-12-07  8:57 ` [PATCH v2 rdma-next 6/6] RDMA/mlx5: Add work to remove temporary entries from the cache Michael Guralnik
  5 siblings, 1 reply; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Michael Guralnik

Switch from using the mkey order to using the new struct as the key to
the RB tree of cache entries.
The key is all the mkey properties that UMR operations can't modify.
Using this key to define the cache entries and to search and create
cache mkeys.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  32 +++--
 drivers/infiniband/hw/mlx5/mr.c      | 196 +++++++++++++++++++--------
 drivers/infiniband/hw/mlx5/odp.c     |  26 ++--
 3 files changed, 180 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 10e22fb01e1b..d795e9fc2c2f 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -731,17 +731,26 @@ struct umr_common {
 	unsigned int state;
 };
 
+struct mlx5r_cache_rb_key {
+	u8 ats:1;
+	unsigned int access_mode;
+	unsigned int access_flags;
+	/*
+	 * keep ndescs as the last member so entries with about the same ndescs
+	 * will be close in the tree
+	 */
+	unsigned int ndescs;
+};
+
 struct mlx5_cache_ent {
 	struct xarray		mkeys;
 	unsigned long		stored;
 	unsigned long		reserved;
 
 	char                    name[4];
-	u32                     order;
-	u32			access_mode;
-	unsigned int		ndescs;
 
 	struct rb_node		node;
+	struct mlx5r_cache_rb_key rb_key;
 
 	u8 disabled:1;
 	u8 fill_to_high_water:1;
@@ -1320,14 +1329,13 @@ int mlx5_ib_get_cqe_size(struct ib_cq *ibcq);
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev);
 int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev);
 struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
-					      int order);
+					      struct mlx5r_cache_rb_key rb_key,
+					      bool debugfs);
 
 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
-				       struct mlx5_cache_ent *ent,
-				       int access_flags);
+				       int access_flags, int access_mode,
+				       int ndescs);
 
-struct mlx5_ib_mr *mlx5_mr_cache_alloc_order(struct mlx5_ib_dev *dev, u32 order,
-					     int access_flags);
 int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
 			    struct ib_mr_status *mr_status);
 struct ib_wq *mlx5_ib_create_wq(struct ib_pd *pd,
@@ -1350,7 +1358,7 @@ int mlx5r_odp_create_eq(struct mlx5_ib_dev *dev, struct mlx5_ib_pf_eq *eq);
 void mlx5_ib_odp_cleanup_one(struct mlx5_ib_dev *ibdev);
 int __init mlx5_ib_odp_init(void);
 void mlx5_ib_odp_cleanup(void);
-void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent);
+struct mlx5_cache_ent *mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev);
 void mlx5_odp_populate_xlt(void *xlt, size_t idx, size_t nentries,
 			   struct mlx5_ib_mr *mr, int flags);
 
@@ -1369,7 +1377,11 @@ static inline int mlx5r_odp_create_eq(struct mlx5_ib_dev *dev,
 static inline void mlx5_ib_odp_cleanup_one(struct mlx5_ib_dev *ibdev) {}
 static inline int mlx5_ib_odp_init(void) { return 0; }
 static inline void mlx5_ib_odp_cleanup(void)				    {}
-void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent) {}
+static inline struct mlx5_cache_ent *
+mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev)
+{
+	return NULL;
+}
 static inline void mlx5_odp_populate_xlt(void *xlt, size_t idx, size_t nentries,
 					 struct mlx5_ib_mr *mr, int flags) {}
 
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index f35022067769..6531e38ef4ec 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -292,11 +292,13 @@ static void set_cache_mkc(struct mlx5_cache_ent *ent, void *mkc)
 	set_mkc_access_pd_addr_fields(mkc, 0, 0, ent->dev->umrc.pd);
 	MLX5_SET(mkc, mkc, free, 1);
 	MLX5_SET(mkc, mkc, umr_en, 1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, ent->access_mode & 0x3);
-	MLX5_SET(mkc, mkc, access_mode_4_2, (ent->access_mode >> 2) & 0x7);
+	MLX5_SET(mkc, mkc, access_mode_1_0, ent->rb_key.access_mode & 0x3);
+	MLX5_SET(mkc, mkc, access_mode_4_2,
+		(ent->rb_key.access_mode >> 2) & 0x7);
 
 	MLX5_SET(mkc, mkc, translations_octword_size,
-		 get_mkc_octo_size(ent->access_mode, ent->ndescs));
+		 get_mkc_octo_size(ent->rb_key.access_mode,
+				   ent->rb_key.ndescs));
 	MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
 }
 
@@ -594,8 +596,8 @@ static void __cache_work_func(struct mlx5_cache_ent *ent)
 			if (err != -EAGAIN) {
 				mlx5_ib_warn(
 					dev,
-					"command failed order %d, err %d\n",
-					ent->order, err);
+					"add keys command failed, err %d\n",
+					err);
 				queue_delayed_work(cache->wq, &ent->dwork,
 						   msecs_to_jiffies(1000));
 			}
@@ -641,22 +643,44 @@ static void delayed_cache_work_func(struct work_struct *work)
 	__cache_work_func(ent);
 }
 
+static int cache_ent_key_cmp(struct mlx5r_cache_rb_key key1,
+			     struct mlx5r_cache_rb_key key2)
+{
+	int res;
+
+	res = key1.ats - key2.ats;
+	if (res)
+		return res;
+
+	res = key1.access_mode - key2.access_mode;
+	if (res)
+		return res;
+
+	res = key1.access_flags - key2.access_flags;
+	if (res)
+		return res;
+
+	return key1.ndescs - key2.ndescs;
+}
+
 static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache,
 				 struct mlx5_cache_ent *ent)
 {
 	struct rb_node **new = &cache->rb_root.rb_node, *parent = NULL;
 	struct mlx5_cache_ent *cur;
+	int cmp;
 
 	mutex_lock(&cache->rb_lock);
 	/* Figure out where to put new node */
 	while (*new) {
 		cur = rb_entry(*new, struct mlx5_cache_ent, node);
 		parent = *new;
-		if (ent->order < cur->order)
+		cmp = cache_ent_key_cmp(cur->rb_key, ent->rb_key);
+		if (cmp > 0)
 			new = &((*new)->rb_left);
-		if (ent->order > cur->order)
+		if (cmp < 0)
 			new = &((*new)->rb_right);
-		if (ent->order == cur->order) {
+		if (cmp == 0) {
 			mutex_unlock(&cache->rb_lock);
 			return -EEXIST;
 		}
@@ -670,40 +694,45 @@ static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache,
 	return 0;
 }
 
-static struct mlx5_cache_ent *mkey_cache_ent_from_order(struct mlx5_ib_dev *dev,
-							unsigned int order)
+static struct mlx5_cache_ent *
+mkey_cache_ent_from_rb_key(struct mlx5_ib_dev *dev,
+			   struct mlx5r_cache_rb_key rb_key)
 {
 	struct rb_node *node = dev->cache.rb_root.rb_node;
 	struct mlx5_cache_ent *cur, *smallest = NULL;
+	int cmp;
 
 	/*
 	 * Find the smallest ent with order >= requested_order.
 	 */
 	while (node) {
 		cur = rb_entry(node, struct mlx5_cache_ent, node);
-		if (cur->order > order) {
+		cmp = cache_ent_key_cmp(cur->rb_key, rb_key);
+		if (cmp > 0) {
 			smallest = cur;
 			node = node->rb_left;
 		}
-		if (cur->order < order)
+		if (cmp < 0)
 			node = node->rb_right;
-		if (cur->order == order)
+		if (cmp == 0)
 			return cur;
 	}
 
-	return smallest;
+	return (smallest &&
+		smallest->rb_key.access_mode == rb_key.access_mode &&
+		smallest->rb_key.access_flags == rb_key.access_flags &&
+		smallest->rb_key.ats == rb_key.ats) ?
+		       smallest :
+		       NULL;
 }
 
-struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
-				       struct mlx5_cache_ent *ent,
-				       int access_flags)
+static struct mlx5_ib_mr *_mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
+					struct mlx5_cache_ent *ent,
+					int access_flags)
 {
 	struct mlx5_ib_mr *mr;
 	int err;
 
-	if (!mlx5r_umr_can_reconfig(dev, 0, access_flags))
-		return ERR_PTR(-EOPNOTSUPP);
-
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
@@ -734,12 +763,43 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 	return mr;
 }
 
-struct mlx5_ib_mr *mlx5_mr_cache_alloc_order(struct mlx5_ib_dev *dev,
-					     u32 order, int access_flags)
+static int get_unchangeable_access_flags(struct mlx5_ib_dev *dev,
+					 int access_flags)
 {
-	struct mlx5_cache_ent *ent = mkey_cache_ent_from_order(dev, order);
+	int ret = 0;
+
+	if ((access_flags & IB_ACCESS_REMOTE_ATOMIC) &&
+	    MLX5_CAP_GEN(dev->mdev, atomic) &&
+	    MLX5_CAP_GEN(dev->mdev, umr_modify_atomic_disabled))
+		ret |= IB_ACCESS_REMOTE_ATOMIC;
 
-	return mlx5_mr_cache_alloc(dev, ent, access_flags);
+	if ((access_flags & IB_ACCESS_RELAXED_ORDERING) &&
+	    MLX5_CAP_GEN(dev->mdev, relaxed_ordering_write) &&
+	    !MLX5_CAP_GEN(dev->mdev, relaxed_ordering_write_umr))
+		ret |= IB_ACCESS_RELAXED_ORDERING;
+
+	if ((access_flags & IB_ACCESS_RELAXED_ORDERING) &&
+	    MLX5_CAP_GEN(dev->mdev, relaxed_ordering_read) &&
+	    !MLX5_CAP_GEN(dev->mdev, relaxed_ordering_read_umr))
+		ret |= IB_ACCESS_RELAXED_ORDERING;
+
+	return ret;
+}
+
+struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
+				       int access_flags, int access_mode,
+				       int ndescs)
+{
+	struct mlx5r_cache_rb_key rb_key = {
+		.ndescs = ndescs,
+		.access_mode = access_mode,
+		.access_flags = get_unchangeable_access_flags(dev, access_flags)
+	};
+	struct mlx5_cache_ent *ent = mkey_cache_ent_from_rb_key(dev, rb_key);
+	if (!ent)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	return _mlx5_mr_cache_alloc(dev, ent, access_flags);
 }
 
 static void clean_keys(struct mlx5_ib_dev *dev, struct mlx5_cache_ent *ent)
@@ -766,28 +826,32 @@ static void mlx5_mkey_cache_debugfs_cleanup(struct mlx5_ib_dev *dev)
 	dev->cache.fs_root = NULL;
 }
 
+static void mlx5_mkey_cache_debugfs_add_ent(struct mlx5_ib_dev *dev,
+					    struct mlx5_cache_ent *ent)
+{
+	int order = order_base_2(ent->rb_key.ndescs);
+	struct dentry *dir;
+
+	if (ent->rb_key.access_mode == MLX5_MKC_ACCESS_MODE_KSM)
+		order = MLX5_IMR_KSM_CACHE_ENTRY + 2;
+
+	sprintf(ent->name, "%d", order);
+	dir = debugfs_create_dir(ent->name, dev->cache.fs_root);
+	debugfs_create_file("size", 0600, dir, ent, &size_fops);
+	debugfs_create_file("limit", 0600, dir, ent, &limit_fops);
+	debugfs_create_ulong("cur", 0400, dir, &ent->stored);
+	debugfs_create_u32("miss", 0600, dir, &ent->miss);
+}
+
 static void mlx5_mkey_cache_debugfs_init(struct mlx5_ib_dev *dev)
 {
+	struct dentry *dbg_root = mlx5_debugfs_get_dev_root(dev->mdev);
 	struct mlx5_mkey_cache *cache = &dev->cache;
-	struct mlx5_cache_ent *ent;
-	struct dentry *dir;
-	int i;
 
 	if (!mlx5_debugfs_root || dev->is_rep)
 		return;
 
-	dir = mlx5_debugfs_get_dev_root(dev->mdev);
-	cache->fs_root = debugfs_create_dir("mr_cache", dir);
-
-	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		ent = mkey_cache_ent_from_order(dev, i);
-		sprintf(ent->name, "%d", ent->order);
-		dir = debugfs_create_dir(ent->name, cache->fs_root);
-		debugfs_create_file("size", 0600, dir, ent, &size_fops);
-		debugfs_create_file("limit", 0600, dir, ent, &limit_fops);
-		debugfs_create_ulong("cur", 0400, dir, &ent->stored);
-		debugfs_create_u32("miss", 0600, dir, &ent->miss);
-	}
+	cache->fs_root = debugfs_create_dir("mr_cache", dbg_root);
 }
 
 static void delay_time_func(struct timer_list *t)
@@ -798,7 +862,8 @@ static void delay_time_func(struct timer_list *t)
 }
 
 struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
-					      int order)
+					      struct mlx5r_cache_rb_key rb_key,
+					      bool debugfs)
 {
 	struct mlx5_cache_ent *ent;
 	int ret;
@@ -808,7 +873,10 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
 		return ERR_PTR(-ENOMEM);
 
 	xa_init_flags(&ent->mkeys, XA_FLAGS_LOCK_IRQ);
-	ent->order = order;
+	ent->rb_key.ats = rb_key.ats;
+	ent->rb_key.access_mode = rb_key.access_mode;
+	ent->rb_key.access_flags = rb_key.access_flags;
+	ent->rb_key.ndescs = rb_key.ndescs;
 	ent->dev = dev;
 
 	INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
@@ -818,11 +886,18 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
 		kfree(ent);
 		return ERR_PTR(ret);
 	}
+
+	if (debugfs)
+		mlx5_mkey_cache_debugfs_add_ent(dev, ent);
+
 	return ent;
 }
 
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 {
+	struct mlx5r_cache_rb_key rb_key = {
+		.access_mode = MLX5_MKC_ACCESS_MODE_MTT,
+	};
 	struct mlx5_mkey_cache *cache = &dev->cache;
 	struct mlx5_cache_ent *ent;
 	int i;
@@ -838,19 +913,26 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 
 	mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx);
 	timer_setup(&dev->delay_timer, delay_time_func, 0);
+	mlx5_mkey_cache_debugfs_init(dev);
 	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
-		ent = mlx5r_cache_create_ent(dev, i);
-
-		if (i > MKEY_CACHE_LAST_STD_ENTRY) {
-			mlx5_odp_init_mkey_cache_entry(ent);
+		if (i > mkey_cache_max_order(dev))
 			continue;
+
+		if (i == MLX5_IMR_KSM_CACHE_ENTRY) {
+			ent = mlx5_odp_init_mkey_cache_entry(dev);
+			if (!ent)
+				continue;
+		}
+		else {
+			rb_key.ndescs = 1 << (i + 2);
+			ent = mlx5r_cache_create_ent(dev, rb_key, true);
 		}
 
-		if (ent->order > mkey_cache_max_order(dev))
-			continue;
+		if (IS_ERR(ent)) {
+			mlx5_ib_warn(dev, "failed to create mkey cache entry\n");
+			return PTR_ERR(ent);
+		}
 
-		ent->ndescs = 1 << ent->order;
-		ent->access_mode = MLX5_MKC_ACCESS_MODE_MTT;
 		if ((dev->mdev->profile.mask & MLX5_PROF_MASK_MR_CACHE) &&
 		    !dev->is_rep && mlx5_core_is_pf(dev->mdev) &&
 		    mlx5r_umr_can_load_pas(dev, 0))
@@ -862,8 +944,6 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 		xa_unlock_irq(&ent->mkeys);
 	}
 
-	mlx5_mkey_cache_debugfs_init(dev);
-
 	return 0;
 }
 
@@ -995,6 +1075,9 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 					     struct ib_umem *umem, u64 iova,
 					     int access_flags)
 {
+	struct mlx5r_cache_rb_key rb_key = {
+		.access_mode = MLX5_MKC_ACCESS_MODE_MTT,
+	};
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_cache_ent *ent;
 	struct mlx5_ib_mr *mr;
@@ -1007,8 +1090,11 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 						     0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
-	ent = mkey_cache_ent_from_order(
-		dev, order_base_2(ib_umem_num_dma_blocks(umem, page_size)));
+
+	rb_key.ndescs = ib_umem_num_dma_blocks(umem, page_size);
+	rb_key.ats = mlx5_umem_needs_ats(dev, umem, access_flags);
+	rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags);
+	ent = mkey_cache_ent_from_rb_key(dev, rb_key);
 	/*
 	 * Matches access in alloc_cache_mr(). If the MR can't come from the
 	 * cache then synchronously create an uncached one.
@@ -1022,7 +1108,7 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 		return mr;
 	}
 
-	mr = mlx5_mr_cache_alloc(dev, ent, access_flags);
+	mr = _mlx5_mr_cache_alloc(dev, ent, access_flags);
 	if (IS_ERR(mr))
 		return mr;
 
@@ -1451,7 +1537,7 @@ static bool can_use_umr_rereg_pas(struct mlx5_ib_mr *mr,
 		mlx5_umem_find_best_pgsz(new_umem, mkc, log_page_size, 0, iova);
 	if (WARN_ON(!*page_size))
 		return false;
-	return (1ULL << mr->mmkey.cache_ent->order) >=
+	return (mr->mmkey.cache_ent->rb_key.ndescs) >=
 	       ib_umem_num_dma_blocks(new_umem, *page_size);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index c41e091618ce..90de87ba3b96 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -406,7 +406,6 @@ static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
 static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
 						unsigned long idx)
 {
-	int order = order_base_2(MLX5_IMR_MTT_ENTRIES);
 	struct mlx5_ib_dev *dev = mr_to_mdev(imr);
 	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
@@ -419,7 +418,9 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
 	if (IS_ERR(odp))
 		return ERR_CAST(odp);
 
-	mr = mlx5_mr_cache_alloc_order(dev, order, imr->access_flags);
+	mr = mlx5_mr_cache_alloc(dev, imr->access_flags,
+				 MLX5_MKC_ACCESS_MODE_MTT,
+				 MLX5_IMR_MTT_ENTRIES);
 	if (IS_ERR(mr)) {
 		ib_umem_odp_release(odp);
 		return mr;
@@ -493,8 +494,8 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 	if (IS_ERR(umem_odp))
 		return ERR_CAST(umem_odp);
 
-	imr = mlx5_mr_cache_alloc_order(dev, MLX5_IMR_KSM_CACHE_ENTRY,
-					access_flags);
+	imr = mlx5_mr_cache_alloc(dev, access_flags, MLX5_MKC_ACCESS_MODE_KSM,
+				  mlx5_imr_ksm_entries);
 	if (IS_ERR(imr)) {
 		ib_umem_odp_release(umem_odp);
 		return imr;
@@ -1587,12 +1588,19 @@ mlx5_ib_odp_destroy_eq(struct mlx5_ib_dev *dev, struct mlx5_ib_pf_eq *eq)
 	return err;
 }
 
-void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
+struct mlx5_cache_ent *mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev)
 {
-	if (!(ent->dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
-		return;
-	ent->ndescs = mlx5_imr_ksm_entries;
-	ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
+	struct mlx5r_cache_rb_key rb_key = {
+		.ats = 0,
+		.access_mode = MLX5_MKC_ACCESS_MODE_KSM,
+		.access_flags = 0,
+		.ndescs = mlx5_imr_ksm_entries,
+	};
+
+	if (!(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
+		return NULL;
+
+	return mlx5r_cache_create_ent(dev, rb_key, true);
 }
 
 static const struct ib_device_ops mlx5_ib_dev_odp_ops = {
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
                   ` (3 preceding siblings ...)
  2022-12-07  8:57 ` [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  2022-12-08  0:44   ` Jason Gunthorpe
  2022-12-07  8:57 ` [PATCH v2 rdma-next 6/6] RDMA/mlx5: Add work to remove temporary entries from the cache Michael Guralnik
  5 siblings, 1 reply; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Michael Guralnik

Currently, when dereging an MR, if the mkey doesn't belong to a cache
entry, it will be destroyed.
As a result, the restart of applications with many non-cached mkeys is
not efficient since all the mkeys are destroyed and then recreated.
This process takes a long time (for 100,000 MRs, it is ~20 seconds for
dereg and ~28 seconds for re-reg).

To shorten the restart runtime, insert all cacheable mkeys to the cache.
If there is no fitting entry to the mkey properties, create a temporary
entry that fits it.

After a predetermined timeout, the cache entries will shrink to the
initial high limit.

The mkeys will still be in the cache when consuming them again after an
application restart. Therefore, the registration will be much faster
(for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).

The temporary cache entries created to store the non-cache mkeys are not
exposed through sysfs like the default cache entries.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 24 ++++++------
 drivers/infiniband/hw/mlx5/mr.c      | 55 +++++++++++++++++++++-------
 2 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index be6d9ec5b127..8f0faa6bc9b5 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -617,12 +617,25 @@ enum mlx5_mkey_type {
 	MLX5_MKEY_INDIRECT_DEVX,
 };
 
+struct mlx5r_cache_rb_key {
+	u8 ats:1;
+	unsigned int access_mode;
+	unsigned int access_flags;
+	/*
+	 * keep ndescs as the last member so entries with about the same ndescs
+	 * will be close in the tree
+	 */
+	unsigned int ndescs;
+};
+
 struct mlx5_ib_mkey {
 	u32 key;
 	enum mlx5_mkey_type type;
 	unsigned int ndescs;
 	struct wait_queue_head wait;
 	refcount_t usecount;
+	/* User Mkey must hold either a cache_key or a cache_ent. */
+	struct mlx5r_cache_rb_key rb_key;
 	struct mlx5_cache_ent *cache_ent;
 };
 
@@ -731,17 +744,6 @@ struct umr_common {
 	unsigned int state;
 };
 
-struct mlx5r_cache_rb_key {
-	u8 ats:1;
-	unsigned int access_mode;
-	unsigned int access_flags;
-	/*
-	 * keep ndescs as the last member so entries with about the same ndescs
-	 * will be close in the tree
-	 */
-	unsigned int ndescs;
-};
-
 struct mlx5_cache_ent {
 	struct xarray		mkeys;
 	unsigned long		stored;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6531e38ef4ec..2e984d436ad5 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1096,15 +1096,14 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 	rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags);
 	ent = mkey_cache_ent_from_rb_key(dev, rb_key);
 	/*
-	 * Matches access in alloc_cache_mr(). If the MR can't come from the
-	 * cache then synchronously create an uncached one.
+	 * If the MR can't come from the cache then synchronously create an uncached
+	 * one.
 	 */
-	if (!ent || ent->limit == 0 ||
-	    !mlx5r_umr_can_reconfig(dev, 0, access_flags) ||
-	    mlx5_umem_needs_ats(dev, umem, access_flags)) {
+	if (!ent) {
 		mutex_lock(&dev->slow_path_mutex);
 		mr = reg_create(pd, umem, iova, access_flags, page_size, false);
 		mutex_unlock(&dev->slow_path_mutex);
+		mr->mmkey.rb_key = rb_key;
 		return mr;
 	}
 
@@ -1195,6 +1194,7 @@ static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem,
 		goto err_2;
 	}
 	mr->mmkey.type = MLX5_MKEY_MR;
+	mr->mmkey.ndescs = get_octo_len(iova, umem->length, mr->page_shift);
 	mr->umem = umem;
 	set_mr_fields(dev, mr, umem->length, access_flags, iova);
 	kvfree(in);
@@ -1732,6 +1732,40 @@ mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
 	}
 }
 
+static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
+				    struct mlx5_ib_mr *mr)
+{
+	struct mlx5_mkey_cache *cache = &dev->cache;
+	struct mlx5_cache_ent *ent;
+
+	if (mr->mmkey.cache_ent) {
+		xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
+		mr->mmkey.cache_ent->in_use--;
+		xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
+		goto end;
+	}
+
+	mutex_lock(&cache->rb_lock);
+	ent = mkey_cache_ent_from_rb_key(dev, mr->mmkey.rb_key);
+	mutex_unlock(&cache->rb_lock);
+	if (ent) {
+		if (ent->rb_key.ndescs == mr->mmkey.rb_key.ndescs) {
+			mr->mmkey.cache_ent = ent;
+			goto end;
+		}
+	}
+
+	ent = mlx5r_cache_create_ent(dev, mr->mmkey.rb_key, false);
+	if (IS_ERR(ent))
+		return PTR_ERR(ent);
+
+	mr->mmkey.cache_ent = ent;
+
+end:
+	return push_mkey(mr->mmkey.cache_ent, false,
+			 xa_mk_value(mr->mmkey.key));
+}
+
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
@@ -1777,16 +1811,11 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	}
 
 	/* Stop DMA */
-	if (mr->mmkey.cache_ent) {
-		xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
-		mr->mmkey.cache_ent->in_use--;
-		xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
-
+	if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
 		if (mlx5r_umr_revoke_mr(mr) ||
-		    push_mkey(mr->mmkey.cache_ent, false,
-			      xa_mk_value(mr->mmkey.key)))
+		    cache_ent_find_and_store(dev, mr))
 			mr->mmkey.cache_ent = NULL;
-	}
+
 	if (!mr->mmkey.cache_ent) {
 		rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
 		if (rc)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 rdma-next 6/6] RDMA/mlx5: Add work to remove temporary entries from the cache
  2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
                   ` (4 preceding siblings ...)
  2022-12-07  8:57 ` [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Michael Guralnik
@ 2022-12-07  8:57 ` Michael Guralnik
  5 siblings, 0 replies; 13+ messages in thread
From: Michael Guralnik @ 2022-12-07  8:57 UTC (permalink / raw)
  To: jgg, leonro, linux-rdma; +Cc: maorg, Michael Guralnik

The non-cache mkeys are stored in the cache only to shorten restarting
application time. Don't store them longer than needed.

Configure cache entries that store non-cache MRs as temporary entries.
If 30 seconds have passed and no user reclaimed the temporarily cached
mkeys, an asynchronous work will destroy the mkeys entries.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  8 ++-
 drivers/infiniband/hw/mlx5/mr.c      | 93 ++++++++++++++++++++++------
 drivers/infiniband/hw/mlx5/odp.c     |  2 +-
 3 files changed, 80 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 8f0faa6bc9b5..86893fc145ba 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -754,6 +754,7 @@ struct mlx5_cache_ent {
 	struct rb_node		node;
 	struct mlx5r_cache_rb_key rb_key;
 
+	u8 is_tmp:1;
 	u8 disabled:1;
 	u8 fill_to_high_water:1;
 
@@ -787,6 +788,7 @@ struct mlx5_mkey_cache {
 	struct mutex		rb_lock;
 	struct dentry		*fs_root;
 	unsigned long		last_add;
+	struct delayed_work	remove_ent_dwork;
 };
 
 struct mlx5_ib_port_resources {
@@ -1330,9 +1332,9 @@ void mlx5_ib_copy_pas(u64 *old, u64 *new, int step, int num);
 int mlx5_ib_get_cqe_size(struct ib_cq *ibcq);
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev);
 int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev);
-struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
-					      struct mlx5r_cache_rb_key rb_key,
-					      bool debugfs);
+struct mlx5_cache_ent *
+mlx5r_cache_create_ent_locked(struct mlx5_ib_dev *dev,
+			      struct mlx5r_cache_rb_key rb_key, bool debugfs);
 
 struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 				       int access_flags, int access_mode,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 2e984d436ad5..5645ce351f59 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -140,19 +140,16 @@ static void create_mkey_warn(struct mlx5_ib_dev *dev, int status, void *out)
 	mlx5_cmd_out_err(dev->mdev, MLX5_CMD_OP_CREATE_MKEY, 0, out);
 }
 
-
-static int push_mkey(struct mlx5_cache_ent *ent, bool limit_pendings,
-		     void *to_store)
+static int push_mkey_locked(struct mlx5_cache_ent *ent, bool limit_pendings,
+			    void *to_store)
 {
 	XA_STATE(xas, &ent->mkeys, 0);
 	void *curr;
 
-	xa_lock_irq(&ent->mkeys);
 	if (limit_pendings &&
-	    (ent->reserved - ent->stored) > MAX_PENDING_REG_MR) {
-		xa_unlock_irq(&ent->mkeys);
+	    (ent->reserved - ent->stored) > MAX_PENDING_REG_MR)
 		return -EAGAIN;
-	}
+
 	while (1) {
 		/*
 		 * This is cmpxchg (NULL, XA_ZERO_ENTRY) however this version
@@ -191,6 +188,7 @@ static int push_mkey(struct mlx5_cache_ent *ent, bool limit_pendings,
 			break;
 		xa_lock_irq(&ent->mkeys);
 	}
+	xa_lock_irq(&ent->mkeys);
 	if (xas_error(&xas))
 		return xas_error(&xas);
 	if (WARN_ON(curr))
@@ -198,6 +196,17 @@ static int push_mkey(struct mlx5_cache_ent *ent, bool limit_pendings,
 	return 0;
 }
 
+static int push_mkey(struct mlx5_cache_ent *ent, bool limit_pendings,
+		     void *to_store)
+{
+	int ret;
+
+	xa_lock_irq(&ent->mkeys);
+	ret = push_mkey_locked(ent, limit_pendings, to_store);
+	xa_unlock_irq(&ent->mkeys);
+	return ret;
+}
+
 static void undo_push_reserve_mkey(struct mlx5_cache_ent *ent)
 {
 	void *old;
@@ -545,7 +554,7 @@ static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent)
 {
 	lockdep_assert_held(&ent->mkeys.xa_lock);
 
-	if (ent->disabled || READ_ONCE(ent->dev->fill_delay))
+	if (ent->disabled || READ_ONCE(ent->dev->fill_delay) || ent->is_tmp)
 		return;
 	if (ent->stored < ent->limit) {
 		ent->fill_to_high_water = true;
@@ -670,7 +679,6 @@ static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache,
 	struct mlx5_cache_ent *cur;
 	int cmp;
 
-	mutex_lock(&cache->rb_lock);
 	/* Figure out where to put new node */
 	while (*new) {
 		cur = rb_entry(*new, struct mlx5_cache_ent, node);
@@ -690,7 +698,6 @@ static int mlx5_cache_ent_insert(struct mlx5_mkey_cache *cache,
 	rb_link_node(&ent->node, parent, new);
 	rb_insert_color(&ent->node, &cache->rb_root);
 
-	mutex_unlock(&cache->rb_lock);
 	return 0;
 }
 
@@ -861,9 +868,9 @@ static void delay_time_func(struct timer_list *t)
 	WRITE_ONCE(dev->fill_delay, 0);
 }
 
-struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
-					      struct mlx5r_cache_rb_key rb_key,
-					      bool debugfs)
+struct mlx5_cache_ent *
+mlx5r_cache_create_ent_locked(struct mlx5_ib_dev *dev,
+			      struct mlx5r_cache_rb_key rb_key, bool debugfs)
 {
 	struct mlx5_cache_ent *ent;
 	int ret;
@@ -878,6 +885,7 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
 	ent->rb_key.access_flags = rb_key.access_flags;
 	ent->rb_key.ndescs = rb_key.ndescs;
 	ent->dev = dev;
+	ent->is_tmp = !debugfs;
 
 	INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
 
@@ -890,9 +898,43 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
 	if (debugfs)
 		mlx5_mkey_cache_debugfs_add_ent(dev, ent);
 
+	if (ent->is_tmp)
+		mod_delayed_work(ent->dev->cache.wq,
+				 &ent->dev->cache.remove_ent_dwork,
+				 msecs_to_jiffies(30 * 1000));
+
 	return ent;
 }
 
+static void remove_ent_work_func(struct work_struct *work)
+{
+	struct mlx5_mkey_cache *cache;
+	struct mlx5_cache_ent *ent;
+	struct rb_node *cur;
+
+	cache = container_of(work, struct mlx5_mkey_cache,
+			     remove_ent_dwork.work);
+	mutex_lock(&cache->rb_lock);
+	cur = rb_last(&cache->rb_root);
+	while (cur) {
+		ent = rb_entry(cur, struct mlx5_cache_ent, node);
+		cur = rb_prev(cur);
+		mutex_unlock(&cache->rb_lock);
+
+		xa_lock_irq(&ent->mkeys);
+		if (!ent->is_tmp) {
+			xa_unlock_irq(&ent->mkeys);
+			mutex_lock(&cache->rb_lock);
+			continue;
+		}
+		xa_unlock_irq(&ent->mkeys);
+
+		clean_keys(ent->dev, ent);
+		mutex_lock(&cache->rb_lock);
+	}
+	mutex_unlock(&cache->rb_lock);
+}
+
 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 {
 	struct mlx5r_cache_rb_key rb_key = {
@@ -905,6 +947,7 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 	mutex_init(&dev->slow_path_mutex);
 	mutex_init(&dev->cache.rb_lock);
 	dev->cache.rb_root = RB_ROOT;
+	INIT_DELAYED_WORK(&dev->cache.remove_ent_dwork, remove_ent_work_func);
 	cache->wq = alloc_ordered_workqueue("mkey_cache", WQ_MEM_RECLAIM);
 	if (!cache->wq) {
 		mlx5_ib_warn(dev, "failed to create work queue\n");
@@ -918,6 +961,7 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 		if (i > mkey_cache_max_order(dev))
 			continue;
 
+		mutex_lock(&cache->rb_lock);
 		if (i == MLX5_IMR_KSM_CACHE_ENTRY) {
 			ent = mlx5_odp_init_mkey_cache_entry(dev);
 			if (!ent)
@@ -925,8 +969,9 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
 		}
 		else {
 			rb_key.ndescs = 1 << (i + 2);
-			ent = mlx5r_cache_create_ent(dev, rb_key, true);
+			ent = mlx5r_cache_create_ent_locked(dev, rb_key, true);
 		}
+		mutex_unlock(&cache->rb_lock);
 
 		if (IS_ERR(ent)) {
 			mlx5_ib_warn(dev, "failed to create mkey cache entry\n");
@@ -956,6 +1001,7 @@ int mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev)
 	if (!dev->cache.wq)
 		return 0;
 
+	cancel_delayed_work_sync(&dev->cache.remove_ent_dwork);
 	mutex_lock(&dev->cache.rb_lock);
 	for (node = rb_first(root); node; node = rb_next(node)) {
 		ent = rb_entry(node, struct mlx5_cache_ent, node);
@@ -1737,33 +1783,42 @@ static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
 {
 	struct mlx5_mkey_cache *cache = &dev->cache;
 	struct mlx5_cache_ent *ent;
+	int ret;
 
 	if (mr->mmkey.cache_ent) {
 		xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
 		mr->mmkey.cache_ent->in_use--;
-		xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
 		goto end;
 	}
 
 	mutex_lock(&cache->rb_lock);
 	ent = mkey_cache_ent_from_rb_key(dev, mr->mmkey.rb_key);
-	mutex_unlock(&cache->rb_lock);
 	if (ent) {
 		if (ent->rb_key.ndescs == mr->mmkey.rb_key.ndescs) {
+			if (ent->disabled) {
+				mutex_unlock(&cache->rb_lock);
+				return -EOPNOTSUPP;
+			}
 			mr->mmkey.cache_ent = ent;
+			xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
+			mutex_unlock(&cache->rb_lock);
 			goto end;
 		}
 	}
 
-	ent = mlx5r_cache_create_ent(dev, mr->mmkey.rb_key, false);
+	ent = mlx5r_cache_create_ent_locked(dev, mr->mmkey.rb_key, false);
+	mutex_unlock(&cache->rb_lock);
 	if (IS_ERR(ent))
 		return PTR_ERR(ent);
 
 	mr->mmkey.cache_ent = ent;
+	xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
 
 end:
-	return push_mkey(mr->mmkey.cache_ent, false,
-			 xa_mk_value(mr->mmkey.key));
+	ret = push_mkey_locked(mr->mmkey.cache_ent, false,
+			       xa_mk_value(mr->mmkey.key));
+	xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
+	return ret;
 }
 
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 90de87ba3b96..b36b5afbc53c 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1600,7 +1600,7 @@ struct mlx5_cache_ent *mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev)
 	if (!(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
 		return NULL;
 
-	return mlx5r_cache_create_ent(dev, rb_key, true);
+	return mlx5r_cache_create_ent_locked(dev, rb_key, true);
 }
 
 static const struct ib_device_ops mlx5_ib_dev_odp_ops = {
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry
  2022-12-07  8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
@ 2022-12-08  0:02   ` Jason Gunthorpe
  2022-12-08  0:22   ` Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-08  0:02 UTC (permalink / raw)
  To: Michael Guralnik; +Cc: leonro, linux-rdma, maorg, Aharon Landau

On Wed, Dec 07, 2022 at 10:57:48AM +0200, Michael Guralnik wrote:
> From: Aharon Landau <aharonl@nvidia.com>
> 
> Explicit ODP mkey doesn't have unique properties. There is no need to
> devote to it a special entry. Removing it.

This isn't quite true, the purpose of this entry to is to create a
cache block of order 30, which is higher than the usual max of 22

Though, I'm not really sure we should even be caching such huge MRs, I
can understand why ODP would want this cache, at least for fairly
short term.

At the end of this do we still have caching for order 30 MRs?

Also, is the MTT vs KSM access_mode UMRable?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree
  2022-12-07  8:57 ` [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree Michael Guralnik
@ 2022-12-08  0:17   ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-08  0:17 UTC (permalink / raw)
  To: Michael Guralnik; +Cc: leonro, linux-rdma, maorg

On Wed, Dec 07, 2022 at 10:57:49AM +0200, Michael Guralnik wrote:

> @@ -1362,7 +1369,7 @@ static inline int mlx5r_odp_create_eq(struct mlx5_ib_dev *dev,
>  static inline void mlx5_ib_odp_cleanup_one(struct mlx5_ib_dev *ibdev) {}
>  static inline int mlx5_ib_odp_init(void) { return 0; }
>  static inline void mlx5_ib_odp_cleanup(void)				    {}
> -static inline void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent) {}
> +void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent) {}

Why?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry
  2022-12-07  8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
  2022-12-08  0:02   ` Jason Gunthorpe
@ 2022-12-08  0:22   ` Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-08  0:22 UTC (permalink / raw)
  To: Michael Guralnik; +Cc: leonro, linux-rdma, maorg, Aharon Landau

On Wed, Dec 07, 2022 at 10:57:48AM +0200, Michael Guralnik wrote:
> From: Aharon Landau <aharonl@nvidia.com>
> 
> Explicit ODP mkey doesn't have unique properties. There is no need to
> devote to it a special entry. Removing it.
> 
> Signed-off-by: Aharon Landau <aharonl@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/odp.c | 19 ++++---------------
>  include/linux/mlx5/driver.h      |  1 -
>  2 files changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> index e9a29adef7dc..71b733fcac37 100644
> --- a/drivers/infiniband/hw/mlx5/odp.c
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -406,6 +406,7 @@ static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
>  static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
>  						unsigned long idx)
>  {
> +	int order = order_base_2(MLX5_IMR_MTT_ENTRIES);

unsigned int

>  	struct mlx5_ib_dev *dev = mr_to_mdev(imr);
>  	struct ib_umem_odp *odp;
>  	struct mlx5_ib_mr *mr;
> @@ -418,7 +419,7 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr,
>  	if (IS_ERR(odp))
>  		return ERR_CAST(odp);
>  
> -	mr = mlx5_mr_cache_alloc(dev, &dev->cache.ent[MLX5_IMR_MTT_CACHE_ENTRY],
> +	mr = mlx5_mr_cache_alloc(dev, &dev->cache.ent[order],

Okay, I get it now, this is actually just order 18, so it is covered
by the existing order. You should clarify the commit message that it
is the same as the order 18 entry.

Also add a compile time assertion here:

static_assert(order < MKEY_CACHE_LAST_STD_ENTRY);

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key
  2022-12-07  8:57 ` [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
@ 2022-12-08  0:39   ` Jason Gunthorpe
  2022-12-13 12:12     ` Michael Guralnik
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-08  0:39 UTC (permalink / raw)
  To: Michael Guralnik; +Cc: leonro, linux-rdma, maorg

On Wed, Dec 07, 2022 at 10:57:50AM +0200, Michael Guralnik wrote:
> Switch from using the mkey order to using the new struct as the key to
> the RB tree of cache entries.
> The key is all the mkey properties that UMR operations can't modify.
> Using this key to define the cache entries and to search and create
> cache mkeys.
> 
> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  32 +++--
>  drivers/infiniband/hw/mlx5/mr.c      | 196 +++++++++++++++++++--------
>  drivers/infiniband/hw/mlx5/odp.c     |  26 ++--
>  3 files changed, 180 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 10e22fb01e1b..d795e9fc2c2f 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -731,17 +731,26 @@ struct umr_common {
>  	unsigned int state;
>  };
>  
> +struct mlx5r_cache_rb_key {
> +	u8 ats:1;
> +	unsigned int access_mode;
> +	unsigned int access_flags;
> +	/*
> +	 * keep ndescs as the last member so entries with about the same ndescs
> +	 * will be close in the tree
> +	 */

? How does this happen? The compare function doesn't use memcmp..

I think this comment should go in the compare function because the
search function does this:

> -	return smallest;
> +	return (smallest &&
> +		smallest->rb_key.access_mode == rb_key.access_mode &&
> +		smallest->rb_key.access_flags == rb_key.access_flags &&
> +		smallest->rb_key.ats == rb_key.ats) ?
> +		       smallest :
> +		       NULL;

So it isn't that they have to be close in the tree, it is that
"smallest" has to find a matching mode/flags/ats before finding the
smallest ndescs of the matching list. Thus ndescs must always by the
last thing in the compare ladder.

> +struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
> +				       int access_flags, int access_mode,
> +				       int ndescs)
> +{
> +	struct mlx5r_cache_rb_key rb_key = {
> +		.ndescs = ndescs,
> +		.access_mode = access_mode,
> +		.access_flags = get_unchangeable_access_flags(dev, access_flags)
> +	};
> +	struct mlx5_cache_ent *ent = mkey_cache_ent_from_rb_key(dev, rb_key);
> +	if (!ent)

Missing newline

>  struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
> -					      int order)
> +					      struct mlx5r_cache_rb_key rb_key,
> +					      bool debugfs)
>  {
>  	struct mlx5_cache_ent *ent;
>  	int ret;
> @@ -808,7 +873,10 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	xa_init_flags(&ent->mkeys, XA_FLAGS_LOCK_IRQ);
> -	ent->order = order;
> +	ent->rb_key.ats = rb_key.ats;
> +	ent->rb_key.access_mode = rb_key.access_mode;
> +	ent->rb_key.access_flags = rb_key.access_flags;
> +	ent->rb_key.ndescs = rb_key.ndescs;

ent->rb_key = rb_key

>  int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
>  {
> +	struct mlx5r_cache_rb_key rb_key = {
> +		.access_mode = MLX5_MKC_ACCESS_MODE_MTT,
> +	};
>  	struct mlx5_mkey_cache *cache = &dev->cache;
>  	struct mlx5_cache_ent *ent;
>  	int i;
> @@ -838,19 +913,26 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
>  
>  	mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx);
>  	timer_setup(&dev->delay_timer, delay_time_func, 0);
> +	mlx5_mkey_cache_debugfs_init(dev);
>  	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
> -		ent = mlx5r_cache_create_ent(dev, i);
> -
> -		if (i > MKEY_CACHE_LAST_STD_ENTRY) {
> -			mlx5_odp_init_mkey_cache_entry(ent);
> +		if (i > mkey_cache_max_order(dev))
>  			continue;

This is goofy, just make the for loop go from 2 to
mkey_cache_max_order() + 2 (and probably have the function do the + 2
internally)

Get rid of MAX_MKEY_CACHE_ENTRIES
> +
> +		if (i == MLX5_IMR_KSM_CACHE_ENTRY) {
> +			ent = mlx5_odp_init_mkey_cache_entry(dev);
> +			if (!ent)
> +				continue;

This too, just call mlx5_odp_init_mkey_cache_entry() outside the loop

And rename it to something like mlx5_odp_init_mkey_cache(), and don't
return ent.

Set ent->limit inside mlx5r_cache_create_ent()

And run over the whole rbtree in a final loop to do the final
queue_adjust_cache_locked() step.

> -void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
> +struct mlx5_cache_ent *mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev)
>  {
> -	if (!(ent->dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
> -		return;
> -	ent->ndescs = mlx5_imr_ksm_entries;
> -	ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
> +	struct mlx5r_cache_rb_key rb_key = {
> +		.ats = 0,
> +		.access_mode = MLX5_MKC_ACCESS_MODE_KSM,
> +		.access_flags = 0,
> +		.ndescs = mlx5_imr_ksm_entries,

Don't need to zero init things here

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
  2022-12-07  8:57 ` [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Michael Guralnik
@ 2022-12-08  0:44   ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-08  0:44 UTC (permalink / raw)
  To: Michael Guralnik; +Cc: leonro, linux-rdma, maorg

On Wed, Dec 07, 2022 at 10:57:51AM +0200, Michael Guralnik wrote:
> Currently, when dereging an MR, if the mkey doesn't belong to a cache
> entry, it will be destroyed.
> As a result, the restart of applications with many non-cached mkeys is
> not efficient since all the mkeys are destroyed and then recreated.
> This process takes a long time (for 100,000 MRs, it is ~20 seconds for
> dereg and ~28 seconds for re-reg).
> 
> To shorten the restart runtime, insert all cacheable mkeys to the cache.
> If there is no fitting entry to the mkey properties, create a temporary
> entry that fits it.
> 
> After a predetermined timeout, the cache entries will shrink to the
> initial high limit.
> 
> The mkeys will still be in the cache when consuming them again after an
> application restart. Therefore, the registration will be much faster
> (for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).
> 
> The temporary cache entries created to store the non-cache mkeys are not
> exposed through sysfs like the default cache entries.
> 
> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h | 24 ++++++------
>  drivers/infiniband/hw/mlx5/mr.c      | 55 +++++++++++++++++++++-------
>  2 files changed, 55 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index be6d9ec5b127..8f0faa6bc9b5 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -617,12 +617,25 @@ enum mlx5_mkey_type {
>  	MLX5_MKEY_INDIRECT_DEVX,
>  };
>  
> +struct mlx5r_cache_rb_key {
> +	u8 ats:1;
> +	unsigned int access_mode;
> +	unsigned int access_flags;
> +	/*
> +	 * keep ndescs as the last member so entries with about the same ndescs
> +	 * will be close in the tree
> +	 */
> +	unsigned int ndescs;
> +};
> +
>  struct mlx5_ib_mkey {
>  	u32 key;
>  	enum mlx5_mkey_type type;
>  	unsigned int ndescs;
>  	struct wait_queue_head wait;
>  	refcount_t usecount;
> +	/* User Mkey must hold either a cache_key or a cache_ent. */
> +	struct mlx5r_cache_rb_key rb_key;

What is a cache_key?

Why do we now have ndecs and rb_key.ndescs in the same struct?

>  	struct mlx5_cache_ent *cache_ent;
>  };
>  
> @@ -731,17 +744,6 @@ struct umr_common {
>  	unsigned int state;
>  };
>  
> -struct mlx5r_cache_rb_key {
> -	u8 ats:1;
> -	unsigned int access_mode;
> -	unsigned int access_flags;
> -	/*
> -	 * keep ndescs as the last member so entries with about the same ndescs
> -	 * will be close in the tree
> -	 */
> -	unsigned int ndescs;
> -};

Don't move this, put it where it needs to be in the earlier patch

> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 6531e38ef4ec..2e984d436ad5 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1096,15 +1096,14 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
>  	rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags);
>  	ent = mkey_cache_ent_from_rb_key(dev, rb_key);
>  	/*
> -	 * Matches access in alloc_cache_mr(). If the MR can't come from the
> -	 * cache then synchronously create an uncached one.
> +	 * If the MR can't come from the cache then synchronously create an uncached
> +	 * one.
>  	 */
> -	if (!ent || ent->limit == 0 ||
> -	    !mlx5r_umr_can_reconfig(dev, 0, access_flags) ||
> -	    mlx5_umem_needs_ats(dev, umem, access_flags)) {
> +	if (!ent) {
>  		mutex_lock(&dev->slow_path_mutex);
>  		mr = reg_create(pd, umem, iova, access_flags, page_size, false);
>  		mutex_unlock(&dev->slow_path_mutex);
> +		mr->mmkey.rb_key = rb_key;
>  		return mr;
>  	}

Does this belong in this patch? Maybe these cleanups need their own patch

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key
  2022-12-08  0:39   ` Jason Gunthorpe
@ 2022-12-13 12:12     ` Michael Guralnik
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Guralnik @ 2022-12-13 12:12 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: leonro, linux-rdma, maorg


On 12/8/2022 2:39 AM, Jason Gunthorpe wrote:
> On Wed, Dec 07, 2022 at 10:57:50AM +0200, Michael Guralnik wrote:
>> Switch from using the mkey order to using the new struct as the key to
>> the RB tree of cache entries.
>> The key is all the mkey properties that UMR operations can't modify.
>> Using this key to define the cache entries and to search and create
>> cache mkeys.
>>
>> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
>> ---
>>   drivers/infiniband/hw/mlx5/mlx5_ib.h |  32 +++--
>>   drivers/infiniband/hw/mlx5/mr.c      | 196 +++++++++++++++++++--------
>>   drivers/infiniband/hw/mlx5/odp.c     |  26 ++--
>>   3 files changed, 180 insertions(+), 74 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> index 10e22fb01e1b..d795e9fc2c2f 100644
>> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> @@ -731,17 +731,26 @@ struct umr_common {
>>   	unsigned int state;
>>   };
>>   
>> +struct mlx5r_cache_rb_key {
>> +	u8 ats:1;
>> +	unsigned int access_mode;
>> +	unsigned int access_flags;
>> +	/*
>> +	 * keep ndescs as the last member so entries with about the same ndescs
>> +	 * will be close in the tree
>> +	 */
> ? How does this happen? The compare function doesn't use memcmp..
>
> I think this comment should go in the compare function because the
> search function does this:
>
>> -	return smallest;
>> +	return (smallest &&
>> +		smallest->rb_key.access_mode == rb_key.access_mode &&
>> +		smallest->rb_key.access_flags == rb_key.access_flags &&
>> +		smallest->rb_key.ats == rb_key.ats) ?
>> +		       smallest :
>> +		       NULL;
> So it isn't that they have to be close in the tree, it is that
> "smallest" has to find a matching mode/flags/ats before finding the
> smallest ndescs of the matching list. Thus ndescs must always by the
> last thing in the compare ladder.
Correct, I'll move the comment to the compare function.
We've had a previous version of the compare that used memcmp but we've 
changed it so now the comment is not relevant in the struct anymore
I'll fix this and rest of the comments in v3.

Thanks
Michael

>
>> +struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
>> +				       int access_flags, int access_mode,
>> +				       int ndescs)
>> +{
>> +	struct mlx5r_cache_rb_key rb_key = {
>> +		.ndescs = ndescs,
>> +		.access_mode = access_mode,
>> +		.access_flags = get_unchangeable_access_flags(dev, access_flags)
>> +	};
>> +	struct mlx5_cache_ent *ent = mkey_cache_ent_from_rb_key(dev, rb_key);
>> +	if (!ent)
> Missing newline
>
>>   struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
>> -					      int order)
>> +					      struct mlx5r_cache_rb_key rb_key,
>> +					      bool debugfs)
>>   {
>>   	struct mlx5_cache_ent *ent;
>>   	int ret;
>> @@ -808,7 +873,10 @@ struct mlx5_cache_ent *mlx5r_cache_create_ent(struct mlx5_ib_dev *dev,
>>   		return ERR_PTR(-ENOMEM);
>>   
>>   	xa_init_flags(&ent->mkeys, XA_FLAGS_LOCK_IRQ);
>> -	ent->order = order;
>> +	ent->rb_key.ats = rb_key.ats;
>> +	ent->rb_key.access_mode = rb_key.access_mode;
>> +	ent->rb_key.access_flags = rb_key.access_flags;
>> +	ent->rb_key.ndescs = rb_key.ndescs;
> ent->rb_key = rb_key
>
>>   int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
>>   {
>> +	struct mlx5r_cache_rb_key rb_key = {
>> +		.access_mode = MLX5_MKC_ACCESS_MODE_MTT,
>> +	};
>>   	struct mlx5_mkey_cache *cache = &dev->cache;
>>   	struct mlx5_cache_ent *ent;
>>   	int i;
>> @@ -838,19 +913,26 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
>>   
>>   	mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx);
>>   	timer_setup(&dev->delay_timer, delay_time_func, 0);
>> +	mlx5_mkey_cache_debugfs_init(dev);
>>   	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
>> -		ent = mlx5r_cache_create_ent(dev, i);
>> -
>> -		if (i > MKEY_CACHE_LAST_STD_ENTRY) {
>> -			mlx5_odp_init_mkey_cache_entry(ent);
>> +		if (i > mkey_cache_max_order(dev))
>>   			continue;
> This is goofy, just make the for loop go from 2 to
> mkey_cache_max_order() + 2 (and probably have the function do the + 2
> internally)
>
> Get rid of MAX_MKEY_CACHE_ENTRIES
>> +
>> +		if (i == MLX5_IMR_KSM_CACHE_ENTRY) {
>> +			ent = mlx5_odp_init_mkey_cache_entry(dev);
>> +			if (!ent)
>> +				continue;
> This too, just call mlx5_odp_init_mkey_cache_entry() outside the loop
>
> And rename it to something like mlx5_odp_init_mkey_cache(), and don't
> return ent.
>
> Set ent->limit inside mlx5r_cache_create_ent()
>
> And run over the whole rbtree in a final loop to do the final
> queue_adjust_cache_locked() step.
>
>> -void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
>> +struct mlx5_cache_ent *mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev)
>>   {
>> -	if (!(ent->dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
>> -		return;
>> -	ent->ndescs = mlx5_imr_ksm_entries;
>> -	ent->access_mode = MLX5_MKC_ACCESS_MODE_KSM;
>> +	struct mlx5r_cache_rb_key rb_key = {
>> +		.ats = 0,
>> +		.access_mode = MLX5_MKC_ACCESS_MODE_KSM,
>> +		.access_flags = 0,
>> +		.ndescs = mlx5_imr_ksm_entries,
> Don't need to zero init things here
>
> Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-13 12:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-07  8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
2022-12-07  8:57 ` [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
2022-12-07  8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
2022-12-08  0:02   ` Jason Gunthorpe
2022-12-08  0:22   ` Jason Gunthorpe
2022-12-07  8:57 ` [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree Michael Guralnik
2022-12-08  0:17   ` Jason Gunthorpe
2022-12-07  8:57 ` [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
2022-12-08  0:39   ` Jason Gunthorpe
2022-12-13 12:12     ` Michael Guralnik
2022-12-07  8:57 ` [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Michael Guralnik
2022-12-08  0:44   ` Jason Gunthorpe
2022-12-07  8:57 ` [PATCH v2 rdma-next 6/6] RDMA/mlx5: Add work to remove temporary entries from the cache Michael Guralnik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).