[PATCH rdma-next 7/8] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

From: Michael Guralnik <michaelgur@nvidia.com>
To: <jgg@nvidia.com>, <leonro@nvidia.com>
Cc: <maorg@nvidia.com>, <linux-rdma@vger.kernel.org>,
	<saeedm@nvidia.com>, Aharon Landau <aharonl@nvidia.com>,
	Michael Guralnik <michaelgur@nvidia.com>
Subject: [PATCH rdma-next 7/8] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
Date: Thu, 8 Sep 2022 23:54:20 +0300	[thread overview]
Message-ID: <20220908205421.210048-8-michaelgur@nvidia.com> (raw)
In-Reply-To: <20220908205421.210048-1-michaelgur@nvidia.com>

From: Aharon Landau <aharonl@nvidia.com>

Currently, when dereging an MR, if the mkey doesn't belong to a cache
entry, it will be destroyed.
As a result, the restart of applications with many non-cached mkeys is
not efficient since all the mkeys are destroyed and then recreated.
This process takes a long time (for 100,000 MRs, it is ~20 seconds for
dereg and ~28 seconds for re-reg).

To shorten the restart runtime, insert all cacheable mkeys to the cache.
If there is no fitting entry to the mkey properties, create a temporary
entry that fits it.

After a predetermined timeout, the cache entries will shrink to the
initial high limit.

The mkeys will still be in the cache when consuming them again after an
application restart. Therefore, the registration will be much faster
(for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).

The temporary cache entries created to store the non-cache mkeys are not
exposed through sysfs like the default cache entries.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  2 ++
 drivers/infiniband/hw/mlx5/mr.c      | 48 +++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7fd3b47190b1..109e3d666264 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -629,6 +629,8 @@ struct mlx5_ib_mkey {
 	unsigned int ndescs;
 	struct wait_queue_head wait;
 	refcount_t usecount;
+	/* User Mkey must hold either a cache_key or a cache_ent. */
+	struct mlx5r_cache_rb_key rb_key;
 	struct mlx5_cache_ent *cache_ent;
 };
 
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6977d0cbbe6f..1e7b3c2d71a7 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -812,6 +812,7 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, u8 access_mode,
 			return ERR_PTR(err);
 		}
 		mr->mmkey.ndescs = ndescs;
+		mr->mmkey.rb_key = rb_key;
 	}
 	mr->mmkey.type = MLX5_MKEY_MR;
 	init_waitqueue_head(&mr->mmkey.wait);
@@ -1723,6 +1724,42 @@ mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
 	}
 }
 
+static int cache_ent_find_and_store(struct mlx5_ib_dev *dev,
+				    struct mlx5_ib_mr *mr)
+{
+	struct mlx5_mkey_cache *cache = &dev->cache;
+	struct mlx5_cache_ent *ent;
+	struct rb_node *node;
+
+	if (mr->mmkey.cache_ent) {
+		xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
+		mr->mmkey.cache_ent->in_use--;
+		xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
+		goto end;
+	}
+
+	mutex_lock(&cache->rb_lock);
+	node = mlx5_cache_find_smallest_ent(&dev->cache, mr->mmkey.rb_key);
+	mutex_unlock(&cache->rb_lock);
+	if (node) {
+		ent = rb_entry(node, struct mlx5_cache_ent, node);
+		if (ent->rb_key.ndescs == mr->mmkey.rb_key.ndescs) {
+			mr->mmkey.cache_ent = ent;
+			goto end;
+		}
+	}
+
+	ent = mlx5r_cache_create_ent(dev, mr->mmkey.rb_key);
+	if (IS_ERR(ent))
+		return PTR_ERR(ent);
+
+	mr->mmkey.cache_ent = ent;
+
+end:
+	return push_mkey(mr->mmkey.cache_ent, false,
+			 xa_mk_value(mr->mmkey.key));
+}
+
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
@@ -1768,16 +1805,11 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	}
 
 	/* Stop DMA */
-	if (mr->mmkey.cache_ent) {
-		xa_lock_irq(&mr->mmkey.cache_ent->mkeys);
-		mr->mmkey.cache_ent->in_use--;
-		xa_unlock_irq(&mr->mmkey.cache_ent->mkeys);
-
+	if (mr->umem && mlx5r_umr_can_load_pas(dev, mr->umem->length))
 		if (mlx5r_umr_revoke_mr(mr) ||
-		    push_mkey(mr->mmkey.cache_ent, false,
-			      xa_mk_value(mr->mmkey.key)))
+		    cache_ent_find_and_store(dev, mr))
 			mr->mmkey.cache_ent = NULL;
-	}
+
 	if (!mr->mmkey.cache_ent) {
 		rc = destroy_mkey(to_mdev(mr->ibmr.device), mr);
 		if (rc)
-- 
2.17.2

next prev parent reply	other threads:[~2022-09-08 20:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 20:54 [PATCH rdma-next 0/8] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
2022-09-08 20:54 ` [PATCH rdma-next 1/8] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
2022-09-08 20:54 ` [PATCH rdma-next 2/8] RDMA/mlx5: Generalize mlx5_cache_cache_mr() to fit all cacheable mkeys Michael Guralnik
2022-09-09 14:47   ` Jason Gunthorpe
2022-09-08 20:54 ` [PATCH rdma-next 3/8] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
2022-09-08 20:54 ` [PATCH rdma-next 4/8] RDMA/mlx5: Allow rereg all the mkeys that can load pas with UMR Michael Guralnik
2022-09-08 20:54 ` [PATCH rdma-next 5/8] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
2022-09-08 20:54 ` [PATCH rdma-next 6/8] RDMA/mlx5: Change the cache structure to an RB-tree Michael Guralnik
2022-09-08 20:54 ` Michael Guralnik [this message]
2022-09-08 20:54 ` [PATCH rdma-next 8/8] RDMA/mlx5: Add work to remove temporary entries from the cache Michael Guralnik

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7fd3b47190b dfblob:109e3d66626 dfblob:6977d0cbbe6
dfblob:1e7b3c2d71a )
 OR (
bs:"[PATCH rdma-next 7/8] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220908205421.210048-8-michaelgur@nvidia.com \
    --to=michaelgur@nvidia.com \
    --cc=aharonl@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox