From: Jason Gunthorpe <jgg@nvidia.com>
To: Michael Guralnik <michaelgur@nvidia.com>
Cc: leonro@nvidia.com, linux-rdma@vger.kernel.org, maorg@nvidia.com
Subject: Re: [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow
Date: Wed, 7 Dec 2022 20:44:01 -0400 [thread overview]
Message-ID: <Y5EzURuqzm8uauMM@nvidia.com> (raw)
In-Reply-To: <20221207085752.82458-6-michaelgur@nvidia.com>
On Wed, Dec 07, 2022 at 10:57:51AM +0200, Michael Guralnik wrote:
> Currently, when dereging an MR, if the mkey doesn't belong to a cache
> entry, it will be destroyed.
> As a result, the restart of applications with many non-cached mkeys is
> not efficient since all the mkeys are destroyed and then recreated.
> This process takes a long time (for 100,000 MRs, it is ~20 seconds for
> dereg and ~28 seconds for re-reg).
>
> To shorten the restart runtime, insert all cacheable mkeys to the cache.
> If there is no fitting entry to the mkey properties, create a temporary
> entry that fits it.
>
> After a predetermined timeout, the cache entries will shrink to the
> initial high limit.
>
> The mkeys will still be in the cache when consuming them again after an
> application restart. Therefore, the registration will be much faster
> (for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg).
>
> The temporary cache entries created to store the non-cache mkeys are not
> exposed through sysfs like the default cache entries.
>
> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 24 ++++++------
> drivers/infiniband/hw/mlx5/mr.c | 55 +++++++++++++++++++++-------
> 2 files changed, 55 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index be6d9ec5b127..8f0faa6bc9b5 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -617,12 +617,25 @@ enum mlx5_mkey_type {
> MLX5_MKEY_INDIRECT_DEVX,
> };
>
> +struct mlx5r_cache_rb_key {
> + u8 ats:1;
> + unsigned int access_mode;
> + unsigned int access_flags;
> + /*
> + * keep ndescs as the last member so entries with about the same ndescs
> + * will be close in the tree
> + */
> + unsigned int ndescs;
> +};
> +
> struct mlx5_ib_mkey {
> u32 key;
> enum mlx5_mkey_type type;
> unsigned int ndescs;
> struct wait_queue_head wait;
> refcount_t usecount;
> + /* User Mkey must hold either a cache_key or a cache_ent. */
> + struct mlx5r_cache_rb_key rb_key;
What is a cache_key?
Why do we now have ndecs and rb_key.ndescs in the same struct?
> struct mlx5_cache_ent *cache_ent;
> };
>
> @@ -731,17 +744,6 @@ struct umr_common {
> unsigned int state;
> };
>
> -struct mlx5r_cache_rb_key {
> - u8 ats:1;
> - unsigned int access_mode;
> - unsigned int access_flags;
> - /*
> - * keep ndescs as the last member so entries with about the same ndescs
> - * will be close in the tree
> - */
> - unsigned int ndescs;
> -};
Don't move this, put it where it needs to be in the earlier patch
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 6531e38ef4ec..2e984d436ad5 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1096,15 +1096,14 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
> rb_key.access_flags = get_unchangeable_access_flags(dev, access_flags);
> ent = mkey_cache_ent_from_rb_key(dev, rb_key);
> /*
> - * Matches access in alloc_cache_mr(). If the MR can't come from the
> - * cache then synchronously create an uncached one.
> + * If the MR can't come from the cache then synchronously create an uncached
> + * one.
> */
> - if (!ent || ent->limit == 0 ||
> - !mlx5r_umr_can_reconfig(dev, 0, access_flags) ||
> - mlx5_umem_needs_ats(dev, umem, access_flags)) {
> + if (!ent) {
> mutex_lock(&dev->slow_path_mutex);
> mr = reg_create(pd, umem, iova, access_flags, page_size, false);
> mutex_unlock(&dev->slow_path_mutex);
> + mr->mmkey.rb_key = rb_key;
> return mr;
> }
Does this belong in this patch? Maybe these cleanups need their own patch
Jason
next prev parent reply other threads:[~2022-12-08 0:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-07 8:57 [PATCH v2 rdma-next 0/6] RDMA/mlx5: Switch MR cache to use RB-tree Michael Guralnik
2022-12-07 8:57 ` [PATCH v2 rdma-next 1/6] RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries Michael Guralnik
2022-12-07 8:57 ` [PATCH v2 rdma-next 2/6] RDMA/mlx5: Remove explicit ODP cache entry Michael Guralnik
2022-12-08 0:02 ` Jason Gunthorpe
2022-12-08 0:22 ` Jason Gunthorpe
2022-12-07 8:57 ` [PATCH v2 rdma-next 3/6] RDMA/mlx5: Change the cache structure to RB-tree Michael Guralnik
2022-12-08 0:17 ` Jason Gunthorpe
2022-12-07 8:57 ` [PATCH v2 rdma-next 4/6] RDMA/mlx5: Introduce mlx5r_cache_rb_key Michael Guralnik
2022-12-08 0:39 ` Jason Gunthorpe
2022-12-13 12:12 ` Michael Guralnik
2022-12-07 8:57 ` [PATCH v2 rdma-next 5/6] RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow Michael Guralnik
2022-12-08 0:44 ` Jason Gunthorpe [this message]
2022-12-07 8:57 ` [PATCH v2 rdma-next 6/6] RDMA/mlx5: Add work to remove temporary entries from the cache Michael Guralnik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y5EzURuqzm8uauMM@nvidia.com \
--to=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=michaelgur@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.