From: Leon Romanovsky <leon@kernel.org>
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-rdma@vger.kernel.org, Yishai Hadas <yishaih@nvidia.com>
Subject: [PATCH rdma-next 3/3] RDMA/mlx5: Allow larger pages in DevX umem
Date: Thu, 4 Mar 2021 15:05:01 +0200 [thread overview]
Message-ID: <20210304130501.1102577-4-leon@kernel.org> (raw)
In-Reply-To: <20210304130501.1102577-1-leon@kernel.org>
From: Jason Gunthorpe <jgg@nvidia.com>
The umem DMA list calculation was locked at 4k pages due to confusion
around how this API works and is used when larger pages are present.
The conclusion is:
- umem's cannot extend past what is mapped into the process, so creating
a lage page size and referring to a sub-range is not allowed
- umem's must always have a page offset of zero, except for sub PAGE_SIZE
umems
- The feature of umem_offset to create multiple objects inside a umem
is buggy and isn't used anyplace. Thus we can assume all users of the
current API have umem_offset == 0 as well
Provide a new page size calculator that limits the DMA list to the VA
range and enforces umem_offset == 0.
Allow user space to specify the page sizes which it can accept, this
bitmap must be derived from the intended use of the umem, based on
per-usage HW limitations.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/devx.c | 64 ++++++++++++++++++++----
include/uapi/rdma/mlx5_user_ioctl_cmds.h | 1 +
2 files changed, 55 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index de3c2fc6f361..17ab369efc5e 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2185,27 +2185,69 @@ static int devx_umem_get(struct mlx5_ib_dev *dev, struct ib_ucontext *ucontext,
return 0;
}
+static unsigned int devx_umem_find_best_pgsize(struct ib_umem *umem,
+ unsigned long pgsz_bitmap)
+{
+ unsigned long page_size;
+
+ /* Don't bother checking larger page sizes as offset must be zero and
+ * total DEVX umem length must be equal to total umem length.
+ */
+ pgsz_bitmap &= GENMASK_ULL(max_t(u64, order_base_2(umem->length),
+ PAGE_SHIFT),
+ MLX5_ADAPTER_PAGE_SHIFT);
+ if (!pgsz_bitmap)
+ return 0;
+
+ page_size = ib_umem_find_best_pgoff(umem, pgsz_bitmap, U64_MAX);
+ if (!page_size)
+ return 0;
+
+ /* If the page_size is less than the CPU page size then we can use the
+ * offset and create a umem which is a subset of the page list.
+ * For larger page sizes we can't be sure the DMA list reflects the
+ * VA so we must ensure that the umem extent is exactly equal to the
+ * page list. Reduce the page size until one of these cases is true.
+ */
+ while ((ib_umem_dma_offset(umem, page_size) != 0 ||
+ (umem->length % page_size) != 0) &&
+ page_size > PAGE_SIZE)
+ page_size /= 2;
+
+ return page_size;
+}
+
static int devx_umem_reg_cmd_alloc(struct mlx5_ib_dev *dev,
struct uverbs_attr_bundle *attrs,
struct devx_umem *obj,
struct devx_umem_reg_cmd *cmd)
{
+ unsigned long pgsz_bitmap;
unsigned int page_size;
__be64 *mtt;
void *umem;
+ int ret;
/*
- * We don't know what the user intends to use this umem for, but the HW
- * restrictions must be met. MR, doorbell records, QP, WQ and CQ all
- * have different requirements. Since we have no idea how to sort this
- * out, only support PAGE_SIZE with the expectation that userspace will
- * provide the necessary alignments inside the known PAGE_SIZE and that
- * FW will check everything.
+ * If the user does not pass in pgsz_bitmap then the user promises not
+ * to use umem_offset!=0 in any commands that allocate on top of the
+ * umem.
+ *
+ * If the user wants to use a umem_offset then it must pass in
+ * pgsz_bitmap which guides the maximum page size and thus maximum
+ * object alignment inside the umem. See the PRM.
+ *
+ * Users are not allowed to use IOVA here, mkeys are not supported on
+ * umem.
*/
- page_size = ib_umem_find_best_pgoff(
- obj->umem, PAGE_SIZE,
- __mlx5_page_offset_to_bitmask(__mlx5_bit_sz(umem, page_offset),
- 0));
+ ret = uverbs_get_const_default(&pgsz_bitmap, attrs,
+ MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP,
+ GENMASK_ULL(63,
+ min(PAGE_SHIFT, MLX5_ADAPTER_PAGE_SHIFT)));
+ if (ret)
+ return ret;
+
+ page_size = devx_umem_find_best_pgsize(obj->umem, pgsz_bitmap);
if (!page_size)
return -EINVAL;
@@ -2791,6 +2833,8 @@ DECLARE_UVERBS_NAMED_METHOD(
UA_MANDATORY),
UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_DEVX_UMEM_REG_ACCESS,
enum ib_access_flags),
+ UVERBS_ATTR_CONST_IN(MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP,
+ u64),
UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_DEVX_UMEM_REG_OUT_ID,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY));
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index 3fd9b380a091..3f0bc7597ba7 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -154,6 +154,7 @@ enum mlx5_ib_devx_umem_reg_attrs {
MLX5_IB_ATTR_DEVX_UMEM_REG_LEN,
MLX5_IB_ATTR_DEVX_UMEM_REG_ACCESS,
MLX5_IB_ATTR_DEVX_UMEM_REG_OUT_ID,
+ MLX5_IB_ATTR_DEVX_UMEM_REG_PGSZ_BITMAP,
};
enum mlx5_ib_devx_umem_dereg_attrs {
--
2.29.2
next prev parent reply other threads:[~2021-03-04 13:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-04 13:04 [PATCH rdma-next 0/3] Support larger than 4K pages in DevX UMEMs Leon Romanovsky
2021-03-04 13:04 ` [PATCH rdma-next 1/3] IB/core: Drop WARN_ON() from ib_umem_find_best_pgsz() Leon Romanovsky
2021-03-04 13:05 ` [PATCH rdma-next 2/3] IB/core: Split uverbs_get_const/default to consider target type Leon Romanovsky
2021-03-04 13:05 ` Leon Romanovsky [this message]
2021-03-12 0:30 ` [PATCH rdma-next 0/3] Support larger than 4K pages in DevX UMEMs Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210304130501.1102577-4-leon@kernel.org \
--to=leon@kernel.org \
--cc=dledford@redhat.com \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.