From: Leon Romanovsky <leon@kernel.org>
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Yishai Hadas <yishaih@nvidia.com>,
linux-rdma@vger.kernel.org, Christoph Hellwig <hch@infradead.org>
Subject: [PATCH rdma-next v3 2/4] IB/core: Enable ODP sync without faulting
Date: Wed, 30 Sep 2020 19:38:26 +0300 [thread overview]
Message-ID: <20200930163828.1336747-3-leon@kernel.org> (raw)
In-Reply-To: <20200930163828.1336747-1-leon@kernel.org>
From: Yishai Hadas <yishaih@nvidia.com>
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.
The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.
As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/umem_odp.c | 35 +++++++++++++++++++++---------
drivers/infiniband/hw/mlx5/odp.c | 2 +-
include/rdma/ib_umem_odp.h | 2 +-
3 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index b7dc9ccb2cc9..23c2c009f80e 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -347,9 +347,10 @@ static int ib_umem_odp_map_dma_single_page(
* the return value.
* @access_mask: bit mask of the requested access permissions for the given
* range.
+ * @fault: is faulting required for the given range
*/
int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt,
- u64 bcnt, u64 access_mask)
+ u64 bcnt, u64 access_mask, bool fault)
__acquires(&umem_odp->umem_mutex)
{
struct task_struct *owning_process = NULL;
@@ -385,10 +386,12 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt,
range.end = ALIGN(user_virt + bcnt, 1UL << page_shift);
pfn_start_idx = (range.start - ib_umem_start(umem_odp)) >> PAGE_SHIFT;
num_pfns = (range.end - range.start) >> PAGE_SHIFT;
- range.default_flags = HMM_PFN_REQ_FAULT;
+ if (fault) {
+ range.default_flags = HMM_PFN_REQ_FAULT;
- if (access_mask & ODP_WRITE_ALLOWED_BIT)
- range.default_flags |= HMM_PFN_REQ_WRITE;
+ if (access_mask & ODP_WRITE_ALLOWED_BIT)
+ range.default_flags |= HMM_PFN_REQ_WRITE;
+ }
range.hmm_pfns = &(umem_odp->pfn_list[pfn_start_idx]);
timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
@@ -417,12 +420,24 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt,
for (pfn_index = 0; pfn_index < num_pfns;
pfn_index += 1 << (page_shift - PAGE_SHIFT), dma_index++) {
- /*
- * Since we asked for hmm_range_fault() to populate pages,
- * it shouldn't return an error entry on success.
- */
- WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR);
- WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID));
+
+ if (fault) {
+ /*
+ * Since we asked for hmm_range_fault() to populate
+ * pages it shouldn't return an error entry on success.
+ */
+ WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR);
+ WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID));
+ } else {
+ if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) {
+ WARN_ON(umem_odp->dma_list[dma_index]);
+ continue;
+ }
+ access_mask = ODP_READ_ALLOWED_BIT;
+ if (range.hmm_pfns[pfn_index] & HMM_PFN_WRITE)
+ access_mask |= ODP_WRITE_ALLOWED_BIT;
+ }
+
hmm_order = hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]);
/* If a hugepage was detected and ODP wasn't set for, the umem
* page_shift will be used, the opposite case is an error.
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 0f203141a6ad..5bd5e19d76a2 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -681,7 +681,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
if (odp->umem.writable && !downgrade)
access_mask |= ODP_WRITE_ALLOWED_BIT;
- np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask);
+ np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, true);
if (np < 0)
return np;
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index a53b62ac8a9d..0844c1d05ac6 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -94,7 +94,7 @@ ib_umem_odp_alloc_child(struct ib_umem_odp *root_umem, unsigned long addr,
void ib_umem_odp_release(struct ib_umem_odp *umem_odp);
int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 start_offset,
- u64 bcnt, u64 access_mask);
+ u64 bcnt, u64 access_mask, bool fault);
void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 start_offset,
u64 bound);
--
2.26.2
next prev parent reply other threads:[~2020-09-30 16:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-30 16:38 [PATCH rdma-next v3 0/4] Improve ODP by using HMM API Leon Romanovsky
2020-09-30 16:38 ` [PATCH rdma-next v3 1/4] IB/core: Improve ODP to use hmm_range_fault() Leon Romanovsky
2020-09-30 16:38 ` Leon Romanovsky [this message]
2020-09-30 16:38 ` [PATCH rdma-next v3 3/4] RDMA/mlx5: Extend advice MR to support non faulting mode Leon Romanovsky
2020-09-30 16:38 ` [PATCH rdma-next v3 4/4] RDMA/mlx5: Sync device with CPU pages upon ODP MR registration Leon Romanovsky
2020-10-01 20:01 ` [PATCH rdma-next v3 0/4] Improve ODP by using HMM API Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200930163828.1336747-3-leon@kernel.org \
--to=leon@kernel.org \
--cc=dledford@redhat.com \
--cc=hch@infradead.org \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).