* generic RDMA READ/WRITE API V2
@ 2016-02-29 22:20 Christoph Hellwig
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
` (6 more replies)
0 siblings, 7 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
This series contains patches that implement a first version of a generic
API to handle RDMA READ/WRITE operations as commonly used on the target
(or server) side for storage protocols.
This has been developed for the upcoming NVMe over Fabrics target, and
extensively teѕted as part of that, although this upstream version has
additional updates over the one we're currently using.
It hides details such as the use of MRs for iWarp devices, and will allow
looking at other HCA specifics easily in the future.
This series contains a conversion of the SRP target, and the git tree
below also has a RFC conversion of the iSER target (a little hacky
due to the signature MR support which I can't test)
I also have a git tree available at:
git://git.infradead.org/users/hch/rdma.git rdma-rw-api
Gitweb:
http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-rw-api
These two also include the RFC iSER target conversion.
Changes since V1:
- fixed offset handling in ib_sg_to_pages
- uses proper SG iterators to handle larger than PAGE_SIZE segments
- adjusted parameters for some functions to reduce size of the context
- SRP target support
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/9] IB/cma: pass the port number to ib_create_qp
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-02-29 22:20 ` [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
2016-02-29 22:20 ` [PATCH 4/9] IB/core: refactor ib_create_qp Christoph Hellwig
2 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
The new RW API will need this.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/infiniband/core/cma.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9729639..a791522 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -800,6 +800,7 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
if (id->device != pd->device)
return -EINVAL;
+ qp_init_attr->port_num = id->port_num;
qp = ib_create_qp(pd, qp_init_attr);
if (IS_ERR(qp))
return PTR_ERR(qp);
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:20 ` [PATCH 1/9] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-02-29 23:13 ` Bart Van Assche
2016-02-29 22:20 ` [PATCH 4/9] IB/core: refactor ib_create_qp Christoph Hellwig
2 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/infiniband/core/verbs.c | 24 ++++++++++++------------
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++----
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 5 ++---
drivers/infiniband/hw/cxgb4/mem.c | 7 +++----
drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++---
drivers/infiniband/hw/mlx4/mr.c | 7 +++----
drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 ++---
drivers/infiniband/hw/mlx5/mr.c | 7 +++----
drivers/infiniband/hw/nes/nes_verbs.c | 7 +++----
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 7 +++----
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 5 ++---
drivers/infiniband/hw/qib/qib_mr.c | 7 +++----
drivers/infiniband/hw/qib/qib_verbs.h | 5 ++---
drivers/infiniband/ulp/iser/iser_memory.c | 4 ++--
drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
drivers/infiniband/ulp/srp/ib_srp.c | 2 +-
include/rdma/ib_verbs.h | 23 +++++++++--------------
net/rds/iw_rdma.c | 2 +-
net/rds/iw_send.c | 2 +-
net/sunrpc/xprtrdma/frwr_ops.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +-
21 files changed, 60 insertions(+), 77 deletions(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 5af6d02..1d04cea 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1557,6 +1557,7 @@ EXPORT_SYMBOL(ib_check_mr_status);
* @mr: memory region
* @sg: dma mapped scatterlist
* @sg_nents: number of entries in sg
+ * @sg_offset: offset in byes into sg
* @page_size: page vector desired page size
*
* Constraints:
@@ -1573,17 +1574,15 @@ EXPORT_SYMBOL(ib_check_mr_status);
* After this completes successfully, the memory region
* is ready for registration.
*/
-int ib_map_mr_sg(struct ib_mr *mr,
- struct scatterlist *sg,
- int sg_nents,
- unsigned int page_size)
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset, unsigned int page_size)
{
if (unlikely(!mr->device->map_mr_sg))
return -ENOSYS;
mr->page_size = page_size;
- return mr->device->map_mr_sg(mr, sg, sg_nents);
+ return mr->device->map_mr_sg(mr, sg, sg_nents, sg_offset);
}
EXPORT_SYMBOL(ib_map_mr_sg);
@@ -1593,6 +1592,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
* @mr: memory region
* @sgl: dma mapped scatterlist
* @sg_nents: number of entries in sg
+ * @sg_offset: offset in byes into sg
* @set_page: driver page assignment function pointer
*
* Core service helper for drivers to convert the largest
@@ -1603,10 +1603,8 @@ EXPORT_SYMBOL(ib_map_mr_sg);
* Returns the number of sg elements that were assigned to
* a page vector.
*/
-int ib_sg_to_pages(struct ib_mr *mr,
- struct scatterlist *sgl,
- int sg_nents,
- int (*set_page)(struct ib_mr *, u64))
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+ unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64))
{
struct scatterlist *sg;
u64 last_end_dma_addr = 0;
@@ -1614,12 +1612,12 @@ int ib_sg_to_pages(struct ib_mr *mr,
u64 page_mask = ~((u64)mr->page_size - 1);
int i, ret;
- mr->iova = sg_dma_address(&sgl[0]);
+ mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
mr->length = 0;
for_each_sg(sgl, sg, sg_nents, i) {
- u64 dma_addr = sg_dma_address(sg);
- unsigned int dma_len = sg_dma_len(sg);
+ u64 dma_addr = sg_dma_address(sg) + sg_offset;
+ unsigned int dma_len = sg_dma_len(sg) - sg_offset;
u64 end_dma_addr = dma_addr + dma_len;
u64 page_addr = dma_addr & page_mask;
@@ -1652,6 +1650,8 @@ next_page:
mr->length += dma_len;
last_end_dma_addr = end_dma_addr;
last_page_off = end_dma_addr & ~page_mask;
+
+ sg_offset = 0;
}
return i;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 2734820..a9b8ed5 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -782,15 +782,14 @@ static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-static int iwch_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+static int iwch_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+ int sg_nents, unsigned sg_offset)
{
struct iwch_mr *mhp = to_iwch_mr(ibmr);
mhp->npages = 0;
- return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
+ return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, iwch_set_page);
}
static int iwch_destroy_qp(struct ib_qp *ib_qp)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index fb2de75..5b6b962 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -957,9 +957,8 @@ void c4iw_qp_rem_ref(struct ib_qp *qp);
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents);
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset);
int c4iw_dealloc_mw(struct ib_mw *mw);
struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 7849890..65c67ba 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -686,15 +686,14 @@ static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset)
{
struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
mhp->mpl_len = 0;
- return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
+ return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, c4iw_set_page);
}
int c4iw_dereg_mr(struct ib_mr *ib_mr)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 52ce7b0..e38cc44 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -716,9 +716,8 @@ int mlx4_ib_dealloc_mw(struct ib_mw *mw);
struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset);
int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 242b94e..32c387d 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -526,9 +526,8 @@ static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset)
{
struct mlx4_ib_mr *mr = to_mmr(ibmr);
int rc;
@@ -539,7 +538,7 @@ int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
sizeof(u64) * mr->max_pages,
DMA_TO_DEVICE);
- rc = ib_sg_to_pages(ibmr, sg, sg_nents, mlx4_set_page);
+ rc = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, mlx4_set_page);
ib_dma_sync_single_for_device(ibmr->device, mr->page_map,
sizeof(u64) * mr->max_pages,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d2b9737..2d05bb5 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -654,9 +654,8 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset);
int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6000f7a..2afcb61 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1450,9 +1450,8 @@ static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset)
{
struct mlx5_ib_mr *mr = to_mmr(ibmr);
int n;
@@ -1463,7 +1462,7 @@ int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
mr->desc_size * mr->max_descs,
DMA_TO_DEVICE);
- n = ib_sg_to_pages(ibmr, sg, sg_nents, mlx5_set_page);
+ n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, mlx5_set_page);
ib_dma_sync_single_for_device(ibmr->device, mr->desc_map,
mr->desc_size * mr->max_descs,
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 8c4daf7..8c85b84 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -401,15 +401,14 @@ static int nes_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-static int nes_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+static int nes_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+ int sg_nents, unsigned int sg_offset)
{
struct nes_mr *nesmr = to_nesmr(ibmr);
nesmr->npages = 0;
- return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
+ return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, nes_set_page);
}
/**
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 37620b4..555127a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -3077,13 +3077,12 @@ static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset)
{
struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
mr->npages = 0;
- return ib_sg_to_pages(ibmr, sg, sg_nents, ocrdma_set_page);
+ return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, ocrdma_set_page);
}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 8b517fd..b290e5d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -122,8 +122,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned sg_offset);
#endif /* __OCRDMA_VERBS_H__ */
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 5f53304..c5fb4dd 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -315,15 +315,14 @@ static int qib_set_page(struct ib_mr *ibmr, u64 addr)
return 0;
}
-int qib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents)
+int qib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset)
{
struct qib_mr *mr = to_imr(ibmr);
mr->npages = 0;
- return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
+ return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, qib_set_page);
}
/**
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 6c5e777..5067cac 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1042,9 +1042,8 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_entries);
-int qib_map_mr_sg(struct ib_mr *ibmr,
- struct scatterlist *sg,
- int sg_nents);
+int qib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset);
int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 9a391cc..44cc85f 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -236,7 +236,7 @@ int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
page_vec->npages = 0;
page_vec->fake_mr.page_size = SIZE_4K;
plen = ib_sg_to_pages(&page_vec->fake_mr, mem->sg,
- mem->size, iser_set_page);
+ mem->size, 0, iser_set_page);
if (unlikely(plen < mem->size)) {
iser_err("page vec too short to hold this SG\n");
iser_data_buf_dump(mem, device->ib_device);
@@ -446,7 +446,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
ib_update_fast_reg_key(mr, ib_inc_rkey(mr->rkey));
- n = ib_map_mr_sg(mr, mem->sg, mem->size, SIZE_4K);
+ n = ib_map_mr_sg(mr, mem->sg, mem->size, 0, SIZE_4K);
if (unlikely(n != mem->size)) {
iser_err("failed to map sg (%d/%d)\n",
n, mem->size);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index f121e61..7c7ad3a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2561,7 +2561,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
wr = &inv_wr;
}
- n = ib_map_mr_sg(mr, mem->sg, mem->nents, PAGE_SIZE);
+ n = ib_map_mr_sg(mr, mem->sg, mem->nents, 0, PAGE_SIZE);
if (unlikely(n != mem->nents)) {
isert_err("failed to map mr sg (%d/%d)\n",
n, mem->nents);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 03022f6..60b169a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1365,7 +1365,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
rkey = ib_inc_rkey(desc->mr->rkey);
ib_update_fast_reg_key(desc->mr, rkey);
- n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, dev->mr_page_size);
+ n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, 0, dev->mr_page_size);
if (unlikely(n < 0))
return n;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 284b00c..2114929 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1806,7 +1806,8 @@ struct ib_device {
u32 max_num_sg);
int (*map_mr_sg)(struct ib_mr *mr,
struct scatterlist *sg,
- int sg_nents);
+ int sg_nents,
+ unsigned sg_offset);
struct ib_mw * (*alloc_mw)(struct ib_pd *pd,
enum ib_mw_type type);
int (*dealloc_mw)(struct ib_mw *mw);
@@ -3070,28 +3071,22 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
u16 pkey, const union ib_gid *gid,
const struct sockaddr *addr);
-int ib_map_mr_sg(struct ib_mr *mr,
- struct scatterlist *sg,
- int sg_nents,
- unsigned int page_size);
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset, unsigned int page_size);
static inline int
-ib_map_mr_sg_zbva(struct ib_mr *mr,
- struct scatterlist *sg,
- int sg_nents,
- unsigned int page_size)
+ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+ unsigned int sg_offset, unsigned int page_size)
{
int n;
- n = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+ n = ib_map_mr_sg(mr, sg, sg_nents, sg_offset, page_size);
mr->iova = 0;
return n;
}
-int ib_sg_to_pages(struct ib_mr *mr,
- struct scatterlist *sgl,
- int sg_nents,
- int (*set_page)(struct ib_mr *, u64));
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+ unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64));
#endif /* IB_VERBS_H */
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index b09a40c..7ce9a92 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -666,7 +666,7 @@ static int rds_iw_rdma_reg_mr(struct rds_iw_mapping *mapping)
struct ib_send_wr *failed_wr;
int ret, n;
- n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, PAGE_SIZE);
+ n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, 0, PAGE_SIZE);
if (unlikely(n != m_sg->len))
return n < 0 ? n : -EINVAL;
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index e20bd50..1ba3c57 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -767,7 +767,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
{
int n;
- n = ib_map_mr_sg(send->s_mr, sg, sg_nents, PAGE_SIZE);
+ n = ib_map_mr_sg(send->s_mr, sg, sg_nents, 0, PAGE_SIZE);
if (unlikely(n != sg_nents))
return n < 0 ? n : -EINVAL;
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e165673..4ffdcb2 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -384,7 +384,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
return -ENOMEM;
}
- n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
if (unlikely(n != frmr->sg_nents)) {
pr_err("RPC: %s: failed to map mr %p (%u/%u)\n",
__func__, frmr->fr_mr, n, frmr->sg_nents);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index c8b8a8b..1244a62 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -281,7 +281,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
}
atomic_inc(&xprt->sc_dma_used);
- n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
if (unlikely(n != frmr->sg_nents)) {
pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
frmr->mr, n, frmr->sg_nents);
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-03-01 9:02 ` Sagi Grimberg
2016-02-29 22:20 ` [PATCH 5/9] IB/core: add a simple MR pool Christoph Hellwig
` (4 subsequent siblings)
6 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/rdma/ib_verbs.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2114929..267f11e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2286,6 +2286,15 @@ static inline bool rdma_cap_roce_gid_table(const struct ib_device *device,
device->add_gid && device->del_gid;
}
+/*
+ * Check if the device supports READ W/ INVALIDATE.
+ */
+static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
+{
+ /* iWarp requires READ W/ INVALIDATE. No other device supports it yet */
+ return rdma_protocol_iwarp(dev, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid,
struct ib_gid_attr *attr);
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 4/9] IB/core: refactor ib_create_qp
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:20 ` [PATCH 1/9] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
2016-02-29 22:20 ` [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
Split the XRC magic into a separate function, and return early on failure
to make the initialization code readable.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/infiniband/core/verbs.c | 103 +++++++++++++++++++++-------------------
1 file changed, 54 insertions(+), 49 deletions(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1d04cea..662466f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -723,62 +723,67 @@ struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
}
EXPORT_SYMBOL(ib_open_qp);
+static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
+ struct ib_qp_init_attr *qp_init_attr)
+{
+ struct ib_qp *real_qp = qp;
+
+ qp->event_handler = __ib_shared_qp_event_handler;
+ qp->qp_context = qp;
+ qp->pd = NULL;
+ qp->send_cq = qp->recv_cq = NULL;
+ qp->srq = NULL;
+ qp->xrcd = qp_init_attr->xrcd;
+ atomic_inc(&qp_init_attr->xrcd->usecnt);
+ INIT_LIST_HEAD(&qp->open_list);
+
+ qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
+ qp_init_attr->qp_context);
+ if (IS_ERR(qp))
+ real_qp->device->destroy_qp(real_qp);
+ else
+ __ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
+ return qp;
+}
+
struct ib_qp *ib_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *qp_init_attr)
{
- struct ib_qp *qp, *real_qp;
- struct ib_device *device;
+ struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
+ struct ib_qp *qp;
- device = pd ? pd->device : qp_init_attr->xrcd->device;
qp = device->create_qp(pd, qp_init_attr, NULL);
-
- if (!IS_ERR(qp)) {
- qp->device = device;
- qp->real_qp = qp;
- qp->uobject = NULL;
- qp->qp_type = qp_init_attr->qp_type;
-
- atomic_set(&qp->usecnt, 0);
- if (qp_init_attr->qp_type == IB_QPT_XRC_TGT) {
- qp->event_handler = __ib_shared_qp_event_handler;
- qp->qp_context = qp;
- qp->pd = NULL;
- qp->send_cq = qp->recv_cq = NULL;
- qp->srq = NULL;
- qp->xrcd = qp_init_attr->xrcd;
- atomic_inc(&qp_init_attr->xrcd->usecnt);
- INIT_LIST_HEAD(&qp->open_list);
-
- real_qp = qp;
- qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
- qp_init_attr->qp_context);
- if (!IS_ERR(qp))
- __ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
- else
- real_qp->device->destroy_qp(real_qp);
- } else {
- qp->event_handler = qp_init_attr->event_handler;
- qp->qp_context = qp_init_attr->qp_context;
- if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
- qp->recv_cq = NULL;
- qp->srq = NULL;
- } else {
- qp->recv_cq = qp_init_attr->recv_cq;
- atomic_inc(&qp_init_attr->recv_cq->usecnt);
- qp->srq = qp_init_attr->srq;
- if (qp->srq)
- atomic_inc(&qp_init_attr->srq->usecnt);
- }
-
- qp->pd = pd;
- qp->send_cq = qp_init_attr->send_cq;
- qp->xrcd = NULL;
-
- atomic_inc(&pd->usecnt);
- atomic_inc(&qp_init_attr->send_cq->usecnt);
- }
+ if (IS_ERR(qp))
+ return qp;
+
+ qp->device = device;
+ qp->real_qp = qp;
+ qp->uobject = NULL;
+ qp->qp_type = qp_init_attr->qp_type;
+
+ atomic_set(&qp->usecnt, 0);
+ if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
+ return ib_create_xrc_qp(qp, qp_init_attr);
+
+ qp->event_handler = qp_init_attr->event_handler;
+ qp->qp_context = qp_init_attr->qp_context;
+ if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
+ qp->recv_cq = NULL;
+ qp->srq = NULL;
+ } else {
+ qp->recv_cq = qp_init_attr->recv_cq;
+ atomic_inc(&qp_init_attr->recv_cq->usecnt);
+ qp->srq = qp_init_attr->srq;
+ if (qp->srq)
+ atomic_inc(&qp_init_attr->srq->usecnt);
}
+ qp->pd = pd;
+ qp->send_cq = qp_init_attr->send_cq;
+ qp->xrcd = NULL;
+
+ atomic_inc(&pd->usecnt);
+ atomic_inc(&qp_init_attr->send_cq->usecnt);
return qp;
}
EXPORT_SYMBOL(ib_create_qp);
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 5/9] IB/core: add a simple MR pool
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:20 ` [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-02-29 22:36 ` Bart Van Assche
2016-03-01 19:12 ` Bart Van Assche
2016-02-29 22:20 ` [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
` (3 subsequent siblings)
6 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/mr_pool.c | 84 +++++++++++++++++++++++++++++++++++++++
drivers/infiniband/core/verbs.c | 5 +++
include/rdma/ib_verbs.h | 9 ++++-
include/rdma/mr_pool.h | 25 ++++++++++++
5 files changed, 123 insertions(+), 2 deletions(-)
create mode 100644 drivers/infiniband/core/mr_pool.c
create mode 100644 include/rdma/mr_pool.h
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index f818538..48bd9d8 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
ib_core-y := packer.o ud_header.o verbs.o cq.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
- roce_gid_mgmt.o
+ roce_gid_mgmt.o mr_pool.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
diff --git a/drivers/infiniband/core/mr_pool.c b/drivers/infiniband/core/mr_pool.c
new file mode 100644
index 0000000..b1eb27a
--- /dev/null
+++ b/drivers/infiniband/core/mr_pool.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#include <rdma/ib_verbs.h>
+#include <rdma/mr_pool.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
+{
+ struct ib_mr *mr = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&qp->mr_lock, flags);
+ mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
+ if (mr)
+ qp->mrs_used++;
+ spin_unlock_irqrestore(&qp->mr_lock, flags);
+
+ return mr;
+}
+EXPORT_SYMBOL(ib_mr_pool_get);
+
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&qp->mr_lock, flags);
+ list_add(&mr->qp_entry, list);
+ qp->mrs_used--;
+ spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_put);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+ enum ib_mr_type type, u32 max_num_sg)
+{
+ struct ib_mr *mr;
+ unsigned long flags;
+ int ret, i;
+
+ for (i = 0; i < nr; i++) {
+ mr = ib_alloc_mr(qp->pd, type, max_num_sg);
+ if (IS_ERR(mr)) {
+ ret = PTR_ERR(mr);
+ goto out;
+ }
+
+ spin_lock_irqsave(&qp->mr_lock, flags);
+ list_add_tail(&mr->qp_entry, list);
+ spin_unlock_irqrestore(&qp->mr_lock, flags);
+ }
+
+ return 0;
+out:
+ ib_mr_pool_destroy(qp, list);
+ return ret;
+}
+EXPORT_SYMBOL(ib_mr_pool_init);
+
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list)
+{
+ struct ib_mr *mr;
+ unsigned long flags;
+
+ spin_lock_irqsave(&qp->mr_lock, flags);
+ while (!list_empty(list)) {
+ mr = list_first_entry(list, struct ib_mr, qp_entry);
+ list_del(&mr->qp_entry);
+
+ spin_unlock_irqrestore(&qp->mr_lock, flags);
+ ib_dereg_mr(mr);
+ spin_lock_irqsave(&qp->mr_lock, flags);
+ }
+ spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_destroy);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 662466f..20bb5d1 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -762,6 +762,9 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
qp->qp_type = qp_init_attr->qp_type;
atomic_set(&qp->usecnt, 0);
+ qp->mrs_used = 0;
+ spin_lock_init(&qp->mr_lock);
+
if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
return ib_create_xrc_qp(qp, qp_init_attr);
@@ -1255,6 +1258,8 @@ int ib_destroy_qp(struct ib_qp *qp)
struct ib_srq *srq;
int ret;
+ WARN_ON_ONCE(qp->mrs_used > 0);
+
if (atomic_read(&qp->usecnt))
return -EBUSY;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 267f11e..1e68dae 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1408,6 +1408,10 @@ struct ib_qp {
struct ib_srq *srq;
struct ib_xrcd *xrcd; /* XRC TGT QPs only */
struct list_head xrcd_list;
+
+ spinlock_t mr_lock;
+ int mrs_used;
+
/* count times opened, mcast attaches, flow attaches */
atomic_t usecnt;
struct list_head open_list;
@@ -1422,12 +1426,15 @@ struct ib_qp {
struct ib_mr {
struct ib_device *device;
struct ib_pd *pd;
- struct ib_uobject *uobject;
u32 lkey;
u32 rkey;
u64 iova;
u32 length;
unsigned int page_size;
+ union {
+ struct ib_uobject *uobject; /* user */
+ struct list_head qp_entry; /* FR */
+ };
};
struct ib_mw {
diff --git a/include/rdma/mr_pool.h b/include/rdma/mr_pool.h
new file mode 100644
index 0000000..986010b
--- /dev/null
+++ b/include/rdma/mr_pool.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_MR_POOL_H
+#define _RDMA_MR_POOL_H 1
+
+#include <rdma/ib_verbs.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list);
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+ enum ib_mr_type type, u32 max_num_sg);
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list);
+
+#endif /* _RDMA_MR_POOL_H */
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
` (2 preceding siblings ...)
2016-02-29 22:20 ` [PATCH 5/9] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-02-29 22:40 ` Bart Van Assche
2016-02-29 22:20 ` [PATCH 7/9] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
` (2 subsequent siblings)
6 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel, Steve Wise
From: Steve Wise <swise@chelsio.com>
This is the first step toward moving MR invalidation decisions
to the core. It will be needed by the upcoming RW API.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
include/rdma/ib_verbs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1e68dae..2b94cea 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1431,6 +1431,7 @@ struct ib_mr {
u64 iova;
u32 length;
unsigned int page_size;
+ bool need_inval;
union {
struct ib_uobject *uobject; /* user */
struct list_head qp_entry; /* FR */
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
` (3 preceding siblings ...)
2016-02-29 22:20 ` [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
[not found] ` <1456784410-20166-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-03-03 10:53 ` Sagi Grimberg
2016-02-29 22:20 ` [PATCH 8/9] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
2016-02-29 22:20 ` [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
6 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
This supports both manual mapping of lots of SGEs, as well as using MRs
from the QP's MR pool, for iWarp or other cases where it's more optimal.
For now, MRs are only used for iWARP transports. The user of the RDMA-RW
API must allocate the QP MR pool as well as size the SQ accordingly.
Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
and to Sagi Grimberg for ideas, reviews and fixes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/rw.c | 414 +++++++++++++++++++++++++++++++++++++++
drivers/infiniband/core/verbs.c | 25 +++
include/rdma/ib_verbs.h | 14 +-
include/rdma/rw.h | 69 +++++++
5 files changed, 522 insertions(+), 2 deletions(-)
create mode 100644 drivers/infiniband/core/rw.c
create mode 100644 include/rdma/rw.h
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 48bd9d8..26987d9 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
-ib_core-y := packer.o ud_header.o verbs.o cq.o sysfs.o \
+ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o mr_pool.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
new file mode 100644
index 0000000..e1cc1a9
--- /dev/null
+++ b/drivers/infiniband/core/rw.c
@@ -0,0 +1,414 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#include <linux/slab.h>
+#include <rdma/mr_pool.h>
+#include <rdma/rw.h>
+
+/*
+ * Check if the device needs a memory registration. We currently always use
+ * memory registrations for iWarp, and never for IB and RoCE. In the future
+ * we can hopefully fine tune this based on HCA driver input.
+ */
+static inline bool rdma_rw_use_mr(struct ib_device *dev, u8 port_num)
+{
+ return rdma_protocol_iwarp(dev, port_num);
+}
+
+static inline u32 rdma_rw_max_sge(struct ib_device *dev,
+ enum dma_data_direction dir)
+{
+ return dir == DMA_TO_DEVICE ?
+ dev->attrs.max_sge : dev->attrs.max_sge_rd;
+}
+
+static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+ u8 port_num, struct scatterlist *sg, u32 offset,
+ u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+ int pages_per_mr = qp->pd->device->attrs.max_fast_reg_page_list_len;
+ int pages_left = ctx->dma_nents;
+ u32 va_offset = 0;
+ int i, ret = 0, count = 0;
+
+ ctx->nr_ops = (ctx->dma_nents + pages_per_mr - 1) / pages_per_mr;
+ ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
+ if (!ctx->reg) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ for (i = 0; i < ctx->nr_ops; i++) {
+ struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
+ struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
+ int nents = min(pages_left, pages_per_mr);
+
+ reg->mr = ib_mr_pool_get(qp, &qp->rdma_mrs);
+ if (!reg->mr) {
+ pr_info("failed to allocate MR from pool\n");
+ ret = -EAGAIN;
+ goto out_free;
+ }
+
+ if (reg->mr->need_inval) {
+ reg->inv_wr.opcode = IB_WR_LOCAL_INV;
+ reg->inv_wr.ex.invalidate_rkey = reg->mr->lkey;
+ reg->inv_wr.next = ®->reg_wr.wr;
+ if (prev)
+ prev->wr.wr.next = ®->inv_wr;
+
+ count++;
+ } else if (prev) {
+ prev->wr.wr.next = ®->reg_wr.wr;
+ }
+
+ ib_update_fast_reg_key(reg->mr, ib_inc_rkey(reg->mr->lkey));
+
+ ret = ib_map_mr_sg(reg->mr, sg, nents, offset,
+ PAGE_SIZE);
+ if (ret < nents) {
+ pr_info("failed to map MR\n");
+ ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ reg->reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg->reg_wr.mr = reg->mr;
+ reg->reg_wr.key = reg->mr->lkey;
+ reg->reg_wr.wr.next = ®->wr.wr;
+ count++;
+
+ reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
+ if (rdma_protocol_iwarp(qp->device, port_num))
+ reg->reg_wr.access |= IB_ACCESS_REMOTE_WRITE;
+
+ reg->sge.lkey = reg->mr->lkey;
+ reg->sge.addr = reg->mr->iova;
+ reg->sge.length = reg->mr->length;
+
+ reg->wr.wr.sg_list = ®->sge;
+ reg->wr.wr.num_sge = 1;
+ reg->wr.remote_addr = remote_addr + va_offset;
+ reg->wr.rkey = rkey;
+ count++;
+
+ if (dir == DMA_FROM_DEVICE) {
+ if (rdma_has_read_invalidate(qp->device, port_num)) {
+ reg->wr.wr.opcode = IB_WR_RDMA_READ_WITH_INV;
+ reg->wr.wr.ex.invalidate_rkey = reg->mr->lkey;
+ reg->mr->need_inval = false;
+ } else {
+ reg->wr.wr.opcode = IB_WR_RDMA_READ;
+ reg->mr->need_inval = true;
+ }
+ } else {
+ reg->wr.wr.opcode = IB_WR_RDMA_WRITE;
+ reg->mr->need_inval = true;
+ }
+
+ va_offset += reg->sge.length;
+ pages_left -= nents;
+ sg = sg_next(sg);
+ offset = 0;
+ }
+
+ return count;
+
+out_free:
+ while (--i >= 0)
+ ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+ kfree(ctx->reg);
+out:
+ return ret;
+}
+
+static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+ struct scatterlist *sg, u32 offset, u64 remote_addr, u32 rkey,
+ enum dma_data_direction dir)
+{
+ struct ib_device *dev = qp->pd->device;
+ u32 max_sge = rdma_rw_max_sge(dev, dir);
+ u32 sge_left = ctx->dma_nents;
+ struct ib_sge *sge;
+ u32 total_len = 0, i, j;
+
+ ctx->nr_ops = DIV_ROUND_UP(ctx->dma_nents, max_sge);
+
+ ctx->map.sges = sge = kcalloc(ctx->dma_nents, sizeof(*sge), GFP_KERNEL);
+ if (!ctx->map.sges)
+ goto out;
+
+ ctx->map.wrs = kcalloc(ctx->nr_ops, sizeof(*ctx->map.wrs), GFP_KERNEL);
+ if (!ctx->map.wrs)
+ goto out_free_sges;
+
+ for (i = 0; i < ctx->nr_ops; i++) {
+ struct ib_rdma_wr *rdma_wr = &ctx->map.wrs[i];
+ u32 nr_sge = min(sge_left, max_sge);
+
+ if (dir == DMA_TO_DEVICE)
+ rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+ else
+ rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+ rdma_wr->remote_addr = remote_addr + total_len;
+ rdma_wr->rkey = rkey;
+ rdma_wr->wr.sg_list = sge;
+
+ for (j = 0; j < nr_sge; j++, sg = sg_next(sg)) {
+ BUG_ON(!sg);
+ rdma_wr->wr.num_sge++;
+
+ sge->addr = ib_sg_dma_address(dev, sg) + offset;
+ sge->length = ib_sg_dma_len(dev, sg) - offset;
+ sge->lkey = qp->pd->local_dma_lkey;
+
+ total_len += sge->length;
+ sge++;
+ sge_left--;
+ offset = 0;
+ }
+
+ if (i + 1 != ctx->nr_ops)
+ rdma_wr->wr.next = &ctx->map.wrs[i + 1].wr;
+ }
+
+ return ctx->nr_ops;
+
+out_free_sges:
+ kfree(ctx->map.sges);
+out:
+ return -ENOMEM;
+}
+
+static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+ struct scatterlist *sg, u32 offset, u64 remote_addr, u32 rkey,
+ enum dma_data_direction dir)
+{
+ struct ib_device *dev = qp->pd->device;
+ struct ib_rdma_wr *rdma_wr = &ctx->single.wr;
+
+ ctx->nr_ops = 1;
+
+ ctx->single.sge.lkey = qp->pd->local_dma_lkey;
+ ctx->single.sge.addr = ib_sg_dma_address(dev, sg) + offset;
+ ctx->single.sge.length = ib_sg_dma_len(dev, sg) - offset;
+
+ memset(rdma_wr, 0, sizeof(*rdma_wr));
+ if (dir == DMA_TO_DEVICE)
+ rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+ else
+ rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+ rdma_wr->wr.sg_list = &ctx->single.sge;
+ rdma_wr->wr.num_sge = 1;
+ rdma_wr->remote_addr = remote_addr;
+ rdma_wr->rkey = rkey;
+
+ return 1;
+}
+
+/**
+ * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
+ * @ctx: context to initialize
+ * @qp: queue pair to operate on
+ * @port_num: port num to which the connection is bound
+ * @sg: scatterlist to READ/WRITE from/to
+ * @sg_cnt: number of entries in @sg
+ * @sg_offset: current byte offset into @sg
+ * @length: total length of @sg in bytes
+ * @remote_addr:remote address to read/write (relative to @rkey)
+ * @rkey: remote key to operate on
+ * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ *
+ * If we're going to use a FR to map this context @max_nents should be smaller
+ * or equal to the MR size.
+ *
+ * Returns the number of WQEs that will be needed on the workqueue if
+ * successful, or a negative error code.
+ */
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
+ u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+ struct ib_device *dev = qp->pd->device;
+ int ret;
+
+ ctx->dma_nents = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+ if (!ctx->dma_nents)
+ return -ENOMEM;
+
+ /*
+ * Skip to the S/G entry that sg_offset falls into:
+ */
+ for (; sg; sg = sg_next(sg)) {
+ u32 len = ib_sg_dma_len(dev, sg);
+
+ if (sg_offset < len)
+ break;
+
+ sg_offset -= len;
+ ctx->dma_nents--;
+ }
+
+ if (rdma_rw_use_mr(qp->device, port_num)) {
+ ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
+ remote_addr, rkey, dir);
+ } else if (ctx->dma_nents > 1) {
+ ret = rdma_rw_init_map_wrs(ctx, qp, sg, sg_offset,
+ remote_addr, rkey, dir);
+ } else {
+ ret = rdma_rw_init_single_wr(ctx, qp, sg, sg_offset,
+ remote_addr, rkey, dir);
+ }
+
+ if (ret < 0)
+ goto out_unmap_sg;
+ return ret;
+
+out_unmap_sg:
+ ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_init);
+
+/**
+ * rdma_rw_ctx_destroy - release all resources allocated by rdma_rw_ctx_init
+ * @ctx: context to release
+ * @qp: queue pair to operate on
+ * @port_num: port num to which the connection is bound
+ * @sg: scatterlist that was used for the READ/WRITE
+ * @sg_cnt: number of entries in @sg
+ * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ */
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct scatterlist *sg, u32 sg_cnt, enum dma_data_direction dir)
+{
+ if (rdma_rw_use_mr(qp->device, port_num)) {
+ int i;
+
+ for (i = 0; i < ctx->nr_ops; i++)
+ ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+ kfree(ctx->reg);
+ } else if (ctx->dma_nents > 1) {
+ kfree(ctx->map.wrs);
+ kfree(ctx->map.sges);
+ }
+
+ ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_destroy);
+
+/**
+ * rdma_rw_ctx_wrs - return chain of WRs for a RDMA READ or WRITE operation
+ * @ctx: context to operate on
+ * @qp: queue pair to operate on
+ * @port_num: port num to which the connection is bound
+ * @cqe: completion queue entry for the last WR
+ * @chain_wr: WR to append to the posted chain
+ *
+ * Return the WR chain for the set of RDMA READ/WRITE operations described by
+ * @ctx, as well as any memory registration operations needed. If @chain_wr
+ * is non-NULL the WR it points to will be appended to the chain of WRs posted.
+ * If @chain_wr is not set @cqe must be set so that the caller gets a
+ * completion notification.
+ */
+struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+ u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+ struct ib_send_wr *first_wr, *last_wr;
+
+ if (rdma_rw_use_mr(qp->device, port_num)) {
+ if (ctx->reg[0].inv_wr.next)
+ first_wr = &ctx->reg[0].inv_wr;
+ else
+ first_wr = &ctx->reg[0].reg_wr.wr;
+ last_wr = &ctx->reg[ctx->nr_ops - 1].wr.wr;
+ } else if (ctx->dma_nents > 1) {
+ first_wr = &ctx->map.wrs[0].wr;
+ last_wr = &ctx->map.wrs[ctx->nr_ops - 1].wr;
+ } else {
+ first_wr = &ctx->single.wr.wr;
+ last_wr = &ctx->single.wr.wr;
+ }
+
+ if (chain_wr) {
+ last_wr->next = chain_wr;
+ } else {
+ last_wr->wr_cqe = cqe;
+ last_wr->send_flags |= IB_SEND_SIGNALED;
+ }
+
+ return first_wr;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_wrs);
+
+/**
+ * rdma_rw_ctx_post - post a RDMA READ or RDMA WRITE operation
+ * @ctx: context to operate on
+ * @qp: queue pair to operate on
+ * @port_num: port num to which the connection is bound
+ * @cqe: completion queue entry for the last WR
+ * @chain_wr: WR to append to the posted chain
+ *
+ * Post the set of RDMA READ/WRITE operations described by @ctx, as well as
+ * any memory registration operations needed. If @chain_wr is non-NULL the
+ * WR it points to will be appended to the chain of WRs posted. If @chain_wr
+ * is not set @cqe must be set so that the caller gets a completion
+ * notification.
+ */
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+ struct ib_send_wr *first_wr, *bad_wr;
+
+ first_wr = rdma_rw_ctx_wrs(ctx, qp, port_num, cqe, chain_wr);
+ return ib_post_send(qp, first_wr, &bad_wr);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_post);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
+{
+ /*
+ * Each context needs at least one RDMA READ or WRITE WR.
+ *
+ * For some hardware we might need more, eventually we should ask the
+ * HCA driver for a multiplier here.
+ */
+ attr->cap.max_send_wr += attr->cap.max_rdma_ctxs;
+
+ /*
+ * If the devices needs MRs to perform RDMA READ or WRITE operations,
+ * we'll need two additional MRs for the registrations and the
+ * invalidation.
+ */
+ if (rdma_rw_use_mr(dev, attr->port_num))
+ attr->cap.max_send_wr += 2 * attr->cap.max_rdma_ctxs;
+}
+
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
+{
+ struct ib_device *dev = qp->pd->device;
+ int ret = 0;
+
+ if (rdma_rw_use_mr(dev, attr->port_num)) {
+ ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
+ attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
+ dev->attrs.max_fast_reg_page_list_len);
+ }
+
+ return ret;
+}
+
+void rdma_rw_cleanup_mrs(struct ib_qp *qp)
+{
+ ib_mr_pool_destroy(qp, &qp->rdma_mrs);
+}
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 20bb5d1..686f9c2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -48,6 +48,7 @@
#include <rdma/ib_verbs.h>
#include <rdma/ib_cache.h>
#include <rdma/ib_addr.h>
+#include <rdma/rw.h>
#include "core_priv.h"
@@ -751,6 +752,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
{
struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
struct ib_qp *qp;
+ int ret;
+
+ /*
+ * If the callers is using the RDMA API calculate the resources
+ * needed for the RDMA READ/WRITE operations.
+ *
+ * Note that these callers need to pass in a port number.
+ */
+ if (qp_init_attr->cap.max_rdma_ctxs)
+ rdma_rw_init_qp(device, qp_init_attr);
qp = device->create_qp(pd, qp_init_attr, NULL);
if (IS_ERR(qp))
@@ -764,6 +775,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
atomic_set(&qp->usecnt, 0);
qp->mrs_used = 0;
spin_lock_init(&qp->mr_lock);
+ INIT_LIST_HEAD(&qp->rdma_mrs);
if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
return ib_create_xrc_qp(qp, qp_init_attr);
@@ -787,6 +799,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
atomic_inc(&pd->usecnt);
atomic_inc(&qp_init_attr->send_cq->usecnt);
+
+ if (qp_init_attr->cap.max_rdma_ctxs) {
+ ret = rdma_rw_init_mrs(qp, qp_init_attr);
+ if (ret) {
+ pr_err("failed to init MR pool ret= %d\n", ret);
+ ib_destroy_qp(qp);
+ qp = ERR_PTR(ret);
+ }
+ }
+
return qp;
}
EXPORT_SYMBOL(ib_create_qp);
@@ -1271,6 +1293,9 @@ int ib_destroy_qp(struct ib_qp *qp)
rcq = qp->recv_cq;
srq = qp->srq;
+ if (!qp->uobject)
+ rdma_rw_cleanup_mrs(qp);
+
ret = qp->device->destroy_qp(qp);
if (!ret) {
if (pd)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2b94cea..035585a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -915,6 +915,13 @@ struct ib_qp_cap {
u32 max_send_sge;
u32 max_recv_sge;
u32 max_inline_data;
+
+ /*
+ * Maximum number of rdma_rw_ctx structures in flight at a time.
+ * ib_create_qp() will calculate the right amount of neededed WRs
+ * and MRs based on this.
+ */
+ u32 max_rdma_ctxs;
};
enum ib_sig_type {
@@ -986,7 +993,11 @@ struct ib_qp_init_attr {
enum ib_sig_type sq_sig_type;
enum ib_qp_type qp_type;
enum ib_qp_create_flags create_flags;
- u8 port_num; /* special QP types only */
+
+ /*
+ * Only needed for special QP types, or when using the RW API.
+ */
+ u8 port_num;
};
struct ib_qp_open_attr {
@@ -1410,6 +1421,7 @@ struct ib_qp {
struct list_head xrcd_list;
spinlock_t mr_lock;
+ struct list_head rdma_mrs;
int mrs_used;
/* count times opened, mcast attaches, flow attaches */
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
new file mode 100644
index 0000000..57ea304
--- /dev/null
+++ b/include/rdma/rw.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_RW_H
+#define _RDMA_RW_H
+
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/mr_pool.h>
+
+struct rdma_rw_ctx {
+ /* number of SGL entries returned by dma_map_sg */
+ u32 dma_nents;
+
+ /* number of RDMA READ/WRITE WRs (not counting MR WRs) */
+ u32 nr_ops;
+
+ union {
+ /* for mapping a single SGE: */
+ struct {
+ struct ib_sge sge;
+ struct ib_rdma_wr wr;
+ } single;
+
+ /* for mapping of multiple SGEs: */
+ struct {
+ struct ib_sge *sges;
+ struct ib_rdma_wr *wrs;
+ } map;
+
+ /* for registering multiple WRs: */
+ struct rdma_rw_reg_ctx {
+ struct ib_sge sge;
+ struct ib_rdma_wr wr;
+ struct ib_reg_wr reg_wr;
+ struct ib_send_wr inv_wr;
+ struct ib_mr *mr;
+ } *reg;
+ };
+};
+
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
+ u64 remote_addr, u32 rkey, enum dma_data_direction dir);
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct scatterlist *sg, u32 sg_cnt,
+ enum dma_data_direction dir);
+
+struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+ u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+ struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
+void rdma_rw_cleanup_mrs(struct ib_qp *qp);
+
+#endif /* _RDMA_RW_H */
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 8/9] target: enhance and export target_alloc_sgl/target_free_sgl
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
` (4 preceding siblings ...)
2016-02-29 22:20 ` [PATCH 7/9] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-02-29 22:20 ` [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
6 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
The SRP target driver will need to allocate and chain it's own SGLs soon.
For this export target_alloc_sgl, and add a new argument to it so that it
can allocate an additional chain entry that doesn't point to a page. Also
export transport_free_sgl after renaming it to target_free_sgl to free
these SGLs again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/target/target_core_transport.c | 32 ++++++++++++++++++--------------
drivers/target/target_core_xcopy.c | 2 +-
include/target/target_core_backend.h | 1 -
include/target/target_core_fabric.h | 4 ++++
4 files changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 867bc6d..bb32da7c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2131,7 +2131,7 @@ queue_full:
transport_handle_queue_full(cmd, cmd->se_dev);
}
-static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
+void target_free_sgl(struct scatterlist *sgl, int nents)
{
struct scatterlist *sg;
int count;
@@ -2141,6 +2141,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
kfree(sgl);
}
+EXPORT_SYMBOL(target_free_sgl);
static inline void transport_reset_sgl_orig(struct se_cmd *cmd)
{
@@ -2161,7 +2162,7 @@ static inline void transport_reset_sgl_orig(struct se_cmd *cmd)
static inline void transport_free_pages(struct se_cmd *cmd)
{
if (!(cmd->se_cmd_flags & SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC)) {
- transport_free_sgl(cmd->t_prot_sg, cmd->t_prot_nents);
+ target_free_sgl(cmd->t_prot_sg, cmd->t_prot_nents);
cmd->t_prot_sg = NULL;
cmd->t_prot_nents = 0;
}
@@ -2172,7 +2173,7 @@ static inline void transport_free_pages(struct se_cmd *cmd)
* SG_TO_MEM_NOALLOC to function with COMPARE_AND_WRITE
*/
if (cmd->se_cmd_flags & SCF_COMPARE_AND_WRITE) {
- transport_free_sgl(cmd->t_bidi_data_sg,
+ target_free_sgl(cmd->t_bidi_data_sg,
cmd->t_bidi_data_nents);
cmd->t_bidi_data_sg = NULL;
cmd->t_bidi_data_nents = 0;
@@ -2182,11 +2183,11 @@ static inline void transport_free_pages(struct se_cmd *cmd)
}
transport_reset_sgl_orig(cmd);
- transport_free_sgl(cmd->t_data_sg, cmd->t_data_nents);
+ target_free_sgl(cmd->t_data_sg, cmd->t_data_nents);
cmd->t_data_sg = NULL;
cmd->t_data_nents = 0;
- transport_free_sgl(cmd->t_bidi_data_sg, cmd->t_bidi_data_nents);
+ target_free_sgl(cmd->t_bidi_data_sg, cmd->t_bidi_data_nents);
cmd->t_bidi_data_sg = NULL;
cmd->t_bidi_data_nents = 0;
}
@@ -2260,20 +2261,22 @@ EXPORT_SYMBOL(transport_kunmap_data_sg);
int
target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
- bool zero_page)
+ bool zero_page, bool chainable)
{
struct scatterlist *sg;
struct page *page;
gfp_t zero_flag = (zero_page) ? __GFP_ZERO : 0;
- unsigned int nent;
+ unsigned int nalloc, nent;
int i = 0;
- nent = DIV_ROUND_UP(length, PAGE_SIZE);
- sg = kmalloc(sizeof(struct scatterlist) * nent, GFP_KERNEL);
+ nalloc = nent = DIV_ROUND_UP(length, PAGE_SIZE);
+ if (chainable)
+ nalloc++;
+ sg = kmalloc_array(nalloc, sizeof(struct scatterlist), GFP_KERNEL);
if (!sg)
return -ENOMEM;
- sg_init_table(sg, nent);
+ sg_init_table(sg, nalloc);
while (length) {
u32 page_len = min_t(u32, length, PAGE_SIZE);
@@ -2297,6 +2300,7 @@ out:
kfree(sg);
return -ENOMEM;
}
+EXPORT_SYMBOL(target_alloc_sgl);
/*
* Allocate any required resources to execute the command. For writes we
@@ -2312,7 +2316,7 @@ transport_generic_new_cmd(struct se_cmd *cmd)
if (cmd->prot_op != TARGET_PROT_NORMAL &&
!(cmd->se_cmd_flags & SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC)) {
ret = target_alloc_sgl(&cmd->t_prot_sg, &cmd->t_prot_nents,
- cmd->prot_length, true);
+ cmd->prot_length, true, false);
if (ret < 0)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
}
@@ -2337,13 +2341,13 @@ transport_generic_new_cmd(struct se_cmd *cmd)
ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
&cmd->t_bidi_data_nents,
- bidi_length, zero_flag);
+ bidi_length, zero_flag, false);
if (ret < 0)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
}
ret = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
- cmd->data_length, zero_flag);
+ cmd->data_length, zero_flag, false);
if (ret < 0)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
} else if ((cmd->se_cmd_flags & SCF_COMPARE_AND_WRITE) &&
@@ -2357,7 +2361,7 @@ transport_generic_new_cmd(struct se_cmd *cmd)
ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
&cmd->t_bidi_data_nents,
- caw_length, zero_flag);
+ caw_length, zero_flag, false);
if (ret < 0)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
}
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 47fe94e..75cd854 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -563,7 +563,7 @@ static int target_xcopy_setup_pt_cmd(
if (alloc_mem) {
rc = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
- cmd->data_length, false);
+ cmd->data_length, false, false);
if (rc < 0) {
ret = rc;
goto out;
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 28ee5c2..d8ab510 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -85,7 +85,6 @@ extern struct configfs_attribute *passthrough_attrib_attrs[];
void *transport_kmap_data_sg(struct se_cmd *);
void transport_kunmap_data_sg(struct se_cmd *);
/* core helpers also used by xcopy during internal command setup */
-int target_alloc_sgl(struct scatterlist **, unsigned int *, u32, bool);
sense_reason_t transport_generic_map_mem_to_cmd(struct se_cmd *,
struct scatterlist *, u32, struct scatterlist *, u32);
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 5665340..fd6bbd2 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -179,6 +179,10 @@ int core_tpg_set_initiator_node_tag(struct se_portal_group *,
int core_tpg_register(struct se_wwn *, struct se_portal_group *, int);
int core_tpg_deregister(struct se_portal_group *);
+int target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents,
+ u32 length, bool zero_page, bool chainable);
+void target_free_sgl(struct scatterlist *sgl, int nents);
+
/*
* The LIO target core uses DMA_TO_DEVICE to mean that data is going
* to the target (eg handling a WRITE) and DMA_FROM_DEVICE to mean
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
` (5 preceding siblings ...)
2016-02-29 22:20 ` [PATCH 8/9] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
@ 2016-02-29 22:20 ` Christoph Hellwig
2016-03-01 21:38 ` Bart Van Assche
6 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-02-29 22:20 UTC (permalink / raw)
To: linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
Replace the homegrown RDMA READ/WRITE code in srpt with the generic API.
The only real twist here is that we need to allocate on Linux scatterlist
per direct buffer in the SRP command, and chain them before handing them
off to the target core.
As a side-effect of the conversion the driver will also chain the SEND
of the SRP response to the RDMA WRITE WRs for a DATA OUT command, and
properly account for RDMA WRITE WRs instead of just for RDMA READ WRs
like the driver previously did.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/infiniband/ulp/srpt/ib_srpt.c | 693 +++++++++++-----------------------
drivers/infiniband/ulp/srpt/ib_srpt.h | 30 +-
2 files changed, 242 insertions(+), 481 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 25bdaee..65b5e3a 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -765,52 +765,6 @@ static int srpt_post_recv(struct srpt_device *sdev,
}
/**
- * srpt_post_send() - Post an IB send request.
- *
- * Returns zero upon success and a non-zero value upon failure.
- */
-static int srpt_post_send(struct srpt_rdma_ch *ch,
- struct srpt_send_ioctx *ioctx, int len)
-{
- struct ib_sge list;
- struct ib_send_wr wr, *bad_wr;
- struct srpt_device *sdev = ch->sport->sdev;
- int ret;
-
- atomic_inc(&ch->req_lim);
-
- ret = -ENOMEM;
- if (unlikely(atomic_dec_return(&ch->sq_wr_avail) < 0)) {
- pr_warn("IB send queue full (needed 1)\n");
- goto out;
- }
-
- ib_dma_sync_single_for_device(sdev->device, ioctx->ioctx.dma, len,
- DMA_TO_DEVICE);
-
- list.addr = ioctx->ioctx.dma;
- list.length = len;
- list.lkey = sdev->pd->local_dma_lkey;
-
- ioctx->ioctx.cqe.done = srpt_send_done;
- wr.next = NULL;
- wr.wr_cqe = &ioctx->ioctx.cqe;
- wr.sg_list = &list;
- wr.num_sge = 1;
- wr.opcode = IB_WR_SEND;
- wr.send_flags = IB_SEND_SIGNALED;
-
- ret = ib_post_send(ch->qp, &wr, &bad_wr);
-
-out:
- if (ret < 0) {
- atomic_inc(&ch->sq_wr_avail);
- atomic_dec(&ch->req_lim);
- }
- return ret;
-}
-
-/**
* srpt_zerolength_write() - Perform a zero-length RDMA write.
*
* A quote from the InfiniBand specification: C9-88: For an HCA responder
@@ -843,6 +797,110 @@ static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc)
}
}
+static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx,
+ struct srp_direct_buf *db, int nbufs, struct scatterlist **sg,
+ unsigned *sg_cnt)
+{
+ enum dma_data_direction dir = target_reverse_dma_direction(&ioctx->cmd);
+ struct srpt_rdma_ch *ch = ioctx->ch;
+ struct scatterlist *prev = NULL;
+ unsigned prev_nents;
+ int ret, i;
+
+ if (nbufs == 1) {
+ ioctx->rw_ctxs = &ioctx->s_rw_ctx;
+ } else {
+ ioctx->rw_ctxs = kmalloc_array(nbufs, sizeof(*ioctx->rw_ctxs),
+ GFP_KERNEL);
+ if (!ioctx->rw_ctxs)
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < nbufs; i++, db++) {
+ struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+ u64 remote_addr = be64_to_cpu(db->va);
+ u32 size = be32_to_cpu(db->len);
+ u32 rkey = be32_to_cpu(db->key);
+
+ ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size, false,
+ i == nbufs - 1 ? false : true);
+ if (ret)
+ goto unwind;
+
+ ret = rdma_rw_ctx_init(&ctx->rw, ch->qp, ch->sport->port,
+ ctx->sg, ctx->nents, 0, remote_addr, rkey, dir);
+ if (ret < 0) {
+ target_free_sgl(ctx->sg, ctx->nents);
+ goto unwind;
+ }
+
+ ioctx->n_rdma += ret;
+
+ if (prev) {
+ sg_unmark_end(&prev[prev_nents - 1]);
+ sg_chain(prev, prev_nents + 1, ctx->sg);
+ } else {
+ *sg = ctx->sg;
+ }
+
+ prev = ctx->sg;
+ prev_nents = ctx->nents;
+
+ *sg_cnt += ctx->nents;
+ }
+
+ ioctx->n_rw_ctx = nbufs;
+ return 0;
+
+unwind:
+ while (--i >= 0) {
+ struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+ rdma_rw_ctx_destroy(&ctx->rw, ch->qp, ch->sport->port,
+ ctx->sg, ctx->nents, dir);
+ target_free_sgl(ctx->sg, ctx->nents);
+ }
+ if (ioctx->rw_ctxs && ioctx->rw_ctxs != &ioctx->s_rw_ctx)
+ kfree(ioctx->rw_ctxs);
+ return ret;
+}
+
+static void srpt_free_rw_ctxs(struct srpt_rdma_ch *ch,
+ struct srpt_send_ioctx *ioctx)
+{
+ enum dma_data_direction dir = target_reverse_dma_direction(&ioctx->cmd);
+ int i;
+
+ for (i = 0; i < ioctx->n_rw_ctx; i++) {
+ struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+ rdma_rw_ctx_destroy(&ctx->rw, ch->qp, ch->sport->port,
+ ctx->sg, ctx->nents, dir);
+ target_free_sgl(ctx->sg, ctx->nents);
+ }
+
+ if (ioctx->rw_ctxs && ioctx->rw_ctxs != &ioctx->s_rw_ctx)
+ kfree(ioctx->rw_ctxs);
+}
+
+static inline void *srpt_get_desc_buf(struct srp_cmd *srp_cmd)
+{
+ /*
+ * The pointer computations below will only be compiled correctly
+ * if srp_cmd::add_data is declared as s8*, u8*, s8[] or u8[], so check
+ * whether srp_cmd::add_data has been declared as a byte pointer.
+ */
+ BUILD_BUG_ON(!__same_type(srp_cmd->add_data[0], (s8)0)
+ && !__same_type(srp_cmd->add_data[0], (u8)0));
+
+ /*
+ * According to the SRP spec, the lower two bits of the 'ADDITIONAL
+ * CDB LENGTH' field are reserved and the size in bytes of this field
+ * is four times the value specified in bits 3..7. Hence the "& ~3".
+ */
+ return srp_cmd->add_data + (srp_cmd->add_cdb_len & ~3);
+}
+
/**
* srpt_get_desc_tbl() - Parse the data descriptors of an SRP_CMD request.
* @ioctx: Pointer to the I/O context associated with the request.
@@ -858,94 +916,59 @@ static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc)
* -ENOMEM when memory allocation fails and zero upon success.
*/
static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
- struct srp_cmd *srp_cmd,
- enum dma_data_direction *dir, u64 *data_len)
+ struct srp_cmd *srp_cmd, enum dma_data_direction *dir,
+ struct scatterlist **sg, unsigned *sg_cnt, u64 *data_len)
{
- struct srp_indirect_buf *idb;
- struct srp_direct_buf *db;
- unsigned add_cdb_offset;
- int ret;
-
- /*
- * The pointer computations below will only be compiled correctly
- * if srp_cmd::add_data is declared as s8*, u8*, s8[] or u8[], so check
- * whether srp_cmd::add_data has been declared as a byte pointer.
- */
- BUILD_BUG_ON(!__same_type(srp_cmd->add_data[0], (s8)0)
- && !__same_type(srp_cmd->add_data[0], (u8)0));
-
BUG_ON(!dir);
BUG_ON(!data_len);
- ret = 0;
- *data_len = 0;
-
/*
* The lower four bits of the buffer format field contain the DATA-IN
* buffer descriptor format, and the highest four bits contain the
* DATA-OUT buffer descriptor format.
*/
- *dir = DMA_NONE;
if (srp_cmd->buf_fmt & 0xf)
/* DATA-IN: transfer data from target to initiator (read). */
*dir = DMA_FROM_DEVICE;
else if (srp_cmd->buf_fmt >> 4)
/* DATA-OUT: transfer data from initiator to target (write). */
*dir = DMA_TO_DEVICE;
+ else
+ *dir = DMA_NONE;
+
+ /* initialize data_direction early as srpt_alloc_rw_ctxs needs it */
+ ioctx->cmd.data_direction = *dir;
- /*
- * According to the SRP spec, the lower two bits of the 'ADDITIONAL
- * CDB LENGTH' field are reserved and the size in bytes of this field
- * is four times the value specified in bits 3..7. Hence the "& ~3".
- */
- add_cdb_offset = srp_cmd->add_cdb_len & ~3;
if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_DIRECT) ||
((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_DIRECT)) {
- ioctx->n_rbuf = 1;
- ioctx->rbufs = &ioctx->single_rbuf;
+ struct srp_direct_buf *db = srpt_get_desc_buf(srp_cmd);
- db = (struct srp_direct_buf *)(srp_cmd->add_data
- + add_cdb_offset);
- memcpy(ioctx->rbufs, db, sizeof(*db));
*data_len = be32_to_cpu(db->len);
+ return srpt_alloc_rw_ctxs(ioctx, db, 1, sg, sg_cnt);
} else if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_INDIRECT) ||
((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_INDIRECT)) {
- idb = (struct srp_indirect_buf *)(srp_cmd->add_data
- + add_cdb_offset);
-
- ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof(*db);
+ struct srp_indirect_buf *idb = srpt_get_desc_buf(srp_cmd);
+ int nbufs = be32_to_cpu(idb->table_desc.len) /
+ sizeof(struct srp_direct_buf);
- if (ioctx->n_rbuf >
+ if (nbufs >
(srp_cmd->data_out_desc_cnt + srp_cmd->data_in_desc_cnt)) {
pr_err("received unsupported SRP_CMD request"
" type (%u out + %u in != %u / %zu)\n",
srp_cmd->data_out_desc_cnt,
srp_cmd->data_in_desc_cnt,
be32_to_cpu(idb->table_desc.len),
- sizeof(*db));
- ioctx->n_rbuf = 0;
- ret = -EINVAL;
- goto out;
- }
-
- if (ioctx->n_rbuf == 1)
- ioctx->rbufs = &ioctx->single_rbuf;
- else {
- ioctx->rbufs =
- kmalloc(ioctx->n_rbuf * sizeof(*db), GFP_ATOMIC);
- if (!ioctx->rbufs) {
- ioctx->n_rbuf = 0;
- ret = -ENOMEM;
- goto out;
- }
+ sizeof(struct srp_direct_buf));
+ return -EINVAL;
}
- db = idb->desc_list;
- memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
*data_len = be32_to_cpu(idb->len);
+ return srpt_alloc_rw_ctxs(ioctx, idb->desc_list, nbufs,
+ sg, sg_cnt);
+ } else {
+ *data_len = 0;
+ return 0;
}
-out:
- return ret;
}
/**
@@ -1049,217 +1072,6 @@ static int srpt_ch_qp_err(struct srpt_rdma_ch *ch)
}
/**
- * srpt_unmap_sg_to_ib_sge() - Unmap an IB SGE list.
- */
-static void srpt_unmap_sg_to_ib_sge(struct srpt_rdma_ch *ch,
- struct srpt_send_ioctx *ioctx)
-{
- struct scatterlist *sg;
- enum dma_data_direction dir;
-
- BUG_ON(!ch);
- BUG_ON(!ioctx);
- BUG_ON(ioctx->n_rdma && !ioctx->rdma_wrs);
-
- while (ioctx->n_rdma)
- kfree(ioctx->rdma_wrs[--ioctx->n_rdma].wr.sg_list);
-
- kfree(ioctx->rdma_wrs);
- ioctx->rdma_wrs = NULL;
-
- if (ioctx->mapped_sg_count) {
- sg = ioctx->sg;
- WARN_ON(!sg);
- dir = ioctx->cmd.data_direction;
- BUG_ON(dir == DMA_NONE);
- ib_dma_unmap_sg(ch->sport->sdev->device, sg, ioctx->sg_cnt,
- target_reverse_dma_direction(&ioctx->cmd));
- ioctx->mapped_sg_count = 0;
- }
-}
-
-/**
- * srpt_map_sg_to_ib_sge() - Map an SG list to an IB SGE list.
- */
-static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch,
- struct srpt_send_ioctx *ioctx)
-{
- struct ib_device *dev = ch->sport->sdev->device;
- struct se_cmd *cmd;
- struct scatterlist *sg, *sg_orig;
- int sg_cnt;
- enum dma_data_direction dir;
- struct ib_rdma_wr *riu;
- struct srp_direct_buf *db;
- dma_addr_t dma_addr;
- struct ib_sge *sge;
- u64 raddr;
- u32 rsize;
- u32 tsize;
- u32 dma_len;
- int count, nrdma;
- int i, j, k;
-
- BUG_ON(!ch);
- BUG_ON(!ioctx);
- cmd = &ioctx->cmd;
- dir = cmd->data_direction;
- BUG_ON(dir == DMA_NONE);
-
- ioctx->sg = sg = sg_orig = cmd->t_data_sg;
- ioctx->sg_cnt = sg_cnt = cmd->t_data_nents;
-
- count = ib_dma_map_sg(ch->sport->sdev->device, sg, sg_cnt,
- target_reverse_dma_direction(cmd));
- if (unlikely(!count))
- return -EAGAIN;
-
- ioctx->mapped_sg_count = count;
-
- if (ioctx->rdma_wrs && ioctx->n_rdma_wrs)
- nrdma = ioctx->n_rdma_wrs;
- else {
- nrdma = (count + SRPT_DEF_SG_PER_WQE - 1) / SRPT_DEF_SG_PER_WQE
- + ioctx->n_rbuf;
-
- ioctx->rdma_wrs = kcalloc(nrdma, sizeof(*ioctx->rdma_wrs),
- GFP_KERNEL);
- if (!ioctx->rdma_wrs)
- goto free_mem;
-
- ioctx->n_rdma_wrs = nrdma;
- }
-
- db = ioctx->rbufs;
- tsize = cmd->data_length;
- dma_len = ib_sg_dma_len(dev, &sg[0]);
- riu = ioctx->rdma_wrs;
-
- /*
- * For each remote desc - calculate the #ib_sge.
- * If #ib_sge < SRPT_DEF_SG_PER_WQE per rdma operation then
- * each remote desc rdma_iu is required a rdma wr;
- * else
- * we need to allocate extra rdma_iu to carry extra #ib_sge in
- * another rdma wr
- */
- for (i = 0, j = 0;
- j < count && i < ioctx->n_rbuf && tsize > 0; ++i, ++riu, ++db) {
- rsize = be32_to_cpu(db->len);
- raddr = be64_to_cpu(db->va);
- riu->remote_addr = raddr;
- riu->rkey = be32_to_cpu(db->key);
- riu->wr.num_sge = 0;
-
- /* calculate how many sge required for this remote_buf */
- while (rsize > 0 && tsize > 0) {
-
- if (rsize >= dma_len) {
- tsize -= dma_len;
- rsize -= dma_len;
- raddr += dma_len;
-
- if (tsize > 0) {
- ++j;
- if (j < count) {
- sg = sg_next(sg);
- dma_len = ib_sg_dma_len(
- dev, sg);
- }
- }
- } else {
- tsize -= rsize;
- dma_len -= rsize;
- rsize = 0;
- }
-
- ++riu->wr.num_sge;
-
- if (rsize > 0 &&
- riu->wr.num_sge == SRPT_DEF_SG_PER_WQE) {
- ++ioctx->n_rdma;
- riu->wr.sg_list = kmalloc_array(riu->wr.num_sge,
- sizeof(*riu->wr.sg_list),
- GFP_KERNEL);
- if (!riu->wr.sg_list)
- goto free_mem;
-
- ++riu;
- riu->wr.num_sge = 0;
- riu->remote_addr = raddr;
- riu->rkey = be32_to_cpu(db->key);
- }
- }
-
- ++ioctx->n_rdma;
- riu->wr.sg_list = kmalloc_array(riu->wr.num_sge,
- sizeof(*riu->wr.sg_list),
- GFP_KERNEL);
- if (!riu->wr.sg_list)
- goto free_mem;
- }
-
- db = ioctx->rbufs;
- tsize = cmd->data_length;
- riu = ioctx->rdma_wrs;
- sg = sg_orig;
- dma_len = ib_sg_dma_len(dev, &sg[0]);
- dma_addr = ib_sg_dma_address(dev, &sg[0]);
-
- /* this second loop is really mapped sg_addres to rdma_iu->ib_sge */
- for (i = 0, j = 0;
- j < count && i < ioctx->n_rbuf && tsize > 0; ++i, ++riu, ++db) {
- rsize = be32_to_cpu(db->len);
- sge = riu->wr.sg_list;
- k = 0;
-
- while (rsize > 0 && tsize > 0) {
- sge->addr = dma_addr;
- sge->lkey = ch->sport->sdev->pd->local_dma_lkey;
-
- if (rsize >= dma_len) {
- sge->length =
- (tsize < dma_len) ? tsize : dma_len;
- tsize -= dma_len;
- rsize -= dma_len;
-
- if (tsize > 0) {
- ++j;
- if (j < count) {
- sg = sg_next(sg);
- dma_len = ib_sg_dma_len(
- dev, sg);
- dma_addr = ib_sg_dma_address(
- dev, sg);
- }
- }
- } else {
- sge->length = (tsize < rsize) ? tsize : rsize;
- tsize -= rsize;
- dma_len -= rsize;
- dma_addr += rsize;
- rsize = 0;
- }
-
- ++k;
- if (k == riu->wr.num_sge && rsize > 0 && tsize > 0) {
- ++riu;
- sge = riu->wr.sg_list;
- k = 0;
- } else if (rsize > 0 && tsize > 0)
- ++sge;
- }
- }
-
- return 0;
-
-free_mem:
- srpt_unmap_sg_to_ib_sge(ch, ioctx);
-
- return -ENOMEM;
-}
-
-/**
* srpt_get_send_ioctx() - Obtain an I/O context for sending to the initiator.
*/
static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch)
@@ -1284,12 +1096,8 @@ static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch)
BUG_ON(ioctx->ch != ch);
spin_lock_init(&ioctx->spinlock);
ioctx->state = SRPT_STATE_NEW;
- ioctx->n_rbuf = 0;
- ioctx->rbufs = NULL;
+ ioctx->n_rw_ctx = 0;
ioctx->n_rdma = 0;
- ioctx->n_rdma_wrs = 0;
- ioctx->rdma_wrs = NULL;
- ioctx->mapped_sg_count = 0;
init_completion(&ioctx->tx_done);
ioctx->queue_status_only = false;
/*
@@ -1359,7 +1167,6 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx)
* SRP_RSP sending failed or the SRP_RSP send completion has
* not been received in time.
*/
- srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx);
transport_generic_free_cmd(&ioctx->cmd, 0);
break;
case SRPT_STATE_MGMT_RSP_SENT:
@@ -1387,6 +1194,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
WARN_ON(ioctx->n_rdma <= 0);
atomic_add(ioctx->n_rdma, &ch->sq_wr_avail);
+ ioctx->n_rdma = 0;
if (unlikely(wc->status != IB_WC_SUCCESS)) {
pr_info("RDMA_READ for ioctx 0x%p failed with status %d\n",
@@ -1403,23 +1211,6 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
__LINE__, srpt_get_cmd_state(ioctx));
}
-static void srpt_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
-{
- struct srpt_send_ioctx *ioctx =
- container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe);
-
- if (unlikely(wc->status != IB_WC_SUCCESS)) {
- /*
- * Note: if an RDMA write error completion is received that
- * means that a SEND also has been posted. Defer further
- * processing of the associated command until the send error
- * completion has been received.
- */
- pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n",
- ioctx, wc->status);
- }
-}
-
/**
* srpt_build_cmd_rsp() - Build an SRP_RSP response.
* @ch: RDMA channel through which the request has been received.
@@ -1537,6 +1328,8 @@ static void srpt_handle_cmd(struct srpt_rdma_ch *ch,
{
struct se_cmd *cmd;
struct srp_cmd *srp_cmd;
+ struct scatterlist *sg = NULL;
+ unsigned sg_cnt = 0;
u64 data_len;
enum dma_data_direction dir;
int rc;
@@ -1563,16 +1356,18 @@ static void srpt_handle_cmd(struct srpt_rdma_ch *ch,
break;
}
- if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) {
+ if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &sg, &sg_cnt,
+ &data_len)) {
pr_err("0x%llx: parsing SRP descriptor table failed.\n",
srp_cmd->tag);
goto release_ioctx;
}
- rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb,
+ rc = target_submit_cmd_map_sgls(cmd, ch->sess, srp_cmd->cdb,
&send_ioctx->sense_data[0],
scsilun_to_int(&srp_cmd->lun), data_len,
- TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF);
+ TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF,
+ sg, sg_cnt, NULL, 0, NULL, 0);
if (rc != 0) {
pr_debug("target_submit_cmd() returned %d for tag %#llx\n", rc,
srp_cmd->tag);
@@ -1779,14 +1574,13 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc)
WARN_ON(state != SRPT_STATE_CMD_RSP_SENT &&
state != SRPT_STATE_MGMT_RSP_SENT);
- atomic_inc(&ch->sq_wr_avail);
+ atomic_add(1 + ioctx->n_rdma, &ch->sq_wr_avail);
if (wc->status != IB_WC_SUCCESS)
pr_info("sending response for ioctx 0x%p failed"
" with status %d\n", ioctx, wc->status);
if (state != SRPT_STATE_DONE) {
- srpt_unmap_sg_to_ib_sge(ch, ioctx);
transport_generic_free_cmd(&ioctx->cmd, 0);
} else {
pr_err("IB completion has been received too late for"
@@ -1832,8 +1626,11 @@ retry:
qp_init->srq = sdev->srq;
qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
qp_init->qp_type = IB_QPT_RC;
- qp_init->cap.max_send_wr = srp_sq_size;
- qp_init->cap.max_send_sge = SRPT_DEF_SG_PER_WQE;
+ qp_init->cap.max_send_wr = srp_sq_size / 2;
+ qp_init->cap.max_rdma_ctxs = srp_sq_size / 2;
+ qp_init->cap.max_send_sge = max(sdev->device->attrs.max_sge_rd,
+ sdev->device->attrs.max_sge);
+ qp_init->port_num = ch->sport->port;
ch->qp = ib_create_qp(sdev->pd, qp_init);
if (IS_ERR(ch->qp)) {
@@ -2397,95 +2194,6 @@ static int srpt_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
return ret;
}
-/**
- * srpt_perform_rdmas() - Perform IB RDMA.
- *
- * Returns zero upon success or a negative number upon failure.
- */
-static int srpt_perform_rdmas(struct srpt_rdma_ch *ch,
- struct srpt_send_ioctx *ioctx)
-{
- struct ib_send_wr *bad_wr;
- int sq_wr_avail, ret, i;
- enum dma_data_direction dir;
- const int n_rdma = ioctx->n_rdma;
-
- dir = ioctx->cmd.data_direction;
- if (dir == DMA_TO_DEVICE) {
- /* write */
- ret = -ENOMEM;
- sq_wr_avail = atomic_sub_return(n_rdma, &ch->sq_wr_avail);
- if (sq_wr_avail < 0) {
- pr_warn("IB send queue full (needed %d)\n",
- n_rdma);
- goto out;
- }
- }
-
- for (i = 0; i < n_rdma; i++) {
- struct ib_send_wr *wr = &ioctx->rdma_wrs[i].wr;
-
- wr->opcode = (dir == DMA_FROM_DEVICE) ?
- IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;
-
- if (i == n_rdma - 1) {
- /* only get completion event for the last rdma read */
- if (dir == DMA_TO_DEVICE) {
- wr->send_flags = IB_SEND_SIGNALED;
- ioctx->rdma_cqe.done = srpt_rdma_read_done;
- } else {
- ioctx->rdma_cqe.done = srpt_rdma_write_done;
- }
- wr->wr_cqe = &ioctx->rdma_cqe;
- wr->next = NULL;
- } else {
- wr->wr_cqe = NULL;
- wr->next = &ioctx->rdma_wrs[i + 1].wr;
- }
- }
-
- ret = ib_post_send(ch->qp, &ioctx->rdma_wrs->wr, &bad_wr);
- if (ret)
- pr_err("%s[%d]: ib_post_send() returned %d for %d/%d\n",
- __func__, __LINE__, ret, i, n_rdma);
-out:
- if (unlikely(dir == DMA_TO_DEVICE && ret < 0))
- atomic_add(n_rdma, &ch->sq_wr_avail);
- return ret;
-}
-
-/**
- * srpt_xfer_data() - Start data transfer from initiator to target.
- */
-static int srpt_xfer_data(struct srpt_rdma_ch *ch,
- struct srpt_send_ioctx *ioctx)
-{
- int ret;
-
- ret = srpt_map_sg_to_ib_sge(ch, ioctx);
- if (ret) {
- pr_err("%s[%d] ret=%d\n", __func__, __LINE__, ret);
- goto out;
- }
-
- ret = srpt_perform_rdmas(ch, ioctx);
- if (ret) {
- if (ret == -EAGAIN || ret == -ENOMEM)
- pr_info("%s[%d] queue full -- ret=%d\n",
- __func__, __LINE__, ret);
- else
- pr_err("%s[%d] fatal error -- ret=%d\n",
- __func__, __LINE__, ret);
- goto out_unmap;
- }
-
-out:
- return ret;
-out_unmap:
- srpt_unmap_sg_to_ib_sge(ch, ioctx);
- goto out;
-}
-
static int srpt_write_pending_status(struct se_cmd *se_cmd)
{
struct srpt_send_ioctx *ioctx;
@@ -2502,11 +2210,42 @@ static int srpt_write_pending(struct se_cmd *se_cmd)
struct srpt_send_ioctx *ioctx =
container_of(se_cmd, struct srpt_send_ioctx, cmd);
struct srpt_rdma_ch *ch = ioctx->ch;
+ struct ib_send_wr *first_wr = NULL, *bad_wr;
+ struct ib_cqe *cqe = &ioctx->rdma_cqe;
enum srpt_command_state new_state;
+ int ret, i;
new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA);
WARN_ON(new_state == SRPT_STATE_DONE);
- return srpt_xfer_data(ch, ioctx);
+
+ if (atomic_sub_return(ioctx->n_rdma, &ch->sq_wr_avail) < 0) {
+ pr_warn("%s: IB send queue full (needed %d)\n",
+ __func__, ioctx->n_rdma);
+ ret = -ENOMEM;
+ goto out_undo;
+ }
+
+ cqe->done = srpt_rdma_read_done;
+ for (i = ioctx->n_rw_ctx - 1; i >= 0; i--) {
+ struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+ first_wr = rdma_rw_ctx_wrs(&ctx->rw, ch->qp, ch->sport->port,
+ cqe, first_wr);
+ cqe = NULL;
+ }
+
+ ret = ib_post_send(ch->qp, first_wr, &bad_wr);
+ if (ret) {
+ pr_err("%s: ib_post_send() returned %d for %d (avail: %d)\n",
+ __func__, ret, ioctx->n_rdma,
+ atomic_read(&ch->sq_wr_avail));
+ goto out_undo;
+ }
+
+ return 0;
+out_undo:
+ atomic_add(ioctx->n_rdma, &ch->sq_wr_avail);
+ return ret;
}
static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status)
@@ -2528,17 +2267,17 @@ static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status)
*/
static void srpt_queue_response(struct se_cmd *cmd)
{
- struct srpt_rdma_ch *ch;
- struct srpt_send_ioctx *ioctx;
+ struct srpt_send_ioctx *ioctx =
+ container_of(cmd, struct srpt_send_ioctx, cmd);
+ struct srpt_rdma_ch *ch = ioctx->ch;
+ struct srpt_device *sdev = ch->sport->sdev;
+ struct ib_send_wr send_wr, *first_wr = NULL, *bad_wr;
+ struct ib_sge sge;
enum srpt_command_state state;
unsigned long flags;
- int ret;
- enum dma_data_direction dir;
- int resp_len;
+ int resp_len, ret, i;
u8 srp_tm_status;
- ioctx = container_of(cmd, struct srpt_send_ioctx, cmd);
- ch = ioctx->ch;
BUG_ON(!ch);
spin_lock_irqsave(&ioctx->spinlock, flags);
@@ -2565,17 +2304,19 @@ static void srpt_queue_response(struct se_cmd *cmd)
return;
}
- dir = ioctx->cmd.data_direction;
-
/* For read commands, transfer the data to the initiator. */
- if (dir == DMA_FROM_DEVICE && ioctx->cmd.data_length &&
+ if (ioctx->cmd.data_direction == DMA_FROM_DEVICE &&
+ ioctx->cmd.data_length &&
!ioctx->queue_status_only) {
- ret = srpt_xfer_data(ch, ioctx);
- if (ret) {
- pr_err("xfer_data failed for tag %llu\n",
- ioctx->cmd.tag);
- return;
+ for (i = ioctx->n_rw_ctx - 1; i >= 0; i--) {
+ struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+ first_wr = rdma_rw_ctx_wrs(&ctx->rw, ch->qp,
+ ch->sport->port, NULL,
+ first_wr ? first_wr : &send_wr);
}
+ } else {
+ first_wr = &send_wr;
}
if (state != SRPT_STATE_MGMT)
@@ -2587,14 +2328,46 @@ static void srpt_queue_response(struct se_cmd *cmd)
resp_len = srpt_build_tskmgmt_rsp(ch, ioctx, srp_tm_status,
ioctx->cmd.tag);
}
- ret = srpt_post_send(ch, ioctx, resp_len);
- if (ret) {
- pr_err("sending cmd response failed for tag %llu\n",
- ioctx->cmd.tag);
- srpt_unmap_sg_to_ib_sge(ch, ioctx);
- srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
- target_put_sess_cmd(&ioctx->cmd);
+
+ atomic_inc(&ch->req_lim);
+
+ if (unlikely(atomic_sub_return(1 + ioctx->n_rdma,
+ &ch->sq_wr_avail) < 0)) {
+ pr_warn("%s: IB send queue full (needed %d)\n",
+ __func__, ioctx->n_rdma);
+ ret = -ENOMEM;
+ goto out;
}
+
+ ib_dma_sync_single_for_device(sdev->device, ioctx->ioctx.dma, resp_len,
+ DMA_TO_DEVICE);
+
+ sge.addr = ioctx->ioctx.dma;
+ sge.length = resp_len;
+ sge.lkey = sdev->pd->local_dma_lkey;
+
+ ioctx->ioctx.cqe.done = srpt_send_done;
+ send_wr.next = NULL;
+ send_wr.wr_cqe = &ioctx->ioctx.cqe;
+ send_wr.sg_list = &sge;
+ send_wr.num_sge = 1;
+ send_wr.opcode = IB_WR_SEND;
+ send_wr.send_flags = IB_SEND_SIGNALED;
+
+ ret = ib_post_send(ch->qp, first_wr, &bad_wr);
+ if (ret < 0) {
+ pr_err("%s: sending cmd response failed for tag %llu (%d)\n",
+ __func__, ioctx->cmd.tag, ret);
+ goto out;
+ }
+
+ return;
+
+out:
+ atomic_add(1 + ioctx->n_rdma, &ch->sq_wr_avail);
+ atomic_dec(&ch->req_lim);
+ srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
+ target_put_sess_cmd(&ioctx->cmd);
}
static int srpt_queue_data_in(struct se_cmd *cmd)
@@ -2610,10 +2383,6 @@ static void srpt_queue_tm_rsp(struct se_cmd *cmd)
static void srpt_aborted_task(struct se_cmd *cmd)
{
- struct srpt_send_ioctx *ioctx = container_of(cmd,
- struct srpt_send_ioctx, cmd);
-
- srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx);
}
static int srpt_queue_status(struct se_cmd *cmd)
@@ -2914,12 +2683,10 @@ static void srpt_release_cmd(struct se_cmd *se_cmd)
unsigned long flags;
WARN_ON(ioctx->state != SRPT_STATE_DONE);
- WARN_ON(ioctx->mapped_sg_count != 0);
- if (ioctx->n_rbuf > 1) {
- kfree(ioctx->rbufs);
- ioctx->rbufs = NULL;
- ioctx->n_rbuf = 0;
+ if (ioctx->n_rw_ctx > 1) {
+ srpt_free_rw_ctxs(ch, ioctx);
+ ioctx->n_rw_ctx = 0;
}
spin_lock_irqsave(&ch->spinlock, flags);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index af9b8b5..7a8ebc7 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -42,6 +42,7 @@
#include <rdma/ib_verbs.h>
#include <rdma/ib_sa.h>
#include <rdma/ib_cm.h>
+#include <rdma/rw.h>
#include <scsi/srp.h>
@@ -105,7 +106,6 @@ enum {
SRP_LOGIN_RSP_MULTICHAN_MAINTAINED = 0x2,
SRPT_DEF_SG_TABLESIZE = 128,
- SRPT_DEF_SG_PER_WQE = 16,
MIN_SRPT_SQ_SIZE = 16,
DEF_SRPT_SQ_SIZE = 4096,
@@ -174,21 +174,18 @@ struct srpt_recv_ioctx {
struct srpt_ioctx ioctx;
struct list_head wait_list;
};
+
+struct srpt_rw_ctx {
+ struct rdma_rw_ctx rw;
+ struct scatterlist *sg;
+ unsigned int nents;
+};
/**
* struct srpt_send_ioctx - SRPT send I/O context.
* @ioctx: See above.
* @ch: Channel pointer.
* @free_list: Node in srpt_rdma_ch.free_list.
- * @n_rbuf: Number of data buffers in the received SRP command.
- * @rbufs: Pointer to SRP data buffer array.
- * @single_rbuf: SRP data buffer if the command has only a single buffer.
- * @sg: Pointer to sg-list associated with this I/O context.
- * @sg_cnt: SG-list size.
- * @mapped_sg_count: ib_dma_map_sg() return value.
- * @n_rdma_wrs: Number of elements in the rdma_wrs array.
- * @rdma_wrs: Array with information about the RDMA mapping.
- * @tag: Tag of the received SRP information unit.
* @spinlock: Protects 'state'.
* @state: I/O context state.
* @cmd: Target core command data structure.
@@ -197,21 +194,18 @@ struct srpt_recv_ioctx {
struct srpt_send_ioctx {
struct srpt_ioctx ioctx;
struct srpt_rdma_ch *ch;
- struct ib_rdma_wr *rdma_wrs;
+
+ struct srpt_rw_ctx s_rw_ctx;
+ struct srpt_rw_ctx *rw_ctxs;
+
struct ib_cqe rdma_cqe;
- struct srp_direct_buf *rbufs;
- struct srp_direct_buf single_rbuf;
- struct scatterlist *sg;
struct list_head free_list;
spinlock_t spinlock;
enum srpt_command_state state;
struct se_cmd cmd;
struct completion tx_done;
- int sg_cnt;
- int mapped_sg_count;
- u16 n_rdma_wrs;
u8 n_rdma;
- u8 n_rbuf;
+ u8 n_rw_ctx;
bool queue_status_only;
u8 sense_data[TRANSPORT_SENSE_BUFFER];
};
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <1456784410-20166-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-29 22:28 ` Steve Wise
[not found] ` <56D4C615.5010202-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-03-01 9:00 ` Sagi Grimberg
2016-02-29 23:12 ` Bart Van Assche
1 sibling, 2 replies; 43+ messages in thread
From: Steve Wise @ 2016-02-29 22:28 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
On 2/29/2016 4:20 PM, Christoph Hellwig wrote:
> This supports both manual mapping of lots of SGEs, as well as using MRs
> from the QP's MR pool, for iWarp or other cases where it's more optimal.
> For now, MRs are only used for iWARP transports. The user of the RDMA-RW
> API must allocate the QP MR pool as well as size the SQ accordingly.
>
> Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
> and to Sagi Grimberg for ideas, reviews and fixes.
>
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
> drivers/infiniband/core/Makefile | 2 +-
> drivers/infiniband/core/rw.c | 414 +++++++++++++++++++++++++++++++++++++++
> drivers/infiniband/core/verbs.c | 25 +++
> include/rdma/ib_verbs.h | 14 +-
> include/rdma/rw.h | 69 +++++++
> 5 files changed, 522 insertions(+), 2 deletions(-)
> create mode 100644 drivers/infiniband/core/rw.c
> create mode 100644 include/rdma/rw.h
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 48bd9d8..26987d9 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -8,7 +8,7 @@ obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
> obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
> $(user_access-y)
>
> -ib_core-y := packer.o ud_header.o verbs.o cq.o sysfs.o \
> +ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> device.o fmr_pool.o cache.o netlink.o \
> roce_gid_mgmt.o mr_pool.o
> ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
> new file mode 100644
> index 0000000..e1cc1a9
> --- /dev/null
> +++ b/drivers/infiniband/core/rw.c
> @@ -0,0 +1,414 @@
> +/*
> + * Copyright (c) 2016 HGST, a Western Digital Company.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +#include <linux/slab.h>
> +#include <rdma/mr_pool.h>
> +#include <rdma/rw.h>
> +
> +/*
> + * Check if the device needs a memory registration. We currently always use
> + * memory registrations for iWarp, and never for IB and RoCE. In the future
> + * we can hopefully fine tune this based on HCA driver input.
> + */
> +static inline bool rdma_rw_use_mr(struct ib_device *dev, u8 port_num)
> +{
> + return rdma_protocol_iwarp(dev, port_num);
> +}
> +
> +static inline u32 rdma_rw_max_sge(struct ib_device *dev,
> + enum dma_data_direction dir)
> +{
> + return dir == DMA_TO_DEVICE ?
> + dev->attrs.max_sge : dev->attrs.max_sge_rd;
> +}
> +
> +static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> + u8 port_num, struct scatterlist *sg, u32 offset,
> + u64 remote_addr, u32 rkey, enum dma_data_direction dir)
> +{
> + int pages_per_mr = qp->pd->device->attrs.max_fast_reg_page_list_len;
> + int pages_left = ctx->dma_nents;
> + u32 va_offset = 0;
> + int i, ret = 0, count = 0;
> +
> + ctx->nr_ops = (ctx->dma_nents + pages_per_mr - 1) / pages_per_mr;
> + ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
> + if (!ctx->reg) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + for (i = 0; i < ctx->nr_ops; i++) {
> + struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
> + struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
> + int nents = min(pages_left, pages_per_mr);
> +
> + reg->mr = ib_mr_pool_get(qp, &qp->rdma_mrs);
> + if (!reg->mr) {
> + pr_info("failed to allocate MR from pool\n");
> + ret = -EAGAIN;
> + goto out_free;
> + }
> +
> + if (reg->mr->need_inval) {
> + reg->inv_wr.opcode = IB_WR_LOCAL_INV;
> + reg->inv_wr.ex.invalidate_rkey = reg->mr->lkey;
> + reg->inv_wr.next = ®->reg_wr.wr;
> + if (prev)
> + prev->wr.wr.next = ®->inv_wr;
> +
> + count++;
> + } else if (prev) {
> + prev->wr.wr.next = ®->reg_wr.wr;
> + }
> +
> + ib_update_fast_reg_key(reg->mr, ib_inc_rkey(reg->mr->lkey));
> +
> + ret = ib_map_mr_sg(reg->mr, sg, nents, offset,
> + PAGE_SIZE);
> + if (ret < nents) {
> + pr_info("failed to map MR\n");
> + ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
> + ret = -EINVAL;
> + goto out_free;
> + }
> +
> + reg->reg_wr.wr.opcode = IB_WR_REG_MR;
> + reg->reg_wr.mr = reg->mr;
> + reg->reg_wr.key = reg->mr->lkey;
> + reg->reg_wr.wr.next = ®->wr.wr;
> + count++;
> +
> + reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
> + if (rdma_protocol_iwarp(qp->device, port_num))
> + reg->reg_wr.access |= IB_ACCESS_REMOTE_WRITE;
> +
> + reg->sge.lkey = reg->mr->lkey;
> + reg->sge.addr = reg->mr->iova;
> + reg->sge.length = reg->mr->length;
> +
> + reg->wr.wr.sg_list = ®->sge;
> + reg->wr.wr.num_sge = 1;
> + reg->wr.remote_addr = remote_addr + va_offset;
> + reg->wr.rkey = rkey;
> + count++;
> +
> + if (dir == DMA_FROM_DEVICE) {
> + if (rdma_has_read_invalidate(qp->device, port_num)) {
> + reg->wr.wr.opcode = IB_WR_RDMA_READ_WITH_INV;
> + reg->wr.wr.ex.invalidate_rkey = reg->mr->lkey;
> + reg->mr->need_inval = false;
> + } else {
> + reg->wr.wr.opcode = IB_WR_RDMA_READ;
> + reg->mr->need_inval = true;
> + }
> + } else {
> + reg->wr.wr.opcode = IB_WR_RDMA_WRITE;
> + reg->mr->need_inval = true;
> + }
> +
> + va_offset += reg->sge.length;
> + pages_left -= nents;
> + sg = sg_next(sg);
> + offset = 0;
> + }
> +
> + return count;
> +
> +out_free:
> + while (--i >= 0)
> + ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
> + kfree(ctx->reg);
> +out:
> + return ret;
> +}
> +
> +static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> + struct scatterlist *sg, u32 offset, u64 remote_addr, u32 rkey,
> + enum dma_data_direction dir)
> +{
> + struct ib_device *dev = qp->pd->device;
> + u32 max_sge = rdma_rw_max_sge(dev, dir);
> + u32 sge_left = ctx->dma_nents;
> + struct ib_sge *sge;
> + u32 total_len = 0, i, j;
> +
> + ctx->nr_ops = DIV_ROUND_UP(ctx->dma_nents, max_sge);
> +
> + ctx->map.sges = sge = kcalloc(ctx->dma_nents, sizeof(*sge), GFP_KERNEL);
> + if (!ctx->map.sges)
> + goto out;
> +
> + ctx->map.wrs = kcalloc(ctx->nr_ops, sizeof(*ctx->map.wrs), GFP_KERNEL);
> + if (!ctx->map.wrs)
> + goto out_free_sges;
> +
> + for (i = 0; i < ctx->nr_ops; i++) {
> + struct ib_rdma_wr *rdma_wr = &ctx->map.wrs[i];
> + u32 nr_sge = min(sge_left, max_sge);
> +
> + if (dir == DMA_TO_DEVICE)
> + rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
> + else
> + rdma_wr->wr.opcode = IB_WR_RDMA_READ;
> + rdma_wr->remote_addr = remote_addr + total_len;
> + rdma_wr->rkey = rkey;
> + rdma_wr->wr.sg_list = sge;
> +
> + for (j = 0; j < nr_sge; j++, sg = sg_next(sg)) {
> + BUG_ON(!sg);
> + rdma_wr->wr.num_sge++;
> +
> + sge->addr = ib_sg_dma_address(dev, sg) + offset;
> + sge->length = ib_sg_dma_len(dev, sg) - offset;
> + sge->lkey = qp->pd->local_dma_lkey;
> +
> + total_len += sge->length;
> + sge++;
> + sge_left--;
> + offset = 0;
> + }
> +
> + if (i + 1 != ctx->nr_ops)
> + rdma_wr->wr.next = &ctx->map.wrs[i + 1].wr;
> + }
> +
> + return ctx->nr_ops;
> +
> +out_free_sges:
> + kfree(ctx->map.sges);
> +out:
> + return -ENOMEM;
> +}
> +
> +static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> + struct scatterlist *sg, u32 offset, u64 remote_addr, u32 rkey,
> + enum dma_data_direction dir)
> +{
> + struct ib_device *dev = qp->pd->device;
> + struct ib_rdma_wr *rdma_wr = &ctx->single.wr;
> +
> + ctx->nr_ops = 1;
> +
> + ctx->single.sge.lkey = qp->pd->local_dma_lkey;
> + ctx->single.sge.addr = ib_sg_dma_address(dev, sg) + offset;
> + ctx->single.sge.length = ib_sg_dma_len(dev, sg) - offset;
> +
> + memset(rdma_wr, 0, sizeof(*rdma_wr));
> + if (dir == DMA_TO_DEVICE)
> + rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
> + else
> + rdma_wr->wr.opcode = IB_WR_RDMA_READ;
> + rdma_wr->wr.sg_list = &ctx->single.sge;
> + rdma_wr->wr.num_sge = 1;
> + rdma_wr->remote_addr = remote_addr;
> + rdma_wr->rkey = rkey;
> +
> + return 1;
> +}
> +
> +/**
> + * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
> + * @ctx: context to initialize
> + * @qp: queue pair to operate on
> + * @port_num: port num to which the connection is bound
> + * @sg: scatterlist to READ/WRITE from/to
> + * @sg_cnt: number of entries in @sg
> + * @sg_offset: current byte offset into @sg
> + * @length: total length of @sg in bytes
> + * @remote_addr:remote address to read/write (relative to @rkey)
> + * @rkey: remote key to operate on
> + * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
> + *
> + * If we're going to use a FR to map this context @max_nents should be smaller
> + * or equal to the MR size.
> + *
> + * Returns the number of WQEs that will be needed on the workqueue if
> + * successful, or a negative error code.
> + */
> +int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
> + u64 remote_addr, u32 rkey, enum dma_data_direction dir)
> +{
> + struct ib_device *dev = qp->pd->device;
> + int ret;
> +
> + ctx->dma_nents = ib_dma_map_sg(dev, sg, sg_cnt, dir);
> + if (!ctx->dma_nents)
> + return -ENOMEM;
> +
> + /*
> + * Skip to the S/G entry that sg_offset falls into:
> + */
> + for (; sg; sg = sg_next(sg)) {
> + u32 len = ib_sg_dma_len(dev, sg);
> +
> + if (sg_offset < len)
> + break;
> +
> + sg_offset -= len;
> + ctx->dma_nents--;
> + }
> +
> + if (rdma_rw_use_mr(qp->device, port_num)) {
> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
> + remote_addr, rkey, dir);
At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
== 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
NVMEF code, which is slightly different. If it proves out to better the
performance, I'll post a follow-on patch...
> + } else if (ctx->dma_nents > 1) {
> + ret = rdma_rw_init_map_wrs(ctx, qp, sg, sg_offset,
> + remote_addr, rkey, dir);
> + } else {
> + ret = rdma_rw_init_single_wr(ctx, qp, sg, sg_offset,
> + remote_addr, rkey, dir);
> + }
> +
> + if (ret < 0)
> + goto out_unmap_sg;
> + return ret;
> +
> +out_unmap_sg:
> + ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
> + return ret;
> +}
> +EXPORT_SYMBOL(rdma_rw_ctx_init);
> +
> +/**
> + * rdma_rw_ctx_destroy - release all resources allocated by rdma_rw_ctx_init
> + * @ctx: context to release
> + * @qp: queue pair to operate on
> + * @port_num: port num to which the connection is bound
> + * @sg: scatterlist that was used for the READ/WRITE
> + * @sg_cnt: number of entries in @sg
> + * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
> + */
> +void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct scatterlist *sg, u32 sg_cnt, enum dma_data_direction dir)
> +{
> + if (rdma_rw_use_mr(qp->device, port_num)) {
> + int i;
> +
> + for (i = 0; i < ctx->nr_ops; i++)
> + ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
> + kfree(ctx->reg);
> + } else if (ctx->dma_nents > 1) {
> + kfree(ctx->map.wrs);
> + kfree(ctx->map.sges);
> + }
> +
> + ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
> +}
> +EXPORT_SYMBOL(rdma_rw_ctx_destroy);
> +
> +/**
> + * rdma_rw_ctx_wrs - return chain of WRs for a RDMA READ or WRITE operation
> + * @ctx: context to operate on
> + * @qp: queue pair to operate on
> + * @port_num: port num to which the connection is bound
> + * @cqe: completion queue entry for the last WR
> + * @chain_wr: WR to append to the posted chain
> + *
> + * Return the WR chain for the set of RDMA READ/WRITE operations described by
> + * @ctx, as well as any memory registration operations needed. If @chain_wr
> + * is non-NULL the WR it points to will be appended to the chain of WRs posted.
> + * If @chain_wr is not set @cqe must be set so that the caller gets a
> + * completion notification.
> + */
> +struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> + u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
> +{
> + struct ib_send_wr *first_wr, *last_wr;
> +
> + if (rdma_rw_use_mr(qp->device, port_num)) {
> + if (ctx->reg[0].inv_wr.next)
> + first_wr = &ctx->reg[0].inv_wr;
> + else
> + first_wr = &ctx->reg[0].reg_wr.wr;
> + last_wr = &ctx->reg[ctx->nr_ops - 1].wr.wr;
> + } else if (ctx->dma_nents > 1) {
> + first_wr = &ctx->map.wrs[0].wr;
> + last_wr = &ctx->map.wrs[ctx->nr_ops - 1].wr;
> + } else {
> + first_wr = &ctx->single.wr.wr;
> + last_wr = &ctx->single.wr.wr;
> + }
> +
> + if (chain_wr) {
> + last_wr->next = chain_wr;
> + } else {
> + last_wr->wr_cqe = cqe;
> + last_wr->send_flags |= IB_SEND_SIGNALED;
> + }
> +
> + return first_wr;
> +}
> +EXPORT_SYMBOL(rdma_rw_ctx_wrs);
> +
> +/**
> + * rdma_rw_ctx_post - post a RDMA READ or RDMA WRITE operation
> + * @ctx: context to operate on
> + * @qp: queue pair to operate on
> + * @port_num: port num to which the connection is bound
> + * @cqe: completion queue entry for the last WR
> + * @chain_wr: WR to append to the posted chain
> + *
> + * Post the set of RDMA READ/WRITE operations described by @ctx, as well as
> + * any memory registration operations needed. If @chain_wr is non-NULL the
> + * WR it points to will be appended to the chain of WRs posted. If @chain_wr
> + * is not set @cqe must be set so that the caller gets a completion
> + * notification.
> + */
> +int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
> +{
> + struct ib_send_wr *first_wr, *bad_wr;
> +
> + first_wr = rdma_rw_ctx_wrs(ctx, qp, port_num, cqe, chain_wr);
> + return ib_post_send(qp, first_wr, &bad_wr);
> +}
> +EXPORT_SYMBOL(rdma_rw_ctx_post);
> +
> +void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
> +{
> + /*
> + * Each context needs at least one RDMA READ or WRITE WR.
> + *
> + * For some hardware we might need more, eventually we should ask the
> + * HCA driver for a multiplier here.
> + */
> + attr->cap.max_send_wr += attr->cap.max_rdma_ctxs;
> +
> + /*
> + * If the devices needs MRs to perform RDMA READ or WRITE operations,
> + * we'll need two additional MRs for the registrations and the
> + * invalidation.
> + */
> + if (rdma_rw_use_mr(dev, attr->port_num))
> + attr->cap.max_send_wr += 2 * attr->cap.max_rdma_ctxs;
> +}
> +
> +int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
> +{
> + struct ib_device *dev = qp->pd->device;
> + int ret = 0;
> +
> + if (rdma_rw_use_mr(dev, attr->port_num)) {
> + ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
> + attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
> + dev->attrs.max_fast_reg_page_list_len);
> + }
> +
> + return ret;
> +}
> +
> +void rdma_rw_cleanup_mrs(struct ib_qp *qp)
> +{
> + ib_mr_pool_destroy(qp, &qp->rdma_mrs);
> +}
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 20bb5d1..686f9c2 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -48,6 +48,7 @@
> #include <rdma/ib_verbs.h>
> #include <rdma/ib_cache.h>
> #include <rdma/ib_addr.h>
> +#include <rdma/rw.h>
>
> #include "core_priv.h"
>
> @@ -751,6 +752,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
> {
> struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
> struct ib_qp *qp;
> + int ret;
> +
> + /*
> + * If the callers is using the RDMA API calculate the resources
> + * needed for the RDMA READ/WRITE operations.
> + *
> + * Note that these callers need to pass in a port number.
> + */
> + if (qp_init_attr->cap.max_rdma_ctxs)
> + rdma_rw_init_qp(device, qp_init_attr);
>
> qp = device->create_qp(pd, qp_init_attr, NULL);
> if (IS_ERR(qp))
> @@ -764,6 +775,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
> atomic_set(&qp->usecnt, 0);
> qp->mrs_used = 0;
> spin_lock_init(&qp->mr_lock);
> + INIT_LIST_HEAD(&qp->rdma_mrs);
>
> if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
> return ib_create_xrc_qp(qp, qp_init_attr);
> @@ -787,6 +799,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
>
> atomic_inc(&pd->usecnt);
> atomic_inc(&qp_init_attr->send_cq->usecnt);
> +
> + if (qp_init_attr->cap.max_rdma_ctxs) {
> + ret = rdma_rw_init_mrs(qp, qp_init_attr);
> + if (ret) {
> + pr_err("failed to init MR pool ret= %d\n", ret);
> + ib_destroy_qp(qp);
> + qp = ERR_PTR(ret);
> + }
> + }
> +
> return qp;
> }
> EXPORT_SYMBOL(ib_create_qp);
> @@ -1271,6 +1293,9 @@ int ib_destroy_qp(struct ib_qp *qp)
> rcq = qp->recv_cq;
> srq = qp->srq;
>
> + if (!qp->uobject)
> + rdma_rw_cleanup_mrs(qp);
> +
> ret = qp->device->destroy_qp(qp);
> if (!ret) {
> if (pd)
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 2b94cea..035585a 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -915,6 +915,13 @@ struct ib_qp_cap {
> u32 max_send_sge;
> u32 max_recv_sge;
> u32 max_inline_data;
> +
> + /*
> + * Maximum number of rdma_rw_ctx structures in flight at a time.
> + * ib_create_qp() will calculate the right amount of neededed WRs
> + * and MRs based on this.
> + */
> + u32 max_rdma_ctxs;
> };
>
> enum ib_sig_type {
> @@ -986,7 +993,11 @@ struct ib_qp_init_attr {
> enum ib_sig_type sq_sig_type;
> enum ib_qp_type qp_type;
> enum ib_qp_create_flags create_flags;
> - u8 port_num; /* special QP types only */
> +
> + /*
> + * Only needed for special QP types, or when using the RW API.
> + */
> + u8 port_num;
> };
>
> struct ib_qp_open_attr {
> @@ -1410,6 +1421,7 @@ struct ib_qp {
> struct list_head xrcd_list;
>
> spinlock_t mr_lock;
> + struct list_head rdma_mrs;
> int mrs_used;
>
> /* count times opened, mcast attaches, flow attaches */
> diff --git a/include/rdma/rw.h b/include/rdma/rw.h
> new file mode 100644
> index 0000000..57ea304
> --- /dev/null
> +++ b/include/rdma/rw.h
> @@ -0,0 +1,69 @@
> +/*
> + * Copyright (c) 2016 HGST, a Western Digital Company.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +#ifndef _RDMA_RW_H
> +#define _RDMA_RW_H
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/scatterlist.h>
> +#include <rdma/ib_verbs.h>
> +#include <rdma/rdma_cm.h>
> +#include <rdma/mr_pool.h>
> +
> +struct rdma_rw_ctx {
> + /* number of SGL entries returned by dma_map_sg */
> + u32 dma_nents;
> +
> + /* number of RDMA READ/WRITE WRs (not counting MR WRs) */
> + u32 nr_ops;
> +
> + union {
> + /* for mapping a single SGE: */
> + struct {
> + struct ib_sge sge;
> + struct ib_rdma_wr wr;
> + } single;
> +
> + /* for mapping of multiple SGEs: */
> + struct {
> + struct ib_sge *sges;
> + struct ib_rdma_wr *wrs;
> + } map;
> +
> + /* for registering multiple WRs: */
> + struct rdma_rw_reg_ctx {
> + struct ib_sge sge;
> + struct ib_rdma_wr wr;
> + struct ib_reg_wr reg_wr;
> + struct ib_send_wr inv_wr;
> + struct ib_mr *mr;
> + } *reg;
> + };
> +};
> +
> +int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
> + u64 remote_addr, u32 rkey, enum dma_data_direction dir);
> +void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct scatterlist *sg, u32 sg_cnt,
> + enum dma_data_direction dir);
> +
> +struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> + u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
> +int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
> + struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
> +
> +void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
> +int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
> +void rdma_rw_cleanup_mrs(struct ib_qp *qp);
> +
> +#endif /* _RDMA_RW_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] IB/core: add a simple MR pool
2016-02-29 22:20 ` [PATCH 5/9] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-02-29 22:36 ` Bart Van Assche
2016-03-01 6:48 ` Christoph Hellwig
2016-03-01 19:12 ` Bart Van Assche
1 sibling, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-02-29 22:36 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma@vger.kernel.org
Cc: swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> +struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
> +{
> + struct ib_mr *mr = NULL;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&qp->mr_lock, flags);
> + mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
> + if (mr)
> + qp->mrs_used++;
> + spin_unlock_irqrestore(&qp->mr_lock, flags);
> +
> + return mr;
> +}
> +EXPORT_SYMBOL(ib_mr_pool_get);
> +
> +void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&qp->mr_lock, flags);
> + list_add(&mr->qp_entry, list);
> + qp->mrs_used--;
> + spin_unlock_irqrestore(&qp->mr_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_mr_pool_put);
Hi Christoph,
So ib_mr_pool_get() does not remove the returned element from the list
and ib_mr_pool_put() adds it to the list? Why this asymmetry?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr
2016-02-29 22:20 ` [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
@ 2016-02-29 22:40 ` Bart Van Assche
[not found] ` <56D4C8EC.8020805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
0 siblings, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-02-29 22:40 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma@vger.kernel.org
Cc: swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org, Steve Wise
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> From: Steve Wise <swise@chelsio.com>
>
> This is the first step toward moving MR invalidation decisions
> to the core. It will be needed by the upcoming RW API.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> include/rdma/ib_verbs.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 1e68dae..2b94cea 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1431,6 +1431,7 @@ struct ib_mr {
> u64 iova;
> u32 length;
> unsigned int page_size;
> + bool need_inval;
> union {
> struct ib_uobject *uobject; /* user */
> struct list_head qp_entry; /* FR */
Hi Christoph,
mthca uses kmalloc() to allocate a struct mr. Are you sure we don't need
to initialize the 'need_inval' member in ib_alloc_mr()?
Bart.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <1456784410-20166-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:28 ` Steve Wise
@ 2016-02-29 23:12 ` Bart Van Assche
2016-03-01 6:50 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-02-29 23:12 UTC (permalink / raw)
To: Christoph Hellwig,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org,
sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> +static inline u32 rdma_rw_max_sge(struct ib_device *dev,
> + enum dma_data_direction dir)
> +{
> + return dir == DMA_TO_DEVICE ?
> + dev->attrs.max_sge : dev->attrs.max_sge_rd;
> +}
Hi Christoph,
Are you aware that using max_sge is safe for some but not for all HCAs?
See also http://thread.gmane.org/gmane.linux.drivers.rdma/22043.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
2016-02-29 22:20 ` [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
@ 2016-02-29 23:13 ` Bart Van Assche
2016-03-01 6:50 ` Christoph Hellwig
0 siblings, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-02-29 23:13 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma@vger.kernel.org
Cc: swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 5af6d02..1d04cea 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1557,6 +1557,7 @@ EXPORT_SYMBOL(ib_check_mr_status);
> * @mr: memory region
> * @sg: dma mapped scatterlist
> * @sg_nents: number of entries in sg
> + * @sg_offset: offset in byes into sg
> * @page_size: page vector desired page size
> *
> * Constraints:
Hi Christoph,
Please spell this and the second occurrence as "bytes".
Thanks,
Bart.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] IB/core: add a simple MR pool
2016-02-29 22:36 ` Bart Van Assche
@ 2016-03-01 6:48 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 6:48 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, linux-rdma@vger.kernel.org,
swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On Mon, Feb 29, 2016 at 02:36:53PM -0800, Bart Van Assche wrote:
> So ib_mr_pool_get() does not remove the returned element from the list and
> ib_mr_pool_put() adds it to the list? Why this asymmetry?
It should removed it. I already fixed this up twice after testing,
not sure why this sneaked back in..
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-02-29 23:12 ` Bart Van Assche
@ 2016-03-01 6:50 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 6:50 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, linux-rdma@vger.kernel.org,
swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On Mon, Feb 29, 2016 at 03:12:55PM -0800, Bart Van Assche wrote:
> Hi Christoph,
>
> Are you aware that using max_sge is safe for some but not for all HCAs? See
> also http://thread.gmane.org/gmane.linux.drivers.rdma/22043.
It should be safe, and the mlx4 bug has been fixed by Sagi a while ago
in commit a5e14b ("mlx4: Expose correct max_sge_rd limit").
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
2016-02-29 23:13 ` Bart Van Assche
@ 2016-03-01 6:50 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 6:50 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, linux-rdma@vger.kernel.org,
swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On Mon, Feb 29, 2016 at 03:13:56PM -0800, Bart Van Assche wrote:
> Please spell this and the second occurrence as "bytes".
Ok, I will fix this up.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr
[not found] ` <56D4C8EC.8020805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-03-01 7:09 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 7:09 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org,
sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Steve Wise
On Mon, Feb 29, 2016 at 02:40:44PM -0800, Bart Van Assche wrote:
> Hi Christoph,
>
> mthca uses kmalloc() to allocate a struct mr. Are you sure we don't need to
> initialize the 'need_inval' member in ib_alloc_mr()?
I'll initialize it just to be sure, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <56D4C615.5010202-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2016-03-01 7:21 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 7:21 UTC (permalink / raw)
To: Steve Wise
Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
On Mon, Feb 29, 2016 at 04:28:37PM -0600, Steve Wise wrote:
>> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
>> + remote_addr, rkey, dir);
>
> At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents ==
> 1, and WRITE/dma_nents <= max_send_sge for the device. I think this will
> help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the NVMEF
> code, which is slightly different. If it proves out to better the
> performance, I'll post a follow-on patch...
Yes, this should work fine. My plan was to have the bare API in place
for now and then allow HCA drivers to fine tune the idea eventually.
My idea was to have a threshold set by the driver from where to start
registering the memory for case where it not required (WRITE on all HCAs,
READ for !iWarp).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-02-29 22:28 ` Steve Wise
[not found] ` <56D4C615.5010202-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2016-03-01 9:00 ` Sagi Grimberg
[not found] ` <56D55A1B.6030902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-01 15:44 ` Steve Wise
1 sibling, 2 replies; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-01 9:00 UTC (permalink / raw)
To: Steve Wise, Christoph Hellwig, linux-rdma
Cc: sagig, bart.vanassche, target-devel
>> + if (rdma_rw_use_mr(qp->device, port_num)) {
>> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
>> + remote_addr, rkey, dir);
>
> At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
> == 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
> will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
> NVMEF code, which is slightly different. If it proves out to better the
> performance, I'll post a follow-on patch...
Is this possible given that pd->local_dma_lkey has only local
permissions?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-02-29 22:20 ` [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
@ 2016-03-01 9:02 ` Sagi Grimberg
2016-03-01 9:14 ` Christoph Hellwig
[not found] ` <56D55ABA.6070007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 2 replies; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-01 9:02 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
> +/*
> + * Check if the device supports READ W/ INVALIDATE.
> + */
> +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
Is rdma_device_sup_read_inv() a better name?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <56D55A1B.6030902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-03-01 9:13 ` Christoph Hellwig
[not found] ` <20160301091338.GA2208-jcswGhMUV9g@public.gmane.org>
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 9:13 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Steve Wise, Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
On Tue, Mar 01, 2016 at 11:00:11AM +0200, Sagi Grimberg wrote:
>
>>> + if (rdma_rw_use_mr(qp->device, port_num)) {
>>> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
>>> + remote_addr, rkey, dir);
>>
>> At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
>> == 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
>> will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
>> NVMEF code, which is slightly different. If it proves out to better the
>> performance, I'll post a follow-on patch...
>
> Is this possible given that pd->local_dma_lkey has only local
> permissions?
I think he means NVMe / SCSI READ commands, which use RDMA WRITE.
For NVMe / SCSI WRITE commands that use RDMA READ all bets are off
for iWarp, but at least it supports READ W/ INVALIDATE.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-03-01 9:02 ` Sagi Grimberg
@ 2016-03-01 9:14 ` Christoph Hellwig
[not found] ` <56D55ABA.6070007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-01 9:14 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, linux-rdma, swise, sagig, bart.vanassche,
target-devel
On Tue, Mar 01, 2016 at 11:02:50AM +0200, Sagi Grimberg wrote:
>
>> +/*
>> + * Check if the device supports READ W/ INVALIDATE.
>> + */
>> +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
>
> Is rdma_device_sup_read_inv() a better name?
I'm fine with any name. Let's find some consensus before chaning it,
though.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <20160301091338.GA2208-jcswGhMUV9g@public.gmane.org>
@ 2016-03-01 9:19 ` Sagi Grimberg
[not found] ` <56D55EB3.2050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-01 9:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
>>>> + if (rdma_rw_use_mr(qp->device, port_num)) {
>>>> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
>>>> + remote_addr, rkey, dir);
>>>
>>> At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
>>> == 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
>>> will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
>>> NVMEF code, which is slightly different. If it proves out to better the
>>> performance, I'll post a follow-on patch...
>>
>> Is this possible given that pd->local_dma_lkey has only local
>> permissions?
>
> I think he means NVMe / SCSI READ commands, which use RDMA WRITE.
> For NVMe / SCSI WRITE commands that use RDMA READ all bets are off
> for iWarp, but at least it supports READ W/ INVALIDATE.
I interpreted "no-mr for WRITE/dma_nents == 1" as either RDMA_WRITE of
RDMA_READ with a single dma entry (and that part I didn't understand).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-03-01 9:00 ` Sagi Grimberg
[not found] ` <56D55A1B.6030902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-03-01 15:44 ` Steve Wise
1 sibling, 0 replies; 43+ messages in thread
From: Steve Wise @ 2016-03-01 15:44 UTC (permalink / raw)
To: 'Sagi Grimberg', 'Christoph Hellwig', linux-rdma
Cc: sagig, bart.vanassche, target-devel
> >> + if (rdma_rw_use_mr(qp->device, port_num)) {
> >> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
> >> + remote_addr, rkey, dir);
> >
> > At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
> > == 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
> > will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
> > NVMEF code, which is slightly different. If it proves out to better the
> > performance, I'll post a follow-on patch...
>
> Is this possible given that pd->local_dma_lkey has only local
> permissions?
The source of an RDMA WRITE doesn't need remote permissions.
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
[not found] ` <56D55ABA.6070007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-03-01 15:45 ` Steve Wise
2016-03-03 13:10 ` 'Christoph Hellwig'
0 siblings, 1 reply; 43+ messages in thread
From: Steve Wise @ 2016-03-01 15:45 UTC (permalink / raw)
To: 'Sagi Grimberg', 'Christoph Hellwig',
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
> -----Original Message-----
> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
> Sent: Tuesday, March 01, 2016 3:03 AM
> To: Christoph Hellwig; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org; sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org; target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
>
>
> > +/*
> > + * Check if the device supports READ W/ INVALIDATE.
> > + */
> > +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
>
> Is rdma_device_sup_read_inv() a better name?
or rdma_dev_supports_read_inv()
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <56D55EB3.2050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-03-01 15:47 ` Steve Wise
0 siblings, 0 replies; 43+ messages in thread
From: Steve Wise @ 2016-03-01 15:47 UTC (permalink / raw)
To: 'Sagi Grimberg', 'Christoph Hellwig'
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
> -----Original Message-----
> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
> Sent: Tuesday, March 01, 2016 3:20 AM
> To: Christoph Hellwig
> Cc: Steve Wise; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org; target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
>
>
> >>>> + if (rdma_rw_use_mr(qp->device, port_num)) {
> >>>> + ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_offset,
> >>>> + remote_addr, rkey, dir);
> >>>
> >>> At some point I would like the iWARP IO to do no-mr for WRITE/dma_nents
> >>> == 1, and WRITE/dma_nents <= max_send_sge for the device. I think this
> >>> will help smaller iSER/NVMEF READ IOPs. I'm testing this out now on the
> >>> NVMEF code, which is slightly different. If it proves out to better the
> >>> performance, I'll post a follow-on patch...
> >>
> >> Is this possible given that pd->local_dma_lkey has only local
> >> permissions?
> >
> > I think he means NVMe / SCSI READ commands, which use RDMA WRITE.
> > For NVMe / SCSI WRITE commands that use RDMA READ all bets are off
> > for iWarp, but at least it supports READ W/ INVALIDATE.
>
> I interpreted "no-mr for WRITE/dma_nents == 1" as either RDMA_WRITE of
> RDMA_READ with a single dma entry (and that part I didn't understand).
I mean for iWARP, we can do RDMA WRITE operations w/o a MR, if the ib_sge is small enough given the device max send depth.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] IB/core: add a simple MR pool
2016-02-29 22:20 ` [PATCH 5/9] IB/core: add a simple MR pool Christoph Hellwig
2016-02-29 22:36 ` Bart Van Assche
@ 2016-03-01 19:12 ` Bart Van Assche
[not found] ` <56D5E989.9010006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
1 sibling, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-03-01 19:12 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma@vger.kernel.org
Cc: swise@opengridcomputing.com, sagig@mellanox.com,
target-devel@vger.kernel.org
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> +int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
> + enum ib_mr_type type, u32 max_num_sg)
> +{
> + struct ib_mr *mr;
> + unsigned long flags;
> + int ret, i;
> +
> + for (i = 0; i < nr; i++) {
> + mr = ib_alloc_mr(qp->pd, type, max_num_sg);
> + if (IS_ERR(mr)) {
> + ret = PTR_ERR(mr);
> + goto out;
> + }
> +
> + spin_lock_irqsave(&qp->mr_lock, flags);
> + list_add_tail(&mr->qp_entry, list);
> + spin_unlock_irqrestore(&qp->mr_lock, flags);
> + }
> +
> + return 0;
> +out:
> + ib_mr_pool_destroy(qp, list);
> + return ret;
> +}
> +EXPORT_SYMBOL(ib_mr_pool_init);
> +
> +void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list)
> +{
> + struct ib_mr *mr;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&qp->mr_lock, flags);
> + while (!list_empty(list)) {
> + mr = list_first_entry(list, struct ib_mr, qp_entry);
> + list_del(&mr->qp_entry);
> +
> + spin_unlock_irqrestore(&qp->mr_lock, flags);
> + ib_dereg_mr(mr);
> + spin_lock_irqsave(&qp->mr_lock, flags);
> + }
> + spin_unlock_irqrestore(&qp->mr_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_mr_pool_destroy);
Hello Christoph,
A second question about this patch: why do we need locking in
ib_mr_pool_init() and ib_mr_pool_destroy()? Calling ib_mr_pool_get() or
ib_mr_pool_put() while one of these two functions is in progress is a
bug (race condition). Can the locking be left out from these two functions?
Bart.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API
2016-02-29 22:20 ` [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-03-01 21:38 ` Bart Van Assche
2016-03-02 9:14 ` Christoph Hellwig
0 siblings, 1 reply; 43+ messages in thread
From: Bart Van Assche @ 2016-03-01 21:38 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, target-devel
On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> Replace the homegrown RDMA READ/WRITE code in srpt with the generic API.
> The only real twist here is that we need to allocate on Linux scatterlist
^^
This probably should have been "one" instead of "on"?
> + ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size, false,
> + i == nbufs - 1 ? false : true);
Please change "i == nbufs - 1 ? false : true" into "i < nbufs - 1".
> @@ -1832,8 +1626,11 @@ retry:
> qp_init->srq = sdev->srq;
> qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
> qp_init->qp_type = IB_QPT_RC;
> - qp_init->cap.max_send_wr = srp_sq_size;
> - qp_init->cap.max_send_sge = SRPT_DEF_SG_PER_WQE;
> + qp_init->cap.max_send_wr = srp_sq_size / 2;
> + qp_init->cap.max_rdma_ctxs = srp_sq_size / 2;
> + qp_init->cap.max_send_sge = max(sdev->device->attrs.max_sge_rd,
> + sdev->device->attrs.max_sge);
> + qp_init->port_num = ch->sport->port;
>
> ch->qp = ib_create_qp(sdev->pd, qp_init);
> if (IS_ERR(ch->qp)) {
These changes have not been described in the patch description.
Bart.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] IB/core: add a simple MR pool
[not found] ` <56D5E989.9010006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-03-02 9:14 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-02 9:14 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org,
sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Mar 01, 2016 at 11:12:09AM -0800, Bart Van Assche wrote:
> A second question about this patch: why do we need locking in
> ib_mr_pool_init() and ib_mr_pool_destroy()? Calling ib_mr_pool_get() or
> ib_mr_pool_put() while one of these two functions is in progress is a bug
> (race condition). Can the locking be left out from these two functions?
Maybe. But then we'd need to use barrier to ensure no reordering ever
happens. Just using the lock in the this slow path as well makes that
a lot easier.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API
2016-03-01 21:38 ` Bart Van Assche
@ 2016-03-02 9:14 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-02 9:14 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel
On Tue, Mar 01, 2016 at 01:38:42PM -0800, Bart Van Assche wrote:
> On 02/29/2016 02:20 PM, Christoph Hellwig wrote:
> > Replace the homegrown RDMA READ/WRITE code in srpt with the generic API.
> > The only real twist here is that we need to allocate on Linux scatterlist
> ^^
> This probably should have been "one" instead of "on"?
Yes.
>> + ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size, false,
>> + i == nbufs - 1 ? false : true);
>
> Please change "i == nbufs - 1 ? false : true" into "i < nbufs - 1".
Ok.
>> @@ -1832,8 +1626,11 @@ retry:
>> qp_init->srq = sdev->srq;
>> qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
>> qp_init->qp_type = IB_QPT_RC;
>> - qp_init->cap.max_send_wr = srp_sq_size;
>> - qp_init->cap.max_send_sge = SRPT_DEF_SG_PER_WQE;
>> + qp_init->cap.max_send_wr = srp_sq_size / 2;
>> + qp_init->cap.max_rdma_ctxs = srp_sq_size / 2;
>> + qp_init->cap.max_send_sge = max(sdev->device->attrs.max_sge_rd,
>> + sdev->device->attrs.max_sge);
>> + qp_init->port_num = ch->sport->port;
>>
>> ch->qp = ib_create_qp(sdev->pd, qp_init);
>> if (IS_ERR(ch->qp)) {
>
> These changes have not been described in the patch description.
Ok, I will update it.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-02-29 22:20 ` [PATCH 7/9] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
[not found] ` <1456784410-20166-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-03-03 10:53 ` Sagi Grimberg
2016-03-03 12:02 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-03 10:53 UTC (permalink / raw)
To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, bart.vanassche, target-devel
> +int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
> +{
> + struct ib_device *dev = qp->pd->device;
> + int ret = 0;
> +
> + if (rdma_rw_use_mr(dev, attr->port_num)) {
> + ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
> + attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
> + dev->attrs.max_fast_reg_page_list_len);
Christoph,
This is a problem for mlx5 which exposes:
props->max_fast_reg_page_list_len = (unsigned int)-1;
Which is obviously wrong and needs to be corrected, but this is sort of
an overkill to allocate max supported unconditionally.
How about choosing a sane default of 256/512 pages for now? I don't
think we'll see a lot of larger transfers in iser/nvmf (which actually
need MRs for iWARP).
Alternatively we can allow the caller to limit the MR size?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-03-03 10:53 ` Sagi Grimberg
@ 2016-03-03 12:02 ` Christoph Hellwig
[not found] ` <20160303120209.GC20543-jcswGhMUV9g@public.gmane.org>
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-03 12:02 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, linux-rdma, swise, sagig, bart.vanassche,
target-devel
On Thu, Mar 03, 2016 at 12:53:04PM +0200, Sagi Grimberg wrote:
>
>> +int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
>> +{
>> + struct ib_device *dev = qp->pd->device;
>> + int ret = 0;
>> +
>> + if (rdma_rw_use_mr(dev, attr->port_num)) {
>> + ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
>> + attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
>> + dev->attrs.max_fast_reg_page_list_len);
>
> Christoph,
>
> This is a problem for mlx5 which exposes:
>
> props->max_fast_reg_page_list_len = (unsigned int)-1;
>
> Which is obviously wrong and needs to be corrected, but this is sort of
> an overkill to allocate max supported unconditionally.
>
> How about choosing a sane default of 256/512 pages for now? I don't
> think we'll see a lot of larger transfers in iser/nvmf (which actually
> need MRs for iWARP).
>
> Alternatively we can allow the caller to limit the MR size?
I'm fine with a limit in the core rdma r/w code. But why is this
a problem for mlx5? If it offers unlimited MR sizes it should support
that, or report a useful value. I don't see why fixing mlx5 should
be a problem, and would rather see this driver bug fixed ASAP.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <20160303120209.GC20543-jcswGhMUV9g@public.gmane.org>
@ 2016-03-03 12:08 ` Sagi Grimberg
2016-03-03 12:22 ` Christoph Hellwig
[not found] ` <56D8293C.1050306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 2 replies; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-03 12:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
>> Christoph,
>>
>> This is a problem for mlx5 which exposes:
>>
>> props->max_fast_reg_page_list_len = (unsigned int)-1;
>>
>> Which is obviously wrong and needs to be corrected, but this is sort of
>> an overkill to allocate max supported unconditionally.
>>
>> How about choosing a sane default of 256/512 pages for now? I don't
>> think we'll see a lot of larger transfers in iser/nvmf (which actually
>> need MRs for iWARP).
>>
>> Alternatively we can allow the caller to limit the MR size?
>
> I'm fine with a limit in the core rdma r/w code. But why is this
> a problem for mlx5? If it offers unlimited MR sizes it should support
> that, or report a useful value. I don't see why fixing mlx5 should
> be a problem, and would rather see this driver bug fixed ASAP.
Already sent out a fix for mlx5. But even then, if a driver offers huge
amount of translation entries per MR doesn't mean we need to allocate
gigantic MRs for imaginary transfer sizes...
I'd be happier if this is controlled by the caller.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-03-03 12:08 ` Sagi Grimberg
@ 2016-03-03 12:22 ` Christoph Hellwig
2016-03-03 12:54 ` Sagi Grimberg
[not found] ` <56D8293C.1050306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2016-03-03 12:22 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-rdma, swise, sagig, bart.vanassche, target-devel
On Thu, Mar 03, 2016 at 02:08:28PM +0200, Sagi Grimberg wrote:
> Already sent out a fix for mlx5. But even then, if a driver offers huge
> amount of translation entries per MR doesn't mean we need to allocate
> gigantic MRs for imaginary transfer sizes...
>
> I'd be happier if this is controlled by the caller.
I'll throw in an arbitrary 256 entry limit for the next version, and
see if anyone screams..
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-03-03 12:22 ` Christoph Hellwig
@ 2016-03-03 12:54 ` Sagi Grimberg
0 siblings, 0 replies; 43+ messages in thread
From: Sagi Grimberg @ 2016-03-03 12:54 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-rdma, swise, sagig, bart.vanassche, target-devel
> I'll throw in an arbitrary 256 entry limit for the next version, and
> see if anyone screams..
min with device capability. But yes, I think that will suffice.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-03-01 15:45 ` Steve Wise
@ 2016-03-03 13:10 ` 'Christoph Hellwig'
2016-03-03 15:31 ` Steve Wise
2016-03-03 17:53 ` Jason Gunthorpe
0 siblings, 2 replies; 43+ messages in thread
From: 'Christoph Hellwig' @ 2016-03-03 13:10 UTC (permalink / raw)
To: Steve Wise
Cc: 'Sagi Grimberg', 'Christoph Hellwig', linux-rdma,
sagig, bart.vanassche, target-devel
On Tue, Mar 01, 2016 at 09:45:40AM -0600, Steve Wise wrote:
> > > +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
> >
> > Is rdma_device_sup_read_inv() a better name?
>
> or rdma_dev_supports_read_inv()
I looked at the naming and I think a rdma_device_* or rdma_dev_* is
simplify wrong - per the official decree we can support different
protocols on different ports, and that's why we also pass the port_num
to this function. How about rdma_cap_read_inv?
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
[not found] ` <56D8293C.1050306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-03-03 15:29 ` Steve Wise
2016-03-03 18:05 ` 'Christoph Hellwig'
0 siblings, 1 reply; 43+ messages in thread
From: Steve Wise @ 2016-03-03 15:29 UTC (permalink / raw)
To: 'Sagi Grimberg', 'Christoph Hellwig'
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
> >> Christoph,
> >>
> >> This is a problem for mlx5 which exposes:
> >>
> >> props->max_fast_reg_page_list_len = (unsigned int)-1;
> >>
> >> Which is obviously wrong and needs to be corrected, but this is sort of
> >> an overkill to allocate max supported unconditionally.
> >>
> >> How about choosing a sane default of 256/512 pages for now? I don't
> >> think we'll see a lot of larger transfers in iser/nvmf (which actually
> >> need MRs for iWARP).
> >>
> >> Alternatively we can allow the caller to limit the MR size?
> >
> > I'm fine with a limit in the core rdma r/w code. But why is this
> > a problem for mlx5? If it offers unlimited MR sizes it should support
> > that, or report a useful value. I don't see why fixing mlx5 should
> > be a problem, and would rather see this driver bug fixed ASAP.
>
> Already sent out a fix for mlx5. But even then, if a driver offers huge
> amount of translation entries per MR doesn't mean we need to allocate
> gigantic MRs for imaginary transfer sizes...
>
> I'd be happier if this is controlled by the caller.
Are there hard limits for iSER and NVMEF/RDMA? If so, then perhaps the caller should pass that in and the RDMA-RW will choose the
min(passed_in_max, device_max)?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-03-03 13:10 ` 'Christoph Hellwig'
@ 2016-03-03 15:31 ` Steve Wise
2016-03-03 17:53 ` Jason Gunthorpe
1 sibling, 0 replies; 43+ messages in thread
From: Steve Wise @ 2016-03-03 15:31 UTC (permalink / raw)
To: 'Christoph Hellwig'
Cc: 'Sagi Grimberg', linux-rdma, sagig, bart.vanassche,
target-devel
>
> On Tue, Mar 01, 2016 at 09:45:40AM -0600, Steve Wise wrote:
> > > > +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
> > >
> > > Is rdma_device_sup_read_inv() a better name?
> >
> > or rdma_dev_supports_read_inv()
>
> I looked at the naming and I think a rdma_device_* or rdma_dev_* is
> simplify wrong - per the official decree we can support different
> protocols on different ports, and that's why we also pass the port_num
> to this function. How about rdma_cap_read_inv?
Given rdma_cap_ is used a lot already, I say that sounds good.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
2016-03-03 13:10 ` 'Christoph Hellwig'
2016-03-03 15:31 ` Steve Wise
@ 2016-03-03 17:53 ` Jason Gunthorpe
[not found] ` <20160303175301.GD15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
1 sibling, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2016-03-03 17:53 UTC (permalink / raw)
To: 'Christoph Hellwig'
Cc: Steve Wise, 'Sagi Grimberg', linux-rdma, sagig,
bart.vanassche, target-devel
On Thu, Mar 03, 2016 at 02:10:50PM +0100, 'Christoph Hellwig' wrote:
> On Tue, Mar 01, 2016 at 09:45:40AM -0600, Steve Wise wrote:
> > > > +static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
> > >
> > > Is rdma_device_sup_read_inv() a better name?
> >
> > or rdma_dev_supports_read_inv()
>
> I looked at the naming and I think a rdma_device_* or rdma_dev_* is
> simplify wrong - per the official decree we can support different
> protocols on different ports, and that's why we also pass the port_num
> to this function. How about rdma_cap_read_inv?
We can't support different protocols on different ports. The ports
hung off a struct device must be similar. The rules might be bent a
bit for rocee/ib sharing (not a great idea, IMHO) but certainly iwap
and !iwarp cannot share a struct device.
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support
[not found] ` <20160303175301.GD15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-03 17:56 ` 'Christoph Hellwig'
0 siblings, 0 replies; 43+ messages in thread
From: 'Christoph Hellwig' @ 2016-03-03 17:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: 'Christoph Hellwig', Steve Wise, 'Sagi Grimberg',
linux-rdma-u79uwXL29TY76Z2rM5mHXA, sagig-VPRAkNaXOzVWk0Htik3J/w,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
target-devel-u79uwXL29TY76Z2rM5mHXA
On Thu, Mar 03, 2016 at 10:53:01AM -0700, Jason Gunthorpe wrote:
> > I looked at the naming and I think a rdma_device_* or rdma_dev_* is
> > simplify wrong - per the official decree we can support different
> > protocols on different ports, and that's why we also pass the port_num
> > to this function. How about rdma_cap_read_inv?
>
> We can't support different protocols on different ports. The ports
> hung off a struct device must be similar. The rules might be bent a
> bit for rocee/ib sharing (not a great idea, IMHO) but certainly iwap
> and !iwarp cannot share a struct device.
That's how I would like to see it. But the way how all the cap checks
are done on a per port basis implies something different.
And for this series I'm not going to changes this. In the long run
I'd be much happier if this nonsense of passing a port number to
the capability checks would go away.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/9] IB/core: generic RDMA READ/WRITE API
2016-03-03 15:29 ` Steve Wise
@ 2016-03-03 18:05 ` 'Christoph Hellwig'
0 siblings, 0 replies; 43+ messages in thread
From: 'Christoph Hellwig' @ 2016-03-03 18:05 UTC (permalink / raw)
To: Steve Wise
Cc: 'Sagi Grimberg', 'Christoph Hellwig', linux-rdma,
sagig, bart.vanassche, target-devel
On Thu, Mar 03, 2016 at 09:29:47AM -0600, Steve Wise wrote:
> Are there hard limits for iSER and NVMEF/RDMA? If so, then perhaps
> the caller should pass that in and the RDMA-RW will choose the
> min(passed_in_max, device_max)?
Nope, no hard limit in the general case.
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2016-03-03 18:05 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-29 22:20 generic RDMA READ/WRITE API V2 Christoph Hellwig
[not found] ` <1456784410-20166-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:20 ` [PATCH 1/9] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
2016-02-29 22:20 ` [PATCH 2/9] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
2016-02-29 23:13 ` Bart Van Assche
2016-03-01 6:50 ` Christoph Hellwig
2016-02-29 22:20 ` [PATCH 4/9] IB/core: refactor ib_create_qp Christoph Hellwig
2016-02-29 22:20 ` [PATCH 3/9] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
2016-03-01 9:02 ` Sagi Grimberg
2016-03-01 9:14 ` Christoph Hellwig
[not found] ` <56D55ABA.6070007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-01 15:45 ` Steve Wise
2016-03-03 13:10 ` 'Christoph Hellwig'
2016-03-03 15:31 ` Steve Wise
2016-03-03 17:53 ` Jason Gunthorpe
[not found] ` <20160303175301.GD15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-03 17:56 ` 'Christoph Hellwig'
2016-02-29 22:20 ` [PATCH 5/9] IB/core: add a simple MR pool Christoph Hellwig
2016-02-29 22:36 ` Bart Van Assche
2016-03-01 6:48 ` Christoph Hellwig
2016-03-01 19:12 ` Bart Van Assche
[not found] ` <56D5E989.9010006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-03-02 9:14 ` Christoph Hellwig
2016-02-29 22:20 ` [PATCH 6/9] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
2016-02-29 22:40 ` Bart Van Assche
[not found] ` <56D4C8EC.8020805-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-03-01 7:09 ` Christoph Hellwig
2016-02-29 22:20 ` [PATCH 7/9] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
[not found] ` <1456784410-20166-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-29 22:28 ` Steve Wise
[not found] ` <56D4C615.5010202-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-03-01 7:21 ` Christoph Hellwig
2016-03-01 9:00 ` Sagi Grimberg
[not found] ` <56D55A1B.6030902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-01 9:13 ` Christoph Hellwig
[not found] ` <20160301091338.GA2208-jcswGhMUV9g@public.gmane.org>
2016-03-01 9:19 ` Sagi Grimberg
[not found] ` <56D55EB3.2050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-01 15:47 ` Steve Wise
2016-03-01 15:44 ` Steve Wise
2016-02-29 23:12 ` Bart Van Assche
2016-03-01 6:50 ` Christoph Hellwig
2016-03-03 10:53 ` Sagi Grimberg
2016-03-03 12:02 ` Christoph Hellwig
[not found] ` <20160303120209.GC20543-jcswGhMUV9g@public.gmane.org>
2016-03-03 12:08 ` Sagi Grimberg
2016-03-03 12:22 ` Christoph Hellwig
2016-03-03 12:54 ` Sagi Grimberg
[not found] ` <56D8293C.1050306-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-03 15:29 ` Steve Wise
2016-03-03 18:05 ` 'Christoph Hellwig'
2016-02-29 22:20 ` [PATCH 8/9] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
2016-02-29 22:20 ` [PATCH 9/9] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
2016-03-01 21:38 ` Bart Van Assche
2016-03-02 9:14 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox