* [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray
@ 2023-01-13 23:26 Bob Pearson
2023-01-13 23:27 ` [PATCH for-next v3 1/6] RDMA/rxe: Cleanup mr_check_range Bob Pearson
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Bob Pearson @ 2023-01-13 23:26 UTC (permalink / raw)
To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson
This patch series replaces the page map carried in each memory region
with a struct xarray. It is based on a sketch developed by Jason
Gunthorpe. The first five patches are preparation that tries to
cleanly isolate all the mr specific code into rxe_mr.c. The sixth
patch is the actual change.
v3:
Fixed an error reported by kernel test robot
v2:
Rebased to 6.2.0-rc1+
Minor cleanups
Fixed error reported by Jason in 4/6 missing if after else.
Bob Pearson (6):
RDMA/rxe: Cleanup mr_check_range
RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c
RDMA-rxe: Isolate mr code from atomic_reply()
RDMA-rxe: Isolate mr code from atomic_write_reply()
RDMA/rxe: Cleanup page variables in rxe_mr.c
RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray
drivers/infiniband/sw/rxe/rxe_loc.h | 6 +-
drivers/infiniband/sw/rxe/rxe_mr.c | 574 +++++++++++++++-----------
drivers/infiniband/sw/rxe/rxe_resp.c | 107 ++---
drivers/infiniband/sw/rxe/rxe_verbs.c | 36 --
drivers/infiniband/sw/rxe/rxe_verbs.h | 32 +-
5 files changed, 374 insertions(+), 381 deletions(-)
base-commit: bd99ede8ef2dc03e29a181b755ba4f78da2644e6
--
2.37.2
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH for-next v3 1/6] RDMA/rxe: Cleanup mr_check_range 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 2/6] RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c Bob Pearson ` (4 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Remove blank lines and replace EFAULT by EINVAL when an invalid mr type is used. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_mr.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 072eac4b65d2..632ee1e516a1 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -26,8 +26,6 @@ u8 rxe_get_next_key(u32 last_key) int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) { - - switch (mr->ibmr.type) { case IB_MR_TYPE_DMA: return 0; @@ -41,7 +39,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) default: rxe_dbg_mr(mr, "type (%d) not supported\n", mr->ibmr.type); - return -EFAULT; + return -EINVAL; } } -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH for-next v3 2/6] RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 1/6] RDMA/rxe: Cleanup mr_check_range Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 3/6] RDMA-rxe: Isolate mr code from atomic_reply() Bob Pearson ` (3 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_loc.h | 2 ++ drivers/infiniband/sw/rxe/rxe_mr.c | 36 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_verbs.c | 36 --------------------------- 3 files changed, 38 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 948ce4902b10..29b6c2143045 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -69,6 +69,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, + int sg_nents, unsigned int *sg_offset); void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum rxe_mr_lookup_type type); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 632ee1e516a1..229c7259644c 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -223,6 +223,42 @@ int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr) return err; } +static int rxe_set_page(struct ib_mr *ibmr, u64 addr) +{ + struct rxe_mr *mr = to_rmr(ibmr); + struct rxe_map *map; + struct rxe_phys_buf *buf; + + if (unlikely(mr->nbuf == mr->num_buf)) + return -ENOMEM; + + map = mr->map[mr->nbuf / RXE_BUF_PER_MAP]; + buf = &map->buf[mr->nbuf % RXE_BUF_PER_MAP]; + + buf->addr = addr; + buf->size = ibmr->page_size; + mr->nbuf++; + + return 0; +} + +int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, + int sg_nents, unsigned int *sg_offset) +{ + struct rxe_mr *mr = to_rmr(ibmr); + int n; + + mr->nbuf = 0; + + n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rxe_set_page); + + mr->page_shift = ilog2(ibmr->page_size); + mr->page_mask = ibmr->page_size - 1; + mr->offset = ibmr->iova & mr->page_mask; + + return n; +} + static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out, size_t *offset_out) { diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 025b35bf014e..7a902e0a0607 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -948,42 +948,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, return ERR_PTR(err); } -static int rxe_set_page(struct ib_mr *ibmr, u64 addr) -{ - struct rxe_mr *mr = to_rmr(ibmr); - struct rxe_map *map; - struct rxe_phys_buf *buf; - - if (unlikely(mr->nbuf == mr->num_buf)) - return -ENOMEM; - - map = mr->map[mr->nbuf / RXE_BUF_PER_MAP]; - buf = &map->buf[mr->nbuf % RXE_BUF_PER_MAP]; - - buf->addr = addr; - buf->size = ibmr->page_size; - mr->nbuf++; - - return 0; -} - -static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, - int sg_nents, unsigned int *sg_offset) -{ - struct rxe_mr *mr = to_rmr(ibmr); - int n; - - mr->nbuf = 0; - - n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rxe_set_page); - - mr->page_shift = ilog2(ibmr->page_size); - mr->page_mask = ibmr->page_size - 1; - mr->offset = ibmr->iova & mr->page_mask; - - return n; -} - static ssize_t parent_show(struct device *device, struct device_attribute *attr, char *buf) { -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH for-next v3 3/6] RDMA-rxe: Isolate mr code from atomic_reply() 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 1/6] RDMA/rxe: Cleanup mr_check_range Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 2/6] RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() Bob Pearson ` (2 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Isolate mr specific code from atomic_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_op() in rxe_mr.c. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_loc.h | 2 ++ drivers/infiniband/sw/rxe/rxe_mr.c | 30 +++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 49 ++++++++-------------------- 3 files changed, 45 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 29b6c2143045..bcb1bbcf50df 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -72,6 +72,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); +int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum rxe_mr_lookup_type type); int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 229c7259644c..791731be6067 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -538,6 +538,36 @@ int copy_data( return err; } +static DEFINE_SPINLOCK(atomic_ops_lock); + +int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + u64 *vaddr = iova_to_vaddr(mr, iova, sizeof(u64)); + u64 value; + + /* needs to match rxe_resp.c */ + if (mr->state != RXE_MR_STATE_VALID || !vaddr) + return -EFAULT; + if ((uintptr_t)vaddr & 0x7) + return -EINVAL; + + spin_lock_bh(&atomic_ops_lock); + value = *orig_val = *vaddr; + + if (opcode == IB_OPCODE_RC_COMPARE_SWAP) { + if (value == compare) + value = swap_add; + } else { + value += swap_add; + } + + *vaddr = value; + spin_unlock_bh(&atomic_ops_lock); + + return 0; +} + int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) { struct rxe_sge *sge = &dma->sge[dma->cur_sge]; diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index c74972244f08..02301e3f343c 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -725,17 +725,12 @@ static enum resp_states process_flush(struct rxe_qp *qp, return RESPST_ACKNOWLEDGE; } -/* Guarantee atomicity of atomic operations at the machine level. */ -static DEFINE_SPINLOCK(atomic_ops_lock); - static enum resp_states atomic_reply(struct rxe_qp *qp, - struct rxe_pkt_info *pkt) + struct rxe_pkt_info *pkt) { - u64 *vaddr; - enum resp_states ret; struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; - u64 value; + int err; if (!res) { res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_MASK); @@ -743,32 +738,16 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, } if (!res->replay) { - if (mr->state != RXE_MR_STATE_VALID) { - ret = RESPST_ERR_RKEY_VIOLATION; - goto out; - } - - vaddr = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, - sizeof(u64)); - - /* check vaddr is 8 bytes aligned. */ - if (!vaddr || (uintptr_t)vaddr & 7) { - ret = RESPST_ERR_MISALIGNED_ATOMIC; - goto out; - } - - spin_lock_bh(&atomic_ops_lock); - res->atomic.orig_val = value = *vaddr; - - if (pkt->opcode == IB_OPCODE_RC_COMPARE_SWAP) { - if (value == atmeth_comp(pkt)) - value = atmeth_swap_add(pkt); - } else { - value += atmeth_swap_add(pkt); - } - - *vaddr = value; - spin_unlock_bh(&atomic_ops_lock); + u64 iova = qp->resp.va + qp->resp.offset; + + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); + if (err == -EINVAL) + return RESPST_ERR_MISALIGNED_ATOMIC; + else if (err) + return RESPST_ERR_RKEY_VIOLATION; qp->resp.msn++; @@ -780,9 +759,7 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, qp->resp.status = IB_WC_SUCCESS; } - ret = RESPST_ACKNOWLEDGE; -out: - return ret; + return RESPST_ACKNOWLEDGE; } #ifdef CONFIG_64BIT -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson ` (2 preceding siblings ...) 2023-01-13 23:27 ` [PATCH for-next v3 3/6] RDMA-rxe: Isolate mr code from atomic_reply() Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-16 2:11 ` lizhijian 2023-01-17 1:36 ` Zhu Yanjun 2023-01-13 23:27 ` [PATCH for-next v3 5/6] RDMA/rxe: Cleanup page variables in rxe_mr.c Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Bob Pearson 5 siblings, 2 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Isolate mr specific code from atomic_write_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. Check length for atomic write operation. Make iova_to_vaddr() static. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- v3: Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte write if CONFIG_64BIT is not defined as orignally intended by the developers of the atomic write implementation. link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ drivers/infiniband/sw/rxe/rxe_loc.h | 1 + drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- 3 files changed, 73 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index bcb1bbcf50df..fd70c71a9e4e 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum rxe_mr_lookup_type type); int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 791731be6067..1e74f5e8e10b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, return 0; } +/** + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr + * @mr: memory region + * @iova: iova in mr + * @addr: source of data to write + * + * Returns: + * 0 on success + * -1 for misaligned address + * -2 for access errors + * -3 for cpu without native 64 bit support + */ +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) +{ +#if defined CONFIG_64BIT + u64 *vaddr; + u64 value; + unsigned int length = 8; + + /* See IBA oA19-28 */ + if (unlikely(mr->state != RXE_MR_STATE_VALID)) { + rxe_dbg_mr(mr, "mr not valid"); + return -2; + } + + /* See IBA A19.4.2 */ + if (unlikely((uintptr_t)vaddr & 0x7 || iova & 0x7)) { + rxe_dbg_mr(mr, "misaligned address"); + return -1; + } + + vaddr = iova_to_vaddr(mr, iova, length); + if (unlikely(!vaddr)) { + rxe_dbg_mr(mr, "iova out of range"); + return -2; + } + + /* this makes no sense. What of payload is not 8? */ + memcpy(&value, addr, length); + + /* Do atomic write after all prior operations have completed */ + smp_store_release(vaddr, value); + + return 0; +#else + rxe_dbg_mr(mr, "64 bit integers not atomic"); + return -3; +#endif +} + int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) { struct rxe_sge *sge = &dma->sge[dma->cur_sge]; diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 02301e3f343c..1083cda22c65 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -762,26 +762,33 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, return RESPST_ACKNOWLEDGE; } -#ifdef CONFIG_64BIT -static enum resp_states do_atomic_write(struct rxe_qp *qp, - struct rxe_pkt_info *pkt) +static enum resp_states atomic_write_reply(struct rxe_qp *qp, + struct rxe_pkt_info *pkt) { + u64 iova = qp->resp.va + qp->resp.offset; + struct resp_res *res = qp->resp.res; struct rxe_mr *mr = qp->resp.mr; + void *addr = payload_addr(pkt); int payload = payload_size(pkt); - u64 src, *dst; - - if (mr->state != RXE_MR_STATE_VALID) - return RESPST_ERR_RKEY_VIOLATION; + int err; - memcpy(&src, payload_addr(pkt), payload); + if (!res) { + res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); + qp->resp.res = res; + } - dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, payload); - /* check vaddr is 8 bytes aligned. */ - if (!dst || (uintptr_t)dst & 7) - return RESPST_ERR_MISALIGNED_ATOMIC; + if (res->replay) + return RESPST_ACKNOWLEDGE; - /* Do atomic write after all prior operations have completed */ - smp_store_release(dst, src); + err = rxe_mr_do_atomic_write(mr, iova, addr); + if (unlikely(err)) { + if (err == -3) + return RESPST_ERR_UNSUPPORTED_OPCODE; + else if (err == -2) + return RESPST_ERR_RKEY_VIOLATION; + else + return RESPST_ERR_MISALIGNED_ATOMIC; + } /* decrease resp.resid to zero */ qp->resp.resid -= sizeof(payload); @@ -794,29 +801,8 @@ static enum resp_states do_atomic_write(struct rxe_qp *qp, qp->resp.opcode = pkt->opcode; qp->resp.status = IB_WC_SUCCESS; - return RESPST_ACKNOWLEDGE; -} -#else -static enum resp_states do_atomic_write(struct rxe_qp *qp, - struct rxe_pkt_info *pkt) -{ - return RESPST_ERR_UNSUPPORTED_OPCODE; -} -#endif /* CONFIG_64BIT */ - -static enum resp_states atomic_write_reply(struct rxe_qp *qp, - struct rxe_pkt_info *pkt) -{ - struct resp_res *res = qp->resp.res; - if (!res) { - res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); - qp->resp.res = res; - } - - if (res->replay) - return RESPST_ACKNOWLEDGE; - return do_atomic_write(qp, pkt); + return RESPST_ACKNOWLEDGE; } static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp, -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-13 23:27 ` [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() Bob Pearson @ 2023-01-16 2:11 ` lizhijian 2023-01-16 13:53 ` Jason Gunthorpe 2023-01-17 1:36 ` Zhu Yanjun 1 sibling, 1 reply; 16+ messages in thread From: lizhijian @ 2023-01-16 2:11 UTC (permalink / raw) To: Bob Pearson, jgg@nvidia.com, leonro@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org On 14/01/2023 07:27, Bob Pearson wrote: > Isolate mr specific code from atomic_write_reply() in rxe_resp.c into > a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. > Check length for atomic write operation. > Make iova_to_vaddr() static. > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v3: > Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte > write if CONFIG_64BIT is not defined as orignally intended by the > developers of the atomic write implementation. > link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ > > drivers/infiniband/sw/rxe/rxe_loc.h | 1 + > drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- > 3 files changed, 73 insertions(+), 36 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > index bcb1bbcf50df..fd70c71a9e4e 100644 > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > u64 compare, u64 swap_add, u64 *orig_val); > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, > enum rxe_mr_lookup_type type); > int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index 791731be6067..1e74f5e8e10b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > return 0; > } > > +/** > + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > + * @mr: memory region > + * @iova: iova in mr > + * @addr: source of data to write > + * > + * Returns: > + * 0 on success > + * -1 for misaligned address > + * -2 for access errors > + * -3 for cpu without native 64 bit support Well, i'm not favor of such error code. especialy when it's not a staic subroutine. Any comments from other maintainers ? > + */ > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) > +{ > +#if defined CONFIG_64BIT > + u64 *vaddr; > + u64 value; > + unsigned int length = 8; > + > + /* See IBA oA19-28 */ > + if (unlikely(mr->state != RXE_MR_STATE_VALID)) { > + rxe_dbg_mr(mr, "mr not valid"); > + return -2; > + } > + > + /* See IBA A19.4.2 */ > + if (unlikely((uintptr_t)vaddr & 0x7 || iova & 0x7)) { vaddr used before being initialized Thanks Zhijian > + rxe_dbg_mr(mr, "misaligned address"); > + return -1; > + } > + > + vaddr = iova_to_vaddr(mr, iova, length); > + if (unlikely(!vaddr)) { > + rxe_dbg_mr(mr, "iova out of range"); > + return -2; > + } > + > + /* this makes no sense. What of payload is not 8? */ > + memcpy(&value, addr, length); > + > + /* Do atomic write after all prior operations have completed */ > + smp_store_release(vaddr, value); > + > + return 0; > +#else > + rxe_dbg_mr(mr, "64 bit integers not atomic"); > + return -3; > +#endif > +} > + > int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) > { > struct rxe_sge *sge = &dma->sge[dma->cur_sge]; > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index 02301e3f343c..1083cda22c65 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -762,26 +762,33 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, > return RESPST_ACKNOWLEDGE; > } > > -#ifdef CONFIG_64BIT > -static enum resp_states do_atomic_write(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > +static enum resp_states atomic_write_reply(struct rxe_qp *qp, > + struct rxe_pkt_info *pkt) > { > + u64 iova = qp->resp.va + qp->resp.offset; > + struct resp_res *res = qp->resp.res; > struct rxe_mr *mr = qp->resp.mr; > + void *addr = payload_addr(pkt); > int payload = payload_size(pkt); > - u64 src, *dst; > - > - if (mr->state != RXE_MR_STATE_VALID) > - return RESPST_ERR_RKEY_VIOLATION; > + int err; > > - memcpy(&src, payload_addr(pkt), payload); > + if (!res) { > + res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); > + qp->resp.res = res; > + } > > - dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, payload); > - /* check vaddr is 8 bytes aligned. */ > - if (!dst || (uintptr_t)dst & 7) > - return RESPST_ERR_MISALIGNED_ATOMIC; > + if (res->replay) > + return RESPST_ACKNOWLEDGE; > > - /* Do atomic write after all prior operations have completed */ > - smp_store_release(dst, src); > + err = rxe_mr_do_atomic_write(mr, iova, addr); > + if (unlikely(err)) { > + if (err == -3) > + return RESPST_ERR_UNSUPPORTED_OPCODE; > + else if (err == -2) > + return RESPST_ERR_RKEY_VIOLATION; > + else > + return RESPST_ERR_MISALIGNED_ATOMIC; > + } > > /* decrease resp.resid to zero */ > qp->resp.resid -= sizeof(payload); > @@ -794,29 +801,8 @@ static enum resp_states do_atomic_write(struct rxe_qp *qp, > > qp->resp.opcode = pkt->opcode; > qp->resp.status = IB_WC_SUCCESS; > - return RESPST_ACKNOWLEDGE; > -} > -#else > -static enum resp_states do_atomic_write(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > -{ > - return RESPST_ERR_UNSUPPORTED_OPCODE; > -} > -#endif /* CONFIG_64BIT */ > - > -static enum resp_states atomic_write_reply(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > -{ > - struct resp_res *res = qp->resp.res; > > - if (!res) { > - res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); > - qp->resp.res = res; > - } > - > - if (res->replay) > - return RESPST_ACKNOWLEDGE; > - return do_atomic_write(qp, pkt); > + return RESPST_ACKNOWLEDGE; > } > > static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp, ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-16 2:11 ` lizhijian @ 2023-01-16 13:53 ` Jason Gunthorpe 0 siblings, 0 replies; 16+ messages in thread From: Jason Gunthorpe @ 2023-01-16 13:53 UTC (permalink / raw) To: lizhijian@fujitsu.com Cc: Bob Pearson, leonro@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org On Mon, Jan 16, 2023 at 02:11:29AM +0000, lizhijian@fujitsu.com wrote: > > +/** > > + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > > + * @mr: memory region > > + * @iova: iova in mr > > + * @addr: source of data to write > > + * > > + * Returns: > > + * 0 on success > > + * -1 for misaligned address > > + * -2 for access errors > > + * -3 for cpu without native 64 bit support > > Well, i'm not favor of such error code. especialy when it's not a staic subroutine. > Any comments from other maintainers ? It should at least have proper constants, but can it be structured to avoid this in the first place? Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-13 23:27 ` [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() Bob Pearson 2023-01-16 2:11 ` lizhijian @ 2023-01-17 1:36 ` Zhu Yanjun 2023-01-17 13:47 ` Jason Gunthorpe 1 sibling, 1 reply; 16+ messages in thread From: Zhu Yanjun @ 2023-01-17 1:36 UTC (permalink / raw) To: Bob Pearson; +Cc: jgg, leonro, linux-rdma On Sat, Jan 14, 2023 at 7:28 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > Isolate mr specific code from atomic_write_reply() in rxe_resp.c into > a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. > Check length for atomic write operation. > Make iova_to_vaddr() static. > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v3: > Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte > write if CONFIG_64BIT is not defined as orignally intended by the > developers of the atomic write implementation. > link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ > > drivers/infiniband/sw/rxe/rxe_loc.h | 1 + > drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- > 3 files changed, 73 insertions(+), 36 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > index bcb1bbcf50df..fd70c71a9e4e 100644 > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > u64 compare, u64 swap_add, u64 *orig_val); > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, > enum rxe_mr_lookup_type type); > int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index 791731be6067..1e74f5e8e10b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > return 0; > } > > +/** > + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > + * @mr: memory region > + * @iova: iova in mr > + * @addr: source of data to write > + * > + * Returns: > + * 0 on success > + * -1 for misaligned address > + * -2 for access errors > + * -3 for cpu without native 64 bit support > + */ > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) > +{ > +#if defined CONFIG_64BIT IS_ENABLED is better? Zhu Yanjun > + u64 *vaddr; > + u64 value; > + unsigned int length = 8; > + > + /* See IBA oA19-28 */ > + if (unlikely(mr->state != RXE_MR_STATE_VALID)) { > + rxe_dbg_mr(mr, "mr not valid"); > + return -2; > + } > + > + /* See IBA A19.4.2 */ > + if (unlikely((uintptr_t)vaddr & 0x7 || iova & 0x7)) { > + rxe_dbg_mr(mr, "misaligned address"); > + return -1; > + } > + > + vaddr = iova_to_vaddr(mr, iova, length); > + if (unlikely(!vaddr)) { > + rxe_dbg_mr(mr, "iova out of range"); > + return -2; > + } > + > + /* this makes no sense. What of payload is not 8? */ > + memcpy(&value, addr, length); > + > + /* Do atomic write after all prior operations have completed */ > + smp_store_release(vaddr, value); > + > + return 0; > +#else > + rxe_dbg_mr(mr, "64 bit integers not atomic"); > + return -3; > +#endif > +} > + > int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) > { > struct rxe_sge *sge = &dma->sge[dma->cur_sge]; > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index 02301e3f343c..1083cda22c65 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -762,26 +762,33 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, > return RESPST_ACKNOWLEDGE; > } > > -#ifdef CONFIG_64BIT > -static enum resp_states do_atomic_write(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > +static enum resp_states atomic_write_reply(struct rxe_qp *qp, > + struct rxe_pkt_info *pkt) > { > + u64 iova = qp->resp.va + qp->resp.offset; > + struct resp_res *res = qp->resp.res; > struct rxe_mr *mr = qp->resp.mr; > + void *addr = payload_addr(pkt); > int payload = payload_size(pkt); > - u64 src, *dst; > - > - if (mr->state != RXE_MR_STATE_VALID) > - return RESPST_ERR_RKEY_VIOLATION; > + int err; > > - memcpy(&src, payload_addr(pkt), payload); > + if (!res) { > + res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); > + qp->resp.res = res; > + } > > - dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, payload); > - /* check vaddr is 8 bytes aligned. */ > - if (!dst || (uintptr_t)dst & 7) > - return RESPST_ERR_MISALIGNED_ATOMIC; > + if (res->replay) > + return RESPST_ACKNOWLEDGE; > > - /* Do atomic write after all prior operations have completed */ > - smp_store_release(dst, src); > + err = rxe_mr_do_atomic_write(mr, iova, addr); > + if (unlikely(err)) { > + if (err == -3) > + return RESPST_ERR_UNSUPPORTED_OPCODE; > + else if (err == -2) > + return RESPST_ERR_RKEY_VIOLATION; > + else > + return RESPST_ERR_MISALIGNED_ATOMIC; > + } > > /* decrease resp.resid to zero */ > qp->resp.resid -= sizeof(payload); > @@ -794,29 +801,8 @@ static enum resp_states do_atomic_write(struct rxe_qp *qp, > > qp->resp.opcode = pkt->opcode; > qp->resp.status = IB_WC_SUCCESS; > - return RESPST_ACKNOWLEDGE; > -} > -#else > -static enum resp_states do_atomic_write(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > -{ > - return RESPST_ERR_UNSUPPORTED_OPCODE; > -} > -#endif /* CONFIG_64BIT */ > - > -static enum resp_states atomic_write_reply(struct rxe_qp *qp, > - struct rxe_pkt_info *pkt) > -{ > - struct resp_res *res = qp->resp.res; > > - if (!res) { > - res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_WRITE_MASK); > - qp->resp.res = res; > - } > - > - if (res->replay) > - return RESPST_ACKNOWLEDGE; > - return do_atomic_write(qp, pkt); > + return RESPST_ACKNOWLEDGE; > } > > static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp, > -- > 2.37.2 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-17 1:36 ` Zhu Yanjun @ 2023-01-17 13:47 ` Jason Gunthorpe 2023-01-17 16:45 ` Bob Pearson 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2023-01-17 13:47 UTC (permalink / raw) To: Zhu Yanjun; +Cc: Bob Pearson, leonro, linux-rdma On Tue, Jan 17, 2023 at 09:36:02AM +0800, Zhu Yanjun wrote: > On Sat, Jan 14, 2023 at 7:28 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > > Isolate mr specific code from atomic_write_reply() in rxe_resp.c into > > a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. > > Check length for atomic write operation. > > Make iova_to_vaddr() static. > > > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > --- > > v3: > > Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte > > write if CONFIG_64BIT is not defined as orignally intended by the > > developers of the atomic write implementation. > > link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ > > > > drivers/infiniband/sw/rxe/rxe_loc.h | 1 + > > drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ > > drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- > > 3 files changed, 73 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > > index bcb1bbcf50df..fd70c71a9e4e 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > > @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > > void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > > int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > > u64 compare, u64 swap_add, u64 *orig_val); > > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > > struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, > > enum rxe_mr_lookup_type type); > > int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > > index 791731be6067..1e74f5e8e10b 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > > @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > > return 0; > > } > > > > +/** > > + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > > + * @mr: memory region > > + * @iova: iova in mr > > + * @addr: source of data to write > > + * > > + * Returns: > > + * 0 on success > > + * -1 for misaligned address > > + * -2 for access errors > > + * -3 for cpu without native 64 bit support > > + */ > > +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) > > +{ > > +#if defined CONFIG_64BIT > > IS_ENABLED is better? is_enabled won't work here because the code doesn't compile. Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-17 13:47 ` Jason Gunthorpe @ 2023-01-17 16:45 ` Bob Pearson 2023-01-18 0:18 ` Zhu Yanjun 0 siblings, 1 reply; 16+ messages in thread From: Bob Pearson @ 2023-01-17 16:45 UTC (permalink / raw) To: Jason Gunthorpe, Zhu Yanjun; +Cc: leonro, linux-rdma On 1/17/23 07:47, Jason Gunthorpe wrote: > On Tue, Jan 17, 2023 at 09:36:02AM +0800, Zhu Yanjun wrote: >> On Sat, Jan 14, 2023 at 7:28 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: >>> >>> Isolate mr specific code from atomic_write_reply() in rxe_resp.c into >>> a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. >>> Check length for atomic write operation. >>> Make iova_to_vaddr() static. >>> >>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >>> --- >>> v3: >>> Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte >>> write if CONFIG_64BIT is not defined as orignally intended by the >>> developers of the atomic write implementation. >>> link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ >>> >>> drivers/infiniband/sw/rxe/rxe_loc.h | 1 + >>> drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ >>> drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- >>> 3 files changed, 73 insertions(+), 36 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h >>> index bcb1bbcf50df..fd70c71a9e4e 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_loc.h >>> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h >>> @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, >>> void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); >>> int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, >>> u64 compare, u64 swap_add, u64 *orig_val); >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); >>> struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, >>> enum rxe_mr_lookup_type type); >>> int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >>> index 791731be6067..1e74f5e8e10b 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c >>> @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, >>> return 0; >>> } >>> >>> +/** >>> + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr >>> + * @mr: memory region >>> + * @iova: iova in mr >>> + * @addr: source of data to write >>> + * >>> + * Returns: >>> + * 0 on success >>> + * -1 for misaligned address >>> + * -2 for access errors >>> + * -3 for cpu without native 64 bit support >>> + */ >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) >>> +{ >>> +#if defined CONFIG_64BIT >> >> IS_ENABLED is better? > > is_enabled won't work here because the code doesn't compile. > > Jason exactly. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-17 16:45 ` Bob Pearson @ 2023-01-18 0:18 ` Zhu Yanjun 2023-01-18 13:54 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Zhu Yanjun @ 2023-01-18 0:18 UTC (permalink / raw) To: Bob Pearson; +Cc: Jason Gunthorpe, leonro, linux-rdma On Wed, Jan 18, 2023 at 12:45 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > On 1/17/23 07:47, Jason Gunthorpe wrote: > > On Tue, Jan 17, 2023 at 09:36:02AM +0800, Zhu Yanjun wrote: > >> On Sat, Jan 14, 2023 at 7:28 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > >>> > >>> Isolate mr specific code from atomic_write_reply() in rxe_resp.c into > >>> a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. > >>> Check length for atomic write operation. > >>> Make iova_to_vaddr() static. > >>> > >>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >>> --- > >>> v3: > >>> Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte > >>> write if CONFIG_64BIT is not defined as orignally intended by the > >>> developers of the atomic write implementation. > >>> link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ > >>> > >>> drivers/infiniband/sw/rxe/rxe_loc.h | 1 + > >>> drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ > >>> drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- > >>> 3 files changed, 73 insertions(+), 36 deletions(-) > >>> > >>> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > >>> index bcb1bbcf50df..fd70c71a9e4e 100644 > >>> --- a/drivers/infiniband/sw/rxe/rxe_loc.h > >>> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > >>> @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > >>> void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > >>> int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > >>> u64 compare, u64 swap_add, u64 *orig_val); > >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > >>> struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, > >>> enum rxe_mr_lookup_type type); > >>> int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); > >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > >>> index 791731be6067..1e74f5e8e10b 100644 > >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c > >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > >>> @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > >>> return 0; > >>> } > >>> > >>> +/** > >>> + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > >>> + * @mr: memory region > >>> + * @iova: iova in mr > >>> + * @addr: source of data to write > >>> + * > >>> + * Returns: > >>> + * 0 on success > >>> + * -1 for misaligned address > >>> + * -2 for access errors > >>> + * -3 for cpu without native 64 bit support > >>> + */ > >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) > >>> +{ > >>> +#if defined CONFIG_64BIT > >> > >> IS_ENABLED is better? > > > > is_enabled won't work here because the code doesn't compile. > > drivers/infiniband/sw/rxe/rxe_net.c: 45 46 #if IS_ENABLED(CONFIG_IPV6) 47 static struct dst_entry *rxe_find_route6(struct rxe_qp *qp, 48 struct net_device *ndev, 49 struct in6_addr *saddr, 50 struct in6_addr *daddr) 51 { 52 struct dst_entry *ndst; 53 struct flowi6 fl6 = { { 0 } }; Zhu Yanjun > > Jason > > exactly. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() 2023-01-18 0:18 ` Zhu Yanjun @ 2023-01-18 13:54 ` Jason Gunthorpe 0 siblings, 0 replies; 16+ messages in thread From: Jason Gunthorpe @ 2023-01-18 13:54 UTC (permalink / raw) To: Zhu Yanjun; +Cc: Bob Pearson, leonro, linux-rdma On Wed, Jan 18, 2023 at 08:18:06AM +0800, Zhu Yanjun wrote: > On Wed, Jan 18, 2023 at 12:45 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > > On 1/17/23 07:47, Jason Gunthorpe wrote: > > > On Tue, Jan 17, 2023 at 09:36:02AM +0800, Zhu Yanjun wrote: > > >> On Sat, Jan 14, 2023 at 7:28 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > >>> > > >>> Isolate mr specific code from atomic_write_reply() in rxe_resp.c into > > >>> a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. > > >>> Check length for atomic write operation. > > >>> Make iova_to_vaddr() static. > > >>> > > >>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > >>> --- > > >>> v3: > > >>> Fixed bug reported by kernel test robot. Ifdef'ed out atomic 8 byte > > >>> write if CONFIG_64BIT is not defined as orignally intended by the > > >>> developers of the atomic write implementation. > > >>> link: https://lore.kernel.org/linux-rdma/202301131143.CmoyVcul-lkp@intel.com/ > > >>> > > >>> drivers/infiniband/sw/rxe/rxe_loc.h | 1 + > > >>> drivers/infiniband/sw/rxe/rxe_mr.c | 50 ++++++++++++++++++++++++ > > >>> drivers/infiniband/sw/rxe/rxe_resp.c | 58 +++++++++++----------------- > > >>> 3 files changed, 73 insertions(+), 36 deletions(-) > > >>> > > >>> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > > >>> index bcb1bbcf50df..fd70c71a9e4e 100644 > > >>> --- a/drivers/infiniband/sw/rxe/rxe_loc.h > > >>> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > > >>> @@ -74,6 +74,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > > >>> void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > > >>> int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > > >>> u64 compare, u64 swap_add, u64 *orig_val); > > >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > > >>> struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, > > >>> enum rxe_mr_lookup_type type); > > >>> int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); > > >>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > > >>> index 791731be6067..1e74f5e8e10b 100644 > > >>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c > > >>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > > >>> @@ -568,6 +568,56 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > > >>> return 0; > > >>> } > > >>> > > >>> +/** > > >>> + * rxe_mr_do_atomic_write() - write 64bit value to iova from addr > > >>> + * @mr: memory region > > >>> + * @iova: iova in mr > > >>> + * @addr: source of data to write > > >>> + * > > >>> + * Returns: > > >>> + * 0 on success > > >>> + * -1 for misaligned address > > >>> + * -2 for access errors > > >>> + * -3 for cpu without native 64 bit support > > >>> + */ > > >>> +int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) > > >>> +{ > > >>> +#if defined CONFIG_64BIT > > >> > > >> IS_ENABLED is better? > > > > > > is_enabled won't work here because the code doesn't compile. > > > > > drivers/infiniband/sw/rxe/rxe_net.c: > > 45 > 46 #if IS_ENABLED(CONFIG_IPV6) You only need this pattern if the config symbol is tristate CONFIG_64BIT is a bool Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH for-next v3 5/6] RDMA/rxe: Cleanup page variables in rxe_mr.c 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson ` (3 preceding siblings ...) 2023-01-13 23:27 ` [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Bob Pearson 5 siblings, 0 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Cleanup usage of mr->page_shift and mr->page_mask and introduce an extractor for mr->ibmr.page_size. Normal usage in the kernel has page_mask masking out offset in page rather than masking out the page number. The rxe driver had reversed that which was confusing. Implicitly there can be a per mr page_size which was not uniformly supported. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_mr.c | 31 ++++++++++++--------------- drivers/infiniband/sw/rxe/rxe_verbs.h | 11 +++++++--- 2 files changed, 22 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 1e74f5e8e10b..fd5537ee7f04 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -60,6 +60,9 @@ static void rxe_mr_init(int access, struct rxe_mr *mr) mr->lkey = mr->ibmr.lkey = lkey; mr->rkey = mr->ibmr.rkey = rkey; + mr->ibmr.page_size = PAGE_SIZE; + mr->page_mask = PAGE_MASK; + mr->page_shift = PAGE_SHIFT; mr->state = RXE_MR_STATE_INVALID; } @@ -149,9 +152,6 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, goto err_release_umem; } - mr->page_shift = PAGE_SHIFT; - mr->page_mask = PAGE_SIZE - 1; - num_buf = 0; map = mr->map; if (length > 0) { @@ -180,7 +180,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, goto err_release_umem; } buf->addr = (uintptr_t)vaddr; - buf->size = PAGE_SIZE; + buf->size = mr_page_size(mr); num_buf++; buf++; @@ -189,10 +189,9 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, mr->umem = umem; mr->access = access; - mr->offset = ib_umem_offset(umem); + mr->page_offset = ib_umem_offset(umem); mr->state = RXE_MR_STATE_VALID; mr->ibmr.type = IB_MR_TYPE_USER; - mr->ibmr.page_size = PAGE_SIZE; return 0; @@ -246,29 +245,27 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset) { struct rxe_mr *mr = to_rmr(ibmr); - int n; - - mr->nbuf = 0; + unsigned int page_size = mr_page_size(mr); - n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rxe_set_page); + mr->page_shift = ilog2(page_size); + mr->page_mask = ~((u64)page_size - 1); + mr->page_offset = ibmr->iova & (page_size - 1); - mr->page_shift = ilog2(ibmr->page_size); - mr->page_mask = ibmr->page_size - 1; - mr->offset = ibmr->iova & mr->page_mask; + mr->nbuf = 0; - return n; + return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rxe_set_page); } static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out, size_t *offset_out) { - size_t offset = iova - mr->ibmr.iova + mr->offset; + size_t offset = iova - mr->ibmr.iova + mr->page_offset; int map_index; int buf_index; u64 length; if (likely(mr->page_shift)) { - *offset_out = offset & mr->page_mask; + *offset_out = offset & (mr_page_size(mr) - 1); offset >>= mr->page_shift; *n_out = offset & mr->map_mask; *m_out = offset >> mr->map_shift; @@ -342,7 +339,7 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length) if (mr->ibmr.type == IB_MR_TYPE_DMA) return -EFAULT; - offset = (iova - mr->ibmr.iova + mr->offset) & mr->page_mask; + offset = (iova - mr->ibmr.iova + mr->page_offset) & mr->page_mask; while (length > 0) { u8 *va; int bytes; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 19ddfa890480..bfc94caaeec5 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -310,11 +310,11 @@ struct rxe_mr { u32 lkey; u32 rkey; enum rxe_mr_state state; - u32 offset; int access; - int page_shift; - int page_mask; + unsigned int page_offset; + unsigned int page_shift; + u64 page_mask; int map_shift; int map_mask; @@ -329,6 +329,11 @@ struct rxe_mr { struct rxe_map **map; }; +static inline unsigned int mr_page_size(struct rxe_mr *mr) +{ + return mr ? mr->ibmr.page_size : PAGE_SIZE; +} + enum rxe_mw_state { RXE_MW_STATE_INVALID = RXE_MR_STATE_INVALID, RXE_MW_STATE_FREE = RXE_MR_STATE_FREE, -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson ` (4 preceding siblings ...) 2023-01-13 23:27 ` [PATCH for-next v3 5/6] RDMA/rxe: Cleanup page variables in rxe_mr.c Bob Pearson @ 2023-01-13 23:27 ` Bob Pearson 2023-01-16 2:21 ` lizhijian 5 siblings, 1 reply; 16+ messages in thread From: Bob Pearson @ 2023-01-13 23:27 UTC (permalink / raw) To: jgg, leonro, zyjzyj2000, linux-rdma; +Cc: Bob Pearson Replace struct rxe-phys_buf and struct rxe_map by struct xarray in rxe_verbs.h. This allows using rcu locking on reads for the memory maps stored in each mr. This is based off of a sketch of a patch from Jason Gunthorpe in the link below. Some changes were needed to make this work. It applies cleanly to the current for-next and passes the pyverbs, perftest and the same blktests test cases which run today. Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_loc.h | 1 - drivers/infiniband/sw/rxe/rxe_mr.c | 533 ++++++++++++-------------- drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 21 +- 4 files changed, 251 insertions(+), 306 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index fd70c71a9e4e..0cf78f9bb27c 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -71,7 +71,6 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); -void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index fd5537ee7f04..e4634279080a 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -60,145 +60,113 @@ static void rxe_mr_init(int access, struct rxe_mr *mr) mr->lkey = mr->ibmr.lkey = lkey; mr->rkey = mr->ibmr.rkey = rkey; + mr->access = access; mr->ibmr.page_size = PAGE_SIZE; mr->page_mask = PAGE_MASK; mr->page_shift = PAGE_SHIFT; mr->state = RXE_MR_STATE_INVALID; } -static int rxe_mr_alloc(struct rxe_mr *mr, int num_buf) -{ - int i; - int num_map; - struct rxe_map **map = mr->map; - - num_map = (num_buf + RXE_BUF_PER_MAP - 1) / RXE_BUF_PER_MAP; - - mr->map = kmalloc_array(num_map, sizeof(*map), GFP_KERNEL); - if (!mr->map) - goto err1; - - for (i = 0; i < num_map; i++) { - mr->map[i] = kmalloc(sizeof(**map), GFP_KERNEL); - if (!mr->map[i]) - goto err2; - } - - BUILD_BUG_ON(!is_power_of_2(RXE_BUF_PER_MAP)); - - mr->map_shift = ilog2(RXE_BUF_PER_MAP); - mr->map_mask = RXE_BUF_PER_MAP - 1; - - mr->num_buf = num_buf; - mr->num_map = num_map; - mr->max_buf = num_map * RXE_BUF_PER_MAP; - - return 0; - -err2: - for (i--; i >= 0; i--) - kfree(mr->map[i]); - - kfree(mr->map); - mr->map = NULL; -err1: - return -ENOMEM; -} - void rxe_mr_init_dma(int access, struct rxe_mr *mr) { rxe_mr_init(access, mr); - mr->access = access; mr->state = RXE_MR_STATE_VALID; mr->ibmr.type = IB_MR_TYPE_DMA; } -static bool is_pmem_page(struct page *pg) +static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) { - unsigned long paddr = page_to_phys(pg); + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); +} - return REGION_INTERSECTS == - region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM, - IORES_DESC_PERSISTENT_MEMORY); +static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) +{ + return iova & (mr_page_size(mr) - 1); +} + +static int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt) +{ + XA_STATE(xas, &mr->pages, 0); + struct sg_page_iter sg_iter; + + xa_init(&mr->pages); + + __sg_page_iter_start(&sg_iter, sgt->sgl, sgt->orig_nents, 0); + if (!__sg_page_iter_next(&sg_iter)) + return 0; + + do { + xas_lock(&xas); + while (true) { + xas_store(&xas, sg_page_iter_page(&sg_iter)); + if (xas_error(&xas)) + break; + xas_next(&xas); + if (!__sg_page_iter_next(&sg_iter)) + break; + } + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + return xas_error(&xas); } int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr) { - struct rxe_map **map; - struct rxe_phys_buf *buf = NULL; - struct ib_umem *umem; - struct sg_page_iter sg_iter; - int num_buf; - void *vaddr; + struct ib_umem *umem; int err; + rxe_mr_init(access, mr); + umem = ib_umem_get(&rxe->ib_dev, start, length, access); if (IS_ERR(umem)) { rxe_dbg_mr(mr, "Unable to pin memory region err = %d\n", (int)PTR_ERR(umem)); - err = PTR_ERR(umem); - goto err_out; + return PTR_ERR(umem); } - num_buf = ib_umem_num_pages(umem); - - rxe_mr_init(access, mr); - - err = rxe_mr_alloc(mr, num_buf); + err = rxe_mr_fill_pages_from_sgt(mr, &umem->sgt_append.sgt); if (err) { - rxe_dbg_mr(mr, "Unable to allocate memory for map\n"); - goto err_release_umem; + ib_umem_release(umem); + return err; } - num_buf = 0; - map = mr->map; - if (length > 0) { - bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT; - - buf = map[0]->buf; - for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) { - struct page *pg = sg_page_iter_page(&sg_iter); + mr->umem = umem; + mr->ibmr.type = IB_MR_TYPE_USER; + mr->state = RXE_MR_STATE_VALID; - if (persistent_access && !is_pmem_page(pg)) { - rxe_dbg_mr(mr, "Unable to register persistent access to non-pmem device\n"); - err = -EINVAL; - goto err_release_umem; - } + return 0; +} - if (num_buf >= RXE_BUF_PER_MAP) { - map++; - buf = map[0]->buf; - num_buf = 0; - } +static int rxe_mr_alloc(struct rxe_mr *mr, int num_buf) +{ + XA_STATE(xas, &mr->pages, 0); + int i = 0; + int err; - vaddr = page_address(pg); - if (!vaddr) { - rxe_dbg_mr(mr, "Unable to get virtual address\n"); - err = -ENOMEM; - goto err_release_umem; - } - buf->addr = (uintptr_t)vaddr; - buf->size = mr_page_size(mr); - num_buf++; - buf++; + xa_init(&mr->pages); + do { + xas_lock(&xas); + while (i != num_buf) { + xas_store(&xas, XA_ZERO_ENTRY); + if (xas_error(&xas)) + break; + xas_next(&xas); + i++; } - } + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); - mr->umem = umem; - mr->access = access; - mr->page_offset = ib_umem_offset(umem); - mr->state = RXE_MR_STATE_VALID; - mr->ibmr.type = IB_MR_TYPE_USER; + err = xas_error(&xas); + if (err) + return err; - return 0; + mr->num_buf = num_buf; -err_release_umem: - ib_umem_release(umem); -err_out: - return err; + return 0; } int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr) @@ -212,7 +180,6 @@ int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr) if (err) goto err1; - mr->max_buf = max_pages; mr->state = RXE_MR_STATE_FREE; mr->ibmr.type = IB_MR_TYPE_MEM_REG; @@ -222,116 +189,100 @@ int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr) return err; } -static int rxe_set_page(struct ib_mr *ibmr, u64 addr) +static int rxe_set_page(struct ib_mr *ibmr, u64 iova) { struct rxe_mr *mr = to_rmr(ibmr); - struct rxe_map *map; - struct rxe_phys_buf *buf; + struct page *page = virt_to_page(iova & mr->page_mask); + XA_STATE(xas, &mr->pages, mr->nbuf); + int err; if (unlikely(mr->nbuf == mr->num_buf)) return -ENOMEM; - map = mr->map[mr->nbuf / RXE_BUF_PER_MAP]; - buf = &map->buf[mr->nbuf % RXE_BUF_PER_MAP]; + do { + xas_lock(&xas); + xas_store(&xas, page); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); - buf->addr = addr; - buf->size = ibmr->page_size; - mr->nbuf++; + err = xas_error(&xas); + if (err) + return err; + mr->nbuf++; return 0; } -int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, +int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, int sg_nents, unsigned int *sg_offset) { struct rxe_mr *mr = to_rmr(ibmr); unsigned int page_size = mr_page_size(mr); + mr->nbuf = 0; mr->page_shift = ilog2(page_size); mr->page_mask = ~((u64)page_size - 1); - mr->page_offset = ibmr->iova & (page_size - 1); + mr->page_offset = mr->ibmr.iova & (page_size - 1); - mr->nbuf = 0; - - return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rxe_set_page); + return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } -static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out, - size_t *offset_out) -{ - size_t offset = iova - mr->ibmr.iova + mr->page_offset; - int map_index; - int buf_index; - u64 length; - - if (likely(mr->page_shift)) { - *offset_out = offset & (mr_page_size(mr) - 1); - offset >>= mr->page_shift; - *n_out = offset & mr->map_mask; - *m_out = offset >> mr->map_shift; - } else { - map_index = 0; - buf_index = 0; - - length = mr->map[map_index]->buf[buf_index].size; - - while (offset >= length) { - offset -= length; - buf_index++; - - if (buf_index == RXE_BUF_PER_MAP) { - map_index++; - buf_index = 0; - } - length = mr->map[map_index]->buf[buf_index].size; - } - - *m_out = map_index; - *n_out = buf_index; - *offset_out = offset; - } -} - -void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length) +/* + * TODO: Attempting to modify the mr page map between the time + * a packet is received and the map is referenced as here + * in xa_load(&mr->pages) will cause problems. It is OK to + * deregister the mr since the mr reference counts will preserve + * it until memory accesses are complete. Currently reregister mr + * operations are not supported by the rxe driver but could be + * in the future. Invalidate followed by fast_reg mr will change + * the map and then the rkey so delayed packets arriving in the + * middle could use the wrong map entries. This isn't new but was + * already the case in the earlier implementation. This won't be + * a problem for well behaved programs which wait until all the + * outstanding packets for the first FMR before remapping to the + * second. + */ +static int rxe_mr_copy_xarray(struct rxe_mr *mr, void *addr, + unsigned long index, + unsigned int page_offset, unsigned int length, + enum rxe_mr_copy_dir dir) { - size_t offset; - int m, n; - void *addr; - - if (mr->state != RXE_MR_STATE_VALID) { - rxe_dbg_mr(mr, "Not in valid state\n"); - addr = NULL; - goto out; - } - - if (!mr->map) { - addr = (void *)(uintptr_t)iova; - goto out; - } + struct page *page; + unsigned int bytes; + void *va; - if (mr_check_range(mr, iova, length)) { - rxe_dbg_mr(mr, "Range violation\n"); - addr = NULL; - goto out; - } - - lookup_iova(mr, iova, &m, &n, &offset); - - if (offset + length > mr->map[m]->buf[n].size) { - rxe_dbg_mr(mr, "Crosses page boundary\n"); - addr = NULL; - goto out; + while (length) { + page = xa_load(&mr->pages, index); + if (WARN_ON(!page)) + return -EINVAL; + + bytes = min_t(unsigned int, length, + mr_page_size(mr) - page_offset); + va = kmap_local_page(page); + if (dir == RXE_FROM_MR_OBJ) + memcpy(addr, va + page_offset, bytes); + else + memcpy(va + page_offset, addr, bytes); + kunmap_local(va); + + page_offset = 0; + addr += bytes; + length -= bytes; + index++; } - addr = (void *)(uintptr_t)mr->map[m]->buf[n].addr + offset; - -out: - return addr; + return 0; } +// TODO convert iova to va through xarray and then flush int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length) { - size_t offset; + unsigned int page_offset; + unsigned long index; + struct page *page; + int bytes; + int err; + u8 *va; if (length == 0) return 0; @@ -339,104 +290,84 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length) if (mr->ibmr.type == IB_MR_TYPE_DMA) return -EFAULT; - offset = (iova - mr->ibmr.iova + mr->page_offset) & mr->page_mask; - while (length > 0) { - u8 *va; - int bytes; + err = mr_check_range(mr, iova, length); + if (err) + return err; - bytes = mr->ibmr.page_size - offset; + while (length > 0) { + page_offset = rxe_mr_iova_to_page_offset(mr, iova); + bytes = mr->ibmr.page_size - page_offset; if (bytes > length) bytes = length; - va = iova_to_vaddr(mr, iova, length); + index = rxe_mr_iova_to_index(mr, iova); + page = xa_load(&mr->pages, index); + if (!page) + return -2; + + va = kmap_local_page(page); if (!va) return -EFAULT; - arch_wb_cache_pmem(va, bytes); + arch_wb_cache_pmem(va + page_offset, bytes); length -= bytes; iova += bytes; - offset = 0; + page_offset = 0; } return 0; } -/* copy data from a range (vaddr, vaddr+length-1) to or from - * a mr object starting at iova. - */ +static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int page_offset, unsigned int length, + enum rxe_mr_copy_dir dir) +{ + unsigned int bytes; + struct page *page; + u8 *va; + + while (length) { + page = virt_to_page(iova & mr->page_mask); + va = kmap_local_page(page); + bytes = min_t(unsigned int, length, PAGE_SIZE - page_offset); + + if (dir == RXE_TO_MR_OBJ) + memcpy(va + page_offset, addr, bytes); + else + memcpy(addr, va + page_offset, bytes); + + kunmap_local(va); + page_offset = 0; + iova += bytes; + addr += bytes; + length -= bytes; + } +} + int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir) { - int err; - int bytes; - u8 *va; - struct rxe_map **map; - struct rxe_phys_buf *buf; - int m; - int i; - size_t offset; + unsigned int page_offset; + unsigned long index; + int err; if (length == 0) return 0; if (mr->ibmr.type == IB_MR_TYPE_DMA) { - u8 *src, *dest; - - src = (dir == RXE_TO_MR_OBJ) ? addr : ((void *)(uintptr_t)iova); - - dest = (dir == RXE_TO_MR_OBJ) ? ((void *)(uintptr_t)iova) : addr; - - memcpy(dest, src, length); - + page_offset = iova & (PAGE_SIZE - 1); + rxe_mr_copy_dma(mr, iova, addr, page_offset, length, dir); return 0; } - WARN_ON_ONCE(!mr->map); - err = mr_check_range(mr, iova, length); - if (err) { - err = -EFAULT; - goto err1; - } - - lookup_iova(mr, iova, &m, &i, &offset); - - map = mr->map + m; - buf = map[0]->buf + i; - - while (length > 0) { - u8 *src, *dest; - - va = (u8 *)(uintptr_t)buf->addr + offset; - src = (dir == RXE_TO_MR_OBJ) ? addr : va; - dest = (dir == RXE_TO_MR_OBJ) ? va : addr; - - bytes = buf->size - offset; - - if (bytes > length) - bytes = length; - - memcpy(dest, src, bytes); - - length -= bytes; - addr += bytes; - - offset = 0; - buf++; - i++; - - if (i == RXE_BUF_PER_MAP) { - i = 0; - map++; - buf = map[0]->buf; - } - } - - return 0; + if (err) + return err; -err1: - return err; + page_offset = rxe_mr_iova_to_page_offset(mr, iova); + index = rxe_mr_iova_to_index(mr, iova); + return rxe_mr_copy_xarray(mr, addr, index, page_offset, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -508,7 +439,6 @@ int copy_data( if (bytes > 0) { iova = sge->addr + offset; - err = rxe_mr_copy(mr, iova, addr, bytes, dir); if (err) goto err2; @@ -537,20 +467,47 @@ int copy_data( static DEFINE_SPINLOCK(atomic_ops_lock); +/* + * Returns: + * 0 on success + * -1 address is misaligned + * -2 access violations + */ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val) { - u64 *vaddr = iova_to_vaddr(mr, iova, sizeof(u64)); + unsigned int page_offset; + struct page *page; u64 value; + u64 *va; - /* needs to match rxe_resp.c */ - if (mr->state != RXE_MR_STATE_VALID || !vaddr) - return -EFAULT; - if ((uintptr_t)vaddr & 0x7) - return -EINVAL; + if (unlikely(mr->state != RXE_MR_STATE_VALID)) + return -2; + + if (mr->ibmr.type == IB_MR_TYPE_DMA) { + page_offset = iova & (PAGE_SIZE - 1); + page = virt_to_page(iova & PAGE_MASK); + } else { + unsigned long index; + int err; + + err = mr_check_range(mr, iova, 8); + if (err) + return err; + page_offset = rxe_mr_iova_to_page_offset(mr, iova); + index = rxe_mr_iova_to_index(mr, iova); + page = xa_load(&mr->pages, index); + if (!page) + return -2; + } + + if (unlikely(page_offset & 0x7)) + return -1; + + va = kmap_local_page(page); spin_lock_bh(&atomic_ops_lock); - value = *orig_val = *vaddr; + *orig_val = value = va[page_offset >> 3]; if (opcode == IB_OPCODE_RC_COMPARE_SWAP) { if (value == compare) @@ -559,9 +516,11 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, value += swap_add; } - *vaddr = value; + va[page_offset >> 3] = value; spin_unlock_bh(&atomic_ops_lock); + kunmap_local(va); + return 0; } @@ -580,9 +539,11 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) { #if defined CONFIG_64BIT - u64 *vaddr; - u64 value; + unsigned int page_offset; + struct page *page; unsigned int length = 8; + u64 value; + u64 *va; /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { @@ -590,23 +551,38 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr) return -2; } + if (mr->ibmr.type == IB_MR_TYPE_DMA) { + page_offset = iova & (PAGE_SIZE - 1); + page = virt_to_page(iova & PAGE_MASK); + } else { + unsigned long index; + int err; + + /* See IBA oA19-28 */ + err = mr_check_range(mr, iova, length); + if (unlikely(err)) { + rxe_dbg_mr(mr, "iova out of range"); + return -2; + } + page_offset = rxe_mr_iova_to_page_offset(mr, iova); + index = rxe_mr_iova_to_index(mr, iova); + page = xa_load(&mr->pages, index); + if (WARN_ON(!page)) + return -2; + } + /* See IBA A19.4.2 */ - if (unlikely((uintptr_t)vaddr & 0x7 || iova & 0x7)) { + if (iova & 0x7) { rxe_dbg_mr(mr, "misaligned address"); return -1; } - vaddr = iova_to_vaddr(mr, iova, length); - if (unlikely(!vaddr)) { - rxe_dbg_mr(mr, "iova out of range"); - return -2; - } - - /* this makes no sense. What of payload is not 8? */ + va = kmap_local_page(page); memcpy(&value, addr, length); /* Do atomic write after all prior operations have completed */ - smp_store_release(vaddr, value); + smp_store_release(&va[page_offset >> 3], value); + kunmap_local(va); return 0; #else @@ -648,12 +624,6 @@ int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) return 0; } -/* (1) find the mr corresponding to lkey/rkey - * depending on lookup_type - * (2) verify that the (qp) pd matches the mr pd - * (3) verify that the mr can support the requested access - * (4) verify that mr state is valid - */ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum rxe_mr_lookup_type type) { @@ -774,15 +744,8 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) void rxe_mr_cleanup(struct rxe_pool_elem *elem) { struct rxe_mr *mr = container_of(elem, typeof(*mr), elem); - int i; rxe_put(mr_pd(mr)); ib_umem_release(mr->umem); - - if (mr->map) { - for (i = 0; i < mr->num_map; i++) - kfree(mr->map[i]); - - kfree(mr->map); - } + xa_destroy(&mr->pages); } diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 1083cda22c65..b0ff36f6dc4a 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -744,7 +744,7 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, atmeth_comp(pkt), atmeth_swap_add(pkt), &res->atomic.orig_val); - if (err == -EINVAL) + if (err == -1) return RESPST_ERR_MISALIGNED_ATOMIC; else if (err) return RESPST_ERR_RKEY_VIOLATION; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index bfc94caaeec5..64551cf354a6 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -283,17 +283,6 @@ enum rxe_mr_lookup_type { RXE_LOOKUP_REMOTE, }; -#define RXE_BUF_PER_MAP (PAGE_SIZE / sizeof(struct rxe_phys_buf)) - -struct rxe_phys_buf { - u64 addr; - u64 size; -}; - -struct rxe_map { - struct rxe_phys_buf buf[RXE_BUF_PER_MAP]; -}; - static inline int rkey_is_mw(u32 rkey) { u32 index = rkey >> 8; @@ -311,22 +300,16 @@ struct rxe_mr { u32 rkey; enum rxe_mr_state state; int access; + atomic_t num_mw; unsigned int page_offset; unsigned int page_shift; u64 page_mask; - int map_shift; - int map_mask; u32 num_buf; u32 nbuf; - u32 max_buf; - u32 num_map; - - atomic_t num_mw; - - struct rxe_map **map; + struct xarray pages; }; static inline unsigned int mr_page_size(struct rxe_mr *mr) -- 2.37.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray 2023-01-13 23:27 ` [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Bob Pearson @ 2023-01-16 2:21 ` lizhijian 2023-01-17 0:05 ` Bob Pearson 0 siblings, 1 reply; 16+ messages in thread From: lizhijian @ 2023-01-16 2:21 UTC (permalink / raw) To: Bob Pearson, jgg@nvidia.com, leonro@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org On 14/01/2023 07:27, Bob Pearson wrote: > Replace struct rxe-phys_buf and struct rxe_map by struct xarray > in rxe_verbs.h. This allows using rcu locking on reads for > the memory maps stored in each mr. > > This is based off of a sketch of a patch from Jason Gunthorpe in the > link below. Some changes were needed to make this work. It applies > cleanly to the current for-next and passes the pyverbs, perftest > and the same blktests test cases which run today. > > Link:https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ > Co-developed-by: Jason Gunthorpe<jgg@nvidia.com> > Signed-off-by: Bob Pearson<rpearsonhpe@gmail.com> > --- > drivers/infiniband/sw/rxe/rxe_loc.h | 1 - > drivers/infiniband/sw/rxe/rxe_mr.c | 533 ++++++++++++-------------- > drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- > drivers/infiniband/sw/rxe/rxe_verbs.h | 21 +- > 4 files changed, 251 insertions(+), 306 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > index fd70c71a9e4e..0cf78f9bb27c 100644 > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > @@ -71,7 +71,6 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, > void *addr, int length, enum rxe_mr_copy_dir dir); > int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, > int sg_nents, unsigned int *sg_offset); > -void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); > int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, > u64 compare, u64 swap_add, u64 *orig_val); > int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index fd5537ee7f04..e4634279080a 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c [snip...] > -static bool is_pmem_page(struct page *pg) > +static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) > { > - unsigned long paddr = page_to_phys(pg); > + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); > +} > > - return REGION_INTERSECTS == > - region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM, > - IORES_DESC_PERSISTENT_MEMORY); [snip...] > + rxe_mr_init(access, mr); > + > umem = ib_umem_get(&rxe->ib_dev, start, length, access); > if (IS_ERR(umem)) { > rxe_dbg_mr(mr, "Unable to pin memory region err = %d\n", > (int)PTR_ERR(umem)); > - err = PTR_ERR(umem); > - goto err_out; > + return PTR_ERR(umem); > } > > - num_buf = ib_umem_num_pages(umem); > - > - rxe_mr_init(access, mr); > - > - err = rxe_mr_alloc(mr, num_buf); > + err = rxe_mr_fill_pages_from_sgt(mr, &umem->sgt_append.sgt); > if (err) { > - rxe_dbg_mr(mr, "Unable to allocate memory for map\n"); > - goto err_release_umem; > + ib_umem_release(umem); > + return err; > } > > - num_buf = 0; > - map = mr->map; > - if (length > 0) { > - bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT; > - > - buf = map[0]->buf; > - for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) { > - struct page *pg = sg_page_iter_page(&sg_iter); > + mr->umem = umem; > + mr->ibmr.type = IB_MR_TYPE_USER; > + mr->state = RXE_MR_STATE_VALID; > > - if (persistent_access && !is_pmem_page(pg)) { > - rxe_dbg_mr(mr, "Unable to register persistent access to non-pmem device\n"); > - err = -EINVAL; > - goto err_release_umem; > - } I read you removed is_pmem_page and its checking, but there is no description about this. IMO, it's required by IBA spec. Thanks Zhijian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray 2023-01-16 2:21 ` lizhijian @ 2023-01-17 0:05 ` Bob Pearson 0 siblings, 0 replies; 16+ messages in thread From: Bob Pearson @ 2023-01-17 0:05 UTC (permalink / raw) To: lizhijian@fujitsu.com, jgg@nvidia.com, leonro@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org On 1/15/23 20:21, lizhijian@fujitsu.com wrote: > > > On 14/01/2023 07:27, Bob Pearson wrote: >> Replace struct rxe-phys_buf and struct rxe_map by struct xarray >> in rxe_verbs.h. This allows using rcu locking on reads for >> the memory maps stored in each mr. >> >> This is based off of a sketch of a patch from Jason Gunthorpe in the >> link below. Some changes were needed to make this work. It applies >> cleanly to the current for-next and passes the pyverbs, perftest >> and the same blktests test cases which run today. >> >> Link:https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ >> Co-developed-by: Jason Gunthorpe<jgg@nvidia.com> >> Signed-off-by: Bob Pearson<rpearsonhpe@gmail.com> >> --- >> drivers/infiniband/sw/rxe/rxe_loc.h | 1 - >> drivers/infiniband/sw/rxe/rxe_mr.c | 533 ++++++++++++-------------- >> drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- >> drivers/infiniband/sw/rxe/rxe_verbs.h | 21 +- >> 4 files changed, 251 insertions(+), 306 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h >> index fd70c71a9e4e..0cf78f9bb27c 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_loc.h >> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h >> @@ -71,7 +71,6 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, >> void *addr, int length, enum rxe_mr_copy_dir dir); >> int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, >> int sg_nents, unsigned int *sg_offset); >> -void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); >> int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, >> u64 compare, u64 swap_add, u64 *orig_val); >> int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr); >> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c >> index fd5537ee7f04..e4634279080a 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_mr.c >> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > [snip...] > >> -static bool is_pmem_page(struct page *pg) >> +static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) >> { >> - unsigned long paddr = page_to_phys(pg); >> + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); >> +} >> >> - return REGION_INTERSECTS == >> - region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM, >> - IORES_DESC_PERSISTENT_MEMORY); > > [snip...] > >> + rxe_mr_init(access, mr); >> + >> umem = ib_umem_get(&rxe->ib_dev, start, length, access); >> if (IS_ERR(umem)) { >> rxe_dbg_mr(mr, "Unable to pin memory region err = %d\n", >> (int)PTR_ERR(umem)); >> - err = PTR_ERR(umem); >> - goto err_out; >> + return PTR_ERR(umem); >> } >> >> - num_buf = ib_umem_num_pages(umem); >> - >> - rxe_mr_init(access, mr); >> - >> - err = rxe_mr_alloc(mr, num_buf); >> + err = rxe_mr_fill_pages_from_sgt(mr, &umem->sgt_append.sgt); >> if (err) { >> - rxe_dbg_mr(mr, "Unable to allocate memory for map\n"); >> - goto err_release_umem; >> + ib_umem_release(umem); >> + return err; >> } >> >> - num_buf = 0; >> - map = mr->map; >> - if (length > 0) { >> - bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT; >> - >> - buf = map[0]->buf; >> - for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) { >> - struct page *pg = sg_page_iter_page(&sg_iter); >> + mr->umem = umem; >> + mr->ibmr.type = IB_MR_TYPE_USER; >> + mr->state = RXE_MR_STATE_VALID; >> >> - if (persistent_access && !is_pmem_page(pg)) { >> - rxe_dbg_mr(mr, "Unable to register persistent access to non-pmem device\n"); >> - err = -EINVAL; >> - goto err_release_umem; >> - } > > I read you removed is_pmem_page and its checking, but there is no description about this. > IMO, it's required by IBA spec. > > Thanks > Zhijian Zhijian, It was dropped by accident. I just posted an update with the calls put back. Please take a look and if possible can you test to see if it doesn't regress your pmem changes. I have no way to test pmem. Also look at a comment I added to the pmem flush call. I am suspicious of the smp_store_release() call based on the original comment. Regards, Bob ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-01-18 14:15 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-13 23:26 [PATCH for-next v3 0/6] RDMA/rxe: Replace mr page map with an xarray Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 1/6] RDMA/rxe: Cleanup mr_check_range Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 2/6] RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 3/6] RDMA-rxe: Isolate mr code from atomic_reply() Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 4/6] RDMA-rxe: Isolate mr code from atomic_write_reply() Bob Pearson 2023-01-16 2:11 ` lizhijian 2023-01-16 13:53 ` Jason Gunthorpe 2023-01-17 1:36 ` Zhu Yanjun 2023-01-17 13:47 ` Jason Gunthorpe 2023-01-17 16:45 ` Bob Pearson 2023-01-18 0:18 ` Zhu Yanjun 2023-01-18 13:54 ` Jason Gunthorpe 2023-01-13 23:27 ` [PATCH for-next v3 5/6] RDMA/rxe: Cleanup page variables in rxe_mr.c Bob Pearson 2023-01-13 23:27 ` [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Bob Pearson 2023-01-16 2:21 ` lizhijian 2023-01-17 0:05 ` Bob Pearson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).