* [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
@ 2023-10-27 5:41 Li Zhijian
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Li Zhijian @ 2023-10-27 5:41 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke, bvanassche,
Li Zhijian
mr->page_list only encodes *page without page offset, when
page_size != PAGE_SIZE, we cannot restore the address with a wrong
page_offset.
Note that this patch will break some ULPs that try to register 4K
MR when PAGE_SIZE is not 4K.
SRP and nvme over RXE is known to be impacted.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..61a136ea1d91 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
struct rxe_mr *mr = to_rmr(ibmr);
unsigned int page_size = mr_page_size(mr);
+ if (page_size != PAGE_SIZE) {
+ rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
+ page_size, PAGE_SIZE);
+ return -EOPNOTSUPP;
+ }
+
mr->nbuf = 0;
mr->page_shift = ilog2(page_size);
mr->page_mask = ~((u64)page_size - 1);
--
2.41.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
2023-10-27 5:41 [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Li Zhijian
@ 2023-10-27 5:41 ` Li Zhijian
2023-10-27 21:47 ` Bart Van Assche
2023-10-28 3:52 ` Zhu Yanjun
2023-10-27 8:17 ` [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Zhu Yanjun
2023-10-30 7:51 ` Zhijian Li (Fujitsu)
2 siblings, 2 replies; 17+ messages in thread
From: Li Zhijian @ 2023-10-27 5:41 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke, bvanassche,
Li Zhijian
RXE_PAGE_SIZE_CAP means the MR page size supported by RXE. However
in current RXE implementation, only PAGE_SIZE MR works well.
So change it to PAGE_SIZE only.
ULPs such as SRP calculating the page size according to this attribute get
worked again with this change.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index d2f57ead78ad..b1cf1e1c0ce1 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
/* default/initial rxe device parameter settings */
enum rxe_device_param {
RXE_MAX_MR_SIZE = -1ull,
- RXE_PAGE_SIZE_CAP = 0xfffff000,
+ RXE_PAGE_SIZE_CAP = PAGE_SIZE,
RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
| IB_DEVICE_BAD_QKEY_CNTR
--
2.41.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-27 5:41 [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Li Zhijian
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
@ 2023-10-27 8:17 ` Zhu Yanjun
2023-10-27 21:46 ` Bart Van Assche
2023-10-30 8:13 ` Zhijian Li (Fujitsu)
2023-10-30 7:51 ` Zhijian Li (Fujitsu)
2 siblings, 2 replies; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-27 8:17 UTC (permalink / raw)
To: Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke, bvanassche
在 2023/10/27 13:41, Li Zhijian 写道:
> mr->page_list only encodes *page without page offset, when
> page_size != PAGE_SIZE, we cannot restore the address with a wrong
> page_offset.
>
> Note that this patch will break some ULPs that try to register 4K
> MR when PAGE_SIZE is not 4K.
> SRP and nvme over RXE is known to be impacted.
When ULP uses folio or compound page, ULP can not work well with RXE
after this commit is applied.
Perhaps removing page_size set in RXE is a good solution because
page_size is set twice, firstly page_size is set in infiniband/core,
secondly it is set in RXE.
When folio or compound page is used in ULP, it is very possible that
page_size in infiniband/core is different from the page_size in RXE.
Not sure what problem this difference will cause.
Zhu Yanjun
>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..61a136ea1d91 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> struct rxe_mr *mr = to_rmr(ibmr);
> unsigned int page_size = mr_page_size(mr);
>
> + if (page_size != PAGE_SIZE) {
> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
> + page_size, PAGE_SIZE);
> + return -EOPNOTSUPP;
> + }
> +
> mr->nbuf = 0;
> mr->page_shift = ilog2(page_size);
> mr->page_mask = ~((u64)page_size - 1);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-27 8:17 ` [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Zhu Yanjun
@ 2023-10-27 21:46 ` Bart Van Assche
2023-10-28 2:48 ` Zhu Yanjun
2023-10-30 8:13 ` Zhijian Li (Fujitsu)
1 sibling, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2023-10-27 21:46 UTC (permalink / raw)
To: Zhu Yanjun, Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke
On 10/27/23 01:17, Zhu Yanjun wrote:
> When ULP uses folio or compound page, ULP can not work well with RXE
> after this commit is applied.
Isn't it the responsibility of the DMA mapping code to build an sg-list
for folios? Drivers like ib_srp see an sg-list, whether that comes from
a folio or from something else.
Bart.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
@ 2023-10-27 21:47 ` Bart Van Assche
2023-10-28 3:52 ` Zhu Yanjun
1 sibling, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2023-10-27 21:47 UTC (permalink / raw)
To: Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke
On 10/26/23 22:41, Li Zhijian wrote:
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
> index d2f57ead78ad..b1cf1e1c0ce1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
> /* default/initial rxe device parameter settings */
> enum rxe_device_param {
> RXE_MAX_MR_SIZE = -1ull,
> - RXE_PAGE_SIZE_CAP = 0xfffff000,
> + RXE_PAGE_SIZE_CAP = PAGE_SIZE,
> RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
> RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
> | IB_DEVICE_BAD_QKEY_CNTR
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-27 21:46 ` Bart Van Assche
@ 2023-10-28 2:48 ` Zhu Yanjun
2023-10-28 23:07 ` Bart Van Assche
0 siblings, 1 reply; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-28 2:48 UTC (permalink / raw)
To: Bart Van Assche, Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke
在 2023/10/28 5:46, Bart Van Assche 写道:
> On 10/27/23 01:17, Zhu Yanjun wrote:
>> When ULP uses folio or compound page, ULP can not work well with RXE
>> after this commit is applied.
>
> Isn't it the responsibility of the DMA mapping code to build an sg-list
> for folios? Drivers like ib_srp see an sg-list, whether that comes from
A folio is a way of representing a set of physically contiguous base
pages. In current implementations of folio, it seems that sg-list is not
used.
In Folio, some huge pages whose size is not PAGE_SIZE is dma-mapped into
hardware.
So the page size of folio is not equal to PAGE_SIZE. If this commit is
applied, it causes potential risks to the future folio.
I have developed some folio work for some NIC and RDMA drivers. In
Folio, the page size of Folio is possibly not equal to PAGE_SIZE, it is
multiple PAGE_SIZE. And when folio is dma-mapped to HW, the page size is
equal to multiple PAGE_SIZE.
In this case, ULP with folio will not work well with current RXE after
this commit is applied.
Removing page_size in RXE seems a plan for this problem.
Zhu Yanjun
> a folio or from something else.
>
> Bart.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
2023-10-27 21:47 ` Bart Van Assche
@ 2023-10-28 3:52 ` Zhu Yanjun
1 sibling, 0 replies; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-28 3:52 UTC (permalink / raw)
To: Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke, bvanassche
在 2023/10/27 13:41, Li Zhijian 写道:
> RXE_PAGE_SIZE_CAP means the MR page size supported by RXE. However
> in current RXE implementation, only PAGE_SIZE MR works well.
> So change it to PAGE_SIZE only.
>
> ULPs such as SRP calculating the page size according to this attribute get
> worked again with this change.
>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Thanks a lot.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
> index d2f57ead78ad..b1cf1e1c0ce1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
> /* default/initial rxe device parameter settings */
> enum rxe_device_param {
> RXE_MAX_MR_SIZE = -1ull,
> - RXE_PAGE_SIZE_CAP = 0xfffff000,
> + RXE_PAGE_SIZE_CAP = PAGE_SIZE,
> RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
> RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
> | IB_DEVICE_BAD_QKEY_CNTR
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-28 2:48 ` Zhu Yanjun
@ 2023-10-28 23:07 ` Bart Van Assche
2023-10-29 3:22 ` Zhu Yanjun
0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2023-10-28 23:07 UTC (permalink / raw)
To: Zhu Yanjun, Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke
On 10/27/23 19:48, Zhu Yanjun wrote:
> In this case, ULP with folio will not work well with current RXE after
> this commit is applied.
Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The
latter function calls rxe_map_mr_sg() if the RXE driver is used.
rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates
SG-entries that are larger than a virtual memory page into multiple
entries with size mr->page_size.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-28 23:07 ` Bart Van Assche
@ 2023-10-29 3:22 ` Zhu Yanjun
0 siblings, 0 replies; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-29 3:22 UTC (permalink / raw)
To: Bart Van Assche, Li Zhijian, zyjzyj2000, jgg, leon, linux-rdma
Cc: linux-kernel, rpearsonhpe, matsuda-daisuke
在 2023/10/29 7:07, Bart Van Assche 写道:
> On 10/27/23 19:48, Zhu Yanjun wrote:
>> In this case, ULP with folio will not work well with current RXE after
>> this commit is applied.
>
> Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The
> latter function calls rxe_map_mr_sg() if the RXE driver is used.
> rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates
> SG-entries that are larger than a virtual memory page into multiple
> entries with size mr->page_size.
It seems that we are not on the same page. I am thinking that this fix
should also work for the future folio. And your idea is based on the
current implementation.
Perhaps it is not a good time to discuss folio currently. Let me focus
on the current implementation.
In this commit, if the page size is not equal to PAGE_SIZE, it will pop
out warning then exit. So "rxe_info_mr" should be changed to rxe_warning_mr?
In fact, I like to remove the page size assignment in rxe, that is,
partly reverting the commit 325a7eb85199 ("RDMA/rxe: Cleanup page
variables in rxe_mr.c"). But it seems that Jason did not like this.
I no longer insist on this.
I am not sure whether Jason, Leon, Bob and you have better solution to
this problem or not.
If not, this commit can fix this problem if no better solution.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Thanks,
Zhu Yanjun
>
> Thanks,
>
> Bart.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-27 5:41 [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Li Zhijian
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
2023-10-27 8:17 ` [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Zhu Yanjun
@ 2023-10-30 7:51 ` Zhijian Li (Fujitsu)
2023-10-30 12:40 ` Jason Gunthorpe
2 siblings, 1 reply; 17+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-30 7:51 UTC (permalink / raw)
To: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
On 27/10/2023 13:41, Li Zhijian wrote:
> mr->page_list only encodes *page without page offset, when
> page_size != PAGE_SIZE, we cannot restore the address with a wrong
> page_offset.
>
> Note that this patch will break some ULPs that try to register 4K
> MR when PAGE_SIZE is not 4K.
> SRP and nvme over RXE is known to be impacted.
>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..61a136ea1d91 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> struct rxe_mr *mr = to_rmr(ibmr);
> unsigned int page_size = mr_page_size(mr);
>
> + if (page_size != PAGE_SIZE) {
It seems this condition is too strict, it should be:
if (!IS_ALIGNED(page_size, PAGE_SIZE))
So that, page_size with (N * PAGE_SIZE) can work as previously.
Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
Thanks
Zhijian
> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
> + page_size, PAGE_SIZE);
> + return -EOPNOTSUPP;
> + }
> +
> mr->nbuf = 0;
> mr->page_shift = ilog2(page_size);
> mr->page_mask = ~((u64)page_size - 1);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-27 8:17 ` [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Zhu Yanjun
2023-10-27 21:46 ` Bart Van Assche
@ 2023-10-30 8:13 ` Zhijian Li (Fujitsu)
2023-10-30 9:43 ` Zhu Yanjun
1 sibling, 1 reply; 17+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-30 8:13 UTC (permalink / raw)
To: Zhu Yanjun, zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
On 27/10/2023 16:17, Zhu Yanjun wrote:
> 在 2023/10/27 13:41, Li Zhijian 写道:
>> mr->page_list only encodes *page without page offset, when
>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>> page_offset.
>>
>> Note that this patch will break some ULPs that try to register 4K
>> MR when PAGE_SIZE is not 4K.
>> SRP and nvme over RXE is known to be impacted.
>
> When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied.
>
> Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE.
Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this.
48 static void rxe_mr_init(int access, struct rxe_mr *mr)
49 {
50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1);
51
52 /* set ibmr->l/rkey and also copy into private l/rkey
53 * for user MRs these will always be the same
54 * for cases where caller 'owns' the key portion
55 * they may be different until REG_MR WQE is executed.
56 */
57 mr->lkey = mr->ibmr.lkey = key;
58 mr->rkey = mr->ibmr.rkey = key;
59
60 mr->access = access;
61 mr->ibmr.page_size = PAGE_SIZE;
62 mr->page_mask = PAGE_MASK;
63 mr->page_shift = PAGE_SHIFT;
64 mr->state = RXE_MR_STATE_INVALID;
65 }
Thanks
Zhijian
>
> When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE
>
> Not sure what problem this difference will cause.
>
> Zhu Yanjun
>
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index f54042e9aeb2..61a136ea1d91 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>> struct rxe_mr *mr = to_rmr(ibmr);
>> unsigned int page_size = mr_page_size(mr);
>> + if (page_size != PAGE_SIZE) {
>> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
>> + page_size, PAGE_SIZE);
>> + return -EOPNOTSUPP;
>> + }
>> +
>> mr->nbuf = 0;
>> mr->page_shift = ilog2(page_size);
>> mr->page_mask = ~((u64)page_size - 1);
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-30 8:13 ` Zhijian Li (Fujitsu)
@ 2023-10-30 9:43 ` Zhu Yanjun
0 siblings, 0 replies; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-30 9:43 UTC (permalink / raw)
To: Zhijian Li (Fujitsu), zyjzyj2000@gmail.com, jgg@ziepe.ca,
leon@kernel.org, linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
在 2023/10/30 16:13, Zhijian Li (Fujitsu) 写道:
>
>
> On 27/10/2023 16:17, Zhu Yanjun wrote:
>> 在 2023/10/27 13:41, Li Zhijian 写道:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>
>> When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied.
>>
>> Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE.
>
> Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this.
Please read the discussions carefully. This problem has been discussed.
Best Regards,
Zhu Yanjun
>
> 48 static void rxe_mr_init(int access, struct rxe_mr *mr)
> 49 {
> 50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1);
> 51
> 52 /* set ibmr->l/rkey and also copy into private l/rkey
> 53 * for user MRs these will always be the same
> 54 * for cases where caller 'owns' the key portion
> 55 * they may be different until REG_MR WQE is executed.
> 56 */
> 57 mr->lkey = mr->ibmr.lkey = key;
> 58 mr->rkey = mr->ibmr.rkey = key;
> 59
> 60 mr->access = access;
> 61 mr->ibmr.page_size = PAGE_SIZE;
> 62 mr->page_mask = PAGE_MASK;
> 63 mr->page_shift = PAGE_SHIFT;
> 64 mr->state = RXE_MR_STATE_INVALID;
> 65 }
>
>
> Thanks
> Zhijian
>
>>
>> When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE
>>
>> Not sure what problem this difference will cause.
>>
>> Zhu Yanjun
>>
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>> struct rxe_mr *mr = to_rmr(ibmr);
>>> unsigned int page_size = mr_page_size(mr);
>>> + if (page_size != PAGE_SIZE) {
>>> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
>>> + page_size, PAGE_SIZE);
>>> + return -EOPNOTSUPP;
>>> + }
>>> +
>>> mr->nbuf = 0;
>>> mr->page_shift = ilog2(page_size);
>>> mr->page_mask = ~((u64)page_size - 1);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-30 7:51 ` Zhijian Li (Fujitsu)
@ 2023-10-30 12:40 ` Jason Gunthorpe
2023-10-31 8:52 ` Zhu Yanjun
2023-10-31 9:59 ` Zhijian Li (Fujitsu)
0 siblings, 2 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2023-10-30 12:40 UTC (permalink / raw)
To: Zhijian Li (Fujitsu)
Cc: zyjzyj2000@gmail.com, leon@kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>
>
> On 27/10/2023 13:41, Li Zhijian wrote:
> > mr->page_list only encodes *page without page offset, when
> > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > page_offset.
> >
> > Note that this patch will break some ULPs that try to register 4K
> > MR when PAGE_SIZE is not 4K.
> > SRP and nvme over RXE is known to be impacted.
> >
> > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > index f54042e9aeb2..61a136ea1d91 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > struct rxe_mr *mr = to_rmr(ibmr);
> > unsigned int page_size = mr_page_size(mr);
> >
> > + if (page_size != PAGE_SIZE) {
>
> It seems this condition is too strict, it should be:
> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>
> So that, page_size with (N * PAGE_SIZE) can work as previously.
> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
That makes sense
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-30 12:40 ` Jason Gunthorpe
@ 2023-10-31 8:52 ` Zhu Yanjun
2023-10-31 13:19 ` Jason Gunthorpe
2023-10-31 9:59 ` Zhijian Li (Fujitsu)
1 sibling, 1 reply; 17+ messages in thread
From: Zhu Yanjun @ 2023-10-31 8:52 UTC (permalink / raw)
To: Jason Gunthorpe, Zhijian Li (Fujitsu)
Cc: zyjzyj2000@gmail.com, leon@kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
在 2023/10/30 20:40, Jason Gunthorpe 写道:
> On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/10/2023 13:41, Li Zhijian wrote:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>> struct rxe_mr *mr = to_rmr(ibmr);
>>> unsigned int page_size = mr_page_size(mr);
>>>
>>> + if (page_size != PAGE_SIZE) {
>>
>> It seems this condition is too strict, it should be:
>> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>>
>> So that, page_size with (N * PAGE_SIZE) can work as previously.
>> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
>
> That makes sense
I read all the discussions very carefully.
Thanks, Greg.
Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is
enabled, the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled,
PAGE_SIZE is 4K.
But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem
in RXE. This problem is in NVMe.
If this problem in NVMe is not fixed, NVMe still can not use RXE to make
tests with blktests when CONFIG_ARM64_64K_PAGES is enabled.
This is not a problem in RXE, it should not be fixed in RXE.
Zhu Yanjun
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-30 12:40 ` Jason Gunthorpe
2023-10-31 8:52 ` Zhu Yanjun
@ 2023-10-31 9:59 ` Zhijian Li (Fujitsu)
1 sibling, 0 replies; 17+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-31 9:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: zyjzyj2000@gmail.com, leon@kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
On 30/10/2023 20:40, Jason Gunthorpe wrote:
> On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/10/2023 13:41, Li Zhijian wrote:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>> struct rxe_mr *mr = to_rmr(ibmr);
>>> unsigned int page_size = mr_page_size(mr);
>>>
>>> + if (page_size != PAGE_SIZE) {
>>
>> It seems this condition is too strict, it should be:
>> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>>
I have to say I retract this conclusion. It still misses something.
To support PAGE_SIZE aligned MR, we have to refactor rxe_map_mr_sg() or rxe_set_page()
Currently, rxe_set_page() will be called in the step of page_size, this doesn't split N*PAGE_SIZE memory into
N *page. So when we restore an iova from xarray, the array index is wrong as well.
So i'm going to refactor rxe_map_mr_sg() to iterate the sgl by myself in rxe_map_mr_sg() like SIW does.
Hope this refactor can help RXE to support SZ_4K when PAGE_SIZE!=4K as well.
Thanks
Zhijian
>> So that, page_size with (N * PAGE_SIZE) can work as previously.
>> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
>
> That makes sense
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-31 8:52 ` Zhu Yanjun
@ 2023-10-31 13:19 ` Jason Gunthorpe
2023-11-01 0:58 ` Greg Sword
0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-10-31 13:19 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Zhijian Li (Fujitsu), zyjzyj2000@gmail.com, leon@kernel.org,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
rpearsonhpe@gmail.com, Daisuke Matsuda (Fujitsu),
bvanassche@acm.org
On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote:
> 在 2023/10/30 20:40, Jason Gunthorpe 写道:
> > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
> > >
> > >
> > > On 27/10/2023 13:41, Li Zhijian wrote:
> > > > mr->page_list only encodes *page without page offset, when
> > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > > > page_offset.
> > > >
> > > > Note that this patch will break some ULPs that try to register 4K
> > > > MR when PAGE_SIZE is not 4K.
> > > > SRP and nvme over RXE is known to be impacted.
> > > >
> > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > > > ---
> > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > index f54042e9aeb2..61a136ea1d91 100644
> > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > > > struct rxe_mr *mr = to_rmr(ibmr);
> > > > unsigned int page_size = mr_page_size(mr);
> > > > + if (page_size != PAGE_SIZE) {
> > >
> > > It seems this condition is too strict, it should be:
> > > if (!IS_ALIGNED(page_size, PAGE_SIZE))
> > >
> > > So that, page_size with (N * PAGE_SIZE) can work as previously.
> > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
> >
> > That makes sense
>
> I read all the discussions very carefully.
>
> Thanks, Greg.
>
> Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled,
> the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is
> 4K.
>
> But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
> CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in
> RXE. This problem is in NVMe.
Maybe, but no real RDMA devices don't support 4K.
The xarray conversion may need revision to use physical addresses
instead of storing struct pages so it can handle this kind of
segmentation.
Certainly in the mean time it should be rejected.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr
2023-10-31 13:19 ` Jason Gunthorpe
@ 2023-11-01 0:58 ` Greg Sword
0 siblings, 0 replies; 17+ messages in thread
From: Greg Sword @ 2023-11-01 0:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Zhu Yanjun, Zhijian Li (Fujitsu), zyjzyj2000@gmail.com,
leon@kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
Daisuke Matsuda (Fujitsu), bvanassche@acm.org
On Tue, Oct 31, 2023 at 9:19 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote:
> > 在 2023/10/30 20:40, Jason Gunthorpe 写道:
> > > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
> > > >
> > > >
> > > > On 27/10/2023 13:41, Li Zhijian wrote:
> > > > > mr->page_list only encodes *page without page offset, when
> > > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > > > > page_offset.
> > > > >
> > > > > Note that this patch will break some ULPs that try to register 4K
> > > > > MR when PAGE_SIZE is not 4K.
> > > > > SRP and nvme over RXE is known to be impacted.
> > > > >
> > > > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > > > > ---
> > > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > index f54042e9aeb2..61a136ea1d91 100644
> > > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > > > > struct rxe_mr *mr = to_rmr(ibmr);
> > > > > unsigned int page_size = mr_page_size(mr);
> > > > > + if (page_size != PAGE_SIZE) {
> > > >
> > > > It seems this condition is too strict, it should be:
> > > > if (!IS_ALIGNED(page_size, PAGE_SIZE))
> > > >
> > > > So that, page_size with (N * PAGE_SIZE) can work as previously.
> > > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
> > >
> > > That makes sense
> >
> > I read all the discussions very carefully.
> >
> > Thanks, Greg.
> >
> > Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled,
> > the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is
> > 4K.
> >
> > But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
> > CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in
> > RXE. This problem is in NVMe.
>
> Maybe, but no real RDMA devices don't support 4K.
>
> The xarray conversion may need revision to use physical addresses
> instead of storing struct pages so it can handle this kind of
> segmentation.
This problem can not be fixed until rxe supports multiple page sizes
including 4K page size.
Now it is not fixed. It is an intermediate way.
>
> Certainly in the mean time it should be rejected.
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-11-01 0:58 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-27 5:41 [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Li Zhijian
2023-10-27 5:41 ` [PATCH RFC 2/2] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE Li Zhijian
2023-10-27 21:47 ` Bart Van Assche
2023-10-28 3:52 ` Zhu Yanjun
2023-10-27 8:17 ` [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr Zhu Yanjun
2023-10-27 21:46 ` Bart Van Assche
2023-10-28 2:48 ` Zhu Yanjun
2023-10-28 23:07 ` Bart Van Assche
2023-10-29 3:22 ` Zhu Yanjun
2023-10-30 8:13 ` Zhijian Li (Fujitsu)
2023-10-30 9:43 ` Zhu Yanjun
2023-10-30 7:51 ` Zhijian Li (Fujitsu)
2023-10-30 12:40 ` Jason Gunthorpe
2023-10-31 8:52 ` Zhu Yanjun
2023-10-31 13:19 ` Jason Gunthorpe
2023-11-01 0:58 ` Greg Sword
2023-10-31 9:59 ` Zhijian Li (Fujitsu)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox