* [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
@ 2025-05-22 11:36 Leon Romanovsky
2025-05-22 13:29 ` Daisuke Matsuda
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 11:36 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Leon Romanovsky, Daisuke Matsuda, linux-rdma, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
RO pages has "perm" equal to 0, that caused to the situation
where such pages were marked as needed to have fault and caused
to infinite loop.
Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index a1416626f61a5..0f67167ddddd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
while (addr < iova + length) {
idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
- if (!(umem_odp->map.pfn_list[idx] & perm)) {
+ if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
need_fault = true;
break;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
@ 2025-05-22 13:29 ` Daisuke Matsuda
2025-05-22 13:37 ` Leon Romanovsky
2025-05-23 12:15 ` Zhu Yanjun
2025-05-22 13:35 ` Leon Romanovsky
2025-05-22 15:40 ` Zhu Yanjun
2 siblings, 2 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-05-22 13:29 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun
On 2025/05/22 20:36, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
>
> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
Thank you!
This change fixes one of the two issues I reported.
The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
I will take a closer look anyway.
Thanks,
Daisuke
> ---
> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index a1416626f61a5..0f67167ddddd1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> while (addr < iova + length) {
> idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>
> - if (!(umem_odp->map.pfn_list[idx] & perm)) {
> + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
> need_fault = true;
> break;
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
2025-05-22 13:29 ` Daisuke Matsuda
@ 2025-05-22 13:35 ` Leon Romanovsky
2025-05-22 15:40 ` Zhu Yanjun
2 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:35 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Daisuke Matsuda, linux-rdma, Zhu Yanjun, Leon Romanovsky
On Thu, 22 May 2025 14:36:18 +0300, Leon Romanovsky wrote:
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
>
>
Applied, thanks!
[1/1] RDMA/rxe: Break endless pagefault loop for RO pages
https://git.kernel.org/rdma/rdma/c/01ec1d8feaf938
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 13:29 ` Daisuke Matsuda
@ 2025-05-22 13:37 ` Leon Romanovsky
2025-05-22 13:42 ` Leon Romanovsky
2025-05-23 12:15 ` Zhu Yanjun
1 sibling, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:37 UTC (permalink / raw)
To: Daisuke Matsuda; +Cc: Jason Gunthorpe, linux-rdma, Zhu Yanjun
On Thu, May 22, 2025 at 10:29:02PM +0900, Daisuke Matsuda wrote:
>
> On 2025/05/22 20:36, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > RO pages has "perm" equal to 0, that caused to the situation
> > where such pages were marked as needed to have fault and caused
> > to infinite loop.
> >
> > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>
> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
>
> Thank you!
> This change fixes one of the two issues I reported.
> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
>
>
> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
Thanks, I updated the link to point to https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/
>
> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
Unlikely, up till now, it indicated that driver didn't release some
uverb object.
> I will take a closer look anyway.
>
> Thanks,
> Daisuke
>
>
> > ---
> > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > index a1416626f61a5..0f67167ddddd1 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> > while (addr < iova + length) {
> > idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > - if (!(umem_odp->map.pfn_list[idx] & perm)) {
> > + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
> > need_fault = true;
> > break;
> > }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 13:37 ` Leon Romanovsky
@ 2025-05-22 13:42 ` Leon Romanovsky
0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:42 UTC (permalink / raw)
To: Daisuke Matsuda; +Cc: Jason Gunthorpe, linux-rdma, Zhu Yanjun
On Thu, May 22, 2025 at 04:37:16PM +0300, Leon Romanovsky wrote:
> On Thu, May 22, 2025 at 10:29:02PM +0900, Daisuke Matsuda wrote:
> >
> > On 2025/05/22 20:36, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > RO pages has "perm" equal to 0, that caused to the situation
> > > where such pages were marked as needed to have fault and caused
> > > to infinite loop.
> > >
> > > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >
> > Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
> >
> > Thank you!
> > This change fixes one of the two issues I reported.
> > The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
> >
> >
> > The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> > cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>
> Thanks, I updated the link to point to https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/
>
> >
> > The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> > enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
>
> Unlikely, up till now, it indicated that driver didn't release some
> uverb object.
BTW, all places in RXE driver which do the following:
page = hmm_pfn_to_page(umem_odp->map.pfn_list[index]);
if (!page) {
...
are incorrect, hmm_pfn_to_page() will always return something.
Thanks
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
2025-05-22 13:29 ` Daisuke Matsuda
2025-05-22 13:35 ` Leon Romanovsky
@ 2025-05-22 15:40 ` Zhu Yanjun
2025-05-22 16:07 ` Leon Romanovsky
2 siblings, 1 reply; 9+ messages in thread
From: Zhu Yanjun @ 2025-05-22 15:40 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, Daisuke Matsuda, linux-rdma, Zhu Yanjun
在 2025/5/22 13:36, Leon Romanovsky 写道:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
>
> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index a1416626f61a5..0f67167ddddd1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> while (addr < iova + length) {
> idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>
> - if (!(umem_odp->map.pfn_list[idx] & perm)) {
> + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
Because perm is not used, it is not necessary to calculate and pass perm
to rxe_check_pagefault. The cleanup is as below:
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c
b/drivers/infiniband/sw/rxe/rxe_odp.c
index 9f6e2bb2a269..f385fccd5988 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -125,7 +125,7 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64
start, u64 length,
}
static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
- u64 iova, int length, u32 perm)
+ u64 iova, int length)
{
bool need_fault = false;
u64 addr;
@@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct
ib_umem_odp *umem_odp,
while (addr < iova + length) {
idx = (addr - ib_umem_start(umem_odp)) >>
umem_odp->page_shift;
- if (!(umem_odp->dma_list[idx] & perm)) {
+ if (!(umem_odp->dma_list[idx] & HMM_PFN_VALID)) {
need_fault = true;
break;
}
@@ -151,19 +151,14 @@ static int rxe_odp_map_range_and_lock(struct
rxe_mr *mr, u64 iova, int length, u
{
struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
bool need_fault;
- u64 perm;
int err;
if (unlikely(length < 1))
return -EINVAL;
- perm = ODP_READ_ALLOWED_BIT;
- if (!(flags & RXE_PAGEFAULT_RDONLY))
- perm |= ODP_WRITE_ALLOWED_BIT;
-
mutex_lock(&umem_odp->umem_mutex);
- need_fault = rxe_check_pagefault(umem_odp, iova, length, perm);
+ need_fault = rxe_check_pagefault(umem_odp, iova, length);
if (need_fault) {
mutex_unlock(&umem_odp->umem_mutex);
@@ -173,7 +168,7 @@ static int rxe_odp_map_range_and_lock(struct rxe_mr
*mr, u64 iova, int length, u
if (err < 0)
return err;
- need_fault = rxe_check_pagefault(umem_odp, iova, length,
perm);
+ need_fault = rxe_check_pagefault(umem_odp, iova, length);
if (need_fault)
return -EFAULT;
}
Zhu Yanjun
> need_fault = true;
> break;
> }
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 15:40 ` Zhu Yanjun
@ 2025-05-22 16:07 ` Leon Romanovsky
0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 16:07 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Jason Gunthorpe, Daisuke Matsuda, linux-rdma, Zhu Yanjun
On Thu, May 22, 2025 at 05:40:38PM +0200, Zhu Yanjun wrote:
> 在 2025/5/22 13:36, Leon Romanovsky 写道:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > RO pages has "perm" equal to 0, that caused to the situation
> > where such pages were marked as needed to have fault and caused
> > to infinite loop.
> >
> > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > index a1416626f61a5..0f67167ddddd1 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> > while (addr < iova + length) {
> > idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > - if (!(umem_odp->map.pfn_list[idx] & perm)) {
> > + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>
> Because perm is not used, it is not necessary to calculate and pass perm to
> rxe_check_pagefault. The cleanup is as below:
Thanks a lot, I folded this cleanup to the fix.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-22 13:29 ` Daisuke Matsuda
2025-05-22 13:37 ` Leon Romanovsky
@ 2025-05-23 12:15 ` Zhu Yanjun
2025-05-23 12:57 ` Daisuke Matsuda
1 sibling, 1 reply; 9+ messages in thread
From: Zhu Yanjun @ 2025-05-23 12:15 UTC (permalink / raw)
To: Daisuke Matsuda, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun
On 22.05.25 15:29, Daisuke Matsuda wrote:
>
> On 2025/05/22 20:36, Leon Romanovsky wrote:
>> From: Leon Romanovsky <leonro@nvidia.com>
>>
>> RO pages has "perm" equal to 0, that caused to the situation
>> where such pages were marked as needed to have fault and caused
>> to infinite loop.
>>
>> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in
>> PFN")
>> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
>> Closes:
>> https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>
> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
In the bug report mail, you mentioned
"
After these two patches are merged to the for-next tree, RXE ODP test
always hangs:
RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage
RDMA/umem: Store ODP access mask information in PFN
"
After this commit is applied, which of the two previous commits is
innocent, and which one causes the "stuck issue in uverbs_destroy_ufile_hw"?
Best Regards,
Yanjun.Zhu
>
> Thank you!
> This change fixes one of the two issues I reported.
> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
>
>
> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> cf.
> https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>
> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> enabled in rxe. It might indicate that the root cause lies in ib_uverbs
> layer.
> I will take a closer look anyway.
>
> Thanks,
> Daisuke
>
>
>> ---
>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c
>> b/drivers/infiniband/sw/rxe/rxe_odp.c
>> index a1416626f61a5..0f67167ddddd1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct
>> ib_umem_odp *umem_odp,
>> while (addr < iova + length) {
>> idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>> - if (!(umem_odp->map.pfn_list[idx] & perm)) {
>> + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>> need_fault = true;
>> break;
>> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
2025-05-23 12:15 ` Zhu Yanjun
@ 2025-05-23 12:57 ` Daisuke Matsuda
0 siblings, 0 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 12:57 UTC (permalink / raw)
To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun
On 2025/05/23 21:15, Zhu Yanjun wrote:
> On 22.05.25 15:29, Daisuke Matsuda wrote:
>>
>> On 2025/05/22 20:36, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> RO pages has "perm" equal to 0, that caused to the situation
>>> where such pages were marked as needed to have fault and caused
>>> to infinite loop.
>>>
>>> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
>>> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
>>> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
>>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>>
>> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
>
> In the bug report mail, you mentioned
> "
> After these two patches are merged to the for-next tree, RXE ODP test always hangs:
> RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage
> RDMA/umem: Store ODP access mask information in PFN
> "
>
> After this commit is applied, which of the two previous commits is innocent, and which one causes the "stuck issue in uverbs_destroy_ufile_hw"?
The issue caused by "RDMA/umem: Store ODP access mask information in PFN" has been resolved,
and after applying "RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage",
the stuck issue in uverbs_destroy_ufile_hw() emerges.
I have added some details to the bug report. I am going post a fix
though I am not sure people like changing hmm.c to fix this one.
Thanks,
Daisuke
>
> Best Regards,
> Yanjun.Zhu
>
>>
>> Thank you!
>> This change fixes one of the two issues I reported.
>> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
>>
>>
>> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
>> cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>>
>> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
>> enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
>> I will take a closer look anyway.
>>
>> Thanks,
>> Daisuke
>>
>>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>>> index a1416626f61a5..0f67167ddddd1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>>> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
>>> while (addr < iova + length) {
>>> idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>>> - if (!(umem_odp->map.pfn_list[idx] & perm)) {
>>> + if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>>> need_fault = true;
>>> break;
>>> }
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-23 12:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
2025-05-22 13:29 ` Daisuke Matsuda
2025-05-22 13:37 ` Leon Romanovsky
2025-05-22 13:42 ` Leon Romanovsky
2025-05-23 12:15 ` Zhu Yanjun
2025-05-23 12:57 ` Daisuke Matsuda
2025-05-22 13:35 ` Leon Romanovsky
2025-05-22 15:40 ` Zhu Yanjun
2025-05-22 16:07 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).