linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
@ 2025-05-22 11:36 Leon Romanovsky
  2025-05-22 13:29 ` Daisuke Matsuda
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 11:36 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, Daisuke Matsuda, linux-rdma, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

RO pages has "perm" equal to 0, that caused to the situation
where such pages were marked as needed to have fault and caused
to infinite loop.

Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index a1416626f61a5..0f67167ddddd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
 	while (addr < iova + length) {
 		idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
 
-		if (!(umem_odp->map.pfn_list[idx] & perm)) {
+		if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
 			need_fault = true;
 			break;
 		}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
@ 2025-05-22 13:29 ` Daisuke Matsuda
  2025-05-22 13:37   ` Leon Romanovsky
  2025-05-23 12:15   ` Zhu Yanjun
  2025-05-22 13:35 ` Leon Romanovsky
  2025-05-22 15:40 ` Zhu Yanjun
  2 siblings, 2 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-05-22 13:29 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun


On 2025/05/22 20:36, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
> 
> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>

Thank you!
This change fixes one of the two issues I reported.
The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.


The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/

The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
I will take a closer look anyway.

Thanks,
Daisuke


> ---
>   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index a1416626f61a5..0f67167ddddd1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
>   	while (addr < iova + length) {
>   		idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>   
> -		if (!(umem_odp->map.pfn_list[idx] & perm)) {
> +		if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>   			need_fault = true;
>   			break;
>   		}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
  2025-05-22 13:29 ` Daisuke Matsuda
@ 2025-05-22 13:35 ` Leon Romanovsky
  2025-05-22 15:40 ` Zhu Yanjun
  2 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Daisuke Matsuda, linux-rdma, Zhu Yanjun, Leon Romanovsky


On Thu, 22 May 2025 14:36:18 +0300, Leon Romanovsky wrote:
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
> 
> 

Applied, thanks!

[1/1] RDMA/rxe: Break endless pagefault loop for RO pages
      https://git.kernel.org/rdma/rdma/c/01ec1d8feaf938

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 13:29 ` Daisuke Matsuda
@ 2025-05-22 13:37   ` Leon Romanovsky
  2025-05-22 13:42     ` Leon Romanovsky
  2025-05-23 12:15   ` Zhu Yanjun
  1 sibling, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:37 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: Jason Gunthorpe, linux-rdma, Zhu Yanjun

On Thu, May 22, 2025 at 10:29:02PM +0900, Daisuke Matsuda wrote:
> 
> On 2025/05/22 20:36, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > RO pages has "perm" equal to 0, that caused to the situation
> > where such pages were marked as needed to have fault and caused
> > to infinite loop.
> > 
> > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> 
> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
> 
> Thank you!
> This change fixes one of the two issues I reported.
> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
> 
> 
> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/

Thanks, I updated the link to point to https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/

> 
> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.

Unlikely, up till now, it indicated that driver didn't release some
uverb object.

> I will take a closer look anyway.
> 
> Thanks,
> Daisuke
> 
> 
> > ---
> >   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > index a1416626f61a5..0f67167ddddd1 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> >   	while (addr < iova + length) {
> >   		idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > -		if (!(umem_odp->map.pfn_list[idx] & perm)) {
> > +		if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
> >   			need_fault = true;
> >   			break;
> >   		}
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 13:37   ` Leon Romanovsky
@ 2025-05-22 13:42     ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 13:42 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: Jason Gunthorpe, linux-rdma, Zhu Yanjun

On Thu, May 22, 2025 at 04:37:16PM +0300, Leon Romanovsky wrote:
> On Thu, May 22, 2025 at 10:29:02PM +0900, Daisuke Matsuda wrote:
> > 
> > On 2025/05/22 20:36, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > RO pages has "perm" equal to 0, that caused to the situation
> > > where such pages were marked as needed to have fault and caused
> > > to infinite loop.
> > > 
> > > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > 
> > Thank you!
> > This change fixes one of the two issues I reported.
> > The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
> > 
> > 
> > The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> > cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
> 
> Thanks, I updated the link to point to https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/
> 
> > 
> > The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> > enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
> 
> Unlikely, up till now, it indicated that driver didn't release some
> uverb object.

BTW, all places in RXE driver which do the following:
page = hmm_pfn_to_page(umem_odp->map.pfn_list[index]);
if (!page) {
...

are incorrect, hmm_pfn_to_page() will always return something.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
  2025-05-22 13:29 ` Daisuke Matsuda
  2025-05-22 13:35 ` Leon Romanovsky
@ 2025-05-22 15:40 ` Zhu Yanjun
  2025-05-22 16:07   ` Leon Romanovsky
  2 siblings, 1 reply; 9+ messages in thread
From: Zhu Yanjun @ 2025-05-22 15:40 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Leon Romanovsky, Daisuke Matsuda, linux-rdma, Zhu Yanjun

在 2025/5/22 13:36, Leon Romanovsky 写道:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> RO pages has "perm" equal to 0, that caused to the situation
> where such pages were marked as needed to have fault and caused
> to infinite loop.
> 
> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index a1416626f61a5..0f67167ddddd1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
>   	while (addr < iova + length) {
>   		idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>   
> -		if (!(umem_odp->map.pfn_list[idx] & perm)) {
> +		if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {

Because perm is not used, it is not necessary to calculate and pass perm 
to rxe_check_pagefault. The cleanup is as below:

diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c 
b/drivers/infiniband/sw/rxe/rxe_odp.c
index 9f6e2bb2a269..f385fccd5988 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -125,7 +125,7 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 
start, u64 length,
  }

  static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
-                                      u64 iova, int length, u32 perm)
+                                      u64 iova, int length)
  {
         bool need_fault = false;
         u64 addr;
@@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct 
ib_umem_odp *umem_odp,
         while (addr < iova + length) {
                 idx = (addr - ib_umem_start(umem_odp)) >> 
umem_odp->page_shift;

-               if (!(umem_odp->dma_list[idx] & perm)) {
+               if (!(umem_odp->dma_list[idx] & HMM_PFN_VALID)) {
                         need_fault = true;
                         break;
                 }
@@ -151,19 +151,14 @@ static int rxe_odp_map_range_and_lock(struct 
rxe_mr *mr, u64 iova, int length, u
  {
         struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
         bool need_fault;
-       u64 perm;
         int err;

         if (unlikely(length < 1))
                 return -EINVAL;

-       perm = ODP_READ_ALLOWED_BIT;
-       if (!(flags & RXE_PAGEFAULT_RDONLY))
-               perm |= ODP_WRITE_ALLOWED_BIT;
-
         mutex_lock(&umem_odp->umem_mutex);

-       need_fault = rxe_check_pagefault(umem_odp, iova, length, perm);
+       need_fault = rxe_check_pagefault(umem_odp, iova, length);
         if (need_fault) {
                 mutex_unlock(&umem_odp->umem_mutex);

@@ -173,7 +168,7 @@ static int rxe_odp_map_range_and_lock(struct rxe_mr 
*mr, u64 iova, int length, u
                 if (err < 0)
                         return err;

-               need_fault = rxe_check_pagefault(umem_odp, iova, length, 
perm);
+               need_fault = rxe_check_pagefault(umem_odp, iova, length);
                 if (need_fault)
                         return -EFAULT;
         }

Zhu Yanjun

>   			need_fault = true;
>   			break;
>   		}


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 15:40 ` Zhu Yanjun
@ 2025-05-22 16:07   ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-05-22 16:07 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Jason Gunthorpe, Daisuke Matsuda, linux-rdma, Zhu Yanjun

On Thu, May 22, 2025 at 05:40:38PM +0200, Zhu Yanjun wrote:
> 在 2025/5/22 13:36, Leon Romanovsky 写道:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > RO pages has "perm" equal to 0, that caused to the situation
> > where such pages were marked as needed to have fault and caused
> > to infinite loop.
> > 
> > Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
> > Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
> > Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > index a1416626f61a5..0f67167ddddd1 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> >   	while (addr < iova + length) {
> >   		idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > -		if (!(umem_odp->map.pfn_list[idx] & perm)) {
> > +		if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
> 
> Because perm is not used, it is not necessary to calculate and pass perm to
> rxe_check_pagefault. The cleanup is as below:

Thanks a lot, I folded this cleanup to the fix.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-22 13:29 ` Daisuke Matsuda
  2025-05-22 13:37   ` Leon Romanovsky
@ 2025-05-23 12:15   ` Zhu Yanjun
  2025-05-23 12:57     ` Daisuke Matsuda
  1 sibling, 1 reply; 9+ messages in thread
From: Zhu Yanjun @ 2025-05-23 12:15 UTC (permalink / raw)
  To: Daisuke Matsuda, Leon Romanovsky, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun

On 22.05.25 15:29, Daisuke Matsuda wrote:
> 
> On 2025/05/22 20:36, Leon Romanovsky wrote:
>> From: Leon Romanovsky <leonro@nvidia.com>
>>
>> RO pages has "perm" equal to 0, that caused to the situation
>> where such pages were marked as needed to have fault and caused
>> to infinite loop.
>>
>> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in 
>> PFN")
>> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
>> Closes: 
>> https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> 
> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>

In the bug report mail, you mentioned
"
After these two patches are merged to the for-next tree, RXE ODP test 
always hangs:
   RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage
   RDMA/umem: Store ODP access mask information in PFN
"

After this commit is applied, which of the two previous commits is 
innocent, and which one causes the "stuck issue in uverbs_destroy_ufile_hw"?

Best Regards,
Yanjun.Zhu

> 
> Thank you!
> This change fixes one of the two issues I reported.
> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
> 
> 
> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
> cf. 
> https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
> 
> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
> enabled in rxe. It might indicate that the root cause lies in ib_uverbs 
> layer.
> I will take a closer look anyway.
> 
> Thanks,
> Daisuke
> 
> 
>> ---
>>   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c 
>> b/drivers/infiniband/sw/rxe/rxe_odp.c
>> index a1416626f61a5..0f67167ddddd1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct 
>> ib_umem_odp *umem_odp,
>>       while (addr < iova + length) {
>>           idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>> -        if (!(umem_odp->map.pfn_list[idx] & perm)) {
>> +        if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>>               need_fault = true;
>>               break;
>>           }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages
  2025-05-23 12:15   ` Zhu Yanjun
@ 2025-05-23 12:57     ` Daisuke Matsuda
  0 siblings, 0 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 12:57 UTC (permalink / raw)
  To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Zhu Yanjun


On 2025/05/23 21:15, Zhu Yanjun wrote:
> On 22.05.25 15:29, Daisuke Matsuda wrote:
>>
>> On 2025/05/22 20:36, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> RO pages has "perm" equal to 0, that caused to the situation
>>> where such pages were marked as needed to have fault and caused
>>> to infinite loop.
>>>
>>> Fixes: eedd5b1276e7 ("RDMA/umem: Store ODP access mask information in PFN")
>>> Reported-by: Daisuke Matsuda <dskmtsd@gmail.com>
>>> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com
>>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>>
>> Tested-by: Daisuke Matsuda <dskmtsd@gmail.com>
> 
> In the bug report mail, you mentioned
> "
> After these two patches are merged to the for-next tree, RXE ODP test always hangs:
>    RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage
>    RDMA/umem: Store ODP access mask information in PFN
> "
> 
> After this commit is applied, which of the two previous commits is innocent, and which one causes the "stuck issue in uverbs_destroy_ufile_hw"?

The issue caused by "RDMA/umem: Store ODP access mask information in PFN" has been resolved,
and after applying "RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage",
the stuck issue in uverbs_destroy_ufile_hw() emerges.

I have added some details to the bug report. I am going post a fix
though I am not sure people like changing hmm.c to fix this one.

Thanks,
Daisuke

> 
> Best Regards,
> Yanjun.Zhu
> 
>>
>> Thank you!
>> This change fixes one of the two issues I reported.
>> The kernel module does not get stuck in rxe_ib_invalidate_range() anymore.
>>
>>
>> The remaining one is the stuck issue in uverbs_destroy_ufile_hw().
>> cf. https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>>
>> The issue occurs with test_odp_async_prefetch_rc_traffic, which is not yet
>> enabled in rxe. It might indicate that the root cause lies in ib_uverbs layer.
>> I will take a closer look anyway.
>>
>> Thanks,
>> Daisuke
>>
>>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>>> index a1416626f61a5..0f67167ddddd1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>>> @@ -137,7 +137,7 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
>>>       while (addr < iova + length) {
>>>           idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
>>> -        if (!(umem_odp->map.pfn_list[idx] & perm)) {
>>> +        if (!(umem_odp->map.pfn_list[idx] & HMM_PFN_VALID)) {
>>>               need_fault = true;
>>>               break;
>>>           }
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-23 12:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 11:36 [PATCH rdma-next] RDMA/rxe: Break endless pagefault loop for RO pages Leon Romanovsky
2025-05-22 13:29 ` Daisuke Matsuda
2025-05-22 13:37   ` Leon Romanovsky
2025-05-22 13:42     ` Leon Romanovsky
2025-05-23 12:15   ` Zhu Yanjun
2025-05-23 12:57     ` Daisuke Matsuda
2025-05-22 13:35 ` Leon Romanovsky
2025-05-22 15:40 ` Zhu Yanjun
2025-05-22 16:07   ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).