public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
@ 2025-11-13  9:53 lirongqing
  2025-11-13 11:27 ` Junxian Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: lirongqing @ 2025-11-13  9:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, huangjunxian6, linux-rdma; +Cc: Li RongQing

From: Li RongQing <lirongqing@baidu.com>

When a process exits with numerous large, pinned memory regions consisting
of 4KB pages, the cleanup of the memory region through __ib_umem_release()
may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
is called in a tight loop for unpin and releasing page without yielding the
CPU.

 watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
 Kernel panic - not syncing: softlockup: hung tasks
 CPU: 44 PID: 73464 Comm: python3 Tainted: G           OEL

 asm_sysvec_apic_timer_interrupt+0x1b/0x20
 RIP: 0010:free_unref_page+0xff/0x190

  ? free_unref_page+0xe3/0x190
  __put_page+0x77/0xe0
  put_compound_head+0xed/0x100
  unpin_user_page_range_dirty_lock+0xb2/0x180
  __ib_umem_release+0x57/0xb0 [ib_core]
  ib_umem_release+0x3f/0xd0 [ib_core]
  mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
  ib_dereg_mr_user+0x43/0xb0 [ib_core]
  uverbs_free_mr+0x15/0x20 [ib_uverbs]
  destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
  uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
  __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
  uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
  ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
  __fput+0x9c/0x280
  ____fput+0xe/0x20
  task_work_run+0x6a/0xb0
  do_exit+0x217/0x3c0
  do_group_exit+0x3b/0xb0
  get_signal+0x150/0x900
  arch_do_signal_or_restart+0xde/0x100
  exit_to_user_mode_loop+0xc4/0x160
  exit_to_user_mode_prepare+0xa0/0xb0
  syscall_exit_to_user_mode+0x27/0x50
  do_syscall_64+0x63/0xb0

Fix soft lockup issues by incorporating cond_resched() calls within
__ib_umem_release(), and this SG entries are typically grouped in 2MB
chunks on x86_64, adding cond_resched() should has minimal performance
impact.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
diff v1: move the cond_sched into loop, add the calling trace to change log

 drivers/infiniband/core/umem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c5b6863..8fd84aa 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -55,9 +55,11 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 		ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
 					   DMA_BIDIRECTIONAL, 0);
 
-	for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
+	for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
 		unpin_user_page_range_dirty_lock(sg_page(sg),
 			DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
+		cond_resched();
+	}
 
 	sg_free_append_table(&umem->sgt_append);
 }
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-13  9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
@ 2025-11-13 11:27 ` Junxian Huang
  2025-11-13 13:35 ` Leon Romanovsky
  2025-11-17 17:47 ` Jason Gunthorpe
  2 siblings, 0 replies; 8+ messages in thread
From: Junxian Huang @ 2025-11-13 11:27 UTC (permalink / raw)
  To: lirongqing, Jason Gunthorpe, Leon Romanovsky, linux-rdma



On 2025/11/13 17:53, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
> 
>  watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
>  Kernel panic - not syncing: softlockup: hung tasks
>  CPU: 44 PID: 73464 Comm: python3 Tainted: G           OEL
> 
>  asm_sysvec_apic_timer_interrupt+0x1b/0x20
>  RIP: 0010:free_unref_page+0xff/0x190
> 
>   ? free_unref_page+0xe3/0x190
>   __put_page+0x77/0xe0
>   put_compound_head+0xed/0x100
>   unpin_user_page_range_dirty_lock+0xb2/0x180
>   __ib_umem_release+0x57/0xb0 [ib_core]
>   ib_umem_release+0x3f/0xd0 [ib_core]
>   mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
>   ib_dereg_mr_user+0x43/0xb0 [ib_core]
>   uverbs_free_mr+0x15/0x20 [ib_uverbs]
>   destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
>   uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
>   __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
>   uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
>   ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
>   __fput+0x9c/0x280
>   ____fput+0xe/0x20
>   task_work_run+0x6a/0xb0
>   do_exit+0x217/0x3c0
>   do_group_exit+0x3b/0xb0
>   get_signal+0x150/0x900
>   arch_do_signal_or_restart+0xde/0x100
>   exit_to_user_mode_loop+0xc4/0x160
>   exit_to_user_mode_prepare+0xa0/0xb0
>   syscall_exit_to_user_mode+0x27/0x50
>   do_syscall_64+0x63/0xb0
> 
> Fix soft lockup issues by incorporating cond_resched() calls within
> __ib_umem_release(), and this SG entries are typically grouped in 2MB
> chunks on x86_64, adding cond_resched() should has minimal performance
> impact.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> diff v1: move the cond_sched into loop, add the calling trace to change log
> 
>  drivers/infiniband/core/umem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index c5b6863..8fd84aa 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -55,9 +55,11 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  		ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
>  					   DMA_BIDIRECTIONAL, 0);
>  
> -	for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> +	for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
>  		unpin_user_page_range_dirty_lock(sg_page(sg),
>  			DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
> +		cond_resched();
> +	}

Acked-by: Junxian Huang <huangjunxian6@hisilicon.com>

>  
>  	sg_free_append_table(&umem->sgt_append);
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-13  9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
  2025-11-13 11:27 ` Junxian Huang
@ 2025-11-13 13:35 ` Leon Romanovsky
  2025-11-17 17:47 ` Jason Gunthorpe
  2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2025-11-13 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe, huangjunxian6, linux-rdma, lirongqing


On Thu, 13 Nov 2025 17:53:17 +0800, lirongqing wrote:
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
> 
>  watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
>  Kernel panic - not syncing: softlockup: hung tasks
>  CPU: 44 PID: 73464 Comm: python3 Tainted: G           OEL
> 
> [...]

Applied, thanks!

[1/1] RDMA/core: Prevent soft lockup during large user memory region cleanup
      https://git.kernel.org/rdma/rdma/c/d056bc45b62b59

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-13  9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
  2025-11-13 11:27 ` Junxian Huang
  2025-11-13 13:35 ` Leon Romanovsky
@ 2025-11-17 17:47 ` Jason Gunthorpe
  2025-11-19  2:03   ` [????] " Li,Rongqing
  2 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-17 17:47 UTC (permalink / raw)
  To: lirongqing; +Cc: Leon Romanovsky, huangjunxian6, linux-rdma

On Thu, Nov 13, 2025 at 05:53:17PM +0800, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
> 
>  watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
>  Kernel panic - not syncing: softlockup: hung tasks
>  CPU: 44 PID: 73464 Comm: python3 Tainted: G           OEL
> 
>  asm_sysvec_apic_timer_interrupt+0x1b/0x20
>  RIP: 0010:free_unref_page+0xff/0x190
> 
>   ? free_unref_page+0xe3/0x190
>   __put_page+0x77/0xe0
>   put_compound_head+0xed/0x100
>   unpin_user_page_range_dirty_lock+0xb2/0x180
>   __ib_umem_release+0x57/0xb0 [ib_core]
>   ib_umem_release+0x3f/0xd0 [ib_core]
>   mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
>   ib_dereg_mr_user+0x43/0xb0 [ib_core]
>   uverbs_free_mr+0x15/0x20 [ib_uverbs]
>   destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
>   uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
>   __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
>   uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
>   ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
>   __fput+0x9c/0x280
>   ____fput+0xe/0x20
>   task_work_run+0x6a/0xb0
>   do_exit+0x217/0x3c0
>   do_group_exit+0x3b/0xb0
>   get_signal+0x150/0x900
>   arch_do_signal_or_restart+0xde/0x100
>   exit_to_user_mode_loop+0xc4/0x160
>   exit_to_user_mode_prepare+0xa0/0xb0
>   syscall_exit_to_user_mode+0x27/0x50
>   do_syscall_64+0x63/0xb0
> 
> Fix soft lockup issues by incorporating cond_resched() calls within
> __ib_umem_release(), and this SG entries are typically grouped in 2MB
> chunks on x86_64, adding cond_resched() should has minimal performance
> impact.

This is not true, I think this should have been more careful to only
resched after larger groupings.. How much slower did you make normal
4k unpins??

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-17 17:47 ` Jason Gunthorpe
@ 2025-11-19  2:03   ` Li,Rongqing
  2025-11-19 19:06     ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Li,Rongqing @ 2025-11-19  2:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
	linux-rdma@vger.kernel.org

> > Fix soft lockup issues by incorporating cond_resched() calls within
> > __ib_umem_release(), and this SG entries are typically grouped in 2MB
> > chunks on x86_64, adding cond_resched() should has minimal
> performance
> > impact.
> 
> This is not true, I think this should have been more careful to only resched
> after larger groupings.. How much slower did you make normal 4k unpins??
> 
> Jason


I don't see this as a issue for several reasons. First, this code path is not performance-critical. Second, the number of cond_resched calls added by this modification is identical to what was introduced in commit 928da37a229f3444, which has never been reported to cause any problems. Third, as seen in commit 16c610162d1f1c, the cond_resched call rate was reduced to once every 16 packets - our current frequency remains well below this commit.

When I have access to the appropriate hardware, I will collect performance data for further analysis. Alternatively, if this is considered problematic, someone could collaborate on optimizing these two cond_resched in umem.c calls together.

Thanks

-Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-19  2:03   ` [????] " Li,Rongqing
@ 2025-11-19 19:06     ` Jason Gunthorpe
  2025-11-20  3:28       ` [????] " Li,Rongqing
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 19:06 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
	linux-rdma@vger.kernel.org

On Wed, Nov 19, 2025 at 02:03:20AM +0000, Li,Rongqing wrote:
> > > Fix soft lockup issues by incorporating cond_resched() calls within
> > > __ib_umem_release(), and this SG entries are typically grouped in 2MB
> > > chunks on x86_64, adding cond_resched() should has minimal
> > performance
> > > impact.
> > 
> > This is not true, I think this should have been more careful to only resched
> > after larger groupings.. How much slower did you make normal 4k unpins??
> > 
> > Jason
> 
> 
> I don't see this as a issue for several reasons. First, this code
> path is not performance-critical. 

Yes it is!

> Second, the number of cond_resched
> calls added by this modification is identical to what was introduced
> in commit 928da37a229f3444, 

No its not! That loop does entire batches of pages into a PAGE_SIZE
memory buffer, this does it for every single 4k page.

> any problems. Third, as seen in commit 16c610162d1f1c, the
> cond_resched call rate was reduced to once every 16 packets - our
> current frequency remains well below this commit.

I don't know what that has to do with anything here

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-19 19:06     ` Jason Gunthorpe
@ 2025-11-20  3:28       ` Li,Rongqing
  2025-11-21 23:33         ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Li,Rongqing @ 2025-11-20  3:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
	linux-rdma@vger.kernel.org



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: 2025年11月20日 3:06
> To: Li,Rongqing <lirongqing@baidu.com>
> Cc: Leon Romanovsky <leon@kernel.org>; huangjunxian6@hisilicon.com;
> linux-rdma@vger.kernel.org
> Subject: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup
> during large user memory region cleanup
> 
> On Wed, Nov 19, 2025 at 02:03:20AM +0000, Li,Rongqing wrote:
> > > > Fix soft lockup issues by incorporating cond_resched() calls
> > > > within __ib_umem_release(), and this SG entries are typically
> > > > grouped in 2MB chunks on x86_64, adding cond_resched() should has
> > > > minimal
> > > performance
> > > > impact.
> > >
> > > This is not true, I think this should have been more careful to only
> > > resched after larger groupings.. How much slower did you make normal
> 4k unpins??
> > >
> > > Jason
> >
> >
> > I don't see this as a issue for several reasons. First, this code path
> > is not performance-critical.
> 
> Yes it is!
> 
> > Second, the number of cond_resched
> > calls added by this modification is identical to what was introduced
> > in commit 928da37a229f3444,
> 
> No its not! That loop does entire batches of pages into a PAGE_SIZE memory
> buffer, this does it for every single 4k page.
> 

Thanks, I understand

To minimize performance impact on releasing memory regions, call cond_resched() per 4k loop, how about the below 

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c5b6863..613c16d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -45,6 +45,8 @@

 #include "uverbs.h"

+#define RESCHED_LOOP_CNT_THRESHOLD 0xfff
+
 static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
 {
        bool make_dirty = umem->writable && dirty;
@@ -55,10 +57,15 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
                ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
                                           DMA_BIDIRECTIONAL, 0);

-       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
+       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
                unpin_user_page_range_dirty_lock(sg_page(sg),
                        DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);

+               if (!(i & RESCHED_LOOP_CNT_THRESHOLD)) {
+                       cond_resched();
+               }
+       }
+
        sg_free_append_table(&umem->sgt_append);
 }


-Li


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
  2025-11-20  3:28       ` [????] " Li,Rongqing
@ 2025-11-21 23:33         ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-21 23:33 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
	linux-rdma@vger.kernel.org

On Thu, Nov 20, 2025 at 03:28:18AM +0000, Li,Rongqing wrote:

> Thanks, I understand
> 
> To minimize performance impact on releasing memory regions, call cond_resched() per 4k loop, how about the below 

This seems like a reasonable idea, though I would just use % for
better clarity. The compiler knows how to convert that to bitmasking

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-21 23:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13  9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
2025-11-13 11:27 ` Junxian Huang
2025-11-13 13:35 ` Leon Romanovsky
2025-11-17 17:47 ` Jason Gunthorpe
2025-11-19  2:03   ` [????] " Li,Rongqing
2025-11-19 19:06     ` Jason Gunthorpe
2025-11-20  3:28       ` [????] " Li,Rongqing
2025-11-21 23:33         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox