* [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
@ 2025-11-13 9:53 lirongqing
2025-11-13 11:27 ` Junxian Huang
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: lirongqing @ 2025-11-13 9:53 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, huangjunxian6, linux-rdma; +Cc: Li RongQing
From: Li RongQing <lirongqing@baidu.com>
When a process exits with numerous large, pinned memory regions consisting
of 4KB pages, the cleanup of the memory region through __ib_umem_release()
may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
is called in a tight loop for unpin and releasing page without yielding the
CPU.
watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
Kernel panic - not syncing: softlockup: hung tasks
CPU: 44 PID: 73464 Comm: python3 Tainted: G OEL
asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:free_unref_page+0xff/0x190
? free_unref_page+0xe3/0x190
__put_page+0x77/0xe0
put_compound_head+0xed/0x100
unpin_user_page_range_dirty_lock+0xb2/0x180
__ib_umem_release+0x57/0xb0 [ib_core]
ib_umem_release+0x3f/0xd0 [ib_core]
mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
ib_dereg_mr_user+0x43/0xb0 [ib_core]
uverbs_free_mr+0x15/0x20 [ib_uverbs]
destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
__uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
__fput+0x9c/0x280
____fput+0xe/0x20
task_work_run+0x6a/0xb0
do_exit+0x217/0x3c0
do_group_exit+0x3b/0xb0
get_signal+0x150/0x900
arch_do_signal_or_restart+0xde/0x100
exit_to_user_mode_loop+0xc4/0x160
exit_to_user_mode_prepare+0xa0/0xb0
syscall_exit_to_user_mode+0x27/0x50
do_syscall_64+0x63/0xb0
Fix soft lockup issues by incorporating cond_resched() calls within
__ib_umem_release(), and this SG entries are typically grouped in 2MB
chunks on x86_64, adding cond_resched() should has minimal performance
impact.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
diff v1: move the cond_sched into loop, add the calling trace to change log
drivers/infiniband/core/umem.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c5b6863..8fd84aa 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -55,9 +55,11 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
DMA_BIDIRECTIONAL, 0);
- for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
+ for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
+ cond_resched();
+ }
sg_free_append_table(&umem->sgt_append);
}
--
2.9.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-13 9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
@ 2025-11-13 11:27 ` Junxian Huang
2025-11-13 13:35 ` Leon Romanovsky
2025-11-17 17:47 ` Jason Gunthorpe
2 siblings, 0 replies; 8+ messages in thread
From: Junxian Huang @ 2025-11-13 11:27 UTC (permalink / raw)
To: lirongqing, Jason Gunthorpe, Leon Romanovsky, linux-rdma
On 2025/11/13 17:53, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
>
> watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 44 PID: 73464 Comm: python3 Tainted: G OEL
>
> asm_sysvec_apic_timer_interrupt+0x1b/0x20
> RIP: 0010:free_unref_page+0xff/0x190
>
> ? free_unref_page+0xe3/0x190
> __put_page+0x77/0xe0
> put_compound_head+0xed/0x100
> unpin_user_page_range_dirty_lock+0xb2/0x180
> __ib_umem_release+0x57/0xb0 [ib_core]
> ib_umem_release+0x3f/0xd0 [ib_core]
> mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
> ib_dereg_mr_user+0x43/0xb0 [ib_core]
> uverbs_free_mr+0x15/0x20 [ib_uverbs]
> destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
> uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
> __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
> uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
> ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
> __fput+0x9c/0x280
> ____fput+0xe/0x20
> task_work_run+0x6a/0xb0
> do_exit+0x217/0x3c0
> do_group_exit+0x3b/0xb0
> get_signal+0x150/0x900
> arch_do_signal_or_restart+0xde/0x100
> exit_to_user_mode_loop+0xc4/0x160
> exit_to_user_mode_prepare+0xa0/0xb0
> syscall_exit_to_user_mode+0x27/0x50
> do_syscall_64+0x63/0xb0
>
> Fix soft lockup issues by incorporating cond_resched() calls within
> __ib_umem_release(), and this SG entries are typically grouped in 2MB
> chunks on x86_64, adding cond_resched() should has minimal performance
> impact.
>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> diff v1: move the cond_sched into loop, add the calling trace to change log
>
> drivers/infiniband/core/umem.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index c5b6863..8fd84aa 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -55,9 +55,11 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
> ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
> DMA_BIDIRECTIONAL, 0);
>
> - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> + for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
> unpin_user_page_range_dirty_lock(sg_page(sg),
> DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
> + cond_resched();
> + }
Acked-by: Junxian Huang <huangjunxian6@hisilicon.com>
>
> sg_free_append_table(&umem->sgt_append);
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-13 9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
2025-11-13 11:27 ` Junxian Huang
@ 2025-11-13 13:35 ` Leon Romanovsky
2025-11-17 17:47 ` Jason Gunthorpe
2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2025-11-13 13:35 UTC (permalink / raw)
To: Jason Gunthorpe, huangjunxian6, linux-rdma, lirongqing
On Thu, 13 Nov 2025 17:53:17 +0800, lirongqing wrote:
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
>
> watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 44 PID: 73464 Comm: python3 Tainted: G OEL
>
> [...]
Applied, thanks!
[1/1] RDMA/core: Prevent soft lockup during large user memory region cleanup
https://git.kernel.org/rdma/rdma/c/d056bc45b62b59
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-13 9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
2025-11-13 11:27 ` Junxian Huang
2025-11-13 13:35 ` Leon Romanovsky
@ 2025-11-17 17:47 ` Jason Gunthorpe
2025-11-19 2:03 ` [????] " Li,Rongqing
2 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-17 17:47 UTC (permalink / raw)
To: lirongqing; +Cc: Leon Romanovsky, huangjunxian6, linux-rdma
On Thu, Nov 13, 2025 at 05:53:17PM +0800, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> When a process exits with numerous large, pinned memory regions consisting
> of 4KB pages, the cleanup of the memory region through __ib_umem_release()
> may cause soft lockups. This is because unpin_user_page_range_dirty_lock()
> is called in a tight loop for unpin and releasing page without yielding the
> CPU.
>
> watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464]
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 44 PID: 73464 Comm: python3 Tainted: G OEL
>
> asm_sysvec_apic_timer_interrupt+0x1b/0x20
> RIP: 0010:free_unref_page+0xff/0x190
>
> ? free_unref_page+0xe3/0x190
> __put_page+0x77/0xe0
> put_compound_head+0xed/0x100
> unpin_user_page_range_dirty_lock+0xb2/0x180
> __ib_umem_release+0x57/0xb0 [ib_core]
> ib_umem_release+0x3f/0xd0 [ib_core]
> mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib]
> ib_dereg_mr_user+0x43/0xb0 [ib_core]
> uverbs_free_mr+0x15/0x20 [ib_uverbs]
> destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs]
> uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs]
> __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs]
> uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs]
> ib_uverbs_close+0x1f/0xb0 [ib_uverbs]
> __fput+0x9c/0x280
> ____fput+0xe/0x20
> task_work_run+0x6a/0xb0
> do_exit+0x217/0x3c0
> do_group_exit+0x3b/0xb0
> get_signal+0x150/0x900
> arch_do_signal_or_restart+0xde/0x100
> exit_to_user_mode_loop+0xc4/0x160
> exit_to_user_mode_prepare+0xa0/0xb0
> syscall_exit_to_user_mode+0x27/0x50
> do_syscall_64+0x63/0xb0
>
> Fix soft lockup issues by incorporating cond_resched() calls within
> __ib_umem_release(), and this SG entries are typically grouped in 2MB
> chunks on x86_64, adding cond_resched() should has minimal performance
> impact.
This is not true, I think this should have been more careful to only
resched after larger groupings.. How much slower did you make normal
4k unpins??
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-17 17:47 ` Jason Gunthorpe
@ 2025-11-19 2:03 ` Li,Rongqing
2025-11-19 19:06 ` Jason Gunthorpe
0 siblings, 1 reply; 8+ messages in thread
From: Li,Rongqing @ 2025-11-19 2:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
linux-rdma@vger.kernel.org
> > Fix soft lockup issues by incorporating cond_resched() calls within
> > __ib_umem_release(), and this SG entries are typically grouped in 2MB
> > chunks on x86_64, adding cond_resched() should has minimal
> performance
> > impact.
>
> This is not true, I think this should have been more careful to only resched
> after larger groupings.. How much slower did you make normal 4k unpins??
>
> Jason
I don't see this as a issue for several reasons. First, this code path is not performance-critical. Second, the number of cond_resched calls added by this modification is identical to what was introduced in commit 928da37a229f3444, which has never been reported to cause any problems. Third, as seen in commit 16c610162d1f1c, the cond_resched call rate was reduced to once every 16 packets - our current frequency remains well below this commit.
When I have access to the appropriate hardware, I will collect performance data for further analysis. Alternatively, if this is considered problematic, someone could collaborate on optimizing these two cond_resched in umem.c calls together.
Thanks
-Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-19 2:03 ` [????] " Li,Rongqing
@ 2025-11-19 19:06 ` Jason Gunthorpe
2025-11-20 3:28 ` [????] " Li,Rongqing
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 19:06 UTC (permalink / raw)
To: Li,Rongqing
Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
linux-rdma@vger.kernel.org
On Wed, Nov 19, 2025 at 02:03:20AM +0000, Li,Rongqing wrote:
> > > Fix soft lockup issues by incorporating cond_resched() calls within
> > > __ib_umem_release(), and this SG entries are typically grouped in 2MB
> > > chunks on x86_64, adding cond_resched() should has minimal
> > performance
> > > impact.
> >
> > This is not true, I think this should have been more careful to only resched
> > after larger groupings.. How much slower did you make normal 4k unpins??
> >
> > Jason
>
>
> I don't see this as a issue for several reasons. First, this code
> path is not performance-critical.
Yes it is!
> Second, the number of cond_resched
> calls added by this modification is identical to what was introduced
> in commit 928da37a229f3444,
No its not! That loop does entire batches of pages into a PAGE_SIZE
memory buffer, this does it for every single 4k page.
> any problems. Third, as seen in commit 16c610162d1f1c, the
> cond_resched call rate was reduced to once every 16 packets - our
> current frequency remains well below this commit.
I don't know what that has to do with anything here
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-19 19:06 ` Jason Gunthorpe
@ 2025-11-20 3:28 ` Li,Rongqing
2025-11-21 23:33 ` Jason Gunthorpe
0 siblings, 1 reply; 8+ messages in thread
From: Li,Rongqing @ 2025-11-20 3:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
linux-rdma@vger.kernel.org
> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: 2025年11月20日 3:06
> To: Li,Rongqing <lirongqing@baidu.com>
> Cc: Leon Romanovsky <leon@kernel.org>; huangjunxian6@hisilicon.com;
> linux-rdma@vger.kernel.org
> Subject: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup
> during large user memory region cleanup
>
> On Wed, Nov 19, 2025 at 02:03:20AM +0000, Li,Rongqing wrote:
> > > > Fix soft lockup issues by incorporating cond_resched() calls
> > > > within __ib_umem_release(), and this SG entries are typically
> > > > grouped in 2MB chunks on x86_64, adding cond_resched() should has
> > > > minimal
> > > performance
> > > > impact.
> > >
> > > This is not true, I think this should have been more careful to only
> > > resched after larger groupings.. How much slower did you make normal
> 4k unpins??
> > >
> > > Jason
> >
> >
> > I don't see this as a issue for several reasons. First, this code path
> > is not performance-critical.
>
> Yes it is!
>
> > Second, the number of cond_resched
> > calls added by this modification is identical to what was introduced
> > in commit 928da37a229f3444,
>
> No its not! That loop does entire batches of pages into a PAGE_SIZE memory
> buffer, this does it for every single 4k page.
>
Thanks, I understand
To minimize performance impact on releasing memory regions, call cond_resched() per 4k loop, how about the below
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c5b6863..613c16d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -45,6 +45,8 @@
#include "uverbs.h"
+#define RESCHED_LOOP_CNT_THRESHOLD 0xfff
+
static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
{
bool make_dirty = umem->writable && dirty;
@@ -55,10 +57,15 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt,
DMA_BIDIRECTIONAL, 0);
- for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
+ for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) {
unpin_user_page_range_dirty_lock(sg_page(sg),
DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty);
+ if (!(i & RESCHED_LOOP_CNT_THRESHOLD)) {
+ cond_resched();
+ }
+ }
+
sg_free_append_table(&umem->sgt_append);
}
-Li
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [????] Re: [????] Re: [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup
2025-11-20 3:28 ` [????] " Li,Rongqing
@ 2025-11-21 23:33 ` Jason Gunthorpe
0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2025-11-21 23:33 UTC (permalink / raw)
To: Li,Rongqing
Cc: Leon Romanovsky, huangjunxian6@hisilicon.com,
linux-rdma@vger.kernel.org
On Thu, Nov 20, 2025 at 03:28:18AM +0000, Li,Rongqing wrote:
> Thanks, I understand
>
> To minimize performance impact on releasing memory regions, call cond_resched() per 4k loop, how about the below
This seems like a reasonable idea, though I would just use % for
better clarity. The compiler knows how to convert that to bitmasking
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-21 23:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 9:53 [PATCH][v2] RDMA/core: Prevent soft lockup during large user memory region cleanup lirongqing
2025-11-13 11:27 ` Junxian Huang
2025-11-13 13:35 ` Leon Romanovsky
2025-11-17 17:47 ` Jason Gunthorpe
2025-11-19 2:03 ` [????] " Li,Rongqing
2025-11-19 19:06 ` Jason Gunthorpe
2025-11-20 3:28 ` [????] " Li,Rongqing
2025-11-21 23:33 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox