* [BUG] shmem: shmem_get_folio_gfp livelock [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com> @ 2026-06-30 12:55 ` 马超 2026-06-30 13:15 ` 马超 0 siblings, 1 reply; 2+ messages in thread From: 马超 @ 2026-06-30 12:55 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 田孝斌, 俞东斌, 李鹏程 [-- Attachment #1: Type: text/plain, Size: 4938 bytes --] Hello, I encountered a bug in the shmem subsystem. Details below. [Summary] shmem_get_folio_gfp() can livelock when multiple threads fault on the same shmem page concurrently. The -EEXIST retry loop (goto repeat) has no cond_resched(), causing busy-looping threads to starve the thread that holds the swapcache slot, resulting in an indefinite RCU stall and system hang. [Environment] 1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y) 2.Triggered by: multi-threaded app with threads constrained to 2 CPUs via cpuset [Root Cause] When multiple threads in the same process fault on the same shmem swap entry: 1.Thread A enters shmem_swap_alloc_folio(), succeeds at swapcache_prepare() (sets SWAP_HAS_CACHE), then enters workingset_refault() → lru_gen_refault() → rcu_read_lock(). While inside the RCU read-side critical section, it is preempted via preempt_schedule_irq (IRQ exit path detects TIF_NEED_RESCHED). 2.Threads B & C enter shmem_swap_alloc_folio(), fail at swapcache_prepare() (slot already taken by A), return -EEXIST. 3.In shmem_get_folio_gfp(): error = shmem_swapin_folio(...); if (error == -EEXIST) goto repeat;// no cond_resched(), tight loop 4.Threads B & C spin at 100% CPU on the retry loop. All three threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A is perpetually preempted and starved ― it cannot complete the few instructions needed to call rcu_read_unlock(). 5.The held RCU read lock blocks the grace period indefinitely, causing all synchronize_rcu() callers (cgroup operations, fd allocation, etc.) to hang, eventually blocking init. [Scheduling Details] Key observations: 1.Thread A was RCU-boosted to prio 98 but accumulated only 99ms of execution over the entire stall period (~1200s). It was effectively starved despite the priority boost. 2.Threads B & C have vruntime=0 and prio 91, indicating they run in an RT-equivalent scheduling class (SCHED_FIFO/RT policy). Each accumulated ~1134 seconds of execution with only ~1600 context switches, meaning they ran uninterrupted for ~700ms per scheduling quantum on average. 3.Thread A cannot preempt Threads B & C: Although RCU boost raised Thread A to prio 98, Threads B & C at prio 91 (lower numeric value = higher priority in RT class) have equal or higher effective priority. The busy-looping threads never voluntarily yield (no cond_resched(), no blocking calls in the loop), so Thread A never gets scheduled. 4.CPU contention: CPU0 had nr_running=28 and CPU1 had nr_running=24, with 3-4 RT tasks per CPU. Thread A competed with Thread B on CPU0 but could not win scheduling. [Observed Impact] 1.RCU stall lasting 910+ seconds (19 consecutive stall warnings, grace period g=4398761 never advanced) 2.synchronize_rcu_expedited() callers blocked 742+ seconds 3.init process hung > 720 seconds → system unresponsive [Call Traces] Thread A (RCU stall source, sampled 19 times identically): __switch_to+0x1a4/0x360 (T) __schedule+0x96c/0xf3c preempt_schedule_irq+0xec/0x198 raw_irqentry_exit_cond_resched+0x2c/0x44 irqentry_exit+0x38/0x64 exit_to_kernel_mode+0x28/0x38 el1_interrupt+0x5c/0xa8 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x84/0x88 workingset_refault+0x16c/0x79c (P) shmem_swapin_folio+0x8e4/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault+0xa0/0x174 __do_fault do_pte_missing handle_mm_fault do_page_fault el0_ia Thread B (busy-loop on CPU0, sum_exec_runtime=1134s): xas_load+0x78/0xe4 (P) shmem_swapin_folio+0x950/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault → ... → el0_ia Thread C (busy-loop on CPU1, sum_exec_runtime=1134s): xas_load+0x50/0xe4 (P) shmem_swapin_folio+0xd8/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault → ... → el0_ia [Question] What is the recommended approach to fix this livelock? We are considering adding a cond_resched() before the goto repeat in shmem_get_folio_gfp() to break the tight loop and allow the swapcache-holding thread to make progress. Would this be an acceptable fix, or is there a better strategy (e.g., bounded retry with fallback, or yielding to the specific waiter)? Thanks, Chao Ma #/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/# [-- Attachment #2: Type: text/html, Size: 20427 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
* [BUG] shmem: shmem_get_folio_gfp livelock 2026-06-30 12:55 ` [BUG] shmem: shmem_get_folio_gfp livelock 马超 @ 2026-06-30 13:15 ` 马超 0 siblings, 0 replies; 2+ messages in thread From: 马超 @ 2026-06-30 13:15 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 田孝斌, 俞东斌, 李鹏程 [-- Attachment #1: Type: text/plain, Size: 4940 bytes --] Hello, I encountered a bug in the shmem subsystem. Details below. [Summary] shmem_get_folio_gfp() can livelock when multiple threads fault on the same shmem page concurrently. The -EEXIST retry loop (goto repeat) has no cond_resched(), causing busy-looping threads to starve the thread that holds the swapcache slot, resulting in an indefinite RCU stall and system hang. [Environment] 1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y) 2.Triggered by: multi-threaded app with threads constrained to 2 CPUs via cpuset [Root Cause] When multiple threads in the same process fault on the same shmem swap entry: 1.Thread A enters shmem_swap_alloc_folio(), succeeds at swapcache_prepare() (sets SWAP_HAS_CACHE), then enters workingset_refault() → lru_gen_refault() → rcu_read_lock(). While inside the RCU read-side critical section, it is preempted via preempt_schedule_irq (IRQ exit path detects TIF_NEED_RESCHED). 2.Threads B & C enter shmem_swap_alloc_folio(), fail at swapcache_prepare() (slot already taken by A), return -EEXIST. 3.In shmem_get_folio_gfp(): error = shmem_swapin_folio(...); if (error == -EEXIST) goto repeat;// no cond_resched(), tight loop 4.Threads B & C spin at 100% CPU on the retry loop. All three threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A is perpetually preempted and starved ― it cannot complete the few instructions needed to call rcu_read_unlock(). 5.The held RCU read lock blocks the grace period indefinitely, causing all synchronize_rcu() callers (cgroup operations, fd allocation, etc.) to hang, eventually blocking init. [Scheduling Details] Key observations: 1.Thread A was RCU-boosted to prio 98 but accumulated only 99ms of execution over the entire stall period (~1200s). It was effectively starved despite the priority boost. 2.Threads B & C have vruntime=0 and prio 91, indicating they run in an RT-equivalent scheduling class (SCHED_FIFO/RT policy). Each accumulated ~1134 seconds of execution with only ~1600 context switches, meaning they ran uninterrupted for ~700ms per scheduling quantum on average. 3.Thread A cannot preempt Threads B & C: Although RCU boost raised Thread A to prio 98, Threads B & C at prio 91 (lower numeric value = higher priority in RT class) have equal or higher effective priority. The busy-looping threads never voluntarily yield (no cond_resched(), no blocking calls in the loop), so Thread A never gets scheduled. 4.CPU contention: CPU0 had nr_running=28 and CPU1 had nr_running=24, with 3-4 RT tasks per CPU. Thread A competed with Thread B on CPU0 but could not win scheduling. [Observed Impact] 1.RCU stall lasting 910+ seconds (19 consecutive stall warnings, grace period g=4398761 never advanced) 2.synchronize_rcu_expedited() callers blocked 742+ seconds 3.init process hung > 720 seconds → system unresponsive [Call Traces] Thread A (RCU stall source, sampled 19 times identically): __switch_to+0x1a4/0x360 (T) __schedule+0x96c/0xf3c preempt_schedule_irq+0xec/0x198 raw_irqentry_exit_cond_resched+0x2c/0x44 irqentry_exit+0x38/0x64 exit_to_kernel_mode+0x28/0x38 el1_interrupt+0x5c/0xa8 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x84/0x88 workingset_refault+0x16c/0x79c (P) shmem_swapin_folio+0x8e4/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault+0xa0/0x174 __do_fault do_pte_missing handle_mm_fault do_page_fault el0_ia Thread B (busy-loop on CPU0, sum_exec_runtime=1134s): xas_load+0x78/0xe4 (P) shmem_swapin_folio+0x950/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault → ... → el0_ia Thread C (busy-loop on CPU1, sum_exec_runtime=1134s): xas_load+0x50/0xe4 (P) shmem_swapin_folio+0xd8/0xd44 shmem_get_folio_gfp+0xb8/0x710 shmem_fault → ... → el0_ia [Question] What is the recommended approach to fix this livelock? We are considering adding a cond_resched() before the goto repeat in shmem_get_folio_gfp() to break the tight loop and allow the swapcache-holding thread to make progress. Would this be an acceptable fix, or is there a better strategy (e.g., bounded retry with fallback, or yielding to the specific waiter)? Thanks, Chao Ma #/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/# [-- Attachment #2: Type: text/html, Size: 18108 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-30 13:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <e1c290782eeb419e8c3e18ac6b1f49eb@xiaomi.com>
[not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
2026-06-30 12:55 ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
2026-06-30 13:15 ` 马超
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox