[BUG] shmem: shmem_get_folio

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [BUG] shmem: shmem_get_folio_gfp livelock
       [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
@ 2026-06-30 12:55   ` 马超
  2026-06-30 13:15     ` 马超
  0 siblings, 1 reply; 2+ messages in thread
From: 马超 @ 2026-06-30 12:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	田孝斌, 俞东斌,
	李鹏程

[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]

Hello,
I encountered a bug in the shmem subsystem. Details below.
[Summary]
shmem_get_folio_gfp() can livelock when multiple threads fault on the
same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
no cond_resched(), causing busy-looping threads to starve the thread
that holds the swapcache slot, resulting in an indefinite RCU stall and
system hang.
[Environment]
1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
via cpuset
[Root Cause]
When multiple threads in the same process fault on the same shmem
swap entry:
1.Thread A enters shmem_swap_alloc_folio(), succeeds at
swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
workingset_refault() → lru_gen_refault() → rcu_read_lock().
While inside the RCU read-side critical section, it is
preempted via preempt_schedule_irq (IRQ exit path detects
TIF_NEED_RESCHED).
2.Threads B & C enter shmem_swap_alloc_folio(), fail at
swapcache_prepare() (slot already taken by A), return -EEXIST.
3.In shmem_get_folio_gfp():
    error = shmem_swapin_folio(...);
    if (error == -EEXIST)
        goto repeat;// no cond_resched(), tight loop
4.Threads B & C spin at 100% CPU on the retry loop. All three
threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
is perpetually preempted and starved ― it cannot complete the
few instructions needed to call rcu_read_unlock().
5.The held RCU read lock blocks the grace period indefinitely,
causing all synchronize_rcu() callers (cgroup operations,
fd allocation, etc.) to hang, eventually blocking init.
[Scheduling Details]
Key observations:
1.Thread A was RCU-boosted to prio 98 but accumulated only
99ms of execution over the entire stall period (~1200s).
It was effectively starved despite the priority boost.
2.Threads B & C have vruntime=0 and prio 91, indicating
they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
policy). Each accumulated ~1134 seconds of execution with
only ~1600 context switches, meaning they ran uninterrupted
for ~700ms per scheduling quantum on average.
3.Thread A cannot preempt Threads B & C: Although RCU boost
raised Thread A to prio 98, Threads B & C at prio 91 (lower
numeric value = higher priority in RT class) have equal or
higher effective priority. The busy-looping threads never
voluntarily yield (no cond_resched(), no blocking calls in
the loop), so Thread A never gets scheduled.
4.CPU contention: CPU0 had nr_running=28 and CPU1 had
nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
with Thread B on CPU0 but could not win scheduling.
[Observed Impact]
1.RCU stall lasting 910+ seconds (19 consecutive stall
warnings, grace period g=4398761 never advanced)
2.synchronize_rcu_expedited() callers blocked 742+ seconds
3.init process hung > 720 seconds → system unresponsive
[Call Traces]
Thread A (RCU stall source, sampled 19 times identically):
__switch_to+0x1a4/0x360 (T)
__schedule+0x96c/0xf3c
preempt_schedule_irq+0xec/0x198
raw_irqentry_exit_cond_resched+0x2c/0x44
irqentry_exit+0x38/0x64
exit_to_kernel_mode+0x28/0x38
el1_interrupt+0x5c/0xa8
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x84/0x88
workingset_refault+0x16c/0x79c (P)
  shmem_swapin_folio+0x8e4/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault+0xa0/0x174
        __do_fault
do_pte_missing
handle_mm_fault
do_page_fault
el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
xas_load+0x78/0xe4 (P)
  shmem_swapin_folio+0x950/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
xas_load+0x50/0xe4 (P)
  shmem_swapin_folio+0xd8/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia
[Question]
What is the recommended approach to fix this livelock?
We are considering adding a cond_resched() before the
goto repeat in shmem_get_folio_gfp() to break the tight
loop and allow the swapcache-holding thread to make
progress. Would this be an acceptable fix, or is there
a better strategy (e.g., bounded retry with fallback,
or yielding to the specific waiter)?

Thanks,
Chao Ma
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

[-- Attachment #2: Type: text/html, Size: 20427 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [BUG] shmem: shmem_get_folio_gfp livelock
  2026-06-30 12:55   ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
@ 2026-06-30 13:15     ` 马超
  0 siblings, 0 replies; 2+ messages in thread
From: 马超 @ 2026-06-30 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	田孝斌, 俞东斌,
	李鹏程

[-- Attachment #1: Type: text/plain, Size: 4940 bytes --]

Hello,
I encountered a bug in the shmem subsystem. Details below.

[Summary]
shmem_get_folio_gfp() can livelock when multiple threads fault on the
same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
no cond_resched(), causing busy-looping threads to starve the thread
that holds the swapcache slot, resulting in an indefinite RCU stall and
system hang.
[Environment]
1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
via cpuset
[Root Cause]
When multiple threads in the same process fault on the same shmem
swap entry:
1.Thread A enters shmem_swap_alloc_folio(), succeeds at
swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
workingset_refault() → lru_gen_refault() → rcu_read_lock().
While inside the RCU read-side critical section, it is
preempted via preempt_schedule_irq (IRQ exit path detects
TIF_NEED_RESCHED).
2.Threads B & C enter shmem_swap_alloc_folio(), fail at
swapcache_prepare() (slot already taken by A), return -EEXIST.
3.In shmem_get_folio_gfp():
    error = shmem_swapin_folio(...);
    if (error == -EEXIST)
        goto repeat;// no cond_resched(), tight loop
4.Threads B & C spin at 100% CPU on the retry loop. All three
threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
is perpetually preempted and starved ― it cannot complete the
few instructions needed to call rcu_read_unlock().
5.The held RCU read lock blocks the grace period indefinitely,
causing all synchronize_rcu() callers (cgroup operations,
fd allocation, etc.) to hang, eventually blocking init.
[Scheduling Details]
Key observations:
1.Thread A was RCU-boosted to prio 98 but accumulated only
99ms of execution over the entire stall period (~1200s).
It was effectively starved despite the priority boost.
2.Threads B & C have vruntime=0 and prio 91, indicating
they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
policy). Each accumulated ~1134 seconds of execution with
only ~1600 context switches, meaning they ran uninterrupted
for ~700ms per scheduling quantum on average.
3.Thread A cannot preempt Threads B & C: Although RCU boost
raised Thread A to prio 98, Threads B & C at prio 91 (lower
numeric value = higher priority in RT class) have equal or
higher effective priority. The busy-looping threads never
voluntarily yield (no cond_resched(), no blocking calls in
the loop), so Thread A never gets scheduled.
4.CPU contention: CPU0 had nr_running=28 and CPU1 had
nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
with Thread B on CPU0 but could not win scheduling.
[Observed Impact]
1.RCU stall lasting 910+ seconds (19 consecutive stall
warnings, grace period g=4398761 never advanced)
2.synchronize_rcu_expedited() callers blocked 742+ seconds
3.init process hung > 720 seconds → system unresponsive
[Call Traces]
Thread A (RCU stall source, sampled 19 times identically):
__switch_to+0x1a4/0x360 (T)
__schedule+0x96c/0xf3c
preempt_schedule_irq+0xec/0x198
raw_irqentry_exit_cond_resched+0x2c/0x44
irqentry_exit+0x38/0x64
exit_to_kernel_mode+0x28/0x38
el1_interrupt+0x5c/0xa8
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x84/0x88
workingset_refault+0x16c/0x79c (P)
  shmem_swapin_folio+0x8e4/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault+0xa0/0x174
        __do_fault
do_pte_missing
handle_mm_fault
do_page_fault
el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
xas_load+0x78/0xe4 (P)
  shmem_swapin_folio+0x950/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
xas_load+0x50/0xe4 (P)
  shmem_swapin_folio+0xd8/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia
[Question]
What is the recommended approach to fix this livelock?
We are considering adding a cond_resched() before the
goto repeat in shmem_get_folio_gfp() to break the tight
loop and allow the swapcache-holding thread to make
progress. Would this be an acceptable fix, or is there
a better strategy (e.g., bounded retry with fallback,
or yielding to the specific waiter)?

Thanks,
Chao Ma
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

[-- Attachment #2: Type: text/html, Size: 18108 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-30 13:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <e1c290782eeb419e8c3e18ac6b1f49eb@xiaomi.com>
     [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
2026-06-30 12:55   ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
2026-06-30 13:15     ` 马超

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox