Hello,

I encountered a bug in the shmem subsystem. Details below.

[Summary]

shmem_get_folio_gfp() can livelock when multiple threads fault on the

same shmem page concurrently. The -EEXIST retry loop (goto repeat) has

no cond_resched(), causing busy-looping threads to starve the thread

that holds the swapcache slot, resulting in an indefinite RCU stall and

system hang.

[Environment]

1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)

2.Triggered by: multi-threaded app with threads constrained to 2 CPUs

via cpuset

[Root Cause]

When multiple threads in the same process fault on the same shmem

swap entry:

1.Thread A enters shmem_swap_alloc_folio(), succeeds at

swapcache_prepare() (sets SWAP_HAS_CACHE), then enters

workingset_refault() → lru_gen_refault() → rcu_read_lock().

While inside the RCU read-side critical section, it is

preempted via preempt_schedule_irq (IRQ exit path detects

TIF_NEED_RESCHED).

2.Threads B & C enter shmem_swap_alloc_folio(), fail at

swapcache_prepare() (slot already taken by A), return -EEXIST.

3.In shmem_get_folio_gfp():

error = shmem_swapin_folio(...);

if (error == -EEXIST)

goto repeat;// no cond_resched(), tight loop

4.Threads B & C spin at 100% CPU on the retry loop. All three

threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A

is perpetually preempted and starved — it cannot complete the

few instructions needed to call rcu_read_unlock().

5.The held RCU read lock blocks the grace period indefinitely,

causing all synchronize_rcu() callers (cgroup operations,

fd allocation, etc.) to hang, eventually blocking init.

[Scheduling Details]

Key observations:

1.Thread A was RCU-boosted to prio 98 but accumulated only

99ms of execution over the entire stall period (~1200s).

It was effectively starved despite the priority boost.

2.Threads B & C have vruntime=0 and prio 91, indicating

they run in an RT-equivalent scheduling class (SCHED_FIFO/RT

policy). Each accumulated ~1134 seconds of execution with

only ~1600 context switches, meaning they ran uninterrupted

for ~700ms per scheduling quantum on average.

3.Thread A cannot preempt Threads B & C: Although RCU boost

raised Thread A to prio 98, Threads B & C at prio 91 (lower

numeric value = higher priority in RT class) have equal or

higher effective priority. The busy-looping threads never

voluntarily yield (no cond_resched(), no blocking calls in

the loop), so Thread A never gets scheduled.

4.CPU contention: CPU0 had nr_running=28 and CPU1 had

nr_running=24, with 3-4 RT tasks per CPU. Thread A competed

with Thread B on CPU0 but could not win scheduling.

[Observed Impact]

1.RCU stall lasting 910+ seconds (19 consecutive stall

warnings, grace period g=4398761 never advanced)

2.synchronize_rcu_expedited() callers blocked 742+ seconds

3.init process hung > 720 seconds → system unresponsive

[Call Traces]

Thread A (RCU stall source, sampled 19 times identically):

__switch_to+0x1a4/0x360 (T)

__schedule+0x96c/0xf3c

preempt_schedule_irq+0xec/0x198

raw_irqentry_exit_cond_resched+0x2c/0x44

irqentry_exit+0x38/0x64

exit_to_kernel_mode+0x28/0x38

el1_interrupt+0x5c/0xa8

el1h_64_irq_handler+0x18/0x24

el1h_64_irq+0x84/0x88

workingset_refault+0x16c/0x79c (P)

shmem_swapin_folio+0x8e4/0xd44

shmem_get_folio_gfp+0xb8/0x710

shmem_fault+0xa0/0x174

__do_fault

do_pte_missing

handle_mm_fault

do_page_fault

el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):

xas_load+0x78/0xe4 (P)

shmem_swapin_folio+0x950/0xd44

shmem_get_folio_gfp+0xb8/0x710

shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):

xas_load+0x50/0xe4 (P)

shmem_swapin_folio+0xd8/0xd44

shmem_get_folio_gfp+0xb8/0x710

shmem_fault → ... → el0_ia

[Question]

What is the recommended approach to fix this livelock?

We are considering adding a cond_resched() before the

goto repeat in shmem_get_folio_gfp() to break the tight

loop and allow the swapcache-holding thread to make

progress. Would this be an acceptable fix, or is there

a better strategy (e.g., bounded retry with fallback,

or yielding to the specific waiter)?

Thanks,

Chao Ma