[BUG] shmem: shmem_get_folio_gfp livelock

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: 马超 <machao26@xiaomi.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	田孝斌 <tianxiaobin@xiaomi.com>, 俞东斌 <yudongbin@xiaomi.com>,
	李鹏程 <xiaoyaoli@xiaomi.com>
Subject: [BUG] shmem: shmem_get_folio_gfp livelock
Date: Tue, 30 Jun 2026 13:15:45 +0000	[thread overview]
Message-ID: <700a2cbf90a2484f979aac858f08f5d4@xiaomi.com> (raw)
In-Reply-To: <49858bc642844e3bbf6449c0f241af04@xiaomi.com>

[-- Attachment #1: Type: text/plain, Size: 4940 bytes --]

Hello,
I encountered a bug in the shmem subsystem. Details below.

[Summary]
shmem_get_folio_gfp() can livelock when multiple threads fault on the
same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
no cond_resched(), causing busy-looping threads to starve the thread
that holds the swapcache slot, resulting in an indefinite RCU stall and
system hang.
[Environment]
1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
via cpuset
[Root Cause]
When multiple threads in the same process fault on the same shmem
swap entry:
1.Thread A enters shmem_swap_alloc_folio(), succeeds at
swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
workingset_refault() → lru_gen_refault() → rcu_read_lock().
While inside the RCU read-side critical section, it is
preempted via preempt_schedule_irq (IRQ exit path detects
TIF_NEED_RESCHED).
2.Threads B & C enter shmem_swap_alloc_folio(), fail at
swapcache_prepare() (slot already taken by A), return -EEXIST.
3.In shmem_get_folio_gfp():
    error = shmem_swapin_folio(...);
    if (error == -EEXIST)
        goto repeat;// no cond_resched(), tight loop
4.Threads B & C spin at 100% CPU on the retry loop. All three
threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
is perpetually preempted and starved ― it cannot complete the
few instructions needed to call rcu_read_unlock().
5.The held RCU read lock blocks the grace period indefinitely,
causing all synchronize_rcu() callers (cgroup operations,
fd allocation, etc.) to hang, eventually blocking init.
[Scheduling Details]
Key observations:
1.Thread A was RCU-boosted to prio 98 but accumulated only
99ms of execution over the entire stall period (~1200s).
It was effectively starved despite the priority boost.
2.Threads B & C have vruntime=0 and prio 91, indicating
they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
policy). Each accumulated ~1134 seconds of execution with
only ~1600 context switches, meaning they ran uninterrupted
for ~700ms per scheduling quantum on average.
3.Thread A cannot preempt Threads B & C: Although RCU boost
raised Thread A to prio 98, Threads B & C at prio 91 (lower
numeric value = higher priority in RT class) have equal or
higher effective priority. The busy-looping threads never
voluntarily yield (no cond_resched(), no blocking calls in
the loop), so Thread A never gets scheduled.
4.CPU contention: CPU0 had nr_running=28 and CPU1 had
nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
with Thread B on CPU0 but could not win scheduling.
[Observed Impact]
1.RCU stall lasting 910+ seconds (19 consecutive stall
warnings, grace period g=4398761 never advanced)
2.synchronize_rcu_expedited() callers blocked 742+ seconds
3.init process hung > 720 seconds → system unresponsive
[Call Traces]
Thread A (RCU stall source, sampled 19 times identically):
__switch_to+0x1a4/0x360 (T)
__schedule+0x96c/0xf3c
preempt_schedule_irq+0xec/0x198
raw_irqentry_exit_cond_resched+0x2c/0x44
irqentry_exit+0x38/0x64
exit_to_kernel_mode+0x28/0x38
el1_interrupt+0x5c/0xa8
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x84/0x88
workingset_refault+0x16c/0x79c (P)
  shmem_swapin_folio+0x8e4/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault+0xa0/0x174
        __do_fault
do_pte_missing
handle_mm_fault
do_page_fault
el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
xas_load+0x78/0xe4 (P)
  shmem_swapin_folio+0x950/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
xas_load+0x50/0xe4 (P)
  shmem_swapin_folio+0xd8/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia
[Question]
What is the recommended approach to fix this livelock?
We are considering adding a cond_resched() before the
goto repeat in shmem_get_folio_gfp() to break the tight
loop and allow the swapcache-holding thread to make
progress. Would this be an acceptable fix, or is there
a better strategy (e.g., bounded retry with fallback,
or yielding to the specific waiter)?

Thanks,
Chao Ma
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

[-- Attachment #2: Type: text/html, Size: 18108 bytes --]

     prev parent reply	other threads:[~2026-06-30 13:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <e1c290782eeb419e8c3e18ac6b1f49eb@xiaomi.com>
     [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
2026-06-30 12:55   ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
2026-06-30 13:15     ` 马超 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=700a2cbf90a2484f979aac858f08f5d4@xiaomi.com \
    --to=machao26@xiaomi.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tianxiaobin@xiaomi.com \
    --cc=xiaoyaoli@xiaomi.com \
    --cc=yudongbin@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox