[BUG] shmem: shmem_get_folio

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [BUG] shmem: shmem_get_folio_gfp livelock
       [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
@ 2026-06-30 12:55   ` 马超
  2026-06-30 13:15     ` 马超
  0 siblings, 1 reply; 4+ messages in thread
From: 马超 @ 2026-06-30 12:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	田孝斌, 俞东斌,
	李鹏程

[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]

Hello,
I encountered a bug in the shmem subsystem. Details below.
[Summary]
shmem_get_folio_gfp() can livelock when multiple threads fault on the
same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
no cond_resched(), causing busy-looping threads to starve the thread
that holds the swapcache slot, resulting in an indefinite RCU stall and
system hang.
[Environment]
1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
via cpuset
[Root Cause]
When multiple threads in the same process fault on the same shmem
swap entry:
1.Thread A enters shmem_swap_alloc_folio(), succeeds at
swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
workingset_refault() → lru_gen_refault() → rcu_read_lock().
While inside the RCU read-side critical section, it is
preempted via preempt_schedule_irq (IRQ exit path detects
TIF_NEED_RESCHED).
2.Threads B & C enter shmem_swap_alloc_folio(), fail at
swapcache_prepare() (slot already taken by A), return -EEXIST.
3.In shmem_get_folio_gfp():
    error = shmem_swapin_folio(...);
    if (error == -EEXIST)
        goto repeat;// no cond_resched(), tight loop
4.Threads B & C spin at 100% CPU on the retry loop. All three
threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
is perpetually preempted and starved ― it cannot complete the
few instructions needed to call rcu_read_unlock().
5.The held RCU read lock blocks the grace period indefinitely,
causing all synchronize_rcu() callers (cgroup operations,
fd allocation, etc.) to hang, eventually blocking init.
[Scheduling Details]
Key observations:
1.Thread A was RCU-boosted to prio 98 but accumulated only
99ms of execution over the entire stall period (~1200s).
It was effectively starved despite the priority boost.
2.Threads B & C have vruntime=0 and prio 91, indicating
they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
policy). Each accumulated ~1134 seconds of execution with
only ~1600 context switches, meaning they ran uninterrupted
for ~700ms per scheduling quantum on average.
3.Thread A cannot preempt Threads B & C: Although RCU boost
raised Thread A to prio 98, Threads B & C at prio 91 (lower
numeric value = higher priority in RT class) have equal or
higher effective priority. The busy-looping threads never
voluntarily yield (no cond_resched(), no blocking calls in
the loop), so Thread A never gets scheduled.
4.CPU contention: CPU0 had nr_running=28 and CPU1 had
nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
with Thread B on CPU0 but could not win scheduling.
[Observed Impact]
1.RCU stall lasting 910+ seconds (19 consecutive stall
warnings, grace period g=4398761 never advanced)
2.synchronize_rcu_expedited() callers blocked 742+ seconds
3.init process hung > 720 seconds → system unresponsive
[Call Traces]
Thread A (RCU stall source, sampled 19 times identically):
__switch_to+0x1a4/0x360 (T)
__schedule+0x96c/0xf3c
preempt_schedule_irq+0xec/0x198
raw_irqentry_exit_cond_resched+0x2c/0x44
irqentry_exit+0x38/0x64
exit_to_kernel_mode+0x28/0x38
el1_interrupt+0x5c/0xa8
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x84/0x88
workingset_refault+0x16c/0x79c (P)
  shmem_swapin_folio+0x8e4/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault+0xa0/0x174
        __do_fault
do_pte_missing
handle_mm_fault
do_page_fault
el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
xas_load+0x78/0xe4 (P)
  shmem_swapin_folio+0x950/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
xas_load+0x50/0xe4 (P)
  shmem_swapin_folio+0xd8/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia
[Question]
What is the recommended approach to fix this livelock?
We are considering adding a cond_resched() before the
goto repeat in shmem_get_folio_gfp() to break the tight
loop and allow the swapcache-holding thread to make
progress. Would this be an acceptable fix, or is there
a better strategy (e.g., bounded retry with fallback,
or yielding to the specific waiter)?

Thanks,
Chao Ma
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

[-- Attachment #2: Type: text/html, Size: 20427 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [BUG] shmem: shmem_get_folio_gfp livelock
  2026-06-30 12:55   ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
@ 2026-06-30 13:15     ` 马超
  2026-07-01 10:03       ` Baolin Wang
  0 siblings, 1 reply; 4+ messages in thread
From: 马超 @ 2026-06-30 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	田孝斌, 俞东斌,
	李鹏程

[-- Attachment #1: Type: text/plain, Size: 4940 bytes --]

Hello,
I encountered a bug in the shmem subsystem. Details below.

[Summary]
shmem_get_folio_gfp() can livelock when multiple threads fault on the
same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
no cond_resched(), causing busy-looping threads to starve the thread
that holds the swapcache slot, resulting in an indefinite RCU stall and
system hang.
[Environment]
1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
via cpuset
[Root Cause]
When multiple threads in the same process fault on the same shmem
swap entry:
1.Thread A enters shmem_swap_alloc_folio(), succeeds at
swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
workingset_refault() → lru_gen_refault() → rcu_read_lock().
While inside the RCU read-side critical section, it is
preempted via preempt_schedule_irq (IRQ exit path detects
TIF_NEED_RESCHED).
2.Threads B & C enter shmem_swap_alloc_folio(), fail at
swapcache_prepare() (slot already taken by A), return -EEXIST.
3.In shmem_get_folio_gfp():
    error = shmem_swapin_folio(...);
    if (error == -EEXIST)
        goto repeat;// no cond_resched(), tight loop
4.Threads B & C spin at 100% CPU on the retry loop. All three
threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
is perpetually preempted and starved ― it cannot complete the
few instructions needed to call rcu_read_unlock().
5.The held RCU read lock blocks the grace period indefinitely,
causing all synchronize_rcu() callers (cgroup operations,
fd allocation, etc.) to hang, eventually blocking init.
[Scheduling Details]
Key observations:
1.Thread A was RCU-boosted to prio 98 but accumulated only
99ms of execution over the entire stall period (~1200s).
It was effectively starved despite the priority boost.
2.Threads B & C have vruntime=0 and prio 91, indicating
they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
policy). Each accumulated ~1134 seconds of execution with
only ~1600 context switches, meaning they ran uninterrupted
for ~700ms per scheduling quantum on average.
3.Thread A cannot preempt Threads B & C: Although RCU boost
raised Thread A to prio 98, Threads B & C at prio 91 (lower
numeric value = higher priority in RT class) have equal or
higher effective priority. The busy-looping threads never
voluntarily yield (no cond_resched(), no blocking calls in
the loop), so Thread A never gets scheduled.
4.CPU contention: CPU0 had nr_running=28 and CPU1 had
nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
with Thread B on CPU0 but could not win scheduling.
[Observed Impact]
1.RCU stall lasting 910+ seconds (19 consecutive stall
warnings, grace period g=4398761 never advanced)
2.synchronize_rcu_expedited() callers blocked 742+ seconds
3.init process hung > 720 seconds → system unresponsive
[Call Traces]
Thread A (RCU stall source, sampled 19 times identically):
__switch_to+0x1a4/0x360 (T)
__schedule+0x96c/0xf3c
preempt_schedule_irq+0xec/0x198
raw_irqentry_exit_cond_resched+0x2c/0x44
irqentry_exit+0x38/0x64
exit_to_kernel_mode+0x28/0x38
el1_interrupt+0x5c/0xa8
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x84/0x88
workingset_refault+0x16c/0x79c (P)
  shmem_swapin_folio+0x8e4/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault+0xa0/0x174
        __do_fault
do_pte_missing
handle_mm_fault
do_page_fault
el0_ia

Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
xas_load+0x78/0xe4 (P)
  shmem_swapin_folio+0x950/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia

Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
xas_load+0x50/0xe4 (P)
  shmem_swapin_folio+0xd8/0xd44
    shmem_get_folio_gfp+0xb8/0x710
      shmem_fault → ... → el0_ia
[Question]
What is the recommended approach to fix this livelock?
We are considering adding a cond_resched() before the
goto repeat in shmem_get_folio_gfp() to break the tight
loop and allow the swapcache-holding thread to make
progress. Would this be an acceptable fix, or is there
a better strategy (e.g., bounded retry with fallback,
or yielding to the specific waiter)?

Thanks,
Chao Ma
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

[-- Attachment #2: Type: text/html, Size: 18108 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] shmem: shmem_get_folio_gfp livelock
  2026-06-30 13:15     ` 马超
@ 2026-07-01 10:03       ` Baolin Wang
  2026-07-01 17:25         ` Kairui Song
  0 siblings, 1 reply; 4+ messages in thread
From: Baolin Wang @ 2026-07-01 10:03 UTC (permalink / raw)
  To: 马超, Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	田孝斌, 俞东斌,
	李鹏程, hughd@google.com, Kairui Song

CC Hugh and Kairui.

On 6/30/26 9:15 PM, 马超 wrote:
> Hello,
> I encountered a bug in the shmem subsystem. Details below.
> 
> [Summary]
> shmem_get_folio_gfp() can livelock when multiple threads fault on the
> same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
> no cond_resched(), causing busy-looping threads to starve the thread
> that holds the swapcache slot, resulting in an indefinite RCU stall and
> system hang.
> [Environment]
> 1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
> 2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
> via cpuset
> [Root Cause]
> When multiple threads in the same process fault on the same shmem
> swap entry:
> 1.Thread A enters shmem_swap_alloc_folio(), succeeds at
> swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
> workingset_refault() → lru_gen_refault() → rcu_read_lock().
> While inside the RCU read-side critical section, it is
> preempted via preempt_schedule_irq (IRQ exit path detects
> TIF_NEED_RESCHED).
> 2.Threads B & C enter shmem_swap_alloc_folio(), fail at
> swapcache_prepare() (slot already taken by A), return -EEXIST.
> 3.In shmem_get_folio_gfp():
>      error = shmem_swapin_folio(...);
>      if (error == -EEXIST)
>          goto repeat;// no cond_resched(), tight loop
> 4.Threads B & C spin at 100% CPU on the retry loop. All three
> threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
> is perpetually preempted and starved — it cannot complete the
> few instructions needed to call rcu_read_unlock().
> 5.The held RCU read lock blocks the grace period indefinitely,
> causing all synchronize_rcu() callers (cgroup operations,
> fd allocation, etc.) to hang, eventually blocking init.
> [Scheduling Details]
> Key observations:
> 1.Thread A was RCU-boosted to prio 98 but accumulated only
> 99ms of execution over the entire stall period (~1200s).
> It was effectively starved despite the priority boost.
> 2.Threads B & C have vruntime=0 and prio 91, indicating
> they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
> policy). Each accumulated ~1134 seconds of execution with
> only ~1600 context switches, meaning they ran uninterrupted
> for ~700ms per scheduling quantum on average.
> 3.Thread A cannot preempt Threads B & C: Although RCU boost
> raised Thread A to prio 98, Threads B & C at prio 91 (lower
> numeric value = higher priority in RT class) have equal or
> higher effective priority. The busy-looping threads never
> voluntarily yield (no cond_resched(), no blocking calls in
> the loop), so Thread A never gets scheduled.
> 4.CPU contention: CPU0 had nr_running=28 and CPU1 had
> nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
> with Thread B on CPU0 but could not win scheduling.
> [Observed Impact]
> 1.RCU stall lasting 910+ seconds (19 consecutive stall
> warnings, grace period g=4398761 never advanced)
> 2.synchronize_rcu_expedited() callers blocked 742+ seconds
> 3.init process hung > 720 seconds → system unresponsive
> [Call Traces]
> Thread A (RCU stall source, sampled 19 times identically):
> __switch_to+0x1a4/0x360 (T)
> __schedule+0x96c/0xf3c
> preempt_schedule_irq+0xec/0x198
> raw_irqentry_exit_cond_resched+0x2c/0x44
> irqentry_exit+0x38/0x64
> exit_to_kernel_mode+0x28/0x38
> el1_interrupt+0x5c/0xa8
> el1h_64_irq_handler+0x18/0x24
> el1h_64_irq+0x84/0x88
> workingset_refault+0x16c/0x79c (P)
>    shmem_swapin_folio+0x8e4/0xd44
>      shmem_get_folio_gfp+0xb8/0x710
>        shmem_fault+0xa0/0x174
>          __do_fault
> do_pte_missing
> handle_mm_fault
> do_page_fault
> el0_ia
> 
> Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
> xas_load+0x78/0xe4 (P)
>    shmem_swapin_folio+0x950/0xd44
>      shmem_get_folio_gfp+0xb8/0x710
>        shmem_fault → ... → el0_ia
> 
> Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
> xas_load+0x50/0xe4 (P)
>    shmem_swapin_folio+0xd8/0xd44
>      shmem_get_folio_gfp+0xb8/0x710
>        shmem_fault → ... → el0_ia
> [Question]
> What is the recommended approach to fix this livelock?
> We are considering adding a cond_resched() before the
> goto repeat in shmem_get_folio_gfp() to break the tight
> loop and allow the swapcache-holding thread to make
> progress. Would this be an acceptable fix, or is there
> a better strategy (e.g., bounded retry with fallback,
> or yielding to the specific waiter)?

IIRC, the scheduler maintainers are not a fan of continuing to sprinkle 
random cond_resched() calls throughout the kernel. The scheduling 
decisions should be left to the scheduler itself.

Regarding your issue, could you try the latest kernel? IIUC, this 
problem has already been fixed there (likely from Kairui's swap 
refactoring work [1]).

Now the shmem swapin call trace should be:

shmem_swapin_folio()
   -> shmem_swap_alloc_folio() (I think you use the SYNC swap device)
     -> swapin_sync()

In swapin_sync(), it first checks whether a folio is already present in 
the swapcache. If so, it returns immediately. In your case, threads B/C 
would get the folio that has already been added to the swapcache and 
continue onward, instead of retrying in a loop.

struct folio *swapin_sync(swp_entry_t entry, gfp_t gfp, unsigned long 
orders,
                            struct vm_fault *vmf, struct mempolicy 
*mpol, pgoff_t ilx)
{
         struct folio *folio;

         do {
                 folio = swap_cache_get_folio(entry);
                 if (folio)
                         return folio;
                 folio = swap_cache_alloc_folio(entry, gfp, orders, vmf, 
mpol, ilx);
         } while (PTR_ERR(folio) == -EEXIST);

         if (IS_ERR(folio))
                 return folio;

         swap_read_folio(folio, NULL);
         return folio;
}


[1] 
https://lore.kernel.org/all/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.c


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] shmem: shmem_get_folio_gfp livelock
  2026-07-01 10:03       ` Baolin Wang
@ 2026-07-01 17:25         ` Kairui Song
  0 siblings, 0 replies; 4+ messages in thread
From: Kairui Song @ 2026-07-01 17:25 UTC (permalink / raw)
  To: Baolin Wang
  Cc: 马超, Andrew Morton, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, 田孝斌,
	俞东斌, 李鹏程,
	hughd@google.com

On Wed, Jul 1, 2026 at 6:09 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> CC Hugh and Kairui.

Hello!

>
> On 6/30/26 9:15 PM, 马超 wrote:
> > Hello,
> > I encountered a bug in the shmem subsystem. Details below.
> >
> > [Summary]
> > shmem_get_folio_gfp() can livelock when multiple threads fault on the
> > same shmem page concurrently. The -EEXIST retry loop (goto repeat) has
> > no cond_resched(), causing busy-looping threads to starve the thread
> > that holds the swapcache slot, resulting in an indefinite RCU stall and
> > system hang.
> > [Environment]
> > 1.Kernel: 6.18.21 (ARM64, PREEMPT, CONFIG_LRU_GEN=y)
> > 2.Triggered by: multi-threaded app with threads constrained to 2 CPUs
> > via cpuset
> > [Root Cause]
> > When multiple threads in the same process fault on the same shmem
> > swap entry:
> > 1.Thread A enters shmem_swap_alloc_folio(), succeeds at
> > swapcache_prepare() (sets SWAP_HAS_CACHE), then enters
> > workingset_refault() → lru_gen_refault() → rcu_read_lock().
> > While inside the RCU read-side critical section, it is
> > preempted via preempt_schedule_irq (IRQ exit path detects
> > TIF_NEED_RESCHED).
> > 2.Threads B & C enter shmem_swap_alloc_folio(), fail at
> > swapcache_prepare() (slot already taken by A), return -EEXIST.
> > 3.In shmem_get_folio_gfp():
> >      error = shmem_swapin_folio(...);
> >      if (error == -EEXIST)
> >          goto repeat;// no cond_resched(), tight loop
> > 4.Threads B & C spin at 100% CPU on the retry loop. All three
> > threads share the same cpuset (CPU0-1,cpus_allowed=0x3).Thread A
> > is perpetually preempted and starved — it cannot complete the
> > few instructions needed to call rcu_read_unlock().
> > 5.The held RCU read lock blocks the grace period indefinitely,
> > causing all synchronize_rcu() callers (cgroup operations,
> > fd allocation, etc.) to hang, eventually blocking init.
> > [Scheduling Details]
> > Key observations:
> > 1.Thread A was RCU-boosted to prio 98 but accumulated only
> > 99ms of execution over the entire stall period (~1200s).
> > It was effectively starved despite the priority boost.
> > 2.Threads B & C have vruntime=0 and prio 91, indicating
> > they run in an RT-equivalent scheduling class (SCHED_FIFO/RT
> > policy). Each accumulated ~1134 seconds of execution with
> > only ~1600 context switches, meaning they ran uninterrupted
> > for ~700ms per scheduling quantum on average.
> > 3.Thread A cannot preempt Threads B & C: Although RCU boost
> > raised Thread A to prio 98, Threads B & C at prio 91 (lower
> > numeric value = higher priority in RT class) have equal or
> > higher effective priority. The busy-looping threads never
> > voluntarily yield (no cond_resched(), no blocking calls in
> > the loop), so Thread A never gets scheduled.
> > 4.CPU contention: CPU0 had nr_running=28 and CPU1 had
> > nr_running=24, with 3-4 RT tasks per CPU. Thread A competed
> > with Thread B on CPU0 but could not win scheduling.
> > [Observed Impact]
> > 1.RCU stall lasting 910+ seconds (19 consecutive stall
> > warnings, grace period g=4398761 never advanced)
> > 2.synchronize_rcu_expedited() callers blocked 742+ seconds
> > 3.init process hung > 720 seconds → system unresponsive
> > [Call Traces]
> > Thread A (RCU stall source, sampled 19 times identically):
> > __switch_to+0x1a4/0x360 (T)
> > __schedule+0x96c/0xf3c
> > preempt_schedule_irq+0xec/0x198
> > raw_irqentry_exit_cond_resched+0x2c/0x44
> > irqentry_exit+0x38/0x64
> > exit_to_kernel_mode+0x28/0x38
> > el1_interrupt+0x5c/0xa8
> > el1h_64_irq_handler+0x18/0x24
> > el1h_64_irq+0x84/0x88
> > workingset_refault+0x16c/0x79c (P)
> >    shmem_swapin_folio+0x8e4/0xd44
> >      shmem_get_folio_gfp+0xb8/0x710
> >        shmem_fault+0xa0/0x174
> >          __do_fault
> > do_pte_missing
> > handle_mm_fault
> > do_page_fault
> > el0_ia
> >
> > Thread B (busy-loop on CPU0, sum_exec_runtime=1134s):
> > xas_load+0x78/0xe4 (P)
> >    shmem_swapin_folio+0x950/0xd44
> >      shmem_get_folio_gfp+0xb8/0x710
> >        shmem_fault → ... → el0_ia
> >
> > Thread C (busy-loop on CPU1, sum_exec_runtime=1134s):
> > xas_load+0x50/0xe4 (P)
> >    shmem_swapin_folio+0xd8/0xd44
> >      shmem_get_folio_gfp+0xb8/0x710
> >        shmem_fault → ... → el0_ia
> > [Question]
> > What is the recommended approach to fix this livelock?
> > We are considering adding a cond_resched() before the
> > goto repeat in shmem_get_folio_gfp() to break the tight
> > loop and allow the swapcache-holding thread to make
> > progress. Would this be an acceptable fix, or is there
> > a better strategy (e.g., bounded retry with fallback,
> > or yielding to the specific waiter)?

So this is a 6.18 issue, the SWAP_HAS_CACHE design was a long time
issue and we had a workaround for anon in commit 13ddaf26be32, the
`schedule_timeout_uninterruptible(1)`. That's fine as a workaroud and
we remove the workaround with proper redesign later in commit
f1879e8a0c60. Shmem never used a workaround like
schedule_timeout_uninterruptible and we jump to the right design
directly.

For 6.18 LTS, do we need a fix though? I think we can have a similar
timeout for shmem as well, just do a
schedule_timeout_uninterruptible(1), yes it's really ugly, and we have
another long time workaround like that in mm/swap_state.c as well in
6.18. Without the later proper redesign there seems no better way.
Barry provided an extra improvement on top of that workaround in
commit 01626a182302, maybe can also be carried into shmem. So maybe
having both for shmem is a good choice for a 6.18 fix.

>
> IIRC, the scheduler maintainers are not a fan of continuing to sprinkle
> random cond_resched() calls throughout the kernel. The scheduling
> decisions should be left to the scheduler itself.
>
> Regarding your issue, could you try the latest kernel? IIUC, this
> problem has already been fixed there (likely from Kairui's swap
> refactoring work [1]).

Yes, exactly, the recent swap rework removed the root SWAP_HAS_CACHE
problem, so we should no longer see any similiar problems like this in
mainline.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-07-01 17:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <e1c290782eeb419e8c3e18ac6b1f49eb@xiaomi.com>
     [not found] ` <126cb4ced14f4a3fa40c3189bf8a5920@xiaomi.com>
2026-06-30 12:55   ` [BUG] shmem: shmem_get_folio_gfp livelock 马超
2026-06-30 13:15     ` 马超
2026-07-01 10:03       ` Baolin Wang
2026-07-01 17:25         ` Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox