lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load
       [not found] ` <eecb1d2d1f7a44ef8c757138cb1b3755@huawei.com>
@ 2026-02-02  1:19   ` Zhangjiaji
  2026-02-06 11:07     ` Thorsten Leemhuis
  0 siblings, 1 reply; 2+ messages in thread
From: Zhangjiaji @ 2026-02-02  1:19 UTC (permalink / raw)
  To: stable@vger.kernel.org
  Cc: huyu (D), Wangqinxiao (Tom), regressions@lists.linux.dev,
	Liumengqiu

Hi all,

I'm hitting a lock contention / long stall issue on an x86 KVM host under heavy VM load, and I'd like to ask for advice on the proper fix direction.

Problem summary
When the host is under heavy VM pressure and a cache drop is triggered, the reclaim path can hold shrinker_rwsem for a long time due to lock contention on kvm_lock inside the KVM/MMU shrinker, which then blocks systemd in a way that also holds cgroup_mutex, causing cascading issues (e.g., journald log gaps).

Observed lock chain / flow
From what I see:

1. drop_caches leads to slab reclaim and enters shrink_slab()
2. shrink_slab() takes shrinker_rwsem
3. It then enters do_shrink_slab()
4. During slab shrinking, the KVM/MMU shrinker callback is invoked (e.g mmu_shrink_scan()) to reclaim KVM-related caches
5. mmu_shrink_scan() attempts to take kvm_lock
6. Under heavy VM load, kvm_lock is highly contended, so the shrinker callback stalls and shrinker_rwsem remains held for an extended time

In parallel:

7. systemd holds cgroup_mutex (e.g. during cgroup operations) and then tries to acquire shrinker_rwsem
8. Because shrinker_rwsem is still held by the drop_caches reclaim path, systemd blocks while still holding cgroup_mutex
9. Other components (e.g. systemd-journald) needing cgroup_mutex become blocked, leading to issues such as logging stalls/gaps

Impact
- Long stalls in systemd-controlled cgroup operations
- systemd-journald (and possibly others) blocked on cgroup_mutex, causing log dropouts / discontinuities
- Overall system responsiveness degradation during the cache-drop operation

Questions
1. Is it expected/acceptable for a shrinker callback (KVM/MMU shrinker) to contend on a highly contended lock like kvm_lock while shrinker_rwsem is held?
2. Are there known recommendations to avoid holding shrinker_rwsem across potentially blocking/contended shrinker callbacks?
3. Would the preferred fix be on the KVM shrinker side (e.g. using mutex_trylock()/spin_trylock() semantics and returning SHRINK_STOP/-EAGAIN style behavior when contended), or on the shrink_slab/shrinker infrastructure side?
4. Alternatively, is there any known guidance for systemd/cgroup codepaths to avoid waiting on shrinker_rwsem while holding cgroup_mutex (to avoid lock chaining)?

Please let me know what the most useful information would be, and what direction you would recommend for a fix.

Thanks,
Huyu

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load
  2026-02-02  1:19   ` lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load Zhangjiaji
@ 2026-02-06 11:07     ` Thorsten Leemhuis
  0 siblings, 0 replies; 2+ messages in thread
From: Thorsten Leemhuis @ 2026-02-06 11:07 UTC (permalink / raw)
  To: Zhangjiaji, stable@vger.kernel.org
  Cc: huyu (D), Wangqinxiao (Tom), regressions@lists.linux.dev,
	Liumengqiu

On 2/2/26 02:19, Zhangjiaji wrote:
> 
> I'm hitting a lock contention / long stall issue on an x86 KVM host
> under heavy VM load, and I'd like to ask for advice on the proper
> fix direction.

Thx for the report. You CCed the stable and the regressions list, which
leads to a few important questions:

* Is mainline affected as well?
* What was the last version where things where working?
* Could you bisect? https://docs.kernel.org/admin-guide/bug-bisect.html

Ciao, Thorsten

> Problem summary When the host is under heavy VM pressure and a cache
> drop is triggered, the reclaim path can hold shrinker_rwsem for a
> long time due to lock contention on kvm_lock inside the KVM/MMU
> shrinker, which then blocks systemd in a way that also holds
> cgroup_mutex, causing cascading issues (e.g., journald log gaps).
> 
> Observed lock chain / flow
>> From what I see:
> 
> 1. drop_caches leads to slab reclaim and enters shrink_slab() 2.
> shrink_slab() takes shrinker_rwsem 3. It then enters
> do_shrink_slab() 4. During slab shrinking, the KVM/MMU shrinker
> callback is invoked (e.g mmu_shrink_scan()) to reclaim KVM-related
> caches 5. mmu_shrink_scan() attempts to take kvm_lock 6. Under heavy
> VM load, kvm_lock is highly contended, so the shrinker callback
> stalls and shrinker_rwsem remains held for an extended time
> 
> In parallel:
> 
> 7. systemd holds cgroup_mutex (e.g. during cgroup operations) and
> then tries to acquire shrinker_rwsem 8. Because shrinker_rwsem is
> still held by the drop_caches reclaim path, systemd blocks while
> still holding cgroup_mutex 9. Other components (e.g. systemd-
> journald) needing cgroup_mutex become blocked, leading to issues
> such as logging stalls/gaps
> 
> Impact - Long stalls in systemd-controlled cgroup operations -
> systemd-journald (and possibly others) blocked on cgroup_mutex,
> causing log dropouts / discontinuities - Overall system
> responsiveness degradation during the cache-drop operation
> 
> Questions 1. Is it expected/acceptable for a shrinker callback (KVM/
> MMU shrinker) to contend on a highly contended lock like kvm_lock
> while shrinker_rwsem is held? 2. Are there known recommendations to
> avoid holding shrinker_rwsem across potentially blocking/contended
> shrinker callbacks? 3. Would the preferred fix be on the KVM
> shrinker side (e.g. using mutex_trylock()/spin_trylock() semantics
> and returning SHRINK_STOP/-EAGAIN style behavior when contended), or
> on the shrink_slab/shrinker infrastructure side? 4. Alternatively,
> is there any known guidance for systemd/cgroup codepaths to avoid
> waiting on shrinker_rwsem while holding cgroup_mutex (to avoid lock
> chaining)?
> 
> Please let me know what the most useful information would be, and
> what direction you would recommend for a fix.
> 
> Thanks, Huyu
> 
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-02-06 11:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <505c34d2cef84117b7e995c211efc393@huawei.com>
     [not found] ` <eecb1d2d1f7a44ef8c757138cb1b3755@huawei.com>
2026-02-02  1:19   ` lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load Zhangjiaji
2026-02-06 11:07     ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox