[BUG] RCU Detected Stall in sys_process_vm

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUG] RCU Detected Stall in sys_process_vm_writev
@ 2025-05-19  5:19 Guoyu Yin
  2025-05-19  7:36 ` David Hildenbrand
  2025-05-23 17:34 ` Dave Hansen
  0 siblings, 2 replies; 4+ messages in thread
From: Guoyu Yin @ 2025-05-19  5:19 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, dave.hansen, luto, peterz, tglx, mingo,
	bp, x86, hpa

Hi,

I discovered a kernel crash using the Syzkaller framework, described
as "INFO: rcu detected stall in sys_process_vm_writev". This issue
occurs during the execution of the sys_process_vm_writev system call,
where RCU detects a stall on CPU 0.

From the dmesg log, CPU 3 is stuck trying to acquire a spinlock in the
pgd_free function (arch/x86/mm/pgtable.c:490), leading to the RCU
stall. This is likely caused by spinlock contention triggered by the
page pinning and unpinning logic in sys_process_vm_writev under high
load or abnormal conditions.

I recommend reviewing the page pinning (pin_user_pages_remote) and
unpinning (unpin_user_pages_dirty_lock) logic in
process_vm_rw_single_vec (mm/process_vm_access.c) to ensure it does
not cause prolonged spinlock blocking due to scheduling delays or
resource contention.

This can be reproduced on:

HEAD commit:

fac04efc5c793dccbd07e2d59af9f90b7fc0dca4

report: https://pastebin.com/raw/v7xV4BdD

console output : https://pastebin.com/raw/GfJLqkpf

kernel config: https://pastebin.com/raw/zrj9jd1V

C reproducer : https://pastebin.com/raw/8Mm5f2kh

Best regards,

Guoyu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU Detected Stall in sys_process_vm_writev
  2025-05-19  5:19 [BUG] RCU Detected Stall in sys_process_vm_writev Guoyu Yin
@ 2025-05-19  7:36 ` David Hildenbrand
  2025-05-19 15:42   ` Dave Hansen
  2025-05-23 17:34 ` Dave Hansen
  1 sibling, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2025-05-19  7:36 UTC (permalink / raw)
  To: Guoyu Yin, akpm
  Cc: linux-mm, linux-kernel, dave.hansen, luto, peterz, tglx, mingo,
	bp, x86, hpa

On 19.05.25 07:19, Guoyu Yin wrote:
> Hi,
> 
> I discovered a kernel crash using the Syzkaller framework, described
> as "INFO: rcu detected stall in sys_process_vm_writev". This issue
> occurs during the execution of the sys_process_vm_writev system call,
> where RCU detects a stall on CPU 0.
> 
>  From the dmesg log, CPU 3 is stuck trying to acquire a spinlock in the
> pgd_free function (arch/x86/mm/pgtable.c:490), leading to the RCU
> stall. This is likely caused by spinlock contention triggered by the
> page pinning and unpinning logic in sys_process_vm_writev under high
> load or abnormal conditions.

pgd_free() calls pgd_dtor() where we should be taking the pgd_lock. 
Apart from that, only the buddy allocator might be taking locks when 
freeing the page.

> 
> I recommend reviewing the page pinning (pin_user_pages_remote) and
> unpinning (unpin_user_pages_dirty_lock) logic in
> process_vm_rw_single_vec (mm/process_vm_access.c) to ensure it does
> not cause prolonged spinlock blocking due to scheduling delays or
> resource contention.

This almost reads like AI generated content.

Anyhow, unpin_user_pages_dirty_lock() should only be taking the folio 
lock, and pin_user_pages_remote() should only be taking page table locks.

As I am sure you wouldn't bother us with AI generated slop, what makes 
you think that the pgd_lock is relevant in the context of GUP?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU Detected Stall in sys_process_vm_writev
  2025-05-19  7:36 ` David Hildenbrand
@ 2025-05-19 15:42   ` Dave Hansen
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Hansen @ 2025-05-19 15:42 UTC (permalink / raw)
  To: David Hildenbrand, Guoyu Yin, akpm
  Cc: linux-mm, linux-kernel, dave.hansen, luto, peterz, tglx, mingo,
	bp, x86, hpa

On 5/19/25 00:36, David Hildenbrand wrote:
>>  From the dmesg log, CPU 3 is stuck trying to acquire a spinlock in the
>> pgd_free function (arch/x86/mm/pgtable.c:490), leading to the RCU
>> stall. This is likely caused by spinlock contention triggered by the
>> page pinning and unpinning logic in sys_process_vm_writev under high
>> load or abnormal conditions.
> 
> pgd_free() calls pgd_dtor() where we should be taking the pgd_lock.
> Apart from that, only the buddy allocator might be taking locks when
> freeing the page.

Yeah, it's definitely the pgd_lock.

I just did a quick audit of all the sites that use it and I didn't see
any places where we could have left it locked. The most likely means
it's _another_ lock underneath (like pgt_lock) that's stuck. The other
possibility is that the thread holding pgd_lock got in trouble like with
a BUG_ON().

Guoyu Yin: Can you try to reproduce this with lockdep enabled, please?

It would also be useful to see the state of all the other threads in the
system. Are any of them stuck in a path that holds pgd_lock?

> As I am sure you wouldn't bother us with AI generated slop, what
> makes you think that the pgd_lock is relevant in the context of
> GUP?

Yeah, I think someone (or some bad AI bot) is looking at the use of
'pgd_lock' in xen_mm_pin/unpin_all() and somehow connecting that to
pin_user_pages().

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU Detected Stall in sys_process_vm_writev
  2025-05-19  5:19 [BUG] RCU Detected Stall in sys_process_vm_writev Guoyu Yin
  2025-05-19  7:36 ` David Hildenbrand
@ 2025-05-23 17:34 ` Dave Hansen
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Hansen @ 2025-05-23 17:34 UTC (permalink / raw)
  To: Guoyu Yin, akpm
  Cc: linux-mm, linux-kernel, dave.hansen, luto, peterz, tglx, mingo,
	bp, x86, hpa

On 5/18/25 22:19, Guoyu Yin wrote:
> I discovered a kernel crash using the Syzkaller framework, described
> as "INFO: rcu detected stall in sys_process_vm_writev". This issue
> occurs during the execution of the sys_process_vm_writev system call,
> where RCU detects a stall on CPU 0.
Guoyu,

Could you tell us a little more about the overall environment here? It
seems like you're running syzkaller and just reporting whenever you see
a splat. Is that about right? Could you tell us a little more about why
you are doing this? What is your goal?

I think Steve's advice he gave to en eerily similar report applies to
this one as well:

https://lore.kernel.org/all/20250521133137.1b2f2cac@gandalf.local.home/

Feel free to _run_ with KASAN enabled, but please don't report issues
unless you can reproduce without KASAN. Unless it's an actual KASAN
error report, of course.

But, in general syzkaller produces a ton of noise. Unless you have a
reproducer or a _clear_ bug, I'm not sure it's very worth sending these
reports. There's honestly not much we can do with them.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-23 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19  5:19 [BUG] RCU Detected Stall in sys_process_vm_writev Guoyu Yin
2025-05-19  7:36 ` David Hildenbrand
2025-05-19 15:42   ` Dave Hansen
2025-05-23 17:34 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).