linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released
@ 2024-07-02 18:54 syzbot
  2024-07-04 20:12 ` Jesper Dangaard Brouer
  2024-08-19  4:40 ` syzbot
  0 siblings, 2 replies; 4+ messages in thread
From: syzbot @ 2024-07-02 18:54 UTC (permalink / raw)
  To: akpm, cgroups, hannes, hawk, linux-kernel, linux-mm,
	linux-trace-kernel, lizefan.x, mathieu.desnoyers, mhiramat,
	netdev, rostedt, syzkaller-bugs, tj

Hello,

syzbot found the following issue on:

HEAD commit:    a12978712d90 selftests/bpf: Move ARRAY_SIZE to bpf_misc.h
git tree:       bpf-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=130457fa980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=736daf12bd72e034
dashboard link: https://syzkaller.appspot.com/bug?extid=16b6ab88e66b34d09014
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125718be980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14528876980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9d845a55bf58/disk-a1297871.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/12cb27bdb2de/vmlinux-a1297871.xz
kernel image: https://storage.googleapis.com/syzbot-assets/db09a1fa448c/bzImage-a1297871.xz

The issue was bisected to:

commit 21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2
Author: Jesper Dangaard Brouer <hawk@kernel.org>
Date:   Wed May 1 14:04:11 2024 +0000

    cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14ecc085980000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=16ecc085980000
console output: https://syzkaller.appspot.com/x/log.txt?x=12ecc085980000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+16b6ab88e66b34d09014@syzkaller.appspotmail.com
Fixes: 21c38a3bd4ee ("cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints")

============================================
WARNING: possible recursive locking detected
6.10.0-rc2-syzkaller-00797-ga12978712d90 #0 Not tainted
--------------------------------------------
syz-executor646/5097 is trying to acquire lock:
ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243

but task is already holding lock:
ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(lock#9);
  lock(lock#9);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

5 locks held by syz-executor646/5097:
 #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
 #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: acct_collect+0x1cf/0x830 kernel/acct.c:563
 #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
 #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
 #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
 #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
 #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_memcg_path_buf mm/mmap_lock.c:139 [inline]
 #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_mm_memcg_path+0xb1/0x600 mm/mmap_lock.c:209
 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: trace_call_bpf+0xbc/0x8a0
 #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_trylock include/linux/mmap_lock.h:163 [inline]
 #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: stack_map_get_build_id_offset+0x237/0x9d0 kernel/bpf/stackmap.c:141

stack backtrace:
CPU: 0 PID: 5097 Comm: syz-executor646 Not tainted 6.10.0-rc2-syzkaller-00797-ga12978712d90 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 check_deadlock kernel/locking/lockdep.c:3062 [inline]
 validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3856
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
 local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
 __mmap_lock_do_trace_released+0x9c/0x620 mm/mmap_lock.c:243
 __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
 mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
 bpf_mmap_unlock_mm kernel/bpf/mmap_unlock_work.h:52 [inline]
 stack_map_get_build_id_offset+0x9c7/0x9d0 kernel/bpf/stackmap.c:173
 __bpf_get_stack+0x4ad/0x5a0 kernel/bpf/stackmap.c:449
 bpf_prog_e6cf5f9c69743609+0x42/0x46
 bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
 __bpf_prog_run include/linux/filter.h:691 [inline]
 bpf_prog_run include/linux/filter.h:698 [inline]
 bpf_prog_run_array include/linux/bpf.h:2104 [inline]
 trace_call_bpf+0x369/0x8a0 kernel/trace/bpf_trace.c:147
 perf_trace_run_bpf_submit+0x7c/0x1d0 kernel/events/core.c:10269
 perf_trace_mmap_lock+0x3d7/0x510 include/trace/events/mmap_lock.h:16
 trace_mmap_lock_released include/trace/events/mmap_lock.h:50 [inline]
 __mmap_lock_do_trace_released+0x5bb/0x620 mm/mmap_lock.c:243
 __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
 mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
 acct_collect+0x81d/0x830 kernel/acct.c:566
 do_exit+0x936/0x27e0 kernel/exit.c:853
 do_group_exit+0x207/0x2c0 kernel/exit.c:1023
 __do_sys_exit_group kernel/exit.c:1034 [inline]
 __se_sys_exit_group kernel/exit.c:1032 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1032
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8fac26d039
Code: 90 49 c7 c0 b8 ff ff ff be e7 00 00 00 ba 3c 00 00 00 eb 12 0f 1f 44 00 00 89 d0 0f 05 48 3d 00 f0 ff ff 77 1c f4 89 f0 0f 05 <48> 3d 00 f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00
RSP: 002b:00007ffd95d56e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8fac26d039
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007f8fac2e82b0 R08: ffffffffffffffb8 R09: 00000000000000a0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8fac2e82b0
R13: 0000000000000000 R14: 00007f8fac2e8d20 R15: 00007f8fac23e1e0
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released
  2024-07-02 18:54 [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released syzbot
@ 2024-07-04 20:12 ` Jesper Dangaard Brouer
  2024-07-11 14:33   ` Steven Rostedt
  2024-08-19  4:40 ` syzbot
  1 sibling, 1 reply; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2024-07-04 20:12 UTC (permalink / raw)
  To: syzbot, akpm, cgroups, hannes, linux-kernel, linux-mm,
	linux-trace-kernel, lizefan.x, mathieu.desnoyers, mhiramat,
	netdev, rostedt, syzkaller-bugs, tj



On 02/07/2024 20.54, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    a12978712d90 selftests/bpf: Move ARRAY_SIZE to bpf_misc.h
> git tree:       bpf-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=130457fa980000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=736daf12bd72e034
> dashboard link: https://syzkaller.appspot.com/bug?extid=16b6ab88e66b34d09014
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125718be980000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14528876980000

I cannot reproduce with reproducer on my testlab.
(More below)

> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/9d845a55bf58/disk-a1297871.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/12cb27bdb2de/vmlinux-a1297871.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/db09a1fa448c/bzImage-a1297871.xz
> 
> The issue was bisected to:
> 
> commit 21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2
> Author: Jesper Dangaard Brouer <hawk@kernel.org>
> Date:   Wed May 1 14:04:11 2024 +0000
> 
>      cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14ecc085980000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=16ecc085980000
> console output: https://syzkaller.appspot.com/x/log.txt?x=12ecc085980000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+16b6ab88e66b34d09014@syzkaller.appspotmail.com
> Fixes: 21c38a3bd4ee ("cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints")
> 
> ============================================
> WARNING: possible recursive locking detected
> 6.10.0-rc2-syzkaller-00797-ga12978712d90 #0 Not tainted
> --------------------------------------------
> syz-executor646/5097 is trying to acquire lock:
> ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
> 
> but task is already holding lock:
> ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
> 
> other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(lock#9);
>    lock(lock#9);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 

To me, this looks like a lockdep false-positive, but I might be wrong.

Could someone with more LOCKDEP knowledge give their interpretation?

The commit[1] adds a fairly standard trylock scheme.
Do I need to lockdep annotate trylock's in some special way?

  [1] https://git.kernel.org/torvalds/c/21c38a3bd4ee3fb733

Also notice change uses raw_spin_lock, which might be harder for lockdep?
So, I also enabled CONFIG_PROVE_RAW_LOCK_NESTING in my testlab to help
with this, and CONFIG_PROVE_LOCKING.
(And obviously I also enabled LOCKDEP*)

--Jesper

> 5 locks held by syz-executor646/5097:
>   #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
>   #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: acct_collect+0x1cf/0x830 kernel/acct.c:563
>   #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
>   #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
>   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
>   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
>   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_memcg_path_buf mm/mmap_lock.c:139 [inline]
>   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_mm_memcg_path+0xb1/0x600 mm/mmap_lock.c:209
>   #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: trace_call_bpf+0xbc/0x8a0
>   #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_trylock include/linux/mmap_lock.h:163 [inline]
>   #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: stack_map_get_build_id_offset+0x237/0x9d0 kernel/bpf/stackmap.c:141
> 
> stack backtrace:
> CPU: 0 PID: 5097 Comm: syz-executor646 Not tainted 6.10.0-rc2-syzkaller-00797-ga12978712d90 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> Call Trace:
>   <TASK>
>   __dump_stack lib/dump_stack.c:88 [inline]
>   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
>   check_deadlock kernel/locking/lockdep.c:3062 [inline]
>   validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3856
>   __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
>   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
>   local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
>   __mmap_lock_do_trace_released+0x9c/0x620 mm/mmap_lock.c:243
>   __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
>   mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
>   bpf_mmap_unlock_mm kernel/bpf/mmap_unlock_work.h:52 [inline]
>   stack_map_get_build_id_offset+0x9c7/0x9d0 kernel/bpf/stackmap.c:173
>   __bpf_get_stack+0x4ad/0x5a0 kernel/bpf/stackmap.c:449
>   bpf_prog_e6cf5f9c69743609+0x42/0x46
>   bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
>   __bpf_prog_run include/linux/filter.h:691 [inline]
>   bpf_prog_run include/linux/filter.h:698 [inline]
>   bpf_prog_run_array include/linux/bpf.h:2104 [inline]
>   trace_call_bpf+0x369/0x8a0 kernel/trace/bpf_trace.c:147
>   perf_trace_run_bpf_submit+0x7c/0x1d0 kernel/events/core.c:10269
>   perf_trace_mmap_lock+0x3d7/0x510 include/trace/events/mmap_lock.h:16
>   trace_mmap_lock_released include/trace/events/mmap_lock.h:50 [inline]
>   __mmap_lock_do_trace_released+0x5bb/0x620 mm/mmap_lock.c:243
>   __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
>   mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
>   acct_collect+0x81d/0x830 kernel/acct.c:566
>   do_exit+0x936/0x27e0 kernel/exit.c:853
>   do_group_exit+0x207/0x2c0 kernel/exit.c:1023
>   __do_sys_exit_group kernel/exit.c:1034 [inline]
>   __se_sys_exit_group kernel/exit.c:1032 [inline]
>   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1032
>   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8fac26d039
> Code: 90 49 c7 c0 b8 ff ff ff be e7 00 00 00 ba 3c 00 00 00 eb 12 0f 1f 44 00 00 89 d0 0f 05 48 3d 00 f0 ff ff 77 1c f4 89 f0 0f 05 <48> 3d 00 f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00
> RSP: 002b:00007ffd95d56e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8fac26d039
> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> RBP: 00007f8fac2e82b0 R08: ffffffffffffffb8 R09: 00000000000000a0
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8fac2e82b0
> R13: 0000000000000000 R14: 00007f8fac2e8d20 R15: 00007f8fac23e1e0
>   </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released
  2024-07-04 20:12 ` Jesper Dangaard Brouer
@ 2024-07-11 14:33   ` Steven Rostedt
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2024-07-11 14:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: syzbot, akpm, cgroups, hannes, linux-kernel, linux-mm,
	linux-trace-kernel, lizefan.x, mathieu.desnoyers, mhiramat,
	netdev, syzkaller-bugs, tj

On Thu, 4 Jul 2024 22:12:45 +0200
Jesper Dangaard Brouer <hawk@kernel.org> wrote:

> > ============================================
> > WARNING: possible recursive locking detected
> > 6.10.0-rc2-syzkaller-00797-ga12978712d90 #0 Not tainted
> > --------------------------------------------
> > syz-executor646/5097 is trying to acquire lock:
> > ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> > ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
> > 
> > but task is already holding lock:
> > ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> > ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
> > 
> > other info that might help us debug this:
> >   Possible unsafe locking scenario:
> > 
> >         CPU0
> >         ----
> >    lock(lock#9);
> >    lock(lock#9);
> > 
> >   *** DEADLOCK ***

Looks like it's trying to take the rwsem mm->mmap_lock recursively. And
rwsems are *not* allowed to be recursively taken, as once there's a writer,
all new acquires of the reader will block. Then you can have:

   CPU0				    CPU1
   ----				    ----
  down_read(lockA);
				down_write(lockA); // blocks
  down_read(lockA); //blocks

DEADLOCK!


> > 
> >   May be due to missing lock nesting notation
> >   
> 
> To me, this looks like a lockdep false-positive, but I might be wrong.
> 
> Could someone with more LOCKDEP knowledge give their interpretation?
> 
> The commit[1] adds a fairly standard trylock scheme.
> Do I need to lockdep annotate trylock's in some special way?
> 
>   [1] https://git.kernel.org/torvalds/c/21c38a3bd4ee3fb733
> 
> Also notice change uses raw_spin_lock, which might be harder for lockdep?
> So, I also enabled CONFIG_PROVE_RAW_LOCK_NESTING in my testlab to help
> with this, and CONFIG_PROVE_LOCKING.
> (And obviously I also enabled LOCKDEP*)
> 
> --Jesper
> 
> > 5 locks held by syz-executor646/5097:
> >   #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
> >   #0: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: acct_collect+0x1cf/0x830 kernel/acct.c:563
> >   #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> >   #1: ffff8880b94387e8 (lock#9){+.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x83/0x620 mm/mmap_lock.c:243
> >   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
> >   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
> >   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_memcg_path_buf mm/mmap_lock.c:139 [inline]
> >   #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: get_mm_memcg_path+0xb1/0x600 mm/mmap_lock.c:209
> >   #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: trace_call_bpf+0xbc/0x8a0
> >   #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_trylock include/linux/mmap_lock.h:163 [inline]
> >   #4: ffff8880182eb118 (&mm->mmap_lock){++++}-{3:3}, at: stack_map_get_build_id_offset+0x237/0x9d0 kernel/bpf/stackmap.c:141
> > 
> > stack backtrace:
> > CPU: 0 PID: 5097 Comm: syz-executor646 Not tainted 6.10.0-rc2-syzkaller-00797-ga12978712d90 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> > Call Trace:
> >   <TASK>
> >   __dump_stack lib/dump_stack.c:88 [inline]
> >   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
> >   check_deadlock kernel/locking/lockdep.c:3062 [inline]
> >   validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3856
> >   __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
> >   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> >   local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> >   __mmap_lock_do_trace_released+0x9c/0x620 mm/mmap_lock.c:243

Here we have:

  static inline void mmap_read_lock(struct mm_struct *mm)
  {
        __mmap_lock_trace_start_locking(mm, false);
        down_read(&mm->mmap_lock);
        __mmap_lock_trace_acquire_returned(mm, false, true);
  }

Which is taking the mm->mmap_lock for read.

> >   __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
> >   mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
> >   bpf_mmap_unlock_mm kernel/bpf/mmap_unlock_work.h:52 [inline]
> >   stack_map_get_build_id_offset+0x9c7/0x9d0 kernel/bpf/stackmap.c:173
> >   __bpf_get_stack+0x4ad/0x5a0 kernel/bpf/stackmap.c:449
> >   bpf_prog_e6cf5f9c69743609+0x42/0x46
> >   bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
> >   __bpf_prog_run include/linux/filter.h:691 [inline]
> >   bpf_prog_run include/linux/filter.h:698 [inline]
> >   bpf_prog_run_array include/linux/bpf.h:2104 [inline]
> >   trace_call_bpf+0x369/0x8a0 kernel/trace/bpf_trace.c:147
> >   perf_trace_run_bpf_submit+0x7c/0x1d0 kernel/events/core.c:10269
> >   perf_trace_mmap_lock+0x3d7/0x510 include/trace/events/mmap_lock.h:16

I'm guessing a bpf program attached to something within the same code:

> >   trace_mmap_lock_released include/trace/events/mmap_lock.h:50 [inline]
> >   __mmap_lock_do_trace_released+0x5bb/0x620 mm/mmap_lock.c:243

Here is the same function as above where it took the mm->mmap_lock.

My guess is the bpf program that attached to this event ends up calling the
same function and it tries to take the rwsem again, and that poses a risk
for deadlock.

-- Steve

> >   __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
> >   mmap_read_unlock include/linux/mmap_lock.h:170 [inline]
> >   acct_collect+0x81d/0x830 kernel/acct.c:566
> >   do_exit+0x936/0x27e0 kernel/exit.c:853
> >   do_group_exit+0x207/0x2c0 kernel/exit.c:1023
> >   __do_sys_exit_group kernel/exit.c:1034 [inline]
> >   __se_sys_exit_group kernel/exit.c:1032 [inline]
> >   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1032
> >   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f8fac26d039
> > Code: 90 49 c7 c0 b8 ff ff ff be e7 00 00 00 ba 3c 00 00 00 eb 12 0f 1f
> > 44 00 00 89 d0 0f 05 48 3d 00 f0 ff ff 77 1c f4 89 f0 0f 05 <48> 3d 00
> > f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 RSP:
> > 002b:00007ffd95d56e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX:
> > ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8fac26d039 RDX:
> > 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP:
> > 00007f8fac2e82b0 R08: ffffffffffffffb8 R09: 00000000000000a0 R10:
> > 0000000000000000 R11: 0000000000000246 R12: 00007f8fac2e82b0 R13:
> > 0000000000000000 R14: 00007f8fac2e8d20 R15: 00007f8fac23e1e0 </TASK>
> > 
> > 
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > For information about bisection process
> > see: https://goo.gl/tpsmEJ#bisection
> > 
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> > 
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> > 
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> > 
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> > 
> > If you want to undo deduplication, reply with:
> > #syz undup  


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released
  2024-07-02 18:54 [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released syzbot
  2024-07-04 20:12 ` Jesper Dangaard Brouer
@ 2024-08-19  4:40 ` syzbot
  1 sibling, 0 replies; 4+ messages in thread
From: syzbot @ 2024-08-19  4:40 UTC (permalink / raw)
  To: akpm, axelrasmussen, bpf, cgroups, hannes, hawk, linux-kernel,
	linux-mm, linux-trace-kernel, lizefan.x, mathieu.desnoyers,
	mhiramat, netdev, nsaenz, nsaenzju, penguin-kernel,
	penguin-kernel, rostedt, syzkaller-bugs, tj

syzbot suspects this issue was fixed by commit:

commit 7d6be67cfdd4a53cea7147313ca13c531e3a470f
Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date:   Fri Jun 21 01:08:41 2024 +0000

    mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=12d48893980000
start commit:   a12978712d90 selftests/bpf: Move ARRAY_SIZE to bpf_misc.h
git tree:       bpf-next
kernel config:  https://syzkaller.appspot.com/x/.config?x=736daf12bd72e034
dashboard link: https://syzkaller.appspot.com/bug?extid=16b6ab88e66b34d09014
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125718be980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14528876980000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-19  4:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02 18:54 [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released syzbot
2024-07-04 20:12 ` Jesper Dangaard Brouer
2024-07-11 14:33   ` Steven Rostedt
2024-08-19  4:40 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).