* [syzbot] WARNING: suspicious RCU usage in get_timespec64
@ 2021-04-04 19:05 syzbot
2021-04-04 20:38 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: syzbot @ 2021-04-04 19:05 UTC (permalink / raw)
To: john.stultz, linux-kernel, sboyd, syzkaller-bugs, tglx
Hello,
syzbot found the following issue on:
HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
=============================
WARNING: suspicious RCU usage
5.12.0-rc5-syzkaller #0 Not tainted
-----------------------------
kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
3 locks held by syz-executor.4/8418:
#0:
ffff8880751d2b28
(
&p->pi_lock
){-.-.}-{2:2}
, at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
#1:
ffff8880b9d35258
(
&rq->lock
){-.-.}-{2:2}
, at: rq_lock kernel/sched/sched.h:1321 [inline]
, at: ttwu_queue kernel/sched/core.c:3184 [inline]
, at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
#2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
stack backtrace:
CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294
__might_fault+0x6e/0x180 mm/memory.c:5018
_copy_from_user+0x27/0x180 lib/usercopy.c:13
copy_from_user include/linux/uaccess.h:192 [inline]
get_timespec64+0x75/0x220 kernel/time/time.c:787
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline]
__x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x48a621
Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f
RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621
RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032
R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-04 19:05 [syzbot] WARNING: suspicious RCU usage in get_timespec64 syzbot @ 2021-04-04 20:38 ` Thomas Gleixner 2021-04-04 21:40 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2021-04-04 20:38 UTC (permalink / raw) To: syzbot Cc: john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra, Paul E. McKenney On Sun, Apr 04 2021 at 12:05, syzbot wrote: Cc + ... > Hello, > > syzbot found the following issue on: > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > ============================= > WARNING: suspicious RCU usage > 5.12.0-rc5-syzkaller #0 Not tainted > ----------------------------- > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > other info that might help us debug this: > > > rcu_scheduler_active = 2, debug_locks = 0 > 3 locks held by syz-executor.4/8418: > #0: > ffff8880751d2b28 > ( > &p->pi_lock > ){-.-.}-{2:2} > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > #1: > ffff8880b9d35258 > ( > &rq->lock > ){-.-.}-{2:2} > , at: rq_lock kernel/sched/sched.h:1321 [inline] > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > stack backtrace: > CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:79 [inline] > dump_stack+0x141/0x1d7 lib/dump_stack.c:120 > ___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294 > __might_fault+0x6e/0x180 mm/memory.c:5018 > _copy_from_user+0x27/0x180 lib/usercopy.c:13 > copy_from_user include/linux/uaccess.h:192 [inline] > get_timespec64+0x75/0x220 kernel/time/time.c:787 > __do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline] > __se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline] > __x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x48a621 > Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f > RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6 > RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621 > RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090 > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032 > R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90 > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-04 20:38 ` Thomas Gleixner @ 2021-04-04 21:40 ` Paul E. McKenney 2021-04-05 3:08 ` Matthew Wilcox 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2021-04-04 21:40 UTC (permalink / raw) To: Thomas Gleixner Cc: syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra, boqun.feng, willy On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > Cc + ... And a couple more... > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > ============================= > > WARNING: suspicious RCU usage > > 5.12.0-rc5-syzkaller #0 Not tainted > > ----------------------------- > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > 3 locks held by syz-executor.4/8418: > > #0: > > ffff8880751d2b28 > > ( > > &p->pi_lock > > ){-.-.}-{2:2} > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > #1: > > ffff8880b9d35258 > > ( > > &rq->lock > > ){-.-.}-{2:2} > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com in that rcu_sleep_check() sees an RCU lock held, but the later call to lockdep_print_held_locks() does not. Did something change recently that could let the ->lockdep_depth counter get out of sync with the actual number of locks held? Thanx, Paul > > stack backtrace: > > CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:79 [inline] > > dump_stack+0x141/0x1d7 lib/dump_stack.c:120 > > ___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294 > > __might_fault+0x6e/0x180 mm/memory.c:5018 > > _copy_from_user+0x27/0x180 lib/usercopy.c:13 > > copy_from_user include/linux/uaccess.h:192 [inline] > > get_timespec64+0x75/0x220 kernel/time/time.c:787 > > __do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline] > > __se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline] > > __x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245 > > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > RIP: 0033:0x48a621 > > Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f > > RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6 > > RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621 > > RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090 > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032 > > R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90 > > > > > > --- > > This report is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > syzbot will keep track of this issue. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-04 21:40 ` Paul E. McKenney @ 2021-04-05 3:08 ` Matthew Wilcox 2021-04-05 4:01 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2021-04-05 3:08 UTC (permalink / raw) To: Paul E. McKenney Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra, boqun.feng On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > Cc + ... > > And a couple more... > > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > ----------------------------- > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > 3 locks held by syz-executor.4/8418: > > > #0: > > > ffff8880751d2b28 > > > ( > > > &p->pi_lock > > > ){-.-.}-{2:2} > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > #1: > > > ffff8880b9d35258 > > > ( > > > &rq->lock > > > ){-.-.}-{2:2} > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > in that rcu_sleep_check() sees an RCU lock held, but the later call to > lockdep_print_held_locks() does not. Did something change recently that > could let the ->lockdep_depth counter get out of sync with the actual > number of locks held? Dmitri had a different theory here: https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 3:08 ` Matthew Wilcox @ 2021-04-05 4:01 ` Paul E. McKenney 2021-04-05 4:30 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2021-04-05 4:01 UTC (permalink / raw) To: Matthew Wilcox Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra, boqun.feng On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > Cc + ... > > > > And a couple more... > > > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > ============================= > > > > WARNING: suspicious RCU usage > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > ----------------------------- > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > 3 locks held by syz-executor.4/8418: > > > > #0: > > > > ffff8880751d2b28 > > > > ( > > > > &p->pi_lock > > > > ){-.-.}-{2:2} > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > #1: > > > > ffff8880b9d35258 > > > > ( > > > > &rq->lock > > > > ){-.-.}-{2:2} > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > lockdep_print_held_locks() does not. Did something change recently that > > could let the ->lockdep_depth counter get out of sync with the actual > > number of locks held? > > Dmitri had a different theory here: > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ There is always room for more than one bug. ;-) He says "one-off false positives". I was afraid of that... Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 4:01 ` Paul E. McKenney @ 2021-04-05 4:30 ` Paul E. McKenney 2021-04-05 5:23 ` Boqun Feng 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2021-04-05 4:30 UTC (permalink / raw) To: Matthew Wilcox Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra, boqun.feng On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > Cc + ... > > > > > > And a couple more... > > > > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > ============================= > > > > > WARNING: suspicious RCU usage > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > ----------------------------- > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > 3 locks held by syz-executor.4/8418: > > > > > #0: > > > > > ffff8880751d2b28 > > > > > ( > > > > > &p->pi_lock > > > > > ){-.-.}-{2:2} > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > #1: > > > > > ffff8880b9d35258 > > > > > ( > > > > > &rq->lock > > > > > ){-.-.}-{2:2} > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > lockdep_print_held_locks() does not. Did something change recently that > > > could let the ->lockdep_depth counter get out of sync with the actual > > > number of locks held? > > > > Dmitri had a different theory here: > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > There is always room for more than one bug. ;-) > > He says "one-off false positives". I was afraid of that... And both the examples I have been copied on today are consistent with debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) in the midst of a call to rcu_sleep_check(). But I would expect to see a panic or another splat if that were to happen. Dmitry's example did have an additional splat, but I would expect the RCU-related one to come second. Again, there is always room for more than one bug. On the other hand, there are a lot more callers to debug_locks_off() than there were last I looked into this. And both of these splats are consistent with an interrupt in the middle of rcu_sleep_check(), and that interrupt's handler invoking debug_locks_off(), but without printing anything to the console. Does that sequence of events ring a bell for anyone? If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck debug_lockdep_rcu_enabled() after evaluating the condition, but with a memory barrier immediately before the recheck. But I am not at all excited by doing this on speculation. Especially given that doing so might be covering up some other bug. Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 4:30 ` Paul E. McKenney @ 2021-04-05 5:23 ` Boqun Feng 2021-04-05 17:27 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Boqun Feng @ 2021-04-05 5:23 UTC (permalink / raw) To: Paul E. McKenney Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote: > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > > > Cc + ... > > > > > > > > And a couple more... > > > > > > > > > > Hello, > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > > git tree: upstream > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > > > ============================= > > > > > > WARNING: suspicious RCU usage > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > > ----------------------------- > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > > 3 locks held by syz-executor.4/8418: > > > > > > #0: > > > > > > ffff8880751d2b28 > > > > > > ( > > > > > > &p->pi_lock > > > > > > ){-.-.}-{2:2} > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > > #1: > > > > > > ffff8880b9d35258 > > > > > > ( > > > > > > &rq->lock > > > > > > ){-.-.}-{2:2} > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > > lockdep_print_held_locks() does not. Did something change recently that > > > > could let the ->lockdep_depth counter get out of sync with the actual > > > > number of locks held? > > > > > > Dmitri had a different theory here: > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > > > There is always room for more than one bug. ;-) > > > > He says "one-off false positives". I was afraid of that... > > And both the examples I have been copied on today are consistent with > debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) > in the midst of a call to rcu_sleep_check(). But I would expect to see > a panic or another splat if that were to happen. > > Dmitry's example did have an additional splat, but I would expect the > RCU-related one to come second. Again, there is always room for more > than one bug. > > On the other hand, there are a lot more callers to debug_locks_off() > than there were last I looked into this. And both of these splats > are consistent with an interrupt in the middle of rcu_sleep_check(), > and that interrupt's handler invoking debug_locks_off(), but without > printing anything to the console. Does that sequence of events ring a > bell for anyone? > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck > debug_lockdep_rcu_enabled() after evaluating the condition, but with > a memory barrier immediately before the recheck. But I am not at all > excited by doing this on speculation. Especially given that doing > so might be covering up some other bug. > Just check the original console log and find: [ 356.696686][ T8418] ============================= [ 356.696692][ T8418] WARNING: suspicious RCU usage [ 356.700193][T14782] ==================================== [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted [ 356.729981][ T8418] ----------------------------- [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held! , so there are two warnnings here, one is from lockdep_rcu_suspisous() and the other is from print_held_locks_bug(). I think this is what happened: in RCU_LOCKDEP_WARN(): if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1 <interrupted> // lockdep detects a lock bug, set debug_locks = 0 <swicth back> && !__warned // true && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0! the cause of the problem is that RCU_LOCKDEP_WARN() in fact read debug_locks twice and get different values. But if you change the ordering of two reads, probably can avoid the problem: First read: lock_is_held(); // true if 1) lock is really held or 2) lockdep is off Second read: debug_lockdep_rcu_enabled(); // if lockdep is not off, we know // that the first read got correct // value, otherwise we just ignore // the first read, because either // there is a bug reported between // two reads, or lockdep is already // off when the first read happens. So maybe something below: Regards, Boqun diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index bd04f722714f..d11bee5d9347 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) #define RCU_LOCKDEP_WARN(c, s) \ do { \ static bool __section(".data.unlikely") __warned; \ - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ __warned = true; \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ } \ > Thanx, Paul ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 5:23 ` Boqun Feng @ 2021-04-05 17:27 ` Paul E. McKenney 2021-04-05 23:25 ` Boqun Feng 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2021-04-05 17:27 UTC (permalink / raw) To: Boqun Feng Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote: > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote: > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > > > > > Cc + ... > > > > > > > > > > And a couple more... > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > > > git tree: upstream > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > > > > > ============================= > > > > > > > WARNING: suspicious RCU usage > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > > > ----------------------------- > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > > > 3 locks held by syz-executor.4/8418: > > > > > > > #0: > > > > > > > ffff8880751d2b28 > > > > > > > ( > > > > > > > &p->pi_lock > > > > > > > ){-.-.}-{2:2} > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > > > #1: > > > > > > > ffff8880b9d35258 > > > > > > > ( > > > > > > > &rq->lock > > > > > > > ){-.-.}-{2:2} > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > > > lockdep_print_held_locks() does not. Did something change recently that > > > > > could let the ->lockdep_depth counter get out of sync with the actual > > > > > number of locks held? > > > > > > > > Dmitri had a different theory here: > > > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > > > > > There is always room for more than one bug. ;-) > > > > > > He says "one-off false positives". I was afraid of that... > > > > And both the examples I have been copied on today are consistent with > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) > > in the midst of a call to rcu_sleep_check(). But I would expect to see > > a panic or another splat if that were to happen. > > > > Dmitry's example did have an additional splat, but I would expect the > > RCU-related one to come second. Again, there is always room for more > > than one bug. > > > > On the other hand, there are a lot more callers to debug_locks_off() > > than there were last I looked into this. And both of these splats > > are consistent with an interrupt in the middle of rcu_sleep_check(), > > and that interrupt's handler invoking debug_locks_off(), but without > > printing anything to the console. Does that sequence of events ring a > > bell for anyone? > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck > > debug_lockdep_rcu_enabled() after evaluating the condition, but with > > a memory barrier immediately before the recheck. But I am not at all > > excited by doing this on speculation. Especially given that doing > > so might be covering up some other bug. > > > > Just check the original console log and find: > > [ 356.696686][ T8418] ============================= > [ 356.696692][ T8418] WARNING: suspicious RCU usage > [ 356.700193][T14782] ==================================== > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted > [ 356.729981][ T8418] ----------------------------- > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held! > > , so there are two warnnings here, one is from lockdep_rcu_suspisous() > and the other is from print_held_locks_bug(). I think this is what > happened: > > in RCU_LOCKDEP_WARN(): > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1 > <interrupted> > // lockdep detects a lock bug, set debug_locks = 0 > <swicth back> > && !__warned // true > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0! > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read > debug_locks twice and get different values. > > But if you change the ordering of two reads, probably can avoid the > problem: > > First read: > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off > > Second read: > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know > // that the first read got correct > // value, otherwise we just ignore > // the first read, because either > // there is a bug reported between > // two reads, or lockdep is already > // off when the first read happens. > > So maybe something below: > > Regards, > Boqun > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index bd04f722714f..d11bee5d9347 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > #define RCU_LOCKDEP_WARN(c, s) \ > do { \ > static bool __section(".data.unlikely") __warned; \ > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > __warned = true; \ > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \ Good point -- if we check debug_lockdep_rcu_enabled() after the condition, then we will reject false positives in cases where debug_locks was switched to zero out from under us. However, we do need ordering. The "c" usually contains lock_is_held(), which also checks debug_locks, but from some other translation unit. Back in the day, the translation-unit boundaries would provide the needed ordering, but LTO... In addition, the "debug_locks = 0" was originally supposed to be a hint that the report might be a false positive. It is clear that this needs to be made explicit. Taking all this together, how about the following? (The intent is that the changes to lockdep_rcu_suspicious() will be in a separate commit.) Thanx, Paul ------------------------------------------------------------------------ diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 9455476..1199ffd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) #define RCU_LOCKDEP_WARN(c, s) \ do { \ static bool __section(".data.unlikely") __warned; \ - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ __warned = true; \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ } \ diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c6d0c1dc..80065cd 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void) void lockdep_rcu_suspicious(const char *file, const int line, const char *s) { struct task_struct *curr = current; + int dl = READ_ONCE(debug_locks); /* Note: the following can be executed concurrently, so be careful. */ pr_warn("\n"); @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) pr_warn("-----------------------------\n"); pr_warn("%s:%d %s!\n", file, line, s); pr_warn("\nother info that might help us debug this:\n\n"); - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s", !rcu_lockdep_current_cpu_online() ? "RCU used illegally from offline CPU!\n" : "", - rcu_scheduler_active, debug_locks); + rcu_scheduler_active, dl, + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n"); /* * If a CPU is in the RCU-free window in idle (ie: in the section diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index b95ae86..dd94a60 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map); noinstr int notrace debug_lockdep_rcu_enabled(void) { - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) && current->lockdep_recursion == 0; } EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 17:27 ` Paul E. McKenney @ 2021-04-05 23:25 ` Boqun Feng 2021-04-05 23:38 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Boqun Feng @ 2021-04-05 23:25 UTC (permalink / raw) To: Paul E. McKenney Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote: > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote: > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote: > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > > > > > > > Cc + ... > > > > > > > > > > > > And a couple more... > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > > > > git tree: upstream > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > ============================= > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > > > > ----------------------------- > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > > > > 3 locks held by syz-executor.4/8418: > > > > > > > > #0: > > > > > > > > ffff8880751d2b28 > > > > > > > > ( > > > > > > > > &p->pi_lock > > > > > > > > ){-.-.}-{2:2} > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > > > > #1: > > > > > > > > ffff8880b9d35258 > > > > > > > > ( > > > > > > > > &rq->lock > > > > > > > > ){-.-.}-{2:2} > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > > > > lockdep_print_held_locks() does not. Did something change recently that > > > > > > could let the ->lockdep_depth counter get out of sync with the actual > > > > > > number of locks held? > > > > > > > > > > Dmitri had a different theory here: > > > > > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > > > > > > > There is always room for more than one bug. ;-) > > > > > > > > He says "one-off false positives". I was afraid of that... > > > > > > And both the examples I have been copied on today are consistent with > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) > > > in the midst of a call to rcu_sleep_check(). But I would expect to see > > > a panic or another splat if that were to happen. > > > > > > Dmitry's example did have an additional splat, but I would expect the > > > RCU-related one to come second. Again, there is always room for more > > > than one bug. > > > > > > On the other hand, there are a lot more callers to debug_locks_off() > > > than there were last I looked into this. And both of these splats > > > are consistent with an interrupt in the middle of rcu_sleep_check(), > > > and that interrupt's handler invoking debug_locks_off(), but without > > > printing anything to the console. Does that sequence of events ring a > > > bell for anyone? > > > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with > > > a memory barrier immediately before the recheck. But I am not at all > > > excited by doing this on speculation. Especially given that doing > > > so might be covering up some other bug. > > > > > > > Just check the original console log and find: > > > > [ 356.696686][ T8418] ============================= > > [ 356.696692][ T8418] WARNING: suspicious RCU usage > > [ 356.700193][T14782] ==================================== > > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted > > [ 356.729981][ T8418] ----------------------------- > > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held! > > > > , so there are two warnnings here, one is from lockdep_rcu_suspisous() > > and the other is from print_held_locks_bug(). I think this is what > > happened: > > > > in RCU_LOCKDEP_WARN(): > > > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1 > > <interrupted> > > // lockdep detects a lock bug, set debug_locks = 0 > > <swicth back> > > && !__warned // true > > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0! > > > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read > > debug_locks twice and get different values. > > > > But if you change the ordering of two reads, probably can avoid the > > problem: > > > > First read: > > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off > > > > Second read: > > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know > > // that the first read got correct > > // value, otherwise we just ignore > > // the first read, because either > > // there is a bug reported between > > // two reads, or lockdep is already > > // off when the first read happens. > > > > So maybe something below: > > > > Regards, > > Boqun > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index bd04f722714f..d11bee5d9347 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > > #define RCU_LOCKDEP_WARN(c, s) \ > > do { \ > > static bool __section(".data.unlikely") __warned; \ > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > __warned = true; \ > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > } \ > > Good point -- if we check debug_lockdep_rcu_enabled() after the condition, > then we will reject false positives in cases where debug_locks was switched > to zero out from under us. > > However, we do need ordering. The "c" usually contains lock_is_held(), > which also checks debug_locks, but from some other translation unit. > Back in the day, the translation-unit boundaries would provide the needed > ordering, but LTO... > > In addition, the "debug_locks = 0" was originally supposed to be a hint > that the report might be a false positive. It is clear that this needs > to be made explicit. > > Taking all this together, how about the following? (The intent is > that the changes to lockdep_rcu_suspicious() will be in a separate > commit.) > Looks good to me ;-) Regards, Boqun > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 9455476..1199ffd 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > #define RCU_LOCKDEP_WARN(c, s) \ > do { \ > static bool __section(".data.unlikely") __warned; \ > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > __warned = true; \ > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \ > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index c6d0c1dc..80065cd 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void) > void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > { > struct task_struct *curr = current; > + int dl = READ_ONCE(debug_locks); > > /* Note: the following can be executed concurrently, so be careful. */ > pr_warn("\n"); > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > pr_warn("-----------------------------\n"); > pr_warn("%s:%d %s!\n", file, line, s); > pr_warn("\nother info that might help us debug this:\n\n"); > - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", > + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s", > !rcu_lockdep_current_cpu_online() > ? "RCU used illegally from offline CPU!\n" > : "", > - rcu_scheduler_active, debug_locks); > + rcu_scheduler_active, dl, > + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n"); > > /* > * If a CPU is in the RCU-free window in idle (ie: in the section > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index b95ae86..dd94a60 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map); > > noinstr int notrace debug_lockdep_rcu_enabled(void) > { > - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) && > current->lockdep_recursion == 0; > } > EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 23:25 ` Boqun Feng @ 2021-04-05 23:38 ` Paul E. McKenney 2021-04-05 23:42 ` Boqun Feng 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2021-04-05 23:38 UTC (permalink / raw) To: Boqun Feng Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra On Tue, Apr 06, 2021 at 07:25:44AM +0800, Boqun Feng wrote: > On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote: > > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote: > > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote: > > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > > > > > > > > > Cc + ... > > > > > > > > > > > > > > And a couple more... > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > > > > > git tree: upstream > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > > > ============================= > > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > > > > > ----------------------------- > > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > > > > > 3 locks held by syz-executor.4/8418: > > > > > > > > > #0: > > > > > > > > > ffff8880751d2b28 > > > > > > > > > ( > > > > > > > > > &p->pi_lock > > > > > > > > > ){-.-.}-{2:2} > > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > > > > > #1: > > > > > > > > > ffff8880b9d35258 > > > > > > > > > ( > > > > > > > > > &rq->lock > > > > > > > > > ){-.-.}-{2:2} > > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > > > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > > > > > lockdep_print_held_locks() does not. Did something change recently that > > > > > > > could let the ->lockdep_depth counter get out of sync with the actual > > > > > > > number of locks held? > > > > > > > > > > > > Dmitri had a different theory here: > > > > > > > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > > > > > > > > > There is always room for more than one bug. ;-) > > > > > > > > > > He says "one-off false positives". I was afraid of that... > > > > > > > > And both the examples I have been copied on today are consistent with > > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) > > > > in the midst of a call to rcu_sleep_check(). But I would expect to see > > > > a panic or another splat if that were to happen. > > > > > > > > Dmitry's example did have an additional splat, but I would expect the > > > > RCU-related one to come second. Again, there is always room for more > > > > than one bug. > > > > > > > > On the other hand, there are a lot more callers to debug_locks_off() > > > > than there were last I looked into this. And both of these splats > > > > are consistent with an interrupt in the middle of rcu_sleep_check(), > > > > and that interrupt's handler invoking debug_locks_off(), but without > > > > printing anything to the console. Does that sequence of events ring a > > > > bell for anyone? > > > > > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck > > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with > > > > a memory barrier immediately before the recheck. But I am not at all > > > > excited by doing this on speculation. Especially given that doing > > > > so might be covering up some other bug. > > > > > > > > > > Just check the original console log and find: > > > > > > [ 356.696686][ T8418] ============================= > > > [ 356.696692][ T8418] WARNING: suspicious RCU usage > > > [ 356.700193][T14782] ==================================== > > > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted > > > [ 356.729981][ T8418] ----------------------------- > > > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held! > > > > > > , so there are two warnnings here, one is from lockdep_rcu_suspisous() > > > and the other is from print_held_locks_bug(). I think this is what > > > happened: > > > > > > in RCU_LOCKDEP_WARN(): > > > > > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1 > > > <interrupted> > > > // lockdep detects a lock bug, set debug_locks = 0 > > > <swicth back> > > > && !__warned // true > > > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0! > > > > > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read > > > debug_locks twice and get different values. > > > > > > But if you change the ordering of two reads, probably can avoid the > > > problem: > > > > > > First read: > > > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off > > > > > > Second read: > > > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know > > > // that the first read got correct > > > // value, otherwise we just ignore > > > // the first read, because either > > > // there is a bug reported between > > > // two reads, or lockdep is already > > > // off when the first read happens. > > > > > > So maybe something below: > > > > > > Regards, > > > Boqun > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > index bd04f722714f..d11bee5d9347 100644 > > > --- a/include/linux/rcupdate.h > > > +++ b/include/linux/rcupdate.h > > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > > > #define RCU_LOCKDEP_WARN(c, s) \ > > > do { \ > > > static bool __section(".data.unlikely") __warned; \ > > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > > __warned = true; \ > > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > > } \ > > > > Good point -- if we check debug_lockdep_rcu_enabled() after the condition, > > then we will reject false positives in cases where debug_locks was switched > > to zero out from under us. > > > > However, we do need ordering. The "c" usually contains lock_is_held(), > > which also checks debug_locks, but from some other translation unit. > > Back in the day, the translation-unit boundaries would provide the needed > > ordering, but LTO... > > > > In addition, the "debug_locks = 0" was originally supposed to be a hint > > that the report might be a false positive. It is clear that this needs > > to be made explicit. > > > > Taking all this together, how about the following? (The intent is > > that the changes to lockdep_rcu_suspicious() will be in a separate > > commit.) > > Looks good to me ;-) Whew! May I add your Reviewed-by? Thanx, Paul > Regards, > Boqun > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 9455476..1199ffd 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > > #define RCU_LOCKDEP_WARN(c, s) \ > > do { \ > > static bool __section(".data.unlikely") __warned; \ > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > __warned = true; \ > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > } \ > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index c6d0c1dc..80065cd 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void) > > void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > > { > > struct task_struct *curr = current; > > + int dl = READ_ONCE(debug_locks); > > > > /* Note: the following can be executed concurrently, so be careful. */ > > pr_warn("\n"); > > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > > pr_warn("-----------------------------\n"); > > pr_warn("%s:%d %s!\n", file, line, s); > > pr_warn("\nother info that might help us debug this:\n\n"); > > - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", > > + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s", > > !rcu_lockdep_current_cpu_online() > > ? "RCU used illegally from offline CPU!\n" > > : "", > > - rcu_scheduler_active, debug_locks); > > + rcu_scheduler_active, dl, > > + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n"); > > > > /* > > * If a CPU is in the RCU-free window in idle (ie: in the section > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > > index b95ae86..dd94a60 100644 > > --- a/kernel/rcu/update.c > > +++ b/kernel/rcu/update.c > > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map); > > > > noinstr int notrace debug_lockdep_rcu_enabled(void) > > { > > - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > > + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) && > > current->lockdep_recursion == 0; > > } > > EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64 2021-04-05 23:38 ` Paul E. McKenney @ 2021-04-05 23:42 ` Boqun Feng 0 siblings, 0 replies; 11+ messages in thread From: Boqun Feng @ 2021-04-05 23:42 UTC (permalink / raw) To: Paul E. McKenney Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra On Mon, Apr 05, 2021 at 04:38:07PM -0700, Paul E. McKenney wrote: > On Tue, Apr 06, 2021 at 07:25:44AM +0800, Boqun Feng wrote: > > On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote: > > > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote: > > > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote: > > > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote: > > > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote: > > > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote: > > > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote: > > > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote: > > > > > > > > > > > > > > > > > > Cc + ... > > > > > > > > > > > > > > > > And a couple more... > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co.. > > > > > > > > > > git tree: upstream > > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000 > > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d > > > > > > > > > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > > > > > ============================= > > > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > > > > > > > ----------------------------- > > > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section! > > > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > > > > > > > 3 locks held by syz-executor.4/8418: > > > > > > > > > > #0: > > > > > > > > > > ffff8880751d2b28 > > > > > > > > > > ( > > > > > > > > > > &p->pi_lock > > > > > > > > > > ){-.-.}-{2:2} > > > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345 > > > > > > > > > > #1: > > > > > > > > > > ffff8880b9d35258 > > > > > > > > > > ( > > > > > > > > > > &rq->lock > > > > > > > > > > ){-.-.}-{2:2} > > > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline] > > > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline] > > > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464 > > > > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807 > > > > > > > > > > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to > > > > > > > > lockdep_print_held_locks() does not. Did something change recently that > > > > > > > > could let the ->lockdep_depth counter get out of sync with the actual > > > > > > > > number of locks held? > > > > > > > > > > > > > > Dmitri had a different theory here: > > > > > > > > > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ > > > > > > > > > > > > There is always room for more than one bug. ;-) > > > > > > > > > > > > He says "one-off false positives". I was afraid of that... > > > > > > > > > > And both the examples I have been copied on today are consistent with > > > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off()) > > > > > in the midst of a call to rcu_sleep_check(). But I would expect to see > > > > > a panic or another splat if that were to happen. > > > > > > > > > > Dmitry's example did have an additional splat, but I would expect the > > > > > RCU-related one to come second. Again, there is always room for more > > > > > than one bug. > > > > > > > > > > On the other hand, there are a lot more callers to debug_locks_off() > > > > > than there were last I looked into this. And both of these splats > > > > > are consistent with an interrupt in the middle of rcu_sleep_check(), > > > > > and that interrupt's handler invoking debug_locks_off(), but without > > > > > printing anything to the console. Does that sequence of events ring a > > > > > bell for anyone? > > > > > > > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck > > > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with > > > > > a memory barrier immediately before the recheck. But I am not at all > > > > > excited by doing this on speculation. Especially given that doing > > > > > so might be covering up some other bug. > > > > > > > > > > > > > Just check the original console log and find: > > > > > > > > [ 356.696686][ T8418] ============================= > > > > [ 356.696692][ T8418] WARNING: suspicious RCU usage > > > > [ 356.700193][T14782] ==================================== > > > > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted > > > > [ 356.729981][ T8418] ----------------------------- > > > > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held! > > > > > > > > , so there are two warnnings here, one is from lockdep_rcu_suspisous() > > > > and the other is from print_held_locks_bug(). I think this is what > > > > happened: > > > > > > > > in RCU_LOCKDEP_WARN(): > > > > > > > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1 > > > > <interrupted> > > > > // lockdep detects a lock bug, set debug_locks = 0 > > > > <swicth back> > > > > && !__warned // true > > > > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0! > > > > > > > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read > > > > debug_locks twice and get different values. > > > > > > > > But if you change the ordering of two reads, probably can avoid the > > > > problem: > > > > > > > > First read: > > > > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off > > > > > > > > Second read: > > > > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know > > > > // that the first read got correct > > > > // value, otherwise we just ignore > > > > // the first read, because either > > > > // there is a bug reported between > > > > // two reads, or lockdep is already > > > > // off when the first read happens. > > > > > > > > So maybe something below: > > > > > > > > Regards, > > > > Boqun > > > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > > index bd04f722714f..d11bee5d9347 100644 > > > > --- a/include/linux/rcupdate.h > > > > +++ b/include/linux/rcupdate.h > > > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > > > > #define RCU_LOCKDEP_WARN(c, s) \ > > > > do { \ > > > > static bool __section(".data.unlikely") __warned; \ > > > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > > > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > > > __warned = true; \ > > > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > > > } \ > > > > > > Good point -- if we check debug_lockdep_rcu_enabled() after the condition, > > > then we will reject false positives in cases where debug_locks was switched > > > to zero out from under us. > > > > > > However, we do need ordering. The "c" usually contains lock_is_held(), > > > which also checks debug_locks, but from some other translation unit. > > > Back in the day, the translation-unit boundaries would provide the needed > > > ordering, but LTO... > > > > > > In addition, the "debug_locks = 0" was originally supposed to be a hint > > > that the report might be a false positive. It is clear that this needs > > > to be made explicit. > > > > > > Taking all this together, how about the following? (The intent is > > > that the changes to lockdep_rcu_suspicious() will be in a separate > > > commit.) > > > > Looks good to me ;-) > > Whew! May I add your Reviewed-by? > Of course ;-) Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Regards, Boqun > Thanx, Paul > > > Regards, > > Boqun > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > index 9455476..1199ffd 100644 > > > --- a/include/linux/rcupdate.h > > > +++ b/include/linux/rcupdate.h > > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void) > > > #define RCU_LOCKDEP_WARN(c, s) \ > > > do { \ > > > static bool __section(".data.unlikely") __warned; \ > > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > > __warned = true; \ > > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > > } \ > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > index c6d0c1dc..80065cd 100644 > > > --- a/kernel/locking/lockdep.c > > > +++ b/kernel/locking/lockdep.c > > > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void) > > > void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > > > { > > > struct task_struct *curr = current; > > > + int dl = READ_ONCE(debug_locks); > > > > > > /* Note: the following can be executed concurrently, so be careful. */ > > > pr_warn("\n"); > > > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > > > pr_warn("-----------------------------\n"); > > > pr_warn("%s:%d %s!\n", file, line, s); > > > pr_warn("\nother info that might help us debug this:\n\n"); > > > - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", > > > + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s", > > > !rcu_lockdep_current_cpu_online() > > > ? "RCU used illegally from offline CPU!\n" > > > : "", > > > - rcu_scheduler_active, debug_locks); > > > + rcu_scheduler_active, dl, > > > + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n"); > > > > > > /* > > > * If a CPU is in the RCU-free window in idle (ie: in the section > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > > > index b95ae86..dd94a60 100644 > > > --- a/kernel/rcu/update.c > > > +++ b/kernel/rcu/update.c > > > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map); > > > > > > noinstr int notrace debug_lockdep_rcu_enabled(void) > > > { > > > - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > > > + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) && > > > current->lockdep_recursion == 0; > > > } > > > EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-04-05 23:43 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-04-04 19:05 [syzbot] WARNING: suspicious RCU usage in get_timespec64 syzbot 2021-04-04 20:38 ` Thomas Gleixner 2021-04-04 21:40 ` Paul E. McKenney 2021-04-05 3:08 ` Matthew Wilcox 2021-04-05 4:01 ` Paul E. McKenney 2021-04-05 4:30 ` Paul E. McKenney 2021-04-05 5:23 ` Boqun Feng 2021-04-05 17:27 ` Paul E. McKenney 2021-04-05 23:25 ` Boqun Feng 2021-04-05 23:38 ` Paul E. McKenney 2021-04-05 23:42 ` Boqun Feng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox