* Something is leaking RCU holds from interrupt context [not found] <00000000000025a67605bf1dd4ab@google.com> @ 2021-04-04 10:24 ` Matthew Wilcox 2021-04-04 16:48 ` Paul E. McKenney 2021-04-06 8:33 ` Peter Zijlstra 0 siblings, 2 replies; 5+ messages in thread From: Matthew Wilcox @ 2021-04-04 10:24 UTC (permalink / raw) To: syzbot Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, viro, netdev, tglx, peterz, frederic, paulmck On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > WARNING: suspicious RCU usage > 5.12.0-rc5-syzkaller #0 Not tainted > ----------------------------- > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > other info that might help us debug this: > > > rcu_scheduler_active = 2, debug_locks = 0 > no locks held by systemd-udevd/4825. I think we have something that's taking the RCU read lock in (soft?) interrupt context and not releasing it properly in all situations. This thread doesn't have any locks recorded, but lock_is_held(&rcu_bh_lock_map) is true. Is there some debugging code that could find this? eg should lockdep_softirq_end() check that rcu_bh_lock_map is not held? (if it's taken in process context, then BHs can't run, so if it's held at softirq exit, then there's definitely a problem). > stack backtrace: > CPU: 0 PID: 4825 Comm: systemd-udevd Not tainted 5.12.0-rc5-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:79 [inline] > dump_stack+0x141/0x1d7 lib/dump_stack.c:120 > ___might_sleep+0x229/0x2c0 kernel/sched/core.c:8294 > __might_fault+0x6e/0x180 mm/memory.c:5018 > strncpy_from_user+0x2f/0x3e0 lib/strncpy_from_user.c:117 > getname_flags.part.0+0x95/0x4f0 fs/namei.c:149 > getname_flags fs/namei.c:2733 [inline] > user_path_at_empty+0xa1/0x100 fs/namei.c:2733 > user_path_at include/linux/namei.h:60 [inline] > do_faccessat+0x127/0x850 fs/open.c:425 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Nothing in this path calls rcu_read_lock_bh(). It's almost exclusively used by the networking code. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Something is leaking RCU holds from interrupt context 2021-04-04 10:24 ` Something is leaking RCU holds from interrupt context Matthew Wilcox @ 2021-04-04 16:48 ` Paul E. McKenney 2021-04-04 18:24 ` Matthew Wilcox 2021-04-06 8:33 ` Peter Zijlstra 1 sibling, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2021-04-04 16:48 UTC (permalink / raw) To: Matthew Wilcox Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro, netdev, tglx, peterz, frederic On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote: > On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > WARNING: suspicious RCU usage > > 5.12.0-rc5-syzkaller #0 Not tainted > > ----------------------------- > > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > no locks held by systemd-udevd/4825. > > I think we have something that's taking the RCU read lock in > (soft?) interrupt context and not releasing it properly in all > situations. This thread doesn't have any locks recorded, but > lock_is_held(&rcu_bh_lock_map) is true. > > Is there some debugging code that could find this? eg should > lockdep_softirq_end() check that rcu_bh_lock_map is not held? > (if it's taken in process context, then BHs can't run, so if it's > held at softirq exit, then there's definitely a problem). Something like the (untested) patch below? Please note that it does not make sense to also check for either rcu_lock_map or rcu_sched_lock_map because either of these might be held by the interrupted code. Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/softirq.c b/kernel/softirq.c index 42f3f8c..e4ad0a6 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -504,6 +504,7 @@ static inline void lockdep_softirq_end(bool in_hardirq) { lockdep_softirq_exit(); + RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map)); if (in_hardirq) lockdep_hardirq_enter(); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Something is leaking RCU holds from interrupt context 2021-04-04 16:48 ` Paul E. McKenney @ 2021-04-04 18:24 ` Matthew Wilcox 2021-04-04 21:36 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2021-04-04 18:24 UTC (permalink / raw) To: Paul E. McKenney Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro, netdev, tglx, peterz, frederic On Sun, Apr 04, 2021 at 09:48:08AM -0700, Paul E. McKenney wrote: > On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote: > > On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > > > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > > WARNING: suspicious RCU usage > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > ----------------------------- > > > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > no locks held by systemd-udevd/4825. > > > > I think we have something that's taking the RCU read lock in > > (soft?) interrupt context and not releasing it properly in all > > situations. This thread doesn't have any locks recorded, but > > lock_is_held(&rcu_bh_lock_map) is true. > > > > Is there some debugging code that could find this? eg should > > lockdep_softirq_end() check that rcu_bh_lock_map is not held? > > (if it's taken in process context, then BHs can't run, so if it's > > held at softirq exit, then there's definitely a problem). > > Something like the (untested) patch below? Maybe? Will this tell us who took the lock? I was really trying to throw out a suggestion in the hope that somebody who knows this area better than I do would tell me I was wrong. > Please note that it does not make sense to also check for > either rcu_lock_map or rcu_sched_lock_map because either of > these might be held by the interrupted code. Yes! Although if we do it somewhere like tasklet_action_common(), we could do something like: +++ b/kernel/softirq.c @@ -774,6 +774,7 @@ static void tasklet_action_common(struct softirq_action *a, while (list) { struct tasklet_struct *t = list; + unsigned long rcu_lockdep = rcu_get_lockdep_state(); list = list->next; @@ -790,6 +791,10 @@ static void tasklet_action_common(struct softirq_action *a, } tasklet_unlock(t); } + if (rcu_lockdep != rcu_get_lockdep_state()) { + printk(something useful about t); + RCU_LOCKDEP_WARN(... something else useful ...); + } local_irq_disable(); where rcu_get_lockdep_state() returns a bitmap of whether the four rcu lockdep maps are held. We might also need something similar in __do_softirq(), in case it's not a tasklet that's the problem. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Something is leaking RCU holds from interrupt context 2021-04-04 18:24 ` Matthew Wilcox @ 2021-04-04 21:36 ` Paul E. McKenney 0 siblings, 0 replies; 5+ messages in thread From: Paul E. McKenney @ 2021-04-04 21:36 UTC (permalink / raw) To: Matthew Wilcox Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro, netdev, tglx, peterz, frederic On Sun, Apr 04, 2021 at 07:24:53PM +0100, Matthew Wilcox wrote: > On Sun, Apr 04, 2021 at 09:48:08AM -0700, Paul E. McKenney wrote: > > On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote: > > > On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > > > > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > > > > > WARNING: suspicious RCU usage > > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > > ----------------------------- > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > > no locks held by systemd-udevd/4825. > > > > > > I think we have something that's taking the RCU read lock in > > > (soft?) interrupt context and not releasing it properly in all > > > situations. This thread doesn't have any locks recorded, but > > > lock_is_held(&rcu_bh_lock_map) is true. > > > > > > Is there some debugging code that could find this? eg should > > > lockdep_softirq_end() check that rcu_bh_lock_map is not held? > > > (if it's taken in process context, then BHs can't run, so if it's > > > held at softirq exit, then there's definitely a problem). > > > > Something like the (untested) patch below? > > Maybe? Will this tell us who took the lock? I was really trying to > throw out a suggestion in the hope that somebody who knows this area > better than I do would tell me I was wrong. No idea. If it is reproducible, hopefully someone will try it. If it is not reproducible, so it goes! And hey, it is not my fault that you sounded like you knew what you were talking about! ;-) But yes, now that you mention it, it is odd that rcu_sleep_check() thought that rcu_bh_lock_map was held, but lockdep_rcu_suspicious() does not. One clue might be that rcu_sleep_check() is looking at rcu_bh_lock_map() itself, while lockdep_rcu_suspicious() and its callee, lockdep_print_held_locks(), rely on current->lockdep_depth. Maybe these have gotten out of sync. > > Please note that it does not make sense to also check for > > either rcu_lock_map or rcu_sched_lock_map because either of > > these might be held by the interrupted code. > > Yes! Although if we do it somewhere like tasklet_action_common(), > we could do something like: > > +++ b/kernel/softirq.c > @@ -774,6 +774,7 @@ static void tasklet_action_common(struct softirq_action *a, > > while (list) { > struct tasklet_struct *t = list; > + unsigned long rcu_lockdep = rcu_get_lockdep_state(); > > list = list->next; > > @@ -790,6 +791,10 @@ static void tasklet_action_common(struct softirq_action *a, > } > tasklet_unlock(t); > } > + if (rcu_lockdep != rcu_get_lockdep_state()) { > + printk(something useful about t); > + RCU_LOCKDEP_WARN(... something else useful ...); > + } > > local_irq_disable(); > > where rcu_get_lockdep_state() returns a bitmap of whether the four rcu > lockdep maps are held. > > We might also need something similar in __do_softirq(), in case it's > not a tasklet that's the problem. The rcu_get_lockdep_state() function would just set bits based on RCU's various lockdep maps, but the comparison would need to take at least debug_lockdep_rcu_enabled() into account. Just in case there is a lockdep report between the sampling and comparison. But first let's see what the lockdep experts have to say. Thanx, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Something is leaking RCU holds from interrupt context 2021-04-04 10:24 ` Something is leaking RCU holds from interrupt context Matthew Wilcox 2021-04-04 16:48 ` Paul E. McKenney @ 2021-04-06 8:33 ` Peter Zijlstra 1 sibling, 0 replies; 5+ messages in thread From: Peter Zijlstra @ 2021-04-06 8:33 UTC (permalink / raw) To: Matthew Wilcox Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro, netdev, tglx, frederic, paulmck On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote: > On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com > > > > WARNING: suspicious RCU usage > > 5.12.0-rc5-syzkaller #0 Not tainted > > ----------------------------- > > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > no locks held by systemd-udevd/4825. > > I think we have something that's taking the RCU read lock in > (soft?) interrupt context and not releasing it properly in all > situations. This thread doesn't have any locks recorded, but > lock_is_held(&rcu_bh_lock_map) is true. > > Is there some debugging code that could find this? eg should > lockdep_softirq_end() check that rcu_bh_lock_map is not held? > (if it's taken in process context, then BHs can't run, so if it's > held at softirq exit, then there's definitely a problem). Hmm, I'm sure i've written something like that at least once, but I can't seem to find it :/ Does something like the completely untested below work for you? --- diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 600c10da321a..d8aa1dc481b6 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -54,6 +54,8 @@ extern void trace_hardirqs_off_finish(void); extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); +extern void lockdep_validate_context_empty(void); + # define lockdep_hardirq_context() (raw_cpu_read(hardirq_context)) # define lockdep_softirq_context(p) ((p)->softirq_context) # define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled)) @@ -69,6 +71,7 @@ do { \ } while (0) # define lockdep_hardirq_exit() \ do { \ + lockdep_validate_context_empty(); \ __this_cpu_dec(hardirq_context); \ } while (0) # define lockdep_softirq_enter() \ @@ -77,6 +80,7 @@ do { \ } while (0) # define lockdep_softirq_exit() \ do { \ + lockdep_validate_context_empty(); \ current->softirq_context--; \ } while (0) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 82db977eada8..09ac70d1b3a6 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2697,6 +2697,37 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev, return 0; } +void lockdep_validate_context_empty(void) +{ + struct task_struct *curr = current; + int depth, ctx = task_irq_context(curr); + + if (debug_locks_silent) + return; + + depth = curr->lockdep_depth; + if (!depth) + return; + + if (curr->held_locks[depth-1].irq_context != ctx) + return; + + + pr_warn("\n"); + pr_warn("====================================\n"); + pr_warn("WARNING: Asymmetric locking detected\n"); + print_kernel_ident(); + pr_warn("------------------------------------\n"); + + pr_warn("%s/%d is leaving an IRQ context with extra locks on\n", + curr->comm, task_pid_nr(curr)); + + lockdep_printk_held_locks(curr); + + printk("\nstack backtrace:\n"); + dump_stack(); +} + #else static inline int check_irq_usage(struct task_struct *curr, ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-06 8:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <00000000000025a67605bf1dd4ab@google.com>
2021-04-04 10:24 ` Something is leaking RCU holds from interrupt context Matthew Wilcox
2021-04-04 16:48 ` Paul E. McKenney
2021-04-04 18:24 ` Matthew Wilcox
2021-04-04 21:36 ` Paul E. McKenney
2021-04-06 8:33 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).