public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	syzbot <syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com>,
	john.stultz@linaro.org, linux-kernel@vger.kernel.org,
	sboyd@kernel.org, syzkaller-bugs@googlegroups.com,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
Date: Tue, 6 Apr 2021 07:42:06 +0800	[thread overview]
Message-ID: <YGugTjEWCCyul/iv@boqun-archlinux> (raw)
In-Reply-To: <20210405233807.GO2696@paulmck-ThinkPad-P72>

On Mon, Apr 05, 2021 at 04:38:07PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 06, 2021 at 07:25:44AM +0800, Boqun Feng wrote:
> > On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote:
> > > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote:
> > > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > > > > > > 
> > > > > > > > > Cc + ...
> > > > > > > > 
> > > > > > > > And a couple more...
> > > > > > > > 
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > syzbot found the following issue on:
> > > > > > > > > >
> > > > > > > > > > HEAD commit:    5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > > > > > git tree:       upstream
> > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > > > > > >
> > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > > > > >
> > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > > > > > >
> > > > > > > > > > =============================
> > > > > > > > > > WARNING: suspicious RCU usage
> > > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > > > > > -----------------------------
> > > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > > > > > >
> > > > > > > > > > other info that might help us debug this:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > > > > >  #0: 
> > > > > > > > > > ffff8880751d2b28
> > > > > > > > > >  (
> > > > > > > > > > &p->pi_lock
> > > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > > > > >  #1: 
> > > > > > > > > > ffff8880b9d35258
> > > > > > > > > >  (
> > > > > > > > > > &rq->lock
> > > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > > > > >  #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > > > > > > 
> > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > > > > > lockdep_print_held_locks() does not.  Did something change recently that
> > > > > > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > > > > > number of locks held?
> > > > > > > 
> > > > > > > Dmitri had a different theory here:
> > > > > > > 
> > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> > > > > > 
> > > > > > There is always room for more than one bug.  ;-)
> > > > > > 
> > > > > > He says "one-off false positives".  I was afraid of that...
> > > > > 
> > > > > And both the examples I have been copied on today are consistent with
> > > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> > > > > in the midst of a call to rcu_sleep_check().  But I would expect to see
> > > > > a panic or another splat if that were to happen.
> > > > > 
> > > > > Dmitry's example did have an additional splat, but I would expect the
> > > > > RCU-related one to come second.  Again, there is always room for more
> > > > > than one bug.
> > > > > 
> > > > > On the other hand, there are a lot more callers to debug_locks_off()
> > > > > than there were last I looked into this.  And both of these splats
> > > > > are consistent with an interrupt in the middle of rcu_sleep_check(),
> > > > > and that interrupt's handler invoking debug_locks_off(), but without
> > > > > printing anything to the console.  Does that sequence of events ring a
> > > > > bell for anyone?
> > > > > 
> > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> > > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with
> > > > > a memory barrier immediately before the recheck.  But I am not at all
> > > > > excited by doing this on speculation.  Especially given that doing
> > > > > so might be covering up some other bug.
> > > > > 
> > > > 
> > > > Just check the original console log and find:
> > > > 
> > > > [  356.696686][ T8418] =============================
> > > > [  356.696692][ T8418] WARNING: suspicious RCU usage
> > > > [  356.700193][T14782] ====================================
> > > > [  356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
> > > > [  356.729981][ T8418] -----------------------------
> > > > [  356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
> > > > 
> > > > , so there are two warnnings here, one is from lockdep_rcu_suspisous()
> > > > and the other is from print_held_locks_bug(). I think this is what
> > > > happened:
> > > > 
> > > > in RCU_LOCKDEP_WARN():
> > > > 
> > > > 	if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
> > > > 	<interrupted>
> > > > 	// lockdep detects a lock bug, set debug_locks = 0
> > > > 	<swicth back>
> > > > 	    && !__warned // true
> > > > 	    && (c))      // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
> > > > 
> > > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
> > > > debug_locks twice and get different values.
> > > > 
> > > > But if you change the ordering of two reads, probably can avoid the
> > > > problem:
> > > > 	
> > > > First read:
> > > > 	lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
> > > > 
> > > > Second read:
> > > > 	debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
> > > > 				     // that the first read got correct
> > > > 				     // value, otherwise we just ignore
> > > > 				     // the first read, because either
> > > > 				     // there is a bug reported between
> > > > 				     // two reads, or lockdep is already
> > > > 				     // off when the first read happens.
> > > > 
> > > > So maybe something below:
> > > > 
> > > > Regards,
> > > > Boqun
> > > > 
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index bd04f722714f..d11bee5d9347 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > > >  #define RCU_LOCKDEP_WARN(c, s)						\
> > > >  	do {								\
> > > >  		static bool __section(".data.unlikely") __warned;	\
> > > > -		if (debug_lockdep_rcu_enabled() && !__warned && (c)) {	\
> > > > +		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> > > >  			__warned = true;				\
> > > >  			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
> > > >  		}							\
> > > 
> > > Good point -- if we check debug_lockdep_rcu_enabled() after the condition,
> > > then we will reject false positives in cases where debug_locks was switched
> > > to zero out from under us.
> > > 
> > > However, we do need ordering.  The "c" usually contains lock_is_held(),
> > > which also checks debug_locks, but from some other translation unit.
> > > Back in the day, the translation-unit boundaries would provide the needed
> > > ordering, but LTO...
> > > 
> > > In addition, the "debug_locks = 0" was originally supposed to be a hint
> > > that the report might be a false positive.  It is clear that this needs
> > > to be made explicit.
> > > 
> > > Taking all this together, how about the following?  (The intent is
> > > that the changes to lockdep_rcu_suspicious() will be in a separate
> > > commit.)
> > 
> > Looks good to me ;-)
> 
> Whew!  May I add your Reviewed-by?
> 

Of course ;-)

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 9455476..1199ffd 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > >  #define RCU_LOCKDEP_WARN(c, s)						\
> > >  	do {								\
> > >  		static bool __section(".data.unlikely") __warned;	\
> > > -		if (debug_lockdep_rcu_enabled() && !__warned && (c)) {	\
> > > +		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> > >  			__warned = true;				\
> > >  			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
> > >  		}							\
> > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > index c6d0c1dc..80065cd 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
> > >  void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > >  {
> > >  	struct task_struct *curr = current;
> > > +	int dl = READ_ONCE(debug_locks);
> > >  
> > >  	/* Note: the following can be executed concurrently, so be careful. */
> > >  	pr_warn("\n");
> > > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > >  	pr_warn("-----------------------------\n");
> > >  	pr_warn("%s:%d %s!\n", file, line, s);
> > >  	pr_warn("\nother info that might help us debug this:\n\n");
> > > -	pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
> > > +	pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s",
> > >  	       !rcu_lockdep_current_cpu_online()
> > >  			? "RCU used illegally from offline CPU!\n"
> > >  			: "",
> > > -	       rcu_scheduler_active, debug_locks);
> > > +	       rcu_scheduler_active, dl,
> > > +	       dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n");
> > >  
> > >  	/*
> > >  	 * If a CPU is in the RCU-free window in idle (ie: in the section
> > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > index b95ae86..dd94a60 100644
> > > --- a/kernel/rcu/update.c
> > > +++ b/kernel/rcu/update.c
> > > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
> > >  
> > >  noinstr int notrace debug_lockdep_rcu_enabled(void)
> > >  {
> > > -	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> > > +	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) &&
> > >  	       current->lockdep_recursion == 0;
> > >  }
> > >  EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);

      reply	other threads:[~2021-04-05 23:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-04 19:05 [syzbot] WARNING: suspicious RCU usage in get_timespec64 syzbot
2021-04-04 20:38 ` Thomas Gleixner
2021-04-04 21:40   ` Paul E. McKenney
2021-04-05  3:08     ` Matthew Wilcox
2021-04-05  4:01       ` Paul E. McKenney
2021-04-05  4:30         ` Paul E. McKenney
2021-04-05  5:23           ` Boqun Feng
2021-04-05 17:27             ` Paul E. McKenney
2021-04-05 23:25               ` Boqun Feng
2021-04-05 23:38                 ` Paul E. McKenney
2021-04-05 23:42                   ` Boqun Feng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGugTjEWCCyul/iv@boqun-archlinux \
    --to=boqun.feng@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox