Re: RCU idle CPU detection is broken in linux-next

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <levinsasha928@gmail.com>
Cc: Michael Wang <wangyun@linux.vnet.ibm.com>,
	Dave Jones <davej@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: RCU idle CPU detection is broken in linux-next
Date: Thu, 20 Sep 2012 08:23:41 -0700	[thread overview]
Message-ID: <20120920152341.GE2449@linux.vnet.ibm.com> (raw)
In-Reply-To: <505AC979.7000008@gmail.com>

On Thu, Sep 20, 2012 at 09:44:57AM +0200, Sasha Levin wrote:
> On 09/20/2012 09:33 AM, Michael Wang wrote:
> > On 09/20/2012 01:06 AM, Paul E. McKenney wrote:
> >> On Wed, Sep 19, 2012 at 06:35:36PM +0200, Sasha Levin wrote:
> >>> On 09/19/2012 05:39 PM, Paul E. McKenney wrote:
> >>>> On Wed, Sep 12, 2012 at 07:56:48PM +0200, Sasha Levin wrote:
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> While fuzzing using trinity inside a KVM tools guest, I've managed to trigger
> >>>>>> "RCU used illegally from idle CPU!" warnings several times.
> >>>>>>
> >>>>>> There are a bunch of traces which seem to pop exactly at the same time and from
> >>>>>> different places around the kernel. Here are several of them:
> >>>> Hello, Sasha,
> >>>>
> >>>> OK, interesting.  Could you please try reproducing with the diagnostic
> >>>> patch shown below?
> >>>
> >>> Sure - here are the results (btw, it reproduces very easily):
> >>>
> >>> [ 13.525119] ================================================
> >>> [ 13.527165] [ BUG: lock held when returning to user space! ]
> >>> [ 13.528752] 3.6.0-rc6-next-20120918-sasha-00002-g190c311-dirty #362 Tainted: GW
> >>> [ 13.531314] ------------------------------------------------
> >>> [ 13.532918] init/1 is leaving the kernel with locks still held!
> >>> [ 13.534574] 1 lock held by init/1:
> >>> [ 13.535533] #0: (rcu_idle){.+.+..}, at: [<ffffffff811c36d0>]
> >>> rcu_eqs_enter_common+0x1a0/0x9a0
> >>>
> >>> I'm basically seeing lots of the above, so I can't even get to the point where I
> >>> get the previous lockdep warnings.
> >>
> >> OK, that diagnostic patch was unhelpful.  Back to the drawing board...
> > 
> > May be we could first make sure the cpu_idle() behave properly?
> > 
> > Since according to the log, rcu think cpu is idle while current pid
> > is not 0, that could happen if things broken in cpu_idle() which
> > is very dependent on platform.
> > 
> > So check it when idle thread was switched out may could be the first
> > step? some thing like below.
> > 
> > Regards,
> > Michael Wang
> > 
> > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> > index b6baf37..f8c7354 100644
> > --- a/kernel/sched/idle_task.c
> > +++ b/kernel/sched/idle_task.c
> > @@ -43,6 +43,7 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
> >  
> >  static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
> >  {
> > +       WARN_ON(rcu_is_cpu_idle());
> >  }
> >  
> >  static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
> 
> Looks like you're on to something, with the small patch above applied:
> 
> [   23.514223] ------------[ cut here ]------------
> [   23.515496] WARNING: at kernel/sched/idle_task.c:46
> put_prev_task_idle+0x1e/0x30()
> [   23.517498] Pid: 0, comm: swapper/0 Tainted: G        W
> 3.6.0-rc6-next-20120919-sasha-00001-gb54aafe-dirty #366
> [   23.520393] Call Trace:
> [   23.521882]  [<ffffffff8115167e>] ? put_prev_task_idle+0x1e/0x30
> [   23.524220]  [<ffffffff81106736>] warn_slowpath_common+0x86/0xb0
> [   23.524220]  [<ffffffff81106825>] warn_slowpath_null+0x15/0x20
> [   23.524220]  [<ffffffff8115167e>] put_prev_task_idle+0x1e/0x30
> [   23.524220]  [<ffffffff839ea61e>] __schedule+0x25e/0x8f0
> [   23.524220]  [<ffffffff81175ebd>] ? tick_nohz_idle_exit+0x18d/0x1c0
> [   23.524220]  [<ffffffff839ead05>] schedule+0x55/0x60
> [   23.524220]  [<ffffffff81078540>] cpu_idle+0x90/0x160
> [   23.524220]  [<ffffffff8383043c>] rest_init+0x130/0x144
> [   23.524220]  [<ffffffff8383030c>] ? csum_partial_copy_generic+0x16c/0x16c
> [   23.524220]  [<ffffffff858acc18>] start_kernel+0x38d/0x39a
> [   23.524220]  [<ffffffff858ac5fe>] ? repair_env_string+0x5e/0x5e
> [   23.524220]  [<ffffffff858ac326>] x86_64_start_reservations+0x101/0x105
> [   23.524220]  [<ffffffff858ac472>] x86_64_start_kernel+0x148/0x157
> [   23.524220] ---[ end trace 2c3061ab727afec2 ]---

It looks like someone is exiting the idle loop without telling RCU
about it.  Architectures are supposed to invoke rcu_idle_exit() before
they leave the idle loop.  This was in fact my guess yesterday, which is
why I tried to enlist lockdep's help, forgetting that lockdep complains
about holding a lock when entering the idle loop.

A couple of possible things to try:

1.	Inspect the idle loop to see if it can invoke schedule() without
	invoking rcu_idle_exit().  This might happen indirectly, for
	example, by calling mutex_lock().

2.	Bisect to see what caused the warning to appear -- perhaps when
	someone put a mutex_lock() or some such into the idle loop
	without protecting it with rcu_idle_exit() or RCU_NONIDLE().

Seem reasonable?

							Thanx, Paul

next prev parent reply	other threads:[~2012-09-20 15:41 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12 17:56 RCU idle CPU detection is broken in linux-next Sasha Levin
2012-09-19 15:39 ` Paul E. McKenney
2012-09-19 16:35   ` Sasha Levin
2012-09-19 17:06     ` Paul E. McKenney
2012-09-19 22:27       ` Sasha Levin
2012-09-20  7:33       ` Michael Wang
2012-09-20  7:44         ` Sasha Levin
2012-09-20  8:14           ` Michael Wang
2012-09-20 15:23           ` Paul E. McKenney [this message]
2012-09-21  9:30             ` Sasha Levin
2012-09-21 12:13               ` Paul E. McKenney
2012-09-21 13:26                 ` Sasha Levin
2012-09-21 15:12                   ` Paul E. McKenney
2012-09-21 15:18                     ` Sasha Levin
2012-09-22  8:26                       ` Sasha Levin
2012-09-22 15:09                         ` Paul E. McKenney
2012-09-22 15:20                           ` Paul E. McKenney
2012-09-22 15:40                           ` Sasha Levin
2012-09-22 15:56                             ` Paul E. McKenney
2012-09-22 17:50                               ` Sasha Levin
2012-09-22 21:27                                 ` Paul E. McKenney
2012-09-23  0:21                                   ` Paul E. McKenney
2012-09-23  5:39                                     ` Sasha Levin
2012-09-24 21:29                                       ` Frederic Weisbecker
2012-09-24 22:47                                         ` Sasha Levin
2012-09-24 22:54                                           ` Sasha Levin
2012-09-24 23:06                                             ` Frederic Weisbecker
2012-09-24 23:10                                               ` Sasha Levin
2012-09-24 23:35                                                 ` Frederic Weisbecker
2012-09-24 23:41                                                   ` Frederic Weisbecker
2012-09-25  4:04                                                     ` Paul E. McKenney
2012-09-25 11:59                                                       ` Frederic Weisbecker
2012-09-25 13:04                                                         ` Paul E. McKenney
2012-09-26 14:56                                                           ` Frederic Weisbecker
2012-09-26 16:26                                                             ` Paul E. McKenney
2012-09-25 12:06                                                 ` Frederic Weisbecker
2012-09-25 18:28                                                   ` Sasha Levin
2012-09-25 18:36                                                     ` Paul E. McKenney
2012-09-26 15:46                                                       ` Frederic Weisbecker
2012-09-26 16:59                                                         ` Paul E. McKenney
2012-09-26 14:58                                                     ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120920152341.GE2449@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.