Re: rcu: WARNING in rcu_seq_end

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: josh@joshtriplett.org, Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	jiangshanlai@gmail.com, LKML <linux-kernel@vger.kernel.org>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: rcu: WARNING in rcu_seq_end
Date: Mon, 6 Mar 2017 15:08:00 -0800	[thread overview]
Message-ID: <20170306230800.GK30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CACT4Y+YuJCK5UESKyhgJ6Ez26pxGvSAZjVs-C8nEM1HKpgUWgw@mail.gmail.com>

On Mon, Mar 06, 2017 at 11:11:23AM +0100, Dmitry Vyukov wrote:
> On Mon, Mar 6, 2017 at 11:07 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Mar 06, 2017 at 10:24:24AM +0100, Dmitry Vyukov wrote:
> >> On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote:
> >> >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney
> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote:
> >> >> >> Hello,
> >> >> >>
> >> >> >> Paul, you wanted bugs in rcu.
> >> >> >
> >> >> > Well, whether I want them or not, I must deal with them.  ;-)
> >> >> >
> >> >> >> I've got this WARNING while running syzkaller fuzzer on
> >> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760:
> >> >> >>
> >> >> >> ------------[ cut here ]------------
> >> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
> >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> >> >> >> Kernel panic - not syncing: panic_on_warn set ...
> >> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
> >> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> >> >> Workqueue: events wait_rcu_exp_gp
> >> >> >> Call Trace:
> >> >> >>  __dump_stack lib/dump_stack.c:15 [inline]
> >> >> >>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> >> >> >>  panic+0x1fb/0x412 kernel/panic.c:179
> >> >> >>  __warn+0x1c4/0x1e0 kernel/panic.c:540
> >> >> >>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
> >> >> >>  rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> >> >> >>  rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
> >> >> >>  rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
> >> >> >>  rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
> >> >> >>  wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
> >> >> >>  process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
> >> >> >>  worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
> >> >> >>  kthread+0x326/0x3f0 kernel/kthread.c:227
> >> >> >>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> >> >> >> Dumping ftrace buffer:
> >> >> >>    (ftrace buffer empty)
> >> >> >> Kernel Offset: disabled
> >> >> >> Rebooting in 86400 seconds..
> >> >> >>
> >> >> >>
> >> >> >> Not reproducible. But looking at the code, shouldn't it be:
> >> >> >>
> >> >> >>  static void rcu_seq_end(unsigned long *sp)
> >> >> >>  {
> >> >> >>         smp_mb(); /* Ensure update-side operation before counter increment. */
> >> >> >> +       WARN_ON_ONCE(!(*sp & 0x1));
> >> >> >>         WRITE_ONCE(*sp, *sp + 1);
> >> >> >> -       WARN_ON_ONCE(*sp & 0x1);
> >> >> >>  }
> >> >> >>
> >> >> >> ?
> >> >> >>
> >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon
> >> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this
> >> >> >> consequently can allow start of next grace periods. Which in turn can
> >> >> >> make the warning fire. Am I missing something?
> >> >> >>
> >> >> >> I don't see any other bad consequences of this. The rest of
> >> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has
> >> >> >> returned and destroyed work on stack and next period has started and
> >> >> >> ended, but it seems OK.
> >> >> >
> >> >> > I believe that this is a heygood change, but I don't see how it will
> >> >> > help in this case.  BTW, may I have your Signed-off-by?
> >> >> >
> >> >> > The reason I don't believe that it will help is that the
> >> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that
> >> >> > is invoked holding ->exp_mutex, and this mutex is not released until
> >> >> > after the handler invokes rcu_seq_end() and then wakes up the task that
> >> >> > scheduled the workqueue handler.  So the ordering above should not matter
> >> >> > (but I agree that your ordering is cleaner.
> >> >> >
> >> >> > That said, it looks like I am missing some memory barriers, please
> >> >> > see the following patch.
> >> >> >
> >> >> > But what architecture did you see this on?
> >> >>
> >> >>
> >> >> This is just x86.
> >> >>
> >> >> You seem to assume that wait_event() waits for the wakeup. It does not
> >> >> work this way. It can return as soon as the condition becomes true
> >> >> without ever waiting:
> >> >>
> >> >> 305 #define wait_event(wq, condition)                                       \
> >> >> 306 do {                                                                    \
> >> >> 307         might_sleep();                                                  \
> >> >> 308         if (condition)                                                  \
> >> >> 309                 break;                                                  \
> >> >> 310         __wait_event(wq, condition);                                    \
> >> >> 311 } while (0)
> >> >
> >> > Agreed, hence my patch in the previous email.  I guess I knew that, but
> >>
> >> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done?
> >
> > No, there is a mutex release and acquisition that do the synchronization,
> > but only if the wakeup has appropriate barriers.  The issue is that
> > part of the mutex's critical section executes in a workqueue, possibly
> > on some other CPU.
> 
> What is that mutex? And what locks/unlocks provide synchronization? I
> see that one uses exp_mutex and another -- exp_wake_mutex.

Both of them.

->exp_mutex is acquired by the task requesting the grace period, and
the counter's first increment is done by that task under that mutex.
This task then schedules a workqueue, which drives forward the grace
period.  Upon grace-period completion, the workqueue handler does the
second increment (the one that your patch addressed).  The workqueue
handler then acquires ->exp_wake_mutex and wakes the task that holds
->exp_mutex (along with all other tasks waiting for this grace period),
and that task releases ->exp_mutex, which allows the next grace period to
start (and the first increment for that next grace period to be carried
out under that lock).  The workqueue handler releases ->exp_wake_mutex
after finishing its wakeups.

Does that make sense?

							Thanx, Paul

next prev parent reply	other threads:[~2017-03-07  0:13 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-04 16:01 rcu: WARNING in rcu_seq_end Dmitry Vyukov
2017-03-04 20:40 ` Paul E. McKenney
2017-03-05 10:50   ` Dmitry Vyukov
2017-03-05 18:47     ` Paul E. McKenney
2017-03-06  9:24       ` Dmitry Vyukov
2017-03-06 10:07         ` Paul E. McKenney
2017-03-06 10:11           ` Dmitry Vyukov
2017-03-06 23:08             ` Paul E. McKenney [this message]
2017-03-07  7:05               ` Dmitry Vyukov
2017-03-07 14:27                 ` Boqun Feng
2017-03-07 14:43                   ` Dmitry Vyukov
2017-03-07 15:27                     ` Paul E. McKenney
2017-03-07 18:37                       ` Dmitry Vyukov
2017-03-07 19:09                         ` Paul E. McKenney
2017-03-07 23:05                       ` Boqun Feng
2017-03-07 23:31                         ` Paul E. McKenney
2017-03-08  1:39                           ` Boqun Feng
2017-03-08  2:26                             ` Paul E. McKenney
2017-03-08  2:44                               ` Boqun Feng
2017-03-08  3:08                                 ` Paul E. McKenney
2017-03-07 15:16                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170306230800.GK30506@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dvyukov@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rostedt@goodmis.org \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.