linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: josh@joshtriplett.org, Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	jiangshanlai@gmail.com, LKML <linux-kernel@vger.kernel.org>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: rcu: WARNING in rcu_seq_end
Date: Mon, 6 Mar 2017 02:07:41 -0800	[thread overview]
Message-ID: <20170306100741.GJ30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CACT4Y+afRLtjoWiyh_V-gBPD-EL8U4x7vrBrOUSmH1wrWrkVnA@mail.gmail.com>

On Mon, Mar 06, 2017 at 10:24:24AM +0100, Dmitry Vyukov wrote:
> On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote:
> >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote:
> >> >> Hello,
> >> >>
> >> >> Paul, you wanted bugs in rcu.
> >> >
> >> > Well, whether I want them or not, I must deal with them.  ;-)
> >> >
> >> >> I've got this WARNING while running syzkaller fuzzer on
> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760:
> >> >>
> >> >> ------------[ cut here ]------------
> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> >> >> Kernel panic - not syncing: panic_on_warn set ...
> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> >> Workqueue: events wait_rcu_exp_gp
> >> >> Call Trace:
> >> >>  __dump_stack lib/dump_stack.c:15 [inline]
> >> >>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> >> >>  panic+0x1fb/0x412 kernel/panic.c:179
> >> >>  __warn+0x1c4/0x1e0 kernel/panic.c:540
> >> >>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
> >> >>  rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> >> >>  rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
> >> >>  rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
> >> >>  rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
> >> >>  wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
> >> >>  process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
> >> >>  worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
> >> >>  kthread+0x326/0x3f0 kernel/kthread.c:227
> >> >>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> >> >> Dumping ftrace buffer:
> >> >>    (ftrace buffer empty)
> >> >> Kernel Offset: disabled
> >> >> Rebooting in 86400 seconds..
> >> >>
> >> >>
> >> >> Not reproducible. But looking at the code, shouldn't it be:
> >> >>
> >> >>  static void rcu_seq_end(unsigned long *sp)
> >> >>  {
> >> >>         smp_mb(); /* Ensure update-side operation before counter increment. */
> >> >> +       WARN_ON_ONCE(!(*sp & 0x1));
> >> >>         WRITE_ONCE(*sp, *sp + 1);
> >> >> -       WARN_ON_ONCE(*sp & 0x1);
> >> >>  }
> >> >>
> >> >> ?
> >> >>
> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon
> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this
> >> >> consequently can allow start of next grace periods. Which in turn can
> >> >> make the warning fire. Am I missing something?
> >> >>
> >> >> I don't see any other bad consequences of this. The rest of
> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has
> >> >> returned and destroyed work on stack and next period has started and
> >> >> ended, but it seems OK.
> >> >
> >> > I believe that this is a heygood change, but I don't see how it will
> >> > help in this case.  BTW, may I have your Signed-off-by?
> >> >
> >> > The reason I don't believe that it will help is that the
> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that
> >> > is invoked holding ->exp_mutex, and this mutex is not released until
> >> > after the handler invokes rcu_seq_end() and then wakes up the task that
> >> > scheduled the workqueue handler.  So the ordering above should not matter
> >> > (but I agree that your ordering is cleaner.
> >> >
> >> > That said, it looks like I am missing some memory barriers, please
> >> > see the following patch.
> >> >
> >> > But what architecture did you see this on?
> >>
> >>
> >> This is just x86.
> >>
> >> You seem to assume that wait_event() waits for the wakeup. It does not
> >> work this way. It can return as soon as the condition becomes true
> >> without ever waiting:
> >>
> >> 305 #define wait_event(wq, condition)                                       \
> >> 306 do {                                                                    \
> >> 307         might_sleep();                                                  \
> >> 308         if (condition)                                                  \
> >> 309                 break;                                                  \
> >> 310         __wait_event(wq, condition);                                    \
> >> 311 } while (0)
> >
> > Agreed, hence my patch in the previous email.  I guess I knew that, but
> 
> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done?

No, there is a mutex release and acquisition that do the synchronization,
but only if the wakeup has appropriate barriers.  The issue is that
part of the mutex's critical section executes in a workqueue, possibly
on some other CPU.

							Thanx, Paul

> I think you placed the barrier incorrectly for that. rcu_exp_wait_wake
> is already too late. The write that unblocks waiter is in rcu_seq_end
> so you need a release barrier _before_ that write.
> Also can we please start using smp_load_acquire/smp_store_release
> where they are what doctor said. They are faster, more readable,
> better for race detectors _and_ would prevent you from introducing
> this bug, because you would need to find the exact write that
> signifies completion. I.e.:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d80c2587bed8..aa7ba83f6a56 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp)
>  static void rcu_seq_end(unsigned long *sp)
>  {
>         smp_mb(); /* Ensure update-side operation before counter increment. */
> -       WRITE_ONCE(*sp, *sp + 1);
> +       smp_store_release(sp, *sp + 1);
>         WARN_ON_ONCE(*sp & 0x1);
>  }
> 
> @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp)
>   */
>  static bool rcu_seq_done(unsigned long *sp, unsigned long s)
>  {
> -       return ULONG_CMP_GE(READ_ONCE(*sp), s);
> +       return ULONG_CMP_GE(smp_load_acquire(sp), s);
>  }
> 
> 
> 
> > on the day I wrote that code, my fingers didn't.  Or somew similar lame
> > excuse.  ;-)
> >
> >> Mailed a signed patch:
> >> https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ
> >
> > This is the patch you also sent by email, that moves the WARN_ON_ONCE(),
> > thank you!
> 

  reply	other threads:[~2017-03-06 10:08 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-04 16:01 rcu: WARNING in rcu_seq_end Dmitry Vyukov
2017-03-04 20:40 ` Paul E. McKenney
2017-03-05 10:50   ` Dmitry Vyukov
2017-03-05 18:47     ` Paul E. McKenney
2017-03-06  9:24       ` Dmitry Vyukov
2017-03-06 10:07         ` Paul E. McKenney [this message]
2017-03-06 10:11           ` Dmitry Vyukov
2017-03-06 23:08             ` Paul E. McKenney
2017-03-07  7:05               ` Dmitry Vyukov
2017-03-07 14:27                 ` Boqun Feng
2017-03-07 14:43                   ` Dmitry Vyukov
2017-03-07 15:27                     ` Paul E. McKenney
2017-03-07 18:37                       ` Dmitry Vyukov
2017-03-07 19:09                         ` Paul E. McKenney
2017-03-07 23:05                       ` Boqun Feng
2017-03-07 23:31                         ` Paul E. McKenney
2017-03-08  1:39                           ` Boqun Feng
2017-03-08  2:26                             ` Paul E. McKenney
2017-03-08  2:44                               ` Boqun Feng
2017-03-08  3:08                                 ` Paul E. McKenney
2017-03-07 15:16                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170306100741.GJ30506@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dvyukov@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rostedt@goodmis.org \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).