All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	rcu@vger.kernel.org
Subject: Re: [PATCH] rcu/nocb: Fix misordered rcu_barrier() while (de-)offloading
Date: Mon, 18 Oct 2021 19:42:42 +0200	[thread overview]
Message-ID: <20211018174242.GA450204@lothringen> (raw)
In-Reply-To: <20211018161814.GS880162@paulmck-ThinkPad-P17-Gen-1>

On Mon, Oct 18, 2021 at 09:18:14AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 18, 2021 at 01:32:59PM +0200, Frederic Weisbecker wrote:
> > When an rdp is in the process of (de-)offloading, rcu_core() and the
> > nocb kthreads can process callbacks at the same time. This leaves many
> > possible scenarios leading to an rcu barrier to execute before
> > the preceding callbacks. Here is one such example:
> > 
> >             CPU 0                                  CPU 1
> >        --------------                         ---------------
> >      call_rcu(callbacks1)
> >      call_rcu(callbacks2)
> >      // move callbacks1 and callbacks2 on the done list
> >      rcu_advance_callbacks()
> >      call_rcu(callbacks3)
> >      rcu_barrier_func()
> >          rcu_segcblist_entrain(...)
> >                                             nocb_cb_wait()
> >                                                 rcu_do_batch()
> >                                                     callbacks1()
> >                                                     cond_resched_tasks_rcu_qs()
> >      // move callbacks3 and rcu_barrier_callback()
> >      // on the done list
> >      rcu_advance_callbacks()
> >      rcu_core()
> >          rcu_do_batch()
> >              callbacks3()
> >              rcu_barrier_callback()
> >                                                     //MISORDERING
> >                                                     callbacks2()
> > 
> > Fix this with preventing two concurrent rcu_do_batch() on a  same rdp
> > as long as an rcu barrier callback is pending somewhere.
> > 
> > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> 
> Yow!
> 
> But how does the (de-)offloading procedure's acquisition of
> rcu_state.barrier_mutex play into this?  In theory, that mutex was
> supposed to prevent these sorts of scenarios.  In practice, it sounds
> like the shortcomings in this theory should be fully explained so that
> we don't get similar bugs in the future.  ;-)

I think you're right. The real issue is something I wanted to
fix next: RCU_SEGCBLIST_RCU_CORE isn't cleared when nocb is enabled on
boot so rcu_core() always run concurrently with nocb kthreads in TREE04,
without holding rcu_barrier mutex of course (I mean with the latest patchset).

Ok forget this patch, I'm testing again with simply clearing
RCU_SEGCBLIST_RCU_CORE on boot.

Thanks.

  reply	other threads:[~2021-10-18 17:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18 11:32 [PATCH] rcu/nocb: Fix misordered rcu_barrier() while (de-)offloading Frederic Weisbecker
2021-10-18 16:18 ` Paul E. McKenney
2021-10-18 17:42   ` Frederic Weisbecker [this message]
2021-10-18 18:36     ` Paul E. McKenney
2021-10-18 21:50       ` Frederic Weisbecker
2021-10-18 22:34         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018174242.GA450204@lothringen \
    --to=frederic@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.