public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	rcu@vger.kernel.org
Subject: Re: [PATCH] rcu/nocb: Fix misordered rcu_barrier() while (de-)offloading
Date: Mon, 18 Oct 2021 19:42:42 +0200	[thread overview]
Message-ID: <20211018174242.GA450204@lothringen> (raw)
In-Reply-To: <20211018161814.GS880162@paulmck-ThinkPad-P17-Gen-1>

On Mon, Oct 18, 2021 at 09:18:14AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 18, 2021 at 01:32:59PM +0200, Frederic Weisbecker wrote:
> > When an rdp is in the process of (de-)offloading, rcu_core() and the
> > nocb kthreads can process callbacks at the same time. This leaves many
> > possible scenarios leading to an rcu barrier to execute before
> > the preceding callbacks. Here is one such example:
> > 
> >             CPU 0                                  CPU 1
> >        --------------                         ---------------
> >      call_rcu(callbacks1)
> >      call_rcu(callbacks2)
> >      // move callbacks1 and callbacks2 on the done list
> >      rcu_advance_callbacks()
> >      call_rcu(callbacks3)
> >      rcu_barrier_func()
> >          rcu_segcblist_entrain(...)
> >                                             nocb_cb_wait()
> >                                                 rcu_do_batch()
> >                                                     callbacks1()
> >                                                     cond_resched_tasks_rcu_qs()
> >      // move callbacks3 and rcu_barrier_callback()
> >      // on the done list
> >      rcu_advance_callbacks()
> >      rcu_core()
> >          rcu_do_batch()
> >              callbacks3()
> >              rcu_barrier_callback()
> >                                                     //MISORDERING
> >                                                     callbacks2()
> > 
> > Fix this with preventing two concurrent rcu_do_batch() on a  same rdp
> > as long as an rcu barrier callback is pending somewhere.
> > 
> > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> 
> Yow!
> 
> But how does the (de-)offloading procedure's acquisition of
> rcu_state.barrier_mutex play into this?  In theory, that mutex was
> supposed to prevent these sorts of scenarios.  In practice, it sounds
> like the shortcomings in this theory should be fully explained so that
> we don't get similar bugs in the future.  ;-)

I think you're right. The real issue is something I wanted to
fix next: RCU_SEGCBLIST_RCU_CORE isn't cleared when nocb is enabled on
boot so rcu_core() always run concurrently with nocb kthreads in TREE04,
without holding rcu_barrier mutex of course (I mean with the latest patchset).

Ok forget this patch, I'm testing again with simply clearing
RCU_SEGCBLIST_RCU_CORE on boot.

Thanks.

  reply	other threads:[~2021-10-18 17:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18 11:32 [PATCH] rcu/nocb: Fix misordered rcu_barrier() while (de-)offloading Frederic Weisbecker
2021-10-18 16:18 ` Paul E. McKenney
2021-10-18 17:42   ` Frederic Weisbecker [this message]
2021-10-18 18:36     ` Paul E. McKenney
2021-10-18 21:50       ` Frederic Weisbecker
2021-10-18 22:34         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018174242.GA450204@lothringen \
    --to=frederic@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox