From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.ibm.com (Paul E. McKenney) Date: Wed, 13 Feb 2019 07:24:13 -0800 Subject: v5.0-rc2 and NVMeOF In-Reply-To: <20190213151917.GA3311@linux.ibm.com> References: <1549905891.19311.5.camel@acm.org> <20190211210808.GS4240@linux.ibm.com> <1549924039.19311.26.camel@acm.org> <20190212012422.GX4240@linux.ibm.com> <1549990020.19311.40.camel@acm.org> <20190212174715.GP4240@linux.ibm.com> <20190212191522.GA27391@linux.ibm.com> <1550018699.19311.45.camel@acm.org> <20190213011023.GX4240@linux.ibm.com> <20190213151917.GA3311@linux.ibm.com> Message-ID: <20190213152413.GA4468@linux.ibm.com> On Wed, Feb 13, 2019@07:19:17AM -0800, Paul E. McKenney wrote: > On Tue, Feb 12, 2019@05:10:23PM -0800, Paul E. McKenney wrote: > > On Tue, Feb 12, 2019@04:44:59PM -0800, Bart Van Assche wrote: > > > On Tue, 2019-02-12@11:15 -0800, Paul E. McKenney wrote: > > > > [ ... ] > > > > And please see below for a patch that should allow SRCU to provide > > > > greatly improved diagnostics for my hypothesized scenario. > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > commit 266c20cf63cdcecb3856dbc7886529082f0acaf5 > > > > Author: Paul E. McKenney > > > > Date: Tue Feb 12 10:44:33 2019 -0800 > > > > > > > > srcu: Check for in-flight callbacks in _cleanup_srcu_struct() > > > > > > > > If someone fails to drain the corresponding SRCU callbacks (for > > > > example, by failing to invoke srcu_barrier()) before invoking either > > > > cleanup_srcu_struct() or cleanup_srcu_struct_quiesced(), the resulting > > > > diagnostic is an ambiguous use-after-free diagnostic, and even then > > > > only if you are running something like KASAN. This commit therefore > > > > improves SRCU diagnostics by adding checks for in-flight callbacks at > > > > _cleanup_srcu_struct() time. > > > > > > > > Note that these diagnostics can still be defeated, for example, by > > > > invoking call_srcu() concurrently with cleanup_srcu_struct(). Which is > > > > a really bad idea, but sometimes all too easy to do. But even then, > > > > these diagnostics have at least some probability of catching the problem. > > > > > > > > Reported-by: Sagi Grimberg > > > > Reported-by: Bart Van Assche > > > > Signed-off-by: Paul E. McKenney > > > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > > index a60b8ba9e1ac..4f30f3ecabc1 100644 > > > > --- a/kernel/rcu/srcutree.c > > > > +++ b/kernel/rcu/srcutree.c > > > > @@ -387,6 +387,8 @@ void _cleanup_srcu_struct(struct srcu_struct *ssp, bool quiesced) > > > > del_timer_sync(&sdp->delay_work); > > > > flush_work(&sdp->work); > > > > } > > > > + if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist))) > > > > + return; /* Forgot srcu_barrier(), so just leak it! */ > > > > } > > > > if (WARN_ON(rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) != SRCU_STATE_IDLE) || > > > > WARN_ON(srcu_readers_active(ssp))) { > > > > > > Hi Paul, > > > > > > With this patch applied I still see the KASAN use-after-free complaint but no prior > > > warning from inside the RCU code. > > > > Hmmm... > > > > I don't see how the KASAN warning could happen without srcu_struct_cleanup() > > or srcu_struct_cleanup_quiesced() being called. Perhaps a failure of > > imagination on my part. > > > > So does it seem plausible to you that one of those two has been called > > at the time the KASAN complaint is emitted? > > After sleeping on this... > > You are getting the KASAN warning at the same place each time? > > This would force me to hypothesize that you are invoking > srcu_struct_cleanup_quiesced() from a workqueue spawned from > an SRCU callback. Is that the case? You could get the same effect by doing an synchronize_srcu() within a workqueue handler, come to think of it. Thanx, Paul