From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753135AbdCNQVh (ORCPT ); Tue, 14 Mar 2017 12:21:37 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36503 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753071AbdCNQVe (ORCPT ); Tue, 14 Mar 2017 12:21:34 -0400 Date: Tue, 14 Mar 2017 09:21:26 -0700 From: "Paul E. McKenney" To: Lance Roy Cc: Andrey Konovalov , Lai Jiangshan , Josh Triplett , Steven Rostedt , Mathieu Desnoyers , LKML , syzkaller , Dmitry Vyukov , Kostya Serebryany Subject: Re: srcu: BUG in __synchronize_srcu Reply-To: paulmck@linux.vnet.ibm.com References: <20170310222609.GN30506@linux.vnet.ibm.com> <20170314004702.6650857e@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170314004702.6650857e@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17031416-0052-0000-0000-000001A11279 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006781; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00833755; UDB=6.00409375; IPR=6.00611411; BA=6.00005209; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014646; XFM=3.00000013; UTC=2017-03-14 16:21:30 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17031416-0053-0000-0000-00004F1CDEA3 Message-Id: <20170314162126.GC3637@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-14_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703140126 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 14, 2017 at 12:47:02AM -0700, Lance Roy wrote: > I am not sure how the rcu_scheduler_active changes in __synchronize_srcu work, > but there seem to be a few problems in them. First, > "if (done && likely(!driving))" on line 453 doesn't appear to ever happen, > as driving doesn't get set to false when srcu_reschedule is called. This seems > like it could cause a race condition if another thread notices that ->running is > false, adds itself to the queue, set ->running to true, and starts on its own > grace period before the first thread acquires the lock again on line 469. Then > the first thread will then acquire the lock, set running to false, and release > the lock, resulting in an invalid state where ->running is false but the second > thread is still trying to finish its grace period. > > Second, the while loop on line 455 seems to violate to rule that ->running > shouldn't be false when there are entries in the queue. If a second thread adds > itself to the queue while the first thread is driving SRCU inside that loop, and > then the first thread finishes its own grace period and quits the loop, it will > set ->running to false even though there is still a thread on the queue. > > The second issue requires rcu_scheduler_active to be RCU_SCHEDULER_INIT to > occur, and as I don't what the assumptions during RCU_SCHEDULER_INIT are I don't > know if it is actually a problem, but the first issue looks like it could occur > at any time. Thank you for looking into this! I determined that my patch-order strategy was flawed, as it required me to rewrite the mid-boot functionality several times. I therefore removed the mid-boot commits. I will add them in later, but they will use a rather different approach based on a grace-period sequence number similar to that used by the expedited grace periods. Which should also teach me to be less aggressive about pushing new code to -next. For a few weeks, anyway. ;-) Thanx, Paul > Thanks, > Lance > > On Fri, 10 Mar 2017 14:26:09 -0800 > "Paul E. McKenney" wrote: > > On Fri, Mar 10, 2017 at 08:29:55PM +0100, Andrey Konovalov wrote: > > > On Fri, Mar 10, 2017 at 8:28 PM, Andrey Konovalov > > > wrote: > > > > Kernel panic - not syncing: Fatal exception > > > > So the theory is that if !sp->running, all of SRCU's queues must be empty. > > So if you are holding ->queue_lock (with irqs disabled) and you see > > !sp->running, and then you enqueue a callback on ->batch_check0, then > > that callback must be the first in the list. And the code preceding > > the WARN_ON() you triggered does in fact check and enqueue shile holding > > ->queue_lock with irqs disabled. > > > > And rcu_batch_queue() does operate FIFO as required. (Otherwise, > > srcu_barrier() would not work.) > > > > There are only three calls to rcu_batch_queue(), and the one involved with > > the WARN_ON() enqueues to ->batch_check0. The other two enqueue to > > ->batch_queue. Callbacks move from ->batch_queue to ->batch_check0 to > > ->batch_check1 to ->batch_done, so nothing should slip in front. > > > > Of course, if ->running were ever set to false with any of ->batch_check0, > > ->batch_check1, or ->batch_done non-empty, this WARN_ON() would trigger. > > But srcu_reschedule() sets it to false only if all four batches are > > empty (and checks and sets under ->queue_lock()), and all other cases > > where it is set to false happen at initialization time, and also clear > > out the queues. Of course, if someone raced an init_srcu_struct() with > > either a call_srcu() or synchronize_srcu(), all bets are off. Now, > > mmu_notifier.c does invoke init_srcu_struct() manually, but it does > > so at subsys_initcall() time. Which -might- be after other things are > > happening, so one "hail Mary" attempted fix is to remove mmu_notifier_init() > > and replace the "static struct srcu_struct srcu" with: > > > > > DEFINE_STATIC_SRCU(srcu); > > > > But this might require changing the name -- I vaguely recall some > > strangeness where the names of statically defined per-CPU variables need > > to be globally unique even when static. Easy enough to do, though. > > Might need a similar change to the "srcu" instances defined in vmd.c > > and kvm_host.h -- assuming that this change helps. > > > > Another possibility is that something in SRCU is messing with either the > > queues or the ->running field without holding ->queue_lock. And that does > > seem to be happening -- srcu_advance_batches() invokes rcu_batch_move() > > without holding anything. Which seems like it could cause trouble > > if someone else was invoking synchronize_srcu() concurrently. Those > > particular invocations might be safe due to access only by a single > > kthread/workqueue, given that all updates to ->batch_queue are protected > > by ->queue_lock (aside from initialization). > > > > But ->batch_check0 is updated by __synchronize_srcu(), though protected > > by ->queue_lock, and only if ->running is false, and with both the > > check and the update protected by the same ->queue_lock critical section. > > If ->running is false, then the workqueue cannot be running, so it remains > > to see if all other updates to ->batch_check0 are either with ->queue_lock > > held and ->running false on the one hand or from the workqueue handler > > on the other: > > > > srcu_collect_new() updates with ->queue_lock held, but does not check > > ->running. It is invoked only from process_srcu(), which in > > turn is invoked only as a workqueue handler. The work is queued > > from: > > > > call_srcu(), which does so with ->queue_lock held having just > > set ->running to true. > > > > srcu_reschedule(), which invokes it if there are non-empty > > queues. This is invoked from __synchronize_srcu() > > in the case where it has set ->running to true > > after finding the queues empty, which should imply > > no other instances. > > > > It is also invoked from process_srcu(), which is > > invoked only as a workqueue handler. (Yay > > recursive inquiry!) > > > > srcu_advance_batches() updates without locks held. It is invoked as > > follows: > > > > __synchronize_srcu() in the case where ->running was set, which > > as noted before excludes workqueue handlers. > > > > process_srcu() which as noted before is only invoked from > > a workqueue handler. > > > > So an SRCU workqueue is invoked only from a workqueue handler, or from > > some other task that transitioned ->running from false to true while > > holding ->queuelock. There should therefore only be one SRCU workqueue > > per srcu_struct, so this should be safe. Though I hope that it can > > be simplified a bit. :-/ > > > > So the only suggestion I have at the moment is static definition of > > the "srcu" variable. Lai, Josh, Steve, Mathieu, anything I missed? > > > > Thanx, Paul > > >