From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751922AbdI2BKF (ORCPT ); Thu, 28 Sep 2017 21:10:05 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:59184 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750857AbdI2BKE (ORCPT ); Thu, 28 Sep 2017 21:10:04 -0400 Date: Thu, 28 Sep 2017 18:09:57 -0700 From: "Paul E. McKenney" To: Sebastian Andrzej Siewior Cc: Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] srcu: use cpu_online() instead custom check Reply-To: paulmck@linux.vnet.ibm.com References: <20170922152806.22860-1-bigeasy@linutronix.de> <20170922184314.GS3521@linux.vnet.ibm.com> <20170928160207.ln2t3nlnfnkwqusg@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170928160207.ln2t3nlnfnkwqusg@linutronix.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17092901-0048-0000-0000-000001ED2F5B X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007806; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000232; SDB=6.00923830; UDB=6.00464469; IPR=6.00703968; BA=6.00005613; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017313; XFM=3.00000015; UTC=2017-09-29 01:10:01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17092901-0049-0000-0000-000042B61A2F Message-Id: <20170929010957.GV3521@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-28_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709290015 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 28, 2017 at 06:02:08PM +0200, Sebastian Andrzej Siewior wrote: > On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote: > > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote: > > > The current check via srcu_online is slightly racy because after looking > > > at srcu_online there could be an interrupt that interrupted us long > > > enough until the CPU we checked against went offline. > > > > But in that case, wouldn't the interrupt block the synchronize_sched() > > later in the offline sequence? > > What I meant is: > > CPU0 CPU1 > preempt_disable(); > if (READ_ONCE(per_cpu(srcu_online, 1))) > *interrupt* > WRITE_ONCE(per_cpu(srcu_online, cpu), false); > and CPU is offnline > > ret = queue_delayed_work_on(1, wq, dwork, delay); > > is this possible or are there a safety belt for this? I don't see anything that would prevent this. It is unlikely, but not so unlikely that it should not be fixed. > > More to the point, are you actually seeing this failure, or is this > > a theoretical bug? > > I need to get rid of the preempt_disable() section in which > queue_delayed_work*() is invoked for RT. OK, but please see below... > > > An alternative would be to hold the hotplug rwsem (so the CPUs don't > > > change their state) and then check based on cpu_online() if we queue it > > > on a specific CPU or not. queue_work_on() itself can handle if something > > > is enqueued on an offline CPU but a timer which is enqueued on an offline > > > CPU won't fire until the CPU is back online. > > > > > > I am not sure if the removal in rcu_init() is okay or not. I assume that > > > SRCU won't enqueue a work item before SRCU is up and ready. > > > > Another alternative would be to disable preemption across the check and > > the call to queue_delayed_work_on(). > > you need to ensure the *other* CPU won't in the middle of checking its > status. preempt_disable() won't do this on the other CPU. Agreed. > > Yet another alternative would be to have an SRCU-specific per-CPU lock > > that is acquired across the setting and clearing of srcu_online, > > and also across the check and the call to queue_delayed_work_on(). > > This last would be more consistent with a desire to remove the > > synchronize_sched() from the offline sequence. > > > > Or am I missing something here? > The perCPU lock should work. And cpus_read_lock() is basically that > except that srcu_online_cpu() is not holding it but the CPU-HP code. > > So you want keep things as-is or do you prefer a per-CPU rwsem instead? The per-CPU rwsem seems like a reasonable approach. Except for the call_srcu() path, given that call_srcu()'s caller might have preemption (or even interrupts) disabled. Thoughts? Thanx, Paul