From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751800AbdIUQrg (ORCPT ); Thu, 21 Sep 2017 12:47:36 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33774 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751640AbdIUQre (ORCPT ); Thu, 21 Sep 2017 12:47:34 -0400 Date: Thu, 21 Sep 2017 09:47:28 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Steven Rostedt , Neeraj Upadhyay , josh@joshtriplett.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, linux-kernel@vger.kernel.org, sramana@codeaurora.org, prsood@codeaurora.org, pkondeti@codeaurora.org, markivx@codeaurora.org Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu Reply-To: paulmck@linux.vnet.ibm.com References: <20170917010015.GW3521@linux.vnet.ibm.com> <20170918111105.15f687da@gandalf.local.home> <20170918160125.GL3521@linux.vnet.ibm.com> <20170918121213.312c82b0@gandalf.local.home> <20170918162412.GM3521@linux.vnet.ibm.com> <20170919153126.GA2955@linux.vnet.ibm.com> <20170921135946.37fhnlbplgseia53@hirez.programming.kicks-ass.net> <20170921160048.GI3521@linux.vnet.ibm.com> <20170921163012.iqbjmqpijftsgpxu@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170921163012.iqbjmqpijftsgpxu@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17092116-0040-0000-0000-000003A5EA65 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007773; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000231; SDB=6.00920310; UDB=6.00462433; IPR=6.00700532; BA=6.00005601; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017237; XFM=3.00000015; UTC=2017-09-21 16:47:31 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17092116-0041-0000-0000-0000079AEFD6 Message-Id: <20170921164728.GK3521@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-21_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709210226 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 21, 2017 at 06:30:12PM +0200, Peter Zijlstra wrote: > On Thu, Sep 21, 2017 at 09:00:48AM -0700, Paul E. McKenney wrote: > > commit c21c9b78182e35eb0e72ef4e3bba3054f26eaaea [ . . . ] > Inconsistent spacing after your bullet 'o', first two points have a > space the last two a tab or so. > > > Given that resched_cpu() is now used only by RCU, this commit fixes the > > assumption by making resched_cpu() unconditional. > > Other than that, yes looks _much_ better, thanks! > > Acked-by: Peter Zijlstra (Intel) > > Also, you might want to tag it for stable. Like this, then? Thanx, Paul ------------------------------------------------------------------------ commit 62d94c97e96d5aa6a977a53dd007029df7a65586 Author: Paul E. McKenney Date: Mon Sep 18 08:54:40 2017 -0700 sched: Make resched_cpu() unconditional The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay Suggested-by: Neeraj Upadhyay Signed-off-by: Paul E. McKenney Acked-by: Steven Rostedt (VMware) Acked-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cab8c5ec128e..b2281971894c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -505,8 +505,7 @@ void resched_cpu(int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - if (!raw_spin_trylock_irqsave(&rq->lock, flags)) - return; + raw_spin_lock_irqsave(&rq->lock, flags); resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); }