From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751118AbeEEFXo (ORCPT ); Sat, 5 May 2018 01:23:44 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:40644 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750744AbeEEFXm (ORCPT ); Sat, 5 May 2018 01:23:42 -0400 Date: Fri, 4 May 2018 22:25:00 -0700 From: "Paul E. McKenney" To: "Eric W. Biederman" Cc: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, tglx@linutronix.de, Anna-Maria Gleixner Subject: Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore Reply-To: paulmck@linux.vnet.ibm.com References: <20180504144014.5378-1-bigeasy@linutronix.de> <87fu37e4rn.fsf@xmission.com> <20180504170459.624e7vkro5fn2bgw@linutronix.de> <876043e3xb.fsf@xmission.com> <20180504175241.GT26088@linux.vnet.ibm.com> <87sh77b5w7.fsf@xmission.com> <20180504194510.GW26088@linux.vnet.ibm.com> <87vac39oaf.fsf@xmission.com> <20180504203713.GY26088@linux.vnet.ibm.com> <87k1siaf8y.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87k1siaf8y.fsf@xmission.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18050505-0040-0000-0000-00000426F387 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008972; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000258; SDB=6.01027702; UDB=6.00524987; IPR=6.00806836; MB=3.00020941; MTD=3.00000008; XFM=3.00000015; UTC=2018-05-05 05:23:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18050505-0041-0000-0000-0000082CFDA6 Message-Id: <20180505052500.GN26088@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-05_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805050046 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote: > "Paul E. McKenney" writes: > > > On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote: > >> "Paul E. McKenney" writes: > >> > >> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote: > >> >> "Paul E. McKenney" writes: > >> >> > >> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote: > >> >> >> Sebastian Andrzej Siewior writes: > >> >> >> > >> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote: > >> >> >> >> Sebastian Andrzej Siewior writes: > >> >> >> >> > From: Anna-Maria Gleixner > >> >> >> > … > >> >> >> >> > This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make > > >> >> >> >> > wait_lock irq safe") for different reason. > >> >> >> >> > >> >> >> >> Which tree has this change been made in? I am not finding the commit > >> >> >> >> you mention above in Linus's tree. > >> >> >> > > >> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make > >> >> >> > wait_lock irq safe"). > >> >> >> > >> >> >> Can you fix that in your patch description and can you also up the > >> >> >> description of rcu_read_unlock? > >> >> >> > >> >> >> If we don't need to jump through hoops it looks very reasonable to > >> >> >> remove this unnecessary logic. But we should fix the description > >> >> >> in rcu_read_unlock that still says we need these hoops. > >> >> > > >> >> > The hoops are still required for rcu_read_lock(), otherwise you > >> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels. > >> >> > What happens with this patch (if I understand it correctly) is that the > >> >> > signal code now uses a different way of jumping through the hoops. > >> >> > But the hoops are still jumped through. > >> >> > >> >> The patch changes: > >> >> > >> >> local_irq_disable(); > >> >> rcu_read_lock(); > >> >> spin_lock(); > >> >> rcu_read_unlock(); > >> >> > >> >> to: > >> >> > >> >> rcu_read_lock(); > >> >> spin_lock_irq(); > >> >> rcu_read_unlock(); > >> >> > >> >> Now that I have a chance to relfect on it the fact that the patern > >> >> that is being restored does not work is scary. As the failure has > >> >> nothing to do with lock ordering and people won't realize what is going > >> >> on. Especially since the common rcu modes won't care. > >> >> > >> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock > >> >> is a problem because of rt_mutex_unlock()? Or has b4abf91047cf ("rtmutex: Make > >> >> wait_lock irq safe") actually fixed that and we can correct the > >> >> documentation of rcu_read_unlock() ? And fix __lock_task_sighand? > >> > > >> > The problem is that the thing taking the lock might be the scheduler, > >> > or one of the locks taken while the scheduler's pi and rq locks are > >> > held. This occurs only with RCU-preempt. > >> > > >> > Here is what can happen: > >> > > >> > o A task does rcu_read_lock(). > >> > > >> > o That task is preempted. > >> > > >> > o That task stays preempted for a long time, and is therefore > >> > priority boosted. This boosting involves a high-priority RCU > >> > kthread creating an rt_mutex, pretending that the preempted task > >> > already holds it, and then acquiring it. > >> > > >> > o The task awakens, acquires the scheduler's rq lock, and > >> > then does rcu_read_unlock(). > >> > > >> > o Because the task has been priority boosted, __rcu_read_unlock() > >> > invokes the rcu_read_unlock_special() slowpath, which does > >> > (as you say) rt_mutex_unlock() to deboost. The deboosting > >> > can cause the scheduler to acquire the rq and pi locks, which > >> > results in deadlock. > >> > > >> > In contrast, holding these scheduler locks across the entirety of the > >> > RCU-preempt read-side critical section is harmless because then the > >> > critical section cannot be preempted, which means that priority boosting > >> > cannot happen, which means that there will be no need to deboost at > >> > rcu_read_unlock() time. > >> > > >> > This restriction has not changed, and as far as I can see is inherent > >> > in the fact that RCU uses the scheduler and the scheduler uses RCU. > >> > There is going to be an odd corner case in there somewhere! > >> > >> However if I read things correctly b4abf91047cf ("rtmutex: Make > >> wait_lock irq safe") did change this. > >> > >> In particular it changed things so that it is only the scheduler locks > >> that matter, not any old lock that disabled interrupts. This was done > >> by disabling disabling interrupts when taking the wait_lock. > >> > >> The rcu_read_unlock documentation states: > >> > >> * In most situations, rcu_read_unlock() is immune from deadlock. > >> * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock() > >> * is responsible for deboosting, which it does via rt_mutex_unlock(). > >> * Unfortunately, this function acquires the scheduler's runqueue and > >> * priority-inheritance spinlocks. This means that deadlock could result > >> * if the caller of rcu_read_unlock() already holds one of these locks or > >> * any lock that is ever acquired while holding them; or any lock which > >> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() > >> * does not disable irqs while taking ->wait_lock. > >> > >> So we can now remove the clause: > >> * ; or any lock which > >> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() > >> * does not disable irqs while taking ->wait_lock. > >> > >> Without the any lock that disabled interrupts restriction it is now safe > >> to not worry about the issues with the scheduler locks and the rt_mutex > >> Which does make it safe to not worry about these crazy complexities in > >> lock_task_sighand. > >> > >> Paul do you agree or is the patch unsafe? > > > > Ah, I thought you were trying to get rid of all but the first line of > > that paragraph, not just the final clause. Apologies for my confusion! > > > > It looks plausible, but the patch should be stress-tested on a preemptible > > kernel with priority boosting enabled. Has that been done? > > > > (Me, I would run rcutorture scenario TREE03 for an extended time period > > on b4abf91047cf with your patch applied. And with lockdep enabled, which TREE03 does not do by default. > > But what testing have you > > done already?) > > Not my patch. I was just the reviewer asking for some obvious cleanups. > So people won't get confused. Sebastian? Can you tell Paul what > testing you have done? Ah, apologies for my confusion! Thanx, Paul