From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754660AbdEIQg7 (ORCPT ); Tue, 9 May 2017 12:36:59 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41894 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753918AbdEIQg5 (ORCPT ); Tue, 9 May 2017 12:36:57 -0400 Date: Tue, 9 May 2017 09:36:52 -0700 From: "Paul E. McKenney" To: Josh Poimboeuf Cc: Steven Rostedt , Petr Mladek , Jessica Yu , Jiri Kosina , Miroslav Benes , live-patching@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code Reply-To: paulmck@linux.vnet.ibm.com References: <1493895316-19165-3-git-send-email-pmladek@suse.com> <20170508165108.d3vd4h6ffa25bfui@treble> <20170508151322.76e8e9db@gandalf.local.home> <20170508194729.jjq7qrc7gkiq2s5v@treble> <20170508201558.GD3956@linux.vnet.ibm.com> <20170508204333.xc3isvr4riv26his@treble> <20170508210754.GE3956@linux.vnet.ibm.com> <20170508221609.roaeaidj7mpfozcq@treble> <20170508223600.GH3956@linux.vnet.ibm.com> <20170509161835.64ihfts7xuytaryp@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170509161835.64ihfts7xuytaryp@treble> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17050916-0008-0000-0000-00000216F93D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007038; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000209; SDB=6.00858267; UDB=6.00425240; IPR=6.00637729; BA=6.00005339; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015379; XFM=3.00000015; UTC=2017-05-09 16:36:55 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17050916-0009-0000-0000-000035351761 Message-Id: <20170509163652.GS3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-09_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705090088 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 09, 2017 at 11:18:35AM -0500, Josh Poimboeuf wrote: > On Mon, May 08, 2017 at 03:36:00PM -0700, Paul E. McKenney wrote: > > On Mon, May 08, 2017 at 05:16:09PM -0500, Josh Poimboeuf wrote: > > > On Mon, May 08, 2017 at 02:07:54PM -0700, Paul E. McKenney wrote: > > > > This would be a problem if step 2's NMI hit rcu_irq_enter(), > > > > rcu_irq_exit(), and friends in just the wrong place. > > > > > > > > I would suggest that ftrace() do something like this... > > > > > > > > if (in_nmi()) > > > > rcu_nmi_enter(); > > > > else > > > > rcu_irq_enter(); > > > > > > > > Except that, as Steven will quickly point out, this won't work at the > > > > very edges of the NMI, when NMI_MASK won't be set in preempt_count(). > > > > > > > > Other thoughts? > > > > > > Ok. So I think the livepatch ftrace handler would need the in_nmi() > > > check, in case it's called early in the NMI. > > > > > > But on x86, rcu_nmi_enter() is also called in some non-NMI exception > > > cases, from ist_enter(). So it appears that the in_nmi() check wouldn't > > > be sufficient. We might instead need something like: > > > > > > if (in_nmi() || in_some_other_exception()) > > > rcu_nmi_enter(); > > > else > > > rcu_irq_enter(); > > > > > > But unfortunately the in_some_other_exception() function doesn't > > > currently exist. > > > > > > So, one more question. Would it work if we just always called > > > rcu_nmi_enter()? > > > > I am a bit nervous about this. It would -at- -least- be necessary to have > > interrupts disabled throughout the entire time from the rcu_nmi_enter() > > through the matching rcu_nmi_exit(). And there might be other failure > > modes that I don't immediately see. > > Ok, let's forget about that idea for now then :-) Whew!!! ;-) > > But do we really need this, given the in_nmi() check that Steven > > pointed out? > > The in_nmi() check doesn't work for non-NMI exceptions. An exception > can come from anywhere, which is presumably why ist_enter() calls > rcu_nmi_enter(), even though it might not have been in NMI context. The > exception could, for example, happen while you're twiddling important > bits in rcu_irq_enter(). Or it could happen early in do_nmi(), before > it had a chance to set NMI_MASK or call rcu_nmi_enter(). In either > case, in_nmi() would be false, yet calling rcu_irq_enter() would be bad. > > I think I have convinced myself that, as long as the user doesn't patch > ist_enter() or rcu_dynticks_eqs_enter(), it'll be fine. So the > following should be sufficient: > > if (in_nmi()) > rcu_nmi_enter(); /* in case we're called before nmi_enter() */ > else > rcu_irq_enter_irqson(); > > if (unlikely(!rcu_is_watching())) { > klp_block_patch_removal = true; > WARN_ON_ONCE(1); /* this presumably means */ > } As long as you have a similar setup on exit, so that each call to rcu_nmi_enter() is balanced by a corresponding call to rcu_nmi_exit(). Ditto for rcu_irq_enter_irqson(), of course. > I think the alternative, calling rcu_irq_enter_disabled() beforehand, > isn't sufficient, because it only checks the rcu_dynticks_eqs_enter() > case. It doesn't check the IST exception ist_enter() case, before > rcu_nmi_enter() has been called. Yes, calling rcu_irq_enter_disabled() beforehand would be unfortunate if this was an NMI that occurred in just the wrong place in (say) rcu_irq_enter(). ;-) Thanx, Paul