From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754593AbdDLPSZ (ORCPT ); Wed, 12 Apr 2017 11:18:25 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36291 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753941AbdDLPSX (ORCPT ); Wed, 12 Apr 2017 11:18:23 -0400 Date: Wed, 12 Apr 2017 08:18:17 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: linux-kernel@vger.kernel.org Subject: Re: There is a Tasks RCU stall warning Reply-To: paulmck@linux.vnet.ibm.com References: <20170411174953.46adbf1e@gandalf.local.home> <20170411215656.GI1600@linux.vnet.ibm.com> <20170411181530.27dc21cc@gandalf.local.home> <20170411230154.GA3956@linux.vnet.ibm.com> <20170411230445.GA25951@linux.vnet.ibm.com> <20170411231138.GB25951@linux.vnet.ibm.com> <20170412032307.GA27011@linux.vnet.ibm.com> <20170412091821.4ad74bb0@gandalf.local.home> <20170412141936.GF3956@linux.vnet.ibm.com> <20170412104255.26bb17d4@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170412104255.26bb17d4@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17041215-0052-0000-0000-000001D3F463 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006923; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00846557; UDB=6.00417579; IPR=6.00624993; BA=6.00005286; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015022; XFM=3.00000013; UTC=2017-04-12 15:18:19 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041215-0053-0000-0000-00004FE3D622 Message-Id: <20170412151817.GG3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-12_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704120126 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 12, 2017 at 10:42:55AM -0400, Steven Rostedt wrote: > On Wed, 12 Apr 2017 07:19:36 -0700 > "Paul E. McKenney" wrote: > > > On Wed, Apr 12, 2017 at 09:18:21AM -0400, Steven Rostedt wrote: > > > On Tue, 11 Apr 2017 20:23:07 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > But another question... > > > > > > > > Suppose someone traced or probed or whatever a call to (say) > > > > cond_resched_rcu_qs(). Wouldn't that put the call to this > > > > function in the trampoline itself? Of course, if this happened, > > > > life would be hard when the trampoline was freed due to > > > > cond_resched_rcu_qs() being a quiescent state. > > > > > > Not at all, because the trampoline happens at the beginning of the > > > function. Not in the guts of it (unless something in the guts was > > > traced). But even then, it should be fine as the change was already > > > made. > > > > > > /* unhook trampoline from function calls */ > > > unregister_ftrace_function(my_ops); > > > > > > synchronize_rcu_tasks(); > > > > > > kfree(my_ops->trampoline); > > > > > > > > > Thus, once the unregister_ftrace_function() is called, no new entries > > > into the trampoline can happen. The synchronize_rcu_tasks() is to move > > > those that are currently on a trampoline off. > > > > OK, good! (I thought that these things could appear anywhere.) > > Well the trampolines pretty much can, but they are removed before > calling synchronize_rcu_tasks(), and nothing can enter the trampoline > when that is called. Color me confused... So you can have an arbitrary function call within a trampoline? If not, agreed, no problem. Otherwise, it seems like we have a big problem remaining. Unless the functions called from a trampoline are guaranteed never to do a context switch. So what exactly is the trampoline code allowed to do? ;-) > > If it ever becomes necessary, I suppose you could have a function > > call as the very last thing on a trampoline. Do the (off-trampoline) > > return-address push, jump at the function, and that is the last need > > for the trampoline. > > The point of trampolines is to optimize the function hooks, added > features will kill that optimization. But then it gets even more > complex. The trampolines are written in assembly and do special reg > savings in order to call C code. And it needs to restore back to the > original state before calling back to the function being traced. Thus, > anything at the end of the trampoline will need to be written in > assembly. Not sure writing RCU code in assembly would be much fun. Writing RCU code as assembly code would indeed not be my first choice! > > Assuming that the called function doesn't try accessing the code > > surrounding the call, but that would be a problem in any case. > > > > > Is there a way that a task could be in the middle of > > > cond_resched_rcu_qs() and get preempted by something while on the > > > ftrace trampoline, then the above "unregister_ftrace_function()" and > > > "synchronize_rcu_tasks()" can be called and finish, while the one task > > > is still on the trampoline and never finished the cond_resched_rcu_qs()? > > > > Well, if the kernel being ftraced is a guest OS and the hypervisor > > preempts it at just that point... > > Not sure what you mean by the above. You mean the hypervisor running > ftrace on the guest OS? Or just a long pause on the guest OS (could > also be an NMI). But in any case, we don't care about long pauses. We > care about tasks going to sleep while on the trampoline, and the ftrace > code that does the schedule_on_each_cpu() missing that task, because it > was preempted, and not effected by the schedule_on_each_cpu() call. The guest doing ftrace and the hypervisor preempting it. But yes, same thing as NMI. > > > > Or is there something that takes care to avoid putting calls to > > > > this sort of function (and calls to any function calling this sort > > > > of function, directly or indirectly) into a trampoline? > > > > > > The question is, if its on the trampoline in one of theses functions > > > when synchronize_rcu_tasks() is called, will it still be on the > > > trampoline when that returns? > > > > If the function's return address is within the trampoline, it seems to > > me that bad things could happen. > > Not sure what you mean by the above. One should never be tracing within > a trampoline, or calling synchronize_rcu_tasks() in one. The trampoline > could be called from any context, including NMI. My problem is that I have no idea what can and cannot be included in trampoline code. In absence of that information, my RCU-honed reflexes jump immediately to the worst case that I can think of. ;-) Thanx, Paul