From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751501AbdGZVrt (ORCPT ); Wed, 26 Jul 2017 17:47:49 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:41510 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751175AbdGZVrr (ORCPT ); Wed, 26 Jul 2017 17:47:47 -0400 Date: Wed, 26 Jul 2017 14:47:41 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com Subject: Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Reply-To: paulmck@linux.vnet.ibm.com References: <20170724214425.GA9665@linux.vnet.ibm.com> <1500932684-10469-2-git-send-email-paulmck@linux.vnet.ibm.com> <20170725141220.4d2d968e@vmware.local.home> <20170725191814.GU3730@linux.vnet.ibm.com> <20170725181710.44cd1002@vmware.local.home> <20170726000540.GE3730@linux.vnet.ibm.com> <20170726171801.5da044c3@vmware.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726171801.5da044c3@vmware.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17072621-0008-0000-0000-0000026508E2 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007431; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00893333; UDB=6.00446598; IPR=6.00673497; BA=6.00005492; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016395; XFM=3.00000015; UTC=2017-07-26 21:47:44 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17072621-0009-0000-0000-0000362A45CC Message-Id: <20170726214741.GD3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-26_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707260319 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 26, 2017 at 05:18:01PM -0400, Steven Rostedt wrote: > On Tue, 25 Jul 2017 17:05:40 -0700 > "Paul E. McKenney" wrote: > > > On Tue, Jul 25, 2017 at 06:17:10PM -0400, Steven Rostedt wrote: > > > On Tue, 25 Jul 2017 12:18:14 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote: > > > > > On Mon, 24 Jul 2017 14:44:31 -0700 > > > > > "Paul E. McKenney" wrote: > > > > > > > > > > > The handling of RCU's no-CBs CPUs has a maintenance headache, namely > > > > > > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread > > > > > > wakeup must be defered to a point where we can be sure that scheduler > > > > > > locks are not held. Of course, there are a lot of code paths leading > > > > > > from an interrupts-disabled invocation of call_rcu(), and missing any > > > > > > one of these can result in excessive callback-invocation latency, and > > > > > > potentially even system hangs. > > > > > > > > > > What about using irq_work? That's what perf and ftrace use for such a > > > > > case. > > > > > > > > I hadn't looked at irq_work before, thank you for the pointer! > > > > > > > > I nevertheless believe that timers work better in this particular case > > > > because they can be cancelled (which appears to be the common case), they > > > > > > Is the common case here that it doesn't trigger? That is, the > > > del_timer() will be called? > > > > If you have lots of call_rcu() invocations, many of them will be invoked > > with interrupts enabled, and a later one with interrupts enabled will > > take care of things for the earlier ones. So there can be workloads > > where this is the case. > > Note, only the first irq_work called will take action. The other > callers will see that a irq_work is pending and will not reivoke one. OK, that does make things a bit easier. But suppose that an old irq_work has just done the wakeup on CPU 0, but has not yet completed, and the rcuo kthead duly wakes up, does some stuff on CPU 1 and goes to sleep, then CPU 2 gets a call_rcu() with interrupts disabled, and therefore wants to do an irq_work again. But the irq_work on CPU 0 is still running. OK, this seems to be handled by clearing IRQ_WORK_PENDING before invoking the irq_work handler. > > > > normally are not at all time-critical, and because running in softirq > > > > is just fine -- no need to run out of the scheduling-clock interrupt. > > > > > > irq_work doesn't always use the scheduling clock. IIRC, it will simply > > > trigger a interrupt (if the arch supports it), and the work will be > > > done when interrupts are enabled (the interrupt that will do the work > > > will trigger) > > > > Ah, OK, so scheduling clock is just the backstop. Still, softirq > > is a bit nicer to manage than hardirq. > > Still requires a hard interrupt (timer) (thinking of NOHZ FULL where > this does matter). But only assuming that there isn't an interrupts-enabled invocation of call_rcu() before the timer would have gone off. In this case, the irq_work would still trigger, and if I didn't keep the "don't need it" complexity of the current timer-based patch, could further result in a spurious wakeup of the rcuo kthread, which could be just as much of a problem for nohz_full CPUs. (Yes, hopefully the rcuo kthread would be placed to avoid nohz_full CPUs, but on the other hand, hopefully code that caused call_rcu() to be invoked with interrupts disabled would also be so placed.) > > > > Seem reasonable? > > > > > > Don't know. With irq_work, you just call it and forget about it. No > > > need to mod or del timers. > > > > But I could have a series of call_rcu() invocations with interrupts > > disabled, so I would need to interact somehow with the irq_work handler. > > Either that or dynamically allocate the needed data structure. > > > > Or am I missing something here? > > You treat it just like you are with the timer code. You have a irq_work > struct attached to your rdp descriptor. And call irq_work_run() when > interrupts are disabled. If it hasn't already been invoked it will > invoke one. Then the irq_work handler will look at the rdp attached to > the irq_work (container_of()), and then wake the associated thread. > > It is much lighter weight than a timer setup. How much lighter weight? In other words, what fraction of the timers have to avoid being cancelled for irq_work to break even? Thanx, Paul