From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1E9AC0650E for ; Wed, 3 Jul 2019 16:10:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AAF2421882 for ; Wed, 3 Jul 2019 16:10:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726966AbfGCQKf (ORCPT ); Wed, 3 Jul 2019 12:10:35 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:12654 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726473AbfGCQKe (ORCPT ); Wed, 3 Jul 2019 12:10:34 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x63G7SaD006032 for ; Wed, 3 Jul 2019 12:10:33 -0400 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tgwnw5mnx-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 03 Jul 2019 12:10:30 -0400 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 3 Jul 2019 17:10:29 +0100 Received: from b01cxnp22033.gho.pok.ibm.com (9.57.198.23) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 3 Jul 2019 17:10:25 +0100 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x63GAPAN48103730 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 3 Jul 2019 16:10:25 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ECE7EB2065; Wed, 3 Jul 2019 16:10:24 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BEBC0B205F; Wed, 3 Jul 2019 16:10:24 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.26]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 3 Jul 2019 16:10:24 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 351FF16C0D6C; Wed, 3 Jul 2019 09:10:26 -0700 (PDT) Date: Wed, 3 Jul 2019 09:10:26 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: Steven Rostedt , Mathieu Desnoyers , rcu Subject: Re: Normal RCU grace period can be stalled for long because need-resched flags not set? Reply-To: paulmck@linux.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 19070316-0060-0000-0000-000003588A01 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011372; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000286; SDB=6.01226908; UDB=6.00645953; IPR=6.01008135; MB=3.00027569; MTD=3.00000008; XFM=3.00000015; UTC=2019-07-03 16:10:27 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19070316-0061-0000-0000-00004A005033 Message-Id: <20190703161026.GP26519@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-03_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907030195 Sender: rcu-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org On Wed, Jul 03, 2019 at 11:25:20AM -0400, Joel Fernandes wrote: > Hi! > I am measuring performance of the RCU consolidated vs RCU before the > consolidation of flavors happened (just for fun and may be to talk > about in a presentation). > > What I did is I limited the readers/writers in rcuperf to run on all > but one CPU. And then on that one CPU, I had a thread doing a > preempt-disable + busy-wait + preempt_enable in a loop. In a CONFIG_PREEMPT=y kernel? (Guessing so because otherwise preempt_enable() doesn't do all that much.) Ah, and CONFIG_NO_HZ_FULL has an effect as well. > I was hoping the preempt disable busy-wait thread would stall the > regular readers, and it did. > But what I noticed is that grace periods take 100-200 milliseconds to > finish instead of the busy-wait time of 5-10 ms that I set. On closer > examination, it looks like even though the preempt_enable happens in > my loop, the need-resched flag is not set even though the grace period > is long over due. So the thread does not reschedule. The 100 milliseconds is expected behavior if there is not much of anything else runnable on the busy-wait CPU, at least in recent kernels. So which kernel are you running? ;-) And on the need-resched flag not being set, is it possible that it was set, but was cleared before you looked at it? After all, the grace period did end, which means that there was some sort of quiescent state on the busy-waiting CPU. And one quiescent state would be a pass through the scheduler, which would clear the need-resched flag. > For now, in my test I am just setting the need-resched flag manual > after a busy wait. Or are you saying that without your setting need-resched, you are getting RCU CPU stall warnings? Depending on exactly what you have in your busy-wait loop, that might be expected behavior for CONFIG_PREEMPT=n kernels. > But I was thinking, can this really happen in real life? So, say a CPU > is doing a lot of work in preempt_disable but is diligent enough to > check need-resched flag periodically. I believe some spin-on-owner > type locking primitives do this. I believe that RCU handles this correctly. Of course, after detecting need-resched, the code must do something that allows the scheduler to take appropriate action. One approach is to simply call cond_resched() periodically, which conveniently combines the need-resched check with the transfer of control to the scheduler. > Even though the thread is stalling the grace period, it has no clue > because no one told it that a GP is in progress that is being held up. > The tick interrupt for that thread returns rcu_need_deferred_qs() > returns false during the preempt disable section. Can we do better for > such usecases, such as even sending an IPI to the CPUs holding the > Grace period? Or even upgrading the grace period to an expedited one > if need be? The tick interrupt will invoke rcu_sched_clock_irq(), which should take care of things. Unless this is a CONFIG_NO_HZ_FULL=y kernel, in which a CPU running in the kernel might never take a scheduling-clock interrupt. The RCU grace-period kthread checks for this and takes appropriate action in rcu_implicit_dynticks_qs(). > Expedited grace periods did not have such issues. However I did notice > that sometimes the Grace period would end not within 1 busy-wait > duration but within 2. The distribution was strongly bi-modal to > 1*busy-wait and 2*busy-wait durations for expedited tests. (This > expedited test actually happened by accident, because the > preempt-disable in my loop was delaying init enough that the whole > test was running during init during which synchronize_rcu is upgraded > to expedited). I could imagine all sorts of ways that this might happen, but use of event tracing or ftrace or trace_printk() might be a good next step here. > I am sorry if this is not a realistic real-life problem, but more a > "doctor it hurts if I do this" problem as Steven once said ;-) Within the kernel, there are rules that you are supposed to follow, such as cond_resched() or similar within long-running loops. If you break those rules, stop doing that. Otherwise, RCU is supposed to handle it. Within userspace, anything goes, and RCU is supposed to handle it. Give or take random writes to /dev/mem and similar, anyway. > I'll keep poking ;-) Very good! Thanx, Paul