From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCD3DC76188 for ; Fri, 19 Jul 2019 07:43:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 944D32084C for ; Fri, 19 Jul 2019 07:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726563AbfGSHnh (ORCPT ); Fri, 19 Jul 2019 03:43:37 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:52356 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726075AbfGSHng (ORCPT ); Fri, 19 Jul 2019 03:43:36 -0400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6J7gAo6062870; Fri, 19 Jul 2019 03:43:32 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tu7556h9y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2019 03:43:32 -0400 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x6J7hVV0066586; Fri, 19 Jul 2019 03:43:31 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tu7556h9g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2019 03:43:31 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x6J7dS5C029022; Fri, 19 Jul 2019 07:43:31 GMT Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by ppma01wdc.us.ibm.com with ESMTP id 2tq6x6nbbq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Jul 2019 07:43:31 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6J7hUkX53477766 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 Jul 2019 07:43:30 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8CD7BB2066; Fri, 19 Jul 2019 07:43:30 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 662E1B206B; Fri, 19 Jul 2019 07:43:30 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.80.204.62]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 19 Jul 2019 07:43:30 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id F053816C99C1; Fri, 19 Jul 2019 00:43:29 -0700 (PDT) Date: Fri, 19 Jul 2019 00:43:29 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: Byungchul Park , Byungchul Park , rcu , LKML , kernel-team@lge.com Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable Message-ID: <20190719074329.GY14271@linux.ibm.com> Reply-To: paulmck@linux.ibm.com References: <20190712063240.GD7702@X58A-UD3R> <20190712125116.GB92297@google.com> <20190713151330.GE26519@linux.ibm.com> <20190713154257.GE133650@google.com> <20190713174111.GG26519@linux.ibm.com> <20190719003942.GA28226@X58A-UD3R> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-19_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907190087 Sender: rcu-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park wrote: > [snip] > > > - There is a bug in the CPU stopper machinery itself preventing it > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > grace period. > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > stopper from being scheduled on Y. > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > scheduler locks such as pi_lock or rq->lock or something. > > > > I think what you mentioned can happen easily. > > > > Basically we would need information about preemption/irq disabled > > sections on Y and scheduler's current activity on every cpu at that time. > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > don't have NMI solutions and only IPI or interrupt based backtrace > works which should at least catch and the preempt disable and softirq > disable cases. True, though people with systems having hundreds of CPUs might not thank you for forcing an NMI backtrace on each of them. Is it possible to NMI only the ones that are holding up the CPU stopper? Thanx, Paul > But yeah I don't see why just the stacks of those CPUs that are > blocking the CPU X would not suffice for the trivial cases where a > piece of misbehaving code disable interrupts / preemption and > prevented the stopper thread from executing. > > May be once the test case is ready (no rush!) , then it will be more > clear what can help. > > J. >