From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2410C43144 for ; Thu, 28 Jun 2018 05:11:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A2E1126D0E for ; Thu, 28 Jun 2018 05:11:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A2E1126D0E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753490AbeF1FLf (ORCPT ); Thu, 28 Jun 2018 01:11:35 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33590 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751919AbeF1FLe (ORCPT ); Thu, 28 Jun 2018 01:11:34 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5S53vbw024746 for ; Thu, 28 Jun 2018 01:11:33 -0400 Received: from e15.ny.us.ibm.com (e15.ny.us.ibm.com [129.33.205.205]) by mx0b-001b2d01.pphosted.com with ESMTP id 2jvs320tvn-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 28 Jun 2018 01:11:33 -0400 Received: from localhost by e15.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 28 Jun 2018 01:11:32 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (9.57.198.25) by e15.ny.us.ibm.com (146.89.104.202) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 28 Jun 2018 01:11:29 -0400 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5S5BS4E6292062 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 28 Jun 2018 05:11:28 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 83A51B2065; Thu, 28 Jun 2018 01:11:19 -0400 (EDT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5C3C1B205F; Thu, 28 Jun 2018 01:11:19 -0400 (EDT) Received: from paulmck-ThinkPad-W541 (unknown [9.85.192.224]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 28 Jun 2018 01:11:19 -0400 (EDT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id E461A16C5F1A; Wed, 27 Jun 2018 22:13:34 -0700 (PDT) Date: Wed, 27 Jun 2018 22:13:34 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline Reply-To: paulmck@linux.vnet.ibm.com References: <20180626171048.2181-13-paulmck@linux.vnet.ibm.com> <20180626175119.GL2494@hirez.programming.kicks-ass.net> <20180626182950.GH3593@linux.vnet.ibm.com> <20180626202615.GA32162@linux.vnet.ibm.com> <20180626203225.GT2494@hirez.programming.kicks-ass.net> <20180626234004.GQ3593@linux.vnet.ibm.com> <20180627091106.GB7184@worktop.programming.kicks-ass.net> <20180627094633.GG2512@hirez.programming.kicks-ass.net> <20180627155721.GZ3593@linux.vnet.ibm.com> <20180627175134.GV2494@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180627175134.GV2494@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18062805-0068-0000-0000-0000030F608F X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009267; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000266; SDB=6.01053389; UDB=6.00540101; IPR=6.00831311; MB=3.00021897; MTD=3.00000008; XFM=3.00000015; UTC=2018-06-28 05:11:32 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18062805-0069-0000-0000-000044D5D316 Message-Id: <20180628051334.GG3593@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-06-28_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=537 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1806280056 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 27, 2018 at 07:51:34PM +0200, Peter Zijlstra wrote: > On Wed, Jun 27, 2018 at 08:57:21AM -0700, Paul E. McKenney wrote: > > > Another variant, which simply skips the wakeup whever ran on an offline > > > CPU, relying on the wakeup from rcutree_migrate_callbacks() right after > > > the CPU really is dead. > > > > Cute! ;-) > > > > And a much smaller change. > > > > However, this means that if someone indirectly and erroneously causes > > rcu_report_qs_rsp() to be invoked from an offline CPU, the result is an > > intermittent and difficult-to-debug grace-period hang. A lockdep splat > > whose stack trace directly implicates the culprit is much better. > > How so? We do an unconditional wakeup right after finding the offline > cpu dead. There is only very limited code between offline being true and > the CPU reporting in dead. I am thinking more generally than this particular patch. People sometimes invoke things from places they shouldn't, for example, the situation leading to your patch that allows use of RCU much earlier in the CPU-online process. It is nicer to get a splat in those situations than a silent hang. Thanx, Paul