From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3xZlB45KFmzDqZr for ; Sun, 20 Aug 2017 15:05:00 +1000 (AEST) Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v7K53oDi065726 for ; Sun, 20 Aug 2017 01:04:57 -0400 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0b-001b2d01.pphosted.com with ESMTP id 2cejvjqbd3-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Sun, 20 Aug 2017 01:04:57 -0400 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 20 Aug 2017 01:04:56 -0400 Date: Sat, 19 Aug 2017 22:04:52 -0700 From: "Paul E. McKenney" To: Nicholas Piggin Cc: Michael Ellerman , Jonathan Cameron , dzickus@redhat.com, sfr@canb.auug.org.au, linuxarm@huawei.com, abdhalee@linux.vnet.ibm.com, tglx@linutronix.de, sparclinux@vger.kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org, David Miller , linux-arm-kernel@lists.infradead.org, john.stultz@linaro.org Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Reply-To: paulmck@linux.vnet.ibm.com References: <20170731120847.00003d5c@huawei.com> <20170731150411.GA3730@linux.vnet.ibm.com> <20170731162757.000058ba@huawei.com> <20170801184646.GE3730@linux.vnet.ibm.com> <20170802172555.0000468a@huawei.com> <20170815154743.GK7017@linux.vnet.ibm.com> <87wp63smwn.fsf@concordia.ellerman.id.au> <20170816125617.GY7017@linux.vnet.ibm.com> <20170816162731.GA22978@linux.vnet.ibm.com> <20170820144553.2ab2727b@ppc64le> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20170820144553.2ab2727b@ppc64le> Message-Id: <20170820050452.GI11320@linux.vnet.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sun, Aug 20, 2017 at 02:45:53PM +1000, Nicholas Piggin wrote: > On Wed, 16 Aug 2017 09:27:31 -0700 > "Paul E. McKenney" wrote: > > On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote: > > > > Thomas, John, am I misinterpreting the timer trace event messages? > > So I did some digging, and what you find is that rcu_sched seems to do a > simple scheudle_timeout(1) and just goes out to lunch for many seconds. > The process_timeout timer never fires (when it finally does wake after > one of these events, it usually removes the timer with del_timer_sync). > > So this patch seems to fix it. Testing, comments welcome. Fired up some tests, which should complete in about 12 hours. Here is hoping! ;-) Thanx, Paul