From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e39.co.us.ibm.com ([32.97.110.160]:58247 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751463AbbEDFWB (ORCPT ); Mon, 4 May 2015 01:22:01 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 3 May 2015 23:22:00 -0600 Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id B769138C8041 for ; Mon, 4 May 2015 01:21:58 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t445Lwj449741956 for ; Mon, 4 May 2015 05:21:58 GMT Received: from d01av04.pok.ibm.com (localhost [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t445Luae009675 for ; Mon, 4 May 2015 01:21:58 -0400 Message-ID: <554701F1.4040306@linux.vnet.ibm.com> Date: Mon, 04 May 2015 10:51:53 +0530 From: Preeti U Murthy MIME-Version: 1.0 To: Greg KH CC: stable@vger.kernel.org, nico@linaro.org, peterz@infradead.org, shreyas@linux.vnet.ibm.com, rjw@rjwysocki.net, mpe@ellerman.id.au, tglx@linutronix.de, mingo@kernel.org Subject: Re: [PATCH] clockevents: Fix cpu_down() race for hrtimer based broadcasting References: <20150428091927.4116.94102.stgit@preeti.in.ibm.com> <20150502183534.GA31883@kroah.com> In-Reply-To: <20150502183534.GA31883@kroah.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org List-ID: On 05/03/2015 12:05 AM, Greg KH wrote: > On Tue, Apr 28, 2015 at 02:49:55PM +0530, Preeti U Murthy wrote: >> commit 345527b1edce8df719e0884500c76832a18211c3 upstream >> >> It was found when doing a hotplug stress test on POWER, that the >> machine either hit softlockups or rcu_sched stall warnings. The >> issue was traced to commit: >> >> 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") >> >> which exposed the cpu_down() race with hrtimer based broadcast mode: >> >> 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") >> >> The race is the following: >> >> Assume CPU1 is the CPU which holds the hrtimer broadcasting duty >> before it is taken down. >> >> CPU0 CPU1 >> >> cpu_down() take_cpu_down() >> disable_interrupts() >> >> cpu_die() >> >> while (CPU1 != CPU_DEAD) { >> msleep(100); >> switch_to_idle(); >> stop_cpu_timer(); >> schedule_broadcast(); >> } >> >> tick_cleanup_cpu_dead() >> take_over_broadcast() >> >> So after CPU1 disabled interrupts it cannot handle the broadcast >> hrtimer anymore, so CPU0 will be stuck forever. >> >> Fix this by explicitly taking over broadcast duty before cpu_die(). >> >> This is a temporary workaround. What we really want is a callback >> in the clockevent device which allows us to do that from the dying >> CPU by pushing the hrtimer onto a different cpu. That might involve >> an IPI and is definitely more complex than this immediate fix. >> >> Changelog was picked up from: >> >> https://lkml.org/lkml/2015/2/16/213 >> >> Suggested-by: Thomas Gleixner >> Tested-by: Nicolas Pitre >> Signed-off-by: Preeti U. Murthy >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: mpe@ellerman.id.au >> Cc: nicolas.pitre@linaro.org >> Cc: peterz@infradead.org >> Cc: rjw@rjwysocki.net >> Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html >> Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com >> [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ] >> Signed-off-by: Ingo Molnar >> --- >> >> Please apply this to 3.19 stable. > > What about 4.0 stable? It needs to be applied to 4.0 as well. I pulled stable before I posted out and did not find this branch then. > > And this doesn't look like it's the same backport, you didn't modify > tick.h, why not? This was a mistake, apologies for that. Not sure how that got missed. I have resent this patch taking care of the missing hunk with the RESEND tag, that has to be applied to both 3.19 and 4.0. Thank you Regards Preeti U Murthy > > thanks, > > greg k-h >