From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbbDBLbr (ORCPT ); Thu, 2 Apr 2015 07:31:47 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:34290 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750711AbbDBLbq (ORCPT ); Thu, 2 Apr 2015 07:31:46 -0400 Date: Thu, 2 Apr 2015 13:31:41 +0200 From: Ingo Molnar To: Preeti U Murthy Cc: peterz@infradead.org, mpe@ellerman.id.au, tglx@linutronix.de, rjw@rjwysocki.net, nicolas.pitre@linaro.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2] clockevents: Fix cpu down race for hrtimer based broadcasting Message-ID: <20150402113141.GB14370@gmail.com> References: <20150330092410.24979.59887.stgit@preeti.in.ibm.com> <20150402104226.GB21105@gmail.com> <551D2733.7040108@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <551D2733.7040108@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Preeti U Murthy wrote: > On 04/02/2015 04:12 PM, Ingo Molnar wrote: > > > > * Preeti U Murthy wrote: > > > >> It was found when doing a hotplug stress test on POWER, that the machine > >> either hit softlockups or rcu_sched stall warnings. The issue was > >> traced to commit 7cba160ad789a powernv/cpuidle: Redesign idle states > >> management, which exposed the cpu down race with hrtimer based broadcast > >> mode(Commit 5d1638acb9f6(tick: Introduce hrtimer based broadcast). This > >> is explained below. > >> > >> Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before > >> it is taken down. > >> > >> CPU0 CPU1 > >> > >> cpu_down() take_cpu_down() > >> disable_interrupts() > >> > >> cpu_die() > >> > >> while(CPU1 != CPU_DEAD) { > >> msleep(100); > >> switch_to_idle(); > >> stop_cpu_timer(); > >> schedule_broadcast(); > >> } > >> > >> tick_cleanup_cpu_dead() > >> take_over_broadcast() > >> > >> So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer > >> anymore, so CPU0 will be stuck forever. > >> > >> Fix this by explicitly taking over broadcast duty before cpu_die(). > >> This is a temporary workaround. What we really want is a callback in the > >> clockevent device which allows us to do that from the dying CPU by > >> pushing the hrtimer onto a different cpu. That might involve an IPI and > >> is definitely more complex than this immediate fix. > > > > So why not use a suitable CPU_DOWN* notifier for this, instead of open > > coding it all into a random place in the hotplug machinery? > > This is because each of them is unsuitable for a reason: > > 1. CPU_DOWN_PREPARE stage allows for a fail. The cpu in question may not > successfully go down. So we may pull the hrtimer unnecessarily. Failure is really rare - and as long as things will continue to work afterwards it's not a problem to pull the hrtimer to this CPU. Right? Thanks, Ingo