From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755528Ab0I2UDv (ORCPT ); Wed, 29 Sep 2010 16:03:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32847 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755136Ab0I2UDu (ORCPT ); Wed, 29 Sep 2010 16:03:50 -0400 Date: Wed, 29 Sep 2010 16:03:23 -0400 From: Don Zickus To: Stephane Eranian Cc: Robert Richter , Cyrill Gorcunov , "mingo@redhat.com" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "yinghai@kernel.org" , "andi@firstfloor.org" , "peterz@infradead.org" , "ying.huang@intel.com" , "fweisbec@gmail.com" , "ming.m.lin@intel.com" , "tglx@linutronix.de" , "mingo@elte.hu" Subject: Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts after disabling counters Message-ID: <20100929200323.GC26290@redhat.com> References: <20100929150140.GK13563@erda.amd.com> <20100929151253.GL13563@erda.amd.com> <20100929152745.GC9440@lenovo> <20100929154528.GD9440@lenovo> <20100929170924.GR13563@erda.amd.com> <20100929181207.GW26290@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-08-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 29, 2010 at 09:42:26PM +0200, Stephane Eranian wrote: > On Wed, Sep 29, 2010 at 8:12 PM, Don Zickus wrote: > > Robert, > > > > I think you missed Stephane's point.  Say for example, kgdb is being used > > while we are doing stuff with the perf counter (and say kgdb's handler is > > a lower priority than perf; which isn't true I know, but let's say): > > > Yes, exactly my point. The reality is you cannot afford to have false positive > because you may starve another subsystem from an important notification. > > I think it boils down to whether or not we need an error message (Dazed) in > case no subsystem claimed the NMI. If you were to just silently consume the > NMI when no subsystem claims it, then you would not have these issues. > > What Don has done is use a heuristic which gets activated when a PMU > interrupt handler signals that more than one counter have overflowed. His > claim is that this situation is likely to trigger back-to-back. Actually its Robert's heuristic. :-) > > The reason this heuristic works is because it waits until ALL the subsystems > have seen the notification before it declares that the NMI was PMU spurious. > To do that is uses the DIE_NMI_UNKNOWN callchain. Handler on this chain > get call last, after all subsystems have seen the notification once. I believe > that is the only way to safely "consume" a "spurious" NMI and avoid > the 'Dazed' message. Anything else runs the risks of starving the other > subsystems. I agree. Cheers, Don