From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752074Ab0JAHUE (ORCPT ); Fri, 1 Oct 2010 03:20:04 -0400 Received: from va3ehsobe003.messaging.microsoft.com ([216.32.180.13]:30721 "EHLO VA3EHSOBE003.bigfish.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750968Ab0JAHUA (ORCPT ); Fri, 1 Oct 2010 03:20:00 -0400 X-SpamScore: -14 X-BigFish: VPS-14(zzbb2cK1432N98dNzz1202hzzz32i2a8h62h) X-Spam-TCS-SCL: 1:0 X-FB-SS: 0, X-WSS-ID: 0L9LO9P-01-D0Y-02 X-M-MSG: Date: Fri, 1 Oct 2010 09:17:50 +0200 From: Robert Richter To: Don Zickus CC: Stephane Eranian , Cyrill Gorcunov , "mingo@redhat.com" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "yinghai@kernel.org" , "andi@firstfloor.org" , "peterz@infradead.org" , "ying.huang@intel.com" , "fweisbec@gmail.com" , "ming.m.lin@intel.com" , "tglx@linutronix.de" , "mingo@elte.hu" Subject: Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts after disabling counters Message-ID: <20101001071750.GB13563@erda.amd.com> References: <20100929151253.GL13563@erda.amd.com> <20100929152745.GC9440@lenovo> <20100929154528.GD9440@lenovo> <20100929170924.GR13563@erda.amd.com> <20100929181207.GW26290@redhat.com> <20100930091246.GV13563@erda.amd.com> <20100930194451.GI26290@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20100930194451.GI26290@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-Reverse-DNS: ausb3extmailp02.amd.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30.09.10 15:44:51, Don Zickus wrote: > On Thu, Sep 30, 2010 at 11:12:46AM +0200, Robert Richter wrote: > > As soon as you stop executing the chain, there are chances to miss an > > nmi for other parts of the system. Where is no way to avoid this. So > > your argument above is valid also for regular perf nmis and not only > > for catched-spurious or back-to-back nmis. > > I don't agree with that. Most nmi handlers can do a check to see if their > subsystem triggered an nmi or not. Now we may not catch it in the right > order because one handler is higher in the chain than the other, but > ultimately the other handler will get its chance to execute because it > fired its own nmi (which hasn't been lost). No, as soon as a handler with higher priority detected an nmi by its own and handled it, it returns with a stop and all subsequent handlers get ignored without the chance to check their hardware. So, if perf consumes an nmi because a counter triggered, there are rare cases that other handlers may not be executed. > Whereas the problem Stephane is describing is that the heurestics of the > perf counters 'eats' an NMI, thus possibly starving another handler. With > back-to-back nmis we are at least polite, letting everyone have a chance to > process the nmi before we indulge ourselves and 'eat' it (if it still > around to be eaten). > > However in the case of the 'catched-spurious', we selfishly 'eat' the NMI > without really knowing if it was our to be eaten. That was the > difference and the concern. But, this argument is valid. It would be better to handle catched-spurious in the 'unknown' path to give other handlers the chance to check their hardware. I don't think this is a show-stopper for v2.6.36 even because the perf handler runs with the lowest priority now. So we will have enough time after the merge window to improve the code here. -Robert -- Advanced Micro Devices, Inc. Operating System Research Center