From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from e34.co.us.ibm.com ([32.97.110.152]) by canuck.infradead.org with esmtps (Exim 4.63 #1 (Red Hat Linux)) id 1Hgxnf-0001Xj-Dv for kexec@lists.infradead.org; Thu, 26 Apr 2007 02:49:59 -0400 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l3Q6nBFe007167 for ; Thu, 26 Apr 2007 02:49:11 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l3Q6nBRT167348 for ; Thu, 26 Apr 2007 00:49:11 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l3Q6nACO022968 for ; Thu, 26 Apr 2007 00:49:11 -0600 Date: Thu, 26 Apr 2007 12:18:06 +0530 From: Vivek Goyal Subject: Re: [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Message-ID: <20070426064806.GA2626@in.ibm.com> References: <1177498984.16078.37.camel@sebastian.intellilink.co.jp> Mime-Version: 1.0 Content-Disposition: inline In-Reply-To: <1177498984.16078.37.camel@sebastian.intellilink.co.jp> Reply-To: vgoyal@in.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: kexec-bounces@lists.infradead.org Errors-To: kexec-bounces+dwmw2=infradead.org+dwmw2=infradead.org@lists.infradead.org To: Fernando Luis =?iso-8859-1?Q?V=E1zquez?= Cao Cc: kexec@lists.infradead.org, ak@suse.de, linux-kernel@vger.kernel.org, horms@verge.net.au, "Eric W. Biederman" , mbligh@google.com, Keith Owens , akpm@linux-foundation.org T24gV2VkLCBBcHIgMjUsIDIwMDcgYXQgMDg6MDM6MDRQTSArMDkwMCwgRmVybmFuZG8gTHVpcyBW w6F6cXVleiBDYW8gd3JvdGU6Cj4gCj4gc3RhdGljIF9faW5saW5lX18gdm9pZCBhcGljX3dhaXRf aWNyX2lkbGUodm9pZCkKPiB7Cj4gICB3aGlsZSAoYXBpY19yZWFkKEFQSUNfSUNSKSAmIEFQSUNf SUNSX0JVU1kpCj4gICAgIGNwdV9yZWxheCgpOwo+IH0KPiAKPiBUaGUgYnVzeSBsb29wIGluIHRo aXMgZnVuY3Rpb24gd291bGQgbm90IGJlIHByb2JsZW1hdGljIGlmIHRoZQo+IGNvcnJlc3BvbmRp bmcgc3RhdHVzIGJpdCBpbiB0aGUgSUNSIHdlcmUgYWx3YXlzIHVwZGF0ZWQsIGJ1dCB0aGF0IGRv ZXMKPiBub3Qgc2VlbSB0byBiZSB0aGUgY2FzZSB1bmRlciBjZXJ0YWluIGNyYXNoIHNjZW5hcmlv cy4gQXMgYW4gZXhhbXBsZSwKPiB3aGVuIHRoZSBvdGhlciBDUFVzIGFyZSBsb2NrZWQtdXAgaW5z aWRlIHRoZSBOTUkgaGFuZGxlciB0aGUgQ1BVIHRoYXQKPiBzZW5kcyB0aGUgSVBJIHdpbGwgZW5k IHVwIGxvb3BpbmcgZm9yZXZlciBpbiB0aGUgSUNSIGNoZWNrLCBlZmZlY3RpdmVseQo+IGhhcmQt bG9ja2luZyB0aGUgd2hvbGUgc3lzdGVtLgo+IAo+IFF1b3RpbmcgZnJvbSBJbnRlbCdzICJNdWx0 aVByb2Nlc3NvciBTcGVjaWZpY2F0aW9uIiAoVmVyc2lvbiAxLjQpLCBCLTM6Cj4gCj4gIkEgbG9j YWwgQVBJQyB1bml0IGluZGljYXRlcyBzdWNjZXNzZnVsIGRpc3BhdGNoIG9mIGFuIElQSSBieQo+ IHJlc2V0dGluZyB0aGUgRGVsaXZlcnkgU3RhdHVzIGJpdCBpbiB0aGUgSW50ZXJydXB0IENvbW1h bmQKPiBSZWdpc3RlciAoSUNSKS4gVGhlIG9wZXJhdGluZyBzeXN0ZW0gcG9sbHMgdGhlIGRlbGl2 ZXJ5IHN0YXR1cwo+IGJpdCBhZnRlciBzZW5kaW5nIGFuIElOSVQgb3IgU1RBUlRVUCBJUEkgdW50 aWwgdGhlIGNvbW1hbmQgaGFzCj4gYmVlbiBkaXNwYXRjaGVkLgo+IAo+IEEgcGVyaW9kIG9mIDIw IG1pY3Jvc2Vjb25kcyBzaG91bGQgYmUgc3VmZmljaWVudCBmb3IgSVBJIGRpc3BhdGNoCj4gdG8g Y29tcGxldGUgdW5kZXIgbm9ybWFsIG9wZXJhdGluZyBjb25kaXRpb25zLiBJZiB0aGUgSVBJIGlz IG5vdAo+IHN1Y2Nlc3NmdWxseSBkaXNwYXRjaGVkLCB0aGUgb3BlcmF0aW5nIHN5c3RlbSBjYW4g YWJvcnQgdGhlCj4gY29tbWFuZC4gQWx0ZXJuYXRpdmVseSwgdGhlIG9wZXJhdGluZyBzeXN0ZW0g Y2FuIHJldHJ5IHRoZSBJUEkgYnkKPiB3cml0aW5nIHRoZSBsb3dlciAzMi1iaXQgZG91YmxlIHdv cmQgb2YgdGhlIElDUi4gVGhpcyDigJx0aW1lLW91dOKAnQo+IG1lY2hhbmlzbSBjYW4gYmUgaW1w bGVtZW50ZWQgdGhyb3VnaCBhbiBleHRlcm5hbCBpbnRlcnJ1cHQsIGlmCj4gaW50ZXJydXB0cyBh cmUgZW5hYmxlZCBvbiB0aGUgcHJvY2Vzc29yLCBvciB0aHJvdWdoIGV4ZWN1dGlvbiBvZgo+IGFu IGluc3RydWN0aW9uIG9yIHRpbWUtc3RhbXAgY291bnRlciBzcGluIGxvb3AuIgo+IAo+IEludGVs J3MgZG9jdW1lbnRhdGlvbiBzdWdnZXN0cyB0aGUgaW1wbGVtZW50YXRpb24gb2YgYSB0aW1lLW91 dAo+IG1lY2hhbmlzbSwgd2hpY2gsIGJ5IHRoZSB3YXksIGlzIGFscmVhZHkgYmVpbmcgb3Blbi1j b2RlZCBpbiBzb21lIHBhcnRzCj4gb2YgdGhlIGtlcm5lbCB0aGF0IHRpbmtlciB3aXRoIElDUi4K PiAKPiAtLS0gUG9zc2libGUgc29sdXRpb25zCj4gCj4gKiBTb2x1dGlvbiBBOiBJbXBsZW1lbnQg dGhlIHRpbWUtb3V0IG1lY2hhbmlzbSBpbiBhcGljX3dhaXRfaWNyX2lkbGUuCj4gCj4gVGhlIHBy b2JsZW0gd2l0aCB0aGlzIGFwcHJvYWNoIGlzIHRoYXQgaW50cm9kdWNlcyBhIHBlcmZvcm1hbmNl IHBlbmFsdHkKPiB0aGF0IG1heSBub3QgYmUgYWNjZXB0YWJsZSBmb3Igc29tZSBjYWxsZXJzIG9m IGFwaWNfd2FpdF9pY3JfaWRsZS4KPiBCZXNpZGVzLCBkdXJpbmcgbm9ybWFsIG9wZXJhdGlvbiBk ZWxpdmVyeSBlcnJvcnMgc2hvdWxkIG5vdCBvY2N1ci4gVGhpcwo+IGJyaW5ncyB1cyB0byBzb2x1 dGlvbiBCLgo+IAoKSGkgRmVybmFuZG8sCgpIb3cgbXVjaCBpcyB0aGUgcGVyZm9ybWFuY2UgcGVu YWx0eT8gSXMgaXQgcmVhbGx5IHNpZ25pZmljYW50LiBNeSBwb2ludAppcyB0aGF0LCB0byBtZSBj aGFuZ2luZyBhcGljX3dhaXRfaWNyX2RsZSgpIGl0c2VsZiBzZWVtcyB0byBiZSB0aGUgc2ltcGxl CmFwcHJvYWNoIGluc3RlYWQgb2YgaW50cm9kdWNpbmcgYW5vdGhlciBmdW5jdGlvbi4KCk9yaWdp bmFsIGltcGxlbWVudGF0aW9uIGlzOgoKc3RhdGljIF9faW5saW5lX18gdm9pZCBhcGljX3dhaXRf aWNyX2lkbGUodm9pZCkKewoJd2hpbGUgKGFwaWNfcmVhZChBUElDX0lDUikgJiBBUElDX0lDUl9C VVNZKQoJY3B1X3JlbGF4KCk7Cn0KCkFuZCBuZXcgb25lIHdpbGwgbG9vayBzb21ldGhpbmcgbGlr ZS4KCiAgICAgICAgZG8gewogICAgICAgICAgICAgICAgc2VuZF9zdGF0dXMgPSBhcGljX3JlYWQo QVBJQ19JQ1IpICYgQVBJQ19JQ1JfQlVTWTsKICAgICAgICAgICAgICAgIGlmICghc2VuZF9zdGF0 dXMpCiAgICAgICAgICAgICAgICAgICAgICAgIGJyZWFrOwogICAgICAgICAgICAgICAgdWRlbGF5 KDEwMCk7CiAgICAgICAgfSB3aGlsZSAodGltZW91dCsrIDwgMTAwMCk7CgpUaGVyZSB3aWxsIGJl IGF0IG1heCAxMDAgbWljcm9zZWNvbmQgZGVsYXkgYmVmb3JlIHlvdSByZWFsaXplIHRoYXQgSVBJ IGhhcwpiZWVuIGRpc3BhdGNoZWQuIFRvIG9wdGltaXplIGl0IGZ1cnRoZXIgd2UgY2FuIGNoYW5n ZSBpdCB0byAxMCBtaWNyb3NlY29uZApkZWxheQoKICAgICAgICBkbyB7CiAgICAgICAgICAgICAg ICBzZW5kX3N0YXR1cyA9IGFwaWNfcmVhZChBUElDX0lDUikgJiBBUElDX0lDUl9CVVNZOwogICAg ICAgICAgICAgICAgaWYgKCFzZW5kX3N0YXR1cykKICAgICAgICAgICAgICAgICAgICAgICAgYnJl YWs7CiAgICAgICAgICAgICAgICB1ZGVsYXkoMTApOwogICAgICAgIH0gd2hpbGUgKHRpbWVvdXQr KyA8IDEwMDAwKTsKIApvciBtYXkgYmUKCiAgICAgICAgZG8gewogICAgICAgICAgICAgICAgc2Vu ZF9zdGF0dXMgPSBhcGljX3JlYWQoQVBJQ19JQ1IpICYgQVBJQ19JQ1JfQlVTWTsKICAgICAgICAg ICAgICAgIGlmICghc2VuZF9zdGF0dXMpCiAgICAgICAgICAgICAgICAgICAgICAgIGJyZWFrOwog ICAgICAgICAgICAgICAgdWRlbGF5KDEpOwogICAgICAgIH0gd2hpbGUgKHRpbWVvdXQrKyA8IDEw MDAwMCk7CgpJIGRvbid0IGtub3cgaWYgMSBtaWNybyBzZWNvbmQgZGVsYXkgaXMgc3VwcG9ydGVk LiBJIGRvIHNlZSBpdCBiZWluZwp1c2VkIGluIGtlcm5lbC9ocGV0LmMKCklzIGl0IHRvbyBtdWNo IG9mIHBlcmZvcm1hbmNlIG92ZXJoZWFkPyBTb21lYm9keSB3aG8ga25vd3MgbW9yZSBhYm91dCBp dApuZWVkcyB0byB0ZWxsLiBUbyBtZSBjaGFuZ2luZyBhcGljX3dhaXRfaWNyX2lkbGUoKSBzZWVt cyBzaW1wbGUgaW5zdGVhZApvZiBpbnRyb2R1Y2luZyBhIG5ldyBmdW5jdGlvbiBhbmQgdGhlbiBt YWtpbmcgYSBzcGVjaWFsIGNhc2UgZm9yIE5NSS4KClRoYW5rcwpWaXZlawoKX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18Ka2V4ZWMgbWFpbGluZyBsaXN0Cmtl eGVjQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1h bi9saXN0aW5mby9rZXhlYwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933099AbXDZGuI (ORCPT ); Thu, 26 Apr 2007 02:50:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933112AbXDZGuH (ORCPT ); Thu, 26 Apr 2007 02:50:07 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:36648 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933099AbXDZGuE (ORCPT ); Thu, 26 Apr 2007 02:50:04 -0400 Date: Thu, 26 Apr 2007 12:18:06 +0530 From: Vivek Goyal To: Fernando Luis =?iso-8859-1?Q?V=E1zquez?= Cao Cc: "Eric W. Biederman" , Keith Owens , akpm@linux-foundation.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, ak@suse.de, horms@verge.net.au, mbligh@google.com Subject: Re: [RFC PATCH 0/10] apic_wait_icr_idle issues and possible solutions Message-ID: <20070426064806.GA2626@in.ibm.com> Reply-To: vgoyal@in.ibm.com References: <1177498984.16078.37.camel@sebastian.intellilink.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1177498984.16078.37.camel@sebastian.intellilink.co.jp> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2007 at 08:03:04PM +0900, Fernando Luis Vázquez Cao wrote: > > static __inline__ void apic_wait_icr_idle(void) > { > while (apic_read(APIC_ICR) & APIC_ICR_BUSY) > cpu_relax(); > } > > The busy loop in this function would not be problematic if the > corresponding status bit in the ICR were always updated, but that does > not seem to be the case under certain crash scenarios. As an example, > when the other CPUs are locked-up inside the NMI handler the CPU that > sends the IPI will end up looping forever in the ICR check, effectively > hard-locking the whole system. > > Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: > > "A local APIC unit indicates successful dispatch of an IPI by > resetting the Delivery Status bit in the Interrupt Command > Register (ICR). The operating system polls the delivery status > bit after sending an INIT or STARTUP IPI until the command has > been dispatched. > > A period of 20 microseconds should be sufficient for IPI dispatch > to complete under normal operating conditions. If the IPI is not > successfully dispatched, the operating system can abort the > command. Alternatively, the operating system can retry the IPI by > writing the lower 32-bit double word of the ICR. This “time-out” > mechanism can be implemented through an external interrupt, if > interrupts are enabled on the processor, or through execution of > an instruction or time-stamp counter spin loop." > > Intel's documentation suggests the implementation of a time-out > mechanism, which, by the way, is already being open-coded in some parts > of the kernel that tinker with ICR. > > --- Possible solutions > > * Solution A: Implement the time-out mechanism in apic_wait_icr_idle. > > The problem with this approach is that introduces a performance penalty > that may not be acceptable for some callers of apic_wait_icr_idle. > Besides, during normal operation delivery errors should not occur. This > brings us to solution B. > Hi Fernando, How much is the performance penalty? Is it really significant. My point is that, to me changing apic_wait_icr_dle() itself seems to be the simple approach instead of introducing another function. Original implementation is: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } And new one will look something like. do { send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; if (!send_status) break; udelay(100); } while (timeout++ < 1000); There will be at max 100 microsecond delay before you realize that IPI has been dispatched. To optimize it further we can change it to 10 microsecond delay do { send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; if (!send_status) break; udelay(10); } while (timeout++ < 10000); or may be do { send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; if (!send_status) break; udelay(1); } while (timeout++ < 100000); I don't know if 1 micro second delay is supported. I do see it being used in kernel/hpet.c Is it too much of performance overhead? Somebody who knows more about it needs to tell. To me changing apic_wait_icr_idle() seems simple instead of introducing a new function and then making a special case for NMI. Thanks Vivek