From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts Date: Tue, 19 May 2009 15:01:56 -0700 (PDT) Message-ID: <20090519.150156.115978100.davem@davemloft.net> References: <1242068453-5124-1-git-send-email-hong.pham@windriver.com> <20090518.220911.102225532.davem@davemloft.net> <4A132A0F.8070800@windriver.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, matheos.worku@sun.com To: hong.pham@windriver.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:50544 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750793AbZESWB6 (ORCPT ); Tue, 19 May 2009 18:01:58 -0400 In-Reply-To: <4A132A0F.8070800@windriver.com> Sender: netdev-owner@vger.kernel.org List-ID: From: "Hong H. Pham" Date: Tue, 19 May 2009 17:52:15 -0400 > Unfortunately I don't have a PCIe NIU card to test in an x86 box. > If the hang does not happen on x86 (which is my suspicion), that > would rule out a problem with the NIU chip. That would mean there's > some interaction between the NIU and sun4v hypervisor that's causing > the spurious interrupts. I am still leaning towards the NIU chip, or our programming of it, as causing this behavior. Although it's possible that the interrupt logic inside of Niagara-T2, or how it's hooked up to the internal NIU ASIC inside of the CPU, might be to blame I don't consider it likely given the basic gist of the behavior you see. To quote section 17.3.2 of the UltraSPARC-T2 manual: An interrupt will only be issued if the timer is zero, the arm bit is set, and one of more LD's in the LDG, have their flags set and not masked. which confirms our understanding of how this should work. Can you test something Hong? Simply trigger the hung case and when it happens read the LDG registers to see if the ARM bit is set, and what the LDG mask bits say. There might be a bug somewhere that causes us to call niu_ldg_rearm() improperly. In particular I'm looking at that test done in niu_interrupt(): if (likely(v0 & ~((u64)1 << LDN_MIF))) niu_schedule_napi(np, lp, v0, v1, v2); else niu_ldg_rearm(np, lp, 1); If we call niu_ldg_rearm() on an LDG being serviced by NAPI before that poll sequence calls napi_complete() we could definitely see this weird behavior. And whatever causes that would be the bug to fix. Thanks!