From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ew0-f217.google.com (mail-ew0-f217.google.com [209.85.219.217]) by bilbo.ozlabs.org (Postfix) with ESMTP id 50CC0B6EDF for ; Thu, 13 Aug 2009 05:40:05 +1000 (EST) Received: by ewy17 with SMTP id 17so284308ewy.2 for ; Wed, 12 Aug 2009 12:40:02 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 12 Aug 2009 16:40:02 -0300 Message-ID: Subject: The infamous ppc_spurious_interrupt From: Leonardo Chiquitto To: linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset=ISO-8859-1 List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hello, I'm trying to understand the rationale behind "ppc_spurious_interrupts" without luck so far. I'm seeing the BAD interrupt counter incrementing with kernels of different ages (2.6.5, 2.6.16, 2.6.27) and on different hardware (IBM P5 and P6 64 bit, Apple PPC 32 bit). Here is the code snip that performs the increment: void do_IRQ(struct pt_regs *regs) { (...) irq = ppc_md.get_irq(); if (irq != NO_IRQ && irq != NO_IRQ_IGNORE) handle_one_irq(irq); else if (irq != NO_IRQ_IGNORE) /* That's not SMP safe ... but who cares ? */ ppc_spurious_interrupts++; (...) } I see that the counter is incremented only when ppc_md.get_irq() returns NO_IRQ. This seems to happen with some frequency here (32 million "bad" interrupts in ~16 days of uptime, or ~23 int/sec). I did some research in the archives and found some references, like this comment from Rob Baxter (http://marc.info/?l=linuxppc-dev&m=119579052741562&w=2): > A good example is an interrupt request from a PCI bus device. Many device > driver interrupt handlers will clear the source of the interrupt by either > reading or writing some register within the device, perform some necessary > actions, and return from the handler. The PCI device is slow to negate its > interrupt request and the interrupt controller sees the interrupt request > from the device again. With the platforms that I'm associated with I've > seen this happen more frequently (i.e., BAD interrupts) as processor > internal core frequencies increase, especially with the 7457. Is this correct? Why we see this behavior only on PPC and not on all architectures that support the PCI bus? If the "bad interrupts" are harmless (I suppose they are), why are we counting them and displaying the counter as BAD in /proc/interrupts? Best regards, Leonardo