From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751762Ab1LHMnh (ORCPT ); Thu, 8 Dec 2011 07:43:37 -0500 Received: from out2.smtp.messagingengine.com ([66.111.4.26]:43477 "EHLO out2.smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751182Ab1LHMng (ORCPT ); Thu, 8 Dec 2011 07:43:36 -0500 X-Sasl-enc: N+Z6tj7zQshzbMjy17ivLFiGfDqTm9vpivn0vQvFjg74 1323348214 Message-ID: <4EE0B156.4080708@ladisch.de> Date: Thu, 08 Dec 2011 13:45:10 +0100 From: Clemens Ladisch User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Jeroen Van den Keybus CC: "Huang, Shane" , Borislav Petkov , "Nguyen, Dong" , linux-kernel@vger.kernel.org, linux1394-devel@lists.sourceforge.net Subject: Re: Unhandled IRQs on AMD E-450 References: <20111130154445.GA27198@gere.osrc.amd.com> <1E8B869C0C6913418421A406C094DF7C0205358F@sshaexmb1.amd.com> <4EDB6C10.10102@ladisch.de> <4EDBA70E.3090905@ladisch.de> In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jeroen Van den Keybus wrote: > I have the impression that I see the same failure mechanism for both > IRQs. All goes well for a while, until an IRQ storm starts right > (e1000: 19 us, firewire-ohci: 39 us) after a valid IRQ. > > Therefore there is a strong correlation between the arrival of the > spurious interrupt, alledgedly caused by a mystery device, and the > earlier arrival of a valid interrupt for a device. Combined with the > fact that it happens on 2 different IRQs pretty much rules out the > possibilty for me that there is either a mystery device at all, or > that the existing devices would both be defective, does it not ? There appears to be a problem with the interrupt handling. In PCI, interrupts are level-triggered, which means that the interrupt line (INTx) is active when it's at level 0 and inactive when it's at level 1. When a device wants to trigger an interrupt, it outputs zero on its interrupt output. The level doesn't get reset to 1 until the driver acknowledges the interrupt (in e1000, read of the ICR; in firewire-ohci, write of IntEventClear). As long as the line stays at 0, all interrupt handlers will continue being called. This mechanism allows multiple devices to share one interrupt line. In PCI Express, there are only one-to-one connections, and there are no separate interrupt lines. A device raises an interrupt by sending an interrupt message, which could be understood as a memory write to a special address at the interrupt controller. Nothing needs to be done to deactive the interrupt; if the device has another reason for an interrupt, it just sends another interrupt message. When a PCI device is connected to a PCI Express system, the old INTx interrupt line must be converted to PCI Express messages. This is done with _two_ special messages, Assert_INTx and Deassert_INTx. The first tells the interrupt controller that some INTx line went from 1 to 0, the second tells it that it went from 0 back to 1; this allows the interrupt controller to implement the level-triggered behaviour. It appears that some Deassert_INTx messages get lost on your system. There are no indications of any other missing PCIe packets, so this looks like a problem with the interrupt handling in your PCI/PCIe bridge, the ASM1083 chip. > I also do not understand, if there would be a stuck IRQ line, why I > can unload and reload e1000 and firewire-ohci without immediately > getting the same IRQ storm. Linux will reenable the interrupt line when a new driver attaches to it. At this point, it's still stuck, but the device initialization will trigger some actual interrupts, and after the first assert/deassert pair, the line will be unstuck. Regards, Clemens