From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTP id BECF8DDE24 for ; Thu, 17 May 2007 14:59:26 +1000 (EST) Subject: Re: eeh bug From: Benjamin Herrenschmidt To: Linas Vepstas In-Reply-To: <1179377184.32247.274.camel@localhost.localdomain> References: <1179377184.32247.274.camel@localhost.localdomain> Content-Type: text/plain Date: Thu, 17 May 2007 14:59:06 +1000 Message-Id: <1179377946.32247.281.camel@localhost.localdomain> Mime-Version: 1.0 Cc: linuxppc-dev list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, 2007-05-17 at 14:46 +1000, Benjamin Herrenschmidt wrote: > Hi Linas ! > > While debugging some other issues, I had a couple of oopses caused by > what looks like a bug in EEH: > > When an RTAS PCI config space call returns all f's, we do an eeh error > check by calling eeh_dn_check_failure(pdn->node, NULL); > > The problem is that second argument... NULL for the pci_dev *. It looks > like the EEH code will try to printk pci_name of that and later on > dereference it within eehd, thus causing an oops. Ok, so I just added a if (dev == NULL) dev = pdn->pcidev; To eeh_dn_check_failure(), and that fixes one of the NULL (name printing), but I get another one a bit later, in pci_find_capability called from eeh_slot_error_detail called from handle_eeh_events. (Probably in gather_pci_data). One thing that looks suspicions is that just before that I see: EEH: of node=/pci/@8000000200000d3/pci@2,4 Which is not a device but the bridge above it... not sure why, maybe we have a NULL pdn->pcidev at that level.. we should probably not sure pci_find_capability in that code anyway and implent our own version using RTAS in case we don't have a pci_dev around, don't you think ? Cheers, Ben.