From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e28smtp01.in.ibm.com (e28smtp01.in.ibm.com [122.248.162.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 723611A014F for ; Mon, 14 Jul 2014 23:02:02 +1000 (EST) Received: from /spool/local by e28smtp01.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Jul 2014 18:31:56 +0530 Received: from d28relay05.in.ibm.com (d28relay05.in.ibm.com [9.184.220.62]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 378423940043 for ; Mon, 14 Jul 2014 18:31:54 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s6ED2D7T40567022 for ; Mon, 14 Jul 2014 18:32:13 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s6ED1qmg025511 for ; Mon, 14 Jul 2014 18:31:52 +0530 Date: Mon, 14 Jul 2014 23:01:49 +1000 From: Gavin Shan To: Mike Qiu Subject: Re: [PATCH] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot() Message-ID: <20140714130149.GA4752@shangw> References: <1405325963-1664-1-git-send-email-qiudayu@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1405325963-1664-1-git-send-email-qiudayu@linux.vnet.ibm.com> Cc: benh@au1.ibm.com, linuxppc-dev@lists.ozlabs.org, gwshan@linux.vnet.ibm.com Reply-To: Gavin Shan List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, Jul 14, 2014 at 04:19:23AM -0400, Mike Qiu wrote: >[ 121.133381] WARNING: at drivers/pci/search.c:223 >[ 121.133422] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 >[ 121.133424] task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000 >[ 121.133425] NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114 >[ 121.133427] REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) >[ 121.133428] MSR: 9000000000029032 CR: 48002422 XER: 20000000 >[ 121.133433] CFAR: c00000000003752c SOFTE: 0 >GPR00: c000000000037530 c000000001447220 c000000001448c30 c0000003bca1dc00 >GPR04: 0000000000000000 c000000000066064 9000000000009032 0000000000000008 >GPR08: 0000000000000007 0000000000000001 0000000000000100 000000003003d200 >GPR12: 0000000044002482 c00000000fee0000 0000000000000000 c0000000015e8830 >GPR16: c0000000015e8c30 0000000000000000 c0000000015e8430 c0000000015e8030 >GPR20: c000000001348c30 c000000001482180 0000000000000000 0000000000000000 >GPR24: 0000000000200200 c0000003bc243500 c0000003feff4070 c0000003bcec3000 >GPR28: c0000000014cac00 c0000003bca1dc00 0000000000000000 0000000000000000 >[ 121.133454] NIP [c000000000497b70] .pci_get_slot+0x40/0x110 >[ 121.133457] LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190 >[ 121.133458] Call Trace: >[ 121.133461] [c000000001447220] [c000000000721730] .of_get_property+0x30/0x60 (unreliable) >[ 121.133464] [c0000000014472b0] [c000000000037530] .eeh_pe_loc_get+0x150/0x190 >[ 121.133466] [c000000001447340] [c000000000034684] .eeh_dev_check_failure+0x1b4/0x550 >[ 121.133468] [c0000000014473f0] [c000000000034ab0] .eeh_check_failure+0x90/0xf0 >[ 121.133493] [c000000001447490] [d000000002c03e84] .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] >[ 121.133501] [c000000001447520] [d000000002c041a4] .lpfc_poll_eratt+0x64/0x100 [lpfc] >[ 121.133504] [c0000000014475a0] [c0000000000b45b4] .call_timer_fn+0x64/0x190 >[ 121.133506] [c000000001447650] [c0000000000b4d1c] .run_timer_softirq+0x2cc/0x3e0 >[ 121.133508] [c000000001447760] [c0000000000a90c8] .__do_softirq+0x198/0x3c0 >[ 121.133510] [c000000001447880] [c0000000000a9658] .irq_exit+0xc8/0x110 >[ 121.133513] [c000000001447900] [c00000000001e010] .timer_interrupt+0xa0/0xe0 >[ 121.133515] [c000000001447980] [c0000000000026d8] decrementer_common+0x158/0x180 >[ 121.133518] --- Exception: 901 at .arch_local_irq_restore+0x74/0x90 > >pci_get_slot() should not be used in interrupt. But eeh subsystem do >the error checking in interrupt in this situation. > >This patch is to solve this issue. > The commit log has been clear enough, but the following message might be better. I'm not good at writing good commit log as well: --- pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000 NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114 REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 9000000000029032 CR: 48002422 XER: 20000000 CFAR: c00000000003752c SOFTE: 0 : NIP [c000000000497b70] .pci_get_slot+0x40/0x110 LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 >Signed-off-by: Mike Qiu >--- > arch/powerpc/kernel/eeh_pe.c | 29 ++++++++++++++++++++++++++--- > 1 file changed, 26 insertions(+), 3 deletions(-) > >diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c >index fbd01eb..6f4bfee 100644 >--- a/arch/powerpc/kernel/eeh_pe.c >+++ b/arch/powerpc/kernel/eeh_pe.c >@@ -792,6 +792,28 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) > } > > /** >+ * __dn_get_pdev - Retrieve the pci_dev from device_node by bus/devfn >+ * @dn: device_node of the pci_dev >+ * @data: the pci device's bus/devfn >+ * >+ * Retrieve the pci_dev using the given device_node and bus/devfn. >+ */ >+void *__dn_get_pdev(struct device_node *dn, void *data) >+{ The function isn't necessarily public. "static" is enough, I think. I don't think we need this actually. Please refer to more comments below. >+ struct pci_dn *pdn = PCI_DN(dn); >+ int busno = *((int *)data) >> 8; >+ int devfn = *((int *)data) & 0xff; >+ >+ if (!pdn) >+ return NULL; >+ >+ if (pdn->busno == busno && pdn->devfn == devfn) >+ return pdn->pcidev; >+ >+ return NULL; >+} >+ >+/** > * eeh_pe_loc_get - Retrieve location code binding to the given PE > * @pe: EEH PE > * >@@ -807,6 +829,7 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) > struct pci_dev *pdev; > struct device_node *dn; > const char *loc; >+ int bdevfn; > > if (!bus) > return "N/A"; >@@ -823,7 +846,9 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) > if (loc) > return loc; > >- pdev = pci_get_slot(bus, 0x0); >+ /* Get the root port */ >+ bdevfn = (bus->number) << 8 || 0x0; >+ pdev = traverse_pci_devices(hose->dn, __dn_get_pdev, &bdevfn); We needn't search pdev from device-tree and then translate it to device-node. Root port is the only child hooked to root bus's device node (it's also PHB's device-node). So I guess you can just have something: /* Check PHB's device node */ dn = pci_bus_to_OF_node(bus); if (unlikely(!dn)) { loc = "N/A"; goto out; } loc = of_get_property(hose->dn, "ibm,loc-code", NULL); if (loc) return loc; loc = of_get_property(hose->dn, "ibm,io-base-loc-code", NULL); if (loc) return loc; /* Check root port */ dn = dn->child; > } else { > pdev = bus->self; Here, we needn't grab the bridge as well: dn = pci_bus_to_OF_node(bus); > } Then check the device-node of bridge (or root port): if (unlikely(!dn)) { loc = "N/A"; goto out; } loc = of_get_property(dn, "ibm,loc-code", NULL); if (!loc) loc = of_get_property(dn, "ibm,slot-location-code", NULL); if (!loc) loc = "N/A"; >@@ -846,8 +871,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe) > loc = "N/A"; > > out: >- if (pci_is_root_bus(bus) && pdev) >- pci_dev_put(pdev); > return loc; > } Thanks, Gavin