LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 21/22] powerpc/eeh: Process interrupts caused by EEH
From: Gavin Shan @ 2013-05-15  3:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1368588898-16224-1-git-send-email-shangw@linux.vnet.ibm.com>

On PowerNV platform, the EEH event is produced either by detect
on accessing config or I/O registers, or by interrupts dedicated
for EEH report. The patch adds support to process the interrupts
dedicated for EEH report.

Firstly, the kernel thread will be waken up to process incoming
interrupt. The PHBs will be scanned one by one to process all
existing EEH errors. Besides, There're mulple EEH errors that can
be reported from interrupts and we have differentiated actions
against them:

* If the IOC is dead, we will simply panic the system.
* If the PHB is dead, we also simply panic the system.
* If the PHB is fenced, EEH event will be sent to EEH core and
  the fenced PHB is expected to be resetted completely.
* If specific PE has been put into frozen state, EEH event will
  be sent to EEH core so that the PE will be resetted.
* If the error is informational one, we just output the related
  registers for debugging purpose and no more action will be
  taken.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h             |    8 +
 arch/powerpc/platforms/powernv/Makefile    |    2 +-
 arch/powerpc/platforms/powernv/pci-err.c   |  475 ++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/eeh_event.c |    8 +
 4 files changed, 492 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-err.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 05b70dc..7d0dfbf 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -212,6 +212,14 @@ void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
 void eeh_remove_bus_device(struct pci_dev *, int);
 
+#ifdef CONFIG_PPC_POWERNV
+void pci_err_event(void);
+void pci_err_release(void);
+#else
+static inline void pci_err_event(void) { }
+static inline void pci_err_release(void) { }
+#endif
+
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
  *
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 7fe5951..912fa7c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_EEH)	+= pci-err.o eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
new file mode 100644
index 0000000..7dbffb1
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -0,0 +1,475 @@
+/*
+ * The file instends to handle those interrupts dedicated for error
+ * detection from IOC chips. Currently, we only support P7IOC and
+ * need support more IOC chips in the future. The interrupts have
+ * been exported to hypervisor through "opal-interrupts" of "ibm,opal"
+ * OF node. When one of them comes in, the hypervisor simply turns
+ * to the firmware and expects the appropriate events returned. In
+ * turn, we will format one message and queue that in order to process
+ * it at later point.
+ *
+ * On the other hand, we need maintain information about the states
+ * of IO HUBs and their associated PHBs. The information would be
+ * shared by hypervisor and guests in future. While hypervisor or guests
+ * accessing IO HUBs, PHBs and PEs, the state should be checked and
+ * return approriate results. That would benefit EEH RTAS emulation in
+ * hypervisor as well.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/string.h>
+#include <linux/semaphore.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <linux/kthread.h>
+#include <linux/msi.h>
+
+#include <asm/firmware.h>
+#include <asm/sections.h>
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/pci-bridge.h>
+#include <asm/machdep.h>
+#include <asm/msi_bitmap.h>
+#include <asm/ppc-pci.h>
+#include <asm/opal.h>
+#include <asm/iommu.h>
+#include <asm/tce.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+/* Debugging option */
+#ifdef PCI_ERR_DEBUG_ON
+#define PCI_ERR_DBG(args...)	pr_info(args)
+#else
+#define PCI_ERR_DBG(args...)
+#endif
+
+static struct task_struct *pci_err_thread;
+static struct semaphore pci_err_int_sem;
+static struct semaphore pci_err_seq_sem;
+static char *pci_err_diag;
+
+/**
+ * pci_err_event - Report PCI error event
+ * @type: event type
+ *
+ * The function is used for interrupt handler of PCI error IRQs to report
+ * event. As the result, the kthread will be started to handle the PCI
+ * error.
+ */
+void pci_err_event(void)
+{
+	/* Notify kthread to process error */
+	up(&pci_err_int_sem);
+}
+
+static void pci_err_take(void)
+{
+	down(&pci_err_seq_sem);
+}
+
+/**
+ * pci_err_release - Enable error report for sending events
+ *
+ * We're hanlding the EEH event one by one. Each time, there only has
+ * one EEH event caused by error IRQ. The function is called to enable
+ * error report in order to send more EEH events.
+ */
+void pci_err_release(void)
+{
+	up(&pci_err_seq_sem);
+}
+
+/*
+ * When we get global interrupts (e.g. P7IOC RGC), PCI error happens
+ * in critical component of the IOC or PHB. For the formal case, the
+ * firmware just returns OPAL_PCI_ERR_CLASS_HUB and we needn't proceed.
+ * For the late case, we probably need reset one particular PHB. For
+ * that, we're doing is to send EEH event to the toppset PE of that
+ * problematic PHB so that the PHB can be reset by the EEH core.
+ */
+static int pci_err_check_phb(struct pci_controller *hose)
+{
+	struct eeh_pe *phb_pe;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_debug("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * Fence the PHB and send one event to EEH core
+	 * for further processing. We have to fence the
+	 * PHB here because the EEH core always return
+	 * normal state for PHB PE, so we can't do it
+	 * through EEH core.
+	 */
+	if (!(phb_pe->state & EEH_PE_ISOLATED)) {
+		PCI_ERR_DBG("PCI_ERR: Fence PHB#%x and send event "
+			    "to EEH core\n", hose->global_number);
+		eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+		eeh_send_failure_event(phb_pe, EEH_EVENT_INT);
+	} else {
+		pci_err_release();
+	}
+
+	return 0;
+}
+
+/*
+ * When we get interrupts from PHB, there are probablly some PEs that
+ * have been put into frozen state. What we need do is sent one message
+ * to the EEH device, no matter which one it is, so that the EEH core
+ * can check it out and do PE reset accordingly.
+ */
+static int pci_err_check_pe(struct pci_controller *hose, u16 pe_no)
+{
+	struct eeh_pe *phb_pe, *pe;
+	struct eeh_dev dev, *edev;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_warning("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * If the PHB has been put into fenced state, we
+	 * needn't send the duplicate event because the
+	 * whole PHB is going to take reset.
+	 */
+	if (phb_pe->state & EEH_PE_ISOLATED)
+		return 0;
+
+	/* Find the PE according to PE# */
+	memset(&dev, 0, sizeof(struct eeh_dev));
+	dev.phb = hose;
+	dev.pe_config_addr = pe_no;
+	pe = eeh_pe_get(&dev);
+	if (!pe) {
+		pr_debug("%s: Can't find PE for PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PE (%x) found for PHB#%x - PE#%x\n",
+		pe->addr, hose->global_number, pe_no);
+
+	/*
+	 * It doesn't matter which EEH device to get
+	 * the message. Just pick up the one on the
+	 * toppest position.
+	 */
+	edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
+	if (!edev) {
+		pr_err("%s: No EEH devices hooked on PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: First EEH device found on PHB#%x - PE#%x\n",
+		hose->global_number, pe_no);
+
+	if (!eeh_dev_check_failure(edev, EEH_EVENT_INT))
+		pci_err_release();
+
+	return 0;
+}
+
+static void pci_err_hub_diag_common(struct OpalIoP7IOCErrorData *data)
+{
+	/* GEM */
+	pr_info("  GEM XFIR:        %016llx\n", data->gemXfir);
+	pr_info("  GEM RFIR:        %016llx\n", data->gemRfir);
+	pr_info("  GEM RIRQFIR:     %016llx\n", data->gemRirqfir);
+	pr_info("  GEM Mask:        %016llx\n", data->gemMask);
+	pr_info("  GEM RWOF:        %016llx\n", data->gemRwof);
+
+	/* LEM */
+	pr_info("  LEM FIR:         %016llx\n", data->lemFir);
+	pr_info("  LEM Error Mask:  %016llx\n", data->lemErrMask);
+	pr_info("  LEM Action 0:    %016llx\n", data->lemAction0);
+	pr_info("  LEM Action 1:    %016llx\n", data->lemAction1);
+	pr_info("  LEM WOF:         %016llx\n", data->lemWof);
+}
+
+static void pci_err_hub_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoP7IOCErrorData *data;
+	long ret;
+
+	data = (struct OpalIoP7IOCErrorData *)pci_err_diag;
+	ret = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get HUB#%llx diag-data, ret=%ld\n",
+			__func__, phb->hub_id, ret);
+		return;
+	}
+
+	/* Check the error type */
+	if (data->type <= OPAL_P7IOC_DIAG_TYPE_NONE ||
+	    data->type >= OPAL_P7IOC_DIAG_TYPE_LAST) {
+		pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n",
+			__func__, phb->hub_id, data->type);
+		return;
+	}
+
+	switch (data->type) {
+	case OPAL_P7IOC_DIAG_TYPE_RGC:
+		pr_info("P7IOC diag-data for RGC\n\n");
+		pci_err_hub_diag_common(data);
+		pr_info("  RGC Status:      %016llx\n", data->rgc.rgcStatus);
+		pr_info("  RGC LDCP:        %016llx\n", data->rgc.rgcLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_BI:
+		pr_info("P7IOC diag-data for BI %s\n\n",
+			data->bi.biDownbound ? "Downbound" : "Upbound");
+		pci_err_hub_diag_common(data);
+		pr_info("  BI LDCP 0:       %016llx\n", data->bi.biLdcp0);
+		pr_info("  BI LDCP 1:       %016llx\n", data->bi.biLdcp1);
+		pr_info("  BI LDCP 2:       %016llx\n", data->bi.biLdcp2);
+		pr_info("  BI Fence Status: %016llx\n", data->bi.biFenceStatus);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_CI:
+		pr_info("P7IOC diag-data for CI Port %d\\nn",
+			data->ci.ciPort);
+		pci_err_hub_diag_common(data);
+		pr_info("  CI Port Status:  %016llx\n", data->ci.ciPortStatus);
+		pr_info("  CI Port LDCP:    %016llx\n", data->ci.ciPortLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_MISC:
+		pr_info("P7IOC diag-data for MISC\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_I2C:
+		pr_info("P7IOC diag-data for I2C\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	}
+}
+
+static void pci_err_phb_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoP7IOCPhbErrorData *data;
+	int i;
+	long ret;
+
+	data = (struct OpalIoP7IOCPhbErrorData *)pci_err_diag;
+	ret = opal_pci_get_phb_diag_data(phb->opal_id, data, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get diag-data for PHB#%x, ret=%ld\n",
+			__func__, hose->global_number, ret);
+		return;
+	}
+
+	pr_info("PHB#%x Diag-data\n\n", hose->global_number);
+	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
+
+	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
+	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
+	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
+
+	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
+	pr_info("  slotStatus:           %08x\n", data->slotStatus);
+	pr_info("  linkStatus:           %08x\n", data->linkStatus);
+	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
+	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
+
+	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
+	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
+	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
+	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
+	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
+	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
+	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
+	pr_info("  sourceId:             %08x\n", data->sourceId);
+
+	pr_info("  errorClass:           %016llx\n", data->errorClass);
+	pr_info("  correlator:           %016llx\n", data->correlator);
+	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
+	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
+	pr_info("  lemFir:               %016llx\n", data->lemFir);
+	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
+	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
+	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
+	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
+	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
+	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
+	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
+	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
+	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
+	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
+	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
+	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
+	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
+	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
+	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
+	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
+	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
+	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
+
+	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
+		if ((data->pestA[i] >> 63) == 0 &&
+		    (data->pestB[i] >> 63) == 0)
+			continue;
+
+		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
+		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
+	}
+}
+
+/*
+ * Process PCI errors from IOC, PHB, or PE. Here's the list
+ * of expected error types and their severities, as well as
+ * the corresponding action.
+ *
+ * Type                        Severity                Action
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_IOC_DEAD   panic
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_DEAD   panic
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_FENCED eeh
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PE   OPAL_EEH_SEV_PE_ER      eeh
+ */
+static void pci_err_process(struct pci_controller *hose,
+			u16 err_type, u16 severity, u16 pe_no)
+{
+	PCI_ERR_DBG("PCI_ERR: Process error (%d, %d, %d) on PHB#%x\n",
+		err_type, severity, pe_no, hose->global_number);
+
+	switch (err_type) {
+	case OPAL_EEH_IOC_ERROR:
+		if (severity == OPAL_EEH_SEV_IOC_DEAD)
+			panic("Dead IOC of PHB#%x", hose->global_number);
+		else if (severity == OPAL_EEH_SEV_INF) {
+			pci_err_hub_diag_data(hose);
+			pci_err_release();
+		}
+
+		break;
+	case OPAL_EEH_PHB_ERROR:
+		if (severity == OPAL_EEH_SEV_PHB_DEAD)
+			panic("Dead PHB#%x", hose->global_number);
+		else if (severity == OPAL_EEH_SEV_PHB_FENCED)
+			pci_err_check_phb(hose);
+		else if (severity == OPAL_EEH_SEV_INF) {
+			pci_err_phb_diag_data(hose);
+			pci_err_release();
+		}
+
+		break;
+	case OPAL_EEH_PE_ERROR:
+		pci_err_check_pe(hose, pe_no);
+		break;
+	}
+}
+
+static int pci_err_handler(void *dummy)
+{
+	struct pnv_phb *phb;
+	struct pci_controller *hose, *tmp;
+	u64 frozen_pe_no;
+	u16 err_type, severity;
+	long ret;
+
+	while (!kthread_should_stop()) {
+		down(&pci_err_int_sem);
+		PCI_ERR_DBG("PCI_ERR: Get PCI error semaphore\n");
+
+		list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+			phb = hose->private_data;
+restart:
+			pci_err_take();
+			ret = opal_pci_next_error(phb->opal_id,
+					&frozen_pe_no, &err_type, &severity);
+
+			/* If OPAL API returns error, we needn't proceed */
+			if (ret != OPAL_SUCCESS) {
+				PCI_ERR_DBG("PCI_ERR: Invalid return value on "
+					    "PHB#%x (0x%lx) from opal_pci_next_error",
+					    hose->global_number, ret);
+				pci_err_release();
+				continue;
+			}
+
+			/* If the PHB doesn't have error, stop processing */
+			if (err_type == OPAL_EEH_NO_ERROR ||
+			    severity == OPAL_EEH_SEV_NO_ERROR) {
+				PCI_ERR_DBG("PCI_ERR: No error found on PHB#%x\n",
+					hose->global_number);
+				pci_err_release();
+				continue;
+			}
+
+			/*
+			 * Process the error until there're no pending
+			 * errors on the specific PHB.
+			 */
+			pci_err_process(hose, err_type, severity, frozen_pe_no);
+			goto restart;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * pci_err_init - Initialize PCI error handling component
+ *
+ * It should be done before OPAL interrupts got registered because
+ * that depends on this.
+ */
+static int __init pci_err_init(void)
+{
+	int ret = -ENOMEM;
+
+	if (!firmware_has_feature(FW_FEATURE_OPAL))
+		return ret;
+
+	pci_err_diag = (char *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+	if (!pci_err_diag) {
+		pr_err("%s: Failed to alloc memory for diag data\n",
+			__func__);
+		return ret;
+	}
+
+	/* Initialize semaphore and start kthread */
+	sema_init(&pci_err_int_sem, 0);
+	sema_init(&pci_err_seq_sem, 1);
+	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
+	if (IS_ERR(pci_err_thread)) {
+		free_page((unsigned long)pci_err_diag);
+		ret = PTR_ERR(pci_err_thread);
+		pr_err("%s: Failed to start kthread, ret=%d\n",
+			__func__, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+arch_initcall(pci_err_init);
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
index 1f86b80..e4c636e 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -84,6 +84,14 @@ static int eeh_event_handler(void * dummy)
 	eeh_handle_event(pe);
 	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
 
+	/*
+	 * If it's the event caused by error reporting IRQ,
+	 * we need release the module so that precedent events
+	 * could be fired.
+	 */
+	if (event->flag & EEH_EVENT_INT)
+		pci_err_release();
+
 	kfree(event);
 	mutex_unlock(&eeh_event_mutex);
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 22/22] powerpc/eeh: Connect EEH error interrupt handle
From: Gavin Shan @ 2013-05-15  3:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1368588898-16224-1-git-send-email-shangw@linux.vnet.ibm.com>

The EEH error interrupts should have been exported by firmware
through device tree. The OS already installed the interrupt
handler (opal_interrupt()) for them. The patch checks if we have
any existing EEH errors and wakes the kernel thread up to process
that if possible.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index ade4463..75acc6d 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -17,6 +17,8 @@
 #include <linux/interrupt.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
+#include <asm/io.h>
+#include <asm/eeh.h>
 
 #include "powernv.h"
 
@@ -282,6 +284,10 @@ static irqreturn_t opal_interrupt(int irq, void *data)
 	uint64_t events;
 
 	opal_handle_interrupt(virq_to_hw(irq), &events);
+#ifdef CONFIG_EEH
+	if (events & OPAL_EVENT_PCI_ERROR)
+		pci_err_event();
+#endif
 
 	/* XXX TODO: Do something with the events */
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 20/22] powerpc/eeh: Enable EEH check for config access
From: Gavin Shan @ 2013-05-15  3:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1368588898-16224-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch enables EEH check and let EEH core to process the EEH
errors for PowerNV platform while accessing config space. Originally,
the implementation already had mechanism to check EEH errors and
tried to recover from them. However, we never let EEH core to handle
the EEH errors.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.c |   41 +++++++++++++++++++++++++++++++++-
 1 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 55dfca844..15144e4 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -32,6 +32,8 @@
 #include <asm/iommu.h>
 #include <asm/tce.h>
 #include <asm/firmware.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
 
 #include "powernv.h"
 #include "pci.h"
@@ -254,6 +256,10 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_EEH
+	struct device_node *busdn, *dn;
+	struct eeh_pe *phb_pe = NULL;
+#endif
 	u32 bdfn = (((uint64_t)bus->number) << 8) | devfn;
 	s64 rc;
 
@@ -286,8 +292,35 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 	cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n",
 		bus->number, devfn, where, size, *val);
 
-	/* Check if the PHB got frozen due to an error (no response) */
+	/*
+	 * Check if the specified PE has been put into frozen
+	 * state. On the other hand, we needn't do that while
+	 * the PHB has been put into frozen state because of
+	 * PHB-fatal errors.
+	 */
+#ifdef CONFIG_EEH
+	phb_pe = eeh_phb_pe_get(hose);
+	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
+		return PCIBIOS_SUCCESSFUL;
+
+	if (phb->eeh_enabled) {
+		if (*val == EEH_IO_ERROR_VALUE(size)) {
+			busdn = pci_bus_to_OF_node(bus);
+			for (dn = busdn->child; dn; dn = dn->sibling) {
+				struct pci_dn *pdn = PCI_DN(dn);
+
+				if (pdn && pdn->devfn == devfn &&
+					eeh_dev_check_failure(of_node_to_eeh_dev(dn),
+							EEH_EVENT_NORMAL))
+					return PCIBIOS_DEVICE_NOT_FOUND;
+			}
+		}
+	} else {
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+	}
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
@@ -318,8 +351,14 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 	default:
 		return PCIBIOS_FUNC_NOT_SUPPORTED;
 	}
+
 	/* Check if the PHB got frozen due to an error (no response) */
+#ifdef CONFIG_EEH
+	if (!phb->eeh_enabled)
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 19/22] powerpc/eeh: Initialization for PowerNV
From: Gavin Shan @ 2013-05-15  3:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1368588898-16224-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch initializes EEH for PowerNV platform. Because the OPAL
APIs requires HUB ID, we need trace that through struct pnv_phb.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c   |   17 ++++++++++++++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |    6 ++++--
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1da578b..4dc4027 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -978,6 +978,11 @@ static void pnv_pci_ioda_fixup(void)
 	pnv_pci_ioda_setup_PEs();
 	pnv_pci_ioda_setup_seg();
 	pnv_pci_ioda_setup_DMA();
+
+#ifdef CONFIG_EEH
+	eeh_addr_cache_build();
+	eeh_init();
+#endif
 }
 
 /*
@@ -1048,7 +1053,9 @@ static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
 	return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
 }
 
-void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
+void __init pnv_pci_init_ioda_phb(struct device_node *np,
+				  u64 hub_id,
+				  int ioda_type)
 {
 	struct pci_controller *hose;
 	static int primary = 1;
@@ -1086,6 +1093,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 	hose->first_busno = 0;
 	hose->last_busno = 0xff;
 	hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = ioda_type;
 
@@ -1171,6 +1179,9 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 		phb->ioda.io_size, phb->ioda.io_segsize);
 
 	phb->hose->ops = &pnv_pci_ops;
+#ifdef CONFIG_EEH
+	phb->eeh_ops = &ioda_eeh_ops;
+#endif
 
 	/* Setup RID -> PE mapping function */
 	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
@@ -1208,7 +1219,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 
 void pnv_pci_init_ioda2_phb(struct device_node *np)
 {
-	pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2);
+	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }
 
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
@@ -1231,6 +1242,6 @@ void __init pnv_pci_init_ioda_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		/* Look for IODA1 PHBs */
 		if (of_device_is_compatible(phbn, "ibm,ioda-phb"))
-			pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1);
+			pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1);
 	}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 92b37a0..ae72616 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -92,7 +92,7 @@ static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 	set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table);
 }
 
-static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
+static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id,
 					   void *tce_mem, u64 tce_size)
 {
 	struct pnv_phb *phb;
@@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
 	phb->hose->first_busno = 0;
 	phb->hose->last_busno = 0xff;
 	phb->hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = PNV_PHB_P5IOC2;
 	phb->model = PNV_PHB_MODEL_P5IOC2;
@@ -226,7 +227,8 @@ void __init pnv_pci_init_p5ioc2_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		if (of_device_is_compatible(phbn, "ibm,p5ioc2-pcix") ||
 		    of_device_is_compatible(phbn, "ibm,p5ioc2-pciex")) {
-			pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb);
+			pnv_pci_init_p5ioc2_phb(phbn, hub_id,
+					tce_mem, tce_per_phb);
 			tce_mem += tce_per_phb;
 		}
 	}
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH v1 00/22] powerpc/eeh: Enhance converting EEH dev
From: Benjamin Herrenschmidt @ 2013-05-15  3:57 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev
In-Reply-To: <1368588898-16224-1-git-send-email-shangw@linux.vnet.ibm.com>

On Wed, 2013-05-15 at 11:34 +0800, Gavin Shan wrote:
> e don't have existing utility (e.g. errinjct) to test the patchset. In order
> to conduct the test, you need copy over the eeh-debug.c to PowerNV platform
> directory and change the makefile accordingly. Please contact me to get the
> eeh-debug.c if you want run the test case. After that, you need write P7IOC
> registers explicitly to trigger frozen PE or fenced PHB explicitly as the
> following example shows. The patchset has been verified on Firebird-L machine
> where I have 2 Emulex ethernet card on PHB#6. I keep pinging to one of the
> ethernet cards from external and then use following commands to produce frozen
> PE or fenced PHB errors. Eventually, the errors can be recovered and the ethernet
> card is reachable after temporary connection lost.

There is an error injection framework we can use nowadays, or maybe you can
put this in tools/powerpc ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v1 00/22] powerpc/eeh: Enhance converting EEH dev
From: Gavin Shan @ 2013-05-15  5:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1368590264.31689.69.camel@pasglop>

On Wed, May 15, 2013 at 01:57:44PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-05-15 at 11:34 +0800, Gavin Shan wrote:
>> e don't have existing utility (e.g. errinjct) to test the patchset. In order
>> to conduct the test, you need copy over the eeh-debug.c to PowerNV platform
>> directory and change the makefile accordingly. Please contact me to get the
>> eeh-debug.c if you want run the test case. After that, you need write P7IOC
>> registers explicitly to trigger frozen PE or fenced PHB explicitly as the
>> following example shows. The patchset has been verified on Firebird-L machine
>> where I have 2 Emulex ethernet card on PHB#6. I keep pinging to one of the
>> ethernet cards from external and then use following commands to produce frozen
>> PE or fenced PHB errors. Eventually, the errors can be recovered and the ethernet
>> card is reachable after temporary connection lost.
>
>There is an error injection framework we can use nowadays, or maybe you can
>put this in tools/powerpc ?
>

Ben, we don't have error injection framework yet. I have one source file "eeh-debug.c"
which exports P7IOC registers through procfs. In order to force EEH errors (frozen PE
or fenced PHB), we need change the specific bits of corresponding HW registers through
the procfs entries (e.g. /proc/IODA/PHBx/REG).

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH] powerpc: Update currituck pci/usb fixup for new board revision
From: Tony Breeds @ 2013-05-15  5:32 UTC (permalink / raw)
  To: Alistair Popple; +Cc: linuxppc-dev
In-Reply-To: <cone.1368060133.955601.31795.1000@mexican>

[-- Attachment #1: Type: text/plain, Size: 442 bytes --]

On Thu, May 09, 2013 at 10:42:13AM +1000, Alistair Popple wrote:
> The currituck board uses a different IRQ for the pci usb host
> controller depending on the board revision. This patch adds support
> for newer board revisions by retrieving the board revision from the
> FPGA and mapping the appropriate IRQ.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>

Acked-by: Tony Breeds <tony@bakeyournoodle.com>
 
Yours Tony

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH v1 00/22] powerpc/eeh: Enhance converting EEH dev
From: Benjamin Herrenschmidt @ 2013-05-15  5:58 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev
In-Reply-To: <20130515053049.GA28780@shangw.(null)>

On Wed, 2013-05-15 at 13:30 +0800, Gavin Shan wrote:
> >There is an error injection framework we can use nowadays, or maybe you can
> >put this in tools/powerpc ?
> >
> 
> Ben, we don't have error injection framework yet. I have one source file "eeh-debug.c"
> which exports P7IOC registers through procfs. In order to force EEH errors (frozen PE
> or fenced PHB), we need change the specific bits of corresponding HW registers through
> the procfs entries (e.g. /proc/IODA/PHBx/REG).

I mean Linux has an error injection framework :-) We could hook up something into it.

Cheers,
Ben.

^ permalink raw reply

* SATA hang on 8315E triggered by heavy flash write?
From: Anthony Foiani @ 2013-05-15  8:12 UTC (permalink / raw)
  To: linuxppc-dev


Greetings.

We're using a board derived from the MPC8315E.  Fairly regularly, the
SATA connection will freeze up while we are writing to flash memory:

  [  839.806884] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
  [  839.814201] ata2.00: failed command: WRITE DMA
  [  839.818814] ata2.00: cmd ca/00:08:58:95:21/00:00:00:00:00/e1 tag 0 dma 4096 out
  [  839.818838]          res 50/00:00:98:00:18/00:00:00:00:00/e1 Emask 0x10 (ATA bus error)
  [  839.834222] ata2.00: status: { DRDY }
  [  839.838046] ata2: hard resetting link
  [  839.867942] ata2: setting speed (in hard reset)
  [  849.959859] ata2: No Signature Update
  [  850.131872] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [  850.138219] ata2.00: link online but device misclassified
  [  855.143882] ata2.00: qc timeout (cmd 0xec)
  [  855.148144] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [  855.154375] ata2.00: revalidation failed (errno=-5)
  [  855.159376] ata2: hard resetting link
  [  855.659847] ata2: Hardreset failed, not off-lined 0
  [  855.671839] ata2: setting speed (in hard reset)
  [  865.259851] ata2: No Signature Update
  ...

The previous times I saw this, it would eventually recover, and our
device would keep on working correctly.  

However, now that we're doing many more operations per second, it
seems that it doesn't have time to recover, and we get a permanent error:

  [  925.883824] ata2: No Signature Update
  [  926.055893] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [  926.062236] ata2.00: link online but device misclassified
  [  926.067846] ata2: EH complete
  [  926.071337] sd 1:0:0:0: [sda] Unhandled error code
  [  926.076283] sd 1:0:0:0: [sda]  Result: hostbyte=0x04 driverbyte=0x00
  [  926.082758] sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 21 95 58 00 00 08 00
  [  926.090150] end_request: I/O error, dev sda, sector 18978136
  [  926.096132] sd 1:0:0:0: [sda] Unhandled error code
  [  926.101037] sd 1:0:0:0: [sda]  Result: hostbyte=0x04 driverbyte=0x00
  [  926.107504] sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 22 2a 00 00 00 10 00
  [  926.114894] end_request: I/O error, dev sda, sector 19016192
  [  926.120811] sd 1:0:0:0: [sda] Unhandled error code
  [  926.125719] sd 1:0:0:0: [sda]  Result: hostbyte=0x04 driverbyte=0x00
  [  926.132225] sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 00 00 08 00 00 08 00
  [  926.139545] end_request: I/O error, dev sda, sector 8
  [  926.144690] Buffer I/O error on device sda1, logical block 0
  [  926.150437] lost page write due to I/O error on sda1
  [  926.155674] sd 1:0:0:0: [sda] Unhandled error code
  [  926.160614] sd 1:0:0:0: [sda]  Result: hostbyte=0x04 driverbyte=0x00
  [  926.167073] sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 01 18 00 18 00 00 08 00
  [  926.174451] end_request: I/O error, dev sda, sector 18350104

And eventually I get nothing but:

  [ 1177.852326] EXT2-fs (sda1): error: read_inode_bitmap: Cannot read inode bitmap - block_group = 70, inode_bitmap = 2293761
  [ 1177.871983] EXT2-fs (sda1): previous I/O error to superblock detected
  [ 1177.872016] 
  [ 1177.880192] EXT2-fs (sda1): error: read_inode_bitmap: Cannot read inode bitmap - block_group = 70, inode_bitmap = 2293761
  [ 1177.893731] EXT2-fs (sda1): previous I/O error to superblock detected
  [ 1177.893764] 

At this point, /dev/sda is pretty much unusable, and I have to do at
least a reboot to recover.  (I don't recall if I had to do a power
cycle at this point, though.)

I suspect that it is related to errata eLBC-A001 (from MPC8315E Chip
Errata, Rev. 3, 09/2011):

  eLBC-A001:

  Simultaneous FCM and GPCM or UPM operation may erroneously trigger
  bus monitor timeout

  Description: Devices: MPC8315E, MPC8314E
  When the FCM is in the middle of a long transaction, such as NAND
  erase or write, another transaction on the GPCM or UPM triggers the
  bus monitor to start immediately for the GPCM or UPM, even though
  the GPCM or UPM is still waiting for the FCM to finish and has not
  yet started its transaction. If the bus monitor timeout value is not
  programmed for a sufficiently large value, the local bus monitor may
  time out. This timeout corrupts the current NAND Flash operation and
  terminate the GPCM or UPM operation.

  Impact: Local bus monitor may time out unexpectedly and corrupt the
  NAND transaction.

  Workaround: Set the local bus monitor timeout value to the maximum
  by setting LBCR[BMT] = 0 and LBCR[BMTPS] = 0xF.

  Fix plan: No plans to fix

But it seems that erratum is already fixed:

  http://patchwork.ozlabs.org/patch/96339/
  (git patch d08e44570e)

Am I reading that correctly?  (I'm already writing only one flash
sector at a time, but it might be that even a single 0x10000-byte
sector takes long enough to trigger the issue.)  I also verified that
I have the relevant property in my device tree:

  localbus@e0005000 {
    ...
    compatible = "fsl,mpc8315-elbc", "fsl,elbc", "simple-bus";

So, my questions are:

1. Is anyone else seeing something like this?

2. Is there an obvious way for our code to detect that we're in the
   middle of error recovery, so we can not write to the disk until
   recovery is complete?

3. Is there any chance that the 1.5Gbps limiting code might have
   exacerbated the problems?

4. Should I open a support request with Freescale, or if someone from
   Freescale is already reading this, could you look to see if anyone
   else has reported it?

Kernel is 3.4.36, cpuinfo says:

  / # cat /proc/cpuinfo
  processor       : 0
  cpu             : e300c3
  clock           : 266.666664MHz
  revision        : 2.0 (pvr 8085 0020)
  bogomips        : 66.66
  timebase        : 33333333
  platform        : MPC831x RDB
  Memory          : 256 MB

device on that SATA link is an InnoDisk SSD:

  / # dmesg | grep 'ata2'
  [    7.729684] ata2: SATA max UDMA/133 irq 45
  [    7.973996] ata2: setting speed (in hard reset)
  [    7.984849] ata2: Signature Update detected @ 0 msecs
  [    8.390553] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [    8.471244] ata2.00: ATA-8: InnoDisk Corp. - mSATA D150Q, 110520B, max UDMA/133
  [    8.478690] ata2.00: 31277232 sectors, multi 16: LBA48 
  [    8.484562] ata2.00: configured for UDMA/133

As always, any further hints would be very welcome.

Best regards,
Anthony Foiani

^ permalink raw reply

* Re: [PATCH v1 00/22] powerpc/eeh: Enhance converting EEH dev
From: Gavin Shan @ 2013-05-15  8:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1368597537.31689.70.camel@pasglop>

On Wed, May 15, 2013 at 03:58:57PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-05-15 at 13:30 +0800, Gavin Shan wrote:
>> >There is an error injection framework we can use nowadays, or maybe you can
>> >put this in tools/powerpc ?
>> >
>> 
>> Ben, we don't have error injection framework yet. I have one source file "eeh-debug.c"
>> which exports P7IOC registers through procfs. In order to force EEH errors (frozen PE
>> or fenced PHB), we need change the specific bits of corresponding HW registers through
>> the procfs entries (e.g. /proc/IODA/PHBx/REG).
>
>I mean Linux has an error injection framework :-) We could hook up something into it.
>

Ok. Ben. Could you please point me the source code of the error injection framework? :-)
I think you're not talking about userland utility "errinject".

Thanks,
Gavin

^ permalink raw reply

* [PATCH] powerpc: remove dependency on MV64360
From: Paul Bolle @ 2013-05-15  9:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: linuxppc-dev, linux-kernel

The Kconfig entry that allows to "Distribute interrupts on all CPUs by
default" has a (negative) dependency on MV64360. But that Kconfig symbol
was removed in v2.6.27, which means that this dependency has evaluated
to true ever since. It can be removed too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
Tested by grepping the tree.

 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c33e3ad..9a822d9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -404,7 +404,7 @@ config FA_DUMP
 
 config IRQ_ALL_CPUS
 	bool "Distribute interrupts on all CPUs by default"
-	depends on SMP && !MV64360
+	depends on SMP
 	help
 	  This option gives the kernel permission to distribute IRQs across
 	  multiple CPUs.  Saying N here will route all IRQs to the first
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH v1 00/22] powerpc/eeh: Enhance converting EEH dev
From: Benjamin Herrenschmidt @ 2013-05-15  9:48 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev
In-Reply-To: <20130515084740.GA22813@shangw.(null)>

On Wed, 2013-05-15 at 16:47 +0800, Gavin Shan wrote:
> 
> Ok. Ben. Could you please point me the source code of the error
> injection framework? :-)
> I think you're not talking about userland utility "errinject".

It's minimal, it's just the stuff in lib/fault-inject which provides
a framework for exposing attributes for injecting errors along with the
tool in tools/testing/fault-injection, at least that's my understanding.

Or we can just add stuff under debugfs/powerpc, which is what I did
for wsp.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCHv5 0/2] Speed Cap fixes for ppc64
From: Kleber Sacilotto de Souza @ 2013-05-15 12:35 UTC (permalink / raw)
  To: David Airlie
  Cc: linuxppc-dev, dri-devel, Brian King, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Alex Deucher, Alex Deucher,
	Bjorn Helgaas
In-Reply-To: <CADnq5_M9n0q4CucEhNqrJBjHsTh19ASsYr=nsUAJA_-mRr8qvQ@mail.gmail.com>

On 05/06/2013 11:32 AM, Alex Deucher wrote:
> On Fri, May 3, 2013 at 7:01 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Fri, 2013-05-03 at 19:43 -0300, Kleber Sacilotto de Souza wrote:
>>
>>> This patch series does:
>>>    1. max_bus_speed is used to set the device to gen2 speeds
>>>    2. on power there's no longer a conflict between the pseries call and other
>>> architectures, because the overwrite is done via a ppc_md hook
>>>    3. radeon is using bus->max_bus_speed instead of drm_pcie_get_speed_cap_mask
>>> for gen2 capability detection
>>>
>>> The first patch consists of some architecture changes, such as adding a hook on
>>> powerpc for pci_root_bridge_prepare, so that pseries will initialize it to a
>>> function, while all other architectures get a NULL pointer. So that whenever
>>> pci_create_root_bus is called, we'll get max_bus_speed properly setup from
>>> OpenFirmware.
>>>
>>> The second patch consists of simple radeon changes not to call
>>> drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines,
>>> the max_bus_speed property will be properly set already.
>>
>> So I'm ok with the approach now and I might even put the powerpc patch
>> in for 3.10 since arguably we are fixing a nasty bug (uninitialized
>> max_bus_speed).
>>
>> David, what's your feeling about the radeon change ? It would be nice if
>> that could go in soon for various distro targets :-) On the other hand
>> I'm not going to be pushy if you are not comfortable with it.
>
> FWIW, the radeon change looks fine to me.
>
> Alex
>

Hi David,

Are you planning to accept the radeon patch? If yes, can we still expect 
it to make it for 3.10?

As Ben mentioned, we have some distro targets to make and it would be 
nice to have an outlook of the upstream acceptance to start the 
conversations with those distros.


Thanks!
-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCHv5 0/2] Speed Cap fixes for ppc64
From: Alex Deucher @ 2013-05-15 12:58 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: David Airlie, dri-devel, Bjorn Helgaas, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Brian King, Alex Deucher,
	linuxppc-dev
In-Reply-To: <519380FF.4020703@linux.vnet.ibm.com>

On Wed, May 15, 2013 at 8:35 AM, Kleber Sacilotto de Souza
<klebers@linux.vnet.ibm.com> wrote:
> On 05/06/2013 11:32 AM, Alex Deucher wrote:
>>
>> On Fri, May 3, 2013 at 7:01 PM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>>>
>>> On Fri, 2013-05-03 at 19:43 -0300, Kleber Sacilotto de Souza wrote:
>>>
>>>> This patch series does:
>>>>    1. max_bus_speed is used to set the device to gen2 speeds
>>>>    2. on power there's no longer a conflict between the pseries call and
>>>> other
>>>> architectures, because the overwrite is done via a ppc_md hook
>>>>    3. radeon is using bus->max_bus_speed instead of
>>>> drm_pcie_get_speed_cap_mask
>>>> for gen2 capability detection
>>>>
>>>> The first patch consists of some architecture changes, such as adding a
>>>> hook on
>>>> powerpc for pci_root_bridge_prepare, so that pseries will initialize it
>>>> to a
>>>> function, while all other architectures get a NULL pointer. So that
>>>> whenever
>>>> pci_create_root_bus is called, we'll get max_bus_speed properly setup
>>>> from
>>>> OpenFirmware.
>>>>
>>>> The second patch consists of simple radeon changes not to call
>>>> drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines,
>>>> the max_bus_speed property will be properly set already.
>>>
>>>
>>> So I'm ok with the approach now and I might even put the powerpc patch
>>> in for 3.10 since arguably we are fixing a nasty bug (uninitialized
>>> max_bus_speed).
>>>
>>> David, what's your feeling about the radeon change ? It would be nice if
>>> that could go in soon for various distro targets :-) On the other hand
>>> I'm not going to be pushy if you are not comfortable with it.
>>
>>
>> FWIW, the radeon change looks fine to me.
>>
>> Alex
>>
>
> Hi David,
>
> Are you planning to accept the radeon patch? If yes, can we still expect it
> to make it for 3.10?
>
> As Ben mentioned, we have some distro targets to make and it would be nice
> to have an outlook of the upstream acceptance to start the conversations
> with those distros.

Is the rest of this series already upstream?  I don't see a problem
pulling it in if the rest of the patches are in Dave's tree.  Dave, do
you want to grab this, or do you wan me to pull it in through my tree?

Alex

^ permalink raw reply

* Re: [PATCHv5 0/2] Speed Cap fixes for ppc64
From: Kleber Sacilotto de Souza @ 2013-05-15 13:12 UTC (permalink / raw)
  To: Alex Deucher
  Cc: David Airlie, dri-devel, Bjorn Helgaas, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Brian King, Alex Deucher,
	linuxppc-dev
In-Reply-To: <CADnq5_POXuoDUQ3CX2GnYUzxE4SEoVyQq_k3tHdO2TnvDdXOUQ@mail.gmail.com>

On 05/15/2013 09:58 AM, Alex Deucher wrote:
> On Wed, May 15, 2013 at 8:35 AM, Kleber Sacilotto de Souza
> <klebers@linux.vnet.ibm.com> wrote:
>> On 05/06/2013 11:32 AM, Alex Deucher wrote:
>>>
>>> On Fri, May 3, 2013 at 7:01 PM, Benjamin Herrenschmidt
>>> <benh@kernel.crashing.org> wrote:
>>>>
>>>> On Fri, 2013-05-03 at 19:43 -0300, Kleber Sacilotto de Souza wrote:
>>>>
>>>>> This patch series does:
>>>>>     1. max_bus_speed is used to set the device to gen2 speeds
>>>>>     2. on power there's no longer a conflict between the pseries call and
>>>>> other
>>>>> architectures, because the overwrite is done via a ppc_md hook
>>>>>     3. radeon is using bus->max_bus_speed instead of
>>>>> drm_pcie_get_speed_cap_mask
>>>>> for gen2 capability detection
>>>>>
>>>>> The first patch consists of some architecture changes, such as adding a
>>>>> hook on
>>>>> powerpc for pci_root_bridge_prepare, so that pseries will initialize it
>>>>> to a
>>>>> function, while all other architectures get a NULL pointer. So that
>>>>> whenever
>>>>> pci_create_root_bus is called, we'll get max_bus_speed properly setup
>>>>> from
>>>>> OpenFirmware.
>>>>>
>>>>> The second patch consists of simple radeon changes not to call
>>>>> drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines,
>>>>> the max_bus_speed property will be properly set already.
>>>>
>>>>
>>>> So I'm ok with the approach now and I might even put the powerpc patch
>>>> in for 3.10 since arguably we are fixing a nasty bug (uninitialized
>>>> max_bus_speed).
>>>>
>>>> David, what's your feeling about the radeon change ? It would be nice if
>>>> that could go in soon for various distro targets :-) On the other hand
>>>> I'm not going to be pushy if you are not comfortable with it.
>>>
>>>
>>> FWIW, the radeon change looks fine to me.
>>>
>>> Alex
>>>
>>
>> Hi David,
>>
>> Are you planning to accept the radeon patch? If yes, can we still expect it
>> to make it for 3.10?
>>
>> As Ben mentioned, we have some distro targets to make and it would be nice
>> to have an outlook of the upstream acceptance to start the conversations
>> with those distros.
>
> Is the rest of this series already upstream?  I don't see a problem
> pulling it in if the rest of the patches are in Dave's tree.  Dave, do
> you want to grab this, or do you wan me to pull it in through my tree?
>
> Alex
>

The powerpc patch was applied by Ben and is already upstream.


Thanks,

-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Liu Jiang @ 2013-05-15 14:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Neela Syam Kolli, sparclinux@vger.kernel.org, Toshi Kani,
	Jiang Liu, Linux-Scsi, David Airlie, Greg Kroah-Hartman,
	linuxppc-dev, Linux Kernel Mailing List, James E.J. Bottomley,
	Rafael J . Wysocki, Bjorn Helgaas, Yijing Wang,
	linux-pci@vger.kernel.org, Gu Zheng, Paul Mackerras,
	Andrew Morton, Myron Stowe, David S. Miller
In-Reply-To: <CAE9FiQWXN_ETRFPb5oFx-6LXyuHviX3h3NXCQKjcxtv2Uma-JA@mail.gmail.com>

On Wed 15 May 2013 02:52:51 AM CST, Yinghai Lu wrote:
> On Tue, May 14, 2013 at 9:57 AM, Liu Jiang <liuj97@gmail.com> wrote:
>> On Tue 14 May 2013 11:10:33 PM CST, Yinghai Lu wrote:
>>>
>>> On Tue, May 14, 2013 at 7:59 AM, Liu Jiang <liuj97@gmail.com> wrote:
>>>>
>>>> On 05/14/2013 04:26 PM, Gu Zheng wrote:
>>>>       I suggest to use pci_release_dev() instead because it also needs to
>>>> release OF related resources.
>>>> I will update it in next version.
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index bc075a3..2ac6338 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1281,7 +1281,7 @@ static struct pci_dev *pci_scan_device(struct
>>>> pci_bus
>>>> *bus
>>>>           pci_set_of_node(dev);
>>>>
>>>>           if (pci_setup_device(dev)) {
>>>> -               kfree(dev);
>>>> +               pci_release_dev(&dev->dev);
>>>>                   return NULL;
>>>
>>>
>>> no, should move pci_set_of_node calling into pci_setup_device.
>>>
>>> Yinghai
>>
>>
>> I'm not sure whether we should call pci_set_of_node() for SR-IOV devices
>> too,
>> any suggestions here?
>
> or just move down pci_set_of_node after pci_setup_device?
>
> anyway that is another bug.
>
> Yinghai
I'm not familiar with the OF logic and can't make sure whether 
pci_setup_device()
has dependency on dev->of_node. Feel it's more safe to call 
pci_release_of_node()
on failing path instead of tuning call-site of pci_set_of_node().

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Yinghai Lu @ 2013-05-15 14:43 UTC (permalink / raw)
  To: Liu Jiang
  Cc: Neela Syam Kolli, sparclinux@vger.kernel.org, Toshi Kani,
	Jiang Liu, Linux-Scsi, David Airlie, Greg Kroah-Hartman,
	linuxppc-dev, Linux Kernel Mailing List, James E.J. Bottomley,
	Rafael J . Wysocki, Bjorn Helgaas, Yijing Wang,
	linux-pci@vger.kernel.org, Gu Zheng, Paul Mackerras,
	Andrew Morton, Myron Stowe, David S. Miller
In-Reply-To: <51939E16.9090105@gmail.com>

On Wed, May 15, 2013 at 7:39 AM, Liu Jiang <liuj97@gmail.com> wrote:
> On Wed 15 May 2013 02:52:51 AM CST, Yinghai Lu wrote:
>>
>> On Tue, May 14, 2013 at 9:57 AM, Liu Jiang <liuj97@gmail.com> wrote:
>>>
>>> On Tue 14 May 2013 11:10:33 PM CST, Yinghai Lu wrote:
>>>>
>>>>
>>>> On Tue, May 14, 2013 at 7:59 AM, Liu Jiang <liuj97@gmail.com> wrote:
>>>>>
>>>>>
>>>>> On 05/14/2013 04:26 PM, Gu Zheng wrote:
>>>>>       I suggest to use pci_release_dev() instead because it also needs
>>>>> to
>>>>> release OF related resources.
>>>>> I will update it in next version.
>>>>>
>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>> index bc075a3..2ac6338 100644
>>>>> --- a/drivers/pci/probe.c
>>>>> +++ b/drivers/pci/probe.c
>>>>> @@ -1281,7 +1281,7 @@ static struct pci_dev *pci_scan_device(struct
>>>>> pci_bus
>>>>> *bus
>>>>>           pci_set_of_node(dev);
>>>>>
>>>>>           if (pci_setup_device(dev)) {
>>>>> -               kfree(dev);
>>>>> +               pci_release_dev(&dev->dev);
>>>>>                   return NULL;
>>>>
>>>>
>>>>
>>>> no, should move pci_set_of_node calling into pci_setup_device.
>>>>
>>>> Yinghai
>>>
>>>
>>>
>>> I'm not sure whether we should call pci_set_of_node() for SR-IOV devices
>>> too,
>>> any suggestions here?
>>
>>
>> or just move down pci_set_of_node after pci_setup_device?
>>
>> anyway that is another bug.

> I'm not familiar with the OF logic and can't make sure whether
> pci_setup_device()
> has dependency on dev->of_node. Feel it's more safe to call
> pci_release_of_node()
> on failing path instead of tuning call-site of pci_set_of_node().

that is another bug, let of guy handle it.

Yinghai

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Liu Jiang @ 2013-05-15 14:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Neela Syam Kolli, sparclinux@vger.kernel.org, Toshi Kani,
	Jiang Liu, Linux-Scsi, David Airlie, Greg Kroah-Hartman,
	linuxppc-dev, Linux Kernel Mailing List, James E.J. Bottomley,
	Rafael J . Wysocki, Bjorn Helgaas, Yijing Wang,
	linux-pci@vger.kernel.org, Gu Zheng, Paul Mackerras,
	Andrew Morton, Myron Stowe, David S. Miller
In-Reply-To: <CAE9FiQUXrOmN0rkvOwE1G_acEGY3-gouzaXjBMT7EE6uAqvuZA@mail.gmail.com>

On Wed 15 May 2013 10:43:02 PM CST, Yinghai Lu wrote:
> On Wed, May 15, 2013 at 7:39 AM, Liu Jiang <liuj97@gmail.com> wrote:
>> On Wed 15 May 2013 02:52:51 AM CST, Yinghai Lu wrote:
>>>
>>> On Tue, May 14, 2013 at 9:57 AM, Liu Jiang <liuj97@gmail.com> wrote:
>>>>
>>>> On Tue 14 May 2013 11:10:33 PM CST, Yinghai Lu wrote:
>>>>>
>>>>>
>>>>> On Tue, May 14, 2013 at 7:59 AM, Liu Jiang <liuj97@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 05/14/2013 04:26 PM, Gu Zheng wrote:
>>>>>>        I suggest to use pci_release_dev() instead because it also needs
>>>>>> to
>>>>>> release OF related resources.
>>>>>> I will update it in next version.
>>>>>>
>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>> index bc075a3..2ac6338 100644
>>>>>> --- a/drivers/pci/probe.c
>>>>>> +++ b/drivers/pci/probe.c
>>>>>> @@ -1281,7 +1281,7 @@ static struct pci_dev *pci_scan_device(struct
>>>>>> pci_bus
>>>>>> *bus
>>>>>>            pci_set_of_node(dev);
>>>>>>
>>>>>>            if (pci_setup_device(dev)) {
>>>>>> -               kfree(dev);
>>>>>> +               pci_release_dev(&dev->dev);
>>>>>>                    return NULL;
>>>>>
>>>>>
>>>>>
>>>>> no, should move pci_set_of_node calling into pci_setup_device.
>>>>>
>>>>> Yinghai
>>>>
>>>>
>>>>
>>>> I'm not sure whether we should call pci_set_of_node() for SR-IOV devices
>>>> too,
>>>> any suggestions here?
>>>
>>>
>>> or just move down pci_set_of_node after pci_setup_device?
>>>
>>> anyway that is another bug.
>
>> I'm not familiar with the OF logic and can't make sure whether
>> pci_setup_device()
>> has dependency on dev->of_node. Feel it's more safe to call
>> pci_release_of_node()
>> on failing path instead of tuning call-site of pci_set_of_node().
>
> that is another bug, let of guy handle it.
>
> Yinghai
Hi Yinghai,
       I don't know any OF exports, could you please help to CC
some OF experts?
Thanks,
Gerry

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Yinghai Lu @ 2013-05-15 14:58 UTC (permalink / raw)
  To: Liu Jiang, Benjamin Herrenschmidt, Rafael J . Wysocki,
	linuxppc-dev, sparclinux@vger.kernel.org, David S. Miller
  Cc: linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	Bjorn Helgaas, Paul Mackerras, Greg Kroah-Hartman, Gu Zheng,
	Myron Stowe
In-Reply-To: <51939FB1.20203@gmail.com>

On Wed, May 15, 2013 at 7:46 AM, Liu Jiang <liuj97@gmail.com> wrote:
> On Wed 15 May 2013 10:43:02 PM CST, Yinghai Lu wrote:
>>
>
>> that is another bug, let of guy handle it.
>>
>> Yinghai
>
> Hi Yinghai,
>       I don't know any OF exports, could you please help to CC
> some OF experts?

powerpc and sparc are using that.

Ben,

in drivers/pci/probe.c::pci_scan_device() there is

        pci_set_of_node(dev);

        if (pci_setup_device(dev)) {
                kfree(dev);
                return NULL;
        }

so if pci_setup_device fails, there is one dev reference is not release.

please check you can just move down pci_set_of_node down after that
failing path, like


        if (pci_setup_device(dev)) {
                kfree(dev);
                return NULL;
        }

        pci_set_of_node(dev);

Yinghai

^ permalink raw reply

* Re: [RFC PATCH v2, part 2 09/18] PCI, PPC: use hotplug-safe iterators to walk PCI buses
From: Liu Jiang @ 2013-05-15 15:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Toshi Kani, Jiang Liu, Myron Stowe, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel, Rafael J . Wysocki, Gu Zheng,
	Yijing Wang, Bill Pemberton, Paul Mackerras, linux-pci,
	Bjorn Helgaas, Yinghai Lu, Gavin Shan
In-Reply-To: <1368574202.31689.54.camel@pasglop>

On 05/15/2013 07:30 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2013-05-15 at 00:51 +0800, Jiang Liu wrote:
>> Enhance PPC architecture specific code to use hotplug-safe iterators
>> to walk PCI buses.
> I was about to ack it but then I saw:
>
>> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
>> index 51a133a..a41c6dd 100644
>> --- a/arch/powerpc/kernel/pci_64.c
>> +++ b/arch/powerpc/kernel/pci_64.c
>> @@ -208,7 +208,6 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus,
>>   			  unsigned long in_devfn)
>>   {
>>   	struct pci_controller* hose;
>> -	struct list_head *ln;
>>   	struct pci_bus *bus = NULL;
>>   	struct device_node *hose_node;
>>   
>> @@ -229,18 +228,16 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus,
>>   	/* That syscall isn't quite compatible with PCI domains, but it's
>>   	 * used on pre-domains setup. We return the first match
>>   	 */
>> -
>> -	for (ln = pci_root_buses.next; ln != &pci_root_buses; ln = ln->next) {
>> -		bus = pci_bus_b(ln);
>> -		if (in_bus >= bus->number && in_bus <= bus->busn_res.end)
>> +	for_each_pci_root_bus(bus)
>> +		if (in_bus >= bus->number && in_bus <= bus->busn_res.end &&
>> +		    bus->dev.of_node)
>>   			break;
>> -		bus = NULL;
>> -	}
>> -	if (bus == NULL || bus->dev.of_node == NULL)
>> +	if (bus == NULL)
>>   		return -ENODEV;
> You just removed the NULL check for the of_node field...
Hi Benjamin,
     Thanks for review.
     I just moved the "bus->dev.of_node == NULL" into the above 
for_each_pci_root_bus()
loop:)
     Will send you another version according to your suggestion to use 
pci_bus_to_host()
to simplify the code.
Regards!
Gerry

>   
>>   	hose_node = bus->dev.of_node;
>>   	hose = PCI_DN(hose_node)->phb;
> Which is dereferrenced here.	
>
>> +	pci_bus_put(bus);
> On the other hand, the whole thing can probably be using
> pci_bus_to_host() instead.... the above code is bitrotted.
>
>>   	switch (which) {
>>   	case IOBASE_BRIDGE_NUMBER:
>   
> Cheeers,
> Ben.
>
>

^ permalink raw reply

* PCI, PPC: use hotplug-safe iterators to walk PCI buses
From: Jiang Liu @ 2013-05-15 15:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Yinghai Lu
  Cc: Gavin Shan, Toshi Kani, Jiang Liu, Greg Kroah-Hartman,
	linuxppc-dev, linux-kernel, Rafael J . Wysocki, Yijing Wang,
	Bill Pemberton, linux-pci, Gu Zheng, Paul Mackerras, Myron Stowe,
	Jiang Liu
In-Reply-To: <1368574202.31689.54.camel@pasglop>

Enhance PPC architecture specific code to use hotplug-safe iterators
to walk PCI buses.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
---
Hi Benjamin,
	How about this version? Use pci_bus_to_host() instead
to simplify code.
Regards!
Gerry
---
 arch/powerpc/kernel/pci-common.c |  4 ++--
 arch/powerpc/kernel/pci_64.c     | 22 ++++++++--------------
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fa12ae4..26fca09 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1400,7 +1400,7 @@ void __init pcibios_resource_survey(void)
 	struct pci_bus *b;
 
 	/* Allocate and assign resources */
-	list_for_each_entry(b, &pci_root_buses, node)
+	for_each_pci_root_bus(b)
 		pcibios_allocate_bus_resources(b);
 	pcibios_allocate_resources(0);
 	pcibios_allocate_resources(1);
@@ -1410,7 +1410,7 @@ void __init pcibios_resource_survey(void)
 	 * bus available resources to avoid allocating things on top of them
 	 */
 	if (!pci_has_flag(PCI_PROBE_ONLY)) {
-		list_for_each_entry(b, &pci_root_buses, node)
+		for_each_pci_root_bus(b)
 			pcibios_reserve_legacy_regions(b);
 	}
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 51a133a..8bea231 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -207,10 +207,8 @@ void pcibios_setup_phb_io_space(struct pci_controller *hose)
 long sys_pciconfig_iobase(long which, unsigned long in_bus,
 			  unsigned long in_devfn)
 {
-	struct pci_controller* hose;
-	struct list_head *ln;
-	struct pci_bus *bus = NULL;
-	struct device_node *hose_node;
+	struct pci_controller* hose = NULL;
+	struct pci_bus *bus;
 
 	/* Argh ! Please forgive me for that hack, but that's the
 	 * simplest way to get existing XFree to not lockup on some
@@ -229,19 +227,15 @@ long sys_pciconfig_iobase(long which, unsigned long in_bus,
 	/* That syscall isn't quite compatible with PCI domains, but it's
 	 * used on pre-domains setup. We return the first match
 	 */
-
-	for (ln = pci_root_buses.next; ln != &pci_root_buses; ln = ln->next) {
-		bus = pci_bus_b(ln);
-		if (in_bus >= bus->number && in_bus <= bus->busn_res.end)
+	for_each_pci_root_bus(bus)
+		if (in_bus >= bus->number && in_bus <= bus->busn_res.end) {
+			hose = pci_bus_to_host(bus);
+			pci_bus_put(bus);
 			break;
-		bus = NULL;
-	}
-	if (bus == NULL || bus->dev.of_node == NULL)
+		}
+	if (hose == NULL)
 		return -ENODEV;
 
-	hose_node = bus->dev.of_node;
-	hose = PCI_DN(hose_node)->phb;
-
 	switch (which) {
 	case IOBASE_BRIDGE_NUMBER:
 		return (long)hose->first_busno;
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH v5, part4 31/41] mm/ppc: prepare for removing num_physpages and simplify mem_init()
From: Liu Jiang @ 2013-05-15 15:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-arch, James Bottomley, David Howells, Jiang Liu,
	Wen Congyang, linux-mm, Mark Salter, linux-kernel, Michal Hocko,
	Minchan Kim, Paul Mackerras, Mel Gorman, David Rientjes,
	Andrew Morton, linuxppc-dev, Sergei Shtylyov, KAMEZAWA Hiroyuki,
	Jianguo Wu
In-Reply-To: <1368577954.31689.63.camel@pasglop>

On Wed 15 May 2013 08:32:34 AM CST, Benjamin Herrenschmidt wrote:
> On Wed, 2013-05-08 at 23:51 +0800, Jiang Liu wrote:
>> Prepare for removing num_physpages and simplify mem_init().
>
> No objection, I haven't had a chance to actually build/boot test though.
>
> BTW. A recommended way of doing so which is pretty easy even if you
> don't have access to powerpc hardware nowadays is to use
> qemu-system-ppc64 with -M pseries.
>
> You can find cross compilers for the kernel on kernel.org and you can
> feed qemu with some distro installer ISO.
Hi Benjamin,'
      Thanks for review! As could I assume an "Acked-by"?
      I have installed all cross-compiler from ftp.kernel.org and done 
basic
building tests for these patches. But I haven't run new kernel with qemu
yet, that's a really good suggestion, will try it next time.
Regards!
Gerry


>
> Cheers,
> Ben.
>
>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   arch/powerpc/mm/mem.c |   56 +++++++++++--------------------------------------
>>   1 file changed, 12 insertions(+), 44 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index b890245..4e24f1c 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -299,46 +299,27 @@ void __init paging_init(void)
>>
>>   void __init mem_init(void)
>>   {
>> -#ifdef CONFIG_NEED_MULTIPLE_NODES
>> -	int nid;
>> -#endif
>> -	pg_data_t *pgdat;
>> -	unsigned long i;
>> -	struct page *page;
>> -	unsigned long reservedpages = 0, codesize, initsize, datasize, bsssize;
>> -
>>   #ifdef CONFIG_SWIOTLB
>>   	swiotlb_init(0);
>>   #endif
>>
>> -	num_physpages = memblock_phys_mem_size() >> PAGE_SHIFT;
>>   	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
>>
>>   #ifdef CONFIG_NEED_MULTIPLE_NODES
>> -        for_each_online_node(nid) {
>> -		if (NODE_DATA(nid)->node_spanned_pages != 0) {
>> -			printk("freeing bootmem node %d\n", nid);
>> -			free_all_bootmem_node(NODE_DATA(nid));
>> -		}
>> +	{
>> +		pg_data_t *pgdat;
>> +
>> +		for_each_online_pgdat(pgdat)
>> +			if (pgdat->node_spanned_pages != 0) {
>> +				printk("freeing bootmem node %d\n",
>> +					pgdat->node_id);
>> +				free_all_bootmem_node(pgdat);
>> +			}
>>   	}
>>   #else
>>   	max_mapnr = max_pfn;
>>   	free_all_bootmem();
>>   #endif
>> -	for_each_online_pgdat(pgdat) {
>> -		for (i = 0; i < pgdat->node_spanned_pages; i++) {
>> -			if (!pfn_valid(pgdat->node_start_pfn + i))
>> -				continue;
>> -			page = pgdat_page_nr(pgdat, i);
>> -			if (PageReserved(page))
>> -				reservedpages++;
>> -		}
>> -	}
>> -
>> -	codesize = (unsigned long)&_sdata - (unsigned long)&_stext;
>> -	datasize = (unsigned long)&_edata - (unsigned long)&_sdata;
>> -	initsize = (unsigned long)&__init_end - (unsigned long)&__init_begin;
>> -	bsssize = (unsigned long)&__bss_stop - (unsigned long)&__bss_start;
>>
>>   #ifdef CONFIG_HIGHMEM
>>   	{
>> @@ -348,13 +329,9 @@ void __init mem_init(void)
>>   		for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) {
>>   			phys_addr_t paddr = (phys_addr_t)pfn << PAGE_SHIFT;
>>   			struct page *page = pfn_to_page(pfn);
>> -			if (memblock_is_reserved(paddr))
>> -				continue;
>> -			free_highmem_page(page);
>> -			reservedpages--;
>> +			if (!memblock_is_reserved(paddr))
>> +				free_highmem_page(page);
>>   		}
>> -		printk(KERN_DEBUG "High memory: %luk\n",
>> -		       totalhigh_pages << (PAGE_SHIFT-10));
>>   	}
>>   #endif /* CONFIG_HIGHMEM */
>>
>> @@ -367,16 +344,7 @@ void __init mem_init(void)
>>   		(mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) - 1;
>>   #endif
>>
>> -	printk(KERN_INFO "Memory: %luk/%luk available (%luk kernel code, "
>> -	       "%luk reserved, %luk data, %luk bss, %luk init)\n",
>> -		nr_free_pages() << (PAGE_SHIFT-10),
>> -		num_physpages << (PAGE_SHIFT-10),
>> -		codesize >> 10,
>> -		reservedpages << (PAGE_SHIFT-10),
>> -		datasize >> 10,
>> -		bsssize >> 10,
>> -		initsize >> 10);
>> -
>> +	mem_init_print_info(NULL);
>>   #ifdef CONFIG_PPC32
>>   	pr_info("Kernel virtual memory layout:\n");
>>   	pr_info("  * 0x%08lx..0x%08lx  : fixmap\n", FIXADDR_START, FIXADDR_TOP);
>
>

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Benjamin Herrenschmidt @ 2013-05-15 21:29 UTC (permalink / raw)
  To: Liu Jiang
  Cc: Neela Syam Kolli, sparclinux@vger.kernel.org, Toshi Kani,
	Linux-Scsi, Myron Stowe, David Airlie, Greg Kroah-Hartman,
	linuxppc-dev, Linux Kernel Mailing List, James E.J. Bottomley,
	Rafael J . Wysocki, Bjorn Helgaas, Yijing Wang, Paul Mackerras,
	linux-pci@vger.kernel.org, Gu Zheng, Andrew Morton, Yinghai Lu,
	David S. Miller, Jiang Liu
In-Reply-To: <51939FB1.20203@gmail.com>

On Wed, 2013-05-15 at 22:46 +0800, Liu Jiang wrote:
>        I don't know any OF exports, could you please help to CC
> some OF experts?

I wrote that code I think. Sorry, I've missed the beginning of the
thread, what is the problem ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/4] powerpc/book3e: introduce external_input_edge exception handler for 64bit kernel
From: Scott Wood @ 2013-05-15 21:30 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc
In-Reply-To: <20130514020317.GB21564@pek-khao-d1.corp.ad.wrs.com>

On 05/13/2013 09:03:17 PM, Kevin Hao wrote:
> On Mon, May 13, 2013 at 10:47:17AM -0500, Scott Wood wrote:
> > On 05/11/2013 06:26:21 PM, Kevin Hao wrote:
> > >In the external proxy facility mode, the interrupt is automatically
> > >acknowledged with the same effect as reading the IACK register. So
> > >this makes external input interrupt more like edge sensitive. That
> > >means we can leave the irq hard enabled when it occurs with irq =20
> soft
> > >disabled just like the dec and doorbell interrupt. But the External
> > >Proxy Register(EPR) is only considered valid from the time that the
> > >external interrupt occurs until MSR[EE] is set to 1. So we have to
> > >save the EPR before irq hard enabled.
> >
> > Is it really worth it?
>=20
> Maybe. :-)
> Compare with the current kernel:
>   * The overhead is that we need additional load & store the contents =20
> of
>     the EPR from/to PACA.

There's also mental overhead of the extra complexity.  The lazy EE =20
stuff is already fiddly enough (e.g. the recent KVM patches).

>   * The bonus is we keep the irq hard enabled when a external =20
> interrupt occurs
>     with irq soft-disabled. As I know we should leave the irq hard =20
> enabled as
>     much as possible. This is also the primary reason that we =20
> introduce the
>     Lazy EE.

I don't think "as much as possible" is a good way to look at it, so =20
much as "as much as is practical", balanced by also wanting to keep the =20
code as simple as is practical.

-Scott=

^ permalink raw reply

* Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead
From: Benjamin Herrenschmidt @ 2013-05-15 21:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Myron Stowe, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Rafael J . Wysocki, Gu Zheng,
	Paul Mackerras, linux-pci@vger.kernel.org,
	sparclinux@vger.kernel.org, linuxppc-dev, David S. Miller,
	Liu Jiang
In-Reply-To: <CAE9FiQVCSm14S0A1sVWewc7UX-dyOmu9aq1UwqdPZK6qSmbFxQ@mail.gmail.com>

On Wed, 2013-05-15 at 07:58 -0700, Yinghai Lu wrote:

> Ben,
> 
> in drivers/pci/probe.c::pci_scan_device() there is
> 
>         pci_set_of_node(dev);
> 
>         if (pci_setup_device(dev)) {
>                 kfree(dev);
>                 return NULL;
>         }
> 
> so if pci_setup_device fails, there is one dev reference is not release.
> 
> please check you can just move down pci_set_of_node down after that
> failing path, like
> 
> 
>         if (pci_setup_device(dev)) {
>                 kfree(dev);
>                 return NULL;
>         }
> 
>         pci_set_of_node(dev);

No, we want the OF node set when we run the quirks, we intentionally do
that early, the right thing to do is to to call pci_release_of_node()
in the error path (it's safe to call even if the node is NULL).

Cheers,
Ben.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox