LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] ppc476: Enable a linker work around for IBM errata #46
From: Alistair Popple @ 2014-02-24 23:52 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev
In-Reply-To: <CA+5PVA4Hx4iKkL+vLJjOTwMaKOL-dJPJpSHLjR+nU6MTTGvFcA@mail.gmail.com>

On Mon, 24 Feb 2014 08:35:06 Josh Boyer wrote:
> On Mon, Feb 24, 2014 at 2:00 AM, Alistair Popple <alistair@popple.id.au> 
wrote:
> > This patch adds an option to enable a work around for an icache bug on
> > 476 that can cause execution of stale instructions when falling
> > through pages (IBM errata #46). It requires a recent version of
> > binutils which supports the --ppc476-workaround option.
> > 
> > The work around enables the appropriate linker options and ensures
> > that all module output sections are aligned to 4K page boundaries. The
> > work around is only required when building modules.
> 
> What happens if you're using 64K pages?  Is the alignment 4K always,
> or does it need to be aligned to PAGE_SIZE?

The work around inserts an extra instruction on 4K page boundaries. As a 64K 
(or a 16K) page boundary is also a 4K page boundary the work around should 
cover those page sizes as well.

- Alistair

> josh

^ permalink raw reply

* Re: [PATCH] powerpc/crashdump : fix page frame number check in copy_oldmem_page
From: Michael Ellerman @ 2014-02-25  1:47 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <20140224163055.7263.86979.stgit@nimbus>

On Mon, 2014-02-24 at 17:30 +0100, Laurent Dufour wrote:
> In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
> decide if the page is backed or not, is not valid when the memory layout is
> not continuous.
> 
> This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
> in the memory. In that case max_pfn points to the end of RTAS, and a hole
> between the end of the kdump kernel and RTAS is not backed by PTEs. As a
> consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
> in a direct way the pages in that hole.
> 
> This fix relies on the memblock's service memblock_is_region_memory to
> check if the read page is part or not of the directly accessible memory.

Hi Laurent,

This looks good to me, assuming you've tested it on a PowerVM system as well as
under KVM.

cheers

^ permalink raw reply

* [PATCH] powerpc/powernv Platform dump interface
From: Stewart Smith @ 2014-02-25  1:58 UTC (permalink / raw)
  To: Vasant Hegde, benh, linuxppc-dev; +Cc: Stewart Smith

This enables support for userspace to fetch and initiate FSP and
Platform dumps from the service processor (via firmware) through sysfs.

Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

Flow:
  - We register for OPAL notification events.
  - OPAL sends new dump available notification.
  - We make information on dump available via sysfs
  - Userspace requests dump contents
  - We retrieve the dump via OPAL interface
  - User copies the dump data
  - userspace sends ack for dump
  - We send ACK to OPAL.

sysfs files:
  - We add the /sys/firmware/opal/dump directory
  - echoing 1 (well, anything, but in future we may support
    different dump types) to /sys/firmware/opal/dump/initiate_dump
    will initiate a dump.
  - Each dump that we've been notified of gets a directory
    in /sys/firmware/opal/dump/ with a name of the dump ID (in hex,
    as this is what's used elsewhere to identify the dump).
  - Each dump has files: id, type, dump and acknowledge
    dump is binary and is the dump itself.
    echoing 'ack' to acknowledge (currently any string will do) will
    acknowledge the dump and it will soon after disappear from sysfs.

OPAL APIs:
  - opal_dump_init()
  - opal_dump_info()
  - opal_dump_read()
  - opal_dump_ack()
  - opal_dump_resend_notification()

Currently we are only ever notified for one dump at a time (until
the user explicitly acks the current dump, then we get a notification
of the next dump), but this kernel code should "just work" when OPAL
starts notifying us of all the dumps present.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 Documentation/ABI/stable/sysfs-firmware-opal-dump |   29 ++
 arch/powerpc/include/asm/opal.h                   |   12 +
 arch/powerpc/platforms/powernv/Makefile           |    2 +-
 arch/powerpc/platforms/powernv/opal-dump.c        |  511 +++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S    |    5 +
 arch/powerpc/platforms/powernv/opal.c             |    2 +
 6 files changed, 560 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/stable/sysfs-firmware-opal-dump
 create mode 100644 arch/powerpc/platforms/powernv/opal-dump.c

diff --git a/Documentation/ABI/stable/sysfs-firmware-opal-dump b/Documentation/ABI/stable/sysfs-firmware-opal-dump
new file mode 100644
index 0000000..3c2d252
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-firmware-opal-dump
@@ -0,0 +1,29 @@
+What:		/sys/firmware/opal/dump
+Date:		Feb 2014
+Contact:	Stewart Smith <stewart@linux.vnet.ibm.com>
+Description:
+		This directory exposes interfaces for interacting with
+		the FSP and platform dumps through OPAL firmware interface.
+
+		This is only for the powerpc/powernv platform.
+
+		initiate_dump:	When '1' is written to it,
+				we will initiate a dump.
+				Read this file for supported commands.
+
+		0xXXXX:		A directory for dump 0xXXXX (in hex).
+
+		Each dump has the following files:
+		id:		An ASCII representation of the dump ID
+				in hex.
+		type:		An ASCII representation of the type of
+				dump (or 'unknown').
+		dump:		A binary file containing the dump.
+				The size of the dump is the size of this file.
+		acknowledge:	When 'ack' is written to this, we will
+				acknowledge that we've retrieved the
+				dump to the service processor. It will
+				then remove it, making the dump
+				inaccessible.
+				Reading this file will get a list of
+				supported actions.
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 40157e2..3194870 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,8 +154,13 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_DUMP_INIT				81
+#define OPAL_DUMP_INFO				82
+#define OPAL_DUMP_READ				83
+#define OPAL_DUMP_ACK				84
 #define OPAL_GET_MSG				85
 #define OPAL_CHECK_ASYNC_COMPLETION		86
+#define OPAL_DUMP_RESEND			91
 #define OPAL_SYNC_HOST_REBOOT			87
 
 #ifndef __ASSEMBLY__
@@ -237,6 +242,7 @@ enum OpalPendingState {
 	OPAL_EVENT_EPOW			= 0x80,
 	OPAL_EVENT_LED_STATUS		= 0x100,
 	OPAL_EVENT_PCI_ERROR		= 0x200,
+	OPAL_EVENT_DUMP_AVAIL		= 0x400,
 	OPAL_EVENT_MSG_PENDING		= 0x800,
 };
 
@@ -826,6 +832,11 @@ int64_t opal_lpc_read(uint32_t chip_id, enum OpalLPCAddressType addr_type,
 int64_t opal_validate_flash(uint64_t buffer, uint32_t *size, uint32_t *result);
 int64_t opal_manage_flash(uint8_t op);
 int64_t opal_update_flash(uint64_t blk_list);
+int64_t opal_dump_init(uint8_t dump_type);
+int64_t opal_dump_info(uint32_t *dump_id, uint32_t *dump_size);
+int64_t opal_dump_read(uint32_t dump_id, uint64_t buffer);
+int64_t opal_dump_ack(uint32_t dump_id);
+int64_t opal_dump_resend_notification(void);
 
 int64_t opal_get_msg(uint64_t buffer, size_t size);
 int64_t opal_check_completion(uint64_t buffer, size_t size, uint64_t token);
@@ -861,6 +872,7 @@ extern void opal_get_rtc_time(struct rtc_time *tm);
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_init(void);
+extern void opal_platform_dump_init(void);
 
 extern int opal_machine_check(struct pt_regs *regs);
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 8d767fd..3528c11 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,6 +1,6 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
-obj-y			+= rng.o
+obj-y			+= rng.o opal-dump.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
new file mode 100644
index 0000000..b29fe04
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -0,0 +1,511 @@
+/*
+ * PowerNV OPAL Dump Interface
+ *
+ * Copyright 2013,2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kobject.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
+#include <linux/delay.h>
+
+#include <asm/opal.h>
+
+#define DUMP_TYPE_FSP	0x01
+
+struct dump_obj {
+	struct kobject  kobj;
+	struct bin_attribute dump_attr;
+	uint32_t	id;  /* becomes object name */
+	uint32_t	size;
+	char		*buffer;
+};
+#define to_dump_obj(x) container_of(x, struct dump_obj, kobj)
+
+struct dump_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct dump_obj *dump, struct dump_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct dump_obj *dump, struct dump_attribute *attr,
+			 const char *buf, size_t count);
+};
+#define to_dump_attr(x) container_of(x, struct dump_attribute, attr)
+
+static ssize_t dump_id_show(struct dump_obj *dump_obj,
+			    struct dump_attribute *attr,
+			    char *buf)
+{
+	return sprintf(buf, "0x%x\n", dump_obj->id);
+}
+
+static ssize_t dump_type_show(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      char *buf)
+{
+	/* FIXME: Add OPAL support for getting dump type */
+	return sprintf(buf, "unknown\n");
+}
+
+static ssize_t dump_ack_show(struct dump_obj *dump_obj,
+			     struct dump_attribute *attr,
+			     char *buf)
+{
+	return sprintf(buf, "ack - acknowledge dump\n");
+}
+
+/*
+ * Send acknowledgement to OPAL
+ */
+static int64_t dump_send_ack(uint32_t dump_id)
+{
+	int rc;
+
+	rc = opal_dump_ack(dump_id);
+	if (rc)
+		pr_warn("%s: Failed to send ack to Dump ID 0x%x (%d)\n",
+			__func__, dump_id, rc);
+	return rc;
+}
+
+static void delay_release_kobj(void *kobj)
+{
+	kobject_put((struct kobject *)kobj);
+}
+
+static ssize_t dump_ack_store(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      const char *buf,
+			      size_t count)
+{
+	dump_send_ack(dump_obj->id);
+	sysfs_schedule_callback(&dump_obj->kobj, delay_release_kobj,
+				&dump_obj->kobj, THIS_MODULE);
+	return count;
+}
+
+/* Attributes of a dump
+ * The binary attribute of the dump itself is dynamic
+ * due to the dynamic size of the dump
+ */
+static struct dump_attribute id_attribute =
+	__ATTR(id, 0666, dump_id_show, NULL);
+static struct dump_attribute type_attribute =
+	__ATTR(type, 0666, dump_type_show, NULL);
+static struct dump_attribute ack_attribute =
+	__ATTR(acknowledge, 0660, dump_ack_show, dump_ack_store);
+
+static ssize_t init_dump_show(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      char *buf)
+{
+	return sprintf(buf, "1 - initiate dump\n");
+}
+
+static int64_t dump_fips_init(uint8_t type)
+{
+	int rc;
+
+	rc = opal_dump_init(type);
+	if (rc)
+		pr_warn("%s: Failed to initiate FipS dump (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+static ssize_t init_dump_store(struct dump_obj *dump_obj,
+			       struct dump_attribute *attr,
+			       const char *buf,
+			       size_t count)
+{
+	dump_fips_init(DUMP_TYPE_FSP);
+	pr_info("%s: Initiated FSP dump\n", __func__);
+	return count;
+}
+
+static struct dump_attribute initiate_attribute =
+	__ATTR(initiate_dump, 0600, init_dump_show, init_dump_store);
+
+static struct attribute *initiate_attrs[] = {
+	&initiate_attribute.attr,
+	NULL,
+};
+
+static struct attribute_group initiate_attr_group = {
+	.attrs = initiate_attrs,
+};
+
+static struct kset *dump_kset;
+
+static ssize_t dump_attr_show(struct kobject *kobj,
+			      struct attribute *attr,
+			      char *buf)
+{
+	struct dump_attribute *attribute;
+	struct dump_obj *dump;
+
+	attribute = to_dump_attr(attr);
+	dump = to_dump_obj(kobj);
+
+	if (!attribute->show)
+		return -EIO;
+
+	return attribute->show(dump, attribute, buf);
+}
+
+static ssize_t dump_attr_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buf, size_t len)
+{
+	struct dump_attribute *attribute;
+	struct dump_obj *dump;
+
+	attribute = to_dump_attr(attr);
+	dump = to_dump_obj(kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	return attribute->store(dump, attribute, buf, len);
+}
+
+static const struct sysfs_ops dump_sysfs_ops = {
+	.show = dump_attr_show,
+	.store = dump_attr_store,
+};
+
+static void dump_release(struct kobject *kobj)
+{
+	struct dump_obj *dump;
+
+	dump = to_dump_obj(kobj);
+	vfree(dump->buffer);
+	kfree(dump);
+}
+
+static struct attribute *dump_default_attrs[] = {
+	&id_attribute.attr,
+	&type_attribute.attr,
+	&ack_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type dump_ktype = {
+	.sysfs_ops = &dump_sysfs_ops,
+	.release = &dump_release,
+	.default_attrs = dump_default_attrs,
+};
+
+static void free_dump_sg_list(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg1;
+	while (list) {
+		sg1 = list->next;
+		kfree(list);
+		list = sg1;
+	}
+	list = NULL;
+}
+
+static struct opal_sg_list *dump_data_to_sglist(struct dump_obj *dump)
+{
+	struct opal_sg_list *sg1, *list = NULL;
+	void *addr;
+	int64_t size;
+
+	addr = dump->buffer;
+	size = dump->size;
+
+	sg1 = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!sg1)
+		goto nomem;
+
+	list = sg1;
+	sg1->num_entries = 0;
+	while (size > 0) {
+		/* Translate virtual address to physical address */
+		sg1->entry[sg1->num_entries].data =
+			(void *)(vmalloc_to_pfn(addr) << PAGE_SHIFT);
+
+		if (size > PAGE_SIZE)
+			sg1->entry[sg1->num_entries].length = PAGE_SIZE;
+		else
+			sg1->entry[sg1->num_entries].length = size;
+
+		sg1->num_entries++;
+		if (sg1->num_entries >= SG_ENTRIES_PER_NODE) {
+			sg1->next = kzalloc(PAGE_SIZE, GFP_KERNEL);
+			if (!sg1->next)
+				goto nomem;
+
+			sg1 = sg1->next;
+			sg1->num_entries = 0;
+		}
+		addr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	}
+	return list;
+
+nomem:
+	pr_err("%s : Failed to allocate memory\n", __func__);
+	free_dump_sg_list(list);
+	return NULL;
+}
+
+static void sglist_to_phy_addr(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg, *next;
+
+	for (sg = list; sg; sg = next) {
+		next = sg->next;
+		/* Don't translate NULL pointer for last entry */
+		if (sg->next)
+			sg->next = (struct opal_sg_list *)__pa(sg->next);
+		else
+			sg->next = NULL;
+
+		/* Convert num_entries to length */
+		sg->num_entries =
+			sg->num_entries * sizeof(struct opal_sg_entry) + 16;
+	}
+}
+
+static int64_t dump_read_info(uint32_t *id, uint32_t *size)
+{
+	int rc;
+
+	rc = opal_dump_info(id, size);
+	if (rc)
+		pr_warn("%s: Failed to get dump info (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+static int64_t dump_read_data(struct dump_obj *dump)
+{
+	struct opal_sg_list *list;
+	uint64_t addr;
+	int64_t rc;
+
+	/* Allocate memory */
+	dump->buffer = vzalloc(PAGE_ALIGN(dump->size));
+	if (!dump->buffer) {
+		pr_err("%s : Failed to allocate memory\n", __func__);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Generate SG list */
+	list = dump_data_to_sglist(dump);
+	if (!list) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Translate sg list addr to real address */
+	sglist_to_phy_addr(list);
+
+	/* First entry address */
+	addr = __pa(list);
+
+	/* Fetch data */
+	rc = OPAL_BUSY_EVENT;
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_dump_read(dump->id, addr);
+		if (rc == OPAL_BUSY_EVENT) {
+			opal_poll_events(NULL);
+			msleep(20);
+		}
+	}
+
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL)
+		pr_warn("%s: Extract dump failed for ID 0x%x\n",
+			__func__, dump->id);
+
+	/* Free SG list */
+	free_dump_sg_list(list);
+
+out:
+	return rc;
+}
+
+static ssize_t dump_attr_read(struct file *filep, struct kobject *kobj,
+			      struct bin_attribute *bin_attr,
+			      char *buffer, loff_t pos, size_t count)
+{
+	ssize_t rc;
+
+	struct dump_obj *dump = to_dump_obj(kobj);
+
+	if (!dump->buffer) {
+		rc = dump_read_data(dump);
+
+		if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
+			vfree(dump->buffer);
+			dump->buffer = NULL;
+
+			return -EIO;
+		}
+		if (rc == OPAL_PARTIAL) {
+			/* On a partial read, we just return EIO
+			 * and rely on userspace to ask us to try
+			 * again.
+			 */
+			pr_info("%s: Platform dump partially read.ID = 0x%x\n",
+				__func__, dump->id);
+			return -EIO;
+		}
+	}
+
+	memcpy(buffer, dump->buffer + pos, count);
+
+	/* If we're done reading, let's free the buffer as we
+	 * probably won't need it around anymore (and can always
+	 * re-fetch it from firmware.
+	*/
+	if ((pos+count) >= dump->size) {
+		pr_info("%s: Freeing Platform dump Id = 0x%x\n",
+			__func__, dump->id);
+		vfree(dump->buffer);
+		dump->buffer = NULL;
+	}
+
+	return count;
+}
+
+static struct dump_obj *create_dump_obj(uint32_t id, size_t size)
+{
+	struct dump_obj *dump;
+	int rc;
+
+	dump = kzalloc(sizeof(*dump), GFP_KERNEL);
+	if (!dump)
+		return NULL;
+
+	dump->kobj.kset = dump_kset;
+
+	kobject_init(&dump->kobj, &dump_ktype);
+
+	sysfs_bin_attr_init(&dump->dump_attr);
+
+	dump->dump_attr.attr.name = "dump";
+	dump->dump_attr.attr.mode = 0400;
+	dump->dump_attr.size = size;
+	dump->dump_attr.read = dump_attr_read;
+
+	dump->id = id;
+	dump->size = size;
+
+	rc = kobject_add(&dump->kobj, NULL, "0x%x", id);
+	if (rc) {
+		kobject_put(&dump->kobj);
+		return NULL;
+	}
+
+	rc = sysfs_create_bin_file(&dump->kobj, &dump->dump_attr);
+	if (rc) {
+		kobject_put(&dump->kobj);
+		return NULL;
+	}
+
+	pr_info("%s: New platform dump. ID = 0x%x Size %u\n",
+		__func__, dump->id, dump->size);
+
+	kobject_uevent(&dump->kobj, KOBJ_ADD);
+
+	return dump;
+}
+
+static int process_dump(void)
+{
+	int rc;
+	uint32_t dump_id, dump_size;
+	struct dump_obj *dump;
+	char name[11];
+
+	rc = dump_read_info(&dump_id, &dump_size);
+	if (rc != OPAL_SUCCESS)
+		return rc;
+
+	sprintf(name, "0x%x", dump_id);
+
+	/* we may get notified twice, let's handle
+	 * that gracefully and not create two conflicting
+	 * entries.
+	 */
+	if (kset_find_obj(dump_kset, name))
+		return 0;
+
+	dump = create_dump_obj(dump_id, dump_size);
+	if (!dump)
+		return -1;
+
+	return 0;
+}
+
+static void dump_work_fn(struct work_struct *work)
+{
+	process_dump();
+}
+
+static DECLARE_WORK(dump_work, dump_work_fn);
+
+static void schedule_process_dump(void)
+{
+	schedule_work(&dump_work);
+}
+
+/*
+ * New dump available notification
+ *
+ * Once we get notification, we add sysfs entries for it.
+ * We only fetch the dump on demand, and create sysfs asynchronously.
+ */
+static int dump_event(struct notifier_block *nb,
+		      unsigned long events, void *change)
+{
+	if (events & OPAL_EVENT_DUMP_AVAIL)
+		schedule_process_dump();
+
+	return 0;
+}
+
+static struct notifier_block dump_nb = {
+	.notifier_call  = dump_event,
+	.next           = NULL,
+	.priority       = 0
+};
+
+void __init opal_platform_dump_init(void)
+{
+	int rc;
+
+	dump_kset = kset_create_and_add("dump", NULL, opal_kobj);
+	if (!dump_kset) {
+		pr_warn("%s: Failed to create dump kset\n", __func__);
+		return;
+	}
+
+	rc = sysfs_create_group(&dump_kset->kobj, &initiate_attr_group);
+	if (rc) {
+		pr_warn("%s: Failed to create initiate dump attr group\n",
+			__func__);
+		kobject_put(&dump_kset->kobj);
+		return;
+	}
+
+	rc = opal_notifier_register(&dump_nb);
+	if (rc) {
+		pr_warn("%s: Can't register OPAL event notifier (%d)\n",
+			__func__, rc);
+		return;
+	}
+
+	opal_dump_resend_notification();
+}
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 3e8829c..eb403da 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,6 +126,11 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_dump_init,			OPAL_DUMP_INIT);
+OPAL_CALL(opal_dump_info,			OPAL_DUMP_INFO);
+OPAL_CALL(opal_dump_read,			OPAL_DUMP_READ);
+OPAL_CALL(opal_dump_ack,			OPAL_DUMP_ACK);
 OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
 OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);
+OPAL_CALL(opal_dump_resend_notification,	OPAL_DUMP_RESEND);
 OPAL_CALL(opal_sync_host_reboot,		OPAL_SYNC_HOST_REBOOT);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 65499ad..262cd1a 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -474,6 +474,8 @@ static int __init opal_init(void)
 	if (rc == 0) {
 		/* Setup code update interface */
 		opal_flash_init();
+		/* Setup platform dump extract interface */
+		opal_platform_dump_init();
 	}
 
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup
From: Nishanth Aravamudan @ 2014-02-25  2:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Hocko, linux-mm, Mel Gorman, David Rientjes, Andrew Morton,
	linuxppc-dev, Joonsoo Kim, Anton Blanchard
In-Reply-To: <alpine.DEB.2.10.1402241342480.20839@nuc>

On 24.02.2014 [13:43:31 -0600], Christoph Lameter wrote:
> On Fri, 21 Feb 2014, Nishanth Aravamudan wrote:
> 
> > I added two calls to local_memory_node(), I *think* both are necessary,
> > but am willing to be corrected.
> >
> > One is in map_cpu_to_node() and one is in start_secondary(). The
> > start_secondary() path is fine, AFAICT, as we are up & running at that
> > point. But in [the renamed function] update_numa_cpu_node() which is
> > used by hotplug, we get called from do_init_bootmem(), which is before
> > the zonelists are setup.
> >
> > I think both calls are necessary because I believe the
> > arch_update_cpu_topology() is used for supporting firmware-driven
> > home-noding, which does not invoke start_secondary() again (the
> > processor is already running, we're just updating the topology in that
> > situation).
> >
> > Then again, I could special-case the do_init_bootmem callpath, which is
> > only called at kernel init time?
> 
> Well taht looks to be simpler.

Ok, I'll work on this.

> > > I do agree that calling local_memory_node() too early then trying to
> > > fudge around the consequences seems rather wrong.
> >
> > If the answer is to simply not call local_memory_node() early, I'll
> > submit a patch to at least add a comment, as there's nothing in the code
> > itself to prevent this from happening and is guaranteed to oops.
> 
> Ok.

Thanks!
-Nish

^ permalink raw reply

* Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC, Arnaldo Carvalho de Melo, Ingo Molnar,
	Paul Mackerras, Peter Zijlstra
  Cc: LKML
In-Reply-To: <1392415338-16288-2-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote:
> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
> generate functions to extract the relevent bits from
> event->attr.config{,1,2} for use by sw-like pmus where the
> 'config{,1,2}' values don't map directly to hardware registers.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  include/linux/perf_event.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e56b07f..2702e91 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -871,4 +871,21 @@ _name##_show(struct device *dev,					\
>  									\
>  static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
>  
> +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)		\
> +PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
> +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
> +
> +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)		\
> +static u64 event_get_##name##_max(void)					\
> +{									\
> +	int bits = (bit_end) - (bit_start) + 1;				\
> +	return ((0x1ULL << (bits - 1ULL)) - 1ULL) |			\
> +		(0xFULL << (bits - 4ULL));				\
> +}									\
> +static u64 event_get_##name(struct perf_event *event)			\
> +{									\
> +	return (event->attr.attr_var >> (bit_start)) &			\
> +		event_get_##name##_max();				\
> +}

I still don't like the names.

EVENT_GETTER_AND_FORMAT()
EVENT_RESERVED()

?

It's not clear to me the max routine is useful in general. Can't we just do:

> +#define EVENT_RESERVED(name, attr_var, bit_start, bit_end)		\
> +static u64 event_get_##name(struct perf_event *event)		\
> +{									\
> +	return (event->attr.attr_var >> (bit_start)) &			\
> +		((0x1ULL << ((bit_end) - (bit_start) + 1)) - 1ULL);	\
> +}


cheers

^ permalink raw reply

* Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC, Arnaldo Carvalho de Melo, Ingo Molnar,
	Paul Mackerras, Peter Zijlstra
  Cc: LKML
In-Reply-To: <1392415338-16288-3-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
> Export the swevent hrtimer helpers currently only used in events/core.c
> to allow the addition of architecture specific sw-like pmus.

Peter, Ingo, can we get your ACK on this please?

cheers


> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  include/linux/perf_event.h | 5 ++++-
>  kernel/events/core.c       | 8 ++++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2702e91..24378a9 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -559,7 +559,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
>  				int src_cpu, int dst_cpu);
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
> -
> +extern void perf_swevent_init_hrtimer(struct perf_event *event);
> +extern void perf_swevent_start_hrtimer(struct perf_event *event);
> +extern void perf_swevent_cancel_hrtimer(struct perf_event *event);
> +extern int perf_swevent_event_idx(struct perf_event *event);
>  
>  struct perf_sample_data {
>  	u64				type;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 56003c6..feb0347 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event)
>  	return 0;
>  }
>  
> -static int perf_swevent_event_idx(struct perf_event *event)
> +int perf_swevent_event_idx(struct perf_event *event)
>  {
>  	return 0;
>  }
> @@ -6045,7 +6045,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>  	return ret;
>  }
>  
> -static void perf_swevent_start_hrtimer(struct perf_event *event)
> +void perf_swevent_start_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	s64 period;
> @@ -6067,7 +6067,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
>  				HRTIMER_MODE_REL_PINNED, 0);
>  }
>  
> -static void perf_swevent_cancel_hrtimer(struct perf_event *event)
> +void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  
> @@ -6079,7 +6079,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  	}
>  }
>  
> -static void perf_swevent_init_hrtimer(struct perf_event *event)
> +void perf_swevent_init_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  
> -- 
> 1.8.5.4
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

^ permalink raw reply

* Re: [PATCH v2 03/11] sysfs: create bin_attributes under the requested group
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC, Greg Kroah-Hartman
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <1392415338-16288-4-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:07 UTC, Cody P Schafer wrote:
> bin_attributes created/updated in create_files() (such as those listed
> via (struct device).attribute_groups) were not placed under the
> specified group, and instead appeared in the base kobj directory.
> 
> Fix this by making bin_attributes use creating code similar to normal
> attributes.
> 
> A quick grep shows that no one is using bin_attrs in a named attribute
> group yet, so we can do this without breaking anything in usespace.
> 
> Note that I do not add is_visible() support to
> bin_attributes, though that could be done as well.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

Greg has already taken this, so we'll consider that as good as an ack from him,
unless he wants to give us one.

cheers

^ permalink raw reply

* Re: [PATCH v2 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC, Alexander Graf, Anton Blanchard,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Peter Zijlstra
In-Reply-To: <1392415338-16288-5-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:08 UTC, Cody P Schafer wrote:
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/hvcall.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index d8b600b..652f7e4 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -274,6 +274,11 @@
>  /* Platform specific hcalls, used by KVM */
>  #define H_RTAS			0xf000
>  
> +/* "Platform specific hcalls", provided by PHYP */
> +#define H_GET_24X7_CATALOG_PAGE 0xF078
> +#define H_GET_24X7_DATA		0xF07C
> +#define H_GET_PERF_COUNTER_INFO 0xF080

Some tabs some spaces, use tabs.

cheers

^ permalink raw reply

* Re: [PATCH v2 05/11] powerpc: add hv_gpci interface header
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <1392415338-16288-6-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:09 UTC, Cody P Schafer wrote:
> "H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
> here on) is an interface to retrieve specific performance counters and
> other data from the hypervisor. All outputs have a fixed format (and
> are represented as structs in this patch).

I still see unused stuff in here, can you strip it back to just what we need.
Same goes for the next patch.

cheers

^ permalink raw reply

* Re: [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <1392415338-16288-8-git-send-email-cody@linux.vnet.ibm.com>

[PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities

All the patches that touch perf should be "powerpc/perf: foo"

On Fri, 2014-14-02 at 22:02:11 UTC, Cody P Schafer wrote:
> ...

I realise this is a fairly small patch but a changelog is still nice. You could
for example mention that we don't currently use .ga, .expanded or .lab but
we're adding the logic anyway because ...


> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/hv-common.c | 39 +++++++++++++++++++++++++++++++++++++++
>  arch/powerpc/perf/hv-common.h | 17 +++++++++++++++++
>  2 files changed, 56 insertions(+)
>  create mode 100644 arch/powerpc/perf/hv-common.c
>  create mode 100644 arch/powerpc/perf/hv-common.h
> 
> diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
> new file mode 100644
> index 0000000..47e02b3
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-common.c
> @@ -0,0 +1,39 @@
> +#include <asm/io.h>
> +#include <asm/hvcall.h>
> +
> +#include "hv-gpci.h"
> +#include "hv-common.h"
> +
> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
> +{
> +	unsigned long r;
> +	struct p {
> +		struct hv_get_perf_counter_info_params params;
> +		struct cv_system_performance_capabilities caps;
> +	} __packed __aligned(sizeof(uint64_t));
> +
> +	struct p arg = {
> +		.params = {
> +			.counter_request = cpu_to_be32(
> +					CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
> +			.starting_index = cpu_to_be32(-1),
> +			.counter_info_version_in = 0,
> +		}
> +	};
> +
> +	r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
> +			       virt_to_phys(&arg), sizeof(arg));
> +
> +	if (r)
> +		return r;
> +
> +	pr_devel("capability_mask: 0x%x\n", arg.caps.capability_mask);
> +
> +	caps->version = arg.params.counter_info_version_out;
> +	caps->collect_privileged = !!arg.caps.perf_collect_privileged;
> +	caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
> +	caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
> +	caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
> +
> +	return r;
> +}
> diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
> new file mode 100644
> index 0000000..7e615bd
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-common.h
> @@ -0,0 +1,17 @@
> +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
> +#define LINUX_POWERPC_PERF_HV_COMMON_H_
> +
> +#include <linux/types.h>
> +
> +struct hv_perf_caps {
> +	u16 version;
> +	u16 collect_privileged:1,
> +	    ga:1,
> +	    expanded:1,
> +	    lab:1,
> +	    unused:12;
> +};
> +
> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
> +
> +#endif
> -- 
> 1.8.5.4
> 
> 

^ permalink raw reply

* Re: [PATCH v2 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <1392415338-16288-9-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:12 UTC, Cody P Schafer wrote:
> This provides a basic link between perf and hv_gpci. Notably, it does
> not yet support transactions and does not list any events (they can
> still be manually composed).

Can you explain how the HV_CAPS stuff ends up looking.

I'm not against adding it, but I'd like to understand how we expect it to be
used a bit better.

cheers

> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> new file mode 100644
> index 0000000..1f5d96d
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-gpci.c
> +
> +static struct pmu h_gpci_pmu = {
> +	.task_ctx_nr = perf_invalid_context,
> +
> +	.name = "hv_gpci",
> +	.attr_groups = attr_groups,
> +	.event_init  = h_gpci_event_init,
> +	.add         = h_gpci_event_add,
> +	.del         = h_gpci_event_del,
		     = h_gpci_event_stop,

> +	.start       = h_gpci_event_start,
> +	.stop        = h_gpci_event_stop,
> +	.read        = h_gpci_event_read,
		     = h_gpci_event_update

> +	.event_idx = perf_swevent_event_idx,
> +};

^ permalink raw reply

* Re: [PATCH v2 09/11] powerpc/perf: add support for the hv 24x7 interface
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <1392415338-16288-10-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:13 UTC, Cody P Schafer wrote:
> This provides a basic interface between hv_24x7 and perf. Similar to
> the one provided for gpci, it lacks transaction support and does not
> list any events.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/hv-24x7.c | 491 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 491 insertions(+)
>  create mode 100644 arch/powerpc/perf/hv-24x7.c
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> new file mode 100644
> index 0000000..13de140
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-24x7.c
...
> +
> +/*
> + * read_offset_data - copy data from one buffer to another while treating the
> + *                    source buffer as a small view on the total avaliable
> + *                    source data.
> + *
> + * @dest: buffer to copy into
> + * @dest_len: length of @dest in bytes
> + * @requested_offset: the offset within the source data we want. Must be > 0
> + * @src: buffer to copy data from
> + * @src_len: length of @src in bytes
> + * @source_offset: the offset in the sorce data that (src,src_len) refers to.
> + *                 Must be > 0
> + *
> + * returns the number of bytes copied.
> + *
> + * '.' areas in d are written to.
> + *
> + *                       u
> + *   x         w	 v  z
> + * d           |.........|
> + * s |----------------------|
> + *
> + *                      u
> + *   x         w	z     v
> + * d           |........------|
> + * s |------------------|
> + *
> + *   x         w        u,z,v
> + * d           |........|
> + * s |------------------|
> + *
> + *   x,w                u,v,z
> + * d |------------------|
> + * s |------------------|
> + *
> + *   x        u
> + *   w        v		z
> + * d |........|
> + * s |------------------|
> + *
> + *   x      z   w      v
> + * d            |------|
> + * s |------|
> + *
> + * x = source_offset
> + * w = requested_offset
> + * z = source_offset + src_len
> + * v = requested_offset + dest_len
> + *
> + * w_offset_in_s = w - x = requested_offset - source_offset
> + * z_offset_in_s = z - x = src_len
> + * v_offset_in_s = v - x = request_offset + dest_len - src_len
> + * u_offset_in_s = min(z_offset_in_s, v_offset_in_s)
> + *
> + * copy_len = u_offset_in_s - w_offset_in_s = min(z_offset_in_s, v_offset_in_s)
> + *						- w_offset_in_s

Comments are great, especially for complicated code like this. But at a glance
I don't actually understand what this comment is trying to tell me.

> + */
> +static ssize_t read_offset_data(void *dest, size_t dest_len,
> +				loff_t requested_offset, void *src,
> +				size_t src_len, loff_t source_offset)
> +{
> +	size_t w_offset_in_s = requested_offset - source_offset;
> +	size_t z_offset_in_s = src_len;
> +	size_t v_offset_in_s = requested_offset + dest_len - src_len;
> +	size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
> +	size_t copy_len = u_offset_in_s - w_offset_in_s;
> +
> +	if (requested_offset < 0 || source_offset < 0)
> +		return -EINVAL;
> +
> +	if (z_offset_in_s <= w_offset_in_s)
> +		return 0;
> +
> +	memcpy(dest, src + w_offset_in_s, copy_len);
> +	return copy_len;
> +}
> +
> +static unsigned long h_get_24x7_catalog_page(char page[static 4096],
> +					     u32 version, u32 index)
> +{
> +	WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
> +	return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
> +			virt_to_phys(page),
> +			version,
> +			index);
> +}
> +
> +static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
> +			    struct bin_attribute *bin_attr, char *buf,
> +			    loff_t offset, size_t count)
> +{
> +	unsigned long hret;
> +	ssize_t ret = 0;
> +	size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
> +	loff_t page_offset = 0;
> +	uint32_t catalog_version_num = 0;
> +	void *page = kmalloc(4096, GFP_USER);
> +	struct hv_24x7_catalog_page_0 *page_0 = page;
> +	if (!page)
> +		return -ENOMEM;
> +
> +
> +	hret = h_get_24x7_catalog_page(page, 0, 0);
> +	if (hret) {
> +		ret = -EIO;
> +		goto e_free;
> +	}
> +
> +	catalog_version_num = be32_to_cpu(page_0->version);
> +	catalog_page_len = be32_to_cpu(page_0->length);
> +	catalog_len = catalog_page_len * 4096;
> +
> +	page_offset = offset / 4096;
> +	page_count  = count  / 4096;
> +
> +	if (page_offset >= catalog_page_len)
> +		goto e_free;
> +
> +	if (page_offset != 0) {
> +		hret = h_get_24x7_catalog_page(page, catalog_version_num,
> +					       page_offset);
> +		if (hret) {
> +			ret = -EIO;
> +			goto e_free;
> +		}
> +	}
> +
> +	ret = read_offset_data(buf, count, offset,
> +				page, 4096, page_offset * 4096);
> +e_free:
> +	if (hret)
> +		pr_err("h_get_24x7_catalog_page(ver=%d, page=%lld) failed: rc=%ld\n",
> +				catalog_version_num, page_offset, hret);
> +	kfree(page);
> +
> +	pr_devel("catalog_read: offset=%lld(%lld) count=%zu(%zu) catalog_len=%zu(%zu) => %zd\n",
> +			offset, page_offset, count, page_count, catalog_len,
> +			catalog_page_len, ret);
> +
> +	return ret;
> +}
> +
> +#define PAGE_0_ATTR(_name, _fmt, _expr)				\
> +static ssize_t _name##_show(struct device *dev,			\
> +			    struct device_attribute *dev_attr,	\
> +			    char *buf)				\
> +{								\
> +	unsigned long hret;					\
> +	ssize_t ret = 0;					\
> +	void *page = kmalloc(4096, GFP_USER);			\
> +	struct hv_24x7_catalog_page_0 *page_0 = page;		\
> +	if (!page)						\
> +		return -ENOMEM;					\
> +	hret = h_get_24x7_catalog_page(page, 0, 0);		\
> +	if (hret) {						\
> +		ret = -EIO;					\
> +		goto e_free;					\
> +	}							\
> +	ret = sprintf(buf, _fmt, _expr);			\
> +e_free:								\
> +	kfree(page);						\
> +	return ret;						\
> +}								\
> +static DEVICE_ATTR_RO(_name)
> +
> +PAGE_0_ATTR(catalog_version, "%lld\n",
> +		(unsigned long long)be32_to_cpu(page_0->version));
> +PAGE_0_ATTR(catalog_len, "%lld\n",
> +		(unsigned long long)be32_to_cpu(page_0->length) * 4096);
> +static BIN_ATTR_RO(catalog, 0/* real length varies */);

So we're dumping the catalog out as a binary blob.

Why do we want to do that?

It clearly violates the sysfs rule-of-sorts of ASCII and one value per file.
Obviously there can be exceptions, but what's our justification?

> +static struct bin_attribute *if_bin_attrs[] = {
> +	&bin_attr_catalog,
> +	NULL,
> +};
> +
> +static struct attribute *if_attrs[] = {
> +	&dev_attr_catalog_len.attr,
> +	&dev_attr_catalog_version.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group if_group = {
> +	.name = "interface",
> +	.bin_attrs = if_bin_attrs,
> +	.attrs = if_attrs,
> +};

Both pmus have an "interface" directory, but they don't seem to have anything
in common? Its feels a little ad-hoc.

> +static const struct attribute_group *attr_groups[] = {
> +	&format_group,
> +	&if_group,
> +	NULL,
> +};


cheers

^ permalink raw reply

* Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
From: Michael Ellerman @ 2014-02-25  3:33 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC, Aneesh Kumar K.V, Anshuman Khandual,
	Anton Blanchard, Benjamin Herrenschmidt, Kumar Gala, Lijun Pan,
	Li Yang, Paul Bolle, Priyanka Jain, Scott Wood, Tang Yuantian
  Cc: Ingo Molnar, Paul Mackerras, LKML, Arnaldo Carvalho de Melo,
	Peter Zijlstra
In-Reply-To: <1392415338-16288-11-git-send-email-cody@linux.vnet.ibm.com>

On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote:
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/perf/Makefile             | 2 ++
>  arch/powerpc/platforms/Kconfig.cputype | 6 ++++++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index 60d71ee..f9c083a 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
>  obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
>  obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
>  
> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
> +
>  obj-$(CONFIG_PPC64)		+= $(obj64-y)
>  obj-$(CONFIG_PPC32)		+= $(obj32-y)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 434fda3..dcc67cd 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -364,6 +364,12 @@ config PPC_PERF_CTRS
>         help
>           This enables the powerpc-specific perf_event back-end.
>  
> +config HV_PERF_CTRS
> +       def_bool y

This was bool, why did you change it?

> +       depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT

Should be:

	depends on PERF_EVENTS && PPC_PSERIES

> +       help
> +         Enable access to perf counters provided by the hypervisor
> +

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Madhavan Srinivasan @ 2014-02-25  4:53 UTC (permalink / raw)
  To: Cody P Schafer, Benjamin Herrenschmidt, Olof Johansson,
	Paul Gortmaker, Wang Dongsheng
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <1393028074-26797-1-git-send-email-cody@linux.vnet.ibm.com>

On Saturday 22 February 2014 05:44 AM, Cody P Schafer wrote:
> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
> nothing. Add a pr_warn() to convince any users that they should stop
> using it.
> 
> The commit message from the removing commit notes that this
> functionality should move into the cpuidle driver, essentially by

Would prefer to cleanup the code since the functionality is moved,
instead of adding to it.

> adjusting target_residency to the specified value. At the moment,
> target_residency is not exposed by cpuidle's sysfs, so there isn't a
> drop in replacement for this.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/sysfs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 97e1dc9..84097b4 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev,
>  	if (ret != 1)
>  		return -EINVAL;
>  
> +	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +		  current->comm, task_pid_nr(current), cpu->dev.id);
> +
>  	per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
>  	return count;
>  }
> @@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  {
>  	struct cpu *cpu = container_of(dev, struct cpu, dev);
>  
> +	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +		  current->comm, task_pid_nr(current), cpu->dev.id);
> +
>  	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
>  }
>  
> 

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.
From: Mahesh Jagannath Salgaonkar @ 2014-02-25  5:02 UTC (permalink / raw)
  To: Stewart Smith, linuxppc-dev
In-Reply-To: <87y515mij4.fsf@river.au.ibm.com>

On 02/21/2014 05:41 AM, Stewart Smith wrote:
>  Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>  > This patch adds support to read error logs from OPAL and export them
>  > to userspace through sysfs interface /sys/firmware/opa/opal-elog.

Hi Stewart,

Thanks for the review. This code definitely needs improvement.

> 
>  I think we could provide a better interface with instead having a file
>  per log message appear in sysfs. We're never going to have more than 128
>  of these at any one time on the Linux side, so it's not going to bee too
>  many files.

It is not just about 128 files, we may be adding/removing sysfs node for
every new log id that gets informed to kernel and ack-ed. In worst case,
when we have flood of elog errors with user daemon consuming it and
ack-ing back to get ready for next log in a tight poll, we may
continuously add/remove the sysfs node for each new <id>.

> 
>  e.g. /sys/firmware/opal/elog/<id>
> 
>  that way, any new file in /sys/firmware/opal/elog/ means there's a new
>  log entry available. I believe there's 
> 
>  To ack a log, you could just echo 'ack' to the file.
> 
>  The other option woudl be to more closely follow what sysfs is meant to
>  be - ascii text. This would mean having more (any) of the parser in
>  kernel for the error logs - which may/may not be a bad idea.
> 
>  However, it would make the end user code for consuming them much much
>  simpler, and that may be a good thing.
> 
>  Having some way of getting some information out without a userspace
>  parser is probably good though, I'm pretty sure having only a binary
>  interface in /sys is at least partially frowned upon.
> 
>  > This is what user space tool would do:
>  > - Read error log from /sys/firmware/opa/opal-elog.
>  > - Save it to the disk.
>  > - Send an acknowledgement on successful consumption by writing error log
>  >   id to /sys/firmware/opa/opal-elog-ack.
> 
>  A userspace tool may want to explicitly *not* ack the log too, or only
>  ack some entries, so the interface sohuld be sane for this use case too.
> 
>  e.g. we could display them in petitboot.
> 
>  > diff --git a/arch/powerpc/platforms/powernv/opal-elog.c
>  > b/arch/powerpc/platforms/powernv/opal-elog.c
>  [ 2 more citation lines. Click/Enter to show. ]
>  > new file mode 100644
>  > index 0000000..fc891ae
>  > --- /dev/null
>  > +++ b/arch/powerpc/platforms/powernv/opal-elog.c
>  > @@ -0,0 +1,309 @@
>  <snip>
>  > +/* Maximum size of a single log on FSP is 16KB */
>  > +#define OPAL_MAX_ERRLOG_SIZE	16384
> 
>  I've seen some conflicting things on this - is it 2kb or 16kb?

We choose 16kb because we want to pull all the log data and not partial.

> 
>  > +
>  > +struct opal_err_log {
>  > +	struct list_head link;
>  > +	uint64_t opal_log_id;
> 
>  why is this uint64_t and not uint32_t? It appears that the log id is 32bits.

Agree, This needs to be changed to uint_32.

> 
>  > +	size_t opal_log_size;
>  > +	uint8_t data[OPAL_MAX_ERRLOG_SIZE];
>  > +};
>  > +
>  > +/* Pre-allocated temp buffer to pull error log from opal. */
>  > +static uint8_t err_log_data[OPAL_MAX_ERRLOG_SIZE];
> 
>  Why do we need temporary space? Why not just store directly into struct
>  opal_err_log?
> 
>  > +/* Protect err_log_data buf */
>  > +static DEFINE_MUTEX(err_log_data_mutex);
>  [ 15 more citation lines. Click/Enter to show. ]
>  > +
>  > +static uint64_t total_log_size;
>  > +static bool opal_log_available;
>  > +static LIST_HEAD(elog_list);
>  > +static LIST_HEAD(elog_ack_list);
>  > +
>  > +/* lock to protect elog_list and elog-ack_list. */
>  > +static DEFINE_SPINLOCK(opal_elog_lock);
>  > +
>  > +static DECLARE_WAIT_QUEUE_HEAD(opal_log_wait);
>  > +
>  > +/*
>  > + * Interface for user to acknowledge the error log.
>  > + *
>  > + * Once user acknowledge the log, we delete that record entry from the
>  > + * list and move it ack list.
>  > + */
>  > +void opal_elog_ack(uint64_t ack_id)
> 
>  s/ack_id/log_id/

Yup. makes sense.

> 
>  > +
>  > +static ssize_t elog_ack_store(struct kobject *kobj,
>  [ 7 more citation lines. Click/Enter to show. ]
>  > +					struct kobj_attribute *attr,
>  > +					const char *buf, size_t count)
>  > +{
>  > +	uint32_t log_ack_id;
>  > +	log_ack_id = *(uint32_t *) buf;
>  > +
>  > +	/* send acknowledgment to FSP */
>  > +	opal_elog_ack(log_ack_id);
>  > +	return 0;
>  > +}
> 
>  This function has a few problems:
> 
>  Consider the following actions:
>  $ echo 1 > /sys/firmware/opal/opal-elog-ack
>  $ echo 'abcde' > /sys/firmware/opal/opal-elog-ack
> 
>  The former will read undefined memory and the latter will make a kernel
>  thread, rsyslogd and systemd-journal all each a CPU each.
> 
>  Basically, the problems are:
>  1) not endian safe
>  2) not following store API of returning nr bytes read
>  3) binary interface. Use sscanf to read numbers instead.
> 
>  > +/*
>  > + * Show error log records to user.
>  [ 9 more citation lines. Click/Enter to show. ]
>  > + */
>  > +static ssize_t opal_elog_show(struct file *filp, struct kobject *kobj,
>  > +				struct bin_attribute *bin_attr, char *buf,
>  > +				loff_t pos, size_t count)
>  > +{
>  > +	unsigned long flags;
>  > +	struct opal_err_log *record, *next;
>  > +	size_t size = 0;
>  > +	size_t data_to_copy = 0;
>  > +	int error = 0;
>  > +
>  > +	/* Display one log@a time. */
> 
>  use words, not @.
> 
>  > +	if (count > OPAL_MAX_ERRLOG_SIZE)
>  > +		count = OPAL_MAX_ERRLOG_SIZE;
>  [ 23 more citation lines. Click/Enter to show. ]
>  > +	spin_lock_irqsave(&opal_elog_lock, flags);
>  > +	/* Align the pos to point within total errlog size. */
>  > +	if (total_log_size && pos > total_log_size)
>  > +		pos = pos % total_log_size;
>  > +
>  > +	/*
>  > +	 * if pos goes beyond total_log_size then we know we don't have any
>  > +	 * new record to show.
>  > +	 */
>  > +	if (total_log_size == 0 || pos >= total_log_size) {
>  > +		opal_log_available = 0;
>  > +		if (filp->f_flags & O_NONBLOCK) {
>  > +			spin_unlock_irqrestore(&opal_elog_lock, flags);
>  > +			error = -EAGAIN;
>  > +			goto out;
>  > +		}
>  > +		spin_unlock_irqrestore(&opal_elog_lock, flags);
>  > +		pos = 0;
>  > +
>  > +		/* Wait until we get log from sapphire */
>  > +		error = wait_event_interruptible(opal_log_wait,
>  > +						 opal_log_available);
>  > +		if (error)
>  > +			goto out;
>  > +		spin_lock_irqsave(&opal_elog_lock, flags);
>  > +	}
> 
>  Why should we wait for there to be a log message? If there's not one
>  then there's not one and that's fine.
> 
>  I also wonder if we really need total_log_size and opal_log_available,
>  this information seems readily available from the list of events.
> 
>  Instead, for notification (as i understand it)  we should be using
>  sysfs_notify() from kernel and then in userspace we can just call
>  select() to wait for something to happen.
> 
>  > +/*
>  > + * Pre-allocate a buffer to hold handful of error logs until user space
>  [ 5 more citation lines. Click/Enter to show. ]
>  > + * consumes it.
>  > + */
>  > +static int init_err_log_buffer(void)
>  > +{
>  > +	int i = 0;
>  > +	struct opal_err_log *buf_ptr;
>  > +
>  > +	buf_ptr = vmalloc(sizeof(struct opal_err_log) * MAX_NUM_RECORD);
> 
>  This means we constantly use 128 * sizeof(struct opal_err_log) which
>  equates to somewhere north of 2MB of memory (due to list overhead).
> 
>  I don't think we need to statically allocate this, we can probably just
>  allocate on-demand as in a typical system you're probably quite
>  unlikely to have too many of these sitting around (besides, if for
>  whatever reason we cannot allocate memory at some point, that's okay
>  because we can read it again later).

The reason we choose to go for static allocation is, we can not afford
to drop or delay a critical error log due to memory allocation failure.
OR we can keep static allocations for critical errors and follow dynamic
allocation for informative error logs.  What do you say?

Thanks,
-Mahesh.

^ permalink raw reply

* [PATCH 4/9] powerpc/eeh: Introduce eeh_pe_free()
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch introduces eeh_pe_free() to replace original kfree(pe)
so that we could have more checks there and calls to platform
interface supplied by eeh_ops in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_pe.c |   25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index f0c353f..2add834 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -60,6 +60,27 @@ static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
 }
 
 /**
+ * eeh_pe_free - Free PE
+ * @pe: EEH PE
+ *
+ * Free PE instance dynamically
+ */
+static void eeh_pe_free(struct eeh_pe *pe)
+{
+	if (!pe)
+		return;
+
+	if (!list_empty(&pe->child_list) ||
+	    !list_empty(&pe->edevs)) {
+		pr_warn("%s: PHB#%x-PE#%x has child PE or EEH dev\n",
+			__func__, pe->phb->global_number, pe->addr);
+		return;
+	}
+
+	kfree(pe);
+}
+
+/**
  * eeh_phb_pe_create - Create PHB PE
  * @phb: PCI controller
  *
@@ -374,7 +395,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 			pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
 				__func__, edev->phb->global_number);
 			edev->pe = NULL;
-			kfree(pe);
+			eeh_pe_free(pe);
 			return -EEXIST;
 		}
 	}
@@ -433,7 +454,7 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev)
 			if (list_empty(&pe->edevs) &&
 			    list_empty(&pe->child_list)) {
 				list_del(&pe->child);
-				kfree(pe);
+				eeh_pe_free(pe);
 			} else {
 				break;
 			}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 0/9] EEH improvement
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The series of patches intends to improve reliability of EEH on PowerNV
platform. First all, we have had multiple duplicate states (flags) for
PHB and PE, so we remove those duplicate states to simplify the code.
Besides, we had corrupted PHB diag-data for case of frozen PE. In order
to solve the problem, we introduce eeh_ops->event() and notifications
are sent from EEH core to (PowerNV) platform on creating or destroying
PE instance so that we can allocate or free PHB diag-data backend. Then
we cache the PHB diag-data on the first call to eeh_ops->get_state()
and dump it afterwards, which helps to get correct PHB diag-data.

With the patchset applied, we never dump PHB diag-data for INF errors.
Instead, we just maintain statistics in /proc/powerpc/eeh_inf_err. Also,
we changed the PHB diag-data dump format for a bit to have multiple
fields per line and omits the line with all zero'd fields as Ben suggested.

v1 -> v2:
	* Amending commit logs
	* Support eeh_ops->event() and maintain PHB diag-data on basis
	  of PE instance
	* When dumping PHB diag-data, to replace "-" with "00000000" and
	  omit the line if the fields of it are all zeros.

---

arch/powerpc/include/asm/eeh.h               |    7 ++-
arch/powerpc/kernel/eeh.c                    |   10 +---
arch/powerpc/kernel/eeh_driver.c             |   10 ++--
arch/powerpc/kernel/eeh_pe.c                 |   39 ++++++++++++-
arch/powerpc/platforms/powernv/eeh-ioda.c    |  193 ++++++++++++++++++++++++++++++++++++-------------------------
arch/powerpc/platforms/powernv/eeh-powernv.c |   74 +++++++++++++++++++-----
arch/powerpc/platforms/powernv/pci.c         |  228 +++++++++++++++++++++++++++++++++++++++++-------------------------
arch/powerpc/platforms/powernv/pci.h         |   11 ++--
arch/powerpc/platforms/pseries/eeh_pseries.c |    3 +-
9 files changed, 358 insertions(+), 217 deletions(-)

Thanks,
Gavin

^ permalink raw reply

* [PATCH 2/9] powerpc/powernv: Remove PNV_EEH_STATE_REMOVED
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The PHB state PNV_EEH_STATE_REMOVED maintained in pnv_phb isn't
so useful any more and it's duplicated to EEH_PE_ISOLATED. The
patch replaces PNV_EEH_STATE_REMOVED with EEH_PE_ISOLATED.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   56 ++++++++---------------------
 arch/powerpc/platforms/powernv/pci.h      |    1 -
 2 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index f514743..0d1d424 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -662,22 +662,6 @@ static void ioda_eeh_phb_diag(struct pci_controller *hose)
 	pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
 }
 
-static int ioda_eeh_get_phb_pe(struct pci_controller *hose,
-			       struct eeh_pe **pe)
-{
-	struct eeh_pe *phb_pe;
-
-	phb_pe = eeh_phb_pe_get(hose);
-	if (!phb_pe) {
-		pr_warning("%s Can't find PE for PHB#%d\n",
-			   __func__, hose->global_number);
-		return -EEXIST;
-	}
-
-	*pe = phb_pe;
-	return 0;
-}
-
 static int ioda_eeh_get_pe(struct pci_controller *hose,
 			   u16 pe_no, struct eeh_pe **pe)
 {
@@ -685,7 +669,8 @@ static int ioda_eeh_get_pe(struct pci_controller *hose,
 	struct eeh_dev dev;
 
 	/* Find the PHB PE */
-	if (ioda_eeh_get_phb_pe(hose, &phb_pe))
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe)
 		return -EEXIST;
 
 	/* Find the PE according to PE# */
@@ -713,6 +698,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 {
 	struct pci_controller *hose;
 	struct pnv_phb *phb;
+	struct eeh_pe *phb_pe;
 	u64 frozen_pe_no;
 	u16 err_type, severity;
 	long rc;
@@ -729,10 +715,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 	list_for_each_entry(hose, &hose_list, list_node) {
 		/*
 		 * If the subordinate PCI buses of the PHB has been
-		 * removed, we needn't take care of it any more.
+		 * removed or is exactly under error recovery, we
+		 * needn't take care of it any more.
 		 */
 		phb = hose->private_data;
-		if (phb->eeh_state & PNV_EEH_STATE_REMOVED)
+		phb_pe = eeh_phb_pe_get(hose);
+		if (!phb_pe || (phb_pe->state & EEH_PE_ISOLATED))
 			continue;
 
 		rc = opal_pci_next_error(phb->opal_id,
@@ -765,12 +753,6 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 		switch (err_type) {
 		case OPAL_EEH_IOC_ERROR:
 			if (severity == OPAL_EEH_SEV_IOC_DEAD) {
-				list_for_each_entry(hose, &hose_list,
-						    list_node) {
-					phb = hose->private_data;
-					phb->eeh_state |= PNV_EEH_STATE_REMOVED;
-				}
-
 				pr_err("EEH: dead IOC detected\n");
 				ret = EEH_NEXT_ERR_DEAD_IOC;
 			} else if (severity == OPAL_EEH_SEV_INF) {
@@ -783,17 +765,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 			break;
 		case OPAL_EEH_PHB_ERROR:
 			if (severity == OPAL_EEH_SEV_PHB_DEAD) {
-				if (ioda_eeh_get_phb_pe(hose, pe))
-					break;
-
+				*pe = phb_pe;
 				pr_err("EEH: dead PHB#%x detected\n",
 					hose->global_number);
-				phb->eeh_state |= PNV_EEH_STATE_REMOVED;
 				ret = EEH_NEXT_ERR_DEAD_PHB;
 			} else if (severity == OPAL_EEH_SEV_PHB_FENCED) {
-				if (ioda_eeh_get_phb_pe(hose, pe))
-					break;
-
+				*pe = phb_pe;
 				pr_err("EEH: fenced PHB#%x detected\n",
 					hose->global_number);
 				ret = EEH_NEXT_ERR_FENCED_PHB;
@@ -813,15 +790,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 			 * fenced PHB so that it can be recovered.
 			 */
 			if (ioda_eeh_get_pe(hose, frozen_pe_no, pe)) {
-				if (!ioda_eeh_get_phb_pe(hose, pe)) {
-					pr_err("EEH: Escalated fenced PHB#%x "
-					       "detected for PE#%llx\n",
-						hose->global_number,
-						frozen_pe_no);
-					ret = EEH_NEXT_ERR_FENCED_PHB;
-				} else {
-					ret = EEH_NEXT_ERR_NONE;
-				}
+				*pe = phb_pe;
+				pr_err("EEH: Escalated fenced PHB#%x "
+				       "detected for PE#%llx\n",
+					hose->global_number,
+					frozen_pe_no);
+				ret = EEH_NEXT_ERR_FENCED_PHB;
 			} else {
 				pr_err("EEH: Frozen PE#%x on PHB#%x detected\n",
 					(*pe)->addr, (*pe)->phb->global_number);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index cde1694..6870f60 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -83,7 +83,6 @@ struct pnv_eeh_ops {
 };
 
 #define PNV_EEH_STATE_ENABLED	(1 << 0)	/* EEH enabled	*/
-#define PNV_EEH_STATE_REMOVED	(1 << 1)	/* PHB removed	*/
 
 #endif /* CONFIG_EEH */
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/9] powerpc/powernv: Move PNV_EEH_STATE_ENABLED around
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
which is protected by CONFIG_EEH. We needn't that. Instead, we
can have pnv_phb::flags and maintain all flags there, which is
the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
to PNV_PHB_FLAG_EEH.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |    2 +-
 arch/powerpc/platforms/powernv/pci.c      |    8 ++------
 arch/powerpc/platforms/powernv/pci.h      |    7 +++----
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 0d1d424..04b4710 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -153,7 +153,7 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	}
 #endif
 
-	phb->eeh_state |= PNV_EEH_STATE_ENABLED;
+	phb->flags |= PNV_PHB_FLAG_EEH;
 
 	return 0;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 95633d7..3955fc0 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -396,7 +396,7 @@ int pnv_pci_cfg_read(struct device_node *dn,
 	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
 		return PCIBIOS_SUCCESSFUL;
 
-	if (phb->eeh_state & PNV_EEH_STATE_ENABLED) {
+	if (phb->flags & PNV_PHB_FLAG_EEH) {
 		if (*val == EEH_IO_ERROR_VALUE(size) &&
 		    eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
 			return PCIBIOS_DEVICE_NOT_FOUND;
@@ -434,12 +434,8 @@ int pnv_pci_cfg_write(struct device_node *dn,
 	}
 
 	/* Check if the PHB got frozen due to an error (no response) */
-#ifdef CONFIG_EEH
-	if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED))
+	if (!(phb->flags & PNV_PHB_FLAG_EEH))
 		pnv_pci_config_check_eeh(phb, dn);
-#else
-	pnv_pci_config_check_eeh(phb, dn);
-#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 6870f60..94e3495 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -81,24 +81,23 @@ struct pnv_eeh_ops {
 	int (*configure_bridge)(struct eeh_pe *pe);
 	int (*next_error)(struct eeh_pe **pe);
 };
-
-#define PNV_EEH_STATE_ENABLED	(1 << 0)	/* EEH enabled	*/
-
 #endif /* CONFIG_EEH */
 
+#define PNV_PHB_FLAG_EEH	(1 << 0)
+
 struct pnv_phb {
 	struct pci_controller	*hose;
 	enum pnv_phb_type	type;
 	enum pnv_phb_model	model;
 	u64			hub_id;
 	u64			opal_id;
+	int			flags;
 	void __iomem		*regs;
 	int			initialized;
 	spinlock_t		lock;
 
 #ifdef CONFIG_EEH
 	struct pnv_eeh_ops	*eeh_ops;
-	int			eeh_state;
 #endif
 
 #ifdef CONFIG_DEBUG_FS
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/9] powerpc/eeh: Remove EEH_PE_PHB_DEAD
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to
EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD
would be removed from the system. However, it's safe to replace
that with EEH_PE_ISOLATED.

The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled,
either failure or success. It makes the PHB PE state consistent with:

	PHB functions normally		  NONE
	PHB has been removed		  EEH_PE_ISOLATED
	PHB fenced, recovery in progress  EEH_PE_ISOLATED | RECOVERING

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 -
 arch/powerpc/kernel/eeh.c        |   10 ++--------
 arch/powerpc/kernel/eeh_driver.c |   10 +++++-----
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d4dd41f..a61b06f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -53,7 +53,6 @@ struct device_node;
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
-#define EEH_PE_PHB_DEAD		(1 << 2)	/* Dead PHB		*/
 
 #define EEH_PE_KEEP		(1 << 8)	/* Keep PE on hotplug	*/
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e7b76a6..f167676 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -232,7 +232,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 {
 	size_t loglen = 0;
 	struct eeh_dev *edev, *tmp;
-	bool valid_cfg_log = true;
 
 	/*
 	 * When the PHB is fenced or dead, it's pointless to collect
@@ -240,12 +239,7 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 	 * 0xFF's. For ER, we still retrieve the data from the PCI
 	 * config space.
 	 */
-	if (eeh_probe_mode_dev() &&
-	    (pe->type & EEH_PE_PHB) &&
-	    (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
-		valid_cfg_log = false;
-
-	if (valid_cfg_log) {
+	if (!(pe->type & EEH_PE_PHB)) {
 		eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
 		eeh_ops->configure_bridge(pe);
 		eeh_pe_restore_bars(pe);
@@ -309,7 +303,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 
 	/* If the PHB has been in problematic state */
 	eeh_serialize_lock(&flags);
-	if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) {
+	if (phb_pe->state & EEH_PE_ISOLATED) {
 		ret = 0;
 		goto out;
 	}
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fdc679d..4cf0467 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -665,8 +665,7 @@ static void eeh_handle_special_event(void)
 				phb_pe = eeh_phb_pe_get(hose);
 				if (!phb_pe) continue;
 
-				eeh_pe_state_mark(phb_pe,
-					EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+				eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
 			}
 
 			eeh_serialize_unlock(flags);
@@ -682,8 +681,7 @@ static void eeh_handle_special_event(void)
 			eeh_remove_event(pe);
 
 			if (rc == EEH_NEXT_ERR_DEAD_PHB)
-				eeh_pe_state_mark(pe,
-					EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+				eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
 			else
 				eeh_pe_state_mark(pe,
 					EEH_PE_ISOLATED | EEH_PE_RECOVERING);
@@ -707,12 +705,14 @@ static void eeh_handle_special_event(void)
 		if (rc == EEH_NEXT_ERR_FROZEN_PE ||
 		    rc == EEH_NEXT_ERR_FENCED_PHB) {
 			eeh_handle_normal_event(pe);
+			eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
 		} else {
 			pci_lock_rescan_remove();
 			list_for_each_entry(hose, &hose_list, list_node) {
 				phb_pe = eeh_phb_pe_get(hose);
 				if (!phb_pe ||
-				    !(phb_pe->state & EEH_PE_PHB_DEAD))
+				    !(phb_pe->state & EEH_PE_ISOLATED) ||
+				    (phb_pe->state & EEH_PE_RECOVERING))
 					continue;
 
 				/* Notify all devices to be down */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 6/9] powerpc/powernv: Support eeh_ops->event()
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch implements the backend for eeh_ops->event() on PowerNV
platform so that we can allocate or destroy PHB diag-data buffer,
which is attached to eeh_pe::data.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   42 +++++++++++++++++++++++++-
 arch/powerpc/platforms/pseries/eeh_pseries.c |    3 +-
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a59788e..cfba40a 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -365,6 +365,45 @@ static int powernv_eeh_restore_config(struct device_node *dn)
 	return 0;
 }
 
+static int powernv_eeh_event(int event, void *data)
+{
+	struct eeh_pe *pe = data;
+	struct pnv_phb *phb;
+	int ret = 0;
+
+	switch (event) {
+	case EEH_EVENT_PE_ALLOC:
+		if (!pe) {
+			ret = -EINVAL;
+			break;
+		} else if (pe->data) {
+			ret = -EEXIST;
+			break;
+		}
+
+		phb = pe->phb->private_data;
+		if (phb->model == PNV_PHB_MODEL_P7IOC ||
+		    phb->model == PNV_PHB_MODEL_PHB3) {
+			pe->data = kzalloc(PNV_PCI_DIAG_BUF_SIZE, GFP_KERNEL);
+			if (!pe->data)
+				ret = -ENOMEM;
+		}
+
+		break;
+	case EEH_EVENT_PE_FREE:
+		if (pe->data) {
+			kfree(pe->data);
+			pe->data = NULL;
+		}
+
+		break;
+	default:
+		return 0;
+	}
+
+	return ret;
+}
+
 static struct eeh_ops powernv_eeh_ops = {
 	.name                   = "powernv",
 	.init                   = powernv_eeh_init,
@@ -381,7 +420,8 @@ static struct eeh_ops powernv_eeh_ops = {
 	.read_config            = pnv_pci_cfg_read,
 	.write_config           = pnv_pci_cfg_write,
 	.next_error		= powernv_eeh_next_error,
-	.restore_config		= powernv_eeh_restore_config
+	.restore_config		= powernv_eeh_restore_config,
+	.event			= powernv_eeh_event
 };
 
 /**
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 8a8f047..b9a4ddb 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -691,7 +691,8 @@ static struct eeh_ops pseries_eeh_ops = {
 	.read_config		= pseries_eeh_read_config,
 	.write_config		= pseries_eeh_write_config,
 	.next_error		= NULL,
-	.restore_config		= NULL
+	.restore_config		= NULL,
+	.event			= NULL
 };
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 7/9] powerpc/powernv: Cache PHB diag-data
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The PHB diag-data is useful to help locating the root cause for
frozen PE or fenced PHB. However, EEH core enables IO path by clearing
part of HW registers before collecting it and eventually we got broken
PHB diag-data.

The patch intends to fix it by caching the PHB diag-data in advance
to eeh_pe::data when frozen/fenced state on PE or PHB is detected
for the first time in eeh_ops::get_state() or next_error() backend.
Also, we collect diag-data for INF error without dumping it.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c    |   84 ++++++++++++++------------
 arch/powerpc/platforms/powernv/eeh-powernv.c |   32 ++++++----
 arch/powerpc/platforms/powernv/pci.h         |    2 +-
 3 files changed, 67 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 04b4710..cd06c52 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -114,6 +114,23 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, ioda_eeh_inbB_dbgfs_get,
 			ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
+static void ioda_eeh_phb_diag(struct pci_controller *hose, char *buf)
+{
+	struct pnv_phb *phb = hose->private_data;
+	long rc;
+
+	if (!buf)
+		return;
+
+	rc = opal_pci_get_phb_diag_data2(phb->opal_id, buf,
+                                         PNV_PCI_DIAG_BUF_SIZE);
+	if (rc != OPAL_SUCCESS) {
+		pr_warn("%s: Failed to get PHB#%x diag-data (%ld)\n",
+			__func__, hose->global_number, rc);
+		return;
+	}
+}
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -224,12 +241,13 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
 /**
  * ioda_eeh_get_state - Retrieve the state of PE
  * @pe: EEH PE
+ * @cache_diag: Cache PHB diag-data or not
  *
  * The PE's state should be retrieved from the PEEV, PEST
  * IODA tables. Since the OPAL has exported the function
  * to do it, it'd better to use that.
  */
-static int ioda_eeh_get_state(struct eeh_pe *pe)
+static int ioda_eeh_get_state(struct eeh_pe *pe, bool cache_diag)
 {
 	s64 ret = 0;
 	u8 fstate;
@@ -272,6 +290,9 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
 			result |= EEH_STATE_DMA_ACTIVE;
 			result |= EEH_STATE_MMIO_ENABLED;
 			result |= EEH_STATE_DMA_ENABLED;
+		} else if (cache_diag && !(pe->state & EEH_PE_ISOLATED)) {
+			/* Cache diag-data for fenced PHB */
+			ioda_eeh_phb_diag(hose, pe->data);
 		}
 
 		return result;
@@ -315,6 +336,14 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
 			   __func__, fstate, hose->global_number, pe_no);
 	}
 
+	/* Cache PHB diag-data for frozen PE */
+	if (cache_diag &&
+	    result != EEH_STATE_NOT_SUPPORT &&
+	    (result & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) !=
+	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) &&
+	    !(pe->state & EEH_PE_ISOLATED))
+		ioda_eeh_phb_diag(hose, pe->data);
+
 	return result;
 }
 
@@ -541,26 +570,10 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
 static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
 			    char *drv_log, unsigned long len)
 {
-	s64 ret;
-	unsigned long flags;
-	struct pci_controller *hose = pe->phb;
-	struct pnv_phb *phb = hose->private_data;
+	if (!pe->data)
+		return 0;
 
-	spin_lock_irqsave(&phb->lock, flags);
-
-	ret = opal_pci_get_phb_diag_data2(phb->opal_id,
-			phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
-	if (ret) {
-		spin_unlock_irqrestore(&phb->lock, flags);
-		pr_warning("%s: Can't get log for PHB#%x-PE#%x (%lld)\n",
-			   __func__, hose->global_number, pe->addr, ret);
-		return -EIO;
-	}
-
-	/* The PHB diag-data is always indicative */
-	pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
-
-	spin_unlock_irqrestore(&phb->lock, flags);
+	pnv_pci_dump_phb_diag_data(pe->phb, pe->data);
 
 	return 0;
 }
@@ -646,22 +659,6 @@ static void ioda_eeh_hub_diag(struct pci_controller *hose)
 	}
 }
 
-static void ioda_eeh_phb_diag(struct pci_controller *hose)
-{
-	struct pnv_phb *phb = hose->private_data;
-	long rc;
-
-	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
-					 PNV_PCI_DIAG_BUF_SIZE);
-	if (rc != OPAL_SUCCESS) {
-		pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n",
-			    __func__, hose->global_number, rc);
-		return;
-	}
-
-	pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
-}
-
 static int ioda_eeh_get_pe(struct pci_controller *hose,
 			   u16 pe_no, struct eeh_pe **pe)
 {
@@ -778,7 +775,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 				pr_info("EEH: PHB#%x informative error "
 					"detected\n",
 					hose->global_number);
-				ioda_eeh_phb_diag(hose);
+				ioda_eeh_phb_diag(hose, phb->diag.blob);
 				ret = EEH_NEXT_ERR_NONE;
 			}
 
@@ -809,6 +806,19 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 		}
 
 		/*
+		 * EEH core will try recover from fenced PHB or
+		 * frozen PE. In the time for frozen PE, EEH core
+		 * enable IO path for that before collecting logs,
+		 * but it ruins the site. So we have to cache the
+		 * log in advance here.
+		 */
+		if (ret == EEH_NEXT_ERR_FROZEN_PE ||
+		    ret == EEH_NEXT_ERR_FENCED_PHB) {
+			eeh_pe_state_mark(*pe, EEH_PE_ISOLATED);
+			ioda_eeh_phb_diag(hose, (*pe)->data);
+		}
+
+		/*
 		 * If we have no errors on the specific PHB or only
 		 * informative error there, we continue poking it.
 		 * Otherwise, we need actions to be taken by upper
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index cfba40a..54051bf 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -190,24 +190,15 @@ static int powernv_eeh_get_pe_addr(struct eeh_pe *pe)
 	return pe->addr;
 }
 
-/**
- * powernv_eeh_get_state - Retrieve PE state
- * @pe: EEH PE
- * @delay: delay while PE state is temporarily unavailable
- *
- * Retrieve the state of the specified PE. For IODA-compitable
- * platform, it should be retrieved from IODA table. Therefore,
- * we prefer passing down to hardware implementation to handle
- * it.
- */
-static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay)
+static int __powernv_eeh_get_state(struct eeh_pe *pe,
+				   int *delay, bool cache_diag)
 {
 	struct pci_controller *hose = pe->phb;
 	struct pnv_phb *phb = hose->private_data;
 	int ret = EEH_STATE_NOT_SUPPORT;
 
 	if (phb->eeh_ops && phb->eeh_ops->get_state) {
-		ret = phb->eeh_ops->get_state(pe);
+		ret = phb->eeh_ops->get_state(pe, cache_diag);
 
 		/*
 		 * If the PE state is temporarily unavailable,
@@ -225,6 +216,21 @@ static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay)
 }
 
 /**
+ * powernv_eeh_get_state - Retrieve PE state
+ * @pe: EEH PE
+ * @delay: delay while PE state is temporarily unavailable
+ *
+ * Retrieve the state of the specified PE. For IODA-compitable
+ * platform, it should be retrieved from IODA table. Therefore,
+ * we prefer passing down to hardware implementation to handle
+ * it.
+ */
+static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay)
+{
+	return __powernv_eeh_get_state(pe, delay, true);
+}
+
+/**
  * powernv_eeh_reset - Reset the specified PE
  * @pe: EEH PE
  * @option: reset option
@@ -257,7 +263,7 @@ static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait)
 	int mwait;
 
 	while (1) {
-		ret = powernv_eeh_get_state(pe, &mwait);
+		ret = __powernv_eeh_get_state(pe, &mwait, false);
 
 		/*
 		 * If the PE's state is temporarily unavailable,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 94e3495..3645fc4 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -74,7 +74,7 @@ struct pnv_ioda_pe {
 struct pnv_eeh_ops {
 	int (*post_init)(struct pci_controller *hose);
 	int (*set_option)(struct eeh_pe *pe, int option);
-	int (*get_state)(struct eeh_pe *pe);
+	int (*get_state)(struct eeh_pe *pe, bool cache_diag);
 	int (*reset)(struct eeh_pe *pe, int option);
 	int (*get_log)(struct eeh_pe *pe, int severity,
 		       char *drv_log, unsigned long len);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 8/9] powerpc/powernv: Add /proc/powerpc/eeh_inf_err
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch adds /proc/powerpc/eeh_inf_err to count the INF errors
happened on PHBs as Ben suggested.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   51 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |    1 +
 2 files changed, 52 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index cd06c52..3ddd706 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -20,6 +20,7 @@
 #include <linux/msi.h>
 #include <linux/notifier.h>
 #include <linux/pci.h>
+#include <linux/proc_fs.h>
 #include <linux/string.h>
 
 #include <asm/eeh.h>
@@ -35,6 +36,8 @@
 #include "powernv.h"
 #include "pci.h"
 
+static u64 ioda_eeh_ioc_inf_err = 0;
+static int ioda_eeh_proc_init = 0;
 static int ioda_eeh_nb_init = 0;
 
 static int ioda_eeh_event(struct notifier_block *nb,
@@ -114,6 +117,44 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, ioda_eeh_inbB_dbgfs_get,
 			ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
+#ifdef CONFIG_PROC_FS
+static int ioda_eeh_proc_show(struct seq_file *m, void *v)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+
+	if (!eeh_enabled()) {
+                seq_printf(m, "EEH Subsystem disabled\n");
+		return 0;
+	}
+
+	seq_printf(m, "EEH Subsystem enabled\n");
+	if (ioda_eeh_ioc_inf_err > 0)
+		seq_printf(m, "\nIOC INF Errors: %llu\n\n",
+			   ioda_eeh_ioc_inf_err);
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		phb = hose->private_data;
+		seq_printf(m, "PHB#%d INF Errors: %llu\n",
+			   hose->global_number, phb->inf_err);
+	}
+
+	return 0;
+}
+
+static int ioda_eeh_proc_open(struct inode *inode, struct file *file)
+{
+        return single_open(file, ioda_eeh_proc_show, NULL);
+}
+
+static const struct file_operations ioda_eeh_proc_ops = {
+	.open		= ioda_eeh_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+#endif /* CONFIG_PROC_FS */
+
 static void ioda_eeh_phb_diag(struct pci_controller *hose, char *buf)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -170,6 +211,14 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	}
 #endif
 
+#ifdef CONFIG_PROC_FS
+	if (!ioda_eeh_proc_init) {
+		ioda_eeh_proc_init = 1;
+		proc_create("powerpc/eeh_inf_err", 0,
+			    NULL, &ioda_eeh_proc_ops);
+	}
+#endif
+
 	phb->flags |= PNV_PHB_FLAG_EEH;
 
 	return 0;
@@ -755,6 +804,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 			} else if (severity == OPAL_EEH_SEV_INF) {
 				pr_info("EEH: IOC informative error "
 					"detected\n");
+				ioda_eeh_ioc_inf_err++;
 				ioda_eeh_hub_diag(hose);
 				ret = EEH_NEXT_ERR_NONE;
 			}
@@ -775,6 +825,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 				pr_info("EEH: PHB#%x informative error "
 					"detected\n",
 					hose->global_number);
+				phb->inf_err++;
 				ioda_eeh_phb_diag(hose, phb->diag.blob);
 				ret = EEH_NEXT_ERR_NONE;
 			}
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3645fc4..64ca719 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -97,6 +97,7 @@ struct pnv_phb {
 	spinlock_t		lock;
 
 #ifdef CONFIG_EEH
+	u64			inf_err;
 	struct pnv_eeh_ops	*eeh_ops;
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 9/9] powerpc/powernv: Refactor PHB diag-data dump
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl:     00000002
  RootSts:     0000000f 00400000 b0830008 00100147 00002000
  nFir:        0000000000000000 0030006e00000000 0000000000000000
  PhbSts:      0000001c00000000 0000000000000000
  Lem:         0000000000100000 42498e327f502eae 0000000000000000
  InAErr:      8000000000000000 8000000000000000 0402030000000000 \
               0000000000000000
  PE[  8] A/B: 8480002b00000000 8000000000000000

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.c |  220 +++++++++++++++++++---------------
 1 file changed, 125 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 3955fc0..114e1a7 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -134,57 +134,72 @@ static void pnv_pci_dump_p7ioc_diag_data(struct pci_controller *hose,
 	pr_info("P7IOC PHB#%d Diag-data (Version: %d)\n\n",
 		hose->global_number, common->version);
 
-	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
-
-	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
-	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
-	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
-
-	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
-	pr_info("  slotStatus:           %08x\n", data->slotStatus);
-	pr_info("  linkStatus:           %08x\n", data->linkStatus);
-	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
-	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
-
-	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
-	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
-	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
-	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
-	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
-	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
-	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
-	pr_info("  sourceId:             %08x\n", data->sourceId);
-	pr_info("  errorClass:           %016llx\n", data->errorClass);
-	pr_info("  correlator:           %016llx\n", data->correlator);
-	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
-	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
-	pr_info("  lemFir:               %016llx\n", data->lemFir);
-	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
-	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
-	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
-	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
-	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
-	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
-	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
-	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
-	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
-	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
-	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
-	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
-	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
-	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
-	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
-	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
-	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
-	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
+	if (data->brdgCtl)
+		pr_info("  brdgCtl:     %08x\n",
+			data->brdgCtl);
+	if (data->portStatusReg || data->rootCmplxStatus ||
+	    data->busAgentStatus)
+		pr_info("  UtlSts:      %08x %08x %08x\n",
+			data->portStatusReg, data->rootCmplxStatus,
+			data->busAgentStatus);
+	if (data->deviceStatus || data->slotStatus   ||
+	    data->linkStatus   || data->devCmdStatus ||
+	    data->devSecStatus)
+		pr_info("  RootSts:     %08x %08x %08x %08x %08x\n",
+			data->deviceStatus, data->slotStatus,
+			data->linkStatus, data->devCmdStatus,
+			data->devSecStatus);
+	if (data->rootErrorStatus   || data->uncorrErrorStatus ||
+	    data->corrErrorStatus)
+		pr_info("  RootErrSts:  %08x %08x %08x\n",
+			data->rootErrorStatus, data->uncorrErrorStatus,
+			data->corrErrorStatus);
+	if (data->tlpHdr1 || data->tlpHdr2 ||
+	    data->tlpHdr3 || data->tlpHdr4)
+		pr_info("  RootErrLog:  %08x %08x %08x %08x\n",
+			data->tlpHdr1, data->tlpHdr2,
+			data->tlpHdr3, data->tlpHdr4);
+	if (data->sourceId || data->errorClass ||
+	    data->correlator)
+		pr_info("  RootErrLog1: %08x %016llx %016llx\n",
+			data->sourceId, data->errorClass,
+			data->correlator);
+	if (data->p7iocPlssr || data->p7iocCsr)
+		pr_info("  PhbSts:      %016llx %016llx\n",
+			data->p7iocPlssr, data->p7iocCsr);
+	if (data->lemFir || data->lemErrorMask ||
+	    data->lemWOF)
+		pr_info("  Lem:         %016llx %016llx %016llx\n",
+			data->lemFir, data->lemErrorMask,
+			data->lemWOF);
+	if (data->phbErrorStatus || data->phbFirstErrorStatus ||
+	    data->phbErrorLog0   || data->phbErrorLog1)
+		pr_info("  PhbErr:      %016llx %016llx %016llx %016llx\n",
+			data->phbErrorStatus, data->phbFirstErrorStatus,
+			data->phbErrorLog0, data->phbErrorLog1);
+	if (data->mmioErrorStatus || data->mmioFirstErrorStatus ||
+	    data->mmioErrorLog0   || data->mmioErrorLog1)
+		pr_info("  OutErr:      %016llx %016llx %016llx %016llx\n",
+			data->mmioErrorStatus, data->mmioFirstErrorStatus,
+			data->mmioErrorLog0, data->mmioErrorLog1);
+	if (data->dma0ErrorStatus || data->dma0FirstErrorStatus ||
+	    data->dma0ErrorLog0   || data->dma0ErrorLog1)
+		pr_info("  InAErr:      %016llx %016llx %016llx %016llx\n",
+			data->dma0ErrorStatus, data->dma0FirstErrorStatus,
+			data->dma0ErrorLog0, data->dma0ErrorLog1);
+	if (data->dma1ErrorStatus || data->dma1FirstErrorStatus ||
+	    data->dma1ErrorLog0   || data->dma1ErrorLog1)
+		pr_info("  InBErr:      %016llx %016llx %016llx %016llx\n",
+			data->dma1ErrorStatus, data->dma1FirstErrorStatus,
+			data->dma1ErrorLog0, data->dma1ErrorLog1);
 
 	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
 		if ((data->pestA[i] >> 63) == 0 &&
 		    (data->pestB[i] >> 63) == 0)
 			continue;
 
-		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
-		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
+		pr_info("  PE[%3d] A/B: %016llx %016llx\n",
+			i, data->pestA[i], data->pestB[i]);
 	}
 }
 
@@ -197,62 +212,77 @@ static void pnv_pci_dump_phb3_diag_data(struct pci_controller *hose,
 	data = (struct OpalIoPhb3ErrorData*)common;
 	pr_info("PHB3 PHB#%d Diag-data (Version: %d)\n\n",
 		hose->global_number, common->version);
-
-	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
-
-	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
-	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
-	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
-
-	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
-	pr_info("  slotStatus:           %08x\n", data->slotStatus);
-	pr_info("  linkStatus:           %08x\n", data->linkStatus);
-	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
-	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
-
-	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
-	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
-	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
-	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
-	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
-	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
-	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
-	pr_info("  sourceId:             %08x\n", data->sourceId);
-	pr_info("  errorClass:           %016llx\n", data->errorClass);
-	pr_info("  correlator:           %016llx\n", data->correlator);
-
-	pr_info("  nFir:                 %016llx\n", data->nFir);
-	pr_info("  nFirMask:             %016llx\n", data->nFirMask);
-	pr_info("  nFirWOF:              %016llx\n", data->nFirWOF);
-	pr_info("  PhbPlssr:             %016llx\n", data->phbPlssr);
-	pr_info("  PhbCsr:               %016llx\n", data->phbCsr);
-	pr_info("  lemFir:               %016llx\n", data->lemFir);
-	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
-	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
-	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
-	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
-	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
-	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
-	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
-	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
-	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
-	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
-	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
-	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
-	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
-	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
-	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
-	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
-	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
-	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
+	if (data->brdgCtl)
+		pr_info("  brdgCtl:     %08x\n",
+			data->brdgCtl);
+	if (data->portStatusReg || data->rootCmplxStatus ||
+	    data->busAgentStatus)
+		pr_info("  UtlSts:      %08x %08x %08x\n",
+			data->portStatusReg, data->rootCmplxStatus,
+			data->busAgentStatus);
+	if (data->deviceStatus || data->slotStatus   ||
+	    data->linkStatus   || data->devCmdStatus ||
+	    data->devSecStatus)
+		pr_info("  RootSts:     %08x %08x %08x %08x %08x\n",
+			data->deviceStatus, data->slotStatus,
+			data->linkStatus, data->devCmdStatus,
+			data->devSecStatus);
+	if (data->rootErrorStatus || data->uncorrErrorStatus ||
+	    data->corrErrorStatus)
+		pr_info("  RootErrSts:  %08x %08x %08x\n",
+			data->rootErrorStatus, data->uncorrErrorStatus,
+			data->corrErrorStatus);
+	if (data->tlpHdr1 || data->tlpHdr2 ||
+	    data->tlpHdr3 || data->tlpHdr4)
+		pr_info("  RootErrLog:  %08x %08x %08x %08x\n",
+			data->tlpHdr1, data->tlpHdr2,
+			data->tlpHdr3, data->tlpHdr4);
+	if (data->sourceId || data->errorClass ||
+	    data->correlator)
+		pr_info("  RootErrLog1: %08x %016llx %016llx\n",
+			data->sourceId, data->errorClass,
+			data->correlator);
+	if (data->nFir || data->nFirMask ||
+	    data->nFirWOF)
+		pr_info("  nFir:        %016llx %016llx %016llx\n",
+			data->nFir, data->nFirMask,
+			data->nFirWOF);
+	if (data->phbPlssr || data->phbCsr)
+		pr_info("  PhbSts:      %016llx %016llx\n",
+			data->phbPlssr, data->phbCsr);
+	if (data->lemFir || data->lemErrorMask ||
+	    data->lemWOF)
+		pr_info("  Lem:         %016llx %016llx %016llx\n",
+			data->lemFir, data->lemErrorMask,
+			data->lemWOF);
+	if (data->phbErrorStatus || data->phbFirstErrorStatus ||
+	    data->phbErrorLog0   || data->phbErrorLog1)
+		pr_info("  PhbErr:      %016llx %016llx %016llx %016llx\n",
+			data->phbErrorStatus, data->phbFirstErrorStatus,
+			data->phbErrorLog0, data->phbErrorLog1);
+	if (data->mmioErrorStatus || data->mmioFirstErrorStatus ||
+	    data->mmioErrorLog0   || data->mmioErrorLog1)
+		pr_info("  OutErr:      %016llx %016llx %016llx %016llx\n",
+			data->mmioErrorStatus, data->mmioFirstErrorStatus,
+			data->mmioErrorLog0, data->mmioErrorLog1);
+	if (data->dma0ErrorStatus || data->dma0FirstErrorStatus ||
+	    data->dma0ErrorLog0   || data->dma0ErrorLog1)
+		pr_info("  InAErr:      %016llx %016llx %016llx %016llx\n",
+			data->dma0ErrorStatus, data->dma0FirstErrorStatus,
+			data->dma0ErrorLog0, data->dma0ErrorLog1);
+	if (data->dma1ErrorStatus || data->dma1FirstErrorStatus ||
+	    data->dma1ErrorLog0   || data->dma1ErrorLog1)
+		pr_info("  InBErr:      %016llx %016llx %016llx %016llx\n",
+			data->dma1ErrorStatus, data->dma1FirstErrorStatus,
+			data->dma1ErrorLog0, data->dma1ErrorLog1);
 
 	for (i = 0; i < OPAL_PHB3_NUM_PEST_REGS; i++) {
 		if ((data->pestA[i] >> 63) == 0 &&
 		    (data->pestB[i] >> 63) == 0)
 			continue;
 
-		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
-		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
+		pr_info("  PE[%3d] A/B: %016llx %016llx\n",
+			i, data->pestA[i], data->pestB[i]);
 	}
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 5/9] powerpc/eeh: Introduce eeh_ops->event()
From: Gavin Shan @ 2014-02-25  5:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1393306670-17435-1-git-send-email-shangw@linux.vnet.ibm.com>

The patch introduces eeh_ops->event() so that we can pass various
events to underly platform. One reason to have that is to allocate
or free PHB diag-data for individual PEs on PowerNV platform in
future when EEH core to create or destroy PE instances.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    6 ++++++
 arch/powerpc/kernel/eeh_pe.c   |   14 ++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a61b06f..8fd1c2d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -71,6 +71,7 @@ struct eeh_pe {
 	struct list_head child_list;	/* Link PE to the child list	*/
 	struct list_head edevs;		/* Link list of EEH devices	*/
 	struct list_head child;		/* Child PEs			*/
+	void *data;			/* Platform dependent data	*/
 };
 
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
@@ -151,6 +152,10 @@ enum {
 #define EEH_LOG_TEMP		1	/* EEH temporary error log	*/
 #define EEH_LOG_PERM		2	/* EEH permanent error log	*/
 
+/* EEH events sent to platform */
+#define EEH_EVENT_PE_ALLOC	0
+#define EEH_EVENT_PE_FREE	1
+
 struct eeh_ops {
 	char *name;
 	int (*init)(void);
@@ -168,6 +173,7 @@ struct eeh_ops {
 	int (*write_config)(struct device_node *dn, int where, int size, u32 val);
 	int (*next_error)(struct eeh_pe **pe);
 	int (*restore_config)(struct device_node *dn);
+	int (*event)(int event, void *data);
 };
 
 extern struct eeh_ops *eeh_ops;
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 2add834..6cdc7a8 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -44,6 +44,7 @@ static LIST_HEAD(eeh_phb_pe);
 static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
 {
 	struct eeh_pe *pe;
+	int ret;
 
 	/* Allocate PHB PE */
 	pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
@@ -56,6 +57,16 @@ static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
 	INIT_LIST_HEAD(&pe->child);
 	INIT_LIST_HEAD(&pe->edevs);
 
+	if (eeh_ops->event) {
+		ret = eeh_ops->event(EEH_EVENT_PE_ALLOC, pe);
+		if (ret) {
+			pr_warn("%s: Can't alloc PE (%d)\n",
+				__func__, ret);
+			kfree(pe);
+			return NULL;
+		}
+	}
+
 	return pe;
 }
 
@@ -77,6 +88,9 @@ static void eeh_pe_free(struct eeh_pe *pe)
 		return;
 	}
 
+	if (eeh_ops->event)
+		eeh_ops->event(EEH_EVENT_PE_FREE, pe);
+
 	kfree(pe);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox