[PATCH v2 00/10] CXL EEH Handling

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/10] CXL EEH Handling
@ 2015-07-28  5:28 Daniel Axtens
  2015-07-28  5:28 ` [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state Daniel Axtens
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

CXL accelerators are unfortunately not immune from failure. This patch
set enables them to particpate in the Extended Error Handling process.

This series starts with a number of preparatory patches:

 - Patch 1 makes sure we don't touch the hardware when it has failed.
 
 - Patches 2-4 make the 'unplug' functions idempotent, so that if we
   get part way through recovery and then fail, being completely
   unplugged as part of removal doesn't cause us to oops out.

 - Patches 5 and 6 refactor init and teardown paths for the adapter
   and AFUs, so that they can be configured and deconfigured
   separately from their allocation and release.

 - Patch 7 stops cxl_reset from breaking EEH.

Patches 8 and 9 are parts of EEH.

 - Firstly we have a kernel flag that allows us to confidently assert
   the hardware will not change (be reflashed) when it it reset.

 - We then have the EEH support itself.

Finally, we add a CONFIG_CXL_EEH symbol that allows drivers to depend
on CXL EEH, or to be easily backportable if EEH is optional.

Changes from v1:
 - More comprehensive link down checks, including vPHB.
 - Rebased to apply cleanly to 4.2-rc4.
 - cxl reset changes.
 - CONFIG_CXL_EEH symbol addition.
 - add better vPHB support to EEH.

Daniel Axtens (10):
  cxl: Drop commands if the PCI channel is not in normal state
  cxl: Allocate and release the SPA with the AFU
  cxl: Make IRQ release idempotent
  cxl: Clean up adapter MMIO unmap path.
  cxl: Refactor adaptor init/teardown
  cxl: Refactor AFU init/teardown
  cxl: Don't remove AFUs/vPHBs in cxl_reset
  cxl: Allow the kernel to trust that an image won't change on PERST.
  cxl: EEH support
  cxl: Add CONFIG_CXL_EEH symbol

 Documentation/ABI/testing/sysfs-class-cxl |  10 +
 drivers/misc/cxl/Kconfig                  |   6 +
 drivers/misc/cxl/api.c                    |   9 +
 drivers/misc/cxl/context.c                |   6 +-
 drivers/misc/cxl/cxl.h                    |  41 ++-
 drivers/misc/cxl/file.c                   |  19 ++
 drivers/misc/cxl/irq.c                    |   9 +
 drivers/misc/cxl/native.c                 |  99 +++++-
 drivers/misc/cxl/pci.c                    | 521 ++++++++++++++++++++++++------
 drivers/misc/cxl/sysfs.c                  |  30 ++
 drivers/misc/cxl/vphb.c                   |  34 ++
 include/misc/cxl.h                        |  12 +
 12 files changed, 673 insertions(+), 123 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:31   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU Daniel Axtens
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

If the PCI channel has gone down, don't attempt to poke the hardware.

We need to guard every time cxl_whatever_(read|write) is called. This
is because a call to those functions will dereference an offset into an
mmio register, and the mmio mappings get invalidated in the EEH
teardown.

Check in the read/write functions in the header.
We give them the same semantics as usual PCI operations:
 - a write to a channel that is down is ignored.
 - a read from a channel that is down returns all fs.

Also, we try to access the MMIO space of a vPHB device as part of the
PCI disable path. Because that's a read that bypasses most of our usual
checks, we handle it explicitly.

As far as user visible warnings go:
 - Check link state in file ops, return -EIO if down.
 - Be reasonably quiet if there's an error in a teardown path,
   or when we already know the hardware is going down.
 - Throw a big WARN if someone tries to start a CXL operation
   while the card is down. This gives a useful stacktrace for
   debugging whatever is doing that.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/context.c |  6 +++-
 drivers/misc/cxl/cxl.h     | 34 ++++++++++++++++------
 drivers/misc/cxl/file.c    | 19 +++++++++++++
 drivers/misc/cxl/native.c  | 71 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/misc/cxl/vphb.c    | 26 +++++++++++++++++
 5 files changed, 144 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 1287148629c0..615842115848 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -193,7 +193,11 @@ int __detach_context(struct cxl_context *ctx)
 	if (status != STARTED)
 		return -EBUSY;
 
-	WARN_ON(cxl_detach_process(ctx));
+	/* Only warn if we detached while the link was OK.
+	 * If detach fails when hw is down, we don't care.
+	 */
+	WARN_ON(cxl_detach_process(ctx) &&
+		cxl_adapter_link_ok(ctx->afu->adapter));
 	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
 	put_pid(ctx->pid);
 	cxl_ctx_put();
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 4fd66cabde1e..47eadbcfd379 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -531,6 +531,14 @@ struct cxl_process_element {
 	__be32 software_state;
 } __packed;
 
+static inline bool cxl_adapter_link_ok(struct cxl *cxl)
+{
+	struct pci_dev *pdev;
+
+	pdev = to_pci_dev(cxl->dev.parent);
+	return (pdev->error_state == pci_channel_io_normal);
+}
+
 static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg)
 {
 	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
@@ -538,9 +546,11 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg)
 }
 
 #define cxl_p1_write(cxl, reg, val) \
-	out_be64(_cxl_p1_addr(cxl, reg), val)
+	if (cxl_adapter_link_ok(cxl)) \
+		out_be64(_cxl_p1_addr(cxl, reg), val)
 #define cxl_p1_read(cxl, reg) \
-	in_be64(_cxl_p1_addr(cxl, reg))
+	(cxl_adapter_link_ok(cxl) ? in_be64(_cxl_p1_addr(cxl, reg)) \
+	 : (~0ULL))
 
 static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg)
 {
@@ -549,9 +559,11 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg
 }
 
 #define cxl_p1n_write(afu, reg, val) \
-	out_be64(_cxl_p1n_addr(afu, reg), val)
+	if (cxl_adapter_link_ok(afu->adapter)) \
+		out_be64(_cxl_p1n_addr(afu, reg), val)
 #define cxl_p1n_read(afu, reg) \
-	in_be64(_cxl_p1n_addr(afu, reg))
+	(cxl_adapter_link_ok(afu->adapter) ? in_be64(_cxl_p1n_addr(afu, reg)) \
+	 : (~0ULL))
 
 static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg)
 {
@@ -559,15 +571,21 @@ static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg
 }
 
 #define cxl_p2n_write(afu, reg, val) \
-	out_be64(_cxl_p2n_addr(afu, reg), val)
+	if (cxl_adapter_link_ok(afu->adapter)) \
+		out_be64(_cxl_p2n_addr(afu, reg), val)
 #define cxl_p2n_read(afu, reg) \
-	in_be64(_cxl_p2n_addr(afu, reg))
+	(cxl_adapter_link_ok(afu->adapter) ? in_be64(_cxl_p2n_addr(afu, reg)) \
+	 : (~0ULL))
 
 
 #define cxl_afu_cr_read64(afu, cr, off) \
-	in_le64((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off))
+	(cxl_adapter_link_ok(afu->adapter) ? \
+	 in_le64((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off)) : \
+	 (~0ULL))
 #define cxl_afu_cr_read32(afu, cr, off) \
-	in_le32((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off))
+	(cxl_adapter_link_ok(afu->adapter) ? \
+	 in_le32((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off)) : \
+	 0xffffffff)
 u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off);
 u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off);
 
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index e3f4b69527a9..b99b4fa84ebd 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -73,6 +73,11 @@ static int __afu_open(struct inode *inode, struct file *file, bool master)
 	if (!afu->current_mode)
 		goto err_put_afu;
 
+	if (!cxl_adapter_link_ok(adapter)) {
+		rc = -EIO;
+		goto err_put_afu;
+	}
+
 	if (!(ctx = cxl_context_alloc())) {
 		rc = -ENOMEM;
 		goto err_put_afu;
@@ -238,6 +243,9 @@ long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	if (ctx->status == CLOSED)
 		return -EIO;
 
+	if (!cxl_adapter_link_ok(ctx->afu->adapter))
+		return -EIO;
+
 	pr_devel("afu_ioctl\n");
 	switch (cmd) {
 	case CXL_IOCTL_START_WORK:
@@ -265,6 +273,9 @@ int afu_mmap(struct file *file, struct vm_area_struct *vm)
 	if (ctx->status != STARTED)
 		return -EIO;
 
+	if (!cxl_adapter_link_ok(ctx->afu->adapter))
+		return -EIO;
+
 	return cxl_context_iomap(ctx, vm);
 }
 
@@ -309,6 +320,9 @@ ssize_t afu_read(struct file *file, char __user *buf, size_t count,
 	int rc;
 	DEFINE_WAIT(wait);
 
+	if (!cxl_adapter_link_ok(ctx->afu->adapter))
+		return -EIO;
+
 	if (count < CXL_READ_MIN_SIZE)
 		return -EINVAL;
 
@@ -319,6 +333,11 @@ ssize_t afu_read(struct file *file, char __user *buf, size_t count,
 		if (ctx_event_pending(ctx))
 			break;
 
+		if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
+			rc = -EIO;
+			goto out;
+		}
+
 		if (file->f_flags & O_NONBLOCK) {
 			rc = -EAGAIN;
 			goto out;
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 10567f245818..16948915eb0d 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -41,6 +41,12 @@ static int afu_control(struct cxl_afu *afu, u64 command,
 			rc = -EBUSY;
 			goto out;
 		}
+		if (!cxl_adapter_link_ok(afu->adapter)) {
+			afu->enabled = enabled;
+			rc = -EIO;
+			goto out;
+		}
+
 		pr_devel_ratelimited("AFU control... (0x%.16llx)\n",
 				     AFU_Cntl | command);
 		cpu_relax();
@@ -85,6 +91,10 @@ int __cxl_afu_reset(struct cxl_afu *afu)
 
 int cxl_afu_check_and_enable(struct cxl_afu *afu)
 {
+	if (!cxl_adapter_link_ok(afu->adapter)) {
+		WARN(1, "Refusing to enable afu while link down!\n");
+		return -EIO;
+	}
 	if (afu->enabled)
 		return 0;
 	return afu_enable(afu);
@@ -103,6 +113,12 @@ int cxl_psl_purge(struct cxl_afu *afu)
 
 	pr_devel("PSL purge request\n");
 
+	if (!cxl_adapter_link_ok(afu->adapter)) {
+		dev_warn(&afu->dev, "PSL Purge called with link down, ignoring\n");
+		rc = -EIO;
+		goto out;
+	}
+
 	if ((AFU_Cntl & CXL_AFU_Cntl_An_ES_MASK) != CXL_AFU_Cntl_An_ES_Disabled) {
 		WARN(1, "psl_purge request while AFU not disabled!\n");
 		cxl_afu_disable(afu);
@@ -119,6 +135,11 @@ int cxl_psl_purge(struct cxl_afu *afu)
 			rc = -EBUSY;
 			goto out;
 		}
+		if (!cxl_adapter_link_ok(afu->adapter)) {
+			rc = -EIO;
+			goto out;
+		}
+
 		dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
 		pr_devel_ratelimited("PSL purging... PSL_CNTL: 0x%.16llx  PSL_DSISR: 0x%.16llx\n", PSL_CNTL, dsisr);
 		if (dsisr & CXL_PSL_DSISR_TRANS) {
@@ -215,6 +236,8 @@ int cxl_tlb_slb_invalidate(struct cxl *adapter)
 			dev_warn(&adapter->dev, "WARNING: CXL adapter wide TLBIA timed out!\n");
 			return -EBUSY;
 		}
+		if (!cxl_adapter_link_ok(adapter))
+			return -EIO;
 		cpu_relax();
 	}
 
@@ -224,6 +247,8 @@ int cxl_tlb_slb_invalidate(struct cxl *adapter)
 			dev_warn(&adapter->dev, "WARNING: CXL adapter wide SLBIA timed out!\n");
 			return -EBUSY;
 		}
+		if (!cxl_adapter_link_ok(adapter))
+			return -EIO;
 		cpu_relax();
 	}
 	return 0;
@@ -240,6 +265,11 @@ int cxl_afu_slbia(struct cxl_afu *afu)
 			dev_warn(&afu->dev, "WARNING: CXL AFU SLBIA timed out!\n");
 			return -EBUSY;
 		}
+		/* If the adapter has gone down, we can assume that we
+		 * will PERST it and that will invalidate everything.
+		 */
+		if (!cxl_adapter_link_ok(afu->adapter))
+			return -EIO;
 		cpu_relax();
 	}
 	return 0;
@@ -279,6 +309,8 @@ static void slb_invalid(struct cxl_context *ctx)
 	cxl_p1_write(adapter, CXL_PSL_SLBIA, CXL_TLB_SLB_IQ_LPIDPID);
 
 	while (1) {
+		if (!cxl_adapter_link_ok(adapter))
+			break;
 		slbia = cxl_p1_read(adapter, CXL_PSL_SLBIA);
 		if (!(slbia & CXL_TLB_SLB_P))
 			break;
@@ -308,6 +340,11 @@ static int do_process_element_cmd(struct cxl_context *ctx,
 			rc = -EBUSY;
 			goto out;
 		}
+		if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
+			dev_warn(&ctx->afu->dev, "WARNING: Device link down, aborting Process Element Command!\n");
+			rc = -EIO;
+			goto out;
+		}
 		state = be64_to_cpup(ctx->afu->sw_command_status);
 		if (state == ~0ULL) {
 			pr_err("cxl: Error adding process element to AFU\n");
@@ -355,8 +392,13 @@ static int terminate_process_element(struct cxl_context *ctx)
 
 	mutex_lock(&ctx->afu->spa_mutex);
 	pr_devel("%s Terminate pe: %i started\n", __func__, ctx->pe);
-	rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
-				    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
+	/* We could be asked to terminate when the hw is down.  That
+	 * should always succeed: it's not running if the hw has gone
+	 * away and is being reset.
+	 */
+	if (cxl_adapter_link_ok(ctx->afu->adapter))
+		rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
+					    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
 	ctx->elem->software_state = 0;	/* Remove Valid bit */
 	pr_devel("%s Terminate pe: %i finished\n", __func__, ctx->pe);
 	mutex_unlock(&ctx->afu->spa_mutex);
@@ -369,7 +411,14 @@ static int remove_process_element(struct cxl_context *ctx)
 
 	mutex_lock(&ctx->afu->spa_mutex);
 	pr_devel("%s Remove pe: %i started\n", __func__, ctx->pe);
-	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0)))
+
+	/* We could be asked to remove when the hw is down.  Again, if
+	 * the hw is down, the PE is gone, so we succeed.
+	 */
+	if (cxl_adapter_link_ok(ctx->afu->adapter))
+		rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0);
+
+	if (!rc)
 		ctx->pe_inserted = false;
 	slb_invalid(ctx);
 	pr_devel("%s Remove pe: %i finished\n", __func__, ctx->pe);
@@ -614,6 +663,11 @@ int cxl_afu_activate_mode(struct cxl_afu *afu, int mode)
 	if (!(mode & afu->modes_supported))
 		return -EINVAL;
 
+	if (!cxl_adapter_link_ok(afu->adapter)) {
+		WARN(1, "Device link is down, refusing to activate!\n");
+		return -EIO;
+	}
+
 	if (mode == CXL_MODE_DIRECTED)
 		return activate_afu_directed(afu);
 	if (mode == CXL_MODE_DEDICATED)
@@ -624,6 +678,11 @@ int cxl_afu_activate_mode(struct cxl_afu *afu, int mode)
 
 int cxl_attach_process(struct cxl_context *ctx, bool kernel, u64 wed, u64 amr)
 {
+	if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
+		WARN(1, "Device link is down, refusing to attach process!\n");
+		return -EIO;
+	}
+
 	ctx->kernel = kernel;
 	if (ctx->afu->current_mode == CXL_MODE_DIRECTED)
 		return attach_afu_directed(ctx, wed, amr);
@@ -668,6 +727,12 @@ int cxl_get_irq(struct cxl_afu *afu, struct cxl_irq_info *info)
 {
 	u64 pidtid;
 
+	/* If the adapter has gone away, we can't get any meaningful
+	 * information.
+	 */
+	if (!cxl_adapter_link_ok(afu->adapter))
+		return -EIO;
+
 	info->dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
 	info->dar = cxl_p2n_read(afu, CXL_PSL_DAR_An);
 	info->dsr = cxl_p2n_read(afu, CXL_PSL_DSR_An);
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 2eba002b580b..2930911c1e42 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -138,6 +138,26 @@ static int cxl_pcie_config_info(struct pci_bus *bus, unsigned int devfn,
 	return 0;
 }
 
+
+static inline bool cxl_config_link_ok(struct pci_bus *bus)
+{
+	struct pci_controller *phb;
+	struct cxl_afu *afu;
+
+	/* Config space IO is based on phb->cfg_addr, which is based on
+	 * afu_desc_mmio. This isn't safe to read/write when the link
+	 * goes down, as EEH tears down MMIO space.
+	 *
+	 * Check if the link is OK before proceeding.
+	 */
+
+	phb = pci_bus_to_host(bus);
+	if (phb == NULL)
+		return false;
+	afu = (struct cxl_afu *)phb->private_data;
+	return cxl_adapter_link_ok(afu->adapter);
+}
+
 static int cxl_pcie_read_config(struct pci_bus *bus, unsigned int devfn,
 				int offset, int len, u32 *val)
 {
@@ -150,6 +170,9 @@ static int cxl_pcie_read_config(struct pci_bus *bus, unsigned int devfn,
 	if (rc)
 		return rc;
 
+	if (!cxl_config_link_ok(bus))
+		return PCIBIOS_DEVICE_NOT_FOUND;
+
 	/* Can only read 32 bits */
 	*val = (in_le32(ioaddr) >> shift) & mask;
 	return PCIBIOS_SUCCESSFUL;
@@ -167,6 +190,9 @@ static int cxl_pcie_write_config(struct pci_bus *bus, unsigned int devfn,
 	if (rc)
 		return rc;
 
+	if (!cxl_config_link_ok(bus))
+		return PCIBIOS_DEVICE_NOT_FOUND;
+
 	/* Can only write 32 bits so do read-modify-write */
 	mask <<= shift;
 	val <<= shift;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
  2015-07-28  5:28 ` [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:42   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 03/10] cxl: Make IRQ release idempotent Daniel Axtens
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

Previously the SPA was allocated and freed upon entering and leaving
AFU-directed mode. This causes some issues for error recovery - contexts
hold a pointer inside the SPA, and they may persist after the AFU has
been detached.

We would ideally like to allocate the SPA when the AFU is allocated, and
release it until the AFU is released. However, we don't know how big the
SPA needs to be until we read the AFU descriptor.

Therefore, restructure the code:

 - Allocate the SPA only once, on the first attach.

 - Release the SPA only when the entire AFU is being released (not
   detached). Guard the release with a NULL check, so we don't free
   if it was never allocated (e.g. dedicated mode)

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/cxl.h    |  3 +++
 drivers/misc/cxl/native.c | 28 ++++++++++++++++++----------
 drivers/misc/cxl/pci.c    |  3 +++
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 47eadbcfd379..88a88c445e2a 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -603,6 +603,9 @@ void unregister_cxl_calls(struct cxl_calls *calls);
 int cxl_alloc_adapter_nr(struct cxl *adapter);
 void cxl_remove_adapter_nr(struct cxl *adapter);
 
+int cxl_alloc_spa(struct cxl_afu *afu);
+void cxl_release_spa(struct cxl_afu *afu);
+
 int cxl_file_init(void);
 void cxl_file_exit(void);
 int cxl_register_adapter(struct cxl *adapter);
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 16948915eb0d..debd97147b58 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -182,10 +182,8 @@ static int spa_max_procs(int spa_size)
 	return ((spa_size / 8) - 96) / 17;
 }
 
-static int alloc_spa(struct cxl_afu *afu)
+int cxl_alloc_spa(struct cxl_afu *afu)
 {
-	u64 spap;
-
 	/* Work out how many pages to allocate */
 	afu->spa_order = 0;
 	do {
@@ -204,6 +202,13 @@ static int alloc_spa(struct cxl_afu *afu)
 	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
 		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
 
+	return 0;
+}
+
+static void attach_spa(struct cxl_afu *afu)
+{
+	u64 spap;
+
 	afu->sw_command_status = (__be64 *)((char *)afu->spa +
 					    ((afu->spa_max_procs + 3) * 128));
 
@@ -212,13 +217,15 @@ static int alloc_spa(struct cxl_afu *afu)
 	spap |= CXL_PSL_SPAP_V;
 	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
 	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
-
-	return 0;
 }
 
-static void release_spa(struct cxl_afu *afu)
+static inline void detach_spa(struct cxl_afu *afu)
 {
 	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);
+}
+
+void cxl_release_spa(struct cxl_afu *afu)
+{
 	free_pages((unsigned long) afu->spa, afu->spa_order);
 }
 
@@ -446,8 +453,11 @@ static int activate_afu_directed(struct cxl_afu *afu)
 
 	dev_info(&afu->dev, "Activating AFU directed mode\n");
 
-	if (alloc_spa(afu))
-		return -ENOMEM;
+	if (afu->spa == NULL) {
+		if (cxl_alloc_spa(afu))
+			return -ENOMEM;
+	}
+	attach_spa(afu);
 
 	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
 	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
@@ -560,8 +570,6 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
 	cxl_afu_disable(afu);
 	cxl_psl_purge(afu);
 
-	release_spa(afu);
-
 	return 0;
 }
 
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 32ad09705949..1849c1785b49 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -551,6 +551,9 @@ static void cxl_release_afu(struct device *dev)
 
 	pr_devel("cxl_release_afu\n");
 
+	if (afu->spa)
+		cxl_release_spa(afu);
+
 	kfree(afu);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 03/10] cxl: Make IRQ release idempotent
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
  2015-07-28  5:28 ` [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state Daniel Axtens
  2015-07-28  5:28 ` [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:44   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path Daniel Axtens
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

Check if an IRQ is mapped before releasing it.

This will simplify future EEH code by allowing unconditional unmapping
of IRQs.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/irq.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 680cd263436d..121ec48f3ab4 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -341,6 +341,9 @@ int cxl_register_psl_err_irq(struct cxl *adapter)
 
 void cxl_release_psl_err_irq(struct cxl *adapter)
 {
+	if (adapter->err_virq != irq_find_mapping(NULL, adapter->err_hwirq))
+		return;
+
 	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
 	cxl_unmap_irq(adapter->err_virq, adapter);
 	cxl_release_one_irq(adapter, adapter->err_hwirq);
@@ -374,6 +377,9 @@ int cxl_register_serr_irq(struct cxl_afu *afu)
 
 void cxl_release_serr_irq(struct cxl_afu *afu)
 {
+	if (afu->serr_virq != irq_find_mapping(NULL, afu->serr_hwirq))
+		return;
+
 	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
 	cxl_unmap_irq(afu->serr_virq, afu);
 	cxl_release_one_irq(afu->adapter, afu->serr_hwirq);
@@ -400,6 +406,9 @@ int cxl_register_psl_irq(struct cxl_afu *afu)
 
 void cxl_release_psl_irq(struct cxl_afu *afu)
 {
+	if (afu->psl_virq != irq_find_mapping(NULL, afu->psl_hwirq))
+		return;
+
 	cxl_unmap_irq(afu->psl_virq, afu);
 	cxl_release_one_irq(afu->adapter, afu->psl_hwirq);
 	kfree(afu->psl_irq_name);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path.
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (2 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 03/10] cxl: Make IRQ release idempotent Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:52   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 05/10] cxl: Refactor adaptor init/teardown Daniel Axtens
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

 - MMIO pointer unmapping is guarded by a null pointer check.
   However, iounmap doesn't null the pointer, just invalidate it.
   Therefore, explicitly null the pointer after unmapping.

 - afu_desc_mmio also needs to be unmapped.

 - PCI regions are allocated in cxl_map_adapter_regs.
   Therefore they should be released in unmap, not elsewhere.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/pci.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 1849c1785b49..adcf938f2fdb 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -539,10 +539,18 @@ err:
 
 static void cxl_unmap_slice_regs(struct cxl_afu *afu)
 {
-	if (afu->p2n_mmio)
+	if (afu->p2n_mmio) {
 		iounmap(afu->p2n_mmio);
-	if (afu->p1n_mmio)
+		afu->p2n_mmio = NULL;
+	}
+	if (afu->p1n_mmio) {
 		iounmap(afu->p1n_mmio);
+		afu->p1n_mmio = NULL;
+	}
+	if (afu->afu_desc_mmio) {
+		iounmap(afu->afu_desc_mmio);
+		afu->afu_desc_mmio = NULL;
+	}
 }
 
 static void cxl_release_afu(struct device *dev)
@@ -920,10 +928,16 @@ err1:
 
 static void cxl_unmap_adapter_regs(struct cxl *adapter)
 {
-	if (adapter->p1_mmio)
+	if (adapter->p1_mmio) {
 		iounmap(adapter->p1_mmio);
-	if (adapter->p2_mmio)
+		adapter->p1_mmio = NULL;
+		pci_release_region(to_pci_dev(adapter->dev.parent), 2);
+	}
+	if (adapter->p2_mmio) {
 		iounmap(adapter->p2_mmio);
+		adapter->p2_mmio = NULL;
+		pci_release_region(to_pci_dev(adapter->dev.parent), 0);
+	}
 }
 
 static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev)
@@ -1132,8 +1146,6 @@ static void cxl_remove_adapter(struct cxl *adapter)
 
 	device_unregister(&adapter->dev);
 
-	pci_release_region(pdev, 0);
-	pci_release_region(pdev, 2);
 	pci_disable_device(pdev);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (3 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  6:01   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 06/10] cxl: Refactor AFU init/teardown Daniel Axtens
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

Some aspects of initialisation are done only once in the lifetime of
an adapter: for example, allocating memory for the adapter,
allocating the adapter number, or setting up sysfs/debugfs files.

However, we may want to be able to do some parts of the
initialisation multiple times: for example, in error recovery we
want to be able to tear down and then re-map IO memory and IRQs.

Therefore, refactor CXL init/teardown as follows.

 - Keep the overarching functions 'cxl_init_adapter' and its pair,
   'cxl_remove_adapter'.

 - Move all 'once only' allocation/freeing steps to the existing
   'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
   (This involves moving allocation of the adapter number out of
   cxl_init_adapter.)

 - Create two new functions: 'cxl_configure_adapter', and its pair
   'cxl_deconfigure_adapter'. These two functions 'wire up' the
   hardware --- they (de)configure resources that do not need to
   last the entire lifetime of the adapter

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/pci.c | 138 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 85 insertions(+), 53 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index adcf938f2fdb..7f47e2221524 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -966,7 +966,6 @@ static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev)
 	CXL_READ_VSEC_BASE_IMAGE(dev, vsec, &adapter->base_image);
 	CXL_READ_VSEC_IMAGE_STATE(dev, vsec, &image_state);
 	adapter->user_image_loaded = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
-	adapter->perst_loads_image = true;
 	adapter->perst_select_user = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
 
 	CXL_READ_VSEC_NAFUS(dev, vsec, &adapter->slices);
@@ -1026,22 +1025,34 @@ static void cxl_release_adapter(struct device *dev)
 
 	pr_devel("cxl_release_adapter\n");
 
+	cxl_remove_adapter_nr(adapter);
+
 	kfree(adapter);
 }
 
-static struct cxl *cxl_alloc_adapter(struct pci_dev *dev)
+static struct cxl *cxl_alloc_adapter(void)
 {
 	struct cxl *adapter;
+	int rc;
 
 	if (!(adapter = kzalloc(sizeof(struct cxl), GFP_KERNEL)))
 		return NULL;
 
-	adapter->dev.parent = &dev->dev;
-	adapter->dev.release = cxl_release_adapter;
-	pci_set_drvdata(dev, adapter);
 	spin_lock_init(&adapter->afu_list_lock);
 
+	if ((rc = cxl_alloc_adapter_nr(adapter)))
+		goto err1;
+
+	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
+		goto err2;
+
 	return adapter;
+
+err2:
+	cxl_remove_adapter_nr(adapter);
+err1:
+	kfree(adapter);
+	return NULL;
 }
 
 static int sanitise_adapter_regs(struct cxl *adapter)
@@ -1050,57 +1061,94 @@ static int sanitise_adapter_regs(struct cxl *adapter)
 	return cxl_tlb_slb_invalidate(adapter);
 }
 
-static struct cxl *cxl_init_adapter(struct pci_dev *dev)
+/* This should contain *only* operations that can safely be done in
+ * both creation and recovery.
+ */
+static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 {
-	struct cxl *adapter;
-	bool free = true;
 	int rc;
 
+	adapter->dev.parent = &dev->dev;
+	adapter->dev.release = cxl_release_adapter;
+	pci_set_drvdata(dev, adapter);
 
-	if (!(adapter = cxl_alloc_adapter(dev)))
-		return ERR_PTR(-ENOMEM);
+	if ((rc = pci_enable_device(dev))) {
+		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
+		return rc;
+	}
 
 	if ((rc = cxl_read_vsec(adapter, dev)))
-		goto err1;
+		return rc;
 
 	if ((rc = cxl_vsec_looks_ok(adapter, dev)))
-		goto err1;
+	        return rc;
 
 	if ((rc = setup_cxl_bars(dev)))
-		goto err1;
+		return rc;
 
 	if ((rc = switch_card_to_cxl(dev)))
-		goto err1;
-
-	if ((rc = cxl_alloc_adapter_nr(adapter)))
-		goto err1;
-
-	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
-		goto err2;
+		return rc;
 
 	if ((rc = cxl_update_image_control(adapter)))
-		goto err2;
+		return rc;
 
 	if ((rc = cxl_map_adapter_regs(adapter, dev)))
-		goto err2;
+		return rc;
 
 	if ((rc = sanitise_adapter_regs(adapter)))
-		goto err2;
+		goto err;
 
 	if ((rc = init_implementation_adapter_regs(adapter, dev)))
-		goto err3;
+		goto err;
 
 	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_CAPI)))
-		goto err3;
+		goto err;
 
 	/* If recovery happened, the last step is to turn on snooping.
 	 * In the non-recovery case this has no effect */
-	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON))) {
-		goto err3;
-	}
+	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON)))
+		goto err;
 
 	if ((rc = cxl_register_psl_err_irq(adapter)))
-		goto err3;
+		goto err;
+
+	return 0;
+
+err:
+	cxl_unmap_adapter_regs(adapter);
+	return rc;
+
+}
+
+static void cxl_deconfigure_adapter(struct cxl *adapter)
+{
+	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
+
+	cxl_release_psl_err_irq(adapter);
+	cxl_unmap_adapter_regs(adapter);
+
+	pci_disable_device(pdev);
+}
+
+static struct cxl *cxl_init_adapter(struct pci_dev *dev)
+{
+	struct cxl *adapter;
+	int rc;
+
+	adapter = cxl_alloc_adapter();
+	if (!adapter)
+		return ERR_PTR(-ENOMEM);
+
+	/* Set defaults for parameters which need to persist over
+	 * configure/reconfigure
+	 */
+	adapter->perst_loads_image = true;
+
+	if ((rc = cxl_configure_adapter(adapter, dev))) {
+		pci_disable_device(dev);
+		cxl_release_adapter(&adapter->dev);
+		return ERR_PTR(rc);
+	}
 
 	/* Don't care if this one fails: */
 	cxl_debugfs_adapter_add(adapter);
@@ -1118,35 +1166,25 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev)
 	return adapter;
 
 err_put1:
-	device_unregister(&adapter->dev);
-	free = false;
+	/* This should mirror cxl_remove_adapter, except without the
+	 * sysfs parts
+	 */
 	cxl_debugfs_adapter_remove(adapter);
-	cxl_release_psl_err_irq(adapter);
-err3:
-	cxl_unmap_adapter_regs(adapter);
-err2:
-	cxl_remove_adapter_nr(adapter);
-err1:
-	if (free)
-		kfree(adapter);
+	cxl_deconfigure_adapter(adapter);
+	device_unregister(&adapter->dev);
 	return ERR_PTR(rc);
 }
 
 static void cxl_remove_adapter(struct cxl *adapter)
 {
-	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
-
-	pr_devel("cxl_release_adapter\n");
+	pr_devel("cxl_remove_adapter\n");
 
 	cxl_sysfs_adapter_remove(adapter);
 	cxl_debugfs_adapter_remove(adapter);
-	cxl_release_psl_err_irq(adapter);
-	cxl_unmap_adapter_regs(adapter);
-	cxl_remove_adapter_nr(adapter);
 
-	device_unregister(&adapter->dev);
+	cxl_deconfigure_adapter(adapter);
 
-	pci_disable_device(pdev);
+	device_unregister(&adapter->dev);
 }
 
 static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
@@ -1160,15 +1198,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	if (cxl_verbose)
 		dump_cxl_config_space(dev);
 
-	if ((rc = pci_enable_device(dev))) {
-		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
-		return rc;
-	}
-
 	adapter = cxl_init_adapter(dev);
 	if (IS_ERR(adapter)) {
 		dev_err(&dev->dev, "cxl_init_adapter failed: %li\n", PTR_ERR(adapter));
-		pci_disable_device(dev);
 		return PTR_ERR(adapter);
 	}
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 06/10] cxl: Refactor AFU init/teardown
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (4 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 05/10] cxl: Refactor adaptor init/teardown Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:59   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset Daniel Axtens
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

As with an adapter, some aspects of initialisation are done only once
in the lifetime of an AFU: for example, allocating memory, or setting
up sysfs/debugfs files.

However, we may want to be able to do some parts of the initialisation
multiple times: for example, in error recovery we want to be able to
tear down and then re-map IO memory and IRQs.

Therefore, refactor AFU init/teardown as follows.

 - Create two new functions: 'cxl_configure_afu', and its pair
   'cxl_deconfigure_afu'. As with the adapter functions,
   these (de)configure resources that do not need to last the entire
   lifetime of the AFU.

 - Allocating and releasing memory remain the task of 'cxl_alloc_afu'
   and 'cxl_release_afu'.

 - Once-only functions that do not involve allocating/releasing memory
   stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
   However, the task of picking an AFU mode and activating it has been
   broken out.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/pci.c | 87 +++++++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 37 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 7f47e2221524..98a8207da88d 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -753,45 +753,67 @@ ssize_t cxl_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 	return count;
 }
 
-static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
+static int cxl_configure_afu(struct cxl_afu *afu, struct cxl *adapter, struct pci_dev *dev)
 {
-	struct cxl_afu *afu;
-	bool free = true;
 	int rc;
 
-	if (!(afu = cxl_alloc_afu(adapter, slice)))
-		return -ENOMEM;
-
-	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
-		goto err1;
-
 	if ((rc = cxl_map_slice_regs(afu, adapter, dev)))
-		goto err1;
+		return rc;
 
 	if ((rc = sanitise_afu_regs(afu)))
-		goto err2;
+		goto err1;
 
 	/* We need to reset the AFU before we can read the AFU descriptor */
 	if ((rc = __cxl_afu_reset(afu)))
-		goto err2;
+		goto err1;
 
 	if (cxl_verbose)
 		dump_afu_descriptor(afu);
 
 	if ((rc = cxl_read_afu_descriptor(afu)))
-		goto err2;
+		goto err1;
 
 	if ((rc = cxl_afu_descriptor_looks_ok(afu)))
-		goto err2;
+		goto err1;
 
 	if ((rc = init_implementation_afu_regs(afu)))
-		goto err2;
+		goto err1;
 
 	if ((rc = cxl_register_serr_irq(afu)))
-		goto err2;
+		goto err1;
 
 	if ((rc = cxl_register_psl_irq(afu)))
-		goto err3;
+		goto err2;
+
+	return 0;
+
+err2:
+	cxl_release_serr_irq(afu);
+err1:
+	cxl_unmap_slice_regs(afu);
+	return rc;
+}
+
+static void cxl_deconfigure_afu(struct cxl_afu *afu)
+{
+	cxl_release_psl_irq(afu);
+	cxl_release_serr_irq(afu);
+	cxl_unmap_slice_regs(afu);
+}
+
+static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
+{
+	struct cxl_afu *afu;
+	int rc;
+
+	if (!(afu = cxl_alloc_afu(adapter, slice)))
+		return -ENOMEM;
+
+	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
+		goto err_free;
+
+	if ((rc = cxl_configure_afu(afu, adapter, dev)))
+		goto err_free;
 
 	/* Don't care if this fails */
 	cxl_debugfs_afu_add(afu);
@@ -806,10 +828,6 @@ static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
 	if ((rc = cxl_sysfs_afu_add(afu)))
 		goto err_put1;
 
-
-	if ((rc = cxl_afu_select_best_mode(afu)))
-		goto err_put2;
-
 	adapter->afu[afu->slice] = afu;
 
 	if ((rc = cxl_pci_vphb_add(afu)))
@@ -817,21 +835,16 @@ static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
 
 	return 0;
 
-err_put2:
-	cxl_sysfs_afu_remove(afu);
 err_put1:
-	device_unregister(&afu->dev);
-	free = false;
+	cxl_deconfigure_afu(afu);
 	cxl_debugfs_afu_remove(afu);
-	cxl_release_psl_irq(afu);
-err3:
-	cxl_release_serr_irq(afu);
-err2:
-	cxl_unmap_slice_regs(afu);
-err1:
-	if (free)
-		kfree(afu);
+	device_unregister(&afu->dev);
 	return rc;
+
+err_free:
+	kfree(afu);
+	return rc;
+
 }
 
 static void cxl_remove_afu(struct cxl_afu *afu)
@@ -851,10 +864,7 @@ static void cxl_remove_afu(struct cxl_afu *afu)
 	cxl_context_detach_all(afu);
 	cxl_afu_deactivate_mode(afu);
 
-	cxl_release_psl_irq(afu);
-	cxl_release_serr_irq(afu);
-	cxl_unmap_slice_regs(afu);
-
+	cxl_deconfigure_afu(afu);
 	device_unregister(&afu->dev);
 }
 
@@ -1207,6 +1217,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	for (slice = 0; slice < adapter->slices; slice++) {
 		if ((rc = cxl_init_afu(adapter, slice, dev)))
 			dev_err(&dev->dev, "AFU %i failed to initialise: %i\n", slice, rc);
+
+		if ((rc = cxl_afu_select_best_mode(adapter->afu[slice])))
+			dev_err(&dev->dev, "AFU %i failed to start: %i\n", slice, rc);
 	}
 
 	return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (5 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 06/10] cxl: Refactor AFU init/teardown Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  5:57   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST Daniel Axtens
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.

If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.

In both cases, we shouldn't remove the AFU/vPHB.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/pci.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 98a8207da88d..0acf9e62733e 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -877,11 +877,6 @@ int cxl_reset(struct cxl *adapter)
 
 	dev_info(&dev->dev, "CXL reset\n");
 
-	for (i = 0; i < adapter->slices; i++) {
-		cxl_pci_vphb_remove(adapter->afu[i]);
-		cxl_remove_afu(adapter->afu[i]);
-	}
-
 	/* pcie_warm_reset requests a fundamental pci reset which includes a
 	 * PERST assert/deassert.  PERST triggers a loading of the image
 	 * if "user" or "factory" is selected in sysfs */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST.
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (6 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  7:15   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 09/10] cxl: EEH support Daniel Axtens
  2015-07-28  5:28 ` [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol Daniel Axtens
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

Provide a kernel API and a sysfs entry which allow a user to specify
that when a card is PERSTed, it's image will stay the same, allowing
it to participate in EEH.

cxl_reset is used to reflash the card. In that case, we cannot safely
assert that the image will not change. Therefore, disallow cxl_reset
if the flag is set.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 Documentation/ABI/testing/sysfs-class-cxl | 10 ++++++++++
 drivers/misc/cxl/api.c                    |  9 +++++++++
 drivers/misc/cxl/cxl.h                    |  3 +++
 drivers/misc/cxl/pci.c                    | 11 +++++++++++
 drivers/misc/cxl/sysfs.c                  | 30 ++++++++++++++++++++++++++++++
 include/misc/cxl.h                        | 12 ++++++++++++
 6 files changed, 75 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
index acfe9df83139..b07e86d4597f 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -223,3 +223,13 @@ Description:    write only
                 Writing 1 will issue a PERST to card which may cause the card
                 to reload the FPGA depending on load_image_on_perst.
 Users:		https://github.com/ibm-capi/libcxl
+
+What:		/sys/class/cxl/<card>/perst_reloads_same_image
+Date:		July 2015
+Contact:	linuxppc-dev@lists.ozlabs.org
+Description:	read/write
+		Trust that when an image is reloaded via PERST, it will not
+		have changed.
+		0 = don't trust, the image may be different (default)
+		1 = trust that the image will not change.
+Users:		https://github.com/ibm-capi/libcxl
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 729e0851167d..c1012ced0323 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -327,3 +327,12 @@ int cxl_afu_reset(struct cxl_context *ctx)
 	return cxl_afu_check_and_enable(afu);
 }
 EXPORT_SYMBOL_GPL(cxl_afu_reset);
+
+#ifdef CONFIG_CXL_EEH
+void cxl_perst_reloads_same_image(struct cxl_afu *afu,
+				  bool perst_reloads_same_image)
+{
+	afu->adapter->perst_same_image = perst_reloads_same_image;
+}
+EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
+#endif
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 88a88c445e2a..6dd4158f76ac 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -493,6 +493,9 @@ struct cxl {
 	bool user_image_loaded;
 	bool perst_loads_image;
 	bool perst_select_user;
+#ifdef CONFIG_CXL_EEH
+	bool perst_same_image;
+#endif
 };
 
 int cxl_alloc_one_irq(struct cxl *adapter);
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 0acf9e62733e..b6a189b35323 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -875,6 +875,14 @@ int cxl_reset(struct cxl *adapter)
 	int i;
 	u32 val;
 
+#ifdef CONFIG_CXL_EEH
+	if (adapter->perst_same_image) {
+		dev_warn(&dev->dev,
+			 "cxl: refusing to reset/reflash when perst_reloads_same_image is set.\n");
+		return -EINVAL;
+	}
+#endif
+
 	dev_info(&dev->dev, "CXL reset\n");
 
 	/* pcie_warm_reset requests a fundamental pci reset which includes a
@@ -1148,6 +1156,9 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev)
 	 * configure/reconfigure
 	 */
 	adapter->perst_loads_image = true;
+#ifdef CONFIG_CXL_EEH
+	adapter->perst_same_image = false;
+#endif
 
 	if ((rc = cxl_configure_adapter(adapter, dev))) {
 		pci_disable_device(dev);
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index 31f38bc71a3d..4bcb63258e3e 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -112,12 +112,42 @@ static ssize_t load_image_on_perst_store(struct device *device,
 	return count;
 }
 
+#ifdef CONFIG_CXL_EEH
+static ssize_t perst_reloads_same_image_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->perst_same_image);
+}
+
+static ssize_t perst_reloads_same_image_store(struct device *device,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct cxl *adapter = to_cxl_adapter(device);
+	int rc;
+	int val;
+
+	rc = sscanf(buf, "%i", &val);
+	if ((rc != 1) || !(val == 1 || val == 0))
+		return -EINVAL;
+
+	adapter->perst_same_image = (val == 1 ? true : false);
+	return count;
+}
+#endif /* CONFIG_CXL_EEH */
+
 static struct device_attribute adapter_attrs[] = {
 	__ATTR_RO(caia_version),
 	__ATTR_RO(psl_revision),
 	__ATTR_RO(base_image),
 	__ATTR_RO(image_loaded),
 	__ATTR_RW(load_image_on_perst),
+#ifdef CONFIG_CXL_EEH
+	__ATTR_RW(perst_reloads_same_image),
+#endif
 	__ATTR(reset, S_IWUSR, NULL, reset_adapter_store),
 };
 
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index 7a6c1d6cc173..7f2cdc6caab3 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -200,4 +200,16 @@ unsigned int cxl_fd_poll(struct file *file, struct poll_table_struct *poll);
 ssize_t cxl_fd_read(struct file *file, char __user *buf, size_t count,
 			   loff_t *off);
 
+#ifdef CONFIG_CXL_EEH
+/*
+ * For EEH, a driver may want to assert a PERST will reload the same image
+ * from flash into the FPGA.
+ *
+ * This is a property of the entire adapter, not a single AFU, so drivers
+ * should set this property with care!
+ */
+void cxl_perst_reloads_same_image(struct cxl_afu *afu,
+				  bool perst_reloads_same_image);
+#endif /* CONFIG_CXL_EEH */
+
 #endif /* _MISC_CXL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 09/10] cxl: EEH support
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (7 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  7:23   ` Cyril Bur
  2015-07-28  5:28 ` [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol Daniel Axtens
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

EEH (Enhanced Error Handling) allows a driver to recover from the
temporary failure of an attached PCI card. Enable basic CXL support
for EEH.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/cxl.h  |   1 +
 drivers/misc/cxl/pci.c  | 253 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/vphb.c |   8 ++
 3 files changed, 262 insertions(+)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 6dd4158f76ac..2065e894e46d 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -699,6 +699,7 @@ int cxl_psl_purge(struct cxl_afu *afu);
 
 void cxl_stop_trace(struct cxl *cxl);
 int cxl_pci_vphb_add(struct cxl_afu *afu);
+void cxl_pci_vphb_reconfigure(struct cxl_afu *afu);
 void cxl_pci_vphb_remove(struct cxl_afu *afu);
 
 extern struct pci_driver cxl_pci_driver;
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index b6a189b35323..60ae863b6f0a 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -24,6 +24,7 @@
 #include <asm/io.h>
 
 #include "cxl.h"
+#include <misc/cxl.h>
 
 
 #define CXL_PCI_VSEC_ID	0x1280
@@ -1249,10 +1250,262 @@ static void cxl_remove(struct pci_dev *dev)
 	cxl_remove_adapter(adapter);
 }
 
+#ifdef CONFIG_CXL_EEH
+static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
+						pci_channel_state_t state)
+{
+	struct pci_dev *afu_dev;
+	pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
+	pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET;
+
+	/* There should only be one entry, but go through the list
+	 * anyway
+	 */
+	list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
+		if (!afu_dev->driver)
+			continue;
+
+		if (afu_dev->driver->err_handler)
+			afu_result = afu_dev->driver->err_handler->error_detected(afu_dev,
+										  state);
+		/* Disconnect trumps all, NONE trumps NEED_RESET */
+		if (afu_result == PCI_ERS_RESULT_DISCONNECT)
+			result = PCI_ERS_RESULT_DISCONNECT;
+		else if ((afu_result == PCI_ERS_RESULT_NONE) &&
+			 (result == PCI_ERS_RESULT_NEED_RESET))
+			result = PCI_ERS_RESULT_NONE;
+	}
+	return result;
+}
+
+static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
+					       pci_channel_state_t state)
+{
+	struct cxl *adapter = pci_get_drvdata(pdev);
+	struct cxl_afu *afu;
+	pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
+	int i;
+
+	/* At this point, we could still have an interrupt pending.
+	 * Let's try to get them out of the way before they do
+	 * anything we don't like.
+	 */
+	schedule();
+
+	/* If we're permanently dead, give up. */
+	if (state == pci_channel_io_perm_failure) {
+		/* Tell the AFU drivers; but we don't care what they
+		 * say, we're going away.
+		 */
+		for (i = 0; i < adapter->slices; i++) {
+			afu = adapter->afu[i];
+			cxl_vphb_error_detected(afu, state);
+		}
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	/* Are we reflashing?
+	 *
+	 * If we reflash, we could come back as something entirely
+	 * different, including a non-CAPI card. As such, by default
+	 * we don't participate in the process. We'll be unbound and
+	 * the slot re-probed. (TODO: check EEH doesn't blindly rebind
+	 * us!)
+	 *
+	 * However, this isn't the entire story: for reliablity
+	 * reasons, we usually want to reflash the FPGA on PERST in
+	 * order to get back to a more reliable known-good state.
+	 *
+	 * This causes us a bit of a problem: if we reflash we can't
+	 * trust that we'll come back the same - we could have a new
+	 * image and been PERSTed in order to load that
+	 * image. However, most of the time we actually *will* come
+	 * back the same - for example a regular EEH event.
+	 *
+	 * Therefore, we allow the user to assert that the image is
+	 * indeed the same and that we should continue on into EEH
+	 * anyway.
+	 */
+	if (adapter->perst_loads_image && !adapter->perst_same_image) {
+		/* TODO take the PHB out of CXL mode */
+		dev_info(&pdev->dev, "reflashing, so opting out of EEH!\n");
+		return PCI_ERS_RESULT_NONE;
+	}
+
+	/*
+	 * At this point, we want to try to recover.  We'll always
+	 * need a complete slot reset: we don't trust any other reset.
+	 *
+	 * Now, we go through each AFU:
+	 *  - We send the driver, if bound, an error_detected callback.
+	 *    We expect it to clean up, but it can also tell us to give
+	 *    up and permanently detach the card. To simplify things, if
+	 *    any bound AFU driver doesn't support EEH, we give up on EEH.
+	 *
+	 *  - We detach all contexts associated with the AFU. This
+	 *    does not free them, but puts them into a CLOSED state
+	 *    which causes any the associated files to return useful
+	 *    errors to userland. It also unmaps, but does not free,
+	 *    any IRQs.
+	 *
+	 *  - We clean up our side: releasing and unmapping resources we hold
+	 *    so we can wire them up again when the hardware comes back up.
+	 *
+	 * Driver authors should note:
+	 *
+	 *  - Any contexts you create in your kernel driver (except
+	 *    those associated with anonymous file descriptors) are
+	 *    your responsibility to free and recreate. Likewise with
+	 *    any attached resources.
+	 *
+	 *  - We will take responsibility for re-initialising the
+	 *    device context (the one set up for you in
+	 *    cxl_pci_enable_device_hook and accessed through
+	 *    cxl_get_context). If you've attached IRQs or other
+	 *    resources to it, they remains yours to free.
+	 *
+	 * All calls you make into cxl that normally touch the
+	 * hardware will not touch the hardware during recovery. So
+	 * you can call the same functions to release resources as you
+	 * normally would.
+	 *
+	 * Two examples:
+	 *
+	 * 1) If you normally free all your resources at the end of
+	 *    each request, or if you use anonymous FDs, your
+	 *    error_detected callback can simply set a flag to tell
+	 *    your driver not to start any new calls. You can then
+	 *    clear the flag in the resume callback.
+	 *
+	 * 2) If you normally allocate your resources on startup:
+	 *     * Set a flag in error_detected as above.
+	 *     * Let CXL detach your contexts.
+	 *     * In slot_reset, free the old resources and allocate new ones.
+	 *     * In resume, clear the flag to allow things to start.
+	 */
+	for (i = 0; i < adapter->slices; i++) {
+		afu = adapter->afu[i];
+
+		result = cxl_vphb_error_detected(afu, state);
+
+		/* Only continue if everyone agrees on NEED_RESET */
+		if (result != PCI_ERS_RESULT_NEED_RESET)
+			return result;
+
+		cxl_context_detach_all(afu);
+		cxl_afu_deactivate_mode(afu);
+		cxl_deconfigure_afu(afu);
+	}
+	cxl_deconfigure_adapter(adapter);
+
+	return result;
+}
+
+static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
+{
+	struct cxl *adapter = pci_get_drvdata(pdev);
+	struct cxl_afu *afu;
+	struct cxl_context *ctx;
+	struct pci_dev *afu_dev;
+	pci_ers_result_t afu_result = PCI_ERS_RESULT_RECOVERED;
+	pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+	int i;
+
+	if (cxl_configure_adapter(adapter, pdev))
+		goto err;
+
+	for (i = 0; i < adapter->slices; i++) {
+		afu = adapter->afu[i];
+
+		if (cxl_configure_afu(afu, adapter, pdev))
+			goto err;
+
+		if (cxl_afu_select_best_mode(afu))
+			goto err;
+
+		cxl_pci_vphb_reconfigure(afu);
+
+		list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
+			/* Reset the device context.
+			 * TODO: make this less disruptive
+			 */
+			ctx = cxl_get_context(afu_dev);
+
+			if (ctx && cxl_release_context(ctx))
+				goto err;
+
+			ctx = cxl_dev_context_init(afu_dev);
+			if (!ctx)
+				goto err;
+
+			afu_dev->dev.archdata.cxl_ctx = ctx;
+
+			if (cxl_afu_check_and_enable(afu))
+				goto err;
+
+			/* If there's a driver attached, allow it to
+			 * chime in on recovery. Drivers should check
+			 * if everything has come back OK.
+			 */
+			if (!afu_dev->driver)
+				continue;
+
+			if (afu_dev->driver->err_handler &&
+			    afu_dev->driver->err_handler->slot_reset)
+				afu_result = afu_dev->driver->err_handler->slot_reset(afu_dev);
+
+			if (afu_result == PCI_ERS_RESULT_DISCONNECT)
+				result = PCI_ERS_RESULT_DISCONNECT;
+		}
+	}
+	return result;
+
+err:
+	/* All the bits that happen in both error_detected and cxl_remove
+	 * should be idempotent, so we don't need to worry about leaving a mix
+	 * of unconfigured and reconfigured resources.
+	 */
+	dev_err(&pdev->dev, "EEH recovery failed. Asking to be disconnected.\n");
+	return PCI_ERS_RESULT_DISCONNECT;
+}
+
+static void cxl_pci_resume(struct pci_dev *pdev)
+{
+	struct cxl *adapter = pci_get_drvdata(pdev);
+	struct cxl_afu *afu;
+	struct pci_dev *afu_dev;
+	int i;
+
+	/* Everything is back now. Drivers should restart work now.
+	 * This is not the place to be checking if everything came back up
+	 * properly, because there's no return value: do that in slot_reset.
+	 */
+	for (i = 0; i < adapter->slices; i++) {
+		afu = adapter->afu[i];
+
+		list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
+			if (afu_dev->driver && afu_dev->driver->err_handler &&
+			    afu_dev->driver->err_handler->resume)
+				afu_dev->driver->err_handler->resume(afu_dev);
+		}
+	}
+}
+
+static const struct pci_error_handlers cxl_err_handler = {
+	.error_detected = cxl_pci_error_detected,
+	.slot_reset = cxl_pci_slot_reset,
+	.resume = cxl_pci_resume,
+};
+#endif /* CONFIG_CXL_EEH */
+
+
 struct pci_driver cxl_pci_driver = {
 	.name = "cxl-pci",
 	.id_table = cxl_pci_tbl,
 	.probe = cxl_probe,
 	.remove = cxl_remove,
 	.shutdown = cxl_remove,
+#ifdef CONFIG_CXL_EEH
+	.err_handler = &cxl_err_handler,
+#endif
 };
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 2930911c1e42..6dd16a6d153f 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -266,6 +266,14 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
 	return 0;
 }
 
+void cxl_pci_vphb_reconfigure(struct cxl_afu *afu)
+{
+	/* When we are reconfigured, the AFU's MMIO space is unmapped
+	 * and remapped. We need to reflect this in the PHB's view of
+	 * the world.
+	 */
+	afu->phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset;
+}
 
 void cxl_pci_vphb_remove(struct cxl_afu *afu)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol
  2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
                   ` (8 preceding siblings ...)
  2015-07-28  5:28 ` [PATCH v2 09/10] cxl: EEH support Daniel Axtens
@ 2015-07-28  5:28 ` Daniel Axtens
  2015-08-11  3:59   ` Cyril Bur
  9 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-07-28  5:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: mpe, benh, mikey, imunsie, Matthew R. Ochs, Manoj Kumar,
	Daniel Axtens

CONFIG_CXL_EEH is for CXL's EEH related code.

As well as the EEH callbacks, it should guard sysfs and
kernel API changes that are only required for CXL EEH.

We now have all the pieces in place, so add it now.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 drivers/misc/cxl/Kconfig | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index b6db9ebd52c2..c151fc1fe14c 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -11,11 +11,17 @@ config CXL_KERNEL_API
 	bool
 	default n
 
+config CXL_EEH
+	bool
+	default n
+	select EEH
+
 config CXL
 	tristate "Support for IBM Coherent Accelerators (CXL)"
 	depends on PPC_POWERNV && PCI_MSI
 	select CXL_BASE
 	select CXL_KERNEL_API
+	select CXL_EEH
 	default m
 	help
 	  Select this option to enable driver support for IBM Coherent
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state
  2015-07-28  5:28 ` [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state Daniel Axtens
@ 2015-08-11  3:31   ` Cyril Bur
  2015-08-11  4:11     ` Daniel Axtens
  0 siblings, 1 reply; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:31 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:34 +1000
Daniel Axtens <dja@axtens.net> wrote:

> If the PCI channel has gone down, don't attempt to poke the hardware.
> 
> We need to guard every time cxl_whatever_(read|write) is called. This
> is because a call to those functions will dereference an offset into an
> mmio register, and the mmio mappings get invalidated in the EEH
> teardown.
> 

Hey Daniel, keeping in mind I can't exactly test it, and that I don't know the
ins and outs of CAPI, the code looks nice and it makes sence to me.

Some very minor points,

Otherwise

Acked-by: Cyril Bur <cyrilbur@gmail.com>

> Check in the read/write functions in the header.
> We give them the same semantics as usual PCI operations:
>  - a write to a channel that is down is ignored.
>  - a read from a channel that is down returns all fs.
> 
> Also, we try to access the MMIO space of a vPHB device as part of the
> PCI disable path. Because that's a read that bypasses most of our usual
> checks, we handle it explicitly.
> 
> As far as user visible warnings go:
>  - Check link state in file ops, return -EIO if down.
>  - Be reasonably quiet if there's an error in a teardown path,
>    or when we already know the hardware is going down.
>  - Throw a big WARN if someone tries to start a CXL operation
>    while the card is down. This gives a useful stacktrace for
>    debugging whatever is doing that.
> 
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/context.c |  6 +++-
>  drivers/misc/cxl/cxl.h     | 34 ++++++++++++++++------
>  drivers/misc/cxl/file.c    | 19 +++++++++++++
>  drivers/misc/cxl/native.c  | 71 ++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/misc/cxl/vphb.c    | 26 +++++++++++++++++
>  5 files changed, 144 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> index 1287148629c0..615842115848 100644
> --- a/drivers/misc/cxl/context.c
> +++ b/drivers/misc/cxl/context.c
> @@ -193,7 +193,11 @@ int __detach_context(struct cxl_context *ctx)
>  	if (status != STARTED)
>  		return -EBUSY;
>  
> -	WARN_ON(cxl_detach_process(ctx));
> +	/* Only warn if we detached while the link was OK.
> +	 * If detach fails when hw is down, we don't care.
> +	 */
> +	WARN_ON(cxl_detach_process(ctx) &&
> +		cxl_adapter_link_ok(ctx->afu->adapter));
>  	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
>  	put_pid(ctx->pid);
>  	cxl_ctx_put();
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 4fd66cabde1e..47eadbcfd379 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -531,6 +531,14 @@ struct cxl_process_element {
>  	__be32 software_state;
>  } __packed;
>  
> +static inline bool cxl_adapter_link_ok(struct cxl *cxl)
> +{
> +	struct pci_dev *pdev;
> +
> +	pdev = to_pci_dev(cxl->dev.parent);
> +	return (pdev->error_state == pci_channel_io_normal);
> +}
> +

In the process of reviewing these patches I read the style guide in furthur
detail and (it doesn't 100% commit one way or the other but) it suggests it may
be wise to get GCC choose if it should inline or not, unless you have an reason 
(the macro replacment below being a good example)... Just a thought.

>  static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg)
>  {
>  	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
> @@ -538,9 +546,11 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg)
>  }
>  
>  #define cxl_p1_write(cxl, reg, val) \
> -	out_be64(_cxl_p1_addr(cxl, reg), val)
> +	if (cxl_adapter_link_ok(cxl)) \
> +		out_be64(_cxl_p1_addr(cxl, reg), val)
>  #define cxl_p1_read(cxl, reg) \
> -	in_be64(_cxl_p1_addr(cxl, reg))
> +	(cxl_adapter_link_ok(cxl) ? in_be64(_cxl_p1_addr(cxl, reg)) \
> +	 : (~0ULL))
>  
>  static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg)
>  {
> @@ -549,9 +559,11 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu *afu, cxl_p1n_reg_t reg
>  }
>  
>  #define cxl_p1n_write(afu, reg, val) \
> -	out_be64(_cxl_p1n_addr(afu, reg), val)
> +	if (cxl_adapter_link_ok(afu->adapter)) \
> +		out_be64(_cxl_p1n_addr(afu, reg), val)
>  #define cxl_p1n_read(afu, reg) \
> -	in_be64(_cxl_p1n_addr(afu, reg))
> +	(cxl_adapter_link_ok(afu->adapter) ? in_be64(_cxl_p1n_addr(afu, reg)) \
> +	 : (~0ULL))
>  

In the interest of safety and consistency, you might want braces around afu
when you dereference it. ie (afu)->adapter.

>  static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg)
>  {
> @@ -559,15 +571,21 @@ static inline void __iomem *_cxl_p2n_addr(struct cxl_afu *afu, cxl_p2n_reg_t reg
>  }
>  
>  #define cxl_p2n_write(afu, reg, val) \
> -	out_be64(_cxl_p2n_addr(afu, reg), val)
> +	if (cxl_adapter_link_ok(afu->adapter)) \
> +		out_be64(_cxl_p2n_addr(afu, reg), val)
>  #define cxl_p2n_read(afu, reg) \
> -	in_be64(_cxl_p2n_addr(afu, reg))
> +	(cxl_adapter_link_ok(afu->adapter) ? in_be64(_cxl_p2n_addr(afu, reg)) \
> +	 : (~0ULL))
>  
>  
>  #define cxl_afu_cr_read64(afu, cr, off) \
> -	in_le64((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off))
> +	(cxl_adapter_link_ok(afu->adapter) ? \
> +	 in_le64((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off)) : \
> +	 (~0ULL))
>  #define cxl_afu_cr_read32(afu, cr, off) \
> -	in_le32((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off))
> +	(cxl_adapter_link_ok(afu->adapter) ? \
> +	 in_le32((afu)->afu_desc_mmio + (afu)->crs_offset + ((cr) * (afu)->crs_len) + (off)) : \
> +	 0xffffffff)

Previous comment applies here, the existing code has (afu)->


So a birdy has informed me these are going to become inlines, you can
therefore disregard those comments. I am much more in favour of inlines!

>  u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off);
>  u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off);
>  
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index e3f4b69527a9..b99b4fa84ebd 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -73,6 +73,11 @@ static int __afu_open(struct inode *inode, struct file *file, bool master)
>  	if (!afu->current_mode)
>  		goto err_put_afu;
>  
> +	if (!cxl_adapter_link_ok(adapter)) {
> +		rc = -EIO;
> +		goto err_put_afu;
> +	}
> +
>  	if (!(ctx = cxl_context_alloc())) {
>  		rc = -ENOMEM;
>  		goto err_put_afu;
> @@ -238,6 +243,9 @@ long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  	if (ctx->status == CLOSED)
>  		return -EIO;
>  
> +	if (!cxl_adapter_link_ok(ctx->afu->adapter))
> +		return -EIO;
> +
>  	pr_devel("afu_ioctl\n");
>  	switch (cmd) {
>  	case CXL_IOCTL_START_WORK:
> @@ -265,6 +273,9 @@ int afu_mmap(struct file *file, struct vm_area_struct *vm)
>  	if (ctx->status != STARTED)
>  		return -EIO;
>  
> +	if (!cxl_adapter_link_ok(ctx->afu->adapter))
> +		return -EIO;
> +
>  	return cxl_context_iomap(ctx, vm);
>  }
>  
> @@ -309,6 +320,9 @@ ssize_t afu_read(struct file *file, char __user *buf, size_t count,
>  	int rc;
>  	DEFINE_WAIT(wait);
>  
> +	if (!cxl_adapter_link_ok(ctx->afu->adapter))
> +		return -EIO;
> +
>  	if (count < CXL_READ_MIN_SIZE)
>  		return -EINVAL;
>  
> @@ -319,6 +333,11 @@ ssize_t afu_read(struct file *file, char __user *buf, size_t count,
>  		if (ctx_event_pending(ctx))
>  			break;
>  
> +		if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
> +			rc = -EIO;
> +			goto out;
> +		}
> +
>  		if (file->f_flags & O_NONBLOCK) {
>  			rc = -EAGAIN;
>  			goto out;
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index 10567f245818..16948915eb0d 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -41,6 +41,12 @@ static int afu_control(struct cxl_afu *afu, u64 command,
>  			rc = -EBUSY;
>  			goto out;
>  		}
> +		if (!cxl_adapter_link_ok(afu->adapter)) {
> +			afu->enabled = enabled;
> +			rc = -EIO;
> +			goto out;
> +		}
> +
>  		pr_devel_ratelimited("AFU control... (0x%.16llx)\n",
>  				     AFU_Cntl | command);
>  		cpu_relax();
> @@ -85,6 +91,10 @@ int __cxl_afu_reset(struct cxl_afu *afu)
>  
>  int cxl_afu_check_and_enable(struct cxl_afu *afu)
>  {
> +	if (!cxl_adapter_link_ok(afu->adapter)) {
> +		WARN(1, "Refusing to enable afu while link down!\n");
> +		return -EIO;
> +	}
>  	if (afu->enabled)
>  		return 0;
>  	return afu_enable(afu);
> @@ -103,6 +113,12 @@ int cxl_psl_purge(struct cxl_afu *afu)
>  
>  	pr_devel("PSL purge request\n");
>  
> +	if (!cxl_adapter_link_ok(afu->adapter)) {
> +		dev_warn(&afu->dev, "PSL Purge called with link down, ignoring\n");
> +		rc = -EIO;
> +		goto out;
> +	}
> +
>  	if ((AFU_Cntl & CXL_AFU_Cntl_An_ES_MASK) != CXL_AFU_Cntl_An_ES_Disabled) {
>  		WARN(1, "psl_purge request while AFU not disabled!\n");
>  		cxl_afu_disable(afu);
> @@ -119,6 +135,11 @@ int cxl_psl_purge(struct cxl_afu *afu)
>  			rc = -EBUSY;
>  			goto out;
>  		}
> +		if (!cxl_adapter_link_ok(afu->adapter)) {
> +			rc = -EIO;
> +			goto out;
> +		}
> +
>  		dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
>  		pr_devel_ratelimited("PSL purging... PSL_CNTL: 0x%.16llx  PSL_DSISR: 0x%.16llx\n", PSL_CNTL, dsisr);
>  		if (dsisr & CXL_PSL_DSISR_TRANS) {
> @@ -215,6 +236,8 @@ int cxl_tlb_slb_invalidate(struct cxl *adapter)
>  			dev_warn(&adapter->dev, "WARNING: CXL adapter wide TLBIA timed out!\n");
>  			return -EBUSY;
>  		}
> +		if (!cxl_adapter_link_ok(adapter))
> +			return -EIO;
>  		cpu_relax();
>  	}
>  
> @@ -224,6 +247,8 @@ int cxl_tlb_slb_invalidate(struct cxl *adapter)
>  			dev_warn(&adapter->dev, "WARNING: CXL adapter wide SLBIA timed out!\n");
>  			return -EBUSY;
>  		}
> +		if (!cxl_adapter_link_ok(adapter))
> +			return -EIO;
>  		cpu_relax();
>  	}
>  	return 0;
> @@ -240,6 +265,11 @@ int cxl_afu_slbia(struct cxl_afu *afu)
>  			dev_warn(&afu->dev, "WARNING: CXL AFU SLBIA timed out!\n");
>  			return -EBUSY;
>  		}
> +		/* If the adapter has gone down, we can assume that we
> +		 * will PERST it and that will invalidate everything.
> +		 */
> +		if (!cxl_adapter_link_ok(afu->adapter))
> +			return -EIO;
>  		cpu_relax();
>  	}
>  	return 0;
> @@ -279,6 +309,8 @@ static void slb_invalid(struct cxl_context *ctx)
>  	cxl_p1_write(adapter, CXL_PSL_SLBIA, CXL_TLB_SLB_IQ_LPIDPID);
>  
>  	while (1) {
> +		if (!cxl_adapter_link_ok(adapter))
> +			break;
>  		slbia = cxl_p1_read(adapter, CXL_PSL_SLBIA);
>  		if (!(slbia & CXL_TLB_SLB_P))
>  			break;
> @@ -308,6 +340,11 @@ static int do_process_element_cmd(struct cxl_context *ctx,
>  			rc = -EBUSY;
>  			goto out;
>  		}
> +		if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
> +			dev_warn(&ctx->afu->dev, "WARNING: Device link down, aborting Process Element Command!\n");
> +			rc = -EIO;
> +			goto out;
> +		}
>  		state = be64_to_cpup(ctx->afu->sw_command_status);
>  		if (state == ~0ULL) {
>  			pr_err("cxl: Error adding process element to AFU\n");
> @@ -355,8 +392,13 @@ static int terminate_process_element(struct cxl_context *ctx)
>  
>  	mutex_lock(&ctx->afu->spa_mutex);
>  	pr_devel("%s Terminate pe: %i started\n", __func__, ctx->pe);
> -	rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
> -				    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
> +	/* We could be asked to terminate when the hw is down.  That

                                                               ^
I'm notoriously bad for having a low care threshhold but I do have a high care
threshhold for consistency. Double space after period? Are you sure?

> +	 * should always succeed: it's not running if the hw has gone
> +	 * away and is being reset.
> +	 */
> +	if (cxl_adapter_link_ok(ctx->afu->adapter))
> +		rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
> +					    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
>  	ctx->elem->software_state = 0;	/* Remove Valid bit */
>  	pr_devel("%s Terminate pe: %i finished\n", __func__, ctx->pe);
>  	mutex_unlock(&ctx->afu->spa_mutex);
> @@ -369,7 +411,14 @@ static int remove_process_element(struct cxl_context *ctx)
>  
>  	mutex_lock(&ctx->afu->spa_mutex);
>  	pr_devel("%s Remove pe: %i started\n", __func__, ctx->pe);
> -	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0)))
> +
> +	/* We could be asked to remove when the hw is down.  Again, if

Also here                                                   ^

> +	 * the hw is down, the PE is gone, so we succeed.
> +	 */
> +	if (cxl_adapter_link_ok(ctx->afu->adapter))
> +		rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0);
> +
> +	if (!rc)
>  		ctx->pe_inserted = false;
>  	slb_invalid(ctx);
>  	pr_devel("%s Remove pe: %i finished\n", __func__, ctx->pe);
> @@ -614,6 +663,11 @@ int cxl_afu_activate_mode(struct cxl_afu *afu, int mode)
>  	if (!(mode & afu->modes_supported))
>  		return -EINVAL;
>  
> +	if (!cxl_adapter_link_ok(afu->adapter)) {
> +		WARN(1, "Device link is down, refusing to activate!\n");
> +		return -EIO;
> +	}
> +
>  	if (mode == CXL_MODE_DIRECTED)
>  		return activate_afu_directed(afu);
>  	if (mode == CXL_MODE_DEDICATED)
> @@ -624,6 +678,11 @@ int cxl_afu_activate_mode(struct cxl_afu *afu, int mode)
>  
>  int cxl_attach_process(struct cxl_context *ctx, bool kernel, u64 wed, u64 amr)
>  {
> +	if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
> +		WARN(1, "Device link is down, refusing to attach process!\n");
> +		return -EIO;
> +	}
> +
>  	ctx->kernel = kernel;
>  	if (ctx->afu->current_mode == CXL_MODE_DIRECTED)
>  		return attach_afu_directed(ctx, wed, amr);
> @@ -668,6 +727,12 @@ int cxl_get_irq(struct cxl_afu *afu, struct cxl_irq_info *info)
>  {
>  	u64 pidtid;
>  
> +	/* If the adapter has gone away, we can't get any meaningful
> +	 * information.
> +	 */
> +	if (!cxl_adapter_link_ok(afu->adapter))
> +		return -EIO;
> +
>  	info->dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
>  	info->dar = cxl_p2n_read(afu, CXL_PSL_DAR_An);
>  	info->dsr = cxl_p2n_read(afu, CXL_PSL_DSR_An);
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 2eba002b580b..2930911c1e42 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -138,6 +138,26 @@ static int cxl_pcie_config_info(struct pci_bus *bus, unsigned int devfn,
>  	return 0;
>  }
>  
> +
> +static inline bool cxl_config_link_ok(struct pci_bus *bus)
> +{
> +	struct pci_controller *phb;
> +	struct cxl_afu *afu;
> +
> +	/* Config space IO is based on phb->cfg_addr, which is based on
> +	 * afu_desc_mmio. This isn't safe to read/write when the link

And yet not here         ^

> +	 * goes down, as EEH tears down MMIO space.
> +	 *
> +	 * Check if the link is OK before proceeding.
> +	 */
> +
> +	phb = pci_bus_to_host(bus);
> +	if (phb == NULL)
> +		return false;
> +	afu = (struct cxl_afu *)phb->private_data;
> +	return cxl_adapter_link_ok(afu->adapter);
> +}
> +
>  static int cxl_pcie_read_config(struct pci_bus *bus, unsigned int devfn,
>  				int offset, int len, u32 *val)
>  {
> @@ -150,6 +170,9 @@ static int cxl_pcie_read_config(struct pci_bus *bus, unsigned int devfn,
>  	if (rc)
>  		return rc;
>  
> +	if (!cxl_config_link_ok(bus))
> +		return PCIBIOS_DEVICE_NOT_FOUND;
> +
>  	/* Can only read 32 bits */
>  	*val = (in_le32(ioaddr) >> shift) & mask;
>  	return PCIBIOS_SUCCESSFUL;
> @@ -167,6 +190,9 @@ static int cxl_pcie_write_config(struct pci_bus *bus, unsigned int devfn,
>  	if (rc)
>  		return rc;
>  
> +	if (!cxl_config_link_ok(bus))
> +		return PCIBIOS_DEVICE_NOT_FOUND;
> +
>  	/* Can only write 32 bits so do read-modify-write */
>  	mask <<= shift;
>  	val <<= shift;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU
  2015-07-28  5:28 ` [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU Daniel Axtens
@ 2015-08-11  3:42   ` Cyril Bur
  2015-08-11  4:16     ` Daniel Axtens
  0 siblings, 1 reply; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:42 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:35 +1000
Daniel Axtens <dja@axtens.net> wrote:

> Previously the SPA was allocated and freed upon entering and leaving
> AFU-directed mode. This causes some issues for error recovery - contexts
> hold a pointer inside the SPA, and they may persist after the AFU has
> been detached.
> 
> We would ideally like to allocate the SPA when the AFU is allocated, and
> release it until the AFU is released. However, we don't know how big the
> SPA needs to be until we read the AFU descriptor.
> 
> Therefore, restructure the code:
> 
>  - Allocate the SPA only once, on the first attach.
> 
>  - Release the SPA only when the entire AFU is being released (not
>    detached). Guard the release with a NULL check, so we don't free
>    if it was never allocated (e.g. dedicated mode)
> 

I'm sure you tested this :), looks fine to me, from an outsiders perspective
code appears to do what the commit message says.

Just one super minor question, you do the NULL check in the caller. How obvious
is the error if/when a caller forgets?

Acked-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/cxl.h    |  3 +++
>  drivers/misc/cxl/native.c | 28 ++++++++++++++++++----------
>  drivers/misc/cxl/pci.c    |  3 +++
>  3 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 47eadbcfd379..88a88c445e2a 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -603,6 +603,9 @@ void unregister_cxl_calls(struct cxl_calls *calls);
>  int cxl_alloc_adapter_nr(struct cxl *adapter);
>  void cxl_remove_adapter_nr(struct cxl *adapter);
>  
> +int cxl_alloc_spa(struct cxl_afu *afu);
> +void cxl_release_spa(struct cxl_afu *afu);
> +
>  int cxl_file_init(void);
>  void cxl_file_exit(void);
>  int cxl_register_adapter(struct cxl *adapter);
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index 16948915eb0d..debd97147b58 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -182,10 +182,8 @@ static int spa_max_procs(int spa_size)
>  	return ((spa_size / 8) - 96) / 17;
>  }
>  
> -static int alloc_spa(struct cxl_afu *afu)
> +int cxl_alloc_spa(struct cxl_afu *afu)
>  {
> -	u64 spap;
> -
>  	/* Work out how many pages to allocate */
>  	afu->spa_order = 0;
>  	do {
> @@ -204,6 +202,13 @@ static int alloc_spa(struct cxl_afu *afu)
>  	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
>  		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
>  
> +	return 0;
> +}
> +
> +static void attach_spa(struct cxl_afu *afu)
> +{
> +	u64 spap;
> +
>  	afu->sw_command_status = (__be64 *)((char *)afu->spa +
>  					    ((afu->spa_max_procs + 3) * 128));
>  
> @@ -212,13 +217,15 @@ static int alloc_spa(struct cxl_afu *afu)
>  	spap |= CXL_PSL_SPAP_V;
>  	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
>  	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
> -
> -	return 0;
>  }
>  
> -static void release_spa(struct cxl_afu *afu)
> +static inline void detach_spa(struct cxl_afu *afu)
>  {
>  	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);
> +}
> +
> +void cxl_release_spa(struct cxl_afu *afu)
> +{
>  	free_pages((unsigned long) afu->spa, afu->spa_order);
>  }
>  
> @@ -446,8 +453,11 @@ static int activate_afu_directed(struct cxl_afu *afu)
>  
>  	dev_info(&afu->dev, "Activating AFU directed mode\n");
>  
> -	if (alloc_spa(afu))
> -		return -ENOMEM;
> +	if (afu->spa == NULL) {
> +		if (cxl_alloc_spa(afu))
> +			return -ENOMEM;
> +	}
> +	attach_spa(afu);
>  
>  	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
>  	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
> @@ -560,8 +570,6 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
>  	cxl_afu_disable(afu);
>  	cxl_psl_purge(afu);
>  
> -	release_spa(afu);
> -
>  	return 0;
>  }
>  
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 32ad09705949..1849c1785b49 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -551,6 +551,9 @@ static void cxl_release_afu(struct device *dev)
>  
>  	pr_devel("cxl_release_afu\n");
>  
> +	if (afu->spa)
> +		cxl_release_spa(afu);
> +
>  	kfree(afu);
>  }
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 03/10] cxl: Make IRQ release idempotent
  2015-07-28  5:28 ` [PATCH v2 03/10] cxl: Make IRQ release idempotent Daniel Axtens
@ 2015-08-11  3:44   ` Cyril Bur
  0 siblings, 0 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:44 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:36 +1000
Daniel Axtens <dja@axtens.net> wrote:

> Check if an IRQ is mapped before releasing it.
> 
> This will simplify future EEH code by allowing unconditional unmapping
> of IRQs.
> 

Acked-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/irq.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
> index 680cd263436d..121ec48f3ab4 100644
> --- a/drivers/misc/cxl/irq.c
> +++ b/drivers/misc/cxl/irq.c
> @@ -341,6 +341,9 @@ int cxl_register_psl_err_irq(struct cxl *adapter)
>  
>  void cxl_release_psl_err_irq(struct cxl *adapter)
>  {
> +	if (adapter->err_virq != irq_find_mapping(NULL, adapter->err_hwirq))
> +		return;
> +
>  	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
>  	cxl_unmap_irq(adapter->err_virq, adapter);
>  	cxl_release_one_irq(adapter, adapter->err_hwirq);
> @@ -374,6 +377,9 @@ int cxl_register_serr_irq(struct cxl_afu *afu)
>  
>  void cxl_release_serr_irq(struct cxl_afu *afu)
>  {
> +	if (afu->serr_virq != irq_find_mapping(NULL, afu->serr_hwirq))
> +		return;
> +
>  	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
>  	cxl_unmap_irq(afu->serr_virq, afu);
>  	cxl_release_one_irq(afu->adapter, afu->serr_hwirq);
> @@ -400,6 +406,9 @@ int cxl_register_psl_irq(struct cxl_afu *afu)
>  
>  void cxl_release_psl_irq(struct cxl_afu *afu)
>  {
> +	if (afu->psl_virq != irq_find_mapping(NULL, afu->psl_hwirq))
> +		return;
> +
>  	cxl_unmap_irq(afu->psl_virq, afu);
>  	cxl_release_one_irq(afu->adapter, afu->psl_hwirq);
>  	kfree(afu->psl_irq_name);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path.
  2015-07-28  5:28 ` [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path Daniel Axtens
@ 2015-08-11  3:52   ` Cyril Bur
  2015-08-11  6:38     ` Daniel Axtens
  0 siblings, 1 reply; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:52 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:37 +1000
Daniel Axtens <dja@axtens.net> wrote:

>  - MMIO pointer unmapping is guarded by a null pointer check.
>    However, iounmap doesn't null the pointer, just invalidate it.
>    Therefore, explicitly null the pointer after unmapping.
> 
>  - afu_desc_mmio also needs to be unmapped.
> 
>  - PCI regions are allocated in cxl_map_adapter_regs.
>    Therefore they should be released in unmap, not elsewhere.
> 

You've changed the order in which cxl_remove_adapter() does its work, which,
I'm sure you've considered and it's fine, best to check.

Acked-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/pci.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 1849c1785b49..adcf938f2fdb 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -539,10 +539,18 @@ err:
>  
>  static void cxl_unmap_slice_regs(struct cxl_afu *afu)
>  {
> -	if (afu->p2n_mmio)
> +	if (afu->p2n_mmio) {
>  		iounmap(afu->p2n_mmio);
> -	if (afu->p1n_mmio)
> +		afu->p2n_mmio = NULL;
> +	}
> +	if (afu->p1n_mmio) {
>  		iounmap(afu->p1n_mmio);
> +		afu->p1n_mmio = NULL;
> +	}
> +	if (afu->afu_desc_mmio) {
> +		iounmap(afu->afu_desc_mmio);
> +		afu->afu_desc_mmio = NULL;
> +	}
>  }
>  
>  static void cxl_release_afu(struct device *dev)
> @@ -920,10 +928,16 @@ err1:
>  
>  static void cxl_unmap_adapter_regs(struct cxl *adapter)
>  {
> -	if (adapter->p1_mmio)
> +	if (adapter->p1_mmio) {
>  		iounmap(adapter->p1_mmio);
> -	if (adapter->p2_mmio)
> +		adapter->p1_mmio = NULL;
> +		pci_release_region(to_pci_dev(adapter->dev.parent), 2);
> +	}
> +	if (adapter->p2_mmio) {
>  		iounmap(adapter->p2_mmio);
> +		adapter->p2_mmio = NULL;
> +		pci_release_region(to_pci_dev(adapter->dev.parent), 0);
> +	}
>  }
>  
>  static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev)
> @@ -1132,8 +1146,6 @@ static void cxl_remove_adapter(struct cxl *adapter)
>  
>  	device_unregister(&adapter->dev);
>  
> -	pci_release_region(pdev, 0);
> -	pci_release_region(pdev, 2);
>  	pci_disable_device(pdev);
>  }
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol
  2015-07-28  5:28 ` [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol Daniel Axtens
@ 2015-08-11  3:59   ` Cyril Bur
  0 siblings, 0 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:59 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:43 +1000
Daniel Axtens <dja@axtens.net> wrote:

> CONFIG_CXL_EEH is for CXL's EEH related code.
> 
> As well as the EEH callbacks, it should guard sysfs and
> kernel API changes that are only required for CXL EEH.
> 
> We now have all the pieces in place, so add it now.
> 

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/Kconfig | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
> index b6db9ebd52c2..c151fc1fe14c 100644
> --- a/drivers/misc/cxl/Kconfig
> +++ b/drivers/misc/cxl/Kconfig
> @@ -11,11 +11,17 @@ config CXL_KERNEL_API
>  	bool
>  	default n
>  
> +config CXL_EEH
> +	bool
> +	default n
> +	select EEH
> +
>  config CXL
>  	tristate "Support for IBM Coherent Accelerators (CXL)"
>  	depends on PPC_POWERNV && PCI_MSI
>  	select CXL_BASE
>  	select CXL_KERNEL_API
> +	select CXL_EEH
>  	default m
>  	help
>  	  Select this option to enable driver support for IBM Coherent

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 06/10] cxl: Refactor AFU init/teardown
  2015-07-28  5:28 ` [PATCH v2 06/10] cxl: Refactor AFU init/teardown Daniel Axtens
@ 2015-08-11  3:59   ` Cyril Bur
  0 siblings, 0 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  3:59 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:39 +1000
Daniel Axtens <dja@axtens.net> wrote:

> As with an adapter, some aspects of initialisation are done only once
> in the lifetime of an AFU: for example, allocating memory, or setting
> up sysfs/debugfs files.
> 
> However, we may want to be able to do some parts of the initialisation
> multiple times: for example, in error recovery we want to be able to
> tear down and then re-map IO memory and IRQs.
> 
> Therefore, refactor AFU init/teardown as follows.
> 
>  - Create two new functions: 'cxl_configure_afu', and its pair
>    'cxl_deconfigure_afu'. As with the adapter functions,
>    these (de)configure resources that do not need to last the entire
>    lifetime of the AFU.
> 
>  - Allocating and releasing memory remain the task of 'cxl_alloc_afu'
>    and 'cxl_release_afu'.
> 
>  - Once-only functions that do not involve allocating/releasing memory
>    stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
>    However, the task of picking an AFU mode and activating it has been
>    broken out.
> 

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/pci.c | 87 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 50 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 7f47e2221524..98a8207da88d 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -753,45 +753,67 @@ ssize_t cxl_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
>  	return count;
>  }
>  
> -static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
> +static int cxl_configure_afu(struct cxl_afu *afu, struct cxl *adapter, struct pci_dev *dev)
>  {
> -	struct cxl_afu *afu;
> -	bool free = true;
>  	int rc;
>  
> -	if (!(afu = cxl_alloc_afu(adapter, slice)))
> -		return -ENOMEM;
> -
> -	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
> -		goto err1;
> -
>  	if ((rc = cxl_map_slice_regs(afu, adapter, dev)))
> -		goto err1;
> +		return rc;
>  
>  	if ((rc = sanitise_afu_regs(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	/* We need to reset the AFU before we can read the AFU descriptor */
>  	if ((rc = __cxl_afu_reset(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	if (cxl_verbose)
>  		dump_afu_descriptor(afu);
>  
>  	if ((rc = cxl_read_afu_descriptor(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	if ((rc = cxl_afu_descriptor_looks_ok(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	if ((rc = init_implementation_afu_regs(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	if ((rc = cxl_register_serr_irq(afu)))
> -		goto err2;
> +		goto err1;
>  
>  	if ((rc = cxl_register_psl_irq(afu)))
> -		goto err3;
> +		goto err2;
> +
> +	return 0;
> +
> +err2:
> +	cxl_release_serr_irq(afu);
> +err1:
> +	cxl_unmap_slice_regs(afu);
> +	return rc;
> +}
> +
> +static void cxl_deconfigure_afu(struct cxl_afu *afu)
> +{
> +	cxl_release_psl_irq(afu);
> +	cxl_release_serr_irq(afu);
> +	cxl_unmap_slice_regs(afu);
> +}
> +
> +static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
> +{
> +	struct cxl_afu *afu;
> +	int rc;
> +
> +	if (!(afu = cxl_alloc_afu(adapter, slice)))
> +		return -ENOMEM;
> +
> +	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
> +		goto err_free;
> +
> +	if ((rc = cxl_configure_afu(afu, adapter, dev)))
> +		goto err_free;
>  
>  	/* Don't care if this fails */
>  	cxl_debugfs_afu_add(afu);
> @@ -806,10 +828,6 @@ static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
>  	if ((rc = cxl_sysfs_afu_add(afu)))
>  		goto err_put1;
>  
> -
> -	if ((rc = cxl_afu_select_best_mode(afu)))
> -		goto err_put2;
> -
>  	adapter->afu[afu->slice] = afu;
>  
>  	if ((rc = cxl_pci_vphb_add(afu)))
> @@ -817,21 +835,16 @@ static int cxl_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
>  
>  	return 0;
>  
> -err_put2:
> -	cxl_sysfs_afu_remove(afu);
>  err_put1:
> -	device_unregister(&afu->dev);
> -	free = false;
> +	cxl_deconfigure_afu(afu);
>  	cxl_debugfs_afu_remove(afu);
> -	cxl_release_psl_irq(afu);
> -err3:
> -	cxl_release_serr_irq(afu);
> -err2:
> -	cxl_unmap_slice_regs(afu);
> -err1:
> -	if (free)
> -		kfree(afu);
> +	device_unregister(&afu->dev);
>  	return rc;
> +
> +err_free:
> +	kfree(afu);
> +	return rc;
> +
>  }
>  
>  static void cxl_remove_afu(struct cxl_afu *afu)
> @@ -851,10 +864,7 @@ static void cxl_remove_afu(struct cxl_afu *afu)
>  	cxl_context_detach_all(afu);
>  	cxl_afu_deactivate_mode(afu);
>  
> -	cxl_release_psl_irq(afu);
> -	cxl_release_serr_irq(afu);
> -	cxl_unmap_slice_regs(afu);
> -
> +	cxl_deconfigure_afu(afu);
>  	device_unregister(&afu->dev);
>  }
>  
> @@ -1207,6 +1217,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  	for (slice = 0; slice < adapter->slices; slice++) {
>  		if ((rc = cxl_init_afu(adapter, slice, dev)))
>  			dev_err(&dev->dev, "AFU %i failed to initialise: %i\n", slice, rc);
> +
> +		if ((rc = cxl_afu_select_best_mode(adapter->afu[slice])))
> +			dev_err(&dev->dev, "AFU %i failed to start: %i\n", slice, rc);
>  	}
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state
  2015-08-11  3:31   ` Cyril Bur
@ 2015-08-11  4:11     ` Daniel Axtens
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11  4:11 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, imunsie

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

> Hey Daniel, keeping in mind I can't exactly test it, and that I don't know the
> ins and outs of CAPI, the code looks nice and it makes sence to me.
> 
Thanks!

> Some very minor points,
> 
> Otherwise
> 
> Acked-by: Cyril Bur <cyrilbur@gmail.com>
> 

> >  
> > +static inline bool cxl_adapter_link_ok(struct cxl *cxl)
> > +{
> > +	struct pci_dev *pdev;
> > +
> > +	pdev = to_pci_dev(cxl->dev.parent);
> > +	return (pdev->error_state == pci_channel_io_normal);
> > +}
> > +
> 
> In the process of reviewing these patches I read the style guide in furthur
> detail and (it doesn't 100% commit one way or the other but) it suggests it may
> be wise to get GCC choose if it should inline or not, unless you have an reason 
> (the macro replacment below being a good example)... Just a thought.
> 

This sits in a bunch of hot paths and tight loops... and it's a
glorified macro. I will bear this in mind for the future: I hadn't
thought through the tradeoffs.


> 
> So a birdy has informed me these are going to become inlines, you can
> therefore disregard those comments. I am much more in favour of inlines!
> 
Yep, they're going to become inlines. yay for inlines.

> > +	/* We could be asked to terminate when the hw is down.  That
> 
>                                                                ^
> I'm notoriously bad for having a low care threshhold but I do have a high care
> threshhold for consistency. Double space after period? Are you sure?
> 
Fixed.

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU
  2015-08-11  3:42   ` Cyril Bur
@ 2015-08-11  4:16     ` Daniel Axtens
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11  4:16 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, imunsie

[-- Attachment #1: Type: text/plain, Size: 5159 bytes --]

On Tue, 2015-08-11 at 13:42 +1000, Cyril Bur wrote:
> On Tue, 28 Jul 2015 15:28:35 +1000
> Daniel Axtens <dja@axtens.net> wrote:
> 
> > Previously the SPA was allocated and freed upon entering and leaving
> > AFU-directed mode. This causes some issues for error recovery - contexts
> > hold a pointer inside the SPA, and they may persist after the AFU has
> > been detached.
> > 
> > We would ideally like to allocate the SPA when the AFU is allocated, and
> > release it until the AFU is released. However, we don't know how big the
> > SPA needs to be until we read the AFU descriptor.
> > 
> > Therefore, restructure the code:
> > 
> >  - Allocate the SPA only once, on the first attach.
> > 
> >  - Release the SPA only when the entire AFU is being released (not
> >    detached). Guard the release with a NULL check, so we don't free
> >    if it was never allocated (e.g. dedicated mode)
> > 
> 
> I'm sure you tested this :), looks fine to me, from an outsiders perspective
> code appears to do what the commit message says.
> 
> Just one super minor question, you do the NULL check in the caller. How obvious
> is the error if/when a caller forgets?
> 
Good point. The SPA is only released when releasing the entire AFU, so
it doesn't really matter at this point, but in case we ever move around
the assignment/release of the SPA (dynamic reprogramming comes to mind)
I've moved the check and added an explicit NULLing of the SPA pointer.

> Acked-by: Cyril Bur <cyrilbur@gmail.com>
> 
Thanks!
> > Signed-off-by: Daniel Axtens <dja@axtens.net>
> > ---
> >  drivers/misc/cxl/cxl.h    |  3 +++
> >  drivers/misc/cxl/native.c | 28 ++++++++++++++++++----------
> >  drivers/misc/cxl/pci.c    |  3 +++
> >  3 files changed, 24 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> > index 47eadbcfd379..88a88c445e2a 100644
> > --- a/drivers/misc/cxl/cxl.h
> > +++ b/drivers/misc/cxl/cxl.h
> > @@ -603,6 +603,9 @@ void unregister_cxl_calls(struct cxl_calls *calls);
> >  int cxl_alloc_adapter_nr(struct cxl *adapter);
> >  void cxl_remove_adapter_nr(struct cxl *adapter);
> >  
> > +int cxl_alloc_spa(struct cxl_afu *afu);
> > +void cxl_release_spa(struct cxl_afu *afu);
> > +
> >  int cxl_file_init(void);
> >  void cxl_file_exit(void);
> >  int cxl_register_adapter(struct cxl *adapter);
> > diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> > index 16948915eb0d..debd97147b58 100644
> > --- a/drivers/misc/cxl/native.c
> > +++ b/drivers/misc/cxl/native.c
> > @@ -182,10 +182,8 @@ static int spa_max_procs(int spa_size)
> >  	return ((spa_size / 8) - 96) / 17;
> >  }
> >  
> > -static int alloc_spa(struct cxl_afu *afu)
> > +int cxl_alloc_spa(struct cxl_afu *afu)
> >  {
> > -	u64 spap;
> > -
> >  	/* Work out how many pages to allocate */
> >  	afu->spa_order = 0;
> >  	do {
> > @@ -204,6 +202,13 @@ static int alloc_spa(struct cxl_afu *afu)
> >  	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
> >  		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
> >  
> > +	return 0;
> > +}
> > +
> > +static void attach_spa(struct cxl_afu *afu)
> > +{
> > +	u64 spap;
> > +
> >  	afu->sw_command_status = (__be64 *)((char *)afu->spa +
> >  					    ((afu->spa_max_procs + 3) * 128));
> >  
> > @@ -212,13 +217,15 @@ static int alloc_spa(struct cxl_afu *afu)
> >  	spap |= CXL_PSL_SPAP_V;
> >  	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
> >  	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
> > -
> > -	return 0;
> >  }
> >  
> > -static void release_spa(struct cxl_afu *afu)
> > +static inline void detach_spa(struct cxl_afu *afu)
> >  {
> >  	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);
> > +}
> > +
> > +void cxl_release_spa(struct cxl_afu *afu)
> > +{
> >  	free_pages((unsigned long) afu->spa, afu->spa_order);
> >  }
> >  
> > @@ -446,8 +453,11 @@ static int activate_afu_directed(struct cxl_afu *afu)
> >  
> >  	dev_info(&afu->dev, "Activating AFU directed mode\n");
> >  
> > -	if (alloc_spa(afu))
> > -		return -ENOMEM;
> > +	if (afu->spa == NULL) {
> > +		if (cxl_alloc_spa(afu))
> > +			return -ENOMEM;
> > +	}
> > +	attach_spa(afu);
> >  
> >  	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
> >  	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
> > @@ -560,8 +570,6 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
> >  	cxl_afu_disable(afu);
> >  	cxl_psl_purge(afu);
> >  
> > -	release_spa(afu);
> > -
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> > index 32ad09705949..1849c1785b49 100644
> > --- a/drivers/misc/cxl/pci.c
> > +++ b/drivers/misc/cxl/pci.c
> > @@ -551,6 +551,9 @@ static void cxl_release_afu(struct device *dev)
> >  
> >  	pr_devel("cxl_release_afu\n");
> >  
> > +	if (afu->spa)
> > +		cxl_release_spa(afu);
> > +
> >  	kfree(afu);
> >  }
> >  
> 

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset
  2015-07-28  5:28 ` [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset Daniel Axtens
@ 2015-08-11  5:57   ` Cyril Bur
  0 siblings, 0 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  5:57 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:40 +1000
Daniel Axtens <dja@axtens.net> wrote:

> If the driver doesn't participate in EEH, the AFUs will be removed
> by cxl_remove, which will be invoked by EEH.
> 
> If the driver does particpate in EEH, the vPHB needs to stick around
> so that the it can particpate.
> 
> In both cases, we shouldn't remove the AFU/vPHB.
> 

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/pci.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 98a8207da88d..0acf9e62733e 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -877,11 +877,6 @@ int cxl_reset(struct cxl *adapter)
>  
>  	dev_info(&dev->dev, "CXL reset\n");
>  
> -	for (i = 0; i < adapter->slices; i++) {
> -		cxl_pci_vphb_remove(adapter->afu[i]);
> -		cxl_remove_afu(adapter->afu[i]);
> -	}
> -
>  	/* pcie_warm_reset requests a fundamental pci reset which includes a
>  	 * PERST assert/deassert.  PERST triggers a loading of the image
>  	 * if "user" or "factory" is selected in sysfs */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
  2015-07-28  5:28 ` [PATCH v2 05/10] cxl: Refactor adaptor init/teardown Daniel Axtens
@ 2015-08-11  6:01   ` Cyril Bur
  2015-08-11 22:38     ` Daniel Axtens
  2015-08-12 10:14     ` David Laight
  0 siblings, 2 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  6:01 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, imunsie

On Tue, 28 Jul 2015 15:28:38 +1000
Daniel Axtens <dja@axtens.net> wrote:

> Some aspects of initialisation are done only once in the lifetime of
> an adapter: for example, allocating memory for the adapter,
> allocating the adapter number, or setting up sysfs/debugfs files.
> 
> However, we may want to be able to do some parts of the
> initialisation multiple times: for example, in error recovery we
> want to be able to tear down and then re-map IO memory and IRQs.
> 
> Therefore, refactor CXL init/teardown as follows.
> 
>  - Keep the overarching functions 'cxl_init_adapter' and its pair,
>    'cxl_remove_adapter'.
> 
>  - Move all 'once only' allocation/freeing steps to the existing
>    'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
>    (This involves moving allocation of the adapter number out of
>    cxl_init_adapter.)
> 
>  - Create two new functions: 'cxl_configure_adapter', and its pair
>    'cxl_deconfigure_adapter'. These two functions 'wire up' the
>    hardware --- they (de)configure resources that do not need to
>    last the entire lifetime of the adapter
> 

You have a dilema with the use of ugly if (rc = foo()). I don't like it but the
file is littered with it.

Looks like the majority of uses in this file the conditional block is only
one line then it makes sense (or at least in terms of numbers of lines... fair
enough), however, if you have a conditional block spanning multiple lines, I
don't like.

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/pci.c | 138 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 85 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index adcf938f2fdb..7f47e2221524 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -966,7 +966,6 @@ static int cxl_read_vsec(struct cxl *adapter, struct pci_dev *dev)
>  	CXL_READ_VSEC_BASE_IMAGE(dev, vsec, &adapter->base_image);
>  	CXL_READ_VSEC_IMAGE_STATE(dev, vsec, &image_state);
>  	adapter->user_image_loaded = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
> -	adapter->perst_loads_image = true;
>  	adapter->perst_select_user = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
>  
>  	CXL_READ_VSEC_NAFUS(dev, vsec, &adapter->slices);
> @@ -1026,22 +1025,34 @@ static void cxl_release_adapter(struct device *dev)
>  
>  	pr_devel("cxl_release_adapter\n");
>  
> +	cxl_remove_adapter_nr(adapter);
> +
>  	kfree(adapter);
>  }
>  
> -static struct cxl *cxl_alloc_adapter(struct pci_dev *dev)
> +static struct cxl *cxl_alloc_adapter(void)
>  {
>  	struct cxl *adapter;
> +	int rc;
>  
>  	if (!(adapter = kzalloc(sizeof(struct cxl), GFP_KERNEL)))
>  		return NULL;
>  
> -	adapter->dev.parent = &dev->dev;
> -	adapter->dev.release = cxl_release_adapter;
> -	pci_set_drvdata(dev, adapter);
>  	spin_lock_init(&adapter->afu_list_lock);
>  
> +	if ((rc = cxl_alloc_adapter_nr(adapter)))

Humf

> +		goto err1;
> +
> +	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))

Humf
> +		goto err2;
> +
>  	return adapter;
> +
> +err2:
> +	cxl_remove_adapter_nr(adapter);
> +err1:
> +	kfree(adapter);
> +	return NULL;
>  }
>  
>  static int sanitise_adapter_regs(struct cxl *adapter)
> @@ -1050,57 +1061,94 @@ static int sanitise_adapter_regs(struct cxl *adapter)
>  	return cxl_tlb_slb_invalidate(adapter);
>  }
>  
> -static struct cxl *cxl_init_adapter(struct pci_dev *dev)
> +/* This should contain *only* operations that can safely be done in
> + * both creation and recovery.
> + */
> +static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
>  {
> -	struct cxl *adapter;
> -	bool free = true;
>  	int rc;
>  
> +	adapter->dev.parent = &dev->dev;
> +	adapter->dev.release = cxl_release_adapter;
> +	pci_set_drvdata(dev, adapter);
>  
> -	if (!(adapter = cxl_alloc_adapter(dev)))
> -		return ERR_PTR(-ENOMEM);
> +	if ((rc = pci_enable_device(dev))) {

Backets...

> +		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
> +		return rc;
> +	}
>  
>  	if ((rc = cxl_read_vsec(adapter, dev)))
> -		goto err1;
> +		return rc;
>  
>  	if ((rc = cxl_vsec_looks_ok(adapter, dev)))
> -		goto err1;
> +	        return rc;
>  
>  	if ((rc = setup_cxl_bars(dev)))
> -		goto err1;
> +		return rc;
>  
>  	if ((rc = switch_card_to_cxl(dev)))
> -		goto err1;
> -
> -	if ((rc = cxl_alloc_adapter_nr(adapter)))
> -		goto err1;
> -
> -	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
> -		goto err2;
> +		return rc;
>  
>  	if ((rc = cxl_update_image_control(adapter)))
> -		goto err2;
> +		return rc;
>  
>  	if ((rc = cxl_map_adapter_regs(adapter, dev)))
> -		goto err2;
> +		return rc;
>  
>  	if ((rc = sanitise_adapter_regs(adapter)))
> -		goto err2;
> +		goto err;
>  
>  	if ((rc = init_implementation_adapter_regs(adapter, dev)))
> -		goto err3;
> +		goto err;
>  
>  	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_CAPI)))
> -		goto err3;
> +		goto err;
>  
>  	/* If recovery happened, the last step is to turn on snooping.
>  	 * In the non-recovery case this has no effect */
> -	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON))) {
> -		goto err3;
> -	}
> +	if ((rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON)))
> +		goto err;
>  
>  	if ((rc = cxl_register_psl_err_irq(adapter)))
> -		goto err3;
> +		goto err;
> +
> +	return 0;
> +
> +err:
> +	cxl_unmap_adapter_regs(adapter);
> +	return rc;
> +
> +}
> +
> +static void cxl_deconfigure_adapter(struct cxl *adapter)
> +{
> +	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
> +
> +	cxl_release_psl_err_irq(adapter);
> +	cxl_unmap_adapter_regs(adapter);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static struct cxl *cxl_init_adapter(struct pci_dev *dev)
> +{
> +	struct cxl *adapter;
> +	int rc;
> +
> +	adapter = cxl_alloc_adapter();
> +	if (!adapter)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/* Set defaults for parameters which need to persist over
> +	 * configure/reconfigure
> +	 */
> +	adapter->perst_loads_image = true;
> +
> +	if ((rc = cxl_configure_adapter(adapter, dev))) {

Brackets

> +		pci_disable_device(dev);
> +		cxl_release_adapter(&adapter->dev);
> +		return ERR_PTR(rc);
> +	}
>  
>  	/* Don't care if this one fails: */
>  	cxl_debugfs_adapter_add(adapter);
> @@ -1118,35 +1166,25 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev)
>  	return adapter;
>  
>  err_put1:
> -	device_unregister(&adapter->dev);
> -	free = false;
> +	/* This should mirror cxl_remove_adapter, except without the
> +	 * sysfs parts
> +	 */
>  	cxl_debugfs_adapter_remove(adapter);
> -	cxl_release_psl_err_irq(adapter);
> -err3:
> -	cxl_unmap_adapter_regs(adapter);
> -err2:
> -	cxl_remove_adapter_nr(adapter);
> -err1:
> -	if (free)
> -		kfree(adapter);
> +	cxl_deconfigure_adapter(adapter);
> +	device_unregister(&adapter->dev);
>  	return ERR_PTR(rc);
>  }
>  
>  static void cxl_remove_adapter(struct cxl *adapter)
>  {
> -	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
> -
> -	pr_devel("cxl_release_adapter\n");
> +	pr_devel("cxl_remove_adapter\n");
>  
>  	cxl_sysfs_adapter_remove(adapter);
>  	cxl_debugfs_adapter_remove(adapter);
> -	cxl_release_psl_err_irq(adapter);
> -	cxl_unmap_adapter_regs(adapter);
> -	cxl_remove_adapter_nr(adapter);
>  
> -	device_unregister(&adapter->dev);
> +	cxl_deconfigure_adapter(adapter);
>  
> -	pci_disable_device(pdev);
> +	device_unregister(&adapter->dev);
>  }
>  
>  static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
> @@ -1160,15 +1198,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  	if (cxl_verbose)
>  		dump_cxl_config_space(dev);
>  
> -	if ((rc = pci_enable_device(dev))) {
> -		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
> -		return rc;
> -	}
> -
>  	adapter = cxl_init_adapter(dev);
>  	if (IS_ERR(adapter)) {
>  		dev_err(&dev->dev, "cxl_init_adapter failed: %li\n", PTR_ERR(adapter));
> -		pci_disable_device(dev);
>  		return PTR_ERR(adapter);
>  	}
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path.
  2015-08-11  3:52   ` Cyril Bur
@ 2015-08-11  6:38     ` Daniel Axtens
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11  6:38 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, imunsie

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

Hi Cyril,
> >  - PCI regions are allocated in cxl_map_adapter_regs.
> >    Therefore they should be released in unmap, not elsewhere.
> > 
> 
> You've changed the order in which cxl_remove_adapter() does its work, which,
> I'm sure you've considered and it's fine, best to check.

Yeah, I have considered this. I've also tested it a lot.

Thanks for the review.

> Acked-by: Cyril Bur <cyrilbur@gmail.com>


-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST.
  2015-07-28  5:28 ` [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST Daniel Axtens
@ 2015-08-11  7:15   ` Cyril Bur
  2015-08-11 11:22     ` Daniel Axtens
  0 siblings, 1 reply; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  7:15 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, Matthew R. Ochs, imunsie, Manoj Kumar

On Tue, 28 Jul 2015 15:28:41 +1000
Daniel Axtens <dja@axtens.net> wrote:

> Provide a kernel API and a sysfs entry which allow a user to specify
> that when a card is PERSTed, it's image will stay the same, allowing
> it to participate in EEH.
> 
> cxl_reset is used to reflash the card. In that case, we cannot safely
> assert that the image will not change. Therefore, disallow cxl_reset
> if the flag is set.
> 

So I'm not super all over the putting all sorts of code inside CONFIG_CXL_EEH,
I understand that there is another driver being merged and they'll use
CONFIG_CXL_EEH so that both this driver and the other driver can go in the same
merge window but does this mean you need to put it around everything here?

I may have misunderstood what you've told me but if the other driver depends on
work done in this one (and not the other way around), if they depend on
CONFIG_CXL_EEH which you create in the last patch, then they cannot be built
until this series exists, so they can't have issues.

The one catch is that this series as is waits untill the last patch to actually
create the symbol, and therefore compile everything so lets be sure you don't
break bisecting. You might need to rethink the order of things in 8/10 and 9/10,
I can't see anything obvious if it helps...

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  Documentation/ABI/testing/sysfs-class-cxl | 10 ++++++++++
>  drivers/misc/cxl/api.c                    |  9 +++++++++
>  drivers/misc/cxl/cxl.h                    |  3 +++
>  drivers/misc/cxl/pci.c                    | 11 +++++++++++
>  drivers/misc/cxl/sysfs.c                  | 30 ++++++++++++++++++++++++++++++
>  include/misc/cxl.h                        | 12 ++++++++++++
>  6 files changed, 75 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
> index acfe9df83139..b07e86d4597f 100644
> --- a/Documentation/ABI/testing/sysfs-class-cxl
> +++ b/Documentation/ABI/testing/sysfs-class-cxl
> @@ -223,3 +223,13 @@ Description:    write only
>                  Writing 1 will issue a PERST to card which may cause the card
>                  to reload the FPGA depending on load_image_on_perst.
>  Users:		https://github.com/ibm-capi/libcxl
> +
> +What:		/sys/class/cxl/<card>/perst_reloads_same_image
> +Date:		July 2015
> +Contact:	linuxppc-dev@lists.ozlabs.org
> +Description:	read/write
> +		Trust that when an image is reloaded via PERST, it will not
> +		have changed.
> +		0 = don't trust, the image may be different (default)
> +		1 = trust that the image will not change.
> +Users:		https://github.com/ibm-capi/libcxl
> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index 729e0851167d..c1012ced0323 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -327,3 +327,12 @@ int cxl_afu_reset(struct cxl_context *ctx)
>  	return cxl_afu_check_and_enable(afu);
>  }
>  EXPORT_SYMBOL_GPL(cxl_afu_reset);
> +
> +#ifdef CONFIG_CXL_EEH
> +void cxl_perst_reloads_same_image(struct cxl_afu *afu,
> +				  bool perst_reloads_same_image)
> +{
> +	afu->adapter->perst_same_image = perst_reloads_same_image;
> +}
> +EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
> +#endif
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 88a88c445e2a..6dd4158f76ac 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -493,6 +493,9 @@ struct cxl {
>  	bool user_image_loaded;
>  	bool perst_loads_image;
>  	bool perst_select_user;
> +#ifdef CONFIG_CXL_EEH
> +	bool perst_same_image;
> +#endif
>  };
>  
>  int cxl_alloc_one_irq(struct cxl *adapter);
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 0acf9e62733e..b6a189b35323 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -875,6 +875,14 @@ int cxl_reset(struct cxl *adapter)
>  	int i;
>  	u32 val;
>  
> +#ifdef CONFIG_CXL_EEH
> +	if (adapter->perst_same_image) {
> +		dev_warn(&dev->dev,
> +			 "cxl: refusing to reset/reflash when perst_reloads_same_image is set.\n");
> +		return -EINVAL;
> +	}
> +#endif
> +
>  	dev_info(&dev->dev, "CXL reset\n");
>  
>  	/* pcie_warm_reset requests a fundamental pci reset which includes a
> @@ -1148,6 +1156,9 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev)
>  	 * configure/reconfigure
>  	 */
>  	adapter->perst_loads_image = true;
> +#ifdef CONFIG_CXL_EEH
> +	adapter->perst_same_image = false;
> +#endif
>  
>  	if ((rc = cxl_configure_adapter(adapter, dev))) {
>  		pci_disable_device(dev);
> diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
> index 31f38bc71a3d..4bcb63258e3e 100644
> --- a/drivers/misc/cxl/sysfs.c
> +++ b/drivers/misc/cxl/sysfs.c
> @@ -112,12 +112,42 @@ static ssize_t load_image_on_perst_store(struct device *device,
>  	return count;
>  }
>  
> +#ifdef CONFIG_CXL_EEH
> +static ssize_t perst_reloads_same_image_show(struct device *device,
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	struct cxl *adapter = to_cxl_adapter(device);
> +
> +	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->perst_same_image);
> +}
> +
> +static ssize_t perst_reloads_same_image_store(struct device *device,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count)
> +{
> +	struct cxl *adapter = to_cxl_adapter(device);
> +	int rc;
> +	int val;
> +
> +	rc = sscanf(buf, "%i", &val);
> +	if ((rc != 1) || !(val == 1 || val == 0))
> +		return -EINVAL;
> +
> +	adapter->perst_same_image = (val == 1 ? true : false);
> +	return count;
> +}
> +#endif /* CONFIG_CXL_EEH */
> +
>  static struct device_attribute adapter_attrs[] = {
>  	__ATTR_RO(caia_version),
>  	__ATTR_RO(psl_revision),
>  	__ATTR_RO(base_image),
>  	__ATTR_RO(image_loaded),
>  	__ATTR_RW(load_image_on_perst),
> +#ifdef CONFIG_CXL_EEH
> +	__ATTR_RW(perst_reloads_same_image),
> +#endif
>  	__ATTR(reset, S_IWUSR, NULL, reset_adapter_store),
>  };
>  
> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> index 7a6c1d6cc173..7f2cdc6caab3 100644
> --- a/include/misc/cxl.h
> +++ b/include/misc/cxl.h
> @@ -200,4 +200,16 @@ unsigned int cxl_fd_poll(struct file *file, struct poll_table_struct *poll);
>  ssize_t cxl_fd_read(struct file *file, char __user *buf, size_t count,
>  			   loff_t *off);
>  
> +#ifdef CONFIG_CXL_EEH
> +/*
> + * For EEH, a driver may want to assert a PERST will reload the same image
> + * from flash into the FPGA.
> + *
> + * This is a property of the entire adapter, not a single AFU, so drivers
> + * should set this property with care!
> + */
> +void cxl_perst_reloads_same_image(struct cxl_afu *afu,
> +				  bool perst_reloads_same_image);
> +#endif /* CONFIG_CXL_EEH */
> +
>  #endif /* _MISC_CXL_H */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 09/10] cxl: EEH support
  2015-07-28  5:28 ` [PATCH v2 09/10] cxl: EEH support Daniel Axtens
@ 2015-08-11  7:23   ` Cyril Bur
  0 siblings, 0 replies; 29+ messages in thread
From: Cyril Bur @ 2015-08-11  7:23 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, mikey, Matthew R. Ochs, imunsie, Manoj Kumar

On Tue, 28 Jul 2015 15:28:42 +1000
Daniel Axtens <dja@axtens.net> wrote:

> EEH (Enhanced Error Handling) allows a driver to recover from the
> temporary failure of an attached PCI card. Enable basic CXL support
> for EEH.
> 

Same thoughts about the config option as in 8/10.

As I've mentioned to you my knowledge of PCI, EEH and CAPI are limited, after
talking to you about it and apart from the CONFIG problems it looks like it
works as advertised.

> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  drivers/misc/cxl/cxl.h  |   1 +
>  drivers/misc/cxl/pci.c  | 253 ++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/misc/cxl/vphb.c |   8 ++
>  3 files changed, 262 insertions(+)
> 
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 6dd4158f76ac..2065e894e46d 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -699,6 +699,7 @@ int cxl_psl_purge(struct cxl_afu *afu);
>  
>  void cxl_stop_trace(struct cxl *cxl);
>  int cxl_pci_vphb_add(struct cxl_afu *afu);
> +void cxl_pci_vphb_reconfigure(struct cxl_afu *afu);
>  void cxl_pci_vphb_remove(struct cxl_afu *afu);
>  
>  extern struct pci_driver cxl_pci_driver;
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index b6a189b35323..60ae863b6f0a 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -24,6 +24,7 @@
>  #include <asm/io.h>
>  
>  #include "cxl.h"
> +#include <misc/cxl.h>
> 

Re our discussion: you do need it :)
 
>  
>  #define CXL_PCI_VSEC_ID	0x1280
> @@ -1249,10 +1250,262 @@ static void cxl_remove(struct pci_dev *dev)
>  	cxl_remove_adapter(adapter);
>  }
>  
> +#ifdef CONFIG_CXL_EEH
> +static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
> +						pci_channel_state_t state)
> +{
> +	struct pci_dev *afu_dev;
> +	pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
> +	pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET;
> +
> +	/* There should only be one entry, but go through the list
> +	 * anyway
> +	 */
> +	list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
> +		if (!afu_dev->driver)
> +			continue;
> +
> +		if (afu_dev->driver->err_handler)
> +			afu_result = afu_dev->driver->err_handler->error_detected(afu_dev,
> +										  state);
> +		/* Disconnect trumps all, NONE trumps NEED_RESET */
> +		if (afu_result == PCI_ERS_RESULT_DISCONNECT)
> +			result = PCI_ERS_RESULT_DISCONNECT;
> +		else if ((afu_result == PCI_ERS_RESULT_NONE) &&
> +			 (result == PCI_ERS_RESULT_NEED_RESET))
> +			result = PCI_ERS_RESULT_NONE;
> +	}
> +	return result;
> +}
> +
> +static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> +					       pci_channel_state_t state)
> +{
> +	struct cxl *adapter = pci_get_drvdata(pdev);
> +	struct cxl_afu *afu;
> +	pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
> +	int i;
> +
> +	/* At this point, we could still have an interrupt pending.
> +	 * Let's try to get them out of the way before they do
> +	 * anything we don't like.
> +	 */
> +	schedule();
> +
> +	/* If we're permanently dead, give up. */
> +	if (state == pci_channel_io_perm_failure) {
> +		/* Tell the AFU drivers; but we don't care what they
> +		 * say, we're going away.
> +		 */
> +		for (i = 0; i < adapter->slices; i++) {
> +			afu = adapter->afu[i];
> +			cxl_vphb_error_detected(afu, state);
> +		}
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/* Are we reflashing?
> +	 *
> +	 * If we reflash, we could come back as something entirely
> +	 * different, including a non-CAPI card. As such, by default
> +	 * we don't participate in the process. We'll be unbound and
> +	 * the slot re-probed. (TODO: check EEH doesn't blindly rebind
> +	 * us!)
> +	 *
> +	 * However, this isn't the entire story: for reliablity
> +	 * reasons, we usually want to reflash the FPGA on PERST in
> +	 * order to get back to a more reliable known-good state.
> +	 *
> +	 * This causes us a bit of a problem: if we reflash we can't
> +	 * trust that we'll come back the same - we could have a new
> +	 * image and been PERSTed in order to load that
> +	 * image. However, most of the time we actually *will* come
> +	 * back the same - for example a regular EEH event.
> +	 *
> +	 * Therefore, we allow the user to assert that the image is
> +	 * indeed the same and that we should continue on into EEH
> +	 * anyway.
> +	 */
> +	if (adapter->perst_loads_image && !adapter->perst_same_image) {
> +		/* TODO take the PHB out of CXL mode */
> +		dev_info(&pdev->dev, "reflashing, so opting out of EEH!\n");
> +		return PCI_ERS_RESULT_NONE;
> +	}
> +
> +	/*
> +	 * At this point, we want to try to recover.  We'll always
> +	 * need a complete slot reset: we don't trust any other reset.
> +	 *
> +	 * Now, we go through each AFU:
> +	 *  - We send the driver, if bound, an error_detected callback.
> +	 *    We expect it to clean up, but it can also tell us to give
> +	 *    up and permanently detach the card. To simplify things, if
> +	 *    any bound AFU driver doesn't support EEH, we give up on EEH.
> +	 *
> +	 *  - We detach all contexts associated with the AFU. This
> +	 *    does not free them, but puts them into a CLOSED state
> +	 *    which causes any the associated files to return useful
> +	 *    errors to userland. It also unmaps, but does not free,
> +	 *    any IRQs.
> +	 *
> +	 *  - We clean up our side: releasing and unmapping resources we hold
> +	 *    so we can wire them up again when the hardware comes back up.
> +	 *
> +	 * Driver authors should note:
> +	 *
> +	 *  - Any contexts you create in your kernel driver (except
> +	 *    those associated with anonymous file descriptors) are
> +	 *    your responsibility to free and recreate. Likewise with
> +	 *    any attached resources.
> +	 *
> +	 *  - We will take responsibility for re-initialising the
> +	 *    device context (the one set up for you in
> +	 *    cxl_pci_enable_device_hook and accessed through
> +	 *    cxl_get_context). If you've attached IRQs or other
> +	 *    resources to it, they remains yours to free.
> +	 *
> +	 * All calls you make into cxl that normally touch the
> +	 * hardware will not touch the hardware during recovery. So
> +	 * you can call the same functions to release resources as you
> +	 * normally would.
> +	 *
> +	 * Two examples:
> +	 *
> +	 * 1) If you normally free all your resources at the end of
> +	 *    each request, or if you use anonymous FDs, your
> +	 *    error_detected callback can simply set a flag to tell
> +	 *    your driver not to start any new calls. You can then
> +	 *    clear the flag in the resume callback.
> +	 *
> +	 * 2) If you normally allocate your resources on startup:
> +	 *     * Set a flag in error_detected as above.
> +	 *     * Let CXL detach your contexts.
> +	 *     * In slot_reset, free the old resources and allocate new ones.
> +	 *     * In resume, clear the flag to allow things to start.
> +	 */
> +	for (i = 0; i < adapter->slices; i++) {
> +		afu = adapter->afu[i];
> +
> +		result = cxl_vphb_error_detected(afu, state);
> +
> +		/* Only continue if everyone agrees on NEED_RESET */
> +		if (result != PCI_ERS_RESULT_NEED_RESET)
> +			return result;
> +
> +		cxl_context_detach_all(afu);
> +		cxl_afu_deactivate_mode(afu);
> +		cxl_deconfigure_afu(afu);
> +	}
> +	cxl_deconfigure_adapter(adapter);
> +
> +	return result;
> +}
> +
> +static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
> +{
> +	struct cxl *adapter = pci_get_drvdata(pdev);
> +	struct cxl_afu *afu;
> +	struct cxl_context *ctx;
> +	struct pci_dev *afu_dev;
> +	pci_ers_result_t afu_result = PCI_ERS_RESULT_RECOVERED;
> +	pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
> +	int i;
> +
> +	if (cxl_configure_adapter(adapter, pdev))
> +		goto err;
> +
> +	for (i = 0; i < adapter->slices; i++) {
> +		afu = adapter->afu[i];
> +
> +		if (cxl_configure_afu(afu, adapter, pdev))
> +			goto err;
> +
> +		if (cxl_afu_select_best_mode(afu))
> +			goto err;
> +
> +		cxl_pci_vphb_reconfigure(afu);
> +
> +		list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
> +			/* Reset the device context.
> +			 * TODO: make this less disruptive
> +			 */
> +			ctx = cxl_get_context(afu_dev);
> +
> +			if (ctx && cxl_release_context(ctx))
> +				goto err;
> +
> +			ctx = cxl_dev_context_init(afu_dev);
> +			if (!ctx)
> +				goto err;
> +
> +			afu_dev->dev.archdata.cxl_ctx = ctx;
> +
> +			if (cxl_afu_check_and_enable(afu))
> +				goto err;
> +
> +			/* If there's a driver attached, allow it to
> +			 * chime in on recovery. Drivers should check
> +			 * if everything has come back OK.
> +			 */
> +			if (!afu_dev->driver)
> +				continue;
> +
> +			if (afu_dev->driver->err_handler &&
> +			    afu_dev->driver->err_handler->slot_reset)
> +				afu_result = afu_dev->driver->err_handler->slot_reset(afu_dev);
> +
> +			if (afu_result == PCI_ERS_RESULT_DISCONNECT)
> +				result = PCI_ERS_RESULT_DISCONNECT;
> +		}
> +	}
> +	return result;
> +
> +err:
> +	/* All the bits that happen in both error_detected and cxl_remove
> +	 * should be idempotent, so we don't need to worry about leaving a mix
> +	 * of unconfigured and reconfigured resources.
> +	 */
> +	dev_err(&pdev->dev, "EEH recovery failed. Asking to be disconnected.\n");
> +	return PCI_ERS_RESULT_DISCONNECT;
> +}
> +
> +static void cxl_pci_resume(struct pci_dev *pdev)
> +{
> +	struct cxl *adapter = pci_get_drvdata(pdev);
> +	struct cxl_afu *afu;
> +	struct pci_dev *afu_dev;
> +	int i;
> +
> +	/* Everything is back now. Drivers should restart work now.
> +	 * This is not the place to be checking if everything came back up
> +	 * properly, because there's no return value: do that in slot_reset.
> +	 */
> +	for (i = 0; i < adapter->slices; i++) {
> +		afu = adapter->afu[i];
> +
> +		list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
> +			if (afu_dev->driver && afu_dev->driver->err_handler &&
> +			    afu_dev->driver->err_handler->resume)
> +				afu_dev->driver->err_handler->resume(afu_dev);
> +		}
> +	}
> +}
> +
> +static const struct pci_error_handlers cxl_err_handler = {
> +	.error_detected = cxl_pci_error_detected,
> +	.slot_reset = cxl_pci_slot_reset,
> +	.resume = cxl_pci_resume,
> +};
> +#endif /* CONFIG_CXL_EEH */
> +
> +
>  struct pci_driver cxl_pci_driver = {
>  	.name = "cxl-pci",
>  	.id_table = cxl_pci_tbl,
>  	.probe = cxl_probe,
>  	.remove = cxl_remove,
>  	.shutdown = cxl_remove,
> +#ifdef CONFIG_CXL_EEH
> +	.err_handler = &cxl_err_handler,
> +#endif
>  };
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 2930911c1e42..6dd16a6d153f 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -266,6 +266,14 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
>  	return 0;
>  }
>  
> +void cxl_pci_vphb_reconfigure(struct cxl_afu *afu)
> +{
> +	/* When we are reconfigured, the AFU's MMIO space is unmapped
> +	 * and remapped. We need to reflect this in the PHB's view of
> +	 * the world.
> +	 */
> +	afu->phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset;
> +}
>  
>  void cxl_pci_vphb_remove(struct cxl_afu *afu)
>  {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST.
  2015-08-11  7:15   ` Cyril Bur
@ 2015-08-11 11:22     ` Daniel Axtens
  2015-08-11 23:47       ` Daniel Axtens
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11 11:22 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, Matthew R. Ochs, imunsie, Manoj Kumar

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]


> So I'm not super all over the putting all sorts of code inside CONFIG_CXL_EEH,
> I understand that there is another driver being merged and they'll use
> CONFIG_CXL_EEH so that both this driver and the other driver can go in the same
> merge window but does this mean you need to put it around everything here?
> 
> I may have misunderstood what you've told me but if the other driver depends on
> work done in this one (and not the other way around), if they depend on
> CONFIG_CXL_EEH which you create in the last patch, then they cannot be built
> until this series exists, so they can't have issues.
> 
> The one catch is that this series as is waits untill the last patch to actually
> create the symbol, and therefore compile everything so lets be sure you don't
> break bisecting. You might need to rethink the order of things in 8/10 and 9/10,
> I can't see anything obvious if it helps...
> 

Yeah, so you're right. I've taken the guards off everything except the
new API function. I still want to leave the patch that adds the symbol
at the end: that way you don't get the function unless it is actually
going to make a difference in the EEH process. 

The other driver (cxlflash) just guards the API function, inserting a
stub if it's not defined. So this setup will make our code cleaner and
will still let their code merge cleanly.

Thanks again for the review.

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
  2015-08-11  6:01   ` Cyril Bur
@ 2015-08-11 22:38     ` Daniel Axtens
  2015-08-12 10:14     ` David Laight
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11 22:38 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, imunsie

[-- Attachment #1: Type: text/plain, Size: 389 bytes --]

> Looks like the majority of uses in this file the conditional block is only
> one line then it makes sense (or at least in terms of numbers of lines... fair
> enough), however, if you have a conditional block spanning multiple lines, I
> don't like.
> 
Much as this is a massive nit pick, I have split out the two
conditionals that spanned multiple lines.

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST.
  2015-08-11 11:22     ` Daniel Axtens
@ 2015-08-11 23:47       ` Daniel Axtens
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-11 23:47 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, mikey, Matthew R. Ochs, imunsie, Manoj Kumar

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

So after further offline conversations,

Yes, Cyril, you are right, I don't need the ifdef in my code. I just
need the symbol. I will amend my patches appropriately.

On Tue, 2015-08-11 at 21:22 +1000, Daniel Axtens wrote:
> > So I'm not super all over the putting all sorts of code inside CONFIG_CXL_EEH,
> > I understand that there is another driver being merged and they'll use
> > CONFIG_CXL_EEH so that both this driver and the other driver can go in the same
> > merge window but does this mean you need to put it around everything here?
> > 
> > I may have misunderstood what you've told me but if the other driver depends on
> > work done in this one (and not the other way around), if they depend on
> > CONFIG_CXL_EEH which you create in the last patch, then they cannot be built
> > until this series exists, so they can't have issues.
> > 
> > The one catch is that this series as is waits untill the last patch to actually
> > create the symbol, and therefore compile everything so lets be sure you don't
> > break bisecting. You might need to rethink the order of things in 8/10 and 9/10,
> > I can't see anything obvious if it helps...
> > 
> 
> Yeah, so you're right. I've taken the guards off everything except the
> new API function. I still want to leave the patch that adds the symbol
> at the end: that way you don't get the function unless it is actually
> going to make a difference in the EEH process. 
> 
> The other driver (cxlflash) just guards the API function, inserting a
> stub if it's not defined. So this setup will make our code cleaner and
> will still let their code merge cleanly.
> 
> Thanks again for the review.
> 

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
  2015-08-11  6:01   ` Cyril Bur
  2015-08-11 22:38     ` Daniel Axtens
@ 2015-08-12 10:14     ` David Laight
  2015-08-12 21:58       ` Daniel Axtens
  1 sibling, 1 reply; 29+ messages in thread
From: David Laight @ 2015-08-12 10:14 UTC (permalink / raw)
  To: 'Cyril Bur', Daniel Axtens
  Cc: linuxppc-dev@ozlabs.org, mikey@neuling.org, imunsie@au.ibm.com

RnJvbTogQ3lyaWwgQnVyDQo+IFNlbnQ6IDExIEF1Z3VzdCAyMDE1IDA3OjAxDQouLi4NCj4gWW91
IGhhdmUgYSBkaWxlbWEgd2l0aCB0aGUgdXNlIG9mIHVnbHkgaWYgKHJjID0gZm9vKCkpLiBJIGRv
bid0IGxpa2UgaXQgYnV0IHRoZQ0KPiBmaWxlIGlzIGxpdHRlcmVkIHdpdGggaXQuDQo+IA0KPiBM
b29rcyBsaWtlIHRoZSBtYWpvcml0eSBvZiB1c2VzIGluIHRoaXMgZmlsZSB0aGUgY29uZGl0aW9u
YWwgYmxvY2sgaXMgb25seQ0KPiBvbmUgbGluZSB0aGVuIGl0IG1ha2VzIHNlbnNlIChvciBhdCBs
ZWFzdCBpbiB0ZXJtcyBvZiBudW1iZXJzIG9mIGxpbmVzLi4uIGZhaXINCj4gZW5vdWdoKSwgaG93
ZXZlciwgaWYgeW91IGhhdmUgYSBjb25kaXRpb25hbCBibG9jayBzcGFubmluZyBtdWx0aXBsZSBs
aW5lcywgSQ0KPiBkb24ndCBsaWtlLg0KLi4uDQo+ID4gIAlrZnJlZShhZGFwdGVyKTsNCj4gPiAg
fQ0KPiA+DQo+ID4gLXN0YXRpYyBzdHJ1Y3QgY3hsICpjeGxfYWxsb2NfYWRhcHRlcihzdHJ1Y3Qg
cGNpX2RldiAqZGV2KQ0KPiA+ICtzdGF0aWMgc3RydWN0IGN4bCAqY3hsX2FsbG9jX2FkYXB0ZXIo
dm9pZCkNCj4gPiAgew0KPiA+ICAJc3RydWN0IGN4bCAqYWRhcHRlcjsNCj4gPiArCWludCByYzsN
Cj4gPg0KPiA+ICAJaWYgKCEoYWRhcHRlciA9IGt6YWxsb2Moc2l6ZW9mKHN0cnVjdCBjeGwpLCBH
RlBfS0VSTkVMKSkpDQo+ID4gIAkJcmV0dXJuIE5VTEw7DQo+ID4NCj4gPiAtCWFkYXB0ZXItPmRl
di5wYXJlbnQgPSAmZGV2LT5kZXY7DQo+ID4gLQlhZGFwdGVyLT5kZXYucmVsZWFzZSA9IGN4bF9y
ZWxlYXNlX2FkYXB0ZXI7DQo+ID4gLQlwY2lfc2V0X2RydmRhdGEoZGV2LCBhZGFwdGVyKTsNCj4g
PiAgCXNwaW5fbG9ja19pbml0KCZhZGFwdGVyLT5hZnVfbGlzdF9sb2NrKTsNCj4gPg0KPiA+ICsJ
aWYgKChyYyA9IGN4bF9hbGxvY19hZGFwdGVyX25yKGFkYXB0ZXIpKSkNCj4gDQo+IEh1bWYNCj4g
DQo+ID4gKwkJZ290byBlcnIxOw0KPiA+ICsNCj4gPiArCWlmICgocmMgPSBkZXZfc2V0X25hbWUo
JmFkYXB0ZXItPmRldiwgImNhcmQlaSIsIGFkYXB0ZXItPmFkYXB0ZXJfbnVtKSkpDQo+IA0KPiBI
dW1mDQo+ID4gKwkJZ290byBlcnIyOw0KPiA+ICsNCj4gPiAgCXJldHVybiBhZGFwdGVyOw0KPiA+
ICsNCj4gPiArZXJyMjoNCj4gPiArCWN4bF9yZW1vdmVfYWRhcHRlcl9ucihhZGFwdGVyKTsNCj4g
PiArZXJyMToNCj4gPiArCWtmcmVlKGFkYXB0ZXIpOw0KPiA+ICsJcmV0dXJuIE5VTEw7DQo+ID4g
IH0NCi4uLg0KDQpUaGUgZnVuY3Rpb24gYWJvdmUgZG9lc24ndCBldmVuIHVzZSB0aGUgJ3JjJyB2
YWx1ZS4NCg0KCURhdmlkDQo=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/10] cxl: Refactor adaptor init/teardown
  2015-08-12 10:14     ` David Laight
@ 2015-08-12 21:58       ` Daniel Axtens
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Axtens @ 2015-08-12 21:58 UTC (permalink / raw)
  To: David Laight
  Cc: 'Cyril Bur', linuxppc-dev@ozlabs.org, mikey@neuling.org,
	imunsie@au.ibm.com

[-- Attachment #1: Type: text/plain, Size: 163 bytes --]

> The function above doesn't even use the 'rc' value.

Darn, you're right.

I'll fix that in a new version.

-- 
Regards,
Daniel

-- 
Regards,
Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 860 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-08-12 22:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-28  5:28 [PATCH v2 00/10] CXL EEH Handling Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 01/10] cxl: Drop commands if the PCI channel is not in normal state Daniel Axtens
2015-08-11  3:31   ` Cyril Bur
2015-08-11  4:11     ` Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 02/10] cxl: Allocate and release the SPA with the AFU Daniel Axtens
2015-08-11  3:42   ` Cyril Bur
2015-08-11  4:16     ` Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 03/10] cxl: Make IRQ release idempotent Daniel Axtens
2015-08-11  3:44   ` Cyril Bur
2015-07-28  5:28 ` [PATCH v2 04/10] cxl: Clean up adapter MMIO unmap path Daniel Axtens
2015-08-11  3:52   ` Cyril Bur
2015-08-11  6:38     ` Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 05/10] cxl: Refactor adaptor init/teardown Daniel Axtens
2015-08-11  6:01   ` Cyril Bur
2015-08-11 22:38     ` Daniel Axtens
2015-08-12 10:14     ` David Laight
2015-08-12 21:58       ` Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 06/10] cxl: Refactor AFU init/teardown Daniel Axtens
2015-08-11  3:59   ` Cyril Bur
2015-07-28  5:28 ` [PATCH v2 07/10] cxl: Don't remove AFUs/vPHBs in cxl_reset Daniel Axtens
2015-08-11  5:57   ` Cyril Bur
2015-07-28  5:28 ` [PATCH v2 08/10] cxl: Allow the kernel to trust that an image won't change on PERST Daniel Axtens
2015-08-11  7:15   ` Cyril Bur
2015-08-11 11:22     ` Daniel Axtens
2015-08-11 23:47       ` Daniel Axtens
2015-07-28  5:28 ` [PATCH v2 09/10] cxl: EEH support Daniel Axtens
2015-08-11  7:23   ` Cyril Bur
2015-07-28  5:28 ` [PATCH v2 10/10] cxl: Add CONFIG_CXL_EEH symbol Daniel Axtens
2015-08-11  3:59   ` Cyril Bur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).