Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Richard Cochran @ 2016-12-07 21:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161207193908.GA13062@netboy>

On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
> > +static s32 gem_ptp_max_adj(unsigned int f_nom)
> > +{
> > +	u64 adj;
> > +
> > +	/* The 48 bits of seconds for the GEM overflows every:
> > +	 * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
> > +	 * thus the maximum adjust frequency must not overflow CNS register:
> > +	 *
> > +	 * addend  = 10^9/nominal_freq
> > +	 * adj_max = +/- addend*ppb_max/10^9
> > +	 * max_ppb = (2^8-1)*nominal_freq-10^9
> > +	 */
> > +	adj = f_nom;
> > +	adj *= 0xffff;
> > +	adj -= 1000000000ULL;
> 
> What is this computation, and how does it relate to the comment?

I am not sure what you meant, but it sounds like you are on the wrong
track.  Let me explain...

The max_adj has nothing at all to do with the width of the time
register.  Rather, it should reflect the maximum possible change in
the tuning word.

For example, with a nominal 8 ns period, the tuning word is 0x80000.
Looking at running the clock more slowly, the slowest possible word is
0x00001, meaning a difference of 0x7FFFF.  This implies an adjustment
of 0x7FFFF/0x80000 or 999998092 ppb.  Running more quickly, we can
already have 0x100000, twice as fast, or just under 2 billion ppb.

You should consider the extreme cases to determine the most limited
(smallest) max_adj value:

Case 1 - high frequency
~~~~~~~~~~~~~~~~~~~~~~~

With a nominal 1 ns period, we have the nominal tuning word 0x10000.
The smallest is 0x1 for a difference of 0xFFFF.  This corresponds to
an adjustment of 0xFFFF/0x10000 = .9999847412109375 or 999984741 ppb.

Case 2 - low frequency
~~~~~~~~~~~~~~~~~~~~~~

With a nominal 255 ns period, the nominal word is 0xFF0000, the
largest 0xFFFFFF, and the difference is 0xFFFF.  This corresponds to
and adjustment of 0xFFFF/0xFF0000 = .0039215087890625 or 3921508 ppb.

Since 3921508 ppb is a huge adjustment, you can simply use that as a
safe maximum, ignoring the actual input clock.

Thanks,
Richard

^ permalink raw reply

* [PATCH v5 0/3] Altera Cyclone Passive Serial SPI FPGA Manager
From: Joshua Clayton @ 2016-12-07 21:04 UTC (permalink / raw)
  To: linux-arm-kernel

This series adds an FPGA manager for Altera cyclone FPGAs
that can program them using an spi port and a couple of gpios, using
Alteras passive serial protocol.

Still need an ACK from Russell King on the ARM specific portion of 
patch 1. Any comment/criticism on the addition of bitrev8x4() to bitrev.h
would be welcome as well.

Changes from v4:
- Added the needed return statement to __arch_bitrev8x4()
- Added Rob Herrings ACK for and fix a typo in the commit log of patch 2

Changes from v3:
- Fixed up the state() function to return the state of the status pin
  reqested by Alan Tull
- Switched the pin to ACTIVE_LOW and coresponding logic level, and updated
  the corresponding documentation. Thanks Rob Herring for pointing out my
  mistake.
- Per Rob Herring, switched from "gpio" to "gpios" in dts

Changes from v2:
- Merged patch 3 and 4 as suggested in review by Moritz Fischer
- Changed FPGA_MIN_DELAY from 250 to 50 ms is the time advertized by
  Altera. This now works, as we don't assume it is done

Changes from v1:
- Changed the name from cyclone-spi-fpga-mgr to cyclone-ps-spi-fpga-mgr
  This name change was requested by Alan Tull, to be specific about which
  programming method is being employed on the fpga.
- Changed the name of the reset-gpio to config-gpio to closer match the
  way the pins are described in the Altera manual
- Moved MODULE_LICENCE, _AUTHOR, and _DESCRIPTION to the bottom

- Added a bitrev8x4() function to the bitrev headers and implemented ARM
 const, runtime, and ARM specific faster versions (This may end up
 needing to be a standalone patch)

- Moved the bitswapping into cyclonespi_write(), as requested.
  This falls short of my desired generic lsb first spi support, but is a step
  in that direction.

- Fixed whitespace problems introduced during refactoring

- Replaced magic number for initial delay with a descriptive macro
- Poll the fpga to see when it is ready rather than a fixed 1 ms sleep

Joshua Clayton (3):
  lib: add bitrev8x4()
  doc: dt: add cyclone-ps-spi binding document
  fpga manager: Add cyclone-ps-spi driver for Altera FPGAs

 .../bindings/fpga/cyclone-ps-spi-fpga-mgr.txt      |  25 +++
 arch/arm/include/asm/bitrev.h                      |   6 +
 drivers/fpga/Kconfig                               |   7 +
 drivers/fpga/Makefile                              |   1 +
 drivers/fpga/cyclone-ps-spi.c                      | 181 +++++++++++++++++++++
 include/linux/bitrev.h                             |  26 +++
 6 files changed, 246 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/fpga/cyclone-ps-spi-fpga-mgr.txt
 create mode 100644 drivers/fpga/cyclone-ps-spi.c

-- 
2.9.3

^ permalink raw reply

* [PATCH v5 1/3] lib: add bitrev8x4()
From: Joshua Clayton @ 2016-12-07 21:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <cover.1481139171.git.stillcompiling@gmail.com>

Add a function to reverse bytes within a 32 bit word.
This function is more efficient than using the 8 bit version when
iterating over an array

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
---
 arch/arm/include/asm/bitrev.h |  6 ++++++
 include/linux/bitrev.h        | 26 ++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/arch/arm/include/asm/bitrev.h b/arch/arm/include/asm/bitrev.h
index ec291c3..9482f78 100644
--- a/arch/arm/include/asm/bitrev.h
+++ b/arch/arm/include/asm/bitrev.h
@@ -17,4 +17,10 @@ static __always_inline __attribute_const__ u8 __arch_bitrev8(u8 x)
 	return __arch_bitrev32((u32)x) >> 24;
 }
 
+static __always_inline __attribute_const__ u32 __arch_bitrev8x4(u32 x)
+{
+	__asm__ ("rbit %0, %1; rev %0, %0" : "=r" (x) : "r" (x));
+	return x;
+}
+
 #endif
diff --git a/include/linux/bitrev.h b/include/linux/bitrev.h
index fb790b8..b1cfa1a 100644
--- a/include/linux/bitrev.h
+++ b/include/linux/bitrev.h
@@ -9,6 +9,7 @@
 #define __bitrev32 __arch_bitrev32
 #define __bitrev16 __arch_bitrev16
 #define __bitrev8 __arch_bitrev8
+#define __bitrev8x4 __arch_bitrev8x4
 
 #else
 extern u8 const byte_rev_table[256];
@@ -27,6 +28,14 @@ static inline u32 __bitrev32(u32 x)
 	return (__bitrev16(x & 0xffff) << 16) | __bitrev16(x >> 16);
 }
 
+static inline u32 __bitrev8x4(u32 x)
+{
+	return(__bitrev8(x & 0xff) |
+	       (__bitrev8((x >> 8)  & 0xff) << 8) |
+	       (__bitrev8((x >> 16)  & 0xff) << 16) |
+	       (__bitrev8((x >> 24)  & 0xff) << 24));
+}
+
 #endif /* CONFIG_HAVE_ARCH_BITREVERSE */
 
 #define __constant_bitrev32(x)	\
@@ -50,6 +59,15 @@ static inline u32 __bitrev32(u32 x)
 	__x;								\
 })
 
+#define __constant_bitrev8x4(x) \
+({			\
+	u32 __x = x;	\
+	__x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4);	\
+	__x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2);	\
+	__x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1);	\
+	__x;								\
+})
+
 #define __constant_bitrev8(x)	\
 ({					\
 	u8 __x = x;			\
@@ -75,6 +93,14 @@ static inline u32 __bitrev32(u32 x)
 	__bitrev16(__x);				\
  })
 
+#define bitrev8x4(x) \
+({			\
+	u32 __x = x;	\
+	__builtin_constant_p(__x) ?	\
+	__constant_bitrev8x4(__x) :			\
+	__bitrev8x4(__x);				\
+})
+
 #define bitrev8(x) \
 ({			\
 	u8 __x = x;	\
-- 
2.9.3

^ permalink raw reply related

* [PATCH v5 2/3] doc: dt: add cyclone-ps-spi binding document
From: Joshua Clayton @ 2016-12-07 21:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <cover.1481139171.git.stillcompiling@gmail.com>

Describe a cyclone-ps-spi devicetree entry, required features

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 .../bindings/fpga/cyclone-ps-spi-fpga-mgr.txt      | 25 ++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/fpga/cyclone-ps-spi-fpga-mgr.txt

diff --git a/Documentation/devicetree/bindings/fpga/cyclone-ps-spi-fpga-mgr.txt b/Documentation/devicetree/bindings/fpga/cyclone-ps-spi-fpga-mgr.txt
new file mode 100644
index 0000000..3f515c7
--- /dev/null
+++ b/Documentation/devicetree/bindings/fpga/cyclone-ps-spi-fpga-mgr.txt
@@ -0,0 +1,25 @@
+Altera Cyclone Passive Serial SPI FPGA Manager
+
+Altera Cyclone FPGAs support a method of loading the bitstream over what is
+referred to as "passive serial".
+The passive serial link is not technically spi, and might require extra
+circuits in order to play nicely with other spi slaves on the same bus.
+
+See https://www.altera.com/literature/hb/cyc/cyc_c51013.pdf
+
+Required properties:
+- compatible  : should contain "altr,cyclone-ps-spi-fpga-mgr"
+- reg         : spi slave id of the fpga
+- config-gpios : config pin (referred to as nCONFIG in the cyclone manual)
+- status-gpios : status pin (referred to as nSTATUS in the cyclone manual)
+
+both gpios pins are normally active low open drain.
+
+Example:
+	fpga_spi: evi-fpga-spi at 0 {
+		compatible = "altr,cyclone-ps-spi-fpga-mgr";
+		spi-max-frequency = <20000000>;
+		reg = <0>;
+		config-gpios = <&gpio4 9 GPIO_ACTIVE_LOW>;
+		status-gpios = <&gpio4 11 GPIO_ACTIVE_LOW>;
+	};
-- 
2.9.3

^ permalink raw reply related

* [PATCH v5 3/3] fpga manager: Add cyclone-ps-spi driver for Altera FPGAs
From: Joshua Clayton @ 2016-12-07 21:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <cover.1481139171.git.stillcompiling@gmail.com>

cyclone-ps-spi loads FPGA firmware over spi, using the "passive serial"
interface on Altera Cyclone FPGAS.

This is one of the simpler ways to set up an FPGA at runtime.
The signal interface is close to unidirectional spi with lsb first.

Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
---
 drivers/fpga/Kconfig          |   7 ++
 drivers/fpga/Makefile         |   1 +
 drivers/fpga/cyclone-ps-spi.c | 181 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 189 insertions(+)
 create mode 100644 drivers/fpga/cyclone-ps-spi.c

diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index cd84934..2462707 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -13,6 +13,13 @@ config FPGA
 
 if FPGA
 
+config FPGA_MGR_CYCLONE_PS_SPI
+	tristate "Altera Cyclone FPGA Passive Serial over SPI"
+	depends on SPI
+	help
+	  FPGA manager driver support for Altera Cyclone using the
+	  passive serial interface over SPI
+
 config FPGA_MGR_SOCFPGA
 	tristate "Altera SOCFPGA FPGA Manager"
 	depends on ARCH_SOCFPGA
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 8d83fc6..8f93930 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -6,5 +6,6 @@
 obj-$(CONFIG_FPGA)			+= fpga-mgr.o
 
 # FPGA Manager Drivers
+obj-$(CONFIG_FPGA_MGR_CYCLONE_PS_SPI)	+= cyclone-ps-spi.o
 obj-$(CONFIG_FPGA_MGR_SOCFPGA)		+= socfpga.o
 obj-$(CONFIG_FPGA_MGR_ZYNQ_FPGA)	+= zynq-fpga.o
diff --git a/drivers/fpga/cyclone-ps-spi.c b/drivers/fpga/cyclone-ps-spi.c
new file mode 100644
index 0000000..82a754a
--- /dev/null
+++ b/drivers/fpga/cyclone-ps-spi.c
@@ -0,0 +1,181 @@
+/**
+ * Copyright (c) 2015 United Western Technologies, Corporation
+ *
+ * Joshua Clayton <stillcompiling@gmail.com>
+ *
+ * Manage Altera fpga firmware that is loaded over spi.
+ * Firmware must be in binary "rbf" format.
+ * Works on Cyclone V. Should work on cyclone series.
+ * May work on other Altera fpgas.
+ *
+ */
+
+#include <linux/bitrev.h>
+#include <linux/delay.h>
+#include <linux/fpga/fpga-mgr.h>
+#include <linux/gpio/consumer.h>
+#include <linux/module.h>
+#include <linux/of_gpio.h>
+#include <linux/spi/spi.h>
+#include <linux/sizes.h>
+
+#define FPGA_RESET_TIME		50   /* time in usecs to trigger FPGA config */
+#define FPGA_MIN_DELAY		50   /* min usecs to wait for config status */
+#define FPGA_MAX_DELAY		1000 /* max usecs to wait for config status */
+
+struct cyclonespi_conf {
+	struct gpio_desc *config;
+	struct gpio_desc *status;
+	struct spi_device *spi;
+};
+
+static const struct of_device_id of_ef_match[] = {
+	{ .compatible = "altr,cyclone-ps-spi-fpga-mgr", },
+	{}
+};
+MODULE_DEVICE_TABLE(of, of_ef_match);
+
+static enum fpga_mgr_states cyclonespi_state(struct fpga_manager *mgr)
+{
+	struct cyclonespi_conf *conf = (struct cyclonespi_conf *)mgr->priv;
+
+	if (gpiod_get_value(conf->status))
+		return FPGA_MGR_STATE_RESET;
+
+	return FPGA_MGR_STATE_UNKNOWN;
+}
+
+static int cyclonespi_write_init(struct fpga_manager *mgr, u32 flags,
+				 const char *buf, size_t count)
+{
+	struct cyclonespi_conf *conf = (struct cyclonespi_conf *)mgr->priv;
+	int i;
+
+	if (flags & FPGA_MGR_PARTIAL_RECONFIG) {
+		dev_err(&mgr->dev, "Partial reconfiguration not supported.\n");
+		return -EINVAL;
+	}
+
+	gpiod_set_value(conf->config, 1);
+	usleep_range(FPGA_RESET_TIME, FPGA_RESET_TIME + 20);
+	if (!gpiod_get_value(conf->status)) {
+		dev_err(&mgr->dev, "Status pin should be low.\n");
+		return -EIO;
+	}
+
+	gpiod_set_value(conf->config, 0);
+	for (i = 0; i < (FPGA_MAX_DELAY / FPGA_MIN_DELAY); i++) {
+		usleep_range(FPGA_MIN_DELAY, FPGA_MIN_DELAY + 20);
+		if (!gpiod_get_value(conf->status))
+			return 0;
+	}
+
+	dev_err(&mgr->dev, "Status pin not ready.\n");
+	return -EIO;
+}
+
+static void rev_buf(void *buf, size_t len)
+{
+	u32 *fw32 = (u32 *)buf;
+	const u32 *fw_end = (u32 *)(buf + len);
+
+	/* set buffer to lsb first */
+	while (fw32 < fw_end) {
+		*fw32 = bitrev8x4(*fw32);
+		fw32++;
+	}
+}
+
+static int cyclonespi_write(struct fpga_manager *mgr, const char *buf,
+			    size_t count)
+{
+	struct cyclonespi_conf *conf = (struct cyclonespi_conf *)mgr->priv;
+	const char *fw_data = buf;
+	const char *fw_data_end = fw_data + count;
+
+	while (fw_data < fw_data_end) {
+		int ret;
+		size_t stride = min(fw_data_end - fw_data, SZ_4K);
+
+		rev_buf((void *)fw_data, stride);
+		ret = spi_write(conf->spi, fw_data, stride);
+		if (ret) {
+			dev_err(&mgr->dev, "spi error in firmware write: %d\n",
+				ret);
+			return ret;
+		}
+		fw_data += stride;
+	}
+
+	return 0;
+}
+
+static int cyclonespi_write_complete(struct fpga_manager *mgr, u32 flags)
+{
+	struct cyclonespi_conf *conf = (struct cyclonespi_conf *)mgr->priv;
+
+	if (gpiod_get_value(conf->status)) {
+		dev_err(&mgr->dev, "Error during configuration.\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static const struct fpga_manager_ops cyclonespi_ops = {
+	.state = cyclonespi_state,
+	.write_init = cyclonespi_write_init,
+	.write = cyclonespi_write,
+	.write_complete = cyclonespi_write_complete,
+};
+
+static int cyclonespi_probe(struct spi_device *spi)
+{
+	struct cyclonespi_conf *conf = devm_kzalloc(&spi->dev, sizeof(*conf),
+						GFP_KERNEL);
+
+	if (!conf)
+		return -ENOMEM;
+
+	conf->spi = spi;
+	conf->config = devm_gpiod_get(&spi->dev, "config", GPIOD_OUT_HIGH);
+	if (IS_ERR(conf->config)) {
+		dev_err(&spi->dev, "Failed to get config gpio: %ld\n",
+			PTR_ERR(conf->config));
+		return PTR_ERR(conf->config);
+	}
+
+	conf->status = devm_gpiod_get(&spi->dev, "status", GPIOD_IN);
+	if (IS_ERR(conf->status)) {
+		dev_err(&spi->dev, "Failed to get status gpio: %ld\n",
+			PTR_ERR(conf->status));
+		return PTR_ERR(conf->status);
+	}
+
+	return fpga_mgr_register(&spi->dev,
+				 "Altera Cyclone PS SPI FPGA Manager",
+				 &cyclonespi_ops, conf);
+}
+
+static int cyclonespi_remove(struct spi_device *spi)
+{
+	fpga_mgr_unregister(&spi->dev);
+
+	return 0;
+}
+
+static struct spi_driver cyclonespi_driver = {
+	.driver = {
+		.name   = "cyclone-ps-spi",
+		.owner  = THIS_MODULE,
+		.of_match_table = of_match_ptr(of_ef_match),
+	},
+	.probe  = cyclonespi_probe,
+	.remove = cyclonespi_remove,
+};
+
+module_spi_driver(cyclonespi_driver)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Joshua Clayton <stillcompiling@gmail.com>");
+MODULE_DESCRIPTION("Module to load Altera FPGA firmware over spi");
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 00/23] arm: defconfigs: use kconfig fragments
From: Arnd Bergmann @ 2016-12-07 21:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <4222703.VA8eKM008t@amdc3058>

On Wednesday, December 7, 2016 12:41:29 PM CET Bartlomiej Zolnierkiewicz wrote:
> 
> On Tuesday, December 06, 2016 11:03:34 AM Olof Johansson wrote:
> > On Tue, Dec 6, 2016 at 4:38 AM, Bartlomiej Zolnierkiewicz
> > <b.zolnierkie@samsung.com> wrote:
> > > Hi,
> > >
> > > This RFC patchset starts convertion of ARM defconfigs to use kconfig
> > > fragments and dynamically generate defconfigs.  The goals of this
> > > work are to:
> > 
> > You don't provide any motivation as to why this is better. As far as I
> 
> Benefits are:
> 
> - no code duplication (this initial patchset alone removes ~1700 lines
>   from defconfigs without any change in functionality)

This may be interesting

> - prevention of "multi" defconfigs (i.e. multi_v7_defconfig) going out
>   of sync with "SoC-family" ones (i.e. exynos_defconfig) - there will
>   be just one place to update when changing things

I'm not convinced this is worthwhile: in a lot of cases, the soc-specific
configs want to enable things built-in, while the more generic ones
tend to use loadable modules.

> - possibility to add support for more optimized defconfigs (i.e. board
>   specific ones) in the future without duplicating the code

I'd prefer seeing fewer top-level options than more of them, so
this doesn't really help.

> - making it easier to define arch specific parts of defconfigs in
>   the future if we decide on doing it (i.e. we may want to enable
>   things like CONFIG_SYSVIPC for all defconfigs)

The example you give is for something that should be decided
in architecture-independent Kconfig language rather than
per architecture, and that won't require fragments.

> > am concerned it'll just be a mess.
> > 
> > So:
> > 
> > Nack. So much nack. I really don't want to see a proliferation of
> > config fragments like this.
> > 
> > I had a feeling it was a bad idea to pick up that one line config
> > fragment before, since it opened the door for this kind of mess. 
> 
> Like I said in the cover-letter I'm not satisfied with the current
> patches and they have much room for improvement.
> 
> However I see that you don't like the idea itself... 

I do think that there is some room for more config fragments in
mainline, but not most of the patches you have here. Some areas
that I think would benefit from fragments are:

- architecture level selection: v6/v6k/v7/v7ve/v8 could have a
  common defconfig file that starts out with all v6+ enabled,
  but then having fragments that disable the older architectures
  and platforms using them while turning on features that are only
  available on newer architectures

- A "debug" fragment would be nice, to turn on common options that
  add a lot of useful runtime checks at the expense of performance
  or code size.

- A "distro" fragment that turns on all loadable modules that are
  enabled by common distributions (e.g. two or more of
  debian/fedora/opensuse/gentoo), to let you build a drop-in
  replacement kernel for a shipping distro. This would also allow
  the distros to strip their own config files and just specify
  whatever differs from the others.

	Arnd

^ permalink raw reply

* [RFC PATCH 00/23] arm: defconfigs: use kconfig fragments
From: Olof Johansson @ 2016-12-07 21:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <8292218.3BDisRkZdU@wuerfel>

On Wed, Dec 7, 2016 at 1:07 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday, December 7, 2016 12:41:29 PM CET Bartlomiej Zolnierkiewicz wrote:
>>
>> On Tuesday, December 06, 2016 11:03:34 AM Olof Johansson wrote:
>> > On Tue, Dec 6, 2016 at 4:38 AM, Bartlomiej Zolnierkiewicz
>> > <b.zolnierkie@samsung.com> wrote:
>> > > Hi,
>> > >
>> > > This RFC patchset starts convertion of ARM defconfigs to use kconfig
>> > > fragments and dynamically generate defconfigs.  The goals of this
>> > > work are to:
>> >
>> > You don't provide any motivation as to why this is better. As far as I
>>
>> Benefits are:
>>
>> - no code duplication (this initial patchset alone removes ~1700 lines
>>   from defconfigs without any change in functionality)
>
> This may be interesting

Management of the fragments is the big headache here. I haven't seen
any system that does it well downstream either in a way that scales as
far as we'd need it to.

>> - prevention of "multi" defconfigs (i.e. multi_v7_defconfig) going out
>>   of sync with "SoC-family" ones (i.e. exynos_defconfig) - there will
>>   be just one place to update when changing things
>
> I'm not convinced this is worthwhile: in a lot of cases, the soc-specific
> configs want to enable things built-in, while the more generic ones
> tend to use loadable modules.

Agreed.

>> - possibility to add support for more optimized defconfigs (i.e. board
>>   specific ones) in the future without duplicating the code
>
> I'd prefer seeing fewer top-level options than more of them, so
> this doesn't really help.
>
>> - making it easier to define arch specific parts of defconfigs in
>>   the future if we decide on doing it (i.e. we may want to enable
>>   things like CONFIG_SYSVIPC for all defconfigs)
>
> The example you give is for something that should be decided
> in architecture-independent Kconfig language rather than
> per architecture, and that won't require fragments.
>
>> > am concerned it'll just be a mess.
>> >
>> > So:
>> >
>> > Nack. So much nack. I really don't want to see a proliferation of
>> > config fragments like this.
>> >
>> > I had a feeling it was a bad idea to pick up that one line config
>> > fragment before, since it opened the door for this kind of mess.
>>
>> Like I said in the cover-letter I'm not satisfied with the current
>> patches and they have much room for improvement.
>>
>> However I see that you don't like the idea itself...
>
> I do think that there is some room for more config fragments in
> mainline, but not most of the patches you have here. Some areas
> that I think would benefit from fragments are:
>
> - architecture level selection: v6/v6k/v7/v7ve/v8 could have a
>   common defconfig file that starts out with all v6+ enabled,
>   but then having fragments that disable the older architectures
>   and platforms using them while turning on features that are only
>   available on newer architectures
>
> - A "debug" fragment would be nice, to turn on common options that
>   add a lot of useful runtime checks at the expense of performance
>   or code size.

Hmm, some of these might work but several useful debug options (in
particular DEBUG_LL for early errors) are per-system/platform.

> - A "distro" fragment that turns on all loadable modules that are
>   enabled by common distributions (e.g. two or more of
>   debian/fedora/opensuse/gentoo), to let you build a drop-in
>   replacement kernel for a shipping distro. This would also allow
>   the distros to strip their own config files and just specify
>   whatever differs from the others.

Keeping this in sync with the distro kernel could be a bit awkward
(and possibly churny).


-Olof

^ permalink raw reply

* [PATCH v5 3/3] fpga manager: Add cyclone-ps-spi driver for Altera FPGAs
From: Anatolij Gustschin @ 2016-12-07 21:21 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1778a82d1989d13919b24e47fa09eeb56b2cb8e5.1481139171.git.stillcompiling@gmail.com>

On Wed,  7 Dec 2016 13:04:40 -0800
Joshua Clayton stillcompiling at gmail.com wrote:
...
>+static int cyclonespi_write_init(struct fpga_manager *mgr, u32 flags,
>+				 const char *buf, size_t count)

there is a minor API change in linux-next [1]. struct fpga_image_info *
is passed instead of u32 flags. I assume this will be merged soon, so
please prepare this driver accordingly.

...
>+static int cyclonespi_write_complete(struct fpga_manager *mgr, u32 flags)

same here.

[1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/fpga?id=1df2865f8dd9d56cb76aa7aa1298921e7bece2af

--
Anatolij

^ permalink raw reply

* [PATCH] arm/xen: Use alloc_percpu rather than __alloc_percpu
From: Stefano Stabellini @ 2016-12-07 21:22 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481113480-7663-1-git-send-email-julien.grall@arm.com>

On Wed, 7 Dec 2016, Julien Grall wrote:
> The function xen_guest_init is using __alloc_percpu with an alignment
> which are not power of two.
> 
> However, the percpu allocator never supported alignments which are not power
> of two and has always behaved incorectly in thise case.
> 
> Commit 3ca45a4 "percpu: ensure requested alignment is power of two"
> introduced a check which trigger a warning [1] when booting linux-next
> on Xen. But in fine this bug was always present.
> 
> This can be fixed by replacing the call to __alloc_percpu with
> alloc_percpu. The latter will use an alignment which are a power of two.
> 
> [1]
> 
> [    0.023921] illegal size (48) or align (48) for percpu allocation
> [    0.024167] ------------[ cut here ]------------
> [    0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0
> [    0.024584] Modules linked in:
> [    0.024708]
> [    0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.9.0-rc7-next-20161128 #473
> [    0.025012] Hardware name: Foundation-v8A (DT)
> [    0.025162] task: ffff80003d870000 task.stack: ffff80003d844000
> [    0.025351] PC is at pcpu_alloc+0x88/0x6c0
> [    0.025490] LR is at pcpu_alloc+0x88/0x6c0
> [    0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>]
> pstate: 60000045
> [    0.025830] sp : ffff80003d847cd0
> [    0.025946] x29: ffff80003d847cd0 x28: 0000000000000000
> [    0.026147] x27: 0000000000000000 x26: 0000000000000000
> [    0.026348] x25: 0000000000000000 x24: 0000000000000000
> [    0.026549] x23: 0000000000000000 x22: 00000000024000c0
> [    0.026752] x21: ffff000008e97000 x20: 0000000000000000
> [    0.026953] x19: 0000000000000030 x18: 0000000000000010
> [    0.027155] x17: 0000000000000a3f x16: 00000000deadbeef
> [    0.027357] x15: 0000000000000006 x14: ffff000088f79c3f
> [    0.027573] x13: ffff000008f79c4d x12: 0000000000000041
> [    0.027782] x11: 0000000000000006 x10: 0000000000000042
> [    0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c
> [    0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84
> [    0.028419] x5 : 0000000000000005 x4 : 0000000000000000
> [    0.028628] x3 : 0000000000000000 x2 : 000000000000017f
> [    0.028840] x1 : ffff80003d870000 x0 : 0000000000000035
> [    0.029056]
> [    0.029152] ---[ end trace 0000000000000000 ]---
> [    0.029297] Call trace:
> [    0.029403] Exception stack(0xffff80003d847b00 to
>                                0xffff80003d847c30)
> [    0.029621] 7b00: 0000000000000030 0001000000000000
> ffff80003d847cd0 ffff00000818e678
> [    0.029901] 7b20: 0000000000000002 0000000000000004
> ffff000008f7c060 0000000000000035
> [    0.030153] 7b40: ffff000008f79000 ffff000008c4cd88
> ffff80003d847bf0 ffff000008101778
> [    0.030402] 7b60: 0000000000000030 0000000000000000
> ffff000008e97000 00000000024000c0
> [    0.030647] 7b80: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> [    0.030895] 7ba0: 0000000000000035 ffff80003d870000
> 000000000000017f 0000000000000000
> [    0.031144] 7bc0: 0000000000000000 0000000000000005
> ffff000008f79c84 6120757063726570
> [    0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40
> 0000000000000042 0000000000000006
> [    0.031643] 7c00: 0000000000000041 ffff000008f79c4d
> ffff000088f79c3f 0000000000000006
> [    0.031877] 7c20: 00000000deadbeef 0000000000000a3f
> [    0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0
> [    0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20
> [    0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4
> [    0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130
> [    0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248
> [    0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100
> [    0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
> 
> Reported-by: Wei Chen <wei.chen@arm.com>
> Link: https://lkml.org/lkml/2016/11/28/669
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> Cc: stable at vger.kernel.org
> 
> ---
> 
> I have requested backport to stable because the percpu allocator may
> misbehaves when the alignment is not a power of two.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

I'll fix a small wording issue in the commit message and apply to
xentip.


>  arch/arm/xen/enlighten.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index f193414..4986dc0 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -372,8 +372,7 @@ static int __init xen_guest_init(void)
>  	 * for secondary CPUs as they are brought up.
>  	 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
>  	 */
> -	xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info),
> -			                       sizeof(struct vcpu_info));
> +	xen_vcpu_info = alloc_percpu(struct vcpu_info);
>  	if (xen_vcpu_info == NULL)
>  		return -ENOMEM;
>  
> -- 
> 1.9.1
> 

^ permalink raw reply

* [PATCH v5 3/3] fpga manager: Add cyclone-ps-spi driver for Altera FPGAs
From: Joshua Clayton @ 2016-12-07 21:29 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161207222127.23f6b8b4@crub>

Anatolij,
Thanks for the quick look and response.
I guess I should have rebased before submitting.
I just realized I also forgot to send this patch series to
the newly minted FPGA Manager mailing list. V6 coming soon :)

On 12/07/2016 01:21 PM, Anatolij Gustschin wrote:
> On Wed,  7 Dec 2016 13:04:40 -0800
> Joshua Clayton stillcompiling at gmail.com wrote:
> ...
>> +static int cyclonespi_write_init(struct fpga_manager *mgr, u32 flags,
>> +				 const char *buf, size_t count)
> there is a minor API change in linux-next [1]. struct fpga_image_info *
> is passed instead of u32 flags. I assume this will be merged soon, so
> please prepare this driver accordingly.
>
> ...
Ah. Thanks.

>> +static int cyclonespi_write_complete(struct fpga_manager *mgr, u32 flags)
> same here.
OK.
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/fpga?id=1df2865f8dd9d56cb76aa7aa1298921e7bece2af
>
> --
> Anatolij
Joshua

^ permalink raw reply

* [Question] New mmap64 syscall?
From: Arnd Bergmann @ 2016-12-07 21:30 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <DF57E4A6-85C9-44EF-AD03-19EA688D19DB@theobroma-systems.com>

On Wednesday, December 7, 2016 5:43:27 PM CET Dr. Philipp Tomsich wrote:
> Catalin,
> 
> > On 07 Dec 2016, at 17:32, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> >>> In other words: Why not keep ILP32 simple an ask users that need a 16TB+ offset
> >>> to use LP64? It seems much more consistent with the other choices takes so far.
> >> 
> >> If user can switch to lp64, he doesn't need ilp32 at all, right? 
> >> Also, I don't understand how true 64-bit offset in mmap64() would
> >> complicate this port.
> > 
> > It's more like the user wanting a quick transition from code that was
> > only ever compiled for AArch32 (or other 32-bit architecture) with a
> > goal of full LP64 transition on the long run. I have yet to see
> > convincing benchmarks showing ILP32 as an advantage over LP64 (of
> > course, I hear the argument of reading a pointer a loop is twice as fast
> > with a half-size pointer but I don't consider such benchmarks relevant).
> 
> Most of the performance advantage in benchmarks comes from a reduction
> in the size of data-structures and/or tighter packing of arrays.  In other words,
> we can make slightly better use of the caches and push the memory subsystem
> a little further when running multiple instances of benchmarks.
> 
> Most of these advantages should eventually go away, when struct-reorg makes
> it way into the compiler. That said, it?s a marginal (but real) improvement for a
> subset of SPEC.
> 
> In the real world, the importance of ILP32 as an aid to transition legacy code
> that is not 64bit clean? and this should drive the ILP32 discussion. That we
> get a boost in our SPEC scores is just a nice extra that we get from it 

To bring this back from the philosophical questions of ABI design
to the specific point of what file offset width you want for mmap()
on 32-bit architectures.

For all I can tell, using mmap() to access a file that is many thousand
times larger than your virtual address space is completely crazy.
Adding a new mmap64() syscall on all 32-bit architectures would be
trivial if there was a use case for it, without one we but without at
least one specific application asking for it (with good reasons), we
shouldn't even be talking about that.

Note that until commit f8b7256096a2 ("Unify sys_mmap"), we actually
had a sys_mmap64 implementation on a couple of architectures, but
removed it.

	Arnd

^ permalink raw reply

* [PATCH v5 3/3] fpga manager: Add cyclone-ps-spi driver for Altera FPGAs
From: Anatolij Gustschin @ 2016-12-07 21:33 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <6e33619c-e8a4-e372-04ab-d03c379cbfa0@gmail.com>

On Wed, 7 Dec 2016 13:29:29 -0800
Joshua Clayton stillcompiling at gmail.com wrote:

>Anatolij,
>Thanks for the quick look and response.
>I guess I should have rebased before submitting.

yes, Makefile and Kconfig changes need rebasing, I think.

>I just realized I also forgot to send this patch series to
>the newly minted FPGA Manager mailing list. V6 coming soon :)

okay, thanks.

--
Anatolij

^ permalink raw reply

* [RFC PATCH 00/23] arm: defconfigs: use kconfig fragments
From: Arnd Bergmann @ 2016-12-07 21:35 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAOesGMi_mLG+tS0Ya8G9KLO_sUYGtLmgcvFfpn0Nwbd-KMKOrQ@mail.gmail.com>

On Wednesday, December 7, 2016 1:14:02 PM CET Olof Johansson wrote:
> >
> > - A "debug" fragment would be nice, to turn on common options that
> >   add a lot of useful runtime checks at the expense of performance
> >   or code size.
> 
> Hmm, some of these might work but several useful debug options (in
> particular DEBUG_LL for early errors) are per-system/platform.

I was thinking mostly of architecture-independent options, i.e.
the stuff that is in lib/Kconfig.debug but isn't too expensive
to be run in a regular test environment. Enabling those
for a build/boot automation environment would be particularly
useful as you'd catch more bugs that get introduced through
a random patch.

> > - A "distro" fragment that turns on all loadable modules that are
> >   enabled by common distributions (e.g. two or more of
> >   debian/fedora/opensuse/gentoo), to let you build a drop-in
> >   replacement kernel for a shipping distro. This would also allow
> >   the distros to strip their own config files and just specify
> >   whatever differs from the others.
> 
> Keeping this in sync with the distro kernel could be a bit awkward
> (and possibly churny).

It would certainly need buy-in from distro maintainers. I've discussed
this with Laura Abbott in the past, and she was interested in
principle.

	Arnd

^ permalink raw reply

* IMX6S MRII does not work
From: Andy Ng @ 2016-12-07 21:39 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAAVaN8z8jAkPw-=bDxNUvZOL4xB7ZK2_hkGyp8FQSE6ivhzeTQ@mail.gmail.com>

Just an update...

I have put a scope on ENET_REF_CLK to check what comes out from the
PAD_GPIO16. The uboot shows 50MHz, and when in the kernel it goes
125MHz. And for sure that doesn't do

On Wed, Dec 7, 2016 at 12:44 AM, Andy Ng <andreas2025@gmail.com> wrote:
> Hello,
>
> I have quite recent kernel linux-4.1.20 from Freescale's git repo.
>
>
> I am using MRII and a fec setup that uses clock that goes to the phy too.
>
>
> My dtsi has:
>
>     pinctrl_enet: enetgrp {
>             fsl,pins = <
>                 MX6QDL_PAD_ENET_MDIO__ENET_MDIO        0x1b0b0
>                 MX6QDL_PAD_ENET_MDC__ENET_MDC          0x1b0b0
>                 MX6QDL_PAD_ENET_CRS_DV__ENET_RX_EN  0x1b0b0
>                 MX6QDL_PAD_ENET_TXD0__ENET_TX_DATA0  0x1b0b0
>                 MX6QDL_PAD_ENET_TXD1__ENET_TX_DATA1  0x1b0b0
>                 MX6QDL_PAD_ENET_TX_EN__ENET_TX_EN      0x1b0b0
>                 MX6QDL_PAD_ENET_RX_ER__ENET_RX_ER      0x1b0b0
>                 MX6QDL_PAD_ENET_RXD0__ENET_RX_DATA0    0x1b0b0
>                 MX6QDL_PAD_ENET_RXD1__ENET_RX_DATA1     0x1b0b0
>                MX6QDL_PAD_GPIO_16__ENET_REF_CLK        0x4001b0a8
>
>             >;
>         };
>
>
> the fec entry:
>
> &fec {
>     pinctrl-names = "default";
>     pinctrl-0 = <&pinctrl_enet>;
>     phy-mode = "rmii";
>     phy-reset-duration = <2>;
>     phy-reset-gpios = <&gpio2 9 GPIO_ACTIVE_LOW>;
>     status = "okay";
> };
>
>
> The kernel comes up and detects the phy (MDIO works), but it seems fec
> is not doing much:
>
> INIT: Entering runlevel: 5
> Configuring network interfaces... RMII Detected
> fec 2188000.ethernet eth0: Freescale FEC PHY driver [SMSC
> LAN8710/LAN8720] (mii_bus:phy_addr=2188000.ethernet:00, irq=-1)
> IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.24.1) started
> Sending discover...
> Sending discover...
> Sending discover...
> No lease, forking to background
> done.
>
> I have checked the code, and  I've found very few boards that use
> rmii, and from those, most of them are using external clock for the
> phy.
>
> I have the impression that GPIO_16_ENET_REF_CLK with SION bit ON may
> not have been tested, as there is no imx6 ref board that uses
> that mode.
>
> Did anyone else have seen the same issue?
> Any thoughts will be very much appreciated.
>
> Thank you
> Andy

^ permalink raw reply

* Shipment delivery problem #000369871
From: FedEx International Ground @ 2016-12-07 21:42 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Customer,

We could not deliver your item.
Shipment Label is attached to email.

Sincerely,
Chad Wagner,
Sr. Support Manager.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: FedEx_000369871.zip
Type: application/zip
Size: 3033 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161207/eab2c8fb/attachment.zip>

^ permalink raw reply

* [PATCH v2 1/2] arm64: dts: zx: Fix gic GICR property
From: Arnd Bergmann @ 2016-12-07 21:44 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161207203412.GA10158@localhost>

On Wednesday, December 7, 2016 12:34:12 PM CET Olof Johansson wrote:
> 
> So, it seems that Arnd didn't apply the old patch to fixes-non-critical, or
> at least didn't push out after doing it. So, I've done that now, as well as
> applied the fixup you have above.
> 

My mistake, I forgot to push it out.

I checked the other branches now and found two other commits missing
in next/fixes-non-critical:


arm64: tegra: Add missing Smaug revision
arm64: tegra: Add VDD_GPU regulator to Jetson TX1

Both sent by Thierry Reding last Friday. I can add them back on top
or you pick them up again if you are already busy doing merges.

For the first patch, I had amended the changelog text, this is the
version I had.

	Arnd

8<---------
>From 4d0a00060c1cec86f5aab57c814afbfb3e152578 Mon Sep 17 00:00:00 2001
From: Alexandre Courbot <acourbot@nvidia.com>
Date: Fri, 2 Dec 2016 20:57:55 +0100
Subject: [PATCH] arm64: tegra: Add VDD_GPU regulator to Jetson TX1

Add the VDD_GPU regulator (a GPIO-enabled PWM regulator) to the Jetson
TX1 board. This addition allows the GPU to be used provided the
bootloader properly enabled the GPU node.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
[as pointed out by Thierry on IRC, nobody has reported a bug
 in the field, but using a new bootloader with a .dtb that
 has the incorrect data, it will crash on boot]
Fixes: 336f79c7b6d7 ("arm64: tegra: Add NVIDIA Jetson TX1 Developer Kit support")
Cc: stable at vger.kernel.org #v4.5+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi b/arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi
index 5fda583..906fb83 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210-p2180.dtsi
@@ -21,6 +21,10 @@
 		reg = <0x0 0x80000000 0x1 0x0>;
 	};
 
+	gpu at 57000000 {
+		vdd-supply = <&vdd_gpu>;
+	};
+
 	/* debug port */
 	serial at 70006000 {
 		status = "okay";
@@ -291,4 +295,18 @@
 			clock-frequency = <32768>;
 		};
 	};
+
+	regulators {
+		vdd_gpu: regulator at 100 {
+			compatible = "pwm-regulator";
+			reg = <100>;
+			pwms = <&pwm 1 4880>;
+			regulator-name = "VDD_GPU";
+			regulator-min-microvolt = <710000>;
+			regulator-max-microvolt = <1320000>;
+			enable-gpios = <&pmic 6 GPIO_ACTIVE_HIGH>;
+			regulator-ramp-delay = <80>;
+			regulator-enable-ramp-delay = <1000>;
+		};
+	};
 };

^ permalink raw reply related

* [PATCH V6 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel

When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

                 +---------------+
       +-------->|               |
       |         |  GHES polling |--+
+-------------+  |    source     |  |   +---------------+   +------------+
|             |  +---------------+  |   |  Kernel GHES  |   |            |
|  Firmware   |                     +-->|  CPER AER and |-->|  RAS trace |
|             |  +---------------+  |   |  EDAC drivers |   |   event    |
+-------------+  |               |  |   +---------------+   +------------+
       |         |  GHES sci     |--+
       +-------->|   source      |
                 +---------------+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

Depends on: [PATCH v15] acpi, apei, arm64: APEI initial support for aarch64.
            https://lkml.org/lkml/2016/12/1/312

V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for readability
    Move APEI helper defines from cper.h to ghes.h
    Add data_len decrement back into print loop
    Change references to ARMv8 to just ARM
    Rewrite ARM processor context info parsing
    Check valid bit of ARM error info field before printing it
    Add include of linux/uuid.h in ghes.c

V5: Fix GHES goto logic for error conditions
    Change ghes_do_read_ack to ghes_ack_error
    Make sure data version check is >= 3
    Use CPER helper functions in print functions
    Make handle_guest_sea() dummy function static for arm
    Add arm to subject line for KVM patch

V4: Add bit offset left shift to read_ack_write value
    Make HEST generic and generic_v2 structures a union in the ghes structure
    Move gdata v3 helper functions into ghes.h to avoid duplication
    Reorder the timestamp print and avoid memcpy
    Add helper functions for gdata size checking
    Rename the SEA functions
    Add helper function for GHES panics
    Set fru_id to NULL UUID at variable declaration
    Limit ARM trace event parameters to the needed structures
    Reorder the ARM trace event variables to save space
    Add comment for why we don't pass SEAs to the guest when it aborts
    Move ARM trace event call into GHES driver instead of CPER

V3: Fix unmapped address to the read_ack_register in ghes.c
    Add helper function to get the proper payload based on generic data entry
     version
    Move timestamp print to avoid changing function calls in cper.c
    Remove patch "arm64: exception: handle instruction abort at current EL"
     since the el1_ia handler is already added in 4.8
    Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA
    Add a new trace event for ARM type errors
    Add support to handle KVM guest SEAs

V2: Add PSCI state print for the ARMv8 error type.
    Separate timestamp year into year and century using BCD format.
    Rebase on top of ACPICA 20160318 release and remove header file changes
     in include/acpi/actbl1.h.
    Add panic OS with fatal error status block patch.
    Add processing of unrecognized CPER error section patches with updates
     from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
  acpi: apei: panic OS with fatal error status block

Tyler Baicar (9):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARM processor error
  arm64: exception: handle Synchronous External Abort
  acpi: apei: handle SEA notification type for ARMv8
  efi: print unrecognized CPER section
  ras: acpi / apei: generate trace event for unrecognized CPER section
  trace, ras: add ARM processor error trace event
  arm/arm64: KVM: add guest SEA support

 arch/arm/include/asm/kvm_arm.h       |   1 +
 arch/arm/include/asm/system_misc.h   |   5 +
 arch/arm/kvm/mmu.c                   |  18 +++-
 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/kvm_arm.h     |   1 +
 arch/arm64/include/asm/system_misc.h |  15 +++
 arch/arm64/mm/fault.c                |  71 ++++++++++--
 drivers/acpi/apei/Kconfig            |  14 +++
 drivers/acpi/apei/ghes.c             | 189 +++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 204 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c                    |   2 +
 include/acpi/ghes.h                  |  27 ++++-
 include/linux/cper.h                 |  53 +++++++++
 include/ras/ras_event.h              | 100 +++++++++++++++++
 15 files changed, 664 insertions(+), 44 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH V6 01/10] acpi: apei: read ack upon ghes record consumption
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/ghes.h      |  5 ++++-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 60746ef..b23160d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -79,6 +80,10 @@
 	((struct acpi_hest_generic_status *)				\
 	 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)				\
+	((struct acpi_hest_header *)ghes->generic)->type ==	\
+	 ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,10 +253,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
 	ghes->generic = generic;
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_register);
+		if (rc)
+			goto err_free;
+	}
+
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
-		goto err_free;
+		goto err_unmap_read_ack_addr;
 	error_block_length = generic->error_block_length;
 	if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
 		pr_warning(FW_WARN GHES_PFX
@@ -263,13 +276,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
-		goto err_unmap;
+		goto err_unmap_status_addr;
 	}
 
 	return ghes;
 
-err_unmap:
+err_unmap_status_addr:
 	apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -279,6 +296,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -648,6 +668,23 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_register);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_register.bit_offset;
+	val |= generic_v2->read_ack_write <<
+		generic_v2->read_ack_register.bit_offset;
+	rc = apei_write(val, &generic_v2->read_ack_register);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -660,6 +697,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = ghes_ack_error(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return 0;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 792a0d9..ef725a9 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@ static const int hest_esrc_len_tab[ACPI_HEST_TYPE_RESERVED] = {
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -146,7 +147,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -157,7 +159,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..68f088a 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -13,7 +13,10 @@
 #define GHES_EXITING		0x0002
 
 struct ghes {
-	struct acpi_hest_generic *generic;
+	union {
+		struct acpi_hest_generic *generic;
+		struct acpi_hest_generic_v2 *generic_v2;
+	};
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

Currently when a RAS error is reported it is not timestamped.
The ACPI 6.1 spec adds the timestamp field to the generic error
data entry v3 structure. The timestamp of when the firmware
generated the error is now being reported.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c    |  9 ++++---
 drivers/firmware/efi/cper.c | 63 +++++++++++++++++++++++++++++++++++----------
 include/acpi/ghes.h         | 22 ++++++++++++++++
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b23160d..2acbc60 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -420,7 +420,8 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
-	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+
+	mem_err = acpi_hest_generic_data_payload(gdata);
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
@@ -457,7 +458,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
-			mem_err = (struct cper_sec_mem_err *)(gdata+1);
+
+			mem_err = acpi_hest_generic_data_payload(gdata);
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
@@ -467,7 +469,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
-			pcie_err = (struct cper_sec_pcie *)(gdata+1);
+
+			pcie_err = acpi_hest_generic_data_payload(gdata);
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..8fa4e23 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,9 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/printk.h>
+#include <linux/bcd.h>
+#include <acpi/ghes.h>
 
 #define INDENT_SP	" "
 
@@ -386,13 +389,37 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
+static void cper_estatus_print_section_v300(const char *pfx,
+	const struct acpi_hest_generic_data_v300 *gdata)
+{
+	__u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+	if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+		timestamp = (__u8 *)&(gdata->time_stamp);
+		sec = bcd2bin(timestamp[0]);
+		min = bcd2bin(timestamp[1]);
+		hour = bcd2bin(timestamp[2]);
+		day = bcd2bin(timestamp[4]);
+		mon = bcd2bin(timestamp[5]);
+		year = bcd2bin(timestamp[6]);
+		century = bcd2bin(timestamp[7]);
+		printk("%stime: %7s %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+			0x01 & *(timestamp + 3) ? "precise" : "", century,
+			year, mon, day, hour, min, sec);
+	}
+}
+
 static void cper_estatus_print_section(
-	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+	const char *pfx, struct acpi_hest_generic_data *gdata, int sec_no)
 {
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
 	char newpfx[64];
 
+	if (acpi_hest_generic_data_version(gdata) >= 3)
+		cper_estatus_print_section_v300(pfx,
+			(const struct acpi_hest_generic_data_v300 *)gdata);
+
 	severity = gdata->error_severity;
 	printk("%s""Error %d, type: %s\n", pfx, sec_no,
 	       cper_severity_str(severity));
@@ -403,14 +430,18 @@ static void cper_estatus_print_section(
 
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 	if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-		struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+		struct cper_sec_proc_generic *proc_err;
+
+		proc_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: general processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*proc_err))
 			cper_print_proc_generic(newpfx, proc_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		struct cper_sec_mem_err *mem_err;
+
+		mem_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: memory error\n", newpfx);
 		if (gdata->error_data_length >=
 		    sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +450,9 @@ static void cper_estatus_print_section(
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-		struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+		struct cper_sec_pcie *pcie;
+
+		pcie = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: PCIe error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*pcie))
 			cper_print_pcie(newpfx, pcie, gdata);
@@ -438,7 +471,7 @@ void cper_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
-	unsigned int data_len, gedata_len;
+	unsigned int data_len;
 	int sec_no = 0;
 	char newpfx[64];
 	__u16 severity;
@@ -451,12 +484,13 @@ void cper_estatus_print(const char *pfx,
 	printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
 		cper_estatus_print_section(newpfx, gdata, sec_no);
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= acpi_hest_generic_data_record_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 		sec_no++;
 	}
 }
@@ -486,12 +520,13 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 		return rc;
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		if (gedata_len > data_len - sizeof(*gdata))
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
+		gedata_len = acpi_hest_generic_data_error_length(gdata);
+		if (gedata_len > data_len - acpi_hest_generic_data_size(gdata))
 			return -EINVAL;
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= gedata_len + acpi_hest_generic_data_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 	}
 	if (data_len)
 		return -EINVAL;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 68f088a..6ae318b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -12,6 +12,18 @@
 #define GHES_TO_CLEAR		0x0001
 #define GHES_EXITING		0x0002
 
+#define acpi_hest_generic_data_error_length(gdata)	\
+	(((struct acpi_hest_generic_data *)(gdata))->error_data_length)
+#define acpi_hest_generic_data_size(gdata)		\
+	((acpi_hest_generic_data_version(gdata) >= 3) ?	\
+	sizeof(struct acpi_hest_generic_data_v300) :	\
+	sizeof(struct acpi_hest_generic_data))
+#define acpi_hest_generic_data_record_size(gdata)	\
+	(acpi_hest_generic_data_size(gdata) +		\
+	acpi_hest_generic_data_error_length(gdata))
+#define acpi_hest_generic_data_next(gdata)		\
+	((void *)(gdata) + acpi_hest_generic_data_record_size(gdata))
+
 struct ghes {
 	union {
 		struct acpi_hest_generic *generic;
@@ -73,3 +85,13 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+
+#define acpi_hest_generic_data_version(gdata)			\
+	(gdata->revision >> 8)
+
+static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data *gdata)
+{
+	return acpi_hest_generic_data_version(gdata) >= 3 ?
+		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
+		gdata + 1;
+}
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 03/10] efi: parse ARM processor error
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 128 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  53 ++++++++++++++++++
 2 files changed, 181 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8fa4e23..1ac2572 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
+	"ARM",
 };
 
 static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
+	"ARM A32/T32",
+	"ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -139,6 +142,18 @@ static const char * const proc_flag_strs[] = {
 	"corrected",
 };
 
+static const char * const arm_reg_ctx_strs[] = {
+	"AArch32 general purpose registers",
+	"AArch32 EL1 context registers",
+	"AArch32 EL2 context registers",
+	"AArch32 secure context registers",
+	"AArch64 general purpose registers",
+	"AArch64 EL1 context registers",
+	"AArch64 EL2 context registers",
+	"AArch64 EL3 context registers",
+	"Misc. system register structure",
+};
+
 static void cper_print_proc_generic(const char *pfx,
 				    const struct cper_sec_proc_generic *proc)
 {
@@ -184,6 +199,110 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+static void cper_print_proc_arm(const char *pfx,
+				const struct cper_sec_proc_arm *proc)
+{
+	int i, len, max_ctx_type;
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	char newpfx[64];
+
+	printk("%ssection length: %d\n", pfx, proc->section_length);
+	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+	len = proc->section_length - (sizeof(*proc) +
+		proc->err_info_num * (sizeof(*err_info)));
+	if (len < 0) {
+		printk("%ssection length is too small\n", pfx);
+		printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+		return;
+	}
+
+	if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+		printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);
+	if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+		printk("%serror affinity level: %d\n", pfx,
+			proc->affinity_level);
+	if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+		printk("%srunning state: %d\n", pfx, proc->running_state);
+		printk("%sPSCI state: %d\n", pfx, proc->psci_state);
+	}
+
+	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+	err_info = (struct cper_arm_err_info *)(proc + 1);
+	for (i = 0; i < proc->err_info_num; i++) {
+		printk("%sError info structure %d:\n", pfx, i);
+		printk("%sversion:%d\n", newpfx, err_info->version);
+		printk("%slength:%d\n", newpfx, err_info->length);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_MULTI_ERR) {
+			if (err_info->multiple_error == 0)
+				printk("%ssingle error\n", newpfx);
+			else if (err_info->multiple_error == 1)
+				printk("%smultiple errors\n", newpfx);
+			else
+				printk("%smultiple errors count:%d\n",
+				newpfx, err_info->multiple_error);
+		}
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+				printk("%sfirst error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+				printk("%slast error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+				printk("%spropagated error captured\n",
+				       newpfx);
+		}
+		printk("%serror_type: %d, %s\n", newpfx, err_info->type,
+			err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+			proc_error_type_strs[err_info->type] : "unknown");
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+			printk("%serror_info: 0x%016llx\n", newpfx,
+			       err_info->error_info);
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+			printk("%svirtual fault address: 0x%016llx\n",
+				newpfx, err_info->virt_fault_addr);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+			printk("%sphysical fault address: 0x%016llx\n",
+				newpfx, err_info->physical_fault_addr);
+		err_info += 1;
+	}
+	ctx_info = (struct cper_arm_ctx_info *)err_info;
+	max_ctx_type = (sizeof(arm_reg_ctx_strs) /
+			sizeof(arm_reg_ctx_strs[0]) - 1);
+	for (i = 0; i < proc->context_info_num; i++) {
+		int size = sizeof(*ctx_info) + ctx_info->size;
+
+		printk("%sContext info structure %d:\n", pfx, i);
+		if (len < size) {
+			printk("%ssection length is too small\n", newpfx);
+			return;
+		}
+		if (ctx_info->type > max_ctx_type) {
+			printk("%sInvalid context type: %d\n",	newpfx,
+						ctx_info->type);
+			printk("%sMax context type: %d\n", newpfx,
+						max_ctx_type);
+			return;
+		}
+		printk("%sregister context type %d: %s\n", newpfx,
+			ctx_info->type, arm_reg_ctx_strs[ctx_info->type]);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				(ctx_info + 1), ctx_info->size, 0);
+		len -= size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + size);
+	}
+
+	if (len > 0) {
+		printk("%sVendor specific error info has %d bytes:\n", pfx,
+		       len);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4, ctx_info,
+				len, 0);
+	}
+}
+
 static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
@@ -458,6 +577,15 @@ static void cper_estatus_print_section(
 			cper_print_pcie(newpfx, pcie, gdata);
 		else
 			goto err_section_too_small;
+	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARM)) {
+		struct cper_sec_proc_arm *arm_err;
+
+		arm_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection_type: ARM processor error\n", newpfx);
+		if (gdata->error_data_length >= sizeof(*arm_err))
+			cper_print_proc_arm(newpfx, arm_err);
+		else
+			goto err_section_too_small;
 	} else
 		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
 
diff --git a/include/linux/cper.h b/include/linux/cper.h
index dcacb1a..6064eac 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -180,6 +180,10 @@ enum {
 #define CPER_SEC_PROC_IPF						\
 	UUID_LE(0xE429FAF1, 0x3CB7, 0x11D4, 0x0B, 0xCA, 0x07, 0x00,	\
 		0x80, 0xC7, 0x3C, 0x88, 0x81)
+/* Processor Specific: ARM */
+#define CPER_SEC_PROC_ARM						\
+	UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05,	\
+		0x1D, 0x5D, 0x46, 0xB0)
 /* Platform Memory */
 #define CPER_SEC_PLATFORM_MEM						\
 	UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83,	\
@@ -255,6 +259,21 @@ enum {
 
 #define CPER_PCIE_SLOT_SHIFT			3
 
+#define CPER_ARM_VALID_MPIDR			0x00000001
+#define CPER_ARM_VALID_AFFINITY_LEVEL		0x00000002
+#define CPER_ARM_VALID_RUNNING_STATE		0x00000004
+#define CPER_ARM_VALID_VENDOR_INFO		0x00000008
+
+#define CPER_ARM_INFO_VALID_MULTI_ERR		0x0001
+#define CPER_ARM_INFO_VALID_FLAGS		0x0002
+#define CPER_ARM_INFO_VALID_ERR_INFO		0x0004
+#define CPER_ARM_INFO_VALID_VIRT_ADDR		0x0008
+#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR	0x0010
+
+#define CPER_ARM_INFO_FLAGS_FIRST		0x0001
+#define CPER_ARM_INFO_FLAGS_LAST		0x0002
+#define CPER_ARM_INFO_FLAGS_PROPAGATED		0x0004
+
 /*
  * All tables and structs must be byte-packed to match CPER
  * specification, since the tables are provided by the system BIOS
@@ -340,6 +359,40 @@ struct cper_ia_proc_ctx {
 	__u64	mm_reg_addr;
 };
 
+/* ARM Processor Error Section */
+struct cper_sec_proc_arm {
+	__u32	validation_bits;
+	__u16	err_info_num; /* Number of Processor Error Info */
+	__u16	context_info_num; /* Number of Processor Context Info Records*/
+	__u32	section_length;
+	__u8	affinity_level;
+	__u8	reserved[3];	/* must be zero */
+	__u64	mpidr;
+	__u64	midr;
+	__u32	running_state; /* Bit 0 set - Processor running. PSCI = 0 */
+	__u32	psci_state;
+};
+
+/* ARM Processor Error Information Structure */
+struct cper_arm_err_info {
+	__u8	version;
+	__u8	length;
+	__u16	validation_bits;
+	__u8	type;
+	__u16	multiple_error;
+	__u8	flags;
+	__u64	error_info;
+	__u64	virt_fault_addr;
+	__u64	physical_fault_addr;
+};
+
+/* ARM Processor Context Information Structure */
+struct cper_arm_ctx_info {
+	__u16	version;
+	__u16	type;
+	__u32	size;
+};
+
 /* Old Memory Error Section UEFI 2.1, 2.2 */
 struct cper_sec_mem_err_old {
 	__u64	validation_bits;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 04/10] arm64: exception: handle Synchronous External Abort
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, go through
the handlers registered in the notification list.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/include/asm/system_misc.h | 13 ++++++++
 arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 57f110b..9040e1d 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
 #endif	/* __ASSEMBLY__ */
 
+/*
+ * The functions below are used to register and unregister callbacks
+ * that are to be invoked when a Synchronous External Abort (SEA)
+ * occurs. An SEA is raised by certain fault status codes that have
+ * either data or instruction abort as the exception class, and
+ * callbacks may be registered to parse or handle such hardware errors.
+ *
+ * Registered callbacks are run in an interrupt/atomic context. They
+ * are not allowed to block or sleep.
+ */
+int register_synchronous_ext_abort_notifier(struct notifier_block *nb);
+void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 05d2bd7..fcc49f1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -39,6 +39,22 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+/*
+ * GHES SEA handler code may register a notifier call here to
+ * handle HW error record passed from platform.
+ */
+static ATOMIC_NOTIFIER_HEAD(sea_handler_chain);
+
+int register_synchronous_ext_abort_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&sea_handler_chain, nb);
+}
+
+void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&sea_handler_chain, nb);
+}
+
 static const char *fault_name(unsigned int esr);
 
 #ifdef CONFIG_KPROBES
@@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_synch_ext_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static const struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -502,22 +540,22 @@ static const struct fault_info {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous parity or ECC err" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 synch parity error"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 05/10] acpi: apei: handle SEA notification type for ARMv8
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

ARM APEI extension proposal added SEA (Synchrounous External
Abort) notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  1 +
 drivers/acpi/apei/Kconfig | 14 ++++++++
 drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b380c87..ae34349 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,7 @@ config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_ACPI_APEI if (ACPI && EFI)
+	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..3786ff1 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
 config HAVE_ACPI_APEI_NMI
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	depends on ARM64
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 2acbc60..66ab3fd 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -50,6 +50,10 @@
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+#include <asm/system_misc.h>
+#endif
+
 #include "apei-internal.h"
 
 #define GHES_PFX	"GHES: "
@@ -767,6 +771,62 @@ static struct notifier_block ghes_notifier_sci = {
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+static int ghes_notify_sea(struct notifier_block *this,
+				  unsigned long event, void *data)
+{
+	struct ghes *ghes;
+	int ret = NOTIFY_DONE;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		if (!ghes_proc(ghes))
+			ret = NOTIFY_OK;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static struct notifier_block ghes_notifier_sea = {
+	.notifier_call = ghes_notify_sea,
+};
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	if (list_empty(&ghes_sea))
+		register_synchronous_ext_abort_notifier(&ghes_notifier_sea);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	if (list_empty(&ghes_sea))
+		unregister_synchronous_ext_abort_notifier(&ghes_notifier_sea);
+	mutex_unlock(&ghes_list_mutex);
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1011,6 +1071,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1022,6 +1090,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1076,6 +1151,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1118,6 +1198,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 06/10] acpi: apei: panic OS with fatal error status block
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 66ab3fd..895dec7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -141,6 +141,8 @@ static unsigned long ghes_estatus_pool_size_request;
 static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
+static int ghes_panic_timeout __read_mostly = 30;
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -692,6 +694,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
 	return rc;
 }
 
+static void __ghes_call_panic(void)
+{
+	if (panic_timeout == 0)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -703,6 +712,10 @@ static int ghes_proc(struct ghes *ghes)
 		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
+	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+		__ghes_call_panic();
+	}
+
 	ghes_do_proc(ghes, ghes->estatus);
 
 	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
@@ -847,8 +860,6 @@ static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
 
-static int ghes_panic_timeout	__read_mostly = 30;
-
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -941,9 +952,7 @@ static void __ghes_panic(struct ghes *ghes)
 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
 
 	/* reboot to log the error! */
-	if (panic_timeout == 0)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
+	__ghes_call_panic();
 }
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 07/10] efi: print unrecognized CPER section
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

Following is a sample output from dmesg:
[  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  115.787456] {1}[Hardware Error]: event severity: corrected
[  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
[  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
[  115.805596] {1}[Hardware Error]:  fru_text:
[  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
[  115.823880] {1}[Hardware Error]:  section length: 88
[  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
[  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
[  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
[  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
[  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
[  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 1ac2572..6cfcaa2 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -586,8 +586,16 @@ static void cper_estatus_print_section(
 			cper_print_proc_arm(newpfx, arm_err);
 		else
 			goto err_section_too_small;
-	} else
-		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+	} else {
+		const void *unknown_err;
+
+		unknown_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection type: %pUl\n", newpfx, sec_type);
+		printk("%ssection length: %d\n", newpfx,
+		       gdata->error_data_length);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+			       unknown_err, gdata->error_data_length, 0);
+	}
 
 	return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH V6 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
From: Tyler Baicar @ 2016-12-07 21:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481147303-7979-1-git-send-email-tbaicar@codeaurora.org>

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
any section type that the kernel knows how to parse, trace event
is not generated for such section. And thus user is not able to know
happening of such hardware error, including error record of
non-standard section.

This commit generates a trace event which contains raw error data
for unrecognized CPER section.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 22 ++++++++++++++++++++--
 drivers/ras/ras.c        |  1 +
 include/ras/ras_event.h  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 895dec7..f2217da 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -44,11 +44,13 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/nmi.h>
+#include <linux/uuid.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <ras/ras_event.h>
 
 #ifdef CONFIG_HAVE_ACPI_APEI_SEA
 #include <asm/system_misc.h>
@@ -457,11 +459,21 @@ static void ghes_do_proc(struct ghes *ghes,
 {
 	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
+	uuid_le sec_type;
+	uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
 		sec_sev = ghes_severity(gdata->error_severity);
-		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		sec_type = *(uuid_le *)gdata->section_type;
+
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+			fru_id = (uuid_le *)gdata->fru_id;
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+			fru_text = gdata->fru_text;
+
+		if (!uuid_le_cmp(sec_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
 
@@ -472,7 +484,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		else if (!uuid_le_cmp(sec_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
 
@@ -505,6 +517,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
+		else {
+			void *unknown_err = acpi_hest_generic_data_payload(gdata);
+			trace_unknown_sec_event(&sec_type,
+					fru_id, fru_text, sec_sev,
+					unknown_err, gdata->error_data_length);
+		}
 	}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..fb2500b 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,4 @@ subsys_initcall(ras_init);
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 1791a12..5861b6f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@ TRACE_EVENT(mc_event,
 );
 
 /*
+ * Unknown Section Report
+ *
+ * This event is generated when hardware detected a hardware
+ * error event, which may be of non-standard section as defined
+ * in UEFI spec appendix "Common Platform Error Record", or may
+ * be of sections for which TRACE_EVENT is not defined.
+ *
+ */
+TRACE_EVENT(unknown_sec_event,
+
+	TP_PROTO(const uuid_le *sec_type,
+		 const uuid_le *fru_id,
+		 const char *fru_text,
+		 const u8 sev,
+		 const u8 *err,
+		 const u32 len),
+
+	TP_ARGS(sec_type, fru_id, fru_text, sev, err, len),
+
+	TP_STRUCT__entry(
+		__array(char, sec_type, 16)
+		__array(char, fru_id, 16)
+		__string(fru_text, fru_text)
+		__field(u8, sev)
+		__field(u32, len)
+		__dynamic_array(u8, buf, len)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->sec_type, sec_type, sizeof(uuid_le));
+		memcpy(__entry->fru_id, fru_id, sizeof(uuid_le));
+		__assign_str(fru_text, fru_text);
+		__entry->sev = sev;
+		__entry->len = len;
+		memcpy(__get_dynamic_array(buf), err, len);
+	),
+
+	TP_printk("severity: %d; sec type:%pU; FRU: %pU %s; data len:%d; raw data:%s",
+		  __entry->sev, __entry->sec_type,
+		  __entry->fru_id, __get_str(fru_text),
+		  __entry->len,
+		  __print_hex(__get_dynamic_array(buf), __entry->len))
+);
+
+/*
  * PCIe AER Trace event
  *
  * These events are generated when hardware detects a corrected or
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox