All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: linux-next: Tree for May 14 (objtool 2/2)
From: Kees Cook @ 2020-05-29 19:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, Peter Zijlstra
In-Reply-To: <20200529175456.tbedus7okjrlkao7@treble>

On Fri, May 29, 2020 at 12:54:56PM -0500, Josh Poimboeuf wrote:
> On Thu, May 28, 2020 at 11:06:32PM -0700, Kees Cook wrote:
> > diff --git a/lib/Kconfig.ubsan b/lib/Kconfig.ubsan
> > index 929211039bac..27bcc2568c95 100644
> > --- a/lib/Kconfig.ubsan
> > +++ b/lib/Kconfig.ubsan
> > @@ -63,7 +63,7 @@ config UBSAN_SANITIZE_ALL
> >  config UBSAN_ALIGNMENT
> >         bool "Enable checks for pointers alignment"
> >         default !HAVE_EFFICIENT_UNALIGNED_ACCESS
> > -       depends on !X86 || !COMPILE_TEST
> > +       depends on !UBSAN_TRAP
> >         help
> >           This option enables the check of unaligned memory accesses.
> >           Enabling this option on architectures that support unaligned
> > 
> > How about that?
> 
> But I thought you said the alignment traps might be useful on other
> arches?  Should it be
> 
> 	depends on !X86 || !UBSAN_TRAP
> 
> ?

I was just trying to avoid objtool there, but really, UBSAN_TRAP is
likely insane for unaligned access checks entirely. If anyone ever needs
it, they can adjust. :)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH net-next 06/11] net: dsa: ocelot: create a template for the DSA tags on xmit
From: Vladimir Oltean @ 2020-05-29 19:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, netdev, Vivien Didelot, Florian Fainelli,
	Russell King - ARM Linux admin, Antoine Tenart, Alexandre Belloni,
	Horatiu Vultur, Allan W. Nielsen, Microchip Linux Driver Support,
	Alexandru Marginean, Claudiu Manoil, Madalin Bucur (OSS),
	radu-andrei.bulie, fido_max
In-Reply-To: <20200528145058.GA840827@lunn.ch>

Hi Andrew,

On Thu, 28 May 2020 at 17:51, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Thu, May 28, 2020 at 02:41:08AM +0300, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > With this patch we try to kill 2 birds with 1 stone.
> >
> > First of all, some switches that use tag_ocelot.c don't have the exact
> > same bitfield layout for the DSA tags. The destination ports field is
> > different for Seville VSC9953 for example. So the choices are to either
> > duplicate tag_ocelot.c into a new tag_seville.c (sub-optimal) or somehow
> > take into account a supposed ocelot->dest_ports_offset when packing this
> > field into the DSA injection header (again not ideal).
> >
> > Secondly, tag_ocelot.c already needs to memset a 128-bit area to zero
> > and call some packing() functions of dubious performance in the
> > fastpath. And most of the values it needs to pack are pretty much
> > constant (BYPASS=1, SRC_PORT=CPU, DEST=port index). So it would be good
> > if we could improve that.
> >
> > The proposed solution is to allocate a memory area per port at probe
> > time, initialize that with the statically defined bits as per chip
> > hardware revision, and just perform a simpler memcpy in the fastpath.
>
> Hi Vladimir
>
> We try to keep the taggers independent of the DSA drivers. I think
> tag_ocelot.c is the only one that breaks this.
>
> tag drivers are kernel modules. They have all the options of a kernel
> module, such as init and exit functions. You could create these
> templates in the module init function, and clean them up in the exit
> function. You can also register multiple taggers in one
> driver. tag_brcm.c does this as an example. So you can have a Seville
> tagger which uses different templates to ocelot.
>
>        Andrew

I don't particularly like that tag_brcm.c is riddled with #if /
#endif, they make it difficult to follow.

And if I allocate/free the xmit template in the
dsa_tag_driver_module_init / dsa_tag_driver_module_exit, how can I
reach the pointer to the correct per-switch-per-port template in the
ocelot_xmit function?

Please note that ocelot_xmit is already stateful, and it _needs_ to be
stateful: for 1588, it saves and increments the TX timestamp ID which
will be matched to the data that is received in felix_irq_handler.

And sja1105 also breaks the tagger/driver separation, and in even
"worse" ways - see sja1105_xmit_tpid which transmits a different frame
depending on which state the driver is in; also sja1105_decode_subvlan
which on RX looks up a table populated by the driver.

Generally speaking, I don't see any good reason why keeping the tagger
and the driver separated should be a design goal, especially when the
hotpath depends on stateful information (and the tagging driver can't
do anything at all without a backing switch driver anyway). Separation
could be done only in the simplest of cases, but as more advanced
features are necessary (not arguing that the template I'm adding here
is "advanced" stuff), this becomes practically impossible. Please also
see this tag_ocelot.c patch which needs to take the classified VLAN
from the DSA tag, or not, depending on the VLAN awareness state of the
port:
https://patchwork.ozlabs.org/project/netdev/patch/20200506074900.28529-7-xiaoliang.yang_1@nxp.com/

Thanks,
-Vladimir

^ permalink raw reply

* Re: mmotm 2020-05-13-20-30 uploaded (objtool warnings)
From: Linus Torvalds @ 2020-05-29 19:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Christoph Hellwig, Randy Dunlap, Andrew Morton,
	Mark Brown, linux-fsdevel, Linux Kernel Mailing List, Linux-MM,
	Linux Next Mailing List, Michal Hocko, mm-commits,
	Stephen Rothwell, Al Viro, the arch/x86 maintainers,
	Steven Rostedt
In-Reply-To: <20200529165011.o7vvhn4wcj6zjxux@treble>

On Fri, May 29, 2020 at 9:50 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> From staring at the asm I think the generated code is correct, it's just
> that the nested likelys with ftrace profiling cause GCC to converge the
> error/success paths.  But objtool doesn't do register value tracking so
> it's not smart enough to know that it's safe.

I'm surprised that gcc doesn't end up doing the obvious CSE and then
branch following and folding it all away in the end, but your patch is
obviously the right thing to do regardless, so ack on that.

Al - I think this had best go into your uaccess cleanup branch with
that csum-wrapper update, to avoid any unnecessary conflicts or
dependencies.

             Linus

^ permalink raw reply

* Re: [PATCH v10 1/2] dt-bindings: mtd: Add Nand Flash Controller support for Intel LGM SoC
From: Rob Herring @ 2020-05-29 19:31 UTC (permalink / raw)
  To: Ramuthevar,Vadivel MuruganX
  Cc: linux-kernel, linux-mtd, devicetree, miquel.raynal, richard,
	vigneshr, arnd, brendanhiggins, tglx, boris.brezillon,
	anders.roxell, masonccyang, linux-mips, hauke.mehrtens,
	andriy.shevchenko, qi-ming.wu, cheol.yong.kim
In-Reply-To: <20200528153929.46859-2-vadivel.muruganx.ramuthevar@linux.intel.com>

On Thu, May 28, 2020 at 11:39:28PM +0800, Ramuthevar,Vadivel MuruganX wrote:
> From: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> 
> Add YAML file for dt-bindings to support NAND Flash Controller
> on Intel's Lightning Mountain SoC.
> 
> Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> ---
>  .../devicetree/bindings/mtd/intel,lgm-nand.yaml    | 93 ++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> 
> diff --git a/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml b/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> new file mode 100644
> index 000000000000..afecc9920e04
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> @@ -0,0 +1,93 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/mtd/intel,lgm-nand.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Intel LGM SoC NAND Controller Device Tree Bindings
> +
> +allOf:
> +  - $ref: "nand-controller.yaml"
> +
> +maintainers:
> +  - Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> +
> +properties:
> +  compatible:
> +    const: intel,lgm-nand-controller

Doesn't match the example.

> +
> +  reg:
> +    maxItems: 6
> +
> +  reg-names:
> +    items:
> +       - const: ebunand
> +       - const: hsnand
> +       - const: nand_cs0
> +       - const: nand_cs1
> +       - const: addr_sel0
> +       - const: addr_sel1
> +
> +  clocks:
> +    maxItems: 1
> +
> +  dmas:
> +    maxItems: 2
> +
> +  dma-names:
> +    items:
> +      - const: tx
> +      - const: rx
> +
> +patternProperties:
> +  "^nand@[a-f0-9]+$":
> +    type: object
> +    properties:
> +      reg:
> +        minimum: 0
> +        maximum: 7
> +
> +      nand-ecc-mode: true
> +
> +      nand-ecc-algo:
> +        const: hw
> +
> +    additionalProperties: false
> +
> +required:
> +  - compatible
> +  - reg
> +  - reg-names
> +  - clocks
> +  - dmas
> +  - dma-names
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    nand-controller@e0f00000 {
> +      compatible = "intel,lgm-nand";
> +      reg = <0xe0f00000 0x100>,
> +            <0xe1000000 0x300>,
> +            <0xe1400000 0x8000>,
> +            <0xe1c00000 0x1000>,
> +            <0x17400000 0x4>,
> +            <0x17c00000 0x4>;
> +      reg-names = "ebunand", "hsnand", "nand_cs0", "nand_cs1",
> +        "addr_sel0", "addr_sel1";
> +      clocks = <&cgu0 125>;
> +      dmas = <&dma0 8>, <&dma0 9>;
> +      dma-names = "tx", "rx";
> +      #address-cells = <1>;
> +      #size-cells = <0>;
> +
> +      nand@0 {
> +        reg = <0>;
> +        nand-on-flash-bbt;
> +        #address-cells = <1>;
> +        #size-cells = <1>;
> +      };
> +    };
> +
> +...
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH v10 1/2] dt-bindings: mtd: Add Nand Flash Controller support for Intel LGM SoC
From: Rob Herring @ 2020-05-29 19:31 UTC (permalink / raw)
  To: Ramuthevar, Vadivel MuruganX
  Cc: cheol.yong.kim, devicetree, qi-ming.wu, anders.roxell, vigneshr,
	arnd, hauke.mehrtens, richard, brendanhiggins, linux-kernel,
	linux-mips, boris.brezillon, linux-mtd, miquel.raynal, tglx,
	masonccyang, andriy.shevchenko
In-Reply-To: <20200528153929.46859-2-vadivel.muruganx.ramuthevar@linux.intel.com>

On Thu, May 28, 2020 at 11:39:28PM +0800, Ramuthevar,Vadivel MuruganX wrote:
> From: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> 
> Add YAML file for dt-bindings to support NAND Flash Controller
> on Intel's Lightning Mountain SoC.
> 
> Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> ---
>  .../devicetree/bindings/mtd/intel,lgm-nand.yaml    | 93 ++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> 
> diff --git a/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml b/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> new file mode 100644
> index 000000000000..afecc9920e04
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mtd/intel,lgm-nand.yaml
> @@ -0,0 +1,93 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/mtd/intel,lgm-nand.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Intel LGM SoC NAND Controller Device Tree Bindings
> +
> +allOf:
> +  - $ref: "nand-controller.yaml"
> +
> +maintainers:
> +  - Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
> +
> +properties:
> +  compatible:
> +    const: intel,lgm-nand-controller

Doesn't match the example.

> +
> +  reg:
> +    maxItems: 6
> +
> +  reg-names:
> +    items:
> +       - const: ebunand
> +       - const: hsnand
> +       - const: nand_cs0
> +       - const: nand_cs1
> +       - const: addr_sel0
> +       - const: addr_sel1
> +
> +  clocks:
> +    maxItems: 1
> +
> +  dmas:
> +    maxItems: 2
> +
> +  dma-names:
> +    items:
> +      - const: tx
> +      - const: rx
> +
> +patternProperties:
> +  "^nand@[a-f0-9]+$":
> +    type: object
> +    properties:
> +      reg:
> +        minimum: 0
> +        maximum: 7
> +
> +      nand-ecc-mode: true
> +
> +      nand-ecc-algo:
> +        const: hw
> +
> +    additionalProperties: false
> +
> +required:
> +  - compatible
> +  - reg
> +  - reg-names
> +  - clocks
> +  - dmas
> +  - dma-names
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    nand-controller@e0f00000 {
> +      compatible = "intel,lgm-nand";
> +      reg = <0xe0f00000 0x100>,
> +            <0xe1000000 0x300>,
> +            <0xe1400000 0x8000>,
> +            <0xe1c00000 0x1000>,
> +            <0x17400000 0x4>,
> +            <0x17c00000 0x4>;
> +      reg-names = "ebunand", "hsnand", "nand_cs0", "nand_cs1",
> +        "addr_sel0", "addr_sel1";
> +      clocks = <&cgu0 125>;
> +      dmas = <&dma0 8>, <&dma0 9>;
> +      dma-names = "tx", "rx";
> +      #address-cells = <1>;
> +      #size-cells = <0>;
> +
> +      nand@0 {
> +        reg = <0>;
> +        nand-on-flash-bbt;
> +        #address-cells = <1>;
> +        #size-cells = <1>;
> +      };
> +    };
> +
> +...
> -- 
> 2.11.0
> 

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply

* Re: [PATCH 1/4] dt-bindings: pinctrl: Document 7211 compatible for brcm, bcm2835-gpio.txt
From: Rob Herring @ 2020-05-29 19:32 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Nicolas Saenz Julienne, linux-kernel,
	open list:PIN CONTROL SUBSYSTEM,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Stefan Wahren,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Rob Herring,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Matti Vaittinen, Geert Uytterhoeven, Linus Walleij, Scott Branden,
	Ray Jui
In-Reply-To: <20200528192112.26123-2-f.fainelli@gmail.com>

On Thu, 28 May 2020 12:21:09 -0700, Florian Fainelli wrote:
> Document the brcm,bcm7211-gpio compatible string in the
> brcm,bcm2835-gpio.txt document.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/4] dt-bindings: pinctrl: Document 7211 compatible for brcm, bcm2835-gpio.txt
From: Rob Herring @ 2020-05-29 19:32 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Stefan Wahren,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Scott Branden, Geert Uytterhoeven, Ray Jui, Linus Walleij,
	Matti Vaittinen, linux-kernel, open list:PIN CONTROL SUBSYSTEM,
	Rob Herring,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	Nicolas Saenz Julienne,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
In-Reply-To: <20200528192112.26123-2-f.fainelli@gmail.com>

On Thu, 28 May 2020 12:21:09 -0700, Florian Fainelli wrote:
> Document the brcm,bcm7211-gpio compatible string in the
> brcm,bcm2835-gpio.txt document.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring <robh@kernel.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* drivers/gpu/drm/mgag200/mgag200_drv.c:70:5: warning: no previous prototype for function 'mgag200_driver_dumb_create'
From: kbuild test robot @ 2020-05-29 19:27 UTC (permalink / raw)
  To: Thomas, Zimmermann,
  Cc: kbuild-all, clang-built-linux, linux-kernel, Daniel Vetter

[-- Attachment #1: Type: text/plain, Size: 2539 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head:   75caf310d16cc5e2f851c048cd597f5437013368
commit: 1591fadf857cdbaf2baa55e421af99a61354713c drm/mgag200: Add workaround for HW that does not support 'startadd'
date:   6 months ago
config: arm-randconfig-r036-20200529 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 2d068e534f1671459e1b135852c1b3c10502e929)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        git checkout 1591fadf857cdbaf2baa55e421af99a61354713c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> drivers/gpu/drm/mgag200/mgag200_drv.c:70:5: warning: no previous prototype for function 'mgag200_driver_dumb_create' [-Wmissing-prototypes]
int mgag200_driver_dumb_create(struct drm_file *file,
^
drivers/gpu/drm/mgag200/mgag200_drv.c:70:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int mgag200_driver_dumb_create(struct drm_file *file,
^
static
1 warning generated.

vim +/mgag200_driver_dumb_create +70 drivers/gpu/drm/mgag200/mgag200_drv.c

    69	
  > 70	int mgag200_driver_dumb_create(struct drm_file *file,
    71				       struct drm_device *dev,
    72				       struct drm_mode_create_dumb *args)
    73	{
    74		struct mga_device *mdev = dev->dev_private;
    75		unsigned long pg_align;
    76	
    77		if (WARN_ONCE(!dev->vram_mm, "VRAM MM not initialized"))
    78			return -EINVAL;
    79	
    80		pg_align = 0ul;
    81	
    82		/*
    83		 * Aligning scanout buffers to the size of the video ram forces
    84		 * placement at offset 0. Works around a bug where HW does not
    85		 * respect 'startadd' field.
    86		 */
    87		if (mgag200_pin_bo_at_0(mdev))
    88			pg_align = PFN_UP(mdev->mc.vram_size);
    89	
    90		return drm_gem_vram_fill_create_dumb(file, dev, &dev->vram_mm->bdev,
    91						     pg_align, false, args);
    92	}
    93	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33154 bytes --]

^ permalink raw reply

* Re: 5.6.12 MCE on AMD EPYC 7502
From: Yazen Ghannam @ 2020-05-29 19:32 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Dmitry Antipov, linux-kernel@vger.kernel.org
In-Reply-To: <20200529115720.GF9011@zn.tnic>

On Fri, May 29, 2020 at 07:57:20AM -0400, Borislav Petkov wrote:
> On Fri, May 29, 2020 at 01:55:29PM +0300, Dmitry Antipov wrote:
> > Hello,
> > 
> > I'm facing the following kernel messages running Debian 9 with
> > custom 5.6.12 kernel running on AMD EPYC 7502 - based hardware:
> > 
> > [138537.806814] mce: [Hardware Error]: Machine check events logged
> > [138537.806818] [Hardware Error]: Corrected error, no action required.
> > [138537.808456] [Hardware Error]: CPU:0 (17:31:0) MC27_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd82000000002080b
> > [138537.810080] [Hardware Error]: IPID: 0x0001002e00001e01, Syndrome: 0x000000005a000005
> > [138537.811694] [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error.
> > [138537.813281] [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)
> > 
> > Is it related to some (not so) known CPU errata?
> 
> Who knows.
>

There aren't any reported errata related to this that I could find.

> > Should I try to update microcode, motherboard firmware, kernel, or whatever else?
> 
> Yeah, BIOS update might be a good idea, if there's a newer version for
> your board.
>

I agree. The link settings are generally tuned for the platform. So the
platform vendor may have a fix.

Thanks,
Yazen

^ permalink raw reply

* Re: [PATCH v4 6/5] fixup! rebase: add --reset-author-date
From: Johannes Schindelin @ 2020-05-29  2:59 UTC (permalink / raw)
  To: Đoàn Trần Công Danh
  Cc: Phillip Wood, Junio C Hamano, Elijah Newren, Rohit Ashiwal,
	Alban Gruin, Git Mailing List
In-Reply-To: <20200528131701.GD1983@danh.dev>

[-- Attachment #1: Type: text/plain, Size: 2450 bytes --]

Hi,

On Thu, 28 May 2020, Đoàn Trần Công Danh wrote:

> On 2020-05-27 18:57:48+0100, Phillip Wood <phillip.wood123@gmail.com> wrote:
> > From: Phillip Wood <phillip.wood@dunelm.org.uk>
> >
> > Sorry I somehow forgot to commit this before sending the v4 patches,
> > it fixes up the final patch
> >
> > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> > ---
> >  t/t3436-rebase-more-options.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/t/t3436-rebase-more-options.sh b/t/t3436-rebase-more-options.sh
> > index 5ee193f333..ecfd68397f 100755
> > --- a/t/t3436-rebase-more-options.sh
> > +++ b/t/t3436-rebase-more-options.sh
> > @@ -196,7 +196,7 @@ test_expect_success '--ignore-date is an alias for --reset-author-date' '
> >  	git rebase --apply --ignore-date HEAD^ &&
> >  	git commit --allow-empty -m empty --date="$GIT_AUTHOR_DATE" &&
> >  	git rebase -m --ignore-date HEAD^ &&
> > -	git log -2 --pretty="format:%ai" >authortime &&
> > +	git log -2 --pretty=%ai >authortime &&
> >  	grep "+0000" authortime >output &&
> >  	test_line_count = 2 output
> >  '
>
> This version addressed all of my concerns, LGTM.
>
> Only the last
>
> 	test_line_count = 2 output
>
> puzzled me at first.
> Since it's the only usage of test_line_count in this version
> Turn out, it's equivalence with:
> -----------8<-----------
> diff --git a/t/t3436-rebase-more-options.sh b/t/t3436-rebase-more-options.sh
> index ecfd68397f..abe9af4d8c 100755
> --- a/t/t3436-rebase-more-options.sh
> +++ b/t/t3436-rebase-more-options.sh
> @@ -197,8 +197,7 @@ test_expect_success '--ignore-date is an alias for --reset-author-date' '
>  	git commit --allow-empty -m empty --date="$GIT_AUTHOR_DATE" &&
>  	git rebase -m --ignore-date HEAD^ &&
>  	git log -2 --pretty=%ai >authortime &&
> -	grep "+0000" authortime >output &&
> -	test_line_count = 2 output
> +	! grep -v "+0000" authortime
>  '
>
>  # This must be the last test in this file
> ------>8------

Good suggestion!

I've read through all 5 patches, and rather than repeating much of what I
said about 1/5 and 2/5 in 4/5, I'll just say it here that it applies
there, too: less repetitions in the test script, and I'd prefer the layer
where the `apply` vs `merge` options are set to be `cmd__rebase()` rather
than `run_am()` (and `get_replay_opts()`).

All in all, it was a pleasant read.

Thanks,
Dscho

^ permalink raw reply

* Re: [PATCH 2/4] dt-bindings: pinctrl: Document optional BCM7211 wake-up interrupts
From: Rob Herring @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-kernel, Linus Walleij, Ray Jui, Scott Branden,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	Nicolas Saenz Julienne, Stefan Wahren, Geert Uytterhoeven,
	Matti Vaittinen, open list:PIN CONTROL SUBSYSTEM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
In-Reply-To: <20200528192112.26123-3-f.fainelli@gmail.com>

On Thu, May 28, 2020 at 12:21:10PM -0700, Florian Fainelli wrote:
> BCM7211 supports wake-up interrupts in the form of optional interrupt
> lines, one per bank, plus the "all banks" interrupt line.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  .../devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt         | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> index dfc67b90591c..5682b2010e50 100644
> --- a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> @@ -16,7 +16,9 @@ Required properties:
>    second cell is used to specify optional parameters:
>    - bit 0 specifies polarity (0 for normal, 1 for inverted)
>  - interrupts : The interrupt outputs from the controller. One interrupt per
> -  individual bank followed by the "all banks" interrupt.
> +  individual bank followed by the "all banks" interrupt. For BCM7211, an
> +  additional set of per-bank interrupt line and an "all banks" wake-up
> +  interrupt may be specified.

Is 'all banks' the name? Generally 'wakeup' is used for a wake up irq.

Rob

^ permalink raw reply

* Re: [PATCH 2/4] dt-bindings: pinctrl: Document optional BCM7211 wake-up interrupts
From: Rob Herring @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Stefan Wahren,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Geert Uytterhoeven, Scott Branden, Ray Jui, Linus Walleij,
	Matti Vaittinen, linux-kernel, open list:PIN CONTROL SUBSYSTEM,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Nicolas Saenz Julienne,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
In-Reply-To: <20200528192112.26123-3-f.fainelli@gmail.com>

On Thu, May 28, 2020 at 12:21:10PM -0700, Florian Fainelli wrote:
> BCM7211 supports wake-up interrupts in the form of optional interrupt
> lines, one per bank, plus the "all banks" interrupt line.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  .../devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt         | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> index dfc67b90591c..5682b2010e50 100644
> --- a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
> @@ -16,7 +16,9 @@ Required properties:
>    second cell is used to specify optional parameters:
>    - bit 0 specifies polarity (0 for normal, 1 for inverted)
>  - interrupts : The interrupt outputs from the controller. One interrupt per
> -  individual bank followed by the "all banks" interrupt.
> +  individual bank followed by the "all banks" interrupt. For BCM7211, an
> +  additional set of per-bank interrupt line and an "all banks" wake-up
> +  interrupt may be specified.

Is 'all banks' the name? Generally 'wakeup' is used for a wake up irq.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
From: Bjorn Helgaas @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: Kuppuswamy Sathyanarayanan, Ganapatrao Prabhakerrao Kulkarni,
	Myron Stowe, Vijay Mohan Pandarathil, Marc Zyngier,
	Bhupesh Sharma, kexec mailing list, Robin Murphy, linux-pci,
	Prabhakar Kushwaha, Will Deacon, linux-arm-kernel
In-Reply-To: <CAJ2QiJKKSy20Z5oZ-yMb3AaioowBWC9ooQeQ+n+vXGLdiYKhgg@mail.gmail.com>

On Fri, May 29, 2020 at 07:48:10PM +0530, Prabhakar Kushwaha wrote:
> On Thu, May 28, 2020 at 1:48 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> > > On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > > > via this new table..
> > > > > > > > >
> > > > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > > > it will be aborted by SMMU.
> > > > > > > > >
> > > > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > > > >
> > > > > > > > That sounds like exactly what we want, doesn't it?
> > > > > > > >
> > > > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > > > as it's not memory used by the kdump kernel.
> > > > > > >
> > > > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > > > setup present at this moment.
> > > > > >
> > > > > > I don't understand what you mean by "in context of driver."  The whole
> > > > > > problem is that we can't control *when* the abort happens, so it may
> > > > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > > > or at some other unpredictable time.
> > > > > >
> > > > > > > Solution of this issue should be at 2 place
> > > > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > > > issue till finally driver's probe takeover.
> > > > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > > > able to recover.
> > > > > > >
> > > > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > > > A network device hang is observed even after continuous
> > > > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > > > >
> > > > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > > > we could recover and reliably use the device after the error, that
> > > > > > > > seems like it would be a more robust solution that having to add
> > > > > > > > special cases in every IOMMU driver.
> > > > > > > >
> > > > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > > > it because we want to recover from that sort of error in normal
> > > > > > > > (non-crash) situations as well.
> > > > > > > >
> > > > > > > Completion abort case should be gracefully handled.  And device should
> > > > > > > always remain usable.
> > > > > > >
> > > > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > > > 82576 Gigabit Network card.
> > > > > > >
> > > > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > > > >     -  kdump file system does not have Ethernet driver
> > > > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > > > of kdump root file system.
> > > > > >
> > > > > > In this case, I think report_error_detected() is deciding that because
> > > > > > the device has no driver, we can't do anything.  The flow is like
> > > > > > this:
> > > > > >
> > > > > >   aer_recover_work_func               # aer_recover_work
> > > > > >     kfifo_get(aer_recover_ring, entry)
> > > > > >     dev = pci_get_domain_bus_and_slot
> > > > > >     cper_print_aer(dev, ...)
> > > > > >       pci_err("AER: aer_status:")
> > > > > >       pci_err("AER:   [14] CmpltTO")
> > > > > >       pci_err("AER: aer_layer=")
> > > > > >     if (AER_NONFATAL)
> > > > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > > > >         status = CAN_RECOVER
> > > > > >         pci_walk_bus(report_normal_detected)
> > > > > >           report_error_detected
> > > > > >             if (!dev->driver)
> > > > > >               vote = NO_AER_DRIVER
> > > > > >               pci_info("can't recover (no error_detected callback)")
> > > > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > > > >             # always NO_AER_DRIVER
> > > > > >         status is now NO_AER_DRIVER
> > > > > >
> > > > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > > > and status is not RECOVERED, so it skips .resume().
> > > > > >
> > > > > > I don't remember the history there, but if a device has no driver and
> > > > > > the device generates errors, it seems like we ought to be able to
> > > > > > reset it.
> > > > >
> > > > > But how to reset the device considering there is no driver.
> > > > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > > > perform reset at PCIe level.
> > > >
> > > > I don't understand your question.  The PCI core (not the device
> > > > driver) already does the reset.  When pcie_do_recovery() calls
> > > > reset_link(), all devices on the other side of the link are reset.
> > > >
> > > > > > We should be able to field one (or a few) AER errors, reset the
> > > > > > device, and you should be able to use the shell in the kdump kernel.
> > > > > >
> > > > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > > > cannot see what they are typing.
> > > >
> > > > Right, that's what I expect.  If the PCI core resets the device, you
> > > > should get just a few AER errors, and they should stop after the
> > > > device is reset.
> > > >
> > > > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > > > >
> > > > > > > II) Crash testing using default root file system: Specific case to
> > > > > > > test Ethernet driver in second kernel
> > > > > > >    -  Default root file system have Ethernet driver
> > > > > > >    -  AER error comes even before the driver probe starts.
> > > > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > > > >    -  AER also tries to recover. but no success.  [2]
> > > > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > > > >           than different set of errors come which also never able to recover [3]
> > > > > > >
> > > > >
> > > > > Please suggest your view on this case. Here driver is preset.
> > > > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > > > In this case AER errors starts even before driver probe starts.
> > > > > After probe, driver does the device reset with no success and even AER
> > > > > recovery does not work.
> > > >
> > > > This case should be the same as the one above.  If we can change the
> > > > PCI core so it can reset the device when there's no driver,  that would
> > > > apply to case I (where there will never be a driver) and to case II
> > > > (where there is no driver now, but a driver will probe the device
> > > > later).
> > >
> > > Does this means change are required in PCI core.
> >
> > Yes, I am suggesting that the PCI core does not do the right thing
> > here.
> >
> > > I tried following changes in pcie_do_recovery() but it did not help.
> > > Same error as before.
> > >
> > > -- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > >         pci_info(dev, "broadcast resume message\n");
> > >         pci_walk_bus(bus, report_resume, &status);
> > > @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > >         return status;
> > >
> > >  failed:
> > >         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> > > +       pci_reset_function(dev);
> > > +       pci_aer_clear_device_status(dev);
> > > +       pci_aer_clear_nonfatal_status(dev);
> >
> > Did you confirm that this resets the devices in question (0000:09:00.0
> > and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
> > PM, etc)?
> 
> Earlier reset  was happening with P2P bridge(0000:00:09.0) this the
> reason no effect. After making following changes,  both devices are
> now getting reset.
> Both devices are using FLR.
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 117c0a2b2ba4..26b908f55aef 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -66,6 +66,20 @@ static int report_error_detected(struct pci_dev *dev,
>                 if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>                         vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>                         pci_info(dev, "can't recover (no
> error_detected callback)\n");
> +
> +                       pci_save_state(dev);
> +                       pci_cfg_access_lock(dev);
> +
> +                       /* Quiesce the device completely */
> +                       pci_write_config_word(dev, PCI_COMMAND,
> +                             PCI_COMMAND_INTX_DISABLE);
> +                       if (!__pci_reset_function_locked(dev)) {
> +                               vote = PCI_ERS_RESULT_RECOVERED;
> +                               pci_info(dev, "recovered via pci level
> reset\n");
> +                       }

Why do we need to save the state and quiesce the device?  The reset
should disable interrupts anyway.  In this particular case where
there's no driver, I don't think we should have to restore the state.
We maybe should *remove* the device and re-enumerate it after the
reset, but the state from before the reset should be irrelevant.

> +                       pci_cfg_access_unlock(dev);
> +                       pci_restore_state(dev);
>                 } else {
>                         vote = PCI_ERS_RESULT_NONE;
>                 }
> 
> in order to take care of case 2 (driver comes after sometime) ==>
> following code needs to be added to avoid crash during igb_probe.  It
> looks to be a race condition between AER and igb_probe().
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index b46bff8fe056..c48f0a54bb95 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3012,6 +3012,11 @@ static int igb_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>         /* Catch broken hardware that put the wrong VF device ID in
>          * the PCIe SR-IOV capability.
>          */
> +       if (pci_dev_trylock(pdev)) {
> +               mdelay(1000);
> +               pci_info(pdev,"device is locked, try waiting 1 sec\n");
> +       }

This is interesting to learn about the AER/driver interaction, but of
course, we wouldn't want to add code like this permanently.

> Here are the observation with all above changes
> A) AER errors are less but they are still there for both case 1 (No
> driver at all) and case 2 (driver comes after some time)

We'll certainly get *some* AER errors.  We have to get one before we
know to reset the device.

> B) Each AER error(NON_FATAL) causes both devices to reset. It happens many times

I'm not sure why we reset both devices.  Are we seeing errors from
both, or could we be more selective in the code?

> C) After that AER errors [1] comes is only for device 0000:09:00.0.
> This is strange as this pci device is not being used during test.
> Ping/ssh are happening with 0000:09:01.0
> D) If wait for some more time. No more AER errors from any device
> E) Ping is working fine in case 2.
> 
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 
> # lspci -t -v
> 
>  \-[0000:00]-+-00.0  Cavium, Inc. CN99xx [ThunderX2] Integrated PCI Host bridge
>              +-01.0-[01]--
>              +-02.0-[02]--
>              +-03.0-[03]--
>              +-04.0-[04]--
>              +-05.0-[05]--+-00.0  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              |            \-00.1  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              +-06.0-[06]--
>              +-07.0-[07]--
>              +-08.0-[08]--
>              +-09.0-[09-0a]--+-00.0  Intel Corporation 82576 Gigabit
> Network Connection
>              |               \-00.1  Intel Corporation 82576 Gigabit
> Network Connection
> 
> 
> [1] AER error which comes for 09:00.0:
> 
> [   81.659825] {7}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   81.668080] {7}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   81.676503] {7}[Hardware Error]: event severity: corrected
> [   81.681975] {7}[Hardware Error]:  Error 0, type: corrected
> [   81.687447] {7}[Hardware Error]:   section_type: PCIe error
> [   81.693004] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.698908] {7}[Hardware Error]:   version: 3.0
> [   81.703424] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.709589] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.715059] {7}[Hardware Error]:   slot: 0
> [   81.719141] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.724265] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.730864] {7}[Hardware Error]:   class_code: 000002
> [   81.735901] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.742587] {7}[Hardware Error]:  Error 1, type: corrected
> [   81.748058] {7}[Hardware Error]:   section_type: PCIe error
> [   81.753615] {7}[Hardware Error]:   port_type: 4, root port
> [   81.759086] {7}[Hardware Error]:   version: 3.0
> [   81.763602] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.769767] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.775237] {7}[Hardware Error]:   slot: 0
> [   81.779319] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.784442] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.791041] {7}[Hardware Error]:   class_code: 000406
> [   81.796078] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.803806] {7}[Hardware Error]:  Error 2, type: corrected
> [   81.809276] {7}[Hardware Error]:   section_type: PCIe error
> [   81.814834] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.820738] {7}[Hardware Error]:   version: 3.0
> [   81.825254] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.831419] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.836889] {7}[Hardware Error]:   slot: 0
> [   81.840971] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.846094] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.852693] {7}[Hardware Error]:   class_code: 000002
> [   81.857730] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.864416] {7}[Hardware Error]:  Error 3, type: corrected
> [   81.869886] {7}[Hardware Error]:   section_type: PCIe error
> [   81.875444] {7}[Hardware Error]:   port_type: 4, root port
> [   81.880914] {7}[Hardware Error]:   version: 3.0
> [   81.885430] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.891595] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.897066] {7}[Hardware Error]:   slot: 0
> [   81.901147] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.906271] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.912870] {7}[Hardware Error]:   class_code: 000406
> [   81.917906] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.925634] {7}[Hardware Error]:  Error 4, type: corrected
> [   81.931104] {7}[Hardware Error]:   section_type: PCIe error
> [   81.936662] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.942566] {7}[Hardware Error]:   version: 3.0
> [   81.947082] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.953247] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.958717] {7}[Hardware Error]:   slot: 0
> [   81.962799] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.967923] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.974522] {7}[Hardware Error]:   class_code: 000002
> [   81.979558] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.986244] {7}[Hardware Error]:  Error 5, type: corrected
> [   81.991715] {7}[Hardware Error]:   section_type: PCIe error
> [   81.997272] {7}[Hardware Error]:   port_type: 4, root port
> [   82.002743] {7}[Hardware Error]:   version: 3.0
> [   82.007259] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.013424] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.018894] {7}[Hardware Error]:   slot: 0
> [   82.022976] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.028099] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.034698] {7}[Hardware Error]:   class_code: 000406
> [   82.039735] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.047463] {7}[Hardware Error]:  Error 6, type: corrected
> [   82.052933] {7}[Hardware Error]:   section_type: PCIe error
> [   82.058491] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.064395] {7}[Hardware Error]:   version: 3.0
> [   82.068911] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.075076] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.080547] {7}[Hardware Error]:   slot: 0
> [   82.084628] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.089752] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.096351] {7}[Hardware Error]:   class_code: 000002
> [   82.101387] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.108073] {7}[Hardware Error]:  Error 7, type: corrected
> [   82.113544] {7}[Hardware Error]:   section_type: PCIe error
> [   82.119101] {7}[Hardware Error]:   port_type: 4, root port
> [   82.124572] {7}[Hardware Error]:   version: 3.0
> [   82.129087] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.135252] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.140723] {7}[Hardware Error]:   slot: 0
> [   82.144805] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.149928] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.156527] {7}[Hardware Error]:   class_code: 000406
> [   82.161564] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.169291] {7}[Hardware Error]:  Error 8, type: corrected
> [   82.174762] {7}[Hardware Error]:   section_type: PCIe error
> [   82.180319] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.186224] {7}[Hardware Error]:   version: 3.0
> [   82.190739] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.196904] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.202375] {7}[Hardware Error]:   slot: 0
> [   82.206456] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.211580] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.218179] {7}[Hardware Error]:   class_code: 000002
> [   82.223216] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.229901] {7}[Hardware Error]:  Error 9, type: corrected
> [   82.235372] {7}[Hardware Error]:   section_type: PCIe error
> [   82.240929] {7}[Hardware Error]:   port_type: 4, root port
> [   82.246400] {7}[Hardware Error]:   version: 3.0
> [   82.250916] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.257081] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.262551] {7}[Hardware Error]:   slot: 0
> [   82.266633] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.271756] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.278355] {7}[Hardware Error]:   class_code: 000406
> [   82.283392] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.291119] {7}[Hardware Error]:  Error 10, type: corrected
> [   82.296676] {7}[Hardware Error]:   section_type: PCIe error
> [   82.302234] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.308138] {7}[Hardware Error]:   version: 3.0
> [   82.312654] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.318819] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.324290] {7}[Hardware Error]:   slot: 0
> [   82.328371] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.333495] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.340094] {7}[Hardware Error]:   class_code: 000002
> [   82.345131] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.351816] {7}[Hardware Error]:  Error 11, type: corrected
> [   82.357374] {7}[Hardware Error]:   section_type: PCIe error
> [   82.362931] {7}[Hardware Error]:   port_type: 4, root port
> [   82.368402] {7}[Hardware Error]:   version: 3.0
> [   82.372917] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.379082] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.384553] {7}[Hardware Error]:   slot: 0
> [   82.388635] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.393758] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.400357] {7}[Hardware Error]:   class_code: 000406
> [   82.405394] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.413121] {7}[Hardware Error]:  Error 12, type: corrected
> [   82.418678] {7}[Hardware Error]:   section_type: PCIe error
> [   82.424236] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.430140] {7}[Hardware Error]:   version: 3.0
> [   82.434656] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.440821] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.446291] {7}[Hardware Error]:   slot: 0
> [   82.450373] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.455497] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.462096] {7}[Hardware Error]:   class_code: 000002
> [   82.467132] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.473818] {7}[Hardware Error]:  Error 13, type: corrected
> [   82.479375] {7}[Hardware Error]:   section_type: PCIe error
> [   82.484933] {7}[Hardware Error]:   port_type: 4, root port
> [   82.490403] {7}[Hardware Error]:   version: 3.0
> [   82.494919] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.501084] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.506555] {7}[Hardware Error]:   slot: 0
> [   82.510636] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.515760] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.522359] {7}[Hardware Error]:   class_code: 000406
> [   82.527395] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.535171] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.542476] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.550301] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.558032] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.566296] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.573597] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.581421] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.589151] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.597411] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.604711] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.612535] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.620271] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.628525] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.635826] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.643649] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.651385] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.659645] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.666940] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.674763] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.682498] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.690759] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.698053] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.705876] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.713612] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.721872] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.729167] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.736990] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.744725] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.059225] {8}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   88.067478] {8}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   88.075899] {8}[Hardware Error]: event severity: corrected
> [   88.081370] {8}[Hardware Error]:  Error 0, type: corrected
> [   88.086841] {8}[Hardware Error]:   section_type: PCIe error
> [   88.092399] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.098303] {8}[Hardware Error]:   version: 3.0
> [   88.102819] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.108984] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.114455] {8}[Hardware Error]:   slot: 0
> [   88.118536] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.123660] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.130259] {8}[Hardware Error]:   class_code: 000002
> [   88.135296] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.141981] {8}[Hardware Error]:  Error 1, type: corrected
> [   88.147452] {8}[Hardware Error]:   section_type: PCIe error
> [   88.153009] {8}[Hardware Error]:   port_type: 4, root port
> [   88.158480] {8}[Hardware Error]:   version: 3.0
> [   88.162995] {8}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   88.169161] {8}[Hardware Error]:   device_id: 0000:00:09.0
> [   88.174633] {8}[Hardware Error]:   slot: 0
> [   88.180018] {8}[Hardware Error]:   secondary_bus: 0x09
> [   88.185142] {8}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   88.191914] {8}[Hardware Error]:   class_code: 000406
> [   88.196951] {8}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   88.204852] {8}[Hardware Error]:  Error 2, type: corrected
> [   88.210323] {8}[Hardware Error]:   section_type: PCIe error
> [   88.215881] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.221786] {8}[Hardware Error]:   version: 3.0
> [   88.226301] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.232466] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.237937] {8}[Hardware Error]:   slot: 0
> [   88.242019] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.247142] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.253741] {8}[Hardware Error]:   class_code: 000002
> [   88.258778] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.265509] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.272812] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   88.280635] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   88.288363] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.296622] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.305391] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> 
> > Case I is using APEI, and it looks like that can queue up 16 errors
> > (AER_RECOVER_RING_SIZE), so that queue could be completely full before
> > we even get a chance to reset the device.  But I would think that the
> > reset should *eventually* stop the errors, even though we might log
> > 30+ of them first.
> >
> > As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
> > see if it reduces the logging.
> 
> Did not tried this experiment. I believe it is not required now
> 
> --pk
> 
> >
> > > > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > > > during enumeration phase of kdump kernel.
> > > > > can we thought of doing pci_reset_function for all devices in kdump
> > > > > kernel or device specific quirk.
> > > > >
> > > > > --pk
> > > > >
> > > > >
> > > > > > > As per my understanding, possible solutions are
> > > > > > >  - Copy SMMU table i.e. this patch
> > > > > > > OR
> > > > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > > > same AER error again.
> > > > > > >
> > > > > > >
> > > > > > > -pk
> > > > > > >
> > > > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > > > [1] with bootargs having pci=noaer
> > > > > > >
> > > > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > > > Hardware Error Source: 1
> > > > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > > > total mem (8153768 kB)
> > > > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> >
> > <snip>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply

* Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
From: Bjorn Helgaas @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: Robin Murphy, linux-arm-kernel, kexec mailing list, linux-pci,
	Marc Zyngier, Will Deacon, Ganapatrao Prabhakerrao Kulkarni,
	Bhupesh Sharma, Prabhakar Kushwaha, Kuppuswamy Sathyanarayanan,
	Vijay Mohan Pandarathil, Myron Stowe
In-Reply-To: <CAJ2QiJKKSy20Z5oZ-yMb3AaioowBWC9ooQeQ+n+vXGLdiYKhgg@mail.gmail.com>

On Fri, May 29, 2020 at 07:48:10PM +0530, Prabhakar Kushwaha wrote:
> On Thu, May 28, 2020 at 1:48 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> > > On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > > > via this new table..
> > > > > > > > >
> > > > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > > > it will be aborted by SMMU.
> > > > > > > > >
> > > > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > > > >
> > > > > > > > That sounds like exactly what we want, doesn't it?
> > > > > > > >
> > > > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > > > as it's not memory used by the kdump kernel.
> > > > > > >
> > > > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > > > setup present at this moment.
> > > > > >
> > > > > > I don't understand what you mean by "in context of driver."  The whole
> > > > > > problem is that we can't control *when* the abort happens, so it may
> > > > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > > > or at some other unpredictable time.
> > > > > >
> > > > > > > Solution of this issue should be at 2 place
> > > > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > > > issue till finally driver's probe takeover.
> > > > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > > > able to recover.
> > > > > > >
> > > > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > > > A network device hang is observed even after continuous
> > > > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > > > >
> > > > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > > > we could recover and reliably use the device after the error, that
> > > > > > > > seems like it would be a more robust solution that having to add
> > > > > > > > special cases in every IOMMU driver.
> > > > > > > >
> > > > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > > > it because we want to recover from that sort of error in normal
> > > > > > > > (non-crash) situations as well.
> > > > > > > >
> > > > > > > Completion abort case should be gracefully handled.  And device should
> > > > > > > always remain usable.
> > > > > > >
> > > > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > > > 82576 Gigabit Network card.
> > > > > > >
> > > > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > > > >     -  kdump file system does not have Ethernet driver
> > > > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > > > of kdump root file system.
> > > > > >
> > > > > > In this case, I think report_error_detected() is deciding that because
> > > > > > the device has no driver, we can't do anything.  The flow is like
> > > > > > this:
> > > > > >
> > > > > >   aer_recover_work_func               # aer_recover_work
> > > > > >     kfifo_get(aer_recover_ring, entry)
> > > > > >     dev = pci_get_domain_bus_and_slot
> > > > > >     cper_print_aer(dev, ...)
> > > > > >       pci_err("AER: aer_status:")
> > > > > >       pci_err("AER:   [14] CmpltTO")
> > > > > >       pci_err("AER: aer_layer=")
> > > > > >     if (AER_NONFATAL)
> > > > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > > > >         status = CAN_RECOVER
> > > > > >         pci_walk_bus(report_normal_detected)
> > > > > >           report_error_detected
> > > > > >             if (!dev->driver)
> > > > > >               vote = NO_AER_DRIVER
> > > > > >               pci_info("can't recover (no error_detected callback)")
> > > > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > > > >             # always NO_AER_DRIVER
> > > > > >         status is now NO_AER_DRIVER
> > > > > >
> > > > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > > > and status is not RECOVERED, so it skips .resume().
> > > > > >
> > > > > > I don't remember the history there, but if a device has no driver and
> > > > > > the device generates errors, it seems like we ought to be able to
> > > > > > reset it.
> > > > >
> > > > > But how to reset the device considering there is no driver.
> > > > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > > > perform reset at PCIe level.
> > > >
> > > > I don't understand your question.  The PCI core (not the device
> > > > driver) already does the reset.  When pcie_do_recovery() calls
> > > > reset_link(), all devices on the other side of the link are reset.
> > > >
> > > > > > We should be able to field one (or a few) AER errors, reset the
> > > > > > device, and you should be able to use the shell in the kdump kernel.
> > > > > >
> > > > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > > > cannot see what they are typing.
> > > >
> > > > Right, that's what I expect.  If the PCI core resets the device, you
> > > > should get just a few AER errors, and they should stop after the
> > > > device is reset.
> > > >
> > > > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > > > >
> > > > > > > II) Crash testing using default root file system: Specific case to
> > > > > > > test Ethernet driver in second kernel
> > > > > > >    -  Default root file system have Ethernet driver
> > > > > > >    -  AER error comes even before the driver probe starts.
> > > > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > > > >    -  AER also tries to recover. but no success.  [2]
> > > > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > > > >           than different set of errors come which also never able to recover [3]
> > > > > > >
> > > > >
> > > > > Please suggest your view on this case. Here driver is preset.
> > > > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > > > In this case AER errors starts even before driver probe starts.
> > > > > After probe, driver does the device reset with no success and even AER
> > > > > recovery does not work.
> > > >
> > > > This case should be the same as the one above.  If we can change the
> > > > PCI core so it can reset the device when there's no driver,  that would
> > > > apply to case I (where there will never be a driver) and to case II
> > > > (where there is no driver now, but a driver will probe the device
> > > > later).
> > >
> > > Does this means change are required in PCI core.
> >
> > Yes, I am suggesting that the PCI core does not do the right thing
> > here.
> >
> > > I tried following changes in pcie_do_recovery() but it did not help.
> > > Same error as before.
> > >
> > > -- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > >         pci_info(dev, "broadcast resume message\n");
> > >         pci_walk_bus(bus, report_resume, &status);
> > > @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > >         return status;
> > >
> > >  failed:
> > >         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> > > +       pci_reset_function(dev);
> > > +       pci_aer_clear_device_status(dev);
> > > +       pci_aer_clear_nonfatal_status(dev);
> >
> > Did you confirm that this resets the devices in question (0000:09:00.0
> > and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
> > PM, etc)?
> 
> Earlier reset  was happening with P2P bridge(0000:00:09.0) this the
> reason no effect. After making following changes,  both devices are
> now getting reset.
> Both devices are using FLR.
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 117c0a2b2ba4..26b908f55aef 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -66,6 +66,20 @@ static int report_error_detected(struct pci_dev *dev,
>                 if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>                         vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>                         pci_info(dev, "can't recover (no
> error_detected callback)\n");
> +
> +                       pci_save_state(dev);
> +                       pci_cfg_access_lock(dev);
> +
> +                       /* Quiesce the device completely */
> +                       pci_write_config_word(dev, PCI_COMMAND,
> +                             PCI_COMMAND_INTX_DISABLE);
> +                       if (!__pci_reset_function_locked(dev)) {
> +                               vote = PCI_ERS_RESULT_RECOVERED;
> +                               pci_info(dev, "recovered via pci level
> reset\n");
> +                       }

Why do we need to save the state and quiesce the device?  The reset
should disable interrupts anyway.  In this particular case where
there's no driver, I don't think we should have to restore the state.
We maybe should *remove* the device and re-enumerate it after the
reset, but the state from before the reset should be irrelevant.

> +                       pci_cfg_access_unlock(dev);
> +                       pci_restore_state(dev);
>                 } else {
>                         vote = PCI_ERS_RESULT_NONE;
>                 }
> 
> in order to take care of case 2 (driver comes after sometime) ==>
> following code needs to be added to avoid crash during igb_probe.  It
> looks to be a race condition between AER and igb_probe().
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index b46bff8fe056..c48f0a54bb95 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3012,6 +3012,11 @@ static int igb_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>         /* Catch broken hardware that put the wrong VF device ID in
>          * the PCIe SR-IOV capability.
>          */
> +       if (pci_dev_trylock(pdev)) {
> +               mdelay(1000);
> +               pci_info(pdev,"device is locked, try waiting 1 sec\n");
> +       }

This is interesting to learn about the AER/driver interaction, but of
course, we wouldn't want to add code like this permanently.

> Here are the observation with all above changes
> A) AER errors are less but they are still there for both case 1 (No
> driver at all) and case 2 (driver comes after some time)

We'll certainly get *some* AER errors.  We have to get one before we
know to reset the device.

> B) Each AER error(NON_FATAL) causes both devices to reset. It happens many times

I'm not sure why we reset both devices.  Are we seeing errors from
both, or could we be more selective in the code?

> C) After that AER errors [1] comes is only for device 0000:09:00.0.
> This is strange as this pci device is not being used during test.
> Ping/ssh are happening with 0000:09:01.0
> D) If wait for some more time. No more AER errors from any device
> E) Ping is working fine in case 2.
> 
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 
> # lspci -t -v
> 
>  \-[0000:00]-+-00.0  Cavium, Inc. CN99xx [ThunderX2] Integrated PCI Host bridge
>              +-01.0-[01]--
>              +-02.0-[02]--
>              +-03.0-[03]--
>              +-04.0-[04]--
>              +-05.0-[05]--+-00.0  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              |            \-00.1  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              +-06.0-[06]--
>              +-07.0-[07]--
>              +-08.0-[08]--
>              +-09.0-[09-0a]--+-00.0  Intel Corporation 82576 Gigabit
> Network Connection
>              |               \-00.1  Intel Corporation 82576 Gigabit
> Network Connection
> 
> 
> [1] AER error which comes for 09:00.0:
> 
> [   81.659825] {7}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   81.668080] {7}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   81.676503] {7}[Hardware Error]: event severity: corrected
> [   81.681975] {7}[Hardware Error]:  Error 0, type: corrected
> [   81.687447] {7}[Hardware Error]:   section_type: PCIe error
> [   81.693004] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.698908] {7}[Hardware Error]:   version: 3.0
> [   81.703424] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.709589] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.715059] {7}[Hardware Error]:   slot: 0
> [   81.719141] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.724265] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.730864] {7}[Hardware Error]:   class_code: 000002
> [   81.735901] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.742587] {7}[Hardware Error]:  Error 1, type: corrected
> [   81.748058] {7}[Hardware Error]:   section_type: PCIe error
> [   81.753615] {7}[Hardware Error]:   port_type: 4, root port
> [   81.759086] {7}[Hardware Error]:   version: 3.0
> [   81.763602] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.769767] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.775237] {7}[Hardware Error]:   slot: 0
> [   81.779319] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.784442] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.791041] {7}[Hardware Error]:   class_code: 000406
> [   81.796078] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.803806] {7}[Hardware Error]:  Error 2, type: corrected
> [   81.809276] {7}[Hardware Error]:   section_type: PCIe error
> [   81.814834] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.820738] {7}[Hardware Error]:   version: 3.0
> [   81.825254] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.831419] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.836889] {7}[Hardware Error]:   slot: 0
> [   81.840971] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.846094] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.852693] {7}[Hardware Error]:   class_code: 000002
> [   81.857730] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.864416] {7}[Hardware Error]:  Error 3, type: corrected
> [   81.869886] {7}[Hardware Error]:   section_type: PCIe error
> [   81.875444] {7}[Hardware Error]:   port_type: 4, root port
> [   81.880914] {7}[Hardware Error]:   version: 3.0
> [   81.885430] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.891595] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.897066] {7}[Hardware Error]:   slot: 0
> [   81.901147] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.906271] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.912870] {7}[Hardware Error]:   class_code: 000406
> [   81.917906] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.925634] {7}[Hardware Error]:  Error 4, type: corrected
> [   81.931104] {7}[Hardware Error]:   section_type: PCIe error
> [   81.936662] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.942566] {7}[Hardware Error]:   version: 3.0
> [   81.947082] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.953247] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.958717] {7}[Hardware Error]:   slot: 0
> [   81.962799] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.967923] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.974522] {7}[Hardware Error]:   class_code: 000002
> [   81.979558] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.986244] {7}[Hardware Error]:  Error 5, type: corrected
> [   81.991715] {7}[Hardware Error]:   section_type: PCIe error
> [   81.997272] {7}[Hardware Error]:   port_type: 4, root port
> [   82.002743] {7}[Hardware Error]:   version: 3.0
> [   82.007259] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.013424] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.018894] {7}[Hardware Error]:   slot: 0
> [   82.022976] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.028099] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.034698] {7}[Hardware Error]:   class_code: 000406
> [   82.039735] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.047463] {7}[Hardware Error]:  Error 6, type: corrected
> [   82.052933] {7}[Hardware Error]:   section_type: PCIe error
> [   82.058491] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.064395] {7}[Hardware Error]:   version: 3.0
> [   82.068911] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.075076] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.080547] {7}[Hardware Error]:   slot: 0
> [   82.084628] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.089752] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.096351] {7}[Hardware Error]:   class_code: 000002
> [   82.101387] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.108073] {7}[Hardware Error]:  Error 7, type: corrected
> [   82.113544] {7}[Hardware Error]:   section_type: PCIe error
> [   82.119101] {7}[Hardware Error]:   port_type: 4, root port
> [   82.124572] {7}[Hardware Error]:   version: 3.0
> [   82.129087] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.135252] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.140723] {7}[Hardware Error]:   slot: 0
> [   82.144805] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.149928] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.156527] {7}[Hardware Error]:   class_code: 000406
> [   82.161564] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.169291] {7}[Hardware Error]:  Error 8, type: corrected
> [   82.174762] {7}[Hardware Error]:   section_type: PCIe error
> [   82.180319] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.186224] {7}[Hardware Error]:   version: 3.0
> [   82.190739] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.196904] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.202375] {7}[Hardware Error]:   slot: 0
> [   82.206456] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.211580] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.218179] {7}[Hardware Error]:   class_code: 000002
> [   82.223216] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.229901] {7}[Hardware Error]:  Error 9, type: corrected
> [   82.235372] {7}[Hardware Error]:   section_type: PCIe error
> [   82.240929] {7}[Hardware Error]:   port_type: 4, root port
> [   82.246400] {7}[Hardware Error]:   version: 3.0
> [   82.250916] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.257081] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.262551] {7}[Hardware Error]:   slot: 0
> [   82.266633] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.271756] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.278355] {7}[Hardware Error]:   class_code: 000406
> [   82.283392] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.291119] {7}[Hardware Error]:  Error 10, type: corrected
> [   82.296676] {7}[Hardware Error]:   section_type: PCIe error
> [   82.302234] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.308138] {7}[Hardware Error]:   version: 3.0
> [   82.312654] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.318819] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.324290] {7}[Hardware Error]:   slot: 0
> [   82.328371] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.333495] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.340094] {7}[Hardware Error]:   class_code: 000002
> [   82.345131] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.351816] {7}[Hardware Error]:  Error 11, type: corrected
> [   82.357374] {7}[Hardware Error]:   section_type: PCIe error
> [   82.362931] {7}[Hardware Error]:   port_type: 4, root port
> [   82.368402] {7}[Hardware Error]:   version: 3.0
> [   82.372917] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.379082] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.384553] {7}[Hardware Error]:   slot: 0
> [   82.388635] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.393758] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.400357] {7}[Hardware Error]:   class_code: 000406
> [   82.405394] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.413121] {7}[Hardware Error]:  Error 12, type: corrected
> [   82.418678] {7}[Hardware Error]:   section_type: PCIe error
> [   82.424236] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.430140] {7}[Hardware Error]:   version: 3.0
> [   82.434656] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.440821] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.446291] {7}[Hardware Error]:   slot: 0
> [   82.450373] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.455497] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.462096] {7}[Hardware Error]:   class_code: 000002
> [   82.467132] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.473818] {7}[Hardware Error]:  Error 13, type: corrected
> [   82.479375] {7}[Hardware Error]:   section_type: PCIe error
> [   82.484933] {7}[Hardware Error]:   port_type: 4, root port
> [   82.490403] {7}[Hardware Error]:   version: 3.0
> [   82.494919] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.501084] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.506555] {7}[Hardware Error]:   slot: 0
> [   82.510636] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.515760] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.522359] {7}[Hardware Error]:   class_code: 000406
> [   82.527395] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.535171] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.542476] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.550301] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.558032] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.566296] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.573597] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.581421] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.589151] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.597411] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.604711] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.612535] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.620271] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.628525] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.635826] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.643649] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.651385] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.659645] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.666940] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.674763] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.682498] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.690759] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.698053] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.705876] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.713612] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.721872] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.729167] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.736990] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.744725] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.059225] {8}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   88.067478] {8}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   88.075899] {8}[Hardware Error]: event severity: corrected
> [   88.081370] {8}[Hardware Error]:  Error 0, type: corrected
> [   88.086841] {8}[Hardware Error]:   section_type: PCIe error
> [   88.092399] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.098303] {8}[Hardware Error]:   version: 3.0
> [   88.102819] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.108984] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.114455] {8}[Hardware Error]:   slot: 0
> [   88.118536] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.123660] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.130259] {8}[Hardware Error]:   class_code: 000002
> [   88.135296] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.141981] {8}[Hardware Error]:  Error 1, type: corrected
> [   88.147452] {8}[Hardware Error]:   section_type: PCIe error
> [   88.153009] {8}[Hardware Error]:   port_type: 4, root port
> [   88.158480] {8}[Hardware Error]:   version: 3.0
> [   88.162995] {8}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   88.169161] {8}[Hardware Error]:   device_id: 0000:00:09.0
> [   88.174633] {8}[Hardware Error]:   slot: 0
> [   88.180018] {8}[Hardware Error]:   secondary_bus: 0x09
> [   88.185142] {8}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   88.191914] {8}[Hardware Error]:   class_code: 000406
> [   88.196951] {8}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   88.204852] {8}[Hardware Error]:  Error 2, type: corrected
> [   88.210323] {8}[Hardware Error]:   section_type: PCIe error
> [   88.215881] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.221786] {8}[Hardware Error]:   version: 3.0
> [   88.226301] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.232466] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.237937] {8}[Hardware Error]:   slot: 0
> [   88.242019] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.247142] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.253741] {8}[Hardware Error]:   class_code: 000002
> [   88.258778] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.265509] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.272812] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   88.280635] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   88.288363] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.296622] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.305391] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> 
> > Case I is using APEI, and it looks like that can queue up 16 errors
> > (AER_RECOVER_RING_SIZE), so that queue could be completely full before
> > we even get a chance to reset the device.  But I would think that the
> > reset should *eventually* stop the errors, even though we might log
> > 30+ of them first.
> >
> > As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
> > see if it reduces the logging.
> 
> Did not tried this experiment. I believe it is not required now
> 
> --pk
> 
> >
> > > > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > > > during enumeration phase of kdump kernel.
> > > > > can we thought of doing pci_reset_function for all devices in kdump
> > > > > kernel or device specific quirk.
> > > > >
> > > > > --pk
> > > > >
> > > > >
> > > > > > > As per my understanding, possible solutions are
> > > > > > >  - Copy SMMU table i.e. this patch
> > > > > > > OR
> > > > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > > > same AER error again.
> > > > > > >
> > > > > > >
> > > > > > > -pk
> > > > > > >
> > > > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > > > [1] with bootargs having pci=noaer
> > > > > > >
> > > > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > > > Hardware Error Source: 1
> > > > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > > > total mem (8153768 kB)
> > > > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> >
> > <snip>

^ permalink raw reply

* Re: [PATCH net] net: dsa: sja1105: fix port mirroring for P/Q/R/S
From: kbuild test robot @ 2020-05-29 19:33 UTC (permalink / raw)
  To: kbuild-all
In-Reply-To: <20200527164006.1080903-1-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6546 bytes --]

Hi Vladimir,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on sparc-next/master linus/master v5.7-rc7 next-20200529]
[cannot apply to net/master]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Vladimir-Oltean/net-dsa-sja1105-fix-port-mirroring-for-P-Q-R-S/20200528-004418
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git dc0f3ed1973f101508957b59e529e03da1349e09
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

drivers/net/dsa/sja1105/sja1105_static_config.c:105:8: warning: no previous prototype for 'sja1105pqrs_avb_params_entry_packing' [-Wmissing-prototypes]
105 | size_t sja1105pqrs_avb_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/dsa/sja1105/sja1105_static_config.c:149:8: warning: no previous prototype for 'sja1105pqrs_general_params_entry_packing' [-Wmissing-prototypes]
149 | size_t sja1105pqrs_general_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:198:8: warning: no previous prototype for 'sja1105_l2_forwarding_entry_packing' [-Wmissing-prototypes]
198 | size_t sja1105_l2_forwarding_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:230:8: warning: no previous prototype for 'sja1105pqrs_l2_lookup_params_entry_packing' [-Wmissing-prototypes]
230 | size_t sja1105pqrs_l2_lookup_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:252:8: warning: no previous prototype for 'sja1105et_l2_lookup_entry_packing' [-Wmissing-prototypes]
252 | size_t sja1105et_l2_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:266:8: warning: no previous prototype for 'sja1105pqrs_l2_lookup_entry_packing' [-Wmissing-prototypes]
266 | size_t sja1105pqrs_l2_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:342:8: warning: no previous prototype for 'sja1105pqrs_mac_config_entry_packing' [-Wmissing-prototypes]
342 | size_t sja1105pqrs_mac_config_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:461:8: warning: no previous prototype for 'sja1105_vl_lookup_entry_packing' [-Wmissing-prototypes]
461 | size_t sja1105_vl_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:511:8: warning: no previous prototype for 'sja1105_vlan_lookup_entry_packing' [-Wmissing-prototypes]
511 | size_t sja1105_vlan_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:542:8: warning: no previous prototype for 'sja1105_retagging_entry_packing' [-Wmissing-prototypes]
542 | size_t sja1105_retagging_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/sja1105pqrs_general_params_entry_packing +149 drivers/net/dsa/sja1105/sja1105_static_config.c

   145	
   146	/* TPID and TPID2 are intentionally reversed so that semantic
   147	 * compatibility with E/T is kept.
   148	 */
 > 149	size_t sja1105pqrs_general_params_entry_packing(void *buf, void *entry_ptr,
   150							enum packing_op op)
   151	{
   152		const size_t size = SJA1105PQRS_SIZE_GENERAL_PARAMS_ENTRY;
   153		struct sja1105_general_params_entry *entry = entry_ptr;
   154	
   155		sja1105_packing(buf, &entry->vllupformat, 351, 351, size, op);
   156		sja1105_packing(buf, &entry->mirr_ptacu,  350, 350, size, op);
   157		sja1105_packing(buf, &entry->switchid,    349, 347, size, op);
   158		sja1105_packing(buf, &entry->hostprio,    346, 344, size, op);
   159		sja1105_packing(buf, &entry->mac_fltres1, 343, 296, size, op);
   160		sja1105_packing(buf, &entry->mac_fltres0, 295, 248, size, op);
   161		sja1105_packing(buf, &entry->mac_flt1,    247, 200, size, op);
   162		sja1105_packing(buf, &entry->mac_flt0,    199, 152, size, op);
   163		sja1105_packing(buf, &entry->incl_srcpt1, 151, 151, size, op);
   164		sja1105_packing(buf, &entry->incl_srcpt0, 150, 150, size, op);
   165		sja1105_packing(buf, &entry->send_meta1,  149, 149, size, op);
   166		sja1105_packing(buf, &entry->send_meta0,  148, 148, size, op);
   167		sja1105_packing(buf, &entry->casc_port,   147, 145, size, op);
   168		sja1105_packing(buf, &entry->host_port,   144, 142, size, op);
   169		sja1105_packing(buf, &entry->mirr_port,   141, 139, size, op);
   170		sja1105_packing(buf, &entry->vlmarker,    138, 107, size, op);
   171		sja1105_packing(buf, &entry->vlmask,      106,  75, size, op);
   172		sja1105_packing(buf, &entry->tpid2,        74,  59, size, op);
   173		sja1105_packing(buf, &entry->ignore2stf,   58,  58, size, op);
   174		sja1105_packing(buf, &entry->tpid,         57,  42, size, op);
   175		sja1105_packing(buf, &entry->queue_ts,     41,  41, size, op);
   176		sja1105_packing(buf, &entry->egrmirrvid,   40,  29, size, op);
   177		sja1105_packing(buf, &entry->egrmirrpcp,   28,  26, size, op);
   178		sja1105_packing(buf, &entry->egrmirrdei,   25,  25, size, op);
   179		sja1105_packing(buf, &entry->replay_port,  24,  22, size, op);
   180		return size;
   181	}
   182	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 61666 bytes --]

^ permalink raw reply

* Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
From: Bjorn Helgaas @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: Kuppuswamy Sathyanarayanan, Ganapatrao Prabhakerrao Kulkarni,
	Myron Stowe, Vijay Mohan Pandarathil, Marc Zyngier,
	Bhupesh Sharma, kexec mailing list, Robin Murphy, linux-pci,
	Prabhakar Kushwaha, Will Deacon, linux-arm-kernel
In-Reply-To: <CAJ2QiJKKSy20Z5oZ-yMb3AaioowBWC9ooQeQ+n+vXGLdiYKhgg@mail.gmail.com>

On Fri, May 29, 2020 at 07:48:10PM +0530, Prabhakar Kushwaha wrote:
> On Thu, May 28, 2020 at 1:48 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> > > On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > > > via this new table..
> > > > > > > > >
> > > > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > > > it will be aborted by SMMU.
> > > > > > > > >
> > > > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > > > >
> > > > > > > > That sounds like exactly what we want, doesn't it?
> > > > > > > >
> > > > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > > > as it's not memory used by the kdump kernel.
> > > > > > >
> > > > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > > > setup present at this moment.
> > > > > >
> > > > > > I don't understand what you mean by "in context of driver."  The whole
> > > > > > problem is that we can't control *when* the abort happens, so it may
> > > > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > > > or at some other unpredictable time.
> > > > > >
> > > > > > > Solution of this issue should be at 2 place
> > > > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > > > issue till finally driver's probe takeover.
> > > > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > > > able to recover.
> > > > > > >
> > > > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > > > A network device hang is observed even after continuous
> > > > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > > > >
> > > > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > > > we could recover and reliably use the device after the error, that
> > > > > > > > seems like it would be a more robust solution that having to add
> > > > > > > > special cases in every IOMMU driver.
> > > > > > > >
> > > > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > > > it because we want to recover from that sort of error in normal
> > > > > > > > (non-crash) situations as well.
> > > > > > > >
> > > > > > > Completion abort case should be gracefully handled.  And device should
> > > > > > > always remain usable.
> > > > > > >
> > > > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > > > 82576 Gigabit Network card.
> > > > > > >
> > > > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > > > >     -  kdump file system does not have Ethernet driver
> > > > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > > > of kdump root file system.
> > > > > >
> > > > > > In this case, I think report_error_detected() is deciding that because
> > > > > > the device has no driver, we can't do anything.  The flow is like
> > > > > > this:
> > > > > >
> > > > > >   aer_recover_work_func               # aer_recover_work
> > > > > >     kfifo_get(aer_recover_ring, entry)
> > > > > >     dev = pci_get_domain_bus_and_slot
> > > > > >     cper_print_aer(dev, ...)
> > > > > >       pci_err("AER: aer_status:")
> > > > > >       pci_err("AER:   [14] CmpltTO")
> > > > > >       pci_err("AER: aer_layer=")
> > > > > >     if (AER_NONFATAL)
> > > > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > > > >         status = CAN_RECOVER
> > > > > >         pci_walk_bus(report_normal_detected)
> > > > > >           report_error_detected
> > > > > >             if (!dev->driver)
> > > > > >               vote = NO_AER_DRIVER
> > > > > >               pci_info("can't recover (no error_detected callback)")
> > > > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > > > >             # always NO_AER_DRIVER
> > > > > >         status is now NO_AER_DRIVER
> > > > > >
> > > > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > > > and status is not RECOVERED, so it skips .resume().
> > > > > >
> > > > > > I don't remember the history there, but if a device has no driver and
> > > > > > the device generates errors, it seems like we ought to be able to
> > > > > > reset it.
> > > > >
> > > > > But how to reset the device considering there is no driver.
> > > > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > > > perform reset at PCIe level.
> > > >
> > > > I don't understand your question.  The PCI core (not the device
> > > > driver) already does the reset.  When pcie_do_recovery() calls
> > > > reset_link(), all devices on the other side of the link are reset.
> > > >
> > > > > > We should be able to field one (or a few) AER errors, reset the
> > > > > > device, and you should be able to use the shell in the kdump kernel.
> > > > > >
> > > > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > > > cannot see what they are typing.
> > > >
> > > > Right, that's what I expect.  If the PCI core resets the device, you
> > > > should get just a few AER errors, and they should stop after the
> > > > device is reset.
> > > >
> > > > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > > > >
> > > > > > > II) Crash testing using default root file system: Specific case to
> > > > > > > test Ethernet driver in second kernel
> > > > > > >    -  Default root file system have Ethernet driver
> > > > > > >    -  AER error comes even before the driver probe starts.
> > > > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > > > >    -  AER also tries to recover. but no success.  [2]
> > > > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > > > >           than different set of errors come which also never able to recover [3]
> > > > > > >
> > > > >
> > > > > Please suggest your view on this case. Here driver is preset.
> > > > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > > > In this case AER errors starts even before driver probe starts.
> > > > > After probe, driver does the device reset with no success and even AER
> > > > > recovery does not work.
> > > >
> > > > This case should be the same as the one above.  If we can change the
> > > > PCI core so it can reset the device when there's no driver,  that would
> > > > apply to case I (where there will never be a driver) and to case II
> > > > (where there is no driver now, but a driver will probe the device
> > > > later).
> > >
> > > Does this means change are required in PCI core.
> >
> > Yes, I am suggesting that the PCI core does not do the right thing
> > here.
> >
> > > I tried following changes in pcie_do_recovery() but it did not help.
> > > Same error as before.
> > >
> > > -- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > >         pci_info(dev, "broadcast resume message\n");
> > >         pci_walk_bus(bus, report_resume, &status);
> > > @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > >         return status;
> > >
> > >  failed:
> > >         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> > > +       pci_reset_function(dev);
> > > +       pci_aer_clear_device_status(dev);
> > > +       pci_aer_clear_nonfatal_status(dev);
> >
> > Did you confirm that this resets the devices in question (0000:09:00.0
> > and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
> > PM, etc)?
> 
> Earlier reset  was happening with P2P bridge(0000:00:09.0) this the
> reason no effect. After making following changes,  both devices are
> now getting reset.
> Both devices are using FLR.
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 117c0a2b2ba4..26b908f55aef 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -66,6 +66,20 @@ static int report_error_detected(struct pci_dev *dev,
>                 if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>                         vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>                         pci_info(dev, "can't recover (no
> error_detected callback)\n");
> +
> +                       pci_save_state(dev);
> +                       pci_cfg_access_lock(dev);
> +
> +                       /* Quiesce the device completely */
> +                       pci_write_config_word(dev, PCI_COMMAND,
> +                             PCI_COMMAND_INTX_DISABLE);
> +                       if (!__pci_reset_function_locked(dev)) {
> +                               vote = PCI_ERS_RESULT_RECOVERED;
> +                               pci_info(dev, "recovered via pci level
> reset\n");
> +                       }

Why do we need to save the state and quiesce the device?  The reset
should disable interrupts anyway.  In this particular case where
there's no driver, I don't think we should have to restore the state.
We maybe should *remove* the device and re-enumerate it after the
reset, but the state from before the reset should be irrelevant.

> +                       pci_cfg_access_unlock(dev);
> +                       pci_restore_state(dev);
>                 } else {
>                         vote = PCI_ERS_RESULT_NONE;
>                 }
> 
> in order to take care of case 2 (driver comes after sometime) ==>
> following code needs to be added to avoid crash during igb_probe.  It
> looks to be a race condition between AER and igb_probe().
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index b46bff8fe056..c48f0a54bb95 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3012,6 +3012,11 @@ static int igb_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
>         /* Catch broken hardware that put the wrong VF device ID in
>          * the PCIe SR-IOV capability.
>          */
> +       if (pci_dev_trylock(pdev)) {
> +               mdelay(1000);
> +               pci_info(pdev,"device is locked, try waiting 1 sec\n");
> +       }

This is interesting to learn about the AER/driver interaction, but of
course, we wouldn't want to add code like this permanently.

> Here are the observation with all above changes
> A) AER errors are less but they are still there for both case 1 (No
> driver at all) and case 2 (driver comes after some time)

We'll certainly get *some* AER errors.  We have to get one before we
know to reset the device.

> B) Each AER error(NON_FATAL) causes both devices to reset. It happens many times

I'm not sure why we reset both devices.  Are we seeing errors from
both, or could we be more selective in the code?

> C) After that AER errors [1] comes is only for device 0000:09:00.0.
> This is strange as this pci device is not being used during test.
> Ping/ssh are happening with 0000:09:01.0
> D) If wait for some more time. No more AER errors from any device
> E) Ping is working fine in case 2.
> 
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 
> # lspci -t -v
> 
>  \-[0000:00]-+-00.0  Cavium, Inc. CN99xx [ThunderX2] Integrated PCI Host bridge
>              +-01.0-[01]--
>              +-02.0-[02]--
>              +-03.0-[03]--
>              +-04.0-[04]--
>              +-05.0-[05]--+-00.0  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              |            \-00.1  Broadcom Inc. and subsidiaries
> BCM57840 NetXtreme II 10 Gigabit Ethernet
>              +-06.0-[06]--
>              +-07.0-[07]--
>              +-08.0-[08]--
>              +-09.0-[09-0a]--+-00.0  Intel Corporation 82576 Gigabit
> Network Connection
>              |               \-00.1  Intel Corporation 82576 Gigabit
> Network Connection
> 
> 
> [1] AER error which comes for 09:00.0:
> 
> [   81.659825] {7}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   81.668080] {7}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   81.676503] {7}[Hardware Error]: event severity: corrected
> [   81.681975] {7}[Hardware Error]:  Error 0, type: corrected
> [   81.687447] {7}[Hardware Error]:   section_type: PCIe error
> [   81.693004] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.698908] {7}[Hardware Error]:   version: 3.0
> [   81.703424] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.709589] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.715059] {7}[Hardware Error]:   slot: 0
> [   81.719141] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.724265] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.730864] {7}[Hardware Error]:   class_code: 000002
> [   81.735901] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.742587] {7}[Hardware Error]:  Error 1, type: corrected
> [   81.748058] {7}[Hardware Error]:   section_type: PCIe error
> [   81.753615] {7}[Hardware Error]:   port_type: 4, root port
> [   81.759086] {7}[Hardware Error]:   version: 3.0
> [   81.763602] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.769767] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.775237] {7}[Hardware Error]:   slot: 0
> [   81.779319] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.784442] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.791041] {7}[Hardware Error]:   class_code: 000406
> [   81.796078] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.803806] {7}[Hardware Error]:  Error 2, type: corrected
> [   81.809276] {7}[Hardware Error]:   section_type: PCIe error
> [   81.814834] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.820738] {7}[Hardware Error]:   version: 3.0
> [   81.825254] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.831419] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.836889] {7}[Hardware Error]:   slot: 0
> [   81.840971] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.846094] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.852693] {7}[Hardware Error]:   class_code: 000002
> [   81.857730] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.864416] {7}[Hardware Error]:  Error 3, type: corrected
> [   81.869886] {7}[Hardware Error]:   section_type: PCIe error
> [   81.875444] {7}[Hardware Error]:   port_type: 4, root port
> [   81.880914] {7}[Hardware Error]:   version: 3.0
> [   81.885430] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   81.891595] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   81.897066] {7}[Hardware Error]:   slot: 0
> [   81.901147] {7}[Hardware Error]:   secondary_bus: 0x09
> [   81.906271] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   81.912870] {7}[Hardware Error]:   class_code: 000406
> [   81.917906] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   81.925634] {7}[Hardware Error]:  Error 4, type: corrected
> [   81.931104] {7}[Hardware Error]:   section_type: PCIe error
> [   81.936662] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   81.942566] {7}[Hardware Error]:   version: 3.0
> [   81.947082] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   81.953247] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   81.958717] {7}[Hardware Error]:   slot: 0
> [   81.962799] {7}[Hardware Error]:   secondary_bus: 0x00
> [   81.967923] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   81.974522] {7}[Hardware Error]:   class_code: 000002
> [   81.979558] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   81.986244] {7}[Hardware Error]:  Error 5, type: corrected
> [   81.991715] {7}[Hardware Error]:   section_type: PCIe error
> [   81.997272] {7}[Hardware Error]:   port_type: 4, root port
> [   82.002743] {7}[Hardware Error]:   version: 3.0
> [   82.007259] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.013424] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.018894] {7}[Hardware Error]:   slot: 0
> [   82.022976] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.028099] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.034698] {7}[Hardware Error]:   class_code: 000406
> [   82.039735] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.047463] {7}[Hardware Error]:  Error 6, type: corrected
> [   82.052933] {7}[Hardware Error]:   section_type: PCIe error
> [   82.058491] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.064395] {7}[Hardware Error]:   version: 3.0
> [   82.068911] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.075076] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.080547] {7}[Hardware Error]:   slot: 0
> [   82.084628] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.089752] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.096351] {7}[Hardware Error]:   class_code: 000002
> [   82.101387] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.108073] {7}[Hardware Error]:  Error 7, type: corrected
> [   82.113544] {7}[Hardware Error]:   section_type: PCIe error
> [   82.119101] {7}[Hardware Error]:   port_type: 4, root port
> [   82.124572] {7}[Hardware Error]:   version: 3.0
> [   82.129087] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.135252] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.140723] {7}[Hardware Error]:   slot: 0
> [   82.144805] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.149928] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.156527] {7}[Hardware Error]:   class_code: 000406
> [   82.161564] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.169291] {7}[Hardware Error]:  Error 8, type: corrected
> [   82.174762] {7}[Hardware Error]:   section_type: PCIe error
> [   82.180319] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.186224] {7}[Hardware Error]:   version: 3.0
> [   82.190739] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.196904] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.202375] {7}[Hardware Error]:   slot: 0
> [   82.206456] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.211580] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.218179] {7}[Hardware Error]:   class_code: 000002
> [   82.223216] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.229901] {7}[Hardware Error]:  Error 9, type: corrected
> [   82.235372] {7}[Hardware Error]:   section_type: PCIe error
> [   82.240929] {7}[Hardware Error]:   port_type: 4, root port
> [   82.246400] {7}[Hardware Error]:   version: 3.0
> [   82.250916] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.257081] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.262551] {7}[Hardware Error]:   slot: 0
> [   82.266633] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.271756] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.278355] {7}[Hardware Error]:   class_code: 000406
> [   82.283392] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.291119] {7}[Hardware Error]:  Error 10, type: corrected
> [   82.296676] {7}[Hardware Error]:   section_type: PCIe error
> [   82.302234] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.308138] {7}[Hardware Error]:   version: 3.0
> [   82.312654] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.318819] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.324290] {7}[Hardware Error]:   slot: 0
> [   82.328371] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.333495] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.340094] {7}[Hardware Error]:   class_code: 000002
> [   82.345131] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.351816] {7}[Hardware Error]:  Error 11, type: corrected
> [   82.357374] {7}[Hardware Error]:   section_type: PCIe error
> [   82.362931] {7}[Hardware Error]:   port_type: 4, root port
> [   82.368402] {7}[Hardware Error]:   version: 3.0
> [   82.372917] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.379082] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.384553] {7}[Hardware Error]:   slot: 0
> [   82.388635] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.393758] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.400357] {7}[Hardware Error]:   class_code: 000406
> [   82.405394] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.413121] {7}[Hardware Error]:  Error 12, type: corrected
> [   82.418678] {7}[Hardware Error]:   section_type: PCIe error
> [   82.424236] {7}[Hardware Error]:   port_type: 0, PCIe end point
> [   82.430140] {7}[Hardware Error]:   version: 3.0
> [   82.434656] {7}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   82.440821] {7}[Hardware Error]:   device_id: 0000:09:00.0
> [   82.446291] {7}[Hardware Error]:   slot: 0
> [   82.450373] {7}[Hardware Error]:   secondary_bus: 0x00
> [   82.455497] {7}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   82.462096] {7}[Hardware Error]:   class_code: 000002
> [   82.467132] {7}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   82.473818] {7}[Hardware Error]:  Error 13, type: corrected
> [   82.479375] {7}[Hardware Error]:   section_type: PCIe error
> [   82.484933] {7}[Hardware Error]:   port_type: 4, root port
> [   82.490403] {7}[Hardware Error]:   version: 3.0
> [   82.494919] {7}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   82.501084] {7}[Hardware Error]:   device_id: 0000:00:09.0
> [   82.506555] {7}[Hardware Error]:   slot: 0
> [   82.510636] {7}[Hardware Error]:   secondary_bus: 0x09
> [   82.515760] {7}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   82.522359] {7}[Hardware Error]:   class_code: 000406
> [   82.527395] {7}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   82.535171] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.542476] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.550301] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.558032] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.566296] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.573597] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.581421] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.589151] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.597411] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.604711] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.612535] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.620271] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.628525] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.635826] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.643649] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.651385] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.659645] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.666940] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.674763] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.682498] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.690759] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.698053] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.705876] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.713612] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   82.721872] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   82.729167] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   82.736990] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   82.744725] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.059225] {8}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 0
> [   88.067478] {8}[Hardware Error]: It has been corrected by h/w and
> requires no further action
> [   88.075899] {8}[Hardware Error]: event severity: corrected
> [   88.081370] {8}[Hardware Error]:  Error 0, type: corrected
> [   88.086841] {8}[Hardware Error]:   section_type: PCIe error
> [   88.092399] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.098303] {8}[Hardware Error]:   version: 3.0
> [   88.102819] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.108984] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.114455] {8}[Hardware Error]:   slot: 0
> [   88.118536] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.123660] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.130259] {8}[Hardware Error]:   class_code: 000002
> [   88.135296] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.141981] {8}[Hardware Error]:  Error 1, type: corrected
> [   88.147452] {8}[Hardware Error]:   section_type: PCIe error
> [   88.153009] {8}[Hardware Error]:   port_type: 4, root port
> [   88.158480] {8}[Hardware Error]:   version: 3.0
> [   88.162995] {8}[Hardware Error]:   command: 0x0106, status: 0x4010
> [   88.169161] {8}[Hardware Error]:   device_id: 0000:00:09.0
> [   88.174633] {8}[Hardware Error]:   slot: 0
> [   88.180018] {8}[Hardware Error]:   secondary_bus: 0x09
> [   88.185142] {8}[Hardware Error]:   vendor_id: 0x177d, device_id: 0xaf84
> [   88.191914] {8}[Hardware Error]:   class_code: 000406
> [   88.196951] {8}[Hardware Error]:   bridge: secondary_status:
> 0x6000, control: 0x0002
> [   88.204852] {8}[Hardware Error]:  Error 2, type: corrected
> [   88.210323] {8}[Hardware Error]:   section_type: PCIe error
> [   88.215881] {8}[Hardware Error]:   port_type: 0, PCIe end point
> [   88.221786] {8}[Hardware Error]:   version: 3.0
> [   88.226301] {8}[Hardware Error]:   command: 0x0507, status: 0x0010
> [   88.232466] {8}[Hardware Error]:   device_id: 0000:09:00.0
> [   88.237937] {8}[Hardware Error]:   slot: 0
> [   88.242019] {8}[Hardware Error]:   secondary_bus: 0x00
> [   88.247142] {8}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> [   88.253741] {8}[Hardware Error]:   class_code: 000002
> [   88.258778] {8}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> [   88.265509] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.272812] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [   88.280635] pcieport 0000:00:09.0: AER: aer_status: 0x00000000,
> aer_mask: 0x00002000
> [   88.288363] pcieport 0000:00:09.0: AER: aer_layer=Transaction
> Layer, aer_agent=Receiver ID
> [   88.296622] igb 0000:09:00.0: AER: aer_status: 0x00002000,
> aer_mask: 0x00002000
> [   88.305391] igb 0000:09:00.0: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> 
> > Case I is using APEI, and it looks like that can queue up 16 errors
> > (AER_RECOVER_RING_SIZE), so that queue could be completely full before
> > we even get a chance to reset the device.  But I would think that the
> > reset should *eventually* stop the errors, even though we might log
> > 30+ of them first.
> >
> > As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
> > see if it reduces the logging.
> 
> Did not tried this experiment. I believe it is not required now
> 
> --pk
> 
> >
> > > > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > > > during enumeration phase of kdump kernel.
> > > > > can we thought of doing pci_reset_function for all devices in kdump
> > > > > kernel or device specific quirk.
> > > > >
> > > > > --pk
> > > > >
> > > > >
> > > > > > > As per my understanding, possible solutions are
> > > > > > >  - Copy SMMU table i.e. this patch
> > > > > > > OR
> > > > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > > > same AER error again.
> > > > > > >
> > > > > > >
> > > > > > > -pk
> > > > > > >
> > > > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > > > [1] with bootargs having pci=noaer
> > > > > > >
> > > > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > > > Hardware Error Source: 1
> > > > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > > > total mem (8153768 kB)
> > > > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > > > aer_mask: 0x00000000
> > > > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > > > aer_agent=Requester ID
> > > > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> >
> > <snip>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: wpanusb?
From: Christopher Friedt @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Stefan Schmidt; +Cc: linux-wpan
In-Reply-To: <b009a2a8-64a5-fe66-a53e-5a93135cf1f8@datenfreihafen.org>

Hi Stefan!

On Tue, May 26, 2020 at 3:38 PM Stefan Schmidt
<stefan@datenfreihafen.org> wrote:
> On 25.05.20 14:39, Christopher Friedt wrote:
> > Hi all,
> >
> > Bouncing around a bit, but in Zephyr, there is reference to a
> > "wpanusb" Linux kernel driver here:
> >
> > https://docs.zephyrproject.org/latest/samples/net/wpanusb/README.html
> >
> > This *might* be the driver in question:
> >
> > https://github.com/finikorg/wpanusb
> >
> > Just wondering if anyone has made any attempts to submit that, or
> > would that go directly upstream these days?
>
> I had a chance to talk to the author a while back. Not much activity
> from his side.

I was chatting with him as well on Zephyr Slack and let him know that
there was significant interest in it going upstream. I worry though
that it might not be a high priority for his employer.

Is there a linux-wpan IRC? Would be nice to chat in real-time at some point.

> For me this needs to be designed in a way where we could have bare
> metal, Zephyr, RIOT or Contiki based firmware implementing the interface
> and the driver would just work. The code available is a good start but
> needs more work.

I agree mostly. Of course each RTOS has their own headers, way of
declaring things, etc, but for the most part it could be platform
independent.

> I was, and somehow still am, planning on working on this. But with the
> world turned upside down there was always something else to look at
> before. Its on my list, just not very high. If anyone wants to have a
> stab at this feel free and let me know.

I'll bring it up in the Zephyr Slack. They want to incorporate it into
their "tools" repository, but it really should go into Linux at some
point.

We'll probably end up working on this for BB.O - even just having a
single driver that works for all boards in Zephyr is a pretty large
step.

Lastly, I feel like this is a recurring question, but a number of us
will likely need a bunch of 802.15.4 USB dongle to speak to our 15.4
nodes. I have a couple of ATUSB on my desk, but are there others in
our group that don't have any idea where to get parts, and likely
building one from scratch would be more time than they want to take.

Do you know of an off-the-shelf product that works with existing
drivers upstream?

M.f.G.

Chris

^ permalink raw reply

* Re: [PATCH net] net: dsa: sja1105: fix port mirroring for P/Q/R/S
From: kbuild test robot @ 2020-05-29 19:33 UTC (permalink / raw)
  To: Vladimir Oltean, davem
  Cc: kbuild-all, andrew, f.fainelli, vivien.didelot, netdev
In-Reply-To: <20200527164006.1080903-1-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6443 bytes --]

Hi Vladimir,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on sparc-next/master linus/master v5.7-rc7 next-20200529]
[cannot apply to net/master]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Vladimir-Oltean/net-dsa-sja1105-fix-port-mirroring-for-P-Q-R-S/20200528-004418
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git dc0f3ed1973f101508957b59e529e03da1349e09
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

drivers/net/dsa/sja1105/sja1105_static_config.c:105:8: warning: no previous prototype for 'sja1105pqrs_avb_params_entry_packing' [-Wmissing-prototypes]
105 | size_t sja1105pqrs_avb_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/dsa/sja1105/sja1105_static_config.c:149:8: warning: no previous prototype for 'sja1105pqrs_general_params_entry_packing' [-Wmissing-prototypes]
149 | size_t sja1105pqrs_general_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:198:8: warning: no previous prototype for 'sja1105_l2_forwarding_entry_packing' [-Wmissing-prototypes]
198 | size_t sja1105_l2_forwarding_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:230:8: warning: no previous prototype for 'sja1105pqrs_l2_lookup_params_entry_packing' [-Wmissing-prototypes]
230 | size_t sja1105pqrs_l2_lookup_params_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:252:8: warning: no previous prototype for 'sja1105et_l2_lookup_entry_packing' [-Wmissing-prototypes]
252 | size_t sja1105et_l2_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:266:8: warning: no previous prototype for 'sja1105pqrs_l2_lookup_entry_packing' [-Wmissing-prototypes]
266 | size_t sja1105pqrs_l2_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:342:8: warning: no previous prototype for 'sja1105pqrs_mac_config_entry_packing' [-Wmissing-prototypes]
342 | size_t sja1105pqrs_mac_config_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:461:8: warning: no previous prototype for 'sja1105_vl_lookup_entry_packing' [-Wmissing-prototypes]
461 | size_t sja1105_vl_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:511:8: warning: no previous prototype for 'sja1105_vlan_lookup_entry_packing' [-Wmissing-prototypes]
511 | size_t sja1105_vlan_lookup_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/dsa/sja1105/sja1105_static_config.c:542:8: warning: no previous prototype for 'sja1105_retagging_entry_packing' [-Wmissing-prototypes]
542 | size_t sja1105_retagging_entry_packing(void *buf, void *entry_ptr,
|        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/sja1105pqrs_general_params_entry_packing +149 drivers/net/dsa/sja1105/sja1105_static_config.c

   145	
   146	/* TPID and TPID2 are intentionally reversed so that semantic
   147	 * compatibility with E/T is kept.
   148	 */
 > 149	size_t sja1105pqrs_general_params_entry_packing(void *buf, void *entry_ptr,
   150							enum packing_op op)
   151	{
   152		const size_t size = SJA1105PQRS_SIZE_GENERAL_PARAMS_ENTRY;
   153		struct sja1105_general_params_entry *entry = entry_ptr;
   154	
   155		sja1105_packing(buf, &entry->vllupformat, 351, 351, size, op);
   156		sja1105_packing(buf, &entry->mirr_ptacu,  350, 350, size, op);
   157		sja1105_packing(buf, &entry->switchid,    349, 347, size, op);
   158		sja1105_packing(buf, &entry->hostprio,    346, 344, size, op);
   159		sja1105_packing(buf, &entry->mac_fltres1, 343, 296, size, op);
   160		sja1105_packing(buf, &entry->mac_fltres0, 295, 248, size, op);
   161		sja1105_packing(buf, &entry->mac_flt1,    247, 200, size, op);
   162		sja1105_packing(buf, &entry->mac_flt0,    199, 152, size, op);
   163		sja1105_packing(buf, &entry->incl_srcpt1, 151, 151, size, op);
   164		sja1105_packing(buf, &entry->incl_srcpt0, 150, 150, size, op);
   165		sja1105_packing(buf, &entry->send_meta1,  149, 149, size, op);
   166		sja1105_packing(buf, &entry->send_meta0,  148, 148, size, op);
   167		sja1105_packing(buf, &entry->casc_port,   147, 145, size, op);
   168		sja1105_packing(buf, &entry->host_port,   144, 142, size, op);
   169		sja1105_packing(buf, &entry->mirr_port,   141, 139, size, op);
   170		sja1105_packing(buf, &entry->vlmarker,    138, 107, size, op);
   171		sja1105_packing(buf, &entry->vlmask,      106,  75, size, op);
   172		sja1105_packing(buf, &entry->tpid2,        74,  59, size, op);
   173		sja1105_packing(buf, &entry->ignore2stf,   58,  58, size, op);
   174		sja1105_packing(buf, &entry->tpid,         57,  42, size, op);
   175		sja1105_packing(buf, &entry->queue_ts,     41,  41, size, op);
   176		sja1105_packing(buf, &entry->egrmirrvid,   40,  29, size, op);
   177		sja1105_packing(buf, &entry->egrmirrpcp,   28,  26, size, op);
   178		sja1105_packing(buf, &entry->egrmirrdei,   25,  25, size, op);
   179		sja1105_packing(buf, &entry->replay_port,  24,  22, size, op);
   180		return size;
   181	}
   182	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 61666 bytes --]

^ permalink raw reply

* [leon-rdma:rdma-next 6/31] net/ethtool/linkmodes.c:241:2: warning: initializer overrides prior initialization of this subobject
From: kbuild test robot @ 2020-05-29 19:34 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7574 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git rdma-next
head:   bcca52466e06ce0bb434bbc3e52c4bccb2a57c41
commit: 5bd74bbc6ffbc77544e0c00e72a9d674eea67d9c [6/31] ethtool: Add support for 100Gbps per lane link modes
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 2d068e534f1671459e1b135852c1b3c10502e929)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        git checkout 5bd74bbc6ffbc77544e0c00e72a9d674eea67d9c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> net/ethtool/linkmodes.c:241:2: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
__DEFINE_LINK_MODE_PARAMS(400000, CR8, Full),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/ethtool/linkmodes.c:150:48: note: expanded from macro '__DEFINE_LINK_MODE_PARAMS'
[ETHTOOL_LINK_MODE(_speed, _type, _duplex)] = {                                                          ^~~
net/ethtool/linkmodes.c:239:2: note: previous initialization is here
__DEFINE_LINK_MODE_PARAMS(400000, CR8, Full),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/ethtool/linkmodes.c:150:48: note: expanded from macro '__DEFINE_LINK_MODE_PARAMS'
[ETHTOOL_LINK_MODE(_speed, _type, _duplex)] = {                                                          ^~~
1 warning generated.

vim +241 net/ethtool/linkmodes.c

   148	
   149	#define __DEFINE_LINK_MODE_PARAMS(_speed, _type, _duplex) \
   150		[ETHTOOL_LINK_MODE(_speed, _type, _duplex)] = { \
   151			.speed	= SPEED_ ## _speed, \
   152			.duplex	= __DUPLEX_ ## _duplex \
   153		}
   154	#define __DUPLEX_Half DUPLEX_HALF
   155	#define __DUPLEX_Full DUPLEX_FULL
   156	#define __DEFINE_SPECIAL_MODE_PARAMS(_mode) \
   157		[ETHTOOL_LINK_MODE_ ## _mode ## _BIT] = { \
   158			.speed	= SPEED_UNKNOWN, \
   159			.duplex	= DUPLEX_UNKNOWN, \
   160		}
   161	
   162	static const struct link_mode_info link_mode_params[] = {
   163		__DEFINE_LINK_MODE_PARAMS(10, T, Half),
   164		__DEFINE_LINK_MODE_PARAMS(10, T, Full),
   165		__DEFINE_LINK_MODE_PARAMS(100, T, Half),
   166		__DEFINE_LINK_MODE_PARAMS(100, T, Full),
   167		__DEFINE_LINK_MODE_PARAMS(1000, T, Half),
   168		__DEFINE_LINK_MODE_PARAMS(1000, T, Full),
   169		__DEFINE_SPECIAL_MODE_PARAMS(Autoneg),
   170		__DEFINE_SPECIAL_MODE_PARAMS(TP),
   171		__DEFINE_SPECIAL_MODE_PARAMS(AUI),
   172		__DEFINE_SPECIAL_MODE_PARAMS(MII),
   173		__DEFINE_SPECIAL_MODE_PARAMS(FIBRE),
   174		__DEFINE_SPECIAL_MODE_PARAMS(BNC),
   175		__DEFINE_LINK_MODE_PARAMS(10000, T, Full),
   176		__DEFINE_SPECIAL_MODE_PARAMS(Pause),
   177		__DEFINE_SPECIAL_MODE_PARAMS(Asym_Pause),
   178		__DEFINE_LINK_MODE_PARAMS(2500, X, Full),
   179		__DEFINE_SPECIAL_MODE_PARAMS(Backplane),
   180		__DEFINE_LINK_MODE_PARAMS(1000, KX, Full),
   181		__DEFINE_LINK_MODE_PARAMS(10000, KX4, Full),
   182		__DEFINE_LINK_MODE_PARAMS(10000, KR, Full),
   183		[ETHTOOL_LINK_MODE_10000baseR_FEC_BIT] = {
   184			.speed	= SPEED_10000,
   185			.duplex = DUPLEX_FULL,
   186		},
   187		__DEFINE_LINK_MODE_PARAMS(20000, MLD2, Full),
   188		__DEFINE_LINK_MODE_PARAMS(20000, KR2, Full),
   189		__DEFINE_LINK_MODE_PARAMS(40000, KR4, Full),
   190		__DEFINE_LINK_MODE_PARAMS(40000, CR4, Full),
   191		__DEFINE_LINK_MODE_PARAMS(40000, SR4, Full),
   192		__DEFINE_LINK_MODE_PARAMS(40000, LR4, Full),
   193		__DEFINE_LINK_MODE_PARAMS(56000, KR4, Full),
   194		__DEFINE_LINK_MODE_PARAMS(56000, CR4, Full),
   195		__DEFINE_LINK_MODE_PARAMS(56000, SR4, Full),
   196		__DEFINE_LINK_MODE_PARAMS(56000, LR4, Full),
   197		__DEFINE_LINK_MODE_PARAMS(25000, CR, Full),
   198		__DEFINE_LINK_MODE_PARAMS(25000, KR, Full),
   199		__DEFINE_LINK_MODE_PARAMS(25000, SR, Full),
   200		__DEFINE_LINK_MODE_PARAMS(50000, CR2, Full),
   201		__DEFINE_LINK_MODE_PARAMS(50000, KR2, Full),
   202		__DEFINE_LINK_MODE_PARAMS(100000, KR4, Full),
   203		__DEFINE_LINK_MODE_PARAMS(100000, SR4, Full),
   204		__DEFINE_LINK_MODE_PARAMS(100000, CR4, Full),
   205		__DEFINE_LINK_MODE_PARAMS(100000, LR4_ER4, Full),
   206		__DEFINE_LINK_MODE_PARAMS(50000, SR2, Full),
   207		__DEFINE_LINK_MODE_PARAMS(1000, X, Full),
   208		__DEFINE_LINK_MODE_PARAMS(10000, CR, Full),
   209		__DEFINE_LINK_MODE_PARAMS(10000, SR, Full),
   210		__DEFINE_LINK_MODE_PARAMS(10000, LR, Full),
   211		__DEFINE_LINK_MODE_PARAMS(10000, LRM, Full),
   212		__DEFINE_LINK_MODE_PARAMS(10000, ER, Full),
   213		__DEFINE_LINK_MODE_PARAMS(2500, T, Full),
   214		__DEFINE_LINK_MODE_PARAMS(5000, T, Full),
   215		__DEFINE_SPECIAL_MODE_PARAMS(FEC_NONE),
   216		__DEFINE_SPECIAL_MODE_PARAMS(FEC_RS),
   217		__DEFINE_SPECIAL_MODE_PARAMS(FEC_BASER),
   218		__DEFINE_LINK_MODE_PARAMS(50000, KR, Full),
   219		__DEFINE_LINK_MODE_PARAMS(50000, SR, Full),
   220		__DEFINE_LINK_MODE_PARAMS(50000, CR, Full),
   221		__DEFINE_LINK_MODE_PARAMS(50000, LR_ER_FR, Full),
   222		__DEFINE_LINK_MODE_PARAMS(50000, DR, Full),
   223		__DEFINE_LINK_MODE_PARAMS(100000, KR2, Full),
   224		__DEFINE_LINK_MODE_PARAMS(100000, SR2, Full),
   225		__DEFINE_LINK_MODE_PARAMS(100000, CR2, Full),
   226		__DEFINE_LINK_MODE_PARAMS(100000, LR2_ER2_FR2, Full),
   227		__DEFINE_LINK_MODE_PARAMS(100000, DR2, Full),
   228		__DEFINE_LINK_MODE_PARAMS(200000, KR4, Full),
   229		__DEFINE_LINK_MODE_PARAMS(200000, SR4, Full),
   230		__DEFINE_LINK_MODE_PARAMS(200000, LR4_ER4_FR4, Full),
   231		__DEFINE_LINK_MODE_PARAMS(200000, DR4, Full),
   232		__DEFINE_LINK_MODE_PARAMS(200000, CR4, Full),
   233		__DEFINE_LINK_MODE_PARAMS(100, T1, Full),
   234		__DEFINE_LINK_MODE_PARAMS(1000, T1, Full),
   235		__DEFINE_LINK_MODE_PARAMS(400000, KR8, Full),
   236		__DEFINE_LINK_MODE_PARAMS(400000, SR8, Full),
   237		__DEFINE_LINK_MODE_PARAMS(400000, LR8_ER8_FR8, Full),
   238		__DEFINE_LINK_MODE_PARAMS(400000, DR8, Full),
   239		__DEFINE_LINK_MODE_PARAMS(400000, CR8, Full),
   240		__DEFINE_SPECIAL_MODE_PARAMS(FEC_LLRS),
 > 241		__DEFINE_LINK_MODE_PARAMS(400000, CR8, Full),
   242		__DEFINE_LINK_MODE_PARAMS(100000, KR, Full),
   243		__DEFINE_LINK_MODE_PARAMS(100000, SR, Full),
   244		__DEFINE_LINK_MODE_PARAMS(100000, LR_ER_FR, Full),
   245		__DEFINE_LINK_MODE_PARAMS(100000, DR, Full),
   246		__DEFINE_LINK_MODE_PARAMS(100000, CR, Full),
   247		__DEFINE_LINK_MODE_PARAMS(200000, KR2, Full),
   248		__DEFINE_LINK_MODE_PARAMS(200000, SR2, Full),
   249		__DEFINE_LINK_MODE_PARAMS(200000, LR2_ER2_FR2, Full),
   250		__DEFINE_LINK_MODE_PARAMS(200000, DR2, Full),
   251		__DEFINE_LINK_MODE_PARAMS(200000, CR2, Full),
   252		__DEFINE_LINK_MODE_PARAMS(400000, KR4, Full),
   253		__DEFINE_LINK_MODE_PARAMS(400000, SR4, Full),
   254		__DEFINE_LINK_MODE_PARAMS(400000, LR4_ER4_FR4, Full),
   255		__DEFINE_LINK_MODE_PARAMS(400000, DR4, Full),
   256		__DEFINE_LINK_MODE_PARAMS(400000, CR4, Full),
   257	};
   258	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 73583 bytes --]

^ permalink raw reply

* Re: [PATCH V2] dt-bindings: regulator: Convert anatop regulator to json-schema
From: Rob Herring @ 2020-05-29 19:35 UTC (permalink / raw)
  To: Anson Huang
  Cc: paul.liu, devicetree, linux-kernel, robh+dt, Linux-imx, broonie,
	lgirdwood
In-Reply-To: <1590717551-20772-1-git-send-email-Anson.Huang@nxp.com>

On Fri, 29 May 2020 09:59:11 +0800, Anson Huang wrote:
> Convert the anatop regulator binding to DT schema format using json-schema.
> 
> Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
> ---
> Changes since V1:
> 	- remove definition of "regulator-name" which is a standrad property;
> 	- add "unevaluatedProperties: false".
> ---
>  .../bindings/regulator/anatop-regulator.txt        | 40 ---------
>  .../bindings/regulator/anatop-regulator.yaml       | 94 ++++++++++++++++++++++
>  2 files changed, 94 insertions(+), 40 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/regulator/anatop-regulator.txt
>  create mode 100644 Documentation/devicetree/bindings/regulator/anatop-regulator.yaml
> 

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
From: Andrew Cooper @ 2020-05-29 19:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné
In-Reply-To: <8a02b933-3b7e-ded9-8bf3-a1c35f2ef7ae@suse.com>

On 28/05/2020 13:50, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -769,6 +769,30 @@ void load_system_tables(void)
>>  	tss->rsp1 = 0x8600111111111111ul;
>>  	tss->rsp2 = 0x8600111111111111ul;
>>  
>> +	/* Set up the shadow stack IST. */
>> +	if (cpu_has_xen_shstk) {
>> +		volatile uint64_t *ist_ssp = this_cpu(tss_page).ist_ssp;
>> +
>> +		/*
>> +		 * Used entries must point at the supervisor stack token.
>> +		 * Unused entries are poisoned.
>> +		 *
>> +		 * This IST Table may be live, and the NMI/#MC entries must
>> +		 * remain valid on every instruction boundary, hence the
>> +		 * volatile qualifier.
>> +		 */
> Move this comment ahead of what it comments on, as we usually have it?
>
>> +		ist_ssp[0] = 0x8600111111111111ul;
>> +		ist_ssp[IST_MCE] = stack_top + (IST_MCE * IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_NMI] = stack_top + (IST_NMI * IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_DB]	 = stack_top + (IST_DB	* IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_DF]	 = stack_top + (IST_DF	* IST_SHSTK_SIZE) - 8;
> Strictly speaking you want to introduce
>
> #define IST_SHSTK_SLOT 0
>
> next to PRIMARY_SHSTK_SLOT and use
>
> 		ist_ssp[IST_MCE] = stack_top + (IST_SHSTK_SLOT * PAGE_SIZE) +
>                                                (IST_MCE * IST_SHSTK_SIZE) - 8;
>
> etc here. It's getting longish, so I'm not going to insist. But if you
> go this route, then please also below / elsewhere.

Actually no.  I've got a much better idea, based on how Linux does the
same, but it's definitely 4.15 material at this point.

>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5994,12 +5994,33 @@ void memguard_unguard_range(void *p, unsigned long l)
>>  
>>  #endif
>>  
>> +static void write_sss_token(unsigned long *ptr)
>> +{
>> +    /*
>> +     * A supervisor shadow stack token is its own linear address, with the
>> +     * busy bit (0) clear.
>> +     */
>> +    *ptr = (unsigned long)ptr;
>> +}
>> +
>>  void memguard_guard_stack(void *p)
>>  {
>> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
>> +    /* IST Shadow stacks.  4x 1k in stack page 0. */
>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>> +    {
>> +        write_sss_token(p + (IST_MCE * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_NMI * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_DB  * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_DF  * IST_SHSTK_SIZE) - 8);
> Up to now two successive memguard_guard_stack() were working fine. This
> will be no longer the case, just as an observation.

I don't think that matters.

>
>> +    }
>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
> As already hinted at in reply to the previous patch, I think this wants
> to remain _PAGE_NONE when we don't use CET-SS.

The commit message discussed why that is not an option (currently), and
why I don't consider it a good idea to make possible.

>> +    /* Primary Shadow Stack.  1x 4k in stack page 5. */
>>      p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
>> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>> +        write_sss_token(p + PAGE_SIZE - 8);
>> +
>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>>  }
>>  
>>  void memguard_unguard_stack(void *p)
> Would this function perhaps better zap the tokens?

Why?  We don't zap any other stack contents, and let the regular page
scrubbing clean it.

~Andrew


^ permalink raw reply

* [Bug 207959] Don't warn about the universal zero initializer for a structure with the 'designated_init' attribute.
From: bugzilla-daemon @ 2020-05-29 19:35 UTC (permalink / raw)
  To: linux-sparse
In-Reply-To: <bug-207959-200559@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=207959

--- Comment #7 from Luc Van Oostenryck (luc.vanoostenryck@gmail.com) ---
(In reply to Linus Torvalds from comment #5)
> That said, I'm not sure the kernel cares. If sparse makes '{ 0 }' be
> equivalent to '{ }' and doesn't warn for it, it's not like it's a huge deal.
> 
> The problem with using 0 instead of NULL (or vice versa, which is a crime,
> and which is why NULL should never have been defined to plain 0) comes when
> it is actually confusing.

OK. I also detest this 'you can use 0 for pointers' but I think that '{ 0 }'
should just be understood as the standard idiom for '{ }' and that the current
situation where '{ 0 }' gives warnings while '{ }' doesn't s confusing and
annoying. So, I'll change Sparse's default to -Wno-universal-initializer.

> So I'd prefer the "0 for NULL" warning, even if this may not be the most
> important case for it.

Do you think it's worth to add -Wuniversal-initializer for the kernel so that
these warnings are still present for '{ 0 }'?

-- Luc

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH 2/4] dt-bindings: pinctrl: Document optional BCM7211 wake-up interrupts
From: Florian Fainelli @ 2020-05-29 19:36 UTC (permalink / raw)
  To: Rob Herring, Florian Fainelli
  Cc: linux-kernel, Linus Walleij, Ray Jui, Scott Branden,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	Nicolas Saenz Julienne, Stefan Wahren, Geert Uytterhoeven,
	Matti Vaittinen, open list:PIN CONTROL SUBSYSTEM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
In-Reply-To: <20200529193315.GA2807797@bogus>

On 5/29/20 12:33 PM, Rob Herring wrote:
> On Thu, May 28, 2020 at 12:21:10PM -0700, Florian Fainelli wrote:
>> BCM7211 supports wake-up interrupts in the form of optional interrupt
>> lines, one per bank, plus the "all banks" interrupt line.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  .../devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt         | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> index dfc67b90591c..5682b2010e50 100644
>> --- a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> @@ -16,7 +16,9 @@ Required properties:
>>    second cell is used to specify optional parameters:
>>    - bit 0 specifies polarity (0 for normal, 1 for inverted)
>>  - interrupts : The interrupt outputs from the controller. One interrupt per
>> -  individual bank followed by the "all banks" interrupt.
>> +  individual bank followed by the "all banks" interrupt. For BCM7211, an
>> +  additional set of per-bank interrupt line and an "all banks" wake-up
>> +  interrupt may be specified.
> 
> Is 'all banks' the name? Generally 'wakeup' is used for a wake up irq.

The firmware provided DTB on 7211 names the interrupts "gpio_%d" for the
standard interrupts, including the "all banks" which is then "gpio_3"
and the wake-up interrupts are named "gpio_%d_wake", and the all banks
wake-up is "gpio_3_wake".
-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/4] dt-bindings: pinctrl: Document optional BCM7211 wake-up interrupts
From: Florian Fainelli @ 2020-05-29 19:36 UTC (permalink / raw)
  To: Rob Herring, Florian Fainelli
  Cc: Stefan Wahren,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Geert Uytterhoeven, Scott Branden, Ray Jui, Linus Walleij,
	Matti Vaittinen, linux-kernel, open list:PIN CONTROL SUBSYSTEM,
	maintainer:BROADCOM BCM281XX/BCM11XXX/BCM216XX ARM ARCHITE...,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Nicolas Saenz Julienne,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
In-Reply-To: <20200529193315.GA2807797@bogus>

On 5/29/20 12:33 PM, Rob Herring wrote:
> On Thu, May 28, 2020 at 12:21:10PM -0700, Florian Fainelli wrote:
>> BCM7211 supports wake-up interrupts in the form of optional interrupt
>> lines, one per bank, plus the "all banks" interrupt line.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  .../devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt         | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> index dfc67b90591c..5682b2010e50 100644
>> --- a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt
>> @@ -16,7 +16,9 @@ Required properties:
>>    second cell is used to specify optional parameters:
>>    - bit 0 specifies polarity (0 for normal, 1 for inverted)
>>  - interrupts : The interrupt outputs from the controller. One interrupt per
>> -  individual bank followed by the "all banks" interrupt.
>> +  individual bank followed by the "all banks" interrupt. For BCM7211, an
>> +  additional set of per-bank interrupt line and an "all banks" wake-up
>> +  interrupt may be specified.
> 
> Is 'all banks' the name? Generally 'wakeup' is used for a wake up irq.

The firmware provided DTB on 7211 names the interrupts "gpio_%d" for the
standard interrupts, including the "all banks" which is then "gpio_3"
and the wake-up interrupts are named "gpio_%d_wake", and the all banks
wake-up is "gpio_3_wake".
-- 
Florian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH V3] dt-bindings: timer: Convert i.MX GPT to json-schema
From: Rob Herring @ 2020-05-29 19:36 UTC (permalink / raw)
  To: Anson Huang
  Cc: linux-arm-kernel, s.hauer, robh+dt, daniel.lezcano, shawnguo,
	Linux-imx, festevam, kernel, devicetree, linux-kernel, tglx
In-Reply-To: <1590717882-20922-1-git-send-email-Anson.Huang@nxp.com>

On Fri, 29 May 2020 10:04:42 +0800, Anson Huang wrote:
> Convert the i.MX GPT binding to DT schema format using json-schema.
> 
> Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
> ---
> Changes since V2:
> 	- in compatible properties, group all the ones with the same
> 	  fallback to a single 'items' list using enum for the first entry.
> ---
>  .../devicetree/bindings/timer/fsl,imxgpt.txt       | 45 --------------
>  .../devicetree/bindings/timer/fsl,imxgpt.yaml      | 72 ++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 45 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/timer/fsl,imxgpt.txt
>  create mode 100644 Documentation/devicetree/bindings/timer/fsl,imxgpt.yaml
> 

Applied, thanks!

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.