LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/7 v5] Docs: dt: add fsl-mc iommu-map device-tree binding
From: Robin Murphy @ 2018-07-03 14:39 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-2-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> The existing IOMMU bindings cannot be used to specify the relationship
> between fsl-mc devices and IOMMUs. This patch adds a generic binding for
> mapping fsl-mc devices to IOMMUs, using iommu-map property.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> ---
>   .../devicetree/bindings/misc/fsl,qoriq-mc.txt      | 39 ++++++++++++++++++++++
>   1 file changed, 39 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> index 6611a7c..8cbed4f 100644
> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> @@ -9,6 +9,25 @@ blocks that can be used to create functional hardware objects/devices
>   such as network interfaces, crypto accelerator instances, L2 switches,
>   etc.
>   
> +For an overview of the DPAA2 architecture and fsl-mc bus see:
> +drivers/staging/fsl-mc/README.txt

Nit: Looks like that's Documentation/networking/dpaa2/overview.rst now.

> +
> +As described in the above overview, all DPAA2 objects in a DPRC share the
> +same hardware "isolation context" and a 10-bit value called an ICID
> +(isolation context id) is expressed by the hardware to identify
> +the requester.
> +
> +The generic 'iommus' property is insufficient to describe the relationship
> +between ICIDs and IOMMUs, so an iommu-map property is used to define
> +the set of possible ICIDs under a root DPRC and how they map to
> +an IOMMU.
> +
> +For generic IOMMU bindings, see
> +Documentation/devicetree/bindings/iommu/iommu.txt.
> +
> +For arm-smmu binding, see:
> +Documentation/devicetree/bindings/iommu/arm,smmu.txt.
> +
>   Required properties:
>   
>       - compatible
> @@ -88,14 +107,34 @@ Sub-nodes:
>                 Value type: <phandle>
>                 Definition: Specifies the phandle to the PHY device node associated
>                             with the this dpmac.
> +Optional properties:
> +
> +- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier
> +  data.
> +
> +  The property is an arbitrary number of tuples of
> +  (icid-base,iommu,iommu-base,length).
> +
> +  Any ICID i in the interval [icid-base, icid-base + length) is
> +  associated with the listed IOMMU, with the iommu-specifier
> +  (i - icid-base + iommu-base).
>   
>   Example:
>   
> +        smmu: iommu@5000000 {
> +               compatible = "arm,mmu-500";
> +               #iommu-cells = <2>;

This should be 1 if stream-match-mask is present. Bad example is bad :)

Robin.

> +               stream-match-mask = <0x7C00>;
> +               ...
> +        };
> +
>           fsl_mc: fsl-mc@80c000000 {
>                   compatible = "fsl,qoriq-mc";
>                   reg = <0x00000008 0x0c000000 0 0x40>,    /* MC portal base */
>                         <0x00000000 0x08340000 0 0x40000>; /* MC control reg */
>                   msi-parent = <&its>;
> +                /* define map for ICIDs 23-64 */
> +                iommu-map = <23 &smmu 23 41>;
>                   #address-cells = <3>;
>                   #size-cells = <1>;
>   
> 

^ permalink raw reply

* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Rafael J. Wysocki @ 2018-07-03 14:35 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev
In-Reply-To: <1530600642-25090-1-git-send-email-kernelfans@gmail.com>

On Tuesday, July 3, 2018 8:50:38 AM CEST Pingfan Liu wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge.

So what *exactly* does happen in that case?

^ permalink raw reply

* Re: [PATCHv3 3/4] drivers/base: clean up the usage of devices_kset_move_last()
From: Rafael J. Wysocki @ 2018-07-03 14:26 UTC (permalink / raw)
  To: Pingfan Liu, Grygorii Strashko
  Cc: linux-kernel, Greg Kroah-Hartman, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, linux-pci, linuxppc-dev
In-Reply-To: <1530600642-25090-4-git-send-email-kernelfans@gmail.com>

On Tuesday, July 3, 2018 8:50:41 AM CEST Pingfan Liu wrote:
> Clean up the referring to the code in commit 52cdbdd49853 ("driver core:
> correct device's shutdown order"). So later we can revert it safely.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: linux-pci@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/base/core.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 684b994..db3deb8 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -127,13 +127,6 @@ static int device_reorder_to_tail(struct device *dev, void *not_used)
>  {
>  	struct device_link *link;
>  
> -	/*
> -	 * Devices that have not been registered yet will be put to the ends
> -	 * of the lists during the registration, so skip them here.
> -	 */
> -	if (device_is_registered(dev))
> -		devices_kset_move_last(dev);
> -
>  	if (device_pm_initialized(dev))
>  		device_pm_move_last(dev);

You can't do this.

If you do it, that will break power management in some situations.

Thanks,
Rafael

^ permalink raw reply

* How is this possible - Register r30 contains 0xc2236400 instead of 0xc6236400
From: Christophe LEROY @ 2018-07-03 13:20 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org, Segher Boessenkool,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Kernel Oops at 0xc0334d5c for reading at address 0xc2236450 which 
corresponds to r30 + 80

But r30 should contain what's at r3 + 16 that is at 0xc619ec10 so r30 
should be c6236400 as shown below (print_hex_dump(regs->gpr[3]) added at 
end of __die() )

So how can r30 contain 0xc2236400 instead ?

And this is not random, it happens at most if not every startup.

c0334d44 <sock_wfree>:
c0334d44:       7c 08 02 a6     mflr    r0
c0334d48:       94 21 ff f0     stwu    r1,-16(r1)
c0334d4c:       bf c1 00 08     stmw    r30,8(r1)
c0334d50:       90 01 00 14     stw     r0,20(r1)
c0334d54:       83 c3 00 10     lwz     r30,16(r3)
c0334d58:       81 23 00 a8     lwz     r9,168(r3)
c0334d5c:       81 5e 00 50     lwz     r10,80(r30)


[  152.288237] Unable to handle kernel paging request for data at 
address 0xc2236450
[  152.295444] Faulting instruction address: 0xc0334d5c
[  152.300369] Oops: Kernel access of bad area, sig: 11 [#1]
[  152.305665] BE PREEMPT DEBUG_PAGEALLOC CMPC885
[  152.313630] CPU: 0 PID: 269 Comm: in:imuxsock Not tainted 
4.14.52-00025-g5bada429cf-dirty #36
[  152.322729] task: c623e100 task.stack: c650c000
[  152.327202] NIP:  c0334d5c LR: c043602c CTR: c0435fb8
[  152.332200] REGS: c650dc00 TRAP: 0300   Not tainted 
(4.14.52-00025-g5bada429cf-dirty)
[  152.340699] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28002822 XER: 20000000
[  152.347333] DAR: c2236450 DSISR: c0000000
[  152.347333] GPR00: c043602c c650dcb0 c623e100 c619ec00 c642c060 
00000008 00000018 c650dd4c
[  152.347333] GPR08: c0435fb8 000002b0 c068d830 00000004 28004822 
100d4208 00000000 7780c848
[  152.347333] GPR16: 0ff58398 777674b0 1024b050 1024b0a8 1005ddbc 
0ff5a7bc 000003e8 00000000
[  152.347333] GPR24: 0000008e c5011650 c650deb8 0000008e c619ec00 
00000040 c2236400 c619ec00
[  152.385015] NIP [c0334d5c] sock_wfree+0x18/0xa4
[  152.389458] LR [c043602c] unix_destruct_scm+0x74/0x88
[  152.394399] Call Trace:
[  152.396868] [c650dcb0] [c006348c] ns_to_timeval+0x4c/0x7c (unreliable)
[  152.403305] [c650dcc0] [c043602c] unix_destruct_scm+0x74/0x88
[  152.408999] [c650dcf0] [c033a10c] skb_release_head_state+0x8c/0x110
[  152.415184] [c650dd00] [c033a3c4] skb_release_all+0x18/0x50
[  152.420690] [c650dd10] [c033a7cc] consume_skb+0x38/0xec
[  152.425869] [c650dd20] [c0342d7c] skb_free_datagram+0x1c/0x68
[  152.431535] [c650dd30] [c0435c8c] unix_dgram_recvmsg+0x19c/0x4ac
[  152.437476] [c650ddb0] [c0331370] ___sys_recvmsg+0x98/0x138
[  152.442984] [c650deb0] [c0333280] __sys_recvmsg+0x40/0x84
[  152.448321] [c650df10] [c0333680] SyS_socketcall+0xb8/0x1d4
[  152.453832] [c650df40] [c000d1ac] ret_from_syscall+0x0/0x38
[  152.459286] Instruction dump:
[  152.462225] 41beffac 4bffff58 38800003 4bffffa0 38800001 4bffff98 
7c0802a6 9421fff0
[  152.469881] bfc10008 90010014 83c30010 812300a8 <815e0050> 3bfe00e0 
71480200 4082003c
[  152.477739] c619ec00: 00 00 00 00 00 00 00 00 00 00 00 23 6f d9 b1 65
[  152.484100] c619ec10: c6 23 64 00 00 00 00 00 c6 42 c0 60 00 00 03 e8
[  152.490471] c619ec20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  152.496837] c619ec30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  152.503205] c619ec40: 00 00 00 00 00 00 00 00 00 00 00 00 c0 43 5f b8
[  152.509575] c619ec50: 00 00 00 00 00 00 00 00 00 00 00 8e 00 00 00 00
[  152.515943] c619ec60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  152.522311] c619ec70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  152.528680] c619ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  152.535048] c619ec90: 00 00 ff ff 00 00 ff ff c6 42 30 8e c6 42 31 50
[  152.541417] c619eca0: c6 42 30 00 c6 42 30 00 00 00 02 b0 00 00 00 01
[  152.547781] ---[ end trace 0710a9d231876a27 ]---

Christophe

^ permalink raw reply

* Re: CONFIG_LD_DEAD_CODE_DATA_ELIMINATION: Section mismatch in reference from the variable via_pmu_driver to the function .init.text:pmu_init()
From: Nicholas Piggin @ 2018-07-03 12:15 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: linuxppc-dev, Christophe LEROY
In-Reply-To: <CA+7wUswfXaECaN5EiU9sNsO+YbgTOggVfsbMC4azStB7Xx9hPw@mail.gmail.com>

On Tue, 3 Jul 2018 13:40:55 +0200
Mathieu Malaterre <malat@debian.org> wrote:

> On Tue, Jul 3, 2018 at 11:40 AM Mathieu Malaterre <malat@debian.org> wrote:
> >
> > Hi Nick,
> >
> > Would you consider this a bug:
> >
> > $ touch drivers/macintosh/via-pmu.c
> > $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n make ARCH=powerpc
> > CROSS_COMPILE=powerpc-linux-gnu-
> > ...
> >   LD      vmlinux.o
> >   MODPOST vmlinux.o
> > WARNING: vmlinux.o(.data+0x216018): Section mismatch in reference from
> > the variable via_pmu_driver to the function .init.text:pmu_init()
> > The variable via_pmu_driver references
> > the function __init pmu_init()
> > If the reference is valid then annotate the
> > variable with __init* or __refdata (see linux/init.h) or name the variable:
> > *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
> >
> > While:
> >
> > $ touch drivers/macintosh/via-pmu.c
> > $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y make ARCH=powerpc
> > CROSS_COMPILE=powerpc-linux-gnu-
> > ...
> >   AR      init/built-in.a
> >   AR      built-in.a
> >   LD      vmlinux.o
> >   MODPOST vmlinux.o
> >   KSYM    .tmp_kallsyms1.o
> >   KSYM    .tmp_kallsyms2.o
> >   LD      vmlinux
> >   SORTEX  vmlinux
> >   SYSMAP  System.map
> > ...
> >
> > Thanks for comment  
> 
> Just to clarify I reverted 58935176ad17976b7a7f6ea25c0ceb2ca4308a30
> just as to reproduce a warning. So my question (rephrased):
> 
> Is this expected that CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y inhibit
> the behavior of  CONFIG_DEBUG_SECTION_MISMATCH=y ?

Well that's a good question actually. Section mismatch
analysis is done on the throwaway vmlinux.o which is not linked
with --gc-sections (and is not a final link), so the via_pmu_driver
symbol should exist and be picked up.

I wonder if something about the  -ffunction-sections is breaking
the reference detection.

Thanks,
Nick

^ permalink raw reply

* Re: CONFIG_LD_DEAD_CODE_DATA_ELIMINATION: Section mismatch in reference from the variable via_pmu_driver to the function .init.text:pmu_init()
From: Mathieu Malaterre @ 2018-07-03 11:40 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Christophe LEROY
In-Reply-To: <CA+7wUsxznwK8fztA+2TAWJL9nEHek_tyUcDm9HCvy-4MdvpAWQ@mail.gmail.com>

On Tue, Jul 3, 2018 at 11:40 AM Mathieu Malaterre <malat@debian.org> wrote:
>
> Hi Nick,
>
> Would you consider this a bug:
>
> $ touch drivers/macintosh/via-pmu.c
> $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n make ARCH=powerpc
> CROSS_COMPILE=powerpc-linux-gnu-
> ...
>   LD      vmlinux.o
>   MODPOST vmlinux.o
> WARNING: vmlinux.o(.data+0x216018): Section mismatch in reference from
> the variable via_pmu_driver to the function .init.text:pmu_init()
> The variable via_pmu_driver references
> the function __init pmu_init()
> If the reference is valid then annotate the
> variable with __init* or __refdata (see linux/init.h) or name the variable:
> *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
>
> While:
>
> $ touch drivers/macintosh/via-pmu.c
> $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y make ARCH=powerpc
> CROSS_COMPILE=powerpc-linux-gnu-
> ...
>   AR      init/built-in.a
>   AR      built-in.a
>   LD      vmlinux.o
>   MODPOST vmlinux.o
>   KSYM    .tmp_kallsyms1.o
>   KSYM    .tmp_kallsyms2.o
>   LD      vmlinux
>   SORTEX  vmlinux
>   SYSMAP  System.map
> ...
>
> Thanks for comment

Just to clarify I reverted 58935176ad17976b7a7f6ea25c0ceb2ca4308a30
just as to reproduce a warning. So my question (rephrased):

Is this expected that CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y inhibit
the behavior of  CONFIG_DEBUG_SECTION_MISMATCH=y ?

Thanks

^ permalink raw reply

* Re: Oops in sock_wfree
From: Christophe LEROY @ 2018-07-03 11:34 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org, netdev, Eric Dumazet
In-Reply-To: <8929d3b5-5c07-b6a1-b3c8-8625ff1c79a8@c-s.fr>



Le 03/07/2018 à 10:51, Christophe LEROY a écrit :
> Hi,
> 
> I was having strange unexpected memory corruption, therefore I activated 
> DEBUG_PAGEALLOC and I now end up with the following Oops, which tends to 
> make me think we have somewhere in the network code a use-after-free 
> bug. I saw a few of such bugs have been fixed for IPv4 and IPv6. Maybe 
> we have one remaining for Unix sockets ? How can I spot it off and fix it ?
> 
> [   39.645644] Unable to handle kernel paging request for data at 
> address 0xc2235010

In fact, must be something else. This page has never been allocated.
In seems that skb->sk should be c6234fc0 and suddenly it has changed to 
c2234fc0

How can I track that ?

Christophe

> [   39.652860] Faulting instruction address: 0xc0334d5c
> [   39.657783] Oops: Kernel access of bad area, sig: 11 [#1]
> [   39.663085] BE PREEMPT DEBUG_PAGEALLOC CMPC885
> [   39.667488] SAF3000 DIE NOTIFICATION
> [   39.671050] CPU: 0 PID: 269 Comm: in:imuxsock Not tainted 
> 4.14.52-00025-g5bada429cf #22
> [   39.679633] task: c623e100 task.stack: c651e000
> [   39.684106] NIP:  c0334d5c LR: c043602c CTR: c0435fb8
> [   39.689103] REGS: c651fc00 TRAP: 0300   Not tainted 
> (4.14.52-00025-g5bada429cf)
> [   39.697087] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28002822 XER: 20000000
> [   39.703720] DAR: c2235010 DSISR: c0000000
> [   39.703720] GPR00: c043602c c651fcb0 c623e100 c619eec0 c642c540 
> 00000008 00000018 c651fd4c
> [   39.703720] GPR08: c0435fb8 000002b0 c068d830 00000004 28004822 
> 100d4208 00000000 77990848
> [   39.703720] GPR16: 0ff58398 778eb4b0 1039f050 1039f0a8 1005ddbc 
> 0ff5a7bc 00000000 00000000
> [   39.703720] GPR24: 00000072 c5011650 c651feb8 00000072 c619eec0 
> 00000040 c2234fc0 c619eec0
> [   39.741401] NIP [c0334d5c] sock_wfree+0x18/0xa4
> [   39.745843] LR [c043602c] unix_destruct_scm+0x74/0x88
> [   39.750786] Call Trace:
> [   39.753253] [c651fcb0] [c006348c] ns_to_timeval+0x4c/0x7c (unreliable)
> [   39.759690] [c651fcc0] [c043602c] unix_destruct_scm+0x74/0x88
> [   39.765385] [c651fcf0] [c033a10c] skb_release_head_state+0x8c/0x110
> [   39.771571] [c651fd00] [c033a3c4] skb_release_all+0x18/0x50
> [   39.777078] [c651fd10] [c033a7cc] consume_skb+0x38/0xec
> [   39.782255] [c651fd20] [c0342d7c] skb_free_datagram+0x1c/0x68
> [   39.787922] [c651fd30] [c0435c8c] unix_dgram_recvmsg+0x19c/0x4ac
> [   39.793863] [c651fdb0] [c0331370] ___sys_recvmsg+0x98/0x138
> [   39.799371] [c651feb0] [c0333280] __sys_recvmsg+0x40/0x84
> [   39.804707] [c651ff10] [c0333680] SyS_socketcall+0xb8/0x1d4
> [   39.810220] [c651ff40] [c000d1ac] ret_from_syscall+0x0/0x38
> [   39.815673] Instruction dump:
> [   39.818612] 41beffac 4bffff58 38800003 4bffffa0 38800001 4bffff98 
> 7c0802a6 9421fff0
> [   39.826267] bfc10008 90010014 83c30010 812300a8 <815e0050> 3bfe00e0 
> 71480200 4082003c
> [   39.834113] ---[ end trace 8affde0490d7e25e ]---
> 
> Thanks
> Christophe

^ permalink raw reply

* [v2 PATCH 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"
From: Gautham R. Shenoy @ 2018-07-03 11:03 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <cover.1530609795.git.ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share a
particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch defines the helper function to parse the contents of
"ibm,thread-groups" and a new structure to contain the parsed output.

The patch also creates the sysfs file named "small_core_siblings" that
returns the physical ids of the threads in the core that share the L1
cache, translation cache and instruction data flow.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 ++
 arch/powerpc/include/asm/cputhreads.h              |  22 +++++
 arch/powerpc/kernel/setup-common.c                 | 110 +++++++++++++++++++++
 arch/powerpc/kernel/sysfs.c                        |  35 +++++++
 4 files changed, 175 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9c5e7732..53a823a 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,3 +487,11 @@ Description:	Information about CPU vulnerabilities
 		"Not affected"	  CPU is not affected by the vulnerability
 		"Vulnerable"	  CPU is affected and no mitigation in effect
 		"Mitigation: $M"  CPU is affected and mitigation $M is in effect
+
+What: 		/sys/devices/system/cpu/cpu[0-9]+/small_core_sibings
+Date:		03-Jul-2018
+KernelVersion:	v4.18.0
+Contact:	Gautham R. Shenoy <ego@linux.vnet.ibm.com>
+Description:	List of Physical ids of CPUs which share the the L1 cache,
+		translation cache and instruction data-flow with this CPU.
+Values:		Comma separated list of decimal integers.
diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index d71a909..33226d7 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
 #define threads_per_subcore	1
 #define threads_shift		0
+#define has_big_cores		0
 #define threads_core_mask	(*get_cpu_mask(0))
 #endif
 
@@ -69,12 +71,32 @@ static inline cpumask_t cpu_online_cores_map(void)
 	return cpu_thread_mask_to_cores(cpu_online_mask);
 }
 
+#define MAX_THREAD_LIST_SIZE	8
+struct thread_groups {
+	unsigned int property;
+	unsigned int nr_groups;
+	unsigned int threads_per_group;
+	unsigned int thread_list[MAX_THREAD_LIST_SIZE];
+};
+
 #ifdef CONFIG_SMP
 int cpu_core_index_of_thread(int cpu);
 int cpu_first_thread_of_core(int core);
+int parse_thread_groups(struct device_node *dn, struct thread_groups *tg);
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg);
 #else
 static inline int cpu_core_index_of_thread(int cpu) { return cpu; }
 static inline int cpu_first_thread_of_core(int core) { return core; }
+static inline int parse_thread_groups(struct device_node *dn,
+				      struct thread_groups *tg)
+{
+	return -ENODATA;
+}
+
+static inline int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+	return -1;
+}
 #endif
 
 static inline int cpu_thread_in_core(int cpu)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 40b44bb..a78ec66 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -402,10 +402,12 @@ void __init check_for_initrd(void)
 #ifdef CONFIG_SMP
 
 int threads_per_core, threads_per_subcore, threads_shift;
+bool has_big_cores = true;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
 EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
+EXPORT_SYMBOL_GPL(has_big_cores);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
 static void __init cpu_init_thread_core_maps(int tpc)
@@ -433,6 +435,108 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+/*
+ * parse_thread_groups: Parses the "ibm,thread-groups" device tree
+ *                      property for the CPU device node dn and stores
+ *                      the parsed output in the thread_groups
+ *                      structure tg.
+ *
+ * ibm,thread-groups[0..N-1] array defines which group of threads in
+ * the CPU-device node can be grouped together based on the property.
+ *
+ * ibm,thread-groups[0] tells us the property based on which the
+ * threads are being grouped together. If this value is 1, it implies
+ * that the threads in the same group share L1, translation cache.
+ *
+ * ibm,thread-groups[1] tells us how many such thread groups exist.
+ *
+ * ibm,thread-groups[2] tells us the number of threads in each such
+ * group.
+ *
+ * ibm,thread-groups[3..N-1] is the list of threads identified by
+ * "ibm,ppc-interrupt-server#s" arranged as per their membership in
+ * the grouping.
+ *
+ * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
+ * implies that there are 2 groups of 4 threads each, where each group
+ * of threads share L1, translation cache.
+ *
+ * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
+ * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
+ * 11, 12} structure
+ *
+ * Returns 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ */
+int parse_thread_groups(struct device_node *dn,
+			struct thread_groups *tg)
+{
+	unsigned int nr_groups, threads_per_group, property;
+	int i;
+	u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
+	u32 *thread_list;
+	size_t total_threads;
+	int ret;
+
+	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
+					 thread_group_array, 3);
+
+	if (ret)
+		return ret;
+
+	property = thread_group_array[0];
+	nr_groups = thread_group_array[1];
+	threads_per_group = thread_group_array[2];
+	total_threads = nr_groups * threads_per_group;
+
+	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
+					 thread_group_array,
+					 3 + total_threads);
+	if (ret)
+		return ret;
+
+	thread_list = &thread_group_array[3];
+
+	for (i = 0 ; i < total_threads; i++)
+		tg->thread_list[i] = thread_list[i];
+
+	tg->property = property;
+	tg->nr_groups = nr_groups;
+	tg->threads_per_group = threads_per_group;
+
+	return 0;
+}
+
+/*
+ * get_cpu_thread_group_start : Searches the thread group in tg->thread_list
+ *                              that @cpu belongs to.
+ *
+ * Returns the index to tg->thread_list that points to the the start
+ * of the thread_group that @cpu belongs to.
+ *
+ * Returns -1 if cpu doesn't belong to any of the groups pointed
+ * to by tg->thread_list.
+ */
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+	int hw_cpu_id = get_hard_smp_processor_id(cpu);
+	int i, j;
+
+	for (i = 0; i < tg->nr_groups; i++) {
+		int group_start = i * tg->threads_per_group;
+
+		for (j = 0; j < tg->threads_per_group; j++) {
+			int idx = group_start + j;
+
+			if (tg->thread_list[idx] == hw_cpu_id)
+				return group_start;
+		}
+	}
+
+	return -1;
+}
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -467,6 +571,7 @@ void __init smp_setup_cpu_maps(void)
 		const __be32 *intserv;
 		__be32 cpu_be;
 		int j, len;
+		struct thread_groups tg = {.nr_groups = 0};
 
 		DBG("  * %pOF...\n", dn);
 
@@ -505,6 +610,11 @@ void __init smp_setup_cpu_maps(void)
 			cpu++;
 		}
 
+		if (parse_thread_groups(dn, &tg) ||
+		    tg.nr_groups < 1 || tg.property != 1) {
+			has_big_cores = false;
+		}
+
 		if (cpu >= nr_cpu_ids) {
 			of_node_put(dn);
 			break;
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 755dc98..f5717de 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
 #include <asm/smp.h>
 #include <asm/pmc.h>
 #include <asm/firmware.h>
+#include <asm/cputhreads.h>
 
 #include "cacheinfo.h"
 #include "setup.h"
@@ -1025,6 +1026,33 @@ static ssize_t show_physical_id(struct device *dev,
 }
 static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
 
+static ssize_t show_small_core_siblings(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	struct device_node *dn = of_get_cpu_node(cpu->dev.id, NULL);
+	struct thread_groups tg;
+	int i, j;
+	ssize_t ret = 0;
+
+	if (parse_thread_groups(dn, &tg))
+		return -ENODATA;
+
+	i = get_cpu_thread_group_start(cpu->dev.id, &tg);
+
+	if (i == -1)
+		return -ENODATA;
+
+	for (j = 0; j < tg.threads_per_group - 1; j++)
+		ret += sprintf(buf + ret, "%d,", tg.thread_list[i + j]);
+
+	ret += sprintf(buf + ret, "%d\n", tg.thread_list[i + j]);
+
+	return ret;
+}
+static DEVICE_ATTR(small_core_siblings, 0444, show_small_core_siblings, NULL);
+
 static int __init topology_init(void)
 {
 	int cpu, r;
@@ -1048,6 +1076,13 @@ static int __init topology_init(void)
 			register_cpu(c, cpu);
 
 			device_create_file(&c->dev, &dev_attr_physical_id);
+
+			if (has_big_cores) {
+				const struct device_attribute *attr =
+				       &dev_attr_small_core_siblings;
+
+			       device_create_file(&c->dev, attr);
+			}
 		}
 	}
 	r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
-- 
1.9.4

^ permalink raw reply related

* [v2 PATCH 0/2] powerpc: Detection and scheduler optimization for POWER9 bigcore
From: Gautham R. Shenoy @ 2018-07-03 11:03 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

This is the second iteration of the patchset to add support for big-core on POWER9.

The earlier version can be found here: https://lkml.org/lkml/2018/5/11/245.

The changes from the previous version:
    - Added comments explaining the "ibm,thread-groups" device tree property.
    - Uses cleaner device-tree parsing functions to parse the u32 arrays.
    - Adds a sysfs file listing the small-core siblings for every CPU.
    - Enables the scheduler optimization by setting the CPU_FTR_ASYM_SMT bit
      in the cur_cpu_spec->cpu_features on detecting the presence
      of interleaved big-core.
    - Handles the corner case where there is only a single thread-group
      or when there is a single thread in a thread-group.


Description:
~~~~~~~~~~~~~~~~~~~~
A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. Furthermore, the thread-ids of
such a big-core is obtained by interleaving the thread-ids of the
component SMT4 cores.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.
    
On such a big-core, when multiple tasks are scheduled to run on the
big-core, we get the best performance when the tasks are spread across
the pair of SMT4 cores.

The Linux scheduler supports a flag called "SD_ASYM_PACKING" which
when set in the SMT sched-domain, biases the load-balancing of the
tasks on the smaller numbered threads in the core. On an big-core
whose threads are interleavings of the threads of the small cores,
enabling SD_ASYM_PACKING in the SMT sched-domain automatically results
in spreading the tasks uniformly across the associated pair of SMT4
cores, thereby yielding better performance.

This patchset contains two patches which on detecting the presence of
interleaved big-cores will enable the the CPU_FTR_ASYM_SMT bit in the
cur_cpu_spec->cpu_feature.

Patch 1: adds support to detect the presence of
big-cores and reports the small-core siblings of each CPU X
via the sysfs file "/sys/devices/system/cpu/cpuX/big_core_siblings".

Patch 2: checks if the thread-ids of the component small-cores are
interleaved, in which case we enable the the CPU_FTR_ASYM_SMT bit in
the cur_cpu_spec->cpu_features which results in the SD_ASYM_PACKING
flag being set at the SMT level sched-domain.

Results:
~~~~~~~~~~~~~~~~~
Experimental results for ebizzy with 2 threads, bound to a single big-core
show a marked improvement with this patchset over the 4.18-rc3 vanilla
kernel.

The result of 100 such runs for 4.18-rc3 kernel and the 4.18-rc3 +
big-core-patches are as follows

4.18-rc3:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        records/s    :  # samples  : Histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	
[0 -       1000000]  :      0      : #
[1000000 - 2000000]  :      3      : #
[2000000 - 3000000]  :      16     : ####
[3000000 - 4000000]  :      11     : ###
[4000000 - 5000000]  :      0      : #
[5000000 - 6000000]  :      70     : ###############

4.18-rc3 + big-core-patches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        records/s    :  # samples  : Histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	
[0 -       1000000]  :      0      : #
[1000000 - 2000000]  :      0      : #
[2000000 - 3000000]  :      6      : ##
[3000000 - 4000000]  :      0      : #
[4000000 - 5000000]  :      2      : #
[5000000 - 6000000]  :      92     : ###################



Gautham R. Shenoy (2):
  powerpc: Detect the presence of big-cores via "ibm,thread-groups"
  powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores

 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 +
 arch/powerpc/include/asm/cputhreads.h              |  22 +++
 arch/powerpc/kernel/setup-common.c                 | 177 ++++++++++++++++++++-
 arch/powerpc/kernel/sysfs.c                        |  35 ++++
 4 files changed, 241 insertions(+), 1 deletion(-)

-- 
1.9.4

^ permalink raw reply

* [v2 PATCH 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores
From: Gautham R. Shenoy @ 2018-07-03 11:03 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <cover.1530609795.git.ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

A pair of IBM POWER9 SMT4 cores can be fused together to form a big-core
with 8 SMT threads. This can be discovered via the "ibm,thread-groups"
CPU property in the device tree which will indicate which group of
threads that share the L1 cache, translation cache and instruction data
flow. If there are multiple such group of threads, then the core is a
big-core.

Furthermore, if the thread-ids of the threads of the big-core can be
obtained by interleaving the thread-ids of the thread-groups
(component small core), then such a big-core is called an interleaved
big-core.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

The SMT4 cores forming a big-core are more or less independent
units. Thus when multiple tasks are scheduled to run on the fused
core, we get the best performance when the tasks are spread across the
pair of SMT4 cores.

This patch enables CPU_FTR_ASYM_SMT bit in the cpu-features on
detecting the presence of interleaved big-cores at boot up. This will
will bias the load-balancing of tasks on smaller numbered threads,
which will automatically result in spreading the tasks uniformly
across the associated pair of SMT4 cores.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/setup-common.c | 67 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index a78ec66..f63d797 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -537,6 +537,56 @@ int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
 	return -1;
 }
 
+/*
+ * check_interleaved_big_core - Checks if the thread group tg
+ * corresponds to a big-core whose threads are interleavings of the
+ * threads of the component small cores.
+ *
+ * @tg: A thread-group struct for the core.
+ *
+ * Returns true if the core is a interleaved big-core.
+ * Returns false otherwise.
+ */
+static inline bool check_interleaved_big_core(struct thread_groups *tg)
+{
+	int nr_groups;
+	int threads_per_group;
+	int cur_cpu, next_cpu, i, j;
+
+	nr_groups = tg->nr_groups;
+	threads_per_group = tg->threads_per_group;
+
+	if (tg->property != 1)
+		return false;
+
+	if (nr_groups < 2 || threads_per_group < 2)
+		return false;
+
+	/*
+	 * In case of an interleaved big-core, the thread-ids of the
+	 * big-core can be obtained by interleaving the the thread-ids
+	 * of the component small
+	 *
+	 * Eg: On a 8-thread big-core with two SMT4 small cores, the
+	 * threads of the two component small cores will be
+	 * {0, 2, 4, 6} and {1, 3, 5, 7}.
+	 */
+	for (i = 0; i < nr_groups; i++) {
+		int group_start = i * threads_per_group;
+
+		for (j = 0; j < threads_per_group - 1; j++) {
+			int cur_idx = group_start + j;
+
+			cur_cpu = tg->thread_list[cur_idx];
+			next_cpu = tg->thread_list[cur_idx + 1];
+			if (next_cpu != cur_cpu + nr_groups)
+				return false;
+		}
+	}
+
+	return true;
+}
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -560,6 +610,7 @@ void __init smp_setup_cpu_maps(void)
 	struct device_node *dn;
 	int cpu = 0;
 	int nthreads = 1;
+	bool has_interleaved_big_core = true;
 
 	DBG("smp_setup_cpu_maps()\n");
 
@@ -613,6 +664,12 @@ void __init smp_setup_cpu_maps(void)
 		if (parse_thread_groups(dn, &tg) ||
 		    tg.nr_groups < 1 || tg.property != 1) {
 			has_big_cores = false;
+			has_interleaved_big_core = false;
+		}
+
+		if (has_interleaved_big_core) {
+			has_interleaved_big_core =
+				check_interleaved_big_core(&tg);
 		}
 
 		if (cpu >= nr_cpu_ids) {
@@ -669,7 +726,15 @@ void __init smp_setup_cpu_maps(void)
 	vdso_data->processorCount = num_present_cpus();
 #endif /* CONFIG_PPC64 */
 
-        /* Initialize CPU <=> thread mapping/
+	if (has_interleaved_big_core) {
+		int key = __builtin_ctzl(CPU_FTR_ASYM_SMT);
+
+		cur_cpu_spec->cpu_features |= CPU_FTR_ASYM_SMT;
+		static_branch_enable(&cpu_feature_keys[key]);
+		pr_info("Detected interleaved big-cores\n");
+	}
+
+	/* Initialize CPU <=> thread mapping/
 	 *
 	 * WARNING: We assume that the number of threads is the same for
 	 * every CPU in the system. If that is not the case, then some code
-- 
1.9.4

^ permalink raw reply related

* Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices
From: Andy Shevchenko @ 2018-07-03 10:58 UTC (permalink / raw)
  To: Pingfan Liu, Pavel Tatashin
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Rafael J . Wysocki,
	Grygorii Strashko, Christoph Hellwig, Bjorn Helgaas, Dave Young,
	linux-pci, open list:LINUX FOR POWERPC PA SEMI PWRFICIENT
In-Reply-To: <1530600642-25090-3-git-send-email-kernelfans@gmail.com>

I think Pavel would be interested to see this as well (he is doing
some parallel device shutdown stuff)

On Tue, Jul 3, 2018 at 9:50 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge. This will break the
> parent<-children order and cause failure when "kexec -e" in some scenario.
>
> The detailed description of the scenario:
> An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> to some issue. For this case, the bridge is moved after its children in
> devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> write back buffer in flight due to the former shutdown of the bridge which
> clears the BusMaster bit.
>
> It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> order on devices_kset. Take the following scene:
> step0: before a consumer's probing, (note child_a is supplier of consumer_a)
>   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
>                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> step1: when probing, moving consumer-X after supplier-X
>   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> step2: the children of consumer-X should be re-ordered to maintain the seq
>   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> step3: the consumer_a should be re-ordered to maintain the seq
>   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
>
> It requires two nested recursion to drain out all out-of-order item in
> "affected range". To avoid such complicated code, this patch suggests
> to utilize the info in device tree, instead of using the order of
> devices_kset during shutdown. It iterates the device tree, and firstly
> shutdown a device's children and consumers. After this patch, the buggy
> commit is hollow and left to clean.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: linux-pci@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/device.h |  1 +
>  2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a48868f..684b994 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
>         INIT_LIST_HEAD(&dev->links.consumers);
>         INIT_LIST_HEAD(&dev->links.suppliers);
>         dev->links.status = DL_DEV_NO_DRIVER;
> +       dev->shutdown = false;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>
> @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
>          * lock is to be held
>          */
>         parent = get_device(dev->parent);
> -       get_device(dev);
>         /*
>          * Make sure the device is off the kset list, in the
>          * event that dev->*->shutdown() doesn't remove it.
> @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
>                         dev_info(dev, "shutdown\n");
>                 dev->driver->shutdown(dev);
>         }
> -
> +       dev->shutdown = true;
>         device_unlock(dev);
>         if (parent)
>                 device_unlock(parent);
>
> -       put_device(dev);
>         put_device(parent);
>         spin_lock(&devices_kset->list_lock);
>  }
>
> +/* shutdown dev's children and consumer firstly, then itself */
> +static int device_for_each_child_shutdown(struct device *dev)
> +{
> +       struct klist_iter i;
> +       struct device *child;
> +       struct device_link *link;
> +
> +       /* already shutdown, then skip this sub tree */
> +       if (dev->shutdown)
> +               return 0;
> +
> +       if (!dev->p)
> +               goto check_consumers;
> +
> +       /* there is breakage of lock in __device_shutdown(), and the redundant
> +        * ref++ on srcu protected consumer is harmless since shutdown is not
> +        * hot path.
> +        */
> +       get_device(dev);
> +
> +       klist_iter_init(&dev->p->klist_children, &i);
> +       while ((child = next_device(&i)))
> +               device_for_each_child_shutdown(child);
> +       klist_iter_exit(&i);
> +
> +check_consumers:
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> +               if (!link->consumer->shutdown)
> +                       device_for_each_child_shutdown(link->consumer);
> +       }
> +
> +       __device_shutdown(dev);
> +       put_device(dev);
> +       return 0;
> +}
> +
>  /**
>   * device_shutdown - call ->shutdown() on each device to shutdown.
>   */
>  void device_shutdown(void)
>  {
>         struct device *dev;
> +       int idx;
>
> +       idx = device_links_read_lock();
>         spin_lock(&devices_kset->list_lock);
>         /*
>          * Walk the devices list backward, shutting down each in turn.
> @@ -2866,11 +2903,12 @@ void device_shutdown(void)
>          * devices offline, even as the system is shutting down.
>          */
>         while (!list_empty(&devices_kset->list)) {
> -               dev = list_entry(devices_kset->list.prev, struct device,
> +               dev = list_entry(devices_kset->list.next, struct device,
>                                 kobj.entry);
> -               __device_shutdown(dev);
> +               device_for_each_child_shutdown(dev);
>         }
>         spin_unlock(&devices_kset->list_lock);
> +       device_links_read_unlock(idx);
>  }
>
>  /*
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 055a69d..8a0f784 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1003,6 +1003,7 @@ struct device {
>         bool                    offline:1;
>         bool                    of_node_reused:1;
>         bool                    dma_32bit_limit:1;
> +       bool                    shutdown:1; /* one direction: false->true */
>  };
>
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> --
> 2.7.4
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.
From: Michal Suchánek @ 2018-07-03 10:37 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mahesh J Salgaonkar, linuxppc-dev, Laurent Dufour,
	Michal Suchanek, Aneesh Kumar K.V
In-Reply-To: <20180703080814.5a57f52b@roar.ozlabs.ibm.com>

On Tue, 3 Jul 2018 08:08:14 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
> > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
> >  arch/powerpc/include/asm/machdep.h            |    1 
> >  arch/powerpc/kernel/exceptions-64s.S          |   42
> > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
> > |   16 +++++++- arch/powerpc/mm/slb.c                         |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1 
> >  arch/powerpc/platforms/pseries/pseries.h      |    1 
> >  arch/powerpc/platforms/pseries/ras.c          |   51
> > +++++++++++++++++++++++++
> > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +	mr	r10,r1			/* Save r1 */
> > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
> > stack */
> > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
> > frame		*/
> > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> > +	EXCEPTION_PROLOG_COMMON_1()
> > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +	EXCEPTION_PROLOG_COMMON_3(0x200)
> > +	addi	r3,r1,STACK_FRAME_OVERHEAD
> > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?

Code sharing is nice but if we envision this going to stable kernels
butchering the existing handler is going to be a nightmare. The code is
quite a bit different between kernel versions.

This code as is requires the bit that introduces
EXCEPTION_PROLOG_COMMON_1 and then should work on Linux 3.14+

Thanks

Michal

^ permalink raw reply

* Re: [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue.
From: Mahesh Jagannath Salgaonkar @ 2018-07-03 10:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Laurent Dufour,
	Michal Suchanek
In-Reply-To: <20180703132524.36500304@roar.ozlabs.ibm.com>

On 07/03/2018 08:55 AM, Nicholas Piggin wrote:
> On Mon, 02 Jul 2018 11:16:29 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> data to user through proc fs. This buffer is allocated from vmalloc
>> (non-linear mapping) area.
>>
>> On Machine check interrupt, register r3 points to RTAS extended event
>> log passed by hypervisor that contains the MCE event. The pseries
>> machine check handler then logs this error into rtas_log_buf. The
>> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
>> page fault (vector 0x300) while accessing it. Since machine check
>> interrupt handler runs in NMI context we can not afford to take any
>> page fault. Page faults are not honored in NMI context and causes
>> kernel panic. Apart from that, as Nick pointed out, pSeries_log_error()
>> also takes a spin_lock while logging error which is not safe in NMI
>> context. It may endup in deadlock if we get another MCE before releasing
>> the lock. Fix this by deferring the logging of rtas error to irq work queue.
>>
>> Current implementation uses two different buffers to hold rtas error log
>> depending on whether extended log is provided or not. This makes bit
>> difficult to identify which buffer has valid data that needs to logged
>> later in irq work. Simplify this using single buffer, one per paca, and
>> copy rtas log to it irrespective of whether extended log is provided or
>> not. Allocate this buffer below RMA region so that it can be accessed
>> in real mode mce handler.
>>
>> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> I think this looks reasonable. It doesn't fix that commit so much as
> fixes the problem that's apparent after it's applied. I don't know if
> we should backport this to a wider set of stable kernels? Aside from
> that,

Since the commit b96672dd840f went into 4.13, I think it is good if we
backport this to V4.14 and above stable kernels.

Thanks,
-Mahesh

> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Thanks,
> Nick
> 
>> ---
>>  arch/powerpc/include/asm/paca.h        |    3 ++
>>  arch/powerpc/platforms/pseries/ras.c   |   47 ++++++++++++++++++++++----------
>>  arch/powerpc/platforms/pseries/setup.c |   16 +++++++++++
>>  3 files changed, 51 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>> index 3f109a3e3edb..b441fef53077 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -251,6 +251,9 @@ struct paca_struct {
>>  	void *rfi_flush_fallback_area;
>>  	u64 l1d_flush_size;
>>  #endif
>> +#ifdef CONFIG_PPC_PSERIES
>> +	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
>> +#endif /* CONFIG_PPC_PSERIES */
>>  } ____cacheline_aligned;
>>  
>>  extern void copy_mm_to_paca(struct mm_struct *mm);
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index ef104144d4bc..14a46b07ab2f 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/of.h>
>>  #include <linux/fs.h>
>>  #include <linux/reboot.h>
>> +#include <linux/irq_work.h>
>>  
>>  #include <asm/machdep.h>
>>  #include <asm/rtas.h>
>> @@ -32,11 +33,13 @@
>>  static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
>>  static DEFINE_SPINLOCK(ras_log_buf_lock);
>>  
>> -static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
>> -static DEFINE_PER_CPU(__u64, mce_data_buf);
>> -
>>  static int ras_check_exception_token;
>>  
>> +static void mce_process_errlog_event(struct irq_work *work);
>> +static struct irq_work mce_errlog_process_work = {
>> +	.func = mce_process_errlog_event,
>> +};
>> +
>>  #define EPOW_SENSOR_TOKEN	9
>>  #define EPOW_SENSOR_INDEX	0
>>  
>> @@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>>  	((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
>>  	(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
>>  
>> +static inline struct rtas_error_log *fwnmi_get_errlog(void)
>> +{
>> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
>> +}
>> +
>>  /*
>>   * Get the error information for errors coming through the
>>   * FWNMI vectors.  The pt_regs' r3 will be updated to reflect
>>   * the actual r3 if possible, and a ptr to the error log entry
>>   * will be returned if found.
>>   *
>> - * If the RTAS error is not of the extended type, then we put it in a per
>> - * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
>> + * Use one buffer mce_data_buf per cpu to store RTAS error.
>>   *
>> - * The global_mce_data_buf does not have any locks or protection around it,
>> + * The mce_data_buf does not have any locks or protection around it,
>>   * if a second machine check comes in, or a system reset is done
>>   * before we have logged the error, then we will get corruption in the
>>   * error log.  This is preferable over holding off on calling
>> @@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
>>  static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  {
>>  	unsigned long *savep;
>> -	struct rtas_error_log *h, *errhdr = NULL;
>> +	struct rtas_error_log *h;
>>  
>>  	/* Mask top two bits */
>>  	regs->gpr[3] &= ~(0x3UL << 62);
>> @@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  	savep = __va(regs->gpr[3]);
>>  	regs->gpr[3] = savep[0];	/* restore original r3 */
>>  
>> -	/* If it isn't an extended log we can use the per cpu 64bit buffer */
>>  	h = (struct rtas_error_log *)&savep[1];
>> +	/* Use the per cpu buffer from paca to store rtas error log */
>> +	memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
>>  	if (!rtas_error_extended(h)) {
>> -		memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
>> -		errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
>> +		memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
>>  	} else {
>>  		int len, error_log_length;
>>  
>>  		error_log_length = 8 + rtas_error_extended_log_length(h);
>>  		len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
>> -		memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
>> -		memcpy(global_mce_data_buf, h, len);
>> -		errhdr = (struct rtas_error_log *)global_mce_data_buf;
>> +		memcpy(local_paca->mce_data_buf, h, len);
>>  	}
>>  
>> -	return errhdr;
>> +	return (struct rtas_error_log *)local_paca->mce_data_buf;
>>  }
>>  
>>  /* Call this when done with the data returned by FWNMI_get_errinfo.
>> @@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>  	return 0; /* need to perform reset */
>>  }
>>  
>> +/*
>> + * Process MCE rtas errlog event.
>> + */
>> +static void mce_process_errlog_event(struct irq_work *work)
>> +{
>> +	struct rtas_error_log *err;
>> +
>> +	err = fwnmi_get_errlog();
>> +	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
>> +}
>> +
>>  /*
>>   * See if we can recover from a machine check exception.
>>   * This is only called on power4 (or above) and only via
>> @@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>>  		recovered = 1;
>>  	}
>>  
>> -	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
>> +	/* Queue irq work to log this rtas event later. */
>> +	irq_work_queue(&mce_errlog_process_work);
>>  
>>  	return recovered;
>>  }
>> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>> index fdb32e056ef4..60a067a6e743 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -41,6 +41,7 @@
>>  #include <linux/root_dev.h>
>>  #include <linux/of.h>
>>  #include <linux/of_pci.h>
>> +#include <linux/memblock.h>
>>  
>>  #include <asm/mmu.h>
>>  #include <asm/processor.h>
>> @@ -101,6 +102,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
>>  static void __init fwnmi_init(void)
>>  {
>>  	unsigned long system_reset_addr, machine_check_addr;
>> +	u8 *mce_data_buf;
>> +	unsigned int i;
>> +	int nr_cpus = num_possible_cpus();
>>  
>>  	int ibm_nmi_register = rtas_token("ibm,nmi-register");
>>  	if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
>> @@ -114,6 +118,18 @@ static void __init fwnmi_init(void)
>>  	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
>>  				machine_check_addr))
>>  		fwnmi_active = 1;
>> +
>> +	/*
>> +	 * Allocate a chunk for per cpu buffer to hold rtas errorlog.
>> +	 * It will be used in real mode mce handler, hence it needs to be
>> +	 * below RMA.
>> +	 */
>> +	mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
>> +					RTAS_ERROR_LOG_MAX, ppc64_rma_size));
>> +	for_each_possible_cpu(i) {
>> +		paca_ptrs[i]->mce_data_buf = mce_data_buf +
>> +						(RTAS_ERROR_LOG_MAX * i);
>> +	}
>>  }
>>  
>>  static void pseries_8259_cascade(struct irq_desc *desc)
>>
> 

^ permalink raw reply

* Re: [PATCH] QE GPIO: Add qe_gpio_set_multiple
From: Joakim Tjernlund @ 2018-07-03 10:30 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
	qiang.zhao@nxp.com, york.sun@nxp.com
In-Reply-To: <87a7rh7ji2.fsf@concordia.ellerman.id.au>

T24gVHVlLCAyMDE4LTA2LTI2IGF0IDIzOjQxICsxMDAwLCBNaWNoYWVsIEVsbGVybWFuIHdyb3Rl
Og0KPiANCj4gSm9ha2ltIFRqZXJubHVuZCA8Sm9ha2ltLlRqZXJubHVuZEBpbmZpbmVyYS5jb20+
IHdyaXRlczoNCj4gPiBPbiBUaHUsIDIwMTgtMDYtMjEgYXQgMDI6MzggKzAwMDAsIFFpYW5nIFpo
YW8gd3JvdGU6DQo+ID4gPiBPbiAwNi8xOS8yMDE4IDA5OjIyIEFNLCBKb2FraW0gVGplcm5sdW5k
IHdyb3RlOg0KPiA+ID4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPiA+IEZyb206IExp
bnV4cHBjLWRldiBbbWFpbHRvOmxpbnV4cHBjLWRldi1ib3VuY2VzK3FpYW5nLnpoYW89bnhwLmNv
bUBsaXN0cy5vemxhYnMub3JnXSBPbiBCZWhhbGYgT2YgSm9ha2ltIFRqZXJubHVuZA0KPiA+ID4g
U2VudDogMjAxOOW5tDbmnIgyMOaXpSAwOjIyDQo+ID4gPiBUbzogWW9yayBTdW4gPHlvcmsuc3Vu
QG54cC5jb20+OyBsaW51eHBwYy1kZXYgPGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnPg0K
PiA+ID4gU3ViamVjdDogW1BBVENIXSBRRSBHUElPOiBBZGQgcWVfZ3Bpb19zZXRfbXVsdGlwbGUN
Cj4gPiA+IA0KPiA+ID4gVGhpcyBjb3VzaW4gdG8gZ3Bpby1tcGM4eHh4IHdhcyBsYWNraW5nIGEg
bXVsdGlwbGUgcGlucyBtZXRob2QsIGFkZCBvbmUuDQo+ID4gPiANCj4gPiA+IFNpZ25lZC1vZmYt
Ynk6IEpvYWtpbSBUamVybmx1bmQgPGpvYWtpbS50amVybmx1bmRAaW5maW5lcmEuY29tPg0KPiA+
ID4gLS0tDQo+ID4gPiAgZHJpdmVycy9zb2MvZnNsL3FlL2dwaW8uYyB8IDI4ICsrKysrKysrKysr
KysrKysrKysrKysrKysrKysNCj4gPiA+ICAxIGZpbGUgY2hhbmdlZCwgMjggaW5zZXJ0aW9ucygr
KQ0KPiANCj4gLi4uDQo+ID4gPiAgc3RhdGljIGludCBxZV9ncGlvX2Rpcl9pbihzdHJ1Y3QgZ3Bp
b19jaGlwICpnYywgdW5zaWduZWQgaW50IGdwaW8pICB7DQo+ID4gPiAgICAgICAgIHN0cnVjdCBv
Zl9tbV9ncGlvX2NoaXAgKm1tX2djID0gdG9fb2ZfbW1fZ3Bpb19jaGlwKGdjKTsgQEAgLTI5OCw2
ICszMjUsNyBAQCBzdGF0aWMgaW50IF9faW5pdCBxZV9hZGRfZ3Bpb2NoaXBzKHZvaWQpDQo+ID4g
PiAgICAgICAgICAgICAgICAgZ2MtPmRpcmVjdGlvbl9vdXRwdXQgPSBxZV9ncGlvX2Rpcl9vdXQ7
DQo+ID4gPiAgICAgICAgICAgICAgICAgZ2MtPmdldCA9IHFlX2dwaW9fZ2V0Ow0KPiA+ID4gICAg
ICAgICAgICAgICAgIGdjLT5zZXQgPSBxZV9ncGlvX3NldDsNCj4gPiA+ICsgICAgICAgICAgICAg
ICBnYy0+c2V0X211bHRpcGxlID0gcWVfZ3Bpb19zZXRfbXVsdGlwbGU7DQo+ID4gPiANCj4gPiA+
ICAgICAgICAgICAgICAgICByZXQgPSBvZl9tbV9ncGlvY2hpcF9hZGRfZGF0YShucCwgbW1fZ2Ms
IHFlX2djKTsNCj4gPiA+ICAgICAgICAgICAgICAgICBpZiAocmV0KQ0KPiA+ID4gDQo+ID4gPiBS
ZXZpZXdlZC1ieTogUWlhbmcgWmhhbyA8cWlhbmcuemhhb0BueHAuY29tPg0KPiA+ID4gDQo+ID4g
DQo+ID4gV2hvIHBpY2tzIHVwIHRoaXMgcGF0Y2ggPyBJcyBpdCBxdWV1ZWQgc29tZXdoZXJlIGFs
cmVhZHk/DQo+IA0KPiBOb3QgbWUuDQoNCllvcms/IFlvdSBzZWVtIHRvIGJlIHRoZSBvbmx5IG9u
ZSBsZWZ0Lg0KDQogSm9ja2U=

^ permalink raw reply

* Re: CONFIG_LD_DEAD_CODE_DATA_ELIMINATION: Section mismatch in reference from the variable via_pmu_driver to the function .init.text:pmu_init()
From: Christophe LEROY @ 2018-07-03 10:07 UTC (permalink / raw)
  To: Mathieu Malaterre, Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <CA+7wUsxznwK8fztA+2TAWJL9nEHek_tyUcDm9HCvy-4MdvpAWQ@mail.gmail.com>



Le 03/07/2018 à 11:40, Mathieu Malaterre a écrit :
> Hi Nick,
> 
> Would you consider this a bug:

Looks more like a feature ...

In /drivers/macintosh/adb.c you have, and it's the only place 
via_pmu_driver is used.

#if defined(CONFIG_ADB_PMU) || defined(CONFIG_ADB_PMU68K)
	&via_pmu_driver,
#endif

Is one of those defined in your .config ? If not then via_pmu_driver 
gets eliminated hence nothing to warn on.


Christophe


> 
> $ touch drivers/macintosh/via-pmu.c
> $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n make ARCH=powerpc
> CROSS_COMPILE=powerpc-linux-gnu-
> ...
>    LD      vmlinux.o
>    MODPOST vmlinux.o
> WARNING: vmlinux.o(.data+0x216018): Section mismatch in reference from
> the variable via_pmu_driver to the function .init.text:pmu_init()
> The variable via_pmu_driver references
> the function __init pmu_init()
> If the reference is valid then annotate the
> variable with __init* or __refdata (see linux/init.h) or name the variable:
> *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
> 
> While:
> 
> $ touch drivers/macintosh/via-pmu.c
> $ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y make ARCH=powerpc
> CROSS_COMPILE=powerpc-linux-gnu-
> ...
>    AR      init/built-in.a
>    AR      built-in.a
>    LD      vmlinux.o
>    MODPOST vmlinux.o
>    KSYM    .tmp_kallsyms1.o
>    KSYM    .tmp_kallsyms2.o
>    LD      vmlinux
>    SORTEX  vmlinux
>    SYSMAP  System.map
> ...
> 
> Thanks for comment
> 

^ permalink raw reply

* Re: [PATCH v3 2/2] powernv/cpuidle: Use parsed device tree values for cpuidle_init
From: Gautham R Shenoy @ 2018-07-03  9:47 UTC (permalink / raw)
  To: Akshay Adiga
  Cc: linux-kernel, linuxppc-dev, linux-pm, rjw, stewart, benh, svaidy,
	ego, npiggin, mpe
In-Reply-To: <1530609656-13301-3-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Hi Akshay,

On Tue, Jul 03, 2018 at 02:50:56PM +0530, Akshay Adiga wrote:
> Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
> cpuidle driver. Use properties from pnv_idle_states structure for powernv
> cpuidle_init.
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> ---

[..snip..]

> 
>  		/*
>  		 * For nap and fastsleep, use default target_residency
>  		 * values if f/w does not expose it.
>  		 */

This comment can also go, since we are no longer hard-coding the
residency values in the kernel.

Otherwise looks good to me.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>


--
Thanks and Regards
gautham.

^ permalink raw reply

* Re: [PATCH v3 1/2] powernv/cpuidle: Parse dt idle properties into global structure
From: Gautham R Shenoy @ 2018-07-03  9:46 UTC (permalink / raw)
  To: Akshay Adiga
  Cc: linux-kernel, linuxppc-dev, linux-pm, rjw, stewart, benh, svaidy,
	ego, npiggin, mpe
In-Reply-To: <1530609656-13301-2-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Hi Akshay,

On Tue, Jul 03, 2018 at 02:50:55PM +0530, Akshay Adiga wrote:
> Device-tree parsing happens twice, once while deciding idle state to be
> used for hotplug and once during cpuidle init. Hence, parsing the device
> tree and caching it will reduce code duplication. Parsing code has been
> moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
> to the properties in the device tree the number of available states is
> also required.
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Looks good.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

^ permalink raw reply

* CONFIG_LD_DEAD_CODE_DATA_ELIMINATION: Section mismatch in reference from the variable via_pmu_driver to the function .init.text:pmu_init()
From: Mathieu Malaterre @ 2018-07-03  9:40 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev

Hi Nick,

Would you consider this a bug:

$ touch drivers/macintosh/via-pmu.c
$ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n make ARCH=powerpc
CROSS_COMPILE=powerpc-linux-gnu-
...
  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.data+0x216018): Section mismatch in reference from
the variable via_pmu_driver to the function .init.text:pmu_init()
The variable via_pmu_driver references
the function __init pmu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console

While:

$ touch drivers/macintosh/via-pmu.c
$ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y make ARCH=powerpc
CROSS_COMPILE=powerpc-linux-gnu-
...
  AR      init/built-in.a
  AR      built-in.a
  LD      vmlinux.o
  MODPOST vmlinux.o
  KSYM    .tmp_kallsyms1.o
  KSYM    .tmp_kallsyms2.o
  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
...

Thanks for comment

^ permalink raw reply

* Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices
From: Pingfan Liu @ 2018-07-03  9:26 UTC (permalink / raw)
  To: lukas
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Grygorii Strashko, Christoph Hellwig, Bjorn Helgaas, Dave Young,
	linux-pci, linuxppc-dev
In-Reply-To: <20180703075106.GA21136@wunner.de>

On Tue, Jul 3, 2018 at 3:51 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Tue, Jul 03, 2018 at 02:50:40PM +0800, Pingfan Liu wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > places an assumption of supplier<-consumer order on the process of probe.
> > But it turns out to break down the parent <- child order in some scene.
> > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > have been probed. Then comes the bridge's module, which enables extra
> > feature(such as hotplug) on this bridge. This will break the
> > parent<-children order and cause failure when "kexec -e" in some scenario.
> >
> > The detailed description of the scenario:
> > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > to some issue. For this case, the bridge is moved after its children in
> > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > write back buffer in flight due to the former shutdown of the bridge which
> > clears the BusMaster bit.
>
> If you revert commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services
> during shutdown"), does the issue go away?

Yes, it is gone.

^ permalink raw reply

* [PATCH v3 2/2] powernv/cpuidle: Use parsed device tree values for cpuidle_init
From: Akshay Adiga @ 2018-07-03  9:20 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-pm
  Cc: rjw, stewart, benh, svaidy, ego, npiggin, mpe, Akshay Adiga
In-Reply-To: <1530609656-13301-1-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/cpuidle.h |   2 +
 drivers/cpuidle/cpuidle-powernv.c  | 154 +++++------------------------
 2 files changed, 28 insertions(+), 128 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index 574b0ce1d671..43e5f31fe64d 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -90,6 +90,8 @@ struct pnv_idle_states_t {
 	bool valid;
 };
 
+extern struct pnv_idle_states_t *pnv_idle_states;
+extern int nr_pnv_idle_states;
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index d29e4f041efe..208d57c77305 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -242,6 +242,7 @@ static inline void add_powernv_state(int index, const char *name,
 	powernv_states[index].target_residency = target_residency;
 	powernv_states[index].exit_latency = exit_latency;
 	powernv_states[index].enter = idle_fn;
+	/* For power8 and below psscr_* will be 0 */
 	stop_psscr_table[index].val = psscr_val;
 	stop_psscr_table[index].mask = psscr_mask;
 }
@@ -263,186 +264,84 @@ static inline int validate_dt_prop_sizes(const char *prop1, int prop1_len,
 extern u32 pnv_get_supported_cpuidle_states(void);
 static int powernv_add_idle_states(void)
 {
-	struct device_node *power_mgt;
 	int nr_idle_states = 1; /* Snooze */
-	int dt_idle_states, count;
-	u32 latency_ns[CPUIDLE_STATE_MAX];
-	u32 residency_ns[CPUIDLE_STATE_MAX];
-	u32 flags[CPUIDLE_STATE_MAX];
-	u64 psscr_val[CPUIDLE_STATE_MAX];
-	u64 psscr_mask[CPUIDLE_STATE_MAX];
-	const char *names[CPUIDLE_STATE_MAX];
+	int dt_idle_states;
 	u32 has_stop_states = 0;
-	int i, rc;
+	int i;
 	u32 supported_flags = pnv_get_supported_cpuidle_states();
 
 
 	/* Currently we have snooze statically defined */
-
-	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
-	if (!power_mgt) {
-		pr_warn("opal: PowerMgmt Node not found\n");
+	if (nr_pnv_idle_states <= 0) {
+		pr_warn("cpuidle-powernv : Only Snooze is available\n");
 		goto out;
 	}
 
-	/* Read values of any property to determine the num of idle states */
-	dt_idle_states = of_property_count_u32_elems(power_mgt, "ibm,cpu-idle-state-flags");
-	if (dt_idle_states < 0) {
-		pr_warn("cpuidle-powernv: no idle states found in the DT\n");
-		goto out;
-	}
-
-	count = of_property_count_u32_elems(power_mgt,
-					    "ibm,cpu-idle-state-latencies-ns");
-
-	if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-				   "ibm,cpu-idle-state-latencies-ns",
-				   count) != 0)
-		goto out;
-
-	count = of_property_count_strings(power_mgt,
-					  "ibm,cpu-idle-state-names");
-	if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags", dt_idle_states,
-				   "ibm,cpu-idle-state-names",
-				   count) != 0)
-		goto out;
+	/* TODO: Count only states which are eligible for cpuidle */
+	dt_idle_states = nr_pnv_idle_states;
 
 	/*
 	 * Since snooze is used as first idle state, max idle states allowed is
 	 * CPUIDLE_STATE_MAX -1
 	 */
-	if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
+	if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
 		pr_warn("cpuidle-powernv: discovered idle states more than allowed");
 		dt_idle_states = CPUIDLE_STATE_MAX - 1;
 	}
 
-	if (of_property_read_u32_array(power_mgt,
-			"ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
-		pr_warn("cpuidle-powernv : missing ibm,cpu-idle-state-flags in DT\n");
-		goto out;
-	}
-
-	if (of_property_read_u32_array(power_mgt,
-		"ibm,cpu-idle-state-latencies-ns", latency_ns,
-		dt_idle_states)) {
-		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
-		goto out;
-	}
-	if (of_property_read_string_array(power_mgt,
-		"ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
-		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
-		goto out;
-	}
-
 	/*
 	 * If the idle states use stop instruction, probe for psscr values
 	 * and psscr mask which are necessary to specify required stop level.
 	 */
-	has_stop_states = (flags[0] &
+	has_stop_states = (pnv_idle_states[0].flags &
 			   (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP));
-	if (has_stop_states) {
-		count = of_property_count_u64_elems(power_mgt,
-						    "ibm,cpu-idle-state-psscr");
-		if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags",
-					   dt_idle_states,
-					   "ibm,cpu-idle-state-psscr",
-					   count) != 0)
-			goto out;
-
-		count = of_property_count_u64_elems(power_mgt,
-						    "ibm,cpu-idle-state-psscr-mask");
-		if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags",
-					   dt_idle_states,
-					   "ibm,cpu-idle-state-psscr-mask",
-					   count) != 0)
-			goto out;
-
-		if (of_property_read_u64_array(power_mgt,
-		    "ibm,cpu-idle-state-psscr", psscr_val, dt_idle_states)) {
-			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
-			goto out;
-		}
-
-		if (of_property_read_u64_array(power_mgt,
-					       "ibm,cpu-idle-state-psscr-mask",
-						psscr_mask, dt_idle_states)) {
-			pr_warn("cpuidle-powernv:Missing ibm,cpu-idle-state-psscr-mask in DT\n");
-			goto out;
-		}
-	}
-
-	count = of_property_count_u32_elems(power_mgt,
-					    "ibm,cpu-idle-state-residency-ns");
-
-	if (count < 0) {
-		rc = count;
-	} else if (validate_dt_prop_sizes("ibm,cpu-idle-state-flags",
-					  dt_idle_states,
-					  "ibm,cpu-idle-state-residency-ns",
-					  count) != 0) {
-		goto out;
-	} else {
-		rc = of_property_read_u32_array(power_mgt,
-						"ibm,cpu-idle-state-residency-ns",
-						residency_ns, dt_idle_states);
-	}
 
 	for (i = 0; i < dt_idle_states; i++) {
 		unsigned int exit_latency, target_residency;
 		bool stops_timebase = false;
+		struct pnv_idle_states_t *state = &pnv_idle_states[i];
 
 		/*
 		 * Skip the platform idle state whose flag isn't in
 		 * the supported_cpuidle_states flag mask.
 		 */
-		if ((flags[i] & supported_flags) != flags[i])
+		if ((state->flags & supported_flags) != state->flags)
 			continue;
 		/*
 		 * If an idle state has exit latency beyond
 		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
 		 * in cpu-idle.
 		 */
-		if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
+		if (state->latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
 			continue;
 		/*
 		 * Firmware passes residency and latency values in ns.
 		 * cpuidle expects it in us.
 		 */
-		exit_latency = DIV_ROUND_UP(latency_ns[i], 1000);
-		if (!rc)
-			target_residency = DIV_ROUND_UP(residency_ns[i], 1000);
-		else
-			target_residency = 0;
-
-		if (has_stop_states) {
-			int err = validate_psscr_val_mask(&psscr_val[i],
-							  &psscr_mask[i],
-							  flags[i]);
-			if (err) {
-				report_invalid_psscr_val(psscr_val[i], err);
+		exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
+		target_residency = DIV_ROUND_UP(state->residency_ns, 1000);
+
+		if (has_stop_states && !(state->valid))
 				continue;
-			}
-		}
 
-		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
+		if (state->flags & OPAL_PM_TIMEBASE_STOP)
 			stops_timebase = true;
 
 		/*
 		 * For nap and fastsleep, use default target_residency
 		 * values if f/w does not expose it.
 		 */
-		if (flags[i] & OPAL_PM_NAP_ENABLED) {
-			if (!rc)
-				target_residency = 100;
+		if (state->flags & OPAL_PM_NAP_ENABLED) {
 			/* Add NAP state */
 			add_powernv_state(nr_idle_states, "Nap",
 					  CPUIDLE_FLAG_NONE, nap_loop,
 					  target_residency, exit_latency, 0, 0);
 		} else if (has_stop_states && !stops_timebase) {
-			add_powernv_state(nr_idle_states, names[i],
+			add_powernv_state(nr_idle_states, state->name,
 					  CPUIDLE_FLAG_NONE, stop_loop,
 					  target_residency, exit_latency,
-					  psscr_val[i], psscr_mask[i]);
+					  state->psscr_val,
+					  state->psscr_mask);
 		}
 
 		/*
@@ -450,20 +349,19 @@ static int powernv_add_idle_states(void)
 		 * within this config dependency check.
 		 */
 #ifdef CONFIG_TICK_ONESHOT
-		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
-			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
-			if (!rc)
-				target_residency = 300000;
+		else if (state->flags & OPAL_PM_SLEEP_ENABLED ||
+			 state->flags & OPAL_PM_SLEEP_ENABLED_ER1) {
 			/* Add FASTSLEEP state */
 			add_powernv_state(nr_idle_states, "FastSleep",
 					  CPUIDLE_FLAG_TIMER_STOP,
 					  fastsleep_loop,
 					  target_residency, exit_latency, 0, 0);
 		} else if (has_stop_states && stops_timebase) {
-			add_powernv_state(nr_idle_states, names[i],
+			add_powernv_state(nr_idle_states, state->name,
 					  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
 					  target_residency, exit_latency,
-					  psscr_val[i], psscr_mask[i]);
+					  state->psscr_val,
+					  state->psscr_mask);
 		}
 #endif
 		else
-- 
2.18.0.rc2.85.g1fb9df7

^ permalink raw reply related

* [PATCH v3 1/2] powernv/cpuidle: Parse dt idle properties into global structure
From: Akshay Adiga @ 2018-07-03  9:20 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-pm
  Cc: rjw, stewart, benh, svaidy, ego, npiggin, mpe, Akshay Adiga
In-Reply-To: <1530609656-13301-1-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/cpuidle.h    |  11 ++
 arch/powerpc/platforms/powernv/idle.c | 216 ++++++++++++++++----------
 2 files changed, 149 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..574b0ce1d671 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -79,6 +79,17 @@ struct stop_sprs {
 	u64 mmcra;
 };
 
+#define PNV_IDLE_NAME_LEN    16
+struct pnv_idle_states_t {
+	char name[PNV_IDLE_NAME_LEN];
+	u32 latency_ns;
+	u32 residency_ns;
+	u64 psscr_val;
+	u64 psscr_mask;
+	u32 flags;
+	bool valid;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 1c5d0675b43c..7cf71b3e03a1 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -36,6 +36,8 @@
 #define P9_STOP_SPR_PSSCR      855
 
 static u32 supported_cpuidle_states;
+struct pnv_idle_states_t *pnv_idle_states;
+int nr_pnv_idle_states;
 
 /*
  * The default stop state that will be used by ppc_md.power_save
@@ -622,48 +624,10 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
-					int dt_idle_states)
+static int __init pnv_power9_idle_init(void)
 {
-	u64 *psscr_val = NULL;
-	u64 *psscr_mask = NULL;
-	u32 *residency_ns = NULL;
 	u64 max_residency_ns = 0;
-	int rc = 0, i;
-
-	psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
-	psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
-	residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
-			       GFP_KERNEL);
-
-	if (!psscr_val || !psscr_mask || !residency_ns) {
-		rc = -1;
-		goto out;
-	}
-
-	if (of_property_read_u64_array(np,
-		"ibm,cpu-idle-state-psscr",
-		psscr_val, dt_idle_states)) {
-		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
-		rc = -1;
-		goto out;
-	}
-
-	if (of_property_read_u64_array(np,
-				       "ibm,cpu-idle-state-psscr-mask",
-				       psscr_mask, dt_idle_states)) {
-		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask in DT\n");
-		rc = -1;
-		goto out;
-	}
-
-	if (of_property_read_u32_array(np,
-				       "ibm,cpu-idle-state-residency-ns",
-					residency_ns, dt_idle_states)) {
-		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-residency-ns in DT\n");
-		rc = -1;
-		goto out;
-	}
+	int i;
 
 	/*
 	 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
@@ -679,33 +643,36 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
 	 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
 	 */
 	pnv_first_deep_stop_state = MAX_STOP_STATE;
-	for (i = 0; i < dt_idle_states; i++) {
+	for (i = 0; i < nr_pnv_idle_states; i++) {
 		int err;
-		u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+		struct pnv_idle_states_t *state = &pnv_idle_states[i];
+		u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK;
 
-		if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
-		     (pnv_first_deep_stop_state > psscr_rl))
+		if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
+		    pnv_first_deep_stop_state > psscr_rl)
 			pnv_first_deep_stop_state = psscr_rl;
 
-		err = validate_psscr_val_mask(&psscr_val[i], &psscr_mask[i],
-					      flags[i]);
+		err = validate_psscr_val_mask(&state->psscr_val,
+					      &state->psscr_mask,
+					      state->flags);
 		if (err) {
-			report_invalid_psscr_val(psscr_val[i], err);
+			state->valid = false;
+			report_invalid_psscr_val(state->psscr_val, err);
 			continue;
 		}
 
-		if (max_residency_ns < residency_ns[i]) {
-			max_residency_ns = residency_ns[i];
-			pnv_deepest_stop_psscr_val = psscr_val[i];
-			pnv_deepest_stop_psscr_mask = psscr_mask[i];
-			pnv_deepest_stop_flag = flags[i];
+		if (max_residency_ns < state->residency_ns) {
+			max_residency_ns = state->residency_ns;
+			pnv_deepest_stop_psscr_val = state->psscr_val;
+			pnv_deepest_stop_psscr_mask = state->psscr_mask;
+			pnv_deepest_stop_flag = state->flags;
 			deepest_stop_found = true;
 		}
 
 		if (!default_stop_found &&
-		    (flags[i] & OPAL_PM_STOP_INST_FAST)) {
-			pnv_default_stop_val = psscr_val[i];
-			pnv_default_stop_mask = psscr_mask[i];
+		    (state->flags & OPAL_PM_STOP_INST_FAST)) {
+			pnv_default_stop_val = state->psscr_val;
+			pnv_default_stop_mask = state->psscr_mask;
 			default_stop_found = true;
 		}
 	}
@@ -728,11 +695,8 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
 
 	pr_info("cpuidle-powernv: Requested Level (RL) value of first deep stop = 0x%llx\n",
 		pnv_first_deep_stop_state);
-out:
-	kfree(psscr_val);
-	kfree(psscr_mask);
-	kfree(residency_ns);
-	return rc;
+
+	return 0;
 }
 
 /*
@@ -740,50 +704,146 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
  */
 static void __init pnv_probe_idle_states(void)
 {
-	struct device_node *np;
-	int dt_idle_states;
-	u32 *flags = NULL;
 	int i;
 
+	if (nr_pnv_idle_states < 0) {
+		pr_warn("cpuidle-powernv: no idle states found in the DT\n");
+		return;
+	}
+
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		if (pnv_power9_idle_init())
+			return;
+	}
+
+	for (i = 0; i < nr_pnv_idle_states; i++)
+		supported_cpuidle_states |= pnv_idle_states[i].flags;
+}
+
+/*
+ * This function parses device-tree and populates all the information
+ * into pnv_idle_states structure. It also sets up nr_pnv_idle_states
+ * which is the number of cpuidle states discovered through device-tree.
+ */
+
+static int pnv_parse_cpuidle_dt(void)
+{
+	struct device_node *np;
+	int nr_idle_states, i;
+	int rc = 0;
+	u32 *temp_u32;
+	u64 *temp_u64;
+	const char **temp_string;
+
 	np = of_find_node_by_path("/ibm,opal/power-mgt");
 	if (!np) {
 		pr_warn("opal: PowerMgmt Node not found\n");
-		goto out;
+		return -ENODEV;
 	}
-	dt_idle_states = of_property_count_u32_elems(np,
-			"ibm,cpu-idle-state-flags");
-	if (dt_idle_states < 0) {
-		pr_warn("cpuidle-powernv: no idle states found in the DT\n");
+	nr_idle_states = of_property_count_u32_elems(np,
+						"ibm,cpu-idle-state-flags");
+
+	pnv_idle_states = kcalloc(nr_idle_states, sizeof(*pnv_idle_states),
+				  GFP_KERNEL);
+	temp_u32 = kcalloc(nr_idle_states, sizeof(u32),  GFP_KERNEL);
+	temp_u64 = kcalloc(nr_idle_states, sizeof(u64),  GFP_KERNEL);
+	temp_string = kcalloc(nr_idle_states, sizeof(char *),  GFP_KERNEL);
+
+	if (!(pnv_idle_states && temp_u32 && temp_u64 && temp_string)) {
+		pr_err("Could not allocate memory for dt parsing\n");
+		rc = -ENOMEM;
 		goto out;
 	}
 
-	flags = kcalloc(dt_idle_states, sizeof(*flags),  GFP_KERNEL);
-
-	if (of_property_read_u32_array(np,
-			"ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
+	/* Read flags */
+	if (of_property_read_u32_array(np, "ibm,cpu-idle-state-flags",
+				       temp_u32, nr_idle_states)) {
 		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-flags in DT\n");
+		rc = -EINVAL;
 		goto out;
 	}
+	for (i = 0; i < nr_idle_states; i++)
+		pnv_idle_states[i].flags = temp_u32[i];
+
+	/* Read latencies */
+	if (of_property_read_u32_array(np, "ibm,cpu-idle-state-latencies-ns",
+				       temp_u32, nr_idle_states)) {
+		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
+		rc = -EINVAL;
+		goto out;
+	}
+	for (i = 0; i < nr_idle_states; i++)
+		pnv_idle_states[i].latency_ns = temp_u32[i];
+
+	/* Read residencies */
+	if (of_property_read_u32_array(np, "ibm,cpu-idle-state-residency-ns",
+				       temp_u32, nr_idle_states)) {
+		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
+		rc = -EINVAL;
+		goto out;
+	}
+	for (i = 0; i < nr_idle_states; i++)
+		pnv_idle_states[i].residency_ns = temp_u32[i];
 
+	/* For power9 */
 	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
-		if (pnv_power9_idle_init(np, flags, dt_idle_states))
+		/* Read pm_crtl_val */
+		if (of_property_read_u64_array(np, "ibm,cpu-idle-state-psscr",
+					       temp_u64, nr_idle_states)) {
+			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
+			rc = -EINVAL;
+			goto out;
+		}
+		for (i = 0; i < nr_idle_states; i++)
+			pnv_idle_states[i].psscr_val = temp_u64[i];
+
+		/* Read pm_crtl_mask */
+		if (of_property_read_u64_array(np, "ibm,cpu-idle-state-psscr-mask",
+					       temp_u64, nr_idle_states)) {
+			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask in DT\n");
+			rc = -EINVAL;
 			goto out;
+		}
+		for (i = 0; i < nr_idle_states; i++)
+			pnv_idle_states[i].psscr_mask = temp_u64[i];
 	}
 
-	for (i = 0; i < dt_idle_states; i++)
-		supported_cpuidle_states |= flags[i];
+	/*
+	 * power8 specific properties ibm,cpu-idle-state-pmicr-mask and
+	 * ibm,cpu-idle-state-pmicr-val were never used and there is no
+	 * plan to use it in near future. Hence, not parsing these properties
+	 */
 
+	if (of_property_read_string_array(np, "ibm,cpu-idle-state-names",
+					  temp_string, nr_idle_states) < 0) {
+		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
+		rc = -EINVAL;
+		goto out;
+	}
+	for (i = 0; i < nr_idle_states; i++)
+		strncpy(pnv_idle_states[i].name, temp_string[i],
+			PNV_IDLE_NAME_LEN);
+	nr_pnv_idle_states = nr_idle_states;
+	rc = 0;
 out:
-	kfree(flags);
+	kfree(temp_u32);
+	kfree(temp_u64);
+	kfree(temp_string);
+	return rc;
 }
+
 static int __init pnv_init_idle_states(void)
 {
-
+	int rc = 0;
 	supported_cpuidle_states = 0;
 
+	/* In case we error out nr_pnv_idle_states will be zero */
+	nr_pnv_idle_states = 0;
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		goto out;
-
+	rc = pnv_parse_cpuidle_dt();
+	if (rc)
+		return rc;
 	pnv_probe_idle_states();
 
 	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-- 
2.18.0.rc2.85.g1fb9df7

^ permalink raw reply related

* [PATCH v3 0/2] powernv/cpuidle Device-tree parsing cleanup
From: Akshay Adiga @ 2018-07-03  9:20 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-pm
  Cc: rjw, stewart, benh, svaidy, ego, npiggin, mpe, Akshay Adiga


Device-tree parsed multiple time in powernv cpuidle and powernv
hotplug code. 

First to identify supported flags. Second time, to identify deepest_state
and first deep state. Third time, during cpuidle init to find the available
idle states. Any change in device-tree format will lead to make changes in
these 3 places. Errors in device-tree can be handled in a better manner.

This series adds code to parse device tree once and save in global structure.

Changes from v2 :
 - Fix build error (moved a hunk from patch 1 to patch 2)
Changes from v1 :
 - fold first 2 patches into 1
 - rename pm_ctrl_reg_* as psscr_*
 - added comment stating removal of pmicr parsing code
 - removed parsing code for pmicr
 - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val
are valid combination,
 - Change function description of pnv_parse_cpuidle_dt
 - Added error handling code.


Akshay Adiga (2):
  powernv/cpuidle: Parse dt idle properties into global structure
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

 arch/powerpc/include/asm/cpuidle.h    |  13 ++
 arch/powerpc/platforms/powernv/idle.c | 216 ++++++++++++++++----------
 drivers/cpuidle/cpuidle-powernv.c     | 154 ++++--------------
 3 files changed, 177 insertions(+), 206 deletions(-)

-- 
2.18.0.rc2.85.g1fb9df7

^ permalink raw reply

* Oops in sock_wfree
From: Christophe LEROY @ 2018-07-03  8:51 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org, netdev, Eric Dumazet

Hi,

I was having strange unexpected memory corruption, therefore I activated 
DEBUG_PAGEALLOC and I now end up with the following Oops, which tends to 
make me think we have somewhere in the network code a use-after-free 
bug. I saw a few of such bugs have been fixed for IPv4 and IPv6. Maybe 
we have one remaining for Unix sockets ? How can I spot it off and fix it ?

[   39.645644] Unable to handle kernel paging request for data at 
address 0xc2235010
[   39.652860] Faulting instruction address: 0xc0334d5c
[   39.657783] Oops: Kernel access of bad area, sig: 11 [#1]
[   39.663085] BE PREEMPT DEBUG_PAGEALLOC CMPC885
[   39.667488] SAF3000 DIE NOTIFICATION
[   39.671050] CPU: 0 PID: 269 Comm: in:imuxsock Not tainted 
4.14.52-00025-g5bada429cf #22
[   39.679633] task: c623e100 task.stack: c651e000
[   39.684106] NIP:  c0334d5c LR: c043602c CTR: c0435fb8
[   39.689103] REGS: c651fc00 TRAP: 0300   Not tainted 
(4.14.52-00025-g5bada429cf)
[   39.697087] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28002822 XER: 20000000
[   39.703720] DAR: c2235010 DSISR: c0000000
[   39.703720] GPR00: c043602c c651fcb0 c623e100 c619eec0 c642c540 
00000008 00000018 c651fd4c
[   39.703720] GPR08: c0435fb8 000002b0 c068d830 00000004 28004822 
100d4208 00000000 77990848
[   39.703720] GPR16: 0ff58398 778eb4b0 1039f050 1039f0a8 1005ddbc 
0ff5a7bc 00000000 00000000
[   39.703720] GPR24: 00000072 c5011650 c651feb8 00000072 c619eec0 
00000040 c2234fc0 c619eec0
[   39.741401] NIP [c0334d5c] sock_wfree+0x18/0xa4
[   39.745843] LR [c043602c] unix_destruct_scm+0x74/0x88
[   39.750786] Call Trace:
[   39.753253] [c651fcb0] [c006348c] ns_to_timeval+0x4c/0x7c (unreliable)
[   39.759690] [c651fcc0] [c043602c] unix_destruct_scm+0x74/0x88
[   39.765385] [c651fcf0] [c033a10c] skb_release_head_state+0x8c/0x110
[   39.771571] [c651fd00] [c033a3c4] skb_release_all+0x18/0x50
[   39.777078] [c651fd10] [c033a7cc] consume_skb+0x38/0xec
[   39.782255] [c651fd20] [c0342d7c] skb_free_datagram+0x1c/0x68
[   39.787922] [c651fd30] [c0435c8c] unix_dgram_recvmsg+0x19c/0x4ac
[   39.793863] [c651fdb0] [c0331370] ___sys_recvmsg+0x98/0x138
[   39.799371] [c651feb0] [c0333280] __sys_recvmsg+0x40/0x84
[   39.804707] [c651ff10] [c0333680] SyS_socketcall+0xb8/0x1d4
[   39.810220] [c651ff40] [c000d1ac] ret_from_syscall+0x0/0x38
[   39.815673] Instruction dump:
[   39.818612] 41beffac 4bffff58 38800003 4bffffa0 38800001 4bffff98 
7c0802a6 9421fff0
[   39.826267] bfc10008 90010014 83c30010 812300a8 <815e0050> 3bfe00e0 
71480200 4082003c
[   39.834113] ---[ end trace 8affde0490d7e25e ]---

Thanks
Christophe

^ permalink raw reply

* Re: [PATCH v9 0/6] add support for relative references in special sections
From: Ingo Molnar @ 2018-07-03  8:38 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, the arch/x86 maintainers, Ingo Molnar,
	Linux Kernel Mailing List, Arnd Bergmann, Kees Cook,
	Michael Ellerman, Thomas Garnier, Serge E. Hallyn, Bjorn Helgaas,
	Benjamin Herrenschmidt, Russell King, Paul Mackerras,
	Catalin Marinas, Petr Mladek, James Morris, Andrew Morton,
	Nicolas Pitre, Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
	Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev,
	Will Deacon
In-Reply-To: <CAKv+Gu-dkVEtRmvZwiAUpxPSHrkTVvzeGhCwczEmxRjcgfd+LQ@mail.gmail.com>


* Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> On 27 June 2018 at 17:15, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Ard,
> >
> > On Tue, Jun 26, 2018 at 08:27:55PM +0200, Ard Biesheuvel wrote:
> >> This adds support for emitting special sections such as initcall arrays,
> >> PCI fixups and tracepoints as relative references rather than absolute
> >> references. This reduces the size by 50% on 64-bit architectures, but
> >> more importantly, it removes the need for carrying relocation metadata
> >> for these sections in relocatable kernels (e.g., for KASLR) that needs
> >> to be fixed up at boot time. On arm64, this reduces the vmlinux footprint
> >> of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry
> >> vs 4 byte relative reference)
> >>
> >> Patch #3 was sent out before as a single patch. This series supersedes
> >> the previous submission. This version makes relative ksymtab entries
> >> dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
> >> than trying to infer from kbuild test robot replies for which architectures
> >> it should be blacklisted.
> >>
> >> Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
> >> and sets it for the main architectures that are expected to benefit the
> >> most from this feature, i.e., 64-bit architectures or ones that use
> >> runtime relocations.
> >>
> >> Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
> >> ksymtab/kcrctab sections in decompressor and EFI stub objects when
> >> rebuilding existing C files to run in a different context.
> >
> > I had a small question on patch 3, but it's really for my understanding.
> > So, for patches 1-3:
> >
> > Reviewed-by: Will Deacon <will.deacon@arm.com>
> >
> 
> Thanks all.
> 
> Thomas, Ingo,
> 
> Except for the below tweak against patch #3 for powerpc, which may
> apparently get confused by an input section called .discard without
> any suffixes, this series is good to go, but requires your ack to
> proceed, so I would like to ask you to share your comments and/or
> objections. Also, any suggestions or recommendations regarding the
> route these patches should take are highly appreciated.

LGTM:

Acked-by: Ingo Molnar <mingo@kernel.org>

Regarding route - I suspect -mm would be good, or any other tree that does a lot 
of cross-arch testing?

Thanks,

	Ingo

^ permalink raw reply

* Re: 83a092cf95f28: powerpc: Link warning for orphan sections
From: Mathieu Malaterre @ 2018-07-03  7:37 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <20180703172332.7210ec7a@roar.ozlabs.ibm.com>

On Tue, Jul 3, 2018 at 9:23 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Tue, 3 Jul 2018 09:03:46 +0200
> Mathieu Malaterre <malat@debian.org> wrote:
>
> > Hi Nick,
> >
> > I am building my kernel (ppc32) with both
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and CONFIG_UBSAN=y. This leads
> > to ~316428 warnings such as:
> >
> > + powerpc-linux-gnu-ld -EB -m elf32ppc -Bstatic --orphan-handling=warn
> > --build-id --gc-sections -X -o .tmp_vmlinux1 -T
> > ./arch/powerpc/kernel/vmlinux.lds --who
> > le-archive built-in.a --no-whole-archive --start-group lib/lib.a --end-group
> > powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data393'
> > from `init/main.o' being placed in section `.data..Lubsan_data393'.
> > powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data394'
> > from `init/main.o' being placed in section `.data..Lubsan_data394'.
> > powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data395'
> > from `init/main.o' being placed in section `.data..Lubsan_data395'.
> > powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data396'
> > from `init/main.o' being placed in section `.data..Lubsan_data396'.
> > ...
> >
> > What would you recommend to reduce the number of warnings produced ? I
> > tried `--warn-once` but that only affect undefined symbols.
>
> I'm not sure if the linker can be quietened. You could try putting
> *(.data..Lubsan_data*) into the .data section in
> powerpc/kernel/vmlinux.lds.S

Nice ! Will submit a patch. Thanks

> Thanks,
> Nick

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox