LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"
From: peterz @ 2020-09-15 18:16 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Aneesh Kumar K.V, Santosh Sivaraj, Alistair Popple,
	Greg Kroah-Hartman, Mahesh Salgaonkar, Nicholas Piggin,
	linux-kernel, Jordan Niethe, Paul Mackerras, Ganesh Goudar,
	stable, linuxppc-dev, Mike Rapoport
In-Reply-To: <20200915180659.12503-1-msuchanek@suse.de>

On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote:
> This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d.
> 
> This commit causes the kernel to oops and reboot when injecting a SLB
> multihit which causes a MCE.
> 
> Before this commit a SLB multihit was corrected by the kernel and the
> system continued to operate normally.
> 
> cc: stable@vger.kernel.org
> Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>

Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
nmi_enter() supports nesting natively.

^ permalink raw reply

* Re: [PATCH v2 3/4] sparc64: remove mm_cpumask clearing to fix kthread_use_mm race
From: David Miller @ 2020-09-15 19:42 UTC (permalink / raw)
  To: npiggin
  Cc: axboe, linux-arch, peterz, aneesh.kumar, linux-kernel, linux-mm,
	sparclinux, akpm, linuxppc-dev
In-Reply-To: <1600139445.qwycwjuwdq.astroid@bobo.none>

From: Nicholas Piggin <npiggin@gmail.com>
Date: Tue, 15 Sep 2020 13:24:07 +1000

> Excerpts from David Miller's message of September 15, 2020 5:59 am:
>> From: Nicholas Piggin <npiggin@gmail.com>
>> Date: Mon, 14 Sep 2020 14:52:18 +1000
>> 
>>  ...
>>> The basic fix for sparc64 is to remove its mm_cpumask clearing code. The
>>> optimisation could be effectively restored by sending IPIs to mm_cpumask
>>> members and having them remove themselves from mm_cpumask. This is more
>>> tricky so I leave it as an exercise for someone with a sparc64 SMP.
>>> powerpc has a (currently similarly broken) example.
>>> 
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> 
>> Sad to see this optimization go away, but what can I do:
>> 
>> Acked-by: David S. Miller <davem@davemloft.net>
>> 
> 
> Thanks Dave, any objection if we merge this via the powerpc tree
> to keep the commits together?

No objection.

^ permalink raw reply

* [PATCH v3] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
From: Scott Cheloha @ 2020-09-15 19:46 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nathan Lynch, Michal Suchanek, Laurent Dufour, David Hildenbrand,
	Rick Lindsley

During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
to determine which node id (nid) to use when later calling __add_memory().

This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
appropriate nid for a given address by looking up the LMB containing the
address and then passing that LMB to of_drconf_to_nid_single() to get the
nid.  In dlpar_add_lmb() we get this address from the LMB itself.

In short, we have a pointer to an LMB and then we are searching for
that LMB *again* in order to find its nid.

If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
can skip the redundant lookup.  The only error handling we need to
duplicate from memory_add_physaddr_to_nid() is the fallback to the
default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
an invalid nid.

Skipping the extra lookup makes hot-add operations faster, especially
on machines with many LMBs.

Consider an LPAR with 126976 LMBs.  In one test, hot-adding 126000
LMBs on an upatched kernel took ~3.5 hours while a patched kernel
completed the same operation in ~2 hours:

Unpatched (12450 seconds):
Sep  9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
Sep  9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
[...]
Sep  9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added

Patched (7065 seconds):
Sep  8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
Sep  8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
[...]
Sep  8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added

It should be noted that the speedup grows more substantial when
hot-adding LMBs at the end of the drconf range.  This is because we
are skipping a linear LMB search.

To see the distinction, consider smaller hot-add test on the same
LPAR.  A perf-stat run with 10 iterations showed that hot-adding 4096
LMBs completed less than 1 second faster on a patched kernel:

Unpatched:
 Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):

        104,753.42 msec task-clock                #    0.992 CPUs utilized            ( +-  0.55% )
             4,708      context-switches          #    0.045 K/sec                    ( +-  0.69% )
             2,444      cpu-migrations            #    0.023 K/sec                    ( +-  1.25% )
               394      page-faults               #    0.004 K/sec                    ( +-  0.22% )
   445,902,503,057      cycles                    #    4.257 GHz                      ( +-  0.55% )  (66.67%)
     8,558,376,740      stalled-cycles-frontend   #    1.92% frontend cycles idle     ( +-  0.88% )  (49.99%)
   300,346,181,651      stalled-cycles-backend    #   67.36% backend cycles idle      ( +-  0.76% )  (50.01%)
   258,091,488,691      instructions              #    0.58  insn per cycle
                                                  #    1.16  stalled cycles per insn  ( +-  0.22% )  (66.67%)
    70,568,169,256      branches                  #  673.660 M/sec                    ( +-  0.17% )  (50.01%)
     3,100,725,426      branch-misses             #    4.39% of all branches          ( +-  0.20% )  (49.99%)

           105.583 +- 0.589 seconds time elapsed  ( +-  0.56% )

Patched:
 Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):

        104,055.69 msec task-clock                #    0.993 CPUs utilized            ( +-  0.32% )
             4,606      context-switches          #    0.044 K/sec                    ( +-  0.20% )
             2,463      cpu-migrations            #    0.024 K/sec                    ( +-  0.93% )
               394      page-faults               #    0.004 K/sec                    ( +-  0.25% )
   442,951,129,921      cycles                    #    4.257 GHz                      ( +-  0.32% )  (66.66%)
     8,710,413,329      stalled-cycles-frontend   #    1.97% frontend cycles idle     ( +-  0.47% )  (50.06%)
   299,656,905,836      stalled-cycles-backend    #   67.65% backend cycles idle      ( +-  0.39% )  (50.02%)
   252,731,168,193      instructions              #    0.57  insn per cycle
                                                  #    1.19  stalled cycles per insn  ( +-  0.20% )  (66.66%)
    68,902,851,121      branches                  #  662.173 M/sec                    ( +-  0.13% )  (49.94%)
     3,100,242,882      branch-misses             #    4.50% of all branches          ( +-  0.15% )  (49.98%)

           104.829 +- 0.325 seconds time elapsed  ( +-  0.31% )

This is consistent.  An add-by-count hot-add operation adds LMBs
greedily, so LMBs near the start of the drconf range are considered
first.  On an otherwise idle LPAR with so many LMBs we would expect to
find the LMBs we need near the start of the drconf range, hence the
smaller speedup.

Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
---
Changelog:

v1: https://lore.kernel.org/linuxppc-dev/20200910175637.2865160-1-cheloha@linux.ibm.com/

v2:
- Move prototype for of_drconf_to_nid_single() to topology.h.
  Requested by Michael Ellerman.

v3:
- Send the right patch.  v2 is from the wrong branch, my mistake.

 arch/powerpc/include/asm/topology.h             | 3 +++
 arch/powerpc/mm/numa.c                          | 2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c | 6 ++++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index f0b6300e7dd3..d15d9999bad6 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -86,6 +86,9 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 
 #endif /* CONFIG_NUMA */
 
+struct drmem_lmb;
+extern int of_drconf_to_nid_single(struct drmem_lmb *);
+
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
 extern int find_and_online_cpu_nid(int cpu);
 #else
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 1f61fa2148b5..63507b47164d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -430,7 +430,7 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
  * This is like of_node_to_nid_single() for memory represented in the
  * ibm,dynamic-reconfiguration-memory node.
  */
-static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
+int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 {
 	struct assoc_arrays aa = { .arrays = NULL };
 	int default_nid = NUMA_NO_NODE;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac4..9a533acf8ad0 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -611,8 +611,10 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	block_sz = memory_block_size_bytes();
 
-	/* Find the node id for this address. */
-	nid = memory_add_physaddr_to_nid(lmb->base_addr);
+	/* Find the node id for this LMB.  Fake one if necessary. */
+	nid = of_drconf_to_nid_single(lmb);
+	if (nid < 0 || !node_possible(nid))
+		nid = first_online_node;
 
 	/* Add the memory */
 	rc = __add_memory(nid, lmb->base_addr, block_sz);
-- 
2.24.1


^ permalink raw reply related

* Re: [PATCH] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Martin K. Petersen @ 2020-09-15 20:16 UTC (permalink / raw)
  To: mpe, benh, Jing Xiangfeng, jejb, paulus, tyreld
  Cc: linuxppc-dev, linux-kernel, Martin K . Petersen, linux-scsi
In-Reply-To: <20200907083949.154251-1-jingxiangfeng@huawei.com>

On Mon, 7 Sep 2020 16:39:49 +0800, Jing Xiangfeng wrote:

> Fix to return error code PTR_ERR() from the error handling case instead
> of 0.

Applied to 5.10/scsi-queue, thanks!

[1/1] scsi: ibmvfc: Fix error return in ibmvfc_probe()
      https://git.kernel.org/mkp/scsi/c/5e48a084f4e8

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH net] ibmvnic: update MAINTAINERS
From: David Miller @ 2020-09-15 20:27 UTC (permalink / raw)
  To: drt; +Cc: netdev, linuxppc-dev
In-Reply-To: <20200915003535.819585-1-drt@linux.ibm.com>

From: Dany Madden <drt@linux.ibm.com>
Date: Mon, 14 Sep 2020 20:35:35 -0400

> Update supporters for IBM Power SRIOV Virtual NIC Device Driver. 
> Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan 
> and Sukadev Bhattiprolu are the current supporters.
> 
> Signed-off-by: Dany Madden <drt@linux.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ibmvfc: Avoid link down on FS9100 canister reboot
From: Martin K. Petersen @ 2020-09-16  0:49 UTC (permalink / raw)
  To: Brian King; +Cc: tyreld, linuxppc-dev, martin.petersen, linux-scsi
In-Reply-To: <1599859706-8505-1-git-send-email-brking@linux.vnet.ibm.com>


Brian,

> When a canister on a FS9100, or similar storage, running in NPIV mode,
> is rebooted, its WWPNs will fail over to another canister.

[...]

Applied to 5.10/scsi-staging, thanks! I fixed a bunch of checkpatch
warnings.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
From: Nicholas Piggin @ 2020-09-16  0:51 UTC (permalink / raw)
  To: linuxppc-dev, mahesh, Michael Ellerman, Michal Suchánek
In-Reply-To: <87bli7p5dx.fsf@mpe.ellerman.id.au>

Excerpts from Michael Ellerman's message of September 15, 2020 10:54 pm:
> Michal Suchánek <msuchanek@suse.de> writes:
>> Hello,
>>
>> Using the SLB mutihit injection test module (which I did not write so I
>> do not want to post it here) to verify updates on my 5.3 frankernekernel
>> I found that the kernel crashes with Oops: kernel bad access.
>>
>> I tested on latest upstream kernel build that I have at hand and the
>> result is te same (minus the message - nothing was logged and the kernel
>> simply rebooted).
> 
> That's disappointing.

It seems to work okay with qemu and mambo injection on upstream
(powernv_defconfig). I wonder why that nmi_enter is crashing.
Can you post the output of a successful test with the patch
reverted?


qemu injection test output - 
[  195.279885][    C0] Disabling lock debugging due to kernel taint
[  195.280891][    C0] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: 00000000deadbeef [Recovered]
[  195.282117][    C0] MCE: CPU0: NIP: [c00000000003c2b4] isa300_idle_stop_mayloss+0x68/0x6c
[  195.283631][    C0] MCE: CPU0: Initiator CPU
[  195.284432][    C0] MCE: CPU0: Probable Software error (some chance of hardware cause)
[  220.711577][   T90] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: 00000000deadbeef [Recovered]
[  220.712805][   T90] MCE: CPU0: PID: 90 Comm: yes NIP: [00007fff7fdac2e0]
[  220.713553][   T90] MCE: CPU0: Initiator CPU
[  220.714021][   T90] MCE: CPU0: Probable Software error (some chance of hardware cause)

Thanks,
Nick

^ permalink raw reply

* RE: [PATCH v1] soc: fsl: rcpm: Add ACPI support
From: Ran Wang @ 2020-09-16  1:21 UTC (permalink / raw)
  To: kuldip dwivedi, Leo Li, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
  Cc: Biwen Li, Samer El-Haj-Mahmoud, tanveer, Paul Yang,
	kuldip dwivedi, Varun Sethi, Ard Biesheuvel, Arokia Samy
In-Reply-To: <20200915110647.846-1-kuldip.dwivedi@puresoftware.com>

Hi Kuldip,

On Tuesday, September 15, 2020 7:07 PM, kuldip dwivedi wrote:
> Subject: [PATCH v1] soc: fsl: rcpm: Add ACPI support

Actually I also post a patch for this recently: https://lore.kernel.org/patchwork/patch/1299959/  :)

Regards,
Ran

> Add ACPI support in fsl RCPM driver. This is required to support ACPI S3 state.
> S3 is the ACPI sleep state that is known as "sleep" or "suspend to RAM".
> It essentially turns off most power of the system but keeps memory powered.

Actually the low power mode is to gate clocks rather than power down on Layerscape platforms.

> Signed-off-by: tanveer <tanveer.alam@puresoftware.com>
> Signed-off-by: kuldip dwivedi <kuldip.dwivedi@puresoftware.com>
> ---
> 
> Notes:
>     1. Add ACPI match table
>     2. NXP team members are added for confirming HID changes
>     3. There is only one node in ACPI so no need to check for
>        current device explicitly
>     4. These changes are tested on LX2160A and LS1046A platforms
> 
>  drivers/soc/fsl/rcpm.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index
> a093dbe6d2cb..e75a436fb159 100644
> --- a/drivers/soc/fsl/rcpm.c
> +++ b/drivers/soc/fsl/rcpm.c
> @@ -2,10 +2,12 @@
>  //
>  // rcpm.c - Freescale QorIQ RCPM driver  // -// Copyright 2019 NXP
> +// Copyright 2019-2020 NXP
> +// Copyright 2020 Puresoftware Ltd.
>  //
>  // Author: Ran Wang <ran.wang_1@nxp.com>
> 
> +#include <linux/acpi.h>
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
> @@ -57,8 +59,13 @@ static int rcpm_pm_prepare(struct device *dev)
>  				rcpm->wakeup_cells + 1);
> 
>  		/*  Wakeup source should refer to current rcpm device */
> -		if (ret || (np->phandle != value[0]))
> -			continue;
> +		if (is_acpi_node(dev->fwnode)) {
> +			if (ret)
> +				continue;
> +		} else {
> +			if (ret || (np->phandle != value[0]))
> +				continue;
> +		}
>  		/* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
>  		 * number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
> @@ -139,10 +146,19 @@ static const struct of_device_id rcpm_of_match[] =
> {  };  MODULE_DEVICE_TABLE(of, rcpm_of_match);
> 
> +#ifdef CONFIG_ACPI
> +static const struct acpi_device_id rcpm_acpi_match[] = {
> +	{ "NXP0015", },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_match); #endif
> +
>  static struct platform_driver rcpm_driver = {
>  	.driver = {
>  		.name = "rcpm",
>  		.of_match_table = rcpm_of_match,
> +		.acpi_match_table = ACPI_PTR(rcpm_acpi_match),
>  		.pm	= &rcpm_pm_ops,
>  	},
>  	.probe = rcpm_probe,
> --
> 2.17.1


^ permalink raw reply

* RE: [PATCH v1] soc: fsl: rcpm: Add ACPI support
From: Ran Wang @ 2020-09-16  1:32 UTC (permalink / raw)
  To: Ard Biesheuvel, kuldip dwivedi, Leo Li,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
  Cc: Biwen Li, Samer El-Haj-Mahmoud, Arokia Samy, Paul Yang,
	Varun Sethi, tanveer
In-Reply-To: <4e008f0a-69da-d5c2-4dfc-ef8695e17f47@arm.com>

Hi Ard,

On Tuesday, September 15, 2020 7:10 PM, Ard Biesheuvel wrote:
> Subject: Re: [PATCH v1] soc: fsl: rcpm: Add ACPI support
> 
> On 9/15/20 1:06 PM, kuldip dwivedi wrote:
> > Add ACPI support in fsl RCPM driver. This is required to support ACPI
> > S3 state. S3 is the ACPI sleep state that is known as "sleep" or
> > "suspend to RAM".
> > It essentially turns off most power of the system but keeps memory
> > powered.
> >
> > Signed-off-by: tanveer <tanveer.alam@puresoftware.com>
> > Signed-off-by: kuldip dwivedi <kuldip.dwivedi@puresoftware.com>
> 
> Why does the OS need to program this device? Can't this be done by
> firmware?

This device is use to tell HW which IP (such as USB, SDHC, SATA, etc) should not be
clock gated during system enter low power state (to allow that IP work as a
wakeup source). And user does this configuration in device tree. So implement
this RCPM driver to do it in kernel rather than firmware.

Regards,
Ran

> > ---
> >
> > Notes:
> >      1. Add ACPI match table
> >      2. NXP team members are added for confirming HID changes
> >      3. There is only one node in ACPI so no need to check for
> >         current device explicitly
> >      4. These changes are tested on LX2160A and LS1046A platforms
> >
> >   drivers/soc/fsl/rcpm.c | 22 +++++++++++++++++++---
> >   1 file changed, 19 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index
> > a093dbe6d2cb..e75a436fb159 100644
> > --- a/drivers/soc/fsl/rcpm.c
> > +++ b/drivers/soc/fsl/rcpm.c
> > @@ -2,10 +2,12 @@
> >   //
> >   // rcpm.c - Freescale QorIQ RCPM driver
> >   //
> > -// Copyright 2019 NXP
> > +// Copyright 2019-2020 NXP
> > +// Copyright 2020 Puresoftware Ltd.
> >   //
> >   // Author: Ran Wang <ran.wang_1@nxp.com>
> >
> > +#include <linux/acpi.h>
> >   #include <linux/init.h>
> >   #include <linux/module.h>
> >   #include <linux/platform_device.h>
> > @@ -57,8 +59,13 @@ static int rcpm_pm_prepare(struct device *dev)
> >   				rcpm->wakeup_cells + 1);
> >
> >   		/*  Wakeup source should refer to current rcpm device */
> > -		if (ret || (np->phandle != value[0]))
> > -			continue;
> > +		if (is_acpi_node(dev->fwnode)) {
> > +			if (ret)
> > +				continue;
> > +		} else {
> > +			if (ret || (np->phandle != value[0]))
> > +				continue;
> > +		}
> >
> >   		/* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
> >   		 * number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
> > @@ -139,10 +146,19 @@ static const struct of_device_id rcpm_of_match[]
> = {
> >   };
> >   MODULE_DEVICE_TABLE(of, rcpm_of_match);
> >
> > +#ifdef CONFIG_ACPI
> > +static const struct acpi_device_id rcpm_acpi_match[] = {
> > +	{ "NXP0015", },
> > +	{ }
> > +};
> > +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_match); #endif
> > +
> >   static struct platform_driver rcpm_driver = {
> >   	.driver = {
> >   		.name = "rcpm",
> >   		.of_match_table = rcpm_of_match,
> > +		.acpi_match_table = ACPI_PTR(rcpm_acpi_match),
> >   		.pm	= &rcpm_pm_ops,
> >   	},
> >   	.probe = rcpm_probe,
> >


^ permalink raw reply

* [PATCH -next] powerpc/pseries: convert to use DEFINE_SEQ_ATTRIBUTE macro
From: Liu Shixin @ 2020-09-16  2:50 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: Liu Shixin, linuxppc-dev, linux-kernel

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 arch/powerpc/platforms/pseries/hvCall_inst.c | 23 +++-----------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hvCall_inst.c b/arch/powerpc/platforms/pseries/hvCall_inst.c
index c40c62ec432e..2c59b4986ea5 100644
--- a/arch/powerpc/platforms/pseries/hvCall_inst.c
+++ b/arch/powerpc/platforms/pseries/hvCall_inst.c
@@ -70,31 +70,14 @@ static int hc_show(struct seq_file *m, void *p)
 	return 0;
 }
 
-static const struct seq_operations hcall_inst_seq_ops = {
+static const struct seq_operations hcall_inst_sops = {
         .start = hc_start,
         .next  = hc_next,
         .stop  = hc_stop,
         .show  = hc_show
 };
 
-static int hcall_inst_seq_open(struct inode *inode, struct file *file)
-{
-	int rc;
-	struct seq_file *seq;
-
-	rc = seq_open(file, &hcall_inst_seq_ops);
-	seq = file->private_data;
-	seq->private = file_inode(file)->i_private;
-
-	return rc;
-}
-
-static const struct file_operations hcall_inst_seq_fops = {
-	.open = hcall_inst_seq_open,
-	.read = seq_read,
-	.llseek = seq_lseek,
-	.release = seq_release,
-};
+DEFINE_SEQ_ATTRIBUTE(hcall_inst);
 
 #define	HCALL_ROOT_DIR		"hcall_inst"
 #define CPU_NAME_BUF_SIZE	32
@@ -149,7 +132,7 @@ static int __init hcall_inst_init(void)
 		snprintf(cpu_name_buf, CPU_NAME_BUF_SIZE, "cpu%d", cpu);
 		debugfs_create_file(cpu_name_buf, 0444, hcall_root,
 				    per_cpu(hcall_stats, cpu),
-				    &hcall_inst_seq_fops);
+				    &hcall_inst_fops);
 	}
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 1/2] powerpc: untangle cputable mce include
From: Nicholas Piggin @ 2020-09-16  3:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Having cputable.h include mce.h means it pulls in a bunch of low level
headers (e.g., synch.h) which then can't use CPU_FTR_ definitions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/cputable.h | 5 -----
 arch/powerpc/kernel/cputable.c      | 1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 1 +
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 32a15dc49e8c..f89205eff691 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -9,11 +9,6 @@
 
 #ifndef __ASSEMBLY__
 
-/*
- * Added to include __machine_check_early_realmode_* functions
- */
-#include <asm/mce.h>
-
 /* This structure can grow, it's real size is used by head.S code
  * via the mkdefs mechanism.
  */
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2aa89c6b2896..b5bc2edef440 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -16,6 +16,7 @@
 #include <asm/oprofile_impl.h>
 #include <asm/cputable.h>
 #include <asm/prom.h>		/* for PTRRELOC on ARCH=ppc */
+#include <asm/mce.h>
 #include <asm/mmu.h>
 #include <asm/setup.h>
 
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index f204ad79b6b5..1098863e17ee 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -17,6 +17,7 @@
 
 #include <asm/cputable.h>
 #include <asm/dt_cpu_ftrs.h>
+#include <asm/mce.h>
 #include <asm/mmu.h>
 #include <asm/oprofile_impl.h>
 #include <asm/prom.h>
-- 
2.23.0


^ permalink raw reply related

* [PATCH v2 2/2] powerpc/64s: Add cp_abort after tlbiel to invalidate copy-buffer address
From: Nicholas Piggin @ 2020-09-16  3:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200916030234.4110379-1-npiggin@gmail.com>

The copy buffer is implemented as a real address in the nest which is
translated from EA by copy, and used for memory access by paste. This
requires that it be invalidated by TLB invalidation.

TLBIE does invalidate the copy buffer, but TLBIEL does not. Add cp_abort
to the tlbiel sequence.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
v2:
- Untangle headers that were causing build failures.
- Improve the comment a bit.
- Exempt POWER9 from the workaround, as described by the comment (we
  worked this out already but I forgot about it when doing v1!)

 arch/powerpc/include/asm/synch.h       | 19 ++++++++++++++++++-
 arch/powerpc/mm/book3s64/hash_native.c |  8 ++++----
 arch/powerpc/mm/book3s64/radix_tlb.c   | 12 ++++++------
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index aca70fb43147..a2e89f27d547 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -3,8 +3,9 @@
 #define _ASM_POWERPC_SYNCH_H 
 #ifdef __KERNEL__
 
+#include <asm/cputable.h>
 #include <asm/feature-fixups.h>
-#include <asm/asm-const.h>
+#include <asm/ppc-opcode.h>
 
 #ifndef __ASSEMBLY__
 extern unsigned int __start___lwsync_fixup, __stop___lwsync_fixup;
@@ -20,6 +21,22 @@ static inline void isync(void)
 {
 	__asm__ __volatile__ ("isync" : : : "memory");
 }
+
+static inline void ppc_after_tlbiel_barrier(void)
+{
+        asm volatile("ptesync": : :"memory");
+	/*
+	 * POWER9, POWER10 need a cp_abort after tlbiel to ensure the copy
+	 * is invalidated correctly. If this is not done, the paste can take
+	 * data from the physical address that was translated at copy time.
+	 *
+	 * POWER9 in practice does not need this, because address spaces
+	 * with accelerators mapped will use tlbie (which does invalidate
+	 * the copy) to invalidate translations. It's not possible to limit
+	 * POWER10 this way due to local copy-paste.
+	 */
+        asm volatile(ASM_FTR_IFSET(PPC_CP_ABORT, "", %0) : : "i" (CPU_FTR_ARCH_31) : "memory");
+}
 #endif /* __ASSEMBLY__ */
 
 #if defined(__powerpc64__)
diff --git a/arch/powerpc/mm/book3s64/hash_native.c b/arch/powerpc/mm/book3s64/hash_native.c
index cf20e5229ce1..0203cdf48c54 100644
--- a/arch/powerpc/mm/book3s64/hash_native.c
+++ b/arch/powerpc/mm/book3s64/hash_native.c
@@ -82,7 +82,7 @@ static void tlbiel_all_isa206(unsigned int num_sets, unsigned int is)
 	for (set = 0; set < num_sets; set++)
 		tlbiel_hash_set_isa206(set, is);
 
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 }
 
 static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
@@ -110,7 +110,7 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
 	 */
 	tlbiel_hash_set_isa300(0, is, 0, 2, 1);
 
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 
 	asm volatile(PPC_ISA_3_0_INVALIDATE_ERAT "; isync" : : :"memory");
 }
@@ -303,7 +303,7 @@ static inline void tlbie(unsigned long vpn, int psize, int apsize,
 	asm volatile("ptesync": : :"memory");
 	if (use_local) {
 		__tlbiel(vpn, psize, apsize, ssize);
-		asm volatile("ptesync": : :"memory");
+		ppc_after_tlbiel_barrier();
 	} else {
 		__tlbie(vpn, psize, apsize, ssize);
 		fixup_tlbie_vpn(vpn, psize, apsize, ssize);
@@ -879,7 +879,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 				__tlbiel(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
-		asm volatile("ptesync":::"memory");
+		ppc_after_tlbiel_barrier();
 	} else {
 		int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 0d233763441f..5c9d2fccacc7 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -65,7 +65,7 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
 	for (set = 1; set < num_sets; set++)
 		tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
 
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 }
 
 void radix__tlbiel_all(unsigned int action)
@@ -296,7 +296,7 @@ static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 
 	/* For PWC, only one flush is needed */
 	if (ric == RIC_FLUSH_PWC) {
-		asm volatile("ptesync": : :"memory");
+		ppc_after_tlbiel_barrier();
 		return;
 	}
 
@@ -304,7 +304,7 @@ static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 	for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
 		__tlbiel_pid(pid, set, RIC_FLUSH_TLB);
 
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 	asm volatile(PPC_RADIX_INVALIDATE_ERAT_USER "; isync" : : :"memory");
 }
 
@@ -431,7 +431,7 @@ static __always_inline void _tlbiel_va(unsigned long va, unsigned long pid,
 
 	asm volatile("ptesync": : :"memory");
 	__tlbiel_va(va, pid, ap, ric);
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 }
 
 static inline void _tlbiel_va_range(unsigned long start, unsigned long end,
@@ -442,7 +442,7 @@ static inline void _tlbiel_va_range(unsigned long start, unsigned long end,
 	if (also_pwc)
 		__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
 	__tlbiel_va_range(start, end, pid, page_size, psize);
-	asm volatile("ptesync": : :"memory");
+	ppc_after_tlbiel_barrier();
 }
 
 static inline void __tlbie_va_range(unsigned long start, unsigned long end,
@@ -940,7 +940,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 			if (hflush)
 				__tlbiel_va_range(hstart, hend, pid,
 						PMD_SIZE, MMU_PAGE_2M);
-			asm volatile("ptesync": : :"memory");
+			ppc_after_tlbiel_barrier();
 		} else if (cputlb_use_tlbie()) {
 			asm volatile("ptesync": : :"memory");
 			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v2 2/7] powerpc/prom: Introduce early_reserve_mem_old()
From: Cédric Le Goater @ 2020-09-16  5:56 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Christophe Leroy, linuxppc-dev
In-Reply-To: <20200915184607.Horde._j-BRtSmJ6vRGSRwLWoN7Q2@messagerie.si.c-s.fr>

On 9/15/20 6:46 PM, Christophe Leroy wrote:
> Cédric Le Goater <clg@kaod.org> a écrit :
> 
>> and condition its call with IS_ENABLED(CONFIG_PPC32). This fixes a
>> compile error with W=1.
>>
>> arch/powerpc/kernel/prom.c: In function ‘early_reserve_mem’:
>> arch/powerpc/kernel/prom.c:625:10: error: variable ‘reserve_map’ set but not used [-Werror=unused-but-set-variable]
>>   __be64 *reserve_map;
>>           ^~~~~~~~~~~
>> cc1: all warnings being treated as errors
>>
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> 
> @csgroup.eu instead of @c-s.fr please
> 
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  arch/powerpc/kernel/prom.c | 37 ++++++++++++++++++++-----------------
>>  1 file changed, 20 insertions(+), 17 deletions(-)
> 
> That's a lot of changes for a tiny warning.
> 
> You could make it easy by just replacing the #ifdef by:
> 
>         if (!IS_ENABLED(CONFIG_PPC32))
>                 return;

It's equivalent and it moves out the reserve_map variable of the main routine
which I think is better.

>>
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index d8a2fb87ba0c..c958b67cf1a5 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -620,27 +620,14 @@ static void __init early_reserve_mem_dt(void)
>>      }
>>  }
>>
>> -static void __init early_reserve_mem(void)
>> +static void __init early_reserve_mem_old(void)
> 
> Why _old ? Do you mean ppc32 are old ? Modern ADSL boxes like for instance the famous French freebox have powerpc32 microcontroller.
> Eventually you could name it _ppc32, but I don't think that's the good way, see above.

I choose old because of the comment ' ... booting from an old kexec ... ', 
but I agree _ppc32 might be a better choice.

Thanks,

C. 

> Christophe
> 
>>  {
>>      __be64 *reserve_map;
>>
>>      reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
>>              fdt_off_mem_rsvmap(initial_boot_params));
>>
>> -    /* Look for the new "reserved-regions" property in the DT */
>> -    early_reserve_mem_dt();
>> -
>> -#ifdef CONFIG_BLK_DEV_INITRD
>> -    /* Then reserve the initrd, if any */
>> -    if (initrd_start && (initrd_end > initrd_start)) {
>> -        memblock_reserve(ALIGN_DOWN(__pa(initrd_start), PAGE_SIZE),
>> -            ALIGN(initrd_end, PAGE_SIZE) -
>> -            ALIGN_DOWN(initrd_start, PAGE_SIZE));
>> -    }
>> -#endif /* CONFIG_BLK_DEV_INITRD */
>> -
>> -#ifdef CONFIG_PPC32
>> -    /*
>> +    /*
>>       * Handle the case where we might be booting from an old kexec
>>       * image that setup the mem_rsvmap as pairs of 32-bit values
>>       */
>> @@ -658,9 +645,25 @@ static void __init early_reserve_mem(void)
>>              DBG("reserving: %x -> %x\n", base_32, size_32);
>>              memblock_reserve(base_32, size_32);
>>          }
>> -        return;
>>      }
>> -#endif
>> +}
>> +
>> +static void __init early_reserve_mem(void)
>> +{
>> +    /* Look for the new "reserved-regions" property in the DT */
>> +    early_reserve_mem_dt();
>> +
>> +#ifdef CONFIG_BLK_DEV_INITRD
>> +    /* Then reserve the initrd, if any */
>> +    if (initrd_start && (initrd_end > initrd_start)) {
>> +        memblock_reserve(ALIGN_DOWN(__pa(initrd_start), PAGE_SIZE),
>> +            ALIGN(initrd_end, PAGE_SIZE) -
>> +            ALIGN_DOWN(initrd_start, PAGE_SIZE));
>> +    }
>> +#endif /* CONFIG_BLK_DEV_INITRD */
>> +
>> +    if (IS_ENABLED(CONFIG_PPC32))
>> +        early_reserve_mem_old();
>>  }
>>
>>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> -- 
>> 2.25.4
> 
> 


^ permalink raw reply

* Re: [PATCH v1] soc: fsl: rcpm: Add ACPI support
From: Ard Biesheuvel @ 2020-09-16  6:10 UTC (permalink / raw)
  To: Ran Wang, kuldip dwivedi, Leo Li, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
  Cc: Biwen Li, Samer El-Haj-Mahmoud, Arokia Samy, Paul Yang,
	Varun Sethi, tanveer
In-Reply-To: <AM6PR04MB5413903EAAEDB2EED2E254C6F1210@AM6PR04MB5413.eurprd04.prod.outlook.com>

On 9/16/20 3:32 AM, Ran Wang wrote:
> Hi Ard,
> 
> On Tuesday, September 15, 2020 7:10 PM, Ard Biesheuvel wrote:
>> Subject: Re: [PATCH v1] soc: fsl: rcpm: Add ACPI support
>>
>> On 9/15/20 1:06 PM, kuldip dwivedi wrote:
>>> Add ACPI support in fsl RCPM driver. This is required to support ACPI
>>> S3 state. S3 is the ACPI sleep state that is known as "sleep" or
>>> "suspend to RAM".
>>> It essentially turns off most power of the system but keeps memory
>>> powered.
>>>
>>> Signed-off-by: tanveer <tanveer.alam@puresoftware.com>
>>> Signed-off-by: kuldip dwivedi <kuldip.dwivedi@puresoftware.com>
>>
>> Why does the OS need to program this device? Can't this be done by
>> firmware?
> 
> This device is use to tell HW which IP (such as USB, SDHC, SATA, etc) should not be
> clock gated during system enter low power state (to allow that IP work as a
> wakeup source). And user does this configuration in device tree.

The point of ACPI is *not* to describe a DT topology using a table 
format that is not suited for it. The point of ACPI is to describe a 
machine that is more abstracted from the hardware than is typically 
possible with DT, where the abstractions are implemented by AML code 
that is provided by the firmware, but executed in the context of the OS.

So the idea is *not* finding the shortest possible path to get your 
existing DT driver code running on a system that boots via ACPI. 
Instead, you should carefully think about the abstract ACPI machine that 
you will expose to the OS, and hide everything else in firmware.

In this particular case, it seems like your USB, SDHC and SATA device 
objects may need power state dependent AML methods that program this 
block directly.



> So implement
> this RCPM driver to do it in kernel rather than firmware.
> 
> Regards,
> Ran
> 
>>> ---
>>>
>>> Notes:
>>>       1. Add ACPI match table
>>>       2. NXP team members are added for confirming HID changes
>>>       3. There is only one node in ACPI so no need to check for
>>>          current device explicitly
>>>       4. These changes are tested on LX2160A and LS1046A platforms
>>>
>>>    drivers/soc/fsl/rcpm.c | 22 +++++++++++++++++++---
>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index
>>> a093dbe6d2cb..e75a436fb159 100644
>>> --- a/drivers/soc/fsl/rcpm.c
>>> +++ b/drivers/soc/fsl/rcpm.c
>>> @@ -2,10 +2,12 @@
>>>    //
>>>    // rcpm.c - Freescale QorIQ RCPM driver
>>>    //
>>> -// Copyright 2019 NXP
>>> +// Copyright 2019-2020 NXP
>>> +// Copyright 2020 Puresoftware Ltd.
>>>    //
>>>    // Author: Ran Wang <ran.wang_1@nxp.com>
>>>
>>> +#include <linux/acpi.h>
>>>    #include <linux/init.h>
>>>    #include <linux/module.h>
>>>    #include <linux/platform_device.h>
>>> @@ -57,8 +59,13 @@ static int rcpm_pm_prepare(struct device *dev)
>>>    				rcpm->wakeup_cells + 1);
>>>
>>>    		/*  Wakeup source should refer to current rcpm device */
>>> -		if (ret || (np->phandle != value[0]))
>>> -			continue;
>>> +		if (is_acpi_node(dev->fwnode)) {
>>> +			if (ret)
>>> +				continue;
>>> +		} else {
>>> +			if (ret || (np->phandle != value[0]))
>>> +				continue;
>>> +		}
>>>
>>>    		/* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
>>>    		 * number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
>>> @@ -139,10 +146,19 @@ static const struct of_device_id rcpm_of_match[]
>> = {
>>>    };
>>>    MODULE_DEVICE_TABLE(of, rcpm_of_match);
>>>
>>> +#ifdef CONFIG_ACPI
>>> +static const struct acpi_device_id rcpm_acpi_match[] = {
>>> +	{ "NXP0015", },
>>> +	{ }
>>> +};
>>> +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_match); #endif
>>> +
>>>    static struct platform_driver rcpm_driver = {
>>>    	.driver = {
>>>    		.name = "rcpm",
>>>    		.of_match_table = rcpm_of_match,
>>> +		.acpi_match_table = ACPI_PTR(rcpm_acpi_match),
>>>    		.pm	= &rcpm_pm_ops,
>>>    	},
>>>    	.probe = rcpm_probe,
>>>
> 


^ permalink raw reply

* [PATCH] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param
From: sathnaga @ 2020-09-16  6:11 UTC (permalink / raw)
  To: linux-doc
  Cc: Jonathan Corbet, linux-kernel, kvm-ppc, Paul Mackerras,
	Satheesh Rajendran, linuxppc-dev

From: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>

Add document entry for kvm_cma_resv_ratio kernel param which
is used to alter the KVM contiguous memory allocation percentage
for hash pagetable allocation used by hash mode PowerPC KVM guests.

Cc: linux-kernel@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>  
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..9cb126573c71 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,15 @@
 			altogether. For more information, see
 			include/linux/dma-contiguous.h
 
+        kvm_cma_resv_ratio=n
+                        [PPC]
+                        Reserves given percentage from system memory area for
+                        contiguous memory allocation for KVM hash pagetable
+                        allocation.
+                        Bydefault it reserves 5% of total system memory.
+                        Format: <integer>
+                        Default: 5
+
 	cmo_free_hint=	[PPC] Format: { yes | no }
 			Specify whether pages are marked as being inactive
 			when they are freed.  This is used in CMO environments
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param
From: Randy Dunlap @ 2020-09-16  6:18 UTC (permalink / raw)
  To: sathnaga, linux-doc
  Cc: Jonathan Corbet, linux-kernel, kvm-ppc, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <20200916061130.723411-1-sathnaga@linux.vnet.ibm.com>

On 9/15/20 11:11 PM, sathnaga@linux.vnet.ibm.com wrote:
> From: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> 
> Add document entry for kvm_cma_resv_ratio kernel param which
> is used to alter the KVM contiguous memory allocation percentage
> for hash pagetable allocation used by hash mode PowerPC KVM guests.
> 
> Cc: linux-kernel@vger.kernel.org
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>  
> Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a1068742a6df..9cb126573c71 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -599,6 +599,15 @@
>  			altogether. For more information, see
>  			include/linux/dma-contiguous.h
>  
> +        kvm_cma_resv_ratio=n
> +                        [PPC]

You can put [PPC] on the line above.

> +                        Reserves given percentage from system memory area for
> +                        contiguous memory allocation for KVM hash pagetable
> +                        allocation.
> +                        Bydefault it reserves 5% of total system memory.

			   By default

> +                        Format: <integer>
> +                        Default: 5
> +

and please use tabs for indentation, not all spaces.

>  	cmo_free_hint=	[PPC] Format: { yes | no }
>  			Specify whether pages are marked as being inactive
>  			when they are freed.  This is used in CMO environments
> 

Entries in kernel-parameters.txt should be sorted into dictionary order,
so please put that with the other kvm parameters.

thanks.
-- 
~Randy


^ permalink raw reply

* [PATCH -next] drivers/macintosh/smu.c: use for_each_child_of_node() macro
From: Qinglang Miao @ 2020-09-16  6:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, Qinglang Miao

Use for_each_child_of_node() macro instead of open coding it.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
---
 drivers/macintosh/smu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/smu.c b/drivers/macintosh/smu.c
index 96684581a..45875e8c6 100644
--- a/drivers/macintosh/smu.c
+++ b/drivers/macintosh/smu.c
@@ -638,7 +638,7 @@ static void smu_expose_childs(struct work_struct *unused)
 {
 	struct device_node *np;
 
-	for (np = NULL; (np = of_get_next_child(smu->of_node, np)) != NULL;)
+	for_each_child_of_node(smu->of_node, np)
 		if (of_device_is_compatible(np, "smu-sensors"))
 			of_platform_device_create(np, "smu-sensors",
 						  &smu->of_dev->dev);
-- 
2.23.0


^ permalink raw reply related

* [PATCH -next] PCI: rpadlpar: use for_each_child_of_node() and for_each_node_by_name
From: Qinglang Miao @ 2020-09-16  6:21 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Tyrel Datwyler, Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux-kernel, Qinglang Miao

Use for_each_child_of_node() and for_each_node_by_name macro
instead of open coding it.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
---
 drivers/pci/hotplug/rpadlpar_core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index f979b7098..0a3c80ba6 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -40,13 +40,13 @@ static DEFINE_MUTEX(rpadlpar_mutex);
 static struct device_node *find_vio_slot_node(char *drc_name)
 {
 	struct device_node *parent = of_find_node_by_name(NULL, "vdevice");
-	struct device_node *dn = NULL;
+	struct device_node *dn;
 	int rc;
 
 	if (!parent)
 		return NULL;
 
-	while ((dn = of_get_next_child(parent, dn))) {
+	for_each_child_of_node(parent, dn) {
 		rc = rpaphp_check_drc_props(dn, drc_name, NULL);
 		if (rc == 0)
 			break;
@@ -60,10 +60,10 @@ static struct device_node *find_vio_slot_node(char *drc_name)
 static struct device_node *find_php_slot_pci_node(char *drc_name,
 						  char *drc_type)
 {
-	struct device_node *np = NULL;
+	struct device_node *np;
 	int rc;
 
-	while ((np = of_find_node_by_name(np, "pci"))) {
+	for_each_node_by_name(np, "pci") {
 		rc = rpaphp_check_drc_props(np, drc_name, drc_type);
 		if (rc == 0)
 			break;
-- 
2.23.0


^ permalink raw reply related

* [PATCH -next] macintosh: smu_sensors: use for_each_child_of_node() macro
From: Qinglang Miao @ 2020-09-16  6:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, Qinglang Miao

Use for_each_child_of_node() macro instead of open coding it.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
---
 drivers/macintosh/windfarm_smu_sensors.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/macintosh/windfarm_smu_sensors.c b/drivers/macintosh/windfarm_smu_sensors.c
index 3e6059eaa..c8706cfb8 100644
--- a/drivers/macintosh/windfarm_smu_sensors.c
+++ b/drivers/macintosh/windfarm_smu_sensors.c
@@ -421,8 +421,7 @@ static int __init smu_sensors_init(void)
 		return -ENODEV;
 
 	/* Look for sensors subdir */
-	for (sensors = NULL;
-	     (sensors = of_get_next_child(smu, sensors)) != NULL;)
+	for_each_child_of_node(smu, sensors)
 		if (of_node_name_eq(sensors, "sensors"))
 			break;
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH -next] powerpc/powernv: fix wrong warning message in opalcore_config_init()
From: Qinglang Miao @ 2020-09-16  6:21 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel, Qinglang Miao

The logic of the warn output is incorrect. The two args should be
exchanged.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
---
 arch/powerpc/platforms/powernv/opal-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c
index 6dba3b622..23571f0b5 100644
--- a/arch/powerpc/platforms/powernv/opal-core.c
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -510,7 +510,7 @@ static void __init opalcore_config_init(void)
 	idx = be32_to_cpu(opalc_metadata->region_cnt);
 	if (idx > MAX_PT_LOAD_CNT) {
 		pr_warn("WARNING: OPAL regions count (%d) adjusted to limit (%d)",
-			MAX_PT_LOAD_CNT, idx);
+			idx, MAX_PT_LOAD_CNT);
 		idx = MAX_PT_LOAD_CNT;
 	}
 	for (i = 0; i < idx; i++) {
-- 
2.23.0


^ permalink raw reply related

* [PATCH -next] serial: pmac_zilog: use for_each_child_of_node() macro
From: Qinglang Miao @ 2020-09-16  6:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Qinglang Miao, linuxppc-dev, linux-kernel, linux-serial

Use for_each_child_of_node() macro instead of open coding it.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
---
 drivers/tty/serial/pmac_zilog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/pmac_zilog.c b/drivers/tty/serial/pmac_zilog.c
index 96e7aa479..063484b22 100644
--- a/drivers/tty/serial/pmac_zilog.c
+++ b/drivers/tty/serial/pmac_zilog.c
@@ -1644,7 +1644,7 @@ static int __init pmz_probe(void)
 		 * TODO: Add routines with proper locking to do that...
 		 */
 		node_a = node_b = NULL;
-		for (np = NULL; (np = of_get_next_child(node_p, np)) != NULL;) {
+		for_each_child_of_node(node_p, np) {
 			if (of_node_name_prefix(np, "ch-a"))
 				node_a = of_node_get(np);
 			else if (of_node_name_prefix(np, "ch-b"))
-- 
2.23.0


^ permalink raw reply related

* [PATCH] fsl: imx-audmix : Replace seq_printf with seq_puts
From: Xu Wang @ 2020-09-16  6:14 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, shengjiu.wang,
	lgirdwood, broonie, perex, tiwai, shawnguo, s.hauer, kernel,
	linux-imx, alsa-devel, linuxppc-dev, linux-arm-kernel
  Cc: linux-kernel, Xu Wang

A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "devm_kcalloc".

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
---
 sound/soc/fsl/imx-audmix.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index 202fb8950078..cbdc0a2c09c5 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -185,20 +185,20 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	priv->num_dai = 2 * num_dai;
-	priv->dai = devm_kzalloc(&pdev->dev, priv->num_dai *
+	priv->dai = devm_kcalloc(&pdev->dev, priv->num_dai,
 				 sizeof(struct snd_soc_dai_link), GFP_KERNEL);
 	if (!priv->dai)
 		return -ENOMEM;
 
 	priv->num_dai_conf = num_dai;
-	priv->dai_conf = devm_kzalloc(&pdev->dev, priv->num_dai_conf *
+	priv->dai_conf = devm_kcalloc(&pdev->dev, priv->num_dai_conf,
 				      sizeof(struct snd_soc_codec_conf),
 				      GFP_KERNEL);
 	if (!priv->dai_conf)
 		return -ENOMEM;
 
 	priv->num_dapm_routes = 3 * num_dai;
-	priv->dapm_routes = devm_kzalloc(&pdev->dev, priv->num_dapm_routes *
+	priv->dapm_routes = devm_kcalloc(&pdev->dev, priv->num_dapm_routes,
 					 sizeof(struct snd_soc_dapm_route),
 					 GFP_KERNEL);
 	if (!priv->dapm_routes)
@@ -208,7 +208,7 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		struct snd_soc_dai_link_component *dlc;
 
 		/* for CPU/Codec/Platform x 2 */
-		dlc = devm_kzalloc(&pdev->dev, 6 * sizeof(*dlc), GFP_KERNEL);
+		dlc = devm_kcalloc(&pdev->dev, 6, sizeof(*dlc), GFP_KERNEL);
 		if (!dlc) {
 			dev_err(&pdev->dev, "failed to allocate dai_link\n");
 			return -ENOMEM;
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH v1] soc: fsl: rcpm: Add ACPI support
From: Ran Wang @ 2020-09-16  6:40 UTC (permalink / raw)
  To: Ard Biesheuvel, kuldip dwivedi, Leo Li,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
  Cc: Biwen Li, Samer El-Haj-Mahmoud, Arokia Samy, Paul Yang,
	Varun Sethi, tanveer
In-Reply-To: <caf01871-1c3d-bdf8-867d-daf7138966a8@arm.com>

Hi Ard,

On Wednesday, September 16, 2020 2:11 PM, Ard Biesheuvel wrote:
> Subject: Re: [PATCH v1] soc: fsl: rcpm: Add ACPI support
> 
> On 9/16/20 3:32 AM, Ran Wang wrote:
> > Hi Ard,
> >
> > On Tuesday, September 15, 2020 7:10 PM, Ard Biesheuvel wrote:
> >> Subject: Re: [PATCH v1] soc: fsl: rcpm: Add ACPI support
> >>
> >> On 9/15/20 1:06 PM, kuldip dwivedi wrote:
> >>> Add ACPI support in fsl RCPM driver. This is required to support
> >>> ACPI
> >>> S3 state. S3 is the ACPI sleep state that is known as "sleep" or
> >>> "suspend to RAM".
> >>> It essentially turns off most power of the system but keeps memory
> >>> powered.
> >>>
> >>> Signed-off-by: tanveer <tanveer.alam@puresoftware.com>
> >>> Signed-off-by: kuldip dwivedi <kuldip.dwivedi@puresoftware.com>
> >>
> >> Why does the OS need to program this device? Can't this be done by
> >> firmware?
> >
> > This device is use to tell HW which IP (such as USB, SDHC, SATA, etc)
> > should not be clock gated during system enter low power state (to
> > allow that IP work as a wakeup source). And user does this configuration in
> device tree.
> 
> The point of ACPI is *not* to describe a DT topology using a table format that
> is not suited for it. The point of ACPI is to describe a machine that is more
> abstracted from the hardware than is typically possible with DT, where the
> abstractions are implemented by AML code that is provided by the firmware,
> but executed in the context of the OS.
> 
> So the idea is *not* finding the shortest possible path to get your existing DT
> driver code running on a system that boots via ACPI.
> Instead, you should carefully think about the abstract ACPI machine that you
> will expose to the OS, and hide everything else in firmware.
> 
> In this particular case, it seems like your USB, SDHC and SATA device objects
> may need power state dependent AML methods that program this block
> directly.

Actually the scenario is a little bit complicated for RCPM function: it need to query
kernel wakeup source framework (see for_each_wakeup_source(ws)) to fetch all
potential candidates then do the programming accordingly. If we implement
this logic in AML methods, I have no idea how to collect those information stored in
wakeup source data of kernel.

Regards,
Ran

> 
> 
> > So implement
> > this RCPM driver to do it in kernel rather than firmware.
> >
> > Regards,
> > Ran
> >
> >>> ---
> >>>
> >>> Notes:
> >>>       1. Add ACPI match table
> >>>       2. NXP team members are added for confirming HID changes
> >>>       3. There is only one node in ACPI so no need to check for
> >>>          current device explicitly
> >>>       4. These changes are tested on LX2160A and LS1046A platforms
> >>>
> >>>    drivers/soc/fsl/rcpm.c | 22 +++++++++++++++++++---
> >>>    1 file changed, 19 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index
> >>> a093dbe6d2cb..e75a436fb159 100644
> >>> --- a/drivers/soc/fsl/rcpm.c
> >>> +++ b/drivers/soc/fsl/rcpm.c
> >>> @@ -2,10 +2,12 @@
> >>>    //
> >>>    // rcpm.c - Freescale QorIQ RCPM driver
> >>>    //
> >>> -// Copyright 2019 NXP
> >>> +// Copyright 2019-2020 NXP
> >>> +// Copyright 2020 Puresoftware Ltd.
> >>>    //
> >>>    // Author: Ran Wang <ran.wang_1@nxp.com>
> >>>
> >>> +#include <linux/acpi.h>
> >>>    #include <linux/init.h>
> >>>    #include <linux/module.h>
> >>>    #include <linux/platform_device.h> @@ -57,8 +59,13 @@ static int
> >>> rcpm_pm_prepare(struct device *dev)
> >>>    				rcpm->wakeup_cells + 1);
> >>>
> >>>    		/*  Wakeup source should refer to current rcpm device */
> >>> -		if (ret || (np->phandle != value[0]))
> >>> -			continue;
> >>> +		if (is_acpi_node(dev->fwnode)) {
> >>> +			if (ret)
> >>> +				continue;
> >>> +		} else {
> >>> +			if (ret || (np->phandle != value[0]))
> >>> +				continue;
> >>> +		}
> >>>
> >>>    		/* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
> >>>    		 * number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
> >>> @@ -139,10 +146,19 @@ static const struct of_device_id
> >>> rcpm_of_match[]
> >> = {
> >>>    };
> >>>    MODULE_DEVICE_TABLE(of, rcpm_of_match);
> >>>
> >>> +#ifdef CONFIG_ACPI
> >>> +static const struct acpi_device_id rcpm_acpi_match[] = {
> >>> +	{ "NXP0015", },
> >>> +	{ }
> >>> +};
> >>> +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_match); #endif
> >>> +
> >>>    static struct platform_driver rcpm_driver = {
> >>>    	.driver = {
> >>>    		.name = "rcpm",
> >>>    		.of_match_table = rcpm_of_match,
> >>> +		.acpi_match_table = ACPI_PTR(rcpm_acpi_match),
> >>>    		.pm	= &rcpm_pm_ops,
> >>>    	},
> >>>    	.probe = rcpm_probe,
> >>>
> >


^ permalink raw reply

* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
From: Mahesh Jagannath Salgaonkar @ 2020-09-16  7:12 UTC (permalink / raw)
  To: Michal Suchánek, linuxppc-dev
In-Reply-To: <20200915084302.GG29778@kitsune.suse.cz>

On 9/15/20 2:13 PM, Michal Suchánek wrote:
> Hello,
> 
> Using the SLB mutihit injection test module (which I did not write so I
> do not want to post it here) to verify updates on my 5.3 frankernekernel
> I found that the kernel crashes with Oops: kernel bad access.
> 
> I tested on latest upstream kernel build that I have at hand and the
> result is te same (minus the message - nothing was logged and the kernel
> simply rebooted).


Yes, SLB multihit recovery is broken upstream. Fix is on the way.


> 
> Since the whole effort to write a real mode MCE handler was supposed to
> prevent this maybe the SLB injection module should be added to the
> kernel selftests?

Yes. We are working on adding SLB injection selftest patches will be
posted soon.

Thanks,
-Mahesh.

> 
> Thanks
> 
> Michal
> 


^ permalink raw reply

* Re: [PATCH v3] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
From: David Hildenbrand @ 2020-09-16  7:39 UTC (permalink / raw)
  To: Scott Cheloha, linuxppc-dev
  Cc: Nathan Lynch, Michal Suchanek, Laurent Dufour, Rick Lindsley
In-Reply-To: <20200915194647.3334645-1-cheloha@linux.ibm.com>

On 15.09.20 21:46, Scott Cheloha wrote:
> During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
> to determine which node id (nid) to use when later calling __add_memory().
> 
> This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
> appropriate nid for a given address by looking up the LMB containing the
> address and then passing that LMB to of_drconf_to_nid_single() to get the
> nid.  In dlpar_add_lmb() we get this address from the LMB itself.
> 
> In short, we have a pointer to an LMB and then we are searching for
> that LMB *again* in order to find its nid.
> 
> If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
> can skip the redundant lookup.  The only error handling we need to
> duplicate from memory_add_physaddr_to_nid() is the fallback to the
> default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
> an invalid nid.
> 
> Skipping the extra lookup makes hot-add operations faster, especially
> on machines with many LMBs.
> 
> Consider an LPAR with 126976 LMBs.  In one test, hot-adding 126000
> LMBs on an upatched kernel took ~3.5 hours while a patched kernel
> completed the same operation in ~2 hours:
> 
> Unpatched (12450 seconds):
> Sep  9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
> Sep  9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
> [...]
> Sep  9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
> 
> Patched (7065 seconds):
> Sep  8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
> Sep  8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
> [...]
> Sep  8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
> 
> It should be noted that the speedup grows more substantial when
> hot-adding LMBs at the end of the drconf range.  This is because we
> are skipping a linear LMB search.
> 
> To see the distinction, consider smaller hot-add test on the same
> LPAR.  A perf-stat run with 10 iterations showed that hot-adding 4096
> LMBs completed less than 1 second faster on a patched kernel:
> 
> Unpatched:
>  Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
> 
>         104,753.42 msec task-clock                #    0.992 CPUs utilized            ( +-  0.55% )
>              4,708      context-switches          #    0.045 K/sec                    ( +-  0.69% )
>              2,444      cpu-migrations            #    0.023 K/sec                    ( +-  1.25% )
>                394      page-faults               #    0.004 K/sec                    ( +-  0.22% )
>    445,902,503,057      cycles                    #    4.257 GHz                      ( +-  0.55% )  (66.67%)
>      8,558,376,740      stalled-cycles-frontend   #    1.92% frontend cycles idle     ( +-  0.88% )  (49.99%)
>    300,346,181,651      stalled-cycles-backend    #   67.36% backend cycles idle      ( +-  0.76% )  (50.01%)
>    258,091,488,691      instructions              #    0.58  insn per cycle
>                                                   #    1.16  stalled cycles per insn  ( +-  0.22% )  (66.67%)
>     70,568,169,256      branches                  #  673.660 M/sec                    ( +-  0.17% )  (50.01%)
>      3,100,725,426      branch-misses             #    4.39% of all branches          ( +-  0.20% )  (49.99%)
> 
>            105.583 +- 0.589 seconds time elapsed  ( +-  0.56% )
> 
> Patched:
>  Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
> 
>         104,055.69 msec task-clock                #    0.993 CPUs utilized            ( +-  0.32% )
>              4,606      context-switches          #    0.044 K/sec                    ( +-  0.20% )
>              2,463      cpu-migrations            #    0.024 K/sec                    ( +-  0.93% )
>                394      page-faults               #    0.004 K/sec                    ( +-  0.25% )
>    442,951,129,921      cycles                    #    4.257 GHz                      ( +-  0.32% )  (66.66%)
>      8,710,413,329      stalled-cycles-frontend   #    1.97% frontend cycles idle     ( +-  0.47% )  (50.06%)
>    299,656,905,836      stalled-cycles-backend    #   67.65% backend cycles idle      ( +-  0.39% )  (50.02%)
>    252,731,168,193      instructions              #    0.57  insn per cycle
>                                                   #    1.19  stalled cycles per insn  ( +-  0.20% )  (66.66%)
>     68,902,851,121      branches                  #  662.173 M/sec                    ( +-  0.13% )  (49.94%)
>      3,100,242,882      branch-misses             #    4.50% of all branches          ( +-  0.15% )  (49.98%)
> 
>            104.829 +- 0.325 seconds time elapsed  ( +-  0.31% )
> 
> This is consistent.  An add-by-count hot-add operation adds LMBs
> greedily, so LMBs near the start of the drconf range are considered
> first.  On an otherwise idle LPAR with so many LMBs we would expect to
> find the LMBs we need near the start of the drconf range, hence the
> smaller speedup.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>


Hi Scott,

IIRC, ppc DLPAR does a single add_memory() for each LMB (16 MB). With
tons of LMBs, this will also make /proc/iomem explode in size (using a a
list-based tree), making traversal significantly slower e.g., on
insertions and system ram walks.

I was wondering if you would get another performance boost under ppc
when using MEMHP_MERGE_RESOURCE [1]. AFAIKs, the resource boundaries are
not of interest. No guarantees, might be worth a try.

Did you investigate what else makes memory hotplug that slow? (126000
LMBs correspond to roughly 2TB, that shouldn't take 2 hours ...) Memory
block devices might still be a slowdown (although we have an xarray in
place now that takes care of most pain).

[1]
https://lore.kernel.org/linux-mm/20200911103459.10306-1-david@redhat.com/

-- 
Thanks,

David / dhildenb


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox