LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Matthew Wilcox @ 2020-09-20 19:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920191031.GQ3421308@ZenIV.linux.org.uk>

On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> IMO it's much saner to mark those and refuse to touch them from io_uring...

Simpler solution is to remove io_uring from the 32-bit syscall list.
If you're a 32-bit process, you don't get to use io_uring.  Would
any real users actually care about that?

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Andy Lutomirski @ 2020-09-20 19:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <20200920192259.GU32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
>
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

We could go one step farther and declare that we're done adding *any*
new compat syscalls :)



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [PATCH] soc: fsl: dpio: remove set but not used 'addr_cena'
From: Krzysztof Kozlowski @ 2020-09-20 20:19 UTC (permalink / raw)
  To: Jason Yan
  Cc: Roy.Pledge, linux-kernel@vger.kernel.org, youri.querry_1,
	Hulk Robot, leoyang.li, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200910140415.1132266-1-yanaijie@huawei.com>

On Thu, 10 Sep 2020 at 16:57, Jason Yan <yanaijie@huawei.com> wrote:
>
> This addresses the following gcc warning with "make W=1":
>
> drivers/soc/fsl/dpio/qbman-portal.c: In function
> ‘qbman_swp_enqueue_multiple_direct’:
> drivers/soc/fsl/dpio/qbman-portal.c:650:11: warning: variable
> ‘addr_cena’ set but not used [-Wunused-but-set-variable]
>   650 |  uint64_t addr_cena;
>       |           ^~~~~~~~~
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Jason Yan <yanaijie@huawei.com>

This was already reported:
Reported-by: kernel test robot <lkp@intel.com>
https://lkml.org/lkml/2020/6/12/290

Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>

Best regards,
Krzysztof

> ---
>  drivers/soc/fsl/dpio/qbman-portal.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
> index 0ab85bfb116f..659b4a570d5b 100644
> --- a/drivers/soc/fsl/dpio/qbman-portal.c
> +++ b/drivers/soc/fsl/dpio/qbman-portal.c
> @@ -647,7 +647,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>         const uint32_t *cl = (uint32_t *)d;
>         uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
>         int i, num_enqueued = 0;
> -       uint64_t addr_cena;
>
>         spin_lock(&s->access_spinlock);
>         half_mask = (s->eqcr.pi_ci_mask>>1);
> @@ -701,7 +700,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>
>         /* Flush all the cacheline without load/store in between */
>         eqcr_pi = s->eqcr.pi;
> -       addr_cena = (size_t)s->addr_cena;
>         for (i = 0; i < num_enqueued; i++)
>                 eqcr_pi++;
>         s->eqcr.pi = eqcr_pi & full_mask;
> --
> 2.25.4
>

^ permalink raw reply

* [PATCH] soc: fsl: qbman: Fix return value on success
From: Krzysztof Kozlowski @ 2020-09-20 20:26 UTC (permalink / raw)
  To: Li Yang, Roy Pledge, linuxppc-dev, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski

On error the function was meant to return -ERRNO.  This also fixes
compile warning:

  drivers/soc/fsl/qbman/bman.c:640:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]

Fixes: 0505d00c8dba ("soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to bootup")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
---
 drivers/soc/fsl/qbman/bman.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index f4fb527d8301..c5dd026fe889 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -660,7 +660,7 @@ int bm_shutdown_pool(u32 bpid)
 	}
 done:
 	put_affine_portal();
-	return 0;
+	return err;
 }
 
 struct gen_pool *bm_bpalloc;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Arnd Bergmann @ 2020-09-20 20:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Matthew Wilcox, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrXVtBkxNJcMxf9myaKT9snHKbCWUenKHGRfp8AOtORBPg@mail.gmail.com>

On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > > IMO it's much saner to mark those and refuse to touch them from io_uring...
> >
> > Simpler solution is to remove io_uring from the 32-bit syscall list.
> > If you're a 32-bit process, you don't get to use io_uring.  Would
> > any real users actually care about that?
>
> We could go one step farther and declare that we're done adding *any*
> new compat syscalls :)

Would you also stop adding system calls to native 32-bit systems then?

On memory constrained systems (less than 2GB a.t.m.), there is still a
strong demand for running 32-bit user space, but all of the recent Arm
cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
that compat mode may eventually become the primary way to run
Linux on cheap embedded systems.

I don't think there is any chance we can realistically take away io_uring
from the 32-bit ABI any more now.

      Arnd

^ permalink raw reply

* RE: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: David Laight @ 2020-09-20 21:13 UTC (permalink / raw)
  To: 'Arnd Bergmann', Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM,
	keyrings@vger.kernel.org, sparclinux, Christoph Hellwig,
	linux-arch, linux-s390, Linux SCSI List, X86 ML, Matthew Wilcox,
	linux-block, Al Viro, io-uring@vger.kernel.org, linux-arm-kernel,
	Jens Axboe, Parisc List, Network Development, LKML, LSM List,
	Linux FS Devel, Andrew Morton, linuxppc-dev
In-Reply-To: <CAK8P3a37BRFj_qg61gP2oVrjJzBrZ58y1vggeTk_5n55Ou5U2Q@mail.gmail.com>

From: Arnd Bergmann
> Sent: 20 September 2020 21:49
> 
> On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
> > On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > > > IMO it's much saner to mark those and refuse to touch them from io_uring...
> > >
> > > Simpler solution is to remove io_uring from the 32-bit syscall list.
> > > If you're a 32-bit process, you don't get to use io_uring.  Would
> > > any real users actually care about that?
> >
> > We could go one step farther and declare that we're done adding *any*
> > new compat syscalls :)
> 
> Would you also stop adding system calls to native 32-bit systems then?
> 
> On memory constrained systems (less than 2GB a.t.m.), there is still a
> strong demand for running 32-bit user space, but all of the recent Arm
> cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
> that compat mode may eventually become the primary way to run
> Linux on cheap embedded systems.
> 
> I don't think there is any chance we can realistically take away io_uring
> from the 32-bit ABI any more now.

Can't it just run requests from 32bit apps in a kernel thread that has
the 'in_compat_syscall' flag set?
Not that i recall seeing the code where it saves the 'compat' nature
of any requests.

It is already completely f*cked if you try to pass the command ring
to a child process - it uses the wrong 'mm'.
I suspect there are some really horrid security holes in that area.

	David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 21:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920192259.GU32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 08:22:59PM +0100, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
> 
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

What for?  I mean, is there any reason to try and keep those bugs as
first-class citizens?  IDGI...  Yes, we have several special files
(out of thousands) that have read()/write() user-visible semantics
broken wrt 32bit/64bit.  And we have to keep them working that way
for existing syscalls.  Why would we want to pretend that their
behaviour is normal and isn't an ABI bug, not to be repeated for
anything new?

^ permalink raw reply

* [PATCH] fsl: imx-audmix : Use devm_kcalloc() instead of devm_kzalloc()
From: Xu Wang @ 2020-09-21  1:59 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, shengjiu.wang,
	lgirdwood, broonie, perex, tiwai, shawnguo, s.hauer, kernel,
	linux-imx, alsa-devel, linuxppc-dev, linux-arm-kernel
  Cc: linux-kernel, Xu Wang

A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "devm_kcalloc".

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
---
 sound/soc/fsl/imx-audmix.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index 202fb8950078..cbdc0a2c09c5 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -185,20 +185,20 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	priv->num_dai = 2 * num_dai;
-	priv->dai = devm_kzalloc(&pdev->dev, priv->num_dai *
+	priv->dai = devm_kcalloc(&pdev->dev, priv->num_dai,
 				 sizeof(struct snd_soc_dai_link), GFP_KERNEL);
 	if (!priv->dai)
 		return -ENOMEM;
 
 	priv->num_dai_conf = num_dai;
-	priv->dai_conf = devm_kzalloc(&pdev->dev, priv->num_dai_conf *
+	priv->dai_conf = devm_kcalloc(&pdev->dev, priv->num_dai_conf,
 				      sizeof(struct snd_soc_codec_conf),
 				      GFP_KERNEL);
 	if (!priv->dai_conf)
 		return -ENOMEM;
 
 	priv->num_dapm_routes = 3 * num_dai;
-	priv->dapm_routes = devm_kzalloc(&pdev->dev, priv->num_dapm_routes *
+	priv->dapm_routes = devm_kcalloc(&pdev->dev, priv->num_dapm_routes,
 					 sizeof(struct snd_soc_dapm_route),
 					 GFP_KERNEL);
 	if (!priv->dapm_routes)
@@ -208,7 +208,7 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		struct snd_soc_dai_link_component *dlc;
 
 		/* for CPU/Codec/Platform x 2 */
-		dlc = devm_kzalloc(&pdev->dev, 6 * sizeof(*dlc), GFP_KERNEL);
+		dlc = devm_kcalloc(&pdev->dev, 6, sizeof(*dlc), GFP_KERNEL);
 		if (!dlc) {
 			dev_err(&pdev->dev, "failed to allocate dai_link\n");
 			return -ENOMEM;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Christoph Hellwig @ 2020-09-21  4:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Matthew Wilcox, Arnd Bergmann,
	linux-block, Al Viro, io-uring, linux-arm-kernel, Jens Axboe,
	Parisc List, Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrWHW4wHG+Z-mxsY-kvjSi+S6gRUQ+LHd9syPcm5bhi3cw@mail.gmail.com>

On Sun, Sep 20, 2020 at 12:14:49PM -0700, Andy Lutomirski wrote:
> I wonder if this is really quite cast in stone.  We could also have
> FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
> compat mode.  Then that particular struct file would be read and
> written using the compat data format.  The change would be
> user-visible, but the user that would see it would be very strange
> indeed.
> 
> I don't have a strong opinion as to whether that is better or worse
> than denying io_uring access to these things, but at least it moves
> the special case out of io_uring.

open could have happened through an io_uring thread a well, so I don't
see how this would do anything but move the problem to a different
place.

> 
> --Andy
---end quoted text---

^ permalink raw reply

* Re: let import_iovec deal with compat_iovecs as well
From: 'Christoph Hellwig' @ 2020-09-21  4:41 UTC (permalink / raw)
  To: David Laight
  Cc: linux-aio@kvack.org, linux-mips@vger.kernel.org, David Howells,
	linux-mm@kvack.org, keyrings@vger.kernel.org,
	sparclinux@vger.kernel.org, 'Christoph Hellwig',
	linux-arch@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-scsi@vger.kernel.org, x86@kernel.org, Arnd Bergmann,
	linux-block@vger.kernel.org, Alexander Viro,
	io-uring@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Jens Axboe, linux-parisc@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2c7bf42ee4314484ae0177280cd8f5f3@AcuMS.aculab.com>

On Sat, Sep 19, 2020 at 02:24:10PM +0000, David Laight wrote:
> I thought about that change while writing my import_iovec() => iovec_import()
> patch - and thought that the io_uring code would (as usual) cause grief.
> 
> Christoph - did you see those patches?

No.

^ permalink raw reply

* Re: [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()
From: Christoph Hellwig @ 2020-09-21  6:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, sparclinux, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, x86, Russell King,
	linux-csky, David Airlie, Mel Gorman, linux-snps-arc,
	linux-xtensa, Paul McKenney, intel-gfx, linuxppc-dev,
	Steven Rostedt, Linus Torvalds, Jani Nikula, Rodrigo Vivi,
	Dietmar Eggemann, Linux ARM, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Nick Hu, Linux-MM, Vineet Gupta, LKML,
	Arnd Bergmann, Daniel Vetter, Paul Mackerras, Andrew Morton,
	Daniel Bristot de Oliveira, David S. Miller, Greentime Hu
In-Reply-To: <20200919092615.879315697@linutronix.de>

On Sat, Sep 19, 2020 at 11:17:52AM +0200, Thomas Gleixner wrote:
> Nothing in modules can use that.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*
From: Christoph Hellwig @ 2020-09-21  6:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, sparclinux, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, x86, Russell King,
	linux-csky, David Airlie, Mel Gorman, linux-snps-arc,
	linux-xtensa, Paul McKenney, intel-gfx, linuxppc-dev,
	Steven Rostedt, Linus Torvalds, Jani Nikula, Rodrigo Vivi,
	Dietmar Eggemann, Linux ARM, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Nick Hu, Linux-MM, Vineet Gupta, LKML,
	Arnd Bergmann, Daniel Vetter, Paul Mackerras, Andrew Morton,
	Daniel Bristot de Oliveira, David S. Miller, Greentime Hu
In-Reply-To: <20200919092615.990731525@linutronix.de>

> +# ifndef ARCH_NEEDS_KMAP_HIGH_GET
> +static inline void *arch_kmap_temporary_high_get(struct page *page)
> +{
> +	return NULL;
> +}
> +# endif

Turn this into a macro and use #ifndef on the symbol name?

> +static inline void __kunmap_atomic(void *addr)
> +{
> +	kumap_atomic_indexed(addr);
> +}
> +
> +
> +#endif /* CONFIG_KMAP_ATOMIC_GENERIC */

Stange double empty line above the endif.

> -#define kunmap_atomic(addr)                                     \
> -do {                                                            \
> -	BUILD_BUG_ON(__same_type((addr), struct page *));       \
> -	kunmap_atomic_high(addr);                                  \
> -	pagefault_enable();                                     \
> -	preempt_enable();                                       \
> -} while (0)
> -
> +#define kunmap_atomic(addr)						\
> +	do {								\
> +		BUILD_BUG_ON(__same_type((addr), struct page *));	\
> +		__kunmap_atomic(addr);					\
> +		preempt_enable();					\
> +	} while (0)

Why the strange re-indent to a form that is much less common and less
readable?

> +void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
> +{
> +	pagefault_disable();
> +	return __kmap_atomic_pfn_prot(pfn, prot);
> +}
> +EXPORT_SYMBOL(kmap_atomic_pfn_prot);

The existing kmap_atomic_pfn & co implementation is EXPORT_SYMBOL_GPL,
and this stuff should preferably stay that way.

^ permalink raw reply

* [PATCH V2] powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints
From: Athira Rajeev @ 2020-09-21  7:10 UTC (permalink / raw)
  To: mpe; +Cc: maddy, linuxppc-dev

PMU counter support functions enforces event constraints for group of
events to check if all events in a group can be monitored. Incase of
event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ),
not all constraints are applicable, say the threshold or sample bits.
But current code includes pmc5 and pmc6 in some group constraints (like
IC_DC Qualifier bits) which is actually not applicable and hence results
in those events not getting counted when scheduled along with group of
other events. Patch fixes this by excluding PMC5/6 from constraints
which are not relevant for it.

Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changes in v2:
- Added a block comment in the fix path explaining
  why the change is needed.

 arch/powerpc/perf/isa207-common.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 964437a..12153da 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -288,6 +288,15 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 
 		mask  |= CNST_PMC_MASK(pmc);
 		value |= CNST_PMC_VAL(pmc);
+
+		/*
+		 * PMC5 and PMC6 are used to count cycles and instructions
+		 * and these doesnot support most of the constraint bits.
+		 * Add a check to exclude PMC5/6 from most of the constraints
+		 * except for ebb/bhrb.
+		 */
+		if (pmc >= 5)
+			goto ebb_bhrb;
 	}
 
 	if (pmc <= 4) {
@@ -357,6 +366,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 		}
 	}
 
+ebb_bhrb:
 	if (!pmc && ebb)
 		/* EBB events must specify the PMC */
 		return -1;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Thomas Gleixner @ 2020-09-21  7:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, linux-sparc, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, the arch/x86 maintainers,
	Russell King, linux-csky, David Airlie, Mel Gorman,
	open list:SYNOPSYS ARC ARCHITECTURE, linux-xtensa, Paul McKenney,
	intel-gfx, linuxppc-dev, Steven Rostedt, Jani Nikula,
	Rodrigo Vivi, Dietmar Eggemann, Linux ARM, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
	Vineet Gupta, LKML, Arnd Bergmann, Daniel Vetter, Paul Mackerras,
	Andrew Morton, Daniel Bristot de Oliveira, David S. Miller,
	Greentime Hu
In-Reply-To: <CAHk-=wgF-upZVpqJWK=TK7MS9H-Rp1ZxGfOG+dDW=JThtxAzVQ@mail.gmail.com>

On Sun, Sep 20 2020 at 10:42, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 10:40 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> I think the more obvious solution is to split the whole exercise:
>>
>>   schedule()
>>      prepare_switch()
>>         unmap()
>>
>>     switch_to()
>>
>>     finish_switch()
>>         map()
>
> Yeah, that looks much easier to explain. Ack.

So far so good, but Peter Z. just pointed out to me that I completely
missed the fact that this cannot work.

If a task is migrated to a different CPU then the mapping address will
change which will explode in colourful ways.

On RT kernels this works because we ping the task to the CPU via
migrate_disable(). On a !RT kernel migrate_disable() maps to
preempt_disable() which brings us back to square one.

/me goes back to the drawing board.

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH v2 1/4] selftests/seccomp: Record syscall during ptrace entry
From: Christian Brauner @ 2020-09-21  7:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200919080637.259478-2-keescook@chromium.org>

On Sat, Sep 19, 2020 at 01:06:34AM -0700, Kees Cook wrote:
> In preparation for performing actions during ptrace syscall exit, save
> the syscall number during ptrace syscall entry. Some architectures do
> no have the syscall number available during ptrace syscall exit.
> 
> Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 40 +++++++++++++------
>  1 file changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index bc0fb463c709..c0311b4c736b 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -1949,12 +1949,19 @@ void tracer_seccomp(struct __test_metadata *_metadata, pid_t tracee,
>  
>  }
>  
> +FIXTURE(TRACE_syscall) {
> +	struct sock_fprog prog;
> +	pid_t tracer, mytid, mypid, parent;
> +	long syscall_nr;
> +};
> +
>  void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
>  		   int status, void *args)
>  {
> -	int ret, nr;
> +	int ret;
>  	unsigned long msg;
>  	static bool entry;
> +	FIXTURE_DATA(TRACE_syscall) *self = args;
>  
>  	/*
>  	 * The traditional way to tell PTRACE_SYSCALL entry/exit
> @@ -1968,24 +1975,31 @@ void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
>  	EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
>  			: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
>  
> -	if (!entry)
> +	/*
> +	 * Some architectures only support setting return values during
> +	 * syscall exit under ptrace, and on exit the syscall number may
> +	 * no longer be available. Therefore, save the initial sycall

s/sycall/syscall/

Otherwise looks good. Thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

> +	 * number here, so it can be examined during both entry and exit
> +	 * phases.
> +	 */
> +	if (entry)
> +		self->syscall_nr = get_syscall(_metadata, tracee);
> +	else
>  		return;
>  
> -	nr = get_syscall(_metadata, tracee);
> -
> -	if (nr == __NR_getpid)
> +	switch (self->syscall_nr) {
> +	case __NR_getpid:
>  		change_syscall(_metadata, tracee, __NR_getppid, 0);
> -	if (nr == __NR_gettid)
> +		break;
> +	case __NR_gettid:
>  		change_syscall(_metadata, tracee, -1, 45000);
> -	if (nr == __NR_openat)
> +		break;
> +	case __NR_openat:
>  		change_syscall(_metadata, tracee, -1, -ESRCH);
> +		break;
> +	}
>  }
>  
> -FIXTURE(TRACE_syscall) {
> -	struct sock_fprog prog;
> -	pid_t tracer, mytid, mypid, parent;
> -};
> -
>  FIXTURE_VARIANT(TRACE_syscall) {
>  	/*
>  	 * All of the SECCOMP_RET_TRACE behaviors can be tested with either
> @@ -2044,7 +2058,7 @@ FIXTURE_SETUP(TRACE_syscall)
>  	self->tracer = setup_trace_fixture(_metadata,
>  					   variant->use_ptrace ? tracer_ptrace
>  							       : tracer_seccomp,
> -					   NULL, variant->use_ptrace);
> +					   self, variant->use_ptrace);
>  
>  	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>  	ASSERT_EQ(0, ret);
> -- 
> 2.25.1
> 

^ permalink raw reply

* Re: [PATCH v2 2/4] selftests/seccomp: Allow syscall nr and ret value to be set separately
From: Christian Brauner @ 2020-09-21  7:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200919080637.259478-3-keescook@chromium.org>

On Sat, Sep 19, 2020 at 01:06:35AM -0700, Kees Cook wrote:
> In preparation for setting syscall nr and ret values separately, refactor
> the helpers to take a pointer to a value, so that a NULL can indicate
> "do not change this respective value". This is done to keep the regset
> read/write happening once and in one code path.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH v2 3/4] selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
From: Christian Brauner @ 2020-09-21  7:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200919080637.259478-4-keescook@chromium.org>

On Sat, Sep 19, 2020 at 01:06:36AM -0700, Kees Cook wrote:
> Some archs (like powerpc) only support changing the return code during
> syscall exit when ptrace is used. Test entry vs exit phases for which
> portions of the syscall number and return values need to be set at which
> different phases. For non-powerpc, all changes are made during ptrace
> syscall entry, as before. For powerpc, the syscall number is changed at
> ptrace syscall entry and the syscall return value is changed on ptrace
> syscall exit.
> 
> Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
> Fixes: 58d0a862f573 ("seccomp: add tests for ptrace hole")
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH v2 4/4] selftests/clone3: Avoid OS-defined clone_args
From: Christian Brauner @ 2020-09-21  7:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200919080637.259478-5-keescook@chromium.org>

On Sat, Sep 19, 2020 at 01:06:37AM -0700, Kees Cook wrote:
> As the UAPI headers start to appear in distros, we need to avoid outdated
> versions of struct clone_args to be able to test modern features;
> rename to "struct __clone_args". Additionally update the struct size
> macro names to match UAPI names.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good, thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
From: Frederic Barrat @ 2020-09-21  7:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linuxppc-dev, linux-kernel, stable, Andrew Donnellan
In-Reply-To: <20200919181029.GI2431@sasha-vm>



Le 19/09/2020 à 20:10, Sasha Levin a écrit :
> On Fri, Sep 18, 2020 at 08:35:06AM +0200, Frederic Barrat wrote:
>>
>>
>> Le 18/09/2020 à 03:57, Sasha Levin a écrit :
>>> From: Frederic Barrat <fbarrat@linux.ibm.com>
>>>
>>> [ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
>>>
>>
>> This patch is not desirable for stable, for 5.4 and 4.19 (it was 
>> already flagged by autosel back in April. Not sure why it's showing 
>> again now)
> 
> Hey Fred,
> 
> This was a bit of a "lie", it wasn't a run of AUTOSEL, but rather an
> audit of patches that went into distro/vendor trees but not into the
> upstream stable trees.
> 
> I can see that this patch was pulled into Ubuntu's 5.4 tree, is it not
> needed in the upstream stable tree?


That patch in itself is useless (it replaces a ref counter leak by 
another one). It was part of a longer series that we backported to 
Ubuntu's 5.4 tree.
So it's really not needed on the stable trees. It likely wouldn't hurt 
or break anything, but there's really no point.

   Fred


^ permalink raw reply

* [PATCH V2] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param
From: sathnaga @ 2020-09-21  9:02 UTC (permalink / raw)
  To: linux-doc
  Cc: Jonathan Corbet, Randy Dunlap, linux-kernel, kvm-ppc,
	Paul Mackerras, Satheesh Rajendran, linuxppc-dev

From: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>

Add document entry for kvm_cma_resv_ratio kernel param which
is used to alter the KVM contiguous memory allocation percentage
for hash pagetable allocation used by hash mode PowerPC KVM guests.

Cc: linux-kernel@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---

V2: 
Addressed review comments from Randy.

V1: https://lkml.org/lkml/2020/9/16/72
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..932ed45740c9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2258,6 +2258,14 @@
 			[KVM,ARM] Allow use of GICv4 for direct injection of
 			LPIs.
 
+	kvm_cma_resv_ratio=n [PPC]
+			Reserves given percentage from system memory area for
+			contiguous memory allocation for KVM hash pagetable
+			allocation.
+			By default it reserves 5% of total system memory.
+			Format: <integer>
+			Default: 5
+
 	kvm-intel.ept=	[KVM,Intel] Disable extended page tables
 			(virtualized MMU) support on capable Intel chips.
 			Default is 1 (enabled)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 00/11] Optimization to improve CPU online/offline on Powerpc
From: Srikar Dronamraju @ 2020-09-21  9:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Oliver O'Halloran, Satheesh Rajendran,
	linuxppc-dev, Ingo Molnar

Here are some optimizations and fixes to make CPU online/offline
faster and hence result in faster bootup.

Its based on top of my v5 coregroup support patchset.
https://lore.kernel.org/linuxppc-dev/20200810071834.92514-1-srikar@linux.vnet.ibm.com/t/#u

Anton reported that his 4096 cpu (1024 cores in a socket) was taking too
long to boot. He also analyzed that most of the time was being spent on
updating cpu_core_mask.

The first two patches should solve Anton's immediate problem.
On the unofficial patches, Anton reported that the boot time came from 30
mins to 6 seconds. (Basically a high core count in a single socket
configuration). Satheesh also reported similar numbers.

The rest are cleanups/optimizations.

Since cpu_core_mask is an exported symbol for a long duration, lets retain
as a snapshot of cpumask_of_node.

$ lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              1024
On-line CPU(s) list: 0-1023
Thread(s) per core:  8
Core(s) per socket:  8
Socket(s):           16
NUMA node(s):        16
Model:               2.0 (pvr 004d 0200)
Model name:          POWER8 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           64K
L1i cache:           32K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
NUMA node2 CPU(s):   128-191
NUMA node3 CPU(s):   192-255
NUMA node4 CPU(s):   256-319
NUMA node5 CPU(s):   320-383
NUMA node6 CPU(s):   384-447
NUMA node7 CPU(s):   448-511
NUMA node8 CPU(s):   512-575
NUMA node9 CPU(s):   576-639
NUMA node10 CPU(s):  640-703
NUMA node11 CPU(s):  704-767
NUMA node12 CPU(s):  768-831
NUMA node13 CPU(s):  832-895
NUMA node14 CPU(s):  896-959
NUMA node15 CPU(s):  960-1023

$ dmesg -k | grep -i -e Bringing -e Brought -e sysrq -e bug
With powerp/next
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.354971] smp: Bringing up secondary CPUs ...
[  233.354676] smp: Brought up 16 nodes, 1024 CPUs
[  330.023073] sysrq: Changing Loglevel
[  330.023101] sysrq: Loglevel set to 9

With +patchset
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.351703] smp: Bringing up secondary CPUs ...
[    4.059859] smp: Brought up 16 nodes, 1024 CPUs
[   98.309015] sysrq: Changing Loglevel
[   98.309044] sysrq: Loglevel set to 9

Observations:
CPU bringup time reduced to 4 seconds from 233 seconds on this 1024 CPU
system. This resulted in System boot up time reducing to 98 seconds from
330 seconds. The actual improvement would depend on your system topology.

Topology verification post patchset on a 2 node Power9 PowerVM LPAR

powerpc/next                                                        +patchset
------------                                                        ---------
$ lscpu
Architecture:        ppc64le                                        Architecture:        ppc64le
Byte Order:          Little Endian                                  Byte Order:          Little Endian
CPU(s):              128                                            CPU(s):              128
On-line CPU(s) list: 0-127                                          On-line CPU(s) list: 0-127
Thread(s) per core:  8                                              Thread(s) per core:  8
Core(s) per socket:  8                                              Core(s) per socket:  8
Socket(s):           2                                              Socket(s):           2
NUMA node(s):        2                                              NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)                            Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported        Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp                                           Hypervisor vendor:   pHyp
Virtualization type: para                                           Virtualization type: para
L1d cache:           32K                                            L1d cache:           32K
L1i cache:           32K                                            L1i cache:           32K
L2 cache:            512K                                           L2 cache:            512K
L3 cache:            10240K                                         L3 cache:            10240K
NUMA node0 CPU(s):   0-63                                           NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127                                         NUMA node1 CPU(s):   64-127

$ tail -f /proc/cpuinfo
processor	: 127                                               processor	: 127
cpu		: POWER9 (architected), altivec supported           cpu		: POWER9 (architected), altivec supported
clock		: 3000.000000MHz                                    clock		: 3000.000000MHz
revision	: 2.2 (pvr 004e 0202)                               revision	: 2.2 (pvr 004e 0202)

timebase	: 512000000                                         timebase	: 512000000
platform	: pSeries                                           platform	: pSeries
model		: IBM,9008-22L                                      model		: IBM,9008-22L
machine		: CHRP IBM,9008-22L                                 machine		: CHRP IBM,9008-22L
MMU		: Radix                                             MMU		: Radix

$ grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
--------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT                 /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT
/proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE               /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE
/proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE                 /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA                /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA

$ grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
---------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391               /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327               /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327
/proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071               /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801              /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801

Post ppc64_cpu --smt=1
$ tail -f /proc/cpuinfo
processor	: 120                                               processor	: 120
cpu		: POWER9 (architected), altivec supported           cpu		: POWER9 (architected), altivec supported
clock		: 3000.000000MHz                                    clock		: 3000.000000MHz
revision	: 2.2 (pvr 004e 0202)                               revision	: 2.2 (pvr 004e 0202)

timebase	: 512000000                                         timebase	: 512000000
platform	: pSeries                                           platform	: pSeries
model		: IBM,9008-22L                                      model	: IBM,9008-22L
machine		: CHRP IBM,9008-22L                                 machine	: CHRP IBM,9008-22L
MMU		: Radix                                             MMU		: Radix

$ grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
--------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:DIE                 /proc/sys/kernel/sched_domain/cpu0/domain0/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain1/name:NUMA                /proc/sys/kernel/sched_domain/cpu0/domain1/name:NUMA

$ grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
---------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2071               /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:12801              /proc/sys/kernel/sched_domain/cpu0/domain1/flags:12801

Performance impact post +patchset
---------------------------------
100 iterations of ebizzy
Units: Records/second : higher is better
-----------------------------------------
kernel        N    Min     Max     Median  Avg        Stddev
powerpc/next  100  753917  870520  819054  817636.56  22649.7
+patchset     100  746258  874984  816681  813876.74  26424.351


100 iterations of perf bench sched pipe -l 10000000 (aka Hackbench)
units: usec/ops: lesser is better
--------------------------------
kernel        N    Min        Max        Median     Avg        Stddev
powerpc/next  100  13.845834  14.569539  14.06263   14.086167  0.17512607
+patchset     100  13.637611  18.097744  13.862656  13.9257    0.43872453


schbench Latency percentiles (usec)
units: usec : lesser is better
-----------------------------------
powerpc/next      	+patchset
50.0000th: 48     	50.0000th: 49
75.0000th: 65     	75.0000th: 66
90.0000th: 77     	90.0000th: 79
95.0000th: 84     	95.0000th: 85
*99.0000th: 101   	*99.0000th: 99
99.5000th: 113    	99.5000th: 104
99.9000th: 159    	99.9000th: 129
min=0, max=15221  	min=0, max=7666

100 interations of ppc64_cpu --smt=1 / ppc64_cpu --smt=8
Units: seconds : lesser is better
---------------------------------
ppc64_cpu --smt=1
kernel        N    Min    Max    Median  Avg      Stddev
powerpc/next  100  13.39  17.55  14.71   14.7658  0.69184745
+patchset     100  13.3   16.27  14.33   14.4179  0.5427433

ppc64_cpu --smt=8
kernel        N    Min    Max    Median  Avg      Stddev
powerpc/next  100  21.65  26.17  23.71   23.7111  0.8589786
+patchset     100  21.88  25.79  23.16   23.2945  0.86394839


Observations:
Performance of ebizzy/ perf_sched_bench / schbench remain the
same with and without the patchset.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Changelog v1->v2:
v1 link: https://lore.kernel.org/linuxppc-dev/20200727075532.30058-1-srikar@linux.vnet.ibm.com/t/#u
	Added five more patches on top of Seven.
	Rebased to 19th Sept 2020 powerpc/next (based on v5.9-rc2)

Srikar Dronamraju (11):
  powerpc/topology: Update topology_core_cpumask
  powerpc/smp: Stop updating cpu_core_mask
  powerpc/smp: Remove get_physical_package_id
  powerpc/smp: Optimize remove_cpu_from_masks
  powerpc/smp: Limit CPUs traversed to within a node.
  powerpc/smp: Stop passing mask to update_mask_by_l2
  powerpc/smp: Depend on cpu_l1_cache_map when adding CPUs
  powerpc/smp: Check for duplicate topologies and consolidate
  powerpc/smp: Optimize update_mask_by_l2
  powerpc/smp: Move coregroup mask updation to a new function
  powerpc/smp: Optimize update_coregroup_mask

 arch/powerpc/include/asm/smp.h      |   5 -
 arch/powerpc/include/asm/topology.h |   7 +-
 arch/powerpc/kernel/smp.c           | 186 ++++++++++++++++++----------
 3 files changed, 120 insertions(+), 78 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH v2 01/11] powerpc/topology: Update topology_core_cpumask
From: Srikar Dronamraju @ 2020-09-21  9:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Oliver O'Halloran, Satheesh Rajendran,
	linuxppc-dev, Ingo Molnar
In-Reply-To: <20200921095653.9701-1-srikar@linux.vnet.ibm.com>

On Power, cpu_core_mask and cpu_cpu_mask refer to the same set of CPUs.
cpu_cpu_mask is needed by scheduler, hence look at deprecating
cpu_core_mask. Before deleting the cpu_core_mask, ensure its only user
is moved to cpu_cpu_mask.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 6609174918ab..e0f232533c9d 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -122,7 +122,7 @@ int get_physical_package_id(int cpu);
 #endif
 
 #define topology_sibling_cpumask(cpu)	(per_cpu(cpu_sibling_map, cpu))
-#define topology_core_cpumask(cpu)	(per_cpu(cpu_core_map, cpu))
+#define topology_core_cpumask(cpu)	(cpu_cpu_mask(cpu))
 #define topology_core_id(cpu)		(cpu_to_core_id(cpu))
 
 #endif
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 02/11] powerpc/smp: Stop updating cpu_core_mask
From: Srikar Dronamraju @ 2020-09-21  9:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Oliver O'Halloran, Satheesh Rajendran,
	linuxppc-dev, Ingo Molnar
In-Reply-To: <20200921095653.9701-1-srikar@linux.vnet.ibm.com>

Anton Blanchard reported that his 4096 vcpu KVM guest took around 30
minutes to boot. He also analyzed it to the time taken to iterate while
setting the cpu_core_mask.

Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU
would be equal on Power. However updating cpu_core_mask took forever to
update as its a per cpu cpumask variable. Instead cpu_cpu_mask was a per
NODE /per DIE cpumask that was shared by all the respective CPUs.

Also cpu_cpu_mask is needed from a scheduler perspective. However
cpu_core_map is an exported symbol. Hence stop updating cpu_core_map
and make it point to cpu_cpu_mask.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/smp.h |  5 -----
 arch/powerpc/kernel/smp.c      | 33 +++++++--------------------------
 2 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 041f0b97c45b..40e121dd16af 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -119,11 +119,6 @@ static inline struct cpumask *cpu_sibling_mask(int cpu)
 	return per_cpu(cpu_sibling_map, cpu);
 }
 
-static inline struct cpumask *cpu_core_mask(int cpu)
-{
-	return per_cpu(cpu_core_map, cpu);
-}
-
 static inline struct cpumask *cpu_l2_cache_mask(int cpu)
 {
 	return per_cpu(cpu_l2_cache_map, cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3d96752d6570..ec41491beca4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -953,12 +953,17 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 				local_memory_node(numa_cpu_lookup_table[cpu]));
 		}
 #endif
+		/*
+		 * cpu_core_map is now more updated and exists only since
+		 * its been exported for long. It only will have a snapshot
+		 * of cpu_cpu_mask.
+		 */
+		cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
 	}
 
 	/* Init the cpumasks so the boot CPU is related to itself */
 	cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
 	cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
-	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
 	if (has_coregroup_support())
 		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
@@ -1260,9 +1265,7 @@ static void remove_cpu_from_masks(int cpu)
 {
 	int i;
 
-	/* NB: cpu_core_mask is a superset of the others */
-	for_each_cpu(i, cpu_core_mask(cpu)) {
-		set_cpus_unrelated(cpu, i, cpu_core_mask);
+	for_each_cpu(i, cpu_cpu_mask(cpu)) {
 		set_cpus_unrelated(cpu, i, cpu_l2_cache_mask);
 		set_cpus_unrelated(cpu, i, cpu_sibling_mask);
 		if (has_big_cores)
@@ -1312,7 +1315,6 @@ EXPORT_SYMBOL_GPL(get_physical_package_id);
 static void add_cpu_to_masks(int cpu)
 {
 	int first_thread = cpu_first_thread_sibling(cpu);
-	int pkg_id = get_physical_package_id(cpu);
 	int i;
 
 	/*
@@ -1320,7 +1322,6 @@ static void add_cpu_to_masks(int cpu)
 	 * add it to it's own thread sibling mask.
 	 */
 	cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
-	cpumask_set_cpu(cpu, cpu_core_mask(cpu));
 
 	for (i = first_thread; i < first_thread + threads_per_core; i++)
 		if (cpu_online(i))
@@ -1342,26 +1343,6 @@ static void add_cpu_to_masks(int cpu)
 				set_cpus_related(cpu, i, cpu_coregroup_mask);
 		}
 	}
-
-	if (pkg_id == -1) {
-		struct cpumask *(*mask)(int) = cpu_sibling_mask;
-
-		/*
-		 * Copy the sibling mask into core sibling mask and
-		 * mark any CPUs on the same chip as this CPU.
-		 */
-		if (shared_caches)
-			mask = cpu_l2_cache_mask;
-
-		for_each_cpu(i, mask(cpu))
-			set_cpus_related(cpu, i, cpu_core_mask);
-
-		return;
-	}
-
-	for_each_cpu(i, cpu_online_mask)
-		if (get_physical_package_id(i) == pkg_id)
-			set_cpus_related(cpu, i, cpu_core_mask);
 }
 
 /* Activate a secondary processor. */
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 03/11] powerpc/smp: Remove get_physical_package_id
From: Srikar Dronamraju @ 2020-09-21  9:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Oliver O'Halloran, Satheesh Rajendran,
	linuxppc-dev, Ingo Molnar
In-Reply-To: <20200921095653.9701-1-srikar@linux.vnet.ibm.com>

Now that cpu_core_mask has been removed and topology_core_cpumask has
been updated to use cpu_cpu_mask, we no more need
get_physical_package_id.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h |  5 -----
 arch/powerpc/kernel/smp.c           | 20 --------------------
 2 files changed, 25 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index e0f232533c9d..e45219f74be0 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -114,12 +114,7 @@ static inline int cpu_to_coregroup_id(int cpu)
 #ifdef CONFIG_PPC64
 #include <asm/smp.h>
 
-#ifdef CONFIG_PPC_SPLPAR
-int get_physical_package_id(int cpu);
-#define topology_physical_package_id(cpu)	(get_physical_package_id(cpu))
-#else
 #define topology_physical_package_id(cpu)	(cpu_to_chip_id(cpu))
-#endif
 
 #define topology_sibling_cpumask(cpu)	(per_cpu(cpu_sibling_map, cpu))
 #define topology_core_cpumask(cpu)	(cpu_cpu_mask(cpu))
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ec41491beca4..8c095fe237b2 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1292,26 +1292,6 @@ static inline void add_cpu_to_smallcore_masks(int cpu)
 	}
 }
 
-int get_physical_package_id(int cpu)
-{
-	int pkg_id = cpu_to_chip_id(cpu);
-
-	/*
-	 * If the platform is PowerNV or Guest on KVM, ibm,chip-id is
-	 * defined. Hence we would return the chip-id as the result of
-	 * get_physical_package_id.
-	 */
-	if (pkg_id == -1 && firmware_has_feature(FW_FEATURE_LPAR) &&
-	    IS_ENABLED(CONFIG_PPC_SPLPAR)) {
-		struct device_node *np = of_get_cpu_node(cpu, NULL);
-		pkg_id = of_node_to_nid(np);
-		of_node_put(np);
-	}
-
-	return pkg_id;
-}
-EXPORT_SYMBOL_GPL(get_physical_package_id);
-
 static void add_cpu_to_masks(int cpu)
 {
 	int first_thread = cpu_first_thread_sibling(cpu);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 05/11] powerpc/smp: Limit CPUs traversed to within a node.
From: Srikar Dronamraju @ 2020-09-21  9:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Oliver O'Halloran, Satheesh Rajendran,
	linuxppc-dev, Ingo Molnar
In-Reply-To: <20200921095653.9701-1-srikar@linux.vnet.ibm.com>

All the arch specific topology cpumasks are within a node/DIE.
However when setting these per CPU cpumasks, system traverses through
all the online CPUs. This is redundant.

Reduce the traversal to only CPUs that are online in the node to which
the CPU belongs to.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 2e61a81aad88..c860c4950c9f 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1241,7 +1241,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
 	}
 
 	cpumask_set_cpu(cpu, mask_fn(cpu));
-	for_each_cpu(i, cpu_online_mask) {
+	for_each_cpu_and(i, cpu_online_mask, cpu_cpu_mask(cpu)) {
 		/*
 		 * when updating the marks the current CPU has not been marked
 		 * online, but we need to update the cache masks
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox