LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Rich Felker @ 2020-06-04 17:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604171232.GG31009@gate.crashing.org>

On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > well, ppc64le already cannot be run on those, as far as I know (I
> > don't think it's possible to build ppc64le userland without VSX in
> > any configuration)
> 
> VSX is required by the ELFv2 ABI:
> 
> """
> Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> compliant processors must implement the following categories:

This is not actually ABI but IBM policy laundered into an ABI
document, which musl does not honor.

Rich

^ permalink raw reply

* Re: Boot issue with the latest Git kernel
From: Christophe Leroy @ 2020-06-04 17:15 UTC (permalink / raw)
  To: Christian Zigotzky
  Cc: Darren Stevens, linuxppc-dev, Christoph Hellwig, R.T.Dickinson
In-Reply-To: <067BBAB3-19B6-42C6-AA9F-B9F14314255C@xenosoft.de>



Le 04/06/2020 à 17:53, Christian Zigotzky a écrit :
> 
> 
>> On 4. Jun 2020, at 16:29, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
>>
>> And are you able to perform a 'git bisect' to identify the guilty commit ?
>>
>> Thanks
>> Christophe
> 
> Hello Christophe,
> 
> Unfortunately I haven’t had time to bisect the latest Git kernel. Does it boot on your PowerPC machine?
> 

Yes today's linux-next boots on my powerpc 8xx board.

Christophe

^ permalink raw reply

* Re: ppc64le and 32-bit LE userland compatibility
From: Segher Boessenkool @ 2020-06-04 17:12 UTC (permalink / raw)
  To: Daniel Kolesa
  Cc: libc-alpha, eery, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <3aeb6dfe-ae23-42f9-ac23-16be6b54a850@www.fastmail.com>

On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> well, ppc64le already cannot be run on those, as far as I know (I don't think it's possible to build ppc64le userland without VSX in any configuration)

VSX is required by the ELFv2 ABI:

"""
Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
compliant processors must implement the following categories:

[...]

Vector-Scalar
"""


Segher

^ permalink raw reply

* Re: [PATCH] mm: Fix pud_alloc_track()
From: Sedat Dilek @ 2020-06-04 16:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arch, Stephen Rothwell, jroedel, linux-mm, peterz,
	linuxppc-dev, Joerg Roedel, linux-kernel, Steven Rostedt,
	Abdul Haleem, linux-next, Satheesh Rajendran, Andy Lutomirski,
	Andrew Morton, manvanth, hch
In-Reply-To: <20200604164814.GA7600@kernel.org>

On Thu, Jun 4, 2020 at 6:49 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Thu, Jun 04, 2020 at 09:44:46AM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <jroedel@suse.de>
> >
> > The pud_alloc_track() needs to do different checks based on whether
> > __ARCH_HAS_5LEVEL_HACK is defined, like it already does in
> > pud_alloc(). Otherwise it causes boot failures on PowerPC.
> >
> > Provide the correct implementations for both possible settings of
> > __ARCH_HAS_5LEVEL_HACK to fix the boot problems.
>
> There is a patch in mmotm [1] that completely removes
> __ARCH_HAS_5LEVEL_HACK which is a part of the series [2] that updates
> p4d folding accross architectures. This should fix boot on PowerPC and
> the addition of pXd_alloc_track() for __ARCH_HAS_5LEVEL_HACK wouldn't be
> necessary.
>
>
> [1] https://github.com/hnaz/linux-mm/commit/cfae68792af3731ac902ea6ba5ed8df5a0f6bd2f
> [2] https://lore.kernel.org/kvmarm/20200414153455.21744-1-rppt@kernel.org/
>

That link shows an overview of v4 and is easily downloadable as a
single mbox file.
See " Series = mm: remove __ARCH_HAS_5LEVEL_HACK"

- Sedat -

[1] https://lore.kernel.org/patchwork/project/lkml/list/?series=438627

> > Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> > Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> > Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> > Fixes: d8626138009b ("mm: add functions to track page directory modifications")
> > Signed-off-by: Joerg Roedel <jroedel@suse.de>
> > ---
> >  include/asm-generic/5level-fixup.h |  5 +++++
> >  include/linux/mm.h                 | 26 +++++++++++++-------------
> >  2 files changed, 18 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
> > index 58046ddc08d0..afbab31fbd7e 100644
> > --- a/include/asm-generic/5level-fixup.h
> > +++ b/include/asm-generic/5level-fixup.h
> > @@ -17,6 +17,11 @@
> >       ((unlikely(pgd_none(*(p4d))) && __pud_alloc(mm, p4d, address)) ? \
> >               NULL : pud_offset(p4d, address))
> >
> > +#define pud_alloc_track(mm, p4d, address, mask)                                      \
> > +     ((unlikely(pgd_none(*(p4d))) &&                                         \
> > +       (__pud_alloc(mm, p4d, address) || ({*(mask)|=PGTBL_P4D_MODIFIED;0;})))?       \
> > +       NULL : pud_offset(p4d, address))
> > +
> >  #define p4d_alloc(mm, pgd, address)          (pgd)
> >  #define p4d_alloc_track(mm, pgd, address, mask)      (pgd)
> >  #define p4d_offset(pgd, start)                       (pgd)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 66e0977f970a..ad3b31c5bcc3 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2088,35 +2088,35 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, p4d_t *p4d,
> >               NULL : pud_offset(p4d, address);
> >  }
> >
> > -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> > +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> >                                    unsigned long address,
> >                                    pgtbl_mod_mask *mod_mask)
> > -
> >  {
> > -     if (unlikely(pgd_none(*pgd))) {
> > -             if (__p4d_alloc(mm, pgd, address))
> > +     if (unlikely(p4d_none(*p4d))) {
> > +             if (__pud_alloc(mm, p4d, address))
> >                       return NULL;
> > -             *mod_mask |= PGTBL_PGD_MODIFIED;
> > +             *mod_mask |= PGTBL_P4D_MODIFIED;
> >       }
> >
> > -     return p4d_offset(pgd, address);
> > +     return pud_offset(p4d, address);
> >  }
> >
> > -#endif /* !__ARCH_HAS_5LEVEL_HACK */
> > -
> > -static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> > +static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> >                                    unsigned long address,
> >                                    pgtbl_mod_mask *mod_mask)
> > +
> >  {
> > -     if (unlikely(p4d_none(*p4d))) {
> > -             if (__pud_alloc(mm, p4d, address))
> > +     if (unlikely(pgd_none(*pgd))) {
> > +             if (__p4d_alloc(mm, pgd, address))
> >                       return NULL;
> > -             *mod_mask |= PGTBL_P4D_MODIFIED;
> > +             *mod_mask |= PGTBL_PGD_MODIFIED;
> >       }
> >
> > -     return pud_offset(p4d, address);
> > +     return p4d_offset(pgd, address);
> >  }
> >
> > +#endif /* !__ARCH_HAS_5LEVEL_HACK */
> > +
> >  static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
> >  {
> >       return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
> > --
> > 2.26.2
> >
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error
From: Peter Zijlstra @ 2020-06-04 16:57 UTC (permalink / raw)
  To: David Miller
  Cc: linux-s390, linuxppc-dev, bigeasy, x86, heiko.carstens,
	linux-kernel, rostedt, a.darwish, sparclinux, tglx, will, mingo
In-Reply-To: <20200529.162917.1970892823680223252.davem@davemloft.net>

On Fri, May 29, 2020 at 04:29:17PM -0700, David Miller wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri, 29 May 2020 23:35:51 +0200
> 
> > ../arch/sparc/include/asm/percpu_64.h:7:24: warning: call-clobbered register used for global register variable
> > register unsigned long __local_per_cpu_offset asm("g5");
> 
> The "-ffixed-g5" option on the command line tells gcc that we are
> using 'g5' as a fixed register, so some part of your build isn't using
> the:
> 
> KBUILD_CFLAGS += -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare
> 
> from arch/sparc/Makefile for some reason.

Thanks, that was the clue I needed.

I think I see, what happens is that these headers end up in the VDSO
build, and that doesn't have these CFLAGS, because userspace.

Let me see what to do about that.

^ permalink raw reply

* Re: [PATCH] mm: Fix pud_alloc_track()
From: Mike Rapoport @ 2020-06-04 16:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-arch, Stephen Rothwell, jroedel, linux-mm, peterz,
	linuxppc-dev, linux-kernel, Steven Rostedt, Abdul Haleem,
	linux-next, Satheesh Rajendran, Andy Lutomirski, Andrew Morton,
	manvanth, hch
In-Reply-To: <20200604074446.23944-1-joro@8bytes.org>

On Thu, Jun 04, 2020 at 09:44:46AM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The pud_alloc_track() needs to do different checks based on whether
> __ARCH_HAS_5LEVEL_HACK is defined, like it already does in
> pud_alloc(). Otherwise it causes boot failures on PowerPC.
> 
> Provide the correct implementations for both possible settings of
> __ARCH_HAS_5LEVEL_HACK to fix the boot problems.

There is a patch in mmotm [1] that completely removes
__ARCH_HAS_5LEVEL_HACK which is a part of the series [2] that updates
p4d folding accross architectures. This should fix boot on PowerPC and
the addition of pXd_alloc_track() for __ARCH_HAS_5LEVEL_HACK wouldn't be
necessary.


[1] https://github.com/hnaz/linux-mm/commit/cfae68792af3731ac902ea6ba5ed8df5a0f6bd2f
[2] https://lore.kernel.org/kvmarm/20200414153455.21744-1-rppt@kernel.org/

> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Fixes: d8626138009b ("mm: add functions to track page directory modifications")
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  include/asm-generic/5level-fixup.h |  5 +++++
>  include/linux/mm.h                 | 26 +++++++++++++-------------
>  2 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
> index 58046ddc08d0..afbab31fbd7e 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -17,6 +17,11 @@
>  	((unlikely(pgd_none(*(p4d))) && __pud_alloc(mm, p4d, address)) ? \
>  		NULL : pud_offset(p4d, address))
>  
> +#define pud_alloc_track(mm, p4d, address, mask)					\
> +	((unlikely(pgd_none(*(p4d))) &&						\
> +	  (__pud_alloc(mm, p4d, address) || ({*(mask)|=PGTBL_P4D_MODIFIED;0;})))?	\
> +	  NULL : pud_offset(p4d, address))
> +
>  #define p4d_alloc(mm, pgd, address)		(pgd)
>  #define p4d_alloc_track(mm, pgd, address, mask)	(pgd)
>  #define p4d_offset(pgd, start)			(pgd)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 66e0977f970a..ad3b31c5bcc3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2088,35 +2088,35 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, p4d_t *p4d,
>  		NULL : pud_offset(p4d, address);
>  }
>  
> -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
>  				     unsigned long address,
>  				     pgtbl_mod_mask *mod_mask)
> -
>  {
> -	if (unlikely(pgd_none(*pgd))) {
> -		if (__p4d_alloc(mm, pgd, address))
> +	if (unlikely(p4d_none(*p4d))) {
> +		if (__pud_alloc(mm, p4d, address))
>  			return NULL;
> -		*mod_mask |= PGTBL_PGD_MODIFIED;
> +		*mod_mask |= PGTBL_P4D_MODIFIED;
>  	}
>  
> -	return p4d_offset(pgd, address);
> +	return pud_offset(p4d, address);
>  }
>  
> -#endif /* !__ARCH_HAS_5LEVEL_HACK */
> -
> -static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> +static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
>  				     unsigned long address,
>  				     pgtbl_mod_mask *mod_mask)
> +
>  {
> -	if (unlikely(p4d_none(*p4d))) {
> -		if (__pud_alloc(mm, p4d, address))
> +	if (unlikely(pgd_none(*pgd))) {
> +		if (__p4d_alloc(mm, pgd, address))
>  			return NULL;
> -		*mod_mask |= PGTBL_P4D_MODIFIED;
> +		*mod_mask |= PGTBL_PGD_MODIFIED;
>  	}
>  
> -	return pud_offset(p4d, address);
> +	return p4d_offset(pgd, address);
>  }
>  
> +#endif /* !__ARCH_HAS_5LEVEL_HACK */
> +
>  static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
>  {
>  	return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
> -- 
> 2.26.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Will Deacon @ 2020-06-04 16:10 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Stephen Rothwell, arnd, Peter Zijlstra, Linux Kernel Mailing List,
	Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <1160ea76-729b-60a2-31d6-998c57b77858@csgroup.eu>

[+Arnd since I think we spoke about this on IRC once]

On Thu, Jun 04, 2020 at 02:35:14PM +0000, Christophe Leroy wrote:
> Now I get the same issue at
> 
>    CC      mm/mincore.o
> In file included from ./include/asm-generic/bug.h:5:0,
>                  from ./arch/powerpc/include/asm/bug.h:109,
>                  from ./include/linux/bug.h:5,
>                  from ./include/linux/mmdebug.h:5,
>                  from ./include/linux/mm.h:9,
>                  from ./include/linux/pagemap.h:8,
>                  from mm/mincore.c:11:
> In function 'huge_ptep_get',
>     inlined from 'mincore_hugetlb' at mm/mincore.c:35:20:
> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_218'
> declared with attribute error: Unsupported access size for
> {READ,WRITE}_ONCE().
>   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>                                       ^
> ./include/linux/compiler.h:373:4: note: in definition of macro
> '__compiletime_assert'
>     prefix ## suffix();    \
>     ^
> ./include/linux/compiler.h:392:2: note: in expansion of macro
> '_compiletime_assert'
>   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>   ^
> ./include/linux/compiler.h:405:2: note: in expansion of macro
> 'compiletime_assert'
>   compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
>   ^
> ./include/linux/compiler.h:291:2: note: in expansion of macro
> 'compiletime_assert_rwonce_type'
>   compiletime_assert_rwonce_type(x);    \
>   ^
> ./include/asm-generic/hugetlb.h:125:9: note: in expansion of macro
> 'READ_ONCE'
>   return READ_ONCE(*ptep);
>          ^
> make[2]: *** [mm/mincore.o] Error 1
> 
> I guess for this one I have to implement platform specific huge_ptep_get()

Yeah, or bite the bullet and introduce proper accessors for all these
things:

	pte_read()
	pmd_read()
	pud_read()
	etc

with the default implementation pointing at READ_ONCE(), but allowing an
architecture override. It's a big job because mm/ would need repainting,
but it would have the benefit of being able to remove aggregate types from
READ_ONCE() entirely and using a special accessor just for the page-table
types.

That might also mean that we could have asm-generic versions of things
like ptep_get_and_clear() that work for architectures with hardware
update and need atomic rmw. But I'm getting ahead of myself.

Will

^ permalink raw reply

* Re: Boot issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-04 15:53 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Darren Stevens, linuxppc-dev, Christoph Hellwig, R.T.Dickinson
In-Reply-To: <f7f1b233-6101-2316-7996-4654586b7d24@csgroup.eu>



> On 4. Jun 2020, at 16:29, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> And are you able to perform a 'git bisect' to identify the guilty commit ?
> 
> Thanks
> Christophe

Hello Christophe,

Unfortunately I haven’t had time to bisect the latest Git kernel. Does it boot on your PowerPC machine?

Thanks,
Christian

^ permalink raw reply

* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Christophe Leroy @ 2020-06-04 14:35 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: Stephen Rothwell, Linux Kernel Mailing List,
	Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <20200604120007.GA4117@hirez.programming.kicks-ass.net>



On 06/04/2020 12:00 PM, Peter Zijlstra wrote:
> On Thu, Jun 04, 2020 at 12:17:23PM +0100, Will Deacon wrote:
>> Hi, [+Peter]
>>
>> On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
>>> Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
>>> CONFIG_PPC_4K_PAGES, getting the following build failure:
>>>
>>>    CC      mm/gup.o
>>> In file included from ./include/linux/kernel.h:11:0,
>>>                   from mm/gup.c:2:
>>> In function 'gup_hugepte.constprop',
>>>      inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
>>> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
>>> declared with attribute error: Unsupported access size for
>>> {READ,WRITE}_ONCE().
>>>    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>>>                                        ^
>>> ./include/linux/compiler.h:373:4: note: in definition of macro
>>> '__compiletime_assert'
>>>      prefix ## suffix();    \
>>>      ^
>>> ./include/linux/compiler.h:392:2: note: in expansion of macro
>>> '_compiletime_assert'
>>>    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>>>    ^
>>> ./include/linux/compiler.h:405:2: note: in expansion of macro
>>> 'compiletime_assert'
>>>    compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
>>>    ^
>>> ./include/linux/compiler.h:291:2: note: in expansion of macro
>>> 'compiletime_assert_rwonce_type'
>>>    compiletime_assert_rwonce_type(x);    \
>>>    ^
>>> mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
>>>    pte = READ_ONCE(*ptep);
>>>          ^
>>> In function 'gup_get_pte',
>>>      inlined from 'gup_pte_range' at mm/gup.c:2228:9,
>>>      inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
>>>      inlined from 'gup_pud_range' at mm/gup.c:2641:15,
>>>      inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
>>>      inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
>>>      inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
>>
>> At first glance, this looks like a real bug in the 16k page code -- you're
>> loading the pte non-atomically on the fast GUP path and so you're prone to
>> tearing, which probably isn't what you want. For a short-term hack, I'd
>> suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
>> you want to support this them you'll need to rework your pte_t so that it
>> can be loaded atomically.
> 
> Looking at commit 55c8fc3f49302, they're all the exact same value, so
> what they could do is grow another special gup_get_pte() variant that
> just loads the first value.
> 
> Also, per that very same commit, there's a distinct lack of WRITE_ONCE()
> in the pte_update() / __set_pte_at() paths for much of Power.
> 

Thanks for the idea.

Now I get the same issue at

    CC      mm/mincore.o
In file included from ./include/asm-generic/bug.h:5:0,
                  from ./arch/powerpc/include/asm/bug.h:109,
                  from ./include/linux/bug.h:5,
                  from ./include/linux/mmdebug.h:5,
                  from ./include/linux/mm.h:9,
                  from ./include/linux/pagemap.h:8,
                  from mm/mincore.c:11:
In function 'huge_ptep_get',
     inlined from 'mincore_hugetlb' at mm/mincore.c:35:20:
./include/linux/compiler.h:392:38: error: call to 
'__compiletime_assert_218' declared with attribute error: Unsupported 
access size for {READ,WRITE}_ONCE().
   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                       ^
./include/linux/compiler.h:373:4: note: in definition of macro 
'__compiletime_assert'
     prefix ## suffix();    \
     ^
./include/linux/compiler.h:392:2: note: in expansion of macro 
'_compiletime_assert'
   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
   ^
./include/linux/compiler.h:405:2: note: in expansion of macro 
'compiletime_assert'
   compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
   ^
./include/linux/compiler.h:291:2: note: in expansion of macro 
'compiletime_assert_rwonce_type'
   compiletime_assert_rwonce_type(x);    \
   ^
./include/asm-generic/hugetlb.h:125:9: note: in expansion of macro 
'READ_ONCE'
   return READ_ONCE(*ptep);
          ^
make[2]: *** [mm/mincore.o] Error 1

I guess for this one I have to implement platform specific huge_ptep_get()

Christophe

^ permalink raw reply

* Re: Boot issue with the latest Git kernel
From: Christophe Leroy @ 2020-06-04 14:29 UTC (permalink / raw)
  To: Christian Zigotzky, linuxppc-dev, Christoph Hellwig,
	Darren Stevens
  Cc: R.T.Dickinson
In-Reply-To: <16b46b7d-7bd0-83b8-307f-2b9be830b096@csgroup.eu>



Le 04/06/2020 à 16:26, Christophe Leroy a écrit :
> Hi,
> 
> 
> Le 04/06/2020 à 16:16, Christian Zigotzky a écrit :
>> Hi All,
>>
>> I tested the latest Git kernel today. [1]. Unfortunately it doesn't 
>> boot on my PowerPC machines.
>>
>> Could you please test the latest Git kernel with your PowerPC machine?
>>
>> BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
>>
> 
> Which machine/platform ? Which defconfig are you using ?
> 


And are you able to perform a 'git bisect' to identify the guilty commit ?

Thanks
Christophe

^ permalink raw reply

* Re: Boot issue with the latest Git kernel
From: Christophe Leroy @ 2020-06-04 14:26 UTC (permalink / raw)
  To: Christian Zigotzky, linuxppc-dev, Christoph Hellwig,
	Darren Stevens
  Cc: R.T.Dickinson
In-Reply-To: <cca69c35-899b-38d8-73ee-2d62997484e5@xenosoft.de>

Hi,


Le 04/06/2020 à 16:16, Christian Zigotzky a écrit :
> Hi All,
> 
> I tested the latest Git kernel today. [1]. Unfortunately it doesn't boot 
> on my PowerPC machines.
> 
> Could you please test the latest Git kernel with your PowerPC machine?
> 
> BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
> 

Which machine/platform ? Which defconfig are you using ?

Christophe


^ permalink raw reply

* Boot issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-04 14:16 UTC (permalink / raw)
  To: linuxppc-dev, Christoph Hellwig, Darren Stevens; +Cc: R.T.Dickinson

Hi All,

I tested the latest Git kernel today. [1]. Unfortunately it doesn't boot 
on my PowerPC machines.

Could you please test the latest Git kernel with your PowerPC machine?

BTW, it doesn't boot in a virtual QEMU PowerPC machine either.

Thanks,
Christian

[1] 
https://forum.hyperion-entertainment.com/viewtopic.php?f=58&p=50758&sid=3f816f078869510dea9fe4baca3605db#p50758

^ permalink raw reply

* Re: [PATCH] ASoC: fsl-asoc-card: Defer probe when fail to find codec device
From: Mark Brown @ 2020-06-04 14:05 UTC (permalink / raw)
  To: lgirdwood, tiwai, perex, nicoleotsuka, Xiubo.Lee, timur,
	Shengjiu Wang, linuxppc-dev, festevam, linux-kernel, alsa-devel
In-Reply-To: <1591251930-4111-1-git-send-email-shengjiu.wang@nxp.com>

On Thu, 4 Jun 2020 14:25:30 +0800, Shengjiu Wang wrote:
> Defer probe when fail to find codec device, because the codec
> device maybe probed later than machine driver.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl-asoc-card: Defer probe when fail to find codec device
      commit: e396dec46c5600d426b2ca8a01a877928b50d1d9

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Christophe Leroy @ 2020-06-04 13:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Stephen Rothwell, peterz, Linux Kernel Mailing List,
	Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <20200604111723.GA1267@willie-the-truck>



On 06/04/2020 11:17 AM, Will Deacon wrote:
> Hi, [+Peter]
> 
> On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
>> Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
>> CONFIG_PPC_4K_PAGES, getting the following build failure:
>>
>>    CC      mm/gup.o
>> In file included from ./include/linux/kernel.h:11:0,
>>                   from mm/gup.c:2:
>> In function 'gup_hugepte.constprop',
>>      inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
>> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
>> declared with attribute error: Unsupported access size for
>> {READ,WRITE}_ONCE().
>>    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>>                                        ^
>> ./include/linux/compiler.h:373:4: note: in definition of macro
>> '__compiletime_assert'
>>      prefix ## suffix();    \
>>      ^
>> ./include/linux/compiler.h:392:2: note: in expansion of macro
>> '_compiletime_assert'
>>    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>>    ^
>> ./include/linux/compiler.h:405:2: note: in expansion of macro
>> 'compiletime_assert'
>>    compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
>>    ^
>> ./include/linux/compiler.h:291:2: note: in expansion of macro
>> 'compiletime_assert_rwonce_type'
>>    compiletime_assert_rwonce_type(x);    \
>>    ^
>> mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
>>    pte = READ_ONCE(*ptep);
>>          ^
>> In function 'gup_get_pte',
>>      inlined from 'gup_pte_range' at mm/gup.c:2228:9,
>>      inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
>>      inlined from 'gup_pud_range' at mm/gup.c:2641:15,
>>      inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
>>      inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
>>      inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
> 
> At first glance, this looks like a real bug in the 16k page code -- you're
> loading the pte non-atomically on the fast GUP path and so you're prone to
> tearing, which probably isn't what you want. For a short-term hack, I'd
> suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
> you want to support this them you'll need to rework your pte_t so that it
> can be loaded atomically.

What do you mean by *rework* pte_t ?
pte are 32 bits words in size and are spread every 4 words in memory. 
Therefore pte_t has to be 128 bits because unlike huge_pte handling 
which always use huge_pte_offset() in loops, many many places in the 
kernel do pte++, so we need the pte type to be the size of the interval 
from one pte to the next one.

Christophe

^ permalink raw reply

* [PATCH v3 1/3] selftests: powerpc: Fix pkey access right updates
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
  To: mpe
  Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
	linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>

The Power ISA mandates that all writes to the Authority
Mask Register (AMR) must always be preceded as well as
succeeded by a context synchronizing instruction.

This makes sure that the tests follow this requirement
when attempting to update a pkey's access rights.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
 tools/testing/selftests/powerpc/include/reg.h        | 6 ++++++
 tools/testing/selftests/powerpc/ptrace/core-pkey.c   | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/reg.h b/tools/testing/selftests/powerpc/include/reg.h
index 022c5076b2c5..c0f2742a3a59 100644
--- a/tools/testing/selftests/powerpc/include/reg.h
+++ b/tools/testing/selftests/powerpc/include/reg.h
@@ -57,6 +57,12 @@
 #define SPRN_PPR       896	/* Program Priority Register */
 #define SPRN_AMR       13	/* Authority Mask Register - problem state */
 
+#define set_amr(v)	asm volatile("isync;" \
+				     "mtspr " __stringify(SPRN_AMR) ",%0;" \
+				     "isync" : \
+				    : "r" ((unsigned long)(v)) \
+				    : "memory")
+
 /* TEXASR register bits */
 #define TEXASR_FC	0xFE00000000000000
 #define TEXASR_FP	0x0100000000000000
diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
index d5c64fee032d..bbc05ffc5860 100644
--- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -150,7 +150,7 @@ static int child(struct shared_info *info)
 	printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
 	       user_write, info->amr, pkey1, pkey2, pkey3);
 
-	mtspr(SPRN_AMR, info->amr);
+	set_amr(info->amr);
 
 	/*
 	 * We won't use pkey3. This tests whether the kernel restores the UAMOR
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bdbbbe8431e0..904c04f8c919 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -126,7 +126,7 @@ static int child(struct shared_info *info)
 	printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
 	       user_write, info->amr1, pkey1, pkey2, pkey3);
 
-	mtspr(SPRN_AMR, info->amr1);
+	set_amr(info->amr1);
 
 	/* Wait for parent to read our AMR value and write a new one. */
 	ret = prod_parent(&info->child_sync);
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 3/3] selftests: powerpc: Add test for execute-disabled pkeys
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
  To: mpe
  Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
	linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>

Apart from read and write access, memory protection keys can
also be used for restricting execute permission of pages on
powerpc. This adds a test to verify if the feature works as
expected.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
 tools/testing/selftests/powerpc/mm/.gitignore |   1 +
 tools/testing/selftests/powerpc/mm/Makefile   |   3 +-
 .../selftests/powerpc/mm/pkey_exec_prot.c     | 388 ++++++++++++++++++
 3 files changed, 391 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c

diff --git a/tools/testing/selftests/powerpc/mm/.gitignore b/tools/testing/selftests/powerpc/mm/.gitignore
index 2ca523255b1b..8f841f925baa 100644
--- a/tools/testing/selftests/powerpc/mm/.gitignore
+++ b/tools/testing/selftests/powerpc/mm/.gitignore
@@ -8,3 +8,4 @@ wild_bctr
 large_vm_fork_separation
 bad_accesses
 tlbie_test
+pkey_exec_prot
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index 2389bf791fd6..f9fa0ba7435c 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -3,7 +3,7 @@ noarg:
 	$(MAKE) -C ../
 
 TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \
-		  large_vm_fork_separation bad_accesses
+		  large_vm_fork_separation bad_accesses pkey_exec_prot
 TEST_GEN_PROGS_EXTENDED := tlbie_test
 TEST_GEN_FILES := tempfile
 
@@ -17,6 +17,7 @@ $(OUTPUT)/prot_sao: ../utils.c
 $(OUTPUT)/wild_bctr: CFLAGS += -m64
 $(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
 $(OUTPUT)/bad_accesses: CFLAGS += -m64
+$(OUTPUT)/pkey_exec_prot: CFLAGS += -m64
 
 $(OUTPUT)/tempfile:
 	dd if=/dev/zero of=$@ bs=64k count=1
diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
new file mode 100644
index 000000000000..7c7c93425c5e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
@@ -0,0 +1,388 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/*
+ * Copyright 2020, Sandipan Das, IBM Corp.
+ *
+ * Test if applying execute protection on pages using memory
+ * protection keys works as expected.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "reg.h"
+#include "utils.h"
+
+/*
+ * Older versions of libc use the Intel-specific access rights.
+ * Hence, override the definitions as they might be incorrect.
+ */
+#undef PKEY_DISABLE_ACCESS
+#define PKEY_DISABLE_ACCESS	0x3
+
+#undef PKEY_DISABLE_WRITE
+#define PKEY_DISABLE_WRITE	0x2
+
+#undef PKEY_DISABLE_EXECUTE
+#define PKEY_DISABLE_EXECUTE	0x4
+
+/* Older versions of libc do not not define this */
+#ifndef SEGV_PKUERR
+#define SEGV_PKUERR	4
+#endif
+
+#define SI_PKEY_OFFSET	0x20
+
+#define SYS_pkey_mprotect	386
+#define SYS_pkey_alloc		384
+#define SYS_pkey_free		385
+
+#define PKEY_BITS_PER_PKEY	2
+#define NR_PKEYS		32
+#define PKEY_BITS_MASK		((1UL << PKEY_BITS_PER_PKEY) - 1)
+
+#define PPC_INST_NOP	0x60000000
+#define PPC_INST_TRAP	0x7fe00008
+#define PPC_INST_BLR	0x4e800020
+
+#define sigsafe_err(msg)	({ \
+		ssize_t nbytes __attribute__((unused)); \
+		nbytes = write(STDERR_FILENO, msg, strlen(msg)); })
+
+static inline unsigned long pkeyreg_get(void)
+{
+	return mfspr(SPRN_AMR);
+}
+
+static inline void pkeyreg_set(unsigned long amr)
+{
+	set_amr(amr);
+}
+
+static void pkey_set_rights(int pkey, unsigned long rights)
+{
+	unsigned long amr, shift;
+
+	shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
+	amr = pkeyreg_get();
+	amr &= ~(PKEY_BITS_MASK << shift);
+	amr |= (rights & PKEY_BITS_MASK) << shift;
+	pkeyreg_set(amr);
+}
+
+static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
+{
+	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+}
+
+static int sys_pkey_alloc(unsigned long flags, unsigned long rights)
+{
+	return syscall(SYS_pkey_alloc, flags, rights);
+}
+
+static int sys_pkey_free(int pkey)
+{
+	return syscall(SYS_pkey_free, pkey);
+}
+
+static volatile sig_atomic_t fault_pkey, fault_code, fault_type;
+static volatile sig_atomic_t remaining_faults;
+static volatile unsigned int *fault_addr;
+static unsigned long pgsize, numinsns;
+static unsigned int *insns;
+
+static void trap_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+	/* Check if this fault originated from the expected address */
+	if (sinfo->si_addr != (void *) fault_addr)
+		sigsafe_err("got a fault for an unexpected address\n");
+
+	_exit(1);
+}
+
+static void segv_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+	int signal_pkey;
+
+	/*
+	 * In older versions of libc, siginfo_t does not have si_pkey as
+	 * a member.
+	 */
+#ifdef si_pkey
+	signal_pkey = sinfo->si_pkey;
+#else
+	signal_pkey = *((int *)(((char *) sinfo) + SI_PKEY_OFFSET));
+#endif
+
+	fault_code = sinfo->si_code;
+
+	/* Check if this fault originated from the expected address */
+	if (sinfo->si_addr != (void *) fault_addr) {
+		sigsafe_err("got a fault for an unexpected address\n");
+		_exit(1);
+	}
+
+	/* Check if too many faults have occurred for a single test case */
+	if (!remaining_faults) {
+		sigsafe_err("got too many faults for the same address\n");
+		_exit(1);
+	}
+
+
+	/* Restore permissions in order to continue */
+	switch (fault_code) {
+	case SEGV_ACCERR:
+		if (mprotect(insns, pgsize, PROT_READ | PROT_WRITE)) {
+			sigsafe_err("failed to set access permissions\n");
+			_exit(1);
+		}
+		break;
+	case SEGV_PKUERR:
+		if (signal_pkey != fault_pkey) {
+			sigsafe_err("got a fault for an unexpected pkey\n");
+			_exit(1);
+		}
+
+		switch (fault_type) {
+		case PKEY_DISABLE_ACCESS:
+			pkey_set_rights(fault_pkey, 0);
+			break;
+		case PKEY_DISABLE_EXECUTE:
+			/*
+			 * Reassociate the exec-only pkey with the region
+			 * to be able to continue. Unlike AMR, we cannot
+			 * set IAMR directly from userspace to restore the
+			 * permissions.
+			 */
+			if (mprotect(insns, pgsize, PROT_EXEC)) {
+				sigsafe_err("failed to set execute permissions\n");
+				_exit(1);
+			}
+			break;
+		default:
+			sigsafe_err("got a fault with an unexpected type\n");
+			_exit(1);
+		}
+		break;
+	default:
+		sigsafe_err("got a fault with an unexpected code\n");
+		_exit(1);
+	}
+
+	remaining_faults--;
+}
+
+static int pkeys_unsupported(void)
+{
+	bool hash_mmu = false;
+	int pkey;
+
+	/* Protection keys are currently supported on Hash MMU only */
+	FAIL_IF(using_hash_mmu(&hash_mmu));
+	SKIP_IF(!hash_mmu);
+
+	/* Check if the system call is supported */
+	pkey = sys_pkey_alloc(0, 0);
+	SKIP_IF(pkey < 0);
+	sys_pkey_free(pkey);
+
+	return 0;
+}
+
+static int test(void)
+{
+	struct sigaction segv_act, trap_act;
+	int pkey, ret, i;
+
+	ret = pkeys_unsupported();
+	if (ret)
+		return ret;
+
+	/* Setup SIGSEGV handler */
+	segv_act.sa_handler = 0;
+	segv_act.sa_sigaction = segv_handler;
+	FAIL_IF(sigprocmask(SIG_SETMASK, 0, &segv_act.sa_mask) != 0);
+	segv_act.sa_flags = SA_SIGINFO;
+	segv_act.sa_restorer = 0;
+	FAIL_IF(sigaction(SIGSEGV, &segv_act, NULL) != 0);
+
+	/* Setup SIGTRAP handler */
+	trap_act.sa_handler = 0;
+	trap_act.sa_sigaction = trap_handler;
+	FAIL_IF(sigprocmask(SIG_SETMASK, 0, &trap_act.sa_mask) != 0);
+	trap_act.sa_flags = SA_SIGINFO;
+	trap_act.sa_restorer = 0;
+	FAIL_IF(sigaction(SIGTRAP, &trap_act, NULL) != 0);
+
+	/* Setup executable region */
+	pgsize = getpagesize();
+	numinsns = pgsize / sizeof(unsigned int);
+	insns = (unsigned int *) mmap(NULL, pgsize, PROT_READ | PROT_WRITE,
+				      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	FAIL_IF(insns == MAP_FAILED);
+
+	/* Write the instruction words */
+	for (i = 1; i < numinsns - 1; i++)
+		insns[i] = PPC_INST_NOP;
+
+	/*
+	 * Set the first instruction as an unconditional trap. If
+	 * the last write to this address succeeds, this should
+	 * get overwritten by a no-op.
+	 */
+	insns[0] = PPC_INST_TRAP;
+
+	/*
+	 * Later, to jump to the executable region, we use a branch
+	 * and link instruction (bctrl) which sets the return address
+	 * automatically in LR. Use that to return back.
+	 */
+	insns[numinsns - 1] = PPC_INST_BLR;
+
+	/* Allocate a pkey that restricts execution */
+	pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
+	FAIL_IF(pkey < 0);
+
+	/*
+	 * Pick the first instruction's address from the executable
+	 * region.
+	 */
+	fault_addr = insns;
+
+	/* The following two cases will avoid SEGV_PKUERR */
+	fault_type = -1;
+	fault_pkey = -1;
+
+	/*
+	 * Read an instruction word from the address when AMR bits
+	 * are not set i.e. the pkey permits both read and write
+	 * access.
+	 *
+	 * This should not generate a fault as having PROT_EXEC
+	 * implies PROT_READ on GNU systems. The pkey currently
+	 * restricts execution only based on the IAMR bits. The
+	 * AMR bits are cleared.
+	 */
+	remaining_faults = 0;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	printf("read from %p, pkey is execute-disabled, access-enabled\n",
+	       (void *) fault_addr);
+	i = *fault_addr;
+	FAIL_IF(remaining_faults != 0);
+
+	/*
+	 * Write an instruction word to the address when AMR bits
+	 * are not set i.e. the pkey permits both read and write
+	 * access.
+	 *
+	 * This should generate an access fault as having just
+	 * PROT_EXEC also restricts writes. The pkey currently
+	 * restricts execution only based on the IAMR bits. The
+	 * AMR bits are cleared.
+	 */
+	remaining_faults = 1;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	printf("write to %p, pkey is execute-disabled, access-enabled\n",
+	       (void *) fault_addr);
+	*fault_addr = PPC_INST_TRAP;
+	FAIL_IF(remaining_faults != 0 || fault_code != SEGV_ACCERR);
+
+	/* The following three cases will generate SEGV_PKUERR */
+	fault_type = PKEY_DISABLE_ACCESS;
+	fault_pkey = pkey;
+
+	/*
+	 * Read an instruction word from the address when AMR bits
+	 * are set i.e. the pkey permits neither read nor write
+	 * access.
+	 *
+	 * This should generate a pkey fault based on AMR bits only
+	 * as having PROT_EXEC implicitly allows reads.
+	 */
+	remaining_faults = 1;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	printf("read from %p, pkey is execute-disabled, access-disabled\n",
+	       (void *) fault_addr);
+	pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+	i = *fault_addr;
+	FAIL_IF(remaining_faults != 0 || fault_code != SEGV_PKUERR);
+
+	/*
+	 * Write an instruction word to the address when AMR bits
+	 * are set i.e. the pkey permits neither read nor write
+	 * access.
+	 *
+	 * This should generate two faults. First, a pkey fault
+	 * based on AMR bits and then an access fault since
+	 * PROT_EXEC does not allow writes.
+	 */
+	remaining_faults = 2;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	printf("write to %p, pkey is execute-disabled, access-disabled\n",
+	       (void *) fault_addr);
+	pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+	*fault_addr = PPC_INST_NOP;
+	FAIL_IF(remaining_faults != 0 || fault_code != SEGV_ACCERR);
+
+	/*
+	 * Jump to the executable region when AMR bits are set i.e.
+	 * the pkey permits neither read nor write access.
+	 *
+	 * This should generate a pkey fault based on IAMR bits which
+	 * are set to not permit execution. AMR bits should not affect
+	 * execution.
+	 *
+	 * This also checks if the overwrite of the first instruction
+	 * word from a trap to a no-op succeeded.
+	 */
+	fault_addr = insns;
+	fault_type = PKEY_DISABLE_EXECUTE;
+	fault_pkey = pkey;
+	remaining_faults = 1;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+	printf("execute at %p, pkey is execute-disabled, access-disabled\n",
+	       (void *) fault_addr);
+	asm volatile("mtctr	%0; bctrl" : : "r"(insns));
+	FAIL_IF(remaining_faults != 0 || fault_code != SEGV_PKUERR);
+
+	/*
+	 * Free the current pkey and allocate a new one that is
+	 * fully permissive.
+	 */
+	sys_pkey_free(pkey);
+	pkey = sys_pkey_alloc(0, 0);
+
+	/*
+	 * Jump to the executable region when AMR bits are not set
+	 * i.e. the pkey permits read and write access.
+	 *
+	 * This should not generate any faults as the IAMR bits are
+	 * also not set and hence will the pkey will not restrict
+	 * execution.
+	 */
+	fault_pkey = pkey;
+	remaining_faults = 0;
+	FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+	printf("execute at %p, pkey is execute-enabled, access-enabled\n",
+	       (void *) fault_addr);
+	asm volatile("mtctr	%0; bctrl" : : "r"(insns));
+	FAIL_IF(remaining_faults != 0);
+
+	/* Cleanup */
+	munmap((void *) insns, pgsize);
+	sys_pkey_free(pkey);
+
+	return 0;
+}
+
+int main(void)
+{
+	test_harness(test, "pkey_exec_prot");
+}
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 2/3] selftests: powerpc: Move Hash MMU check to utilities
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
  To: mpe
  Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
	linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>

This moves a function to test if the MMU is in Hash mode
under the generic test utilities.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
 .../testing/selftests/powerpc/include/utils.h |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  2 +-
 .../selftests/powerpc/mm/bad_accesses.c       | 28 -------------------
 tools/testing/selftests/powerpc/utils.c       | 28 +++++++++++++++++++
 4 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h
index e089a0c30d9a..ad2728736ae5 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -60,6 +60,7 @@ static inline bool have_hwcap2(unsigned long ftr2)
 #endif
 
 bool is_ppc64le(void);
+int using_hash_mmu(bool *using_hash);
 
 /* Yes, this is evil */
 #define FAIL_IF(x)						\
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index b9103c4bb414..2389bf791fd6 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -10,7 +10,7 @@ TEST_GEN_FILES := tempfile
 top_srcdir = ../../../../..
 include ../../lib.mk
 
-$(TEST_GEN_PROGS): ../harness.c
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
 
 $(OUTPUT)/prot_sao: ../utils.c
 
diff --git a/tools/testing/selftests/powerpc/mm/bad_accesses.c b/tools/testing/selftests/powerpc/mm/bad_accesses.c
index adc465f499ef..a864ed7e2008 100644
--- a/tools/testing/selftests/powerpc/mm/bad_accesses.c
+++ b/tools/testing/selftests/powerpc/mm/bad_accesses.c
@@ -64,34 +64,6 @@ int bad_access(char *p, bool write)
 	return 0;
 }
 
-static int using_hash_mmu(bool *using_hash)
-{
-	char line[128];
-	FILE *f;
-	int rc;
-
-	f = fopen("/proc/cpuinfo", "r");
-	FAIL_IF(!f);
-
-	rc = 0;
-	while (fgets(line, sizeof(line), f) != NULL) {
-		if (strcmp(line, "MMU		: Hash\n") == 0) {
-			*using_hash = true;
-			goto out;
-		}
-
-		if (strcmp(line, "MMU		: Radix\n") == 0) {
-			*using_hash = false;
-			goto out;
-		}
-	}
-
-	rc = -1;
-out:
-	fclose(f);
-	return rc;
-}
-
 static int test(void)
 {
 	unsigned long i, j, addr, region_shift, page_shift, page_size;
diff --git a/tools/testing/selftests/powerpc/utils.c b/tools/testing/selftests/powerpc/utils.c
index 5ee0e98c4896..933678f1ed0a 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -293,3 +293,31 @@ void set_dscr(unsigned long val)
 
 	asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR));
 }
+
+int using_hash_mmu(bool *using_hash)
+{
+	char line[128];
+	FILE *f;
+	int rc;
+
+	f = fopen("/proc/cpuinfo", "r");
+	FAIL_IF(!f);
+
+	rc = 0;
+	while (fgets(line, sizeof(line), f) != NULL) {
+		if (strcmp(line, "MMU		: Hash\n") == 0) {
+			*using_hash = true;
+			goto out;
+		}
+
+		if (strcmp(line, "MMU		: Radix\n") == 0) {
+			*using_hash = false;
+			goto out;
+		}
+	}
+
+	rc = -1;
+out:
+	fclose(f);
+	return rc;
+}
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 0/3] selftests: powerpc: Fixes and execute-disable test for pkeys
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
  To: mpe
  Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
	linuxppc-dev, bauerman

This fixes the way the Authority Mask Register (AMR) is updated
by the existing pkey tests and adds a new test to verify the
functionality of execute-disabled pkeys.

Previous versions can be found at:
v2: https://lore.kernel.org/linuxppc-dev/20200527030342.13712-1-sandipan@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20200508162332.65316-1-sandipan@linux.ibm.com/

Changes in v3:
- Fixed AMR writes for existing pkey tests (new patch).
- Moved Hash MMU check under utilities (new patch) and removed duplicate
  code.
- Fixed comments on why the pkey permission bits were redefined.
- Switched to existing mfspr() macro for reading AMR.
- Switched to sig_atomic_t as data type for variables updated in the
  signal handlers.
- Switched to exit()-ing if the signal handlers come across an unexpected
  condition instead of trying to reset page and pkey permissions.
- Switched to write() from printf() for printing error messages from
  the signal handlers.
- Switched to getpagesize().
- Renamed fault counter to denote remaining faults.
- Dropped unnecessary randomization for choosing an address to fault at.
- Added additional information on change in permissions due to AMR and
  IAMR bits in comments.
- Switched the first instruction word of the executable region to a trap
  to test if it is actually overwritten by a no-op later.
- Added an new test scenario where the pkey imposes no restrictions and
  an attempt is made to jump to the executable region again.

Changes in v2:
- Added .gitignore entry for test binary.
- Fixed builds for older distros where siginfo_t might not have si_pkey as
  a formal member based on discussion with Michael.

Sandipan Das (3):
  selftests: powerpc: Fix pkey access right updates
  selftests: powerpc: Move Hash MMU check to utilities
  selftests: powerpc: Add test for execute-disabled pkeys

 tools/testing/selftests/powerpc/include/reg.h |   6 +
 .../testing/selftests/powerpc/include/utils.h |   1 +
 tools/testing/selftests/powerpc/mm/.gitignore |   1 +
 tools/testing/selftests/powerpc/mm/Makefile   |   5 +-
 .../selftests/powerpc/mm/bad_accesses.c       |  28 --
 .../selftests/powerpc/mm/pkey_exec_prot.c     | 388 ++++++++++++++++++
 .../selftests/powerpc/ptrace/core-pkey.c      |   2 +-
 .../selftests/powerpc/ptrace/ptrace-pkey.c    |   2 +-
 tools/testing/selftests/powerpc/utils.c       |  28 ++
 9 files changed, 429 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c

-- 
2.25.1


^ permalink raw reply

* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Michael Ellerman @ 2020-06-04 12:43 UTC (permalink / raw)
  To: Stephen Rothwell, Andrew Morton
  Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604184501.1ea5ba36@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:
> Hi all,
>
> On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> diff --git a/arch/powerpc/mm/kasan/8xx.c b/arch/powerpc/mm/kasan/8xx.c
>> index db4ef44af22f..569d98a41881 100644
>> --- a/arch/powerpc/mm/kasan/8xx.c
>> +++ b/arch/powerpc/mm/kasan/8xx.c
>> @@ -10,7 +10,7 @@
>>  static int __init
>>  kasan_init_shadow_8M(unsigned long k_start, unsigned long k_end, void *block)
>>  {
>> -	pmd_t *pmd = pmd_ptr_k(k_start);
>> +	pmd_t *pmd = pmd_off_k(k_start);
>>  	unsigned long k_cur, k_next;
>>  
>>  	for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd += 2, block += SZ_8M) {
>> @@ -59,7 +59,7 @@ int __init kasan_init_region(void *start, size_t size)
>>  		return ret;
>>  
>>  	for (; k_cur < k_end; k_cur += PAGE_SIZE) {
>> -		pmd_t *pmd = pmd_ptr_k(k_cur);
>> +		pmd_t *pmd = pmd_off_k(k_cur);
>>  		void *va = block + k_cur - k_start;
>>  		pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>>  
>> diff --git a/arch/powerpc/mm/kasan/book3s_32.c b/arch/powerpc/mm/kasan/book3s_32.c
>> index 4bc491a4a1fd..a32b4640b9de 100644
>> --- a/arch/powerpc/mm/kasan/book3s_32.c
>> +++ b/arch/powerpc/mm/kasan/book3s_32.c
>> @@ -46,7 +46,7 @@ int __init kasan_init_region(void *start, size_t size)
>>  	kasan_update_early_region(k_start, k_cur, __pte(0));
>>  
>>  	for (; k_cur < k_end; k_cur += PAGE_SIZE) {
>> -		pmd_t *pmd = pmd_ptr_k(k_cur);
>> +		pmd_t *pmd = pmd_off_k(k_cur);
>>  		void *va = block + k_cur - k_start;
>>  		pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>>  
>> diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
>> index 286441bbbe49..92e8929cbe3e 100644
>> --- a/arch/powerpc/mm/nohash/8xx.c
>> +++ b/arch/powerpc/mm/nohash/8xx.c
>> @@ -74,7 +74,7 @@ static pte_t __init *early_hugepd_alloc_kernel(hugepd_t *pmdp, unsigned long va)
>>  static int __ref __early_map_kernel_hugepage(unsigned long va, phys_addr_t pa,
>>  					     pgprot_t prot, int psize, bool new)
>>  {
>> -	pmd_t *pmdp = pmd_ptr_k(va);
>> +	pmd_t *pmdp = pmd_off_k(va);
>>  	pte_t *ptep;
>>  
>>  	if (WARN_ON(psize != MMU_PAGE_512K && psize != MMU_PAGE_8M))
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index 45a0556089e8..1136257c3a99 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -264,7 +264,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>>  #if defined(CONFIG_PPC_8xx)
>>  void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
>>  {
>> -	pmd_t *pmd = pmd_ptr(mm, addr);
>> +	pmd_t *pmd = pmd_off(mm, addr);
>>  	pte_basic_t val;
>>  	pte_basic_t *entry = &ptep->pte;
>>  	int num = is_hugepd(*((hugepd_t *)pmd)) ? 1 : SZ_512K / SZ_4K;
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index e2d054c9575e..6eb4eab79385 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -40,7 +40,7 @@ notrace void __init early_ioremap_init(void)
>>  {
>>  	unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE);
>>  	pte_t *ptep = (pte_t *)early_fixmap_pagetable;
>> -	pmd_t *pmdp = pmd_ptr_k(addr);
>> +	pmd_t *pmdp = pmd_off_k(addr);
>>  
>>  	for (; (s32)(FIXADDR_TOP - addr) > 0;
>>  	     addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++)
>
> I have added the above hunks as to linux-next for tomorrow as a fix for
> mm-pgtable-add-shortcuts-for-accessing-kernel-pmd-and-pte.

Looks good. Thanks.

cheers

^ permalink raw reply

* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Peter Zijlstra @ 2020-06-04 12:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Stephen Rothwell, Linux Kernel Mailing List,
	Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <20200604111723.GA1267@willie-the-truck>

On Thu, Jun 04, 2020 at 12:17:23PM +0100, Will Deacon wrote:
> Hi, [+Peter]
> 
> On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
> > Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
> > CONFIG_PPC_4K_PAGES, getting the following build failure:
> > 
> >   CC      mm/gup.o
> > In file included from ./include/linux/kernel.h:11:0,
> >                  from mm/gup.c:2:
> > In function 'gup_hugepte.constprop',
> >     inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
> > ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
> > declared with attribute error: Unsupported access size for
> > {READ,WRITE}_ONCE().
> >   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> >                                       ^
> > ./include/linux/compiler.h:373:4: note: in definition of macro
> > '__compiletime_assert'
> >     prefix ## suffix();    \
> >     ^
> > ./include/linux/compiler.h:392:2: note: in expansion of macro
> > '_compiletime_assert'
> >   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> >   ^
> > ./include/linux/compiler.h:405:2: note: in expansion of macro
> > 'compiletime_assert'
> >   compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
> >   ^
> > ./include/linux/compiler.h:291:2: note: in expansion of macro
> > 'compiletime_assert_rwonce_type'
> >   compiletime_assert_rwonce_type(x);    \
> >   ^
> > mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
> >   pte = READ_ONCE(*ptep);
> >         ^
> > In function 'gup_get_pte',
> >     inlined from 'gup_pte_range' at mm/gup.c:2228:9,
> >     inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
> >     inlined from 'gup_pud_range' at mm/gup.c:2641:15,
> >     inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
> >     inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
> >     inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
> 
> At first glance, this looks like a real bug in the 16k page code -- you're
> loading the pte non-atomically on the fast GUP path and so you're prone to
> tearing, which probably isn't what you want. For a short-term hack, I'd
> suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
> you want to support this them you'll need to rework your pte_t so that it
> can be loaded atomically.

Looking at commit 55c8fc3f49302, they're all the exact same value, so
what they could do is grow another special gup_get_pte() variant that
just loads the first value.

Also, per that very same commit, there's a distinct lack of WRITE_ONCE()
in the pte_update() / __set_pte_at() paths for much of Power.

^ permalink raw reply

* Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics
From: Amit Kucheria @ 2020-06-04 11:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: Emanuele Giuseppe Esposito, linux-s390, kvm, David Hildenbrand,
	Cornelia Huck, Emanuele Giuseppe Esposito, LKML, kvm-ppc,
	Jonathan Adams, Christian Borntraeger, Alexander Viro,
	linux-fsdevel, Paolo Bonzini, Vitaly Kuznetsov, open list:MIPS,
	linuxppc-dev, Jim Mattson
In-Reply-To: <alpine.DEB.2.22.394.2005041429210.224786@chino.kir.corp.google.com>

On Tue, May 5, 2020 at 3:07 AM David Rientjes <rientjes@google.com> wrote:
>
> On Mon, 4 May 2020, Emanuele Giuseppe Esposito wrote:
>
> > There is currently no common way for Linux kernel subsystems to expose
> > statistics to userspace shared throughout the Linux kernel; subsystems
> > have to take care of gathering and displaying statistics by themselves,
> > for example in the form of files in debugfs. For example KVM has its own
> > code section that takes care of this in virt/kvm/kvm_main.c, where it sets
> > up debugfs handlers for displaying values and aggregating them from
> > various subfolders to obtain information about the system state (i.e.
> > displaying the total number of exits, calculated by summing all exits of
> > all cpus of all running virtual machines).
> >
> > Allowing each section of the kernel to do so has two disadvantages. First,
> > it will introduce redundant code. Second, debugfs is anyway not the right
> > place for statistics (for example it is affected by lockdown)
> >
> > In this patch series I introduce statsfs, a synthetic ram-based virtual
> > filesystem that takes care of gathering and displaying statistics for the
> > Linux kernel subsystems.
> >
>
> This is exciting, we have been looking in the same area recently.  Adding
> Jonathan Adams <jwadams@google.com>.
>
> In your diffstat, one thing I notice that is omitted: an update to
> Documentation/* :)  Any chance of getting some proposed Documentation/
> updates with structure of the fs, the per subsystem breakdown, and best
> practices for managing the stats from the kernel level?
>
> > The file system is mounted on /sys/kernel/stats and would be already used
> > by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
> >
> > Statsfs offers a generic and stable API, allowing any kind of
> > directory/file organization and supporting multiple kind of aggregations
> > (not only sum, but also average, max, min and count_zero) and data types
> > (all unsigned and signed types plus boolean). The implementation, which is
> > a generalization of KVM’s debugfs statistics code, takes care of gathering
> > and displaying information at run time; users only need to specify the
> > values to be included in each source.
> >
> > Statsfs would also be a different mountpoint from debugfs, and would not
> > suffer from limited access due to the security lock down patches. Its main
> > function is to display each statistics as a file in the desired folder
> > hierarchy defined through the API. Statsfs files can be read, and possibly
> > cleared if their file mode allows it.
> >
> > Statsfs has two main components: the public API defined by
> > include/linux/statsfs.h, and the virtual file system which should end up
> > in /sys/kernel/stats.
> >
> > The API has two main elements, values and sources. Kernel subsystems like
> > KVM can use the API to create a source, add child
> > sources/values/aggregates and register it to the root source (that on the
> > virtual fs would be /sys/kernel/statsfs).
> >
> > Sources are created via statsfs_source_create(), and each source becomes a
> > directory in the file system. Sources form a parent-child relationship;
> > root sources are added to the file system via statsfs_source_register().
> > Every other source is added to or removed from a parent through the
> > statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
> > Once a source is created and added to the tree (via add_subordinate), it
> > will be used to compute aggregate values in the parent source.
> >
> > Values represent quantites that are gathered by the statsfs user. Examples
> > of values include the number of vm exits of a given kind, the amount of
> > memory used by some data structure, the length of the longest hash table
> > chain, or anything like that. Values are defined with the
> > statsfs_source_add_values function. Each value is defined by a struct
> > statsfs_value; the same statsfs_value can be added to many different
> > sources. A value can be considered "simple" if it fetches data from a
> > user-provided location, or "aggregate" if it groups all values in the
> > subordinates sources that include the same statsfs_value.
> >
>
> This seems like it could have a lot of overhead if we wanted to
> periodically track the totality of subsystem stats as a form of telemetry
> gathering from userspace.  To collect telemetry for 1,000 different stats,
> do we need to issue lseek()+read() syscalls for each of them individually
> (or, worse, open()+read()+close())?
>
> Any thoughts on how that can be optimized?  A couple of ideas:
>
>  - an interface that allows gathering of all stats for a particular
>    interface through a single file that would likely be encoded in binary
>    and the responsibility of userspace to disseminate, or
>
>  - an interface that extends beyond this proposal and allows the reader to
>    specify which stats they are interested in collecting and then the
>    kernel will only provide these stats in a well formed structure and
>    also be binary encoded.

Something akin to how ftrace allows you specify the list of functions
in /sys/kernel/debug/tracing/set_ftrace_filter would make this a lot
easier to use than the one-file-per-stat interface.

That would be useful, e.g. in capturing correlated stats periodically
e.g. scheduler, power and thermal stats

> We've found that the one-file-per-stat method is pretty much a show
> stopper from the performance view and we always must execute at least two
> syscalls to obtain a single stat.
>
> Since this is becoming a generic API (good!!), maybe we can discuss
> possible ways to optimize gathering of stats in mass?
>
> > For more information, please consult the kerneldoc documentation in patch
> > 2 and the sample uses in the kunit tests and in KVM.
> >
> > This series of patches is based on my previous series "libfs: group and
> > simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
> > VM_STAT and VCPU_STAT definitions in a single place". The former
> > simplifies code duplicated in debugfs and tracefs (from which statsfs is
> > based on), the latter groups all macros definition for statistics in kvm
> > in a single common file shared by all architectures.
> >
> > Patch 1 adds a new refcount and kref destructor wrappers that take a
> > semaphore, as those are used later by statsfs. Patch 2 introduces the
> > statsfs API, patch 3 provides extensive tests that can also be used as
> > example on how to use the API and patch 4 adds the file system support.
> > Finally, patch 5 provides a real-life example of statsfs usage in KVM.
> >
> > [1] https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa366d@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M
> >
> > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> >
> > v1->v2 remove unnecessary list_foreach_safe loops, fix wrong indentation,
> > change statsfs in stats_fs
> >
> > Emanuele Giuseppe Esposito (5):
> >   refcount, kref: add dec-and-test wrappers for rw_semaphores
> >   stats_fs API: create, add and remove stats_fs sources and values
> >   kunit: tests for stats_fs API
> >   stats_fs fs: virtual fs to show stats to the end-user
> >   kvm_main: replace debugfs with stats_fs
> >
> >  MAINTAINERS                     |    7 +
> >  arch/arm64/kvm/Kconfig          |    1 +
> >  arch/arm64/kvm/guest.c          |    2 +-
> >  arch/mips/kvm/Kconfig           |    1 +
> >  arch/mips/kvm/mips.c            |    2 +-
> >  arch/powerpc/kvm/Kconfig        |    1 +
> >  arch/powerpc/kvm/book3s.c       |    6 +-
> >  arch/powerpc/kvm/booke.c        |    8 +-
> >  arch/s390/kvm/Kconfig           |    1 +
> >  arch/s390/kvm/kvm-s390.c        |   16 +-
> >  arch/x86/include/asm/kvm_host.h |    2 +-
> >  arch/x86/kvm/Kconfig            |    1 +
> >  arch/x86/kvm/Makefile           |    2 +-
> >  arch/x86/kvm/debugfs.c          |   64 --
> >  arch/x86/kvm/stats_fs.c         |   56 ++
> >  arch/x86/kvm/x86.c              |    6 +-
> >  fs/Kconfig                      |   12 +
> >  fs/Makefile                     |    1 +
> >  fs/stats_fs/Makefile            |    6 +
> >  fs/stats_fs/inode.c             |  337 ++++++++++
> >  fs/stats_fs/internal.h          |   35 +
> >  fs/stats_fs/stats_fs-tests.c    | 1088 +++++++++++++++++++++++++++++++
> >  fs/stats_fs/stats_fs.c          |  773 ++++++++++++++++++++++
> >  include/linux/kref.h            |   11 +
> >  include/linux/kvm_host.h        |   39 +-
> >  include/linux/refcount.h        |    2 +
> >  include/linux/stats_fs.h        |  304 +++++++++
> >  include/uapi/linux/magic.h      |    1 +
> >  lib/refcount.c                  |   32 +
> >  tools/lib/api/fs/fs.c           |   21 +
> >  virt/kvm/arm/arm.c              |    2 +-
> >  virt/kvm/kvm_main.c             |  314 ++-------
> >  32 files changed, 2772 insertions(+), 382 deletions(-)
> >  delete mode 100644 arch/x86/kvm/debugfs.c
> >  create mode 100644 arch/x86/kvm/stats_fs.c
> >  create mode 100644 fs/stats_fs/Makefile
> >  create mode 100644 fs/stats_fs/inode.c
> >  create mode 100644 fs/stats_fs/internal.h
> >  create mode 100644 fs/stats_fs/stats_fs-tests.c
> >  create mode 100644 fs/stats_fs/stats_fs.c
> >  create mode 100644 include/linux/stats_fs.h
> >
> > --
> > 2.25.2
> >
> >

^ permalink raw reply

* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Michael Ellerman @ 2020-06-04 11:28 UTC (permalink / raw)
  To: Stephen Rothwell, Andrew Morton
  Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604183805.6f384b23@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:
> Hi all,
>
> On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 25c3cb8272c0..a6799723cd98 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1008,6 +1008,12 @@ extern struct page *p4d_page(p4d_t p4d);
>>  #define pud_page_vaddr(pud)	__va(pud_val(pud) & ~PUD_MASKED_BITS)
>>  #define p4d_page_vaddr(p4d)	__va(p4d_val(p4d) & ~P4D_MASKED_BITS)
>>  
>> +static inline unsigned long pgd_index(unsigned long address)
>> +{
>> +	return (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
>> +}
>> +#define pgd_index pgd_index
>> +
>>  #define pte_ERROR(e) \
>>  	pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
>>  #define pmd_ERROR(e) \
>
> I have added that hunk to linux-next for tomorrow as a fix for
> mm-consolidate-pgd_index-and-pgd_offset_k-definitions.
>
> Its not strickly necessary, but Michael expressed a preference for the
> inline function.

That was because we just recently converted it into a static inline to
avoid UBSAN warnings:

commit c2e929b18cea6cbf71364f22d742d9aad7f4677a
Author:     Qian Cai <cai@lca.pw>
AuthorDate: Thu Mar 5 23:48:52 2020 -0500

    powerpc/64s/pgtable: fix an undefined behaviour

    Booting a power9 server with hash MMU could trigger an undefined
    behaviour because pud_offset(p4d, 0) will do,

    0 >> (PAGE_SHIFT:16 + PTE_INDEX_SIZE:8 + H_PMD_INDEX_SIZE:10)

    Fix it by converting pud_index() and friends to static inline
    functions.

    UBSAN: shift-out-of-bounds in arch/powerpc/mm/ptdump/ptdump.c:282:15
    shift exponent 34 is too large for 32-bit type 'int'
    CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200303+ #13
    Call Trace:
    dump_stack+0xf4/0x164 (unreliable)
    ubsan_epilogue+0x18/0x78
    __ubsan_handle_shift_out_of_bounds+0x160/0x21c
    walk_pagetables+0x2cc/0x700
    walk_pud at arch/powerpc/mm/ptdump/ptdump.c:282
    (inlined by) walk_pagetables at arch/powerpc/mm/ptdump/ptdump.c:311
    ptdump_check_wx+0x8c/0xf0
    mark_rodata_ro+0x48/0x80
    kernel_init+0x74/0x194
    ret_from_kernel_thread+0x5c/0x74


> I was wondering if pgd_index "Must be a compile-time
> constant" on one (or a few) architectures, then why not leave the
> default as an inline function and special case it as a macro where
> needed ...

AIUI that requirement comes from x86 which has:

#define KERNEL_PGD_BOUNDARY	pgd_index(PAGE_OFFSET)
#define KERNEL_PGD_PTRS		(PTRS_PER_PGD - KERNEL_PGD_BOUNDARY)
...
#define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
...
pgd_t *pgd_alloc(struct mm_struct *mm)
{
	pgd_t *pgd;
	pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS];


Which will produce a variable length array if pgd_index() isn't a
compile-time constant.

cheers

^ permalink raw reply

* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 11:22 UTC (permalink / raw)
  To: Andrew Morton, Michael Ellerman
  Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604174925.3610fdd1@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

Hi all,

On Thu, 4 Jun 2020 17:49:25 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --cc arch/powerpc/include/asm/nohash/32/pgtable.h
> index 639f3b3713ec,eb8538c85077..1927e1b653f2
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@@ -342,15 -334,6 +337,10 @@@ static inline int pte_young(pte_t pte
>   	pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
>   #endif
>   
> - /* Find an entry in the third-level page table.. */
> - #define pte_index(address)		\
> - 	(((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
>  +#define pte_offset_kernel(dir, addr)	\
>  +	(pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
>  +				  pte_index(addr))
> - #define pte_offset_map(dir, addr)	pte_offset_kernel((dir), (addr))
> - static inline void pte_unmap(pte_t *pte) { }
>  +
>   /*
>    * Encode and decode a swap entry.
>    * Note that the bits we use in a PTE for representing a swap entry

I have added this hunk (sort of - see below) to linux-next for tomorrow
as a fix for mm-consolidate-pte_index-and-pte_offset_-definitions.

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 4 Jun 2020 21:16:19 +1000
Subject: [PATCH] mm-consolidate-pte_index-and-pte_offset_-definitions-fix

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index c188a6f64bcd..d94bcd117c5b 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -341,6 +341,10 @@ static inline int pte_young(pte_t pte)
 	pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
 #endif
 
+#define pte_offset_kernel(dir, addr)	\
+	(pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
+				  pte_index(addr))
+
 /*
  * Encode and decode a swap entry.
  * Note that the bits we use in a PTE for representing a swap entry
-- 
2.27.0.rc2

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related

* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Will Deacon @ 2020-06-04 11:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Stephen Rothwell, peterz, Linux Kernel Mailing List,
	Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <dc2b16e1-b719-5500-508d-ae97bf50c4a6@csgroup.eu>

Hi, [+Peter]

On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
> Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
> CONFIG_PPC_4K_PAGES, getting the following build failure:
> 
>   CC      mm/gup.o
> In file included from ./include/linux/kernel.h:11:0,
>                  from mm/gup.c:2:
> In function 'gup_hugepte.constprop',
>     inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
> declared with attribute error: Unsupported access size for
> {READ,WRITE}_ONCE().
>   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>                                       ^
> ./include/linux/compiler.h:373:4: note: in definition of macro
> '__compiletime_assert'
>     prefix ## suffix();    \
>     ^
> ./include/linux/compiler.h:392:2: note: in expansion of macro
> '_compiletime_assert'
>   _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>   ^
> ./include/linux/compiler.h:405:2: note: in expansion of macro
> 'compiletime_assert'
>   compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
>   ^
> ./include/linux/compiler.h:291:2: note: in expansion of macro
> 'compiletime_assert_rwonce_type'
>   compiletime_assert_rwonce_type(x);    \
>   ^
> mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
>   pte = READ_ONCE(*ptep);
>         ^
> In function 'gup_get_pte',
>     inlined from 'gup_pte_range' at mm/gup.c:2228:9,
>     inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
>     inlined from 'gup_pud_range' at mm/gup.c:2641:15,
>     inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
>     inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
>     inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:

At first glance, this looks like a real bug in the 16k page code -- you're
loading the pte non-atomically on the fast GUP path and so you're prone to
tearing, which probably isn't what you want. For a short-term hack, I'd
suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
you want to support this them you'll need to rework your pte_t so that it
can be loaded atomically.

Will

^ permalink raw reply

* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 11:08 UTC (permalink / raw)
  To: Andrew Morton, Michael Ellerman
  Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604174925.3610fdd1@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

Hi all,

On Thu, 4 Jun 2020 17:49:25 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --cc arch/powerpc/include/asm/nohash/32/pgtable.h
> index 639f3b3713ec,eb8538c85077..1927e1b653f2
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@@ -204,13 -205,6 +205,9 @@@ static inline void pmd_clear(pmd_t *pmd
>   	*pmdp = __pmd(0);
>   }
>   
> - 
> - /* to find an entry in a kernel page-table-directory */
> - #define pgd_offset_k(address) pgd_offset(&init_mm, address)
> - 
>  +/* to find an entry in a page-table-directory */
>  +#define pgd_index(address)	 ((address) >> PGDIR_SHIFT)
>  +#define pgd_offset(mm, address)	 ((mm)->pgd + pgd_index(address))
>   
>   /*
>    * PTE updates. This function is called whenever an existing
> @@@ -240,7 -234,7 +237,7 @@@ static inline pte_basic_t pte_update(st
>   	pte_basic_t old = pte_val(*p);
>   	pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
>   	int num, i;
> --	pmd_t *pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
> ++	pmd_t *pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, addr), addr), addr), addr);
>   
>   	if (!huge)
>   		num = PAGE_SIZE / SZ_4K;

I have added those hunks (more or less) to linux-next for tomorrow as a
fix for mm-consolidate-pgd_index-and-pgd_offset_k-definitions.
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox