LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 12/27] m68k: syscalls: switch to generic syscalltbl.sh
From: Geert Uytterhoeven @ 2021-01-28  8:17 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux-Arch, open list:TENSILICA XTENSA PORT (xtensa),
	linux-ia64@vger.kernel.org, Parisc List, linux-kbuild,
	Linux-sh list, the arch/x86 maintainers, linux-um,
	Linux Kernel Mailing List, open list:BROADCOM NVRAM DRIVER,
	linux-m68k, alpha, sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20210128005110.2613902-13-masahiroy@kernel.org>

Hi Yamada-san,

On Thu, Jan 28, 2021 at 1:54 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> As of v5.11-rc1, 12 architectures duplicate similar shell scripts in
> order to generate syscall table headers. My goal is to unify them into
> the single scripts/syscalltbl.sh.
>
> This commit converts m68k to use scripts/syscalltbl.sh.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Thanks a lot!

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 02/27] x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()
From: Sergei Shtylyov @ 2021-01-28  8:00 UTC (permalink / raw)
  To: Masahiro Yamada, linux-arch, x86
  Cc: linux-xtensa, linux-ia64, linux-parisc, linux-kbuild, linux-sh,
	linux-um, linux-kernel, linux-mips, linux-m68k, linux-alpha,
	sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <dd37a7f2-55e1-2e96-0c93-4a40980b8ef2@gmail.com>

On 28.01.2021 10:59, Sergei Shtylyov wrote:

[...]
>> Building kernel/sys_ni.c with W=1 omits tons of -Wmissing-prototypes
> 
>     Emits?
> 
>> warnings.
>>
>> $ make W=1 kernel/sys_ni.o
>>    [ snip ]
>>    CC      kernel/sys_ni.o
>> In file included from kernel/sys_ni.c:10:
>> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous 
>> prototype for '__x64_sys_io_setup' [-Wmissing-prototypes]
>>     83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
>>        |              ^~
>> ./arch/x86/include/asm/syscall_wrapper.h:100:2: note: in expansion of macro 
>> '__COND_SYSCALL'
>>    100 |  __COND_SYSCALL(x64, sys_##name)
>>        |  ^~~~~~~~~~~~~~
>> ./arch/x86/include/asm/syscall_wrapper.h:256:2: note: in expansion of macro 
>> '__X64_COND_SYSCALL'
>>    256 |  __X64_COND_SYSCALL(name)     \
>>        |  ^~~~~~~~~~~~~~~~~~
>> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
>>     39 | COND_SYSCALL(io_setup);
>>        | ^~~~~~~~~~~~
>> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous 
>> prototype for '__ia32_sys_io_setup' [-Wmissing-prototypes]
>>     83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
>>        |              ^~
>> ./arch/x86/include/asm/syscall_wrapper.h:120:2: note: in expansion of macro 
>> '__COND_SYSCALL'
>>    120 |  __COND_SYSCALL(ia32, sys_##name)
>>        |  ^~~~~~~~~~~~~~
>> ./arch/x86/include/asm/syscall_wrapper.h:257:2: note: in expansion of macro 
>> '__IA32_COND_SYSCALL'
>>    257 |  __IA32_COND_SYSCALL(name)
>>        |  ^~~~~~~~~~~~~~~~~~~
>> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
>>     39 | COND_SYSCALL(io_setup);
>>        | ^~~~~~~~~~~~
>>    ...
>>
>> __SYS_STUB0() and __SYS_STUBx() defined a few lines above have forward
>> declarations. Let's do likewise for __COND_SYSCALL() to fix the
>> warnings.
>>
>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>> ---
>>
>>   arch/x86/include/asm/syscall_wrapper.h | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/include/asm/syscall_wrapper.h 
>> b/arch/x86/include/asm/syscall_wrapper.h
>> index a84333adeef2..80c08c7d5e72 100644
>> --- a/arch/x86/include/asm/syscall_wrapper.h
>> +++ b/arch/x86/include/asm/syscall_wrapper.h
>> @@ -80,6 +80,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs 
>> *regs);
>>       }
>>   #define __COND_SYSCALL(abi, name)                    \
>> +    __weak long __##abi##_##name(const struct pt_regs *__unused);    \
>>       __weak long __##abi##_##name(const struct pt_regs *__unused)    \
> 
>     Aren't these two lines identical?

     Ah, got it! :-)

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 02/27] x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()
From: Sergei Shtylyov @ 2021-01-28  7:59 UTC (permalink / raw)
  To: Masahiro Yamada, linux-arch, x86
  Cc: linux-xtensa, linux-ia64, linux-parisc, linux-kbuild, linux-sh,
	linux-um, linux-kernel, linux-mips, linux-m68k, linux-alpha,
	sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210128005110.2613902-3-masahiroy@kernel.org>

Hello!

On 28.01.2021 3:50, Masahiro Yamada wrote:

> Building kernel/sys_ni.c with W=1 omits tons of -Wmissing-prototypes

    Emits?

> warnings.
> 
> $ make W=1 kernel/sys_ni.o
>    [ snip ]
>    CC      kernel/sys_ni.o
> In file included from kernel/sys_ni.c:10:
> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype for '__x64_sys_io_setup' [-Wmissing-prototypes]
>     83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
>        |              ^~
> ./arch/x86/include/asm/syscall_wrapper.h:100:2: note: in expansion of macro '__COND_SYSCALL'
>    100 |  __COND_SYSCALL(x64, sys_##name)
>        |  ^~~~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:256:2: note: in expansion of macro '__X64_COND_SYSCALL'
>    256 |  __X64_COND_SYSCALL(name)     \
>        |  ^~~~~~~~~~~~~~~~~~
> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
>     39 | COND_SYSCALL(io_setup);
>        | ^~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype for '__ia32_sys_io_setup' [-Wmissing-prototypes]
>     83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
>        |              ^~
> ./arch/x86/include/asm/syscall_wrapper.h:120:2: note: in expansion of macro '__COND_SYSCALL'
>    120 |  __COND_SYSCALL(ia32, sys_##name)
>        |  ^~~~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:257:2: note: in expansion of macro '__IA32_COND_SYSCALL'
>    257 |  __IA32_COND_SYSCALL(name)
>        |  ^~~~~~~~~~~~~~~~~~~
> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
>     39 | COND_SYSCALL(io_setup);
>        | ^~~~~~~~~~~~
>    ...
> 
> __SYS_STUB0() and __SYS_STUBx() defined a few lines above have forward
> declarations. Let's do likewise for __COND_SYSCALL() to fix the
> warnings.
> 
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> 
>   arch/x86/include/asm/syscall_wrapper.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
> index a84333adeef2..80c08c7d5e72 100644
> --- a/arch/x86/include/asm/syscall_wrapper.h
> +++ b/arch/x86/include/asm/syscall_wrapper.h
> @@ -80,6 +80,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
>   	}
>   
>   #define __COND_SYSCALL(abi, name)					\
> +	__weak long __##abi##_##name(const struct pt_regs *__unused);	\
>   	__weak long __##abi##_##name(const struct pt_regs *__unused)	\

    Aren't these two lines identical?

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
From: Geert Uytterhoeven @ 2021-01-28  9:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Roy Zang, Lorenzo Pieralisi, PCI, LKML, Minghuan Lian,
	Michael Walle, linux-arm-kernel, Greg Kroah-Hartman,
	Bjorn Helgaas, linuxppc-dev, Mingkai Hu
In-Reply-To: <CAGETcx_81qOe2LvX-J_PBZWdouykPoPYdf5=yMVhnjgDxAkgaw@mail.gmail.com>

Hi Saravana,

On Wed, Jan 27, 2021 at 6:11 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jan 27, 2021 at 8:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Jan 27, 2021 at 5:42 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Jan 26, 2021 at 11:43 PM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > > On Wed, Jan 27, 2021 at 1:44 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Tue, Jan 26, 2021 at 12:50 AM Geert Uytterhoeven
> > > > > <geert@linux-m68k.org> wrote:
> > > > > > On Mon, Jan 25, 2021 at 11:42 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > On Mon, Jan 25, 2021 at 11:49 AM Michael Walle <michael@walle.cc> wrote:
> > > > > > > > Am 2021-01-21 12:01, schrieb Geert Uytterhoeven:
> > > > > > > > > On Thu, Jan 21, 2021 at 1:05 AM Saravana Kannan <saravanak@google.com>
> > > > > > > > > wrote:
> > > > > > > > >> On Wed, Jan 20, 2021 at 3:53 PM Michael Walle <michael@walle.cc>
> > > > > > > > >> wrote:
> > > > > > > > >> > Am 2021-01-20 20:47, schrieb Saravana Kannan:
> > > > > > > > >> > > On Wed, Jan 20, 2021 at 11:28 AM Michael Walle <michael@walle.cc>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >>
> > > > > > > > >> > >> [RESEND, fat-fingered the buttons of my mail client and converted
> > > > > > > > >> > >> all CCs to BCCs :(]
> > > > > > > > >> > >>
> > > > > > > > >> > >> Am 2021-01-20 20:02, schrieb Saravana Kannan:
> > > > > > > > >> > >> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring <robh@kernel.org> wrote:
> > > > > > > > >> > >> >>
> > > > > > > > >> > >> >> On Wed, Jan 20, 2021 at 4:53 AM Michael Walle <michael@walle.cc>
> > > > > > > > >> > >> >> wrote:
> > > > > > > > >> > >> >> >
> > > > > > > > >> > >> >> > fw_devlink will defer the probe until all suppliers are ready. We can't
> > > > > > > > >> > >> >> > use builtin_platform_driver_probe() because it doesn't retry after probe
> > > > > > > > >> > >> >> > deferral. Convert it to builtin_platform_driver().
> > > > > > > > >> > >> >>
> > > > > > > > >> > >> >> If builtin_platform_driver_probe() doesn't work with fw_devlink, then
> > > > > > > > >> > >> >> shouldn't it be fixed or removed?
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > I was actually thinking about this too. The problem with fixing
> > > > > > > > >> > >> > builtin_platform_driver_probe() to behave like
> > > > > > > > >> > >> > builtin_platform_driver() is that these probe functions could be
> > > > > > > > >> > >> > marked with __init. But there are also only 20 instances of
> > > > > > > > >> > >> > builtin_platform_driver_probe() in the kernel:
> > > > > > > > >> > >> > $ git grep ^builtin_platform_driver_probe | wc -l
> > > > > > > > >> > >> > 20
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > So it might be easier to just fix them to not use
> > > > > > > > >> > >> > builtin_platform_driver_probe().
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > Michael,
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > Any chance you'd be willing to help me by converting all these to
> > > > > > > > >> > >> > builtin_platform_driver() and delete builtin_platform_driver_probe()?
> > > > > > > > >> > >>
> > > > > > > > >> > >> If it just moving the probe function to the _driver struct and
> > > > > > > > >> > >> remove the __init annotations. I could look into that.
> > > > > > > > >> > >
> > > > > > > > >> > > Yup. That's pretty much it AFAICT.
> > > > > > > > >> > >
> > > > > > > > >> > > builtin_platform_driver_probe() also makes sure the driver doesn't ask
> > > > > > > > >> > > for async probe, etc. But I doubt anyone is actually setting async
> > > > > > > > >> > > flags and still using builtin_platform_driver_probe().
> > > > > > > > >> >
> > > > > > > > >> > Hasn't module_platform_driver_probe() the same problem? And there
> > > > > > > > >> > are ~80 drivers which uses that.
> > > > > > > > >>
> > > > > > > > >> Yeah. The biggest problem with all of these is the __init markers.
> > > > > > > > >> Maybe some familiar with coccinelle can help?
> > > > > > > > >
> > > > > > > > > And dropping them will increase memory usage.
> > > > > > > >
> > > > > > > > Although I do have the changes for the builtin_platform_driver_probe()
> > > > > > > > ready, I don't think it makes much sense to send these unless we agree
> > > > > > > > on the increased memory footprint. While there are just a few
> > > > > > > > builtin_platform_driver_probe() and memory increase _might_ be
> > > > > > > > negligible, there are many more module_platform_driver_probe().
> > > > > > >
> > > > > > > While it's good to drop code that'll not be used past kernel init, the
> > > > > > > module_platform_driver_probe() is going even more extreme. It doesn't
> > > > > > > even allow deferred probe (well before kernel init is done). I don't
> > > > > > > think that behavior is right and that's why we should delete it. Also,
> > > > > >
> > > > > > This construct is typically used for builtin hardware for which the
> > > > > > dependencies are registered very early, and thus known to probe at
> > > > > > first try (if present).
> > > > > >
> > > > > > > I doubt if any of these probe functions even take up 4KB of memory.
> > > > > >
> > > > > > How many 4 KiB pages do you have in a system with 10 MiB of SRAM?
> > > > > > How many can you afford to waste?
> > > > >
> > > > > There are only a few instances of this macro in the kernel. How many
> > > >
> > > > $ git grep -lw builtin_platform_driver_probe | wc -l
> > > > 21
> > > > $ git grep -lw module_platform_driver_probe | wc -l
> > > > 86
> > > >
> > > > + the ones that haven't been converted to the above yet:
> > > >
> > > > $ git grep -lw platform_driver_probe | wc -l
> > > > 58
> > > >
> > >
> > > Yeah, this adds up in terms of the number of places we'd need to fix.
> > > But thinking more about it, a couple of points:
> > > 1. Not all builtin_platform_driver_probe() are problems for
> > > fw_devlink. So we can just fix them as we go if we need to.
> > >
> > > 2. The problem with builtin_platform_driver_probe() isn't really with
> > > the use of __init. It's the fact that it doesn't allow deferred
> > > probes. builtin_platform_driver_probe()/platform_driver_probe() could
> > > still be fixed up to allow deferred probe until we get to the point
> > > where we free the __init section (so at least till late_initcall).
> >
> > That's intentional: it is used for cases that will (must) never be deferred.
> > That's why it's safe to use __init.
>
> So was the usage of builtin_platform_driver_probe() wrong in the
> driver Michael fixed? Because, deferring and probing again clearly
> works?

It wasn't.  The regression is that the driver no longer probes at first
try, because its dependencies are now probed later.  The question is:
why are the dependencies now probed later?  How to fix that?

> Also, "must never be deferred" seems like a weird condition to
> enforce. I think the real "rule" is that if it defers, the platform is
> not expected to work. But disallowing a probe reattempt seems weird.
> What is it going to hurt if it's attempted again? At worst it fails
> one more time?

"must never be deferred" is not the real condition, but "must not be
probed after __init is freed" is (one of them, there may be other, cfr.
the last paragraph below).  The simplest way to guarantee that is to
probe the driver immediately, and only once.

> Also, I'd argue that all/most of the "can't defer, but I'm still a
> proper struct device" cases are all just patchwork to deal with the
> fact we were playing initcall chicken when there was no fw_devlink.
> I'm hoping we can move people away from that mindset. And the first

I agree, partially.  Still, how come the dependencies are no longer
probed before their consumers when fw_devlinks are enabled?
I thought fw_devlinks is supposed to avoid exactly that?

> step towards that would be to allow *platform_probe() to allow
> deferred probes until late_initcall().

At which increase of complexity, to keep track of which drivers can and
cannot be reprobed anymore after late_initcall?
Still, many of these drivers use platform_driver_probe() early for a
reason: because they need to initialize the hardware early. Probing them
later may introduce subtle bugs.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v6 08/39] powerpc: rearrange do_page_fault error case to be inside exception_enter
From: Christophe Leroy @ 2021-01-28  9:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210115165012.1260253-9-npiggin@gmail.com>



Le 15/01/2021 à 17:49, Nicholas Piggin a écrit :
> This keeps the context tracking over the entire interrupt handler which
> helps later with moving context tracking into interrupt wrappers.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/fault.c | 28 ++++++++++++++++------------
>   1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index e476d7701413..e4121fd9fcf1 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -544,20 +544,24 @@ NOKPROBE_SYMBOL(__do_page_fault);
>   
>   long do_page_fault(struct pt_regs *regs)
>   {
> -	const struct exception_table_entry *entry;
> -	enum ctx_state prev_state = exception_enter();
> -	int rc = __do_page_fault(regs, regs->dar, regs->dsisr);
> -	exception_exit(prev_state);
> -	if (likely(!rc))
> -		return 0;
> -
> -	entry = search_exception_tables(regs->nip);
> -	if (unlikely(!entry))
> -		return rc;

Could we keep this layout with using a 'goto' to the end of the function, instead of pushing error 
handling to the right ?

Because at the end of the series once all context tracking is gone into helpers, the result looks 
unfriendly.

It would look cleaner as:

static long __do_page_fault(struct pt_regs *regs)
{
	long err;
	const struct exception_table_entry *entry;

	err = ___do_page_fault(regs, regs->dar, regs->dsisr);
	if (likely(!err))
		return 0;

	entry = search_exception_tables(regs->nip);
	if (likely(entry)) {
		instruction_pointer_set(regs, extable_fixup(entry));
		return 0;
	} else if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
		/* 32 and 64e handle this in asm */
		return err;
	}
	__bad_page_fault(regs, err);
	return 0;
}
NOKPROBE_SYMBOL(__do_page_fault);



> +	enum ctx_state prev_state;
> +	long err;
> +
> +	prev_state = exception_enter();
> +	err = __do_page_fault(regs, regs->dar, regs->dsisr);
> +	if (unlikely(err)) {
> +		const struct exception_table_entry *entry;
> +
> +		entry = search_exception_tables(regs->nip);
> +		if (likely(entry)) {
> +			instruction_pointer_set(regs, extable_fixup(entry));
> +			err = 0;
> +		}
> +	}
>   
> -	instruction_pointer_set(regs, extable_fixup(entry));
> +	exception_exit(prev_state);
>   
> -	return 0;
> +	return err;
>   }
>   NOKPROBE_SYMBOL(do_page_fault);
>   
> 

^ permalink raw reply

* Re: [PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers
From: Ganesh @ 2021-01-28  9:27 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, mpe; +Cc: mahesh, npiggin
In-Reply-To: <90f24b44-1747-21f4-3829-7af20cf95e46@csgroup.eu>

On 1/25/21 2:54 PM, Christophe Leroy wrote:

>
>
> Le 22/01/2021 à 13:32, Ganesh Goudar a écrit :
>> Access to per-cpu variables requires translation to be enabled on
>> pseries machine running in hash mmu mode, Since part of MCE handler
>> runs in realmode and part of MCE handling code is shared between ppc
>> architectures pseries and powernv, it becomes difficult to manage
>> these variables differently on different architectures, So have
>> these variables in paca instead of having them as per-cpu variables
>> to avoid complications.
>>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>> v2: Dynamically allocate memory for machine check event info
>>
>> v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid
>>      to allocate memory.
>>
>> v4: Spliting the patch into two.
>> ---
>>   arch/powerpc/include/asm/mce.h     | 18 +++++++
>>   arch/powerpc/include/asm/paca.h    |  4 ++
>>   arch/powerpc/kernel/mce.c          | 79 ++++++++++++++++++------------
>>   arch/powerpc/kernel/setup-common.c |  2 +-
>>   4 files changed, 70 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/setup-common.c 
>> b/arch/powerpc/kernel/setup-common.c
>> index 71f38e9248be..17dc451f0e45 100644
>> --- a/arch/powerpc/kernel/setup-common.c
>> +++ b/arch/powerpc/kernel/setup-common.c
>> @@ -916,7 +916,6 @@ void __init setup_arch(char **cmdline_p)
>>       /* On BookE, setup per-core TLB data structures. */
>>       setup_tlb_core_data();
>>   #endif
>> -
>
> This line removal is really required for this patch ?
I will correct it, Thanks for catching.
>
>>       /* Print various info about the machine that has been gathered 
>> so far. */
>>       print_system_info();
>>   @@ -938,6 +937,7 @@ void __init setup_arch(char **cmdline_p)
>>       exc_lvl_early_init();
>>       emergency_stack_init();
>>   +    mce_init();
>
> You have to include mce.h to avoid build failure on PPC32.
Sure, thanks
>>       smp_release_cpus();
>>         initmem_init();
>>

^ permalink raw reply

* Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
From: Tony Lindgren @ 2021-01-28 10:00 UTC (permalink / raw)
  To: Michael Walle
  Cc: Roy Zang, Lorenzo Pieralisi, Saravana Kannan, PCI, LKML,
	Kishon Vijay Abraham I, Minghuan Lian, Geert Uytterhoeven,
	Mingkai Hu, Greg Kroah-Hartman, Bjorn Helgaas, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <a24391e62b107040435766fff52bdd31@walle.cc>

Hi,

* Michael Walle <michael@walle.cc> [210125 19:52]:
> Although I do have the changes for the builtin_platform_driver_probe()
> ready, I don't think it makes much sense to send these unless we agree
> on the increased memory footprint. While there are just a few
> builtin_platform_driver_probe() and memory increase _might_ be
> negligible, there are many more module_platform_driver_probe().

I just noticed this thread today and have pretty much come to the same
conclusions. No need to post a patch for pci-dra7xx.c, I already posted
a patch for pci-dra7xx.c yesterday as part of genpd related changes.

For me probing started breaking as the power-domains property got added.

FYI, it's the following patch:

[PATCH 01/15] PCI: pci-dra7xx: Prepare for deferred probe with module_platform_driver

Regards,

Tony



^ permalink raw reply

* Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
From: Tony Lindgren @ 2021-01-28 10:35 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Roy Zang, Lorenzo Pieralisi, Saravana Kannan, PCI, LKML,
	Minghuan Lian, Michael Walle, Mingkai Hu, Greg Kroah-Hartman,
	Bjorn Helgaas, linuxppc-dev, linux-arm-kernel
In-Reply-To: <CAMuHMdVHouzMFiGcUz=0M0_RFL-OBvkRvQiF5h56XKDMZuC7Kg@mail.gmail.com>

* Geert Uytterhoeven <geert@linux-m68k.org> [210128 09:32]:
> It wasn't.  The regression is that the driver no longer probes at first
> try, because its dependencies are now probed later.  The question is:
> why are the dependencies now probed later?  How to fix that?

I'm afraid that may be unfixable.. It depends on things like the bus
driver probe that might get also deferred.

As suggested, I agree it's best to get rid of builtin_platform_driver_probe
where possible at the cost of dropping the __init as needed.

To me it seems we can't even add a warning to __platform_driver_probe()
if there's drv->driver.of_match_table for example. That warning would
show up on all the devices with driver in question built in even if
the device has no such hardware.

Regards,

Tony

^ permalink raw reply

* RE: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: David Laight @ 2021-01-28 10:38 UTC (permalink / raw)
  To: 'Christopher M. Riedl', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20210128040424.12720-3-cmr@codefail.de>

From: Christopher M. Riedl
> Sent: 28 January 2021 04:04
> 
> Reuse the "safe" implementation from signal.c except for calling
> unsafe_copy_from_user() to copy into a local buffer.
> 
> Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
> ---
>  arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> index 2559a681536e..c18402d625f1 100644
> --- a/arch/powerpc/kernel/signal.h
> +++ b/arch/powerpc/kernel/signal.h
> @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
>  				&buf[i], label);\
>  } while (0)
> 
> +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
> +	struct task_struct *__t = task;					\
> +	u64 __user *__f = (u64 __user *)from;				\
> +	u64 buf[ELF_NFPREG];						\

How big is that buffer?
Isn't is likely to be reasonably large compared to a reasonable
kernel stack frame.
Especially since this isn't even a leaf function.

> +	int i;								\
> +									\
> +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\

That really ought to be sizeof(buf).

	David


> +				label);					\
> +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
> +		__t->thread.TS_FPR(i) = buf[i];				\
> +	__t->thread.fp_state.fpscr = buf[i];				\
> +} while (0)
> +
> +#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
> +	struct task_struct *__t = task;					\
> +	u64 __user *__f = (u64 __user *)from;				\
> +	u64 buf[ELF_NVSRHALFREG];					\
> +	int i;								\
> +									\
> +	unsafe_copy_from_user(buf, __f,					\
> +				ELF_NVSRHALFREG * sizeof(double),	\
> +				label);					\
> +	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
> +		__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];	\
> +} while (0)
> +
> +
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
>  	struct task_struct *__t = task;					\
> @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
>  	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
>  			    ELF_NFPREG * sizeof(double), label)
> 
> +#define unsafe_copy_fpr_from_user(task, from, label)			\
> +	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
> +			    ELF_NFPREG * sizeof(double), label)
> +
>  static inline unsigned long
>  copy_fpr_to_user(void __user *to, struct task_struct *task)
>  {
> @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
>  #else
>  #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
> 
> +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
> +
>  static inline unsigned long
>  copy_fpr_to_user(void __user *to, struct task_struct *task)
>  {
> --
> 2.26.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* [PATCH v5 1/2] powerpc/mce: Reduce the size of event arrays
From: Ganesh Goudar @ 2021-01-28 10:41 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin

Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100. This saves us ~19kB
of memory and has no fatal consequences.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
v4: This patch is a fragment of the orignal patch which is 
    split into two.

v5: No changes.
---
 arch/powerpc/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index e6c27ae843dc..7d8b6679ec68 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -204,7 +204,7 @@ struct mce_error_info {
 	bool			ignore_event;
 };
 
-#define MAX_MC_EVT	100
+#define MAX_MC_EVT	10
 
 /* Release flags for get_mce_event() */
 #define MCE_EVENT_RELEASE	true
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 2/2] powerpc/mce: Remove per cpu variables from MCE handlers
From: Ganesh Goudar @ 2021-01-28 10:41 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin
In-Reply-To: <20210128104143.70668-1-ganeshgr@linux.ibm.com>

Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between ppc
architectures pseries and powernv, it becomes difficult to manage
these variables differently on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
v2: Dynamically allocate memory for machine check event info.

v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid
    to allocate memory.

v4: Spliting the patch into two.

v5: Fix build error for PPC32.
---
 arch/powerpc/include/asm/mce.h     | 18 +++++++
 arch/powerpc/include/asm/paca.h    |  4 ++
 arch/powerpc/kernel/mce.c          | 79 ++++++++++++++++++------------
 arch/powerpc/kernel/setup-common.c |  2 +
 4 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 7d8b6679ec68..331d944280b8 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -206,6 +206,17 @@ struct mce_error_info {
 
 #define MAX_MC_EVT	10
 
+struct mce_info {
+	int mce_nest_count;
+	struct machine_check_event mce_event[MAX_MC_EVT];
+	/* Queue for delayed MCE events. */
+	int mce_queue_count;
+	struct machine_check_event mce_event_queue[MAX_MC_EVT];
+	/* Queue for delayed MCE UE events. */
+	int mce_ue_count;
+	struct machine_check_event  mce_ue_event_queue[MAX_MC_EVT];
+};
+
 /* Release flags for get_mce_event() */
 #define MCE_EVENT_RELEASE	true
 #define MCE_EVENT_DONTRELEASE	false
@@ -234,4 +245,11 @@ long __machine_check_early_realmode_p8(struct pt_regs *regs);
 long __machine_check_early_realmode_p9(struct pt_regs *regs);
 long __machine_check_early_realmode_p10(struct pt_regs *regs);
 #endif /* CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mce_init(void);
+#else
+static inline void mce_init(void) { };
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..38e0c55e845d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -29,6 +29,7 @@
 #include <asm/hmi.h>
 #include <asm/cpuidle.h>
 #include <asm/atomic.h>
+#include <asm/mce.h>
 
 #include <asm-generic/mmiowb_types.h>
 
@@ -273,6 +274,9 @@ struct paca_struct {
 #ifdef CONFIG_MMIOWB
 	struct mmiowb_state mmiowb_state;
 #endif
+#ifdef CONFIG_PPC_BOOK3S_64
+	struct mce_info *mce_info;
+#endif /* CONFIG_PPC_BOOK3S_64 */
 } ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 9f3e133b57b7..6ec5c68997ed 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -17,22 +17,13 @@
 #include <linux/irq_work.h>
 #include <linux/extable.h>
 #include <linux/ftrace.h>
+#include <linux/memblock.h>
 
 #include <asm/machdep.h>
 #include <asm/mce.h>
 #include <asm/nmi.h>
 
-static DEFINE_PER_CPU(int, mce_nest_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
-
-/* Queue for delayed MCE events. */
-static DEFINE_PER_CPU(int, mce_queue_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue);
-
-/* Queue for delayed MCE UE events. */
-static DEFINE_PER_CPU(int, mce_ue_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT],
-					mce_ue_event_queue);
+#include "setup.h"
 
 static void machine_check_process_queued_event(struct irq_work *work);
 static void machine_check_ue_irq_work(struct irq_work *work);
@@ -103,9 +94,10 @@ void save_mce_event(struct pt_regs *regs, long handled,
 		    struct mce_error_info *mce_err,
 		    uint64_t nip, uint64_t addr, uint64_t phys_addr)
 {
-	int index = __this_cpu_inc_return(mce_nest_count) - 1;
-	struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
+	int index = local_paca->mce_info->mce_nest_count++;
+	struct machine_check_event *mce;
 
+	mce = &local_paca->mce_info->mce_event[index];
 	/*
 	 * Return if we don't have enough space to log mce event.
 	 * mce_nest_count may go beyond MAX_MC_EVT but that's ok,
@@ -191,7 +183,7 @@ void save_mce_event(struct pt_regs *regs, long handled,
  */
 int get_mce_event(struct machine_check_event *mce, bool release)
 {
-	int index = __this_cpu_read(mce_nest_count) - 1;
+	int index = local_paca->mce_info->mce_nest_count - 1;
 	struct machine_check_event *mc_evt;
 	int ret = 0;
 
@@ -201,7 +193,7 @@ int get_mce_event(struct machine_check_event *mce, bool release)
 
 	/* Check if we have MCE info to process. */
 	if (index < MAX_MC_EVT) {
-		mc_evt = this_cpu_ptr(&mce_event[index]);
+		mc_evt = &local_paca->mce_info->mce_event[index];
 		/* Copy the event structure and release the original */
 		if (mce)
 			*mce = *mc_evt;
@@ -211,7 +203,7 @@ int get_mce_event(struct machine_check_event *mce, bool release)
 	}
 	/* Decrement the count to free the slot. */
 	if (release)
-		__this_cpu_dec(mce_nest_count);
+		local_paca->mce_info->mce_nest_count--;
 
 	return ret;
 }
@@ -233,13 +225,14 @@ static void machine_check_ue_event(struct machine_check_event *evt)
 {
 	int index;
 
-	index = __this_cpu_inc_return(mce_ue_count) - 1;
+	index = local_paca->mce_info->mce_ue_count++;
 	/* If queue is full, just return for now. */
 	if (index >= MAX_MC_EVT) {
-		__this_cpu_dec(mce_ue_count);
+		local_paca->mce_info->mce_ue_count--;
 		return;
 	}
-	memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt));
+	memcpy(&local_paca->mce_info->mce_ue_event_queue[index],
+	       evt, sizeof(*evt));
 
 	/* Queue work to process this event later. */
 	irq_work_queue(&mce_ue_event_irq_work);
@@ -256,13 +249,14 @@ void machine_check_queue_event(void)
 	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
 		return;
 
-	index = __this_cpu_inc_return(mce_queue_count) - 1;
+	index = local_paca->mce_info->mce_queue_count++;
 	/* If queue is full, just return for now. */
 	if (index >= MAX_MC_EVT) {
-		__this_cpu_dec(mce_queue_count);
+		local_paca->mce_info->mce_queue_count--;
 		return;
 	}
-	memcpy(this_cpu_ptr(&mce_event_queue[index]), &evt, sizeof(evt));
+	memcpy(&local_paca->mce_info->mce_event_queue[index],
+	       &evt, sizeof(evt));
 
 	/* Queue irq work to process this event later. */
 	irq_work_queue(&mce_event_process_work);
@@ -289,9 +283,9 @@ static void machine_process_ue_event(struct work_struct *work)
 	int index;
 	struct machine_check_event *evt;
 
-	while (__this_cpu_read(mce_ue_count) > 0) {
-		index = __this_cpu_read(mce_ue_count) - 1;
-		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
+	while (local_paca->mce_info->mce_ue_count > 0) {
+		index = local_paca->mce_info->mce_ue_count - 1;
+		evt = &local_paca->mce_info->mce_ue_event_queue[index];
 		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
 #ifdef CONFIG_MEMORY_FAILURE
 		/*
@@ -304,7 +298,7 @@ static void machine_process_ue_event(struct work_struct *work)
 		 */
 		if (evt->error_type == MCE_ERROR_TYPE_UE) {
 			if (evt->u.ue_error.ignore_event) {
-				__this_cpu_dec(mce_ue_count);
+				local_paca->mce_info->mce_ue_count--;
 				continue;
 			}
 
@@ -320,7 +314,7 @@ static void machine_process_ue_event(struct work_struct *work)
 					"was generated\n");
 		}
 #endif
-		__this_cpu_dec(mce_ue_count);
+		local_paca->mce_info->mce_ue_count--;
 	}
 }
 /*
@@ -338,17 +332,17 @@ static void machine_check_process_queued_event(struct irq_work *work)
 	 * For now just print it to console.
 	 * TODO: log this error event to FSP or nvram.
 	 */
-	while (__this_cpu_read(mce_queue_count) > 0) {
-		index = __this_cpu_read(mce_queue_count) - 1;
-		evt = this_cpu_ptr(&mce_event_queue[index]);
+	while (local_paca->mce_info->mce_queue_count > 0) {
+		index = local_paca->mce_info->mce_queue_count - 1;
+		evt = &local_paca->mce_info->mce_event_queue[index];
 
 		if (evt->error_type == MCE_ERROR_TYPE_UE &&
 		    evt->u.ue_error.ignore_event) {
-			__this_cpu_dec(mce_queue_count);
+			local_paca->mce_info->mce_queue_count--;
 			continue;
 		}
 		machine_check_print_event_info(evt, false, false);
-		__this_cpu_dec(mce_queue_count);
+		local_paca->mce_info->mce_queue_count--;
 	}
 }
 
@@ -741,3 +735,24 @@ long hmi_exception_realmode(struct pt_regs *regs)
 
 	return 1;
 }
+
+void __init mce_init(void)
+{
+	struct mce_info *mce_info;
+	u64 limit;
+	int i;
+
+	limit = min(ppc64_bolted_size(), ppc64_rma_size);
+	for_each_possible_cpu(i) {
+		mce_info = memblock_alloc_try_nid(sizeof(*mce_info),
+						  __alignof__(*mce_info),
+						  MEMBLOCK_LOW_LIMIT,
+						  limit, cpu_to_node(i));
+		if (!mce_info)
+			goto err;
+		paca_ptrs[i]->mce_info = mce_info;
+	}
+	return;
+err:
+	panic("Failed to allocate memory for MCE event data\n");
+}
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 71f38e9248be..d480f091e0ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -64,6 +64,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_has_feature.h>
 #include <asm/kasan.h>
+#include <asm/mce.h>
 
 #include "setup.h"
 
@@ -938,6 +939,7 @@ void __init setup_arch(char **cmdline_p)
 	exc_lvl_early_init();
 	emergency_stack_init();
 
+	mce_init();
 	smp_release_cpus();
 
 	initmem_init();
-- 
2.26.2


^ permalink raw reply related

* [PATCH] ASoC: fsl_spdif: Utilize the defined parameter to clear code
From: Tang Bin @ 2021-01-28 11:27 UTC (permalink / raw)
  To: broonie, timur, nicoleotsuka, Xiubo.Lee, lgirdwood, perex, tiwai
  Cc: alsa-devel, linuxppc-dev, linux-kernel, Tang Bin

Utilize the defined parameter 'dev' to make the code cleaner.

Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
---
 sound/soc/fsl/fsl_spdif.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
index 455f96908..b6d5563df 100644
--- a/sound/soc/fsl/fsl_spdif.c
+++ b/sound/soc/fsl/fsl_spdif.c
@@ -1215,7 +1215,7 @@ static int fsl_spdif_probe_txclk(struct fsl_spdif_priv *spdif_priv,
 
 	for (i = 0; i < STC_TXCLK_SRC_MAX; i++) {
 		sprintf(tmp, "rxtx%d", i);
-		clk = devm_clk_get(&pdev->dev, tmp);
+		clk = devm_clk_get(dev, tmp);
 		if (IS_ERR(clk)) {
 			dev_err(dev, "no rxtx%d clock in devicetree\n", i);
 			return PTR_ERR(clk);
@@ -1237,14 +1237,14 @@ static int fsl_spdif_probe_txclk(struct fsl_spdif_priv *spdif_priv,
 			break;
 	}
 
-	dev_dbg(&pdev->dev, "use rxtx%d as tx clock source for %dHz sample rate\n",
+	dev_dbg(dev, "use rxtx%d as tx clock source for %dHz sample rate\n",
 			spdif_priv->txclk_src[index], rate[index]);
-	dev_dbg(&pdev->dev, "use txclk df %d for %dHz sample rate\n",
+	dev_dbg(dev, "use txclk df %d for %dHz sample rate\n",
 			spdif_priv->txclk_df[index], rate[index]);
 	if (clk_is_match(spdif_priv->txclk[index], spdif_priv->sysclk))
-		dev_dbg(&pdev->dev, "use sysclk df %d for %dHz sample rate\n",
+		dev_dbg(dev, "use sysclk df %d for %dHz sample rate\n",
 				spdif_priv->sysclk_df[index], rate[index]);
-	dev_dbg(&pdev->dev, "the best rate for %dHz sample rate is %dHz\n",
+	dev_dbg(dev, "the best rate for %dHz sample rate is %dHz\n",
 			rate[index], spdif_priv->txrate[index]);
 
 	return 0;
-- 
2.20.1.windows.1




^ permalink raw reply related

* Re: [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christophe Leroy @ 2021-01-28 12:05 UTC (permalink / raw)
  To: David Laight, 'Christopher M. Riedl',
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6a6ce1a53fcf4669a9848114d3460fef@AcuMS.aculab.com>



Le 28/01/2021 à 11:38, David Laight a écrit :
> From: Christopher M. Riedl
>> Sent: 28 January 2021 04:04
>>
>> Reuse the "safe" implementation from signal.c except for calling
>> unsafe_copy_from_user() to copy into a local buffer.
>>
>> Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
>> ---
>>   arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
>>   1 file changed, 33 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
>> index 2559a681536e..c18402d625f1 100644
>> --- a/arch/powerpc/kernel/signal.h
>> +++ b/arch/powerpc/kernel/signal.h
>> @@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
>>   				&buf[i], label);\
>>   } while (0)
>>
>> +#define unsafe_copy_fpr_from_user(task, from, label)	do {		\
>> +	struct task_struct *__t = task;					\
>> +	u64 __user *__f = (u64 __user *)from;				\
>> +	u64 buf[ELF_NFPREG];						\
> 
> How big is that buffer?

#define ELF_NFPREG	33

So that's 264 bytes.

That's a bit big but still reasonable I think.

Christophe

> Isn't is likely to be reasonably large compared to a reasonable
> kernel stack frame.
> Especially since this isn't even a leaf function.
> 
>> +	int i;								\
>> +									\
>> +	unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double),	\
> 
> That really ought to be sizeof(buf).
> 
> 	David
> 
> 
>> +				label);					\
>> +	for (i = 0; i < ELF_NFPREG - 1; i++)				\
>> +		__t->thread.TS_FPR(i) = buf[i];				\
>> +	__t->thread.fp_state.fpscr = buf[i];				\
>> +} while (0)
>> +
>> +#define unsafe_copy_vsx_from_user(task, from, label)	do {		\
>> +	struct task_struct *__t = task;					\
>> +	u64 __user *__f = (u64 __user *)from;				\
>> +	u64 buf[ELF_NVSRHALFREG];					\
>> +	int i;								\
>> +									\
>> +	unsafe_copy_from_user(buf, __f,					\
>> +				ELF_NVSRHALFREG * sizeof(double),	\
>> +				label);					\
>> +	for (i = 0; i < ELF_NVSRHALFREG ; i++)				\
>> +		__t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];	\
>> +} while (0)
>> +
>> +
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>   #define unsafe_copy_ckfpr_to_user(to, task, label)	do {		\
>>   	struct task_struct *__t = task;					\
>> @@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
>>   	unsafe_copy_to_user(to, (task)->thread.fp_state.fpr,	\
>>   			    ELF_NFPREG * sizeof(double), label)
>>
>> +#define unsafe_copy_fpr_from_user(task, from, label)			\
>> +	unsafe_copy_from_user((task)->thread.fp_state.fpr, from,	\
>> +			    ELF_NFPREG * sizeof(double), label)
>> +
>>   static inline unsigned long
>>   copy_fpr_to_user(void __user *to, struct task_struct *task)
>>   {
>> @@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
>>   #else
>>   #define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
>>
>> +#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
>> +
>>   static inline unsigned long
>>   copy_fpr_to_user(void __user *to, struct task_struct *task)
>>   {
>> --
>> 2.26.1
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply

* [PATCH 3/3] sched/core: Update task_prio() function header
From: Dietmar Eggemann @ 2021-01-28 13:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Jeremy Kerr, Arnd Bergmann,
	Michael Ellerman
  Cc: Juri Lelli, Hillf Danton, Vincent Guittot, linux-kernel,
	Steven Rostedt, linuxppc-dev
In-Reply-To: <20210128131040.296856-1-dietmar.eggemann@arm.com>

The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc/<pid>/stat priority field:

  # cat /proc/<pid>/stat | awk '{ print $18; }'

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 625ec1e12064..be3a956c2d23 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5602,8 +5602,12 @@ SYSCALL_DEFINE1(nice, int, increment)
  * @p: the task in question.
  *
  * Return: The priority value as seen by users in /proc.
- * RT tasks are offset by -200. Normal tasks are centered
- * around 0, value goes from -16 to +15.
+ *
+ * sched policy         return value   kernel prio    user prio/nice
+ *
+ * normal, batch, idle     [0 ... 39]  [100 ... 139]          0/[-20 ... 19]
+ * fifo, rr             [-2 ... -100]     [98 ... 0]  [1 ... 99]
+ * deadline                     -101             -1           0
  */
 int task_prio(const struct task_struct *p)
 {
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/3] sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
From: Dietmar Eggemann @ 2021-01-28 13:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Jeremy Kerr, Arnd Bergmann,
	Michael Ellerman
  Cc: Juri Lelli, Hillf Danton, Vincent Guittot, linux-kernel,
	Steven Rostedt, linuxppc-dev
In-Reply-To: <20210128131040.296856-1-dietmar.eggemann@arm.com>

The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

                = MAX_PRIO - MAX_RT_PRIO

       MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/powerpc/platforms/cell/spufs/sched.c | 2 +-
 include/linux/sched/prio.h                | 9 ---------
 kernel/sched/sched.h                      | 2 +-
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c
index f18d5067cd0f..aeb7f3922106 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -72,7 +72,7 @@ static struct timer_list spuloadavg_timer;
 #define DEF_SPU_TIMESLICE	(100 * HZ / (1000 * SPUSCHED_TICK))
 
 #define SCALE_PRIO(x, prio) \
-	max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_SPU_TIMESLICE)
+	max(x * (MAX_PRIO - prio) / (NICE_WIDTH / 2), MIN_SPU_TIMESLICE)
 
 /*
  * scale user-nice values [ -20 ... 0 ... 19 ] to time slice values:
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index d111f2fd77ea..ab83d85e1183 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -26,15 +26,6 @@
 #define NICE_TO_PRIO(nice)	((nice) + DEFAULT_PRIO)
 #define PRIO_TO_NICE(prio)	((prio) - DEFAULT_PRIO)
 
-/*
- * 'User priority' is the nice value converted to something we
- * can work with better when scaling various scheduler parameters,
- * it's a [ 0 ... 39 ] range.
- */
-#define USER_PRIO(p)		((p)-MAX_RT_PRIO)
-#define TASK_USER_PRIO(p)	USER_PRIO((p)->static_prio)
-#define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
-
 /*
  * Convert nice value [19,-20] to rlimit style value [1,40].
  */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 045b01064c1e..6edc67df3554 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -140,7 +140,7 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
  * scale_load() and scale_load_down(w) to convert between them. The
  * following must be true:
  *
- *  scale_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ *  scale_load(sched_prio_to_weight[NICE_TO_PRIO(0)-MAX_RT_PRIO]) == NICE_0_LOAD
  *
  */
 #define NICE_0_LOAD		(1L << NICE_0_LOAD_SHIFT)
-- 
2.25.1


^ permalink raw reply related

* [PATCH 0/3] sched: Task priority related cleanups
From: Dietmar Eggemann @ 2021-01-28 13:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Jeremy Kerr, Arnd Bergmann,
	Michael Ellerman
  Cc: Juri Lelli, Hillf Danton, Vincent Guittot, linux-kernel,
	Steven Rostedt, linuxppc-dev

(1) Removing MAX_USER_RT_PRIO was already discussed here in April 2020:

    https://lkml.kernel.org/r/20200423094403.6f1d2b8d@gandalf.local.home

(2) USER_PRIO() and related macros are not used anymore except in one
    case for powerpc where MAX_USER_PRIO can be replaced by NICE_WIDTH.
    Set_load_weight(), task_prio(), cpu_weight_nice_write_s64(),
    __update_max_tr() don't use USER_PRIO() but priority - MAX_RT_PRIO.

(3) The function header of task_prio() needs an update. It looks
    ancient since it mentions a prio space [-16 ... 15] for mormal
    tasks. I can't figure out why this range is mentioned here? Maybe
    the influence of the 'sleep-bonus interactivity' feature which was
    removed by commit f3479f10c5d6 ("sched: remove the sleep-bonus
    interactivity code")? 

Dietmar Eggemann (3):
  sched: Remove MAX_USER_RT_PRIO
  sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
  sched/core: Update task_prio() function header

 arch/powerpc/platforms/cell/spufs/sched.c |  2 +-
 include/linux/sched/prio.h                | 18 +-----------------
 kernel/sched/core.c                       | 15 +++++++++------
 kernel/sched/sched.h                      |  2 +-
 4 files changed, 12 insertions(+), 25 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH 1/3] sched: Remove MAX_USER_RT_PRIO
From: Dietmar Eggemann @ 2021-01-28 13:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Jeremy Kerr, Arnd Bergmann,
	Michael Ellerman
  Cc: Juri Lelli, Hillf Danton, Vincent Guittot, linux-kernel,
	Steven Rostedt, linuxppc-dev
In-Reply-To: <20210128131040.296856-1-dietmar.eggemann@arm.com>

Commit d46523ea32a7 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.

This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.

Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 include/linux/sched/prio.h | 9 +--------
 kernel/sched/core.c        | 7 +++----
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index 7d64feafc408..d111f2fd77ea 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -11,16 +11,9 @@
  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
  * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
  * values are inverted: lower p->prio value means higher priority.
- *
- * The MAX_USER_RT_PRIO value allows the actual maximum
- * RT priority to be separate from the value exported to
- * user-space.  This allows kernel threads to set their
- * priority to a value higher than any user task. Note:
- * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
  */
 
-#define MAX_USER_RT_PRIO	100
-#define MAX_RT_PRIO		MAX_USER_RT_PRIO
+#define MAX_RT_PRIO		100
 
 #define MAX_PRIO		(MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO		(MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06b449942adf..625ec1e12064 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5897,11 +5897,10 @@ static int __sched_setscheduler(struct task_struct *p,
 
 	/*
 	 * Valid priorities for SCHED_FIFO and SCHED_RR are
-	 * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL,
+	 * 1..MAX_RT_PRIO-1, valid priority for SCHED_NORMAL,
 	 * SCHED_BATCH and SCHED_IDLE is 0.
 	 */
-	if ((p->mm && attr->sched_priority > MAX_USER_RT_PRIO-1) ||
-	    (!p->mm && attr->sched_priority > MAX_RT_PRIO-1))
+	if (attr->sched_priority > MAX_RT_PRIO-1)
 		return -EINVAL;
 	if ((dl_policy(policy) && !__checkparam_dl(attr)) ||
 	    (rt_policy(policy) != (attr->sched_priority != 0)))
@@ -6969,7 +6968,7 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
 	switch (policy) {
 	case SCHED_FIFO:
 	case SCHED_RR:
-		ret = MAX_USER_RT_PRIO-1;
+		ret = MAX_RT_PRIO-1;
 		break;
 	case SCHED_DEADLINE:
 	case SCHED_NORMAL:
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Zorro Lang @ 2021-01-28 13:52 UTC (permalink / raw)
  To: Jens Axboe, Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <66061f75-c8de-c1eb-aaaf-9594a31be790@kernel.dk>

On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> On 1/27/21 8:13 PM, Zorro Lang wrote:
> > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> >> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> >>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> >>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> >>>>> The fail source line is:
> >>>>>
> >>>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> >>>>>        return SIGSEGV;
> >>>>>
> >>>>> The is_user() is based on user_mod(regs) only. This's not suit for
> >>>>> io_uring, where the helper thread can assume the user app identity
> >>>>> and could perform this fault just fine. So turn to use mm to decide
> >>>>> if this is valid or not.
> >>>>
> >>>> I don't understand why testing is_user would be an issue. KUAP purpose
> >>>> it to block any unallowed access from kernel to user memory
> >>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> >>>> that is what is_user provides.
> >>>>
> >>>> If the kernel access is legitimate, kernel should have opened
> >>>> userspace access then you shouldn't get this "Bug: Read fault blocked
> >>>> by KUAP!".
> >>>>
> >>>> As far as I understand, the fault occurs in
> >>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> >>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
> >>>> access and you really should get a KUAP fault.
> >>>>
> >>>> So the problem is somewhere else, I think you proposed patch just
> >>>> hides the problem, it doesn't fix it.
> >>>
> >>> If we do kthread_use_mm(), can we agree that the user access is valid?
> >>
> >> Yeah the io uring code is fine, provided it uses the uaccess primitives 
> >> like any other kernel code. It's looking more like a an arch/powerpc bug.
> >>
> >>> We should be able to copy to/from user space, and including faults, if
> >>> that's been done and the new mm assigned. Because it really should be.
> >>> If SMAP was a problem on x86, we would have seen it long ago.
> >>>
> >>> I'm assuming this may be breakage related to the recent uaccess changes
> >>> related to set_fs and friends? Or maybe recent changes on the powerpc
> >>> side?
> >>>
> >>> Zorro, did 5.10 work?
> >>
> >> Would be interesting to know.
> > 
> > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > commit(be the HEAD) in 5.10 phase?
> 
> I forget which versions had what series of this, but 5.10 final - and if
> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> 5.10 definitely has them.

I justed built linux v5.10 with same .config file, and gave it same test.
v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:

# ./check generic/013 generic/051
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch

generic/013 138s ...  77s
generic/051 103s ...  143s
Ran: generic/013 generic/051
Passed all 2 tests

> 
> -- 
> Jens Axboe
> 


^ permalink raw reply

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Jens Axboe @ 2021-01-28 14:42 UTC (permalink / raw)
  To: Zorro Lang, Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <20210128135220.GQ14354@localhost.localdomain>

On 1/28/21 6:52 AM, Zorro Lang wrote:
> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>> The fail source line is:
>>>>>>>
>>>>>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>        return SIGSEGV;
>>>>>>>
>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>> if this is valid or not.
>>>>>>
>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>> it to block any unallowed access from kernel to user memory
>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>> that is what is_user provides.
>>>>>>
>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>> by KUAP!".
>>>>>>
>>>>>> As far as I understand, the fault occurs in
>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>> access and you really should get a KUAP fault.
>>>>>>
>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>> hides the problem, it doesn't fix it.
>>>>>
>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>
>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives 
>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>
>>>>> We should be able to copy to/from user space, and including faults, if
>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>
>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>> side?
>>>>>
>>>>> Zorro, did 5.10 work?
>>>>
>>>> Would be interesting to know.
>>>
>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>> commit(be the HEAD) in 5.10 phase?
>>
>> I forget which versions had what series of this, but 5.10 final - and if
>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>> 5.10 definitely has them.
> 
> I justed built linux v5.10 with same .config file, and gave it same test.
> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> 
> # ./check generic/013 generic/051
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> 
> generic/013 138s ...  77s
> generic/051 103s ...  143s
> Ran: generic/013 generic/051
> Passed all 2 tests

Thanks for testing that, so I think it's safe to conclude that there's a
regression in powerpc fault handling for kthreads that use
kthread_use_mm in this release. A bisect would definitely find it, but
might be pointless if Christophe or Nick already have an idea of what it
is.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Christophe Leroy @ 2021-01-28 14:44 UTC (permalink / raw)
  To: Jens Axboe, Zorro Lang, Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <aedb880b-da2b-ec29-3b66-66f01733be9b@kernel.dk>



Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> On 1/28/21 6:52 AM, Zorro Lang wrote:
>> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>>> The fail source line is:
>>>>>>>>
>>>>>>>>     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>>         return SIGSEGV;
>>>>>>>>
>>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>>> if this is valid or not.
>>>>>>>
>>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>>> it to block any unallowed access from kernel to user memory
>>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>>> that is what is_user provides.
>>>>>>>
>>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>>> by KUAP!".
>>>>>>>
>>>>>>> As far as I understand, the fault occurs in
>>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>>> access and you really should get a KUAP fault.
>>>>>>>
>>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>>> hides the problem, it doesn't fix it.
>>>>>>
>>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>>
>>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives
>>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>>
>>>>>> We should be able to copy to/from user space, and including faults, if
>>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>>
>>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>>> side?
>>>>>>
>>>>>> Zorro, did 5.10 work?
>>>>>
>>>>> Would be interesting to know.
>>>>
>>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>>> commit(be the HEAD) in 5.10 phase?
>>>
>>> I forget which versions had what series of this, but 5.10 final - and if
>>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>>> 5.10 definitely has them.
>>
>> I justed built linux v5.10 with same .config file, and gave it same test.
>> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
>>
>> # ./check generic/013 generic/051
>> FSTYP         -- xfs (non-debug)
>> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
>> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
>>
>> generic/013 138s ...  77s
>> generic/051 103s ...  143s
>> Ran: generic/013 generic/051
>> Passed all 2 tests
> 
> Thanks for testing that, so I think it's safe to conclude that there's a
> regression in powerpc fault handling for kthreads that use
> kthread_use_mm in this release. A bisect would definitely find it, but
> might be pointless if Christophe or Nick already have an idea of what it
> is.
> 

I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

Christophe

^ permalink raw reply

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Zorro Lang @ 2021-01-28 15:20 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <17ae2706-fe95-a5de-b9da-e3480800daf7@csgroup.eu>

On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
> 
> 
> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> > On 1/28/21 6:52 AM, Zorro Lang wrote:
> > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> > > > On 1/27/21 8:13 PM, Zorro Lang wrote:
> > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> > > > > > > > > The fail source line is:
> > > > > > > > > 
> > > > > > > > >     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> > > > > > > > >         return SIGSEGV;
> > > > > > > > > 
> > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for
> > > > > > > > > io_uring, where the helper thread can assume the user app identity
> > > > > > > > > and could perform this fault just fine. So turn to use mm to decide
> > > > > > > > > if this is valid or not.
> > > > > > > > 
> > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose
> > > > > > > > it to block any unallowed access from kernel to user memory
> > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> > > > > > > > that is what is_user provides.
> > > > > > > > 
> > > > > > > > If the kernel access is legitimate, kernel should have opened
> > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked
> > > > > > > > by KUAP!".
> > > > > > > > 
> > > > > > > > As far as I understand, the fault occurs in
> > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate
> > > > > > > > access and you really should get a KUAP fault.
> > > > > > > > 
> > > > > > > > So the problem is somewhere else, I think you proposed patch just
> > > > > > > > hides the problem, it doesn't fix it.
> > > > > > > 
> > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid?
> > > > > > 
> > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives
> > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug.
> > > > > > 
> > > > > > > We should be able to copy to/from user space, and including faults, if
> > > > > > > that's been done and the new mm assigned. Because it really should be.
> > > > > > > If SMAP was a problem on x86, we would have seen it long ago.
> > > > > > > 
> > > > > > > I'm assuming this may be breakage related to the recent uaccess changes
> > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc
> > > > > > > side?
> > > > > > > 
> > > > > > > Zorro, did 5.10 work?
> > > > > > 
> > > > > > Would be interesting to know.
> > > > > 
> > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > > > > commit(be the HEAD) in 5.10 phase?
> > > > 
> > > > I forget which versions had what series of this, but 5.10 final - and if
> > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> > > > 5.10 definitely has them.
> > > 
> > > I justed built linux v5.10 with same .config file, and gave it same test.
> > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> > > 
> > > # ./check generic/013 generic/051
> > > FSTYP         -- xfs (non-debug)
> > > PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> > > MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> > > 
> > > generic/013 138s ...  77s
> > > generic/051 103s ...  143s
> > > Ran: generic/013 generic/051
> > > Passed all 2 tests
> > 
> > Thanks for testing that, so I think it's safe to conclude that there's a
> > regression in powerpc fault handling for kthreads that use
> > kthread_use_mm in this release. A bisect would definitely find it, but
> > might be pointless if Christophe or Nick already have an idea of what it
> > is.
> > 
> 
> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

OK, I don't have the vmlinux matching that bug report now, I can help to prepare a new one, but
I need lots of time (about 10+ hours).

Thanks,
Zorro

> 
> Christophe
> 


^ permalink raw reply

* Re: [PATCH 13/13] module: remove EXPORY_UNUSED_SYMBOL*
From: Christoph Hellwig @ 2021-01-28 16:09 UTC (permalink / raw)
  To: Jessica Yu
  Cc: Petr Mladek, Joe Lawrence, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Jiri Kosina, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	dri-devel, Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	Daniel Vetter, Miroslav Benes, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBFvcmUiHRjkucbf@gunter>

On Wed, Jan 27, 2021 at 02:49:38PM +0100, Jessica Yu wrote:
>> #ifdef CONFIG_MODULE_SIG
>> 	/* Signature was verified. */
>> 	bool sig_ok;
>> @@ -592,7 +580,6 @@ struct symsearch {
>> 		GPL_ONLY,
>> 		WILL_BE_GPL_ONLY,
>> 	} license;
>> -	bool unused;
>> };

> Thanks for the cleanups. While we're here, I noticed that struct
> symsearch is only used internally in kernel/module.c, so I don't think
> it actually needs to be in include/linux/module.h. I don't see it used
> anywhere else. We could move maybe that to kernel/module-internal.h.

I've added a patch to just move it directly into module.c.

^ permalink raw reply

* Re: [PATCH 03/13] livepatch: refactor klp_init_object
From: Christoph Hellwig @ 2021-01-28 16:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jiri Kosina, Andrew Donnellan, linux-kbuild, David Airlie,
	Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst, linux-kernel,
	Maxime Ripard, live-patching, Michal Marek, Joe Lawrence,
	dri-devel, Thomas Zimmermann, Jessica Yu, Frederic Barrat,
	Daniel Vetter, Miroslav Benes, linuxppc-dev, Christoph Hellwig
In-Reply-To: <YBFjbbuQ7sn4T7yT@alley>

On Wed, Jan 27, 2021 at 01:58:21PM +0100, Petr Mladek wrote:
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -54,9 +54,6 @@ static void klp_find_object_module(struct klp_object *obj)
> >  {
> >  	struct module *mod;
> >  
> > -	if (!klp_is_module(obj))
> > -		return;
> > -
> 
> We need to either update the function description or keep this check.
> 
> I prefer to keep the check. The function does the right thing also
> for the object "vmlinux". Also the livepatch code includes many
> similar paranoid checks that makes the code less error prone
> against any further changes.

Well, the check is in the caller now where we have a conditional for
it.  So I'd be tempted to either update the comment, or just drop the
patch.

^ permalink raw reply

* Re: [PATCH 03/13] livepatch: refactor klp_init_object
From: Christoph Hellwig @ 2021-01-28 16:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jiri Kosina, Andrew Donnellan, linux-kbuild, David Airlie,
	Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst, linux-kernel,
	Maxime Ripard, live-patching, Michal Marek, Joe Lawrence,
	dri-devel, Thomas Zimmermann, Jessica Yu, Frederic Barrat,
	Daniel Vetter, Miroslav Benes, linuxppc-dev, Christoph Hellwig
In-Reply-To: <20210128162240.GA3417@lst.de>

On Thu, Jan 28, 2021 at 05:22:40PM +0100, Christoph Hellwig wrote:
> > We need to either update the function description or keep this check.
> > 
> > I prefer to keep the check. The function does the right thing also
> > for the object "vmlinux". Also the livepatch code includes many
> > similar paranoid checks that makes the code less error prone
> > against any further changes.
> 
> Well, the check is in the caller now where we have a conditional for
> it.  So I'd be tempted to either update the comment, or just drop the
> patch.

Also even without the check I think it will do the right thing when
called for vmlinux given that it simplify won't find a module called
vmlinux..

^ permalink raw reply

* Re: [PATCH] powerpc/sstep: Fix array out of bound warning
From: Naveen N. Rao @ 2021-01-28 17:20 UTC (permalink / raw)
  To: Ravi Bangoria; +Cc: naveen.n.rao, paulus, linuxppc-dev, jniethe5
In-Reply-To: <20210115061620.692500-1-ravi.bangoria@linux.ibm.com>

On 2021/01/15 11:46AM, Ravi Bangoria wrote:
> Compiling kernel with -Warray-bounds throws below warning:
> 
>   In function 'emulate_vsx_store':
>   warning: array subscript is above array bounds [-Warray-bounds]
>   buf.d[2] = byterev_8(reg->d[1]);
>   ~~~~~^~~
>   buf.d[3] = byterev_8(reg->d[0]);
>   ~~~~~^~~
> 
> Fix it by converting local variable 'union vsx_reg buf' into an array.
> Also consider function argument 'union vsx_reg *reg' as array instead
> of pointer because callers are actually passing an array to it.

I think you should change the function prototype to reflect this.

However, while I agree with this change in principle, it looks to be a 
lot of code churn for a fairly narrow use. Perhaps we should just 
address the specific bug. Something like the below (not tested)?

@@ -818,13 +818,15 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
                        break;
                if (rev) {
                        /* reverse 32 bytes */
-                       buf.d[0] = byterev_8(reg->d[3]);
-                       buf.d[1] = byterev_8(reg->d[2]);
-                       buf.d[2] = byterev_8(reg->d[1]);
-                       buf.d[3] = byterev_8(reg->d[0]);
-                       reg = &buf;
+                       union vsx_reg buf32[2];
+                       buf32[0].d[0] = byterev_8(reg[1].d[1]);
+                       buf32[0].d[1] = byterev_8(reg[1].d[0]);
+                       buf32[1].d[0] = byterev_8(reg[0].d[1]);
+                       buf32[1].d[1] = byterev_8(reg[0].d[0]);
+                       memcpy(mem, buf32, size);
+               } else {
+                       memcpy(mem, reg, size);
                }
-               memcpy(mem, reg, size);
                break;
        case 16:
                /* stxv, stxvx, stxvl, stxvll */


- Naveen


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox