LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 11/15] selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
From: Christian Brauner @ 2020-09-15 16:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-12-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:16AM -0700, Kees Cook wrote:
> Instead of special-casing the specific case of shared registers, create
> a default SYSCALL_RET_SET() macro (mirroring SYSCALL_NUM_SET()), that
> writes to the SYSCALL_RET register. For architectures that can't set the
> return value (for whatever reason), they can define SYSCALL_RET_SET()
> without an associated SYSCALL_RET() macro. This also paves the way for
> architectures that need to do special things to set the return value
> (e.g. powerpc).
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 10/15] selftests/seccomp: Avoid redundant register flushes
From: Christian Brauner @ 2020-09-15 16:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-11-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:15AM -0700, Kees Cook wrote:
> When none of the registers have changed, don't flush them back. This can
> happen if the architecture uses a non-register way to change the syscall
> (e.g. arm64) , and a return value hasn't been written.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 09/15] selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
From: Christian Brauner @ 2020-09-15 16:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-10-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:14AM -0700, Kees Cook wrote:
> Consolidate the REGSET logic into the new ARCH_GETREG() and
> ARCH_SETREG() macros, avoiding more #ifdef code in function bodies.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH v2] powerpc/papr_scm: Fix warning triggered by perf_stats_show()
From: Vaibhav Jain @ 2020-09-15 16:03 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-nvdimm
  Cc: Aneesh Kumar K . V, Santosh Sivaraj, Oliver O'Halloran,
	Dan Williams, Ira Weiny
In-Reply-To: <87imcfp9a7.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>> A warning is reported by the kernel in case perf_stats_show() returns
>> an error code. The warning is of the form below:
>>
>>  papr_scm ibm,persistent-memory:ibm,pmemory@44100001:
>>  	  Failed to query performance stats, Err:-10
>>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
>>
>> On investigation it looks like that the compiler is silently truncating the
>> return value of drc_pmem_query_stats() from 'long' to 'int', since the
>> variable used to store the return code 'rc' is an 'int'. This
>> truncated value is then returned back as a 'ssize_t' back from
>> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
>> unsigned number and triggers this warning..
>>
>> To fix this we update the type of variable 'rc' from 'int' to
>> 'ssize_t' that prevents the compiler from truncating the return value
>> of drc_pmem_query_stats() and returning correct signed value back from
>> perf_stats_show().
>>
>> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>>        stats from PHYP')
>
> Please don't word wrap the Fixes tag it breaks b4.
>
> I've fixed it up this time.

Thanks Mpe

>
> cheers

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* Re: [PATCH 08/15] selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
From: Christian Brauner @ 2020-09-15 16:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-9-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:13AM -0700, Kees Cook wrote:
> Instead of special-casing the get/set-registers routines, move the
> HAVE_GETREG logic into the new ARCH_GETREG() and ARCH_SETREG() macros.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 07/15] selftests/seccomp: Remove syscall setting #ifdefs
From: Christian Brauner @ 2020-09-15 16:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-8-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:12AM -0700, Kees Cook wrote:
> With all architectures now using the common SYSCALL_NUM_SET() macro, the
> arch-specific #ifdef can be removed from change_syscall() itself.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 06/15] selftests/seccomp: mips: Remove O32-specific macro
From: Christian Brauner @ 2020-09-15 16:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-7-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:11AM -0700, Kees Cook wrote:
> Instead of having the mips O32 macro special-cased, pull the logic into
> the SYSCALL_NUM() macro. Additionally include the ABI headers, since
> these appear to have been missing, leaving __NR_O32_Linux undefined.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 05/15] selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
From: Christian Brauner @ 2020-09-15 15:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-6-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:10AM -0700, Kees Cook wrote:
> Remove the arm64 special-case in change_syscall().
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

We're using iovecs in ptrace()??

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 04/15] selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
From: Christian Brauner @ 2020-09-15 15:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-5-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:09AM -0700, Kees Cook wrote:
> Remove the arm special-case in change_syscall().
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 03/15] selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
From: Christian Brauner @ 2020-09-15 15:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-4-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:08AM -0700, Kees Cook wrote:
> Remove the mips special-case in change_syscall().
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 1c83e743bfb1..02a9a6599746 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -1742,6 +1742,13 @@ TEST_F(TRACE_poke, getpid_runs_normally)
>  # define ARCH_REGS		struct pt_regs
>  # define SYSCALL_NUM(_regs)	(_regs).regs[2]
>  # define SYSCALL_SYSCALL_NUM	regs[4]
> +# define SYSCALL_NUM_SET(_regs, _nr)			\
> +	do {						\
> +		if ((_regs).regs[2] == __NR_O32_Linux)	\
> +			(_regs).regs[4] = _nr;		\
> +		else					\
> +			(_regs).regs[2] = _nr;		\
> +	} while (0)

I think that

# define SYSCALL_NUM_SET(_regs, _nr)				\
	do {							\
		if (SYSCALL_NUM(_regs) == __NR_O32_Linux)	\
			(_regs).regs[4] = _nr;			\
		else						\
			(_regs).regs[2] = _nr;			\
	} while (0)

would read better but that's just a matter of taste. :)

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 02/15] selftests/seccomp: Provide generic syscall setting macro
From: Christian Brauner @ 2020-09-15 15:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-3-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:07AM -0700, Kees Cook wrote:
> In order to avoid "#ifdef"s in the main function bodies, create a new
> macro, SYSCALL_NUM_SET(), where arch-specific logic can live.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

SYSCALL_SWITCH(_regs, nr)?

But looks good either way!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* Re: [PATCH 01/15] selftests/seccomp: Refactor arch register macros to avoid xtensa special case
From: Christian Brauner @ 2020-09-15 15:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <20200912110820.597135-2-keescook@chromium.org>

On Sat, Sep 12, 2020 at 04:08:06AM -0700, Kees Cook wrote:
> To avoid an xtensa special-case, refactor all arch register macros to
> take the register variable instead of depending on the macro expanding
> as a struct member name.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Looks good!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply

* [Bug 209277] powerpc: obsolete driver: Marvell MV64X60 MPSC
From: bugzilla-daemon @ 2020-09-15 14:25 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-209277-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=209277

--- Comment #1 from Necip Fazil Yildiran (fazilyildiran@gmail.com) ---
The config MV64X60 in arch/powerpc/platforms/embedded6xx/Kconfig is 
non-prompt selected nowhere -- thus, cannot be enabled.

In addition, a few other configs cannot be enabled due to their dependency
on MV64X60, e.g., EDAC_MV64X60.

The last to use this driver was by PrPMC 280/2800, for which the support
was ended with the commit 3c8464a9b12b 
("powerpc: Delete old PrPMC 280/2800 support").

This looks like the related configs (e.g., MV64X60, EDAC_MV64X60) and the
code (e.g., drivers/edac/mv64x60_edac.c) for Marvell MV64X60 MPSC are now
obsolete.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 209277] New: Dead code :q
From: bugzilla-daemon @ 2020-09-15 14:09 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209277

            Bug ID: 209277
           Summary: Dead code :q
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 5.9-rc4
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: PPC-32
          Assignee: platform_ppc-32@kernel-bugs.osdl.org
          Reporter: fazilyildiran@gmail.com
        Regression: No

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [5.9.0-rc5-20200914] Kernel crash while running LTP(mlock201)
From: Matthew Wilcox @ 2020-09-15 13:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Sachin Sant, linux-next, linuxppc-dev
In-Reply-To: <87o8m7p9jd.fsf@mpe.ellerman.id.au>

On Tue, Sep 15, 2020 at 09:24:38PM +1000, Michael Ellerman wrote:
> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> > While running LTP tests (specifically mlock201) against next-20200914 tree
> > on a POWER9 LPAR results in following crash.
> 
> Looks the same as:
> 
> https://lore.kernel.org/linux-mm/20200914085545.GB28738@shao2-debian/

https://lore.kernel.org/linux-mm/20200914112738.GM6583@casper.infradead.org/

^ permalink raw reply

* Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
From: Michael Ellerman @ 2020-09-15 12:54 UTC (permalink / raw)
  To: Michal Suchánek, linuxppc-dev, mahesh
In-Reply-To: <20200915084302.GG29778@kitsune.suse.cz>

Michal Suchánek <msuchanek@suse.de> writes:
> Hello,
>
> Using the SLB mutihit injection test module (which I did not write so I
> do not want to post it here) to verify updates on my 5.3 frankernekernel
> I found that the kernel crashes with Oops: kernel bad access.
>
> I tested on latest upstream kernel build that I have at hand and the
> result is te same (minus the message - nothing was logged and the kernel
> simply rebooted).

That's disappointing.

> Since the whole effort to write a real mode MCE handler was supposed to
> prevent this maybe the SLB injection module should be added to the
> kernel selftests?

Yes I'd like to see it upstream. I think it should be integrated into
LKDTM, which contains other dangerous things like that and is designed
for testing how the kernel handles/recovers from bad conditions.

cheers

^ permalink raw reply

* Re: [PATCH 00/15] selftests/seccomp: Refactor change_syscall()
From: Michael Ellerman @ 2020-09-15 12:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thadeu Lima de Souza Cascardo, Will Drewry, linux-xtensa,
	linux-kernel, Andy Lutomirski, Max Filippov, linux-arm-kernel,
	linux-kselftest, linux-mips, linuxppc-dev, Christian Brauner
In-Reply-To: <202009141321.366935EF52@keescook>

Kees Cook <keescook@chromium.org> writes:
> On Mon, Sep 14, 2020 at 10:15:18PM +1000, Michael Ellerman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> > Hi,
>> >
>> > This refactors the seccomp selftest macros used in change_syscall(),
>> > in an effort to remove special cases for mips, arm, arm64, and xtensa,
>> > which paves the way for powerpc fixes.
>> >
>> > I'm not entirely done testing, but all-arch build tests and x86_64
>> > selftests pass. I'll be doing arm, arm64, and i386 selftests shortly,
>> > but I currently don't have an easy way to check xtensa, mips, nor
>> > powerpc. Any help there would be appreciated!
>> 
>> The series builds fine for me, and all the tests pass (see below).
>> 
>> Thanks for picking up those changes to deal with powerpc being oddball.
>> 
>> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
>
> Awesome; thanks!
>
> However...
>
>> ./seccomp_bpf
>> TAP version 13
>> 1..86
>> # Starting 86 tests from 7 test cases.
>> #  RUN           global.kcmp ...
>> #            OK  global.kcmp
>> ok 1 global.kcmp
>> [...]
>> #  RUN           global.KILL_thread ...
>> TAP version 13
>> 1..86
>> # Starting 86 tests from 7 test cases.
>
> Was this a mis-paste, or has something very very bad happened here in
> global.KILL_one_arg_six finishes?
>
...
>> TAP version 13
>> 1..86
>> # Starting 86 tests from 7 test cases.
>> [...]
>> # PASSED: 86 / 86 tests passed.
>> # Totals: pass:86 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> And after every user_notification test? O_O

Haha, I thought that was normal :)

It's because of redirection, I run the tests with:

  find . -executable -type f -print -execdir '{}' ';' | tee test.log

If I just run it directly on the terminal everything is normal.

It'll be fork() vs libc buffering.

I can fix it with:

$ stdbuf -oL ./seccomp_bpf | tee test.log

Or the patch below.

I can send a proper patch for that tomorrow, I don't know that harness
code, but I think that's the right fix.

cheers


diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4f78e4805633..b1bd00ff3d94 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -971,6 +971,7 @@ void __run_test(struct __fixture_metadata *f,
 
 	ksft_print_msg(" RUN           %s%s%s.%s ...\n",
 	       f->name, variant->name[0] ? "." : "", variant->name, t->name);
+	fflush(stdout);
 	t->pid = fork();
 	if (t->pid < 0) {
 		ksft_print_msg("ERROR SPAWNING TEST CHILD\n");

^ permalink raw reply related

* [PATCH] powerpc/64s: move the last of the page fault handling logic to C
From: Nicholas Piggin @ 2020-09-15 11:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The page fault handling still has some complex logic particularly around
hash table handling, in asm. Implement this in C instead.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/bug.h        |   1 +
 arch/powerpc/kernel/exceptions-64s.S  | 131 +++++---------------------
 arch/powerpc/mm/book3s64/hash_utils.c |  77 +++++++++------
 arch/powerpc/mm/fault.c               |  55 ++++++++++-
 4 files changed, 124 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 338f36cd9934..d714d83bbc7c 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -112,6 +112,7 @@
 
 struct pt_regs;
 extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
+extern int hash__do_page_fault(struct pt_regs *, unsigned long, unsigned long);
 extern void bad_page_fault(struct pt_regs *, unsigned long, int);
 extern void _exception(int, struct pt_regs *, int, unsigned long);
 extern void _exception_pkey(struct pt_regs *, unsigned long, int);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f7d748b88705..f830b893fe03 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1403,14 +1403,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
  *
  * Handling:
  * - Hash MMU
- *   Go to do_hash_page first to see if the HPT can be filled from an entry in
- *   the Linux page table. Hash faults can hit in kernel mode in a fairly
+ *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
+ *   Linux page table. Hash faults can hit in kernel mode in a fairly
  *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
  *   "non-bolted" regions, e.g., vmalloc space. However these should always be
- *   backed by Linux page tables.
+ *   backed by Linux page table entries.
  *
- *   If none is found, do a Linux page fault. Linux page faults can happen in
- *   kernel mode due to user copy operations of course.
+ *   If no entry is found the Linux page fault handler is invoked (by
+ *   do_hash_fault). Linux page faults can happen in kernel mode due to user
+ *   copy operations of course.
  *
  * - Radix MMU
  *   The hardware loads from the Linux page table directly, so a fault goes
@@ -1438,13 +1439,17 @@ EXC_COMMON_BEGIN(data_access_common)
 	GEN_COMMON data_access
 	ld	r4,_DAR(r1)
 	ld	r5,_DSISR(r1)
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
-	ld	r6,_MSR(r1)
-	li	r3,0x300
-	b	do_hash_page		/* Try to handle as hpte fault */
+	bl	do_hash_fault
 MMU_FTR_SECTION_ELSE
-	b	handle_page_fault
+	bl	do_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
+        cmpdi	r3,0
+	beq+	interrupt_return
+	/* We need to restore NVGPRS */
+	REST_NVGPRS(r1)
+	b       interrupt_return
 
 	GEN_KVM data_access
 
@@ -1539,13 +1544,17 @@ EXC_COMMON_BEGIN(instruction_access_common)
 	GEN_COMMON instruction_access
 	ld	r4,_DAR(r1)
 	ld	r5,_DSISR(r1)
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
-	ld      r6,_MSR(r1)
-	li	r3,0x400
-	b	do_hash_page		/* Try to handle as hpte fault */
+	bl	do_hash_fault
 MMU_FTR_SECTION_ELSE
-	b	handle_page_fault
+	bl	do_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
+        cmpdi	r3,0
+	beq+	interrupt_return
+	/* We need to restore NVGPRS */
+	REST_NVGPRS(r1)
+	b       interrupt_return
 
 	GEN_KVM instruction_access
 
@@ -3197,99 +3206,3 @@ disable_machine_check:
 	RFI_TO_KERNEL
 1:	mtlr	r0
 	blr
-
-/*
- * Hash table stuff
- */
-	.balign	IFETCH_ALIGN_BYTES
-do_hash_page:
-#ifdef CONFIG_PPC_BOOK3S_64
-	lis	r0,(DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)@h
-	ori	r0,r0,DSISR_BAD_FAULT_64S@l
-	and.	r0,r5,r0		/* weird error? */
-	bne-	handle_page_fault	/* if not, try to insert a HPTE */
-
-	/*
-	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
-	 * don't call hash_page, just fail the fault. This is required to
-	 * prevent re-entrancy problems in the hash code, namely perf
-	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
-	 * hash fault. See the comment in hash_preload().
-	 */
-	ld	r11, PACA_THREAD_INFO(r13)
-	lwz	r0,TI_PREEMPT(r11)
-	andis.	r0,r0,NMI_MASK@h
-	bne	77f
-
-	/*
-	 * r3 contains the trap number
-	 * r4 contains the faulting address
-	 * r5 contains dsisr
-	 * r6 msr
-	 *
-	 * at return r3 = 0 for success, 1 for page fault, negative for error
-	 */
-	bl	__hash_page		/* build HPTE if possible */
-        cmpdi	r3,0			/* see if __hash_page succeeded */
-
-	/* Success */
-	beq	interrupt_return	/* Return from exception on success */
-
-	/* Error */
-	blt-	13f
-
-	/* Reload DAR/DSISR into r4/r5 for the DABR check below */
-	ld	r4,_DAR(r1)
-	ld      r5,_DSISR(r1)
-#endif /* CONFIG_PPC_BOOK3S_64 */
-
-/* Here we have a page fault that hash_page can't handle. */
-handle_page_fault:
-11:	andis.  r0,r5,DSISR_DABRMATCH@h
-	bne-    handle_dabr_fault
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_page_fault
-	cmpdi	r3,0
-	beq+	interrupt_return
-	mr	r5,r3
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ld	r4,_DAR(r1)
-	bl	bad_page_fault
-	b	interrupt_return
-
-/* We have a data breakpoint exception - handle it */
-handle_dabr_fault:
-	ld      r4,_DAR(r1)
-	ld      r5,_DSISR(r1)
-	addi    r3,r1,STACK_FRAME_OVERHEAD
-	bl      do_break
-	/*
-	 * do_break() may have changed the NV GPRS while handling a breakpoint.
-	 * If so, we need to restore them with their updated values.
-	 */
-	REST_NVGPRS(r1)
-	b       interrupt_return
-
-
-#ifdef CONFIG_PPC_BOOK3S_64
-/* We have a page fault that hash_page could handle but HV refused
- * the PTE insertion
- */
-13:	mr	r5,r3
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ld	r4,_DAR(r1)
-	bl	low_hash_fault
-	b	interrupt_return
-#endif
-
-/*
- * We come here as a result of a DSI at a point where we don't want
- * to call hash_page, such as when we are accessing memory (possibly
- * user memory) inside a PMU interrupt that occurred while interrupts
- * were soft-disabled.  We want to invoke the exception handler for
- * the access, or panic if there isn't a handler.
- */
-77:	addi	r3,r1,STACK_FRAME_OVERHEAD
-	li	r5,SIGSEGV
-	bl	bad_page_fault
-	b	interrupt_return
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 1da9dbba9217..fd7b6bb7030d 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1500,16 +1500,40 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
 }
 EXPORT_SYMBOL_GPL(hash_page);
 
-int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
-		unsigned long msr)
+int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr)
 {
 	unsigned long access = _PAGE_PRESENT | _PAGE_READ;
 	unsigned long flags = 0;
-	struct mm_struct *mm = current->mm;
-	unsigned int region_id = get_region_id(ea);
+	struct mm_struct *mm;
+	unsigned int region_id;
+	int err;
+
+	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
+		goto _do_page_fault;
+
+	/*
+	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
+	 * don't call hash_page, just fail the fault. This is required to
+	 * prevent re-entrancy problems in the hash code, namely perf
+	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
+	 * hash fault. See the comment in hash_preload().
+	 *
+	 * We come here as a result of a DSI at a point where we don't want
+	 * to call hash_page, such as when we are accessing memory (possibly
+	 * user memory) inside a PMU interrupt that occurred while interrupts
+	 * were soft-disabled.  We want to invoke the exception handler for
+	 * the access, or panic if there isn't a handler.
+	 */
+	if (unlikely(in_nmi())) {
+		bad_page_fault(regs, ea, SIGSEGV);
+		return 0;
+	}
 
+	region_id = get_region_id(ea);
 	if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
 		mm = &init_mm;
+	else
+		mm = current->mm;
 
 	if (dsisr & DSISR_NOHPTE)
 		flags |= HPTE_NOHPTE_UPDATE;
@@ -1525,13 +1549,31 @@ int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
 	 * 2) user space access kernel space.
 	 */
 	access |= _PAGE_PRIVILEGED;
-	if ((msr & MSR_PR) || (region_id == USER_REGION_ID))
+	if (user_mode(regs) || (region_id == USER_REGION_ID))
 		access &= ~_PAGE_PRIVILEGED;
 
-	if (trap == 0x400)
+	if (regs->trap == 0x400)
 		access |= _PAGE_EXEC;
 
-	return hash_page_mm(mm, ea, access, trap, flags);
+	err = hash_page_mm(mm, ea, access, regs->trap, flags);
+	if (unlikely(err < 0)) {
+		// failed to instert a hash PTE due to an hypervisor error
+		if (user_mode(regs)) {
+			if (IS_ENABLED(CONFIG_PPC_SUBPAGE_PROT) && err == -2)
+				_exception(SIGSEGV, regs, SEGV_ACCERR, ea);
+			else
+				_exception(SIGBUS, regs, BUS_ADRERR, ea);
+		} else {
+			bad_page_fault(regs, ea, SIGBUS);
+		}
+		err = 0;
+
+	} else if (err) {
+_do_page_fault:
+		err = hash__do_page_fault(regs, ea, dsisr);
+	}
+
+	return err;
 }
 
 #ifdef CONFIG_PPC_MM_SLICES
@@ -1831,27 +1873,6 @@ void flush_hash_range(unsigned long number, int local)
 	}
 }
 
-/*
- * low_hash_fault is called when we the low level hash code failed
- * to instert a PTE due to an hypervisor error
- */
-void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
-{
-	enum ctx_state prev_state = exception_enter();
-
-	if (user_mode(regs)) {
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-		if (rc == -2)
-			_exception(SIGSEGV, regs, SEGV_ACCERR, address);
-		else
-#endif
-			_exception(SIGBUS, regs, BUS_ADRERR, address);
-	} else
-		bad_page_fault(regs, address, SIGBUS);
-
-	exception_exit(prev_state);
-}
-
 long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
 			   unsigned long pa, unsigned long rflags,
 			   unsigned long vflags, int psize, int ssize)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0add963a849b..ce43e401e0e0 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -405,7 +405,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
 
-	if (unlikely(page_fault_is_bad(error_code))) {
+	if (unlikely(page_fault_is_bad(error_code) || (error_code & DSISR_DABRMATCH))) {
+		if (error_code & DSISR_DABRMATCH)
+			return -1;
+
 		if (is_user) {
 			_exception(SIGBUS, regs, BUS_OBJERR, address);
 			return 0;
@@ -548,12 +551,58 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 		  unsigned long error_code)
 {
 	enum ctx_state prev_state = exception_enter();
-	int rc = __do_page_fault(regs, address, error_code);
+	int err;
+
+	err = __do_page_fault(regs, address, error_code);
+
+#ifdef CONFIG_PPC_BOOK3S_64
+	/* 32 and 64e handle errors in their asm code */
+	if (unlikely(err)) {
+		if (err > 0) {
+			bad_page_fault(regs, address, err);
+			err = 0;
+		} else {
+			/*
+			 * do_break() may change NV GPRS while handling the
+			 * breakpoint. Return -ve to caller to do that.
+			 */
+			do_break(regs, address, error_code);
+		}
+	}
+#endif
+
 	exception_exit(prev_state);
-	return rc;
+
+	return err;
 }
 NOKPROBE_SYMBOL(do_page_fault);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+/* Same as do_page_fault but interrupt entry has already run in do_hash_fault */
+int hash__do_page_fault(struct pt_regs *regs, unsigned long address,
+		  unsigned long error_code)
+{
+	int err;
+
+	err = __do_page_fault(regs, address, error_code);
+	if (unlikely(err)) {
+		if (err > 0) {
+			bad_page_fault(regs, address, err);
+			err = 0;
+		} else {
+			/*
+			 * do_break() may change NV GPRS while handling the
+			 * breakpoint. Return -ve to caller to do that.
+			 */
+			do_break(regs, address, error_code);
+		}
+	}
+
+	return err;
+}
+NOKPROBE_SYMBOL(hash__do_page_fault);
+#endif
+
 /*
  * bad_page_fault is called when we have a bad access from the kernel.
  * It is called from the DSI and ISI handlers in head.S and from some
-- 
2.23.0


^ permalink raw reply related

* [PATCH 6/6] powerpc/64: irq replay remove decrementer overflow check
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200915114650.3980244-1-npiggin@gmail.com>

This is an ad-hoc way to catch some cases of decrementer overflow. It
won't catch cases where interrupts were hard disabled before any soft
masked interrupts fired, for example. And it doesn't catch cases that
have overflowed an even number of times.

It's not clear what exactly what problem s being solved here. A lost
timer when we have an IRQ off latency of more than ~4.3 seconds could
be avoided (so long as it's also less than ~8.6s) but this is already
a hard lockup order of magnitude event, and the decrementer will wrap
again and provide a timer interrupt within the same latency magnitdue.

So the test catches some cases of lost decrementers in very exceptional
(buggy) latency event cases, reducing timer interrupt latency in that
case by up to 4.3 seconds. And for large decrementer, it's useless. It
is performed in potentially quite a hot path, reading the TB can be
a noticable overhead.

Perhaps more importantly it allows the clunky MSR[EE] vs
PACA_IRQ_HARD_DIS incoherency to be removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 50 +--------------------------------------
 1 file changed, 1 insertion(+), 49 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 631e6d236c97..d7162f142f24 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -102,14 +102,6 @@ static inline notrace unsigned long get_irq_happened(void)
 	return happened;
 }
 
-static inline notrace int decrementer_check_overflow(void)
-{
- 	u64 now = get_tb_or_rtc();
-	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
- 
-	return now >= *next_tb;
-}
-
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
@@ -142,35 +134,6 @@ notrace unsigned int __check_irq_replay(void)
 	trace_hardirqs_on();
 	trace_hardirqs_off();
 
-	/*
-	 * We are always hard disabled here, but PACA_IRQ_HARD_DIS may
-	 * not be set, which means interrupts have only just been hard
-	 * disabled as part of the local_irq_restore or interrupt return
-	 * code. In that case, skip the decrementr check becaus it's
-	 * expensive to read the TB.
-	 *
-	 * HARD_DIS then gets cleared here, but it's reconciled later.
-	 * Either local_irq_disable will replay the interrupt and that
-	 * will reconcile state like other hard interrupts. Or interrupt
-	 * retur will replay the interrupt and in that case it sets
-	 * PACA_IRQ_HARD_DIS by hand (see comments in entry_64.S).
-	 */
-	if (happened & PACA_IRQ_HARD_DIS) {
-		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow()) {
-				local_paca->irq_happened |= PACA_IRQ_DEC;
-				happened |= PACA_IRQ_DEC;
-			}
-		}
-	}
-
 	if (happened & PACA_IRQ_DEC) {
 		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
@@ -229,18 +192,6 @@ void replay_soft_interrupts(void)
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 		WARN_ON_ONCE(mfmsr() & MSR_EE);
 
-	if (happened & PACA_IRQ_HARD_DIS) {
-		/*
-		 * We may have missed a decrementer interrupt if hard disabled.
-		 * Check the decrementer register in case we had a rollover
-		 * while hard disabled.
-		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow())
-				happened |= PACA_IRQ_DEC;
-		}
-	}
-
 	/*
 	 * Force the delivery of pending soft-disabled interrupts on PS3.
 	 * Any HV call will have this side effect.
@@ -345,6 +296,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
 			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		__hard_irq_disable();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
-- 
2.23.0


^ permalink raw reply related

* [PATCH 5/6] powerpc/64: make restore_interrupts 64e only
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200915114650.3980244-1-npiggin@gmail.com>

This is not used by 64s.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index b725509f9073..631e6d236c97 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -191,6 +191,25 @@ notrace unsigned int __check_irq_replay(void)
 
 	return 0;
 }
+
+/*
+ * This is specifically called by assembly code to re-enable interrupts
+ * if they are currently disabled. This is typically called before
+ * schedule() or do_signal() when returning to userspace. We do it
+ * in C to avoid the burden of dealing with lockdep etc...
+ *
+ * NOTE: This is called with interrupts hard disabled but not marked
+ * as such in paca->irq_happened, so we need to resync this.
+ */
+void notrace restore_interrupts(void)
+{
+	if (irqs_disabled()) {
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+		local_irq_enable();
+	} else
+		__hard_irq_enable();
+}
+
 #endif /* CONFIG_PPC_BOOK3E */
 
 void replay_soft_interrupts(void)
@@ -364,24 +383,6 @@ notrace void arch_local_irq_restore(unsigned long mask)
 }
 EXPORT_SYMBOL(arch_local_irq_restore);
 
-/*
- * This is specifically called by assembly code to re-enable interrupts
- * if they are currently disabled. This is typically called before
- * schedule() or do_signal() when returning to userspace. We do it
- * in C to avoid the burden of dealing with lockdep etc...
- *
- * NOTE: This is called with interrupts hard disabled but not marked
- * as such in paca->irq_happened, so we need to resync this.
- */
-void notrace restore_interrupts(void)
-{
-	if (irqs_disabled()) {
-		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
-		local_irq_enable();
-	} else
-		__hard_irq_enable();
-}
-
 /*
  * This is a helper to use when about to go into idle low-power
  * when the latter has the side effect of re-enabling interrupts
-- 
2.23.0


^ permalink raw reply related

* [PATCH 4/6] powerpc/64e: remove 64s specific interrupt soft-mask code
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200915114650.3980244-1-npiggin@gmail.com>

Since the assembly soft-masking code was moved to 64e specific, there
are some 64s specific interrupt types still there. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64e.S | 10 ----------
 arch/powerpc/kernel/irq.c            |  2 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index ca444ca82b8d..f579ce46eef2 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1302,16 +1302,6 @@ fast_exception_return:
 	addi	r3,r1,STACK_FRAME_OVERHEAD;
 	bl	do_IRQ
 	b	ret_from_except
-1:	cmpwi	cr0,r3,0xf00
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	performance_monitor_exception
-	b	ret_from_except
-1:	cmpwi	cr0,r3,0xe60
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	handle_hmi_exception
-	b	ret_from_except
 1:	cmpwi	cr0,r3,0x900
 	bne	1f
 	addi	r3,r1,STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 736a6b56e7d6..b725509f9073 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -113,7 +113,7 @@ static inline notrace int decrementer_check_overflow(void)
 #ifdef CONFIG_PPC_BOOK3E
 
 /* This is called whenever we are re-enabling interrupts
- * and returns either 0 (nothing to do) or 500/900/280/a00/e80 if
+ * and returns either 0 (nothing to do) or 500/900/280 if
  * there's an EE, DEC or DBELL to generate.
  *
  * This is called in two contexts: From arch_local_irq_restore()
-- 
2.23.0


^ permalink raw reply related

* [PATCH 3/6] powerpc/64e: remove PACA_IRQ_EE_EDGE
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200915114650.3980244-1-npiggin@gmail.com>

This is not used anywhere.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hw_irq.h    |  5 ++---
 arch/powerpc/kernel/exceptions-64e.S |  1 -
 arch/powerpc/kernel/irq.c            | 23 -----------------------
 3 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 35060be09073..50dc35711db3 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -25,9 +25,8 @@
 #define PACA_IRQ_DBELL		0x02
 #define PACA_IRQ_EE		0x04
 #define PACA_IRQ_DEC		0x08 /* Or FIT */
-#define PACA_IRQ_EE_EDGE	0x10 /* BookE only */
-#define PACA_IRQ_HMI		0x20
-#define PACA_IRQ_PMI		0x40
+#define PACA_IRQ_HMI		0x10
+#define PACA_IRQ_PMI		0x20
 
 /*
  * Some soft-masked interrupts must be hard masked until they are replayed
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index d9ed79415100..ca444ca82b8d 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -988,7 +988,6 @@ kernel_dbg_exc:
 .endm
 
 masked_interrupt_book3e_0x500:
-	// XXX When adding support for EPR, use PACA_IRQ_EE_EDGE
 	masked_interrupt_book3e PACA_IRQ_EE 1
 
 masked_interrupt_book3e_0x900:
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3fdad9336885..736a6b56e7d6 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -181,16 +181,6 @@ notrace unsigned int __check_irq_replay(void)
 		return 0x500;
 	}
 
-	/*
-	 * Check if an EPR external interrupt happened this bit is typically
-	 * set if we need to handle another "edge" interrupt from within the
-	 * MPIC "EPR" handler.
-	 */
-	if (happened & PACA_IRQ_EE_EDGE) {
-		local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
-		return 0x500;
-	}
-
 	if (happened & PACA_IRQ_DBELL) {
 		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 		return 0x280;
@@ -270,19 +260,6 @@ void replay_soft_interrupts(void)
 			hard_irq_disable();
 	}
 
-	/*
-	 * Check if an EPR external interrupt happened this bit is typically
-	 * set if we need to handle another "edge" interrupt from within the
-	 * MPIC "EPR" handler.
-	 */
-	if (IS_ENABLED(CONFIG_PPC_BOOK3E) && (happened & PACA_IRQ_EE_EDGE)) {
-		local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
-		regs.trap = 0x500;
-		do_IRQ(&regs);
-		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-			hard_irq_disable();
-	}
-
 	if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (happened & PACA_IRQ_DBELL)) {
 		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 		if (IS_ENABLED(CONFIG_PPC_BOOK3E))
-- 
2.23.0


^ permalink raw reply related

* [PATCH 1/6] powerpc/64: fix irq replay missing preempt
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Prior to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt
replay in C"), replayed interrupts returned by the regular interrupt
exit code, which performs preemption in case an interrupt had set
need_resched.

This logic was missed by the conversion. Adding preempt_disable/enable
around the interrupt replay and final irq enable will reschedule if
needed.

Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bf21ebd36190..77019699606a 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -368,6 +368,12 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		}
 	}
 
+	/*
+	 * Disable preempt here, so that the below preempt_enable will
+	 * perform resched if required (a replayed interrupt may set
+	 * need_resched).
+	 */
+	preempt_disable();
 	irq_soft_mask_set(IRQS_ALL_DISABLED);
 	trace_hardirqs_off();
 
@@ -377,6 +383,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	trace_hardirqs_on();
 	irq_soft_mask_set(IRQS_ENABLED);
 	__hard_irq_enable();
+	preempt_enable();
 }
 EXPORT_SYMBOL(arch_local_irq_restore);
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH 2/6] powerpc/64: fix irq replay pt_regs->softe value
From: Nicholas Piggin @ 2020-09-15 11:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200915114650.3980244-1-npiggin@gmail.com>

Replayed interrupts get an "artificial" struct pt_regs constructed to
pass to interrupt handler functions. This did not get the softe field
set correctly, it's as though the interrupt has hit while irqs are
disabled. It should be IRQS_ENABLED.

This is possibly harmless, asynchronous handlers should not be testing
if irqs were disabled, but it might be possible for example some code
is shared with synchronous or NMI handlers, and it makes more sense if
debug output looks at this.

Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 77019699606a..3fdad9336885 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -214,7 +214,7 @@ void replay_soft_interrupts(void)
 	struct pt_regs regs;
 
 	ppc_save_regs(&regs);
-	regs.softe = IRQS_ALL_DISABLED;
+	regs.softe = IRQS_ENABLED;
 
 again:
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v2] powerpc/papr_scm: Fix warning triggered by perf_stats_show()
From: Michael Ellerman @ 2020-09-15 11:30 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev, linux-nvdimm
  Cc: Santosh Sivaraj, Aneesh Kumar K . V, Oliver O'Halloran,
	Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200912081451.66225-1-vaibhav@linux.ibm.com>

Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> A warning is reported by the kernel in case perf_stats_show() returns
> an error code. The warning is of the form below:
>
>  papr_scm ibm,persistent-memory:ibm,pmemory@44100001:
>  	  Failed to query performance stats, Err:-10
>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
>
> On investigation it looks like that the compiler is silently truncating the
> return value of drc_pmem_query_stats() from 'long' to 'int', since the
> variable used to store the return code 'rc' is an 'int'. This
> truncated value is then returned back as a 'ssize_t' back from
> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
> unsigned number and triggers this warning..
>
> To fix this we update the type of variable 'rc' from 'int' to
> 'ssize_t' that prevents the compiler from truncating the return value
> of drc_pmem_query_stats() and returning correct signed value back from
> perf_stats_show().
>
> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>        stats from PHYP')

Please don't word wrap the Fixes tag it breaks b4.

I've fixed it up this time.

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox