LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/2] ASoC: dt-bindings: fsl_aud2htx: Add binding doc for aud2htx module
From: Mark Brown @ 2020-11-06 11:54 UTC (permalink / raw)
  To: Shengjiu Wang, perex, Xiubo.Lee, festevam, robh+dt, nicoleotsuka,
	timur, tiwai, devicetree, alsa-devel, lgirdwood
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1604281947-26874-1-git-send-email-shengjiu.wang@nxp.com>

On Mon, 2 Nov 2020 09:52:26 +0800, Shengjiu Wang wrote:
> AUD2HTX (Audio Subsystem TO HDMI TX Subsystem) is a new
> IP module found on i.MX8MP.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/2] ASoC: dt-bindings: fsl_aud2htx: Add binding doc for aud2htx module
      commit: 40f4c56d08f2a95f4f3b036987f171dde69ddd36
[2/2] ASoC: fsl_aud2htx: Add aud2htx module driver
      commit: 8a24c834c053ef1b0cdefbd9c5bcb487cbc5068f

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: [PATCH v3 1/2] ASoC: dt-bindings: fsl_aud2htx: Add binding doc for aud2htx module
From: Mark Brown @ 2020-11-06 12:20 UTC (permalink / raw)
  To: Shengjiu Wang, perex, Xiubo.Lee, festevam, robh+dt, nicoleotsuka,
	timur, tiwai, devicetree, alsa-devel, lgirdwood
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <160466365499.22812.9217467877032314221.b4-ty@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Fri, Nov 06, 2020 at 11:54:23AM +0000, Mark Brown wrote:
> On Mon, 2 Nov 2020 09:52:26 +0800, Shengjiu Wang wrote:
> > AUD2HTX (Audio Subsystem TO HDMI TX Subsystem) is a new
> > IP module found on i.MX8MP.
> 
> Applied to
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Sorry, looks like me queueing this raced with the review comments coming
in.  I think the review commments are small enough that it'll be OK to
fix incrementally?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Kernel panic from malloc() on SUSE 15.1?
From: Michael Ellerman @ 2020-11-06 12:25 UTC (permalink / raw)
  To: Carl Jacobsen; +Cc: linuxppc-dev
In-Reply-To: <CAKkwB_TQJ=tg17SJd_s2SSXA7F3MeHmzjMC8f2uiV7yGvsLufg@mail.gmail.com>

Carl Jacobsen <cjacobsen@storix.com> writes:
> On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
>> Carl Jacobsen <cjacobsen@storix.com> writes:
>> > The panic (on a call to malloc from static linked libcrypto) looks like
>> > this:
>>
>> What hardware is this on?
>>
>
> Thank you for looking into this.
>
> The system that's panicking identifies like this:
>     # uname -a
>     Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
> 2019
>     (d75059b) ppc64le ppc64le ppc64le GNU/Linux
>     #
>     # cat /etc/os-release
>     NAME="SLES"
>     VERSION="15-SP1"
>     VERSION_ID="15.1"
>     PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
>     ID="sles"
>     ID_LIKE="suse"
>     ANSI_COLOR="0;32"
>     CPE_NAME="cpe:/o:suse:sles:15:sp1"
>
> The system is an LPAR running under PowerVM vios version 2.2.3.4.
> The underlying hardware is machine type-model 8284-22A.

OK thanks. That's a Power8.

>> Can you try booting with ppc_tm=off on the kernel command line, and see
>> if that changes anything?
>
> Yes. Output is down below. Doesn't appear to change much, but I don't have
> the background to interpret the registers.

Yeah looks like that's not the problem.

>> Can you put your compiled test program up somewhere we can download it
>> and look at? Or post the disassembly?
>>
>
> Here's the source file:
>     https://www.storix.com/download/support/misc/rand_test.c
>
> Here's the resulting executable:
>     https://www.storix.com/download/support/misc/rand_test

Thanks.

So something seems to have gone wrong linking this, I see eg:

0000000010004a8c <syscall_random>:
    10004a8c:   2b 10 40 3c     lis     r2,4139
    10004a90:   88 f7 42 38     addi    r2,r2,-2168
    10004a94:   a6 02 08 7c     mflr    r0
    10004a98:   10 00 01 f8     std     r0,16(r1)
    10004a9c:   f8 ff e1 fb     std     r31,-8(r1)
    10004aa0:   81 ff 21 f8     stdu    r1,-128(r1)
    10004aa4:   78 0b 3f 7c     mr      r31,r1
    10004aa8:   60 00 7f f8     std     r3,96(r31)
    10004aac:   68 00 9f f8     std     r4,104(r31)
    10004ab0:   00 00 00 60     nop
    10004ab4:   30 80 22 e9     ld      r9,-32720(r2)
    10004ab8:   00 00 a9 2f     cmpdi   cr7,r9,0
    10004abc:   30 00 9e 41     beq     cr7,10004aec <syscall_random+0x60>
    10004ac0:   60 00 7f e8     ld      r3,96(r31)
    10004ac4:   68 00 9f e8     ld      r4,104(r31)
    10004ac8:   39 b5 ff 4b     bl      10000000 <_init-0x1f00>

Notice that last bl (branch and link) to 0x10000000. But there's no text
at 0x10000000, that's the start of the page which happens to be the ELF
magic.

I've seen something like this before, but I can't remember when/where so
I haven't been able to track down what the problem was.

Anyway hopefully someone on the list will know.

That still doesn't explain the kernel crash though.


> Executable is linked to libcrypto from openssl-1.1.1g, configured with:
>     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>
> Executable is built (on SUSE 12) with:
>     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a


> And running the executable (on SUSE 15.1) through gdb goes like this:
>
>     # gdb --args ./rand_test
>     GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
>     << snip intro text >>
>     Reading symbols from ./rand_test...
>     (gdb) b main
>     Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
>     (gdb) r
>     Starting program: /tmp/ossl/rand_test
>
>     Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
>     6           int has_enough_data = RAND_status();
>     (gdb) s
>     RAND_status () at crypto/rand/rand_lib.c:958
>     958         const RAND_METHOD *meth = RAND_get_rand_method();
>     (gdb)
>     RAND_get_rand_method () at crypto/rand/rand_lib.c:844
>     844         const RAND_METHOD *tmp_meth = NULL;
>     (gdb)
>     846         if (!RUN_ONCE(&rand_init, do_rand_init))
>     (gdb)
>     CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>, > init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
>     67          if (*once != 0)
>     (gdb)
>     70          init();
>     (gdb)
>     do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
>     306     DEFINE_RUN_ONCE_STATIC(do_rand_init)
>     (gdb)
>     do_rand_init () at crypto/rand/rand_lib.c:309
>     309         rand_engine_lock = CRYPTO_THREAD_lock_new();
>     (gdb)
>     CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
>     24          if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
>     (gdb)
>     CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:230
>     230         void *ret = CRYPTO_malloc(num, file, line);
>     (gdb)
>     CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:194
>     194         void *ret = NULL;
>     (gdb)
>     197         if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
>     (gdb)
>     200         if (num == 0)
>     (gdb)
>     204         if (allow_customize) {
>     (gdb)
>     210             allow_customize = 0;
>     (gdb)
>     222         ret = malloc(num);
>     (gdb)
>     Bad kernel stack pointer 7fffffffef20 at 700


On my machine it doesn't crash the kernel, so I can catch it later. For
me it's here:

Program received signal SIGILL, Illegal instruction.
0x0000000010000004 in ?? ()
(gdb) bt
#0  0x0000000010000004 in ?? ()
#1  0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32)
    at crypto/rand/rand_unix.c:371
#2  0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0)
    at crypto/rand/rand_unix.c:636
#3  0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0, 
    pout=0x7ffffffff3f0, entropy=256, min_len=32, max_len=2147483647, 
    prediction_resistance=0) at crypto/rand/rand_lib.c:198
#4  0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0, 
    pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG", 
    perslen=28) at crypto/rand/drbg_lib.c:338
#5  0x0000000010020300 in drbg_setup (parent=0x0) at crypto/rand/drbg_lib.c:895
#6  0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7  0x000000001002034c in do_rand_drbg_init_ossl_ ()
    at crypto/rand/drbg_lib.c:909
#8  0x0000000010005d1c in CRYPTO_THREAD_run_once (
    once=0x102ab4d8 <rand_drbg_init>, 
    init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9  0x00000000100209c4 in RAND_DRBG_get0_master ()
    at crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffffa68) at rand_test.c:6
(gdb) 


ie. in the syscall_random() that I mentioned above.

You should be able to catch it there too if you do:

(gdb) b *0x10000000
(gdb) r

Hopefully it will stop without crashing the kernel, and then a `bt` will
show that you're in the same place as me.

If you can get that to work, when you're stopped there, can you do an
`info registers` and send us the output.

cheers

^ permalink raw reply

* Re: Kernel panic from malloc() on SUSE 15.1?
From: Michal Suchánek @ 2020-11-06 12:51 UTC (permalink / raw)
  To: Carl Jacobsen; +Cc: linuxppc-dev
In-Reply-To: <CAKkwB_S6Bs_+5At2aajbQbJg==WE_4NLdhSK=Bj+td67215Htg@mail.gmail.com>

On Mon, Nov 02, 2020 at 12:14:27PM -0800, Carl Jacobsen wrote:
> I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> simple
> test program, built in a slightly unusual way.
> 
> I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> I have a 10 line C test program that compiles and runs fine on the
> SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> gcc 7.4.1), it runs fine on SUSE 15.1.
> 
> But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> system, the call to RAND_status() gets to a malloc() and then panics.
> (And, of course, if I just compile a call to malloc(), that runs fine
> on both systems.) Here's the test program, it's really just a call to
> RAND_status():
> 
>     #include <stdio.h>
>     #include <openssl/rand.h>
> 
>     int main(int argc, char **argv)
>     {
>         int has_enough_data = RAND_status();
>         printf("The PRNG %s been seeded with enough data\n",
>                has_enough_data ? "HAS" : "has NOT");
>         return 0;
>     }
> 
> openssl is configured/built with:
>     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>     make
> 
> and the test program is compiled with:
>     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> 
> The kernel on SUSE 12 is: 3.12.28-4-default
> And glibc is: 2.19
> 
> The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> And glibc is: 2.26

SLE 12 SP5 has pretty much the same kernel as SLE 15 SP1 and pretty much
the same compiler as SLE 12 so it might be interesting data point to try
there.

Also I saw you are using very old VIOS (which should not make much of a
difference) but did not see what firmware version the machine has.

There have been cases of mysterious crashes solved by updating the
firmware.

Thanks

Michal

^ permalink raw reply

* Re: [PATCH 11/11 v3] ftrace: Add recording of functions that caused recursion
From: Petr Mladek @ 2020-11-06 13:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201106023548.102375687@goodmis.org>

On Thu 2020-11-05 21:32:46, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
> "recursed_functions" all the functions that caused recursion while a
> callback to the function tracer was running.
> 
> Changes since v2:
> 
>  - Use trace_recursion flags in current for protecting recursion of recursion recording
>  - Make the recursion logic a little cleaner
>  - Export GPL the recursion recording

JFYI, the code reading and writing the cache looks good to me.

It is still possible that some entries might stay unused (filled
with zeroes) but it should be hard to hit in practice. It
is good enough from my POV.

I do not give Reviewed-by tag just because I somehow do not have power
to review the entire patch carefully enough at the moment.

Best Regards,
Petr

^ permalink raw reply

* [PATCH] powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions
From: Christophe Leroy @ 2020-11-06 13:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

All hugetlb range freeing functions have a verification like the following,
which only differs by the mask used, depending on the page table level.

	start &= MASK;
	if (start < floor)
		return;
	if (ceiling) {
		ceiling &= MASK;
		if (! ceiling)
			return;
		}
	if (end - 1 > ceiling - 1)
		return;

Refactor that into a helper function which takes the mask as
an argument, returning true when [start;end[ is not fully
contained inside [floor;ceiling[

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/mm/hugetlbpage.c | 56 ++++++++++++-----------------------
 1 file changed, 19 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 36c3800769fb..f8d8a4988e15 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -294,6 +294,21 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
 static inline void hugepd_free(struct mmu_gather *tlb, void *hugepte) {}
 #endif
 
+/* Return true when the entry to be freed maps more than the area being freed */
+static bool range_is_outside_limits(unsigned long start, unsigned long end,
+				    unsigned long floor, unsigned long ceiling,
+				    unsigned long mask)
+{
+	if ((start & mask) < floor)
+		return true;
+	if (ceiling) {
+		ceiling &= mask;
+		if (!ceiling)
+			return true;
+	}
+	return end - 1 > ceiling - 1;
+}
+
 static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshift,
 			      unsigned long start, unsigned long end,
 			      unsigned long floor, unsigned long ceiling)
@@ -309,15 +324,7 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	if (shift > pdshift)
 		num_hugepd = 1 << (shift - pdshift);
 
-	start &= pdmask;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= pdmask;
-		if (! ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
+	if (range_is_outside_limits(start, end, floor, ceiling, pdmask))
 		return;
 
 	for (i = 0; i < num_hugepd; i++, hpdp++)
@@ -334,18 +341,9 @@ static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
 				   unsigned long addr, unsigned long end,
 				   unsigned long floor, unsigned long ceiling)
 {
-	unsigned long start = addr;
 	pgtable_t token = pmd_pgtable(*pmd);
 
-	start &= PMD_MASK;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= PMD_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
+	if (range_is_outside_limits(addr, end, floor, ceiling, PMD_MASK))
 		return;
 
 	pmd_clear(pmd);
@@ -395,15 +393,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 				  addr, next, floor, ceiling);
 	} while (addr = next, addr != end);
 
-	start &= PUD_MASK;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= PUD_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
+	if (range_is_outside_limits(start, end, floor, ceiling, PUD_MASK))
 		return;
 
 	pmd = pmd_offset(pud, start);
@@ -446,15 +436,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
 		}
 	} while (addr = next, addr != end);
 
-	start &= PGDIR_MASK;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= PGDIR_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
+	if (range_is_outside_limits(start, end, floor, ceiling, PGDIR_MASK))
 		return;
 
 	pud = pud_offset(p4d, start);
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH 11/11 v3] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-06 13:41 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201106131317.GW20201@alley>

On Fri, 6 Nov 2020 14:13:17 +0100
Petr Mladek <pmladek@suse.com> wrote:

> JFYI, the code reading and writing the cache looks good to me.
> 
> It is still possible that some entries might stay unused (filled
> with zeroes) but it should be hard to hit in practice. It
> is good enough from my POV.

You mean the part that was commented?

> 
> I do not give Reviewed-by tag just because I somehow do not have power
> to review the entire patch carefully enough at the moment.

No problem. Thanks for looking at it.

I'm adding a link to this thread, so if someone wants proof you helped out
on this code, you can have them follow the links ;-)

Anyway, even if I push this to linux-next where I stop rebasing code
(because of test coverage), I do rebase for adding tags. So if you ever get
around at looking at this code, I can add that tag later (before the next
merge window), or if you find something, I could fix it with a new patch and
give you a Reported-by.

-- Steve

^ permalink raw reply

* Re: [PATCH 11/11 v3] ftrace: Add recording of functions that caused recursion
From: Petr Mladek @ 2020-11-06 14:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201106084131.7dfc3a30@gandalf.local.home>

On Fri 2020-11-06 08:41:31, Steven Rostedt wrote:
> On Fri, 6 Nov 2020 14:13:17 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > JFYI, the code reading and writing the cache looks good to me.
> > 
> > It is still possible that some entries might stay unused (filled
> > with zeroes) but it should be hard to hit in practice. It
> > is good enough from my POV.
> 
> You mean the part that was commented?

Yeah, it is the comment problem when nr_records is pushed forward.

> > 
> > I do not give Reviewed-by tag just because I somehow do not have power
> > to review the entire patch carefully enough at the moment.
> 
> No problem. Thanks for looking at it.
> 
> I'm adding a link to this thread, so if someone wants proof you helped out
> on this code, you can have them follow the links ;-)
> 
> Anyway, even if I push this to linux-next where I stop rebasing code
> (because of test coverage), I do rebase for adding tags. So if you ever get
> around at looking at this code, I can add that tag later (before the next
> merge window), or if you find something, I could fix it with a new patch and
> give you a Reported-by.

Good to know.

Best Regards,
Petr

^ permalink raw reply

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
From: Andrew Donnellan @ 2020-11-06 14:59 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt
In-Reply-To: <87r1pfkj7z.fsf@linux.ibm.com>

On 30/10/20 11:10 pm, Nathan Lynch wrote:
>> And there's a zero chance that drmgr will ever be fixed on LE?
> 
> It's always used the sysfs interface on LE, and the only way to provoke
> it to attempt the syscalls is by doing something like this before
> running the migration:
> 
> # echo 0 > /tmp/fake_api_version
> # mount -o bind,ro /tmp/fake_api_version /sys/kernel/mobility/api_version
> 
> So I'm not aware of any circumstance that would actually motivate
> someone to make it work on LE. I'd say it's more likely that drmgr will
> drop its support for using the syscall altogether.
> 

Okay. librtas should handle the error fine, so I don't have any huge 
objection to this, though perhaps a documentation patch to librtas to 
mention that these calls are BE-only would be in order.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* [RFC PATCH 0/9] powerpc/64s: fast interrupt exit
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This series attempts to improve the speed of interrupts and system calls
in two major ways.

Firstly, the SRR/HSRR registers do not need to be reloaded if they were
not used or clobbered fur the duration of the interrupt.

Secondly, an alternate return location facility is added for soft-masked
asynchronous interrupts and then that's used to set everything up for
return without having to disable MSR RI or EE.

After this series, the entire system call / interrupt handler fast path
executes no mtsprs and one mtmsrd to enable interrupts initially, and
the system call vectored path doesn't even need to do that.

Thanks,
Nick

Nicholas Piggin (9):
  powerpc/64s: syscall real mode entry use mtmsrd rather than rfid
  powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]
  powerpc/64s: introduce different functions to return from SRR vs HSRR
    interrupts
  powerpc/64s: avoid reloading (H)SRR registers if they are still valid
  powerpc/64: move interrupt return asm to interrupt_64.S
  powerpc/64s: save one more register in the masked interrupt handler
  powerpc/64s: allow alternate return locations for soft-masked
    interrupts
  powerpc/64s: interrupt soft-enable race fix
  powerpc/64s: use interrupt restart table to speed up return from
    interrupt

 arch/powerpc/Kconfig.debug                 |   5 +
 arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
 arch/powerpc/include/asm/head-64.h         |   2 +-
 arch/powerpc/include/asm/interrupt.h       |  18 +
 arch/powerpc/include/asm/paca.h            |   3 +
 arch/powerpc/include/asm/ppc_asm.h         |   8 +
 arch/powerpc/include/asm/ptrace.h          |  28 +-
 arch/powerpc/kernel/asm-offsets.c          |   5 +
 arch/powerpc/kernel/entry_64.S             | 508 ---------------
 arch/powerpc/kernel/exceptions-64s.S       | 180 ++++--
 arch/powerpc/kernel/fpu.S                  |   2 +
 arch/powerpc/kernel/head_64.S              |   5 +-
 arch/powerpc/kernel/interrupt_64.S         | 720 +++++++++++++++++++++
 arch/powerpc/kernel/irq.c                  |  79 ++-
 arch/powerpc/kernel/kgdb.c                 |   2 +-
 arch/powerpc/kernel/kprobes-ftrace.c       |   2 +-
 arch/powerpc/kernel/kprobes.c              |  10 +-
 arch/powerpc/kernel/process.c              |  21 +-
 arch/powerpc/kernel/rtas.c                 |  13 +-
 arch/powerpc/kernel/signal.c               |   2 +-
 arch/powerpc/kernel/signal_64.c            |  14 +
 arch/powerpc/kernel/syscall_64.c           | 242 ++++---
 arch/powerpc/kernel/syscalls.c             |   2 +
 arch/powerpc/kernel/traps.c                |  18 +-
 arch/powerpc/kernel/vector.S               |   6 +-
 arch/powerpc/kernel/vmlinux.lds.S          |  10 +
 arch/powerpc/lib/Makefile                  |   2 +-
 arch/powerpc/lib/restart_table.c           |  26 +
 arch/powerpc/lib/sstep.c                   |   5 +-
 arch/powerpc/math-emu/math.c               |   2 +-
 arch/powerpc/mm/fault.c                    |   2 +-
 arch/powerpc/perf/core-book3s.c            |  19 +-
 arch/powerpc/platforms/powernv/opal-call.c |   3 +
 arch/powerpc/sysdev/fsl_pci.c              |   2 +-
 34 files changed, 1244 insertions(+), 726 deletions(-)
 create mode 100644 arch/powerpc/kernel/interrupt_64.S
 create mode 100644 arch/powerpc/lib/restart_table.c

-- 
2.23.0


^ permalink raw reply

* [RFC PATCH 1/9] powerpc/64s: syscall real mode entry use mtmsrd rather than rfid
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

Have the real mode system call entry handler branch to the kernel
0xc000... address and then use mtmsrd to enable the MMU, rather than use
SRRs and rfid.

Commit 8729c26e675c ("powerpc/64s/exception: Move real to virt switch
into the common handler") implemented this style of real mode entry for
other interrupt handlers, so this brings system calls into line with
them, which is the main motivcation for the change.

This tends to be slightly faster due to avoiding the mtsprs, and it also
does not clobber the SRR registers, which becomes important in a
subsequent change. The real mode entry points don't tend to be too
important for performance these days, but it is possible for a
hypervisor to run guests in AIL=0 mode for certian reasons.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S       | 6 ++++++
 arch/powerpc/kernel/exceptions-64s.S | 9 +++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 479fb58844fa..bd8cc7a214d3 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -225,6 +225,12 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
 	b	system_call_vectored_common
 #endif
 
+	.balign IFETCH_ALIGN_BYTES
+	.globl system_call_common_real
+system_call_common_real:
+	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
+	mtmsrd	r10
+
 	.balign IFETCH_ALIGN_BYTES
 	.globl system_call_common
 system_call_common:
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1db6b3438c88..ea7bb7cc0db1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1901,12 +1901,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	HMT_MEDIUM
 
 	.if ! \virt
-	__LOAD_HANDLER(r10, system_call_common)
-	mtspr	SPRN_SRR0,r10
-	ld	r10,PACAKMSR(r13)
-	mtspr	SPRN_SRR1,r10
-	RFI_TO_KERNEL
-	b	.	/* prevent speculative execution */
+	__LOAD_HANDLER(r10, system_call_common_real)
+	mtctr	r10
+	bctr
 	.else
 	li	r10,MSR_RI
 	mtmsrd 	r10,1			/* Set RI (EE=0) */
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 2/9] powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

This extends the MSR[RI]=0 window a little further into the system
call in order to pair RI and EE enabling with a single mtmsrd.

XXX: May need to make this radix-only opt. - might take SLB or HPT faults
on the context tracking, etc.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ea7bb7cc0db1..ad9f51e49806 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1905,8 +1905,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	mtctr	r10
 	bctr
 	.else
-	li	r10,MSR_RI
-	mtmsrd 	r10,1			/* Set RI (EE=0) */
 #ifdef CONFIG_RELOCATABLE
 	__LOAD_HANDLER(r10, system_call_common)
 	mtctr	r10
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 3/9] powerpc/64s: introduce different functions to return from SRR vs HSRR interrupts
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

This makes no real difference yet except that HSRR type interrupts will
use hrfid to return. This is important for the next patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S       | 58 +++++++++++++++------
 arch/powerpc/kernel/exceptions-64s.S | 78 +++++++++++++++-------------
 arch/powerpc/kernel/vector.S         |  2 +-
 3 files changed, 85 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index bd8cc7a214d3..53027fc9cd31 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -648,47 +648,54 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	blr
 
 #ifdef CONFIG_PPC_BOOK3S
+
 	/*
 	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
 	 * touched, no exit work created, then this can be used.
 	 */
 	.balign IFETCH_ALIGN_BYTES
-	.globl fast_interrupt_return
-fast_interrupt_return:
-_ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
+	.globl fast_interrupt_return_srr
+fast_interrupt_return_srr:
+_ASM_NOKPROBE_SYMBOL(fast_interrupt_return_srr)
 	kuap_check_amr r3, r4
 	ld	r5,_MSR(r1)
 	andi.	r0,r5,MSR_PR
-	bne	.Lfast_user_interrupt_return
+	bne	.Lfast_user_interrupt_return_srr
 	kuap_restore_amr r3, r4
 	andi.	r0,r5,MSR_RI
 	li	r3,0 /* 0 return value, no EMULATE_STACK_STORE */
-	bne+	.Lfast_kernel_interrupt_return
+	bne+	.Lfast_kernel_interrupt_return_srr
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unrecoverable_exception
 	b	. /* should not get here */
 
+.macro interrupt_return_macro srr
 	.balign IFETCH_ALIGN_BYTES
-	.globl interrupt_return
-interrupt_return:
-_ASM_NOKPROBE_SYMBOL(interrupt_return)
+	.globl interrupt_return_\srr
+interrupt_return_\srr\():
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
 	ld	r4,_MSR(r1)
 	andi.	r0,r4,MSR_PR
-	beq	.Lkernel_interrupt_return
+	beq	.Lkernel_interrupt_return_\srr
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	interrupt_exit_user_prepare
 	cmpdi	r3,0
-	bne-	.Lrestore_nvgprs
+	bne-	.Lrestore_nvgprs_\srr
 
-.Lfast_user_interrupt_return:
+.Lfast_user_interrupt_return_\srr:
 	ld	r11,_NIP(r1)
 	ld	r12,_MSR(r1)
 BEGIN_FTR_SECTION
 	ld	r10,_PPR(r1)
 	mtspr	SPRN_PPR,r10
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	.ifc \srr,srr
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r12
+	.else
+	mtspr	SPRN_HSRR0,r11
+	mtspr	SPRN_HSRR1,r12
+	.endif
 
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1		/* to clear the reservation */
@@ -715,24 +722,33 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 	REST_GPR(6, r1)
 	REST_GPR(0, r1)
 	REST_GPR(1, r1)
+	.ifc \srr,srr
 	RFI_TO_USER
+	.else
+	HRFI_TO_USER
+	.endif
 	b	.	/* prevent speculative execution */
 
-.Lrestore_nvgprs:
+.Lrestore_nvgprs_\srr\():
 	REST_NVGPRS(r1)
-	b	.Lfast_user_interrupt_return
+	b	.Lfast_user_interrupt_return_\srr
 
 	.balign IFETCH_ALIGN_BYTES
-.Lkernel_interrupt_return:
+.Lkernel_interrupt_return_\srr\():
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	interrupt_exit_kernel_prepare
 
-.Lfast_kernel_interrupt_return:
+.Lfast_kernel_interrupt_return_\srr\():
 	cmpdi	cr1,r3,0
 	ld	r11,_NIP(r1)
 	ld	r12,_MSR(r1)
+	.ifc \srr,srr
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r12
+	.else
+	mtspr	SPRN_HSRR0,r11
+	mtspr	SPRN_HSRR1,r12
+	.endif
 
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1		/* to clear the reservation */
@@ -766,7 +782,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 	REST_GPR(6, r1)
 	REST_GPR(0, r1)
 	REST_GPR(1, r1)
+	.ifc \srr,srr
 	RFI_TO_KERNEL
+	.else
+	HRFI_TO_KERNEL
+	.endif
 	b	.	/* prevent speculative execution */
 
 1:	/*
@@ -786,8 +806,16 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 	std	r9,0(r1) /* perform store component of stdu */
 	ld	r9,PACA_EXGEN+0(r13)
 
+	.ifc \srr,srr
 	RFI_TO_KERNEL
+	.else
+	HRFI_TO_KERNEL
+	.endif
 	b	.	/* prevent speculative execution */
+.endm
+
+interrupt_return_macro srr
+interrupt_return_macro hsrr
 #endif /* CONFIG_PPC_BOOK3S */
 
 #ifdef CONFIG_PPC_RTAS
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ad9f51e49806..1f725a3ac2f3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1236,7 +1236,7 @@ EXC_COMMON_BEGIN(machine_check_common)
 	mtmsrd 	r10,1
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM machine_check
 
@@ -1360,10 +1360,10 @@ MMU_FTR_SECTION_ELSE
 	bl	do_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
         cmpdi	r3,0
-	beq+	interrupt_return
+	beq+	interrupt_return_srr
 	/* We need to restore NVGPRS */
 	REST_NVGPRS(r1)
-	b       interrupt_return
+	b       interrupt_return_srr
 
 	GEN_KVM data_access
 
@@ -1410,7 +1410,7 @@ BEGIN_MMU_FTR_SECTION
 	bl	do_slb_fault
 	cmpdi	r3,0
 	bne-	1f
-	b	fast_interrupt_return
+	b	fast_interrupt_return_srr
 1:	/* Error case */
 MMU_FTR_SECTION_ELSE
 	/* Radix case, access is outside page table range */
@@ -1419,7 +1419,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	std	r3,RESULT(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_bad_slb_fault
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM data_access_slb
 
@@ -1458,10 +1458,10 @@ MMU_FTR_SECTION_ELSE
 	bl	do_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
         cmpdi	r3,0
-	beq+	interrupt_return
+	beq+	interrupt_return_srr
 	/* We need to restore NVGPRS */
 	REST_NVGPRS(r1)
-	b       interrupt_return
+	b       interrupt_return_srr
 
 	GEN_KVM instruction_access
 
@@ -1499,7 +1499,7 @@ BEGIN_MMU_FTR_SECTION
 	bl	do_slb_fault
 	cmpdi	r3,0
 	bne-	1f
-	b	fast_interrupt_return
+	b	fast_interrupt_return_srr
 1:	/* Error case */
 MMU_FTR_SECTION_ELSE
 	/* Radix case, access is outside page table range */
@@ -1508,7 +1508,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	std	r3,RESULT(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_bad_slb_fault
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM instruction_access_slb
 
@@ -1554,7 +1554,11 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	GEN_COMMON hardware_interrupt
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	interrupt_return
+	BEGIN_FTR_SECTION
+	b	interrupt_return_hsrr
+	FTR_SECTION_ELSE
+	b	interrupt_return_srr
+	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 
 	GEN_KVM hardware_interrupt
 
@@ -1583,7 +1587,7 @@ EXC_COMMON_BEGIN(alignment_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	alignment_exception
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM alignment
 
@@ -1647,7 +1651,7 @@ EXC_COMMON_BEGIN(program_check_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	program_check_exception
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM program_check
 
@@ -1692,12 +1696,12 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
 	bl	load_up_fpu
-	b	fast_interrupt_return
+	b	fast_interrupt_return_srr
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2:	/* User process was in a transaction */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	fp_unavailable_tm
-	b	interrupt_return
+	b	interrupt_return_srr
 #endif
 
 	GEN_KVM fp_unavailable
@@ -1738,7 +1742,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 	GEN_COMMON decrementer
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	timer_interrupt
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM decrementer
 
@@ -1826,7 +1830,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #else
 	bl	unknown_async_exception
 #endif
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM doorbell_super
 
@@ -1995,7 +1999,7 @@ EXC_COMMON_BEGIN(single_step_common)
 	GEN_COMMON single_step
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	single_step_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM single_step
 
@@ -2036,7 +2040,7 @@ BEGIN_MMU_FTR_SECTION
 MMU_FTR_SECTION_ELSE
 	bl      unknown_exception
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
-	b       interrupt_return
+	b       interrupt_return_hsrr
 
 	GEN_KVM h_data_storage
 
@@ -2063,7 +2067,7 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 	GEN_COMMON h_instr_storage
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM h_instr_storage
 
@@ -2089,7 +2093,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	emulation_assist_interrupt
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM emulation_assist
 
@@ -2170,7 +2174,7 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	GEN_COMMON hmi_exception
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	handle_hmi_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM hmi_exception
 
@@ -2202,7 +2206,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #else
 	bl	unknown_async_exception
 #endif
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM h_doorbell
 
@@ -2230,7 +2234,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
 	GEN_COMMON h_virt_irq
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM h_virt_irq
 
@@ -2275,7 +2279,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	GEN_COMMON performance_monitor
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	performance_monitor_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM performance_monitor
 
@@ -2314,19 +2318,19 @@ BEGIN_FTR_SECTION
   END_FTR_SECTION_NESTED(CPU_FTR_TM, CPU_FTR_TM, 69)
 #endif
 	bl	load_up_altivec
-	b	fast_interrupt_return
+	b	fast_interrupt_return_srr
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2:	/* User process was in a transaction */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	altivec_unavailable_tm
-	b	interrupt_return
+	b	interrupt_return_srr
 #endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	altivec_unavailable_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM altivec_unavailable
 
@@ -2369,14 +2373,14 @@ BEGIN_FTR_SECTION
 2:	/* User process was in a transaction */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	vsx_unavailable_tm
-	b	interrupt_return
+	b	interrupt_return_srr
 #endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	vsx_unavailable_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM vsx_unavailable
 
@@ -2406,7 +2410,7 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
 	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM facility_unavailable
 
@@ -2436,7 +2440,7 @@ EXC_COMMON_BEGIN(h_facility_unavailable_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
 	REST_NVGPRS(r1) /* XXX Shouldn't be necessary in practice */
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM h_facility_unavailable
 
@@ -2469,7 +2473,7 @@ EXC_COMMON_BEGIN(cbe_system_error_common)
 	GEN_COMMON cbe_system_error
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_system_error_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM cbe_system_error
 
@@ -2497,7 +2501,7 @@ EXC_COMMON_BEGIN(instruction_breakpoint_common)
 	GEN_COMMON instruction_breakpoint
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	instruction_breakpoint_exception
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM instruction_breakpoint
 
@@ -2619,7 +2623,7 @@ EXC_COMMON_BEGIN(denorm_exception_common)
 	GEN_COMMON denorm_exception
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM denorm_exception
 
@@ -2640,7 +2644,7 @@ EXC_COMMON_BEGIN(cbe_maintenance_common)
 	GEN_COMMON cbe_maintenance
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_maintenance_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM cbe_maintenance
 
@@ -2672,7 +2676,7 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 #else
 	bl	unknown_exception
 #endif
-	b	interrupt_return
+	b	interrupt_return_srr
 
 	GEN_KVM altivec_assist
 
@@ -2693,7 +2697,7 @@ EXC_COMMON_BEGIN(cbe_thermal_common)
 	GEN_COMMON cbe_thermal
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_thermal_exception
-	b	interrupt_return
+	b	interrupt_return_hsrr
 
 	GEN_KVM cbe_thermal
 
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 801dc28fdcca..2c948e7b0d00 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -133,7 +133,7 @@ _GLOBAL(load_up_vsx)
 	/* enable use of VSX after return */
 	oris	r12,r12,MSR_VSX@h
 	std	r12,_MSR(r1)
-	b	fast_interrupt_return
+	b	fast_interrupt_return_srr
 
 #endif /* CONFIG_VSX */
 
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 4/9] powerpc/64s: avoid reloading (H)SRR registers if they are still valid
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

When an interrupt is taken, the SRR registers are set to return to
where it left off. Unless they are modified in the meantime, or the
return address or MSR are modified, there is no need to reload these
registers when returning from interrupt.

Introduce per-CPU flags that track the validity of SRR and HSRR
registers, clear them when returning from interrupt, using the registers
for something else (e.g., OPAL calls), or adjusting return address or MSR.

This improves the performance of interrupt returns.

XXX: 64e build breaks

XXX: may not need to invalidate both hsrr and srr all the time

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/Kconfig.debug                 |  5 ++
 arch/powerpc/include/asm/paca.h            |  2 +
 arch/powerpc/include/asm/ptrace.h          | 15 +++++
 arch/powerpc/kernel/asm-offsets.c          |  2 +
 arch/powerpc/kernel/entry_64.S             | 75 +++++++++++++++++++---
 arch/powerpc/kernel/exceptions-64s.S       | 27 ++++++++
 arch/powerpc/kernel/fpu.S                  |  2 +
 arch/powerpc/kernel/kgdb.c                 |  2 +-
 arch/powerpc/kernel/kprobes-ftrace.c       |  2 +-
 arch/powerpc/kernel/kprobes.c              | 10 +--
 arch/powerpc/kernel/process.c              | 21 +++++-
 arch/powerpc/kernel/rtas.c                 | 13 +++-
 arch/powerpc/kernel/signal.c               |  2 +-
 arch/powerpc/kernel/signal_64.c            | 14 ++++
 arch/powerpc/kernel/syscalls.c             |  2 +
 arch/powerpc/kernel/traps.c                | 18 +++---
 arch/powerpc/kernel/vector.S               |  4 ++
 arch/powerpc/lib/sstep.c                   |  5 +-
 arch/powerpc/math-emu/math.c               |  2 +-
 arch/powerpc/mm/fault.c                    |  2 +-
 arch/powerpc/platforms/powernv/opal-call.c |  3 +
 arch/powerpc/sysdev/fsl_pci.c              |  2 +-
 22 files changed, 197 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index b88900f4832f..ad1f5bf6ab3d 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -84,6 +84,11 @@ config MSI_BITMAP_SELFTEST
 
 config PPC_IRQ_SOFT_MASK_DEBUG
 	bool "Include extra checks for powerpc irq soft masking"
+	depends on PPC64
+
+config PPC_RFI_SRR_DEBUG
+	bool "Include extra checks for RFI SRR register validity"
+	depends on PPC_BOOK3S_64
 
 config XMON
 	bool "Include xmon kernel debugger"
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..58e9995c3184 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -170,6 +170,8 @@ struct paca_struct {
 #ifdef CONFIG_PPC_BOOK3E
 	u16 trap_save;			/* Used when bad stack is encountered */
 #endif
+	u8 hsrr_valid;			/* HSRRs set for HRFID */
+	u8 srr_valid;			/* SRRs set for RFID */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index e2c778c176a3..2c3e773ce292 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -110,6 +110,7 @@ struct pt_regs
 #endif /* __powerpc64__ */
 
 #ifndef __ASSEMBLY__
+#include <asm/paca.h>
 
 static inline unsigned long instruction_pointer(struct pt_regs *regs)
 {
@@ -160,6 +161,20 @@ static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
 	regs->gpr[3] = rc;
 }
 
+static inline void regs_set_return_ip(struct pt_regs *regs, unsigned long ip)
+{
+	regs->nip = ip;
+#ifdef CONFIG_PPC_BOOK3S
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+#endif
+}
+
+static inline void regs_add_return_ip(struct pt_regs *regs, long offset)
+{
+	regs_set_return_ip(regs, regs->nip + offset);
+}
+
 #ifdef __powerpc64__
 #define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1)
 #else
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index c2722ff36e98..ea13e35dd511 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -210,6 +210,8 @@ int main(void)
 	OFFSET(PACATOC, paca_struct, kernel_toc);
 	OFFSET(PACAKBASE, paca_struct, kernelbase);
 	OFFSET(PACAKMSR, paca_struct, kernel_msr);
+	OFFSET(PACAHSRR_VALID, paca_struct, hsrr_valid);
+	OFFSET(PACASRR_VALID, paca_struct, srr_valid);
 	OFFSET(PACAIRQSOFTMASK, paca_struct, irq_soft_mask);
 	OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened);
 	OFFSET(PACA_FTRACE_ENABLED, paca_struct, ftrace_enabled);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 53027fc9cd31..6236a88a592f 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -65,6 +65,30 @@ exception_marker:
 	.align 7
 
 #ifdef CONFIG_PPC_BOOK3S
+.macro DEBUG_SRR_VALID srr
+#ifdef CONFIG_PPC_RFI_SRR_DEBUG
+	.ifc \srr,srr
+	mfspr	r11,SPRN_SRR0
+	ld	r12,_NIP(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	mfspr	r11,SPRN_SRR1
+	ld	r12,_MSR(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	.else
+	mfspr	r11,SPRN_HSRR0
+	ld	r12,_NIP(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	mfspr	r11,SPRN_HSRR1
+	ld	r12,_MSR(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	.endif
+#endif
+.endm
+
 .macro system_call_vectored name trapnr
 	.globl system_call_vectored_\name
 system_call_vectored_\name:
@@ -289,6 +313,9 @@ END_BTB_FLUSH_SECTION
 	ld	r11,exception_marker@toc(r2)
 	std	r11,-16(r10)		/* "regshere" marker */
 
+	li	r11,1
+	stb	r11,PACASRR_VALID(r13)
+
 	/*
 	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
 	 * would clobber syscall parameters. Also we always enter with IRQs
@@ -310,18 +337,25 @@ END_BTB_FLUSH_SECTION
 	bl	syscall_exit_prepare
 
 	ld	r2,_CCR(r1)
+	ld	r6,_LINK(r1)
+	mtlr	r6
+
+	lbz	r4,PACASRR_VALID(r13)
+	cmpdi	r4,0
+	bne	1f
+	li	r4,0
+	stb	r4,PACASRR_VALID(r13)
 	ld	r4,_NIP(r1)
 	ld	r5,_MSR(r1)
-	ld	r6,_LINK(r1)
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+1:
+	DEBUG_SRR_VALID srr
 
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1			/* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
-	mtspr	SPRN_SRR0,r4
-	mtspr	SPRN_SRR1,r5
-	mtlr	r6
-
 	cmpdi	r3,0
 	bne	.Lsyscall_restore_regs
 	/* Zero volatile regs that may contain sensitive kernel data */
@@ -648,7 +682,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	blr
 
 #ifdef CONFIG_PPC_BOOK3S
-
 	/*
 	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
 	 * touched, no exit work created, then this can be used.
@@ -683,19 +716,32 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
 	bne-	.Lrestore_nvgprs_\srr
 
 .Lfast_user_interrupt_return_\srr:
-	ld	r11,_NIP(r1)
-	ld	r12,_MSR(r1)
 BEGIN_FTR_SECTION
 	ld	r10,_PPR(r1)
 	mtspr	SPRN_PPR,r10
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	.ifc \srr,srr
+	lbz	r4,PACASRR_VALID(r13)
+	.else
+	lbz	r4,PACAHSRR_VALID(r13)
+	.endif
+	cmpdi	r4,0
+	li	r4,0
+	bne	1f
+	ld	r11,_NIP(r1)
+	ld	r12,_MSR(r1)
 	.ifc \srr,srr
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r12
+1:
+	stb	r4,PACASRR_VALID(r13)
 	.else
 	mtspr	SPRN_HSRR0,r11
 	mtspr	SPRN_HSRR1,r12
+1:
+	stb	r4,PACAHSRR_VALID(r13)
 	.endif
+	DEBUG_SRR_VALID \srr
 
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1		/* to clear the reservation */
@@ -740,15 +786,28 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 .Lfast_kernel_interrupt_return_\srr\():
 	cmpdi	cr1,r3,0
+	.ifc \srr,srr
+	lbz	r4,PACASRR_VALID(r13)
+	.else
+	lbz	r4,PACAHSRR_VALID(r13)
+	.endif
+	cmpdi	r4,0
+	li	r4,0
+	bne	1f
 	ld	r11,_NIP(r1)
 	ld	r12,_MSR(r1)
 	.ifc \srr,srr
 	mtspr	SPRN_SRR0,r11
 	mtspr	SPRN_SRR1,r12
+1:
+	stb	r4,PACASRR_VALID(r13)
 	.else
 	mtspr	SPRN_HSRR0,r11
 	mtspr	SPRN_HSRR1,r12
+1:
+	stb	r4,PACAHSRR_VALID(r13)
 	.endif
+	DEBUG_SRR_VALID \srr
 
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1		/* to clear the reservation */
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1f725a3ac2f3..b370a5f334fc 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -567,6 +567,20 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 	std	r0,GPR0(r1)		/* save r0 in stackframe	*/
 	std	r10,GPR1(r1)		/* save r1 in stackframe	*/
 
+	/* Mark our [H]SRRs valid for return */
+	li	r10,1
+	.if IHSRR_IF_HVMODE
+	BEGIN_FTR_SECTION
+	stb	r10,PACAHSRR_VALID(r13)
+	FTR_SECTION_ELSE
+	stb	r10,PACASRR_VALID(r13)
+	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+	.elseif IHSRR
+	stb	r10,PACAHSRR_VALID(r13)
+	.else
+	stb	r10,PACASRR_VALID(r13)
+	.endif
+
 	.if ISET_RI
 	li	r10,MSR_RI
 	mtmsrd	r10,1			/* Set MSR_RI */
@@ -666,10 +680,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 .macro EXCEPTION_RESTORE_REGS hsrr=0
 	/* Move original SRR0 and SRR1 into the respective regs */
 	ld	r9,_MSR(r1)
+	li	r10,0
 	.if \hsrr
 	mtspr	SPRN_HSRR1,r9
+	stb	r10,PACAHSRR_VALID(r13)
 	.else
 	mtspr	SPRN_SRR1,r9
+	stb	r10,PACASRR_VALID(r13)
 	.endif
 	ld	r9,_NIP(r1)
 	.if \hsrr
@@ -1781,6 +1798,8 @@ EXC_COMMON_BEGIN(hdecrementer_common)
 	 *
 	 * Be careful to avoid touching the kernel stack.
 	 */
+	li	r10,0
+	stb	r10,PACAHSRR_VALID(r13)
 	ld	r10,PACA_EXGEN+EX_CTR(r13)
 	mtctr	r10
 	mtcrf	0x80,r9
@@ -2611,6 +2630,8 @@ BEGIN_FTR_SECTION
 	ld	r10,PACA_EXGEN+EX_CFAR(r13)
 	mtspr	SPRN_CFAR,r10
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+	li	r10,0
+	stb	r10,PACAHSRR_VALID(r13)
 	ld	r10,PACA_EXGEN+EX_R10(r13)
 	ld	r11,PACA_EXGEN+EX_R11(r13)
 	ld	r12,PACA_EXGEN+EX_R12(r13)
@@ -2783,6 +2804,12 @@ masked_interrupt:
 	ori	r11,r11,PACA_IRQ_HARD_DIS
 	stb	r11,PACAIRQHAPPENED(r13)
 2:	/* done */
+	li	r10,0
+	.if \hsrr
+	stb	r10,PACAHSRR_VALID(r13)
+	.else
+	stb	r10,PACASRR_VALID(r13)
+	.endif
 	ld	r10,PACA_EXGEN+EX_CTR(r13)
 	mtctr	r10
 	mtcrf	0x80,r9
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 3ff9a8fafa46..ae80b4a3da48 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -105,6 +105,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	ori	r12,r12,MSR_FP
 	or	r12,r12,r4
 	std	r12,_MSR(r1)
+	li	r4,0
+	stb	r4,PACASRR_VALID(r13)
 #endif
 	li	r4,1
 	stb	r4,THREAD_LOAD_FP(r5)
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 409080208a6c..dcac6c74a93c 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -147,7 +147,7 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
 		return 0;
 
 	if (*(u32 *)regs->nip == BREAK_INSTR)
-		regs->nip += BREAK_INSTR_SIZE;
+		regs_add_return_ip(regs, BREAK_INSTR_SIZE);
 
 	return 1;
 }
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 972cb28174b2..dc35d4f5564f 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -40,7 +40,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 			 * Emulate singlestep (and also recover regs->nip)
 			 * as if there is a nop
 			 */
-			regs->nip += MCOUNT_INSN_SIZE;
+			regs_add_return_ip(regs, MCOUNT_INSN_SIZE);
 			if (unlikely(p->post_handler)) {
 				kcb->kprobe_status = KPROBE_HIT_SSDONE;
 				p->post_handler(p, regs, 0);
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 01ab2163659e..8165ed71ab51 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -178,7 +178,7 @@ static nokprobe_inline void prepare_singlestep(struct kprobe *p, struct pt_regs
 	 * variant as values in regs could play a part in
 	 * if the trap is taken or not
 	 */
-	regs->nip = (unsigned long)p->ainsn.insn;
+	regs_set_return_ip(regs, (unsigned long)p->ainsn.insn);
 }
 
 static nokprobe_inline void save_previous_kprobe(struct kprobe_ctlblk *kcb)
@@ -415,7 +415,7 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 	 * we end up emulating it in kprobe_handler(), which increments the nip
 	 * again.
 	 */
-	regs->nip = orig_ret_address - 4;
+	regs_set_return_ip(regs, orig_ret_address - 4);
 	regs->link = orig_ret_address;
 
 	return 0;
@@ -450,7 +450,7 @@ int kprobe_post_handler(struct pt_regs *regs)
 	}
 
 	/* Adjust nip to after the single-stepped instruction */
-	regs->nip = (unsigned long)cur->addr + len;
+	regs_set_return_ip(regs, (unsigned long)cur->addr + len);
 	regs->msr |= kcb->kprobe_saved_msr;
 
 	/*Restore back the original saved kprobes variables and continue. */
@@ -490,7 +490,7 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 		 * and allow the page fault handler to continue as a
 		 * normal page fault.
 		 */
-		regs->nip = (unsigned long)cur->addr;
+		regs_set_return_ip(regs, (unsigned long)cur->addr);
 		regs->msr &= ~MSR_SINGLESTEP; /* Turn off 'trace' bits */
 		regs->msr |= kcb->kprobe_saved_msr;
 		if (kcb->kprobe_status == KPROBE_REENTER)
@@ -523,7 +523,7 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 		 * zero, try to fix up.
 		 */
 		if ((entry = search_exception_tables(regs->nip)) != NULL) {
-			regs->nip = extable_fixup(entry);
+			regs_set_return_ip(regs, extable_fixup(entry));
 			return 1;
 		}
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 0bdd3ed653df..ea36a29c8b01 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -97,6 +97,8 @@ static void check_if_tm_restore_required(struct task_struct *tsk)
 	    !test_thread_flag(TIF_RESTORE_TM)) {
 		tsk->thread.ckpt_regs.msr = tsk->thread.regs->msr;
 		set_thread_flag(TIF_RESTORE_TM);
+		local_paca->hsrr_valid = 0;
+		local_paca->srr_valid = 0;
 	}
 }
 
@@ -161,6 +163,8 @@ static void __giveup_fpu(struct task_struct *tsk)
 	if (cpu_has_feature(CPU_FTR_VSX))
 		msr &= ~MSR_VSX;
 	tsk->thread.regs->msr = msr;
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 }
 
 void giveup_fpu(struct task_struct *tsk)
@@ -244,6 +248,8 @@ static void __giveup_altivec(struct task_struct *tsk)
 	if (cpu_has_feature(CPU_FTR_VSX))
 		msr &= ~MSR_VSX;
 	tsk->thread.regs->msr = msr;
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 }
 
 void giveup_altivec(struct task_struct *tsk)
@@ -559,6 +565,8 @@ void notrace restore_math(struct pt_regs *regs)
 		msr_check_and_clear(new_msr);
 
 		regs->msr |= new_msr | fpexc_mode;
+		local_paca->hsrr_valid = 0;
+		local_paca->srr_valid = 0;
 	}
 }
 #endif /* CONFIG_PPC_BOOK3S_64 */
@@ -1295,6 +1303,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
 			atomic_read(&current->mm->context.vas_windows)))
 			asm volatile(PPC_CP_ABORT);
 	}
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 	return last;
@@ -1878,6 +1888,9 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 	thread_pkey_regs_init(&current->thread);
+
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 }
 EXPORT_SYMBOL(start_thread);
 
@@ -1925,9 +1938,12 @@ int set_fpexc_mode(struct task_struct *tsk, unsigned int val)
 	if (val > PR_FP_EXC_PRECISE)
 		return -EINVAL;
 	tsk->thread.fpexc_mode = __pack_fe01(val);
-	if (regs != NULL && (regs->msr & MSR_FP) != 0)
+	if (regs != NULL && (regs->msr & MSR_FP) != 0) {
 		regs->msr = (regs->msr & ~(MSR_FE0|MSR_FE1))
 			| tsk->thread.fpexc_mode;
+		local_paca->hsrr_valid = 0;
+		local_paca->srr_valid = 0;
+	}
 	return 0;
 }
 
@@ -1979,6 +1995,9 @@ int set_endian(struct task_struct *tsk, unsigned int val)
 	else
 		return -EINVAL;
 
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 954f41676f69..85b5541a24f7 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -46,6 +46,13 @@
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
+static inline void do_enter_rtas(unsigned long args)
+{
+	enter_rtas(args);
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+}
+
 struct rtas_t rtas = {
 	.lock = __ARCH_SPIN_LOCK_UNLOCKED
 };
@@ -384,7 +391,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
 	save_args = rtas.args;
 	rtas.args = err_args;
 
-	enter_rtas(__pa(&rtas.args));
+	do_enter_rtas(__pa(&rtas.args));
 
 	err_args = rtas.args;
 	rtas.args = save_args;
@@ -430,7 +437,7 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	for (i = 0; i < nret; ++i)
 		args->rets[i] = 0;
 
-	enter_rtas(__pa(args));
+	do_enter_rtas(__pa(args));
 }
 
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, ...)
@@ -1198,7 +1205,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	flags = lock_rtas();
 
 	rtas.args = args;
-	enter_rtas(__pa(&rtas.args));
+	do_enter_rtas(__pa(&rtas.args));
 	args = rtas.args;
 
 	/* A -1 return code indicates that the last command couldn't
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 44ec7b34b27e..ca1a504d442d 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -242,7 +242,7 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
 			regs->gpr[0] = __NR_restart_syscall;
 		else
 			regs->gpr[3] = regs->orig_gpr3;
-		regs->nip -= 4;
+		regs_add_return_ip(regs, - 4);
 		regs->result = 0;
 	} else {
 		if (trap_is_scv(regs)) {
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index bfc939360bad..8e60c89dcc87 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -693,6 +693,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
 
 	/* This returns like rt_sigreturn */
 	set_thread_flag(TIF_RESTOREALL);
+
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	return 0;
 }
 
@@ -798,6 +802,10 @@ SYSCALL_DEFINE0(rt_sigreturn)
 		goto badframe;
 
 	set_thread_flag(TIF_RESTOREALL);
+
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	return 0;
 
 badframe:
@@ -878,6 +886,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
 	err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);
 
 	/* Set up "regs" so we "return" to the signal handler. */
+	/* XXX: use set return IP */
 	if (is_elf2_task()) {
 		regs->ctr = (unsigned long) ksig->ka.sa.sa_handler;
 		regs->gpr[12] = regs->ctr;
@@ -910,6 +919,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
 	if (err)
 		goto badframe;
 
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	return 0;
 
 badframe:
@@ -917,6 +929,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
 		printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
 				   tsk->comm, tsk->pid, "setup_rt_frame",
 				   (long)frame, regs->nip, regs->link);
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 
 	return 1;
 }
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 078608ec2e92..13bf62ccfa18 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -123,6 +123,8 @@ SYSCALL_DEFINE0(switch_endian)
 	struct thread_info *ti;
 
 	current->thread.regs->msr ^= MSR_LE;
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
 
 	/*
 	 * Set TIF_RESTOREALL so that r3 isn't clobbered on return to
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 95f84c542523..0796391edd16 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1049,7 +1049,7 @@ static void p9_hmi_special_emu(struct pt_regs *regs)
 #endif /* !__LITTLE_ENDIAN__ */
 
 	/* Go to next instruction */
-	regs->nip += 4;
+	regs_add_return_ip(regs, 4);
 }
 #endif /* CONFIG_VSX */
 
@@ -1487,7 +1487,7 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
 
 		if (!(regs->msr & MSR_PR) &&  /* not user-mode */
 		    report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
-			regs->nip += 4;
+			regs_add_return_ip(regs, 4);
 			return;
 		}
 		_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
@@ -1549,7 +1549,7 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
 	if (reason & (REASON_ILLEGAL | REASON_PRIVILEGED)) {
 		switch (emulate_instruction(regs)) {
 		case 0:
-			regs->nip += 4;
+			regs_add_return_ip(regs, 4);
 			emulate_single_step(regs);
 			return;
 		case -EFAULT:
@@ -1600,7 +1600,7 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
 
 	if (fixed == 1) {
 		/* skip over emulated instruction */
-		regs->nip += inst_length(reason);
+		regs_add_return_ip(regs, inst_length(reason));
 		emulate_single_step(regs);
 		return;
 	}
@@ -1767,7 +1767,7 @@ DEFINE_INTERRUPT_HANDLER(facility_unavailable_exception)
 				pr_err("DSCR based mfspr emulation failed\n");
 				return;
 			}
-			regs->nip += 4;
+			regs_add_return_ip(regs, 4);
 			emulate_single_step(regs);
 		}
 		return;
@@ -2030,7 +2030,7 @@ DEFINE_INTERRUPT_HANDLER(altivec_assist_exception)
 	PPC_WARN_EMULATED(altivec, regs);
 	err = emulate_altivec(regs);
 	if (err == 0) {
-		regs->nip += 4;		/* skip emulated instruction */
+		regs_add_return_ip(regs, 4); /* skip emulated instruction */
 		emulate_single_step(regs);
 		return;
 	}
@@ -2094,7 +2094,7 @@ void SPEFloatingPointException(struct pt_regs *regs)
 
 	err = do_spe_mathemu(regs);
 	if (err == 0) {
-		regs->nip += 4;		/* skip emulated instruction */
+		regs_add_return_ip(regs, 4); /* skip emulated instruction */
 		emulate_single_step(regs);
 		return;
 	}
@@ -2125,10 +2125,10 @@ void SPEFloatingPointRoundException(struct pt_regs *regs)
 		giveup_spe(current);
 	preempt_enable();
 
-	regs->nip -= 4;
+	regs_add_return_ip(regs, - 4);
 	err = speround_handler(regs);
 	if (err == 0) {
-		regs->nip += 4;		/* skip emulated instruction */
+		regs_add_return_ip(regs, 4); /* skip emulated instruction */
 		emulate_single_step(regs);
 		return;
 	}
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 2c948e7b0d00..609e3526a66f 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -75,6 +75,8 @@ _GLOBAL(load_up_altivec)
 	addi	r5,r4,THREAD		/* Get THREAD */
 	oris	r12,r12,MSR_VEC@h
 	std	r12,_MSR(r1)
+	li	r4,0
+	stb	r4,PACASRR_VALID(r13)
 #endif
 	li	r4,1
 	stb	r4,THREAD_LOAD_VEC(r5)
@@ -133,6 +135,8 @@ _GLOBAL(load_up_vsx)
 	/* enable use of VSX after return */
 	oris	r12,r12,MSR_VSX@h
 	std	r12,_MSR(r1)
+	li	r4,0
+	stb	r4,PACASRR_VALID(r13)
 	b	fast_interrupt_return_srr
 
 #endif /* CONFIG_VSX */
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 855457ed09b5..6c2d6a3c9c8a 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3033,7 +3033,7 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op)
 	default:
 		WARN_ON_ONCE(1);
 	}
-	regs->nip = next_pc;
+	regs_set_return_ip(regs, next_pc);
 }
 NOKPROBE_SYMBOL(emulate_update_regs);
 
@@ -3310,6 +3310,9 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr)
 	unsigned long val;
 	unsigned long ea;
 
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	r = analyse_instr(&op, regs, instr);
 	if (r < 0)
 		return r;
diff --git a/arch/powerpc/math-emu/math.c b/arch/powerpc/math-emu/math.c
index 30b4b69c6941..d92416d78aee 100644
--- a/arch/powerpc/math-emu/math.c
+++ b/arch/powerpc/math-emu/math.c
@@ -453,7 +453,7 @@ do_mathemu(struct pt_regs *regs)
 		break;
 	}
 
-	regs->nip += 4;
+	regs_add_return_ip(regs, 4);
 	return 0;
 
 illegal:
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 1999982200ab..1d82dc53386f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -615,7 +615,7 @@ void bad_page_fault(struct pt_regs *regs, int sig)
 
 	/* Are we prepared to handle this fault?  */
 	if ((entry = search_exception_tables(regs->nip)) != NULL) {
-		regs->nip = extable_fixup(entry);
+		regs_set_return_ip(regs, extable_fixup(entry));
 		return;
 	}
 
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..1a7bc261d156 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -100,6 +100,9 @@ static int64_t opal_call(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
 	bool mmu = (msr & (MSR_IR|MSR_DR));
 	int64_t ret;
 
+	local_paca->hsrr_valid = 0;
+	local_paca->srr_valid = 0;
+
 	msr &= ~MSR_EE;
 
 	if (unlikely(!mmu))
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 040b9d01c079..af78e7c3108f 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1072,7 +1072,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
 			ret = get_kernel_nofault(inst, (void *)regs->nip);
 
 		if (!ret && mcheck_handle_load(regs, inst)) {
-			regs->nip += 4;
+			regs_add_return_ip(regs, 4);
 			return 1;
 		}
 	}
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 5/9] powerpc/64: move interrupt return asm to interrupt_64.S
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

The next patch would like to move interrupt return assembly code to a low
location before general text, so move it into its own file and include via
head_64.S

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/head-64.h |   2 +-
 arch/powerpc/kernel/entry_64.S     | 601 ----------------------------
 arch/powerpc/kernel/head_64.S      |   5 +-
 arch/powerpc/kernel/interrupt_64.S | 608 +++++++++++++++++++++++++++++
 4 files changed, 613 insertions(+), 603 deletions(-)
 create mode 100644 arch/powerpc/kernel/interrupt_64.S

diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h
index 4cb9efa2eb21..242204e12993 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -16,7 +16,7 @@
 	.section ".head.data.\name\()","a",@progbits
 .endm
 .macro use_ftsec name
-	.section ".head.text.\name\()"
+	.section ".head.text.\name\()","ax",@progbits
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6236a88a592f..65ddd159974b 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -32,7 +32,6 @@
 #include <asm/irqflags.h>
 #include <asm/hw_irq.h>
 #include <asm/context_tracking.h>
-#include <asm/tm.h>
 #include <asm/ppc-opcode.h>
 #include <asm/barrier.h>
 #include <asm/export.h>
@@ -48,411 +47,7 @@
 /*
  * System calls.
  */
-	.section	".toc","aw"
-SYS_CALL_TABLE:
-	.tc sys_call_table[TC],sys_call_table
-
-#ifdef CONFIG_COMPAT
-COMPAT_SYS_CALL_TABLE:
-	.tc compat_sys_call_table[TC],compat_sys_call_table
-#endif
-
-/* This value is used to mark exception frames on the stack. */
-exception_marker:
-	.tc	ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
-
 	.section	".text"
-	.align 7
-
-#ifdef CONFIG_PPC_BOOK3S
-.macro DEBUG_SRR_VALID srr
-#ifdef CONFIG_PPC_RFI_SRR_DEBUG
-	.ifc \srr,srr
-	mfspr	r11,SPRN_SRR0
-	ld	r12,_NIP(r1)
-100:	tdne	r11,r12
-	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
-	mfspr	r11,SPRN_SRR1
-	ld	r12,_MSR(r1)
-100:	tdne	r11,r12
-	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
-	.else
-	mfspr	r11,SPRN_HSRR0
-	ld	r12,_NIP(r1)
-100:	tdne	r11,r12
-	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
-	mfspr	r11,SPRN_HSRR1
-	ld	r12,_MSR(r1)
-100:	tdne	r11,r12
-	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
-	.endif
-#endif
-.endm
-
-.macro system_call_vectored name trapnr
-	.globl system_call_vectored_\name
-system_call_vectored_\name:
-_ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-BEGIN_FTR_SECTION
-	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
-	bne	.Ltabort_syscall
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
-#endif
-	INTERRUPT_TO_KERNEL
-	mr	r10,r1
-	ld	r1,PACAKSAVE(r13)
-	std	r10,0(r1)
-	std	r11,_NIP(r1)
-	std	r12,_MSR(r1)
-	std	r0,GPR0(r1)
-	std	r10,GPR1(r1)
-	std	r2,GPR2(r1)
-	ld	r2,PACATOC(r13)
-	mfcr	r12
-	li	r11,0
-	/* Can we avoid saving r3-r8 in common case? */
-	std	r3,GPR3(r1)
-	std	r4,GPR4(r1)
-	std	r5,GPR5(r1)
-	std	r6,GPR6(r1)
-	std	r7,GPR7(r1)
-	std	r8,GPR8(r1)
-	/* Zero r9-r12, this should only be required when restoring all GPRs */
-	std	r11,GPR9(r1)
-	std	r11,GPR10(r1)
-	std	r11,GPR11(r1)
-	std	r11,GPR12(r1)
-	std	r9,GPR13(r1)
-	SAVE_NVGPRS(r1)
-	std	r11,_XER(r1)
-	std	r11,_LINK(r1)
-	std	r11,_CTR(r1)
-
-	li	r11,\trapnr
-	std	r11,_TRAP(r1)
-	std	r12,_CCR(r1)
-	std	r3,ORIG_GPR3(r1)
-	addi	r10,r1,STACK_FRAME_OVERHEAD
-	ld	r11,exception_marker@toc(r2)
-	std	r11,-16(r10)		/* "regshere" marker */
-
-BEGIN_FTR_SECTION
-	HMT_MEDIUM
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-
-	/*
-	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
-	 * would clobber syscall parameters. Also we always enter with IRQs
-	 * enabled and nothing pending. system_call_exception() will call
-	 * trace_hardirqs_off().
-	 *
-	 * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS. The
-	 * entry vector already sets PACAIRQSOFTMASK to IRQS_ALL_DISABLED.
-	 */
-
-	/* Calling convention has r9 = orig r0, r10 = regs */
-	mr	r9,r0
-	bl	system_call_exception
-
-.Lsyscall_vectored_\name\()_exit:
-	addi    r4,r1,STACK_FRAME_OVERHEAD
-	li	r5,1 /* scv */
-	bl	syscall_exit_prepare
-
-	ld	r2,_CCR(r1)
-	ld	r4,_NIP(r1)
-	ld	r5,_MSR(r1)
-
-BEGIN_FTR_SECTION
-	stdcx.	r0,0,r1			/* to clear the reservation */
-END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-
-BEGIN_FTR_SECTION
-	HMT_MEDIUM_LOW
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-
-	cmpdi	r3,0
-	bne	.Lsyscall_vectored_\name\()_restore_regs
-
-	/* rfscv returns with LR->NIA and CTR->MSR */
-	mtlr	r4
-	mtctr	r5
-
-	/* Could zero these as per ABI, but we may consider a stricter ABI
-	 * which preserves these if libc implementations can benefit, so
-	 * restore them for now until further measurement is done. */
-	ld	r0,GPR0(r1)
-	ld	r4,GPR4(r1)
-	ld	r5,GPR5(r1)
-	ld	r6,GPR6(r1)
-	ld	r7,GPR7(r1)
-	ld	r8,GPR8(r1)
-	/* Zero volatile regs that may contain sensitive kernel data */
-	li	r9,0
-	li	r10,0
-	li	r11,0
-	li	r12,0
-	mtspr	SPRN_XER,r0
-
-	/*
-	 * We don't need to restore AMR on the way back to userspace for KUAP.
-	 * The value of AMR only matters while we're in the kernel.
-	 */
-	mtcr	r2
-	ld	r2,GPR2(r1)
-	ld	r3,GPR3(r1)
-	ld	r13,GPR13(r1)
-	ld	r1,GPR1(r1)
-	RFSCV_TO_USER
-	b	.	/* prevent speculative execution */
-
-.Lsyscall_vectored_\name\()_restore_regs:
-	li	r3,0
-	mtmsrd	r3,1
-	mtspr	SPRN_SRR0,r4
-	mtspr	SPRN_SRR1,r5
-
-	ld	r3,_CTR(r1)
-	ld	r4,_LINK(r1)
-	ld	r5,_XER(r1)
-
-	REST_NVGPRS(r1)
-	ld	r0,GPR0(r1)
-	mtcr	r2
-	mtctr	r3
-	mtlr	r4
-	mtspr	SPRN_XER,r5
-	REST_10GPRS(2, r1)
-	REST_2GPRS(12, r1)
-	ld	r1,GPR1(r1)
-	RFI_TO_USER
-.endm
-
-system_call_vectored common 0x3000
-/*
- * We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0
- * which is tested by system_call_exception when r0 is -1 (as set by vector
- * entry code).
- */
-system_call_vectored sigill 0x7ff0
-
-
-/*
- * Entered via kernel return set up by kernel/sstep.c, must match entry regs
- */
-	.globl system_call_vectored_emulate
-system_call_vectored_emulate:
-_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
-	li	r10,IRQS_ALL_DISABLED
-	stb	r10,PACAIRQSOFTMASK(r13)
-	b	system_call_vectored_common
-#endif
-
-	.balign IFETCH_ALIGN_BYTES
-	.globl system_call_common_real
-system_call_common_real:
-	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
-	mtmsrd	r10
-
-	.balign IFETCH_ALIGN_BYTES
-	.globl system_call_common
-system_call_common:
-_ASM_NOKPROBE_SYMBOL(system_call_common)
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-BEGIN_FTR_SECTION
-	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
-	bne	.Ltabort_syscall
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
-#endif
-	mr	r10,r1
-	ld	r1,PACAKSAVE(r13)
-	std	r10,0(r1)
-	std	r11,_NIP(r1)
-	std	r12,_MSR(r1)
-	std	r0,GPR0(r1)
-	std	r10,GPR1(r1)
-	std	r2,GPR2(r1)
-#ifdef CONFIG_PPC_FSL_BOOK3E
-START_BTB_FLUSH_SECTION
-	BTB_FLUSH(r10)
-END_BTB_FLUSH_SECTION
-#endif
-	ld	r2,PACATOC(r13)
-	mfcr	r12
-	li	r11,0
-	/* Can we avoid saving r3-r8 in common case? */
-	std	r3,GPR3(r1)
-	std	r4,GPR4(r1)
-	std	r5,GPR5(r1)
-	std	r6,GPR6(r1)
-	std	r7,GPR7(r1)
-	std	r8,GPR8(r1)
-	/* Zero r9-r12, this should only be required when restoring all GPRs */
-	std	r11,GPR9(r1)
-	std	r11,GPR10(r1)
-	std	r11,GPR11(r1)
-	std	r11,GPR12(r1)
-	std	r9,GPR13(r1)
-	SAVE_NVGPRS(r1)
-	std	r11,_XER(r1)
-	std	r11,_CTR(r1)
-	mflr	r10
-
-	/*
-	 * This clears CR0.SO (bit 28), which is the error indication on
-	 * return from this system call.
-	 */
-	rldimi	r12,r11,28,(63-28)
-	li	r11,0xc00
-	std	r10,_LINK(r1)
-	std	r11,_TRAP(r1)
-	std	r12,_CCR(r1)
-	std	r3,ORIG_GPR3(r1)
-	addi	r10,r1,STACK_FRAME_OVERHEAD
-	ld	r11,exception_marker@toc(r2)
-	std	r11,-16(r10)		/* "regshere" marker */
-
-	li	r11,1
-	stb	r11,PACASRR_VALID(r13)
-
-	/*
-	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
-	 * would clobber syscall parameters. Also we always enter with IRQs
-	 * enabled and nothing pending. system_call_exception() will call
-	 * trace_hardirqs_off().
-	 */
-	li	r11,IRQS_ALL_DISABLED
-	li	r12,PACA_IRQ_HARD_DIS
-	stb	r11,PACAIRQSOFTMASK(r13)
-	stb	r12,PACAIRQHAPPENED(r13)
-
-	/* Calling convention has r9 = orig r0, r10 = regs */
-	mr	r9,r0
-	bl	system_call_exception
-
-.Lsyscall_exit:
-	addi    r4,r1,STACK_FRAME_OVERHEAD
-	li	r5,0 /* !scv */
-	bl	syscall_exit_prepare
-
-	ld	r2,_CCR(r1)
-	ld	r6,_LINK(r1)
-	mtlr	r6
-
-	lbz	r4,PACASRR_VALID(r13)
-	cmpdi	r4,0
-	bne	1f
-	li	r4,0
-	stb	r4,PACASRR_VALID(r13)
-	ld	r4,_NIP(r1)
-	ld	r5,_MSR(r1)
-	mtspr	SPRN_SRR0,r4
-	mtspr	SPRN_SRR1,r5
-1:
-	DEBUG_SRR_VALID srr
-
-BEGIN_FTR_SECTION
-	stdcx.	r0,0,r1			/* to clear the reservation */
-END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-
-	cmpdi	r3,0
-	bne	.Lsyscall_restore_regs
-	/* Zero volatile regs that may contain sensitive kernel data */
-	li	r0,0
-	li	r4,0
-	li	r5,0
-	li	r6,0
-	li	r7,0
-	li	r8,0
-	li	r9,0
-	li	r10,0
-	li	r11,0
-	li	r12,0
-	mtctr	r0
-	mtspr	SPRN_XER,r0
-.Lsyscall_restore_regs_cont:
-
-BEGIN_FTR_SECTION
-	HMT_MEDIUM_LOW
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-
-	/*
-	 * We don't need to restore AMR on the way back to userspace for KUAP.
-	 * The value of AMR only matters while we're in the kernel.
-	 */
-	mtcr	r2
-	ld	r2,GPR2(r1)
-	ld	r3,GPR3(r1)
-	ld	r13,GPR13(r1)
-	ld	r1,GPR1(r1)
-	RFI_TO_USER
-	b	.	/* prevent speculative execution */
-
-.Lsyscall_restore_regs:
-	ld	r3,_CTR(r1)
-	ld	r4,_XER(r1)
-	REST_NVGPRS(r1)
-	mtctr	r3
-	mtspr	SPRN_XER,r4
-	ld	r0,GPR0(r1)
-	REST_8GPRS(4, r1)
-	ld	r12,GPR12(r1)
-	b	.Lsyscall_restore_regs_cont
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-.Ltabort_syscall:
-	/* Firstly we need to enable TM in the kernel */
-	mfmsr	r10
-	li	r9, 1
-	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
-	mtmsrd	r10, 0
-
-	/* tabort, this dooms the transaction, nothing else */
-	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
-	TABORT(R9)
-
-	/*
-	 * Return directly to userspace. We have corrupted user register state,
-	 * but userspace will never see that register state. Execution will
-	 * resume after the tbegin of the aborted transaction with the
-	 * checkpointed register state.
-	 */
-	li	r9, MSR_RI
-	andc	r10, r10, r9
-	mtmsrd	r10, 1
-	mtspr	SPRN_SRR0, r11
-	mtspr	SPRN_SRR1, r12
-	RFI_TO_USER
-	b	.	/* prevent speculative execution */
-#endif
-
-#ifdef CONFIG_PPC_BOOK3S
-_GLOBAL(ret_from_fork_scv)
-	bl	schedule_tail
-	REST_NVGPRS(r1)
-	li	r3,0	/* fork() return value */
-	b	.Lsyscall_vectored_common_exit
-#endif
-
-_GLOBAL(ret_from_fork)
-	bl	schedule_tail
-	REST_NVGPRS(r1)
-	li	r3,0	/* fork() return value */
-	b	.Lsyscall_exit
-
-_GLOBAL(ret_from_kernel_thread)
-	bl	schedule_tail
-	REST_NVGPRS(r1)
-	mtctr	r14
-	mr	r3,r15
-#ifdef PPC64_ELF_ABI_v2
-	mr	r12,r14
-#endif
-	bctrl
-	li	r3,0
-	b	.Lsyscall_exit
-
 #ifdef CONFIG_PPC_BOOK3E
 /* Save non-volatile GPRs, if not already saved. */
 _GLOBAL(save_nvgprs)
@@ -681,202 +276,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	addi	r1,r1,SWITCH_FRAME_SIZE
 	blr
 
-#ifdef CONFIG_PPC_BOOK3S
-	/*
-	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
-	 * touched, no exit work created, then this can be used.
-	 */
-	.balign IFETCH_ALIGN_BYTES
-	.globl fast_interrupt_return_srr
-fast_interrupt_return_srr:
-_ASM_NOKPROBE_SYMBOL(fast_interrupt_return_srr)
-	kuap_check_amr r3, r4
-	ld	r5,_MSR(r1)
-	andi.	r0,r5,MSR_PR
-	bne	.Lfast_user_interrupt_return_srr
-	kuap_restore_amr r3, r4
-	andi.	r0,r5,MSR_RI
-	li	r3,0 /* 0 return value, no EMULATE_STACK_STORE */
-	bne+	.Lfast_kernel_interrupt_return_srr
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	unrecoverable_exception
-	b	. /* should not get here */
-
-.macro interrupt_return_macro srr
-	.balign IFETCH_ALIGN_BYTES
-	.globl interrupt_return_\srr
-interrupt_return_\srr\():
-_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
-	ld	r4,_MSR(r1)
-	andi.	r0,r4,MSR_PR
-	beq	.Lkernel_interrupt_return_\srr
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	interrupt_exit_user_prepare
-	cmpdi	r3,0
-	bne-	.Lrestore_nvgprs_\srr
-
-.Lfast_user_interrupt_return_\srr:
-BEGIN_FTR_SECTION
-	ld	r10,_PPR(r1)
-	mtspr	SPRN_PPR,r10
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-	.ifc \srr,srr
-	lbz	r4,PACASRR_VALID(r13)
-	.else
-	lbz	r4,PACAHSRR_VALID(r13)
-	.endif
-	cmpdi	r4,0
-	li	r4,0
-	bne	1f
-	ld	r11,_NIP(r1)
-	ld	r12,_MSR(r1)
-	.ifc \srr,srr
-	mtspr	SPRN_SRR0,r11
-	mtspr	SPRN_SRR1,r12
-1:
-	stb	r4,PACASRR_VALID(r13)
-	.else
-	mtspr	SPRN_HSRR0,r11
-	mtspr	SPRN_HSRR1,r12
-1:
-	stb	r4,PACAHSRR_VALID(r13)
-	.endif
-	DEBUG_SRR_VALID \srr
-
-BEGIN_FTR_SECTION
-	stdcx.	r0,0,r1		/* to clear the reservation */
-FTR_SECTION_ELSE
-	ldarx	r0,0,r1
-ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-
-	ld	r3,_CCR(r1)
-	ld	r4,_LINK(r1)
-	ld	r5,_CTR(r1)
-	ld	r6,_XER(r1)
-	li	r0,0
-
-	REST_4GPRS(7, r1)
-	REST_2GPRS(11, r1)
-	REST_GPR(13, r1)
-
-	mtcr	r3
-	mtlr	r4
-	mtctr	r5
-	mtspr	SPRN_XER,r6
-
-	REST_4GPRS(2, r1)
-	REST_GPR(6, r1)
-	REST_GPR(0, r1)
-	REST_GPR(1, r1)
-	.ifc \srr,srr
-	RFI_TO_USER
-	.else
-	HRFI_TO_USER
-	.endif
-	b	.	/* prevent speculative execution */
-
-.Lrestore_nvgprs_\srr\():
-	REST_NVGPRS(r1)
-	b	.Lfast_user_interrupt_return_\srr
-
-	.balign IFETCH_ALIGN_BYTES
-.Lkernel_interrupt_return_\srr\():
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	interrupt_exit_kernel_prepare
-
-.Lfast_kernel_interrupt_return_\srr\():
-	cmpdi	cr1,r3,0
-	.ifc \srr,srr
-	lbz	r4,PACASRR_VALID(r13)
-	.else
-	lbz	r4,PACAHSRR_VALID(r13)
-	.endif
-	cmpdi	r4,0
-	li	r4,0
-	bne	1f
-	ld	r11,_NIP(r1)
-	ld	r12,_MSR(r1)
-	.ifc \srr,srr
-	mtspr	SPRN_SRR0,r11
-	mtspr	SPRN_SRR1,r12
-1:
-	stb	r4,PACASRR_VALID(r13)
-	.else
-	mtspr	SPRN_HSRR0,r11
-	mtspr	SPRN_HSRR1,r12
-1:
-	stb	r4,PACAHSRR_VALID(r13)
-	.endif
-	DEBUG_SRR_VALID \srr
-
-BEGIN_FTR_SECTION
-	stdcx.	r0,0,r1		/* to clear the reservation */
-FTR_SECTION_ELSE
-	ldarx	r0,0,r1
-ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-
-	ld	r3,_LINK(r1)
-	ld	r4,_CTR(r1)
-	ld	r5,_XER(r1)
-	ld	r6,_CCR(r1)
-	li	r0,0
-
-	REST_4GPRS(7, r1)
-	REST_2GPRS(11, r1)
-
-	mtlr	r3
-	mtctr	r4
-	mtspr	SPRN_XER,r5
-
-	/*
-	 * Leaving a stale exception_marker on the stack can confuse
-	 * the reliable stack unwinder later on. Clear it.
-	 */
-	std	r0,STACK_FRAME_OVERHEAD-16(r1)
-
-	REST_4GPRS(2, r1)
-
-	bne-	cr1,1f /* emulate stack store */
-	mtcr	r6
-	REST_GPR(6, r1)
-	REST_GPR(0, r1)
-	REST_GPR(1, r1)
-	.ifc \srr,srr
-	RFI_TO_KERNEL
-	.else
-	HRFI_TO_KERNEL
-	.endif
-	b	.	/* prevent speculative execution */
-
-1:	/*
-	 * Emulate stack store with update. New r1 value was already calculated
-	 * and updated in our interrupt regs by emulate_loadstore, but we can't
-	 * store the previous value of r1 to the stack before re-loading our
-	 * registers from it, otherwise they could be clobbered.  Use
-	 * PACA_EXGEN as temporary storage to hold the store data, as
-	 * interrupts are disabled here so it won't be clobbered.
-	 */
-	mtcr	r6
-	std	r9,PACA_EXGEN+0(r13)
-	addi	r9,r1,INT_FRAME_SIZE /* get original r1 */
-	REST_GPR(6, r1)
-	REST_GPR(0, r1)
-	REST_GPR(1, r1)
-	std	r9,0(r1) /* perform store component of stdu */
-	ld	r9,PACA_EXGEN+0(r13)
-
-	.ifc \srr,srr
-	RFI_TO_KERNEL
-	.else
-	HRFI_TO_KERNEL
-	.endif
-	b	.	/* prevent speculative execution */
-.endm
-
-interrupt_return_macro srr
-interrupt_return_macro hsrr
-#endif /* CONFIG_PPC_BOOK3S */
-
 #ifdef CONFIG_PPC_RTAS
 /*
  * On CHRP, the Run-Time Abstraction Services (RTAS) have to be
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 4e2591cb4bd1..b13a2d79dfb4 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -189,8 +189,9 @@ CLOSE_FIXED_SECTION(first_256B)
 
 /* This value is used to mark exception frames on the stack. */
 	.section ".toc","aw"
+/* This value is used to mark exception frames on the stack. */
 exception_marker:
-	.tc	ID_72656773_68657265[TC],0x7265677368657265
+	.tc	ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
 	.previous
 
 /*
@@ -206,6 +207,8 @@ OPEN_TEXT_SECTION(0x100)
 
 USE_TEXT_SECTION()
 
+#include "interrupt_64.S"
+
 #ifdef CONFIG_PPC_BOOK3E
 /*
  * The booting_thread_hwid holds the thread id we want to boot in cpu
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
new file mode 100644
index 000000000000..e121829ef717
--- /dev/null
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -0,0 +1,608 @@
+#include <asm/ppc_asm.h>
+#include <asm/head-64.h>
+#include <asm/bug.h>
+#include <asm/hw_irq.h>
+#include <asm/tm.h>
+#include <asm/mmu.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+#include <asm/ptrace.h>
+#include <asm/head-64.h>
+#include <asm/feature-fixups.h>
+#include <asm/kup.h>
+
+	.section	".toc","aw"
+SYS_CALL_TABLE:
+	.tc sys_call_table[TC],sys_call_table
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYS_CALL_TABLE:
+	.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
+	.previous
+
+#ifdef CONFIG_PPC_BOOK3S
+.macro DEBUG_SRR_VALID srr
+#ifdef CONFIG_PPC_RFI_SRR_DEBUG
+	.ifc \srr,srr
+	mfspr	r11,SPRN_SRR0
+	ld	r12,_NIP(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	mfspr	r11,SPRN_SRR1
+	ld	r12,_MSR(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	.else
+	mfspr	r11,SPRN_HSRR0
+	ld	r12,_NIP(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	mfspr	r11,SPRN_HSRR1
+	ld	r12,_MSR(r1)
+100:	tdne	r11,r12
+	EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
+	.endif
+#endif
+.endm
+
+.macro system_call_vectored name trapnr
+	.globl system_call_vectored_\name
+system_call_vectored_\name:
+_ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
+	bne	.Ltabort_syscall
+END_FTR_SECTION_IFSET(CPU_FTR_TM)
+#endif
+	INTERRUPT_TO_KERNEL
+	mr	r10,r1
+	ld	r1,PACAKSAVE(r13)
+	std	r10,0(r1)
+	std	r11,_NIP(r1)
+	std	r12,_MSR(r1)
+	std	r0,GPR0(r1)
+	std	r10,GPR1(r1)
+	std	r2,GPR2(r1)
+	ld	r2,PACATOC(r13)
+	mfcr	r12
+	li	r11,0
+	/* Can we avoid saving r3-r8 in common case? */
+	std	r3,GPR3(r1)
+	std	r4,GPR4(r1)
+	std	r5,GPR5(r1)
+	std	r6,GPR6(r1)
+	std	r7,GPR7(r1)
+	std	r8,GPR8(r1)
+	/* Zero r9-r12, this should only be required when restoring all GPRs */
+	std	r11,GPR9(r1)
+	std	r11,GPR10(r1)
+	std	r11,GPR11(r1)
+	std	r11,GPR12(r1)
+	std	r9,GPR13(r1)
+	SAVE_NVGPRS(r1)
+	std	r11,_XER(r1)
+	std	r11,_LINK(r1)
+	std	r11,_CTR(r1)
+
+	li	r11,\trapnr
+	std	r11,_TRAP(r1)
+	std	r12,_CCR(r1)
+	std	r3,ORIG_GPR3(r1)
+	addi	r10,r1,STACK_FRAME_OVERHEAD
+	ld	r11,exception_marker@toc(r2)
+	std	r11,-16(r10)		/* "regshere" marker */
+
+BEGIN_FTR_SECTION
+	HMT_MEDIUM
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	/*
+	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+	 * would clobber syscall parameters. Also we always enter with IRQs
+	 * enabled and nothing pending. system_call_exception() will call
+	 * trace_hardirqs_off().
+	 *
+	 * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS. The
+	 * entry vector already sets PACAIRQSOFTMASK to IRQS_ALL_DISABLED.
+	 */
+
+	/* Calling convention has r9 = orig r0, r10 = regs */
+	mr	r9,r0
+	bl	system_call_exception
+
+.Lsyscall_vectored_\name\()_exit:
+	addi    r4,r1,STACK_FRAME_OVERHEAD
+	li	r5,1 /* scv */
+	bl	syscall_exit_prepare
+
+	ld	r2,_CCR(r1)
+	ld	r4,_NIP(r1)
+	ld	r5,_MSR(r1)
+
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1			/* to clear the reservation */
+END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+BEGIN_FTR_SECTION
+	HMT_MEDIUM_LOW
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	cmpdi	r3,0
+	bne	.Lsyscall_vectored_\name\()_restore_regs
+
+	/* rfscv returns with LR->NIA and CTR->MSR */
+	mtlr	r4
+	mtctr	r5
+
+	/* Could zero these as per ABI, but we may consider a stricter ABI
+	 * which preserves these if libc implementations can benefit, so
+	 * restore them for now until further measurement is done. */
+	ld	r0,GPR0(r1)
+	ld	r4,GPR4(r1)
+	ld	r5,GPR5(r1)
+	ld	r6,GPR6(r1)
+	ld	r7,GPR7(r1)
+	ld	r8,GPR8(r1)
+	/* Zero volatile regs that may contain sensitive kernel data */
+	li	r9,0
+	li	r10,0
+	li	r11,0
+	li	r12,0
+	mtspr	SPRN_XER,r0
+
+	/*
+	 * We don't need to restore AMR on the way back to userspace for KUAP.
+	 * The value of AMR only matters while we're in the kernel.
+	 */
+	mtcr	r2
+	ld	r2,GPR2(r1)
+	ld	r3,GPR3(r1)
+	ld	r13,GPR13(r1)
+	ld	r1,GPR1(r1)
+	RFSCV_TO_USER
+	b	.	/* prevent speculative execution */
+
+.Lsyscall_vectored_\name\()_restore_regs:
+	li	r3,0
+	mtmsrd	r3,1
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+
+	ld	r3,_CTR(r1)
+	ld	r4,_LINK(r1)
+	ld	r5,_XER(r1)
+
+	REST_NVGPRS(r1)
+	ld	r0,GPR0(r1)
+	mtcr	r2
+	mtctr	r3
+	mtlr	r4
+	mtspr	SPRN_XER,r5
+	REST_10GPRS(2, r1)
+	REST_2GPRS(12, r1)
+	ld	r1,GPR1(r1)
+	RFI_TO_USER
+.endm
+
+system_call_vectored common 0x3000
+/*
+ * We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0
+ * which is tested by system_call_exception when r0 is -1 (as set by vector
+ * entry code).
+ */
+system_call_vectored sigill 0x7ff0
+
+
+/*
+ * Entered via kernel return set up by kernel/sstep.c, must match entry regs
+ */
+	.globl system_call_vectored_emulate
+system_call_vectored_emulate:
+_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
+	li	r10,IRQS_ALL_DISABLED
+	stb	r10,PACAIRQSOFTMASK(r13)
+	b	system_call_vectored_common
+#endif /* CONFIG_PPC_BOOK3S */
+
+	.balign IFETCH_ALIGN_BYTES
+	.globl system_call_common_real
+system_call_common_real:
+	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
+	mtmsrd	r10
+
+	.balign IFETCH_ALIGN_BYTES
+	.globl system_call_common
+system_call_common:
+_ASM_NOKPROBE_SYMBOL(system_call_common)
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
+	bne	.Ltabort_syscall
+END_FTR_SECTION_IFSET(CPU_FTR_TM)
+#endif
+	mr	r10,r1
+	ld	r1,PACAKSAVE(r13)
+	std	r10,0(r1)
+	std	r11,_NIP(r1)
+	std	r12,_MSR(r1)
+	std	r0,GPR0(r1)
+	std	r10,GPR1(r1)
+	std	r2,GPR2(r1)
+#ifdef CONFIG_PPC_FSL_BOOK3E
+START_BTB_FLUSH_SECTION
+	BTB_FLUSH(r10)
+END_BTB_FLUSH_SECTION
+#endif
+	ld	r2,PACATOC(r13)
+	mfcr	r12
+	li	r11,0
+	/* Can we avoid saving r3-r8 in common case? */
+	std	r3,GPR3(r1)
+	std	r4,GPR4(r1)
+	std	r5,GPR5(r1)
+	std	r6,GPR6(r1)
+	std	r7,GPR7(r1)
+	std	r8,GPR8(r1)
+	/* Zero r9-r12, this should only be required when restoring all GPRs */
+	std	r11,GPR9(r1)
+	std	r11,GPR10(r1)
+	std	r11,GPR11(r1)
+	std	r11,GPR12(r1)
+	std	r9,GPR13(r1)
+	SAVE_NVGPRS(r1)
+	std	r11,_XER(r1)
+	std	r11,_CTR(r1)
+	mflr	r10
+
+	/*
+	 * This clears CR0.SO (bit 28), which is the error indication on
+	 * return from this system call.
+	 */
+	rldimi	r12,r11,28,(63-28)
+	li	r11,0xc00
+	std	r10,_LINK(r1)
+	std	r11,_TRAP(r1)
+	std	r12,_CCR(r1)
+	std	r3,ORIG_GPR3(r1)
+	addi	r10,r1,STACK_FRAME_OVERHEAD
+	ld	r11,exception_marker@toc(r2)
+	std	r11,-16(r10)		/* "regshere" marker */
+
+	li	r11,1
+	stb	r11,PACASRR_VALID(r13)
+
+	/*
+	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+	 * would clobber syscall parameters. Also we always enter with IRQs
+	 * enabled and nothing pending. system_call_exception() will call
+	 * trace_hardirqs_off().
+	 */
+	li	r11,IRQS_ALL_DISABLED
+	li	r12,PACA_IRQ_HARD_DIS
+	stb	r11,PACAIRQSOFTMASK(r13)
+	stb	r12,PACAIRQHAPPENED(r13)
+
+	/* Calling convention has r9 = orig r0, r10 = regs */
+	mr	r9,r0
+	bl	system_call_exception
+
+.Lsyscall_exit:
+	addi    r4,r1,STACK_FRAME_OVERHEAD
+	li	r5,0 /* !scv */
+	bl	syscall_exit_prepare
+
+	ld	r2,_CCR(r1)
+	ld	r6,_LINK(r1)
+	mtlr	r6
+
+	lbz	r4,PACASRR_VALID(r13)
+	cmpdi	r4,0
+	bne	1f
+	li	r4,0
+	stb	r4,PACASRR_VALID(r13)
+	ld	r4,_NIP(r1)
+	ld	r5,_MSR(r1)
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+1:
+	DEBUG_SRR_VALID srr
+
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1			/* to clear the reservation */
+END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+	cmpdi	r3,0
+	bne	.Lsyscall_restore_regs
+	/* Zero volatile regs that may contain sensitive kernel data */
+	li	r0,0
+	li	r4,0
+	li	r5,0
+	li	r6,0
+	li	r7,0
+	li	r8,0
+	li	r9,0
+	li	r10,0
+	li	r11,0
+	li	r12,0
+	mtctr	r0
+	mtspr	SPRN_XER,r0
+.Lsyscall_restore_regs_cont:
+
+BEGIN_FTR_SECTION
+	HMT_MEDIUM_LOW
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	/*
+	 * We don't need to restore AMR on the way back to userspace for KUAP.
+	 * The value of AMR only matters while we're in the kernel.
+	 */
+	mtcr	r2
+	ld	r2,GPR2(r1)
+	ld	r3,GPR3(r1)
+	ld	r13,GPR13(r1)
+	ld	r1,GPR1(r1)
+	RFI_TO_USER
+	b	.	/* prevent speculative execution */
+
+.Lsyscall_restore_regs:
+	ld	r3,_CTR(r1)
+	ld	r4,_XER(r1)
+	REST_NVGPRS(r1)
+	mtctr	r3
+	mtspr	SPRN_XER,r4
+	ld	r0,GPR0(r1)
+	REST_8GPRS(4, r1)
+	ld	r12,GPR12(r1)
+	b	.Lsyscall_restore_regs_cont
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+.Ltabort_syscall:
+	/* Firstly we need to enable TM in the kernel */
+	mfmsr	r10
+	li	r9, 1
+	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
+	mtmsrd	r10, 0
+
+	/* tabort, this dooms the transaction, nothing else */
+	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
+	TABORT(R9)
+
+	/*
+	 * Return directly to userspace. We have corrupted user register state,
+	 * but userspace will never see that register state. Execution will
+	 * resume after the tbegin of the aborted transaction with the
+	 * checkpointed register state.
+	 */
+	li	r9, MSR_RI
+	andc	r10, r10, r9
+	mtmsrd	r10, 1
+	mtspr	SPRN_SRR0, r11
+	mtspr	SPRN_SRR1, r12
+	RFI_TO_USER
+	b	.	/* prevent speculative execution */
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S
+_GLOBAL(ret_from_fork_scv)
+	bl	schedule_tail
+	REST_NVGPRS(r1)
+	li	r3,0	/* fork() return value */
+	b	.Lsyscall_vectored_common_exit
+#endif
+
+_GLOBAL(ret_from_fork)
+	bl	schedule_tail
+	REST_NVGPRS(r1)
+	li	r3,0	/* fork() return value */
+	b	.Lsyscall_exit
+
+_GLOBAL(ret_from_kernel_thread)
+	bl	schedule_tail
+	REST_NVGPRS(r1)
+	mtctr	r14
+	mr	r3,r15
+#ifdef PPC64_ELF_ABI_v2
+	mr	r12,r14
+#endif
+	bctrl
+	li	r3,0
+	b	.Lsyscall_exit
+
+#ifdef CONFIG_PPC_BOOK3S
+	/*
+	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
+	 * touched, no exit work created, then this can be used.
+	 */
+	.balign IFETCH_ALIGN_BYTES
+	.globl fast_interrupt_return_srr
+fast_interrupt_return_srr:
+_ASM_NOKPROBE_SYMBOL(fast_interrupt_return_srr)
+	kuap_check_amr r3, r4
+	ld	r5,_MSR(r1)
+	andi.	r0,r5,MSR_PR
+	bne	.Lfast_user_interrupt_return_srr
+	kuap_restore_amr r3, r4
+	andi.	r0,r5,MSR_RI
+	li	r3,0 /* 0 return value, no EMULATE_STACK_STORE */
+	bne+	.Lfast_kernel_interrupt_return_srr
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	unrecoverable_exception
+	b	. /* should not get here */
+
+.macro interrupt_return_macro srr
+	.balign IFETCH_ALIGN_BYTES
+	.globl interrupt_return_\srr
+interrupt_return_\srr\():
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
+	ld	r4,_MSR(r1)
+	andi.	r0,r4,MSR_PR
+	beq	.Lkernel_interrupt_return_\srr
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	interrupt_exit_user_prepare
+	cmpdi	r3,0
+	bne-	.Lrestore_nvgprs_\srr
+
+.Lfast_user_interrupt_return_\srr:
+BEGIN_FTR_SECTION
+	ld	r10,_PPR(r1)
+	mtspr	SPRN_PPR,r10
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	.ifc \srr,srr
+	lbz	r4,PACASRR_VALID(r13)
+	.else
+	lbz	r4,PACAHSRR_VALID(r13)
+	.endif
+	cmpdi	r4,0
+	li	r4,0
+	bne	1f
+	ld	r11,_NIP(r1)
+	ld	r12,_MSR(r1)
+	.ifc \srr,srr
+	mtspr	SPRN_SRR0,r11
+	mtspr	SPRN_SRR1,r12
+1:
+	stb	r4,PACASRR_VALID(r13)
+	.else
+	mtspr	SPRN_HSRR0,r11
+	mtspr	SPRN_HSRR1,r12
+1:
+	stb	r4,PACAHSRR_VALID(r13)
+	.endif
+	DEBUG_SRR_VALID \srr
+
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1		/* to clear the reservation */
+FTR_SECTION_ELSE
+	ldarx	r0,0,r1
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+	ld	r3,_CCR(r1)
+	ld	r4,_LINK(r1)
+	ld	r5,_CTR(r1)
+	ld	r6,_XER(r1)
+	li	r0,0
+
+	REST_4GPRS(7, r1)
+	REST_2GPRS(11, r1)
+	REST_GPR(13, r1)
+
+	mtcr	r3
+	mtlr	r4
+	mtctr	r5
+	mtspr	SPRN_XER,r6
+
+	REST_4GPRS(2, r1)
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
+	.ifc \srr,srr
+	RFI_TO_USER
+	.else
+	HRFI_TO_USER
+	.endif
+	b	.	/* prevent speculative execution */
+
+.Lrestore_nvgprs_\srr\():
+	REST_NVGPRS(r1)
+	b	.Lfast_user_interrupt_return_\srr
+
+	.balign IFETCH_ALIGN_BYTES
+.Lkernel_interrupt_return_\srr\():
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	interrupt_exit_kernel_prepare
+
+.Lfast_kernel_interrupt_return_\srr\():
+	cmpdi	cr1,r3,0
+	.ifc \srr,srr
+	lbz	r4,PACASRR_VALID(r13)
+	.else
+	lbz	r4,PACAHSRR_VALID(r13)
+	.endif
+	cmpdi	r4,0
+	li	r4,0
+	bne	1f
+	ld	r11,_NIP(r1)
+	ld	r12,_MSR(r1)
+	.ifc \srr,srr
+	mtspr	SPRN_SRR0,r11
+	mtspr	SPRN_SRR1,r12
+1:
+	stb	r4,PACASRR_VALID(r13)
+	.else
+	mtspr	SPRN_HSRR0,r11
+	mtspr	SPRN_HSRR1,r12
+1:
+	stb	r4,PACAHSRR_VALID(r13)
+	.endif
+	DEBUG_SRR_VALID \srr
+
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1		/* to clear the reservation */
+FTR_SECTION_ELSE
+	ldarx	r0,0,r1
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+	ld	r3,_LINK(r1)
+	ld	r4,_CTR(r1)
+	ld	r5,_XER(r1)
+	ld	r6,_CCR(r1)
+	li	r0,0
+
+	REST_4GPRS(7, r1)
+	REST_2GPRS(11, r1)
+
+	mtlr	r3
+	mtctr	r4
+	mtspr	SPRN_XER,r5
+
+	/*
+	 * Leaving a stale exception_marker on the stack can confuse
+	 * the reliable stack unwinder later on. Clear it.
+	 */
+	std	r0,STACK_FRAME_OVERHEAD-16(r1)
+
+	REST_4GPRS(2, r1)
+
+	bne-	cr1,1f /* emulate stack store */
+	mtcr	r6
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
+	.ifc \srr,srr
+	RFI_TO_KERNEL
+	.else
+	HRFI_TO_KERNEL
+	.endif
+	b	.	/* prevent speculative execution */
+
+1:	/*
+	 * Emulate stack store with update. New r1 value was already calculated
+	 * and updated in our interrupt regs by emulate_loadstore, but we can't
+	 * store the previous value of r1 to the stack before re-loading our
+	 * registers from it, otherwise they could be clobbered.  Use
+	 * PACA_EXGEN as temporary storage to hold the store data, as
+	 * interrupts are disabled here so it won't be clobbered.
+	 */
+	mtcr	r6
+	std	r9,PACA_EXGEN+0(r13)
+	addi	r9,r1,INT_FRAME_SIZE /* get original r1 */
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
+	std	r9,0(r1) /* perform store component of stdu */
+	ld	r9,PACA_EXGEN+0(r13)
+
+	.ifc \srr,srr
+	RFI_TO_KERNEL
+	.else
+	HRFI_TO_KERNEL
+	.endif
+	b	.	/* prevent speculative execution */
+.endm
+
+interrupt_return_macro srr
+interrupt_return_macro hsrr
+
+#endif /* CONFIG_PPC_BOOK3S */
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 6/9] powerpc/64s: save one more register in the masked interrupt handler
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

This frees up one more register (and takes advantage of that to
clean things up a little bit).

This register will be used in the following patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 34 ++++++++++++++++------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b370a5f334fc..a01f69e774b5 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2745,7 +2745,6 @@ INT_DEFINE_END(soft_nmi)
  * and run it entirely with interrupts hard disabled.
  */
 EXC_COMMON_BEGIN(soft_nmi_common)
-	mfspr	r11,SPRN_SRR0
 	mr	r10,r1
 	ld	r1,PACAEMERGSP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
@@ -2780,19 +2779,24 @@ masked_Hinterrupt:
 	.else
 masked_interrupt:
 	.endif
-	lbz	r11,PACAIRQHAPPENED(r13)
-	or	r11,r11,r10
-	stb	r11,PACAIRQHAPPENED(r13)
+	stw	r9,PACA_EXGEN+EX_CCR(r13)
+	lbz	r9,PACAIRQHAPPENED(r13)
+	or	r9,r9,r10
+	stb	r9,PACAIRQHAPPENED(r13)
+
+	.if ! \hsrr
 	cmpwi	r10,PACA_IRQ_DEC
 	bne	1f
-	lis	r10,0x7fff
-	ori	r10,r10,0xffff
-	mtspr	SPRN_DEC,r10
+	LOAD_REG_IMMEDIATE(r9, 0x7fffffff)
+	mtspr	SPRN_DEC,r9
 #ifdef CONFIG_PPC_WATCHDOG
+	lwz	r9,PACA_EXGEN+EX_CCR(r13)
 	b	soft_nmi_common
 #else
 	b	2f
 #endif
+	.endif
+
 1:	andi.	r10,r10,PACA_IRQ_MUST_HARD_MASK
 	beq	2f
 	xori	r12,r12,MSR_EE	/* clear MSR_EE */
@@ -2801,17 +2805,19 @@ masked_interrupt:
 	.else
 	mtspr	SPRN_SRR1,r12
 	.endif
-	ori	r11,r11,PACA_IRQ_HARD_DIS
-	stb	r11,PACAIRQHAPPENED(r13)
+	ori	r9,r9,PACA_IRQ_HARD_DIS
+	stb	r9,PACAIRQHAPPENED(r13)
 2:	/* done */
-	li	r10,0
+	li	r9,0
 	.if \hsrr
-	stb	r10,PACAHSRR_VALID(r13)
+	stb	r9,PACAHSRR_VALID(r13)
 	.else
-	stb	r10,PACASRR_VALID(r13)
+	stb	r9,PACASRR_VALID(r13)
 	.endif
-	ld	r10,PACA_EXGEN+EX_CTR(r13)
-	mtctr	r10
+
+	ld	r9,PACA_EXGEN+EX_CTR(r13)
+	mtctr	r9
+	lwz	r9,PACA_EXGEN+EX_CCR(r13)
 	mtcrf	0x80,r9
 	std	r1,PACAR1(r13)
 	ld	r9,PACA_EXGEN+EX_R9(r13)
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 7/9] powerpc/64s: allow alternate return locations for soft-masked interrupts
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

This is a variation of the exception table code which adjusts a failed
page fault return location if it was taken at an address specified in
an exception table, to a corresponding fixup handler address.

This patch adds a similar masked interrupt restart table that is checked
when when an asynchronous interrupt is taken while soft-masked.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/interrupt.h | 18 ++++++++++++++
 arch/powerpc/include/asm/ppc_asm.h   |  8 +++++++
 arch/powerpc/kernel/exceptions-64s.S | 36 +++++++++++++++++++++++++++-
 arch/powerpc/kernel/interrupt_64.S   |  3 +++
 arch/powerpc/kernel/vmlinux.lds.S    | 10 ++++++++
 arch/powerpc/lib/Makefile            |  2 +-
 arch/powerpc/lib/restart_table.c     | 26 ++++++++++++++++++++
 arch/powerpc/perf/core-book3s.c      | 19 ++++++++++++---
 8 files changed, 117 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/lib/restart_table.c

diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 1aeb9a841cc6..5d68c510e5e0 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -18,6 +18,11 @@ static inline void nap_adjust_return(struct pt_regs *regs)
 #endif
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+extern char __end_soft_masked[];
+unsigned long search_kernel_restart_table(unsigned long addr);
+#endif
+
 struct interrupt_state {
 #ifdef CONFIG_PPC_BOOK3E_64
 	enum ctx_state ctx_state;
@@ -47,6 +52,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
 		 */
 		if (TRAP(regs) != 0x700)
 			CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+		BUG_ON(regs->nip < (unsigned long)__end_soft_masked));
+		if (arch_irq_disabled_regs(regs))
+			BUG_ON(search_kernel_restart_table(regs->nip));
 	}
 #endif
 
@@ -126,6 +134,8 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte
 {
 #ifdef CONFIG_PPC64
 #ifdef CONFIG_PPC_BOOK3S_64
+	if (!(regs->msr & MSR_PR) && regs->nip < (unsigned long)__end_soft_masked)
+		regs->softe = IRQS_ALL_DISABLED;
 	state->irq_soft_mask = local_paca->irq_soft_mask;
 	state->irq_happened = local_paca->irq_happened;
 
@@ -163,6 +173,14 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
 
 	nap_adjust_return(regs);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	if (arch_irq_disabled_regs(regs)) {
+		unsigned long rst = search_kernel_restart_table(regs->nip);
+		if (rst)
+			regs_set_return_ip(regs, rst);
+	}
+#endif
+
 #ifdef CONFIG_PPC64
 	this_cpu_set_ftrace_enabled(state->ftrace_enabled);
 
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index c1199f6c75a3..da084d695cfe 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -791,6 +791,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 	stringify_in_c(.long (_target) - . ;)	\
 	stringify_in_c(.previous)
 
+#define RESTART_TABLE(_start, _end, _target)	\
+	stringify_in_c(.section __restart_table,"a";)\
+	stringify_in_c(.balign 8;)		\
+	stringify_in_c(.llong (_start);)	\
+	stringify_in_c(.llong (_end);)		\
+	stringify_in_c(.llong (_target);)	\
+	stringify_in_c(.previous)
+
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #define BTB_FLUSH(reg)			\
 	lis reg,BUCSR_INIT@h;		\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index a01f69e774b5..0615c2e724ea 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -514,8 +514,9 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 
 		/* Kernel code running below __end_interrupts is implicitly
 		 * soft-masked */
-		LOAD_HANDLER(r10, __end_interrupts)
+		LOAD_HANDLER(r10, __end_soft_masked)
 		cmpld	r11,r10
+
 		li	r10,IMASK
 		blt-	1f
 
@@ -673,6 +674,28 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	__GEN_COMMON_BODY \name
 .endm
 
+.macro SEARCH_RESTART_TABLE
+	LOAD_REG_IMMEDIATE_SYM(r9, r12, __start___restart_table)
+	LOAD_REG_IMMEDIATE_SYM(r10, r12, __stop___restart_table)
+300:
+	cmpd	r9,r10
+	beq	302f
+	ld	r12,0(r9)
+	cmpld	r11,r12
+	blt	301f
+	ld	r12,8(r9)
+	cmpld	r11,r12
+	bge	301f
+	ld	r12,16(r9)
+	b	303f
+301:
+	addi	r9,r9,24
+	b	300b
+302:
+	li	r12,0
+303:
+.endm
+
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -2758,6 +2781,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	mtmsrd	r9,1
 
 	kuap_restore_amr r9, r10
+
 	EXCEPTION_RESTORE_REGS hsrr=0
 	RFI_TO_KERNEL
 
@@ -2815,6 +2839,16 @@ masked_interrupt:
 	stb	r9,PACASRR_VALID(r13)
 	.endif
 
+	SEARCH_RESTART_TABLE
+	cmpdi	r12,0
+	beq	3f
+	.if \hsrr
+	mtspr	SPRN_HSRR0,r12
+	.else
+	mtspr	SPRN_SRR0,r12
+	.endif
+3:
+
 	ld	r9,PACA_EXGEN+EX_CTR(r13)
 	mtctr	r9
 	lwz	r9,PACA_EXGEN+EX_CCR(r13)
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index e121829ef717..9b44f6d3463b 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -605,4 +605,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 interrupt_return_macro srr
 interrupt_return_macro hsrr
 
+	.globl __end_soft_masked
+__end_soft_masked:
+DEFINE_FIXED_SYMBOL(__end_soft_masked)
 #endif /* CONFIG_PPC_BOOK3S */
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index e0548b4950de..211c82e96420 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -9,6 +9,14 @@
 #define EMITS_PT_NOTE
 #define RO_EXCEPTION_TABLE_ALIGN	0
 
+#define RESTART_TABLE(align)						\
+	. = ALIGN(align);						\
+	__restart_table : AT(ADDR(__restart_table) - LOAD_OFFSET) {	\
+		__start___restart_table = .;				\
+		KEEP(*(__restart_table))				\
+		__stop___restart_table = .;				\
+	}
+
 #include <asm/page.h>
 #include <asm-generic/vmlinux.lds.h>
 #include <asm/cache.h>
@@ -124,6 +132,8 @@ SECTIONS
 	RO_DATA(PAGE_SIZE)
 
 #ifdef CONFIG_PPC64
+	RESTART_TABLE(8)
+
 	. = ALIGN(8);
 	__stf_entry_barrier_fixup : AT(ADDR(__stf_entry_barrier_fixup) - LOAD_OFFSET) {
 		__start___stf_entry_barrier_fixup = .;
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 69a91b571845..5d90bebcf9cf 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -36,7 +36,7 @@ extra-$(CONFIG_PPC64)	+= crtsavres.o
 endif
 
 obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \
-			       memcpy_power7.o
+			       memcpy_power7.o restart_table.o
 
 obj64-y	+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
 	   memcpy_64.o copy_mc_64.o
diff --git a/arch/powerpc/lib/restart_table.c b/arch/powerpc/lib/restart_table.c
new file mode 100644
index 000000000000..f2df84c46f24
--- /dev/null
+++ b/arch/powerpc/lib/restart_table.c
@@ -0,0 +1,26 @@
+struct restart_table_entry {
+	unsigned long start;
+	unsigned long end;
+	unsigned long fixup;
+};
+
+extern struct restart_table_entry __start___restart_table[];
+extern struct restart_table_entry __stop___restart_table[];
+
+/* Given an address, look for it in the kernel exception table */
+unsigned long search_kernel_restart_table(unsigned long addr)
+{
+	struct restart_table_entry *rte = __start___restart_table;
+
+	while (rte < __stop___restart_table) {
+		unsigned long start = rte->start;
+		unsigned long end = rte->end;
+		unsigned long fixup = rte->fixup;
+
+		if (addr >= start && addr < end)
+			return fixup;
+
+		rte++;
+	}
+	return 0;
+}
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cba1494..4afd292a7a5e 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -332,9 +332,15 @@ static inline void perf_read_regs(struct pt_regs *regs)
  * If interrupts were soft-disabled when a PMU interrupt occurs, treat
  * it as an NMI.
  */
+extern char __end_soft_masked[];
 static inline int perf_intr_is_nmi(struct pt_regs *regs)
 {
-	return (regs->softe & IRQS_DISABLED);
+	if (regs->softe & IRQS_DISABLED)
+		return true;
+
+	if (!(regs->msr & MSR_PR) && regs->nip < (unsigned long)__end_soft_masked)
+		return true;
+	return false;
 }
 
 /*
@@ -2214,6 +2220,8 @@ static bool pmc_overflow(unsigned long val)
 	return false;
 }
 
+unsigned long search_kernel_restart_table(unsigned long addr);
+
 /*
  * Performance monitor interrupt stuff
  */
@@ -2301,10 +2309,15 @@ static void __perf_event_interrupt(struct pt_regs *regs)
 	 */
 	write_mmcr0(cpuhw, cpuhw->mmcr.mmcr0);
 
-	if (nmi)
+	if (nmi) {
+		unsigned long rst = search_kernel_restart_table(regs->nip);
+		if (rst)
+			regs_set_return_ip(regs, rst);
+
 		nmi_exit();
-	else
+	} else {
 		irq_exit();
+	}
 }
 
 static void perf_event_interrupt(struct pt_regs *regs)
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 8/9] powerpc/64s: interrupt soft-enable race fix
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

Prevent interrupt restore from allowing racing hard interrupts going
ahead of previous soft-pending ones, by using the soft-masked restart
handler to allow a store to clear the soft-mask while knowing nothing
is soft-pending.

This probably doesn't matter much in practice, but it's a simple
demonstrator / test case to exercise the restart table logic.

XXX: 64e can't do this unless it adds restart table support

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 79 +++++++++++++++++++++++++--------------
 1 file changed, 50 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index c8185f709d26..9671468b2c51 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -317,52 +317,73 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	unsigned char irq_happened;
 
 	/* Write the new soft-enabled value */
-	irq_soft_mask_set(mask);
-	if (mask)
+	if (mask) {
+		irq_soft_mask_set(mask);
 		return;
+	}
 
 	/*
-	 * From this point onward, we can take interrupts, preempt,
-	 * etc... unless we got hard-disabled. We check if an event
-	 * happened. If none happened, we know we can just return.
-	 *
-	 * We may have preempted before the check below, in which case
-	 * we are checking the "new" CPU instead of the old one. This
-	 * is only a problem if an event happened on the "old" CPU.
+	 * After the stb, interrupts are unmasked and there are no interrupts
+	 * pending replay. The restart sequence makes this atomic with
+	 * respect to soft-masked interrupts. If this was just a simple code
+	 * sequence, a soft-masked interrupt could become pending right after
+	 * the comparison and before the stb.
 	 *
-	 * External interrupt events will have caused interrupts to
-	 * be hard-disabled, so there is no problem, we
-	 * cannot have preempted.
+	 * This allows interrupts to be unmasked without hard disabling, and
+	 * also without new hard interrupts coming in ahead of pending ones.
 	 */
+	asm_volatile_goto(
+"1:					\n"
+"		lbz	9,%0(13)	\n"
+"		cmpwi	9,0		\n"
+"		bne	%l[happened]	\n"
+"		stb	9,%1(13)	\n"
+"2:					\n"
+		RESTART_TABLE(1b, 2b, 1b)
+	: : "i" (offsetof(struct paca_struct, irq_happened)),
+	    "i" (offsetof(struct paca_struct, irq_soft_mask))
+	: "cr0", "r9"
+	: happened);
+
+	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+		WARN_ON_ONCE(!(mfmsr() & MSR_EE));
+
+	return;
+
+happened:
 	irq_happened = get_irq_happened();
-	if (!irq_happened) {
+	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+		WARN_ON_ONCE(!irq_happened);
+
+	if (irq_happened == PACA_IRQ_HARD_DIS) {
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
-			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
+			WARN_ON_ONCE(mfmsr() & MSR_EE);
+		irq_soft_mask_set(IRQS_ENABLED);
+		local_paca->irq_happened = 0;
+		__hard_irq_enable();
 		return;
 	}
 
-	/* We need to hard disable to replay. */
+	/* Have interrupts to replay, need to hard disable first */
 	if (!(irq_happened & PACA_IRQ_HARD_DIS)) {
-		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
-			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
+		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+			if (!(mfmsr() & MSR_EE)) {
+				/*
+				 * An interrupt could have come in and cleared
+				 * MSR[EE] and set IRQ_HARD_DIS, so check
+				 * IRQ_HARD_DIS again and warn if it is still
+				 * clear.
+				 */
+				irq_happened = get_irq_happened();
+				WARN_ON_ONCE(!(irq_happened & PACA_IRQ_HARD_DIS));
+			}
+		}
 		__hard_irq_disable();
 	} else {
-		/*
-		 * We should already be hard disabled here. We had bugs
-		 * where that wasn't the case so let's dbl check it and
-		 * warn if we are wrong. Only do that when IRQ tracing
-		 * is enabled as mfmsr() can be costly.
-		 */
 		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
 			if (WARN_ON_ONCE(mfmsr() & MSR_EE))
 				__hard_irq_disable();
 		}
-
-		if (irq_happened == PACA_IRQ_HARD_DIS) {
-			local_paca->irq_happened = 0;
-			__hard_irq_enable();
-			return;
-		}
 	}
 
 	/*
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 9/9] powerpc/64s: use interrupt restart table to speed up return from interrupt
From: Nicholas Piggin @ 2020-11-06 15:59 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201106155929.2246055-1-npiggin@gmail.com>

The restart table facility is used to return from interrupt without
disabling MSR[EE] or MSR[RI].

Interrupt return code is put into the low soft-masked region, and
critical code that has return SRRs set, soft-masked state set with no
soft-pending interrupts to replay, and no exit work, is put into
alternate return sections. r1 is saved in the paca. If an interrupt hits
at this point, our fixup location re-loads r1 and from there we can get
to regs, critical things are re-loaded, and we go around and do the
exit sequence again.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |   4 +-
 arch/powerpc/include/asm/paca.h           |   1 +
 arch/powerpc/include/asm/ptrace.h         |  13 +-
 arch/powerpc/kernel/asm-offsets.c         |   3 +
 arch/powerpc/kernel/interrupt_64.S        | 119 ++++++++++-
 arch/powerpc/kernel/syscall_64.c          | 242 +++++++++++++---------
 6 files changed, 277 insertions(+), 105 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 939f3c94c8f3..e0b0fc2913c1 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -71,8 +71,8 @@ void __init machine_init(u64 dt_ptr);
 #endif
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv);
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs);
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs);
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
 		      u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 58e9995c3184..ab123dc85b81 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -167,6 +167,7 @@ struct paca_struct {
 	u64 kstack;			/* Saved Kernel stack addr */
 	u64 saved_r1;			/* r1 save for RTAS calls or PM or EE=0 */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
+	u64 exit_save_r1;		/* Syscall/interrupt R1 save */
 #ifdef CONFIG_PPC_BOOK3E
 	u16 trap_save;			/* Used when bad stack is encountered */
 #endif
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 2c3e773ce292..cd49769d4194 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -50,14 +50,23 @@ struct pt_regs
 
 	union {
 		struct {
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S_64
 			unsigned long ppr;
+			unsigned long exit_result;
 #endif
 #ifdef CONFIG_PPC_KUAP
 			unsigned long kuap;
 #endif
 		};
-		unsigned long __pad[2];	/* Maintain 16 byte interrupt stack alignment */
+
+		/* Maintain 16 byte interrupt stack alignment */
+#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_BOOK3S_64
+		unsigned long __pad[4];
+#else
+		unsigned long __pad[2];
+#endif
+#endif
 	};
 };
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index ea13e35dd511..eec4e92bd938 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -290,6 +290,7 @@ int main(void)
 	OFFSET(ACCOUNT_STARTTIME_USER, paca_struct, accounting.starttime_user);
 	OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime);
 	OFFSET(ACCOUNT_SYSTEM_TIME, paca_struct, accounting.stime);
+	OFFSET(PACA_EXIT_SAVE_R1, paca_struct, exit_save_r1);
 #ifdef CONFIG_PPC_BOOK3E
 	OFFSET(PACA_TRAP_SAVE, paca_struct, trap_save);
 #endif
@@ -353,7 +354,9 @@ int main(void)
 	STACK_PT_REGS_OFFSET(_ESR, dsisr);
 #else /* CONFIG_PPC64 */
 	STACK_PT_REGS_OFFSET(SOFTE, softe);
+#ifdef CONFIG_PPC_BOOK3S_64
 	STACK_PT_REGS_OFFSET(_PPR, ppr);
+#endif
 #endif /* CONFIG_PPC64 */
 
 #ifdef CONFIG_PPC_KUAP
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index 9b44f6d3463b..a686d625b8ee 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -113,9 +113,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bl	system_call_exception
 
 .Lsyscall_vectored_\name\()_exit:
-	addi    r4,r1,STACK_FRAME_OVERHEAD
+	addi	r4,r1,STACK_FRAME_OVERHEAD
 	li	r5,1 /* scv */
 	bl	syscall_exit_prepare
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+.Lsyscall_vectored_\name\()_rst_start:
+	lbz	r11,PACAIRQHAPPENED(r13)
+	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
+	bne-	.Lsyscall_vectored_\name\()_restart
+	li	r11,IRQS_ENABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	li	r11,0
+	stb	r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS
 
 	ld	r2,_CCR(r1)
 	ld	r4,_NIP(r1)
@@ -165,8 +174,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	b	.	/* prevent speculative execution */
 
 .Lsyscall_vectored_\name\()_restore_regs:
-	li	r3,0
-	mtmsrd	r3,1
 	mtspr	SPRN_SRR0,r4
 	mtspr	SPRN_SRR1,r5
 
@@ -184,9 +191,26 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	REST_2GPRS(12, r1)
 	ld	r1,GPR1(r1)
 	RFI_TO_USER
+.Lsyscall_vectored_\name\()_rst_end:
+
+.Lsyscall_vectored_\name\()_restart:
+	GET_PACA(r13)
+	ld	r1,PACA_EXIT_SAVE_R1(r13)
+	ld	r2,PACATOC(r13)
+	ld	r3,RESULT(r1)
+	addi	r4,r1,STACK_FRAME_OVERHEAD
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	bl	syscall_exit_restart
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+	b	.Lsyscall_vectored_\name\()_rst_start
+
+RESTART_TABLE(.Lsyscall_vectored_\name\()_rst_start, .Lsyscall_vectored_\name\()_rst_end, .Lsyscall_vectored_\name\()_restart)
+
 .endm
 
 system_call_vectored common 0x3000
+
 /*
  * We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0
  * which is tested by system_call_exception when r0 is -1 (as set by vector
@@ -289,9 +313,18 @@ END_BTB_FLUSH_SECTION
 	bl	system_call_exception
 
 .Lsyscall_exit:
-	addi    r4,r1,STACK_FRAME_OVERHEAD
+	addi	r4,r1,STACK_FRAME_OVERHEAD
 	li	r5,0 /* !scv */
 	bl	syscall_exit_prepare
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+.Lsyscall_rst_start:
+	lbz	r11,PACAIRQHAPPENED(r13)
+	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
+	bne-	.Lsyscall_restart
+	li	r11,IRQS_ENABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	li	r11,0
+	stb	r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS
 
 	ld	r2,_CCR(r1)
 	ld	r6,_LINK(r1)
@@ -356,6 +389,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	REST_8GPRS(4, r1)
 	ld	r12,GPR12(r1)
 	b	.Lsyscall_restore_regs_cont
+.Lsyscall_rst_end:
+
+.Lsyscall_restart:
+	GET_PACA(r13)
+	ld	r1,PACA_EXIT_SAVE_R1(r13)
+	ld	r2,PACATOC(r13)
+	ld	r3,RESULT(r1)
+	addi	r4,r1,STACK_FRAME_OVERHEAD
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	bl	syscall_exit_restart
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+	b	.Lsyscall_rst_start
+
+RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, .Lsyscall_restart)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 .Ltabort_syscall:
@@ -439,16 +487,32 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
 	ld	r4,_MSR(r1)
 	andi.	r0,r4,MSR_PR
 	beq	.Lkernel_interrupt_return_\srr
+.Linterrupt_return_\srr\()_user:
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	interrupt_exit_user_prepare
 	cmpdi	r3,0
 	bne-	.Lrestore_nvgprs_\srr
+.Lrestore_nvgprs_\srr\()_cont:
+
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+.Linterrupt_return_\srr\()_user_rst_start:
+	lbz	r11,PACAIRQHAPPENED(r13)
+	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
+	bne-	.Linterrupt_return_\srr\()_user_restart
+	li	r11,IRQS_ENABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	li	r11,0
+	stb	r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS
 
 .Lfast_user_interrupt_return_\srr:
 BEGIN_FTR_SECTION
 	ld	r10,_PPR(r1)
 	mtspr	SPRN_PPR,r10
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	lbz	r4,PACAIRQSOFTMASK(r13)
+	tdnei	r4,IRQS_ENABLED
 	.ifc \srr,srr
 	lbz	r4,PACASRR_VALID(r13)
 	.else
@@ -503,16 +567,46 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 	HRFI_TO_USER
 	.endif
 	b	.	/* prevent speculative execution */
+.Linterrupt_return_\srr\()_user_rst_end:
 
 .Lrestore_nvgprs_\srr\():
 	REST_NVGPRS(r1)
-	b	.Lfast_user_interrupt_return_\srr
+	b	.Lrestore_nvgprs_\srr\()_cont
+
+.Linterrupt_return_\srr\()_user_restart:
+	GET_PACA(r13)
+	ld	r1,PACA_EXIT_SAVE_R1(r13)
+	ld	r2,PACATOC(r13)
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	bl	interrupt_exit_user_restart
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+	b	.Linterrupt_return_\srr\()_user_rst_start
+
+RESTART_TABLE(.Linterrupt_return_\srr\()_user_rst_start, .Linterrupt_return_\srr\()_user_rst_end, .Linterrupt_return_\srr\()_user_restart)
 
 	.balign IFETCH_ALIGN_BYTES
 .Lkernel_interrupt_return_\srr\():
+.Linterrupt_return_\srr\()_kernel:
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	interrupt_exit_kernel_prepare
 
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+.Linterrupt_return_\srr\()_kernel_rst_start:
+	lbz	r11,SOFTE(r1)
+	cmpwi	r11,IRQS_ENABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	bne	1f
+	lbz	r11,PACAIRQHAPPENED(r13)
+	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
+	bne-	.Linterrupt_return_\srr\()_kernel_restart
+	li	r11,0
+	stb	r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS
+1:
+
 .Lfast_kernel_interrupt_return_\srr\():
 	cmpdi	cr1,r3,0
 	.ifc \srr,srr
@@ -600,6 +694,21 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 	HRFI_TO_KERNEL
 	.endif
 	b	.	/* prevent speculative execution */
+.Linterrupt_return_\srr\()_kernel_rst_end:
+
+.Linterrupt_return_\srr\()_kernel_restart:
+	GET_PACA(r13)
+	ld	r1,PACA_EXIT_SAVE_R1(r13)
+	ld	r2,PACATOC(r13)
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	li	r11,IRQS_ALL_DISABLED
+	stb	r11,PACAIRQSOFTMASK(r13)
+	bl	interrupt_exit_kernel_restart
+	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
+	b	.Linterrupt_return_\srr\()_kernel_rst_start
+
+RESTART_TABLE(.Linterrupt_return_\srr\()_kernel_rst_start, .Linterrupt_return_\srr\()_kernel_rst_end, .Linterrupt_return_\srr\()_kernel_restart)
+
 .endm
 
 interrupt_return_macro srr
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 672f2a796487..5add073a21b7 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -109,87 +109,27 @@ notrace long system_call_exception(long r3, long r4, long r5,
 	return f(r3, r4, r5, r6, r7, r8);
 }
 
-/*
- * local irqs must be disabled. Returns false if the caller must re-enable
- * them, check for new work, and try again.
- */
-static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri)
+static notrace inline bool prep_irq_for_enabled_exit(void)
 {
 	/* This must be done with RI=1 because tracing may touch vmaps */
 	trace_hardirqs_on();
 
-	/* This pattern matches prep_irq_for_idle */
-	if (clear_ri)
-		__hard_EE_RI_disable();
-	else
-		__hard_irq_disable();
 	if (unlikely(lazy_irq_pending_nocheck())) {
-		/* Took an interrupt, may have more exit work to do. */
-		if (clear_ri)
-			__hard_RI_enable();
 		trace_hardirqs_off();
-		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
-
 		return false;
 	}
-	local_paca->irq_happened = 0;
-	irq_soft_mask_set(IRQS_ENABLED);
 
 	return true;
 }
 
-/*
- * This should be called after a syscall returns, with r3 the return value
- * from the syscall. If this function returns non-zero, the system call
- * exit assembly should additionally load all GPR registers and CTR and XER
- * from the interrupt frame.
- *
- * The function graph tracer can not trace the return side of this function,
- * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
- */
-notrace unsigned long syscall_exit_prepare(unsigned long r3,
-					   struct pt_regs *regs,
-					   long scv)
+notrace unsigned long syscall_exit_prepare_main(unsigned long r3,
+					   struct pt_regs *regs)
 {
 	unsigned long *ti_flagsp = &current_thread_info()->flags;
 	unsigned long ti_flags;
 	unsigned long ret = 0;
 
-	CT_WARN_ON(ct_state() == CONTEXT_USER);
-
-	kuap_check_amr();
-
-	regs->result = r3;
-
-	/* Check whether the syscall is issued inside a restartable sequence */
-	rseq_syscall(regs);
-
-	ti_flags = *ti_flagsp;
-
-	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
-		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
-			r3 = -r3;
-			regs->ccr |= 0x10000000; /* Set SO bit in CR */
-		}
-	}
-
-	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
-		if (ti_flags & _TIF_RESTOREALL)
-			ret = _TIF_RESTOREALL;
-		else
-			regs->gpr[3] = r3;
-		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
-	} else {
-		regs->gpr[3] = r3;
-	}
-
-	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
-		do_syscall_trace_leave(regs);
-		ret |= _TIF_RESTOREALL;
-	}
-
 again:
-	local_irq_disable();
 	ti_flags = READ_ONCE(*ti_flagsp);
 	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
 		local_irq_enable();
@@ -233,15 +173,16 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 		}
 	}
 
-	user_enter_irqoff();
-
-	/* scv need not set RI=0 because SRRs are not used */
-	if (unlikely(!prep_irq_for_enabled_exit(!scv))) {
-		user_exit_irqoff();
+	if (unlikely(lazy_irq_pending_nocheck())) {
 		local_irq_enable();
+		local_irq_disable();
 		goto again;
 	}
 
+	trace_hardirqs_on();
+
+	user_enter_irqoff();
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	local_paca->tm_scratch = regs->msr;
 #endif
@@ -251,17 +192,95 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 	return ret;
 }
 
+/*
+ * This should be called after a syscall returns, with r3 the return value
+ * from the syscall. If this function returns non-zero, the system call
+ * exit assembly should additionally load all GPR registers and CTR and XER
+ * from the interrupt frame.
+ *
+ * The function graph tracer can not trace the return side of this function,
+ * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
+ */
+notrace unsigned long syscall_exit_prepare(unsigned long r3,
+					   struct pt_regs *regs,
+					   long scv)
+{
+	unsigned long *ti_flagsp = &current_thread_info()->flags;
+	unsigned long ti_flags;
+	unsigned long ret = 0;
+
+	CT_WARN_ON(ct_state() == CONTEXT_USER);
+
+	kuap_check_amr();
+
+	regs->result = r3;
+
+	/* Check whether the syscall is issued inside a restartable sequence */
+	rseq_syscall(regs);
+
+	ti_flags = *ti_flagsp;
+
+	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
+		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
+			r3 = -r3;
+			regs->ccr |= 0x10000000; /* Set SO bit in CR */
+		}
+	}
+
+	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
+		if (ti_flags & _TIF_RESTOREALL)
+			ret = _TIF_RESTOREALL;
+		else
+			regs->gpr[3] = r3;
+		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
+	} else {
+		regs->gpr[3] = r3;
+	}
+
+	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+		do_syscall_trace_leave(regs);
+		ret |= _TIF_RESTOREALL;
+	}
+
+	local_irq_disable();
+	ret |= syscall_exit_prepare_main(r3, regs);
+
+	regs->exit_result = ret;
+
+	return ret;
+}
+
+notrace unsigned long syscall_exit_restart(unsigned long r3, struct pt_regs *regs)
+{
+	/* This is called in when detecting a soft-pending interrupt as well,
+	 * so can't just have restart table returns clear SRR1[MSR] and set
+	 * PACA_IRQ_HARD_DIS here (unless the soft-pending case were to clear
+	 * MSR[EE] too).
+	 */
+	hard_irq_disable();
+
+	trace_hardirqs_off();
+	user_exit_irqoff();
+	account_cpu_user_entry();
+
+	BUG_ON(!user_mode(regs));
+
+	regs->exit_result |= syscall_exit_prepare_main(r3, regs);
+
+	return regs->exit_result;
+}
+
 #ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC_BOOK3E
 	struct thread_struct *ts = &current->thread;
 #endif
 	unsigned long *ti_flagsp = &current_thread_info()->flags;
 	unsigned long ti_flags;
-	unsigned long flags;
 	unsigned long ret = 0;
 
+	BUG_ON(!irqs_disabled());
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
 		BUG_ON(!(regs->msr & MSR_RI));
 	BUG_ON(!(regs->msr & MSR_PR));
@@ -275,8 +294,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 	 */
 	kuap_check_amr();
 
-	local_irq_save(flags);
-
 again:
 	ti_flags = READ_ONCE(*ti_flagsp);
 	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
@@ -310,14 +327,16 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 		}
 	}
 
-	user_enter_irqoff();
-	if (unlikely(!prep_irq_for_enabled_exit(true))) {
-		user_exit_irqoff();
+	if (unlikely(lazy_irq_pending_nocheck())) {
 		local_irq_enable();
 		local_irq_disable();
 		goto again;
 	}
 
+	trace_hardirqs_on();
+
+	user_enter_irqoff();
+
 #ifdef CONFIG_PPC_BOOK3E
 	if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
 		/*
@@ -336,19 +355,21 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 
 	account_cpu_user_exit();
 
+	regs->exit_result = ret;
+
 	return ret;
 }
 
 void unrecoverable_exception(struct pt_regs *regs);
 void preempt_schedule_irq(void);
 
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
 {
 	unsigned long *ti_flagsp = &current_thread_info()->flags;
-	unsigned long flags;
 	unsigned long ret = 0;
 	unsigned long amr;
 
+	BUG_ON(!irqs_disabled());
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
 		unrecoverable_exception(regs);
 	BUG_ON(regs->msr & MSR_PR);
@@ -362,13 +383,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
 
 	amr = kuap_get_and_check_amr();
 
-	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
-		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
-		ret = 1;
-	}
-
-	local_irq_save(flags);
-
 	if (regs->softe == IRQS_ENABLED) {
 		/* Returning to a kernel context with local irqs enabled. */
 		WARN_ON_ONCE(!(regs->msr & MSR_EE));
@@ -381,27 +395,33 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
 			}
 		}
 
-		if (unlikely(!prep_irq_for_enabled_exit(true))) {
-			/*
-			 * Can't local_irq_restore to replay if we were in
-			 * interrupt context. Must replay directly.
-			 */
-			if (irqs_disabled_flags(flags)) {
-				replay_soft_interrupts();
-			} else {
-				local_irq_restore(flags);
-				local_irq_save(flags);
-			}
+		if (unlikely(lazy_irq_pending_nocheck())) {
+			hard_irq_disable();
+			replay_soft_interrupts();
 			/* Took an interrupt, may have more exit work to do. */
 			goto again;
 		}
+		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+
+		trace_hardirqs_on();
 	} else {
 		/* Returning to a kernel context with local irqs disabled. */
-		__hard_EE_RI_disable();
 		if (regs->msr & MSR_EE)
 			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
 	}
 
+	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
+		/* Stack store can't be restarted (stack might have been clobbered) */
+		__hard_EE_RI_disable();
+		if (unlikely(lazy_irq_pending_nocheck()) && regs->softe == IRQS_ENABLED) {
+			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+			__hard_RI_enable();
+			goto again;
+		}
+
+		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
+		ret = 1;
+	}
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	local_paca->tm_scratch = regs->msr;
@@ -416,4 +436,34 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
 
 	return ret;
 }
+
+notrace unsigned long interrupt_exit_user_restart(struct pt_regs *regs)
+{
+	hard_irq_disable();
+
+	trace_hardirqs_off();
+	user_exit_irqoff();
+	account_cpu_user_entry();
+
+	BUG_ON(!user_mode(regs));
+
+	regs->exit_result |= interrupt_exit_user_prepare(regs);
+
+	return regs->exit_result;
+}
+
+notrace unsigned long interrupt_exit_kernel_restart(struct pt_regs *regs)
+{
+	hard_irq_disable();
+
+	set_kuap(AMR_KUAP_BLOCKED);
+
+	if (regs->softe == IRQS_ENABLED)
+		trace_hardirqs_off();
+
+	BUG_ON(user_mode(regs));
+
+	return interrupt_exit_kernel_prepare(regs);
+}
+
 #endif
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
From: Nathan Lynch @ 2020-11-06 16:44 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt
In-Reply-To: <95c8b599-f939-ad28-5671-cfbf7436508e@linux.ibm.com>

Andrew Donnellan <ajd@linux.ibm.com> writes:
> On 30/10/20 11:10 pm, Nathan Lynch wrote:
>>> And there's a zero chance that drmgr will ever be fixed on LE?
>> 
>> It's always used the sysfs interface on LE, and the only way to provoke
>> it to attempt the syscalls is by doing something like this before
>> running the migration:
>> 
>> # echo 0 > /tmp/fake_api_version
>> # mount -o bind,ro /tmp/fake_api_version /sys/kernel/mobility/api_version
>> 
>> So I'm not aware of any circumstance that would actually motivate
>> someone to make it work on LE. I'd say it's more likely that drmgr will
>> drop its support for using the syscall altogether.
>> 
>
> Okay. librtas should handle the error fine, so I don't have any huge 
> objection to this, though perhaps a documentation patch to librtas to 
> mention that these calls are BE-only would be in order.

Good suggestion, I've drafted a documentation change as well as an
attempt to mark the appropriate APIs deprecated for LE here:

https://github.com/nathanlynch/librtas/tree/topic/document-suspend-filter

which I'll send to Tyrel if/when the Linux change is staged.

^ permalink raw reply

* [PATCH net-next] Revert ibmvnic merge do_change_param_reset into do_reset
From: Dany Madden @ 2020-11-06 19:17 UTC (permalink / raw)
  To: davem; +Cc: Dany Madden, netdev, linuxppc-dev

This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
where it restructures do_reset. There are patches being tested that
would require major rework if this is committed first. 

We will resend this after the other patches have been applied.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 147 ++++++++++++++++++++---------
 1 file changed, 104 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index f4167de30461..af4dfbe28d56 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1826,6 +1826,86 @@ static int ibmvnic_set_mac(struct net_device *netdev, void *p)
 	return rc;
 }
 
+/**
+ * do_change_param_reset returns zero if we are able to keep processing reset
+ * events, or non-zero if we hit a fatal error and must halt.
+ */
+static int do_change_param_reset(struct ibmvnic_adapter *adapter,
+				 struct ibmvnic_rwi *rwi,
+				 u32 reset_state)
+{
+	struct net_device *netdev = adapter->netdev;
+	int i, rc;
+
+	netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n",
+		   rwi->reset_reason);
+
+	netif_carrier_off(netdev);
+	adapter->reset_reason = rwi->reset_reason;
+
+	ibmvnic_cleanup(netdev);
+
+	if (reset_state == VNIC_OPEN) {
+		rc = __ibmvnic_close(netdev);
+		if (rc)
+			return rc;
+	}
+
+	release_resources(adapter);
+	release_sub_crqs(adapter, 1);
+	release_crq_queue(adapter);
+
+	adapter->state = VNIC_PROBED;
+
+	rc = init_crq_queue(adapter);
+
+	if (rc) {
+		netdev_err(adapter->netdev,
+			   "Couldn't initialize crq. rc=%d\n", rc);
+		return rc;
+	}
+
+	rc = ibmvnic_reset_init(adapter, true);
+	if (rc)
+		return IBMVNIC_INIT_FAILED;
+
+	/* If the adapter was in PROBE state prior to the reset,
+	 * exit here.
+	 */
+	if (reset_state == VNIC_PROBED)
+		return 0;
+
+	rc = ibmvnic_login(netdev);
+	if (rc) {
+		adapter->state = reset_state;
+		return rc;
+	}
+
+	rc = init_resources(adapter);
+	if (rc)
+		return rc;
+
+	ibmvnic_disable_irqs(adapter);
+
+	adapter->state = VNIC_CLOSED;
+
+	if (reset_state == VNIC_CLOSED)
+		return 0;
+
+	rc = __ibmvnic_open(netdev);
+	if (rc)
+		return IBMVNIC_OPEN_FAILED;
+
+	/* refresh device's multicast list */
+	ibmvnic_set_multi(netdev);
+
+	/* kick napi */
+	for (i = 0; i < adapter->req_rx_queues; i++)
+		napi_schedule(&adapter->napi[i]);
+
+	return 0;
+}
+
 /**
  * do_reset returns zero if we are able to keep processing reset events, or
  * non-zero if we hit a fatal error and must halt.
@@ -1841,12 +1921,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 	netdev_dbg(adapter->netdev, "Re-setting driver (%d)\n",
 		   rwi->reset_reason);
 
-	adapter->reset_reason = rwi->reset_reason;
-	/* requestor of VNIC_RESET_CHANGE_PARAM already has the rtnl lock */
-	if (!(adapter->reset_reason == VNIC_RESET_CHANGE_PARAM))
-		rtnl_lock();
+	rtnl_lock();
 
 	netif_carrier_off(netdev);
+	adapter->reset_reason = rwi->reset_reason;
 
 	old_num_rx_queues = adapter->req_rx_queues;
 	old_num_tx_queues = adapter->req_tx_queues;
@@ -1858,37 +1936,25 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 	if (reset_state == VNIC_OPEN &&
 	    adapter->reset_reason != VNIC_RESET_MOBILITY &&
 	    adapter->reset_reason != VNIC_RESET_FAILOVER) {
-		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
-			rc = __ibmvnic_close(netdev);
-			if (rc)
-				goto out;
-		} else {
-			adapter->state = VNIC_CLOSING;
-
-			/* Release the RTNL lock before link state change and
-			 * re-acquire after the link state change to allow
-			 * linkwatch_event to grab the RTNL lock and run during
-			 * a reset.
-			 */
-			rtnl_unlock();
-			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
-			rtnl_lock();
-			if (rc)
-				goto out;
+		adapter->state = VNIC_CLOSING;
 
-			if (adapter->state != VNIC_CLOSING) {
-				rc = -1;
-				goto out;
-			}
+		/* Release the RTNL lock before link state change and
+		 * re-acquire after the link state change to allow
+		 * linkwatch_event to grab the RTNL lock and run during
+		 * a reset.
+		 */
+		rtnl_unlock();
+		rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
+		rtnl_lock();
+		if (rc)
+			goto out;
 
-			adapter->state = VNIC_CLOSED;
+		if (adapter->state != VNIC_CLOSING) {
+			rc = -1;
+			goto out;
 		}
-	}
 
-	if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
-		release_resources(adapter);
-		release_sub_crqs(adapter, 1);
-		release_crq_queue(adapter);
+		adapter->state = VNIC_CLOSED;
 	}
 
 	if (adapter->reset_reason != VNIC_RESET_NON_FATAL) {
@@ -1897,9 +1963,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 		 */
 		adapter->state = VNIC_PROBED;
 
-		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
-			rc = init_crq_queue(adapter);
-		} else if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
+		if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
 			rc = ibmvnic_reenable_crq_queue(adapter);
 			release_sub_crqs(adapter, 1);
 		} else {
@@ -1939,11 +2003,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			goto out;
 		}
 
-		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
-			rc = init_resources(adapter);
-			if (rc)
-				goto out;
-		} else if (adapter->req_rx_queues != old_num_rx_queues ||
+		if (adapter->req_rx_queues != old_num_rx_queues ||
 		    adapter->req_tx_queues != old_num_tx_queues ||
 		    adapter->req_rx_add_entries_per_subcrq !=
 		    old_num_rx_slots ||
@@ -2004,9 +2064,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 	rc = 0;
 
 out:
-	/* requestor of VNIC_RESET_CHANGE_PARAM should still hold the rtnl lock */
-	if (!(adapter->reset_reason == VNIC_RESET_CHANGE_PARAM))
-		rtnl_unlock();
+	rtnl_unlock();
 
 	return rc;
 }
@@ -2140,7 +2198,10 @@ static void __ibmvnic_reset(struct work_struct *work)
 		}
 		spin_unlock_irqrestore(&adapter->state_lock, flags);
 
-		if (adapter->force_reset_recovery) {
+		if (rwi->reset_reason == VNIC_RESET_CHANGE_PARAM) {
+			/* CHANGE_PARAM requestor holds rtnl_lock */
+			rc = do_change_param_reset(adapter, rwi, reset_state);
+		} else if (adapter->force_reset_recovery) {
 			/* Transport event occurred during previous reset */
 			if (adapter->wait_for_reset) {
 				/* Previous was CHANGE_PARAM; caller locked */
-- 
2.18.2


^ permalink raw reply related

* Re: [PATCH net-next] Revert ibmvnic merge do_change_param_reset into do_reset
From: ljp @ 2020-11-06 19:30 UTC (permalink / raw)
  To: Dany Madden, wvoigt; +Cc: netdev, linuxppc-dev, davem, Linuxppc-dev
In-Reply-To: <20201106191745.1679846-1-drt@linux.ibm.com>

On 2020-11-06 13:17, Dany Madden wrote:
> This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
> where it restructures do_reset. There are patches being tested that
> would require major rework if this is committed first.
> 
> We will resend this after the other patches have been applied.

I discussed with my manager, and he has agreed not revert this one
since it is in the net-next tree and will not affect net tree for
current bug fix patches.

Sorry for the confusion.

Thanks,
Lijun

> 
> Signed-off-by: Dany Madden <drt@linux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 147 ++++++++++++++++++++---------
>  1 file changed, 104 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index f4167de30461..af4dfbe28d56 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1826,6 +1826,86 @@ static int ibmvnic_set_mac(struct net_device
> *netdev, void *p)
>  	return rc;
>  }
> 
> +/**
> + * do_change_param_reset returns zero if we are able to keep 
> processing reset
> + * events, or non-zero if we hit a fatal error and must halt.
> + */
> +static int do_change_param_reset(struct ibmvnic_adapter *adapter,
> +				 struct ibmvnic_rwi *rwi,
> +				 u32 reset_state)
> +{
> +	struct net_device *netdev = adapter->netdev;
> +	int i, rc;
> +
> +	netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n",
> +		   rwi->reset_reason);
> +
> +	netif_carrier_off(netdev);
> +	adapter->reset_reason = rwi->reset_reason;
> +
> +	ibmvnic_cleanup(netdev);
> +
> +	if (reset_state == VNIC_OPEN) {
> +		rc = __ibmvnic_close(netdev);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	release_resources(adapter);
> +	release_sub_crqs(adapter, 1);
> +	release_crq_queue(adapter);
> +
> +	adapter->state = VNIC_PROBED;
> +
> +	rc = init_crq_queue(adapter);
> +
> +	if (rc) {
> +		netdev_err(adapter->netdev,
> +			   "Couldn't initialize crq. rc=%d\n", rc);
> +		return rc;
> +	}
> +
> +	rc = ibmvnic_reset_init(adapter, true);
> +	if (rc)
> +		return IBMVNIC_INIT_FAILED;
> +
> +	/* If the adapter was in PROBE state prior to the reset,
> +	 * exit here.
> +	 */
> +	if (reset_state == VNIC_PROBED)
> +		return 0;
> +
> +	rc = ibmvnic_login(netdev);
> +	if (rc) {
> +		adapter->state = reset_state;
> +		return rc;
> +	}
> +
> +	rc = init_resources(adapter);
> +	if (rc)
> +		return rc;
> +
> +	ibmvnic_disable_irqs(adapter);
> +
> +	adapter->state = VNIC_CLOSED;
> +
> +	if (reset_state == VNIC_CLOSED)
> +		return 0;
> +
> +	rc = __ibmvnic_open(netdev);
> +	if (rc)
> +		return IBMVNIC_OPEN_FAILED;
> +
> +	/* refresh device's multicast list */
> +	ibmvnic_set_multi(netdev);
> +
> +	/* kick napi */
> +	for (i = 0; i < adapter->req_rx_queues; i++)
> +		napi_schedule(&adapter->napi[i]);
> +
> +	return 0;
> +}
> +
>  /**
>   * do_reset returns zero if we are able to keep processing reset 
> events, or
>   * non-zero if we hit a fatal error and must halt.
> @@ -1841,12 +1921,10 @@ static int do_reset(struct ibmvnic_adapter 
> *adapter,
>  	netdev_dbg(adapter->netdev, "Re-setting driver (%d)\n",
>  		   rwi->reset_reason);
> 
> -	adapter->reset_reason = rwi->reset_reason;
> -	/* requestor of VNIC_RESET_CHANGE_PARAM already has the rtnl lock */
> -	if (!(adapter->reset_reason == VNIC_RESET_CHANGE_PARAM))
> -		rtnl_lock();
> +	rtnl_lock();
> 
>  	netif_carrier_off(netdev);
> +	adapter->reset_reason = rwi->reset_reason;
> 
>  	old_num_rx_queues = adapter->req_rx_queues;
>  	old_num_tx_queues = adapter->req_tx_queues;
> @@ -1858,37 +1936,25 @@ static int do_reset(struct ibmvnic_adapter 
> *adapter,
>  	if (reset_state == VNIC_OPEN &&
>  	    adapter->reset_reason != VNIC_RESET_MOBILITY &&
>  	    adapter->reset_reason != VNIC_RESET_FAILOVER) {
> -		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
> -			rc = __ibmvnic_close(netdev);
> -			if (rc)
> -				goto out;
> -		} else {
> -			adapter->state = VNIC_CLOSING;
> -
> -			/* Release the RTNL lock before link state change and
> -			 * re-acquire after the link state change to allow
> -			 * linkwatch_event to grab the RTNL lock and run during
> -			 * a reset.
> -			 */
> -			rtnl_unlock();
> -			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
> -			rtnl_lock();
> -			if (rc)
> -				goto out;
> +		adapter->state = VNIC_CLOSING;
> 
> -			if (adapter->state != VNIC_CLOSING) {
> -				rc = -1;
> -				goto out;
> -			}
> +		/* Release the RTNL lock before link state change and
> +		 * re-acquire after the link state change to allow
> +		 * linkwatch_event to grab the RTNL lock and run during
> +		 * a reset.
> +		 */
> +		rtnl_unlock();
> +		rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
> +		rtnl_lock();
> +		if (rc)
> +			goto out;
> 
> -			adapter->state = VNIC_CLOSED;
> +		if (adapter->state != VNIC_CLOSING) {
> +			rc = -1;
> +			goto out;
>  		}
> -	}
> 
> -	if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
> -		release_resources(adapter);
> -		release_sub_crqs(adapter, 1);
> -		release_crq_queue(adapter);
> +		adapter->state = VNIC_CLOSED;
>  	}
> 
>  	if (adapter->reset_reason != VNIC_RESET_NON_FATAL) {
> @@ -1897,9 +1963,7 @@ static int do_reset(struct ibmvnic_adapter 
> *adapter,
>  		 */
>  		adapter->state = VNIC_PROBED;
> 
> -		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
> -			rc = init_crq_queue(adapter);
> -		} else if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
> +		if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
>  			rc = ibmvnic_reenable_crq_queue(adapter);
>  			release_sub_crqs(adapter, 1);
>  		} else {
> @@ -1939,11 +2003,7 @@ static int do_reset(struct ibmvnic_adapter 
> *adapter,
>  			goto out;
>  		}
> 
> -		if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM) {
> -			rc = init_resources(adapter);
> -			if (rc)
> -				goto out;
> -		} else if (adapter->req_rx_queues != old_num_rx_queues ||
> +		if (adapter->req_rx_queues != old_num_rx_queues ||
>  		    adapter->req_tx_queues != old_num_tx_queues ||
>  		    adapter->req_rx_add_entries_per_subcrq !=
>  		    old_num_rx_slots ||
> @@ -2004,9 +2064,7 @@ static int do_reset(struct ibmvnic_adapter 
> *adapter,
>  	rc = 0;
> 
>  out:
> -	/* requestor of VNIC_RESET_CHANGE_PARAM should still hold the rtnl 
> lock */
> -	if (!(adapter->reset_reason == VNIC_RESET_CHANGE_PARAM))
> -		rtnl_unlock();
> +	rtnl_unlock();
> 
>  	return rc;
>  }
> @@ -2140,7 +2198,10 @@ static void __ibmvnic_reset(struct work_struct 
> *work)
>  		}
>  		spin_unlock_irqrestore(&adapter->state_lock, flags);
> 
> -		if (adapter->force_reset_recovery) {
> +		if (rwi->reset_reason == VNIC_RESET_CHANGE_PARAM) {
> +			/* CHANGE_PARAM requestor holds rtnl_lock */
> +			rc = do_change_param_reset(adapter, rwi, reset_state);
> +		} else if (adapter->force_reset_recovery) {
>  			/* Transport event occurred during previous reset */
>  			if (adapter->wait_for_reset) {
>  				/* Previous was CHANGE_PARAM; caller locked */

^ permalink raw reply

* Re: [PATCH net-next] Revert ibmvnic merge do_change_param_reset into do_reset
From: Jakub Kicinski @ 2020-11-06 19:42 UTC (permalink / raw)
  To: ljp; +Cc: wvoigt, netdev, Linuxppc-dev, Dany Madden, linuxppc-dev, davem
In-Reply-To: <0ff353cbada91b031d1bbae250a975d5@linux.vnet.ibm.com>

On Fri, 06 Nov 2020 13:30:25 -0600 ljp wrote:
> On 2020-11-06 13:17, Dany Madden wrote:
> > This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
> > where it restructures do_reset. There are patches being tested that
> > would require major rework if this is committed first.
> > 
> > We will resend this after the other patches have been applied.  
> 
> I discussed with my manager, and he has agreed not revert this one
> since it is in the net-next tree and will not affect net tree for
> current bug fix patches.

We merge net into net-next periodically (~every week or so) so if you
keep making changes to both branches I will have to deal with the
fallout.

I'm assuming that the resolution for the current conflict which Stephen
Rothwell sent from linux-next is correct. Please confirm.

I will resolve it like he did when Linus pulls from net (hopefully
later today).

But if you know you have more fixes I'd rather revert this, get all the
relevant fixed into net, wait for net to be merged into net-next and
then redo the refactoring.

Hope that makes sense.

^ permalink raw reply

* Re: [PATCH net-next] Revert ibmvnic merge do_change_param_reset into do_reset
From: ljp @ 2020-11-06 20:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: wvoigt, netdev, Linuxppc-dev, Dany Madden, linuxppc-dev, davem
In-Reply-To: <20201106114208.4b0e8eec@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 2020-11-06 13:42, Jakub Kicinski wrote:
> On Fri, 06 Nov 2020 13:30:25 -0600 ljp wrote:
>> On 2020-11-06 13:17, Dany Madden wrote:
>> > This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
>> > where it restructures do_reset. There are patches being tested that
>> > would require major rework if this is committed first.
>> >
>> > We will resend this after the other patches have been applied.
>> 
>> I discussed with my manager, and he has agreed not revert this one
>> since it is in the net-next tree and will not affect net tree for
>> current bug fix patches.
> 
> We merge net into net-next periodically (~every week or so) so if you
> keep making changes to both branches I will have to deal with the
> fallout.
> 
> I'm assuming that the resolution for the current conflict which Stephen
> Rothwell sent from linux-next is correct. Please confirm.

That fix is correct.

> 
> I will resolve it like he did when Linus pulls from net (hopefully
> later today).
> 
> But if you know you have more fixes I'd rather revert this, get all the
> relevant fixed into net, wait for net to be merged into net-next and
> then redo the refactoring.
> 
> Hope that makes sense.

^ permalink raw reply

* Re: [PATCH net-next] Revert ibmvnic merge do_change_param_reset into do_reset
From: ljp @ 2020-11-06 21:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: wvoigt, netdev, Linuxppc-dev, Dany Madden, linuxppc-dev, davem
In-Reply-To: <20201106114208.4b0e8eec@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 2020-11-06 13:42, Jakub Kicinski wrote:
> On Fri, 06 Nov 2020 13:30:25 -0600 ljp wrote:
>> On 2020-11-06 13:17, Dany Madden wrote:
>> > This reverts commit 16b5f5ce351f8709a6b518cc3cbf240c378305bf
>> > where it restructures do_reset. There are patches being tested that
>> > would require major rework if this is committed first.
>> >
>> > We will resend this after the other patches have been applied.
>> 
>> I discussed with my manager, and he has agreed not revert this one
>> since it is in the net-next tree and will not affect net tree for
>> current bug fix patches.
> 
> We merge net into net-next periodically (~every week or so) so if you
> keep making changes to both branches I will have to deal with the
> fallout.
> 
> I'm assuming that the resolution for the current conflict which Stephen
> Rothwell sent from linux-next is correct. Please confirm.
> 
> I will resolve it like he did when Linus pulls from net (hopefully
> later today).
> 
> But if you know you have more fixes I'd rather revert this, get all the
> relevant fixed into net, wait for net to be merged into net-next and
> then redo the refactoring.
> 
> Hope that makes sense.

Jakub,

We had further discussion in the team based on your comments above.
You can revert it for now.
Thanks for your efforts.

Lijun

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox