* [PATCH 0/8] FP/VEC/VSX switching optimisations
@ 2015-11-18 3:26 Cyril Bur
2015-11-18 3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
` (8 more replies)
0 siblings, 9 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
Hi,
These patches are an extension of the work done by Anton
https://patchwork.ozlabs.org/patch/537621/, they'll need to be applied on
top of them.
The goal of these patches is to rework how the 'math' registers (FP, VEC
and VSX) are context switched. Currently the kernel adopts a lazy approach,
always switching userspace tasks with all three facilities disabled and
loads in each set of registers upon receiving each unavailable exception.
The kernel does try to avoid disabling the features in the syscall quick
path but it during testing it appears that even what should be a simple
syscall still causes the kernel to use some facilities (vectorised memcpy
for example) for its self and therefore disable it for the user task.
The lazy approach makes for a small amount of time spent restoring
userspace state and if tasks don't use any of these facilities it is the
correct thing to do. In recent years, new workloads and new features such
as auto vectorisation in GCC have meant that the use of these facilities by
userspace has increased, so much so that some workloads can have a task
take an FP unavailable exception and a VEC unavailable exception almost
every time slice.
This series removes the general laziness in favour of a more selective
approach. If a task uses any of the 'math' facilities the kernel will load
the registers and enable the facilities for future time slices as the
assumption is that the use is likely to continue for some time. This
removes the cost of having to take an exception.
These patches also adds logic to detect if a task had been using a facility
and optimises in the case where the registers are still hot, this provides
another speedup as not only is the cost of the exception saved but the cost
of copying up to 64 x 128 bit registers is also removed.
With these patches applied on top of Antons patches I observe a significant
improvement with Antons context switch microbenchmark using yield():
http://ozlabs.org/~anton/junkcode/context_switch2.c
Using an LE kernel compiled with pseries_le_defconfig
Running:
./context_switch2 --test=yield 8 8
and adding one of --fp, --altivec or --vector
Gives a 5% improvement on a POWER8 CPU.
./context_switch2 --test=yield --fp --altivec --vector 8 8
Gives a 15% improvement on a POWER8 CPU.
I'll take this opportunity to note that 15% can be somewhat misleading. It
may be reasonable to assume that each of the optimisations has had a
compounding effect, this isn't incorrect and the reason behind the apparent
compounding reveals a lot about where the current bottleneck is.
The tests always touch FP first, then VEC then VSX which is the guaranteed
worst case for the way the kernel currently operates. This behaviour will
trigger three subsequent unavailable exceptions. Since the kernel currently
enables all three facilities after taking a VSX unavailable the tests can
be modified to touch VSX->VEC->FP in this order the difference in
performance when touching all three only 5%. There is a compounding effect
in so far as the cost of taking multiple unavailable exception is removed.
This testing also demonstrates that the cost of the exception is by far the
most expensive part of the current lazy approach.
Cyril Bur (8):
selftests/powerpc: Test the preservation of FPU and VMX regs across
syscall
selftests/powerpc: Test preservation of FPU and VMX regs across
preemption
selftests/powerpc: Test FPU and VMX regs in signal ucontext
powerpc: Explicitly disable math features when copying thread
powerpc: Restore FPU/VEC/VSX if previously used
powerpc: Add the ability to save FPU without giving it up
powerpc: Add the ability to save Altivec without giving it up
powerpc: Add the ability to save VSX without giving it up
arch/powerpc/include/asm/processor.h | 2 +
arch/powerpc/include/asm/switch_to.h | 5 +-
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/entry_64.S | 55 +++++-
arch/powerpc/kernel/fpu.S | 25 +--
arch/powerpc/kernel/ppc_ksyms.c | 4 -
arch/powerpc/kernel/process.c | 144 ++++++++++++--
arch/powerpc/kernel/vector.S | 45 +----
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/math/Makefile | 19 ++
tools/testing/selftests/powerpc/math/basic_asm.h | 26 +++
tools/testing/selftests/powerpc/math/fpu_asm.S | 185 +++++++++++++++++
tools/testing/selftests/powerpc/math/fpu_preempt.c | 92 +++++++++
tools/testing/selftests/powerpc/math/fpu_signal.c | 119 +++++++++++
tools/testing/selftests/powerpc/math/fpu_syscall.c | 79 ++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 219 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_preempt.c | 92 +++++++++
tools/testing/selftests/powerpc/math/vmx_signal.c | 124 ++++++++++++
tools/testing/selftests/powerpc/math/vmx_syscall.c | 81 ++++++++
19 files changed, 1240 insertions(+), 81 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/math/Makefile
create mode 100644 tools/testing/selftests/powerpc/math/basic_asm.h
create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
--
2.6.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-23 0:23 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
` (7 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.
fork() works nicely for this purpose, the registers should be the same for
both parent and child
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/math/Makefile | 14 ++
tools/testing/selftests/powerpc/math/basic_asm.h | 26 +++
tools/testing/selftests/powerpc/math/fpu_asm.S | 151 +++++++++++++++++
tools/testing/selftests/powerpc/math/fpu_syscall.c | 79 +++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 183 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_syscall.c | 81 +++++++++
7 files changed, 536 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/math/Makefile
create mode 100644 tools/testing/selftests/powerpc/math/basic_asm.h
create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index 0c2706b..19e8191 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,7 +22,8 @@ SUB_DIRS = benchmarks \
switch_endian \
syscalls \
tm \
- vphn
+ vphn \
+ math
endif
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
new file mode 100644
index 0000000..896d9e2
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -0,0 +1,14 @@
+TEST_PROGS := fpu_syscall vmx_syscall
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c
+$(TEST_PROGS): CFLAGS += -O2 -g
+
+fpu_syscall: fpu_asm.S
+vmx_syscall: vmx_asm.S
+
+include ../../lib.mk
+
+clean:
+ rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/math/basic_asm.h b/tools/testing/selftests/powerpc/math/basic_asm.h
new file mode 100644
index 0000000..27aca79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/basic_asm.h
@@ -0,0 +1,26 @@
+#include <ppc-asm.h>
+#include <asm/unistd.h>
+
+#define LOAD_REG_IMMEDIATE(reg,expr) \
+ lis reg,(expr)@highest; \
+ ori reg,reg,(expr)@higher; \
+ rldicr reg,reg,32,31; \
+ oris reg,reg,(expr)@high; \
+ ori reg,reg,(expr)@l;
+
+#define PUSH_BASIC_STACK(size) \
+ std 2,24(sp); \
+ mflr r0; \
+ std r0,16(sp); \
+ mfcr r0; \
+ stw r0,8(sp); \
+ stdu sp,-size(sp);
+
+#define POP_BASIC_STACK(size) \
+ addi sp,sp,size; \
+ ld 2,24(sp); \
+ ld r0,16(sp); \
+ mtlr r0; \
+ lwz r0,8(sp); \
+ mtcr r0; \
+
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
new file mode 100644
index 0000000..d5412c1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -0,0 +1,151 @@
+#include "basic_asm.h"
+
+#define PUSH_FPU(pos) \
+ stfd f14,pos(sp); \
+ stfd f15,pos+8(sp); \
+ stfd f16,pos+16(sp); \
+ stfd f17,pos+24(sp); \
+ stfd f18,pos+32(sp); \
+ stfd f19,pos+40(sp); \
+ stfd f20,pos+48(sp); \
+ stfd f21,pos+56(sp); \
+ stfd f22,pos+64(sp); \
+ stfd f23,pos+72(sp); \
+ stfd f24,pos+80(sp); \
+ stfd f25,pos+88(sp); \
+ stfd f26,pos+96(sp); \
+ stfd f27,pos+104(sp); \
+ stfd f28,pos+112(sp); \
+ stfd f29,pos+120(sp); \
+ stfd f30,pos+128(sp); \
+ stfd f31,pos+136(sp);
+
+#define POP_FPU(pos) \
+ lfd f14,pos(sp); \
+ lfd f15,pos+8(sp); \
+ lfd f16,pos+16(sp); \
+ lfd f17,pos+24(sp); \
+ lfd f18,pos+32(sp); \
+ lfd f19,pos+40(sp); \
+ lfd f20,pos+48(sp); \
+ lfd f21,pos+56(sp); \
+ lfd f22,pos+64(sp); \
+ lfd f23,pos+72(sp); \
+ lfd f24,pos+80(sp); \
+ lfd f25,pos+88(sp); \
+ lfd f26,pos+96(sp); \
+ lfd f27,pos+104(sp); \
+ lfd f28,pos+112(sp); \
+ lfd f29,pos+120(sp); \
+ lfd f30,pos+128(sp); \
+ lfd f31,pos+136(sp);
+
+#Careful calling this, it will 'clobber' fpu (by design)
+#Don't call this from C
+FUNC_START(load_fpu)
+ lfd f14,0(r3)
+ lfd f15,8(r3)
+ lfd f16,16(r3)
+ lfd f17,24(r3)
+ lfd f18,32(r3)
+ lfd f19,40(r3)
+ lfd f20,48(r3)
+ lfd f21,56(r3)
+ lfd f22,64(r3)
+ lfd f23,72(r3)
+ lfd f24,80(r3)
+ lfd f25,88(r3)
+ lfd f26,96(r3)
+ lfd f27,104(r3)
+ lfd f28,112(r3)
+ lfd f29,120(r3)
+ lfd f30,128(r3)
+ lfd f31,136(r3)
+ blr
+FUNC_END(load_fpu)
+
+FUNC_START(check_fpu)
+ mr r4,r3
+ li r3,1 #assume a bad result
+ lfd f0,0(r4)
+ fcmpu cr1,f0,f14
+ bne cr1,1f
+ lfd f0,8(r4)
+ fcmpu cr1,f0,f15
+ bne cr1,1f
+ lfd f0,16(r4)
+ fcmpu cr1,f0,f16
+ bne cr1,1f
+ lfd f0,24(r4)
+ fcmpu cr1,f0,f17
+ bne cr1,1f
+ lfd f0,32(r4)
+ fcmpu cr1,f0,f18
+ bne cr1,1f
+ lfd f0,40(r4)
+ fcmpu cr1,f0,f19
+ bne cr1,1f
+ lfd f0,48(r4)
+ fcmpu cr1,f0,f20
+ bne cr1,1f
+ lfd f0,56(r4)
+ fcmpu cr1,f0,f21
+ bne cr1,1f
+ lfd f0,64(r4)
+ fcmpu cr1,f0,f22
+ bne cr1,1f
+ lfd f0,72(r4)
+ fcmpu cr1,f0,f23
+ bne cr1,1f
+ lfd f0,80(r4)
+ fcmpu cr1,f0,f24
+ bne cr1,1f
+ lfd f0,88(r4)
+ fcmpu cr1,f0,f25
+ bne cr1,1f
+ lfd f0,96(r4)
+ fcmpu cr1,f0,f26
+ bne cr1,1f
+ lfd f0,104(r4)
+ fcmpu cr1,f0,f27
+ bne cr1,1f
+ lfd f0,112(r4)
+ fcmpu cr1,f0,f28
+ bne cr1,1f
+ lfd f0,120(r4)
+ fcmpu cr1,f0,f29
+ bne cr1,1f
+ lfd f0,128(r4)
+ fcmpu cr1,f0,f30
+ bne cr1,1f
+ lfd f0,136(r4)
+ fcmpu cr1,f0,f31
+ bne cr1,1f
+ li r3,0 #Sucess!!!
+1: blr
+
+FUNC_START(test_fpu)
+ #r3 holds pointer to where to put the result of fork
+ #f14-f31 are non volatiles
+ PUSH_BASIC_STACK(256)
+ std r3,40(sp) #Address of darray
+ std r4,48(sp) #Address of pid
+ PUSH_FPU(56)
+
+ bl load_fpu
+ nop
+ li r0,__NR_fork
+ sc
+
+ #pass the result of the fork to the caller
+ ld r9,48(sp)
+ std r3,0(r9)
+
+ ld r3,40(sp)
+ bl check_fpu
+ nop
+
+ POP_FPU(56)
+ POP_BASIC_STACK(256)
+ blr
+FUNC_END(test_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_syscall.c b/tools/testing/selftests/powerpc/math/fpu_syscall.c
new file mode 100644
index 0000000..a967fd6
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_syscall.c
@@ -0,0 +1,79 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+
+#include "utils.h"
+
+extern int test_fpu(double *darray, pid_t *pid);
+
+double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+int syscall_fpu(void)
+{
+ pid_t fork_pid;
+ int i;
+ int ret;
+ int child_ret;
+ for (i = 0; i < 1000; i++) {
+ /* test_fpu will fork() */
+ ret = test_fpu(darray, &fork_pid);
+ if (fork_pid == -1)
+ return -1;
+ if (fork_pid == 0)
+ exit(ret);
+ waitpid(fork_pid, &child_ret, 0);
+ if (ret || child_ret)
+ return 1;
+ }
+
+ return 0;
+}
+
+int test_syscall_fpu(void)
+{
+ /*
+ * Setup an environment with much context switching
+ */
+ pid_t pid2;
+ pid_t pid = fork();
+ int ret;
+ int child_ret;
+ FAIL_IF(pid == -1);
+
+ pid2 = fork();
+ /* Can't FAIL_IF(pid2 == -1); because already forked once */
+ if (pid2 == -1) {
+ /*
+ * Couldn't fork, ensure test is a fail
+ */
+ child_ret = ret = 1;
+ } else {
+ ret = syscall_fpu();
+ if (pid2)
+ waitpid(pid2, &child_ret, 0);
+ else
+ exit(ret);
+ }
+
+ ret |= child_ret;
+
+ if (pid)
+ waitpid(pid, &child_ret, 0);
+ else
+ exit(ret);
+
+ FAIL_IF(ret || child_ret);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_syscall_fpu, "syscall_fpu");
+
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
new file mode 100644
index 0000000..e642e67
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -0,0 +1,183 @@
+#include "basic_asm.h"
+
+#define PUSH_VMX(pos,reg) \
+ li reg,pos; \
+ stvx v20,reg,sp; \
+ addi reg,reg,16; \
+ stvx v21,reg,sp; \
+ addi reg,reg,16; \
+ stvx v22,reg,sp; \
+ addi reg,reg,16; \
+ stvx v23,reg,sp; \
+ addi reg,reg,16; \
+ stvx v24,reg,sp; \
+ addi reg,reg,16; \
+ stvx v25,reg,sp; \
+ addi reg,reg,16; \
+ stvx v26,reg,sp; \
+ addi reg,reg,16; \
+ stvx v27,reg,sp; \
+ addi reg,reg,16; \
+ stvx v28,reg,sp; \
+ addi reg,reg,16; \
+ stvx v29,reg,sp; \
+ addi reg,reg,16; \
+ stvx v30,reg,sp; \
+ addi reg,reg,16; \
+ stvx v31,reg,sp;
+
+#define POP_VMX(pos,reg) \
+ li reg,pos; \
+ lvx v20,reg,sp; \
+ addi reg,reg,16; \
+ lvx v21,reg,sp; \
+ addi reg,reg,16; \
+ lvx v22,reg,sp; \
+ addi reg,reg,16; \
+ lvx v23,reg,sp; \
+ addi reg,reg,16; \
+ lvx v24,reg,sp; \
+ addi reg,reg,16; \
+ lvx v25,reg,sp; \
+ addi reg,reg,16; \
+ lvx v26,reg,sp; \
+ addi reg,reg,16; \
+ lvx v27,reg,sp; \
+ addi reg,reg,16; \
+ lvx v28,reg,sp; \
+ addi reg,reg,16; \
+ lvx v29,reg,sp; \
+ addi reg,reg,16; \
+ lvx v30,reg,sp; \
+ addi reg,reg,16; \
+ lvx v31,reg,sp;
+
+#Carefull this will 'clobber' vmx (by design)
+#Don't call this from C
+FUNC_START(load_vmx)
+ li r5,0
+ lvx v20,r5,r3
+ addi r5,r5,16
+ lvx v21,r5,r3
+ addi r5,r5,16
+ lvx v22,r5,r3
+ addi r5,r5,16
+ lvx v23,r5,r3
+ addi r5,r5,16
+ lvx v24,r5,r3
+ addi r5,r5,16
+ lvx v25,r5,r3
+ addi r5,r5,16
+ lvx v26,r5,r3
+ addi r5,r5,16
+ lvx v27,r5,r3
+ addi r5,r5,16
+ lvx v28,r5,r3
+ addi r5,r5,16
+ lvx v29,r5,r3
+ addi r5,r5,16
+ lvx v30,r5,r3
+ addi r5,r5,16
+ lvx v31,r5,r3
+ blr
+FUNC_END(load_vmx)
+
+#Should be safe from C, only touches r4, r5 and v0,v1,v2
+FUNC_START(check_vmx)
+ PUSH_BASIC_STACK(16)
+ mr r4,r3
+ li r3,1 #assume a bad result
+ li r5,0
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v20
+ vmr v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v21
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v22
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v23
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v24
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v25
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v26
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v27
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v28
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v29
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v30
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v31
+ vand v2,v2,v1
+
+ li r5,0
+ stvx v2,r5,sp
+ ldx r0,r5,sp
+ cmpdi r0,0xffffffff
+ bne 1f
+ li r3,0
+1: POP_BASIC_STACK(16)
+ blr
+FUNC_END(check_vmx)
+
+#Safe from C
+FUNC_START(test_vmx)
+ #r3 holds pointer to where to put the result of fork
+ #v20-v31 are non-volatile
+ PUSH_BASIC_STACK(512)
+ std r3,40(sp) #Address of varray
+ std r4,48(sp) #address of pid
+ PUSH_VMX(56, r4)
+
+ bl load_vmx
+
+ li r0,__NR_fork
+ sc
+ #Pass the result of fork back to the caller
+ ld r9,48(sp)
+ std r3,0(r9)
+
+ ld r3,40(sp)
+ bl check_vmx
+
+ POP_VMX(56,r4)
+ POP_BASIC_STACK(512)
+ blr
+FUNC_END(test_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_syscall.c b/tools/testing/selftests/powerpc/math/vmx_syscall.c
new file mode 100644
index 0000000..7adff05
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_syscall.c
@@ -0,0 +1,81 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "utils.h"
+
+typedef int v4si __attribute__ ((vector_size (16)));
+v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+extern int test_vmx(v4si *varray, pid_t *pid);
+
+int vmx_syscall(void)
+{
+ pid_t fork_pid;
+ int i;
+ int ret;
+ int child_ret;
+ for (i = 0; i < 1000; i++) {
+ /* test_vmx will fork() */
+ ret = test_vmx(varray, &fork_pid);
+ if (fork_pid == -1)
+ return -1;
+ if (fork_pid == 0)
+ exit(ret);
+ waitpid(fork_pid, &child_ret, 0);
+ if (ret || child_ret)
+ return 1;
+ }
+
+ return 0;
+}
+
+int test_vmx_syscall(void)
+{
+ /*
+ * Setup an environment with much context switching
+ */
+ pid_t pid2;
+ pid_t pid = fork();
+ int ret;
+ int child_ret;
+ FAIL_IF(pid == -1);
+
+ pid2 = fork();
+ ret = vmx_syscall();
+ /* Can't FAIL_IF(pid2 == -1); because we've already forked */
+ if (pid2 == -1) {
+ /*
+ * Couldn't fork, ensure child_ret is set and is a fail
+ */
+ ret = child_ret = 1;
+ } else {
+ if (pid2)
+ waitpid(pid2, &child_ret, 0);
+ else
+ exit(ret);
+ }
+
+ ret |= child_ret;
+
+ if (pid)
+ waitpid(pid, &child_ret, 0);
+ else
+ exit(ret);
+
+ FAIL_IF(ret || child_ret);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_vmx_syscall, "vmx_syscall");
+
+}
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
2015-11-18 3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-23 0:34 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
` (6 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
Loop in assembly checking the registers with many threads.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/math/Makefile | 7 +-
tools/testing/selftests/powerpc/math/fpu_asm.S | 34 ++++++++
tools/testing/selftests/powerpc/math/fpu_preempt.c | 92 ++++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 44 ++++++++++-
tools/testing/selftests/powerpc/math/vmx_preempt.c | 92 ++++++++++++++++++++++
5 files changed, 263 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index 896d9e2..9fa690f 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,12 +1,15 @@
-TEST_PROGS := fpu_syscall vmx_syscall
+TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
all: $(TEST_PROGS)
$(TEST_PROGS): ../harness.c
-$(TEST_PROGS): CFLAGS += -O2 -g
+$(TEST_PROGS): CFLAGS += -O2 -g -pthread
fpu_syscall: fpu_asm.S
+fpu_preempt: fpu_asm.S
+
vmx_syscall: vmx_asm.S
+vmx_preempt: vmx_asm.S
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
index d5412c1..5ff0adc 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -149,3 +149,37 @@ FUNC_START(test_fpu)
POP_BASIC_STACK(256)
blr
FUNC_END(test_fpu)
+
+#int preempt_fpu(double *darray, volatile int *not_ready, int *sentinal)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with darray. Will proceed to check the validity of the FPU
+#registers while sentinal is not zero.
+FUNC_START(preempt_fpu)
+ PUSH_BASIC_STACK(256)
+ std r3,32(sp) #double *darray
+ std r4,40(sp) #volatile int *not_ready
+ std r5,48(sp) #int *sentinal
+ PUSH_FPU(56)
+
+ bl load_fpu
+
+ #Atomic DEC
+ ld r3,40(sp)
+1: lwarx r4,0,r3
+ addi r4,r4,-1
+ stwcx. r4,0,r3
+ bne- 1b
+
+2: ld r3, 32(sp)
+ bl check_fpu
+ cmpdi r3,0
+ bne 3f
+ ld r4, 48(sp)
+ ld r5, 0(r4)
+ cmpwi r5,0
+ bne 2b
+
+3: POP_FPU(56)
+ POP_BASIC_STACK(256)
+ blr
+FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c b/tools/testing/selftests/powerpc/math/fpu_preempt.c
new file mode 100644
index 0000000..e24cf9b
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
@@ -0,0 +1,92 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+volatile int not_ready;
+int running;
+
+extern void preempt_fpu(double *darray, volatile int *not_ready, int *sentinal);
+
+void *preempt_fpu_c(void *p)
+{
+ int i;
+ srand(pthread_self());
+ for (i = 0; i < 21; i++)
+ darray[i] = rand();
+
+ /* Test failed if it ever returns */
+ preempt_fpu(darray, ¬_ready, &running);
+
+ return p;
+}
+
+int test_preempt_fpu(void)
+{
+ int i, rc, threads;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc((threads) * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ not_ready = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, preempt_fpu_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ /* Not really nessesary but nice to wait for every thread to start */
+ printf("\tWaiting for all workers to start...");
+ while(not_ready);
+ printf("done\n");
+
+ printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+ sleep(PREEMPT_TIME);
+ printf("done\n");
+
+ printf("\tKilling workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ void *rc_p;
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why preempt_fpu
+ * returned
+ */
+ if ((long) rc_p)
+ printf("oops\n");
+ FAIL_IF((long) rc_p);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_preempt_fpu, "fpu_preempt");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
index e642e67..23db4b3 100644
--- a/tools/testing/selftests/powerpc/math/vmx_asm.S
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -1,5 +1,6 @@
#include "basic_asm.h"
+#POS MUST BE 16 ALIGNED!
#define PUSH_VMX(pos,reg) \
li reg,pos; \
stvx v20,reg,sp; \
@@ -26,6 +27,7 @@
addi reg,reg,16; \
stvx v31,reg,sp;
+#POS MUST BE 16 ALIGNED!
#define POP_VMX(pos,reg) \
li reg,pos; \
lvx v20,reg,sp; \
@@ -84,7 +86,7 @@ FUNC_END(load_vmx)
#Should be safe from C, only touches r4, r5 and v0,v1,v2
FUNC_START(check_vmx)
- PUSH_BASIC_STACK(16)
+ PUSH_BASIC_STACK(32)
mr r4,r3
li r3,1 #assume a bad result
li r5,0
@@ -153,7 +155,7 @@ FUNC_START(check_vmx)
cmpdi r0,0xffffffff
bne 1f
li r3,0
-1: POP_BASIC_STACK(16)
+1: POP_BASIC_STACK(32)
blr
FUNC_END(check_vmx)
@@ -164,7 +166,7 @@ FUNC_START(test_vmx)
PUSH_BASIC_STACK(512)
std r3,40(sp) #Address of varray
std r4,48(sp) #address of pid
- PUSH_VMX(56, r4)
+ PUSH_VMX(64, r4)
bl load_vmx
@@ -177,7 +179,41 @@ FUNC_START(test_vmx)
ld r3,40(sp)
bl check_vmx
- POP_VMX(56,r4)
+ POP_VMX(64,r4)
POP_BASIC_STACK(512)
blr
FUNC_END(test_vmx)
+
+#int preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with varray. Will proceed to check the validity of the FPU
+#registers while sentinal is not zero.
+FUNC_START(preempt_vmx)
+ PUSH_BASIC_STACK(512)
+ std r3,32(sp) #v4si *varray
+ std r4,40(sp) #volatile int *not_ready
+ std r5,48(sp) #int *sentinal
+ PUSH_VMX(64,r4)
+
+ bl load_vmx
+
+ #Atomic DEC
+ ld r3,40(sp)
+1: lwarx r4,0,r3
+ addi r4,r4,-1
+ stwcx. r4,0,r3
+ bne- 1b
+
+2: ld r3,32(sp)
+ bl check_vmx
+ cmpdi r3,0
+ bne 3f
+ ld r4,48(sp)
+ ld r5,0(r4)
+ cmpwi r5,0
+ bne 2b
+
+3: POP_VMX(64,r4)
+ POP_BASIC_STACK(512)
+ blr
+FUNC_END(preempt_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_preempt.c b/tools/testing/selftests/powerpc/math/vmx_preempt.c
new file mode 100644
index 0000000..342db15
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_preempt.c
@@ -0,0 +1,92 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+volatile int not_ready;
+int running;
+
+extern void preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal);
+
+void *preempt_vmx_c(void *p)
+{
+ int i, j;
+ srand(pthread_self());
+ for (i = 0; i < 12; i++)
+ for (j = 0; j < 4; j++)
+ varray[i][j] = rand();
+
+ /* Test fails if it ever returns */
+ preempt_vmx(varray, ¬_ready, &running);
+ return p;
+}
+
+int test_preempt_vmx(void)
+{
+ int i, rc, threads;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ not_ready = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, preempt_vmx_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ /* Not really nessesary but nice to wait for every thread to start */
+ printf("\tWaiting for all workers to start...");
+ while(not_ready);
+ printf("done\n");
+
+ printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+ sleep(PREEMPT_TIME);
+ printf("done\n");
+
+ printf("\tKilling workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ void *rc_p;
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why preempt_vmx
+ * returned
+ */
+ if ((long) rc_p)
+ printf("oops\n");
+ FAIL_IF((long) rc_p);
+ }
+ printf("done\n");
+
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_preempt_vmx, "vmx_preempt");
+}
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
2015-11-18 3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2015-11-18 3:26 ` [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-19 11:36 ` [3/8] " Michael Ellerman
2015-11-23 1:04 ` [PATCH 3/8] " Michael Neuling
2015-11-18 3:26 ` [PATCH 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
` (5 subsequent siblings)
8 siblings, 2 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/math/Makefile | 4 +-
tools/testing/selftests/powerpc/math/fpu_signal.c | 119 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_signal.c | 124 ++++++++++++++++++++++
3 files changed, 246 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index 9fa690f..5ce000bf 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal
all: $(TEST_PROGS)
@@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS += -O2 -g -pthread
fpu_syscall: fpu_asm.S
fpu_preempt: fpu_asm.S
+fpu_signal: fpu_asm.S
vmx_syscall: vmx_asm.S
vmx_preempt: vmx_asm.S
+vmx_signal: vmx_asm.S
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c b/tools/testing/selftests/powerpc/math/fpu_signal.c
new file mode 100644
index 0000000..ca61d1f
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -0,0 +1,119 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should recieve the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+bool bad_context;
+int running;
+volatile int not_ready;
+extern long preempt_fpu(double *darray, volatile int *not_ready, int *sentinal);
+
+void signal_fpu_sig(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ ucontext_t *uc = context;
+ mcontext_t *mc = &uc->uc_mcontext;
+
+ /* Only the non volatiles were loaded up */
+ for (i = 14; i < 32; i++) {
+ if (mc->fp_regs[i] != darray[i - 14]) {
+ bad_context = true;
+ break;
+ }
+ }
+}
+
+void *signal_fpu_c(void *p)
+{
+ int i;
+ long rc;
+ struct sigaction act;
+ act.sa_sigaction = signal_fpu_sig;
+ act.sa_flags = SA_SIGINFO;
+ rc = sigaction(SIGUSR1, &act, NULL);
+ if (rc)
+ return p;
+
+ srand(pthread_self());
+ for (i = 0; i < 21; i++)
+ darray[i] = rand();
+
+ rc = preempt_fpu(darray, ¬_ready, &running);
+
+ return (void *) rc;
+}
+
+int test_signal_fpu(void)
+{
+ int i, j, rc, threads;
+ void *rc_p;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ not_ready = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, signal_fpu_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ printf("\tWaiting for all workers to start...");
+ while (not_ready);
+ printf("done\n");
+
+ printf("\tSending signals to all threads %d times...", ITERATIONS);
+ for (i = 0; i < ITERATIONS; i++) {
+ for (j = 0; j < threads; j++) {
+ pthread_kill(tids[j], SIGUSR1);
+ }
+ sleep(1);
+ }
+ printf("done\n");
+
+ printf("\tKilling workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why signal_fpu
+ * returned
+ */
+ if ((long) rc_p || bad_context)
+ printf("oops\n");
+ if (bad_context)
+ fprintf(stderr, "\t!! bad_context is true\n");
+ FAIL_IF((long) rc_p || bad_context);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_signal_fpu, "fpu_signal");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_signal.c b/tools/testing/selftests/powerpc/math/vmx_signal.c
new file mode 100644
index 0000000..007ac9e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_signal.c
@@ -0,0 +1,124 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should recieve the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+bool bad_context;
+int running;
+volatile int not_ready;
+
+extern int preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal);
+
+void signal_vmx_sig(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ ucontext_t *uc = context;
+ mcontext_t *mc = &uc->uc_mcontext;
+
+ /* Only the non volatiles were loaded up */
+ for (i = 20; i < 32; i++) {
+ if (memcmp(mc->v_regs->vrregs[i], &varray[i - 20], 16)) {
+ bad_context = true;
+ break;
+ }
+ }
+}
+
+void *signal_vmx_c(void *p)
+{
+ int i, j;
+ long rc;
+ struct sigaction act;
+ act.sa_sigaction = signal_vmx_sig;
+ act.sa_flags = SA_SIGINFO;
+ rc = sigaction(SIGUSR1, &act, NULL);
+ if (rc)
+ return p;
+
+ srand(pthread_self());
+ for (i = 0; i < 12; i++)
+ for (j = 0; j < 4; j++)
+ varray[i][j] = rand();
+
+ rc = preempt_vmx(varray, ¬_ready, &running);
+
+ return (void *) rc;
+}
+
+int test_signal_vmx(void)
+{
+ int i, j, rc, threads;
+ void *rc_p;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ not_ready = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, signal_vmx_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ printf("\tWaiting for all workers to start...");
+ while (not_ready);
+ printf("done\n");
+
+ printf("\tSending signals to all threads %d times...", ITERATIONS);
+ for (i = 0; i < ITERATIONS; i++) {
+ for (j = 0; j < threads; j++) {
+ pthread_kill(tids[j], SIGUSR1);
+ }
+ sleep(1);
+ }
+ printf("done\n");
+
+ printf("\tKilling workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why signal_vmx
+ * returned
+ */
+ if ((long) rc_p || bad_context)
+ printf("oops\n");
+ if (bad_context)
+ fprintf(stderr, "\t!! bad_context is true\n");
+ FAIL_IF((long) rc_p || bad_context);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_signal_vmx, "vmx_signal");
+}
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/8] powerpc: Explicitly disable math features when copying thread
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (2 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-23 1:08 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
` (4 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
With threads leaving the math bits enabled in their saved MSR to indicate
that the hardware is hot and a restore is not needed, children need to turn
it off as when they do get scheduled, there's no way their registers could
have been hot.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/kernel/process.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 398f7bf..441d9e5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1243,6 +1243,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
f = ret_from_fork;
}
+ childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD;
/*
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (3 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-20 11:01 ` Michael Ellerman
2015-11-23 1:29 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
` (3 subsequent siblings)
8 siblings, 2 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a
problem unless a process is using these facilities.
Modern versions of GCC are very good at automatically vectorising code, new
and modernised workloads make use of floating point and vector facilities,
even the kernel makes use of vectorised memcpy.
All this combined greatly increases the cost of a syscall since the kernel
uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking the trifecta.
The obvious overcompensation to this problem is to simply always load all
the facilities on every exit to userspace. Loading up all FPU, VEC and VSX
registers every time can be expensive and if a workload does avoid using
them, it should not be forced to incur this penalty.
An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back to
zero.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/processor.h | 2 ++
arch/powerpc/kernel/asm-offsets.c | 2 ++
arch/powerpc/kernel/entry_64.S | 55 ++++++++++++++++++++++++++++--
arch/powerpc/kernel/fpu.S | 4 +++
arch/powerpc/kernel/process.c | 66 ++++++++++++++++++++++++++++++------
arch/powerpc/kernel/vector.S | 4 +++
6 files changed, 119 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ac23308..dcab21f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,11 +236,13 @@ struct thread_struct {
#endif
struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long trap_nr; /* last trap # on this thread */
+ u8 load_fp;
#ifdef CONFIG_ALTIVEC
struct thread_vr_state vr_state;
struct thread_vr_state *vr_save_area;
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
+ u8 load_vec;
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
/* VSR status */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 221d584..0f593d7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -95,12 +95,14 @@ int main(void)
DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area));
DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
+ DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
#ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area));
DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
+ DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec));
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index c8b4225..46e9869 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,7 +210,54 @@ system_call: /* label this so stack traces look sane */
li r11,-MAX_ERRNO
andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne- syscall_exit_work
- cmpld r3,r11
+
+ /*
+ * This is an assembly version of checks performed in restore_math()
+ * to avoid calling C unless absolutely necessary.
+ * Note: In order to simplify the assembly, if the FP or VEC registers
+ * are hot (and therefore restore_math() isn't called) the
+ * LOAD_{FP,VEC} thread counter doesn't get incremented.
+ * This is likely the best thing to do anyway because hot regs indicate
+ * that the workload is doing a lot of syscalls that can be handled
+ * quickly and without the need to touch FP or VEC regs (by the kernel).
+ * a) If this workload is long running then this is exactly what the
+ * kernel should be doing.
+ * b) If this workload isn't long running then we'll soon fall back to
+ * calling into C and the counter will be incremented regularly again
+ * anyway.
+ */
+ ld r9,PACACURRENT(r13)
+ andi. r0,r8,MSR_FP
+ addi r9,r9,THREAD
+ lbz r5,THREAD_LOAD_FP(r9)
+ /*
+ * Goto 2 if !r0 && r5
+ * The cmpb works because r5 can only have bits set in the lowest byte
+ * and r0 may or may not have bit 13 set (different byte) but will have
+ * a zero low byte therefore the low bytes must differ if r5 == true
+ * and the bit 13 byte must be the same if !r0
+ */
+ cmpb r7,r0,r5
+ cmpldi r7,0xff0
+#ifdef CONFIG_ALTIVEC
+ beq 2f
+
+ lbz r9,THREAD_LOAD_VEC(r9)
+ andis. r0,r8,MSR_VEC@h
+ /* Skip (goto 3) if r0 || !r9 */
+ bne 3f
+ cmpldi r9,0
+ beq 3f
+#else
+ bne 3f
+#endif
+2: addi r3,r1,STACK_FRAME_OVERHEAD
+ bl restore_math
+ ld r8,_MSR(r1)
+ ld r3,RESULT(r1)
+ li r11,-MAX_ERRNO
+
+3: cmpld r3,r11
ld r5,_CCR(r1)
bge- syscall_error
.Lsyscall_error_cont:
@@ -592,8 +639,8 @@ _GLOBAL(ret_from_except_lite)
/* Check current_thread_info()->flags */
andi. r0,r4,_TIF_USER_WORK_MASK
-#ifdef CONFIG_PPC_BOOK3E
bne 1f
+#ifdef CONFIG_PPC_BOOK3E
/*
* Check to see if the dbcr0 register is set up to debug.
* Use the internal debug mode bit to do this.
@@ -608,7 +655,9 @@ _GLOBAL(ret_from_except_lite)
mtspr SPRN_DBSR,r10
b restore
#else
- beq restore
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ bl restore_math
+ b restore
#endif
1: andi. r0,r4,_TIF_NEED_RESCHED
beq 2f
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 2117eac..b063524 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -130,6 +130,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
or r12,r12,r4
std r12,_MSR(r1)
#endif
+ /* Don't care if r4 overflows, this is desired behaviour */
+ lbz r4,THREAD_LOAD_FP(r5)
+ addi r4,r4,1
+ stb r4,THREAD_LOAD_FP(r5)
addi r10,r5,THREAD_FPSTATE
lfd fr0,FPSTATE_FPSCR(r10)
MTFSF_L(fr0)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 441d9e5..c602b67 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -366,6 +366,53 @@ void giveup_all(struct task_struct *tsk)
msr_check_and_clear(msr_all_available);
}
+void restore_math(struct pt_regs *regs)
+{
+ unsigned long msr;
+
+ if (!current->thread.load_fp
+#ifdef CONFIG_ALTIVEC
+ && !current->thread.load_vec)
+#else
+ )
+#endif
+ return;
+
+ msr = regs->msr;
+ msr_check_and_set(msr_all_available);
+
+ /*
+ * Only reload if the bit is not set in the user MSR, the bit BEING set
+ * indicates that the registers are hot
+ */
+#ifdef CONFIG_PPC_FPU
+ if (current->thread.load_fp && !(msr & MSR_FP)) {
+ load_fp_state(¤t->thread.fp_state);
+ msr |= MSR_FP | current->thread.fpexc_mode;
+ current->thread.load_fp++;
+ }
+#endif
+#ifdef CONFIG_ALTIVEC
+ if (current->thread.load_vec && !(msr & MSR_VEC) &&
+ cpu_has_feature(CPU_FTR_ALTIVEC)) {
+ load_vr_state(¤t->thread.vr_state);
+ current->thread.used_vr = 1;
+ msr |= MSR_VEC;
+ current->thread.load_vec++;
+ }
+#endif
+#ifdef CONFIG_VSX
+ if (!(msr & MSR_VSX) && (msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
+ current->thread.used_vsr = 1;
+ msr |= MSR_VSX;
+ }
+#endif
+
+ msr_check_and_clear(msr_all_available);
+
+ regs->msr = msr;
+}
+
void flush_all_to_thread(struct task_struct *tsk)
{
if (tsk->thread.regs) {
@@ -806,17 +853,9 @@ void restore_tm_state(struct pt_regs *regs)
msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
- if (msr_diff & MSR_FP) {
- msr_check_and_set(MSR_FP);
- load_fp_state(¤t->thread.fp_state);
- msr_check_and_clear(MSR_FP);
- regs->msr |= current->thread.fpexc_mode;
- }
- if (msr_diff & MSR_VEC) {
- msr_check_and_set(MSR_VEC);
- load_vr_state(¤t->thread.vr_state);
- msr_check_and_clear(MSR_VEC);
- }
+
+ restore_math(regs);
+
regs->msr |= msr_diff;
}
@@ -977,6 +1016,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
batch = this_cpu_ptr(&ppc64_tlb_batch);
batch->active = 1;
}
+
+ /* Don't do this on a kernel thread */
+ if (current_thread_info()->task->thread.regs)
+ restore_math(current_thread_info()->task->thread.regs);
+
#endif /* CONFIG_PPC_BOOK3S_64 */
return last;
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 162d0f7..038cff8 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
oris r12,r12,MSR_VEC@h
std r12,_MSR(r1)
#endif
+ /* Don't care if r4 overflows, this is desired behaviour */
+ lbz r4,THREAD_LOAD_VEC(r5)
+ addi r4,r4,1
+ stb r4,THREAD_LOAD_VEC(r5)
addi r6,r5,THREAD_VRSTATE
li r4,1
li r10,VRSTATE_VSCR
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/8] powerpc: Add the ability to save FPU without giving it up
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (4 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-18 3:26 ` [PATCH 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
` (2 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 2 +-
arch/powerpc/kernel/fpu.S | 21 ++++------------
arch/powerpc/kernel/process.c | 46 +++++++++++++++++++++++++++++++++++-
3 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 171ce13..8cf7fd6 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -28,7 +28,7 @@ extern void giveup_all(struct task_struct *);
extern void enable_kernel_fp(void);
extern void flush_fp_to_thread(struct task_struct *);
extern void giveup_fpu(struct task_struct *);
-extern void __giveup_fpu(struct task_struct *);
+extern void save_fpu(struct task_struct *);
static inline void disable_kernel_fp(void)
{
msr_check_and_clear(MSR_FP);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index b063524..15da2b5 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
blr
/*
- * __giveup_fpu(tsk)
- * Disable FP for the task given as the argument,
- * and save the floating-point registers in its thread_struct.
+ * save_fpu(tsk)
+ * Save the floating-point registers in its thread_struct.
* Enables the FPU for use in the kernel on return.
*/
-_GLOBAL(__giveup_fpu)
+_GLOBAL(save_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r6,THREAD_FPSAVEAREA(r3)
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r6,0
bne 2f
addi r6,r3,THREAD_FPSTATE
-2: PPC_LCMPI 0,r5,0
- SAVE_32FPVSRS(0, R4, R6)
+2: SAVE_32FPVSRS(0, R4, R6)
mffs fr0
stfd fr0,FPSTATE_FPSCR(r6)
- beq 1f
- PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
- li r3,MSR_FP|MSR_FE0|MSR_FE1
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
- oris r3,r3,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
- andc r4,r4,r3 /* disable FP for previous task */
- PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
/*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index c602b67..51e246a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
EXPORT_SYMBOL(__msr_check_and_clear);
#ifdef CONFIG_PPC_FPU
+void __giveup_fpu(struct task_struct *tsk)
+{
+ save_fpu(tsk);
+ tsk->thread.regs->msr &= ~MSR_FP;
+#ifdef CONFIG_VSX
+ if (cpu_has_feature(CPU_FTR_VSX))
+ tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
void giveup_fpu(struct task_struct *tsk)
{
check_if_tm_restore_required(tsk);
@@ -413,12 +423,46 @@ void restore_math(struct pt_regs *regs)
regs->msr = msr;
}
+void save_all(struct task_struct *tsk)
+{
+ unsigned long usermsr;
+
+ if (!tsk->thread.regs)
+ return;
+
+ usermsr = tsk->thread.regs->msr;
+
+ if ((usermsr & msr_all_available) == 0)
+ return;
+
+ msr_check_and_set(msr_all_available);
+
+#ifdef CONFIG_PPC_FPU
+ if (usermsr & MSR_FP)
+ save_fpu(tsk);
+#endif
+#ifdef CONFIG_ALTIVEC
+ if (usermsr & MSR_VEC)
+ __giveup_altivec(tsk);
+#endif
+#ifdef CONFIG_VSX
+ if (usermsr & MSR_VSX)
+ __giveup_vsx(tsk);
+#endif
+#ifdef CONFIG_SPE
+ if (usermsr & MSR_SPE)
+ __giveup_spe(tsk);
+#endif
+
+ msr_check_and_clear(msr_all_available);
+}
+
void flush_all_to_thread(struct task_struct *tsk)
{
if (tsk->thread.regs) {
preempt_disable();
BUG_ON(tsk != current);
- giveup_all(tsk);
+ save_all(tsk);
#ifdef CONFIG_SPE
if (tsk->thread.regs->msr & MSR_SPE)
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/8] powerpc: Add the ability to save Altivec without giving it up
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (5 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-18 3:26 ` [PATCH 8/8] powerpc: Add the ability to save VSX " Cyril Bur
2015-11-18 14:51 ` [PATCH 0/8] FP/VEC/VSX switching optimisations David Laight
8 siblings, 0 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 2 +-
arch/powerpc/kernel/process.c | 12 +++++++++++-
arch/powerpc/kernel/vector.S | 24 ++++--------------------
3 files changed, 16 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 8cf7fd6..372f297 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -39,7 +39,7 @@ static inline void disable_kernel_fp(void)
extern void enable_kernel_altivec(void);
extern void flush_altivec_to_thread(struct task_struct *);
extern void giveup_altivec(struct task_struct *);
-extern void __giveup_altivec(struct task_struct *);
+extern void save_altivec(struct task_struct *);
static inline void disable_kernel_altivec(void)
{
msr_check_and_clear(MSR_VEC);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 51e246a..19e803a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -198,6 +198,16 @@ void enable_kernel_fp(void)
EXPORT_SYMBOL(enable_kernel_fp);
#ifdef CONFIG_ALTIVEC
+void __giveup_altivec(struct task_struct *tsk)
+{
+ save_altivec(tsk);
+ tsk->thread.regs->msr &= ~MSR_VEC;
+#ifdef CONFIG_VSX
+ if (cpu_has_feature(CPU_FTR_VSX))
+ tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
void giveup_altivec(struct task_struct *tsk)
{
check_if_tm_restore_required(tsk);
@@ -443,7 +453,7 @@ void save_all(struct task_struct *tsk)
#endif
#ifdef CONFIG_ALTIVEC
if (usermsr & MSR_VEC)
- __giveup_altivec(tsk);
+ save_altivec(tsk);
#endif
#ifdef CONFIG_VSX
if (usermsr & MSR_VSX)
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 038cff8..51b0c17 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec)
blr
/*
- * __giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * save_altivec(tsk)
+ * Save the vector registers to its thread_struct
*/
-_GLOBAL(__giveup_altivec)
+_GLOBAL(save_altivec)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r7,THREAD_VRSAVEAREA(r3)
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r7,0
bne 2f
addi r7,r3,THREAD_VRSTATE
-2: PPC_LCMPI 0,r5,0
- SAVE_32VRS(0,r4,r7)
+2: SAVE_32VRS(0,r4,r7)
mfvscr v0
li r4,VRSTATE_VSCR
stvx v0,r4,r7
- beq 1f
- PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
- lis r3,(MSR_VEC|MSR_VSX)@h
-FTR_SECTION_ELSE
- lis r3,MSR_VEC@h
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX)
-#else
- lis r3,MSR_VEC@h
-#endif
- andc r4,r4,r3 /* disable FP for previous task */
- PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
#ifdef CONFIG_VSX
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 8/8] powerpc: Add the ability to save VSX without giving it up
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (6 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
@ 2015-11-18 3:26 ` Cyril Bur
2015-11-18 14:51 ` [PATCH 0/8] FP/VEC/VSX switching optimisations David Laight
8 siblings, 0 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 3:26 UTC (permalink / raw)
To: mikey, anton, linuxppc-dev
This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 1 -
arch/powerpc/kernel/ppc_ksyms.c | 4 ----
arch/powerpc/kernel/process.c | 23 ++++++++++++++++++-----
arch/powerpc/kernel/vector.S | 17 -----------------
4 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 372f297..15843d3 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -50,7 +50,6 @@ static inline void disable_kernel_altivec(void)
extern void enable_kernel_vsx(void);
extern void flush_vsx_to_thread(struct task_struct *);
extern void giveup_vsx(struct task_struct *);
-extern void __giveup_vsx(struct task_struct *);
static inline void disable_kernel_vsx(void)
{
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 41e1607..ef7024da 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
EXPORT_SYMBOL(store_vr_state);
#endif
-#ifdef CONFIG_VSX
-EXPORT_SYMBOL_GPL(__giveup_vsx);
-#endif
-
#ifdef CONFIG_EPAPR_PARAVIRT
EXPORT_SYMBOL(epapr_hypercall_start);
#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 19e803a..e0bb3d7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -248,20 +248,33 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
-void giveup_vsx(struct task_struct *tsk)
+void __giveup_vsx(struct task_struct *tsk)
{
- check_if_tm_restore_required(tsk);
-
- msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
if (tsk->thread.regs->msr & MSR_FP)
__giveup_fpu(tsk);
if (tsk->thread.regs->msr & MSR_VEC)
__giveup_altivec(tsk);
+ tsk->thread.regs->msr &= ~MSR_VSX;
+}
+
+void giveup_vsx(struct task_struct *tsk)
+{
+ check_if_tm_restore_required(tsk);
+
+ msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
__giveup_vsx(tsk);
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
}
EXPORT_SYMBOL(giveup_vsx);
+void save_vsx(struct task_struct *tsk)
+{
+ if (tsk->thread.regs->msr & MSR_FP)
+ save_fpu(tsk);
+ if (tsk->thread.regs->msr & MSR_VEC)
+ save_altivec(tsk);
+}
+
void enable_kernel_vsx(void)
{
WARN_ON(preemptible());
@@ -457,7 +470,7 @@ void save_all(struct task_struct *tsk)
#endif
#ifdef CONFIG_VSX
if (usermsr & MSR_VSX)
- __giveup_vsx(tsk);
+ save_vsx(tsk);
#endif
#ifdef CONFIG_SPE
if (usermsr & MSR_SPE)
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 51b0c17..1c2e7a3 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
std r12,_MSR(r1)
b fast_exception_return
-/*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
- * Does NOT save vsx registers.
- */
-_GLOBAL(__giveup_vsx)
- addi r3,r3,THREAD /* want THREAD of task */
- ld r5,PT_REGS(r3)
- cmpdi 0,r5,0
- beq 1f
- ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
- lis r3,MSR_VSX@h
- andc r4,r4,r3 /* disable VSX for previous task */
- std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
- blr
-
#endif /* CONFIG_VSX */
--
2.6.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* RE: [PATCH 0/8] FP/VEC/VSX switching optimisations
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
` (7 preceding siblings ...)
2015-11-18 3:26 ` [PATCH 8/8] powerpc: Add the ability to save VSX " Cyril Bur
@ 2015-11-18 14:51 ` David Laight
2015-11-18 23:01 ` Cyril Bur
8 siblings, 1 reply; 23+ messages in thread
From: David Laight @ 2015-11-18 14:51 UTC (permalink / raw)
To: 'Cyril Bur', mikey@neuling.org, anton@samba.org,
linuxppc-dev@ozlabs.org
RnJvbTogQ3lyaWwgQnVyDQo+IFNlbnQ6IDE4IE5vdmVtYmVyIDIwMTUgMDM6MjcNCi4uLg0KPiBU
aGUgZ29hbCBvZiB0aGVzZSBwYXRjaGVzIGlzIHRvIHJld29yayBob3cgdGhlICdtYXRoJyByZWdp
c3RlcnMgKEZQLCBWRUMNCj4gYW5kIFZTWCkgYXJlIGNvbnRleHQgc3dpdGNoZWQuIEN1cnJlbnRs
eSB0aGUga2VybmVsIGFkb3B0cyBhIGxhenkgYXBwcm9hY2gsDQo+IGFsd2F5cyBzd2l0Y2hpbmcg
dXNlcnNwYWNlIHRhc2tzIHdpdGggYWxsIHRocmVlIGZhY2lsaXRpZXMgZGlzYWJsZWQgYW5kDQo+
IGxvYWRzIGluIGVhY2ggc2V0IG9mIHJlZ2lzdGVycyB1cG9uIHJlY2VpdmluZyBlYWNoIHVuYXZh
aWxhYmxlIGV4Y2VwdGlvbi4NCj4gVGhlIGtlcm5lbCBkb2VzIHRyeSB0byBhdm9pZCBkaXNhYmxp
bmcgdGhlIGZlYXR1cmVzIGluIHRoZSBzeXNjYWxsIHF1aWNrDQo+IHBhdGggYnV0IGl0IGR1cmlu
ZyB0ZXN0aW5nIGl0IGFwcGVhcnMgdGhhdCBldmVuIHdoYXQgc2hvdWxkIGJlIGEgc2ltcGxlDQo+
IHN5c2NhbGwgc3RpbGwgY2F1c2VzIHRoZSBrZXJuZWwgdG8gdXNlIHNvbWUgZmFjaWxpdGllcyAo
dmVjdG9yaXNlZCBtZW1jcHkNCj4gZm9yIGV4YW1wbGUpIGZvciBpdHMgc2VsZiBhbmQgdGhlcmVm
b3JlIGRpc2FibGUgaXQgZm9yIHRoZSB1c2VyIHRhc2suDQoNClBlcmhhcHMgdGhlIGtlcm5lbCBz
aG91bGQgYmUgYXZvaWRpbmcgdXNpbmcgdGhlc2UgcmVnaXN0ZXJzPw0KSSB3b25kZXIgaWYgdGhl
IGdhaW4gZnJvbSB1c2luZyB2ZWN0b3Jpc2VkIG1lbWNweSBpcyB0eXBpY2FsbHkNCmVub3VnaCB0
byB3YXJyYW50IHRoZSBjb3N0IG9mIHRoZSBzYXZlIGFuZCByZXN0b3JlPw0KDQpUaGVyZSBtYXkg
ZXZlbiBiZSBzY29wZSBmb3Iga2VybmVsIGNvZGUgZG9pbmcgYSBzYXZlL3Jlc3RvcmUNCm9mIGEg
c21hbGwgbnVtYmVyIG9mIHJlZ2lzdGVycyBvbnRvIGFuIGluLXN0YWNrIHNhdmUgYXJlYS4NCkl0
IHdvdWxkIG5lZWQgdG8gYmUgbGlua2VkIHRvIHRoZSBkYXRhIG9mIHRoZSB0aHJlYWQNCnRoYXQg
b3ducyB0aGUgZnB1IHJlZ2lzdGVycyBzbyB0aGF0IGEgc2F2ZSByZXF1ZXN0IGNvdWxkDQpiZSBo
b25vdXJlZC4NClByZS1lbXB0aW9uIHdvdWxkIHByb2JhYmx5IG5lZWQgdG8gYmUgZGlzYWJsZWQs
IGJ1dCBuZXN0ZWQNCnVzZSwgYW5kIHVzZSBmcm9tIElTUiBzaG91bGQgYmUgb2suDQoNCglEYXZp
ZA0KDQo=
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/8] FP/VEC/VSX switching optimisations
2015-11-18 14:51 ` [PATCH 0/8] FP/VEC/VSX switching optimisations David Laight
@ 2015-11-18 23:01 ` Cyril Bur
0 siblings, 0 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18 23:01 UTC (permalink / raw)
To: David Laight; +Cc: mikey@neuling.org, anton@samba.org, linuxppc-dev@ozlabs.org
On Wed, 18 Nov 2015 14:51:25 +0000
David Laight <David.Laight@ACULAB.COM> wrote:
> From: Cyril Bur
> > Sent: 18 November 2015 03:27
> ...
> > The goal of these patches is to rework how the 'math' registers (FP, VEC
> > and VSX) are context switched. Currently the kernel adopts a lazy approach,
> > always switching userspace tasks with all three facilities disabled and
> > loads in each set of registers upon receiving each unavailable exception.
> > The kernel does try to avoid disabling the features in the syscall quick
> > path but it during testing it appears that even what should be a simple
> > syscall still causes the kernel to use some facilities (vectorised memcpy
> > for example) for its self and therefore disable it for the user task.
>
Hi David,
> Perhaps the kernel should be avoiding using these registers?
> I wonder if the gain from using vectorised memcpy is typically
> enough to warrant the cost of the save and restore?
>
Yeah, on smaller copies that might be the way to go.
> There may even be scope for kernel code doing a save/restore
> of a small number of registers onto an in-stack save area.
This has been thrown up in the air, there's also the volatile/non-volatiles to
consider and the caveat that glibc doesn't quite respect the ABI here.
As it turns out (and no one is more surprised than me), despite the other
attempts at optimising, this series really has boiled down to removing the need
for processes to take the facility unavailable interrupts.
I do plan to carry on with optimising in this area and will have a look to see
what I can do.
Cyril
> It would need to be linked to the data of the thread
> that owns the fpu registers so that a save request could
> be honoured.
> Pre-emption would probably need to be disabled, but nested
> use, and use from ISR should be ok.
>
> David
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext
2015-11-18 3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
@ 2015-11-19 11:36 ` Michael Ellerman
2015-11-23 1:04 ` [PATCH 3/8] " Michael Neuling
1 sibling, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2015-11-19 11:36 UTC (permalink / raw)
To: Cyril Bur, mikey, anton, linuxppc-dev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]
On Wed, 2015-18-11 at 03:26:50 UTC, Cyril Bur wrote:
> Load up the non volatile FPU and VMX regs and ensure that they are the
> expected value in a signal handler
This is giving me:
$ /usr/bin/powerpc-linux-gnu-gcc -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"v4.0-rc1-55000-g906d582"' -I/home/buildbot/buildbot/slave/selftests-ppc64-gcc-ubuntu-be/build/tools/testing/selftests/powerpc -O2 -g -pthread fpu_signal.c ../harness.c fpu_asm.S -o fpu_signal
fpu_signal.c: In function ‘signal_fpu_sig’:
fpu_signal.c:33:19: error: initialization from incompatible pointer type [-Werror]
mcontext_t *mc = &uc->uc_mcontext;
^
fpu_signal.c:37:9: error: ‘mcontext_t’ has no member named ‘fp_regs’
if (mc->fp_regs[i] != darray[i - 14]) {
^
As well as:
$ /usr/bin/powerpc-linux-gnu-gcc -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"v4.0-rc1-55000-g906d582"' -I/home/buildbot/buildbot/slave/selftests-ppc64-gcc-ubuntu-be/build/tools/testing/selftests/powerpc -O2 -g -pthread vmx_syscall.c ../harness.c vmx_asm.S -o vmx_syscall
vmx_asm.S: Assembler messages:
vmx_asm.S:61: Error: unsupported relocation against v20
vmx_asm.S:63: Error: unsupported relocation against v21
Times infinity.
And same for vmx_preempt.
And:
$ /usr/bin/powerpc-linux-gnu-gcc -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"v4.0-rc1-55000-g906d582"' -I/home/buildbot/buildbot/slave/selftests-ppc64-gcc-ubuntu-be/build/tools/testing/selftests/powerpc -O2 -g -pthread vmx_signal.c ../harness.c vmx_asm.S -o vmx_signal
make[1]: *** [vmx_preempt] Error 1
vmx_signal.c: In function ‘signal_vmx_sig’:
vmx_signal.c:37:19: error: initialization from incompatible pointer type [-Werror]
mcontext_t *mc = &uc->uc_mcontext;
^
vmx_signal.c:41:16: error: ‘mcontext_t’ has no member named ‘v_regs’
if (memcmp(mc->v_regs->vrregs[i], &varray[i - 20], 16)) {
^
cheers
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used
2015-11-18 3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
@ 2015-11-20 11:01 ` Michael Ellerman
2015-11-22 22:18 ` Cyril Bur
2015-11-23 1:29 ` Michael Neuling
1 sibling, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2015-11-20 11:01 UTC (permalink / raw)
To: Cyril Bur, mikey, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index c8b4225..46e9869 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -210,7 +210,54 @@ system_call: /* label this so stack traces look sane */
> li r11,-MAX_ERRNO
> andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> bne- syscall_exit_work
> - cmpld r3,r11
> +
> + /*
> + * This is an assembly version of checks performed in restore_math()
> + * to avoid calling C unless absolutely necessary.
> + * Note: In order to simplify the assembly, if the FP or VEC registers
> + * are hot (and therefore restore_math() isn't called) the
> + * LOAD_{FP,VEC} thread counter doesn't get incremented.
> + * This is likely the best thing to do anyway because hot regs indicate
> + * that the workload is doing a lot of syscalls that can be handled
> + * quickly and without the need to touch FP or VEC regs (by the kernel).
> + * a) If this workload is long running then this is exactly what the
> + * kernel should be doing.
> + * b) If this workload isn't long running then we'll soon fall back to
> + * calling into C and the counter will be incremented regularly again
> + * anyway.
> + */
> + ld r9,PACACURRENT(r13)
> + andi. r0,r8,MSR_FP
> + addi r9,r9,THREAD
> + lbz r5,THREAD_LOAD_FP(r9)
> + /*
> + * Goto 2 if !r0 && r5
> + * The cmpb works because r5 can only have bits set in the lowest byte
> + * and r0 may or may not have bit 13 set (different byte) but will have
> + * a zero low byte therefore the low bytes must differ if r5 == true
> + * and the bit 13 byte must be the same if !r0
> + */
> + cmpb r7,r0,r5
cmpb is new since Power6, which means it doesn't exist on Cell -> Program Check :)
I'm testing a patch using crandc, but I don't like it.
I'm not a big fan of the logic here, it's unpleasantly complicated. Did you
benchmark going to C to do the checks? Or I wonder if we could just check
THREAD_LOAD_FP || THREAD_LOAD_VEC and if either is set we go to restore_math().
Or on the other hand we check !MSR_FP && !MSR_VEC and if so we go to
restore_math()?
> + cmpldi r7,0xff0
> +#ifdef CONFIG_ALTIVEC
> + beq 2f
> +
> + lbz r9,THREAD_LOAD_VEC(r9)
> + andis. r0,r8,MSR_VEC@h
> + /* Skip (goto 3) if r0 || !r9 */
> + bne 3f
> + cmpldi r9,0
> + beq 3f
> +#else
> + bne 3f
> +#endif
> +2: addi r3,r1,STACK_FRAME_OVERHEAD
> + bl restore_math
> + ld r8,_MSR(r1)
> + ld r3,RESULT(r1)
> + li r11,-MAX_ERRNO
> +
> +3: cmpld r3,r11
> ld r5,_CCR(r1)
> bge- syscall_error
> .Lsyscall_error_cont:
cheers
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used
2015-11-20 11:01 ` Michael Ellerman
@ 2015-11-22 22:18 ` Cyril Bur
2015-11-22 23:07 ` Michael Ellerman
0 siblings, 1 reply; 23+ messages in thread
From: Cyril Bur @ 2015-11-22 22:18 UTC (permalink / raw)
To: Michael Ellerman; +Cc: mikey, anton, linuxppc-dev
On Fri, 20 Nov 2015 22:01:04 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:
> On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index c8b4225..46e9869 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -210,7 +210,54 @@ system_call: /* label this so stack traces look sane */
> > li r11,-MAX_ERRNO
> > andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> > bne- syscall_exit_work
> > - cmpld r3,r11
> > +
> > + /*
> > + * This is an assembly version of checks performed in restore_math()
> > + * to avoid calling C unless absolutely necessary.
> > + * Note: In order to simplify the assembly, if the FP or VEC registers
> > + * are hot (and therefore restore_math() isn't called) the
> > + * LOAD_{FP,VEC} thread counter doesn't get incremented.
> > + * This is likely the best thing to do anyway because hot regs indicate
> > + * that the workload is doing a lot of syscalls that can be handled
> > + * quickly and without the need to touch FP or VEC regs (by the kernel).
> > + * a) If this workload is long running then this is exactly what the
> > + * kernel should be doing.
> > + * b) If this workload isn't long running then we'll soon fall back to
> > + * calling into C and the counter will be incremented regularly again
> > + * anyway.
> > + */
> > + ld r9,PACACURRENT(r13)
> > + andi. r0,r8,MSR_FP
> > + addi r9,r9,THREAD
> > + lbz r5,THREAD_LOAD_FP(r9)
> > + /*
> > + * Goto 2 if !r0 && r5
> > + * The cmpb works because r5 can only have bits set in the lowest byte
> > + * and r0 may or may not have bit 13 set (different byte) but will have
> > + * a zero low byte therefore the low bytes must differ if r5 == true
> > + * and the bit 13 byte must be the same if !r0
> > + */
> > + cmpb r7,r0,r5
>
> cmpb is new since Power6, which means it doesn't exist on Cell -> Program Check :)
>
Oops, sorry.
> I'm testing a patch using crandc, but I don't like it.
>
> I'm not a big fan of the logic here, it's unpleasantly complicated. Did you
> benchmark going to C to do the checks? Or I wonder if we could just check
> THREAD_LOAD_FP || THREAD_LOAD_VEC and if either is set we go to restore_math().
>
I didn't benchmark going to C mostly because you wanted to avoid calling C
unless necessary in that path. Based off the results I got benchmarking the
this series I expect calling C will also be in the noise of removing the
exception.
> Or on the other hand we check !MSR_FP && !MSR_VEC and if so we go to
> restore_math()?
>
That seems like the best check to leave in the assembly if you want to avoid
complicated assembly in there.
> > + cmpldi r7,0xff0
> > +#ifdef CONFIG_ALTIVEC
> > + beq 2f
> > +
> > + lbz r9,THREAD_LOAD_VEC(r9)
> > + andis. r0,r8,MSR_VEC@h
> > + /* Skip (goto 3) if r0 || !r9 */
> > + bne 3f
> > + cmpldi r9,0
> > + beq 3f
> > +#else
> > + bne 3f
> > +#endif
> > +2: addi r3,r1,STACK_FRAME_OVERHEAD
> > + bl restore_math
> > + ld r8,_MSR(r1)
> > + ld r3,RESULT(r1)
> > + li r11,-MAX_ERRNO
> > +
> > +3: cmpld r3,r11
> > ld r5,_CCR(r1)
> > bge- syscall_error
> > .Lsyscall_error_cont:
>
>
> cheers
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used
2015-11-22 22:18 ` Cyril Bur
@ 2015-11-22 23:07 ` Michael Ellerman
0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2015-11-22 23:07 UTC (permalink / raw)
To: Cyril Bur; +Cc: mikey, anton, linuxppc-dev
On Mon, 2015-11-23 at 09:18 +1100, Cyril Bur wrote:
> On Fri, 20 Nov 2015 22:01:04 +1100
> Michael Ellerman <mpe@ellerman.id.au> wrote:
> > On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > > index c8b4225..46e9869 100644
> > > --- a/arch/powerpc/kernel/entry_64.S
> > > +++ b/arch/powerpc/kernel/entry_64.S
> > > @@ -210,7 +210,54 @@ system_call: /* label this so stack traces look sane */
> > > li r11,-MAX_ERRNO
> > > andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> > > bne- syscall_exit_work
> > > - cmpld r3,r11
> > > +
> > > + /*
> > > + * This is an assembly version of checks performed in restore_math()
> > > + * to avoid calling C unless absolutely necessary.
> > > + * Note: In order to simplify the assembly, if the FP or VEC registers
> > > + * are hot (and therefore restore_math() isn't called) the
> > > + * LOAD_{FP,VEC} thread counter doesn't get incremented.
> > > + * This is likely the best thing to do anyway because hot regs indicate
> > > + * that the workload is doing a lot of syscalls that can be handled
> > > + * quickly and without the need to touch FP or VEC regs (by the kernel).
> > > + * a) If this workload is long running then this is exactly what the
> > > + * kernel should be doing.
> > > + * b) If this workload isn't long running then we'll soon fall back to
> > > + * calling into C and the counter will be incremented regularly again
> > > + * anyway.
> > > + */
> > > + ld r9,PACACURRENT(r13)
> > > + andi. r0,r8,MSR_FP
> > > + addi r9,r9,THREAD
> > > + lbz r5,THREAD_LOAD_FP(r9)
> > > + /*
> > > + * Goto 2 if !r0 && r5
> > > + * The cmpb works because r5 can only have bits set in the lowest byte
> > > + * and r0 may or may not have bit 13 set (different byte) but will have
> > > + * a zero low byte therefore the low bytes must differ if r5 == true
> > > + * and the bit 13 byte must be the same if !r0
> > > + */
> > > + cmpb r7,r0,r5
> >
> > cmpb is new since Power6, which means it doesn't exist on Cell -> Program Check :)
> >
> Oops, sorry.
That's fine, there's almost no way for you to know that from reading the
documentation.
> > I'm testing a patch using crandc, but I don't like it.
> >
> > I'm not a big fan of the logic here, it's unpleasantly complicated. Did you
> > benchmark going to C to do the checks? Or I wonder if we could just check
> > THREAD_LOAD_FP || THREAD_LOAD_VEC and if either is set we go to restore_math().
> >
>
> I didn't benchmark going to C mostly because you wanted to avoid calling C
> unless necessary in that path. Based off the results I got benchmarking the
> this series I expect calling C will also be in the noise of removing the
> exception.
Yeah I figured it was probably me that said "avoid C at all costs". But I've
changed my mind ;)
> > Or on the other hand we check !MSR_FP && !MSR_VEC and if so we go to
> > restore_math()?
>
> That seems like the best check to leave in the assembly if you want to avoid
> complicated assembly in there.
Cool. If you can benchmark that that'd be great, mmkay.
cheers
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2015-11-18 3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
@ 2015-11-23 0:23 ` Michael Neuling
2015-11-23 0:58 ` Cyril Bur
0 siblings, 1 reply; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 0:23 UTC (permalink / raw)
To: Cyril Bur, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> Test that the non volatile floating point and Altivec registers get
> correctly preserved across the fork() syscall.
Can we add a test for VSX too? I realise it's the same registers, but
the enable bits in the MSR are different so it's easy to get them wrong
in the kernel.
Additional comments below.
> fork() works nicely for this purpose, the registers should be the same fo=
r
> both parent and child
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> tools/testing/selftests/powerpc/Makefile | 3 +-
> tools/testing/selftests/powerpc/math/Makefile | 14 ++
> tools/testing/selftests/powerpc/math/basic_asm.h | 26 +++
> tools/testing/selftests/powerpc/math/fpu_asm.S | 151 +++++++++++++++=
++
> tools/testing/selftests/powerpc/math/fpu_syscall.c | 79 +++++++++
> tools/testing/selftests/powerpc/math/vmx_asm.S | 183 +++++++++++++++=
++++++
> tools/testing/selftests/powerpc/math/vmx_syscall.c | 81 +++++++++
> 7 files changed, 536 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/powerpc/math/Makefile
> create mode 100644 tools/testing/selftests/powerpc/math/basic_asm.h
> create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
> create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
> create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
> create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
>=20
> diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/sel=
ftests/powerpc/Makefile
> index 0c2706b..19e8191 100644
> --- a/tools/testing/selftests/powerpc/Makefile
> +++ b/tools/testing/selftests/powerpc/Makefile
> @@ -22,7 +22,8 @@ SUB_DIRS =3D benchmarks > > > \
> > > switch_endian> > \
> > > syscalls> > > \
> > > tm> > > > \
> -> > vphn
> +> > vphn \
> +> > math
> =20
> endif
> =20
> diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testin=
g/selftests/powerpc/math/Makefile
> new file mode 100644
> index 0000000..896d9e2
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/Makefile
> @@ -0,0 +1,14 @@
> +TEST_PROGS :=3D fpu_syscall vmx_syscall
Add a new .gitignore in this dirfor these new build objects.
> +
> +all: $(TEST_PROGS)
> +
> +$(TEST_PROGS): ../harness.c
> +$(TEST_PROGS): CFLAGS +=3D -O2 -g
> +
> +fpu_syscall: fpu_asm.S
> +vmx_syscall: vmx_asm.S
> +
> +include ../../lib.mk
> +
> +clean:
> +> > rm -f $(TEST_PROGS) *.o
> diff --git a/tools/testing/selftests/powerpc/math/basic_asm.h b/tools/tes=
ting/selftests/powerpc/math/basic_asm.h
> new file mode 100644
> index 0000000..27aca79
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/basic_asm.h
Can you put this up a directory since it's generically useful for
powerpc?
> @@ -0,0 +1,26 @@
> +#include=20
> +#include=20
> +
> +#define LOAD_REG_IMMEDIATE(reg,expr) \
> +> > lis> > reg,(expr)@highest;> > \
> +> > ori> > reg,reg,(expr)@higher;> > \
> +> > rldicr> > reg,reg,32,31;> > \
> +> > oris> > reg,reg,(expr)@high;> > \
> +> > ori> > reg,reg,(expr)@l;
> +
> +#define PUSH_BASIC_STACK(size) \
> +> > std> > 2,24(sp); \
> +> > mflr> > r0; \
> +> > std> > r0,16(sp); \
> +> > mfcr> > r0; \
> +> > stw> > r0,8(sp); \
> +> > stdu> > sp,-size(sp);
> +
> +#define POP_BASIC_STACK(size) \
> +> > addi> > sp,sp,size; \
> +> > ld> > 2,24(sp); \
> +> > ld> > r0,16(sp); \
> +> > mtlr> > r0; \
> +> > lwz> > r0,8(sp); \
> +> > mtcr> > r0; \
> +
> diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testi=
ng/selftests/powerpc/math/fpu_asm.S
> new file mode 100644
> index 0000000..d5412c1
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
> @@ -0,0 +1,151 @@
> +#include "basic_asm.h"
> +
> +#define PUSH_FPU(pos) \
> +> > stfd> > f14,pos(sp); \
> +> > stfd> > f15,pos+8(sp); \
> +> > stfd> > f16,pos+16(sp); \
> +> > stfd> > f17,pos+24(sp); \
> +> > stfd> > f18,pos+32(sp); \
> +> > stfd> > f19,pos+40(sp); \
> +> > stfd> > f20,pos+48(sp); \
> +> > stfd> > f21,pos+56(sp); \
> +> > stfd> > f22,pos+64(sp); \
> +> > stfd> > f23,pos+72(sp); \
> +> > stfd> > f24,pos+80(sp); \
> +> > stfd> > f25,pos+88(sp); \
> +> > stfd> > f26,pos+96(sp); \
> +> > stfd> > f27,pos+104(sp); \
> +> > stfd> > f28,pos+112(sp); \
> +> > stfd> > f29,pos+120(sp); \
> +> > stfd> > f30,pos+128(sp); \
> +> > stfd> > f31,pos+136(sp);
> +
> +#define POP_FPU(pos) \
> +> > lfd> > f14,pos(sp); \
> +> > lfd> > f15,pos+8(sp); \
> +> > lfd> > f16,pos+16(sp); \
> +> > lfd> > f17,pos+24(sp); \
> +> > lfd> > f18,pos+32(sp); \
> +> > lfd> > f19,pos+40(sp); \
> +> > lfd> > f20,pos+48(sp); \
> +> > lfd> > f21,pos+56(sp); \
> +> > lfd> > f22,pos+64(sp); \
> +> > lfd> > f23,pos+72(sp); \
> +> > lfd> > f24,pos+80(sp); \
> +> > lfd> > f25,pos+88(sp); \
> +> > lfd> > f26,pos+96(sp); \
> +> > lfd> > f27,pos+104(sp); \
> +> > lfd> > f28,pos+112(sp); \
> +> > lfd> > f29,pos+120(sp); \
> +> > lfd> > f30,pos+128(sp); \
> +> > lfd> > f31,pos+136(sp);
> +
> +#Careful calling this, it will 'clobber' fpu (by design)
> +#Don't call this from C
> +FUNC_START(load_fpu)
> +> > lfd> > f14,0(r3)
> +> > lfd> > f15,8(r3)
> +> > lfd> > f16,16(r3)
> +> > lfd> > f17,24(r3)
> +> > lfd> > f18,32(r3)
> +> > lfd> > f19,40(r3)
> +> > lfd> > f20,48(r3)
> +> > lfd> > f21,56(r3)
> +> > lfd> > f22,64(r3)
> +> > lfd> > f23,72(r3)
> +> > lfd> > f24,80(r3)
> +> > lfd> > f25,88(r3)
> +> > lfd> > f26,96(r3)
> +> > lfd> > f27,104(r3)
> +> > lfd> > f28,112(r3)
> +> > lfd> > f29,120(r3)
> +> > lfd> > f30,128(r3)
> +> > lfd> > f31,136(r3)
> +> > blr
> +FUNC_END(load_fpu)
> +
> +FUNC_START(check_fpu)
> +> > mr r4,r3
> +> > li> > r3,1 #assume a bad result
> +> > lfd> > f0,0(r4)
> +> > fcmpu> > cr1,f0,f14
> +> > bne> > cr1,1f
> +> > lfd> > f0,8(r4)
> +> > fcmpu> > cr1,f0,f15
> +> > bne> > cr1,1f
> +> > lfd> > f0,16(r4)
> +> > fcmpu> > cr1,f0,f16
> +> > bne> > cr1,1f
> +> > lfd> > f0,24(r4)
> +> > fcmpu> > cr1,f0,f17
> +> > bne> > cr1,1f
> +> > lfd> > f0,32(r4)
> +> > fcmpu> > cr1,f0,f18
> +> > bne> > cr1,1f
> +> > lfd> > f0,40(r4)
> +> > fcmpu> > cr1,f0,f19
> +> > bne> > cr1,1f
> +> > lfd> > f0,48(r4)
> +> > fcmpu> > cr1,f0,f20
> +> > bne> > cr1,1f
> +> > lfd> > f0,56(r4)
> +> > fcmpu> > cr1,f0,f21
> +> > bne> > cr1,1f
> +> > lfd> > f0,64(r4)
> +> > fcmpu> > cr1,f0,f22
> +> > bne> > cr1,1f
> +> > lfd> > f0,72(r4)
> +> > fcmpu> > cr1,f0,f23
> +> > bne> > cr1,1f
> +> > lfd> > f0,80(r4)
> +> > fcmpu> > cr1,f0,f24
> +> > bne> > cr1,1f
> +> > lfd> > f0,88(r4)
> +> > fcmpu> > cr1,f0,f25
> +> > bne> > cr1,1f
> +> > lfd> > f0,96(r4)
> +> > fcmpu> > cr1,f0,f26
> +> > bne> > cr1,1f
> +> > lfd> > f0,104(r4)
> +> > fcmpu> > cr1,f0,f27
> +> > bne> > cr1,1f
> +> > lfd> > f0,112(r4)
> +> > fcmpu> > cr1,f0,f28
> +> > bne> > cr1,1f
> +> > lfd> > f0,120(r4)
> +> > fcmpu> > cr1,f0,f29
> +> > bne> > cr1,1f
> +> > lfd> > f0,128(r4)
> +> > fcmpu> > cr1,f0,f30
> +> > bne> > cr1,1f
> +> > lfd> > f0,136(r4)
> +> > fcmpu> > cr1,f0,f31
> +> > bne> > cr1,1f
> +> > li> > r3,0 #Sucess!!!
> +1:> > blr
> +
> +FUNC_START(test_fpu)
> +> > #r3 holds pointer to where to put the result of fork
#r4 seems to hold a ptr to the pid
> + #f14-f31 are non volatiles
> +> > PUSH_BASIC_STACK(256)
> +> > std> > r3,40(sp) #Address of darray
> +> > std r4,48(sp) #Address of pid
> +> > PUSH_FPU(56)
> +
> +> > bl load_fpu
> +> > nop
> +> > li> > r0,__NR_fork
> +> > sc
> +
> +> > #pass the result of the fork to the caller
> +> > ld> > r9,48(sp)
> +> > std> > r3,0(r9)
> +
> +> > ld r3,40(sp)
> +> > bl check_fpu
> +> > nop
> +
> +> > POP_FPU(56)
> +> > POP_BASIC_STACK(256)
> +> > blr
> +FUNC_END(test_fpu)
> diff --git a/tools/testing/selftests/powerpc/math/fpu_syscall.c b/tools/t=
esting/selftests/powerpc/math/fpu_syscall.c
> new file mode 100644
> index 0000000..a967fd6
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/fpu_syscall.c
> @@ -0,0 +1,79 @@
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +
> +#include "utils.h"
> +
> +extern int test_fpu(double *darray, pid_t *pid);
> +
> +double darray[] =3D {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
> +> > > 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
> +> > > 2.1};
> +
> +int syscall_fpu(void)
> +{
> +> > pid_t fork_pid;
> +> > int i;
> +> > int ret;
> +> > int child_ret;
> +> > for (i =3D 0; i < 1000; i++) {
> +> > > /* test_fpu will fork() */
> +> > > ret =3D test_fpu(darray, &fork_pid);
> +> > > if (fork_pid =3D=3D -1)
> +> > > > return -1;
> +> > > if (fork_pid =3D=3D 0)
> +> > > > exit(ret);
> +> > > waitpid(fork_pid, &child_ret, 0);
> +> > > if (ret || child_ret)
> +> > > > return 1;
> +> > }
> +
> +> > return 0;
> +}
> +
> +int test_syscall_fpu(void)
> +{
> +> > /*
> +> > * Setup an environment with much context switching
> +> > */
> +> > pid_t pid2;
> +> > pid_t pid =3D fork();
> +> > int ret;
> +> > int child_ret;
> +> > FAIL_IF(pid =3D=3D -1);
> +
> +> > pid2 =3D fork();
> +> > /* Can't FAIL_IF(pid2 =3D=3D -1); because already forked once */
> +> > if (pid2 =3D=3D -1) {
> +> > > /*
> +> > > * Couldn't fork, ensure test is a fail
> +> > > */
> +> > > child_ret =3D ret =3D 1;
> +> > } else {
> +> > > ret =3D syscall_fpu();
> +> > > if (pid2)
> +> > > > waitpid(pid2, &child_ret, 0);
> +> > > else
> +> > > > exit(ret);
> +> > }
> +
> +> > ret |=3D child_ret;
> +
> +> > if (pid)
> +> > > waitpid(pid, &child_ret, 0);
> +> > else
> +> > > exit(ret);
> +
> +> > FAIL_IF(ret || child_ret);
> +> > return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +> > return test_harness(test_syscall_fpu, "syscall_fpu");
> +
> +}
> diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testi=
ng/selftests/powerpc/math/vmx_asm.S
> new file mode 100644
> index 0000000..e642e67
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
> @@ -0,0 +1,183 @@
> +#include "basic_asm.h"
> +
> +#define PUSH_VMX(pos,reg) \
> +> > li> > reg,pos; \
> +> > stvx> > v20,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v21,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v22,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v23,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v24,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v25,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v26,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v27,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v28,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v29,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v30,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > stvx> > v31,reg,sp;
> +
> +#define POP_VMX(pos,reg) \
> +> > li> > reg,pos; \
> +> > lvx> > v20,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v21,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v22,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v23,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v24,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v25,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v26,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v27,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v28,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v29,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v30,reg,sp; \
> +> > addi> > reg,reg,16; \
> +> > lvx> > v31,reg,sp;
> +
> +#Carefull this will 'clobber' vmx (by design)
> +#Don't call this from C
> +FUNC_START(load_vmx)
> +> > li> > r5,0
> +> > lvx> > v20,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v21,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v22,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v23,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v24,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v25,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v26,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v27,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v28,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v29,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v30,r5,r3
> +> > addi> > r5,r5,16
> +> > lvx> > v31,r5,r3
> +> > blr
> +FUNC_END(load_vmx)
> +
> +#Should be safe from C, only touches r4, r5 and v0,v1,v2
> +FUNC_START(check_vmx)
> +> > PUSH_BASIC_STACK(16)
> +> > mr r4,r3
> +> > li> > r3,1 #assume a bad result
> +> > li> > r5,0
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v20
> +> > vmr> > v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v21
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v22
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v23
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v24
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v25
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v26
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v27
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v28
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v29
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v30
> +> > vand> > v2,v2,v1
> +
> +> > addi> > r5,r5,16
> +> > lvx> > v0,r5,r4
> +> > vcmpequd.> > v1,v0,v31
> +> > vand> > v2,v2,v1
> +
> +> > li r5,0
> +> > stvx> > v2,r5,sp
> +> > ldx> > r0,r5,sp
> +> > cmpdi> > r0,0xffffffff
> +> > bne> > 1f
> +> > li> > r3,0
> +1:> > POP_BASIC_STACK(16)
> +> > blr
> +FUNC_END(check_vmx)
> +
> +#Safe from C
> +FUNC_START(test_vmx)
> +> > #r3 holds pointer to where to put the result of fork
> +> > #v20-v31 are non-volatile
> +> > PUSH_BASIC_STACK(512)
> +> > std> > r3,40(sp) #Address of varray
> +> > std r4,48(sp) #address of pid
> +> > PUSH_VMX(56, r4)
> +
> +> > bl load_vmx
> +
> +> > li> > r0,__NR_fork
> +> > sc
> +> > #Pass the result of fork back to the caller
> +> > ld> > r9,48(sp)
> +> > std> > r3,0(r9)
> +
> +> > ld r3,40(sp)
> +> > bl check_vmx
> +
> +> > POP_VMX(56,r4)
> +> > POP_BASIC_STACK(512)
> +> > blr
> +FUNC_END(test_vmx)
> diff --git a/tools/testing/selftests/powerpc/math/vmx_syscall.c b/tools/t=
esting/selftests/powerpc/math/vmx_syscall.c
> new file mode 100644
> index 0000000..7adff05
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/vmx_syscall.c
> @@ -0,0 +1,81 @@
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +#include=20
> +
> +#include "utils.h"
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +v4si varray[] =3D {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
> +> > {13,14,15,16},{17,18,19,20},{21,22,23,24},
> +> > {25,26,27,28},{29,30,31,32},{33,34,35,36},
> +> > {37,38,39,40},{41,42,43,44},{45,46,47,48}};
> +
> +extern int test_vmx(v4si *varray, pid_t *pid);
> +
> +int vmx_syscall(void)
> +{
> +> > pid_t fork_pid;
> +> > int i;
> +> > int ret;
> +> > int child_ret;
> +> > for (i =3D 0; i < 1000; i++) {
> +> > > /* test_vmx will fork() */
> +> > > ret =3D test_vmx(varray, &fork_pid);
> +> > > if (fork_pid =3D=3D -1)
> +> > > > return -1;
> +> > > if (fork_pid =3D=3D 0)
> +> > > > exit(ret);
> +> > > waitpid(fork_pid, &child_ret, 0);
> +> > > if (ret || child_ret)
> +> > > > return 1;
> +> > }
> +
> +> > return 0;
> +}
> +
> +int test_vmx_syscall(void)
> +{
> +> > /*
> +> > * Setup an environment with much context switching
> +> > */
> +> > pid_t pid2;
> +> > pid_t pid =3D fork();
> +> > int ret;
> +> > int child_ret;
> +> > FAIL_IF(pid =3D=3D -1);
> +
> +> > pid2 =3D fork();
> +> > ret =3D vmx_syscall();
> +> > /* Can't FAIL_IF(pid2 =3D=3D -1); because we've already forked */
> +> > if (pid2 =3D=3D -1) {
> +> > > /*
> +> > > * Couldn't fork, ensure child_ret is set and is a fail
> +> > > */
> +> > > ret =3D child_ret =3D 1;
> +> > } else {
> +> > > if (pid2)
> +> > > > waitpid(pid2, &child_ret, 0);
> +> > > else
> +> > > > exit(ret);
> +> > }
> +
> +> > ret |=3D child_ret;
> +
> +> > if (pid)
> +> > > waitpid(pid, &child_ret, 0);
> +> > else
> +> > > exit(ret);
> +
> +> > FAIL_IF(ret || child_ret);
> +> > return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +> > return test_harness(test_vmx_syscall, "vmx_syscall");
> +
> +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption
2015-11-18 3:26 ` [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
@ 2015-11-23 0:34 ` Michael Neuling
0 siblings, 0 replies; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 0:34 UTC (permalink / raw)
To: Cyril Bur, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> Loop in assembly checking the registers with many threads.
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> tools/testing/selftests/powerpc/math/Makefile | 7 +-
> tools/testing/selftests/powerpc/math/fpu_asm.S | 34 ++++++++
> tools/testing/selftests/powerpc/math/fpu_preempt.c | 92 ++++++++++++++++=
++++++
> tools/testing/selftests/powerpc/math/vmx_asm.S | 44 ++++++++++-
> tools/testing/selftests/powerpc/math/vmx_preempt.c | 92 ++++++++++++++++=
++++++
> 5 files changed, 263 insertions(+), 6 deletions(-)
> create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
> create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
>=20
> diff --git a/tools/testing/selftests/powerpc/math/Makefile
b/tools/testing/selftests/powerpc/math/Makefile
> index 896d9e2..9fa690f 100644
> --- a/tools/testing/selftests/powerpc/math/Makefile
> +++ b/tools/testing/selftests/powerpc/math/Makefile
> @@ -1,12 +1,15 @@
> -TEST_PROGS :=3D fpu_syscall vmx_syscall
> +TEST_PROGS :=3D fpu_syscall fpu_preempt vmx_syscall vmx_preempt
.gitignore for this new build object
> all: $(TEST_PROGS)
> =20
> $(TEST_PROGS): ../harness.c
> -$(TEST_PROGS): CFLAGS +=3D -O2 -g
> +$(TEST_PROGS): CFLAGS +=3D -O2 -g -pthread
> =20
> fpu_syscall: fpu_asm.S
> +fpu_preempt: fpu_asm.S
> +
> vmx_syscall: vmx_asm.S
> +vmx_preempt: vmx_asm.S
> =20
> include ../../lib.mk
> =20
> diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testi=
ng/selftests/powerpc/math/fpu_asm.S
> index d5412c1..5ff0adc 100644
> --- a/tools/testing/selftests/powerpc/math/fpu_asm.S
> +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
> @@ -149,3 +149,37 @@ FUNC_START(test_fpu)
> > > POP_BASIC_STACK(256)
> > > blr
> FUNC_END(test_fpu)
> +
> +#int preempt_fpu(double *darray, volatile int *not_ready, int *sentinal)
> +#On starting will (atomically) decrement not_ready as a signal that the =
FPU
> +#has been loaded with darray. Will proceed to check the validity of the =
FPU
> +#registers while sentinal is not zero.
> +FUNC_START(preempt_fpu)
> +> > PUSH_BASIC_STACK(256)
> +> > std r3,32(sp) #double *darray
> +> > std r4,40(sp) #volatile int *not_ready
> +> > std r5,48(sp) #int *sentinal
> +> > PUSH_FPU(56)
> +
> +> > bl load_fpu
> +
Memory barrier here.
> +> > #Atomic DEC
> +> > ld r3,40(sp)
> +1:> > lwarx r4,0,r3
> +> > addi r4,r4,-1
> +> > stwcx. r4,0,r3
> +> > bne- 1b
> +
> +2:> > ld r3, 32(sp)
> +> > bl check_fpu
> +> > cmpdi r3,0
> +> > bne 3f
> +> > ld r4, 48(sp)
> +> > ld r5, 0(r4)
> +> > cmpwi r5,0
> +> > bne 2b
> +
> +3:> > POP_FPU(56)
> +> > POP_BASIC_STACK(256)
> +> > blr
> +FUNC_END(preempt_fpu)
> diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c b/tools/t=
esting/selftests/powerpc/math/fpu_preempt.c
> new file mode 100644
> index 0000000..e24cf9b
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
Needs a copyright and a description of the test here.
Same in patch 1.
> @@ -0,0 +1,92 @@
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <stdlib.h>
> +#include <pthread.h>
> +
> +#include "utils.h"
> +
> +/* Time to wait for workers to get preempted (seconds) */
> +#define PREEMPT_TIME 20
> +/*
> + * Factor by which to multiply number of online CPUs for total number of
> + * worker threads
> + */
> +#define THREAD_FACTOR 8
> +
> +
> +__thread double darray[] =3D {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.=
9, 1.0,
> +> > > 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
> +> > > 2.1};
> +
> +volatile int not_ready;
> +int running;
> +
> +extern void preempt_fpu(double *darray, volatile int *not_ready, int *se=
ntinal);
> +
> +void *preempt_fpu_c(void *p)
> +{
> +> > int i;
> +> > srand(pthread_self());
> +> > for (i =3D 0; i < 21; i++)
> +> > > darray[i] =3D rand();
> +
> +> > /* Test failed if it ever returns */
> +> > preempt_fpu(darray, ¬_ready, &running);
> +
> +> > return p;
> +}
> +
> +int test_preempt_fpu(void)
> +{
> +> > int i, rc, threads;
> +> > pthread_t *tids;
> +
> +> > threads =3D sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> +> > tids =3D malloc((threads) * sizeof(pthread_t));
> +> > FAIL_IF(!tids);
> +
> +> > running =3D true;
> +> > not_ready =3D threads;
> +> > for (i =3D 0; i < threads; i++) {
> +> > > rc =3D pthread_create(&tids[i], NULL, preempt_fpu_c, NULL);
> +> > > FAIL_IF(rc);
> +> > }
> +
> +> > setbuf(stdout, NULL);
> +> > /* Not really nessesary but nice to wait for every thread to start =
*/
> +> > printf("\tWaiting for all workers to start...");
> +> > while(not_ready);
You need a memory barrier here and a matching one in the asm which is
derementing it. Same in the VSX test.
> +> > printf("done\n");
> +
> +> > printf("\tWaiting for %d seconds to let some workers get preempted.=
..", PREEMPT_TIME);
> +> > sleep(PREEMPT_TIME);
> +> > printf("done\n");
> +
> +> > printf("\tKilling workers...");
> +> > running =3D 0;
> +> > for (i =3D 0; i < threads; i++) {
> +> > > void *rc_p;
> +> > > pthread_join(tids[i], &rc_p);
> +
> +> > > /*
> +> > > * Harness will say the fail was here, look at why preempt_fpu
> +> > > * returned
> +> > > */
> +> > > if ((long) rc_p)
> +> > > > printf("oops\n");
> +> > > FAIL_IF((long) rc_p);
> +> > }
> +> > printf("done\n");
> +
> +> > free(tids);
> +> > return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +> > return test_harness(test_preempt_fpu, "fpu_preempt");
> +}
> diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testi=
ng/selftests/powerpc/math/vmx_asm.S
> index e642e67..23db4b3 100644
> --- a/tools/testing/selftests/powerpc/math/vmx_asm.S
> +++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
> @@ -1,5 +1,6 @@
> #include "basic_asm.h"
> =20
> +#POS MUST BE 16 ALIGNED!
> #define PUSH_VMX(pos,reg) \
> > > li> > reg,pos; \
> > > stvx> > v20,reg,sp; \
> @@ -26,6 +27,7 @@
> > > addi> > reg,reg,16; \
> > > stvx> > v31,reg,sp;
> =20
> +#POS MUST BE 16 ALIGNED!
> #define POP_VMX(pos,reg) \
> > > li> > reg,pos; \
> > > lvx> > v20,reg,sp; \
> @@ -84,7 +86,7 @@ FUNC_END(load_vmx)
> =20
> #Should be safe from C, only touches r4, r5 and v0,v1,v2
> FUNC_START(check_vmx)
> -> > PUSH_BASIC_STACK(16)
> +> > PUSH_BASIC_STACK(32)
> > > mr r4,r3
> > > li> > r3,1 #assume a bad result
> > > li> > r5,0
> @@ -153,7 +155,7 @@ FUNC_START(check_vmx)
> > > cmpdi> > r0,0xffffffff
> > > bne> > 1f
> > > li> > r3,0
> -1:> > POP_BASIC_STACK(16)
> +1:> > POP_BASIC_STACK(32)
> > > blr
> FUNC_END(check_vmx)
> =20
> @@ -164,7 +166,7 @@ FUNC_START(test_vmx)
> > > PUSH_BASIC_STACK(512)
> > > std> > r3,40(sp) #Address of varray
> > > std r4,48(sp) #address of pid
> -> > PUSH_VMX(56, r4)
> +> > PUSH_VMX(64, r4)
> =20
> > > bl load_vmx
> =20
> @@ -177,7 +179,41 @@ FUNC_START(test_vmx)
> > > ld r3,40(sp)
> > > bl check_vmx
> =20
> -> > POP_VMX(56,r4)
> +> > POP_VMX(64,r4)
> > > POP_BASIC_STACK(512)
> > > blr
> FUNC_END(test_vmx)
> +
> +#int preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal)
> +#On starting will (atomically) decrement not_ready as a signal that the =
FPU
> +#has been loaded with varray. Will proceed to check the validity of the =
FPU
> +#registers while sentinal is not zero.
> +FUNC_START(preempt_vmx)
> +> > PUSH_BASIC_STACK(512)
> +> > std r3,32(sp) #v4si *varray
> +> > std r4,40(sp) #volatile int *not_ready
> +> > std r5,48(sp) #int *sentinal
> +> > PUSH_VMX(64,r4)
> +
> +> > bl load_vmx
> +
Memory barrier here
> +> > #Atomic DEC
> +> > ld r3,40(sp)
> +1:> > lwarx r4,0,r3
> +> > addi r4,r4,-1
> +> > stwcx. r4,0,r3
> +> > bne- 1b
> +
> +2:> > ld r3,32(sp)
> +> > bl check_vmx
> +> > cmpdi r3,0
> +> > bne 3f
> +> > ld r4,48(sp)
> +> > ld r5,0(r4)
> +> > cmpwi r5,0
> +> > bne 2b
> +
> +3:> > POP_VMX(64,r4)
> +> > POP_BASIC_STACK(512)
> +> > blr
> +FUNC_END(preempt_vmx)
> diff --git a/tools/testing/selftests/powerpc/math/vmx_preempt.c b/tools/t=
esting/selftests/powerpc/math/vmx_preempt.c
> new file mode 100644
> index 0000000..342db15
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/vmx_preempt.c
Needs a copyright and a description of the test here. =20
> @@ -0,0 +1,92 @@
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <stdlib.h>
> +#include <pthread.h>
> +
> +#include "utils.h"
> +
> +/* Time to wait for workers to get preempted (seconds) */
> +#define PREEMPT_TIME 20
> +/*
> + * Factor by which to multiply number of online CPUs for total number of
> + * worker threads
> + */
> +#define THREAD_FACTOR 8
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +__thread v4si varray[] =3D {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
> +> > {13,14,15,16},{17,18,19,20},{21,22,23,24},
> +> > {25,26,27,28},{29,30,31,32},{33,34,35,36},
> +> > {37,38,39,40},{41,42,43,44},{45,46,47,48}};
> +
> +volatile int not_ready;
I really hate this name. How about just "ready" and you negate the
code.
> +int running;
Now about not_running ;-P
> +
> +extern void preempt_vmx(v4si *varray, volatile int *not_ready, int *sent=
inal);
> +
> +void *preempt_vmx_c(void *p)
> +{
> +> > int i, j;
> +> > srand(pthread_self());
> +> > for (i =3D 0; i < 12; i++)
> +> > > for (j =3D 0; j < 4; j++)
> +> > > > varray[i][j] =3D rand();
> +
> +> > /* Test fails if it ever returns */
> +> > preempt_vmx(varray, ¬_ready, &running);
> +> > return p;
> +}
> +
> +int test_preempt_vmx(void)
> +{
> +> > int i, rc, threads;
> +> > pthread_t *tids;
> +
> +> > threads =3D sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> +> > tids =3D malloc(threads * sizeof(pthread_t));
> +> > FAIL_IF(!tids);
> +
> +> > running =3D true;
> +> > not_ready =3D threads;
> +> > for (i =3D 0; i < threads; i++) {
> +> > > rc =3D pthread_create(&tids[i], NULL, preempt_vmx_c, NULL);
> +> > > FAIL_IF(rc);
> +> > }
> +
> +> > setbuf(stdout, NULL);
> +> > /* Not really nessesary but nice to wait for every thread to start =
*/
> +> > printf("\tWaiting for all workers to start...");
> +> > while(not_ready);
Again, a memory barrier here.
> +> > printf("done\n");
> +
> +> > printf("\tWaiting for %d seconds to let some workers get preempted.=
..", PREEMPT_TIME);
> +> > sleep(PREEMPT_TIME);
> +> > printf("done\n");
> +
> +> > printf("\tKilling workers...");
> +> > running =3D 0;
> +> > for (i =3D 0; i < threads; i++) {
> +> > > void *rc_p;
> +> > > pthread_join(tids[i], &rc_p);
> +
> +> > > /*
> +> > > * Harness will say the fail was here, look at why preempt_vmx
> +> > > * returned
> +> > > */
> +> > > if ((long) rc_p)
> +> > > > printf("oops\n");
> +> > > FAIL_IF((long) rc_p);
> +> > }
> +> > printf("done\n");
> +
> +> > return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +> > return test_harness(test_preempt_vmx, "vmx_preempt");
> +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2015-11-23 0:23 ` Michael Neuling
@ 2015-11-23 0:58 ` Cyril Bur
2015-11-23 1:06 ` Michael Neuling
0 siblings, 1 reply; 23+ messages in thread
From: Cyril Bur @ 2015-11-23 0:58 UTC (permalink / raw)
To: Michael Neuling; +Cc: anton, linuxppc-dev
On Mon, 23 Nov 2015 11:23:13 +1100
Michael Neuling <mikey@neuling.org> wrote:
> On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> > Test that the non volatile floating point and Altivec registers get
> > correctly preserved across the fork() syscall.
>
> Can we add a test for VSX too? I realise it's the same registers, but
> the enable bits in the MSR are different so it's easy to get them wrong
> in the kernel.
Yeah, I'm sure I could get that wrong haha.
Hmmmm this got me thinking. Today we always enable FP and Altivec when we
enable VSX but isn't there a world where we could actually run with FP and
Altivec disabled and VSX on? In which case, is the whole thing volatile or
does the kernel still need to save the subset of the matrix which corresponds
to non-volatile FPs and non-volatile Altivec?
>
> Additional comments below.
>
> > fork() works nicely for this purpose, the registers should be the same for
> > both parent and child
> >
> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > ---
> > tools/testing/selftests/powerpc/Makefile | 3 +-
> > tools/testing/selftests/powerpc/math/Makefile | 14 ++
> > tools/testing/selftests/powerpc/math/basic_asm.h | 26 +++
> > tools/testing/selftests/powerpc/math/fpu_asm.S | 151 +++++++++++++++++
> > tools/testing/selftests/powerpc/math/fpu_syscall.c | 79 +++++++++
> > tools/testing/selftests/powerpc/math/vmx_asm.S | 183 +++++++++++++++++++++
> > tools/testing/selftests/powerpc/math/vmx_syscall.c | 81 +++++++++
> > 7 files changed, 536 insertions(+), 1 deletion(-)
> > create mode 100644 tools/testing/selftests/powerpc/math/Makefile
> > create mode 100644 tools/testing/selftests/powerpc/math/basic_asm.h
> > create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
> > create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
> > create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
> > create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
> >
> > diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
> > index 0c2706b..19e8191 100644
> > --- a/tools/testing/selftests/powerpc/Makefile
> > +++ b/tools/testing/selftests/powerpc/Makefile
> > @@ -22,7 +22,8 @@ SUB_DIRS = benchmarks > > > \
> > > > switch_endian> > \
> > > > syscalls> > > \
> > > > tm> > > > \
> > -> > vphn
> > +> > vphn \
> > +> > math
> >
> > endif
> >
> > diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
> > new file mode 100644
> > index 0000000..896d9e2
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/Makefile
> > @@ -0,0 +1,14 @@
> > +TEST_PROGS := fpu_syscall vmx_syscall
>
>
> Add a new .gitignore in this dirfor these new build objects.
>
Yep
> > +
> > +all: $(TEST_PROGS)
> > +
> > +$(TEST_PROGS): ../harness.c
> > +$(TEST_PROGS): CFLAGS += -O2 -g
> > +
> > +fpu_syscall: fpu_asm.S
> > +vmx_syscall: vmx_asm.S
> > +
> > +include ../../lib.mk
> > +
> > +clean:
> > +> > rm -f $(TEST_PROGS) *.o
> > diff --git a/tools/testing/selftests/powerpc/math/basic_asm.h b/tools/testing/selftests/powerpc/math/basic_asm.h
> > new file mode 100644
> > index 0000000..27aca79
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/basic_asm.h
>
> Can you put this up a directory since it's generically useful for
> powerpc?
>
Sure why not.
> > @@ -0,0 +1,26 @@
> > +#include
> > +#include
> > +
> > +#define LOAD_REG_IMMEDIATE(reg,expr) \
> > +> > lis> > reg,(expr)@highest;> > \
> > +> > ori> > reg,reg,(expr)@higher;> > \
> > +> > rldicr> > reg,reg,32,31;> > \
> > +> > oris> > reg,reg,(expr)@high;> > \
> > +> > ori> > reg,reg,(expr)@l;
> > +
> > +#define PUSH_BASIC_STACK(size) \
> > +> > std> > 2,24(sp); \
> > +> > mflr> > r0; \
> > +> > std> > r0,16(sp); \
> > +> > mfcr> > r0; \
> > +> > stw> > r0,8(sp); \
> > +> > stdu> > sp,-size(sp);
> > +
> > +#define POP_BASIC_STACK(size) \
> > +> > addi> > sp,sp,size; \
> > +> > ld> > 2,24(sp); \
> > +> > ld> > r0,16(sp); \
> > +> > mtlr> > r0; \
> > +> > lwz> > r0,8(sp); \
> > +> > mtcr> > r0; \
> > +
> > diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
> > new file mode 100644
> > index 0000000..d5412c1
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
> > @@ -0,0 +1,151 @@
> > +#include "basic_asm.h"
> > +
> > +#define PUSH_FPU(pos) \
> > +> > stfd> > f14,pos(sp); \
> > +> > stfd> > f15,pos+8(sp); \
> > +> > stfd> > f16,pos+16(sp); \
> > +> > stfd> > f17,pos+24(sp); \
> > +> > stfd> > f18,pos+32(sp); \
> > +> > stfd> > f19,pos+40(sp); \
> > +> > stfd> > f20,pos+48(sp); \
> > +> > stfd> > f21,pos+56(sp); \
> > +> > stfd> > f22,pos+64(sp); \
> > +> > stfd> > f23,pos+72(sp); \
> > +> > stfd> > f24,pos+80(sp); \
> > +> > stfd> > f25,pos+88(sp); \
> > +> > stfd> > f26,pos+96(sp); \
> > +> > stfd> > f27,pos+104(sp); \
> > +> > stfd> > f28,pos+112(sp); \
> > +> > stfd> > f29,pos+120(sp); \
> > +> > stfd> > f30,pos+128(sp); \
> > +> > stfd> > f31,pos+136(sp);
> > +
> > +#define POP_FPU(pos) \
> > +> > lfd> > f14,pos(sp); \
> > +> > lfd> > f15,pos+8(sp); \
> > +> > lfd> > f16,pos+16(sp); \
> > +> > lfd> > f17,pos+24(sp); \
> > +> > lfd> > f18,pos+32(sp); \
> > +> > lfd> > f19,pos+40(sp); \
> > +> > lfd> > f20,pos+48(sp); \
> > +> > lfd> > f21,pos+56(sp); \
> > +> > lfd> > f22,pos+64(sp); \
> > +> > lfd> > f23,pos+72(sp); \
> > +> > lfd> > f24,pos+80(sp); \
> > +> > lfd> > f25,pos+88(sp); \
> > +> > lfd> > f26,pos+96(sp); \
> > +> > lfd> > f27,pos+104(sp); \
> > +> > lfd> > f28,pos+112(sp); \
> > +> > lfd> > f29,pos+120(sp); \
> > +> > lfd> > f30,pos+128(sp); \
> > +> > lfd> > f31,pos+136(sp);
> > +
> > +#Careful calling this, it will 'clobber' fpu (by design)
> > +#Don't call this from C
> > +FUNC_START(load_fpu)
> > +> > lfd> > f14,0(r3)
> > +> > lfd> > f15,8(r3)
> > +> > lfd> > f16,16(r3)
> > +> > lfd> > f17,24(r3)
> > +> > lfd> > f18,32(r3)
> > +> > lfd> > f19,40(r3)
> > +> > lfd> > f20,48(r3)
> > +> > lfd> > f21,56(r3)
> > +> > lfd> > f22,64(r3)
> > +> > lfd> > f23,72(r3)
> > +> > lfd> > f24,80(r3)
> > +> > lfd> > f25,88(r3)
> > +> > lfd> > f26,96(r3)
> > +> > lfd> > f27,104(r3)
> > +> > lfd> > f28,112(r3)
> > +> > lfd> > f29,120(r3)
> > +> > lfd> > f30,128(r3)
> > +> > lfd> > f31,136(r3)
> > +> > blr
> > +FUNC_END(load_fpu)
> > +
> > +FUNC_START(check_fpu)
> > +> > mr r4,r3
> > +> > li> > r3,1 #assume a bad result
> > +> > lfd> > f0,0(r4)
> > +> > fcmpu> > cr1,f0,f14
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,8(r4)
> > +> > fcmpu> > cr1,f0,f15
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,16(r4)
> > +> > fcmpu> > cr1,f0,f16
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,24(r4)
> > +> > fcmpu> > cr1,f0,f17
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,32(r4)
> > +> > fcmpu> > cr1,f0,f18
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,40(r4)
> > +> > fcmpu> > cr1,f0,f19
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,48(r4)
> > +> > fcmpu> > cr1,f0,f20
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,56(r4)
> > +> > fcmpu> > cr1,f0,f21
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,64(r4)
> > +> > fcmpu> > cr1,f0,f22
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,72(r4)
> > +> > fcmpu> > cr1,f0,f23
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,80(r4)
> > +> > fcmpu> > cr1,f0,f24
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,88(r4)
> > +> > fcmpu> > cr1,f0,f25
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,96(r4)
> > +> > fcmpu> > cr1,f0,f26
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,104(r4)
> > +> > fcmpu> > cr1,f0,f27
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,112(r4)
> > +> > fcmpu> > cr1,f0,f28
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,120(r4)
> > +> > fcmpu> > cr1,f0,f29
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,128(r4)
> > +> > fcmpu> > cr1,f0,f30
> > +> > bne> > cr1,1f
> > +> > lfd> > f0,136(r4)
> > +> > fcmpu> > cr1,f0,f31
> > +> > bne> > cr1,1f
> > +> > li> > r3,0 #Sucess!!!
> > +1:> > blr
> > +
> > +FUNC_START(test_fpu)
> > +> > #r3 holds pointer to where to put the result of fork
>
> #r4 seems to hold a ptr to the pid
>
Thanks
> > + #f14-f31 are non volatiles
> > +> > PUSH_BASIC_STACK(256)
> > +> > std> > r3,40(sp) #Address of darray
> > +> > std r4,48(sp) #Address of pid
> > +> > PUSH_FPU(56)
> > +
> > +> > bl load_fpu
> > +> > nop
> > +> > li> > r0,__NR_fork
> > +> > sc
> > +
> > +> > #pass the result of the fork to the caller
> > +> > ld> > r9,48(sp)
> > +> > std> > r3,0(r9)
> > +
> > +> > ld r3,40(sp)
> > +> > bl check_fpu
> > +> > nop
> > +
> > +> > POP_FPU(56)
> > +> > POP_BASIC_STACK(256)
> > +> > blr
> > +FUNC_END(test_fpu)
> > diff --git a/tools/testing/selftests/powerpc/math/fpu_syscall.c b/tools/testing/selftests/powerpc/math/fpu_syscall.c
> > new file mode 100644
> > index 0000000..a967fd6
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/fpu_syscall.c
> > @@ -0,0 +1,79 @@
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +
> > +#include "utils.h"
> > +
> > +extern int test_fpu(double *darray, pid_t *pid);
> > +
> > +double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
> > +> > > 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
> > +> > > 2.1};
> > +
> > +int syscall_fpu(void)
> > +{
> > +> > pid_t fork_pid;
> > +> > int i;
> > +> > int ret;
> > +> > int child_ret;
> > +> > for (i = 0; i < 1000; i++) {
> > +> > > /* test_fpu will fork() */
> > +> > > ret = test_fpu(darray, &fork_pid);
> > +> > > if (fork_pid == -1)
> > +> > > > return -1;
> > +> > > if (fork_pid == 0)
> > +> > > > exit(ret);
> > +> > > waitpid(fork_pid, &child_ret, 0);
> > +> > > if (ret || child_ret)
> > +> > > > return 1;
> > +> > }
> > +
> > +> > return 0;
> > +}
> > +
> > +int test_syscall_fpu(void)
> > +{
> > +> > /*
> > +> > * Setup an environment with much context switching
> > +> > */
> > +> > pid_t pid2;
> > +> > pid_t pid = fork();
> > +> > int ret;
> > +> > int child_ret;
> > +> > FAIL_IF(pid == -1);
> > +
> > +> > pid2 = fork();
> > +> > /* Can't FAIL_IF(pid2 == -1); because already forked once */
> > +> > if (pid2 == -1) {
> > +> > > /*
> > +> > > * Couldn't fork, ensure test is a fail
> > +> > > */
> > +> > > child_ret = ret = 1;
> > +> > } else {
> > +> > > ret = syscall_fpu();
> > +> > > if (pid2)
> > +> > > > waitpid(pid2, &child_ret, 0);
> > +> > > else
> > +> > > > exit(ret);
> > +> > }
> > +
> > +> > ret |= child_ret;
> > +
> > +> > if (pid)
> > +> > > waitpid(pid, &child_ret, 0);
> > +> > else
> > +> > > exit(ret);
> > +
> > +> > FAIL_IF(ret || child_ret);
> > +> > return 0;
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +> > return test_harness(test_syscall_fpu, "syscall_fpu");
> > +
> > +}
> > diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
> > new file mode 100644
> > index 0000000..e642e67
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
> > @@ -0,0 +1,183 @@
> > +#include "basic_asm.h"
> > +
> > +#define PUSH_VMX(pos,reg) \
> > +> > li> > reg,pos; \
> > +> > stvx> > v20,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v21,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v22,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v23,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v24,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v25,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v26,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v27,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v28,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v29,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v30,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > stvx> > v31,reg,sp;
> > +
> > +#define POP_VMX(pos,reg) \
> > +> > li> > reg,pos; \
> > +> > lvx> > v20,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v21,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v22,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v23,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v24,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v25,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v26,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v27,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v28,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v29,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v30,reg,sp; \
> > +> > addi> > reg,reg,16; \
> > +> > lvx> > v31,reg,sp;
> > +
> > +#Carefull this will 'clobber' vmx (by design)
> > +#Don't call this from C
> > +FUNC_START(load_vmx)
> > +> > li> > r5,0
> > +> > lvx> > v20,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v21,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v22,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v23,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v24,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v25,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v26,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v27,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v28,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v29,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v30,r5,r3
> > +> > addi> > r5,r5,16
> > +> > lvx> > v31,r5,r3
> > +> > blr
> > +FUNC_END(load_vmx)
> > +
> > +#Should be safe from C, only touches r4, r5 and v0,v1,v2
> > +FUNC_START(check_vmx)
> > +> > PUSH_BASIC_STACK(16)
> > +> > mr r4,r3
> > +> > li> > r3,1 #assume a bad result
> > +> > li> > r5,0
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v20
> > +> > vmr> > v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v21
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v22
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v23
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v24
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v25
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v26
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v27
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v28
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v29
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v30
> > +> > vand> > v2,v2,v1
> > +
> > +> > addi> > r5,r5,16
> > +> > lvx> > v0,r5,r4
> > +> > vcmpequd.> > v1,v0,v31
> > +> > vand> > v2,v2,v1
> > +
> > +> > li r5,0
> > +> > stvx> > v2,r5,sp
> > +> > ldx> > r0,r5,sp
> > +> > cmpdi> > r0,0xffffffff
> > +> > bne> > 1f
> > +> > li> > r3,0
> > +1:> > POP_BASIC_STACK(16)
> > +> > blr
> > +FUNC_END(check_vmx)
> > +
> > +#Safe from C
> > +FUNC_START(test_vmx)
> > +> > #r3 holds pointer to where to put the result of fork
> > +> > #v20-v31 are non-volatile
> > +> > PUSH_BASIC_STACK(512)
> > +> > std> > r3,40(sp) #Address of varray
> > +> > std r4,48(sp) #address of pid
> > +> > PUSH_VMX(56, r4)
> > +
> > +> > bl load_vmx
> > +
> > +> > li> > r0,__NR_fork
> > +> > sc
> > +> > #Pass the result of fork back to the caller
> > +> > ld> > r9,48(sp)
> > +> > std> > r3,0(r9)
> > +
> > +> > ld r3,40(sp)
> > +> > bl check_vmx
> > +
> > +> > POP_VMX(56,r4)
> > +> > POP_BASIC_STACK(512)
> > +> > blr
> > +FUNC_END(test_vmx)
> > diff --git a/tools/testing/selftests/powerpc/math/vmx_syscall.c b/tools/testing/selftests/powerpc/math/vmx_syscall.c
> > new file mode 100644
> > index 0000000..7adff05
> > --- /dev/null
> > +++ b/tools/testing/selftests/powerpc/math/vmx_syscall.c
> > @@ -0,0 +1,81 @@
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +#include
> > +
> > +#include "utils.h"
> > +
> > +typedef int v4si __attribute__ ((vector_size (16)));
> > +v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
> > +> > {13,14,15,16},{17,18,19,20},{21,22,23,24},
> > +> > {25,26,27,28},{29,30,31,32},{33,34,35,36},
> > +> > {37,38,39,40},{41,42,43,44},{45,46,47,48}};
> > +
> > +extern int test_vmx(v4si *varray, pid_t *pid);
> > +
> > +int vmx_syscall(void)
> > +{
> > +> > pid_t fork_pid;
> > +> > int i;
> > +> > int ret;
> > +> > int child_ret;
> > +> > for (i = 0; i < 1000; i++) {
> > +> > > /* test_vmx will fork() */
> > +> > > ret = test_vmx(varray, &fork_pid);
> > +> > > if (fork_pid == -1)
> > +> > > > return -1;
> > +> > > if (fork_pid == 0)
> > +> > > > exit(ret);
> > +> > > waitpid(fork_pid, &child_ret, 0);
> > +> > > if (ret || child_ret)
> > +> > > > return 1;
> > +> > }
> > +
> > +> > return 0;
> > +}
> > +
> > +int test_vmx_syscall(void)
> > +{
> > +> > /*
> > +> > * Setup an environment with much context switching
> > +> > */
> > +> > pid_t pid2;
> > +> > pid_t pid = fork();
> > +> > int ret;
> > +> > int child_ret;
> > +> > FAIL_IF(pid == -1);
> > +
> > +> > pid2 = fork();
> > +> > ret = vmx_syscall();
> > +> > /* Can't FAIL_IF(pid2 == -1); because we've already forked */
> > +> > if (pid2 == -1) {
> > +> > > /*
> > +> > > * Couldn't fork, ensure child_ret is set and is a fail
> > +> > > */
> > +> > > ret = child_ret = 1;
> > +> > } else {
> > +> > > if (pid2)
> > +> > > > waitpid(pid2, &child_ret, 0);
> > +> > > else
> > +> > > > exit(ret);
> > +> > }
> > +
> > +> > ret |= child_ret;
> > +
> > +> > if (pid)
> > +> > > waitpid(pid, &child_ret, 0);
> > +> > else
> > +> > > exit(ret);
> > +
> > +> > FAIL_IF(ret || child_ret);
> > +> > return 0;
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +> > return test_harness(test_vmx_syscall, "vmx_syscall");
> > +
> > +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext
2015-11-18 3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2015-11-19 11:36 ` [3/8] " Michael Ellerman
@ 2015-11-23 1:04 ` Michael Neuling
1 sibling, 0 replies; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 1:04 UTC (permalink / raw)
To: Cyril Bur, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> Load up the non volatile FPU and VMX regs and ensure that they are
> the
> expected value in a signal handler
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> tools/testing/selftests/powerpc/math/Makefile | 4 +-
> tools/testing/selftests/powerpc/math/fpu_signal.c | 119
> +++++++++++++++++++++
> tools/testing/selftests/powerpc/math/vmx_signal.c | 124
> ++++++++++++++++++++++
> 3 files changed, 246 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
> create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
>=20
> diff --git a/tools/testing/selftests/powerpc/math/Makefile
> b/tools/testing/selftests/powerpc/math/Makefile
> index 9fa690f..5ce000bf 100644
> --- a/tools/testing/selftests/powerpc/math/Makefile
> +++ b/tools/testing/selftests/powerpc/math/Makefile
> @@ -1,4 +1,4 @@
> -TEST_PROGS :=3D fpu_syscall fpu_preempt vmx_syscall vmx_preempt
> +TEST_PROGS :=3D fpu_syscall fpu_preempt fpu_signal vmx_syscall
> vmx_preempt vmx_signal
.gitignore
> all: $(TEST_PROGS)
> =20
> @@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS +=3D -O2 -g -pthread
> =20
> fpu_syscall: fpu_asm.S
> fpu_preempt: fpu_asm.S
> +fpu_signal: fpu_asm.S
> =20
> vmx_syscall: vmx_asm.S
> vmx_preempt: vmx_asm.S
> +vmx_signal: vmx_asm.S
> =20
> include ../../lib.mk
> =20
> diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c
> b/tools/testing/selftests/powerpc/math/fpu_signal.c
> new file mode 100644
> index 0000000..ca61d1f
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
> @@ -0,0 +1,119 @@
copyright + detailed description of test here.
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <stdlib.h>
> +#include <pthread.h>
> +
> +#include "utils.h"
> +
> +/* Number of times each thread should recieve the signal */
receive=20
> +#define ITERATIONS 10
> +/*
> + * Factor by which to multiply number of online CPUs for total
> number of
> + * worker threads
> + */
> +#define THREAD_FACTOR 8
> +
> +__thread double darray[] =3D {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
> 0.9, 1.0,
> + 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9,
> 2.0,
> + 2.1};
> +
> +bool bad_context;
> +int running;
> +volatile int not_ready;
Don't use volatile... Ready Documentation/volatile-considered
-harmful.txt
> +extern long preempt_fpu(double *darray, volatile int *not_ready, int
> *sentinal);
> +
> +void signal_fpu_sig(int sig, siginfo_t *info, void *context)
> +{
> + int i;
> + ucontext_t *uc =3D context;
> + mcontext_t *mc =3D &uc->uc_mcontext;
> +
> + /* Only the non volatiles were loaded up */
> + for (i =3D 14; i < 32; i++) {
> + if (mc->fp_regs[i] !=3D darray[i - 14]) {
> + bad_context =3D true;
> + break;
> + }
> + }
> +}
> +
> +void *signal_fpu_c(void *p)
> +{
> + int i;
> + long rc;
> + struct sigaction act;
> + act.sa_sigaction =3D signal_fpu_sig;
> + act.sa_flags =3D SA_SIGINFO;
> + rc =3D sigaction(SIGUSR1, &act, NULL);
> + if (rc)
> + return p;
> +
> + srand(pthread_self());
> + for (i =3D 0; i < 21; i++)
> + darray[i] =3D rand();
> +
> + rc =3D preempt_fpu(darray, ¬_ready, &running);
> +
> + return (void *) rc;
> +}
> +
> +int test_signal_fpu(void)
This is almost idential to test_signal_vmx. Can you merge them?
> +{
> + int i, j, rc, threads;
> + void *rc_p;
> + pthread_t *tids;
> +
> + threads =3D sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> + tids =3D malloc(threads * sizeof(pthread_t));
> + FAIL_IF(!tids);
> +
> + running =3D true;
> + not_ready =3D threads;
> + for (i =3D 0; i < threads; i++) {
> + rc =3D pthread_create(&tids[i], NULL, signal_fpu_c,
> NULL);
> + FAIL_IF(rc);
> + }
> +
> + setbuf(stdout, NULL);
> + printf("\tWaiting for all workers to start...");
> + while (not_ready);
A barrier needed here.
> + printf("done\n");
> +
> + printf("\tSending signals to all threads %d times...",
> ITERATIONS);
> + for (i =3D 0; i < ITERATIONS; i++) {
> + for (j =3D 0; j < threads; j++) {
> + pthread_kill(tids[j], SIGUSR1);
> + }
> + sleep(1);
> + }
> + printf("done\n");
> +
> + printf("\tKilling workers...");
> + running =3D 0;
> + for (i =3D 0; i < threads; i++) {
> + pthread_join(tids[i], &rc_p);
> +
> + /*
> + * Harness will say the fail was here, look at why
> signal_fpu
> + * returned
> + */
> + if ((long) rc_p || bad_context)
bad_context is set in a signal handler but is that run in another
thread (I think)? You might have some memory barrier issues with
bad_context also.
> + printf("oops\n");
> + if (bad_context)
> + fprintf(stderr, "\t!! bad_context is
> true\n");
> + FAIL_IF((long) rc_p || bad_context);
> + }
> + printf("done\n");
> +
> + free(tids);
> + return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + return test_harness(test_signal_fpu, "fpu_signal");
> +}
> diff --git a/tools/testing/selftests/powerpc/math/vmx_signal.c
> b/tools/testing/selftests/powerpc/math/vmx_signal.c
> new file mode 100644
> index 0000000..007ac9e
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/math/vmx_signal.c
> @@ -0,0 +1,124 @@
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <pthread.h>
> +
> +#include "utils.h"
> +
> +/* Number of times each thread should recieve the signal */
> +#define ITERATIONS 10
> +/*
> + * Factor by which to multiply number of online CPUs for total
> number of
> + * worker threads
> + */
> +#define THREAD_FACTOR 8
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +__thread v4si varray[] =3D {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
> + {13,14,15,16},{17,18,19,20},{21,22,23,24},
> + {25,26,27,28},{29,30,31,32},{33,34,35,36},
> + {37,38,39,40},{41,42,43,44},{45,46,47,48}};
> +
> +bool bad_context;
> +int running;
> +volatile int not_ready;
> +
> +extern int preempt_vmx(v4si *varray, volatile int *not_ready, int
> *sentinal);
> +
> +void signal_vmx_sig(int sig, siginfo_t *info, void *context)
> +{
> + int i;
> + ucontext_t *uc =3D context;
> + mcontext_t *mc =3D &uc->uc_mcontext;
> +
> + /* Only the non volatiles were loaded up */
> + for (i =3D 20; i < 32; i++) {
> + if (memcmp(mc->v_regs->vrregs[i], &varray[i - 20],
> 16)) {
> + bad_context =3D true;
> + break;
> + }
> + }
> +}
> +
> +void *signal_vmx_c(void *p)
> +{
> + int i, j;
> + long rc;
> + struct sigaction act;
> + act.sa_sigaction =3D signal_vmx_sig;
> + act.sa_flags =3D SA_SIGINFO;
> + rc =3D sigaction(SIGUSR1, &act, NULL);
> + if (rc)
> + return p;
> +
> + srand(pthread_self());
> + for (i =3D 0; i < 12; i++)
> + for (j =3D 0; j < 4; j++)
> + varray[i][j] =3D rand();
> +
> + rc =3D preempt_vmx(varray, ¬_ready, &running);
> +
> + return (void *) rc;
> +}
> +
> +int test_signal_vmx(void)
> +{
> + int i, j, rc, threads;
> + void *rc_p;
> + pthread_t *tids;
> +
> + threads =3D sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> + tids =3D malloc(threads * sizeof(pthread_t));
> + FAIL_IF(!tids);
> +
> + running =3D true;
> + not_ready =3D threads;
> + for (i =3D 0; i < threads; i++) {
> + rc =3D pthread_create(&tids[i], NULL, signal_vmx_c,
> NULL);
> + FAIL_IF(rc);
> + }
> +
> + setbuf(stdout, NULL);
> + printf("\tWaiting for all workers to start...");
> + while (not_ready);
> + printf("done\n");
> +
> + printf("\tSending signals to all threads %d times...",
> ITERATIONS);
> + for (i =3D 0; i < ITERATIONS; i++) {
> + for (j =3D 0; j < threads; j++) {
> + pthread_kill(tids[j], SIGUSR1);
> + }
> + sleep(1);
> + }
> + printf("done\n");
> +
> + printf("\tKilling workers...");
> + running =3D 0;
> + for (i =3D 0; i < threads; i++) {
> + pthread_join(tids[i], &rc_p);
> +
> + /*
> + * Harness will say the fail was here, look at why
> signal_vmx
> + * returned
> + */
> + if ((long) rc_p || bad_context)
> + printf("oops\n");
> + if (bad_context)
> + fprintf(stderr, "\t!! bad_context is
> true\n");
> + FAIL_IF((long) rc_p || bad_context);
> + }
> + printf("done\n");
> +
> + free(tids);
> + return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + return test_harness(test_signal_vmx, "vmx_signal");
> +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2015-11-23 0:58 ` Cyril Bur
@ 2015-11-23 1:06 ` Michael Neuling
0 siblings, 0 replies; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 1:06 UTC (permalink / raw)
To: Cyril Bur; +Cc: anton, linuxppc-dev
On Mon, 2015-11-23 at 11:58 +1100, Cyril Bur wrote:
> On Mon, 23 Nov 2015 11:23:13 +1100
> Michael Neuling <mikey@neuling.org> wrote:
>=20
> > On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> > > Test that the non volatile floating point and Altivec registers
> > > get
> > > correctly preserved across the fork() syscall. =20
> >=20
> > Can we add a test for VSX too? I realise it's the same registers,
> > but
> > the enable bits in the MSR are different so it's easy to get them
> > wrong
> > in the kernel.
>=20
> Yeah, I'm sure I could get that wrong haha.
>=20
> Hmmmm this got me thinking. Today we always enable FP and Altivec
> when we
> enable VSX but isn't there a world where we could actually run with
> FP and
> Altivec disabled and VSX on? In which case, is the whole thing
> volatile or
> does the kernel still need to save the subset of the matrix which
> corresponds=20
> to non-volatile FPs and non-volatile Altivec?
The hardware can run with FP and VMX off and VSX on but we should never
do that in Linux.
Mikey
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] powerpc: Explicitly disable math features when copying thread
2015-11-18 3:26 ` [PATCH 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2015-11-23 1:08 ` Michael Neuling
2015-11-23 3:20 ` Cyril Bur
0 siblings, 1 reply; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 1:08 UTC (permalink / raw)
To: Cyril Bur, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> With threads leaving the math bits enabled in their saved MSR to
> indicate
> that the hardware is hot and a restore is not needed, children need
> to turn
> it off as when they do get scheduled, there's no way their registers
> could
> have been hot.
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> arch/powerpc/kernel/process.c | 1 +
> 1 file changed, 1 insertion(+)
>=20
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 398f7bf..441d9e5 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1243,6 +1243,7 @@ int copy_thread(unsigned long clone_flags,
> unsigned long usp,
> =20
> f =3D ret_from_fork;
> }
> + childregs->msr &=3D ~(MSR_FP|MSR_VEC|MSR_VSX);
Is this a current bug?=20
Mikey
> sp -=3D STACK_FRAME_OVERHEAD;
> =20
> /*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used
2015-11-18 3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2015-11-20 11:01 ` Michael Ellerman
@ 2015-11-23 1:29 ` Michael Neuling
1 sibling, 0 replies; 23+ messages in thread
From: Michael Neuling @ 2015-11-23 1:29 UTC (permalink / raw)
To: Cyril Bur, anton, linuxppc-dev
On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> Currently the FPU, VEC and VSX facilities are lazily loaded. This is
> not a
> problem unless a process is using these facilities.
I would prefer to say facilities are "enabled" and registers are
"loaded". You're mixing the two here.
> Modern versions of GCC are very good at automatically vectorising
> code, new
> and modernised workloads make use of floating point and vector
> facilities,
> even the kernel makes use of vectorised memcpy.
>=20
> All this combined greatly increases the cost of a syscall since the
> kernel
> uses the facilities sometimes even in syscall fast-path making it
> increasingly common for a thread to take an *_unavailable exception
> soon
> after a syscall, not to mention potentially taking the trifecta.
>=20
> The obvious overcompensation to this problem is to simply always load
> all
> the facilities on every exit to userspace. Loading up all FPU, VEC
> and VSX
> registers every time can be expensive and if a workload does avoid
> using
> them, it should not be forced to incur this penalty.
>=20
> An 8bit counter is used to detect if the registers have been used in
> the
> past and the registers are always loaded until the value wraps to
> back to
> zero.
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> arch/powerpc/include/asm/processor.h | 2 ++
> arch/powerpc/kernel/asm-offsets.c | 2 ++
> arch/powerpc/kernel/entry_64.S | 55
> ++++++++++++++++++++++++++++--
> arch/powerpc/kernel/fpu.S | 4 +++
> arch/powerpc/kernel/process.c | 66
> ++++++++++++++++++++++++++++++------
> arch/powerpc/kernel/vector.S | 4 +++
> 6 files changed, 119 insertions(+), 14 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/processor.h
> b/arch/powerpc/include/asm/processor.h
> index ac23308..dcab21f 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -236,11 +236,13 @@ struct thread_struct {
> #endif
> struct arch_hw_breakpoint hw_brk; /* info on the hardware
> breakpoint */
> unsigned long trap_nr; /* last trap # on this
> thread */
> + u8 load_fp;
> #ifdef CONFIG_ALTIVEC
> struct thread_vr_state vr_state;
> struct thread_vr_state *vr_save_area;
> unsigned long vrsave;
> int used_vr; /* set if process has
> used altivec */
> + u8 load_vec;
> #endif /* CONFIG_ALTIVEC */
> #ifdef CONFIG_VSX
> /* VSR status */
> diff --git a/arch/powerpc/kernel/asm-offsets.c
> b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..0f593d7 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -95,12 +95,14 @@ int main(void)
> DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct,
> fp_state));
> DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct,
> fp_save_area));
> DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state,
> fpscr));
> + DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct,
> load_fp));
> #ifdef CONFIG_ALTIVEC
> DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct,
> vr_state));
> DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct,
> vr_save_area));
> DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct,
> vrsave));
> DEFINE(THREAD_USED_VR, offsetof(struct thread_struct,
> used_vr));
> DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state,
> vscr));
> + DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct,
> load_vec));
> #endif /* CONFIG_ALTIVEC */
> #ifdef CONFIG_VSX
> DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct,
> used_vsr));
> diff --git a/arch/powerpc/kernel/entry_64.S
> b/arch/powerpc/kernel/entry_64.S
> index c8b4225..46e9869 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -210,7 +210,54 @@ system_call: /* label
> this so stack traces look sane */
> li r11,-MAX_ERRNO
> andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TI
> F_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> bne- syscall_exit_work
> - cmpld r3,r11
> +
> + /*
> + * This is an assembly version of checks performed in
> restore_math()
> + * to avoid calling C unless absolutely necessary.
> + * Note: In order to simplify the assembly, if the FP or VEC
> registers
> + * are hot (and therefore restore_math() isn't called) the
> + * LOAD_{FP,VEC} thread counter doesn't get incremented.
> + * This is likely the best thing to do anyway because hot
> regs indicate
> + * that the workload is doing a lot of syscalls that can be
> handled
> + * quickly and without the need to touch FP or VEC regs (by
> the kernel).
> + * a) If this workload is long running then this is exactly
> what the
> + * kernel should be doing.
> + * b) If this workload isn't long running then we'll soon
> fall back to
> + * calling into C and the counter will be incremented
> regularly again
> + * anyway.
> + */
> + ld r9,PACACURRENT(r13)
> + andi. r0,r8,MSR_FP
> + addi r9,r9,THREAD
> + lbz r5,THREAD_LOAD_FP(r9)
Would this work as a minor simplification?
ld r9,PACACURRENT(r13)
andi. r0,r8,MSR_FP
lbz r5,THREAD+THREAD_LOAD_FP(r9)
> +> > /*
> +> > * Goto 2 if !r0 && r5
> +> > * The cmpb works because r5 can only have bits set in the lowest b=
yte
> +> > * and r0 may or may not have bit 13 set (different byte) but will =
have
> +> > * a zero low byte therefore the low bytes must differ if r5 =3D=3D=
true
> +> > * and the bit 13 byte must be the same if !r0
> +> > */
> + cmpb r7,r0,r5
> + cmpldi r7,0xff0
> +#ifdef CONFIG_ALTIVEC
> + beq 2f
> +
> + lbz r9,THREAD_LOAD_VEC(r9)
> + andis. r0,r8,MSR_VEC@h
> + /* Skip (goto 3) if r0 || !r9 */
> + bne 3f
> + cmpldi r9,0
> + beq 3f
> +#else
> + bne 3f
> +#endif
> +2: addi r3,r1,STACK_FRAME_OVERHEAD
> + bl restore_math
> + ld r8,_MSR(r1)
> + ld r3,RESULT(r1)
> + li r11,-MAX_ERRNO
> +
> +3: cmpld r3,r11
> ld r5,_CCR(r1)
> bge- syscall_error
> .Lsyscall_error_cont:
> @@ -592,8 +639,8 @@ _GLOBAL(ret_from_except_lite)
> =20
> /* Check current_thread_info()->flags */
> andi. r0,r4,_TIF_USER_WORK_MASK
> -#ifdef CONFIG_PPC_BOOK3E
> bne 1f
> +#ifdef CONFIG_PPC_BOOK3E
why this change?
> /*
> * Check to see if the dbcr0 register is set up to debug.
> * Use the internal debug mode bit to do this.
> @@ -608,7 +655,9 @@ _GLOBAL(ret_from_except_lite)
> mtspr SPRN_DBSR,r10
> b restore
> #else
> - beq restore
> + addi r3,r1,STACK_FRAME_OVERHEAD
> + bl restore_math
> + b restore
> #endif
> 1: andi. r0,r4,_TIF_NEED_RESCHED
> beq 2f
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index 2117eac..b063524 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -130,6 +130,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> or r12,r12,r4
> std r12,_MSR(r1)
> #endif
> + /* Don't care if r4 overflows, this is desired behaviour */
> + lbz r4,THREAD_LOAD_FP(r5)
> + addi r4,r4,1
> + stb r4,THREAD_LOAD_FP(r5)
> addi r10,r5,THREAD_FPSTATE
> lfd fr0,FPSTATE_FPSCR(r10)
> MTFSF_L(fr0)
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 441d9e5..c602b67 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -366,6 +366,53 @@ void giveup_all(struct task_struct *tsk)
> msr_check_and_clear(msr_all_available);
> }
> =20
> +void restore_math(struct pt_regs *regs)
> +{
> + unsigned long msr;
> +
> + if (!current->thread.load_fp
> +#ifdef CONFIG_ALTIVEC
> + && !current->thread.load_vec)
> +#else
> + )
> +#endif
That is possibly the most revolting use of #ifdef I have ever seen.=20
And that's saying something with the crap state that process.c is
already in :-P
> + return;
> +
> + msr =3D regs->msr;
> + msr_check_and_set(msr_all_available);
> +
> + /*
> + * Only reload if the bit is not set in the user MSR, the
> bit BEING set
> + * indicates that the registers are hot
> + */
> +#ifdef CONFIG_PPC_FPU
> + if (current->thread.load_fp && !(msr & MSR_FP)) {
> + load_fp_state(¤t->thread.fp_state);
> + msr |=3D MSR_FP | current->thread.fpexc_mode;
> + current->thread.load_fp++;
> + }
> +#endif
> +#ifdef CONFIG_ALTIVEC
> + if (current->thread.load_vec && !(msr & MSR_VEC) &&
> + cpu_has_feature(CPU_FTR_ALTIVEC)) {
> + load_vr_state(¤t->thread.vr_state);
> + current->thread.used_vr =3D 1;
> + msr |=3D MSR_VEC;
> + current->thread.load_vec++;
> + }
> +#endif
> +#ifdef CONFIG_VSX
> + if (!(msr & MSR_VSX) && (msr & (MSR_FP | MSR_VEC)) =3D=3D
> (MSR_FP | MSR_VEC)) {
> + current->thread.used_vsr =3D 1;
> + msr |=3D MSR_VSX;
> + }
> +#endif
> +
> + msr_check_and_clear(msr_all_available);
> +
> + regs->msr =3D msr;
> +}
> +
> void flush_all_to_thread(struct task_struct *tsk)
> {
> if (tsk->thread.regs) {
> @@ -806,17 +853,9 @@ void restore_tm_state(struct pt_regs *regs)
> =20
> msr_diff =3D current->thread.ckpt_regs.msr & ~regs->msr;
> msr_diff &=3D MSR_FP | MSR_VEC | MSR_VSX;
> - if (msr_diff & MSR_FP) {
> - msr_check_and_set(MSR_FP);
> - load_fp_state(¤t->thread.fp_state);
> - msr_check_and_clear(MSR_FP);
> - regs->msr |=3D current->thread.fpexc_mode;
> - }
> - if (msr_diff & MSR_VEC) {
> - msr_check_and_set(MSR_VEC);
> - load_vr_state(¤t->thread.vr_state);
> - msr_check_and_clear(MSR_VEC);
> - }
> +
> + restore_math(regs);
> +
> regs->msr |=3D msr_diff;
> }
> =20
> @@ -977,6 +1016,11 @@ struct task_struct *__switch_to(struct
> task_struct *prev,
> batch =3D this_cpu_ptr(&ppc64_tlb_batch);
> batch->active =3D 1;
> }
> +
> + /* Don't do this on a kernel thread */
> + if (current_thread_info()->task->thread.regs)
> + restore_math(current_thread_info()->task
> ->thread.regs);
> +
> #endif /* CONFIG_PPC_BOOK3S_64 */
> =20
> return last;
> diff --git a/arch/powerpc/kernel/vector.S
> b/arch/powerpc/kernel/vector.S
> index 162d0f7..038cff8 100644
> --- a/arch/powerpc/kernel/vector.S
> +++ b/arch/powerpc/kernel/vector.S
> @@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
> oris r12,r12,MSR_VEC@h
> std r12,_MSR(r1)
> #endif
> + /* Don't care if r4 overflows, this is desired behaviour */
> + lbz r4,THREAD_LOAD_VEC(r5)
> + addi r4,r4,1
> + stb r4,THREAD_LOAD_VEC(r5)
> addi r6,r5,THREAD_VRSTATE
> li r4,1
> li r10,VRSTATE_VSCR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] powerpc: Explicitly disable math features when copying thread
2015-11-23 1:08 ` Michael Neuling
@ 2015-11-23 3:20 ` Cyril Bur
0 siblings, 0 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-23 3:20 UTC (permalink / raw)
To: Michael Neuling; +Cc: anton, linuxppc-dev
On Mon, 23 Nov 2015 12:08:38 +1100
Michael Neuling <mikey@neuling.org> wrote:
> On Wed, 2015-11-18 at 14:26 +1100, Cyril Bur wrote:
> > With threads leaving the math bits enabled in their saved MSR to
> > indicate
> > that the hardware is hot and a restore is not needed, children need
> > to turn
> > it off as when they do get scheduled, there's no way their registers
> > could
> > have been hot.
> >
> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > ---
> > arch/powerpc/kernel/process.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/process.c
> > b/arch/powerpc/kernel/process.c
> > index 398f7bf..441d9e5 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1243,6 +1243,7 @@ int copy_thread(unsigned long clone_flags,
> > unsigned long usp,
> >
> > f = ret_from_fork;
> > }
> > + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
>
> Is this a current bug?
>
It is impossible currently because saving the registers (of the parent, before
the creating the child) also forces a giveup of the facilities.
The next patch in the series decouples the saving and the giving up which
makes this situation possible.
> Mikey
>
> > sp -= STACK_FRAME_OVERHEAD;
> >
> > /*
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-11-23 3:20 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-18 3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
2015-11-18 3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2015-11-23 0:23 ` Michael Neuling
2015-11-23 0:58 ` Cyril Bur
2015-11-23 1:06 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
2015-11-23 0:34 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2015-11-19 11:36 ` [3/8] " Michael Ellerman
2015-11-23 1:04 ` [PATCH 3/8] " Michael Neuling
2015-11-18 3:26 ` [PATCH 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
2015-11-23 1:08 ` Michael Neuling
2015-11-23 3:20 ` Cyril Bur
2015-11-18 3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2015-11-20 11:01 ` Michael Ellerman
2015-11-22 22:18 ` Cyril Bur
2015-11-22 23:07 ` Michael Ellerman
2015-11-23 1:29 ` Michael Neuling
2015-11-18 3:26 ` [PATCH 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
2015-11-18 3:26 ` [PATCH 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
2015-11-18 3:26 ` [PATCH 8/8] powerpc: Add the ability to save VSX " Cyril Bur
2015-11-18 14:51 ` [PATCH 0/8] FP/VEC/VSX switching optimisations David Laight
2015-11-18 23:01 ` Cyril Bur
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).