From: Cyril Bur <cyrilbur@gmail.com>
To: linuxppc-dev@ozlabs.org
Subject: [PATCH V2 0/8] FP/VEC/VSX switching optimisations
Date: Fri, 15 Jan 2016 16:04:06 +1100 [thread overview]
Message-ID: <1452834254-22078-1-git-send-email-cyrilbur@gmail.com> (raw)
Cover-letter for V1 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html
Version one of this series used a cmpb instruction in handcrafted assembly
which it turns out is not supported on older power machines. Michael
suggested replacing it with crandc, which instruction works fine. Testing
also showed no difference in performance between using cmpb and crandc.
The primary objective improving the syscall hot path. While gut feelings
may be that avoiding C is quicker it may also be the case that the C is not
significantly slower. If C is not slower using C would provide a distinct
readability and maintainability advantage.
I have benchmarked a few possible scenarios:
1. Always calling into C.
2. Testing for the common case in assembly and calling into C
3. Using crandc in the full assembly check
All benchmarks are the average of 50 runs of Antons context switch
benchmark http://www.ozlabs.org/~anton/junkcode/context_switch2.c with
the kernel and ramdisk run under QEMU/KVM on a POWER8.
To test for all cases a variety of flags were passed to the benchmark to
see the effect of only touching a subset of the 'math' register space.
The absolute numbers are in context switches per second can vary greatly
depending on the how the kernel is run (virt/powernv/ramdisk/disk) and as
such units aren't very relevant here as we're interested in a speedup.
The most interesting number here is the %speedup over the previous
scenario. In this case 100% means there was no difference, therefore <100%
indicates a decrease in performance and >100% an increase.
For 1 - Always calling into C
Flags | Average | Stddev |
========================================
none | 2059785.00 | 14217.64 |
fp | 1766297.65 | 10576.64 |
fp altivec | 1636125.04 | 5693.84 |
fp vector | 1640951.76 | 13141.93 |
altivec | 1815133.80 | 10450.46 |
altivec vector | 1636438.60 | 5475.12 |
vector | 1639628.16 | 11456.06 |
all | 1629516.32 | 7785.36 |
For 2 - Common case checking in asm before calling into C
Flags | Average | Stddev | %speedup vs 1 |
========================================================
none | 2058003.64 | 20464.22 | 99.91 |
fp | 1757245.80 | 14455.45 | 99.49 |
fp altivec | 1658240.12 | 6318.41 | 101.35 |
fp vector | 1668912.96 | 9451.47 | 101.70 |
altivec | 1815223.96 | 4819.82 | 100.00 |
altivec vector | 1648805.32 | 15100.50 | 100.76 |
vector | 1663654.68 | 13814.79 | 101.47 |
all | 1644884.04 | 11315.74 | 100.94 |
For 3 - Full checking in ASM using crandc instead of cmpb
Flags | Average | Stddev | %speedup vs 2 |
========================================================
none | 2066930.52 | 19426.46 | 100.43 |
fp | 1781653.24 | 7744.55 | 101.39 |
fp altivec | 1653125.84 | 6727.36 | 99.69 |
fp vector | 1656011.04 | 11678.56 | 99.23 |
altivec | 1824934.72 | 16842.19 | 100.53 |
altivec vector | 1649486.92 | 3219.14 | 100.04 |
vector | 1662420.20 | 9609.34 | 99.93 |
all | 1647933.64 | 11121.22 | 100.19 |
>From these numbers it appears that reducing the call to C in the common
case is beneficial, possibly up to 1.5% speedup over always calling C. The
benefit of the more complicated asm checking does appear to be very slight,
fractions of a percent at best. In balance it may prove wise to use the
option 2, there are much bigger fish to fry in terms of performance, the
complexity of the assembly for a small fraction of one percent improvement
is not worth it at this stage.
Version 2 of this series also addresses some comments from Mikey Neuling in
the tests such as adding .gitignore and forcing 64 bit compiles of the
tests as they use 64 bit only instructions.
Cyril Bur (8):
selftests/powerpc: Test the preservation of FPU and VMX regs across
syscall
selftests/powerpc: Test preservation of FPU and VMX regs across
preemption
selftests/powerpc: Test FPU and VMX regs in signal ucontext
powerpc: Explicitly disable math features when copying thread
powerpc: Restore FPU/VEC/VSX if previously used
powerpc: Add the ability to save FPU without giving it up
powerpc: Add the ability to save Altivec without giving it up
powerpc: Add the ability to save VSX without giving it up
arch/powerpc/include/asm/processor.h | 2 +
arch/powerpc/include/asm/switch_to.h | 5 +-
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/entry_64.S | 21 +-
arch/powerpc/kernel/fpu.S | 25 +--
arch/powerpc/kernel/ppc_ksyms.c | 4 -
arch/powerpc/kernel/process.c | 144 +++++++++++--
arch/powerpc/kernel/vector.S | 45 +---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/basic_asm.h | 26 +++
tools/testing/selftests/powerpc/math/.gitignore | 6 +
tools/testing/selftests/powerpc/math/Makefile | 19 ++
tools/testing/selftests/powerpc/math/fpu_asm.S | 195 ++++++++++++++++++
tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++++++++++
tools/testing/selftests/powerpc/math/fpu_signal.c | 135 ++++++++++++
tools/testing/selftests/powerpc/math/fpu_syscall.c | 90 ++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 229 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 ++++++++++
tools/testing/selftests/powerpc/math/vmx_signal.c | 138 +++++++++++++
tools/testing/selftests/powerpc/math/vmx_syscall.c | 92 +++++++++
20 files changed, 1326 insertions(+), 81 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
create mode 100644 tools/testing/selftests/powerpc/math/Makefile
create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
--
2.7.0
next reply other threads:[~2016-01-15 5:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-15 5:04 Cyril Bur [this message]
2016-01-15 5:04 ` [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2016-01-15 5:04 ` [PATCH V2 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
2016-01-15 5:04 ` [PATCH V2 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2016-01-15 5:04 ` [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
2016-01-15 5:42 ` Michael Neuling
2016-01-15 5:54 ` Cyril Bur
2016-01-15 6:04 ` Michael Neuling
2016-01-15 5:04 ` [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2016-01-15 6:02 ` Michael Neuling
2016-01-18 2:05 ` Cyril Bur
2016-01-15 5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
2016-01-15 6:08 ` Michael Neuling
2016-01-15 7:38 ` Denis Kirjanov
2016-01-15 7:42 ` Denis Kirjanov
2016-01-15 5:04 ` [PATCH V2 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
2016-01-15 5:04 ` [PATCH V2 8/8] powerpc: Add the ability to save VSX " Cyril Bur
2016-01-15 6:25 ` Michael Neuling
2016-01-18 2:10 ` Cyril Bur
2016-01-18 4:29 ` Michael Neuling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452834254-22078-1-git-send-email-cyrilbur@gmail.com \
--to=cyrilbur@gmail.com \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).