linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] FP/VEC/VSX switching optimisations
@ 2015-11-18  3:26 Cyril Bur
  2015-11-18  3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Cyril Bur @ 2015-11-18  3:26 UTC (permalink / raw)
  To: mikey, anton, linuxppc-dev

Hi,

These patches are an extension of the work done by Anton
https://patchwork.ozlabs.org/patch/537621/, they'll need to be applied on
top of them.

The goal of these patches is to rework how the 'math' registers (FP, VEC
and VSX) are context switched. Currently the kernel adopts a lazy approach,
always switching userspace tasks with all three facilities disabled and
loads in each set of registers upon receiving each unavailable exception.
The kernel does try to avoid disabling the features in the syscall quick
path but it during testing it appears that even what should be a simple
syscall still causes the kernel to use some facilities (vectorised memcpy
for example) for its self and therefore disable it for the user task.

The lazy approach makes for a small amount of time spent restoring
userspace state and if tasks don't use any of these facilities it is the
correct thing to do. In recent years, new workloads and new features such
as auto vectorisation in GCC have meant that the use of these facilities by
userspace has increased, so much so that some workloads can have a task
take an FP unavailable exception and a VEC unavailable exception almost
every time slice.

This series removes the general laziness in favour of a more selective
approach. If a task uses any of the 'math' facilities the kernel will load
the registers and enable the facilities for future time slices as the
assumption is that the use is likely to continue for some time. This
removes the cost of having to take an exception.

These patches also adds logic to detect if a task had been using a facility
and optimises in the case where the registers are still hot, this provides
another speedup as not only is the cost of the exception saved but the cost
of copying up to 64 x 128 bit registers is also removed.

With these patches applied on top of Antons patches I observe a significant
improvement with Antons context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

Using an LE kernel compiled with pseries_le_defconfig

Running:
./context_switch2 --test=yield 8 8
and adding one of --fp, --altivec or --vector
Gives a 5% improvement on a POWER8 CPU.

./context_switch2 --test=yield --fp --altivec --vector 8 8
Gives a 15% improvement on a POWER8 CPU.

I'll take this opportunity to note that 15% can be somewhat misleading. It
may be reasonable to assume that each of the optimisations has had a
compounding effect, this isn't incorrect and the reason behind the apparent
compounding reveals a lot about where the current bottleneck is.

The tests always touch FP first, then VEC then VSX which is the guaranteed
worst case for the way the kernel currently operates. This behaviour will
trigger three subsequent unavailable exceptions. Since the kernel currently
enables all three facilities after taking a VSX unavailable the tests can
be modified to touch VSX->VEC->FP in this order the difference in
performance when touching all three only 5%. There is a compounding effect
in so far as the cost of taking multiple unavailable exception is removed.
This testing also demonstrates that the cost of the exception is by far the
most expensive part of the current lazy approach.

Cyril Bur (8):
  selftests/powerpc: Test the preservation of FPU and VMX regs across
    syscall
  selftests/powerpc: Test preservation of FPU and VMX regs across
    preemption
  selftests/powerpc: Test FPU and VMX regs in signal ucontext
  powerpc: Explicitly disable math features when copying thread
  powerpc: Restore FPU/VEC/VSX if previously used
  powerpc: Add the ability to save FPU without giving it up
  powerpc: Add the ability to save Altivec without giving it up
  powerpc: Add the ability to save VSX without giving it up

 arch/powerpc/include/asm/processor.h               |   2 +
 arch/powerpc/include/asm/switch_to.h               |   5 +-
 arch/powerpc/kernel/asm-offsets.c                  |   2 +
 arch/powerpc/kernel/entry_64.S                     |  55 +++++-
 arch/powerpc/kernel/fpu.S                          |  25 +--
 arch/powerpc/kernel/ppc_ksyms.c                    |   4 -
 arch/powerpc/kernel/process.c                      | 144 ++++++++++++--
 arch/powerpc/kernel/vector.S                       |  45 +----
 tools/testing/selftests/powerpc/Makefile           |   3 +-
 tools/testing/selftests/powerpc/math/Makefile      |  19 ++
 tools/testing/selftests/powerpc/math/basic_asm.h   |  26 +++
 tools/testing/selftests/powerpc/math/fpu_asm.S     | 185 +++++++++++++++++
 tools/testing/selftests/powerpc/math/fpu_preempt.c |  92 +++++++++
 tools/testing/selftests/powerpc/math/fpu_signal.c  | 119 +++++++++++
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  79 ++++++++
 tools/testing/selftests/powerpc/math/vmx_asm.S     | 219 +++++++++++++++++++++
 tools/testing/selftests/powerpc/math/vmx_preempt.c |  92 +++++++++
 tools/testing/selftests/powerpc/math/vmx_signal.c  | 124 ++++++++++++
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  81 ++++++++
 19 files changed, 1240 insertions(+), 81 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

-- 
2.6.2

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-11-23  3:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-18  3:26 [PATCH 0/8] FP/VEC/VSX switching optimisations Cyril Bur
2015-11-18  3:26 ` [PATCH 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2015-11-23  0:23   ` Michael Neuling
2015-11-23  0:58     ` Cyril Bur
2015-11-23  1:06       ` Michael Neuling
2015-11-18  3:26 ` [PATCH 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
2015-11-23  0:34   ` Michael Neuling
2015-11-18  3:26 ` [PATCH 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2015-11-19 11:36   ` [3/8] " Michael Ellerman
2015-11-23  1:04   ` [PATCH 3/8] " Michael Neuling
2015-11-18  3:26 ` [PATCH 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
2015-11-23  1:08   ` Michael Neuling
2015-11-23  3:20     ` Cyril Bur
2015-11-18  3:26 ` [PATCH 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2015-11-20 11:01   ` Michael Ellerman
2015-11-22 22:18     ` Cyril Bur
2015-11-22 23:07       ` Michael Ellerman
2015-11-23  1:29   ` Michael Neuling
2015-11-18  3:26 ` [PATCH 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
2015-11-18  3:26 ` [PATCH 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
2015-11-18  3:26 ` [PATCH 8/8] powerpc: Add the ability to save VSX " Cyril Bur
2015-11-18 14:51 ` [PATCH 0/8] FP/VEC/VSX switching optimisations David Laight
2015-11-18 23:01   ` Cyril Bur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).