linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/11 BROKEN] move FPU context loading to userspace switch
@ 2015-01-11 21:46 riel
  2015-01-11 21:46 ` [RFC PATCH 01/11] x86,fpu: document the data structures a little riel
                   ` (11 more replies)
  0 siblings, 12 replies; 76+ messages in thread
From: riel @ 2015-01-11 21:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, hpa, matt.fleming, bp, oleg, pbonzini, tglx, luto

Currently the kernel will always load the FPU context, even
when switching to a kernel thread, or to an idle thread. In
the case of a task on a KVM VCPU going idle for a bit, and
waking up again later, this creates a vastly inefficient
chain of FPU context saves & loads:

1) save task FPU context, load idle task FPU context (in KVM guest)
2) trap to host
3) save VCPU guest FPU context, load VCPU userspace context (__kernel_fpu_end)
4) save VCPU userspace context, load idle thread FPU context
5) save idle thread FPU context, load VCPU userspace FPU context
6) save VCPU userspace FPU context, load guest FPU context (__kernel_fpu_begin)
7) enter guest
8) save idle task FPU context, load task FPU context (in KVM guest)

This is a total of 6 FPU saves and 6 restores, touching 4 different
FPU contexts, only one of which is ever used. The hardware optimizes
FPU load and restore pretty well, but 12 operations involving 384
bytes of data adds substantial overhead. Additionally, the XSTOROPT
optimization does not work across VMENTER / VMEXIT boundaries, so
things are slower than they would be on bare metal.

This patch series reduces it to two saves (1) and (3), and one load
(6), if the VCPU and the task inside the guest both stay on the same
CPU. The load could be optimized away in a subsequent series, by
recognizing that the emulator did not touch the in-memory FPU state
for the guest.

This could also give a small performance gain for bare metal
applications that wake up and go idle repeatedly, staying on the
same CPU.

Where it all falls apart (probably due to a stupid mistake on my end)
is the signal handling code.

In the signal handling code, the registers (including FPU state) are
all saved to the user space stack, and on sigreturn they are loaded
back in. The signal handler setup code needs to be fixed to deal with
the other changes, but I am apparently doing that incorrectly.

I have been staring at the code for a few weeks now, and do not
appear to be any closer to figuring out what I did wrong in the last
patch of this series.

I would really appreciate it if people with better knowledge of the
signal handler and/or FPU code could take a look :)


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2015-03-03 11:28 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-11 21:46 [RFC PATCH 0/11 BROKEN] move FPU context loading to userspace switch riel
2015-01-11 21:46 ` [RFC PATCH 01/11] x86,fpu: document the data structures a little riel
2015-01-12 21:18   ` Borislav Petkov
2015-01-12 21:38     ` Rik van Riel
2015-01-12 21:52   ` Dave Hansen
2015-01-13 15:59     ` Rik van Riel
2015-01-11 21:46 ` [RFC PATCH 02/11] x86,fpu: replace fpu_switch_t with a thread flag riel
2015-01-13 15:24   ` Oleg Nesterov
2015-01-13 16:35     ` Rik van Riel
2015-01-13 16:55       ` Oleg Nesterov
2015-01-11 21:46 ` [RFC PATCH 03/11] x86,fpu: move __thread_fpu_begin to when the task has the fpu riel
2015-01-13 15:24   ` Oleg Nesterov
2015-01-13 16:37     ` Rik van Riel
2015-01-11 21:46 ` [RFC PATCH 04/11] x86,fpu: defer FPU restore until return to userspace riel
2015-01-13 15:53   ` Oleg Nesterov
2015-01-13 17:07   ` Andy Lutomirski
2015-01-13 17:11   ` Oleg Nesterov
2015-01-13 17:18     ` Andy Lutomirski
2015-01-13 17:44       ` Rik van Riel
2015-01-13 17:57         ` Andy Lutomirski
2015-01-13 18:13           ` Rik van Riel
2015-01-13 18:26             ` Andy Lutomirski
2015-01-13 17:54     ` Rik van Riel
2015-01-13 18:22       ` Oleg Nesterov
2015-01-13 18:30         ` Oleg Nesterov
2015-01-13 20:06           ` Rik van Riel
2015-01-14 17:56             ` Oleg Nesterov
2015-01-13 17:58   ` Oleg Nesterov
2015-01-13 19:32     ` Rik van Riel
2015-01-11 21:46 ` [RFC PATCH 05/11] x86,fpu: ensure FPU state is reloaded from memory if task is traced riel
2015-01-13 16:19   ` Oleg Nesterov
2015-01-13 16:33     ` Rik van Riel
2015-01-13 16:50       ` Oleg Nesterov
2015-01-13 16:57         ` Rik van Riel
2015-01-11 21:46 ` [RFC PATCH 06/11] x86,fpu: lazily skip fpu restore with eager fpu mode, too riel
2015-01-13 17:11   ` Andy Lutomirski
2015-01-13 20:43     ` Rik van Riel
2015-01-14 18:36   ` Oleg Nesterov
2015-01-15  2:49     ` Rik van Riel
2015-01-15 19:34       ` Oleg Nesterov
2015-01-11 21:46 ` [RFC PATCH 07/11] x86,fpu: store current fpu pointer, instead of fpu_owner_task riel
2015-01-11 21:46 ` [RFC PATCH 08/11] x86,fpu: restore user FPU state lazily after __kernel_fpu_end riel
2015-01-14 18:43   ` Oleg Nesterov
2015-01-14 19:08     ` Oleg Nesterov
2015-01-11 21:46 ` [RFC PATCH 09/11] x86,fpu,kvm: keep vcpu FPU active as long as it is resident riel
2015-01-11 21:46 ` [RFC PATCH 10/11] x86,fpu: fix fpu_copy to deal with not-loaded fpu riel
2015-01-11 21:46 ` [RFC PATCH 11/11] (BROKEN) x86,fpu: broken signal handler stack setup riel
2015-01-15 19:19 ` [PATCH 0/3] x86, fpu: kernel_fpu_begin/end initial cleanups/fix Oleg Nesterov
2015-01-15 19:19   ` [PATCH 1/3] x86, fpu: introduce per-cpu "bool in_kernel_fpu" Oleg Nesterov
2015-01-16  2:22     ` Rik van Riel
2015-01-20 12:54     ` [tip:x86/fpu] x86, fpu: Introduce per-cpu in_kernel_fpu state tip-bot for Oleg Nesterov
2015-01-15 19:20   ` [PATCH 2/3] x86, fpu: don't abuse ->has_fpu in __kernel_fpu_{begin,end}() Oleg Nesterov
2015-01-16  2:27     ` Rik van Riel
2015-01-16 15:54       ` Oleg Nesterov
2015-01-16 16:07         ` Rik van Riel
2015-01-20 12:55     ` [tip:x86/fpu] x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin /end() tip-bot for Oleg Nesterov
2015-01-15 19:20   ` [PATCH 3/3] x86, fpu: fix math_state_restore() race with kernel_fpu_begin() Oleg Nesterov
2015-01-16  2:30     ` Rik van Riel
2015-01-16 16:03       ` Oleg Nesterov
2015-01-20 12:55     ` [tip:x86/fpu] x86, fpu: Fix " tip-bot for Oleg Nesterov
2015-01-19 18:51   ` [PATCH 0/3] x86, fpu: more eagerfpu cleanups Oleg Nesterov
2015-01-19 18:51     ` [PATCH 1/3] x86, fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu() Oleg Nesterov
2015-01-20 14:15       ` Rik van Riel
2015-02-20 18:13       ` Borislav Petkov
2015-03-03 11:27       ` [tip:x86/fpu] x86/fpu: " tip-bot for Oleg Nesterov
2015-01-19 18:51     ` [PATCH 2/3] x86, fpu: always allow FPU in interrupt " Oleg Nesterov
2015-01-20 14:46       ` Rik van Riel
2015-01-20 22:46       ` Andy Lutomirski
2015-02-20 21:48       ` Borislav Petkov
2015-03-03 11:28       ` [tip:x86/fpu] x86/fpu: Always " tip-bot for Oleg Nesterov
2015-01-19 18:52     ` [PATCH 3/3] x86, fpu: don't abuse FPU in kernel threads " Oleg Nesterov
2015-01-20 14:53       ` Rik van Riel
2015-02-23 15:31       ` Borislav Petkov
2015-03-03 11:28       ` [tip:x86/fpu] x86/fpu: Don' t " tip-bot for Oleg Nesterov
2015-02-20 12:10     ` [PATCH 0/3] x86, fpu: more eagerfpu cleanups Borislav Petkov
2015-02-20 13:30       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).