* [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP
@ 2010-12-06 23:40 Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
0 siblings, 2 replies; 3+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel
This implements lazy save of FP, VMX and VSX state on SMP 64bit and 32
bit powerpc.
Currently we only do lazy save in UP, but this patch set extends this to
SMP. We always do lazy restore.
For VMX, on a context switch we do the following:
- if we are switching to a CPU that currently holds the new processes
state, just turn on VMX in the MSR (this is the lazy/quick case)
- if the new processes state is in the thread_struct, turn VMX off.
- if the new processes state is in someone else's CPU, IPI that CPU to
giveup it's state and turn VMX off in the MSR (slow IPI case).
We always start the new process at this point, irrespective of if we
have the state or not in the thread struct or current CPU.
So in the slow case, we attempt to avoid the IPI latency by starting
the process immediately and only waiting for the state to be flushed
when the process actually needs VMX. ie. when we take the VMX
unavailable exception after the context switch.
FP is implemented in a similar way. VSX reuses the FP and VMX code as
it doesn't have any additional state over what FP and VMX used.
I've been benchmarking with Anton Blanchard's context_switch.c benchmark
found here:
http://ozlabs.org/~anton/junkcode/context_switch.c
Using this benchmark as is gives no degradation in performance with these
patches applied.
Inserting a simple FP instruction into one of the threads (gives the
nice save/restore lazy case), I get about a 4% improvement in context
switching rates with my patches applied. I get similar results VMX.
With a simple VSX instruction (VSX state is 64x128bit registers) in 1
thread I get an 8% bump in performance with these patches.
With FP/VMX/VSX instructions in both threads, I get no degradation in
performance.
Running lmbench doesn't have any degradation in performance.
Most of my benchmarking and testing has been done on 64 bit systems.
I've tested 32 bit FP but I've not tested 32 bit VMX at all.
There is probably some optimisations to my asm code that can also be
made. I've been concentrating on correctness, as opposed to speed
with the asm code, since if you get a lazy context switch, you skip
all the asm now anyway.
Whole series is bisectable to compile with various 64/32bit SMP/UP
FPU/VMX/VSX config options on and off.
I really hate the include file changes in this series. Getting the
call_single_data in the powerpc threads_struct was a PITA :-)
Mikey
Signed-off-by: Michael Neuling <mikey@neuling.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [RFC/PATCH 1/7] Add csd_locked function
2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
1 sibling, 0 replies; 3+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel
[-- Attachment #1: csd_locked.patch --]
[-- Type: text/plain, Size: 1148 bytes --]
Add csd_locked function to determine if a struct call_single_data is
currently locked. This can be used to see if an IPI can be called
again using this call_single_data.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
kernel/smp.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
Index: linux-lazy/kernel/smp.c
===================================================================
--- linux-lazy.orig/kernel/smp.c
+++ linux-lazy/kernel/smp.c
@@ -12,6 +12,7 @@
#include <linux/gfp.h>
#include <linux/smp.h>
#include <linux/cpu.h>
+#include <linux/hardirq.h>
static struct {
struct list_head queue;
@@ -131,6 +132,22 @@
}
/*
+ * Determine if a csd is currently locked. This can be used to
+ * determine if an IPI is currently pending using this csd already.
+ */
+int csd_locked(struct call_single_data *data)
+{
+ WARN_ON(preemptible());
+
+ /* Ensure flags have propagated */
+ smp_mb();
+
+ if (data->flags & CSD_FLAG_LOCK)
+ return 1;
+ return 0;
+}
+
+/*
* Insert a previously allocated call_single_data element
* for execution on the given CPU. data must already have
* ->func, ->info, and ->flags set.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places
2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
1 sibling, 0 replies; 3+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel
[-- Attachment #1: csd_thread_struct_prepare.patch --]
[-- Type: text/plain, Size: 2592 bytes --]
We need to put struct call_single_data in the powerpc thread_struct,
but can't without this.
The thread_struct is in processor.h. To add a struct call_single_data
to the thread_struct asm/processor.h must include linux/smp.h. When
linux/smp.h is added to processor.h this creates an include loop via
with list.h via:
linux/list.h includes:
linux/prefetch.h includes:
asm/processor.h (for powerpc) includes:
linux/smp.h includes:
linux/list.h
This loops results in an "incomplete list type" compile when using
struct list_head as used in struct call_single_data.
This patch rearanges some include files to avoid this loop.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/linux/call_single_data.h | 14 ++++++++++++++
include/linux/list.h | 4 +++-
include/linux/smp.h | 8 +-------
3 files changed, 18 insertions(+), 8 deletions(-)
Index: linux-lazy/include/linux/call_single_data.h
===================================================================
--- /dev/null
+++ linux-lazy/include/linux/call_single_data.h
@@ -0,0 +1,14 @@
+#ifndef __LINUX_CALL_SINGLE_DATA_H
+#define __LINUX_CALL_SINGLE_DATA_H
+
+#include <linux/list.h>
+
+struct call_single_data {
+ struct list_head list;
+ void (*func) (void *info);
+ void *info;
+ u16 flags;
+ u16 priv;
+};
+
+#endif /* __LINUX_CALL_SINGLE_DATA_H */
Index: linux-lazy/include/linux/list.h
===================================================================
--- linux-lazy.orig/include/linux/list.h
+++ linux-lazy/include/linux/list.h
@@ -4,7 +4,6 @@
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/poison.h>
-#include <linux/prefetch.h>
/*
* Simple doubly linked list implementation.
@@ -16,6 +15,9 @@
* using the generic single-entry routines.
*/
+#include <linux/prefetch.h>
+#include <asm/system.h>
+
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define LIST_HEAD(name) \
Index: linux-lazy/include/linux/smp.h
===================================================================
--- linux-lazy.orig/include/linux/smp.h
+++ linux-lazy/include/linux/smp.h
@@ -9,18 +9,12 @@
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/list.h>
+#include <linux/call_single_data.h>
#include <linux/cpumask.h>
extern void cpu_idle(void);
typedef void (*smp_call_func_t)(void *info);
-struct call_single_data {
- struct list_head list;
- smp_call_func_t func;
- void *info;
- u16 flags;
- u16 priv;
-};
/* total number of cpus in this system (may exceed NR_CPUS) */
extern unsigned int total_cpus;
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-12-06 23:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox