* [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
@ 2009-03-13 20:23 Ryan Arnold
2009-03-13 21:15 ` Kumar Gala
2009-03-14 8:20 ` Michael Neuling
0 siblings, 2 replies; 13+ messages in thread
From: Ryan Arnold @ 2009-03-13 20:23 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Will Schmidt, Steven Munroe
Hi all,
Those of us working on the POWER toolchain can envision a certain class
of customers who may benefit from intelligently disabling certain
register class enable bits on context switches, i.e. not disabling by
default.
Currently, per process, if the MSR enable bits for FPs, VRs or VSRs are
set to disabled, an interrupt will be generated as soon as an FP, VMX,
or VSX instruction is encountered. At this point the kernel enables the
relevant bits in the MSR and returns.
Currently, the kernel will disable all of the bits on a context switch.
If a customer _knows_ a process will be using a register class
extensively, e.g. VRs, they're paying the interrupt->enable-VMX price
with every context switch. It'd be nice if we could intelligently leave
the bits enabled.
Solutions:
- A boot flag which always enables VSRs, VRs, FPRs, etc. These are
cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs.
- A heuristic which permanently enables said register classes for a
process if they've been enabled during the previous X interrupts.
- The same heuristic could disable the register class bits after a
certain criteria is met.
We have some ideas on how to benchmark this to verify the expense of the
interrupt->enable. As it presently works this stands in the way of
using VMX or VSX for optimized string routines in GLIBC.
Regards,
Ryan S. Arnold
IBM Linux Technology Center
Linux Toolchain Development
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold
@ 2009-03-13 21:15 ` Kumar Gala
2009-03-13 22:45 ` Benjamin Herrenschmidt
2009-03-14 8:20 ` Michael Neuling
1 sibling, 1 reply; 13+ messages in thread
From: Kumar Gala @ 2009-03-13 21:15 UTC (permalink / raw)
To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
On Mar 13, 2009, at 3:23 PM, Ryan Arnold wrote:
> Hi all,
>
> Those of us working on the POWER toolchain can envision a certain
> class
> of customers who may benefit from intelligently disabling certain
> register class enable bits on context switches, i.e. not disabling by
> default.
>
> Currently, per process, if the MSR enable bits for FPs, VRs or VSRs
> are
> set to disabled, an interrupt will be generated as soon as an FP, VMX,
> or VSX instruction is encountered. At this point the kernel enables
> the
> relevant bits in the MSR and returns.
>
> Currently, the kernel will disable all of the bits on a context
> switch.
>
> If a customer _knows_ a process will be using a register class
> extensively, e.g. VRs, they're paying the interrupt->enable-VMX price
> with every context switch. It'd be nice if we could intelligently
> leave
> the bits enabled.
>
> Solutions:
> - A boot flag which always enables VSRs, VRs, FPRs, etc. These are
> cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs.
>
> - A heuristic which permanently enables said register classes for a
> process if they've been enabled during the previous X interrupts.
>
> - The same heuristic could disable the register class bits after a
> certain criteria is met.
>
> We have some ideas on how to benchmark this to verify the expense of
> the
> interrupt->enable. As it presently works this stands in the way of
> using VMX or VSX for optimized string routines in GLIBC.
If these applications are aware they are heavy users (of FP, VMX, VSX)
can we not use a sysctl()? Doing so wouldn't be that difficult.
I think trying to do something based on a runtime heuristic sounds a
bit iffy.
- k
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 21:15 ` Kumar Gala
@ 2009-03-13 22:45 ` Benjamin Herrenschmidt
2009-03-13 23:52 ` Josh Boyer
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-13 22:45 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
> If these applications are aware they are heavy users (of FP, VMX, VSX)
> can we not use a sysctl()? Doing so wouldn't be that difficult.
>
> I think trying to do something based on a runtime heuristic sounds a
> bit iffy.
Another option might be simply to say that if an app has used FP, VMX or
VSX -once-, then it's likely to do it again and just keep re-enabling
it :-)
I'm serious here, do we know that many cases where these things are used
seldomly once in a while ?
An if we do, maybe then a simple counter in the task struct... if the
app re-enables it more than a few consecutive switches, then make it
stick. I have the feeling that would work out reasonably well.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 22:45 ` Benjamin Herrenschmidt
@ 2009-03-13 23:52 ` Josh Boyer
2009-03-14 2:31 ` Ryan Arnold
2009-03-14 13:49 ` Segher Boessenkool
2 siblings, 0 replies; 13+ messages in thread
From: Josh Boyer @ 2009-03-13 23:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
On Sat, Mar 14, 2009 at 09:45:51AM +1100, Benjamin Herrenschmidt wrote:
>
>> If these applications are aware they are heavy users (of FP, VMX, VSX)
>> can we not use a sysctl()? Doing so wouldn't be that difficult.
>>
>> I think trying to do something based on a runtime heuristic sounds a
>> bit iffy.
>
>Another option might be simply to say that if an app has used FP, VMX or
>VSX -once-, then it's likely to do it again and just keep re-enabling
>it :-)
>
>I'm serious here, do we know that many cases where these things are used
>seldomly once in a while ?
This seems reasonable to me.
>An if we do, maybe then a simple counter in the task struct... if the
>app re-enables it more than a few consecutive switches, then make it
>stick. I have the feeling that would work out reasonably well.
Gee. That sounds like a runtime heuristic :)
josh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 22:45 ` Benjamin Herrenschmidt
2009-03-13 23:52 ` Josh Boyer
@ 2009-03-14 2:31 ` Ryan Arnold
2009-03-14 3:22 ` Benjamin Herrenschmidt
2009-03-14 13:55 ` Segher Boessenkool
2009-03-14 13:49 ` Segher Boessenkool
2 siblings, 2 replies; 13+ messages in thread
From: Ryan Arnold @ 2009-03-14 2:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
On Sat, 2009-03-14 at 09:45 +1100, Benjamin Herrenschmidt wrote:
> > If these applications are aware they are heavy users (of FP, VMX, VSX)
> > can we not use a sysctl()? Doing so wouldn't be that difficult.
> >
> > I think trying to do something based on a runtime heuristic sounds a
> > bit iffy.
>
> Another option might be simply to say that if an app has used FP, VMX or
> VSX -once-, then it's likely to do it again and just keep re-enabling
> it :-)
>
> I'm serious here, do we know that many cases where these things are used
> seldomly once in a while ?
>
> An if we do, maybe then a simple counter in the task struct... if the
> app re-enables it more than a few consecutive switches, then make it
> stick. I have the feeling that would work out reasonably well.
Both of these thoughts came to mind. I don't have a particular
preference. It's very likely that a process which results in the
enabling of FP,VMX, or VSX may continue to use the facility for the
duration of it's lifetime. Threads would be even more likely to exhibit
this behavior.
The case where this might not be true is if we use VMX or VSX for string
routine optimization in GLIBC. This will require metrics to prove it's
utility of course. Perhaps what I can do in the string routines is
check if the bits are already set and use the facility if it is already
enabled and the usage scenario warrants it, i.e. if the size and
alignment of the data are in a sweet spot as indicated by profiling
data.
Regards,
Ryan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-14 2:31 ` Ryan Arnold
@ 2009-03-14 3:22 ` Benjamin Herrenschmidt
2009-03-14 13:55 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-14 3:22 UTC (permalink / raw)
To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
> Both of these thoughts came to mind. I don't have a particular
> preference. It's very likely that a process which results in the
> enabling of FP,VMX, or VSX may continue to use the facility for the
> duration of it's lifetime. Threads would be even more likely to exhibit
> this behavior.
>
> The case where this might not be true is if we use VMX or VSX for string
> routine optimization in GLIBC. This will require metrics to prove it's
> utility of course. Perhaps what I can do in the string routines is
> check if the bits are already set and use the facility if it is already
> enabled and the usage scenario warrants it, i.e. if the size and
> alignment of the data are in a sweet spot as indicated by profiling
> data.
Or we just add some instrumentation to today kernel to see how often
those gets enabled and then not-re-enabled on the next time slice and do
some stats with common workloads.
Ben.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold
2009-03-13 21:15 ` Kumar Gala
@ 2009-03-14 8:20 ` Michael Neuling
1 sibling, 0 replies; 13+ messages in thread
From: Michael Neuling @ 2009-03-14 8:20 UTC (permalink / raw)
To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
> Hi all,
>
> Those of us working on the POWER toolchain can envision a certain class
> of customers who may benefit from intelligently disabling certain
> register class enable bits on context switches, i.e. not disabling by
> default.
>
> Currently, per process, if the MSR enable bits for FPs, VRs or VSRs are
> set to disabled, an interrupt will be generated as soon as an FP, VMX,
> or VSX instruction is encountered. At this point the kernel enables the
> relevant bits in the MSR and returns.
>
> Currently, the kernel will disable all of the bits on a context switch.
>
> If a customer _knows_ a process will be using a register class
> extensively, e.g. VRs, they're paying the interrupt->enable-VMX price
> with every context switch. It'd be nice if we could intelligently leave
> the bits enabled.
>
> Solutions:
> - A boot flag which always enables VSRs, VRs, FPRs, etc. These are
> cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs.
>
> - A heuristic which permanently enables said register classes for a
> process if they've been enabled during the previous X interrupts.
>
> - The same heuristic could disable the register class bits after a
> certain criteria is met.
Another option is to look at getting lazy save working on SMP. We
currently only enable it on UP compiles.
This would probably have the biggest performance impact when there is
only 1 FP/VSX/VMX application running per CPU.
Mikey
>
> We have some ideas on how to benchmark this to verify the expense of the
> interrupt->enable. As it presently works this stands in the way of
> using VMX or VSX for optimized string routines in GLIBC.
>
> Regards,
>
> Ryan S. Arnold
> IBM Linux Technology Center
> Linux Toolchain Development
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-13 22:45 ` Benjamin Herrenschmidt
2009-03-13 23:52 ` Josh Boyer
2009-03-14 2:31 ` Ryan Arnold
@ 2009-03-14 13:49 ` Segher Boessenkool
2009-03-14 14:58 ` Ryan Arnold
2009-03-16 10:52 ` Gabriel Paubert
2 siblings, 2 replies; 13+ messages in thread
From: Segher Boessenkool @ 2009-03-14 13:49 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
> Another option might be simply to say that if an app has used FP,
> VMX or
> VSX -once-, then it's likely to do it again and just keep re-enabling
> it :-)
>
> I'm serious here, do we know that many cases where these things are
> used
> seldomly once in a while ?
For FP, I believe many apps use it only sporadically. But for VMX
and VSX,
yeah, it might well be optimal to keep it enabled all the time. Someone
should do some profiling...
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-14 2:31 ` Ryan Arnold
2009-03-14 3:22 ` Benjamin Herrenschmidt
@ 2009-03-14 13:55 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2009-03-14 13:55 UTC (permalink / raw)
To: rsa; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev
>> It's very likely that a process which results in the
> enabling of FP,VMX, or VSX may continue to use the facility for the
> duration of it's lifetime. Threads would be even more likely to
> exhibit
> this behavior.
>
> The case where this might not be true is if we use VMX or VSX for
> string
> routine optimization in GLIBC. This will require metrics to prove
> it's
> utility of course.
OTOH, the main cost of using VMX etc. is exactly this register save/
restore,
and it's a win otherwise pretty much always. I.e., as soon as you take
that initial hit, almost anything can get a speedup from VMX. So
even if
you do not see an overall speedup from, say, only some optimised string
routines, it probably is worth it anyway as it pays the initial cost for
enabling more optimisations later.
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-14 13:49 ` Segher Boessenkool
@ 2009-03-14 14:58 ` Ryan Arnold
2009-03-16 0:49 ` Benjamin Herrenschmidt
2009-03-16 10:52 ` Gabriel Paubert
1 sibling, 1 reply; 13+ messages in thread
From: Ryan Arnold @ 2009-03-14 14:58 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev
On Sat, 2009-03-14 at 14:49 +0100, Segher Boessenkool wrote:
> > Another option might be simply to say that if an app has used FP,
> > VMX or
> > VSX -once-, then it's likely to do it again and just keep re-enabling
> > it :-)
> >
> > I'm serious here, do we know that many cases where these things are
> > used
> > seldomly once in a while ?
>
> For FP, I believe many apps use it only sporadically. But for VMX
> and VSX,
> yeah, it might well be optimal to keep it enabled all the time. Someone
> should do some profiling...
We can do some VMX testing on existing POWER6 machines. The VSX
instruction set hasn't been fully implemented in GCC yet so we'll need
to wait a bit for that. Does anyone have an idea for a good VMX/Altivec
benchmark?
Ryan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-14 14:58 ` Ryan Arnold
@ 2009-03-16 0:49 ` Benjamin Herrenschmidt
2009-03-16 6:43 ` Michael Neuling
0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-16 0:49 UTC (permalink / raw)
To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
On Sat, 2009-03-14 at 09:58 -0500, Ryan Arnold wrote:
> We can do some VMX testing on existing POWER6 machines. The VSX
> instruction set hasn't been fully implemented in GCC yet so we'll need
> to wait a bit for that. Does anyone have an idea for a good VMX/Altivec
> benchmark?
Note that there are two aspects to the problem:
- Lazy save/restore on SMP. This would avoid both the save and restore
phases, thus is where the most potential gain is to be made. At the
expense of some tricky IPI work when processes migrate between CPUs.
However, it will only be useful -if- a process using FP/VMX/VSX is
"interrupted" by another process that isn't using them. For example, a
kernel thread. So it's unclear whether that's worth it in practice, ie,
does this happen that often ?
- Always restoring the FP/VMX/VSX state on context switch "in" rather
than taking a fault. This is reasonably simple, but at the potential
expense of adding the save/restore overhead to applications that only
seldomly use these facilities. (Some heuristics might help here).
However, the question here what do this buy us ?
IE, In the worst case scenario, which is HZ=1000, so every 1ms, the
process would have the overhead of an interrupt to do the restore of the
state. IE. The restore state itself doesn't count since it would be done
either way (at context switch vs. in the unavailable interrupt), so all
we win here is the overhead of the actual interrupt, which is
implemented as a fast interrupt in assembly. So we have what here ? 1000
cycles to be pessimistic ? On a 1Ghz CPU, that is 1/1000 of the time
slice, and both of these are rather pessimistic numbers.
So that leaves us with the possible case of 2 tasks using the facility
and running a fraction of the timeslice each, for example because they
are ping-ponging with each other.
Is that something that happens in practice to make it noticeable ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-16 0:49 ` Benjamin Herrenschmidt
@ 2009-03-16 6:43 ` Michael Neuling
0 siblings, 0 replies; 13+ messages in thread
From: Michael Neuling @ 2009-03-16 6:43 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe
> > We can do some VMX testing on existing POWER6 machines. The VSX
> > instruction set hasn't been fully implemented in GCC yet so we'll need
> > to wait a bit for that. Does anyone have an idea for a good VMX/Altivec
> > benchmark?
>
> Note that there are two aspects to the problem:
>
> - Lazy save/restore on SMP. This would avoid both the save and restore
> phases, thus is where the most potential gain is to be made. At the
> expense of some tricky IPI work when processes migrate between CPUs.
>
> However, it will only be useful -if- a process using FP/VMX/VSX is
> "interrupted" by another process that isn't using them. For example, a
> kernel thread. So it's unclear whether that's worth it in practice, ie,
> does this happen that often ?
>
> - Always restoring the FP/VMX/VSX state on context switch "in" rather
> than taking a fault. This is reasonably simple, but at the potential
> expense of adding the save/restore overhead to applications that only
> seldomly use these facilities. (Some heuristics might help here).
>
> However, the question here what do this buy us ?
>
> IE, In the worst case scenario, which is HZ=1000, so every 1ms, the
> process would have the overhead of an interrupt to do the restore of the
> state. IE. The restore state itself doesn't count since it would be done
> either way (at context switch vs. in the unavailable interrupt), so all
> we win here is the overhead of the actual interrupt, which is
> implemented as a fast interrupt in assembly. So we have what here ? 1000
> cycles to be pessimistic ? On a 1Ghz CPU, that is 1/1000 of the time
> slice, and both of these are rather pessimistic numbers.
>
> So that leaves us with the possible case of 2 tasks using the facility
> and running a fraction of the timeslice each, for example because they
> are ping-ponging with each other.
>
> Is that something that happens in practice to make it noticeable ?
I hacked up the below to put stats in /proc/self/sched.
A quick grep through /proc on a rhel5.2 machine (egrep
'(fp_count|switch_count)' /proc/*/sched) shows a few apps use fp a few
dozen times but then stop. This is only init apps like hald, so need to
check some real world apps too.
Ryan: let me know if this allows you to collect some useful stats.
Subject: [PATCH] powerpc: add context switch, fpr & vr stats to /proc/self/sched.
Add a counter for every task switch, fp and vr exception to
/proc/self/sched.
[root@p5-20-p6-e0 ~]# cat /proc/3422/sched |tail -3
switch_count : 559
fp_count : 317
vr_count : 0
[root@p5-20-p6-e0 ~]#
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/include/asm/processor.h | 3 +++
arch/powerpc/kernel/asm-offsets.c | 3 +++
arch/powerpc/kernel/fpu.S | 3 +++
arch/powerpc/kernel/process.c | 3 +++
arch/powerpc/kernel/setup-common.c | 10 ++++++++++
include/linux/seq_file.h | 12 ++++++++++++
kernel/sched_debug.c | 16 ++++------------
7 files changed, 38 insertions(+), 12 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/include/asm/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/processor.h
+++ linux-2.6-ozlabs/arch/powerpc/include/asm/processor.h
@@ -174,11 +174,13 @@ struct thread_struct {
} fpscr;
int fpexc_mode; /* floating-point exception mode */
unsigned int align_ctl; /* alignment handling control */
+ unsigned long fp_count; /* FP restore count */
#ifdef CONFIG_PPC64
unsigned long start_tb; /* Start purr when proc switched in */
unsigned long accum_tb; /* Total accumilated purr for process */
#endif
unsigned long dabr; /* Data address breakpoint register */
+ unsigned long switch_count; /* switch count */
#ifdef CONFIG_ALTIVEC
/* Complete AltiVec register set */
vector128 vr[32] __attribute__((aligned(16)));
@@ -186,6 +188,7 @@ struct thread_struct {
vector128 vscr __attribute__((aligned(16)));
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
+ unsigned long vr_count; /* VSX restore count */
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
/* VSR status */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,14 +74,17 @@ int main(void)
DEFINE(KSP, offsetof(struct thread_struct, ksp));
DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
+ DEFINE(THREAD_SWITCHCOUNT, offsetof(struct thread_struct, switch_count));
DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
+ DEFINE(THREAD_FPCOUNT, offsetof(struct thread_struct, fp_count));
#ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
+ DEFINE(THREAD_VRCOUNT, offsetof(struct thread_struct, vr_count));
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -102,6 +102,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
ori r12,r12,MSR_FP
or r12,r12,r4
std r12,_MSR(r1)
+ ld r4,THREAD_FPCOUNT(r5)
+ addi r4, r4, 1
+ std r4,THREAD_FPCOUNT(r5)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -744,17 +744,20 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+ current->thread.switch_count = 0;
#ifdef CONFIG_VSX
current->thread.used_vsr = 0;
#endif
memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
+ current->thread.fp_count = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
memset(¤t->thread.vscr, 0, sizeof(current->thread.vscr));
current->thread.vscr.u[3] = 0x00010000; /* Java mode disabled */
current->thread.vrsave = 0;
current->thread.used_vr = 0;
+ current->thread.vr_count = 0;
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_SPE
memset(current->thread.evr, 0, sizeof(current->thread.evr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/setup-common.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/setup-common.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/setup-common.c
@@ -669,3 +669,13 @@ static int powerpc_debugfs_init(void)
}
arch_initcall(powerpc_debugfs_init);
#endif
+
+void arch_proc_sched_show_task(struct task_struct *p, struct seq_file *m) {
+ SEQ_printf(m, "%-35s:%21Ld\n",
+ "switch_count", (long long)p->thread.switch_count);
+ SEQ_printf(m, "%-35s:%21Ld\n",
+ "fp_count", (long long)p->thread.fp_count);
+ SEQ_printf(m, "%-35s:%21Ld\n",
+ "vr_count", (long long)p->thread.vr_count);
+}
+
Index: linux-2.6-ozlabs/include/linux/seq_file.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/seq_file.h
+++ linux-2.6-ozlabs/include/linux/seq_file.h
@@ -95,4 +95,16 @@ extern struct list_head *seq_list_start_
extern struct list_head *seq_list_next(void *v, struct list_head *head,
loff_t *ppos);
+/*
+ * This allows printing both to /proc/sched_debug and
+ * to the console
+ */
+#define SEQ_printf(m, x...) \
+ do { \
+ if (m) \
+ seq_printf(m, x); \
+ else \
+ printk(x); \
+ } while (0)
+
#endif
Index: linux-2.6-ozlabs/kernel/sched_debug.c
===================================================================
--- linux-2.6-ozlabs.orig/kernel/sched_debug.c
+++ linux-2.6-ozlabs/kernel/sched_debug.c
@@ -17,18 +17,6 @@
#include <linux/utsname.h>
/*
- * This allows printing both to /proc/sched_debug and
- * to the console
- */
-#define SEQ_printf(m, x...) \
- do { \
- if (m) \
- seq_printf(m, x); \
- else \
- printk(x); \
- } while (0)
-
-/*
* Ease the printing of nsec fields:
*/
static long long nsec_high(unsigned long long nsec)
@@ -370,6 +358,9 @@ static int __init init_sched_debug_procf
__initcall(init_sched_debug_procfs);
+void __attribute__ ((weak))
+arch_proc_sched_show_task(struct task_struct *p, struct seq_file *m) {}
+
void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
{
unsigned long nr_switches;
@@ -473,6 +464,7 @@ void proc_sched_show_task(struct task_st
SEQ_printf(m, "%-35s:%21Ld\n",
"clock-delta", (long long)(t1-t0));
}
+ arch_proc_sched_show_task(p, m);
}
void proc_sched_set_task(struct task_struct *p)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR
2009-03-14 13:49 ` Segher Boessenkool
2009-03-14 14:58 ` Ryan Arnold
@ 2009-03-16 10:52 ` Gabriel Paubert
1 sibling, 0 replies; 13+ messages in thread
From: Gabriel Paubert @ 2009-03-16 10:52 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev
On Sat, Mar 14, 2009 at 02:49:02PM +0100, Segher Boessenkool wrote:
>> Another option might be simply to say that if an app has used FP, VMX
>> or
>> VSX -once-, then it's likely to do it again and just keep re-enabling
>> it :-)
>>
>> I'm serious here, do we know that many cases where these things are
>> used
>> seldomly once in a while ?
>
> For FP, I believe many apps use it only sporadically. But for VMX and
> VSX,
> yeah, it might well be optimal to keep it enabled all the time. Someone
> should do some profiling...
I concur. I have some apps who are mostly integer but from time to time
perform some statistics which are so much easier to write declaring
a few double variables. On the other hand when you start with vector
instructions, it often means that you are going to use them for a while.
This said, I'm not opposed to an heuristic like: if the app has used
the FP/VMX/VSX registers systematically after having been scheduled a
few times (2 for VMX/VSX, 5 for FP), load the corresponding registers
on every schedule for the next n schedules, where n would be about 20
for VSX/VMX, and perhaps only 5 for FP.
Gabriel
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-16 10:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold
2009-03-13 21:15 ` Kumar Gala
2009-03-13 22:45 ` Benjamin Herrenschmidt
2009-03-13 23:52 ` Josh Boyer
2009-03-14 2:31 ` Ryan Arnold
2009-03-14 3:22 ` Benjamin Herrenschmidt
2009-03-14 13:55 ` Segher Boessenkool
2009-03-14 13:49 ` Segher Boessenkool
2009-03-14 14:58 ` Ryan Arnold
2009-03-16 0:49 ` Benjamin Herrenschmidt
2009-03-16 6:43 ` Michael Neuling
2009-03-16 10:52 ` Gabriel Paubert
2009-03-14 8:20 ` Michael Neuling
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).