* [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR @ 2009-03-13 20:23 Ryan Arnold 2009-03-13 21:15 ` Kumar Gala 2009-03-14 8:20 ` Michael Neuling 0 siblings, 2 replies; 13+ messages in thread From: Ryan Arnold @ 2009-03-13 20:23 UTC (permalink / raw) To: linuxppc-dev; +Cc: Will Schmidt, Steven Munroe Hi all, Those of us working on the POWER toolchain can envision a certain class of customers who may benefit from intelligently disabling certain register class enable bits on context switches, i.e. not disabling by default. Currently, per process, if the MSR enable bits for FPs, VRs or VSRs are set to disabled, an interrupt will be generated as soon as an FP, VMX, or VSX instruction is encountered. At this point the kernel enables the relevant bits in the MSR and returns. Currently, the kernel will disable all of the bits on a context switch. If a customer _knows_ a process will be using a register class extensively, e.g. VRs, they're paying the interrupt->enable-VMX price with every context switch. It'd be nice if we could intelligently leave the bits enabled. Solutions: - A boot flag which always enables VSRs, VRs, FPRs, etc. These are cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs. - A heuristic which permanently enables said register classes for a process if they've been enabled during the previous X interrupts. - The same heuristic could disable the register class bits after a certain criteria is met. We have some ideas on how to benchmark this to verify the expense of the interrupt->enable. As it presently works this stands in the way of using VMX or VSX for optimized string routines in GLIBC. Regards, Ryan S. Arnold IBM Linux Technology Center Linux Toolchain Development ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold @ 2009-03-13 21:15 ` Kumar Gala 2009-03-13 22:45 ` Benjamin Herrenschmidt 2009-03-14 8:20 ` Michael Neuling 1 sibling, 1 reply; 13+ messages in thread From: Kumar Gala @ 2009-03-13 21:15 UTC (permalink / raw) To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe On Mar 13, 2009, at 3:23 PM, Ryan Arnold wrote: > Hi all, > > Those of us working on the POWER toolchain can envision a certain > class > of customers who may benefit from intelligently disabling certain > register class enable bits on context switches, i.e. not disabling by > default. > > Currently, per process, if the MSR enable bits for FPs, VRs or VSRs > are > set to disabled, an interrupt will be generated as soon as an FP, VMX, > or VSX instruction is encountered. At this point the kernel enables > the > relevant bits in the MSR and returns. > > Currently, the kernel will disable all of the bits on a context > switch. > > If a customer _knows_ a process will be using a register class > extensively, e.g. VRs, they're paying the interrupt->enable-VMX price > with every context switch. It'd be nice if we could intelligently > leave > the bits enabled. > > Solutions: > - A boot flag which always enables VSRs, VRs, FPRs, etc. These are > cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs. > > - A heuristic which permanently enables said register classes for a > process if they've been enabled during the previous X interrupts. > > - The same heuristic could disable the register class bits after a > certain criteria is met. > > We have some ideas on how to benchmark this to verify the expense of > the > interrupt->enable. As it presently works this stands in the way of > using VMX or VSX for optimized string routines in GLIBC. If these applications are aware they are heavy users (of FP, VMX, VSX) can we not use a sysctl()? Doing so wouldn't be that difficult. I think trying to do something based on a runtime heuristic sounds a bit iffy. - k ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 21:15 ` Kumar Gala @ 2009-03-13 22:45 ` Benjamin Herrenschmidt 2009-03-13 23:52 ` Josh Boyer ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Benjamin Herrenschmidt @ 2009-03-13 22:45 UTC (permalink / raw) To: Kumar Gala; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe > If these applications are aware they are heavy users (of FP, VMX, VSX) > can we not use a sysctl()? Doing so wouldn't be that difficult. > > I think trying to do something based on a runtime heuristic sounds a > bit iffy. Another option might be simply to say that if an app has used FP, VMX or VSX -once-, then it's likely to do it again and just keep re-enabling it :-) I'm serious here, do we know that many cases where these things are used seldomly once in a while ? An if we do, maybe then a simple counter in the task struct... if the app re-enables it more than a few consecutive switches, then make it stick. I have the feeling that would work out reasonably well. Cheers, Ben. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 22:45 ` Benjamin Herrenschmidt @ 2009-03-13 23:52 ` Josh Boyer 2009-03-14 2:31 ` Ryan Arnold 2009-03-14 13:49 ` Segher Boessenkool 2 siblings, 0 replies; 13+ messages in thread From: Josh Boyer @ 2009-03-13 23:52 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe On Sat, Mar 14, 2009 at 09:45:51AM +1100, Benjamin Herrenschmidt wrote: > >> If these applications are aware they are heavy users (of FP, VMX, VSX) >> can we not use a sysctl()? Doing so wouldn't be that difficult. >> >> I think trying to do something based on a runtime heuristic sounds a >> bit iffy. > >Another option might be simply to say that if an app has used FP, VMX or >VSX -once-, then it's likely to do it again and just keep re-enabling >it :-) > >I'm serious here, do we know that many cases where these things are used >seldomly once in a while ? This seems reasonable to me. >An if we do, maybe then a simple counter in the task struct... if the >app re-enables it more than a few consecutive switches, then make it >stick. I have the feeling that would work out reasonably well. Gee. That sounds like a runtime heuristic :) josh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 22:45 ` Benjamin Herrenschmidt 2009-03-13 23:52 ` Josh Boyer @ 2009-03-14 2:31 ` Ryan Arnold 2009-03-14 3:22 ` Benjamin Herrenschmidt 2009-03-14 13:55 ` Segher Boessenkool 2009-03-14 13:49 ` Segher Boessenkool 2 siblings, 2 replies; 13+ messages in thread From: Ryan Arnold @ 2009-03-14 2:31 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe On Sat, 2009-03-14 at 09:45 +1100, Benjamin Herrenschmidt wrote: > > If these applications are aware they are heavy users (of FP, VMX, VSX) > > can we not use a sysctl()? Doing so wouldn't be that difficult. > > > > I think trying to do something based on a runtime heuristic sounds a > > bit iffy. > > Another option might be simply to say that if an app has used FP, VMX or > VSX -once-, then it's likely to do it again and just keep re-enabling > it :-) > > I'm serious here, do we know that many cases where these things are used > seldomly once in a while ? > > An if we do, maybe then a simple counter in the task struct... if the > app re-enables it more than a few consecutive switches, then make it > stick. I have the feeling that would work out reasonably well. Both of these thoughts came to mind. I don't have a particular preference. It's very likely that a process which results in the enabling of FP,VMX, or VSX may continue to use the facility for the duration of it's lifetime. Threads would be even more likely to exhibit this behavior. The case where this might not be true is if we use VMX or VSX for string routine optimization in GLIBC. This will require metrics to prove it's utility of course. Perhaps what I can do in the string routines is check if the bits are already set and use the facility if it is already enabled and the usage scenario warrants it, i.e. if the size and alignment of the data are in a sweet spot as indicated by profiling data. Regards, Ryan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-14 2:31 ` Ryan Arnold @ 2009-03-14 3:22 ` Benjamin Herrenschmidt 2009-03-14 13:55 ` Segher Boessenkool 1 sibling, 0 replies; 13+ messages in thread From: Benjamin Herrenschmidt @ 2009-03-14 3:22 UTC (permalink / raw) To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe > Both of these thoughts came to mind. I don't have a particular > preference. It's very likely that a process which results in the > enabling of FP,VMX, or VSX may continue to use the facility for the > duration of it's lifetime. Threads would be even more likely to exhibit > this behavior. > > The case where this might not be true is if we use VMX or VSX for string > routine optimization in GLIBC. This will require metrics to prove it's > utility of course. Perhaps what I can do in the string routines is > check if the bits are already set and use the facility if it is already > enabled and the usage scenario warrants it, i.e. if the size and > alignment of the data are in a sweet spot as indicated by profiling > data. Or we just add some instrumentation to today kernel to see how often those gets enabled and then not-re-enabled on the next time slice and do some stats with common workloads. Ben. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-14 2:31 ` Ryan Arnold 2009-03-14 3:22 ` Benjamin Herrenschmidt @ 2009-03-14 13:55 ` Segher Boessenkool 1 sibling, 0 replies; 13+ messages in thread From: Segher Boessenkool @ 2009-03-14 13:55 UTC (permalink / raw) To: rsa; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev >> It's very likely that a process which results in the > enabling of FP,VMX, or VSX may continue to use the facility for the > duration of it's lifetime. Threads would be even more likely to > exhibit > this behavior. > > The case where this might not be true is if we use VMX or VSX for > string > routine optimization in GLIBC. This will require metrics to prove > it's > utility of course. OTOH, the main cost of using VMX etc. is exactly this register save/ restore, and it's a win otherwise pretty much always. I.e., as soon as you take that initial hit, almost anything can get a speedup from VMX. So even if you do not see an overall speedup from, say, only some optimised string routines, it probably is worth it anyway as it pays the initial cost for enabling more optimisations later. Segher ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 22:45 ` Benjamin Herrenschmidt 2009-03-13 23:52 ` Josh Boyer 2009-03-14 2:31 ` Ryan Arnold @ 2009-03-14 13:49 ` Segher Boessenkool 2009-03-14 14:58 ` Ryan Arnold 2009-03-16 10:52 ` Gabriel Paubert 2 siblings, 2 replies; 13+ messages in thread From: Segher Boessenkool @ 2009-03-14 13:49 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe > Another option might be simply to say that if an app has used FP, > VMX or > VSX -once-, then it's likely to do it again and just keep re-enabling > it :-) > > I'm serious here, do we know that many cases where these things are > used > seldomly once in a while ? For FP, I believe many apps use it only sporadically. But for VMX and VSX, yeah, it might well be optimal to keep it enabled all the time. Someone should do some profiling... Segher ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-14 13:49 ` Segher Boessenkool @ 2009-03-14 14:58 ` Ryan Arnold 2009-03-16 0:49 ` Benjamin Herrenschmidt 2009-03-16 10:52 ` Gabriel Paubert 1 sibling, 1 reply; 13+ messages in thread From: Ryan Arnold @ 2009-03-14 14:58 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev On Sat, 2009-03-14 at 14:49 +0100, Segher Boessenkool wrote: > > Another option might be simply to say that if an app has used FP, > > VMX or > > VSX -once-, then it's likely to do it again and just keep re-enabling > > it :-) > > > > I'm serious here, do we know that many cases where these things are > > used > > seldomly once in a while ? > > For FP, I believe many apps use it only sporadically. But for VMX > and VSX, > yeah, it might well be optimal to keep it enabled all the time. Someone > should do some profiling... We can do some VMX testing on existing POWER6 machines. The VSX instruction set hasn't been fully implemented in GCC yet so we'll need to wait a bit for that. Does anyone have an idea for a good VMX/Altivec benchmark? Ryan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-14 14:58 ` Ryan Arnold @ 2009-03-16 0:49 ` Benjamin Herrenschmidt 2009-03-16 6:43 ` Michael Neuling 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Herrenschmidt @ 2009-03-16 0:49 UTC (permalink / raw) To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe On Sat, 2009-03-14 at 09:58 -0500, Ryan Arnold wrote: > We can do some VMX testing on existing POWER6 machines. The VSX > instruction set hasn't been fully implemented in GCC yet so we'll need > to wait a bit for that. Does anyone have an idea for a good VMX/Altivec > benchmark? Note that there are two aspects to the problem: - Lazy save/restore on SMP. This would avoid both the save and restore phases, thus is where the most potential gain is to be made. At the expense of some tricky IPI work when processes migrate between CPUs. However, it will only be useful -if- a process using FP/VMX/VSX is "interrupted" by another process that isn't using them. For example, a kernel thread. So it's unclear whether that's worth it in practice, ie, does this happen that often ? - Always restoring the FP/VMX/VSX state on context switch "in" rather than taking a fault. This is reasonably simple, but at the potential expense of adding the save/restore overhead to applications that only seldomly use these facilities. (Some heuristics might help here). However, the question here what do this buy us ? IE, In the worst case scenario, which is HZ=1000, so every 1ms, the process would have the overhead of an interrupt to do the restore of the state. IE. The restore state itself doesn't count since it would be done either way (at context switch vs. in the unavailable interrupt), so all we win here is the overhead of the actual interrupt, which is implemented as a fast interrupt in assembly. So we have what here ? 1000 cycles to be pessimistic ? On a 1Ghz CPU, that is 1/1000 of the time slice, and both of these are rather pessimistic numbers. So that leaves us with the possible case of 2 tasks using the facility and running a fraction of the timeslice each, for example because they are ping-ponging with each other. Is that something that happens in practice to make it noticeable ? Cheers, Ben. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-16 0:49 ` Benjamin Herrenschmidt @ 2009-03-16 6:43 ` Michael Neuling 0 siblings, 0 replies; 13+ messages in thread From: Michael Neuling @ 2009-03-16 6:43 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe > > We can do some VMX testing on existing POWER6 machines. The VSX > > instruction set hasn't been fully implemented in GCC yet so we'll need > > to wait a bit for that. Does anyone have an idea for a good VMX/Altivec > > benchmark? > > Note that there are two aspects to the problem: > > - Lazy save/restore on SMP. This would avoid both the save and restore > phases, thus is where the most potential gain is to be made. At the > expense of some tricky IPI work when processes migrate between CPUs. > > However, it will only be useful -if- a process using FP/VMX/VSX is > "interrupted" by another process that isn't using them. For example, a > kernel thread. So it's unclear whether that's worth it in practice, ie, > does this happen that often ? > > - Always restoring the FP/VMX/VSX state on context switch "in" rather > than taking a fault. This is reasonably simple, but at the potential > expense of adding the save/restore overhead to applications that only > seldomly use these facilities. (Some heuristics might help here). > > However, the question here what do this buy us ? > > IE, In the worst case scenario, which is HZ=1000, so every 1ms, the > process would have the overhead of an interrupt to do the restore of the > state. IE. The restore state itself doesn't count since it would be done > either way (at context switch vs. in the unavailable interrupt), so all > we win here is the overhead of the actual interrupt, which is > implemented as a fast interrupt in assembly. So we have what here ? 1000 > cycles to be pessimistic ? On a 1Ghz CPU, that is 1/1000 of the time > slice, and both of these are rather pessimistic numbers. > > So that leaves us with the possible case of 2 tasks using the facility > and running a fraction of the timeslice each, for example because they > are ping-ponging with each other. > > Is that something that happens in practice to make it noticeable ? I hacked up the below to put stats in /proc/self/sched. A quick grep through /proc on a rhel5.2 machine (egrep '(fp_count|switch_count)' /proc/*/sched) shows a few apps use fp a few dozen times but then stop. This is only init apps like hald, so need to check some real world apps too. Ryan: let me know if this allows you to collect some useful stats. Subject: [PATCH] powerpc: add context switch, fpr & vr stats to /proc/self/sched. Add a counter for every task switch, fp and vr exception to /proc/self/sched. [root@p5-20-p6-e0 ~]# cat /proc/3422/sched |tail -3 switch_count : 559 fp_count : 317 vr_count : 0 [root@p5-20-p6-e0 ~]# Signed-off-by: Michael Neuling <mikey@neuling.org> --- arch/powerpc/include/asm/processor.h | 3 +++ arch/powerpc/kernel/asm-offsets.c | 3 +++ arch/powerpc/kernel/fpu.S | 3 +++ arch/powerpc/kernel/process.c | 3 +++ arch/powerpc/kernel/setup-common.c | 10 ++++++++++ include/linux/seq_file.h | 12 ++++++++++++ kernel/sched_debug.c | 16 ++++------------ 7 files changed, 38 insertions(+), 12 deletions(-) Index: linux-2.6-ozlabs/arch/powerpc/include/asm/processor.h =================================================================== --- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/processor.h +++ linux-2.6-ozlabs/arch/powerpc/include/asm/processor.h @@ -174,11 +174,13 @@ struct thread_struct { } fpscr; int fpexc_mode; /* floating-point exception mode */ unsigned int align_ctl; /* alignment handling control */ + unsigned long fp_count; /* FP restore count */ #ifdef CONFIG_PPC64 unsigned long start_tb; /* Start purr when proc switched in */ unsigned long accum_tb; /* Total accumilated purr for process */ #endif unsigned long dabr; /* Data address breakpoint register */ + unsigned long switch_count; /* switch count */ #ifdef CONFIG_ALTIVEC /* Complete AltiVec register set */ vector128 vr[32] __attribute__((aligned(16))); @@ -186,6 +188,7 @@ struct thread_struct { vector128 vscr __attribute__((aligned(16))); unsigned long vrsave; int used_vr; /* set if process has used altivec */ + unsigned long vr_count; /* VSX restore count */ #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX /* VSR status */ Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c @@ -74,14 +74,17 @@ int main(void) DEFINE(KSP, offsetof(struct thread_struct, ksp)); DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit)); DEFINE(PT_REGS, offsetof(struct thread_struct, regs)); + DEFINE(THREAD_SWITCHCOUNT, offsetof(struct thread_struct, switch_count)); DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode)); DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0])); DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr)); + DEFINE(THREAD_FPCOUNT, offsetof(struct thread_struct, fp_count)); #ifdef CONFIG_ALTIVEC DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0])); DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave)); DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr)); DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr)); + DEFINE(THREAD_VRCOUNT, offsetof(struct thread_struct, vr_count)); #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr)); Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S =================================================================== --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S +++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S @@ -102,6 +102,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) ori r12,r12,MSR_FP or r12,r12,r4 std r12,_MSR(r1) + ld r4,THREAD_FPCOUNT(r5) + addi r4, r4, 1 + std r4,THREAD_FPCOUNT(r5) #endif lfd fr0,THREAD_FPSCR(r5) MTFSF_L(fr0) Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c =================================================================== --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c @@ -744,17 +744,20 @@ void start_thread(struct pt_regs *regs, #endif discard_lazy_cpu_state(); + current->thread.switch_count = 0; #ifdef CONFIG_VSX current->thread.used_vsr = 0; #endif memset(current->thread.fpr, 0, sizeof(current->thread.fpr)); current->thread.fpscr.val = 0; + current->thread.fp_count = 0; #ifdef CONFIG_ALTIVEC memset(current->thread.vr, 0, sizeof(current->thread.vr)); memset(¤t->thread.vscr, 0, sizeof(current->thread.vscr)); current->thread.vscr.u[3] = 0x00010000; /* Java mode disabled */ current->thread.vrsave = 0; current->thread.used_vr = 0; + current->thread.vr_count = 0; #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_SPE memset(current->thread.evr, 0, sizeof(current->thread.evr)); Index: linux-2.6-ozlabs/arch/powerpc/kernel/setup-common.c =================================================================== --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/setup-common.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/setup-common.c @@ -669,3 +669,13 @@ static int powerpc_debugfs_init(void) } arch_initcall(powerpc_debugfs_init); #endif + +void arch_proc_sched_show_task(struct task_struct *p, struct seq_file *m) { + SEQ_printf(m, "%-35s:%21Ld\n", + "switch_count", (long long)p->thread.switch_count); + SEQ_printf(m, "%-35s:%21Ld\n", + "fp_count", (long long)p->thread.fp_count); + SEQ_printf(m, "%-35s:%21Ld\n", + "vr_count", (long long)p->thread.vr_count); +} + Index: linux-2.6-ozlabs/include/linux/seq_file.h =================================================================== --- linux-2.6-ozlabs.orig/include/linux/seq_file.h +++ linux-2.6-ozlabs/include/linux/seq_file.h @@ -95,4 +95,16 @@ extern struct list_head *seq_list_start_ extern struct list_head *seq_list_next(void *v, struct list_head *head, loff_t *ppos); +/* + * This allows printing both to /proc/sched_debug and + * to the console + */ +#define SEQ_printf(m, x...) \ + do { \ + if (m) \ + seq_printf(m, x); \ + else \ + printk(x); \ + } while (0) + #endif Index: linux-2.6-ozlabs/kernel/sched_debug.c =================================================================== --- linux-2.6-ozlabs.orig/kernel/sched_debug.c +++ linux-2.6-ozlabs/kernel/sched_debug.c @@ -17,18 +17,6 @@ #include <linux/utsname.h> /* - * This allows printing both to /proc/sched_debug and - * to the console - */ -#define SEQ_printf(m, x...) \ - do { \ - if (m) \ - seq_printf(m, x); \ - else \ - printk(x); \ - } while (0) - -/* * Ease the printing of nsec fields: */ static long long nsec_high(unsigned long long nsec) @@ -370,6 +358,9 @@ static int __init init_sched_debug_procf __initcall(init_sched_debug_procfs); +void __attribute__ ((weak)) +arch_proc_sched_show_task(struct task_struct *p, struct seq_file *m) {} + void proc_sched_show_task(struct task_struct *p, struct seq_file *m) { unsigned long nr_switches; @@ -473,6 +464,7 @@ void proc_sched_show_task(struct task_st SEQ_printf(m, "%-35s:%21Ld\n", "clock-delta", (long long)(t1-t0)); } + arch_proc_sched_show_task(p, m); } void proc_sched_set_task(struct task_struct *p) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-14 13:49 ` Segher Boessenkool 2009-03-14 14:58 ` Ryan Arnold @ 2009-03-16 10:52 ` Gabriel Paubert 1 sibling, 0 replies; 13+ messages in thread From: Gabriel Paubert @ 2009-03-16 10:52 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Will Schmidt, Steven Munroe, linuxppc-dev On Sat, Mar 14, 2009 at 02:49:02PM +0100, Segher Boessenkool wrote: >> Another option might be simply to say that if an app has used FP, VMX >> or >> VSX -once-, then it's likely to do it again and just keep re-enabling >> it :-) >> >> I'm serious here, do we know that many cases where these things are >> used >> seldomly once in a while ? > > For FP, I believe many apps use it only sporadically. But for VMX and > VSX, > yeah, it might well be optimal to keep it enabled all the time. Someone > should do some profiling... I concur. I have some apps who are mostly integer but from time to time perform some statistics which are so much easier to write declaring a few double variables. On the other hand when you start with vector instructions, it often means that you are going to use them for a while. This said, I'm not opposed to an heuristic like: if the app has used the FP/VMX/VSX registers systematically after having been scheduled a few times (2 for VMX/VSX, 5 for FP), load the corresponding registers on every schedule for the next n schedules, where n would be about 20 for VSX/VMX, and perhaps only 5 for FP. Gabriel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR 2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold 2009-03-13 21:15 ` Kumar Gala @ 2009-03-14 8:20 ` Michael Neuling 1 sibling, 0 replies; 13+ messages in thread From: Michael Neuling @ 2009-03-14 8:20 UTC (permalink / raw) To: rsa; +Cc: linuxppc-dev, Will Schmidt, Steven Munroe > Hi all, > > Those of us working on the POWER toolchain can envision a certain class > of customers who may benefit from intelligently disabling certain > register class enable bits on context switches, i.e. not disabling by > default. > > Currently, per process, if the MSR enable bits for FPs, VRs or VSRs are > set to disabled, an interrupt will be generated as soon as an FP, VMX, > or VSX instruction is encountered. At this point the kernel enables the > relevant bits in the MSR and returns. > > Currently, the kernel will disable all of the bits on a context switch. > > If a customer _knows_ a process will be using a register class > extensively, e.g. VRs, they're paying the interrupt->enable-VMX price > with every context switch. It'd be nice if we could intelligently leave > the bits enabled. > > Solutions: > - A boot flag which always enables VSRs, VRs, FPRs, etc. These are > cumulative, i.e. VSRs implies VRs and FPRS; VRs implies FPRs. > > - A heuristic which permanently enables said register classes for a > process if they've been enabled during the previous X interrupts. > > - The same heuristic could disable the register class bits after a > certain criteria is met. Another option is to look at getting lazy save working on SMP. We currently only enable it on UP compiles. This would probably have the biggest performance impact when there is only 1 FP/VSX/VMX application running per CPU. Mikey > > We have some ideas on how to benchmark this to verify the expense of the > interrupt->enable. As it presently works this stands in the way of > using VMX or VSX for optimized string routines in GLIBC. > > Regards, > > Ryan S. Arnold > IBM Linux Technology Center > Linux Toolchain Development > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-16 10:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-13 20:23 [RFC] Moving toward smarter disabling of FPRs, VRs, and VSRs in the MSR Ryan Arnold 2009-03-13 21:15 ` Kumar Gala 2009-03-13 22:45 ` Benjamin Herrenschmidt 2009-03-13 23:52 ` Josh Boyer 2009-03-14 2:31 ` Ryan Arnold 2009-03-14 3:22 ` Benjamin Herrenschmidt 2009-03-14 13:55 ` Segher Boessenkool 2009-03-14 13:49 ` Segher Boessenkool 2009-03-14 14:58 ` Ryan Arnold 2009-03-16 0:49 ` Benjamin Herrenschmidt 2009-03-16 6:43 ` Michael Neuling 2009-03-16 10:52 ` Gabriel Paubert 2009-03-14 8:20 ` Michael Neuling
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).