* [PATCH] powerpc/kernel: Avoid redundancies on giveup_all @ 2017-06-22 20:27 Breno Leitao 2017-06-23 6:03 ` Cyril Bur 0 siblings, 1 reply; 3+ messages in thread From: Breno Leitao @ 2017-06-22 20:27 UTC (permalink / raw) To: linuxppc-dev; +Cc: Breno Leitao, Gustavo Romero Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and __giveup_altivec() again, in a redudant manner. Other than giving up FP and Altivec, __giveup_vsx() also disables MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}(). As VSX can not be enabled alone on MSR (without FP and/or VEC enabled), this is also a redundancy. VSX is never enabled alone (without FP and VEC) because every time VSX is enabled, as through load_up_vsx() and restore_math(), FP is also enabled together. This change improves giveup_all() in average just 3%, but since giveup_all() is called very frequently, around 8x per CPU per second on an idle machine, this change might show some noticeable improvement. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> --- arch/powerpc/kernel/process.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 2ad725ef4368..5d6af58270e6 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -494,10 +494,6 @@ void giveup_all(struct task_struct *tsk) if (usermsr & MSR_VEC) __giveup_altivec(tsk); #endif -#ifdef CONFIG_VSX - if (usermsr & MSR_VSX) - __giveup_vsx(tsk); -#endif #ifdef CONFIG_SPE if (usermsr & MSR_SPE) __giveup_spe(tsk); -- 2.11.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all 2017-06-22 20:27 [PATCH] powerpc/kernel: Avoid redundancies on giveup_all Breno Leitao @ 2017-06-23 6:03 ` Cyril Bur 2017-06-23 17:51 ` Breno Leitao 0 siblings, 1 reply; 3+ messages in thread From: Cyril Bur @ 2017-06-23 6:03 UTC (permalink / raw) To: Breno Leitao, linuxppc-dev; +Cc: Gustavo Romero On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote: > Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and > __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and > __giveup_altivec() again, in a redudant manner. > > Other than giving up FP and Altivec, __giveup_vsx() also disables > MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}(). > As VSX can not be enabled alone on MSR (without FP and/or VEC > enabled), this is also a redundancy. VSX is never enabled alone (without > FP and VEC) because every time VSX is enabled, as through load_up_vsx() > and restore_math(), FP is also enabled together. > > This change improves giveup_all() in average just 3%, but since > giveup_all() is called very frequently, around 8x per CPU per second on > an idle machine, this change might show some noticeable improvement. > So I totally agree except this makes me quite nervous. I know we're quite good at always disabling VSX when we disable FPU and ALTIVEC and we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if we ever get that wrong... I'm more interested in how this improves giveup_all() performance by so much, but then hardware often surprises - I guess that's the cost of a function call. Perhaps caching the thread.regs->msr isn't a good idea. If we could branch over in the common case and but still have the call to the function in case something goes horribly wrong? > Signed-off-by: Breno Leitao <leitao@debian.org> > Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> > --- > arch/powerpc/kernel/process.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index 2ad725ef4368..5d6af58270e6 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -494,10 +494,6 @@ void giveup_all(struct task_struct *tsk) > if (usermsr & MSR_VEC) > __giveup_altivec(tsk); > #endif > -#ifdef CONFIG_VSX > - if (usermsr & MSR_VSX) > - __giveup_vsx(tsk); > -#endif > #ifdef CONFIG_SPE > if (usermsr & MSR_SPE) > __giveup_spe(tsk); ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all 2017-06-23 6:03 ` Cyril Bur @ 2017-06-23 17:51 ` Breno Leitao 0 siblings, 0 replies; 3+ messages in thread From: Breno Leitao @ 2017-06-23 17:51 UTC (permalink / raw) To: Cyril Bur; +Cc: linuxppc-dev, Gustavo Romero Hi Cyril, On Fri, Jun 23, 2017 at 04:03:12PM +1000, Cyril Bur wrote: > On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote: > > Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and > > __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and > > __giveup_altivec() again, in a redudant manner. > > > > Other than giving up FP and Altivec, __giveup_vsx() also disables > > MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}(). > > As VSX can not be enabled alone on MSR (without FP and/or VEC > > enabled), this is also a redundancy. VSX is never enabled alone (without > > FP and VEC) because every time VSX is enabled, as through load_up_vsx() > > and restore_math(), FP is also enabled together. > > > > This change improves giveup_all() in average just 3%, but since > > giveup_all() is called very frequently, around 8x per CPU per second on > > an idle machine, this change might show some noticeable improvement. > > > > So I totally agree except this makes me quite nervous. I know we're > quite good at always disabling VSX when we disable FPU and ALTIVEC and > we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if > we ever get that wrong... Right, I understand your point, we can consider this code as a 'fallback' if we, somehow, forget to disable VSX when disabling FPU/ALTIVEC. Good point. > I'm more interested in how this improves giveup_all() performance by so > much, but then hardware often surprises - I guess that's the cost of a > function call. I got this number using ftrace. I used the 'funcgraph' tracer with the trace_options set to 'funcgraph-duration'. Then I set set_ftrace_filter with giveup_all(). There is also a tool that helps with it if you wish. It uses the exactly same mechanism I used but in a more automated way. The tool name is funcgraph by Brendan. https://github.com/brendangregg/perf-tools/blob/master/kernel/funcgraph > Perhaps caching the thread.regs->msr isn't a good idea. Yes, I looked at it, but it seems that the compiler is optimizing it, keeping it at r30, and not saving in the memory/stack. This is the code being generated here, where r9 contains the task pointer. usermsr = tsk->thread.regs->msr; c0000000000199c4: 08 01 c9 eb ld r30,264(r9) if ((usermsr & msr_all_available) == 0) c0000000000199c8: 60 5f 2a e9 ld r9,24416(r10) c0000000000199cc: 39 48 ca 7f and. r10,r30,r9 c0000000000199d0: 20 00 82 40 bne c0000000000199f0 <giveup_all+0x60> > If we could > branch over in the common case and but still have the call to the > function in case something goes horribly wrong? Yes, we can revisit it on a future opportunity. Thanks for sharing your opinion. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-06-23 17:51 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-22 20:27 [PATCH] powerpc/kernel: Avoid redundancies on giveup_all Breno Leitao 2017-06-23 6:03 ` Cyril Bur 2017-06-23 17:51 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).