From: Michael Neuling <mikey@neuling.org>
To: Jimi Xenidis <jimix@pobox.com>
Cc: Andrew T Tauferner <ataufer@us.ibm.com>,
Kumar Gala <kumar.gala@gmail.com>,
Jay Bryant <jsbryant@us.ibm.com>,
Josh Boyer <jwboyer@linux.vnet.ibm.com>,
Todd Inglett <tinglett@us.ibm.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC] Add IBM Blue Gene/Q Platform
Date: Mon, 10 Dec 2012 11:47:05 +1100 [thread overview]
Message-ID: <3051.1355100425@neuling.org> (raw)
In-Reply-To: <1ECB6402-9BC0-4468-B405-A6F6E971486B@pobox.com>
Jimi Xenidis <jimix@pobox.com> wrote:
>
> On Dec 7, 2012, at 7:38 AM, Jimi Xenidis <jimix@pobox.com> wrote:
>
> >
> > On Dec 6, 2012, at 11:54 PM, Michael Neuling <mikey@neuling.org> wrote:
> >
> >>> commit 279c0615917b959a652e81f4ad0d886e2d426d85
> >>> Author: Jimi Xenidis <jimix@pobox.com>
> >>> Date: Wed Dec 5 13:43:22 2012 -0500
> >>>
> >>> powerpc/book3e: IBM Blue Gene/Q Quad Processing eXtention (QPX)
> >>>
> >>> This enables kernel support for the QPX extention and is intended for
> >>> processors that support it, usually an IBM Blue Gene processor.
> >>> Turning it on does not effect other processors but it does add code
> >>> and will quadruple the per thread save and restore area for the FPU
> >>> (hense the name). If you have enabled VSX it will only double the
> >>> space.
> >>>
> >>> Signed-off-by: Jimi Xenidis <jimix@pobox.com>
> >>
>
> <<snip>>
> >>
> >>
> >>
> >> +BEGIN_FTR_SECTION \
> >> + SAVE_32VSRS(n,c,base); \
> >> +END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> >> +BEGIN_FTR_SECTION \
> >> + SAVE_32QRS(n,c,base); \
> >> +END_FTR_SECTION_IFSET(CPU_FTR_QPX);
> >>
> >> I don't think we want to do this. We are going to end up with 64
> >> NOPS here somewhere.
> >
> > Excellent point, NOPs are cheap on most processors but not A2 and a lot of embedded, I can wrap some branches with the FTR instead.
> > Do you have a concern on the code size?
>
> Thought about it a bit and came up with this solution for arch/powerpc/kernel/fpu.S.
> This should address the following issues
> - MSR_VSX vs MSR_VEC
> - Big chunks of NOPs in the code path
> - Less FTR space fixups at boot time.
> - IMNHSO easier to read especially when disassembled
Indeed, I think it looks better. I was going to mention that it was
already pretty complex to read, so a rewrite like this was probably
needed. So thanks!!
That being said, there is a pretty complex testing matrix of
CONFIG_VSX/VMX/FPU/QPX/SMP/64/32 CPU_FTR/VSX/FPU/QPX/VMX so I'd need to
look/test more carefully to make sure all of these are covered.
Also, transactional memory (see
http://lists.ozlabs.org/pipermail/linuxppc-dev/2012-November/102216.html)
will change this code. You should rebase on top of that if you really
want it considered for upstream.
Mikey
>
> I did consider using the LR and BLR, but the !CONFIG_SMP case only adds one more special block and uses a different register set.
> Also if this is agreeable I would like us to consider removing the *_32FPVSRS* macros entirely and put the FTR tests in the actual code.
> This would allow us to use #ifdefs and reduce the amount of code that actually gets compiled.
>
> Thoughts?
>
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index e0ada05..5964067 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -25,30 +25,81 @@
> #include <asm/asm-offsets.h>
> #include <asm/ptrace.h>
>
> -#ifdef CONFIG_VSX
> -#define __REST_32FPVSRS(n,c,base) \
> -BEGIN_FTR_SECTION \
> - b 2f; \
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> - REST_32FPRS(n,base); \
> - b 3f; \
> -2: REST_32VSRS(n,c,base); \
> -3:
> -
> -#define __SAVE_32FPVSRS(n,c,base) \
> -BEGIN_FTR_SECTION \
> - b 2f; \
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> - SAVE_32FPRS(n,base); \
> - b 3f; \
> -2: SAVE_32VSRS(n,c,base); \
> -3:
> -#else
> -#define __REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
> -#define __SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
> -#endif
> -#define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
> -#define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
> +
> +/*
> + * Restore subroutines, R4 is scratch and R5 is base
> + */
> +vsx_restore:
> + REST_32VSRS(0, __REG_R4, __REG_R5)
> + b after_restore
> +qpx_restore:
> + REST_32QRS(0, __REG_R4, __REG_R5)
> + b after_restore
> +fpu_restore:
> + REST_32FPRS(0, __REG_R5)
> + b after_restore
> +
> +#define REST_32FPVSRS(n, c, base) \
> +BEGIN_FTR_SECTION \
> + b vsx_restore; \
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
> +BEGIN_FTR_SECTION \
> + b qpx_restore; \
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
> + b fpu_restore; \
> +after_restore:
> +
> +/*
> + * Save subroutines, R4 is scratch and R3 is base
> + */
> +vsx_save:
> + SAVE_32VSRS(0, __REG_R4, __REG_R3)
> + b after_save
> +qpx_save:
> + SAVE_32QRS(0, __REG_R4, __REG_R3)
> + b after_save
> +fpu_save:
> + SAVE_32FPRS(0, __REG_R3)
> + b after_save
> +
> +#define SAVE_32FPVSRS(n, c, base) \
> +BEGIN_FTR_SECTION \
> + b vsx_save; \
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
> +BEGIN_FTR_SECTION \
> + b qpx_save; \
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
> + b fpu_save; \
> +after_save:
> +
> +#ifndef CONFIG_SMP
> +/*
> + * we need an extra save set for the !CONFIG_SMP case, see below
> + * Scratch it R5 and base is R4
> + */
> +vsx_save_nosmp:
> + SAVE_32VSRS(0,R5,R4)
> + b after_save_nosmp
> +
> +qpx_save_nosmp:
> + SAVE_32QRS(0,R5,R4)
> + b after_save_nosmp
> +
> +fpu_save_nosmp:
> + SAVE_32FPRS(0,R4)
> + b after_save_nosmp
> +
> +#define SAVE_32FPVSRS_NOSMP(n,c,base) \
> +BEGIN_FTR_SECTION \
> + b vsx_save_nosmp; \
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
> +BEGIN_FTR_SECTION \
> + b qpx_save_nosmp; \
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
> + b fpu_save_nosmp; \
> +after_save_nosmp:
> +
> +#endif /* !CONFIG_SMP */
>
> /*
> * This task wants to use the FPU now.
> @@ -65,6 +116,11 @@ BEGIN_FTR_SECTION
> oris r5,r5,MSR_VSX@h
> END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> #endif
> +#ifdef CONFIG_PPC_QPX
> +BEGIN_FTR_SECTION
> + oris r5,r5,MSR_VEC@h
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
> +#endif
> SYNC
> MTMSRD(r5) /* enable use of fpu now */
> isync
> @@ -81,7 +137,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> beq 1f
> toreal(r4)
> addi r4,r4,THREAD /* want last_task_used_math->thread */
> - SAVE_32FPVSRS(0, R5, R4)
> + SAVE_32FPVSRS_NOSMP(0, R5, R4)
> mffs fr0
> stfd fr0,THREAD_FPSCR(r4)
> PPC_LL r5,PT_REGS(r4)
> @@ -94,7 +150,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> #endif /* CONFIG_SMP */
> /* enable use of FP after return */
> #ifdef CONFIG_PPC32
> - mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
> + mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
> lwz r4,THREAD_FPEXC_MODE(r5)
> ori r9,r9,MSR_FP /* enable FP for current */
> or r9,r9,r4
> @@ -132,6 +188,11 @@ BEGIN_FTR_SECTION
> oris r5,r5,MSR_VSX@h
> END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> #endif
> +#ifdef CONFIG_PPC_QPX
> +BEGIN_FTR_SECTION
> + oris r5,r5,MSR_VEC@h
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
> +#endif
> SYNC_601
> ISYNC_601
> MTMSRD(r5) /* enable use of fpu now */
> @@ -148,10 +209,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> beq 1f
> PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> li r3,MSR_FP|MSR_FE0|MSR_FE1
> -#ifdef CONFIG_VSX
> +#if defined(CONFIG_VSX) || defined(CONFIG_PPC_QPX)
> BEGIN_FTR_SECTION
> oris r3,r3,MSR_VSX@h
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX | CPU_FTR_QPX)
> #endif
> andc r4,r4,r3 /* disable FP for previous task */
> PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>
> -jx
>
>
> >
> >>
> >> I'd like to see this patch broken into different parts.
> >
> > I'm not sure how _this_ patch:
> > <https://github.com/jimix/linux-bgq/commit/279c0615917b959a652e81f4ad0d886e2d426d85>
> > could be broken up, please advise.
> >
> >>
> >> Also, have you boot tested this change on a VSX enabled box?
> >
> > I can try, I may bug you for help.
> > Is there a commonly test (or apps) I should run?
> >
> > -jx
> >
> >
> >>
> >> Mikey
> >
>
next prev parent reply other threads:[~2012-12-10 0:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-07 1:53 [RFC] Add IBM Blue Gene/Q Platform Jimi Xenidis
2012-12-07 5:41 ` Michael Neuling
2012-12-07 13:12 ` Jimi Xenidis
2012-12-10 0:12 ` Michael Neuling
2012-12-07 5:54 ` Michael Neuling
2012-12-07 5:55 ` Michael Neuling
2012-12-07 13:38 ` Jimi Xenidis
2012-12-08 22:22 ` Jimi Xenidis
2012-12-10 0:47 ` Michael Neuling [this message]
2012-12-10 5:56 ` Jimi Xenidis
2012-12-10 6:06 ` Michael Neuling
2012-12-10 0:18 ` Michael Neuling
2012-12-07 5:56 ` Michael Neuling
2012-12-07 13:44 ` Jimi Xenidis
2012-12-07 14:31 ` Andrew Tauferner
2012-12-10 21:32 ` Jimi Xenidis
2012-12-10 21:33 ` Jimi Xenidis
2012-12-10 0:26 ` Michael Neuling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3051.1355100425@neuling.org \
--to=mikey@neuling.org \
--cc=ataufer@us.ibm.com \
--cc=jimix@pobox.com \
--cc=jsbryant@us.ibm.com \
--cc=jwboyer@linux.vnet.ibm.com \
--cc=kumar.gala@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=tinglett@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.