Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Gabriel Paubert <paubert@iram.es>
To: James Yang <James.Yang@freescale.com>
Cc: Chris Proctor <cproctor@csc.com.au>,
	Stephen N Chivers <schivers@csc.com.au>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
Date: Tue, 11 Feb 2014 08:26:07 +0100	[thread overview]
Message-ID: <20140211072606.GA26514@visitor2.iram.es> (raw)
In-Reply-To: <alpine.LRH.2.00.1402101056410.10318@ra8135-ec1.am.freescale.net>

	Hi James,

On Mon, Feb 10, 2014 at 11:03:07AM -0600, James Yang wrote:

[snipped]
> > Ok, if you have measured that method1 is faster than method2, let us go for it.
> > I believe method2 would be faster if you had a large out-of-order execution
> > window, because more parallelism can be extracted from it, but this is probably
> > only true for high end cores, which do not need FPU emulation in the first place.
> 
> Yeah, 8548 can issue 2 SFX instructions per cycle which is what the 
> compiler generated.   
> 

Then it is method1.

>  
> > The other place where we can optimize is the generation of FEX. Here is 
> > my current patch:
> > 
> > 
> > diff --git a/arch/powerpc/math-emu/mtfsf.c b/arch/powerpc/math-emu/mtfsf.c
> > index dbce92e..b57b3fa8 100644
> > --- a/arch/powerpc/math-emu/mtfsf.c
> > +++ b/arch/powerpc/math-emu/mtfsf.c
> > @@ -11,48 +11,35 @@ mtfsf(unsigned int FM, u32 *frB)
> >  	u32 mask;
> >  	u32 fpscr;
> >  
> > -	if (FM == 0)
> > +	if (likely(FM == 0xff))
> > +		mask = 0xffffffff;
> > +	else if (unlikely(FM == 0))
> >  		return 0;
> > -
> > -	if (FM == 0xff)
> > -		mask = 0x9fffffff;
> >  	else {
> > -		mask = 0;
> > -		if (FM & (1 << 0))
> > -			mask |= 0x90000000;
> > -		if (FM & (1 << 1))
> > -			mask |= 0x0f000000;
> > -		if (FM & (1 << 2))
> > -			mask |= 0x00f00000;
> > -		if (FM & (1 << 3))
> > -			mask |= 0x000f0000;
> > -		if (FM & (1 << 4))
> > -			mask |= 0x0000f000;
> > -		if (FM & (1 << 5))
> > -			mask |= 0x00000f00;
> > -		if (FM & (1 << 6))
> > -			mask |= 0x000000f0;
> > -		if (FM & (1 << 7))
> > -			mask |= 0x0000000f;
> > +		mask = (FM & 1);
> > +		mask |= (FM << 3) & 0x10;
> > +		mask |= (FM << 6) & 0x100;
> > +		mask |= (FM << 9) & 0x1000;
> > +		mask |= (FM << 12) & 0x10000;
> > +		mask |= (FM << 15) & 0x100000;
> > +		mask |= (FM << 18) & 0x1000000;
> > +		mask |= (FM << 21) & 0x10000000;
> > +		mask *= 15;
> 
> 
> Needs to also mask out bits 1 and 2, they aren't to be set from frB.
> 
> 		mask &= 0x9FFFFFFF;
> 
> 

Look at the following lines:

> 
> 
> >  	}
> >  
> > -	__FPU_FPSCR &= ~(mask);
> > -	__FPU_FPSCR |= (frB[1] & mask);
> > +	fpscr = ((__FPU_FPSCR & ~mask) | (frB[1] & mask)) &
> > +		~(FPSCR_VX | FPSCR_FEX);
 
It's here (masking FPSCR_VX and FPSCR_FEX).

Actually the previous code was redundant, it cleared FEX and VX in the
mask computation and later again when recomputing them. Clearing them
once should be enough.

> >  
> > -	__FPU_FPSCR &= ~(FPSCR_VX);
> > -	if (__FPU_FPSCR & (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI |
> > +	if (fpscr & (FPSCR_VXSNAN | FPSCR_VXISI | FPSCR_VXIDI |
> >  		     FPSCR_VXZDZ | FPSCR_VXIMZ | FPSCR_VXVC |
> >  		     FPSCR_VXSOFT | FPSCR_VXSQRT | FPSCR_VXCVI))
> > -		__FPU_FPSCR |= FPSCR_VX;
> > -
> > -	fpscr = __FPU_FPSCR;
> > -	fpscr &= ~(FPSCR_FEX);
> > -	if (((fpscr & FPSCR_VX) && (fpscr & FPSCR_VE)) ||
> > -	    ((fpscr & FPSCR_OX) && (fpscr & FPSCR_OE)) ||
> > -	    ((fpscr & FPSCR_UX) && (fpscr & FPSCR_UE)) ||
> > -	    ((fpscr & FPSCR_ZX) && (fpscr & FPSCR_ZE)) ||
> > -	    ((fpscr & FPSCR_XX) && (fpscr & FPSCR_XE)))
> > -		fpscr |= FPSCR_FEX;
> > +		fpscr |= FPSCR_VX;
> > +
> > +	/* The bit order of exception enables and exception status
> > +	 * is the same. Simply shift and mask to check for enabled
> > +	 * exceptions.
> > +	 */
> > +	if (fpscr & (fpscr >> 22) &  0xf8) fpscr |= FPSCR_FEX;
> >  	__FPU_FPSCR = fpscr;
> >  
> >  #ifdef DEBUG
> >  mtfsf.c |   57 ++++++++++++++++++++++-----------------------------------
> >  1 file changed, 22 insertions(+), 35 deletions(-)
> > 
> > 
> > Notes: 
> > 
> > 1) I'm really unsure on whether 0xff is frequent or not. So the likely()
> > statement at the beginning may be wrong. Actually, if it is not very likely,
> > it might be better to remove the special casef for FM==0xff. A look at 
> > GCC sources shows that it never generates a mask of 0xff. From glibc
> > sources, there vast majority of cases uses 0x1, only isnan() uses 0xff.
> 
> Can't handle all cases here.  

That's why I would go for the simplest possible code. Conditionals are
expensive and minimizing cache footprint is often the best measure of 
performance for infrequently used code. With this in mind, I would get 
rid of all the tests for special FM values and rely on the optimized
generic case.


>  
> > 2) it may be better to remove the check for FM==0, after all, the instruction
> > effectively becomes a nop, and generating the instruction in the first place
> > would be too stupid for words.
> 
> Hmm a heavy no-op.  I wonder if it is heavier than a sync.

In theory not. It contains the equivalent of several isync (taking an exception
and returning from it), but not any synchronization wrt the memory accesses.

	Gabriel

next prev parent reply	other threads:[~2014-02-11  7:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-06  2:09 arch/powerpc/math-emu/mtfsf.c - incorrect mask? Stephen N Chivers
2014-02-06  8:26 ` Gabriel Paubert
2014-02-07  1:27   ` Stephen N Chivers
2014-02-07 10:10     ` Gabriel Paubert
2014-02-07 20:49       ` James Yang
2014-02-09 19:42         ` Stephen N Chivers
2014-02-10 16:50           ` James Yang
2014-02-10 11:03         ` Gabriel Paubert
2014-02-10 11:17           ` David Laight
2014-02-10 12:21             ` Gabriel Paubert
2014-02-10 12:32               ` David Laight
2014-02-10 13:00                 ` Gabriel Paubert
2014-02-10 17:03           ` James Yang
2014-02-11  7:26             ` Gabriel Paubert [this message]
2014-02-11 20:57               ` Linux-3.14-rc2: Order of serial node compatibles in DTS files Stephen N Chivers
2014-02-11 22:33                 ` Kumar Gala
2014-02-11 22:51                   ` Sebastian Hesselbarth
2014-02-11 23:38                     ` Stephen N Chivers
2014-02-11 23:43                       ` Sebastian Hesselbarth
2014-02-12 11:00                         ` Arnd Bergmann
2014-02-11 23:41                     ` Scott Wood
2014-02-11 23:46                       ` Sebastian Hesselbarth
2014-02-12  0:21                         ` Stephen N Chivers
2014-02-12  5:28                           ` Kevin Hao
2014-02-12  8:30                             ` Sebastian Hesselbarth
2014-02-12 10:31                               ` Kevin Hao
2014-02-12 11:26                                 ` Sebastian Hesselbarth
2014-02-12 11:32                                   ` Kevin Hao
2014-02-12  8:25                           ` Sebastian Hesselbarth
2014-02-12 10:35                             ` Kevin Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211072606.GA26514@visitor2.iram.es \
    --to=paubert@iram.es \
    --cc=James.Yang@freescale.com \
    --cc=cproctor@csc.com.au \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=schivers@csc.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox