All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimisation
@ 2004-07-08  6:38 Sridhar Adagada
  2004-07-08  8:59 ` Optimisation Geert Uytterhoeven
  0 siblings, 1 reply; 4+ messages in thread
From: Sridhar Adagada @ 2004-07-08  6:38 UTC (permalink / raw)
  To: linux-mips

Hello everybody,

I am sorry if the mail is too long had to include assembly code, which
i am still learning.

I am trying some optimization on some my code.  Listed below is the
c-code of a function which is obvious,  but the assembly from compiler
with optimization for speed, is confusing to me can any one explain?

As you can see $6 is the length, my confusion is at the lines 12-14,
19, 20 why is the length added with 65535 and the comparison with 0
for max and the right shift.  The same operation is done the base ($7)
 can any one explain why this is needed.  Can i replace all these
three instructions with "ori $9 $0 $6"

One more thing does the instruction at line 25 "madl    $25, $24" do?

Any help with be greatly appresciated

Thanks
Sri

C-code:
short cal_xxx(short *abs, short *coef, short len, short base)
{
 short i;
 short sum = 0;

 for (i = 0; i < length; i++)
 {
   sum += ( (unsigned int)abs[i] * (unsigned int)coef[i] );
 }
 return ( ((sum + 1 << (base -1)) >> base) );
}

Assmebly code:
    1         .set    noat
    2         .set    noreorder
    3         .set    nomacro
    4         .text
    5         .align  4
    6         .globl  cal_xxx
    7 /*  short i;
    8     short sum = 0; */
    9         ori     $9, $0, 0
   10         ori     $3, $0, 0
   11 /*  for( i = 0; i < length; i++ ) */
   12         andi    $6, $6, 65535
   13         imax    $8, $6, 0
   14         srl     $10, $8, 3
   15         beq     $10, $0, .L62
   16         andi    $7, $7, 65535
   17         move    $2, $5
   18         move    $11, $4
   19         sll     $24, $10, 3
   20         andi    $9, $24, 65528
   21 .L78:
   22         lh      $25, 0($11)
   23         lh      $24, 0($2)
   24         mtlo    $3
   25         madl    $25, $24
   26         lh      $25, 2($11)
   27         lh      $24, 2($2)
   28         nop
   29         madl    $25, $24
   30         lh      $25, 4($11)
   31         lh      $24, 4($2)
   32         nop
   33         madl    $25, $24
   34         lh      $25, 6($11)
   35         lh      $24, 6($2)
   36         nop
   37         madl    $25, $24
   38         lh      $25, 8($11)
   39         lh      $24, 8($2)
   40         nop
   41         madl    $25, $24
   42         lh      $25, 10($11)
   43         lh      $24, 10($2)
   44         nop
   45         madl    $25, $24
   46         lh      $25, 12($11)
   47         lh      $24, 12($2)
   48         nop
   49         madl    $25, $24
   50         lh      $25, 14($11)
   51         lh      $24, 14($2)
   52         addiu   $10, $10, -1
   53         madl    $25, $24
   54         addiu   $11, $11, 16
   55         addiu   $2, $2, 16
   56         mflo    $3
   57         bne     $10, $0, .L78
   58         nop
   59         nop
   60 .L62:
   61         andi    $10, $8, 7
   62         beq     $10, $0, .L44
   63         sll     $9, $9, 1
   64         addu    $2, $5, $9
   65         addu    $25, $4, $9
   66         addiu   $10, $10, -1
   67 .L1000082:
   68         lh      $15, 0($2)
   69         lh      $24, 0($25)
   70         mtlo    $3
   71         madl    $24, $15
   72         addiu   $25, $25, 2
   73         addiu   $2, $2, 2
   74         mflo    $3
   75         bne     $10, $0, .L1000082
   76         addiu   $10, $10, -1
   77 .L44:
   78 /*     } for-end */
   79         addiu   $25, $7, -1
   80         ori     $15, $0, 1
   81         sllv    $24, $15, $25
   82         addu    $14, $3, $24
   83         srav    $13, $14, $7
   84         sll     $2, $13, 16
   85  #          .ef
   86         jr      $ra
   87         sra     $2, $2, 16
   88         .type   cal_xxx,@function
   89         .size   cal_xxx,.-cal_xxx
   90         .align  4
   91  #i     $9      local
   92  #sum   $3      local
   93
   94  #abs  $4      param
   95  #coef $5      param
   96  #length        $6      param
   97  #base  $7      param
   98
   99         .data
  100 .L139:
  101         .text
  102 /*  } end-function */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimisation
  2004-07-08  6:38 Optimisation Sridhar Adagada
@ 2004-07-08  8:59 ` Geert Uytterhoeven
  2004-07-08  9:21   ` Optimisation Sridhar Adagada
  0 siblings, 1 reply; 4+ messages in thread
From: Geert Uytterhoeven @ 2004-07-08  8:59 UTC (permalink / raw)
  To: Sridhar Adagada; +Cc: Linux/MIPS Development

On Thu, 8 Jul 2004, Sridhar Adagada wrote:
> As you can see $6 is the length, my confusion is at the lines 12-14,
> 19, 20 why is the length added with 65535 and the comparison with 0

It's not `added with 65535', but `ANDed with 65535'. MIPS32 has 32-bit integer
operations only. If you want to do 16-bit math, all data has to be masked.

Anyway, for performance, it's better to do 32-bit math only.

> short cal_xxx(short *abs, short *coef, short len, short base)
> {
>  short i;
>  short sum = 0;
>
>  for (i = 0; i < length; i++)
>  {
>    sum += ( (unsigned int)abs[i] * (unsigned int)coef[i] );

Why cast to unsigned int while sum is a short? Unless you really want to rely
on sum being a short, you better make it int and do the truncation to short
after the loop.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimisation
  2004-07-08  8:59 ` Optimisation Geert Uytterhoeven
@ 2004-07-08  9:21   ` Sridhar Adagada
  2004-07-08  9:48     ` Optimisation Sridhar Adagada
  0 siblings, 1 reply; 4+ messages in thread
From: Sridhar Adagada @ 2004-07-08  9:21 UTC (permalink / raw)
  To: Geert Uytterhoeven

Thank you.  For some reason i have been reading ANDI ans ADDI.  But i
am still confused at lines 13, 14 and 15
  13         imax    $8, $6, 0
  14         srl     $10, $8, 3
  15         beq     $10, $0, .L62

Thanks for correcting me

Sri


On Thu, 8 Jul 2004 10:59:58 +0200 (MEST), Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Thu, 8 Jul 2004, Sridhar Adagada wrote:
> > As you can see $6 is the length, my confusion is at the lines 12-14,
> > 19, 20 why is the length added with 65535 and the comparison with 0
> 
> It's not `added with 65535', but `ANDed with 65535'. MIPS32 has 32-bit integer
> operations only. If you want to do 16-bit math, all data has to be masked.
> 
> Anyway, for performance, it's better to do 32-bit math only.
> 
> > short cal_xxx(short *abs, short *coef, short len, short base)
> > {
> >  short i;
> >  short sum = 0;
> >
> >  for (i = 0; i < length; i++)
> >  {
> >    sum += ( (unsigned int)abs[i] * (unsigned int)coef[i] );
> 
> Why cast to unsigned int while sum is a short? Unless you really want to rely
> on sum being a short, you better make it int and do the truncation to short
> after the loop.
> 
> Gr{oetje,eeting}s,
> 
>                                                Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                                            -- Linus Torvalds
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimisation
  2004-07-08  9:21   ` Optimisation Sridhar Adagada
@ 2004-07-08  9:48     ` Sridhar Adagada
  0 siblings, 0 replies; 4+ messages in thread
From: Sridhar Adagada @ 2004-07-08  9:48 UTC (permalink / raw)
  To: Geert Uytterhoeven

Thank you! I got it now If the length is less then 7, the loop is
handled differently for the fast access of the abs and coef.

Thanks you very much
Sri

On Thu, 8 Jul 2004 14:51:59 +0530, Sridhar Adagada <asridhars@gmail.com> wrote:
> Thank you.  For some reason i have been reading ANDI ans ADDI.  But i
> am still confused at lines 13, 14 and 15
>  13         imax    $8, $6, 0
>  14         srl     $10, $8, 3
>  15         beq     $10, $0, .L62
> 
> Thanks for correcting me
> 
> Sri
> 
> 
> 
> 
> On Thu, 8 Jul 2004 10:59:58 +0200 (MEST), Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Thu, 8 Jul 2004, Sridhar Adagada wrote:
> > > As you can see $6 is the length, my confusion is at the lines 12-14,
> > > 19, 20 why is the length added with 65535 and the comparison with 0
> >
> > It's not `added with 65535', but `ANDed with 65535'. MIPS32 has 32-bit integer
> > operations only. If you want to do 16-bit math, all data has to be masked.
> >
> > Anyway, for performance, it's better to do 32-bit math only.
> >
> > > short cal_xxx(short *abs, short *coef, short len, short base)
> > > {
> > >  short i;
> > >  short sum = 0;
> > >
> > >  for (i = 0; i < length; i++)
> > >  {
> > >    sum += ( (unsigned int)abs[i] * (unsigned int)coef[i] );
> >
> > Why cast to unsigned int while sum is a short? Unless you really want to rely
> > on sum being a short, you better make it int and do the truncation to short
> > after the loop.
> >
> > Gr{oetje,eeting}s,
> >
> >                                                Geert
> >
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> >
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> >                                                            -- Linus Torvalds
> >
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-07-08  9:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-08  6:38 Optimisation Sridhar Adagada
2004-07-08  8:59 ` Optimisation Geert Uytterhoeven
2004-07-08  9:21   ` Optimisation Sridhar Adagada
2004-07-08  9:48     ` Optimisation Sridhar Adagada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.