linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Going to sleep atomically
@ 2001-03-08 13:37 Ralph Blach
  2001-03-08 14:25 ` speed and space Optimization Srinivas Rao.M
  0 siblings, 1 reply; 8+ messages in thread
From: Ralph Blach @ 2001-03-08 13:37 UTC (permalink / raw)
  To: Embedded Linux list

[-- Attachment #1: Type: text/plain, Size: 81 bytes --]

In a device driver, What is the best way to go to sleep atomically?

Thanks

Chip

[-- Attachment #2: Card for Ralph Blach --]
[-- Type: text/x-vcard, Size: 247 bytes --]

begin:vcard
n:Blach;Ralph
tel;work:919-543-1207
x-mozilla-html:TRUE
url:www.ibm.com
org:IBM MicroElectronics
adr:;;3039 Cornwallis		;RTP;NC;27709;USA
version:2.1
email;internet:rcblach@raleigh.ibm.com
x-mozilla-cpt:;15936
fn:Ralph Blach
end:vcard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* speed and space Optimization
  2001-03-08 13:37 Going to sleep atomically Ralph Blach
@ 2001-03-08 14:25 ` Srinivas Rao.M
  2001-03-08 14:54   ` Gabriel Paubert
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivas Rao.M @ 2001-03-08 14:25 UTC (permalink / raw)
  Cc: Embedded Linux list


hi,

Please tell me the various set of compilation flags we should use in
order to optimize our code. I want to maintain two seperate versions of
my codebase: one with Speed optimized and the other with Space optimized.
I have tried enabling the -O. but it is a general optimizer. I want to
have two distinct codebases.

Thank you in advance.


Srini...
--

Give everything you have today, because anything left over is lost forever.
-Mick Rashid


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 14:25 ` speed and space Optimization Srinivas Rao.M
@ 2001-03-08 14:54   ` Gabriel Paubert
  2001-03-08 15:18     ` Srinivas Rao.M
  2001-03-08 17:18     ` Wolfgang Denk
  0 siblings, 2 replies; 8+ messages in thread
From: Gabriel Paubert @ 2001-03-08 14:54 UTC (permalink / raw)
  To: Srinivas Rao.M; +Cc: Embedded Linux list


On Thu, 8 Mar 2001, Srinivas Rao.M wrote:

>
> hi,
>
> Please tell me the various set of compilation flags we should use in
> order to optimize our code. I want to maintain two seperate versions of
> my codebase: one with Speed optimized and the other with Space optimized.
> I have tried enabling the -O. but it is a general optimizer. I want to
> have two distinct codebases.

Actually, the gcc code generation of PPC lacks in this area. There are
very few differences between -O2 -Os (first one is the standard for speed
although you might want to try -O3). -Os is in theory the space
optimization, but there are very few places in the rs6000/ppc description
that depend on whether the code should be optimized for space or not.
There will be some difference, but relatively very small.

	Regards,
	Gabriel.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 14:54   ` Gabriel Paubert
@ 2001-03-08 15:18     ` Srinivas Rao.M
  2001-03-08 15:48       ` Gabriel Paubert
  2001-03-08 17:18     ` Wolfgang Denk
  1 sibling, 1 reply; 8+ messages in thread
From: Srinivas Rao.M @ 2001-03-08 15:18 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Embedded Linux list


Ok. thanks for the reply Mr. Gabriel. How can i do the bench marking of
these options. Can i get some more information from somewhere on net.
-Srini...


On Thu, 8 Mar 2001, Gabriel Paubert wrote:

>
>
> On Thu, 8 Mar 2001, Srinivas Rao.M wrote:
>
> >
> > hi,
> >
> > Please tell me the various set of compilation flags we should use in
> > order to optimize our code. I want to maintain two seperate versions of
> > my codebase: one with Speed optimized and the other with Space optimized.
> > I have tried enabling the -O. but it is a general optimizer. I want to
> > have two distinct codebases.
>
> Actually, the gcc code generation of PPC lacks in this area. There are
> very few differences between -O2 -Os (first one is the standard for speed
> although you might want to try -O3). -Os is in theory the space
> optimization, but there are very few places in the rs6000/ppc description
> that depend on whether the code should be optimized for space or not.
> There will be some difference, but relatively very small.
>
> 	Regards,
> 	Gabriel.
>

--

Give everything you have today, because anything left over is lost forever.
-Mick Rashid


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 15:18     ` Srinivas Rao.M
@ 2001-03-08 15:48       ` Gabriel Paubert
  2001-03-08 23:47         ` Graham Stoney
  0 siblings, 1 reply; 8+ messages in thread
From: Gabriel Paubert @ 2001-03-08 15:48 UTC (permalink / raw)
  To: Srinivas Rao.M; +Cc: Embedded Linux list


On Thu, 8 Mar 2001, Srinivas Rao.M wrote:

>
> Ok. thanks for the reply Mr. Gabriel. How can i do the bench marking of
> these options. Can i get some more information from somewhere on net.
> -Srini...

Try "info gcc" for the optimization and machine specific options. There
are also specialized gcc mailing lists, but first read the documentation
before bothering the people on the lists.

The best benchmark is *always* the code that you want to run. On my tests
the difference between size and speed optimizations was always small
(although above the noise level), and sometimes surprising (-Os was faster
than -O2 which was faster than -O3 for some routines). The only thing you
can count on is that -O0 will generate the biggest and slowest code by
far, -O1 will make it already much smaller and faster, but still
noticeably slower than any of the higher optimization options.

But every application is different and you might notice a bigger impact
than me depending on processor and compiler options (I did these tests
about 18 months ago, compiler has improved since then but of course not as
much as we would like).

Lies, damn lies, statistics, and benchmarks...

	Regards,
	Gabriel.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 14:54   ` Gabriel Paubert
  2001-03-08 15:18     ` Srinivas Rao.M
@ 2001-03-08 17:18     ` Wolfgang Denk
  1 sibling, 0 replies; 8+ messages in thread
From: Wolfgang Denk @ 2001-03-08 17:18 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Srinivas Rao.M, Embedded Linux list


In message <Pine.HPX.4.10.10103081551290.9166-100000@gra-ux1.iram.es> you wrote:
>
> Actually, the gcc code generation of PPC lacks in this area. There are
> very few differences between -O2 -Os (first one is the standard for speed
> although you might want to try -O3). -Os is in theory the space
> optimization, but there are very few places in the rs6000/ppc description
> that depend on whether the code should be optimized for space or not.
> There will be some difference, but relatively very small.

Just tested on our PPCBoot firmware:

   text    data     bss     dec     hex filename	==> size
  80628   22096   10476  113200   1ba30 ppcboot-Os	100%
  86332   22096   10476  118904   1d078 ppcboot-O2	105%
  90464   22328   10476  123268   1e184 ppcboot-O3	109%

Differences are  small,  but  they  exist  (and  they  can  make  the
difference if the code fits in a 128k boot ROM or not).


Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
Hindsight is an exact science.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 15:48       ` Gabriel Paubert
@ 2001-03-08 23:47         ` Graham Stoney
  2001-03-09  0:26           ` Gabriel Paubert
  0 siblings, 1 reply; 8+ messages in thread
From: Graham Stoney @ 2001-03-08 23:47 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Srinivas Rao.M, Embedded Linux list


On Thu, Mar 08, 2001 at 04:48:21PM +0100, Gabriel Paubert wrote:
> -Os was faster than -O2 which was faster than -O3 for some routines.

I had similar results benchmarking our application on an 855T: -Os was both
smallest and fastest on gcc-2.95.2.  This makes sense since the I-cache on
embedded CPUs is relatively small, and tighter/smaller code takes less time
to fetch.  The difference between -Os and -O2 in terms of speed and space
were only very minor.  -O3 made code larger and slower, by inlining functions
and blowing the cache out too much; I'd strongly recommend against using it.

My suggestion is that the best starting point is to use -Os.

Regards,
Graham
--
Graham Stoney
Assistant Technology Manager
Canon Information Systems Research Australia
Ph: +61 2 9805 2909  Fax: +61 2 9805 2929

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: speed and space Optimization
  2001-03-08 23:47         ` Graham Stoney
@ 2001-03-09  0:26           ` Gabriel Paubert
  0 siblings, 0 replies; 8+ messages in thread
From: Gabriel Paubert @ 2001-03-09  0:26 UTC (permalink / raw)
  To: Graham Stoney; +Cc: Srinivas Rao.M, Embedded Linux list


On Fri, 9 Mar 2001, Graham Stoney wrote:

> On Thu, Mar 08, 2001 at 04:48:21PM +0100, Gabriel Paubert wrote:
> > -Os was faster than -O2 which was faster than -O3 for some routines.
>
> I had similar results benchmarking our application on an 855T: -Os was both
> smallest and fastest on gcc-2.95.2.  This makes sense since the I-cache on
> embedded CPUs is relatively small, and tighter/smaller code takes less time
> to fetch.  The difference between -Os and -O2 in terms of speed and space
> were only very minor.  -O3 made code larger and slower, by inlining functions
> and blowing the cache out too much; I'd strongly recommend against using it.

In my case it was on an FFT on a 603e. The code easily fitted in the cache
and did not call any other subroutine, so inlining was not an effect and I
did not see any loop being unrolled. Basically the code was only accessing
data arrays and performing fmul/fadd and fmadd FPU operations, lots of
shift and mask for addressing with powers of 2. Even in this case -Os
turned out to be better, I suspect that this is due to imperfect
scheduling by the compiler (on a 603e or 750, the rules for retirement are
fairly restrictive, much more than for issuing: the second retired can
only be integer or load, so FP followed by store blocks, like FP followed
by FP and I don't think that gcc takes these rules into account).

It was just an experiment and had no serious scientific value, but
IIRC, the higher optimization levels had a tendency to end up grouping all
the loads at the beginning of the loop, the FP operations in the middle
and the stores at the end. The compiler might be better now, I don't know.

>
> My suggestion is that the best starting point is to use -Os.

Indeed. I often compile my kernels with -Os too. A large part of the
kernel is straight inline code for which cache footprint is a very
important consideration. The few routines which have loops with large
iteration counts have been heaviliy optimized.

	Regards,
	Gabriel.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-03-09  0:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-08 13:37 Going to sleep atomically Ralph Blach
2001-03-08 14:25 ` speed and space Optimization Srinivas Rao.M
2001-03-08 14:54   ` Gabriel Paubert
2001-03-08 15:18     ` Srinivas Rao.M
2001-03-08 15:48       ` Gabriel Paubert
2001-03-08 23:47         ` Graham Stoney
2001-03-09  0:26           ` Gabriel Paubert
2001-03-08 17:18     ` Wolfgang Denk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).