* hint?
@ 2006-09-14 19:58 Henio Paszczak
2006-09-14 22:05 ` hint? Marcin Kościelnicki
2006-09-14 23:19 ` hint? Robert Plantz
0 siblings, 2 replies; 3+ messages in thread
From: Henio Paszczak @ 2006-09-14 19:58 UTC (permalink / raw)
To: linux-assembly
Hi.
I've read hint in some magazine that if you are not
using the same registers in following lines new
procesor ( my is quite old PIII(coppermine)600Mhz )
can make commands syumyltaniously ie:
mov %eax,%ebx
mov %ecx,%edx
can be does allmost in the same time. i've check it by
using extremly simple example:
movl $3,%ebx
movl $3,%edx
movl $0b111111111111111111111111111,%ecx
loop:
sall $2,%eax
orl %ebx,%eax
sall $2,%edi
orl %edx,%edi
loop loop
it works ( agains theorem from magazine ) faster then
movl $3,%ebx
movl $3,%edx
movl $0b111111111111111111111111111,%ecx
loop:
sall $2,%eax
sall $2,%edi
orl %ebx,%eax
orl %edx,%edi
loop loop
so what is real answer ?
Secund i'he allways trying not to use moemory becouse
its extremly slow .
nut again if i exchange walue between registers with
using temporary memory:
movl %eax,temp
movl %ebx,%eax,
movl temp,%ebx
it works faster than:
movl %eax,%edx
movl %ebx,%eax,
movl %edx,%ebx
WHY? maybe linux is doing something in the mean time
... ?
amaizing hink is that
xchgl %eax,%ebx
works slowest :) WHY ?
I realy need time in my programs thats why i'm looking
for any optimalization...
Lukas
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: hint?
2006-09-14 19:58 hint? Henio Paszczak
@ 2006-09-14 22:05 ` Marcin Kościelnicki
2006-09-14 23:19 ` hint? Robert Plantz
1 sibling, 0 replies; 3+ messages in thread
From: Marcin Kościelnicki @ 2006-09-14 22:05 UTC (permalink / raw)
To: linux-assembly
> Hi.
> I've read hint in some magazine that if you are not
> using the same registers in following lines new
> procesor ( my is quite old PIII(coppermine)600Mhz )
> can make commands syumyltaniously ie:
>
> mov %eax,%ebx
> mov %ecx,%edx
>
> can be does allmost in the same time. i've check it by
> using extremly simple example:
>
> movl $3,%ebx
> movl $3,%edx
> movl $0b111111111111111111111111111,%ecx
>
> loop:
> sall $2,%eax
> orl %ebx,%eax
> sall $2,%edi
> orl %edx,%edi
> loop loop
>
> it works ( agains theorem from magazine ) faster then
>
> movl $3,%ebx
> movl $3,%edx
> movl $0b111111111111111111111111111,%ecx
>
> loop:
> sall $2,%eax
> sall $2,%edi
> orl %ebx,%eax
> orl %edx,%edi
> loop loop
>
> so what is real answer ?
Doing mutually-independent computations [that is, ones that can be executed
parallel without changing the outcome] simultanously indeed speeds up the
computation. HOWEVER, modern x86s look ahead several instructions at a time
and have the capability to move some instructions out of order a bit if it
can be proved to not change the result. Since your code snippets are quite
short, the CPU sees the whole thing at once and rearranges the code to
parallel version anyway. So basically, both versions should be executed in
about the same time, with minor differencies due to internal chip details
varying between different CPU models.
Scheduling operations for mutual independence is important when you're dealing
with bigger pieces of code. If your function does the same long computation
on 3 sets of data [and you can manage to fit 3 simultanous computations in
x86's small register set], it's much faster to interleave these three
computations than do them one after another.
> Secund i'he allways trying not to use moemory becouse
> its extremly slow .
>
> nut again if i exchange walue between registers with
> using temporary memory:
>
> movl %eax,temp
> movl %ebx,%eax,
> movl temp,%ebx
>
> it works faster than:
>
> movl %eax,%edx
> movl %ebx,%eax,
> movl %edx,%ebx
>
> WHY? maybe linux is doing something in the mean time
> ... ?
Where did you get that result? On my machine, the version using memory is 1.5x
slower than register-only, as expected.
> amaizing hink is that
>
> xchgl %eax,%ebx
>
> works slowest :) WHY ?
See, x86 is 20 years old. Some things that seemed like a good idea 20 years
ago proved to be teh suck by now. One major category of sucky things in x86
are useless instructions. One of them is xchg for register-to-register
exchange. It is slow, because intel only bothers to speed up instructions
that people actually use. And noone uses it, since code rarely swaps contents
of registers around. The variables usually just stay in one register. And
there's no reason to move them around later -- what's gained by freeing this
register only to occupy another one?
Note: xchg for memory is entirely different beast. It is way slower that you
might assume. That's because xchg for memory is actually useful, but only for
one thing: as an atomic operation for implementing locking mechanisms for
multi-threaded and/or multi-processor stuff. And locking needs special
measures to be taken so that this location is always consistent among all CPU
caches. And that is slow.
> I realy need time in my programs thats why i'm looking
> for any optimalization...
>
> Lukas
Marcin Ko≈õcielnicki
------------------------------------------------------------------------
Szybko i tanio ubezpiecz samochod!
Kupno polisy zajmie Ci 15 minut! Kontakt przez telefon albo Internet.
Kliknij i sprawdz: http://link.interia.pl/f19a0
-
To unsubscribe from this list: send the line "unsubscribe linux-assembly" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: hint?
2006-09-14 19:58 hint? Henio Paszczak
2006-09-14 22:05 ` hint? Marcin Kościelnicki
@ 2006-09-14 23:19 ` Robert Plantz
1 sibling, 0 replies; 3+ messages in thread
From: Robert Plantz @ 2006-09-14 23:19 UTC (permalink / raw)
To: Henio Paszczak, linux-assembly
Boy, do I feel stupid. I've been writing assembly language
using the gnu assembler for seven years and have even
written a textbook about it. Although it is not the subject
of this question, the example given:
Henio Paszczak wrote:
> -----------
> movl $3,%ebx
> movl $3,%edx
> movl $0b111111111111111111111111111,%ecx
>
> ----------
is the first time I learned that you can specify literals
in binary. I always thought that you had to use the
C syntax and express bit patterns in hexadecimal or
octal.
On the plus side, I'm still eager to learn things. :-)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-09-14 23:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 19:58 hint? Henio Paszczak
2006-09-14 22:05 ` hint? Marcin Kościelnicki
2006-09-14 23:19 ` hint? Robert Plantz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).