* [parisc-linux] more cpup.c results
[not found] <20050105055412.68E06495698@palinux.hppa>
@ 2005-01-05 6:16 ` Grant Grundler
2005-01-05 8:20 ` Joel Soete
2005-01-05 8:40 ` Ryan Bradetich
[not found] ` <20050107095143.GN18497@tausq.org>
1 sibling, 2 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-05 6:16 UTC (permalink / raw)
To: parisc-linux
On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote:
> add prefetching to copy_user_page_asm
> matches asm now checked into build-tools/cpup.c
I committed a new version of copy_user_page_asm based on
the results of build-tools/cpup.c.
Here's the output from the last set of cpup2 (4regs) run:
grundler <577>while :; do ./cpup2; done
First Loop : min 9247 avg 12037 median 11250
Later Loops : min 5568 avg 7006 median 6906
First Loop : min 9180 avg 12051 median 11244
Later Loops : min 5557 avg 7003 median 6904
First Loop : min 9204 avg 12027 median 11239
Later Loops : min 5556 avg 7002 median 6901
First Loop : min 9197 avg 12032 median 11237
Later Loops : min 5546 avg 6996 median 6901
First Loop : min 9300 avg 12032 median 11225
Later Loops : min 5584 avg 7001 median 6901
It's essentially indistiguishable from cpup3 (6 regs) routine:
grundler <579>while :; do ./cpup3; done
First Loop : min 9188 avg 11992 median 11223
Later Loops : min 5493 avg 7002 median 6874
First Loop : min 9213 avg 11988 median 11224
Later Loops : min 5487 avg 7004 median 6873
First Loop : min 9252 avg 11991 median 11204
Later Loops : min 5487 avg 7004 median 6874
First Loop : min 9228 avg 12021 median 11219
Later Loops : min 5550 avg 7003 median 6879
First Loop : min 9200 avg 11994 median 11215
Later Loops : min 5514 avg 6997 median 6874
Which tells me the L1 cache is accessible in 1 cycle on PA8700.
And if other CPU implementations need 2 cycles, it wouldn't
hurt to commit the 6regs version.
Can folks try this on PA8000 and PA82000 for me?
Check /proc/cpuinfo if you aren't sure what you have.
Should be a simple cut/paste of 4 lines to a shell prompt:
gcc -O2 -o cpup0 cpup.c
gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c
gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
Please post the output to the mailing list along with /proc/cpuinfo.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [parisc-linux] more cpup.c results
2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler
@ 2005-01-05 8:20 ` Joel Soete
2005-01-05 8:40 ` Ryan Bradetich
1 sibling, 0 replies; 14+ messages in thread
From: Joel Soete @ 2005-01-05 8:20 UTC (permalink / raw)
To: Grant Grundler, parisc-linux
Hello Grant,
> -- Original Message --
> Date: Tue, 4 Jan 2005 23:16:13 -0700
> From: Grant Grundler <grundler@parisc-linux.org>
> To: parisc-linux@lists.parisc-linux.org
> Subject: [parisc-linux] more cpup.c results
>
>
> On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote:
> > add prefetching to copy_user_page_asm
> > matches asm now checked into build-tools/cpup.c
>
[...]
> Which tells me the L1 cache is accessible in 1 cycle on PA8700.
> And if other CPU implementations need 2 cycles, it wouldn't
> hurt to commit the 6regs version.
>
> Can folks try this on PA8000 and PA82000 for me?
> Check /proc/cpuinfo if you aren't sure what you have.
>
Unfortunately only pa8600 (n4k and b2k) ...
> Should be a simple cut/paste of 4 lines to a shell prompt:
>
> gcc -O2 -o cpup0 cpup.c
> gcc -O2 -march=3D2.0 -DLP64 -o cpup2 cpup.c
> gcc -O2 -march=3D2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
> for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
>
anyway here are some results from a b2k (runing obviously a 2.6.10-pa4 64=
bits):
# for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
TEST 1
First Loop : min 14462 avg 17576 median 15409
Later Loops : min 6628 avg 8597 median 7953
First Loop : min 10313 avg 13497 median 11727
Later Loops : min 3581 avg 4843 median 4568
First Loop : min 10714 avg 13703 median 11897
Later Loops : min 3630 avg 5033 median 4778
TEST 2
First Loop : min 14445 avg 17452 median 15428
Later Loops : min 6616 avg 8605 median 7945
First Loop : min 10358 avg 13510 median 11755
Later Loops : min 3597 avg 4835 median 4567
First Loop : min 10669 avg 13708 median 11885
Later Loops : min 3618 avg 5034 median 4780
TEST 3
First Loop : min 14459 avg 17437 median 15432
Later Loops : min 6621 avg 8592 median 7943
First Loop : min 10345 avg 13541 median 11732
Later Loops : min 3584 avg 4853 median 4566
First Loop : min 10658 avg 13695 median 11879
Later Loops : min 3637 avg 5032 median 4775
TEST 4
First Loop : min 14503 avg 17455 median 15429
Later Loops : min 6605 avg 8595 median 7945
First Loop : min 10265 avg 13515 median 11740
Later Loops : min 3566 avg 4835 median 4562
First Loop : min 10681 avg 13720 median 11886
Later Loops : min 3651 avg 5035 median 4778
TEST 5
First Loop : min 14472 avg 17460 median 15425
Later Loops : min 6627 avg 8590 median 7938
First Loop : min 10376 avg 13555 median 11742
Later Loops : min 3597 avg 4843 median 4570
First Loop : min 10684 avg 13689 median 11891
Later Loops : min 3663 avg 5027 median 4775
hth,
Joel
-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] more cpup.c results
2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler
2005-01-05 8:20 ` Joel Soete
@ 2005-01-05 8:40 ` Ryan Bradetich
2005-01-05 16:02 ` Grant Grundler
1 sibling, 1 reply; 14+ messages in thread
From: Ryan Bradetich @ 2005-01-05 8:40 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Grant,
> Can folks try this on PA8000 and PA82000 for me?
> Check /proc/cpuinfo if you aren't sure what you have.
processor : 0
cpu family : PA-RISC 2.0
cpu : PA8200 (PCX-U+)
cpu MHz : 200.000000
model : 9000/782/C200+
model name : Raven U 200 (9000/780/C200)
hversion : 0x000059d0
sversion : 0x00000481
I-cache : 512 KB
D-cache : 1024 KB (WB, 0-way associative)
ITLB entries : 120
DTLB entries : 120 - shared with ITLB
bogomips : 395.26
software id : 2005736878
> Should be a simple cut/paste of 4 lines to a shell prompt:
>
> gcc -O2 -o cpup0 cpup.c
> gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c
> gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
> for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
This is on a 64-bit kernel:
$ uname -a
Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux
TEST 1
First Loop : min 9990 avg 11444 median 10352
Later Loops : min 6290 avg 8673 median 8885
First Loop : min 8758 avg 10370 median 9312
Later Loops : min 5842 avg 7168 median 7024
First Loop : min 8701 avg 10277 median 9215
Later Loops : min 5670 avg 7244 median 7124
TEST 2
First Loop : min 9990 avg 11451 median 10353
Later Loops : min 6197 avg 8669 median 8880
First Loop : min 8748 avg 10379 median 9318
Later Loops : min 5768 avg 7166 median 7022
First Loop : min 8657 avg 10280 median 9208
Later Loops : min 5773 avg 7239 median 7123
TEST 3
First Loop : min 9993 avg 11442 median 10353
Later Loops : min 6278 avg 8670 median 8880
First Loop : min 8745 avg 10408 median 9318
Later Loops : min 5804 avg 7163 median 7023
First Loop : min 8681 avg 10340 median 9266
Later Loops : min 5663 avg 7238 median 7120
TEST 4
First Loop : min 9990 avg 11453 median 10347
Later Loops : min 6282 avg 8661 median 8877
First Loop : min 8751 avg 10400 median 9324
Later Loops : min 5750 avg 7171 median 7024
First Loop : min 8622 avg 10283 median 9213
Later Loops : min 5680 avg 7235 median 7119
TEST 5
First Loop : min 10032 avg 11442 median 10348
Later Loops : min 6224 avg 8688 median 8884
First Loop : min 8799 avg 10396 median 9323
Later Loops : min 5751 avg 7165 median 7021
First Loop : min 8622 avg 10286 median 9221
Later Loops : min 5653 avg 7240 median 7120
This is on a 32-bit kernel:
$ uname -a
Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux
TEST 1
First Loop : min 10924 avg 11555 median 11090
Later Loops : min 7744 avg 8196 median 8130
First Loop : min 9584 avg 10251 median 9790
Later Loops : min 6451 avg 6836 median 6784
First Loop : min 9487 avg 10104 median 9673
Later Loops : min 6202 avg 6604 median 6550
TEST 2
First Loop : min 10927 avg 11553 median 11097
Later Loops : min 7687 avg 8193 median 8130
First Loop : min 9594 avg 10267 median 9790
Later Loops : min 6451 avg 6853 median 6784
First Loop : min 9477 avg 10117 median 9654
Later Loops : min 6243 avg 6606 median 6549
TEST 3
First Loop : min 10943 avg 11549 median 11104
Later Loops : min 7670 avg 8197 median 8130
First Loop : min 9607 avg 10255 median 9790
Later Loops : min 6451 avg 6836 median 6783
First Loop : min 9487 avg 10144 median 9674
Later Loops : min 6216 avg 6604 median 6550
TEST 4
First Loop : min 10924 avg 11527 median 11083
Later Loops : min 6848 avg 8176 median 8118
First Loop : min 9610 avg 10260 median 9810
Later Loops : min 6451 avg 6837 median 6784
First Loop : min 9493 avg 10140 median 9670
Later Loops : min 6215 avg 6605 median 6552
TEST 5
First Loop : min 10924 avg 11538 median 11087
Later Loops : min 7703 avg 8191 median 8132
First Loop : min 9583 avg 10236 median 9793
Later Loops : min 6451 avg 6837 median 6783
First Loop : min 9487 avg 10132 median 9674
Later Loops : min 6171 avg 6606 median 6550
The K460 is a 8000 processor ... I'll see if I can get the K460
installed and updated to give you results from that system as well. I
am also working on getting you results from a 715/100
as well (currently in the middle of a new debian install).
Thanks,
- Ryan
> thanks,
> grant
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
--
Ryan Bradetich <rbradetich@uswest.net>
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] more cpup.c results
2005-01-05 8:40 ` Ryan Bradetich
@ 2005-01-05 16:02 ` Grant Grundler
0 siblings, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-05 16:02 UTC (permalink / raw)
To: Ryan Bradetich; +Cc: parisc-linux
On Wed, Jan 05, 2005 at 01:40:56AM -0700, Ryan Bradetich wrote:
> This is on a 64-bit kernel:
> $ uname -a
> Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux
>
> TEST 1
> First Loop : min 9990 avg 11444 median 10352
> Later Loops : min 6290 avg 8673 median 8885
...
thanks for the results!
> This is on a 32-bit kernel:
> $ uname -a
> Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux
>
> TEST 1
> First Loop : min 10924 avg 11555 median 11090
> Later Loops : min 7744 avg 8196 median 8130
> First Loop : min 9584 avg 10251 median 9790
> Later Loops : min 6451 avg 6836 median 6784
> First Loop : min 9487 avg 10104 median 9673
> Later Loops : min 6202 avg 6604 median 6550
Interesting that cpup3 is slightly faster than cpup2 with
the 32-bit kernel. Since user space is 32-bit always, I wouldn't
have expected a difference in "Later Loops" output.
> The K460 is a 8000 processor ... I'll see if I can get the K460
> installed and updated to give you results from that system as well. I
Well, don't sweat it. Others might have PA8000 box already up and running.
> am also working on getting you results from a 715/100
> as well (currently in the middle of a new debian install).
715 is PA1.1. cpup2/3 won't work there.
It would be worth trying variants of cpup0 (32-bit) scheduling
on PA1.1 machines. I'll leave that as an excercise for others.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] pa_memcpy: 2 small question
[not found] ` <20050107095143.GN18497@tausq.org>
@ 2005-01-09 19:07 ` Joel Soete
2005-01-10 0:13 ` [parisc-linux] " Randolph Chung
2005-02-20 23:44 ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler
1 sibling, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-09 19:07 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
Hello Randolph,
I just studying your pa_memcpy code (always to see if I can use it to improve stuff that you suggested me: l*).
And I wonder understand some values in copy_dstalign():
[...]
in the shift computing:
/* Calculate how to shift a word read at the memory operation
aligned srcp to make it aligned for copy. */
sh_1 = 8 * (src % sizeof(unsigned int));
sh_2 = 8 * sizeof(unsigned int) - sh_1;
what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw and so 2 (a pair of word)? )
next in
switch (len % 4)
{
is '4' because as mentioned in copy_dstalign() description this 'Handles _4_ words per loop'
Thanks in advance,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
@ 2005-01-10 0:13 ` Randolph Chung
2005-01-10 8:44 ` Joel Soete
0 siblings, 1 reply; 14+ messages in thread
From: Randolph Chung @ 2005-01-10 0:13 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
> in the shift computing:
> /* Calculate how to shift a word read at the memory operation
> aligned srcp to make it aligned for copy. */
> sh_1 = 8 * (src % sizeof(unsigned int));
> sh_2 = 8 * sizeof(unsigned int) - sh_1;
>
> what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw
> and so 2 (a pair of word)? )
no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit
integer.
what we are trying to achieve is that given two adjacent 32-bit numbers,
we want to extract a 32-bit number "in the middle" of the two (aligned)
32-bit values.
if you look carefully, actually the implementation of MERGE does not use
both sh_1 and sh_2. In the original implementation, MERGE was
implemented using two SHIFT operations plus an OR operation. This was
optimized to use shrpw because this PA insn can do all three operations
in a single step, so it saves a lot of cycles.
> next in
> switch (len % 4)
> {
>
> is '4' because as mentioned in copy_dstalign() description this 'Handles
> _4_ words per loop'
yes.
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 0:13 ` [parisc-linux] " Randolph Chung
@ 2005-01-10 8:44 ` Joel Soete
2005-01-10 8:54 ` Randolph Chung
0 siblings, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-10 8:44 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
> > in the shift computing:
> > /* Calculate how to shift a word read at the memory operation=
> > aligned srcp to make it aligned for copy. */
> > sh_1 =3D 8 * (src % sizeof(unsigned int));
> > sh_2 =3D 8 * sizeof(unsigned int) - sh_1;
> >
> > what means '8' (=3D=3D 2 * word size; i.e. 2 * 32 bit because MERGE u=
se shrpw
>
> > and so 2 (a pair of word)? )
>
> no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit
> integer.
>
Ah Ok ;-)
> what we are trying to achieve is that given two adjacent 32-bit numbers=
,
> we want to extract a 32-bit number "in the middle" of the two (aligned)=
> 32-bit values.
>
> if you look carefully, actually the implementation of MERGE does not us=
e
> both sh_1 and sh_2.
Yes ...
> In the original implementation, MERGE was
> implemented using two SHIFT operations plus an OR operation. This was
> optimized to use shrpw because this PA insn can do all three operations=
> in a single step, so it saves a lot of cycles.
>
Cool :-)
> > next in
> > switch (len % 4)
> > {
> >
> > is '4' because as mentioned in copy_dstalign() description this 'Hand=
les
>
> > _4_ words per loop'
>
> yes.
>
I need obvioulsy to understand those details beacuse to work with str I w=
ould
need to check somehow if there are a byte =3D=3D 0 in the on going word (=
I already
find a formula to do this into other arch) to jump to byte_copy (so need
more work ;-)
Thanks a lot,
Joel
-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 8:44 ` Joel Soete
@ 2005-01-10 8:54 ` Randolph Chung
2005-01-10 17:12 ` Joel Soete
2005-01-11 18:14 ` Joel Soete
0 siblings, 2 replies; 14+ messages in thread
From: Randolph Chung @ 2005-01-10 8:54 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
> I need obvioulsy to understand those details beacuse to work with str I would
> need to check somehow if there are a byte == 0 in the on going word (I already
> find a formula to do this into other arch) to jump to byte_copy (so need
> more work ;-)
maybe you can use something like (not tested):
loop:
ldw 0(source),tmp
uaddcm,nbz tmp,%r0,%r0
b,n byte_copy
b loop
stw tmp, 0(dst)
byte_copy:
....
uaddcm should be able to let you determine if there are any 0's in the
current word in a single insn.
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 8:54 ` Randolph Chung
@ 2005-01-10 17:12 ` Joel Soete
2005-01-10 17:17 ` Grant Grundler
2005-01-10 20:02 ` Stuart Brady
2005-01-11 18:14 ` Joel Soete
1 sibling, 2 replies; 14+ messages in thread
From: Joel Soete @ 2005-01-10 17:12 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
btw I just recover the libc _wordcopy_fwd_dest_aligned()
>
> > I need obvioulsy to understand those details beacuse to work with str=
I
> would
> > need to check somehow if there are a byte =3D=3D 0 in the on going wo=
rd (I
> already
> > find a formula to do this into other arch) to jump to byte_copy (so n=
eed
> > more work ;-)
>
> maybe you can use something like (not tested):
>
> loop:
> ldw 0(source),tmp
> uaddcm,nbz tmp,%r0,%r0
mmm I always confused by %r0: is a magic one containing zero (when read i=
irc)
what uaddcm is suposed to do: tmp + ~%r0 (i.e. tmp+0xffffffff)? well hav=
e
to study ;-)
well any way if no trap occurs the results is put in %r0: I don't yet wel=
l
understand this usage of %r0 as target reg (excepted for prefetching)?
> b,n byte_copy
> b loop
> stw tmp, 0(dst)
>
> byte_copy:
> ....
>
> uaddcm should be able to let you determine if there are any 0's in the
> current word in a single insn.
>
Thanks ...
I still trying to understand the previous formula I found somewhere :
; NOTE: If a null char. exists, return 0.
; if ((x - 0x01010101) & ~x & 0x80808080)
; return 0;
(here it comes from m32r/lib/useropcy.c)
Joel
-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 17:12 ` Joel Soete
@ 2005-01-10 17:17 ` Grant Grundler
2005-01-10 20:02 ` Stuart Brady
1 sibling, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-10 17:17 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote:
> well any way if no trap occurs the results is put in %r0: I don't yet well
> understand this usage of %r0 as target reg (excepted for prefetching)?
writes to %r0 are simply discarded.
It's a handy target when we don't care about the result since
it will not cause any interlocks with previous or succesive instructions.
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 17:12 ` Joel Soete
2005-01-10 17:17 ` Grant Grundler
@ 2005-01-10 20:02 ` Stuart Brady
1 sibling, 0 replies; 14+ messages in thread
From: Stuart Brady @ 2005-01-10 20:02 UTC (permalink / raw)
To: parisc-linux
On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote:
> I still trying to understand the previous formula I found somewhere :
> ; NOTE: If a null char. exists, return 0.
> ; if ((x - 0x01010101) & ~x & 0x80808080)
> ; return 0;
> (here it comes from m32r/lib/useropcy.c)
It's subtracting one from each byte and checking for overflow. If the
most significant bit is set in x - 1, but not in x, then x must be 0.
The & with ~x is used to mask out the most significant bit in each
byte, if it was already set in x.
If a byte is equal to 0, bytes to the "left" of it will be affected by
the overflow, but that doesn't matter. Quite a neat trick, really.
--
Stuart Brady
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-10 8:54 ` Randolph Chung
2005-01-10 17:12 ` Joel Soete
@ 2005-01-11 18:14 ` Joel Soete
2005-01-12 1:49 ` Randolph Chung
1 sibling, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-11 18:14 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
Hello Randolph,
>
> > I need obvioulsy to understand those details
Well I think that I have understand (for the most) cpy_dstalign() what a
nice work and so interesting ins't it :-)
btw I just have an aditional question (not to work on now but later I ris=
k
to forget):
why don't you save the usage of OPSIZ (defined sizeof(unsigned long int))=
and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I
missed sorry )
>beacuse to work with str I
> would
> > need to check somehow if there are a byte =3D=3D 0 in the on going wo=
rd (I
> already
> > find a formula to do this into other arch) to jump to byte_copy (so n=
eed
> > more work ;-)
>
I can try to continue now ;-)
Thanks again to all for help,
Joel
-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question
2005-01-11 18:14 ` Joel Soete
@ 2005-01-12 1:49 ` Randolph Chung
0 siblings, 0 replies; 14+ messages in thread
From: Randolph Chung @ 2005-01-12 1:49 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
> why don't you save the usage of OPSIZ (defined sizeof(unsigned long int))
> and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I
> missed sorry )
that can be done as well, but cpy_dstaligned is supposed to be a
slow-path for the copy routine, and doing __LP64__ stuff means i can't
easily test it in userspace, so i didn't bother. you are certainly
welcome to try.
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] revisit copy_user_page_asm microbenchmarks
[not found] ` <20050107095143.GN18497@tausq.org>
2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
@ 2005-02-20 23:44 ` Grant Grundler
1 sibling, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-02-20 23:44 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
On Fri, Jan 07, 2005 at 01:51:43AM -0800, Randolph Chung wrote:
> Grant, do you have any results of a somewhat more macrobenchmark that
> shows what happens with this patch installed?
I don't. I wanted to run the ones that you suggested but just
didn't have time. Before this gets totally lost, here are the results
I did collect for 64-bit kernel using ldd/std in copy_user_page_asm.
Can someone collect a comparable set using the original copy_user_page_asm?
This is on J6700 (750Mhz PA8700) w/4GB RAM.
> e.g. what does it do to "dd if=/dev/zero of=foo bs=1k count=500000"?
Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux
root@gggj6k:/home# time dd if=/dev/zero of=foo bs=1k count=500000
"dd" time "dd" B/s "time" real user sys
17.018413 30085062 0m17.034s 0m0.136s 0m11.153s
15.877587 32246713 0m21.932s 0m0.145s 0m11.866s
11.948184 42851700 0m12.642s 0m0.149s 0m11.785s
14.944728 34259573 0m27.936s 0m0.129s 0m11.861s
12.329126 41527680 0m13.035s 0m0.156s 0m11.896s
11.369272 45033666 0m22.073s 0m0.134s 0m11.605s
> what does it do to a kernel compile?
2.6.10-rc3-pa8 kernel:
real 47m42.656s
user 20m15.723s
sys 66m52.259s
> what does it do to a bonnie run?
Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux
root@gggj6k:/mnt# bonnie -u grundler -f -x 3 -m 64SMP -d /mnt
Using uid:1001, gid:1001.
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
64SMP,16G,,,57333,25,15918,12,,,25829,10,166.8,1,16,2368,98,+++++,+++,+++++,+++,2589,99,+++++,+++,7093,99
...
64SMP,16G,,,58014,26,15895,12,,,25461,10,181.7,1,16,2416,98,+++++,+++,+++++,+++,2598,99,+++++,+++,7361,99
...
64SMP,16G,,,57416,25,16201,13,,,25841,10,186.1,1,16,2452,96,+++++,+++,+++++,+++,2621,99,+++++,+++,7544,100
And one more test that I'm not sure is relevant:
time sgp_dd if=/dev/sda of=/dev/sdc bpt=16k count=17781520
2.6.10-rc3-pa8 kernel:
real 15m13.011s
user 0m0.042s
sys 4m6.301s
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-02-20 23:44 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050105055412.68E06495698@palinux.hppa>
2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler
2005-01-05 8:20 ` Joel Soete
2005-01-05 8:40 ` Ryan Bradetich
2005-01-05 16:02 ` Grant Grundler
[not found] ` <20050107095143.GN18497@tausq.org>
2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
2005-01-10 0:13 ` [parisc-linux] " Randolph Chung
2005-01-10 8:44 ` Joel Soete
2005-01-10 8:54 ` Randolph Chung
2005-01-10 17:12 ` Joel Soete
2005-01-10 17:17 ` Grant Grundler
2005-01-10 20:02 ` Stuart Brady
2005-01-11 18:14 ` Joel Soete
2005-01-12 1:49 ` Randolph Chung
2005-02-20 23:44 ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.