* [parisc-linux] more cpup.c results [not found] <20050105055412.68E06495698@palinux.hppa> @ 2005-01-05 6:16 ` Grant Grundler 2005-01-05 8:20 ` Joel Soete 2005-01-05 8:40 ` Ryan Bradetich [not found] ` <20050107095143.GN18497@tausq.org> 1 sibling, 2 replies; 14+ messages in thread From: Grant Grundler @ 2005-01-05 6:16 UTC (permalink / raw) To: parisc-linux On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote: > add prefetching to copy_user_page_asm > matches asm now checked into build-tools/cpup.c I committed a new version of copy_user_page_asm based on the results of build-tools/cpup.c. Here's the output from the last set of cpup2 (4regs) run: grundler <577>while :; do ./cpup2; done First Loop : min 9247 avg 12037 median 11250 Later Loops : min 5568 avg 7006 median 6906 First Loop : min 9180 avg 12051 median 11244 Later Loops : min 5557 avg 7003 median 6904 First Loop : min 9204 avg 12027 median 11239 Later Loops : min 5556 avg 7002 median 6901 First Loop : min 9197 avg 12032 median 11237 Later Loops : min 5546 avg 6996 median 6901 First Loop : min 9300 avg 12032 median 11225 Later Loops : min 5584 avg 7001 median 6901 It's essentially indistiguishable from cpup3 (6 regs) routine: grundler <579>while :; do ./cpup3; done First Loop : min 9188 avg 11992 median 11223 Later Loops : min 5493 avg 7002 median 6874 First Loop : min 9213 avg 11988 median 11224 Later Loops : min 5487 avg 7004 median 6873 First Loop : min 9252 avg 11991 median 11204 Later Loops : min 5487 avg 7004 median 6874 First Loop : min 9228 avg 12021 median 11219 Later Loops : min 5550 avg 7003 median 6879 First Loop : min 9200 avg 11994 median 11215 Later Loops : min 5514 avg 6997 median 6874 Which tells me the L1 cache is accessible in 1 cycle on PA8700. And if other CPU implementations need 2 cycles, it wouldn't hurt to commit the 6regs version. Can folks try this on PA8000 and PA82000 for me? Check /proc/cpuinfo if you aren't sure what you have. Should be a simple cut/paste of 4 lines to a shell prompt: gcc -O2 -o cpup0 cpup.c gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done Please post the output to the mailing list along with /proc/cpuinfo. thanks, grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [parisc-linux] more cpup.c results 2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler @ 2005-01-05 8:20 ` Joel Soete 2005-01-05 8:40 ` Ryan Bradetich 1 sibling, 0 replies; 14+ messages in thread From: Joel Soete @ 2005-01-05 8:20 UTC (permalink / raw) To: Grant Grundler, parisc-linux Hello Grant, > -- Original Message -- > Date: Tue, 4 Jan 2005 23:16:13 -0700 > From: Grant Grundler <grundler@parisc-linux.org> > To: parisc-linux@lists.parisc-linux.org > Subject: [parisc-linux] more cpup.c results > > > On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote: > > add prefetching to copy_user_page_asm > > matches asm now checked into build-tools/cpup.c > [...] > Which tells me the L1 cache is accessible in 1 cycle on PA8700. > And if other CPU implementations need 2 cycles, it wouldn't > hurt to commit the 6regs version. > > Can folks try this on PA8000 and PA82000 for me? > Check /proc/cpuinfo if you aren't sure what you have. > Unfortunately only pa8600 (n4k and b2k) ... > Should be a simple cut/paste of 4 lines to a shell prompt: > > gcc -O2 -o cpup0 cpup.c > gcc -O2 -march=3D2.0 -DLP64 -o cpup2 cpup.c > gcc -O2 -march=3D2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c > for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done > anyway here are some results from a b2k (runing obviously a 2.6.10-pa4 64= bits): # for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done TEST 1 First Loop : min 14462 avg 17576 median 15409 Later Loops : min 6628 avg 8597 median 7953 First Loop : min 10313 avg 13497 median 11727 Later Loops : min 3581 avg 4843 median 4568 First Loop : min 10714 avg 13703 median 11897 Later Loops : min 3630 avg 5033 median 4778 TEST 2 First Loop : min 14445 avg 17452 median 15428 Later Loops : min 6616 avg 8605 median 7945 First Loop : min 10358 avg 13510 median 11755 Later Loops : min 3597 avg 4835 median 4567 First Loop : min 10669 avg 13708 median 11885 Later Loops : min 3618 avg 5034 median 4780 TEST 3 First Loop : min 14459 avg 17437 median 15432 Later Loops : min 6621 avg 8592 median 7943 First Loop : min 10345 avg 13541 median 11732 Later Loops : min 3584 avg 4853 median 4566 First Loop : min 10658 avg 13695 median 11879 Later Loops : min 3637 avg 5032 median 4775 TEST 4 First Loop : min 14503 avg 17455 median 15429 Later Loops : min 6605 avg 8595 median 7945 First Loop : min 10265 avg 13515 median 11740 Later Loops : min 3566 avg 4835 median 4562 First Loop : min 10681 avg 13720 median 11886 Later Loops : min 3651 avg 5035 median 4778 TEST 5 First Loop : min 14472 avg 17460 median 15425 Later Loops : min 6627 avg 8590 median 7938 First Loop : min 10376 avg 13555 median 11742 Later Loops : min 3597 avg 4843 median 4570 First Loop : min 10684 avg 13689 median 11891 Later Loops : min 3663 avg 5027 median 4775 hth, Joel -------------------------------------------------------------------------= -- Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.= .. http://reg.tiscali.be/adsl/default.asp?lg=3DFR _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] more cpup.c results 2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler 2005-01-05 8:20 ` Joel Soete @ 2005-01-05 8:40 ` Ryan Bradetich 2005-01-05 16:02 ` Grant Grundler 1 sibling, 1 reply; 14+ messages in thread From: Ryan Bradetich @ 2005-01-05 8:40 UTC (permalink / raw) To: Grant Grundler; +Cc: parisc-linux Grant, > Can folks try this on PA8000 and PA82000 for me? > Check /proc/cpuinfo if you aren't sure what you have. processor : 0 cpu family : PA-RISC 2.0 cpu : PA8200 (PCX-U+) cpu MHz : 200.000000 model : 9000/782/C200+ model name : Raven U 200 (9000/780/C200) hversion : 0x000059d0 sversion : 0x00000481 I-cache : 512 KB D-cache : 1024 KB (WB, 0-way associative) ITLB entries : 120 DTLB entries : 120 - shared with ITLB bogomips : 395.26 software id : 2005736878 > Should be a simple cut/paste of 4 lines to a shell prompt: > > gcc -O2 -o cpup0 cpup.c > gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c > gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c > for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done This is on a 64-bit kernel: $ uname -a Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux TEST 1 First Loop : min 9990 avg 11444 median 10352 Later Loops : min 6290 avg 8673 median 8885 First Loop : min 8758 avg 10370 median 9312 Later Loops : min 5842 avg 7168 median 7024 First Loop : min 8701 avg 10277 median 9215 Later Loops : min 5670 avg 7244 median 7124 TEST 2 First Loop : min 9990 avg 11451 median 10353 Later Loops : min 6197 avg 8669 median 8880 First Loop : min 8748 avg 10379 median 9318 Later Loops : min 5768 avg 7166 median 7022 First Loop : min 8657 avg 10280 median 9208 Later Loops : min 5773 avg 7239 median 7123 TEST 3 First Loop : min 9993 avg 11442 median 10353 Later Loops : min 6278 avg 8670 median 8880 First Loop : min 8745 avg 10408 median 9318 Later Loops : min 5804 avg 7163 median 7023 First Loop : min 8681 avg 10340 median 9266 Later Loops : min 5663 avg 7238 median 7120 TEST 4 First Loop : min 9990 avg 11453 median 10347 Later Loops : min 6282 avg 8661 median 8877 First Loop : min 8751 avg 10400 median 9324 Later Loops : min 5750 avg 7171 median 7024 First Loop : min 8622 avg 10283 median 9213 Later Loops : min 5680 avg 7235 median 7119 TEST 5 First Loop : min 10032 avg 11442 median 10348 Later Loops : min 6224 avg 8688 median 8884 First Loop : min 8799 avg 10396 median 9323 Later Loops : min 5751 avg 7165 median 7021 First Loop : min 8622 avg 10286 median 9221 Later Loops : min 5653 avg 7240 median 7120 This is on a 32-bit kernel: $ uname -a Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux TEST 1 First Loop : min 10924 avg 11555 median 11090 Later Loops : min 7744 avg 8196 median 8130 First Loop : min 9584 avg 10251 median 9790 Later Loops : min 6451 avg 6836 median 6784 First Loop : min 9487 avg 10104 median 9673 Later Loops : min 6202 avg 6604 median 6550 TEST 2 First Loop : min 10927 avg 11553 median 11097 Later Loops : min 7687 avg 8193 median 8130 First Loop : min 9594 avg 10267 median 9790 Later Loops : min 6451 avg 6853 median 6784 First Loop : min 9477 avg 10117 median 9654 Later Loops : min 6243 avg 6606 median 6549 TEST 3 First Loop : min 10943 avg 11549 median 11104 Later Loops : min 7670 avg 8197 median 8130 First Loop : min 9607 avg 10255 median 9790 Later Loops : min 6451 avg 6836 median 6783 First Loop : min 9487 avg 10144 median 9674 Later Loops : min 6216 avg 6604 median 6550 TEST 4 First Loop : min 10924 avg 11527 median 11083 Later Loops : min 6848 avg 8176 median 8118 First Loop : min 9610 avg 10260 median 9810 Later Loops : min 6451 avg 6837 median 6784 First Loop : min 9493 avg 10140 median 9670 Later Loops : min 6215 avg 6605 median 6552 TEST 5 First Loop : min 10924 avg 11538 median 11087 Later Loops : min 7703 avg 8191 median 8132 First Loop : min 9583 avg 10236 median 9793 Later Loops : min 6451 avg 6837 median 6783 First Loop : min 9487 avg 10132 median 9674 Later Loops : min 6171 avg 6606 median 6550 The K460 is a 8000 processor ... I'll see if I can get the K460 installed and updated to give you results from that system as well. I am also working on getting you results from a 715/100 as well (currently in the middle of a new debian install). Thanks, - Ryan > thanks, > grant > _______________________________________________ > parisc-linux mailing list > parisc-linux@lists.parisc-linux.org > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux > -- Ryan Bradetich <rbradetich@uswest.net> _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] more cpup.c results 2005-01-05 8:40 ` Ryan Bradetich @ 2005-01-05 16:02 ` Grant Grundler 0 siblings, 0 replies; 14+ messages in thread From: Grant Grundler @ 2005-01-05 16:02 UTC (permalink / raw) To: Ryan Bradetich; +Cc: parisc-linux On Wed, Jan 05, 2005 at 01:40:56AM -0700, Ryan Bradetich wrote: > This is on a 64-bit kernel: > $ uname -a > Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux > > TEST 1 > First Loop : min 9990 avg 11444 median 10352 > Later Loops : min 6290 avg 8673 median 8885 ... thanks for the results! > This is on a 32-bit kernel: > $ uname -a > Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux > > TEST 1 > First Loop : min 10924 avg 11555 median 11090 > Later Loops : min 7744 avg 8196 median 8130 > First Loop : min 9584 avg 10251 median 9790 > Later Loops : min 6451 avg 6836 median 6784 > First Loop : min 9487 avg 10104 median 9673 > Later Loops : min 6202 avg 6604 median 6550 Interesting that cpup3 is slightly faster than cpup2 with the 32-bit kernel. Since user space is 32-bit always, I wouldn't have expected a difference in "Later Loops" output. > The K460 is a 8000 processor ... I'll see if I can get the K460 > installed and updated to give you results from that system as well. I Well, don't sweat it. Others might have PA8000 box already up and running. > am also working on getting you results from a 715/100 > as well (currently in the middle of a new debian install). 715 is PA1.1. cpup2/3 won't work there. It would be worth trying variants of cpup0 (32-bit) scheduling on PA1.1 machines. I'll leave that as an excercise for others. thanks, grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20050107095143.GN18497@tausq.org>]
* [parisc-linux] pa_memcpy: 2 small question [not found] ` <20050107095143.GN18497@tausq.org> @ 2005-01-09 19:07 ` Joel Soete 2005-01-10 0:13 ` [parisc-linux] " Randolph Chung 2005-02-20 23:44 ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler 1 sibling, 1 reply; 14+ messages in thread From: Joel Soete @ 2005-01-09 19:07 UTC (permalink / raw) To: Randolph Chung; +Cc: parisc-linux Hello Randolph, I just studying your pa_memcpy code (always to see if I can use it to improve stuff that you suggested me: l*). And I wonder understand some values in copy_dstalign(): [...] in the shift computing: /* Calculate how to shift a word read at the memory operation aligned srcp to make it aligned for copy. */ sh_1 = 8 * (src % sizeof(unsigned int)); sh_2 = 8 * sizeof(unsigned int) - sh_1; what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw and so 2 (a pair of word)? ) next in switch (len % 4) { is '4' because as mentioned in copy_dstalign() description this 'Handles _4_ words per loop' Thanks in advance, Joel _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete @ 2005-01-10 0:13 ` Randolph Chung 2005-01-10 8:44 ` Joel Soete 0 siblings, 1 reply; 14+ messages in thread From: Randolph Chung @ 2005-01-10 0:13 UTC (permalink / raw) To: Joel Soete; +Cc: parisc-linux > in the shift computing: > /* Calculate how to shift a word read at the memory operation > aligned srcp to make it aligned for copy. */ > sh_1 = 8 * (src % sizeof(unsigned int)); > sh_2 = 8 * sizeof(unsigned int) - sh_1; > > what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw > and so 2 (a pair of word)? ) no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit integer. what we are trying to achieve is that given two adjacent 32-bit numbers, we want to extract a 32-bit number "in the middle" of the two (aligned) 32-bit values. if you look carefully, actually the implementation of MERGE does not use both sh_1 and sh_2. In the original implementation, MERGE was implemented using two SHIFT operations plus an OR operation. This was optimized to use shrpw because this PA insn can do all three operations in a single step, so it saves a lot of cycles. > next in > switch (len % 4) > { > > is '4' because as mentioned in copy_dstalign() description this 'Handles > _4_ words per loop' yes. randolph -- Randolph Chung Debian GNU/Linux Developer, hppa/ia64 ports http://www.tausq.org/ _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 0:13 ` [parisc-linux] " Randolph Chung @ 2005-01-10 8:44 ` Joel Soete 2005-01-10 8:54 ` Randolph Chung 0 siblings, 1 reply; 14+ messages in thread From: Joel Soete @ 2005-01-10 8:44 UTC (permalink / raw) To: Randolph Chung; +Cc: parisc-linux > > in the shift computing: > > /* Calculate how to shift a word read at the memory operation= > > aligned srcp to make it aligned for copy. */ > > sh_1 =3D 8 * (src % sizeof(unsigned int)); > > sh_2 =3D 8 * sizeof(unsigned int) - sh_1; > > > > what means '8' (=3D=3D 2 * word size; i.e. 2 * 32 bit because MERGE u= se shrpw > > > and so 2 (a pair of word)? ) > > no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit > integer. > Ah Ok ;-) > what we are trying to achieve is that given two adjacent 32-bit numbers= , > we want to extract a 32-bit number "in the middle" of the two (aligned)= > 32-bit values. > > if you look carefully, actually the implementation of MERGE does not us= e > both sh_1 and sh_2. Yes ... > In the original implementation, MERGE was > implemented using two SHIFT operations plus an OR operation. This was > optimized to use shrpw because this PA insn can do all three operations= > in a single step, so it saves a lot of cycles. > Cool :-) > > next in > > switch (len % 4) > > { > > > > is '4' because as mentioned in copy_dstalign() description this 'Hand= les > > > _4_ words per loop' > > yes. > I need obvioulsy to understand those details beacuse to work with str I w= ould need to check somehow if there are a byte =3D=3D 0 in the on going word (= I already find a formula to do this into other arch) to jump to byte_copy (so need more work ;-) Thanks a lot, Joel -------------------------------------------------------------------------= -- Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80 http://reg.tiscali.be/adsl/default.asp?lg=3DFR _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 8:44 ` Joel Soete @ 2005-01-10 8:54 ` Randolph Chung 2005-01-10 17:12 ` Joel Soete 2005-01-11 18:14 ` Joel Soete 0 siblings, 2 replies; 14+ messages in thread From: Randolph Chung @ 2005-01-10 8:54 UTC (permalink / raw) To: Joel Soete; +Cc: parisc-linux > I need obvioulsy to understand those details beacuse to work with str I would > need to check somehow if there are a byte == 0 in the on going word (I already > find a formula to do this into other arch) to jump to byte_copy (so need > more work ;-) maybe you can use something like (not tested): loop: ldw 0(source),tmp uaddcm,nbz tmp,%r0,%r0 b,n byte_copy b loop stw tmp, 0(dst) byte_copy: .... uaddcm should be able to let you determine if there are any 0's in the current word in a single insn. randolph -- Randolph Chung Debian GNU/Linux Developer, hppa/ia64 ports http://www.tausq.org/ _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 8:54 ` Randolph Chung @ 2005-01-10 17:12 ` Joel Soete 2005-01-10 17:17 ` Grant Grundler 2005-01-10 20:02 ` Stuart Brady 2005-01-11 18:14 ` Joel Soete 1 sibling, 2 replies; 14+ messages in thread From: Joel Soete @ 2005-01-10 17:12 UTC (permalink / raw) To: Randolph Chung; +Cc: parisc-linux btw I just recover the libc _wordcopy_fwd_dest_aligned() > > > I need obvioulsy to understand those details beacuse to work with str= I > would > > need to check somehow if there are a byte =3D=3D 0 in the on going wo= rd (I > already > > find a formula to do this into other arch) to jump to byte_copy (so n= eed > > more work ;-) > > maybe you can use something like (not tested): > > loop: > ldw 0(source),tmp > uaddcm,nbz tmp,%r0,%r0 mmm I always confused by %r0: is a magic one containing zero (when read i= irc) what uaddcm is suposed to do: tmp + ~%r0 (i.e. tmp+0xffffffff)? well hav= e to study ;-) well any way if no trap occurs the results is put in %r0: I don't yet wel= l understand this usage of %r0 as target reg (excepted for prefetching)? > b,n byte_copy > b loop > stw tmp, 0(dst) > > byte_copy: > .... > > uaddcm should be able to let you determine if there are any 0's in the > current word in a single insn. > Thanks ... I still trying to understand the previous formula I found somewhere : ; NOTE: If a null char. exists, return 0. ; if ((x - 0x01010101) & ~x & 0x80808080) ; return 0; (here it comes from m32r/lib/useropcy.c) Joel -------------------------------------------------------------------------= -- Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80 http://reg.tiscali.be/adsl/default.asp?lg=3DFR _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 17:12 ` Joel Soete @ 2005-01-10 17:17 ` Grant Grundler 2005-01-10 20:02 ` Stuart Brady 1 sibling, 0 replies; 14+ messages in thread From: Grant Grundler @ 2005-01-10 17:17 UTC (permalink / raw) To: Joel Soete; +Cc: parisc-linux On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote: > well any way if no trap occurs the results is put in %r0: I don't yet well > understand this usage of %r0 as target reg (excepted for prefetching)? writes to %r0 are simply discarded. It's a handy target when we don't care about the result since it will not cause any interlocks with previous or succesive instructions. grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 17:12 ` Joel Soete 2005-01-10 17:17 ` Grant Grundler @ 2005-01-10 20:02 ` Stuart Brady 1 sibling, 0 replies; 14+ messages in thread From: Stuart Brady @ 2005-01-10 20:02 UTC (permalink / raw) To: parisc-linux On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote: > I still trying to understand the previous formula I found somewhere : > ; NOTE: If a null char. exists, return 0. > ; if ((x - 0x01010101) & ~x & 0x80808080) > ; return 0; > (here it comes from m32r/lib/useropcy.c) It's subtracting one from each byte and checking for overflow. If the most significant bit is set in x - 1, but not in x, then x must be 0. The & with ~x is used to mask out the most significant bit in each byte, if it was already set in x. If a byte is equal to 0, bytes to the "left" of it will be affected by the overflow, but that doesn't matter. Quite a neat trick, really. -- Stuart Brady _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-10 8:54 ` Randolph Chung 2005-01-10 17:12 ` Joel Soete @ 2005-01-11 18:14 ` Joel Soete 2005-01-12 1:49 ` Randolph Chung 1 sibling, 1 reply; 14+ messages in thread From: Joel Soete @ 2005-01-11 18:14 UTC (permalink / raw) To: Randolph Chung; +Cc: parisc-linux Hello Randolph, > > > I need obvioulsy to understand those details Well I think that I have understand (for the most) cpy_dstalign() what a nice work and so interesting ins't it :-) btw I just have an aditional question (not to work on now but later I ris= k to forget): why don't you save the usage of OPSIZ (defined sizeof(unsigned long int))= and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I missed sorry ) >beacuse to work with str I > would > > need to check somehow if there are a byte =3D=3D 0 in the on going wo= rd (I > already > > find a formula to do this into other arch) to jump to byte_copy (so n= eed > > more work ;-) > I can try to continue now ;-) Thanks again to all for help, Joel -------------------------------------------------------------------------= -- Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80 http://reg.tiscali.be/adsl/default.asp?lg=3DFR _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] Re: pa_memcpy: 2 small question 2005-01-11 18:14 ` Joel Soete @ 2005-01-12 1:49 ` Randolph Chung 0 siblings, 0 replies; 14+ messages in thread From: Randolph Chung @ 2005-01-12 1:49 UTC (permalink / raw) To: Joel Soete; +Cc: parisc-linux > why don't you save the usage of OPSIZ (defined sizeof(unsigned long int)) > and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I > missed sorry ) that can be done as well, but cpy_dstaligned is supposed to be a slow-path for the copy routine, and doing __LP64__ stuff means i can't easily test it in userspace, so i didn't bother. you are certainly welcome to try. randolph -- Randolph Chung Debian GNU/Linux Developer, hppa/ia64 ports http://www.tausq.org/ _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* [parisc-linux] revisit copy_user_page_asm microbenchmarks [not found] ` <20050107095143.GN18497@tausq.org> 2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete @ 2005-02-20 23:44 ` Grant Grundler 1 sibling, 0 replies; 14+ messages in thread From: Grant Grundler @ 2005-02-20 23:44 UTC (permalink / raw) To: Randolph Chung; +Cc: parisc-linux On Fri, Jan 07, 2005 at 01:51:43AM -0800, Randolph Chung wrote: > Grant, do you have any results of a somewhat more macrobenchmark that > shows what happens with this patch installed? I don't. I wanted to run the ones that you suggested but just didn't have time. Before this gets totally lost, here are the results I did collect for 64-bit kernel using ldd/std in copy_user_page_asm. Can someone collect a comparable set using the original copy_user_page_asm? This is on J6700 (750Mhz PA8700) w/4GB RAM. > e.g. what does it do to "dd if=/dev/zero of=foo bs=1k count=500000"? Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux root@gggj6k:/home# time dd if=/dev/zero of=foo bs=1k count=500000 "dd" time "dd" B/s "time" real user sys 17.018413 30085062 0m17.034s 0m0.136s 0m11.153s 15.877587 32246713 0m21.932s 0m0.145s 0m11.866s 11.948184 42851700 0m12.642s 0m0.149s 0m11.785s 14.944728 34259573 0m27.936s 0m0.129s 0m11.861s 12.329126 41527680 0m13.035s 0m0.156s 0m11.896s 11.369272 45033666 0m22.073s 0m0.134s 0m11.605s > what does it do to a kernel compile? 2.6.10-rc3-pa8 kernel: real 47m42.656s user 20m15.723s sys 66m52.259s > what does it do to a bonnie run? Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux root@gggj6k:/mnt# bonnie -u grundler -f -x 3 -m 64SMP -d /mnt Using uid:1001, gid:1001. name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. 64SMP,16G,,,57333,25,15918,12,,,25829,10,166.8,1,16,2368,98,+++++,+++,+++++,+++,2589,99,+++++,+++,7093,99 ... 64SMP,16G,,,58014,26,15895,12,,,25461,10,181.7,1,16,2416,98,+++++,+++,+++++,+++,2598,99,+++++,+++,7361,99 ... 64SMP,16G,,,57416,25,16201,13,,,25841,10,186.1,1,16,2452,96,+++++,+++,+++++,+++,2621,99,+++++,+++,7544,100 And one more test that I'm not sure is relevant: time sgp_dd if=/dev/sda of=/dev/sdc bpt=16k count=17781520 2.6.10-rc3-pa8 kernel: real 15m13.011s user 0m0.042s sys 4m6.301s _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-02-20 23:44 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050105055412.68E06495698@palinux.hppa>
2005-01-05 6:16 ` [parisc-linux] more cpup.c results Grant Grundler
2005-01-05 8:20 ` Joel Soete
2005-01-05 8:40 ` Ryan Bradetich
2005-01-05 16:02 ` Grant Grundler
[not found] ` <20050107095143.GN18497@tausq.org>
2005-01-09 19:07 ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
2005-01-10 0:13 ` [parisc-linux] " Randolph Chung
2005-01-10 8:44 ` Joel Soete
2005-01-10 8:54 ` Randolph Chung
2005-01-10 17:12 ` Joel Soete
2005-01-10 17:17 ` Grant Grundler
2005-01-10 20:02 ` Stuart Brady
2005-01-11 18:14 ` Joel Soete
2005-01-12 1:49 ` Randolph Chung
2005-02-20 23:44 ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.