All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] more cpup.c results
       [not found] <20050105055412.68E06495698@palinux.hppa>
@ 2005-01-05  6:16 ` Grant Grundler
  2005-01-05  8:20   ` Joel Soete
  2005-01-05  8:40   ` Ryan Bradetich
       [not found] ` <20050107095143.GN18497@tausq.org>
  1 sibling, 2 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-05  6:16 UTC (permalink / raw)
  To: parisc-linux

On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote:
> 	add prefetching to copy_user_page_asm
> 	matches asm now checked into build-tools/cpup.c

I committed a new version of copy_user_page_asm based on
the results of build-tools/cpup.c.

Here's the output from the last set of cpup2 (4regs) run:
grundler <577>while :; do ./cpup2; done
          First Loop : min   9247  avg  12037  median  11250
         Later Loops : min   5568  avg   7006  median   6906
          First Loop : min   9180  avg  12051  median  11244
         Later Loops : min   5557  avg   7003  median   6904
          First Loop : min   9204  avg  12027  median  11239
         Later Loops : min   5556  avg   7002  median   6901
          First Loop : min   9197  avg  12032  median  11237
         Later Loops : min   5546  avg   6996  median   6901
          First Loop : min   9300  avg  12032  median  11225
         Later Loops : min   5584  avg   7001  median   6901

It's essentially indistiguishable from cpup3 (6 regs) routine:
grundler <579>while :; do ./cpup3; done
          First Loop : min   9188  avg  11992  median  11223
         Later Loops : min   5493  avg   7002  median   6874
          First Loop : min   9213  avg  11988  median  11224
         Later Loops : min   5487  avg   7004  median   6873
          First Loop : min   9252  avg  11991  median  11204
         Later Loops : min   5487  avg   7004  median   6874
          First Loop : min   9228  avg  12021  median  11219
         Later Loops : min   5550  avg   7003  median   6879
          First Loop : min   9200  avg  11994  median  11215
         Later Loops : min   5514  avg   6997  median   6874

Which tells me the L1 cache is accessible in 1 cycle on PA8700.
And if other CPU implementations need 2 cycles, it wouldn't
hurt to commit the 6regs version.

Can folks try this on PA8000 and PA82000 for me?
Check /proc/cpuinfo if you aren't sure what you have.

Should be a simple cut/paste of 4 lines to a shell prompt:

gcc -O2 -o cpup0 cpup.c
gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c
gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done

Please post the output to the mailing list along with /proc/cpuinfo.

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [parisc-linux] more cpup.c results
  2005-01-05  6:16 ` [parisc-linux] more cpup.c results Grant Grundler
@ 2005-01-05  8:20   ` Joel Soete
  2005-01-05  8:40   ` Ryan Bradetich
  1 sibling, 0 replies; 14+ messages in thread
From: Joel Soete @ 2005-01-05  8:20 UTC (permalink / raw)
  To: Grant Grundler, parisc-linux

Hello Grant,

> -- Original Message --
> Date: Tue, 4 Jan 2005 23:16:13 -0700
> From: Grant Grundler <grundler@parisc-linux.org>
> To: parisc-linux@lists.parisc-linux.org
> Subject: [parisc-linux] more cpup.c results
> 
> 
> On Tue, Jan 04, 2005 at 10:54:12PM -0700, Grant Grundler wrote:
> > 	add prefetching to copy_user_page_asm
> > 	matches asm now checked into build-tools/cpup.c
> 
[...]
> Which tells me the L1 cache is accessible in 1 cycle on PA8700.
> And if other CPU implementations need 2 cycles, it wouldn't
> hurt to commit the 6regs version.
> 
> Can folks try this on PA8000 and PA82000 for me?
> Check /proc/cpuinfo if you aren't sure what you have.
> 
Unfortunately only pa8600 (n4k and b2k) ...

> Should be a simple cut/paste of 4 lines to a shell prompt:
> 
> gcc -O2 -o cpup0 cpup.c
> gcc -O2 -march=3D2.0 -DLP64 -o cpup2 cpup.c
> gcc -O2 -march=3D2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
> for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
> 
anyway here are some results from a b2k (runing obviously a 2.6.10-pa4 64=
bits):
# for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done
TEST 1
          First Loop : min  14462  avg  17576  median  15409
         Later Loops : min   6628  avg   8597  median   7953
          First Loop : min  10313  avg  13497  median  11727
         Later Loops : min   3581  avg   4843  median   4568
          First Loop : min  10714  avg  13703  median  11897
         Later Loops : min   3630  avg   5033  median   4778
TEST 2
          First Loop : min  14445  avg  17452  median  15428
         Later Loops : min   6616  avg   8605  median   7945
          First Loop : min  10358  avg  13510  median  11755
         Later Loops : min   3597  avg   4835  median   4567
          First Loop : min  10669  avg  13708  median  11885
         Later Loops : min   3618  avg   5034  median   4780
TEST 3
          First Loop : min  14459  avg  17437  median  15432
         Later Loops : min   6621  avg   8592  median   7943
          First Loop : min  10345  avg  13541  median  11732
         Later Loops : min   3584  avg   4853  median   4566
          First Loop : min  10658  avg  13695  median  11879
         Later Loops : min   3637  avg   5032  median   4775
TEST 4
          First Loop : min  14503  avg  17455  median  15429
         Later Loops : min   6605  avg   8595  median   7945
          First Loop : min  10265  avg  13515  median  11740
         Later Loops : min   3566  avg   4835  median   4562
          First Loop : min  10681  avg  13720  median  11886
         Later Loops : min   3651  avg   5035  median   4778
TEST 5
          First Loop : min  14472  avg  17460  median  15425
         Later Loops : min   6627  avg   8590  median   7938
          First Loop : min  10376  avg  13555  median  11742
         Later Loops : min   3597  avg   4843  median   4570
          First Loop : min  10684  avg  13689  median  11891
         Later Loops : min   3663  avg   5027  median   4775

hth,
      Joel

-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR




_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] more cpup.c results
  2005-01-05  6:16 ` [parisc-linux] more cpup.c results Grant Grundler
  2005-01-05  8:20   ` Joel Soete
@ 2005-01-05  8:40   ` Ryan Bradetich
  2005-01-05 16:02     ` Grant Grundler
  1 sibling, 1 reply; 14+ messages in thread
From: Ryan Bradetich @ 2005-01-05  8:40 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

Grant,

> Can folks try this on PA8000 and PA82000 for me?
> Check /proc/cpuinfo if you aren't sure what you have.

processor       : 0
cpu family      : PA-RISC 2.0
cpu             : PA8200 (PCX-U+)
cpu MHz         : 200.000000
model           : 9000/782/C200+
model name      : Raven U 200 (9000/780/C200)
hversion        : 0x000059d0
sversion        : 0x00000481
I-cache         : 512 KB
D-cache         : 1024 KB (WB, 0-way associative)
ITLB entries    : 120
DTLB entries    : 120 - shared with ITLB
bogomips        : 395.26
software id     : 2005736878

> Should be a simple cut/paste of 4 lines to a shell prompt:
> 
> gcc -O2 -o cpup0 cpup.c
> gcc -O2 -march=2.0 -DLP64 -o cpup2 cpup.c
> gcc -O2 -march=2.0 -DLP64 -DUSE6REGS -o cpup3 cpup.c
> for i in 1 2 3 4 5; do echo TEST $i; ./cpup0; ./cpup2; ./cpup3; done

This is on a 64-bit kernel:
$ uname -a
Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux

TEST 1
          First Loop : min   9990  avg  11444  median  10352
         Later Loops : min   6290  avg   8673  median   8885
          First Loop : min   8758  avg  10370  median   9312
         Later Loops : min   5842  avg   7168  median   7024
          First Loop : min   8701  avg  10277  median   9215
         Later Loops : min   5670  avg   7244  median   7124
TEST 2
          First Loop : min   9990  avg  11451  median  10353
         Later Loops : min   6197  avg   8669  median   8880
          First Loop : min   8748  avg  10379  median   9318
         Later Loops : min   5768  avg   7166  median   7022
          First Loop : min   8657  avg  10280  median   9208
         Later Loops : min   5773  avg   7239  median   7123
TEST 3
          First Loop : min   9993  avg  11442  median  10353
         Later Loops : min   6278  avg   8670  median   8880
          First Loop : min   8745  avg  10408  median   9318
         Later Loops : min   5804  avg   7163  median   7023
          First Loop : min   8681  avg  10340  median   9266
         Later Loops : min   5663  avg   7238  median   7120
TEST 4
          First Loop : min   9990  avg  11453  median  10347
         Later Loops : min   6282  avg   8661  median   8877
          First Loop : min   8751  avg  10400  median   9324
         Later Loops : min   5750  avg   7171  median   7024
          First Loop : min   8622  avg  10283  median   9213
         Later Loops : min   5680  avg   7235  median   7119
TEST 5
          First Loop : min  10032  avg  11442  median  10348
         Later Loops : min   6224  avg   8688  median   8884
          First Loop : min   8799  avg  10396  median   9323
         Later Loops : min   5751  avg   7165  median   7021
          First Loop : min   8622  avg  10286  median   9221
         Later Loops : min   5653  avg   7240  median   7120



This is on a 32-bit kernel:
$ uname -a
Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux

TEST 1
          First Loop : min  10924  avg  11555  median  11090
         Later Loops : min   7744  avg   8196  median   8130
          First Loop : min   9584  avg  10251  median   9790
         Later Loops : min   6451  avg   6836  median   6784
          First Loop : min   9487  avg  10104  median   9673
         Later Loops : min   6202  avg   6604  median   6550
TEST 2
          First Loop : min  10927  avg  11553  median  11097
         Later Loops : min   7687  avg   8193  median   8130
          First Loop : min   9594  avg  10267  median   9790
         Later Loops : min   6451  avg   6853  median   6784
          First Loop : min   9477  avg  10117  median   9654
         Later Loops : min   6243  avg   6606  median   6549
TEST 3
          First Loop : min  10943  avg  11549  median  11104
         Later Loops : min   7670  avg   8197  median   8130
          First Loop : min   9607  avg  10255  median   9790
         Later Loops : min   6451  avg   6836  median   6783
          First Loop : min   9487  avg  10144  median   9674
         Later Loops : min   6216  avg   6604  median   6550
TEST 4
          First Loop : min  10924  avg  11527  median  11083
         Later Loops : min   6848  avg   8176  median   8118
          First Loop : min   9610  avg  10260  median   9810
         Later Loops : min   6451  avg   6837  median   6784
          First Loop : min   9493  avg  10140  median   9670
         Later Loops : min   6215  avg   6605  median   6552
TEST 5
          First Loop : min  10924  avg  11538  median  11087
         Later Loops : min   7703  avg   8191  median   8132
          First Loop : min   9583  avg  10236  median   9793
         Later Loops : min   6451  avg   6837  median   6783
          First Loop : min   9487  avg  10132  median   9674
         Later Loops : min   6171  avg   6606  median   6550


The K460 is a 8000 processor ... I'll see if I can get the K460
installed and updated to give you results from that system as well.  I
am also working on getting you results from a 715/100 
as well (currently in the middle of a new debian install).

Thanks,

- Ryan

> thanks,
> grant
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> 
-- 
Ryan Bradetich <rbradetich@uswest.net>

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] more cpup.c results
  2005-01-05  8:40   ` Ryan Bradetich
@ 2005-01-05 16:02     ` Grant Grundler
  0 siblings, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-05 16:02 UTC (permalink / raw)
  To: Ryan Bradetich; +Cc: parisc-linux

On Wed, Jan 05, 2005 at 01:40:56AM -0700, Ryan Bradetich wrote:
> This is on a 64-bit kernel:
> $ uname -a
> Linux vega 2.6.10-pa3 #1 Sun Jan 2 14:28:36 MST 2005 parisc64 GNU/Linux
> 
> TEST 1
>           First Loop : min   9990  avg  11444  median  10352
>          Later Loops : min   6290  avg   8673  median   8885
...


thanks for the results!

> This is on a 32-bit kernel:
> $ uname -a
> Linux vega 2.6.10-pa5 #1 Wed Jan 5 01:14:00 MST 2005 parisc GNU/Linux
> 
> TEST 1
>           First Loop : min  10924  avg  11555  median  11090
>          Later Loops : min   7744  avg   8196  median   8130
>           First Loop : min   9584  avg  10251  median   9790
>          Later Loops : min   6451  avg   6836  median   6784
>           First Loop : min   9487  avg  10104  median   9673
>          Later Loops : min   6202  avg   6604  median   6550

Interesting that cpup3 is slightly faster than cpup2 with
the 32-bit kernel. Since user space is 32-bit always, I wouldn't
have expected a difference in "Later Loops" output.

> The K460 is a 8000 processor ... I'll see if I can get the K460
> installed and updated to give you results from that system as well.  I

Well, don't sweat it. Others might have PA8000 box already up and running.

> am also working on getting you results from a 715/100 
> as well (currently in the middle of a new debian install).

715 is PA1.1. cpup2/3 won't work there.
It would be worth trying variants of cpup0 (32-bit) scheduling
on PA1.1 machines. I'll leave that as an excercise for others.

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] pa_memcpy: 2 small question
       [not found] ` <20050107095143.GN18497@tausq.org>
@ 2005-01-09 19:07   ` Joel Soete
  2005-01-10  0:13     ` [parisc-linux] " Randolph Chung
  2005-02-20 23:44   ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-09 19:07 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

Hello Randolph,

I just studying your pa_memcpy code (always to see if I can use it to improve stuff that you suggested me: l*).
And I wonder understand some values in copy_dstalign():
[...]
in the shift computing:
         /* Calculate how to shift a word read at the memory operation
            aligned srcp to make it aligned for copy.  */
         sh_1 = 8 * (src % sizeof(unsigned int));
         sh_2 = 8 * sizeof(unsigned int) - sh_1;

what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw and so 2 (a pair of word)? )

next in
          switch (len % 4)
         {

is '4' because as mentioned in copy_dstalign() description this 'Handles _4_ words per loop'

Thanks in advance,
	Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-09 19:07   ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
@ 2005-01-10  0:13     ` Randolph Chung
  2005-01-10  8:44       ` Joel Soete
  0 siblings, 1 reply; 14+ messages in thread
From: Randolph Chung @ 2005-01-10  0:13 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> in the shift computing:
>         /* Calculate how to shift a word read at the memory operation
>            aligned srcp to make it aligned for copy.  */
>         sh_1 = 8 * (src % sizeof(unsigned int));
>         sh_2 = 8 * sizeof(unsigned int) - sh_1;
> 
> what means '8' (== 2 * word size; i.e. 2 * 32 bit because MERGE use shrpw 
> and so 2 (a pair of word)? )

no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit
integer.

what we are trying to achieve is that given two adjacent 32-bit numbers,
we want to extract a 32-bit number "in the middle" of the two (aligned)
32-bit values.

if you look carefully, actually the implementation of MERGE does not use
both sh_1 and sh_2. In the original implementation, MERGE was
implemented using two SHIFT operations plus an OR operation. This was
optimized to use shrpw because this PA insn can do all three operations
in a single step, so it saves a lot of cycles.

> next in
>          switch (len % 4)
>         {
> 
> is '4' because as mentioned in copy_dstalign() description this 'Handles 
> _4_ words per loop'

yes.

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10  0:13     ` [parisc-linux] " Randolph Chung
@ 2005-01-10  8:44       ` Joel Soete
  2005-01-10  8:54         ` Randolph Chung
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-10  8:44 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux


> > in the shift computing:
> >         /* Calculate how to shift a word read at the memory operation=

> >            aligned srcp to make it aligned for copy.  */
> >         sh_1 =3D 8 * (src % sizeof(unsigned int));
> >         sh_2 =3D 8 * sizeof(unsigned int) - sh_1;
> > 
> > what means '8' (=3D=3D 2 * word size; i.e. 2 * 32 bit because MERGE u=
se shrpw
> 
> > and so 2 (a pair of word)? )
> 
> no. 8 is # bits/byte. sh_1 is the number of bits to shift a 32-bit
> integer.
> 
Ah Ok ;-)

> what we are trying to achieve is that given two adjacent 32-bit numbers=
,
> we want to extract a 32-bit number "in the middle" of the two (aligned)=

> 32-bit values.
> 
> if you look carefully, actually the implementation of MERGE does not us=
e
> both sh_1 and sh_2.
Yes ...

> In the original implementation, MERGE was
> implemented using two SHIFT operations plus an OR operation. This was
> optimized to use shrpw because this PA insn can do all three operations=

> in a single step, so it saves a lot of cycles.
> 
Cool :-)

> > next in
> >          switch (len % 4)
> >         {
> > 
> > is '4' because as mentioned in copy_dstalign() description this 'Hand=
les
> 
> > _4_ words per loop'
> 
> yes.
> 
I need obvioulsy to understand those details beacuse to work with str I w=
ould
need to check somehow if there are a byte =3D=3D 0 in the on going word (=
I already
find a formula to do this into other arch) to jump to byte_copy (so need
more work ;-)

Thanks a lot,
    Joel

-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR



_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10  8:44       ` Joel Soete
@ 2005-01-10  8:54         ` Randolph Chung
  2005-01-10 17:12           ` Joel Soete
  2005-01-11 18:14           ` Joel Soete
  0 siblings, 2 replies; 14+ messages in thread
From: Randolph Chung @ 2005-01-10  8:54 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> I need obvioulsy to understand those details beacuse to work with str I would
> need to check somehow if there are a byte == 0 in the on going word (I already
> find a formula to do this into other arch) to jump to byte_copy (so need
> more work ;-)

maybe you can use something like (not tested):

loop:
ldw 0(source),tmp
uaddcm,nbz tmp,%r0,%r0
b,n byte_copy
b loop
stw tmp, 0(dst)

byte_copy:
....

uaddcm should be able to let you determine if there are any 0's in the
current word in a single insn.

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10  8:54         ` Randolph Chung
@ 2005-01-10 17:12           ` Joel Soete
  2005-01-10 17:17             ` Grant Grundler
  2005-01-10 20:02             ` Stuart Brady
  2005-01-11 18:14           ` Joel Soete
  1 sibling, 2 replies; 14+ messages in thread
From: Joel Soete @ 2005-01-10 17:12 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

btw I just recover the libc _wordcopy_fwd_dest_aligned()

> 
> > I need obvioulsy to understand those details beacuse to work with str=

I
> would
> > need to check somehow if there are a byte =3D=3D 0 in the on going wo=
rd (I
> already
> > find a formula to do this into other arch) to jump to byte_copy (so n=
eed
> > more work ;-)
> 
> maybe you can use something like (not tested):
> 
> loop:
> ldw 0(source),tmp
> uaddcm,nbz tmp,%r0,%r0
mmm I always confused by %r0: is a magic one containing zero (when read i=
irc)
what uaddcm is suposed to do:  tmp + ~%r0 (i.e. tmp+0xffffffff)? well hav=
e
to study ;-)
well any way if no trap occurs the results is put in %r0: I don't yet wel=
l
understand this usage of %r0 as target reg (excepted for prefetching)?

> b,n byte_copy
> b loop
> stw tmp, 0(dst)
> 
> byte_copy:
> ....
> 
> uaddcm should be able to let you determine if there are any 0's in the
> current word in a single insn.
> 
Thanks ...
I still trying to understand the previous formula I found somewhere :
	; NOTE: If a null char. exists, return 0.
	; if ((x - 0x01010101) & ~x & 0x80808080)
	;     return 0;
(here it comes from m32r/lib/useropcy.c)

Joel

-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR



_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10 17:12           ` Joel Soete
@ 2005-01-10 17:17             ` Grant Grundler
  2005-01-10 20:02             ` Stuart Brady
  1 sibling, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-01-10 17:17 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote:
> well any way if no trap occurs the results is put in %r0: I don't yet well
> understand this usage of %r0 as target reg (excepted for prefetching)?

writes to %r0 are simply discarded.
It's a handy target when we don't care about the result since
it will not cause any interlocks with previous or succesive instructions.

grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10 17:12           ` Joel Soete
  2005-01-10 17:17             ` Grant Grundler
@ 2005-01-10 20:02             ` Stuart Brady
  1 sibling, 0 replies; 14+ messages in thread
From: Stuart Brady @ 2005-01-10 20:02 UTC (permalink / raw)
  To: parisc-linux

On Mon, Jan 10, 2005 at 06:12:24PM +0100, Joel Soete wrote:
> I still trying to understand the previous formula I found somewhere :
> 	; NOTE: If a null char. exists, return 0.
> 	; if ((x - 0x01010101) & ~x & 0x80808080)
> 	;     return 0;
> (here it comes from m32r/lib/useropcy.c)

It's subtracting one from each byte and checking for overflow.  If the
most significant bit is set in x - 1, but not in x, then x must be 0.
The & with ~x is used to mask out the most significant bit in each
byte, if it was already set in x.

If a byte is equal to 0, bytes to the "left" of it will be affected by
the overflow, but that doesn't matter.  Quite a neat trick, really.
-- 
Stuart Brady
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-10  8:54         ` Randolph Chung
  2005-01-10 17:12           ` Joel Soete
@ 2005-01-11 18:14           ` Joel Soete
  2005-01-12  1:49             ` Randolph Chung
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Soete @ 2005-01-11 18:14 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

Hello Randolph,

> 
> > I need obvioulsy to understand those details 
Well I think that I have understand (for the most) cpy_dstalign() what a
nice work and so interesting ins't it :-)

btw I just have an aditional question (not to work on now but later I ris=
k
to forget):
why don't you save the usage of OPSIZ (defined sizeof(unsigned long int))=

and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I
missed sorry )

>beacuse to work with str I
> would
> > need to check somehow if there are a byte =3D=3D 0 in the on going wo=
rd (I
> already
> > find a formula to do this into other arch) to jump to byte_copy (so n=
eed
> > more work ;-)
> 
I can try to continue now ;-)

Thanks again to all for help,
    Joel

-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR



_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] Re: pa_memcpy: 2 small question
  2005-01-11 18:14           ` Joel Soete
@ 2005-01-12  1:49             ` Randolph Chung
  0 siblings, 0 replies; 14+ messages in thread
From: Randolph Chung @ 2005-01-12  1:49 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> why don't you save the usage of OPSIZ (defined sizeof(unsigned long int))
> and use shrpd, ldd, std when ifdef __LP64__ ? (certainly another stuff I
> missed sorry )

that can be done as well, but cpy_dstaligned is supposed to be a
slow-path for the copy routine, and doing __LP64__ stuff means i can't
easily test it in userspace, so i didn't bother. you are certainly
welcome to try.

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [parisc-linux] revisit copy_user_page_asm microbenchmarks
       [not found] ` <20050107095143.GN18497@tausq.org>
  2005-01-09 19:07   ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
@ 2005-02-20 23:44   ` Grant Grundler
  1 sibling, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-02-20 23:44 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

On Fri, Jan 07, 2005 at 01:51:43AM -0800, Randolph Chung wrote:
> Grant, do you have any results of a somewhat more macrobenchmark that
> shows what happens with this patch installed?

I don't. I wanted to run the ones that you suggested but just
didn't have time. Before this gets totally lost, here are the results
I did collect for 64-bit kernel using ldd/std in copy_user_page_asm.
Can someone collect a comparable set using the original copy_user_page_asm?

This is on J6700 (750Mhz PA8700) w/4GB RAM.

>  e.g. what does it do to "dd if=/dev/zero of=foo bs=1k count=500000"?

Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux

root@gggj6k:/home# time dd if=/dev/zero of=foo bs=1k count=500000

"dd" time	"dd" B/s	"time" real	user		sys
17.018413	30085062	0m17.034s	0m0.136s	0m11.153s
15.877587	32246713	0m21.932s	0m0.145s	0m11.866s
11.948184	42851700	0m12.642s	0m0.149s	0m11.785s
14.944728	34259573	0m27.936s	0m0.129s	0m11.861s
12.329126	41527680	0m13.035s	0m0.156s	0m11.896s
11.369272	45033666	0m22.073s	0m0.134s	0m11.605s


> what does it do to a kernel compile?

2.6.10-rc3-pa8 kernel:
real    47m42.656s
user    20m15.723s
sys     66m52.259s


> what does it do to a bonnie run?

Linux gggj6k 2.6.10-pa6-64SMP #1 SMP Thu Jan 6 22:18:36 PST 2005 parisc64 GNU/Linux

root@gggj6k:/mnt# bonnie -u grundler -f -x 3 -m 64SMP -d /mnt
Using uid:1001, gid:1001.
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
64SMP,16G,,,57333,25,15918,12,,,25829,10,166.8,1,16,2368,98,+++++,+++,+++++,+++,2589,99,+++++,+++,7093,99
...
64SMP,16G,,,58014,26,15895,12,,,25461,10,181.7,1,16,2416,98,+++++,+++,+++++,+++,2598,99,+++++,+++,7361,99
...
64SMP,16G,,,57416,25,16201,13,,,25841,10,186.1,1,16,2452,96,+++++,+++,+++++,+++,2621,99,+++++,+++,7544,100



And one more test that I'm not sure is relevant:

	time sgp_dd if=/dev/sda of=/dev/sdc bpt=16k count=17781520

2.6.10-rc3-pa8 kernel:
real    15m13.011s
user    0m0.042s
sys     4m6.301s


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-02-20 23:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20050105055412.68E06495698@palinux.hppa>
2005-01-05  6:16 ` [parisc-linux] more cpup.c results Grant Grundler
2005-01-05  8:20   ` Joel Soete
2005-01-05  8:40   ` Ryan Bradetich
2005-01-05 16:02     ` Grant Grundler
     [not found] ` <20050107095143.GN18497@tausq.org>
2005-01-09 19:07   ` [parisc-linux] pa_memcpy: 2 small question Joel Soete
2005-01-10  0:13     ` [parisc-linux] " Randolph Chung
2005-01-10  8:44       ` Joel Soete
2005-01-10  8:54         ` Randolph Chung
2005-01-10 17:12           ` Joel Soete
2005-01-10 17:17             ` Grant Grundler
2005-01-10 20:02             ` Stuart Brady
2005-01-11 18:14           ` Joel Soete
2005-01-12  1:49             ` Randolph Chung
2005-02-20 23:44   ` [parisc-linux] revisit copy_user_page_asm microbenchmarks Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.