linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* glibc: optimized ppc bcopy
@ 2003-04-11 21:04 Rob Latham
  2003-04-11 21:37 ` Kenneth Johansson
  2003-04-11 22:21 ` Rob Baxter
  0 siblings, 2 replies; 3+ messages in thread
From: Rob Latham @ 2003-04-11 21:04 UTC (permalink / raw)
  To: linuxppc-dev


i noticed something when comparing lmbench numbers between os x and
linux on the same hardware:  linux beats os x at every category except
one: Bcopy (libc)

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
aragorn   Linux 2.4.20- 198. 216. 90.1  206.5  173.1  122.8  124.2 173. 358.3
os-x         Darwin 6.4 124. 121. 80.5  150.6  178.1  239.8  123.4 178. 411.9

So i looked a bit closer at glibc: there are no optimized powerpc
string or memory operations.  ( later confirmed by the glibc web
pages)

I know there are a zillion powerpc variants:  would it be hard to
write assembly that works with all of them?  I know almost zero about
powerpc assembly, but this might be a fun place to start learning.  Of
course, if anyone else has already started such an undertaking, i'll
defer to them and go work on something else.

For those curious, the full lmbench run can be found here:
http://terizla.org/~robl/pbook/benchmarks/lmbench-linux_vs_osx.1
(linux does quite well :> )

==rob

--
Rob Latham                                        Chicago, IL USA

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: glibc: optimized ppc bcopy
  2003-04-11 21:04 glibc: optimized ppc bcopy Rob Latham
@ 2003-04-11 21:37 ` Kenneth Johansson
  2003-04-11 22:21 ` Rob Baxter
  1 sibling, 0 replies; 3+ messages in thread
From: Kenneth Johansson @ 2003-04-11 21:37 UTC (permalink / raw)
  To: Rob Latham; +Cc: linuxppc-dev@lists.linuxppc.org

[-- Attachment #1: Type: text/plain, Size: 844 bytes --]

On Fri, 2003-04-11 at 23:04, Rob Latham wrote:
 
> So i looked a bit closer at glibc: there are no optimized powerpc
> string or memory operations.  ( later confirmed by the glibc web
> pages)
> 
> I know there are a zillion powerpc variants:  would it be hard to
> write assembly that works with all of them?  I know almost zero about
> powerpc assembly, but this might be a fun place to start learning.  Of
> course, if anyone else has already started such an undertaking, i'll
> defer to them and go work on something else.

You could start with this. I have not used them on a resent version of
glibc but it used to work. 
 
-- 
Kenneth Johansson	
Ericsson AB                       Tel: +46 8 719 70 20
Tellusborgsvägen  90              Fax: +46 8 719 29 45
126 25 Stockholm                  ken@switchboard.ericsson.se

[-- Attachment #2: bcopy.S --]
[-- Type: text/plain, Size: 1132 bytes --]

/* Optimized bcopy `implementation' for PowerPC.
   Copyright (C) 1999 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include <sysdep.h>

ENTRY(bcopy)
/* void bcopy(const void *src [r3], const void *dest [r4], size_t n [r5]) */
	mr	%r6,%r3
	mr	%r3,%r4
	mr	%r4,%r6
	b	memcpy@local
END(bcopy)

[-- Attachment #3: memcpy.S --]
[-- Type: text/plain, Size: 1908 bytes --]

/* Optimized memcpy implementation for PowerPC.
   Copyright (C) 1996 Paul Mackerras.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include <sysdep.h>

ENTRY(memcpy)
/* void * [r3] memcpy(void *dest [r3], const void *src [r4], size_t n [r5]) */

/* Registers used:
   r0:  temporary
   r3:	saved `dest'
   r4:	pointer to previous word in src
   r5:	
   r6:	pointer to previous word in dest
   r7:  temporary
   r8:  temporary (used to move words)
*/
	srwi.	%r7,%r5,3		/* r0 = r5 >> 3 */
	addi	%r6,%r3,-4
	addi	%r4,%r4,-4
	beq	2f			/* if less than 8 bytes to do */
	andi.	%r0,%r6,3			/* get dest word aligned */
	mtctr	%r7
	bne	5f
1:	lwz	%r7,4(%r4)
	lwzu	%r8,8(%r4)
	stw	%r7,4(%r6)
	stwu	%r8,8(%r6)
	bdnz	1b
	andi.	%r5,%r5,7
2:	cmplwi	0,%r5,4
	blt	3f
	lwzu	%r0,4(%r4)
	addi	%r5,%r5,-4
	stwu	%r0,4(%r6)
3:	cmpwi	0,%r5,0
	beqlr
	mtctr	%r5
	addi	%r4,%r4,3
	addi	%r6,%r6,3
4:	lbzu	%r0,1(%r4)
	stbu	%r0,1(%r6)
	bdnz	4b
	blr
5:	subfic	%r0,%r0,4
	mtctr	%r0
6:	lbz	%r7,4(%r4)
	addi	%r4,%r4,1
	stb	%r7,4(%r6)
	addi	%r6,%r6,1
	bdnz	6b
	subf	%r5,%r0,%r5
	srwi.	%r7,%r5,3
	beq	2b
	mtctr	%r7
	b	1b
END(memcpy)

[-- Attachment #4: memmove.S --]
[-- Type: text/plain, Size: 2350 bytes --]

/* Optimized memmove implementation for PowerPC.
   Copyright (C) 1996 Paul Mackerras.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include <sysdep.h>

ENTRY(memmove)
/* void * [r3] memmove(void *dest [r3], const void *src [r4], size_t n [r5]) */
	cmplw	0,%r3,%r4
	bgt	backwards_memcpy
	/* fall through */

forward_memcpy:
	srwi.	%r7,%r5,3		/* r0 = r5 >> 3 */
	addi	%r6,%r3,-4
	addi	%r4,%r4,-4
	beq	2f			/* if less than 8 bytes to do */
	andi.	%r0,%r6,3		/* get dest word aligned */
	mtctr	%r7
	bne	5f
1:	lwz	%r7,4(%r4)
	lwzu	%r8,8(%r4)
	stw	%r7,4(%r6)
	stwu	%r8,8(%r6)
	bdnz	1b
	andi.	%r5,%r5,7
2:	cmplwi	0,%r5,4
	blt	3f
	lwzu	%r0,4(%r4)
	addi	%r5,%r5,-4
	stwu	%r0,4(%r6)
3:	cmpwi	0,%r5,0
	beqlr
	mtctr	%r5
	addi	%r4,%r4,3
	addi	%r6,%r6,3
4:	lbzu	%r0,1(%r4)
	stbu	%r0,1(%r6)
	bdnz	4b
	blr
5:	subfic	%r0,%r0,4
	mtctr	%r0
6:	lbz	%r7,4(%r4)
	addi	%r4,%r4,1
	stb	%r7,4(%r6)
	addi	%r6,%r6,1
	bdnz	6b
	subf	%r5,%r0,%r5
	srwi.	%r7,%r5,3
	beq	2b
	mtctr	%r7
	b	1b

backwards_memcpy:
	rlwinm.	%r7,%r5,32-3,3,31		/* r0 = r5 >> 3 */
	add	%r6,%r3,%r5
	add	%r4,%r4,%r5
	beq	2f
	andi.	%r0,%r6,3
	mtctr	%r7
	bne	5f
1:	lwz	%r7,-4(%r4)
	lwzu	%r8,-8(%r4)
	stw	%r7,-4(%r6)
	stwu	%r8,-8(%r6)
	bdnz	1b
	andi.	%r5,%r5,7
2:	cmplwi	0,%r5,4
	blt	3f
	lwzu	%r0,-4(%r4)
	subi	%r5,%r5,4
	stwu	%r0,-4(%r6)
3:	cmpwi	0,%r5,0
	beqlr
	mtctr	%r5
4:	lbzu	%r0,-1(%r4)
	stbu	%r0,-1(%r6)
	bdnz	4b
	blr
5:	mtctr	%r0
6:	lbzu	%r7,-1(%r4)
	stbu	%r7,-1(%r6)
	bdnz	6b
	subf	%r5,%r0,%r5
	rlwinm.	%r7,%r5,32-3,3,31
	beq	2b
	mtctr	%r7
	b	1b
END(memmove)

[-- Attachment #5: mempcpy.S --]
[-- Type: text/plain, Size: 1784 bytes --]

/* Optimized mempcpy implementation for PowerPC.
   Copyright (C) 1996 Paul Mackerras.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include <sysdep.h>

ENTRY(__mempcpy)
/* void * [r3] __mempcpy(void *dest [r3], const void *src [r4], size_t n [r5])
 */
	srwi.	%r7,%r5,3		/* r0 = r5 >> 3 */
	addi	%r6,%r3,-4
	addi	%r4,%r4,-4
	add	%r3,%r3,%r5		/* set up return value */
	beq	2f			/* if less than 8 bytes to do */
	andi.	%r0,%r6,3		/* get dest word aligned */
	mtctr	%r7
	bne	5f
1:	lwz	%r7,4(%r4)
	lwzu	%r8,8(%r4)
	stw	%r7,4(%r6)
	stwu	%r8,8(%r6)
	bdnz	1b
	andi.	%r5,%r5,7
2:	cmplwi	0,%r5,4
	blt	3f
	lwzu	%r0,4(%r4)
	addi	%r5,%r5,-4
	stwu	%r0,4(%r6)
3:	cmpwi	0,%r5,0
	beqlr
	mtctr	%r5
	addi	%r4,%r4,3
	addi	%r6,%r6,3
4:	lbzu	%r0,1(%r4)
	stbu	%r0,1(%r6)
	bdnz	4b
	blr
5:	subfic	%r0,%r0,4
	mtctr	%r0
6:	lbz	%r7,4(%r4)
	addi	%r4,%r4,1
	stb	%r7,4(%r6)
	addi	%r6,%r6,1
	bdnz	6b
	subf	%r5,%r0,%r5
	srwi.	%r7,%r5,3
	beq	2b
	mtctr	%r7
	b	1b
END(__mempcpy)

weak_alias (__mempcpy, mempcpy)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: glibc: optimized ppc bcopy
  2003-04-11 21:04 glibc: optimized ppc bcopy Rob Latham
  2003-04-11 21:37 ` Kenneth Johansson
@ 2003-04-11 22:21 ` Rob Baxter
  1 sibling, 0 replies; 3+ messages in thread
From: Rob Baxter @ 2003-04-11 22:21 UTC (permalink / raw)
  To: Rob Latham; +Cc: linuxppc-dev


On Fri, Apr 11, 2003 at 04:04:49PM -0500, Rob Latham wrote:
>
> i noticed something when comparing lmbench numbers between os x and
> linux on the same hardware:  linux beats os x at every category except
> one: Bcopy (libc)
>
> *Local* Communication bandwidths in MB/s - bigger is better
> -----------------------------------------------------------
> Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
>                              UNIX      reread reread (libc) (hand) read write
> --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
> aragorn   Linux 2.4.20- 198. 216. 90.1  206.5  173.1  122.8  124.2 173. 358.3
> os-x         Darwin 6.4 124. 121. 80.5  150.6  178.1  239.8  123.4 178. 411.9
>
> So i looked a bit closer at glibc: there are no optimized powerpc
> string or memory operations.  ( later confirmed by the glibc web
> pages)
>
> I know there are a zillion powerpc variants:  would it be hard to
> write assembly that works with all of them?  I know almost zero about
> powerpc assembly, but this might be a fun place to start learning.  Of
> course, if anyone else has already started such an undertaking, i'll
> defer to them and go work on something else.
>
> For those curious, the full lmbench run can be found here:
> http://terizla.org/~robl/pbook/benchmarks/lmbench-linux_vs_osx.1
> (linux does quite well :> )
>
> ==rob
>
> --
> Rob Latham                                        Chicago, IL USA
>

Another route would be the use of an AltiVec coded library variant:

http://e-www.motorola.com/webapp/sps/site/overview.jsp?nodeId=03C1TR0467mKqW5Nf2d9nb

Rob Baxter

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-04-11 22:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-11 21:04 glibc: optimized ppc bcopy Rob Latham
2003-04-11 21:37 ` Kenneth Johansson
2003-04-11 22:21 ` Rob Baxter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).