[PATCH][PPC64] Better memset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paul Mackerras <paulus@samba.org>
To: akpm@osdl.org
Cc: anton@samba.org, torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: [PATCH][PPC64] Better memset
Date: Thu, 24 Jun 2004 22:15:31 +1000	[thread overview]
Message-ID: <16602.50659.207406.720480@cargo.ozlabs.ibm.com> (raw)

Anton noticed in some traces that we were spending an awfully long
time doing a memset.  The ppc64 memset is basically unchanged from the
ppc32 version, and it only does 4-byte stores and doesn't unroll the
loop.  Here's a memset that performs a bit better.  I have been using
it for 3 weeks now, and Anton has tested it on a variety of machines,
without problems.  Please apply.

Signed-off-by: Paul Mackerras <paulus@samba.org>

diff -urN prom-cleanup/arch/ppc64/lib/string.S g5-preempt/arch/ppc64/lib/string.S
--- prom-cleanup/arch/ppc64/lib/string.S	2003-06-15 12:12:49.000000000 +1000
+++ g5-preempt/arch/ppc64/lib/string.S	2004-05-29 21:39:26.000000000 +1000
@@ -66,28 +66,69 @@
 	blr
 
 _GLOBAL(memset)
+	neg	r0,r5
 	rlwimi	r4,r4,8,16,23
+	andi.	r0,r0,7			/* # bytes to be 8-byte aligned */
 	rlwimi	r4,r4,16,0,15
-	addi	r6,r3,-4
-	cmplwi	0,r5,4
-	blt	7f
-	stwu	r4,4(r6)
-	beqlr
-	andi.	r0,r6,3
-	add	r5,r0,r5
-	subf	r6,r0,r6
-	srwi	r0,r5,2
+	cmplw	cr1,r5,r0		/* do we get that far? */
+	rldimi	r4,r4,32,0
+	mr	r6,r3
+	mtcrf	1,r0
+	mr	r6,r3
+	blt	cr1,8f
+	beq+	3f			/* if already 8-byte aligned */
+	subf	r5,r0,r5
+	bf	31,1f
+	stb	r4,0(r6)
+	addi	r6,r6,1
+1:	bf	30,2f
+	sth	r4,0(r6)
+	addi	r6,r6,2
+2:	bf	29,3f
+	stw	r4,0(r6)
+	addi	r6,r6,4
+3:	srdi.	r0,r5,6
+	clrldi	r5,r5,58
 	mtctr	r0
-	bdz	6f
-1:	stwu	r4,4(r6)
-	bdnz	1b
-6:	andi.	r5,r5,3
-7:	cmpwi	0,r5,0
-	beqlr
-	mtctr	r5
-	addi	r6,r6,3
-8:	stbu	r4,1(r6)
-	bdnz	8b
+	beq	5f
+4:	std	r4,0(r6)
+	std	r4,8(r6)
+	std	r4,16(r6)
+	std	r4,24(r6)
+	std	r4,32(r6)
+	std	r4,40(r6)
+	std	r4,48(r6)
+	std	r4,56(r6)
+	addi	r6,r6,64
+	bdnz	4b
+5:	srwi.	r0,r5,3
+	clrlwi	r5,r5,29
+	mtcrf	1,r0
+	beq	8f
+	bf	29,6f
+	std	r4,0(r6)
+	std	r4,8(r6)
+	std	r4,16(r6)
+	std	r4,24(r6)
+	addi	r6,r6,32
+6:	bf	30,7f
+	std	r4,0(r6)
+	std	r4,8(r6)
+	addi	r6,r6,16
+7:	bf	31,8f
+	std	r4,0(r6)
+	addi	r6,r6,8
+8:	cmpwi	r5,0
+	mtcrf	1,r5
+	beqlr+
+	bf	29,9f
+	stw	r4,0(r6)
+	addi	r6,r6,4
+9:	bf	30,10f
+	sth	r4,0(r6)
+	addi	r6,r6,2
+10:	bflr	31
+	stb	r4,0(r6)
 	blr
 
 _GLOBAL(memmove)

                 reply	other threads:[~2004-06-24 12:28 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16602.50659.207406.720480@cargo.ozlabs.ibm.com \
    --to=paulus@samba.org \
    --cc=akpm@osdl.org \
    --cc=anton@samba.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.