linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* FW: Improved copy_page() function, about 30% speed up for mpc860!
@ 2003-02-28 10:41 Joakim Tjernlund
  2003-03-01  5:17 ` Daniel Jacobowitz
  0 siblings, 1 reply; 3+ messages in thread
From: Joakim Tjernlund @ 2003-02-28 10:41 UTC (permalink / raw)
  To: Linuxppc-Dev@Lists. Linuxppc. Org


Hi

hmm, no reply on this on the embedded list. Maybe this list is a better place.

 Jocke

-----Original Message-----
From: Joakim Tjernlund [mailto:joakim.tjernlund@lumentis.se]
Sent: Thursday, February 27, 2003 14:09
To: Linuxppc-Embedded@Lists. Linuxppc. Org
Subject: Improved copy_page() function, about 30% speed up for mpc860!


Hi all

I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
and gained about 30% speed up for my mpc860, rev D4 MHz.

This is what i did:
- Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
  if I don't clear ahead). This is the biggest improvement.
- Use prefetch for 8xx as well.

I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
For me works just fine, except in copy_tofrom_user(don't know why).

I would like to get some feedback & test results both for 8xx and non 8xx.
Please include exact CPU and revision.

 Thanks
         Jocke

_GLOBAL(copy_page)
	addi	r3,r3,-4
	addi	r4,r4,-4
	li	r5,4
#if MAX_COPY_PREFETCH > 1
	/* This will prefetch past end of page, does not seem to be a problem? */
	li	r0,MAX_COPY_PREFETCH
	li	r11,4
	mtctr	r0
11:	dcbt	r11,r4
	addi	r11,r11,L1_CACHE_LINE_SIZE
	bdnz	11b
#else /* MAX_L1_COPY_PREFETCH == 1 */
	dcbt	r5,r4
	li	r11,L1_CACHE_LINE_SIZE+4
#endif /* MAX_L1_COPY_PREFETCH */
	dcbz	r5,r3 /* older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
	addi	r5,r5,L1_CACHE_LINE_SIZE
	li	r0,4096/L1_CACHE_LINE_SIZE-1 /* All, but the last cache line of data due dcbz below */
	mtctr	r0
1:
	dcbt	r11,r4
	dcbz	r5,r3 /* zero the cache line after the one that is beeing copied
		       * older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 32
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 64
	COPY_16_BYTES
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 128
	COPY_16_BYTES
	COPY_16_BYTES
	COPY_16_BYTES
	COPY_16_BYTES
#endif
#endif
#endif
	bdnz	1b
/* Copy the last cache line of data */
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 32
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 64
	COPY_16_BYTES
	COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 128
	COPY_16_BYTES
	COPY_16_BYTES
	COPY_16_BYTES
	COPY_16_BYTES
#endif
#endif
#endif
	blr


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: FW: Improved copy_page() function, about 30% speed up for mpc860!
  2003-02-28 10:41 FW: Improved copy_page() function, about 30% speed up for mpc860! Joakim Tjernlund
@ 2003-03-01  5:17 ` Daniel Jacobowitz
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Jacobowitz @ 2003-03-01  5:17 UTC (permalink / raw)
  To: Linuxppc-Dev@Lists. Linuxppc. Org


On Fri, Feb 28, 2003 at 11:41:52AM +0100, Joakim Tjernlund wrote:
>
> Hi
>
> hmm, no reply on this on the embedded list. Maybe this list is a better place.

I can't tell you what revs they were, but all of the MPC860's I could
get my hands on here the last time I tried to use dcbz on them were
faulty.  You may just not be triggering the bug.

>  Jocke
>
> -----Original Message-----
> From: Joakim Tjernlund [mailto:joakim.tjernlund@lumentis.se]
> Sent: Thursday, February 27, 2003 14:09
> To: Linuxppc-Embedded@Lists. Linuxppc. Org
> Subject: Improved copy_page() function, about 30% speed up for mpc860!
>
>
> Hi all
>
> I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> and gained about 30% speed up for my mpc860, rev D4 MHz.
>
> This is what i did:
> - Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
>   if I don't clear ahead). This is the biggest improvement.
> - Use prefetch for 8xx as well.
>
> I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
> For me works just fine, except in copy_tofrom_user(don't know why).
>
> I would like to get some feedback & test results both for 8xx and non 8xx.
> Please include exact CPU and revision.
>
>  Thanks
>          Jocke
>
> _GLOBAL(copy_page)
> 	addi	r3,r3,-4
> 	addi	r4,r4,-4
> 	li	r5,4
> #if MAX_COPY_PREFETCH > 1
> 	/* This will prefetch past end of page, does not seem to be a problem? */
> 	li	r0,MAX_COPY_PREFETCH
> 	li	r11,4
> 	mtctr	r0
> 11:	dcbt	r11,r4
> 	addi	r11,r11,L1_CACHE_LINE_SIZE
> 	bdnz	11b
> #else /* MAX_L1_COPY_PREFETCH == 1 */
> 	dcbt	r5,r4
> 	li	r11,L1_CACHE_LINE_SIZE+4
> #endif /* MAX_L1_COPY_PREFETCH */
> 	dcbz	r5,r3 /* older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
> 	addi	r5,r5,L1_CACHE_LINE_SIZE
> 	li	r0,4096/L1_CACHE_LINE_SIZE-1 /* All, but the last cache line of data due dcbz below */
> 	mtctr	r0
> 1:
> 	dcbt	r11,r4
> 	dcbz	r5,r3 /* zero the cache line after the one that is beeing copied
> 		       * older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> #endif
> #endif
> #endif
> 	bdnz	1b
> /* Copy the last cache line of data */
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> 	COPY_16_BYTES
> #endif
> #endif
> #endif
> 	blr
>
>
>
>

--
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: FW: Improved copy_page() function, about 30% speed up for mpc860!
@ 2003-03-01 14:23 Joakim Tjernlund
  0 siblings, 0 replies; 3+ messages in thread
From: Joakim Tjernlund @ 2003-03-01 14:23 UTC (permalink / raw)
  To: drow; +Cc: linuxppc-dev


>>On Fri, Feb 28, 2003 at 11:41:52AM +0100, Joakim Tjernlund wrote:
>>
>> Hi
>>
>> hmm, no reply on this on the embedded list. Maybe this list is a better place.
>
> I can't tell you what revs they were, but all of the MPC860's I could
> get my hands on here the last time I tried to use dcbz on them were
> faulty. You may just not be triggering the bug.

hmm, what boards was this?
I am planning to a larger test here with all our custom mpc860 and mpc862 boards. We have them in
100, 80 and 50 MHZ variants.

May be the bug is related to board design?  Is there an official errata from Motorla
regarding this bug? I can't find any.

Anyhow I had a flaw in my testprogram, so you can throw this version of copy_page() away.
But enabling the use of dcbz in the current version still gives me 30%+ performance increase.

See the embedded list for details.

 Jocke

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-03-01 14:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-28 10:41 FW: Improved copy_page() function, about 30% speed up for mpc860! Joakim Tjernlund
2003-03-01  5:17 ` Daniel Jacobowitz
  -- strict thread matches above, loose matches on Subject: below --
2003-03-01 14:23 Joakim Tjernlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).