linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] omap4: enable L2 prefetching
@ 2010-11-15 16:20 Nishanth Menon
       [not found] ` <yw1xaala8qex.fsf@unicorn.mansr.com>
  2010-11-16 18:49 ` Kevin Hilman
  0 siblings, 2 replies; 4+ messages in thread
From: Nishanth Menon @ 2010-11-15 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mans Rullgard <mans@mansr.com>

Enabling L2 prefetching improves performance as shown on Panda
ES2.1 board with mem test, and it has measurable impact on
performances. I think we should consider it, even though it damages
"writes" a bit. (rebased to k.org)
Usually the prefetch is used at both levels together L1 + L2, however,
to enable the CP15 prefetch engines, these are under security, and on
GP devices, we cannot enable it(e.g. on PandaBoard). However, just
enabling PL310 prefetch seems to provide performance improvement,
as shown in the data below (from Ubuntu) and would be a great thing
to pull in.

Measurement Data:
==
STOCK 10.10 WITHOUT PATCH
========================
~# ./memspeed
size    8388608 8192k 8M
offset  8388608, 0
buffers 0x2aaad000 0x2b2ad000
copy  libc          133 MB/s
copy  Android v5    273 MB/s
copy  Android NEON  235 MB/s
copy  INT32         116 MB/s
copy  ASM ARM       187 MB/s
copy  ASM VLDM 64   204 MB/s
copy  ASM VLDM 128  173 MB/s
copy  ASM VLD1      216 MB/s
read  ASM ARM       286 MB/s
read  ASM VLDM      242 MB/s
read  ASM VLD1      286 MB/s
write libc         1947 MB/s
write ASM ARM      1943 MB/s
write ASM VSTM     1942 MB/s
write ASM VST1     1935 MB/s

10.10 + PATCH
=============
~# ./memspeed
size    8388608 8192k 8M
offset  8388608, 0
buffers 0x2ab17000 0x2b317000
copy  libc          129 MB/s
copy  Android v5    256 MB/s
copy  Android NEON  356 MB/s
copy  INT32         127 MB/s
copy  ASM ARM       321 MB/s
copy  ASM VLDM 64   337 MB/s
copy  ASM VLDM 128  321 MB/s
copy  ASM VLD1      350 MB/s
read  ASM ARM       496 MB/s
read  ASM VLDM      470 MB/s
read  ASM VLD1      488 MB/s
write libc         1701 MB/s
write ASM ARM      1682 MB/s
write ASM VSTM     1693 MB/s
write ASM VST1     1681 MB/s

Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Signed-off-by: Mans Rullgard <mans@mansr.com>
---
Original:
http://git.mansr.com/?p=linux-panda;a=commit;h=450b17993ba7c36cea3f2c746ae26c268563ee59
http://git.tif.ti.com/vstehle/kernel-ubuntu.git?a=shortlog;h=refs/heads/vincent/mans-patches

 arch/arm/mach-omap2/omap4-common.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c
index 2f89555..a5e6126 100644
--- a/arch/arm/mach-omap2/omap4-common.c
+++ b/arch/arm/mach-omap2/omap4-common.c
@@ -64,6 +64,10 @@ static int __init omap_l2_cache_init(void)
 	l2cache_base = ioremap(OMAP44XX_L2CACHE_BASE, SZ_4K);
 	BUG_ON(!l2cache_base);
 
+	if (omap_rev() != OMAP4430_REV_ES1_0)
+		omap_smc1(0x109, 0x7e470000);
+
+
 	/* Enable PL310 L2 Cache controller */
 	omap_smc1(0x102, 0x1);
 
@@ -75,7 +79,7 @@ static int __init omap_l2_cache_init(void)
 	if (omap_rev() == OMAP4430_REV_ES1_0)
 		l2x0_init(l2cache_base, 0x0e050000, 0xc0000fff);
 	else
-		l2x0_init(l2cache_base, 0x0e070000, 0xc0000fff);
+		l2x0_init(l2cache_base, 0x7e470000, 0xc0000fff);
 
 	/*
 	 * Override default outer_cache.disable with a OMAP4
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] omap4: enable L2 prefetching
       [not found] ` <yw1xaala8qex.fsf@unicorn.mansr.com>
@ 2010-11-16 18:11   ` Tony Lindgren
  0 siblings, 0 replies; 4+ messages in thread
From: Tony Lindgren @ 2010-11-16 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

* M?ns Rullg?rd <mans@mansr.com> [101115 09:01]:
> Nishanth Menon <nm@ti.com> writes:
> 
> > From: Mans Rullgard <mans@mansr.com>
> >
> > Enabling L2 prefetching improves performance as shown on Panda
> > ES2.1 board with mem test, and it has measurable impact on
> > performances. I think we should consider it, even though it damages
> > "writes" a bit. (rebased to k.org)
> > Usually the prefetch is used at both levels together L1 + L2, however,
> > to enable the CP15 prefetch engines, these are under security, and on
> > GP devices, we cannot enable it(e.g. on PandaBoard). However, just
> > enabling PL310 prefetch seems to provide performance improvement,
> > as shown in the data below (from Ubuntu) and would be a great thing
> > to pull in.
> 
> What this does is enable automatic next line prefetching.  With this
> enabled, whenever the PL310 receives a cachable read request, it
> automatically prefetches the following cache line as well.  A larger
> offset can be programmed in secure mode, but the TI ROM authors
> neglected to include this.
> 
> Testing with FFmpeg showed a speedup of 10% with this patch in some
> cases.

M?ns and Nishant, care to repost this with the updated comments?

Regards,

Tony

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] omap4: enable L2 prefetching
  2010-11-15 16:20 [PATCH] omap4: enable L2 prefetching Nishanth Menon
       [not found] ` <yw1xaala8qex.fsf@unicorn.mansr.com>
@ 2010-11-16 18:49 ` Kevin Hilman
  2010-11-19 16:46   ` Santosh Shilimkar
  1 sibling, 1 reply; 4+ messages in thread
From: Kevin Hilman @ 2010-11-16 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Nishanth Menon <nm@ti.com> writes:

> From: Mans Rullgard <mans@mansr.com>
>
> Enabling L2 prefetching improves performance as shown on Panda
> ES2.1 board with mem test, and it has measurable impact on
> performances. I think we should consider it, even though it damages
> "writes" a bit. (rebased to k.org)
> Usually the prefetch is used at both levels together L1 + L2, however,
> to enable the CP15 prefetch engines, these are under security, and on
> GP devices, we cannot enable it(e.g. on PandaBoard). However, just
> enabling PL310 prefetch seems to provide performance improvement,
> as shown in the data below (from Ubuntu) and would be a great thing
> to pull in.

[...]

>  arch/arm/mach-omap2/omap4-common.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c
> index 2f89555..a5e6126 100644
> --- a/arch/arm/mach-omap2/omap4-common.c
> +++ b/arch/arm/mach-omap2/omap4-common.c
> @@ -64,6 +64,10 @@ static int __init omap_l2_cache_init(void)
>  	l2cache_base = ioremap(OMAP44XX_L2CACHE_BASE, SZ_4K);
>  	BUG_ON(!l2cache_base);
>  
>
> +	if (omap_rev() != OMAP4430_REV_ES1_0)
> +		omap_smc1(0x109, 0x7e470000);
>
>  	/* Enable PL310 L2 Cache controller */
>  	omap_smc1(0x102, 0x1);
>  
> @@ -75,7 +79,7 @@ static int __init omap_l2_cache_init(void)
>  	if (omap_rev() == OMAP4430_REV_ES1_0)
>  		l2x0_init(l2cache_base, 0x0e050000, 0xc0000fff);
>  	else
> -		l2x0_init(l2cache_base, 0x0e070000, 0xc0000fff);
> +		l2x0_init(l2cache_base, 0x7e470000, 0xc0000fff);
>  
>  	/*
>  	 * Override default outer_cache.disable with a OMAP4

Adding/updaing the in-code comments would be helpful as well.

The exiting use of all the hard-coded constants in this code is rather
unreadable and would be much more readable with symbolic constants, and
this change just continues the pattern.

Ideally, switching this code to use symbolic constants and then adding
the new feature would be a cleaner approach.

Kevin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] omap4: enable L2 prefetching
  2010-11-16 18:49 ` Kevin Hilman
@ 2010-11-19 16:46   ` Santosh Shilimkar
  0 siblings, 0 replies; 4+ messages in thread
From: Santosh Shilimkar @ 2010-11-19 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> owner at vger.kernel.org] On Behalf Of Kevin Hilman
> Sent: Wednesday, November 17, 2010 12:19 AM
> To: Nishanth Menon
> Cc: linux-omap; linux-arm; Mans Rullgard
> Subject: Re: [PATCH] omap4: enable L2 prefetching
>
> Nishanth Menon <nm@ti.com> writes:
>
> > From: Mans Rullgard <mans@mansr.com>
> >
> > Enabling L2 prefetching improves performance as shown on Panda
> > ES2.1 board with mem test, and it has measurable impact on
> > performances. I think we should consider it, even though it damages
> > "writes" a bit. (rebased to k.org)
> > Usually the prefetch is used at both levels together L1 + L2, however,
> > to enable the CP15 prefetch engines, these are under security, and on
> > GP devices, we cannot enable it(e.g. on PandaBoard). However, just
> > enabling PL310 prefetch seems to provide performance improvement,
> > as shown in the data below (from Ubuntu) and would be a great thing
> > to pull in.
>
> [...]
>
> >  arch/arm/mach-omap2/omap4-common.c |    6 +++++-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-
> omap2/omap4-common.c
> > index 2f89555..a5e6126 100644
> > --- a/arch/arm/mach-omap2/omap4-common.c
> > +++ b/arch/arm/mach-omap2/omap4-common.c
> > @@ -64,6 +64,10 @@ static int __init omap_l2_cache_init(void)
> >  	l2cache_base = ioremap(OMAP44XX_L2CACHE_BASE, SZ_4K);
> >  	BUG_ON(!l2cache_base);
> >
> >
> > +	if (omap_rev() != OMAP4430_REV_ES1_0)
> > +		omap_smc1(0x109, 0x7e470000);
> >
> >  	/* Enable PL310 L2 Cache controller */
> >  	omap_smc1(0x102, 0x1);
> >
> > @@ -75,7 +79,7 @@ static int __init omap_l2_cache_init(void)
> >  	if (omap_rev() == OMAP4430_REV_ES1_0)
> >  		l2x0_init(l2cache_base, 0x0e050000, 0xc0000fff);
> >  	else
> > -		l2x0_init(l2cache_base, 0x0e070000, 0xc0000fff);
> > +		l2x0_init(l2cache_base, 0x7e470000, 0xc0000fff);
> >
> >  	/*
> >  	 * Override default outer_cache.disable with a OMAP4
>
> Adding/updaing the in-code comments would be helpful as well.
>
> The exiting use of all the hard-coded constants in this code is rather
> unreadable and would be much more readable with symbolic constants, and
> this change just continues the pattern.
>
> Ideally, switching this code to use symbolic constants and then adding
> the new feature would be a cleaner approach.
>
I have cleaned up a code a bit based on the comments from Kevin and
also updated change log in Man's patch. Also added couple of
relevant patches as part of this series which I haven't posted yet.

Will post the series on the list soon

Regards,
Santosh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-11-19 16:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-15 16:20 [PATCH] omap4: enable L2 prefetching Nishanth Menon
     [not found] ` <yw1xaala8qex.fsf@unicorn.mansr.com>
2010-11-16 18:11   ` Tony Lindgren
2010-11-16 18:49 ` Kevin Hilman
2010-11-19 16:46   ` Santosh Shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).