From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Peter 'p2' De Schrijver" Subject: Re: [PATCH 01/13] OMAP3: PM: Update clean_l2 to use v7_flush_dcache_all Date: Fri, 19 Nov 2010 11:57:40 +0200 Message-ID: <20101119095740.GV26003@nokia.com> References: <1290131698-6194-1-git-send-email-nm@ti.com> <1290131698-6194-2-git-send-email-nm@ti.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from smtp.nokia.com ([147.243.128.24]:56706 "EHLO mgw-da01.nokia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752168Ab0KSJ6R (ORCPT ); Fri, 19 Nov 2010 04:58:17 -0500 Content-Disposition: inline In-Reply-To: Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: ext Jean Pihet Cc: Nishanth Menon , linux-omap , Kevin , Vishwanath Sripathy , Tony On Fri, Nov 19, 2010 at 10:46:19AM +0100, ext Jean Pihet wrote: > On Fri, Nov 19, 2010 at 2:54 AM, Nishanth Menon wrote: > > From: Richard Woodruff > > > > Analysis in TI kernel with ETM showed that using cache mapped flush > > in kernel instead of SO mapped flush cost drops by 65% (3.39mS down > > to 1.17mS) for clean_l2 which is used during sleep sequences. > > Overall: > > =A0 =A0 =A0 =A0- speed up > > =A0 =A0 =A0 =A0- unfortunately there isn't a good alternative flush= method today > > =A0 =A0 =A0 =A0- code reduction and less maintenance and potential = bug in > > =A0 =A0 =A0 =A0 =A0unmaintained code > > > > This also fixes the bug with the clean_l2 function usage. > > > > Reported-by: Tony Lindgren > > > > [nm@ti.com: ported rkw's proposal to 2.6.37-rc2] > > Signed-off-by: Nishanth Menon > > Signed-off-by: Richard Woodruff > > --- > > > > Side note: just dcache needs to be flushed based on inputs from TI = internal team > > > > =A0arch/arm/mach-omap2/sleep34xx.S | =A0 79 ++++++-----------------= --------------- > > =A01 files changed, 13 insertions(+), 66 deletions(-) > > > > diff --git a/arch/arm/mach-omap2/sleep34xx.S b/arch/arm/mach-omap2/= sleep34xx.S > > index 2fb205a..8f207b2 100644 > > --- a/arch/arm/mach-omap2/sleep34xx.S > > +++ b/arch/arm/mach-omap2/sleep34xx.S > > @@ -520,72 +520,17 @@ clean_caches: > > =A0 =A0 =A0 =A0cmp =A0 =A0 r9, #1 /* Check whether L2 inval is requ= ired or not*/ > > =A0 =A0 =A0 =A0bne =A0 =A0 skip_l2_inval > > =A0clean_l2: > > - =A0 =A0 =A0 /* read clidr */ > > - =A0 =A0 =A0 mrc =A0 =A0 p15, 1, r0, c0, c0, 1 > > - =A0 =A0 =A0 /* extract loc from clidr */ > > - =A0 =A0 =A0 ands =A0 =A0r3, r0, #0x7000000 > > - =A0 =A0 =A0 /* left align loc bit field */ > > - =A0 =A0 =A0 mov =A0 =A0 r3, r3, lsr #23 > > - =A0 =A0 =A0 /* if loc is 0, then no need to clean */ > > - =A0 =A0 =A0 beq =A0 =A0 finished > > - =A0 =A0 =A0 /* start clean at cache level 0 */ > > - =A0 =A0 =A0 mov =A0 =A0 r10, #0 > > -loop1: > > - =A0 =A0 =A0 /* work out 3x current cache level */ > > - =A0 =A0 =A0 add =A0 =A0 r2, r10, r10, lsr #1 > > - =A0 =A0 =A0 /* extract cache type bits from clidr*/ > > - =A0 =A0 =A0 mov =A0 =A0 r1, r0, lsr r2 > > - =A0 =A0 =A0 /* mask of the bits for current cache only */ > > - =A0 =A0 =A0 and =A0 =A0 r1, r1, #7 > > - =A0 =A0 =A0 /* see what cache we have at this level */ > > - =A0 =A0 =A0 cmp =A0 =A0 r1, #2 > > - =A0 =A0 =A0 /* skip if no cache, or just i-cache */ > > - =A0 =A0 =A0 blt =A0 =A0 skip > > - =A0 =A0 =A0 /* select current cache level in cssr */ > > - =A0 =A0 =A0 mcr =A0 =A0 p15, 2, r10, c0, c0, 0 > > - =A0 =A0 =A0 /* isb to sych the new cssr&csidr */ > > - =A0 =A0 =A0 isb > > - =A0 =A0 =A0 /* read the new csidr */ > > - =A0 =A0 =A0 mrc =A0 =A0 p15, 1, r1, c0, c0, 0 > > - =A0 =A0 =A0 /* extract the length of the cache lines */ > > - =A0 =A0 =A0 and =A0 =A0 r2, r1, #7 > > - =A0 =A0 =A0 /* add 4 (line length offset) */ > > - =A0 =A0 =A0 add =A0 =A0 r2, r2, #4 > > - =A0 =A0 =A0 ldr =A0 =A0 r4, assoc_mask > > - =A0 =A0 =A0 /* find maximum number on the way size */ > > - =A0 =A0 =A0 ands =A0 =A0r4, r4, r1, lsr #3 > > - =A0 =A0 =A0 /* find bit position of way size increment */ > > - =A0 =A0 =A0 clz =A0 =A0 r5, r4 > > - =A0 =A0 =A0 ldr =A0 =A0 r7, numset_mask > > - =A0 =A0 =A0 /* extract max number of the index size*/ > > - =A0 =A0 =A0 ands =A0 =A0r7, r7, r1, lsr #13 > > -loop2: > > - =A0 =A0 =A0 mov =A0 =A0 r9, r4 > > - =A0 =A0 =A0 /* create working copy of max way size*/ > > -loop3: > > - =A0 =A0 =A0 /* factor way and cache number into r11 */ > > - =A0 =A0 =A0 orr =A0 =A0 r11, r10, r9, lsl r5 > > - =A0 =A0 =A0 /* factor index number into r11 */ > > - =A0 =A0 =A0 orr =A0 =A0 r11, r11, r7, lsl r2 > > - =A0 =A0 =A0 /*clean & invalidate by set/way */ > > - =A0 =A0 =A0 mcr =A0 =A0 p15, 0, r11, c7, c10, 2 > > - =A0 =A0 =A0 /* decrement the way*/ > > - =A0 =A0 =A0 subs =A0 =A0r9, r9, #1 > > - =A0 =A0 =A0 bge =A0 =A0 loop3 > > - =A0 =A0 =A0 /*decrement the index */ > > - =A0 =A0 =A0 subs =A0 =A0r7, r7, #1 > > - =A0 =A0 =A0 bge =A0 =A0 loop2 > > -skip: > > - =A0 =A0 =A0 add =A0 =A0 r10, r10, #2 > > - =A0 =A0 =A0 /* increment cache number */ > > - =A0 =A0 =A0 cmp =A0 =A0 r3, r10 > > - =A0 =A0 =A0 bgt =A0 =A0 loop1 > > -finished: > > - =A0 =A0 =A0 /*swith back to cache level 0 */ > > - =A0 =A0 =A0 mov =A0 =A0 r10, #0 > > - =A0 =A0 =A0 /* select current cache level in cssr */ > > - =A0 =A0 =A0 mcr =A0 =A0 p15, 2, r10, c0, c0, 0 > > - =A0 =A0 =A0 isb > > + =A0 =A0 =A0 /* > > + =A0 =A0 =A0 =A0* jump out to kernel flush routine > > + =A0 =A0 =A0 =A0* =A0- resue that code is better > Typo: 'reuse' >=20 > > + =A0 =A0 =A0 =A0* =A0- it executes in a cached space so is faster = than refetch per-block > > + =A0 =A0 =A0 =A0* =A0- should be faster and will change with kerne= l > > + =A0 =A0 =A0 =A0* =A0- 'might' have to copy address, load and jump= to it > > + =A0 =A0 =A0 =A0*/ > > + =A0 =A0 =A0 ldr r1, kernel_flush > > + =A0 =A0 =A0 mov lr, pc > > + =A0 =A0 =A0 bx =A0r1 > It is simpler and more efficient to use: > bl v7_flush_dcache_all This doesn't work from SRAM though, because the linker will generate a PC relative branch which is wrong if the code is moved to SRAM at runtime. So the original version needs to stay :) Cheers, Peter. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html