[PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
@ 2013-11-19 15:29 Lorenzo Pieralisi
  2013-11-19 16:14 ` Catalin Marinas
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Lorenzo Pieralisi @ 2013-11-19 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

Set-associative caches on all v7 implementations map the index bits
to physical addresses LSBs and tag bits to MSBs. On most systems with
sane DRAM controller configurations, this means that the current v7
cache flush routine using set/way operations triggers a DRAM memory
controller precharge/activate for every cache line writeback since the
cache routine cleans lines by first fixing the index and then looping
through ways.

Given the random content of cache tags, swapping the order between
indexes and ways loops do not prevent DRAM pages precharge and
activate cycles but at least, on average, improves the chances that
either multiple lines hit the same page or multiple lines belong to
different DRAM banks, improving throughput significantly.

This patch swaps the inner loops in the v7 cache flushing routine to
carry out the clean operations first on all sets belonging to a given
way (looping through sets) and then decrementing the way.

Benchmarks showed that by swapping the ordering in which sets and ways
are decremented in the v7 cache flushing routine, that uses set/way
operations, time required to flush caches is reduced significantly,
owing to improved writebacks throughput to the DRAM controller.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/mm/cache-v7.S | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index b5c467a..778bcf8 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -146,18 +146,18 @@ flush_levels:
 	ldr	r7, =0x7fff
 	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
 loop1:
-	mov	r9, r4				@ create working copy of max way size
+	mov	r9, r7				@ create working copy of max index
 loop2:
- ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
- THUMB(	lsl	r6, r9, r5		)
+ ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
+ THUMB(	lsl	r6, r4, r5		)
  THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
- ARM(	orr	r11, r11, r7, lsl r2	)	@ factor index number into r11
- THUMB(	lsl	r6, r7, r2		)
+ ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
+ THUMB(	lsl	r6, r9, r2		)
  THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
 	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
-	subs	r9, r9, #1			@ decrement the way
+	subs	r9, r9, #1			@ decrement the index
 	bge	loop2
-	subs	r7, r7, #1			@ decrement the index
+	subs	r4, r4, #1			@ decrement the way
 	bge	loop1
 skip:
 	add	r10, r10, #2			@ increment cache number
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 15:29 [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Lorenzo Pieralisi
@ 2013-11-19 16:14 ` Catalin Marinas
  2013-11-19 16:58 ` Nicolas Pitre
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2013-11-19 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 19, 2013 at 03:29:53PM +0000, Lorenzo Pieralisi wrote:
> Set-associative caches on all v7 implementations map the index bits
> to physical addresses LSBs and tag bits to MSBs. On most systems with
> sane DRAM controller configurations, this means that the current v7
> cache flush routine using set/way operations triggers a DRAM memory
> controller precharge/activate for every cache line writeback since the
> cache routine cleans lines by first fixing the index and then looping
> through ways.
> 
> Given the random content of cache tags, swapping the order between
> indexes and ways loops do not prevent DRAM pages precharge and
> activate cycles but at least, on average, improves the chances that
> either multiple lines hit the same page or multiple lines belong to
> different DRAM banks, improving throughput significantly.
> 
> This patch swaps the inner loops in the v7 cache flushing routine to
> carry out the clean operations first on all sets belonging to a given
> way (looping through sets) and then decrementing the way.
> 
> Benchmarks showed that by swapping the ordering in which sets and ways
> are decremented in the v7 cache flushing routine, that uses set/way
> operations, time required to flush caches is reduced significantly,
> owing to improved writebacks throughput to the DRAM controller.
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index b5c467a..778bcf8 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -146,18 +146,18 @@ flush_levels:
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
>  loop1:
> -	mov	r9, r4				@ create working copy of max way size
> +	mov	r9, r7				@ create working copy of max index
>  loop2:
> - ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
> - THUMB(	lsl	r6, r9, r5		)
> + ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
> + THUMB(	lsl	r6, r4, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> - ARM(	orr	r11, r11, r7, lsl r2	)	@ factor index number into r11
> - THUMB(	lsl	r6, r7, r2		)
> + ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
> + THUMB(	lsl	r6, r9, r2		)
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 15:29 [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Lorenzo Pieralisi
  2013-11-19 16:14 ` Catalin Marinas
@ 2013-11-19 16:58 ` Nicolas Pitre
  2013-11-19 17:04   ` Santosh Shilimkar
  2013-11-19 17:16   ` Catalin Marinas
  2013-11-19 17:35 ` Dave Martin
  2013-12-09 14:24 ` Lorenzo Pieralisi
  3 siblings, 2 replies; 8+ messages in thread
From: Nicolas Pitre @ 2013-11-19 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Nov 2013, Lorenzo Pieralisi wrote:

> Set-associative caches on all v7 implementations map the index bits
> to physical addresses LSBs and tag bits to MSBs. On most systems with
> sane DRAM controller configurations, this means that the current v7
> cache flush routine using set/way operations triggers a DRAM memory
> controller precharge/activate for every cache line writeback since the
> cache routine cleans lines by first fixing the index and then looping
> through ways.
> 
> Given the random content of cache tags, swapping the order between
> indexes and ways loops do not prevent DRAM pages precharge and
> activate cycles but at least, on average, improves the chances that
> either multiple lines hit the same page or multiple lines belong to
> different DRAM banks, improving throughput significantly.
> 
> This patch swaps the inner loops in the v7 cache flushing routine to
> carry out the clean operations first on all sets belonging to a given
> way (looping through sets) and then decrementing the way.
> 
> Benchmarks showed that by swapping the ordering in which sets and ways
> are decremented in the v7 cache flushing routine, that uses set/way
> operations, time required to flush caches is reduced significantly,
> owing to improved writebacks throughput to the DRAM controller.
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Could you include some benchmark results so we have an idea of the 
expected improvement scale?  Other than that...

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index b5c467a..778bcf8 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -146,18 +146,18 @@ flush_levels:
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
>  loop1:
> -	mov	r9, r4				@ create working copy of max way size
> +	mov	r9, r7				@ create working copy of max index
>  loop2:
> - ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
> - THUMB(	lsl	r6, r9, r5		)
> + ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
> + THUMB(	lsl	r6, r4, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> - ARM(	orr	r11, r11, r7, lsl r2	)	@ factor index number into r11
> - THUMB(	lsl	r6, r7, r2		)
> + ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
> + THUMB(	lsl	r6, r9, r2		)
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
> -	subs	r9, r9, #1			@ decrement the way
> +	subs	r9, r9, #1			@ decrement the index
>  	bge	loop2
> -	subs	r7, r7, #1			@ decrement the index
> +	subs	r4, r4, #1			@ decrement the way
>  	bge	loop1
>  skip:
>  	add	r10, r10, #2			@ increment cache number
> -- 
> 1.8.2.2
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 16:58 ` Nicolas Pitre
@ 2013-11-19 17:04   ` Santosh Shilimkar
  2013-11-19 17:16   ` Catalin Marinas
  1 sibling, 0 replies; 8+ messages in thread
From: Santosh Shilimkar @ 2013-11-19 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 19 November 2013 11:58 AM, Nicolas Pitre wrote:
> On Tue, 19 Nov 2013, Lorenzo Pieralisi wrote:
> 
>> Set-associative caches on all v7 implementations map the index bits
>> to physical addresses LSBs and tag bits to MSBs. On most systems with
>> sane DRAM controller configurations, this means that the current v7
>> cache flush routine using set/way operations triggers a DRAM memory
>> controller precharge/activate for every cache line writeback since the
>> cache routine cleans lines by first fixing the index and then looping
>> through ways.
>>
>> Given the random content of cache tags, swapping the order between
>> indexes and ways loops do not prevent DRAM pages precharge and
>> activate cycles but at least, on average, improves the chances that
>> either multiple lines hit the same page or multiple lines belong to
>> different DRAM banks, improving throughput significantly.
>>
>> This patch swaps the inner loops in the v7 cache flushing routine to
>> carry out the clean operations first on all sets belonging to a given
>> way (looping through sets) and then decrementing the way.
>>
>> Benchmarks showed that by swapping the ordering in which sets and ways
>> are decremented in the v7 cache flushing routine, that uses set/way
>> operations, time required to flush caches is reduced significantly,
>> owing to improved writebacks throughput to the DRAM controller.
>>
>> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> 
> Could you include some benchmark results so we have an idea of the 
> expected improvement scale?  Other than that...
> 
Am Curious about the results as well. For the patch itself
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 16:58 ` Nicolas Pitre
  2013-11-19 17:04   ` Santosh Shilimkar
@ 2013-11-19 17:16   ` Catalin Marinas
  2013-11-19 18:20     ` Lorenzo Pieralisi
  1 sibling, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2013-11-19 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 19, 2013 at 04:58:58PM +0000, Nicolas Pitre wrote:
> On Tue, 19 Nov 2013, Lorenzo Pieralisi wrote:
> 
> > Set-associative caches on all v7 implementations map the index bits
> > to physical addresses LSBs and tag bits to MSBs. On most systems with
> > sane DRAM controller configurations, this means that the current v7
> > cache flush routine using set/way operations triggers a DRAM memory
> > controller precharge/activate for every cache line writeback since the
> > cache routine cleans lines by first fixing the index and then looping
> > through ways.
> > 
> > Given the random content of cache tags, swapping the order between
> > indexes and ways loops do not prevent DRAM pages precharge and
> > activate cycles but at least, on average, improves the chances that
> > either multiple lines hit the same page or multiple lines belong to
> > different DRAM banks, improving throughput significantly.
> > 
> > This patch swaps the inner loops in the v7 cache flushing routine to
> > carry out the clean operations first on all sets belonging to a given
> > way (looping through sets) and then decrementing the way.
> > 
> > Benchmarks showed that by swapping the ordering in which sets and ways
> > are decremented in the v7 cache flushing routine, that uses set/way
> > operations, time required to flush caches is reduced significantly,
> > owing to improved writebacks throughput to the DRAM controller.
> > 
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> 
> Could you include some benchmark results so we have an idea of the 
> expected improvement scale?

Lorenzo should have some numbers.

It was initially raised by the hardware people and the ARM ARM was
changed in this respect between revB and revC.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 15:29 [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Lorenzo Pieralisi
  2013-11-19 16:14 ` Catalin Marinas
  2013-11-19 16:58 ` Nicolas Pitre
@ 2013-11-19 17:35 ` Dave Martin
  2013-12-09 14:24 ` Lorenzo Pieralisi
  3 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2013-11-19 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 19, 2013 at 03:29:53PM +0000, Lorenzo Pieralisi wrote:
> Set-associative caches on all v7 implementations map the index bits
> to physical addresses LSBs and tag bits to MSBs. On most systems with
> sane DRAM controller configurations, this means that the current v7
> cache flush routine using set/way operations triggers a DRAM memory
> controller precharge/activate for every cache line writeback since the
> cache routine cleans lines by first fixing the index and then looping
> through ways.
> 
> Given the random content of cache tags, swapping the order between
> indexes and ways loops do not prevent DRAM pages precharge and
> activate cycles but at least, on average, improves the chances that
> either multiple lines hit the same page or multiple lines belong to
> different DRAM banks, improving throughput significantly.
> 
> This patch swaps the inner loops in the v7 cache flushing routine to
> carry out the clean operations first on all sets belonging to a given
> way (looping through sets) and then decrementing the way.
> 
> Benchmarks showed that by swapping the ordering in which sets and ways
> are decremented in the v7 cache flushing routine, that uses set/way
> operations, time required to flush caches is reduced significantly,
> owing to improved writebacks throughput to the DRAM controller.

For the correctness of this patch:

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

My understanding of the performance implications is more limited, so
I'm happy to defer to others on that.

Cheers
---Dave

> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index b5c467a..778bcf8 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -146,18 +146,18 @@ flush_levels:
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
>  loop1:
> -	mov	r9, r4				@ create working copy of max way size
> +	mov	r9, r7				@ create working copy of max index
>  loop2:
> - ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
> - THUMB(	lsl	r6, r9, r5		)
> + ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
> + THUMB(	lsl	r6, r4, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> - ARM(	orr	r11, r11, r7, lsl r2	)	@ factor index number into r11
> - THUMB(	lsl	r6, r7, r2		)
> + ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
> + THUMB(	lsl	r6, r9, r2		)
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
> -	subs	r9, r9, #1			@ decrement the way
> +	subs	r9, r9, #1			@ decrement the index
>  	bge	loop2
> -	subs	r7, r7, #1			@ decrement the index
> +	subs	r4, r4, #1			@ decrement the way
>  	bge	loop1
>  skip:
>  	add	r10, r10, #2			@ increment cache number
> -- 
> 1.8.2.2
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 17:16   ` Catalin Marinas
@ 2013-11-19 18:20     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Pieralisi @ 2013-11-19 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 19, 2013 at 05:16:06PM +0000, Catalin Marinas wrote:
> On Tue, Nov 19, 2013 at 04:58:58PM +0000, Nicolas Pitre wrote:
> > On Tue, 19 Nov 2013, Lorenzo Pieralisi wrote:
> > 
> > > Set-associative caches on all v7 implementations map the index bits
> > > to physical addresses LSBs and tag bits to MSBs. On most systems with
> > > sane DRAM controller configurations, this means that the current v7
> > > cache flush routine using set/way operations triggers a DRAM memory
> > > controller precharge/activate for every cache line writeback since the
> > > cache routine cleans lines by first fixing the index and then looping
> > > through ways.
> > > 
> > > Given the random content of cache tags, swapping the order between
> > > indexes and ways loops do not prevent DRAM pages precharge and
> > > activate cycles but at least, on average, improves the chances that
> > > either multiple lines hit the same page or multiple lines belong to
> > > different DRAM banks, improving throughput significantly.
> > > 
> > > This patch swaps the inner loops in the v7 cache flushing routine to
> > > carry out the clean operations first on all sets belonging to a given
> > > way (looping through sets) and then decrementing the way.
> > > 
> > > Benchmarks showed that by swapping the ordering in which sets and ways
> > > are decremented in the v7 cache flushing routine, that uses set/way
> > > operations, time required to flush caches is reduced significantly,
> > > owing to improved writebacks throughput to the DRAM controller.
> > > 
> > > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > 
> > Could you include some benchmark results so we have an idea of the 
> > expected improvement scale?
> 
> Lorenzo should have some numbers.

Thanks for the acks.

Throughput improvements strictly depend on what's in the cache at
runtime, since the throughput to DRAM depends on the tag addresses of the
writebacks. I have seen throughput 2x on TC2 when L2 is likely to contain
sequential tags (same DRAM row and at row switch bank interleaving - when
I say likely it is because I can measure the number of writebacks using PMU
counters to verify L2 dirtyness, but I have no crystal ball, I can't
check the tag RAM content, I can only craft code that tries to set it
to specific values).

Current kernel code triggers the worst case: every cache line WB implies a
precharge/activate set of commands to the DRAM, since the bits mapping
sets are PA[15:6] on eg A15 L2, looping through ways with fixed index implies
opening a page on common DRAM controller configurations.

On TC2 timing I saw L2 clean/invalidate time going up to 1.8ms with the
current kernel code.

With this patch applied the timing depends on the L2 content, but it
cannot get worse than the current scenario, it can just improve and
that's what happens. With a syntetic idle benchmark (that memsets and
sleeps) worst case goes down to 1.2 ms, and average follows the same
pattern.

I can improve the commit log with HW details, if anyone can give it a go
on systems with embedded L2 that would be grand, actually I am really
keen on getting this code to -next asap for testing reasons.

Thank you,
Lorenzo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence
  2013-11-19 15:29 [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Lorenzo Pieralisi
                   ` (2 preceding siblings ...)
  2013-11-19 17:35 ` Dave Martin
@ 2013-12-09 14:24 ` Lorenzo Pieralisi
  3 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Pieralisi @ 2013-12-09 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

can I add this patch to the patch system please ? It was merged into -next
two weeks ago, no regressions reported, and it has been reviewed and acked.

Thanks,
Lorenzo

On Tue, Nov 19, 2013 at 03:29:53PM +0000, Lorenzo Pieralisi wrote:
> Set-associative caches on all v7 implementations map the index bits
> to physical addresses LSBs and tag bits to MSBs. On most systems with
> sane DRAM controller configurations, this means that the current v7
> cache flush routine using set/way operations triggers a DRAM memory
> controller precharge/activate for every cache line writeback since the
> cache routine cleans lines by first fixing the index and then looping
> through ways.
> 
> Given the random content of cache tags, swapping the order between
> indexes and ways loops do not prevent DRAM pages precharge and
> activate cycles but at least, on average, improves the chances that
> either multiple lines hit the same page or multiple lines belong to
> different DRAM banks, improving throughput significantly.
> 
> This patch swaps the inner loops in the v7 cache flushing routine to
> carry out the clean operations first on all sets belonging to a given
> way (looping through sets) and then decrementing the way.
> 
> Benchmarks showed that by swapping the ordering in which sets and ways
> are decremented in the v7 cache flushing routine, that uses set/way
> operations, time required to flush caches is reduced significantly,
> owing to improved writebacks throughput to the DRAM controller.
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index b5c467a..778bcf8 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -146,18 +146,18 @@ flush_levels:
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
>  loop1:
> -	mov	r9, r4				@ create working copy of max way size
> +	mov	r9, r7				@ create working copy of max index
>  loop2:
> - ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
> - THUMB(	lsl	r6, r9, r5		)
> + ARM(	orr	r11, r10, r4, lsl r5	)	@ factor way and cache number into r11
> + THUMB(	lsl	r6, r4, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> - ARM(	orr	r11, r11, r7, lsl r2	)	@ factor index number into r11
> - THUMB(	lsl	r6, r7, r2		)
> + ARM(	orr	r11, r11, r9, lsl r2	)	@ factor index number into r11
> + THUMB(	lsl	r6, r9, r2		)
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
> -	subs	r9, r9, #1			@ decrement the way
> +	subs	r9, r9, #1			@ decrement the index
>  	bge	loop2
> -	subs	r7, r7, #1			@ decrement the index
> +	subs	r4, r4, #1			@ decrement the way
>  	bge	loop1
>  skip:
>  	add	r10, r10, #2			@ increment cache number
> -- 
> 1.8.2.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-12-09 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-19 15:29 [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Lorenzo Pieralisi
2013-11-19 16:14 ` Catalin Marinas
2013-11-19 16:58 ` Nicolas Pitre
2013-11-19 17:04   ` Santosh Shilimkar
2013-11-19 17:16   ` Catalin Marinas
2013-11-19 18:20     ` Lorenzo Pieralisi
2013-11-19 17:35 ` Dave Martin
2013-12-09 14:24 ` Lorenzo Pieralisi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).