* [PATCH v3 0/2] arm64: change PoC D-cache flush to PoU
@ 2015-12-16 10:11 Ashok Kumar
2015-12-16 10:11 ` [PATCH v3 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
2015-12-16 10:11 ` [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar
0 siblings, 2 replies; 5+ messages in thread
From: Ashok Kumar @ 2015-12-16 10:11 UTC (permalink / raw)
To: linux-arm-kernel
For keeping I and D coherent, dcache flush till PoU(Point of Unification)
should be sufficient instead of doing till PoC(Point of coherence).
In SoC with more levels of cache, there could be a performance hit in doing
flush till PoC as __flush_dcache_area does both flush and invalidate.
Introduced new API __clean_dcache_area_pou which does only clean till PoU.
Also deferred dcache flush in __cpu_copy_user_page to __sync_icache_dcache.
changes since v2 [2]:
Incorporated Mark Rutland's review comments of
* fixing comments
* creating a helper function for __sync_icache_dcache.
changes since v1 [1]:
Incorporated Mark Rutland's review comments of
* renaming __flush_dcache_area_pou to __clean_dcache_area_pou
* using inner shareable domain for dsb in __clean_dcache_area_pou
* having a common macro for __flush_dcache_area and
__clean_dcache_area_pou.
Thanks,
Ashok
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393527.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393837.html
Ashok Kumar (2):
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: Use PoU cache instr for I/D coherency
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/cache.S | 50 +++++++++++++++++++++++++++++--------
arch/arm64/mm/copypage.c | 3 ++-
arch/arm64/mm/flush.c | 33 +++++++++++++-----------
4 files changed, 60 insertions(+), 27 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v3 1/2] arm64: Defer dcache flush in __cpu_copy_user_page 2015-12-16 10:11 [PATCH v3 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar @ 2015-12-16 10:11 ` Ashok Kumar 2015-12-16 10:11 ` [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar 1 sibling, 0 replies; 5+ messages in thread From: Ashok Kumar @ 2015-12-16 10:11 UTC (permalink / raw) To: linux-arm-kernel Defer dcache flushing to __sync_icache_dcache by calling flush_dcache_page which clears PG_dcache_clean flag. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ashok Kumar <ashoks@broadcom.com> --- arch/arm64/mm/copypage.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c index 13bbc3be..22e4cb4 100644 --- a/arch/arm64/mm/copypage.c +++ b/arch/arm64/mm/copypage.c @@ -24,8 +24,9 @@ void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr) { + struct page *page = virt_to_page(kto); copy_page(kto, kfrom); - __flush_dcache_area(kto, PAGE_SIZE); + flush_dcache_page(page); } EXPORT_SYMBOL_GPL(__cpu_copy_user_page); -- 2.1.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency 2015-12-16 10:11 [PATCH v3 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar 2015-12-16 10:11 ` [PATCH v3 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar @ 2015-12-16 10:11 ` Ashok Kumar 2015-12-16 11:39 ` Catalin Marinas 2015-12-16 12:18 ` Will Deacon 1 sibling, 2 replies; 5+ messages in thread From: Ashok Kumar @ 2015-12-16 10:11 UTC (permalink / raw) To: linux-arm-kernel In systems with three levels of cache(PoU at L1 and PoC at L3), PoC cache flush instructions flushes L2 and L3 caches which could affect performance. For cache flushes for I and D coherency, PoU should suffice. So changing all I and D coherency related cache flushes to PoU. Introduced a new __clean_dcache_area_pou API for dcache flush till PoU and provided a common macro for __flush_dcache_area and __clean_dcache_area_pou. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ashok Kumar <ashoks@broadcom.com> --- arch/arm64/include/asm/cacheflush.h | 1 + arch/arm64/mm/cache.S | 50 +++++++++++++++++++++++++++++-------- arch/arm64/mm/flush.c | 33 +++++++++++++----------- 3 files changed, 58 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index c75b8d0..6a5ecbd 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -68,6 +68,7 @@ extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void flush_icache_range(unsigned long start, unsigned long end); extern void __flush_dcache_area(void *addr, size_t len); +extern void __clean_dcache_area_pou(void *addr, size_t len); extern long __flush_cache_user_range(unsigned long start, unsigned long end); static inline void flush_cache_mm(struct mm_struct *mm) diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index eb48d5d..b700a97 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -79,28 +79,56 @@ ENDPROC(flush_icache_range) ENDPROC(__flush_cache_user_range) /* + * Macro to perform a data cache maintenance for the interval + * [kaddr, kaddr + size) + * + * op: operation passed to dc instruction + * domain: domain used in dsb instruciton + * kaddr: starting virtual address of the region + * size: size of the region + * Corrupts: kaddr, size, tmp1, tmp2 + */ + .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 + dcache_line_size \tmp1, \tmp2 + add \size, \kaddr, \size + sub \tmp2, \tmp1, #1 + bic \kaddr, \kaddr, \tmp2 +1: dc \op, \kaddr + add \kaddr, \kaddr, \tmp1 + cmp \kaddr, \size + b.lo 1b + dsb \domain + .endm + +/* * __flush_dcache_area(kaddr, size) * - * Ensure that the data held in the page kaddr is written back to the - * page in question. + * Ensure that any D-cache lines for the interval [kaddr, kaddr+size) + * are cleaned and invalidated to the PoC. * * - kaddr - kernel address * - size - size in question */ ENTRY(__flush_dcache_area) - dcache_line_size x2, x3 - add x1, x0, x1 - sub x3, x2, #1 - bic x0, x0, x3 -1: dc civac, x0 // clean & invalidate D line / unified line - add x0, x0, x2 - cmp x0, x1 - b.lo 1b - dsb sy + dcache_by_line_op civac, sy, x0, x1, x2, x3 ret ENDPROC(__flush_dcache_area) /* + * __clean_dcache_area_pou(kaddr, size) + * + * Ensure that any D-cache lines for the interval [kaddr, kaddr+size) + * are cleaned to the PoU. + * + * - kaddr - kernel address + * - size - size in question + */ +ENTRY(__clean_dcache_area_pou) + dcache_by_line_op cvau, ish, x0, x1, x2, x3 + ret +ENDPROC(__clean_dcache_area_pou) + +/* * __inval_cache_range(start, end) * - start - start address of region * - end - end address of region diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index c26b804..46649d6 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -34,19 +34,24 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, __flush_icache_all(); } +static void sync_icache_aliases(void *kaddr, unsigned long len) +{ + unsigned long addr = (unsigned long)kaddr; + + if (icache_is_aliasing()) { + __clean_dcache_area_pou(kaddr, len); + __flush_icache_all(); + } else { + flush_icache_range(addr, addr + len); + } +} + static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page, unsigned long uaddr, void *kaddr, unsigned long len) { - if (vma->vm_flags & VM_EXEC) { - unsigned long addr = (unsigned long)kaddr; - if (icache_is_aliasing()) { - __flush_dcache_area(kaddr, len); - __flush_icache_all(); - } else { - flush_icache_range(addr, addr + len); - } - } + if (vma->vm_flags & VM_EXEC) + sync_icache_aliases(kaddr, len); } /* @@ -74,13 +79,11 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr) if (!page_mapping(page)) return; - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) { - __flush_dcache_area(page_address(page), - PAGE_SIZE << compound_order(page)); + if (!test_and_set_bit(PG_dcache_clean, &page->flags)) + sync_icache_aliases(page_address(page), + PAGE_SIZE << compound_order(page)); + else if (icache_is_aivivt()) __flush_icache_all(); - } else if (icache_is_aivivt()) { - __flush_icache_all(); - } } /* -- 2.1.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency 2015-12-16 10:11 ` [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar @ 2015-12-16 11:39 ` Catalin Marinas 2015-12-16 12:18 ` Will Deacon 1 sibling, 0 replies; 5+ messages in thread From: Catalin Marinas @ 2015-12-16 11:39 UTC (permalink / raw) To: linux-arm-kernel On Wed, Dec 16, 2015 at 02:11:30AM -0800, Ashok Kumar wrote: > @@ -74,13 +79,11 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr) > if (!page_mapping(page)) > return; > > - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) { > - __flush_dcache_area(page_address(page), > - PAGE_SIZE << compound_order(page)); > + if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > + sync_icache_aliases(page_address(page), > + PAGE_SIZE << compound_order(page)); > + else if (icache_is_aivivt()) > __flush_icache_all(); > - } else if (icache_is_aivivt()) { > - __flush_icache_all(); > - } > } You changed the original code path slightly here. We had a __flush_icache_all() even for non-aliasing VIPT but it now does the I-cache invalidation per page. It may be an improvement, I can't tell without benchmarks but you should at least mention this in the commit log so that we remember in the future. Apart from this: Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency 2015-12-16 10:11 ` [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar 2015-12-16 11:39 ` Catalin Marinas @ 2015-12-16 12:18 ` Will Deacon 1 sibling, 0 replies; 5+ messages in thread From: Will Deacon @ 2015-12-16 12:18 UTC (permalink / raw) To: linux-arm-kernel On Wed, Dec 16, 2015 at 02:11:30AM -0800, Ashok Kumar wrote: > In systems with three levels of cache(PoU at L1 and PoC at L3), > PoC cache flush instructions flushes L2 and L3 caches which could affect > performance. > For cache flushes for I and D coherency, PoU should suffice. > So changing all I and D coherency related cache flushes to PoU. > > Introduced a new __clean_dcache_area_pou API for dcache flush till PoU > and provided a common macro for __flush_dcache_area and > __clean_dcache_area_pou. > > Reviewed-by: Mark Rutland <mark.rutland@arm.com> > Signed-off-by: Ashok Kumar <ashoks@broadcom.com> > --- > arch/arm64/include/asm/cacheflush.h | 1 + > arch/arm64/mm/cache.S | 50 +++++++++++++++++++++++++++++-------- > arch/arm64/mm/flush.c | 33 +++++++++++++----------- > 3 files changed, 58 insertions(+), 26 deletions(-) > > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h > index c75b8d0..6a5ecbd 100644 > --- a/arch/arm64/include/asm/cacheflush.h > +++ b/arch/arm64/include/asm/cacheflush.h > @@ -68,6 +68,7 @@ > extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); > extern void flush_icache_range(unsigned long start, unsigned long end); > extern void __flush_dcache_area(void *addr, size_t len); > +extern void __clean_dcache_area_pou(void *addr, size_t len); > extern long __flush_cache_user_range(unsigned long start, unsigned long end); > > static inline void flush_cache_mm(struct mm_struct *mm) > diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S > index eb48d5d..b700a97 100644 > --- a/arch/arm64/mm/cache.S > +++ b/arch/arm64/mm/cache.S > @@ -79,28 +79,56 @@ ENDPROC(flush_icache_range) > ENDPROC(__flush_cache_user_range) > > /* > + * Macro to perform a data cache maintenance for the interval > + * [kaddr, kaddr + size) > + * > + * op: operation passed to dc instruction > + * domain: domain used in dsb instruciton > + * kaddr: starting virtual address of the region > + * size: size of the region > + * Corrupts: kaddr, size, tmp1, tmp2 > + */ > + .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 > + dcache_line_size \tmp1, \tmp2 > + add \size, \kaddr, \size > + sub \tmp2, \tmp1, #1 > + bic \kaddr, \kaddr, \tmp2 > +1: dc \op, \kaddr > + add \kaddr, \kaddr, \tmp1 > + cmp \kaddr, \size > + b.lo 1b > + dsb \domain > + .endm Minor comment, but can you stick this in proc-macros.S and change that label from 1: to something like 9998 please? Other than that, looks good. I can take the next version for 4.5. Cheers, Will ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-12-16 12:18 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-16 10:11 [PATCH v3 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar 2015-12-16 10:11 ` [PATCH v3 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar 2015-12-16 10:11 ` [PATCH v3 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar 2015-12-16 11:39 ` Catalin Marinas 2015-12-16 12:18 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).