* [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
@ 2023-11-14 14:33 Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 1/2] " Jisheng Zhang
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Jisheng Zhang @ 2023-11-14 14:33 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou; +Cc: linux-riscv, linux-kernel
Previously, we use alternative mechanism to dynamically patch
the CMO operations for THEAD C906/C910 during boot for performance
reason. But as pointed out by Arnd, "there is already a significant
cost in accessing the invalidated cache lines afterwards, which is
likely going to be much higher than the cost of an indirect branch".
And indeed, there's no performance difference with GMAC and EMMC per
my test on Sipeed Lichee Pi 4A board.
Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
the alternative code, and to acchieve Arnd's goal -- "I think
moving the THEAD ops at the same level as all nonstandard operations
makes sense, but I'd still leave CMO as an explicit fast path that
avoids the indirect branch. This seems like the right thing to do both
for readability and for platforms on which the indirect branch has a
noticeable overhead."
To make bisect easy, I use two patches here: patch1 does the conversion
which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I
assume no functionalities changes. patch2 uses T-HEAD PA based CMO
instructions so that we don't need to covert PA to VA.
since v4:
- rebase on 6.7rc1
since v3:
- collect Reviewed-by tag
since v2:
- collect Reviewed-by tag (but missed them in fact)
- fix typo
since v1:
- collect Tested-by tag
- add patch2 to use T-HEAD PA based CMO instructions.
Jisheng Zhang (2):
riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
riscv: errata: thead: use pa based instructions for CMO
arch/riscv/Kconfig.errata | 1 +
arch/riscv/errata/thead/errata.c | 69 +++++++++++++++++++++++++++-
arch/riscv/include/asm/errata_list.h | 50 +++-----------------
3 files changed, 74 insertions(+), 46 deletions(-)
--
2.40.0
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH RESEND v4 1/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
2023-11-14 14:33 [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO Jisheng Zhang
@ 2023-11-14 14:33 ` Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 2/2] riscv: errata: thead: use pa based instructions " Jisheng Zhang
2024-01-11 14:50 ` [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops " patchwork-bot+linux-riscv
2 siblings, 0 replies; 4+ messages in thread
From: Jisheng Zhang @ 2023-11-14 14:33 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, Emil Renner Berthing, Conor Dooley
Previously, we use alternative mechanism to dynamically patch
the CMO operations for THEAD C906/C910 during boot for performance
reason. But as pointed out by Arnd, "there is already a significant
cost in accessing the invalidated cache lines afterwards, which is
likely going to be much higher than the cost of an indirect branch".
And indeed, there's no performance difference with GMAC and EMMC per
my test on Sipeed Lichee Pi 4A board.
Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify
the alternative code, and to acchieve Arnd's goal -- "I think
moving the THEAD ops at the same level as all nonstandard operations
makes sense, but I'd still leave CMO as an explicit fast path that
avoids the indirect branch. This seems like the right thing to do both
for readability and for platforms on which the indirect branch has a
noticeable overhead."
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
---
arch/riscv/Kconfig.errata | 1 +
arch/riscv/errata/thead/errata.c | 75 +++++++++++++++++++++++++++-
arch/riscv/include/asm/errata_list.h | 50 +++----------------
3 files changed, 80 insertions(+), 46 deletions(-)
diff --git a/arch/riscv/Kconfig.errata b/arch/riscv/Kconfig.errata
index e2c731cfed8c..dedb8b238e73 100644
--- a/arch/riscv/Kconfig.errata
+++ b/arch/riscv/Kconfig.errata
@@ -79,6 +79,7 @@ config ERRATA_THEAD_CMO
depends on ERRATA_THEAD && MMU
select DMA_DIRECT_REMAP
select RISCV_DMA_NONCOHERENT
+ select RISCV_NONSTANDARD_CACHE_OPS
default y
help
This will apply the cache management errata to handle the
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index 0554ed4bf087..c07d957b1468 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -12,8 +12,10 @@
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
+#include <asm/dma-noncoherent.h>
#include <asm/errata_list.h>
#include <asm/hwprobe.h>
+#include <asm/io.h>
#include <asm/patch.h>
#include <asm/vendorid_list.h>
@@ -33,6 +35,75 @@ static bool errata_probe_pbmt(unsigned int stage,
return false;
}
+/*
+ * th.dcache.ipa rs1 (invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01010 rs1 000 00000 0001011
+ * th.dcache.iva rs1 (invalidate, virtual address)
+ * 0000001 00110 rs1 000 00000 0001011
+ *
+ * th.dcache.cpa rs1 (clean, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01001 rs1 000 00000 0001011
+ * th.dcache.cva rs1 (clean, virtual address)
+ * 0000001 00101 rs1 000 00000 0001011
+ *
+ * th.dcache.cipa rs1 (clean then invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01011 rs1 000 00000 0001011
+ * th.dcache.civa rs1 (clean then invalidate, virtual address)
+ * 0000001 00111 rs1 000 00000 0001011
+ *
+ * th.sync.s (make sure all cache operations finished)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000000 11001 00000 000 00000 0001011
+ */
+#define THEAD_INVAL_A0 ".long 0x0265000b"
+#define THEAD_CLEAN_A0 ".long 0x0255000b"
+#define THEAD_FLUSH_A0 ".long 0x0275000b"
+#define THEAD_SYNC_S ".long 0x0190000b"
+
+#define THEAD_CMO_OP(_op, _start, _size, _cachesize) \
+asm volatile("mv a0, %1\n\t" \
+ "j 2f\n\t" \
+ "3:\n\t" \
+ THEAD_##_op##_A0 "\n\t" \
+ "add a0, a0, %0\n\t" \
+ "2:\n\t" \
+ "bltu a0, %2, 3b\n\t" \
+ THEAD_SYNC_S \
+ : : "r"(_cachesize), \
+ "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \
+ "r"((unsigned long)(_start) + (_size)) \
+ : "a0")
+
+static void thead_errata_cache_inv(phys_addr_t paddr, size_t size)
+{
+ void *vaddr = phys_to_virt(paddr);
+
+ THEAD_CMO_OP(INVAL, vaddr, size, riscv_cbom_block_size);
+}
+
+static void thead_errata_cache_wback(phys_addr_t paddr, size_t size)
+{
+ void *vaddr = phys_to_virt(paddr);
+
+ THEAD_CMO_OP(CLEAN, vaddr, size, riscv_cbom_block_size);
+}
+
+static void thead_errata_cache_wback_inv(phys_addr_t paddr, size_t size)
+{
+ void *vaddr = phys_to_virt(paddr);
+
+ THEAD_CMO_OP(FLUSH, vaddr, size, riscv_cbom_block_size);
+}
+
+static const struct riscv_nonstd_cache_ops thead_errata_cmo_ops = {
+ .wback = &thead_errata_cache_wback,
+ .inv = &thead_errata_cache_inv,
+ .wback_inv = &thead_errata_cache_wback_inv,
+};
+
static bool errata_probe_cmo(unsigned int stage,
unsigned long arch_id, unsigned long impid)
{
@@ -48,6 +119,7 @@ static bool errata_probe_cmo(unsigned int stage,
if (stage == RISCV_ALTERNATIVES_BOOT) {
riscv_cbom_block_size = L1_CACHE_BYTES;
riscv_noncoherent_supported();
+ riscv_noncoherent_register_cache_ops(&thead_errata_cmo_ops);
}
return true;
@@ -77,8 +149,7 @@ static u32 thead_errata_probe(unsigned int stage,
if (errata_probe_pbmt(stage, archid, impid))
cpu_req_errata |= BIT(ERRATA_THEAD_PBMT);
- if (errata_probe_cmo(stage, archid, impid))
- cpu_req_errata |= BIT(ERRATA_THEAD_CMO);
+ errata_probe_cmo(stage, archid, impid);
if (errata_probe_pmu(stage, archid, impid))
cpu_req_errata |= BIT(ERRATA_THEAD_PMU);
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 83ed25e43553..ea33288f8a25 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -24,9 +24,8 @@
#ifdef CONFIG_ERRATA_THEAD
#define ERRATA_THEAD_PBMT 0
-#define ERRATA_THEAD_CMO 1
-#define ERRATA_THEAD_PMU 2
-#define ERRATA_THEAD_NUMBER 3
+#define ERRATA_THEAD_PMU 1
+#define ERRATA_THEAD_NUMBER 2
#endif
#ifdef __ASSEMBLY__
@@ -94,54 +93,17 @@ asm volatile(ALTERNATIVE( \
#define ALT_THEAD_PMA(_val)
#endif
-/*
- * th.dcache.ipa rs1 (invalidate, physical address)
- * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
- * 0000001 01010 rs1 000 00000 0001011
- * th.dache.iva rs1 (invalida, virtual address)
- * 0000001 00110 rs1 000 00000 0001011
- *
- * th.dcache.cpa rs1 (clean, physical address)
- * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
- * 0000001 01001 rs1 000 00000 0001011
- * th.dcache.cva rs1 (clean, virtual address)
- * 0000001 00101 rs1 000 00000 0001011
- *
- * th.dcache.cipa rs1 (clean then invalidate, physical address)
- * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
- * 0000001 01011 rs1 000 00000 0001011
- * th.dcache.civa rs1 (... virtual address)
- * 0000001 00111 rs1 000 00000 0001011
- *
- * th.sync.s (make sure all cache operations finished)
- * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
- * 0000000 11001 00000 000 00000 0001011
- */
-#define THEAD_INVAL_A0 ".long 0x0265000b"
-#define THEAD_CLEAN_A0 ".long 0x0255000b"
-#define THEAD_FLUSH_A0 ".long 0x0275000b"
-#define THEAD_SYNC_S ".long 0x0190000b"
-
#define ALT_CMO_OP(_op, _start, _size, _cachesize) \
-asm volatile(ALTERNATIVE_2( \
- __nops(6), \
+asm volatile(ALTERNATIVE( \
+ __nops(5), \
"mv a0, %1\n\t" \
"j 2f\n\t" \
"3:\n\t" \
CBO_##_op(a0) \
"add a0, a0, %0\n\t" \
"2:\n\t" \
- "bltu a0, %2, 3b\n\t" \
- "nop", 0, RISCV_ISA_EXT_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \
- "mv a0, %1\n\t" \
- "j 2f\n\t" \
- "3:\n\t" \
- THEAD_##_op##_A0 "\n\t" \
- "add a0, a0, %0\n\t" \
- "2:\n\t" \
- "bltu a0, %2, 3b\n\t" \
- THEAD_SYNC_S, THEAD_VENDOR_ID, \
- ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \
+ "bltu a0, %2, 3b\n\t", \
+ 0, RISCV_ISA_EXT_ZICBOM, CONFIG_RISCV_ISA_ZICBOM) \
: : "r"(_cachesize), \
"r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \
"r"((unsigned long)(_start) + (_size)) \
--
2.40.0
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH RESEND v4 2/2] riscv: errata: thead: use pa based instructions for CMO
2023-11-14 14:33 [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 1/2] " Jisheng Zhang
@ 2023-11-14 14:33 ` Jisheng Zhang
2024-01-11 14:50 ` [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops " patchwork-bot+linux-riscv
2 siblings, 0 replies; 4+ messages in thread
From: Jisheng Zhang @ 2023-11-14 14:33 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, Guo Ren
T-HEAD CPUs such as C906/C910/C920 support phy address based CMO, use
them so that we don't need to convert to virt address.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
---
arch/riscv/errata/thead/errata.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index c07d957b1468..b1c410bbc1ae 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -58,9 +58,9 @@ static bool errata_probe_pbmt(unsigned int stage,
* | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
* 0000000 11001 00000 000 00000 0001011
*/
-#define THEAD_INVAL_A0 ".long 0x0265000b"
-#define THEAD_CLEAN_A0 ".long 0x0255000b"
-#define THEAD_FLUSH_A0 ".long 0x0275000b"
+#define THEAD_INVAL_A0 ".long 0x02a5000b"
+#define THEAD_CLEAN_A0 ".long 0x0295000b"
+#define THEAD_FLUSH_A0 ".long 0x02b5000b"
#define THEAD_SYNC_S ".long 0x0190000b"
#define THEAD_CMO_OP(_op, _start, _size, _cachesize) \
@@ -79,23 +79,17 @@ asm volatile("mv a0, %1\n\t" \
static void thead_errata_cache_inv(phys_addr_t paddr, size_t size)
{
- void *vaddr = phys_to_virt(paddr);
-
- THEAD_CMO_OP(INVAL, vaddr, size, riscv_cbom_block_size);
+ THEAD_CMO_OP(INVAL, paddr, size, riscv_cbom_block_size);
}
static void thead_errata_cache_wback(phys_addr_t paddr, size_t size)
{
- void *vaddr = phys_to_virt(paddr);
-
- THEAD_CMO_OP(CLEAN, vaddr, size, riscv_cbom_block_size);
+ THEAD_CMO_OP(CLEAN, paddr, size, riscv_cbom_block_size);
}
static void thead_errata_cache_wback_inv(phys_addr_t paddr, size_t size)
{
- void *vaddr = phys_to_virt(paddr);
-
- THEAD_CMO_OP(FLUSH, vaddr, size, riscv_cbom_block_size);
+ THEAD_CMO_OP(FLUSH, paddr, size, riscv_cbom_block_size);
}
static const struct riscv_nonstd_cache_ops thead_errata_cmo_ops = {
--
2.40.0
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
2023-11-14 14:33 [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 1/2] " Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 2/2] riscv: errata: thead: use pa based instructions " Jisheng Zhang
@ 2024-01-11 14:50 ` patchwork-bot+linux-riscv
2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-01-11 14:50 UTC (permalink / raw)
To: Jisheng Zhang; +Cc: linux-riscv, paul.walmsley, palmer, aou, linux-kernel
Hello:
This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:
On Tue, 14 Nov 2023 22:33:36 +0800 you wrote:
> Previously, we use alternative mechanism to dynamically patch
> the CMO operations for THEAD C906/C910 during boot for performance
> reason. But as pointed out by Arnd, "there is already a significant
> cost in accessing the invalidated cache lines afterwards, which is
> likely going to be much higher than the cost of an indirect branch".
> And indeed, there's no performance difference with GMAC and EMMC per
> my test on Sipeed Lichee Pi 4A board.
>
> [...]
Here is the summary with links:
- [RESEND,v4,1/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO
https://git.kernel.org/riscv/c/a4ff64edf9ed
- [RESEND,v4,2/2] riscv: errata: thead: use pa based instructions for CMO
https://git.kernel.org/riscv/c/3690492612ec
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-01-11 14:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-14 14:33 [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops for CMO Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 1/2] " Jisheng Zhang
2023-11-14 14:33 ` [PATCH RESEND v4 2/2] riscv: errata: thead: use pa based instructions " Jisheng Zhang
2024-01-11 14:50 ` [PATCH RESEND v4 0/2] riscv: errata: thead: use riscv_nonstd_cache_ops " patchwork-bot+linux-riscv
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox