* [PATCH v2 0/9] CFI for ARM32 using LLVM
@ 2024-03-07 14:21 Linus Walleij
2024-03-07 14:22 ` [PATCH v2 1/9] ARM: Support CLANG CFI Linus Walleij
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:21 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
This is a first patch set to support CLANG CFI (Control Flow
Integrity) on ARM32.
For information about what CFI is, see:
https://clang.llvm.org/docs/ControlFlowIntegrity.html
For the kernel KCFI flavor, see:
https://lwn.net/Articles/898040/
The base changes required to bring up KCFI on ARM32 was mostly
related to the use of custom vtables in the kernel, combined
with defines to call into these vtable members directly from
sites where they are used.
The approach to all of these vtable+define issues has been
the same: instead of a define, wrap the call in a static inline
function that explicitly calls the vtable member.
The permissive mode handles the new breakpoint type (0x03) that
LLVM CLANG is defining.
To runtime-test the patches:
- Enable CONFIG_LKDTM
- echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT
The patch set has been booted to userspace on the following
test platforms:
- Arm Versatile (QEMU)
- Arm Versatile Express (QEMU)
- multi_v7 booted on Versatile Express (QEMU)
- Footbridge Netwinder (SA110 ARMv4)
- Ux500 (ARMv7 SMP)
I am not saying there will not be corner cases that we need
to fix in addition to this, but it is enough to get started.
Looking at what was fixed for arm64 I am a bit weary that
e.g. BPF might need something to trampoline properly.
But hopefullt people can get to testing it and help me fix
remaining issues before the final version, or we can fix it
in-tree.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v2:
- Add the missing ftrace graph tracer stub.
- Enable permissive mode using a breakpoint handler.
- Link to v1: https://lore.kernel.org/r/20240225-arm32-cfi-v1-0-6943306f065b@linaro.org
---
Linus Walleij (9):
ARM: Support CLANG CFI
ARM: tlbflush: Make TLB flushes into static inlines
ARM: bugs: Check in the vtable instead of defined aliases
ARM: proc: Use inlines instead of defines
ARM: delay: Turn delay functions into static inlines
ARM: turn CPU cache flush functions into static inlines
ARM: page: Turn highpage accesses into static inlines
ARM: ftrace: Define ftrace_stub_graph
ARM: KCFI: Allow permissive CFI mode
arch/arm/Kconfig | 1 +
arch/arm/common/mcpm_entry.c | 10 ++-----
arch/arm/include/asm/cacheflush.h | 45 ++++++++++++++++++++++------
arch/arm/include/asm/delay.h | 16 ++++++++--
arch/arm/include/asm/hw_breakpoint.h | 1 +
arch/arm/include/asm/page.h | 36 ++++++++++++++++++-----
arch/arm/include/asm/proc-fns.h | 57 +++++++++++++++++++++++++++++-------
arch/arm/include/asm/tlbflush.h | 18 ++++++++----
arch/arm/kernel/bugs.c | 2 +-
arch/arm/kernel/entry-ftrace.S | 4 +++
arch/arm/kernel/hw_breakpoint.c | 10 +++++++
arch/arm/mach-sunxi/mc_smp.c | 7 +----
arch/arm/mm/dma.h | 28 ++++++++++++++----
arch/arm/mm/proc-syms.c | 7 +----
arch/arm/mm/proc-v7-bugs.c | 4 +--
15 files changed, 182 insertions(+), 64 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240115-arm32-cfi-65d60f201108
Best regards,
--
Linus Walleij <linus.walleij@linaro.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/9] ARM: Support CLANG CFI
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 2/9] ARM: tlbflush: Make TLB flushes into static inlines Linus Walleij
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
Support Control Flow Integrity (CFI) when compiling with
CLANG.
In the as-of-writing LLVM CLANG implementation (v17)
the 32-bit ARM platform is supported by the generic CFI
implementation, which isn't tailored specifically for ARM32
but works well enough to enable the feature.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0af6709570d1..1216656a40bc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -34,6 +34,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_CFI_CLANG
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_BUILTIN_BSWAP
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/9] ARM: tlbflush: Make TLB flushes into static inlines
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
2024-03-07 14:22 ` [PATCH v2 1/9] ARM: Support CLANG CFI Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 3/9] ARM: bugs: Check in the vtable instead of defined aliases Linus Walleij
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
Instead of just using defines to define the TLB flush functions,
use static inlines.
This has the upside that we can tag those as __nocfi so we can
execute a CFI-enabled kernel.
Move the variables around a bit so the functions can find their
global variable cpu_tlb.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/include/asm/tlbflush.h | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 38c6e4a2a0b6..7340518ee0e9 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -210,13 +210,23 @@ struct cpu_tlb_fns {
unsigned long tlb_flags;
};
+extern struct cpu_tlb_fns cpu_tlb;
+
+#define __cpu_tlb_flags cpu_tlb.tlb_flags
+
/*
* Select the calling method
*/
#ifdef MULTI_TLB
-#define __cpu_flush_user_tlb_range cpu_tlb.flush_user_range
-#define __cpu_flush_kern_tlb_range cpu_tlb.flush_kern_range
+static inline void __nocfi __cpu_flush_user_tlb_range(unsigned long s, unsigned long e, struct vm_area_struct *vma)
+{
+ cpu_tlb.flush_user_range(s, e, vma);
+}
+static inline void __nocfi __cpu_flush_kern_tlb_range(unsigned long s, unsigned long e)
+{
+ cpu_tlb.flush_kern_range(s, e);
+}
#else
@@ -228,10 +238,6 @@ extern void __cpu_flush_kern_tlb_range(unsigned long, unsigned long);
#endif
-extern struct cpu_tlb_fns cpu_tlb;
-
-#define __cpu_tlb_flags cpu_tlb.tlb_flags
-
/*
* TLB Management
* ==============
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/9] ARM: bugs: Check in the vtable instead of defined aliases
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
2024-03-07 14:22 ` [PATCH v2 1/9] ARM: Support CLANG CFI Linus Walleij
2024-03-07 14:22 ` [PATCH v2 2/9] ARM: tlbflush: Make TLB flushes into static inlines Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 4/9] ARM: proc: Use inlines instead of defines Linus Walleij
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
Instead of checking if cpu_check_bugs() exist, check for this
callback directly in the CPU vtable: this is better because the
function is just a define to the vtable entry and this is why
the code works. But we want to be able to specify a proper
function for cpu_check_bugs() so look into the vtable instead.
In bugs.c assign PROC_VTABLE(switch_mm) instead of
assigning cpu_do_switch_mm where again this is just a define
into the vtable: this makes it possible to make
cpu_do_switch_mm() into a real function.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/kernel/bugs.c | 2 +-
arch/arm/mm/proc-v7-bugs.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/kernel/bugs.c b/arch/arm/kernel/bugs.c
index 087bce6ec8e9..35d39efb51ed 100644
--- a/arch/arm/kernel/bugs.c
+++ b/arch/arm/kernel/bugs.c
@@ -7,7 +7,7 @@
void check_other_bugs(void)
{
#ifdef MULTI_CPU
- if (cpu_check_bugs)
+ if (PROC_VTABLE(check_bugs))
cpu_check_bugs();
#endif
}
diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c
index 8bc7a2d6d6c7..ea3ee2bd7b56 100644
--- a/arch/arm/mm/proc-v7-bugs.c
+++ b/arch/arm/mm/proc-v7-bugs.c
@@ -87,14 +87,14 @@ static unsigned int spectre_v2_install_workaround(unsigned int method)
case SPECTRE_V2_METHOD_HVC:
per_cpu(harden_branch_predictor_fn, cpu) =
call_hvc_arch_workaround_1;
- cpu_do_switch_mm = cpu_v7_hvc_switch_mm;
+ PROC_VTABLE(switch_mm) = cpu_v7_hvc_switch_mm;
spectre_v2_method = "hypervisor";
break;
case SPECTRE_V2_METHOD_SMC:
per_cpu(harden_branch_predictor_fn, cpu) =
call_smc_arch_workaround_1;
- cpu_do_switch_mm = cpu_v7_smc_switch_mm;
+ PROC_VTABLE(switch_mm) = cpu_v7_smc_switch_mm;
spectre_v2_method = "firmware";
break;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/9] ARM: proc: Use inlines instead of defines
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (2 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 3/9] ARM: bugs: Check in the vtable instead of defined aliases Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 5/9] ARM: delay: Turn delay functions into static inlines Linus Walleij
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
We currently access the per-cpu vtable with defines such
as:
define cpu_proc_init PROC_VTABLE(_proc_init)
Convert all of these instances to static inlines instead:
static inline __nocfi void cpu_proc_init(void)
{
PROC_VTABLE(_proc_init)();
}
This has the upside that we can add the __nocfi tag to
the inline function so CFI can skip over this and work,
and we can simplify some platform code that was looking
into the symbol table to be able to call cpu_reset(),
now we can just call it instead.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/common/mcpm_entry.c | 10 ++------
arch/arm/include/asm/proc-fns.h | 57 +++++++++++++++++++++++++++++++++--------
arch/arm/mach-sunxi/mc_smp.c | 7 +----
3 files changed, 50 insertions(+), 24 deletions(-)
diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
index e013ff1168d3..3e19f246caff 100644
--- a/arch/arm/common/mcpm_entry.c
+++ b/arch/arm/common/mcpm_entry.c
@@ -234,13 +234,10 @@ int mcpm_cpu_power_up(unsigned int cpu, unsigned int cluster)
return ret;
}
-typedef typeof(cpu_reset) phys_reset_t;
-
void mcpm_cpu_power_down(void)
{
unsigned int mpidr, cpu, cluster;
bool cpu_going_down, last_man;
- phys_reset_t phys_reset;
mpidr = read_cpuid_mpidr();
cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
@@ -298,8 +295,7 @@ void mcpm_cpu_power_down(void)
* the kernel as if the power_up method just had deasserted reset
* on the CPU.
*/
- phys_reset = (phys_reset_t)(unsigned long)__pa_symbol(cpu_reset);
- phys_reset(__pa_symbol(mcpm_entry_point), false);
+ cpu_reset(__pa_symbol(mcpm_entry_point), false);
/* should never get here */
BUG();
@@ -376,7 +372,6 @@ static int __init nocache_trampoline(unsigned long _arg)
unsigned int mpidr = read_cpuid_mpidr();
unsigned int cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
unsigned int cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
- phys_reset_t phys_reset;
mcpm_set_entry_vector(cpu, cluster, cpu_resume_no_hyp);
setup_mm_for_reboot();
@@ -387,8 +382,7 @@ static int __init nocache_trampoline(unsigned long _arg)
__mcpm_outbound_leave_critical(cluster, CLUSTER_DOWN);
__mcpm_cpu_down(cpu, cluster);
- phys_reset = (phys_reset_t)(unsigned long)__pa_symbol(cpu_reset);
- phys_reset(__pa_symbol(mcpm_entry_point), false);
+ cpu_reset(__pa_symbol(mcpm_entry_point), false);
BUG();
}
diff --git a/arch/arm/include/asm/proc-fns.h b/arch/arm/include/asm/proc-fns.h
index 280396483f5d..9bd6bf5f901a 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -131,18 +131,55 @@ static inline void init_proc_vtable(const struct processor *p)
}
#endif
-#define cpu_proc_init PROC_VTABLE(_proc_init)
-#define cpu_check_bugs PROC_VTABLE(check_bugs)
-#define cpu_proc_fin PROC_VTABLE(_proc_fin)
-#define cpu_reset PROC_VTABLE(reset)
-#define cpu_do_idle PROC_VTABLE(_do_idle)
-#define cpu_dcache_clean_area PROC_TABLE(dcache_clean_area)
-#define cpu_set_pte_ext PROC_TABLE(set_pte_ext)
-#define cpu_do_switch_mm PROC_VTABLE(switch_mm)
+static inline void __nocfi cpu_proc_init(void)
+{
+ PROC_VTABLE(_proc_init)();
+}
+static inline void __nocfi cpu_check_bugs(void)
+{
+ PROC_VTABLE(check_bugs)();
+}
+static inline void __nocfi cpu_proc_fin(void)
+{
+ PROC_VTABLE(_proc_fin)();
+}
+static inline void __nocfi cpu_reset(unsigned long addr, bool hvc)
+{
+ PROC_VTABLE(reset)(addr, hvc);
+}
+static inline int __nocfi cpu_do_idle(void)
+{
+ return PROC_VTABLE(_do_idle)();
+}
+static inline void __nocfi cpu_dcache_clean_area(void *addr, int size)
+{
+ PROC_TABLE(dcache_clean_area)(addr, size);
+}
+#ifdef CONFIG_ARM_LPAE
+static inline void __nocfi cpu_set_pte_ext(pte_t *ptep, pte_t pte)
+{
+ PROC_TABLE(set_pte_ext)(ptep, pte);
+}
+#else
+static inline void __nocfi cpu_set_pte_ext(pte_t *ptep, pte_t pte, unsigned int ext)
+{
+ PROC_TABLE(set_pte_ext)(ptep, pte, ext);
+}
+#endif
+static inline void __nocfi cpu_do_switch_mm(phys_addr_t pgd_phys, struct mm_struct *mm)
+{
+ PROC_VTABLE(switch_mm)(pgd_phys, mm);
+}
/* These two are private to arch/arm/kernel/suspend.c */
-#define cpu_do_suspend PROC_VTABLE(do_suspend)
-#define cpu_do_resume PROC_VTABLE(do_resume)
+static inline void __nocfi cpu_do_suspend(void *p)
+{
+ PROC_VTABLE(do_suspend)(p);
+}
+static inline void __nocfi cpu_do_resume(void *p)
+{
+ PROC_VTABLE(do_resume)(p);
+}
#endif
extern void cpu_resume(void);
diff --git a/arch/arm/mach-sunxi/mc_smp.c b/arch/arm/mach-sunxi/mc_smp.c
index 277f6aa8e6c2..791eabb7d433 100644
--- a/arch/arm/mach-sunxi/mc_smp.c
+++ b/arch/arm/mach-sunxi/mc_smp.c
@@ -646,17 +646,12 @@ static bool __init sunxi_mc_smp_cpu_table_init(void)
*
* We need the trampoline code to enable CCI-400 on the first cluster
*/
-typedef typeof(cpu_reset) phys_reset_t;
-
static int __init nocache_trampoline(unsigned long __unused)
{
- phys_reset_t phys_reset;
-
setup_mm_for_reboot();
sunxi_cluster_cache_disable_without_axi();
- phys_reset = (phys_reset_t)(unsigned long)__pa_symbol(cpu_reset);
- phys_reset(__pa_symbol(sunxi_mc_smp_resume), false);
+ cpu_reset(__pa_symbol(sunxi_mc_smp_resume), false);
BUG();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/9] ARM: delay: Turn delay functions into static inlines
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (3 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 4/9] ARM: proc: Use inlines instead of defines Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 6/9] ARM: turn CPU cache flush " Linus Walleij
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
The members of the vector table arm_delay_ops are called
directly using defines, but this is really confusing for
KCFI. Wrap the calls in static inlines and tag them with
__nocfi so things start to work.
Without this patch, platforms without a delay timer will
not boot (sticks in calibrating loop etc).
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/include/asm/delay.h | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 1d069e558d8d..7d611b810b6c 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -55,7 +55,10 @@ extern struct arm_delay_ops {
unsigned long ticks_per_jiffy;
} arm_delay_ops;
-#define __delay(n) arm_delay_ops.delay(n)
+static inline void __nocfi __delay(unsigned long n)
+{
+ arm_delay_ops.delay(n);
+}
/*
* This function intentionally does not exist; if you see references to
@@ -76,8 +79,15 @@ extern void __bad_udelay(void);
* first constant multiplications gets optimized away if the delay is
* a constant)
*/
-#define __udelay(n) arm_delay_ops.udelay(n)
-#define __const_udelay(n) arm_delay_ops.const_udelay(n)
+static inline void __nocfi __udelay(unsigned long n)
+{
+ arm_delay_ops.udelay(n);
+}
+
+static inline void __nocfi __const_udelay(unsigned long n)
+{
+ arm_delay_ops.const_udelay(n);
+}
#define udelay(n) \
(__builtin_constant_p(n) ? \
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/9] ARM: turn CPU cache flush functions into static inlines
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (4 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 5/9] ARM: delay: Turn delay functions into static inlines Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 7/9] ARM: page: Turn highpage accesses " Linus Walleij
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
The members of the vector table struct cpu_cache_fns cpu_cache
are called directly using defines, but this is really confusing
for KCFI. Wrap the calls in static inlines and tag them with
__nocfi so things start to work.
Conversely a similar approach is used for the __glue() helpers
which define their way into an assembly ENTRY(symbol) for respective
CPU variant. We wrap these into static inlines and prefix them
with __nocfi as well. (This happens on !MULTI_CACHE systems.)
For this case we also need to invoke the __glue() macro to
provide a proper function prototype for the inner function.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/include/asm/cacheflush.h | 45 +++++++++++++++++++++++++++++++--------
arch/arm/mm/dma.h | 28 ++++++++++++++++++------
2 files changed, 58 insertions(+), 15 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 1075534b0a2e..76fb665162a4 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -122,14 +122,38 @@ struct cpu_cache_fns {
extern struct cpu_cache_fns cpu_cache;
-#define __cpuc_flush_icache_all cpu_cache.flush_icache_all
-#define __cpuc_flush_kern_all cpu_cache.flush_kern_all
-#define __cpuc_flush_kern_louis cpu_cache.flush_kern_louis
-#define __cpuc_flush_user_all cpu_cache.flush_user_all
-#define __cpuc_flush_user_range cpu_cache.flush_user_range
-#define __cpuc_coherent_kern_range cpu_cache.coherent_kern_range
-#define __cpuc_coherent_user_range cpu_cache.coherent_user_range
-#define __cpuc_flush_dcache_area cpu_cache.flush_kern_dcache_area
+static inline void __nocfi __cpuc_flush_icache_all(void)
+{
+ cpu_cache.flush_icache_all();
+}
+static inline void __nocfi __cpuc_flush_kern_all(void)
+{
+ cpu_cache.flush_icache_all();
+}
+static inline void __nocfi __cpuc_flush_kern_louis(void)
+{
+ cpu_cache.flush_kern_louis();
+}
+static inline void __nocfi __cpuc_flush_user_all(void)
+{
+ cpu_cache.flush_user_all();
+}
+static inline void __nocfi __cpuc_flush_user_range(unsigned long start, unsigned long end, unsigned int flags)
+{
+ cpu_cache.flush_user_range(start, end, flags);
+}
+static inline void __nocfi __cpuc_coherent_kern_range(unsigned long start, unsigned long end)
+{
+ cpu_cache.coherent_kern_range(start, end);
+}
+static inline int __nocfi __cpuc_coherent_user_range(unsigned long start, unsigned long end)
+{
+ return cpu_cache.coherent_user_range(start, end);
+}
+static inline void __nocfi __cpuc_flush_dcache_area(void *kaddr, size_t sz)
+{
+ cpu_cache.flush_kern_dcache_area(kaddr, sz);
+}
/*
* These are private to the dma-mapping API. Do not use directly.
@@ -137,7 +161,10 @@ extern struct cpu_cache_fns cpu_cache;
* is visible to DMA, or data written by DMA to system memory is
* visible to the CPU.
*/
-#define dmac_flush_range cpu_cache.dma_flush_range
+static inline void __nocfi dmac_flush_range(const void *start, const void *end)
+{
+ cpu_cache.dma_flush_range(start, end);
+}
#else
diff --git a/arch/arm/mm/dma.h b/arch/arm/mm/dma.h
index aaef64b7f177..251b8a9fffc1 100644
--- a/arch/arm/mm/dma.h
+++ b/arch/arm/mm/dma.h
@@ -5,8 +5,6 @@
#include <asm/glue-cache.h>
#ifndef MULTI_CACHE
-#define dmac_map_area __glue(_CACHE,_dma_map_area)
-#define dmac_unmap_area __glue(_CACHE,_dma_unmap_area)
/*
* These are private to the dma-mapping API. Do not use directly.
@@ -14,8 +12,20 @@
* is visible to DMA, or data written by DMA to system memory is
* visible to the CPU.
*/
-extern void dmac_map_area(const void *, size_t, int);
-extern void dmac_unmap_area(const void *, size_t, int);
+
+/* These turn into function declarations for each per-CPU glue function */
+void __glue(_CACHE,_dma_map_area)(const void *, size_t, int);
+void __glue(_CACHE,_dma_unmap_area)(const void *, size_t, int);
+
+static inline void __nocfi dmac_map_area(const void *start, size_t sz, int flags)
+{
+ __glue(_CACHE,_dma_map_area)(start, sz, flags);
+}
+
+static inline void __nocfi dmac_unmap_area(const void *start, size_t sz, int flags)
+{
+ __glue(_CACHE,_dma_unmap_area)(start, sz, flags);
+}
#else
@@ -25,8 +35,14 @@ extern void dmac_unmap_area(const void *, size_t, int);
* is visible to DMA, or data written by DMA to system memory is
* visible to the CPU.
*/
-#define dmac_map_area cpu_cache.dma_map_area
-#define dmac_unmap_area cpu_cache.dma_unmap_area
+static inline void __nocfi dmac_map_area(const void *start, size_t sz, int flags)
+{
+ cpu_cache.dma_map_area(start, sz, flags);
+}
+static inline void __nocfi dmac_unmap_area(const void *start, size_t sz, int flags)
+{
+ cpu_cache.dma_unmap_area(start, sz, flags);
+}
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 7/9] ARM: page: Turn highpage accesses into static inlines
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (5 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 6/9] ARM: turn CPU cache flush " Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 8/9] ARM: ftrace: Define ftrace_stub_graph Linus Walleij
2024-03-07 14:22 ` [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode Linus Walleij
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
Clearing and copying pages in highmem uses either the cpu_user
vtable or the __glue() assembler stubs to call into per-CPU
versions of these functions.
This is all really confusing for KCFI so wrap these into static
inlines and prefix each inline function with __nocfi.
__cpu_clear_user_highpage() and __cpu_copy_user_highpage() are
exported in arch/arm/mm/proc-syms.c which causes a problem with
using static inlines, but it turns out that these exports are
completely unused, so we can just delete them.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/include/asm/page.h | 36 +++++++++++++++++++++++++++++-------
arch/arm/mm/proc-syms.c | 7 +------
2 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 119aa85d1feb..8bf297228627 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -138,17 +138,39 @@ void xscale_mc_clear_user_highpage(struct page *page, unsigned long vaddr);
#ifdef MULTI_USER
extern struct cpu_user_fns cpu_user;
-#define __cpu_clear_user_highpage cpu_user.cpu_clear_user_highpage
-#define __cpu_copy_user_highpage cpu_user.cpu_copy_user_highpage
+static inline void __nocfi __cpu_clear_user_highpage(struct page *page,
+ unsigned long vaddr)
+{
+ cpu_user.cpu_clear_user_highpage(page, vaddr);
+}
+
+static inline void __nocfi __cpu_copy_user_highpage(struct page *to,
+ struct page *from, unsigned long vaddr,
+ struct vm_area_struct *vma)
+{
+ cpu_user.cpu_copy_user_highpage(to, from, vaddr, vma);
+}
#else
-#define __cpu_clear_user_highpage __glue(_USER,_clear_user_highpage)
-#define __cpu_copy_user_highpage __glue(_USER,_copy_user_highpage)
+/* These turn into function declarations for each per-CPU glue function */
+void __glue(_USER,_clear_user_highpage)(struct page *page, unsigned long vaddr);
+void __glue(_USER,_copy_user_highpage)(struct page *to, struct page *from,
+ unsigned long vaddr, struct vm_area_struct *vma);
+
+static inline void __nocfi __cpu_clear_user_highpage(struct page *page,
+ unsigned long vaddr)
+{
+ __glue(_USER,_clear_user_highpage)(page, vaddr);
+}
+
+static inline void __nocfi __cpu_copy_user_highpage(struct page *to,
+ struct page *from, unsigned long vaddr,
+ struct vm_area_struct *vma)
+{
+ __glue(_USER,_copy_user_highpage)(to, from, vaddr, vma);
+}
-extern void __cpu_clear_user_highpage(struct page *page, unsigned long vaddr);
-extern void __cpu_copy_user_highpage(struct page *to, struct page *from,
- unsigned long vaddr, struct vm_area_struct *vma);
#endif
#define clear_user_highpage(page,vaddr) \
diff --git a/arch/arm/mm/proc-syms.c b/arch/arm/mm/proc-syms.c
index e21249548e9f..c93fec38d9f4 100644
--- a/arch/arm/mm/proc-syms.c
+++ b/arch/arm/mm/proc-syms.c
@@ -31,14 +31,9 @@ EXPORT_SYMBOL(__cpuc_flush_dcache_area);
EXPORT_SYMBOL(cpu_cache);
#endif
-#ifdef CONFIG_MMU
-#ifndef MULTI_USER
-EXPORT_SYMBOL(__cpu_clear_user_highpage);
-EXPORT_SYMBOL(__cpu_copy_user_highpage);
-#else
+#if defined(CONFIG_MMU) && defined(MULTI_USER)
EXPORT_SYMBOL(cpu_user);
#endif
-#endif
/*
* No module should need to touch the TLB (and currently
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 8/9] ARM: ftrace: Define ftrace_stub_graph
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (6 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 7/9] ARM: page: Turn highpage accesses " Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 14:22 ` [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode Linus Walleij
8 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
Several architectures defines this stub for the graph tracer,
and it is needed for CFI, as it needs a separate symbol for it.
The trick from include/asm-generic/vmlinux.lds.h to define
ftrace_stub_graph to ftrace_stub isn't working when using CFI.
Commit 883bbbffa5a4 contains the details.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/kernel/entry-ftrace.S | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index 3e7bcaca5e07..bc598e3d8dd2 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -271,6 +271,10 @@ ENTRY(ftrace_stub)
ret lr
ENDPROC(ftrace_stub)
+ENTRY(ftrace_stub_graph)
+ ret lr
+ENDPROC(ftrace_stub_graph)
+
#ifdef CONFIG_DYNAMIC_FTRACE
__INIT
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
` (7 preceding siblings ...)
2024-03-07 14:22 ` [PATCH v2 8/9] ARM: ftrace: Define ftrace_stub_graph Linus Walleij
@ 2024-03-07 14:22 ` Linus Walleij
2024-03-07 18:58 ` Kees Cook
8 siblings, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2024-03-07 14:22 UTC (permalink / raw)
To: Russell King, Sami Tolvanen, Kees Cook, Nathan Chancellor,
Nick Desaulniers, Ard Biesheuvel, Arnd Bergmann
Cc: linux-arm-kernel, llvm, Linus Walleij
This registers a breakpoint handler for the new breakpoint type
(0x03) inserted by LLVM CLANG for CFI breakpoints.
If we are in permissive mode, just print a backtrace and continue.
Example with CONFIG_CFI_PERMISSIVE enabled:
root@Vexpress:/ echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT
lkdtm: Performing direct entry CFI_FORWARD_PROTO
lkdtm: Calling matched prototype ...
lkdtm: Calling mismatched prototype ...
hw-breakpoint: Permissive CFI breakpoint
CPU: 0 PID: 114 Comm: sh Not tainted 6.8.0-rc1+ #111
Hardware name: ARM-Versatile Express
unwind_backtrace from show_stack+0x28/0x30
(...)
lkdtm: FAIL: survived mismatched prototype function call!
lkdtm: Unexpected! This kernel (6.8.0-rc1+ armv7l) was
built with CONFIG_CFI_CLANG=y
As you can see the LKDTM test fails, but I expect that this would be
expected behaviour in the permissive mode.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
arch/arm/include/asm/hw_breakpoint.h | 1 +
arch/arm/kernel/hw_breakpoint.c | 10 ++++++++++
2 files changed, 11 insertions(+)
diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h
index 62358d3ca0a8..e7f9961c53b2 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -84,6 +84,7 @@ static inline void decode_ctrl_reg(u32 reg,
#define ARM_DSCR_MOE(x) ((x >> 2) & 0xf)
#define ARM_ENTRY_BREAKPOINT 0x1
#define ARM_ENTRY_ASYNC_WATCHPOINT 0x2
+#define ARM_ENTRY_CFI_BREAKPOINT 0x3
#define ARM_ENTRY_SYNC_WATCHPOINT 0xa
/* DSCR monitor/halting bits. */
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc0fb7a81371..256146684813 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -932,6 +932,16 @@ static int hw_breakpoint_pending(unsigned long addr, unsigned int fsr,
case ARM_ENTRY_SYNC_WATCHPOINT:
watchpoint_handler(addr, fsr, regs);
break;
+ case ARM_ENTRY_CFI_BREAKPOINT:
+ if (IS_ENABLED(CONFIG_CFI_PERMISSIVE)) {
+ pr_err("Permissive CFI breakpoint\n");
+ dump_stack();
+ /* Skip the breaking instruction */
+ instruction_pointer(regs) += 4;
+ } else {
+ die("Oops - CFI", regs, 0);
+ }
+ break;
default:
ret = 1; /* Unhandled fault. */
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode
2024-03-07 14:22 ` [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode Linus Walleij
@ 2024-03-07 18:58 ` Kees Cook
0 siblings, 0 replies; 11+ messages in thread
From: Kees Cook @ 2024-03-07 18:58 UTC (permalink / raw)
To: Linus Walleij
Cc: Russell King, Sami Tolvanen, Nathan Chancellor, Nick Desaulniers,
Ard Biesheuvel, Arnd Bergmann, linux-arm-kernel, llvm
On Thu, Mar 07, 2024 at 03:22:08PM +0100, Linus Walleij wrote:
> This registers a breakpoint handler for the new breakpoint type
> (0x03) inserted by LLVM CLANG for CFI breakpoints.
>
> If we are in permissive mode, just print a backtrace and continue.
>
> Example with CONFIG_CFI_PERMISSIVE enabled:
>
> root@Vexpress:/ echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT
> lkdtm: Performing direct entry CFI_FORWARD_PROTO
> lkdtm: Calling matched prototype ...
> lkdtm: Calling mismatched prototype ...
> hw-breakpoint: Permissive CFI breakpoint
> CPU: 0 PID: 114 Comm: sh Not tainted 6.8.0-rc1+ #111
> Hardware name: ARM-Versatile Express
> unwind_backtrace from show_stack+0x28/0x30
> (...)
> lkdtm: FAIL: survived mismatched prototype function call!
> lkdtm: Unexpected! This kernel (6.8.0-rc1+ armv7l) was
> built with CONFIG_CFI_CLANG=y
>
> As you can see the LKDTM test fails, but I expect that this would be
> expected behaviour in the permissive mode.
>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> arch/arm/include/asm/hw_breakpoint.h | 1 +
> arch/arm/kernel/hw_breakpoint.c | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h
> index 62358d3ca0a8..e7f9961c53b2 100644
> --- a/arch/arm/include/asm/hw_breakpoint.h
> +++ b/arch/arm/include/asm/hw_breakpoint.h
> @@ -84,6 +84,7 @@ static inline void decode_ctrl_reg(u32 reg,
> #define ARM_DSCR_MOE(x) ((x >> 2) & 0xf)
> #define ARM_ENTRY_BREAKPOINT 0x1
> #define ARM_ENTRY_ASYNC_WATCHPOINT 0x2
> +#define ARM_ENTRY_CFI_BREAKPOINT 0x3
> #define ARM_ENTRY_SYNC_WATCHPOINT 0xa
>
> /* DSCR monitor/halting bits. */
> diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
> index dc0fb7a81371..256146684813 100644
> --- a/arch/arm/kernel/hw_breakpoint.c
> +++ b/arch/arm/kernel/hw_breakpoint.c
> @@ -932,6 +932,16 @@ static int hw_breakpoint_pending(unsigned long addr, unsigned int fsr,
> case ARM_ENTRY_SYNC_WATCHPOINT:
> watchpoint_handler(addr, fsr, regs);
> break;
> + case ARM_ENTRY_CFI_BREAKPOINT:
> + if (IS_ENABLED(CONFIG_CFI_PERMISSIVE)) {
> + pr_err("Permissive CFI breakpoint\n");
> + dump_stack();
> + /* Skip the breaking instruction */
Instead of open-coding this, can you make a call to report_cfi_failure()
instead? This will keep the failure output the same across
architectures. I think it would look something like:
if (report_cfi_failure(regs, addr, ...) == BUG_TRAP_TYPE_WARN)
instruction_pointer(regs) += 4;
else
die("Oops - CFI", regs, 0);
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-07 18:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-07 14:21 [PATCH v2 0/9] CFI for ARM32 using LLVM Linus Walleij
2024-03-07 14:22 ` [PATCH v2 1/9] ARM: Support CLANG CFI Linus Walleij
2024-03-07 14:22 ` [PATCH v2 2/9] ARM: tlbflush: Make TLB flushes into static inlines Linus Walleij
2024-03-07 14:22 ` [PATCH v2 3/9] ARM: bugs: Check in the vtable instead of defined aliases Linus Walleij
2024-03-07 14:22 ` [PATCH v2 4/9] ARM: proc: Use inlines instead of defines Linus Walleij
2024-03-07 14:22 ` [PATCH v2 5/9] ARM: delay: Turn delay functions into static inlines Linus Walleij
2024-03-07 14:22 ` [PATCH v2 6/9] ARM: turn CPU cache flush " Linus Walleij
2024-03-07 14:22 ` [PATCH v2 7/9] ARM: page: Turn highpage accesses " Linus Walleij
2024-03-07 14:22 ` [PATCH v2 8/9] ARM: ftrace: Define ftrace_stub_graph Linus Walleij
2024-03-07 14:22 ` [PATCH v2 9/9] ARM: KCFI: Allow permissive CFI mode Linus Walleij
2024-03-07 18:58 ` Kees Cook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).